CN115758284B - Photovoltaic hot spot fault detection method and system based on fusion kernel entropy and information gain - Google Patents

Photovoltaic hot spot fault detection method and system based on fusion kernel entropy and information gain Download PDF

Info

Publication number
CN115758284B
CN115758284B CN202211418893.2A CN202211418893A CN115758284B CN 115758284 B CN115758284 B CN 115758284B CN 202211418893 A CN202211418893 A CN 202211418893A CN 115758284 B CN115758284 B CN 115758284B
Authority
CN
China
Prior art keywords
sample
matrix
test
fusion
train
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211418893.2A
Other languages
Chinese (zh)
Other versions
CN115758284A (en
Inventor
易辉
蒋尚俊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Tech University
Original Assignee
Nanjing Tech University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Tech University filed Critical Nanjing Tech University
Priority to CN202211418893.2A priority Critical patent/CN115758284B/en
Publication of CN115758284A publication Critical patent/CN115758284A/en
Application granted granted Critical
Publication of CN115758284B publication Critical patent/CN115758284B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02EREDUCTION OF GREENHOUSE GAS [GHG] EMISSIONS, RELATED TO ENERGY GENERATION, TRANSMISSION OR DISTRIBUTION
    • Y02E10/00Energy generation through renewable energy sources
    • Y02E10/50Photovoltaic [PV] energy

Landscapes

  • Photovoltaic Devices (AREA)
  • Measurement Of Radiation (AREA)

Abstract

The invention discloses a photovoltaic hot spot fault detection method based on fusion kernel entropy and information gain, which comprises the steps of fusing a test sample and a training sample to obtain a fusion sample, carrying out standardized processing on the fusion sample, projecting the fusion sample into a high-dimensional space by using a Gaussian kernel function, carrying out feature decomposition on a kernel matrix to obtain a feature value and a feature vector, and selecting the feature value and the feature vector to calculate feature information of the fusion sample; determining a fault detection threshold value by utilizing characteristic information of the training sample, and judging whether the photovoltaic module has a hot spot fault or not through fault detection variables and comparison with the fault detection threshold value; according to the invention, the characteristic information of the sample is extracted by the nuclear entropy component analysis method, the assumption that sample data accords with Gaussian distribution assumption is not needed, and the application range is wider; according to the invention, the kernel entropy component analysis method is a characteristic extraction method based on information entropy, and the information gain is introduced as a detection variable for hot spot fault detection, so that a better detection effect is achieved.

Description

Photovoltaic hot spot fault detection method and system based on fusion kernel entropy and information gain
Technical Field
The invention relates to a photovoltaic hot spot fault detection method and system based on fusion nuclear entropy and information gain, and belongs to the technical field of photovoltaic hot spot fault diagnosis.
Background
With the continuous aggravation of energy crisis and environmental pollution, new energy technology is vigorously developed. Among all new energy sources, the photovoltaic power generation technology becomes the research focus of related personnel by virtue of the characteristics of cleanness, high efficiency and the like. Photovoltaic power generation is usually deployed in a complex environment, and is easily influenced by the outside to cause a fault problem. The hot spot fault is one of main faults of the photovoltaic power generation system, and when the hot spot fault is serious, the stable operation of the photovoltaic power generation system is disturbed, and even the personal safety is endangered.
The current main detection method of the photovoltaic hot spot fault is based on an infrared image, a detector acquires an infrared image of the photovoltaic module through an infrared thermal imager, and judges whether the hot spot fault occurs to the photovoltaic module according to the temperature difference caused by the hot spot phenomenon. For small photovoltaic power generation systems, this approach is clearly unsuitable. Along with the development of big data, the operation state of the photovoltaic power generation system is judged to have feasibility by analyzing the detection data of the photovoltaic power generation system. The photovoltaic hot spot detection method based on the electrical parameters does not need detection personnel to enter a field environment for detection, and can greatly reduce the burden of the detection personnel. In the existing counting, the photovoltaic hot spot fault detection generally adopts a data tree-based prediction model, for example, CN 107451600A, and discloses an online photovoltaic hot spot fault detection based on an isolation mechanism, so that the measurement of the hot spot fault of any photovoltaic panel in the online photovoltaic array is realized, but the sampling is required to be carried out based on experiences for many times, and the test is inaccurate.
Disclosure of Invention
The invention aims to solve the problems and the defects of the prior art and provides a photovoltaic hot spot fault detection method based on fusion of a nuclear entropy and an information gain.
In order to solve the technical problems, the invention adopts the following technical scheme:
a photovoltaic hot spot fault detection method based on fusion kernel entropy and information gain comprises the following steps:
step 1, collecting photovoltaic module data to form sample data, wherein a training sample is X train The test sample is X test The method comprises the steps of carrying out a first treatment on the surface of the Training sample X train And test sample X test Fusion to obtain a fusion sample X f
Step 2, calculating a fusion sample X f Mean and standard deviation, to fuse sample X f Normalized fusion sample X 'that became 0 standard deviation 1 as mean' f
Step 3, selecting a projection function as a Gaussian kernel function, and using the Gaussian kernel function to obtain X' f Projecting to a high-dimensional space to obtain a kernel matrix K f
Step 4, for the kernel matrix K f Performing feature decomposition to obtain a feature value diagonal matrix D and a feature vector matrix E, calculating a Rayleigh entropy value, and performing descending order sequencing on the feature values and the corresponding feature vectors according to the Rayleigh entropy value;
step 5, selecting characteristic values and characteristic vectors according to the Rayleigh entropy values, and calculating characteristic information Y f
Step 6, utilizing the information characteristic Y of the training sample train Calculating the detection variable F of the training sample train And determines a fault detection threshold F m
Step 7, calculating the detection variable F of the test sample test Will detect variable F test And a detection threshold F m And comparing, and judging whether the photovoltaic module has a hot spot fault or not.
Preferably, the step 1 specifically includes the following steps:
collecting characteristic data of a photovoltaic module to form sample data X, X= [ X ] 1 ,x 2 ...x θ ...x S ] T S represents the total number of samples taken of the data, θ=1, 2 … S, x θ The characteristic data of the photovoltaic module IS acquired for the theta time, and comprises irradiance G (W/m 2), temperature T (DEG C), open-circuit voltage UO (V), short-circuit current IS (A), maximum power point current IM (A), maximum power point voltage UM (V), maximum power MP (W) and filling factor F (1); continuously selecting n normal samples from the sample data X to form a training sample X train =[x 1 ,x 2 ...x n ] T Wherein, training sample X train The method comprises n samples, wherein each sample comprises t photovoltaic module characteristic data; continuously selecting m samples from the rest of the sample data X to form a test sample X test =[x 1 ,x 2 ...x m ] T Wherein test sample X test The method comprises the steps of including m samples, wherein each sample includes t photovoltaic module characteristic data; training sample X train And test sample X test Fusion to obtain a fusion sample X f :
X f =[X train ,X test ]=[x 1 ,x 2 ...x i ...x n+m ] T (1)
x i Representing characteristic data of the photovoltaic module obtained by sampling for the ith time, wherein the characteristic data comprise t photovoltaic module characteristic data, and the photovoltaic module characteristic data comprise irradiance G (W/m) 2 ) Temperature T (DEG C), open circuit voltage U O (V) short-circuit current I S (A) Maximum power point current I M (A) Maximum power point voltage U M (V), maximum power M P (W) and a fill factor F (1).
The step 2 specifically comprises the following steps: for fusion sample X f Normalizing based on the characteristics to obtain a normalized fusion sample X' f Normalization is performed by formula (2):
Figure GDA0004176483070000031
wherein mean (X f ) Representing fusion sample X f Mean value of sd (X) f ) Representing fusion sample X f Standard deviation of (2).
The step 3 specifically comprises the following steps: fusion sample X to be normalized f ' projection into a high-dimensional space, the projection function being a gaussian kernel function, the gaussian kernel function being of formula (3):
Figure GDA0004176483070000032
wherein x is i ' represents a normalized fusion sample X f ' ith sample, x j ' represents a normalized fusion sample X f ' j-th sample, ||x ' ' i -x′ j The expression "x" represents a sample x' i And sample x' j The Euclidean distance between the two, sigma is the bandwidth of a Gaussian kernel function, and sigma is more than 0;
will k (x' i ,x′ j ) Substituting into Gaussian kernel function to obtain kernel matrix K of sample projected into high-dimensional space f
Figure GDA0004176483070000041
The step 4 specifically comprises the following steps: nuclear matrix K f Satisfy formula (5):
(K f -λI)e=0 (5)
wherein λ represents a kernel matrix K f E represents the kernel matrix K f I is the identity matrix of (n+m) · (n+m); let determinant |K f - λi|=0, and find the kernel matrix K f Is lambda 12 ...λ i ...λ n+m ,λ i Representing a kernel matrix K f Is the i-th eigenvalue of (a); will characteristic value lambda 12 ...λ i ...λ n+m Respectively substituting into (5) to obtain corresponding feature vector e 1 ,e 2 ...e i ...e n+m ,e i Representing a kernel matrix K f Is the i-th feature vector of (a);
due to the kernel matrix K f Is a semi-positive definite matrix, then the kernel matrix K f Represented by the form of formula (6):
K f =EDE T (6)
in formula (6), d=diag (λ) 1 ,λ 2 …λ i …λ n+m ) Is a nuclear matrix K f Diagonal matrix of eigenvalues of (E) = (E) 1 ,e 2 …e i …e n+m ) Is a feature vector matrix;
calculating the Rayleigh entropy value of the fusion sample, and sorting the feature values and the corresponding feature vectors in descending order according to the Rayleigh entropy value:
Figure GDA0004176483070000042
in the formula (7), the amino acid sequence of the compound,
Figure GDA0004176483070000043
for the rayleigh entropy value (estimated value of rayleigh entropy), n+m represents the number of samples, and 1 is a row vector of (n+m) ×1;
Based on Rayleigh entropy
Figure GDA0004176483070000051
The corresponding eigenvalues and eigenvectors are ordered in descending order to obtain a diagonal matrix D ' =diag (lambda ') composed of the eigenvalues after ordered in descending order ' 1 ,λ′ 2 …λ′ i …λ′ n+m ) The feature vector matrix E '=diag (E' 1 ,e′ 2 …e′ i …e′ n+m );λ′ i To rank the ith eigenvalue in descending order, e' i The i-th feature vector is sorted according to descending order;
the nuclear entropy component analysis method is converted into an optimization problem:
Figure GDA0004176483070000052
Φ eca feature mapping representing a kernel entropy component analysis; v (V) d (p) represents the Rayleigh entropy value obtained by taking the first D eigenvalues and the corresponding eigenvectors from the diagonal matrix D' after sequencing; d (D) d For the diagonal matrix of the first D eigenvalues λ in the ordered diagonal matrix D', D < = n+m, E d Is the first d eigenvectors (with eigenvalue lambda 'in the ordered eigenvector matrix E' i Correspondence) to the feature vector matrix.
The step 5 specifically comprises the following steps:
to phi eca Constraint is carried out, and selection is carried out
Figure GDA0004176483070000053
Obtaining a d value; m is M Kernel entropy threshold Is constant;
for normalized fusion samples X' f Projecting in the space where the feature vector is located to obtain feature information Y f Formula (9):
Figure GDA0004176483070000054
Figure GDA0004176483070000055
is a standardized fusion sample X' f Phi represents a nonlinear mapping matrix, phi (X f ') represents fusion sample X' f A matrix subjected to nonlinear mapping;
characteristic information Y f Feature information Y comprising training samples train And information characteristic Y of test sample test
The step 6 specifically comprises the following steps:
using characteristic information Y of training samples train Calculating a detection variable and a fault detection threshold of the training sample;
the information gain of the training sample and the test sample is defined as follows:
G t =Y train -Y test (11)
Y train refer to the information features of training samples, Y test Information features of the test sample;
when the test sample is a hot spot fault sample, the information gain of the photovoltaic module under the hot spot fault characteristic is as follows:
G h =Y train -Y h (12)
Y h refers to the information characteristics of the hot spot failure samples.
When the test sample is a normal sample, the information gain G of the two groups of normal samples n The method comprises the following steps:
G n =Y train -Y n (13)
Y n information features of normal samples are pointed out;
the trace of the matrix describes the characteristic information of the matrix, taking into account the information gain G t Not necessarily square, the detected variable of the hot spot failure is defined as:
F test =tr((G t ) T G t ) (14)
tr () represents the trace-finding operation of the matrix, i.e. the sum of the diagonal elements of the square matrix.
Will trainTraining sample feature information Y train According to the following steps of 1: the 1 scale is divided into two parts: y is Y train1 And Y train2 ,Y train1 And Y train2 The information gain in the training sample room is calculated; deriving information gain G in training sample room a The method comprises the following steps:
G a =Y train1 -Y train2 (15)
Y train =[Y train1 ,Y train2 ]handle Y train According to the following steps of 1: the 1 scale is divided into two parts: y is Y train1 And Y train2 For calculating the gain of information within the training sample interval.
The test variables for the training samples are:
F train =tr((G a ) T G a ) (16)
the detection threshold value of the hot spot fault of the photovoltaic module is F m =maxF train
The step 7 specifically comprises the following steps: information gain G based on training samples and test samples t (calculation of fault detection variable F of test sample) test According to the detection threshold F m Test variable F with test sample test Judging whether the hot spot fault occurs. To test the test variable F of the sample test Detection threshold F for hot spot fault of photovoltaic module m Comparing if F test >F m And judging that the photovoltaic module has a hot spot fault.
A photovoltaic hot spot fault detection system based on fusion of nuclear entropy and information gain comprises a sample data establishment unit, a sample standardization unit, a nuclear matrix calculation unit, a Rayleigh entropy calculation unit, a characteristic information acquisition unit, a fault detection threshold calculation unit and a fault judgment unit;
the sample data establishing unit acquires the data of the photovoltaic module to form sample data, wherein the training sample is X train The test sample is X test The method comprises the steps of carrying out a first treatment on the surface of the Training sample X train And test sample X test Fusion to obtain a fusion sample X f
Sample normalization unit calculation fusionCombined sample X f Mean and standard deviation, to fuse sample X f Normalized fusion sample X 'that became 0 standard deviation 1 as mean' f
The kernel matrix calculation unit selects the projection function as a Gaussian kernel function, and uses the Gaussian kernel function to calculate X' f Projecting to a high-dimensional space to obtain a kernel matrix K f
Rayleigh entropy value calculation unit pair kernel matrix K f Performing feature decomposition to obtain a feature value diagonal matrix D and a feature vector matrix E, calculating a Rayleigh entropy value, and performing descending order sequencing on the feature values and the corresponding feature vectors according to the Rayleigh entropy value;
the feature information acquisition unit selects feature values and feature vectors based on the Rayleigh entropy values, and calculates feature information Y f
The failure detection threshold calculation unit is based on the information feature Y of the training sample train Calculating the detection variable F of the training sample train And determines a fault detection threshold F m
The fault judging unit calculates the detection variable F of the test sample test Will detect variable F test And a detection threshold F m And comparing, and judging whether the photovoltaic module has a hot spot fault or not.
The working process of the sample data establishing unit specifically comprises the following steps:
collecting characteristic data of a photovoltaic module to form sample data X, X= [ X ] 1 ,x 2 ...x θ ...x S ] T S represents the total number of samples taken of the data, θ=1, 2..s, x θ The characteristic data of the photovoltaic module IS acquired for the theta time, and comprises irradiance G (W/m 2), temperature T (DEG C), open-circuit voltage UO (V), short-circuit current IS (A), maximum power point current IM (A), maximum power point voltage UM (V), maximum power MP (W) and filling factor F (1); continuously selecting n normal samples from the sample data X to form a training sample X train =[x 1 ,x 2 ...x n ] T Wherein, training sample X train The method comprises n samples, wherein each sample comprises t photovoltaic module characteristic data; continuously selecting m sample compositions from the remainder of sample data XThe test sample is X test =[x 1 ,x 2 ...x m ] T Wherein test sample X test The method comprises the steps of including m samples, wherein each sample includes t photovoltaic module characteristic data; training sample X train And test sample X test Fusion to obtain a fusion sample X f :
X f =[X train ,X test ]=[x 1 ,x 2 ...x i ...x n+m ] T (1)
x i Representing characteristic data of the photovoltaic module obtained by sampling for the ith time, wherein the characteristic data comprise t photovoltaic module characteristic data, and the photovoltaic module characteristic data comprise irradiance G (W/m) 2 ) Temperature T (DEG C), open circuit voltage U O (V) short-circuit current I S (A) Maximum power point current I M (A) Maximum power point voltage U M (V), maximum power M P (W) and a fill factor F (1).
The working process of the sample standardization unit specifically comprises the following steps: for fusion sample X f Normalizing based on the characteristics to obtain a normalized fusion sample X' f Normalization is performed by formula (2):
Figure GDA0004176483070000091
wherein mean (X f ) Representing fusion sample X f Mean value of sd (X) f ) Representing fusion sample X f Standard deviation of (2).
The working process of the core matrix computing unit specifically comprises the following steps:
fusion sample X to be normalized f ' projection into a high-dimensional space, the projection function being a gaussian kernel function, the gaussian kernel function being of formula (3):
Figure GDA0004176483070000092
Wherein x is i ' represents a normalized fusion sample X f ' firsti samples, x j ' represents a normalized fusion sample X f ' j-th sample, ||x ' ' i -x′ j The expression "x" represents a sample x' i And sample x' j The Euclidean distance between the two, sigma is the bandwidth of a Gaussian kernel function, and sigma is more than 0;
will k (x' i ,x′ j ) Substituting into Gaussian kernel function to obtain kernel matrix K of sample projected into high-dimensional space f
Figure GDA0004176483070000093
The working process of the Rayleigh entropy value calculating unit specifically comprises the following steps:
nuclear matrix K f Satisfy formula (5):
(K f -λI)e=0 (5)
wherein λ represents a kernel matrix K f E represents the kernel matrix K f I is the identity matrix of (n+m) · (n+m); let determinant |K f - λi|=0, and find the kernel matrix K f Is lambda 12 ...λ i ...λ n+m ,λ i Representing a kernel matrix K f Is the i-th eigenvalue of (a); will characteristic value lambda 12 ...λ i ...λ n+m Respectively substituting into (5) to obtain corresponding feature vector e 1 ,e 2 ...e i ...e n+m ,e i Representing a kernel matrix K f Is the i-th feature vector of (a);
due to the kernel matrix K f Is a semi-positive definite matrix, then the kernel matrix K f Represented by the form of formula (6):
K f =EDE T (6)
in formula (6), d=diag (λ) 1 ,λ 2 …λ i …λ n+m ) Is a nuclear matrix K f Diagonal matrix of eigenvalues of (E) = (E) 1 ,e 2 …e i …e n+m ) Is a feature vector matrix;
calculating the Rayleigh entropy value of the fusion sample, and sorting the feature values and the corresponding feature vectors in descending order according to the Rayleigh entropy value:
Figure GDA0004176483070000101
In the formula (7), the amino acid sequence of the compound,
Figure GDA0004176483070000102
for the rayleigh entropy value (estimated value of rayleigh entropy), n+m represents the number of samples, and 1 is a row vector of (n+m) ×1;
based on Rayleigh entropy
Figure GDA0004176483070000103
The corresponding eigenvalues and eigenvectors are ordered in descending order to obtain a diagonal matrix D ' =diag (lambda ') composed of the eigenvalues after ordered in descending order ' 1 ,λ′ 2 …λ′ i …λ′ n+m ) The feature vector matrix E '=diag (E' 1 ,e' 2 …e′ i …e' n+m );λ′ i To rank the ith eigenvalue in descending order, e' i The i-th feature vector is sorted according to descending order;
performing nuclear entropy component analysis, and converting a nuclear entropy component analysis method into an optimization problem:
Figure GDA0004176483070000104
Φ eca feature mapping representing a kernel entropy component analysis;
V d (p) represents the Rayleigh entropy value obtained by taking the first D eigenvalues and the corresponding eigenvectors from the diagonal matrix D' after sequencing; d (D) d For the diagonal matrix of the first D eigenvalues λ in the ordered diagonal matrix D', D < = n+m, E d Is the first d eigenvectors (with eigenvalue lambda 'in the ordered eigenvector matrix E' i Correspondence) to the feature vector matrix.
The characteristic information acquisition unit specifically comprises the following steps:
to phi eca Constraint is carried out, and selection is carried out
Figure GDA0004176483070000111
Obtaining a d value; m is M Kernel entropy threshold Is constant;
for normalized fusion samples X' f Projecting in the space where the feature vector is located to obtain feature information Y f Formula (9):
Figure GDA0004176483070000112
Figure GDA0004176483070000113
is a standardized fusion sample X' f Phi represents a nonlinear mapping matrix, phi (X f ') represents fusion sample X' f A matrix subjected to nonlinear mapping;
characteristic information Y f Feature information Y comprising training samples train And information characteristic Y of test sample test
The fault detection threshold calculation unit specifically includes the following steps:
using characteristic information Y of training samples train Calculating a detection variable and a fault detection threshold of the training sample;
the information gain of the training sample and the test sample is defined as follows:
G t =Y train -Y test (11)
Y train refer to the information features of training samples, Y test Information features of the test sample;
when the test sample is a hot spot fault sample, the information gain of the photovoltaic module under the hot spot fault characteristic is as follows:
G h =Y train -Y h (12)
Y h refers to the information characteristics of the hot spot failure samples.
When the test sample is normalInformation gain G of two groups of normal samples during sample n The method comprises the following steps:
G n =Y train -Y n (13)
Y n information features of normal samples are pointed out;
the trace of the matrix describes the characteristic information of the matrix, taking into account the information gain G t Not necessarily square, the detected variable of the hot spot failure is defined as:
F test =tr((G t ) T G t ) (14)
tr () represents the trace-finding operation of the matrix, i.e. the sum of the diagonal elements of the square matrix.
Feature information Y of training sample train According to the following steps of 1: the 1 scale is divided into two parts: y is Y train1 And Y train2 ,Y train1 And Y train2 The information gain in the training sample room is calculated; deriving information gain G in training sample room a The method comprises the following steps:
G a =Y train1 -Y train2 (15)
Y train =[Y train1 ,Y train2 ]handle Y train According to the following steps of 1: the 1 scale is divided into two parts: y is Y train1 And Y train2 For calculating the gain of information within the training sample interval.
The test variables for the training samples are:
F train =tr((G a ) T G a ) (16)
the detection threshold value of the hot spot fault of the photovoltaic module is F m =maxF train
The working process of the fault judging unit specifically comprises the following steps:
information gain G based on training samples and test samples t (calculation of fault detection variable F of test sample) test According to the detection threshold F m Test variable F with test sample test Judging whether the hot spot fault occurs. To test the test variable F of the sample test Detection threshold F for hot spot fault of photovoltaic module m Comparing if F test >F m And judging that the photovoltaic module has a hot spot fault.
The invention has the following beneficial effects:
the invention discloses a fault detection method based on fusion kernel entropy component analysis and information gain, which improves the performance of fault detection by fusing a test sample and a training sample;
according to the invention, characteristic information in the training sample is fully utilized, the training sample and the test sample are fused, the characteristic information in the sample is extracted by using a nuclear entropy component analysis method, and the characteristic extraction method based on information entropy is considered in the nuclear entropy component analysis, so that the information gain is introduced as a detection variable, and the hot spot fault of the photovoltaic module is effectively detected.
According to the invention, the information characteristics of the training sample are fully considered in the characteristic extraction process, the training sample and the test sample are fused and then the characteristic extraction is carried out, so that the normal sample and the fault sample can be better distinguished.
The method for extracting the characteristic information of the sample by selecting the nuclear entropy component analysis method does not need to assume that the sample data accords with Gaussian distribution assumption, and has wider application range.
According to the invention, the kernel entropy component analysis method is a characteristic extraction method based on information entropy, and the information gain is introduced as a detection variable for hot spot fault detection, so that a better detection effect is achieved.
According to the invention, training data and test data are fused to obtain fused data, then characteristic information of a fused sample is extracted through analysis of nuclear entropy components, and finally, an information gain is utilized to obtain a fault detection variable and a detection threshold value. The invention comprises two stages of feature extraction and fault detection. In the feature extraction stage, the test data and the training data are fused to obtain fusion data, the fusion data are subjected to standardized processing and projected to a high-dimensional space by utilizing a Gaussian kernel function, then feature decomposition is carried out on the kernel matrix to obtain a feature value and a feature vector, finally the Rayleigh entropy is calculated, and the feature information of the fusion data is calculated by selecting the feature value and the feature vector according to the size of the Rayleigh entropy. In the fault diagnosis stage, the characteristic information of the fusion data is divided into the characteristic information of the training data and the characteristic information of the test data, then the characteristic information of the training data is utilized to calculate fault detection variables of the training data and obtain a fault detection threshold value, and then the fault detection variables calculated through the characteristic information of the test data are compared with the fault detection threshold value to judge whether hot spot faults occur or not and are used in the field of photovoltaic hot spot fault diagnosis.
Drawings
FIG. 1 is a fault detection flow chart of a photovoltaic hot spot fault detection method based on fusion kernel entropy and information gain;
FIG. 2 is a graph of the feature extraction comparison result of the present invention;
fig. 3 is a graph of the fault detection results of the present invention.
Detailed Description
The invention will be explained in further detail below with reference to the drawings and embodiments. The specific embodiments described herein are to be considered in an illustrative sense only and are not intended to limit the invention. In order that those skilled in the art can better understand the implementation of the present invention, the present invention will use R language for fault diagnosis and verify the inventive results.
As shown in fig. 1, a photovoltaic hot spot fault detection method based on fusion kernel entropy and information gain comprises the following steps:
step 1, collecting photovoltaic module data to form sample data, wherein a training sample is X train The test sample is X test The method comprises the steps of carrying out a first treatment on the surface of the Training sample X train And test sample X test Fusion to obtain a fusion sample X f
Step 2, calculating a fusion sample X f Mean and standard deviation, to fuse sample X f Normalized fusion sample X 'that became 0 standard deviation 1 as mean' f
Step 3, selecting a projection function as a Gaussian kernel function, and using the Gaussian kernel function to obtain X' f Projecting to a high-dimensional space to obtain a kernel matrix K f
Step 4, for the kernel matrix K f Performing feature decomposition to obtain a feature value diagonal matrix D and a feature vector matrixE, calculating a Rayleigh entropy value and sorting the characteristic values and corresponding characteristic vectors in descending order according to the Rayleigh entropy value;
step 5, selecting characteristic values and characteristic vectors according to the Rayleigh entropy, and calculating characteristic information Y f
Step 6, utilizing the information characteristic Y of the training sample train Calculating the detection variable F of the training sample train And determines a fault detection threshold F m
Step 7, calculating fault detection variable F of the test sample test Will detect variable F test And a detection threshold F m And comparing, and judging whether the photovoltaic module has a hot spot fault or not.
The step 1 specifically comprises the following steps: collecting characteristic data of a photovoltaic module to form sample data X, X= [ X ] 1 ,x 2 ...x θ ...x S ] T S represents the total number of samples taken of the data, θ=1, 2..s, x θ Represent the first θ Collecting characteristic data of the photovoltaic assembly, wherein the characteristic data of the photovoltaic assembly comprise irradiance G (W/m 2), temperature T (DEG C), open-circuit voltage UO (V), short-circuit current IS (A), maximum power point current IM (A), maximum power point voltage UM (V), maximum power MP (W) and filling factor F (1); continuously selecting n normal samples from the sample data X to form a training sample X train =[x 1 ,x 2 ...x n ] T Wherein, training sample X train The method comprises the steps of including n samples, wherein each sample comprises 8 photovoltaic module characteristic data; dividing training sample X from sample data X train The rest part is continuously selected m samples to form a test sample X test =[x 1 ,x 2 ...x m ] T Wherein test sample X test The method comprises the steps of including m samples, wherein each sample includes 8 photovoltaic module characteristic data; training sample X train And test sample X test Fusion to obtain a fusion sample X f The formula (1) is shown.
X f =[X train ,X test ]=[x 1 ,x 2 ...x i ...x n+m ] T (1)
x i Characteristic data of the photovoltaic module obtained by sampling for the ith time is represented, wherein the characteristic data comprises 8 photovoltaic module characteristic data, and the photovoltaic module characteristic data comprises irradiance G (W/m 2 ) Temperature T (DEG C), open circuit voltage U O (V) short-circuit current I S (A) Maximum power point current I M (A) Maximum power point voltage U M (V), maximum power M P (W) and a fill factor F (1).
The step 2 specifically comprises the following steps: for fusion sample X f Normalizing based on the characteristics to obtain a normalized fusion sample X' f Normalization is performed by formula (2):
Figure GDA0004176483070000151
Figure GDA0004176483070000152
wherein mean (X f ) Representing fusion sample X f Mean value of sd (X) f ) Representing fusion sample X f Standard deviation of (2).
The step 3 specifically comprises the following steps: fusion sample X to be normalized f ' projection into a high-dimensional space, the projection function being a gaussian kernel function, the gaussian kernel function being of formula (3):
Figure GDA0004176483070000153
wherein x is i ' represents a normalized fusion sample X f ' ith sample, x j ' represents a normalized fusion sample X f ' j-th sample, ||x i ′-x j ' indicates sample x i ' and sample x j 'Euclidean distance between' sigma is the bandwidth of Gaussian kernel function, and sigma > 0;
will k (x) i ′,x j ') is substituted into the Gaussian kernel function to obtain a kernel matrix K of the sample projected into a high-dimensional space f
Figure GDA0004176483070000161
The step 4 specifically comprises the following steps: nuclear matrix K f Satisfy formula (5):
(K f -λI)e=0 (5)
wherein λ represents a kernel matrix K f E represents the kernel matrix K f I is the identity matrix of (n+m) · (n+m); let determinant |K f - λi|=0, and find the kernel matrix K f Is lambda 12 ...λ i ...λ n+m ,λ i Representing a kernel matrix K f Is the i-th eigenvalue of (a); will characteristic value lambda 12 ...λ i ...λ n+m Respectively substituting into (5) to obtain corresponding feature vector e 1 ,e 2 ...e i ...e n+m ,e i Representing a kernel matrix K f Is the i-th feature vector of (a);
due to the kernel matrix K f Is a semi-positive definite matrix, then the kernel matrix K f Represented by formula (6).
K f =EDE T (6)
In formula (6), d=diag (λ) 1 ,λ 2 …λ i …λ n+m ) Is a nuclear matrix K f Diagonal matrix of eigenvalues of (E) = (E) 1 ,e 2 …e i …e n+m ) Is a feature vector matrix;
calculating the Rayleigh entropy value of the fusion sample, and sorting the feature values and the corresponding feature vectors in descending order according to the Rayleigh entropy value:
Figure GDA0004176483070000171
in the above-mentioned method, the step of,
Figure GDA0004176483070000172
for the Rayleigh entropy value (an estimate of Rayleigh entropy), n+m represents the sampleThe number of the codes, 1 is a row vector of (n+m) x 1;
based on Rayleigh entropy
Figure GDA0004176483070000173
The corresponding eigenvalues and eigenvectors are ordered in descending order to obtain a diagonal matrix D ' =diag (lambda ') composed of the eigenvalues after ordered in descending order ' 1 ,λ′ 2 …λ′ i …λ′ n+m ) The feature vector matrix E '=diag (E' 1 ,e' 2 …e′ i …e' n+m );λ′ i To rank the ith eigenvalue in descending order, e' i The i-th feature vector is sorted according to descending order;
the nuclear entropy component analysis method is converted into an optimization problem:
Figure GDA0004176483070000174
Φ eca feature mapping representing a kernel entropy component analysis;
in the above, phi eca Feature mapping for kernel entropy component analysis, V d (p) represents the Rayleigh entropy value obtained by taking the first D eigenvalues and the corresponding eigenvectors from the diagonal matrix D' after sequencing; d (D) d For the diagonal matrix of the first D eigenvalues λ in the ordered diagonal matrix D', D < = n+m, E d Is the first d eigenvectors (with eigenvalue lambda 'in the ordered eigenvector matrix E' i Correspondence) to the feature vector matrix.
The step 5 specifically comprises the following steps:
to phi eca Constraint is carried out, and selection is carried out
Figure GDA0004176483070000175
Obtaining a d value; in this embodiment, M Kernel entropy threshold =0.85;
For normalized fusion samples X' f Projecting in the space where the feature vector is located to obtain feature information Y f Formula (9):
Figure GDA0004176483070000181
Figure GDA0004176483070000182
is a standardized fusion sample X' f Phi represents a nonlinear mapping matrix, phi (X f ') represents fusion sample X' f A matrix subjected to nonlinear mapping;
characteristic information Y f Feature information Y comprising training samples train And information characteristic Y of test sample test . As shown in fig. 2, the black color is the characteristic information extracted from the training sample through the nuclear entropy component analysis, and the red color is the characteristic information extracted from the test sample (hot spot failure sample) through the nuclear entropy component analysis.
The step 6 specifically comprises the following steps: using characteristic information Y of training samples train Calculating a detection variable and a fault detection threshold of the training sample; the kernel entropy component analysis is a characteristic extraction method based on Rayleigh entropy, and an information gain is introduced as a detection variable for hot spot fault detection, wherein the definition of the information gain is as follows:
G(B,A)=H(B)-H(B|A) (10)
the information entropy of the data B is H (B), and the conditional entropy of the data B in the feature A is H (B/A); g (B, a) represents an information gain of the data B under the condition of the feature a (the information gain represents a degree of reduction in uncertainty of information in the case of the feature a determination);
the information gain of the training sample and the test sample is defined as follows:
G t =Y train -Y test (11)
Y train refer to the information features of training samples, Y test Refers to the information characteristic of the test sample.
When the test sample is a hot spot fault sample, the information gain of the photovoltaic module under the hot spot fault characteristic is as follows:
G h =Y train -Y h (12)
Y h Refers to the information characteristics of the hot spot failure samples.
When the test sample is a normal sample, the information gain of the two groups of normal samples is as follows:
G n =Y train -Y n (13)
Y n refers to the information characteristic of a normal sample.
The trace of the matrix describes the characteristic information of the matrix, taking into account the information gain G t Not necessarily square, the detected variable of the hot spot failure is defined as:
F test =tr((G t ) T G t ) (14)
tr () represents the trace-finding operation of the matrix, i.e. the sum of the diagonal elements of the square matrix.
Feature information Y of training sample train According to the following steps of 1: the 1 scale is divided into two parts: y is Y train1 And Y train2 ,Y train1 And Y train2 For calculating the gain of information within the training sample room. Y is Y train1 And Y train2 Satisfying the equation (13), which is the information characteristic of the normal sample, to obtain the information gain G in the training sample room a The method comprises the following steps:
G a =Y train1 -Y train2 (15)
Y train =[Y train1 ,Y train2 ]handle Y train According to the following steps of 1: the 1 scale is divided into two parts: y is Y train1 And Y train2 For calculating the gain of information within the training sample interval.
The test variables for the training samples are:
F train =tr((G a ) T G a ) (16)
the detection threshold value of the hot spot fault of the photovoltaic module is F m =maxF train
The step 7 specifically comprises the following steps: information gain G based on training samples and test samples t (calculation of fault detection variable F of test sample) test According to the detection threshold F m Test variable F with test sample test Judging whether the hot spot fault occurs. To test the test variable F of the sample test Detection threshold F for hot spot fault of photovoltaic module m For comparison, as shown in FIG. 3, if F test >F m And judging that the photovoltaic module has a hot spot fault.
A photovoltaic hot spot fault detection system based on fusion of nuclear entropy and information gain comprises a sample data establishment unit, a sample standardization unit, a nuclear matrix calculation unit, a Rayleigh entropy calculation unit, a characteristic information acquisition unit, a fault detection threshold calculation unit and a fault judgment unit;
the sample data establishing unit acquires the data of the photovoltaic module to form sample data, wherein the training sample is X train The test sample is X test The method comprises the steps of carrying out a first treatment on the surface of the Training sample X train And test sample X test Fusion to obtain a fusion sample X f
Sample normalization unit calculates fusion sample X f Mean and standard deviation, to fuse sample X f Normalized fusion sample X 'that became 0 standard deviation 1 as mean' f
The kernel matrix calculation unit selects the projection function as a Gaussian kernel function, and uses the Gaussian kernel function to calculate X' f Projecting to a high-dimensional space to obtain a kernel matrix K f
Rayleigh entropy value calculation unit pair kernel matrix K f Performing feature decomposition to obtain a feature value diagonal matrix D and a feature vector matrix E, calculating a Rayleigh entropy value, and performing descending order sequencing on the feature values and the corresponding feature vectors according to the Rayleigh entropy value;
the feature information acquisition unit selects feature values and feature vectors based on the Rayleigh entropy values, and calculates feature information Y f
The failure detection threshold calculation unit is based on the information feature Y of the training sample train Calculating the detection variable F of the training sample train And determines a fault detection threshold F m
The fault judging unit calculates the detection variable F of the test sample test Will detect variable F test And a detection threshold F m And comparing, and judging whether the photovoltaic module has a hot spot fault or not.
The working process of the sample data establishing unit specifically comprises the following steps:
collecting characteristic data of a photovoltaic module to form sample data X, X= [ X ] 1 ,x 2 ...x θ ...x S ] T S represents the total number of samples taken of the data, θ=1, 2..s, x θ The characteristic data of the photovoltaic module IS acquired for the theta time, and comprises irradiance G (W/m 2), temperature T (DEG C), open-circuit voltage UO (V), short-circuit current IS (A), maximum power point current IM (A), maximum power point voltage UM (V), maximum power MP (W) and filling factor F (1); continuously selecting n normal samples from the sample data X to form a training sample X train =[x 1 ,x 2 ...x n ] T Wherein, training sample X train The method comprises n samples, wherein each sample comprises t photovoltaic module characteristic data; continuously selecting m samples from the rest of the sample data X to form a test sample X test =[x 1 ,x 2 ...x m ] T Wherein test sample X test The method comprises the steps of including m samples, wherein each sample includes t photovoltaic module characteristic data; training sample X train And test sample X test Fusion to obtain a fusion sample X f :
X f =[X train ,X test ]=[x 1 ,x 2 ...x i ...x n+m ] T (1)
x i Representing characteristic data of the photovoltaic module obtained by sampling for the ith time, wherein the characteristic data comprise t photovoltaic module characteristic data, and the photovoltaic module characteristic data comprise irradiance G (W/m) 2 ) Temperature T (DEG C), open circuit voltage U O (V) short-circuit current I S (A) Maximum power point current I M (A) Maximum power point voltage U M (V), maximum power M P (W) and a fill factor F (1).
The working process of the sample standardization unit specifically comprises the following steps: for fusion sample X f Normalizing based on the characteristics to obtain a normalized fusion sample X' f Normalization is performed by formula (2):
Figure GDA0004176483070000211
wherein mean (X f ) Representing fusion sample X f Mean value of sd (X) f ) Representing fusion sample X f Standard deviation of (2).
The working process of the core matrix computing unit specifically comprises the following steps:
fusion sample X to be normalized f ' projection into a high-dimensional space, the projection function being a gaussian kernel function, the gaussian kernel function being of formula (3):
Figure GDA0004176483070000212
wherein x is i ' represents a normalized fusion sample X f ' ith sample, x j ' represents a normalized fusion sample X f ' j-th sample, ||x i ′-x j ' indicates sample x i ' and sample x j 'Euclidean distance between' sigma is the bandwidth of Gaussian kernel function, and sigma > 0;
Will k (x) i ′,x j ') is substituted into the Gaussian kernel function to obtain a kernel matrix K of the sample projected into a high-dimensional space f
Figure GDA0004176483070000221
The working process of the Rayleigh entropy value calculating unit specifically comprises the following steps:
nuclear matrix K f Satisfy formula (5):
(K f -λI)e=0 (5)
wherein λ represents a kernel matrix K f E represents the kernel matrix K f I is the identity matrix of (n+m) · (n+m); let determinant |K f - λi|=0, and find the kernel matrix K f Is lambda 12 ...λ i ...λ n+m ,λ i Representing a kernel matrix K f Is the i-th eigenvalue of (a); will characteristic value lambda 12 ...λ i ...λ n+m Respectively substituting into (5) to obtain corresponding feature vector e 1 ,e 2 ...e i ...e n+m ,e i Representing a kernel matrix K f Is the i-th feature vector of (a);
due to the kernel matrix K f Is a semi-positive definite matrix, then the kernel matrix K f Represented by the form of formula (6):
K f =EDE T (6)
in formula (6), d=diag (λ) 1 ,λ 2 …λ i …λ n+m ) Is a nuclear matrix K f Diagonal matrix of eigenvalues of (E) = (E) 1 ,e 2 …e i …e n+m ) Is a feature vector matrix;
calculating the Rayleigh entropy value of the fusion sample, and sorting the feature values and the corresponding feature vectors in descending order according to the Rayleigh entropy value:
Figure GDA0004176483070000231
in the formula (7), the amino acid sequence of the compound,
Figure GDA0004176483070000232
for the rayleigh entropy value (estimated value of rayleigh entropy), n+m represents the number of samples, and 1 is a row vector of (n+m) ×1;
based on Rayleigh entropy
Figure GDA0004176483070000233
The corresponding eigenvalues and eigenvectors are ordered in descending order to obtain a diagonal matrix D ' =diag (lambda ') composed of the eigenvalues after ordered in descending order ' 1 ,λ′ 2 …λ′ i …λ′ n+m ) The feature vector matrix E '=diag (E' 1 ,e′ 2 …e′ i …e′ n+m );λ′ i To rank the ith eigenvalue in descending order, e' i To be arranged in descending orderThe i-th feature vector after the sequence;
the nuclear entropy component analysis method is converted into an optimization problem:
Figure GDA0004176483070000234
Φ eca feature mapping representing a kernel entropy component analysis;
V d (p) represents the Rayleigh entropy value obtained by taking the first D eigenvalues and the corresponding eigenvectors from the diagonal matrix D' after sequencing; d (D) d For the diagonal matrix of the first D eigenvalues λ in the ordered diagonal matrix D', D < = n+m, E d Is the first d eigenvectors (with eigenvalue lambda 'in the ordered eigenvector matrix E' i Correspondence) to the feature vector matrix.
The characteristic information acquisition unit specifically comprises the following steps:
to phi eca Constraint is carried out, and selection is carried out
Figure GDA0004176483070000235
Obtaining a d value; m is M Kernel entropy threshold Is constant; />
For normalized fusion samples X' f Projecting in the space where the feature vector is located to obtain feature information Y f Formula (9):
Figure GDA0004176483070000241
Figure GDA0004176483070000242
is a standardized fusion sample X' f Phi represents a nonlinear mapping matrix, phi (X f ') represents fusion sample X' f A matrix subjected to nonlinear mapping;
characteristic information Y f Feature information Y comprising training samples train And information characteristic Y of test sample test
The fault detection threshold calculation unit specifically includes the following steps:
Using characteristic information Y of training samples train Calculating a detection variable and a fault detection threshold of the training sample;
the information gain of the training sample and the test sample is defined as follows:
G t =Y train -Y test (11)
Y train refer to the information features of training samples, Y test Information features of the test sample;
when the test sample is a hot spot fault sample, the information gain of the photovoltaic module under the hot spot fault characteristic is as follows:
G h =Y train -Y h (12)
Y h refers to the information characteristics of the hot spot failure samples.
When the test sample is a normal sample, the information gain G of the two groups of normal samples n The method comprises the following steps:
G n =Y train -Y n (13)
Y n information features of normal samples are pointed out;
the trace of the matrix describes the characteristic information of the matrix, taking into account the information gain G t Not necessarily square, the detected variable of the hot spot failure is defined as:
F test =tr((G t ) T G t ) (14)
tr () represents the trace-finding operation of the matrix, i.e. the sum of the diagonal elements of the square matrix.
Feature information Y of training sample train According to the following steps of 1: the 1 scale is divided into two parts: y is Y train1 And Y train2 ,Y train1 And Y train2 The information gain in the training sample room is calculated; deriving information gain G in training sample room a The method comprises the following steps:
G a =Y train1 -Y train2 (15)
Y train =[Y train1 ,Y train2 ]handle Y train According to the following steps of 1:1 division of proportionsIs two parts: y is Y train1 And Y train2 For calculating the gain of information within the training sample interval.
The test variables for the training samples are:
F train =tr((G a ) T G a ) (16)
the detection threshold value of the hot spot fault of the photovoltaic module is F m =maxF train
The working process of the fault judging unit specifically comprises the following steps:
information gain G based on training samples and test samples t (calculation of fault detection variable F of test sample) test According to the detection threshold F m Test variable F with test sample test Judging whether the hot spot fault occurs. To test the test variable F of the sample test Detection threshold F for hot spot fault of photovoltaic module m Comparing if F test >F m And judging that the photovoltaic module has a hot spot fault.
The invention uses Rstudio software to carry out simulation verification of the performance comparison of the method: the random number seed is set to 100 and the parameters of the gaussian kernel function are set to 0.1. Through verification, the method of the embodiment can effectively realize fault detection of the photovoltaic hot spots.
In the description provided herein, numerous specific details are set forth. However, it is understood that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.
Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be construed as reflecting the intention that: i.e., the claimed invention requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.
Those skilled in the art will appreciate that the modules or units or groups of devices in the examples disclosed herein may be arranged in a device as described in this embodiment, or alternatively may be located in one or more devices different from the devices in this example. The modules in the foregoing examples may be combined into one module or may be further divided into a plurality of sub-modules.
Those skilled in the art will appreciate that the modules in the apparatus of the embodiments may be adaptively changed and disposed in one or more apparatuses different from the embodiments. The modules or units or groups of embodiments may be combined into one module or unit or group, and furthermore they may be divided into a plurality of sub-modules or sub-units or groups. Any combination of all features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or units of any method or apparatus so disclosed, may be used in combination, except insofar as at least some of such features and/or processes or units are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings), may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.
Furthermore, those skilled in the art will appreciate that while some embodiments described herein include some features but not others included in other embodiments, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the following claims, any of the claimed embodiments can be used in any combination.
Furthermore, some of the embodiments are described herein as methods or combinations of method elements that may be implemented by a processor of a computer system or by other means of performing the functions. Thus, a processor with the necessary instructions for implementing the described method or method element forms a means for implementing the method or method element. Furthermore, the elements of the apparatus embodiments described herein are examples of the following apparatus: the apparatus is for carrying out the functions performed by the elements for carrying out the objects of the invention.
The various techniques described herein may be implemented in connection with hardware or software or, alternatively, with a combination of both. Thus, the methods and apparatus of the present invention, or certain aspects or portions of the methods and apparatus of the present invention, may take the form of program code (i.e., instructions) embodied in tangible media, such as floppy diskettes, CD-ROMs, hard drives, or any other machine-readable storage medium, wherein, when the program is loaded into and executed by a machine, such as a computer, the machine becomes an apparatus for practicing the invention.
In the case of program code execution on programmable computers, the computing device will generally include a processor, a storage medium readable by the processor (including volatile and non-volatile memory and/or storage elements), at least one input device, and at least one output device. Wherein the memory is configured to store program code; the processor is configured to perform the method of the invention in accordance with instructions in said program code stored in the memory.
By way of example, and not limitation, computer readable media comprise computer storage media and communication media. Computer-readable media include computer storage media and communication media. Computer storage media stores information such as computer readable instructions, data structures, program modules, or other data. Communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media. Combinations of any of the above are also included within the scope of computer readable media.
As used herein, unless otherwise specified the use of the ordinal terms "first," "second," "third," etc., to describe a general object merely denote different instances of like objects, and are not intended to imply that the objects so described must have a given order, either temporally, spatially, in ranking, or in any other manner.
While the invention has been described with respect to a limited number of embodiments, those skilled in the art, having benefit of the above description, will appreciate that other embodiments are contemplated within the scope of the invention as described herein. Furthermore, it should be noted that the language used in the specification has been principally selected for readability and instructional purposes, and may not have been selected to delineate or circumscribe the inventive subject matter. Accordingly, many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the appended claims. The disclosure of the present invention is intended to be illustrative, but not limiting, of the scope of the invention, which is defined by the appended claims.

Claims (6)

1. The photovoltaic hot spot fault detection method based on fusion kernel entropy and information gain is characterized by comprising the following steps of:
step 1, collecting photovoltaic module data to form sample data, wherein a training sample is X train The test sample is X test The method comprises the steps of carrying out a first treatment on the surface of the Training sample X train And test sample X test Fusion to obtain a fusion sample X f
Step 2, calculating a fusion sample X f Mean and standard deviation, to fuse sample X f Normalized fusion sample X 'that became 0 standard deviation 1 as mean' f
Step 3, selecting a projection function as a Gaussian kernel function, and using the Gaussian kernel function to obtain X' f Projecting to a high-dimensional space to obtain a kernel matrix K f
Step 4, for the kernel matrix K f Performing feature decomposition to obtain a feature value diagonal matrix D and a feature vector matrix E, calculating a Rayleigh entropy value, and performing descending order sequencing on the feature values and the corresponding feature vectors according to the Rayleigh entropy value;
step 5, selecting characteristic values and characteristic vectors according to the Rayleigh entropy values, and calculating characteristic information Y f
Step 6, utilizing the information characteristic Y of the training sample train Calculating the detection variable F of the training sample train And determines a fault detection threshold F m
Step 7, calculating the detection variable F of the test sample test Will detect variable F test And a detection threshold F m Comparing, and judging whether the photovoltaic module has a hot spot fault;
the step 1 specifically comprises the following steps:
collecting characteristic data of a photovoltaic module to form sample data X, X= [ X ] 1 ,x 2 ...x θ ...x S ] T S represents the total number of samples taken of the data, θ=1, 2 … S, x θ The characteristic data of the photovoltaic module is acquired for the theta time, and the characteristic data of the photovoltaic module comprises irradiance, temperature, open-circuit voltage, short-circuit current, maximum power point voltage, maximum power and filling factor F; continuously selecting n normal samples from the sample data X to form a training sample X train =[x 1 ,x 2 ...x n ] T Wherein, training sample X train The method comprises n samples, wherein each sample comprises t photovoltaic module characteristic data; selecting m consecutive samples from the rest of the sample data X to form a test sample X test =[x 1 ,x 2 ...x m ] T Wherein test sample X test The method comprises the steps of including m samples, wherein each sample includes t photovoltaic module characteristic data; training sample X train And test sample X test Fusion to obtain a fusion sample X f :
X f =[X train ,X test ]=[x 1 ,x 2 ...x i ...x n+m ] T (1)
x i Representing characteristic data of the photovoltaic module obtained by the ith sampling;
the step 4 specifically comprises the following steps:
nuclear matrix K f Satisfy formula (5):
(K f -λI)e=0 (5)
wherein λ represents a kernel matrix K f E represents the kernel matrix K f I is the identity matrix of (n+m) · (n+m); determinant |K f - λi|=0, and find the kernel matrix K f Is lambda 12 ...λ i ...λ n+m ,λ i Representing a kernel matrix K f Is the i-th eigenvalue of (a);
will characteristic value lambda 12 ...λ i ...λ n+m Respectively substituting into (5) to obtain corresponding feature vector e 1 ,e 2 ...e i ...e n+m ,e i Representing a kernel matrix K f Is the i-th feature vector of (a);
nuclear matrix K f Represented by the form of formula (6):
K f =EDE T (6)
D=diag(λ 1 ,λ 2 …λ i …λ n+m ) Is a nuclear matrix K f Diagonal matrix of eigenvalues of (E) = (E) 1 ,e 2 …e i …e n+m ) Is a feature vector matrix;
calculating the Rayleigh entropy value of the fusion sample, and sorting the feature values and the corresponding feature vectors in descending order according to the Rayleigh entropy value:
Figure FDA0004176483050000031
Figure FDA0004176483050000032
for the Rayleigh entropy value, n+m represents the number of samples, and 1 is a row vector of (n+m) ×1;
Based on Rayleigh entropy
Figure FDA0004176483050000033
Corresponding eigenvalue and eigenvectorThe rows are sorted in descending order to obtain a diagonal matrix D ' =diag (lambda ') composed of eigenvalues after descending order sorting ' 1 ,λ' 2 …λ' i …λ' n+m ) Feature vector matrix after descending order sequencing
Figure FDA0004176483050000035
λ' i To rank the ith eigenvalue in descending order, e' i The i-th feature vector is sorted according to descending order;
performing nuclear entropy component analysis:
Figure FDA0004176483050000034
Φ eca feature mapping representing a kernel entropy component analysis;
V d (p) represents the Rayleigh entropy value obtained by taking the first D eigenvalues and the corresponding eigenvectors from the diagonal matrix D' after sequencing; d (D) d Diagonal matrix composed of the first D eigenvalues in the ordered diagonal matrix D', D < = n+m, E d Is a feature vector matrix formed by the first d feature vectors in the ordered feature vector matrix E'.
2. The method for detecting the photovoltaic hot spot fault based on the fusion nuclear entropy and the information gain according to claim 1, wherein the method comprises the following steps of,
the step 2 specifically comprises the following steps:
for fusion sample X f Normalizing based on the characteristics to obtain a normalized fusion sample X' f Normalization is performed by formula (2):
Figure FDA0004176483050000041
wherein mean (X f ) Representing fusion sample X f Mean value of sd (X) f ) Representing fusion sample X f Standard deviation of (2).
3. The method for detecting the photovoltaic hot spot fault based on the fusion nuclear entropy and the information gain according to claim 2, wherein,
The step 3 specifically comprises the following steps:
fusion sample X to be normalized f ' projection into a high-dimensional space, the projection function being a gaussian kernel function, the gaussian kernel function being of formula (3):
Figure FDA0004176483050000042
wherein x is i ' represents a normalized fusion sample X f ' ith sample, x j ' represents a normalized fusion sample X f ' j-th sample, ||x i ′-x j ' indicates sample x i ' and sample x j 'Euclidean distance between' sigma is the bandwidth of Gaussian kernel function, and sigma > 0;
will k (x) i ′,x j ') is substituted into the Gaussian kernel function to obtain a kernel matrix K of the sample projected into a high-dimensional space f
Figure FDA0004176483050000043
/>
4. The method for detecting the photovoltaic hot spot fault based on the fusion nuclear entropy and the information gain according to claim 1, wherein the method comprises the following steps of,
the step 5 specifically comprises the following steps:
to phi eca Constraint is carried out, and selection is carried out
Figure FDA0004176483050000044
Obtaining a d value; m is M Kernel entropy threshold Is constant;
for normalized fusion samples X' f Projecting in the space where the feature vector is located to obtain feature information Y f Formula (9):
Figure FDA0004176483050000051
Figure FDA0004176483050000052
is a standardized fusion sample X' f Phi represents a nonlinear mapping matrix, phi (X f ') represents fusion sample X' f A matrix subjected to nonlinear mapping;
characteristic information Y f Feature information Y comprising training samples train And information characteristic Y of test sample test
5. The method for detecting photovoltaic hot spot failure based on fusion kernel entropy and information gain as claimed in claim 4, wherein,
the step 6 specifically comprises the following steps:
using characteristic information Y of training samples train Calculating a detection variable and a fault detection threshold of the training sample;
defining information gain G for training samples and test samples t The method comprises the following steps:
G t =Y train -Y test (11)
when the test sample is a hot spot fault sample, the information gain G of the photovoltaic module under the hot spot fault characteristic h The method comprises the following steps:
G h =Y train -Y h (12)
Y h information features of the hot spot fault samples;
when the test sample is a normal sample, the information gain G of the two groups of normal samples n The method comprises the following steps:
G n =Y train -Y n (13)
Y n information features of normal samples are pointed out;
the detected variable of the hot spot fault is defined as:
F test =tr((G t ) T G t ) (14)
tr () represents the trace calculation of the matrix;
information gain G in training sample room a The method comprises the following steps:
G a =Y train1 -Y train2 (15)
Y train =[Y train1 ,Y train2 ]handle Y train According to the following steps of 1: the 1 scale is divided into two parts: y is Y train1 And Y train2 For calculating the gain of information within the training sample interval:
the detection variables of the training samples are:
F train =tr((G a ) T G a ) (16)
the detection threshold value of the hot spot fault of the photovoltaic module is F m =maxF train
6. The photovoltaic hot spot fault detection system based on fusion of the nuclear entropy and the information gain is characterized by comprising a sample data establishing unit, a sample normalizing unit, a nuclear matrix calculating unit, a Rayleigh entropy calculating unit, a characteristic information obtaining unit, a fault detection threshold calculating unit and a fault judging unit;
The sample data establishing unit acquires the data of the photovoltaic module to form sample data, wherein the training sample is X train The test sample is X test The method comprises the steps of carrying out a first treatment on the surface of the Training sample X train And test sample X test Fusion to obtain a fusion sample X f
Sample normalization unit calculates fusion sample X f Mean and standard deviation, to fuse sample X f Normalized fusion sample X 'that became 0 standard deviation 1 as mean' f
The kernel matrix calculation unit selects the projection function as a Gaussian kernel function, and uses the Gaussian kernel function to calculate X' f Projecting to a high-dimensional space to obtain a kernel matrix K f
Rayleigh entropy value calculation unit kernelMatrix K f Performing feature decomposition to obtain a feature value diagonal matrix D and a feature vector matrix E, calculating a Rayleigh entropy value, and performing descending order sequencing on the feature values and the corresponding feature vectors according to the Rayleigh entropy value;
the feature information acquisition unit selects feature values and feature vectors based on the Rayleigh entropy values, and calculates feature information Y f
The failure detection threshold calculation unit is based on the information feature Y of the training sample train Calculating the detection variable F of the training sample train And determines a fault detection threshold F m
The fault judging unit calculates the detection variable F of the test sample test Will detect variable F test And a detection threshold F m Comparing, and judging whether the photovoltaic module has a hot spot fault;
the working process of the sample data establishing unit specifically comprises the following steps:
Collecting characteristic data of a photovoltaic module to form sample data X, X= [ X ] 1 ,x 2 ...x θ ...x S ] T S represents the total number of samples taken of the data, θ=1, 2 … S, x θ The characteristic data of the photovoltaic module is acquired for the theta time, and the characteristic data of the photovoltaic module comprises irradiance, temperature, open-circuit voltage, short-circuit current, maximum power point voltage, maximum power and filling factor; continuously selecting n normal samples from the sample data X to form a training sample X train =[x 1 ,x 2 ...x n ] T Wherein, training sample X train The method comprises n samples, wherein each sample comprises t photovoltaic module characteristic data; continuously selecting m samples from the rest of the sample data X to form a test sample X test =[x 1 ,x 2 ...x m ] T Wherein test sample X test The method comprises the steps of including m samples, wherein each sample includes t photovoltaic module characteristic data; training sample X train And test sample X test Fusion to obtain a fusion sample X f :
X f =[X train ,X test ]=[x 1 ,x 2 ...x i ...x n+m ] T (1)
x i Representing characteristic data of the photovoltaic module obtained by the ith sampling;
the working process of the sample standardization unit specifically comprises the following steps:
for fusion sample X f Normalizing based on the characteristics to obtain a normalized fusion sample X' f Normalization is performed by formula (2):
Figure FDA0004176483050000071
wherein mean (X f ) Representing fusion sample X f Mean value of sd (X) f ) Representing fusion sample X f Standard deviation of (2);
the working process of the core matrix computing unit specifically comprises the following steps:
fusion sample X to be normalized f ' projection into a high-dimensional space, the projection function being a gaussian kernel function, the gaussian kernel function being of formula (3):
Figure FDA0004176483050000081
wherein x is i ' represents a normalized fusion sample X f ' ith sample, x j ' represents a normalized fusion sample X f ' j-th sample, ||x i ′-x j ' indicates sample x i ' and sample x j 'Euclidean distance between' sigma is the bandwidth of Gaussian kernel function, and sigma > 0; will k (x) i ′,x j ') is substituted into the Gaussian kernel function to obtain a kernel matrix K of the sample projected into a high-dimensional space f
Figure FDA0004176483050000082
The working process of the Rayleigh entropy value calculating unit specifically comprises the following steps:
nuclear matrix K f Satisfy formula (5):
(K f -λI)e=0 (5)
wherein λ represents a kernel matrix K f E represents the kernel matrix K f I is the identity matrix of (n+m) · (n+m); determinant |K f - λi|=0, and find the kernel matrix K f Is lambda 12 ...λ i ...λ n+m ,λ i Representing a kernel matrix K f Is the i-th eigenvalue of (a); will characteristic value lambda 12 ...λ i ...λ n+m Respectively substituting into (5) to obtain corresponding feature vector e 1 ,e 2 ...e i ...e n+m ,e i Representing a kernel matrix K f Is the i-th feature vector of (a);
nuclear matrix K f Represented by the form of formula (6):
K f =EDE T (6)
D=diag(λ 1 ,λ 2 …λ i …λ n+m ) Is a nuclear matrix K f Diagonal matrix of eigenvalues of (E) = (E) 1 ,e 2 …e i …e n+m ) Is a feature vector matrix;
Calculating the Rayleigh entropy value of the fusion sample, and sorting the feature values and the corresponding feature vectors in descending order according to the Rayleigh entropy value:
Figure FDA0004176483050000091
Figure FDA0004176483050000092
for the Rayleigh entropy value, n+m represents the number of samples, and 1 is a row vector of (n+m) ×1;
based on Rayleigh entropy
Figure FDA0004176483050000093
For the corresponding eigenvalue sumThe feature vectors are subjected to descending order to obtain a diagonal matrix D ' =diag (lambda ') formed by feature values after descending order ' 1 ,λ' 2 …λ' i …λ' n+m ) Feature vector matrix after descending order sequencing
Figure FDA0004176483050000095
λ' i To rank the ith eigenvalue in descending order, e' i The i-th feature vector is sorted according to descending order;
performing nuclear entropy component analysis:
Figure FDA0004176483050000094
Φ eca feature mapping representing a kernel entropy component analysis;
V d (p) represents the Rayleigh entropy value obtained by taking the first D eigenvalues and the corresponding eigenvectors from the diagonal matrix D' after sequencing;
D d for the diagonal matrix of the first D eigenvalues λ in the ordered diagonal matrix D', D < = n+m, E d Is a feature vector matrix formed by the first d feature vectors in the ordered feature vector matrix E';
the characteristic information acquisition unit specifically comprises the following steps:
to phi eca Constraint is carried out, and selection is carried out
Figure FDA0004176483050000101
Obtaining a d value; m is M Kernel entropy threshold Is constant;
for normalized fusion samples X' f Projecting in the space where the feature vector is located to obtain feature information Y f Formula (9):
Figure FDA0004176483050000102
Figure FDA0004176483050000103
is a standardized fusion sample X' f Phi represents a nonlinear mapping matrix, phi (X f ') represents fusion sample X' f A matrix subjected to nonlinear mapping;
characteristic information Y f Feature information Y comprising training samples train And information characteristic Y of test sample test
The fault detection threshold calculation unit specifically includes the following steps:
using characteristic information Y of training samples train Calculating a detection variable and a fault detection threshold of the training sample;
the information gain of the training sample and the test sample is defined as follows:
G t =Y train -Y test (11)
when the test sample is a hot spot fault sample, the information gain G of the photovoltaic module under the hot spot fault characteristic h The method comprises the following steps:
G h =Y train -Y h (12)
Y h information features of the hot spot fault samples;
when the test sample is a normal sample, the information gain G of the two groups of normal samples n The method comprises the following steps:
G n =Y train -Y n (13)
Y n information features of normal samples are pointed out;
the trace of the matrix describes the characteristic information of the matrix, taking into account the information gain G t Not necessarily square, the detected variable of the hot spot failure is defined as:
F test =tr((G t ) T G t ) (14)
tr () represents the trace calculation of the matrix;
information gain G in training sample room a The method comprises the following steps:
G a =Y train1 -Y train2 (15)
Y train =[Y train1 ,Y train2 ]handle Y train According to the following steps of 1:1 into two parts Y train1 And Y train2
The test variables for the training samples are:
F train =tr((G a ) T G a ) (16)
the detection threshold value of the hot spot fault of the photovoltaic module is F m =maxF train
CN202211418893.2A 2022-11-14 2022-11-14 Photovoltaic hot spot fault detection method and system based on fusion kernel entropy and information gain Active CN115758284B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211418893.2A CN115758284B (en) 2022-11-14 2022-11-14 Photovoltaic hot spot fault detection method and system based on fusion kernel entropy and information gain

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211418893.2A CN115758284B (en) 2022-11-14 2022-11-14 Photovoltaic hot spot fault detection method and system based on fusion kernel entropy and information gain

Publications (2)

Publication Number Publication Date
CN115758284A CN115758284A (en) 2023-03-07
CN115758284B true CN115758284B (en) 2023-05-16

Family

ID=85370162

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211418893.2A Active CN115758284B (en) 2022-11-14 2022-11-14 Photovoltaic hot spot fault detection method and system based on fusion kernel entropy and information gain

Country Status (1)

Country Link
CN (1) CN115758284B (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109617526A (en) * 2018-12-20 2019-04-12 福州大学 A method of photovoltaic power generation array fault diagnosis and classification based on wavelet multiresolution analysis and SVM

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106094749B (en) * 2016-06-17 2019-03-01 浙江大学 Based on the nonlinear fault detection method and application for improving nuclear entropy constituent analysis
CN107247968A (en) * 2017-07-24 2017-10-13 东北林业大学 Based on logistics equipment method for detecting abnormality under nuclear entropy constituent analysis imbalance data
CN112947649B (en) * 2021-03-19 2021-11-23 安阳师范学院 Multivariate process monitoring method based on mutual information matrix projection
CN113743476A (en) * 2021-08-10 2021-12-03 湖州师范学院 Fault detection method based on improved KECA, electronic equipment and readable storage medium
CN114139614B (en) * 2021-11-18 2022-09-23 南京工业大学 Fisher photovoltaic module hot spot diagnosis method and system based on typical correlation analysis feature extraction

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109617526A (en) * 2018-12-20 2019-04-12 福州大学 A method of photovoltaic power generation array fault diagnosis and classification based on wavelet multiresolution analysis and SVM

Also Published As

Publication number Publication date
CN115758284A (en) 2023-03-07

Similar Documents

Publication Publication Date Title
Chen et al. Parameters identification of photovoltaic models using hybrid adaptive Nelder-Mead simplex algorithm based on eagle strategy
CN109842373B (en) Photovoltaic array fault diagnosis method and device based on space-time distribution characteristics
CN111444615B (en) Photovoltaic array fault diagnosis method based on K nearest neighbor and IV curve
Anwar et al. A data-driven approach to distinguish cyber-attacks from physical faults in a smart grid
CN111478314B (en) Transient stability evaluation method for power system
CN109886328B (en) Electric vehicle charging facility fault prediction method and system
CN112381351A (en) Power utilization behavior change detection method and system based on singular spectrum analysis
CN111191406A (en) Method for determining an electrical model of a string of photovoltaic modules, diagnostic method and device related thereto
CN112816881A (en) Battery differential pressure abnormality detection method, battery differential pressure abnormality detection device and computer storage medium
CN114139614B (en) Fisher photovoltaic module hot spot diagnosis method and system based on typical correlation analysis feature extraction
CN113283113B (en) Solar cell array power generation current prediction model training method, abnormality detection method, device and medium
CN115758284B (en) Photovoltaic hot spot fault detection method and system based on fusion kernel entropy and information gain
Su et al. Parameter extraction of photovoltaic single-diode model using integrated current–voltage error criterion
CN113595132A (en) Photovoltaic online parameter identification method based on global maximum power point tracking and hybrid optimization algorithm
Trevizan et al. Distribution system state estimation sensitivity to errors in phase connections
CN111277221A (en) Photovoltaic fault diagnosis method and device
Zhu et al. Robust representation learning for power system short-term voltage stability assessment under diverse data loss conditions
KR20230089082A (en) Multilingual speech recognition artificial intelligence model(Wav2Byte) device and method thereof
CN115208315A (en) Photovoltaic intelligent fault diagnosis method based on multiple fault characteristic values
US20220215138A1 (en) Method for Validating System Parameters of an Energy System, Method for Operating an Energy System, and Energy Management System for an Energy System
CN112037191A (en) Method and device for determining local leakage current density threshold and computer equipment
Gu et al. Object detection of overhead transmission lines based on improved YOLOv5s
Hasegawa et al. Iv curve differences image classification by cnn for failure factor determination in pv system
Cao et al. Detection of abnormal status of PV modules at PV stations with complex installation conditions
CN115455730B (en) Photovoltaic module hot spot fault diagnosis method based on complete neighborhood preserving embedding

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant