CN111680820B

CN111680820B - Distributed photovoltaic power station fault diagnosis method and device

Info

Publication number: CN111680820B
Application number: CN202010381231.7A
Authority: CN
Inventors: 赵健; 周宁; 刘昊; 孙芊; 王鹏; 马建伟; 王磊; 张建宾; 朱红路
Original assignee: Beijing Sinokey Technology Co ltd; State Grid Corp of China SGCC; Electric Power Research Institute of State Grid Henan Electric Power Co Ltd
Current assignee: Beijing Sinokey Technology Co ltd; State Grid Corp of China SGCC; Electric Power Research Institute of State Grid Henan Electric Power Co Ltd
Priority date: 2020-05-08
Filing date: 2020-05-08
Publication date: 2022-08-19
Anticipated expiration: 2040-05-08
Also published as: CN111680820A

Abstract

The application relates to a distributed photovoltaic power station fault diagnosis method which comprises the steps of firstly collecting historical operation data of each photovoltaic power station, establishing a time sequence of output data of each photovoltaic power station, and collecting historical fault diagnosis records of each photovoltaic power station; training a BP neural network according to the collected historical operation data of each photovoltaic power station to obtain a prediction model of theoretical power generation, and establishing a distributed photovoltaic fault diagnosis model by using a decision tree method; and then, output data of each photovoltaic power station of the photovoltaic power stations are monitored, the output data are input into a prediction model of theoretical generated energy to obtain the theoretical generated energy, correlation coefficients are calculated according to the output data and the theoretical generated energy, and the correlation coefficients are input into a distributed photovoltaic fault diagnosis model to judge the state of the power stations. The fault diagnosis method can be used for judging the state of the power station.

Description

Distributed photovoltaic power station fault diagnosis method and device

Technical Field

The application belongs to the technical field of photovoltaic power generation fault diagnosis, and particularly relates to a distributed photovoltaic power station fault diagnosis method and device based on a BP neural network and a decision tree.

Background

The photovoltaic price is cheap, does not receive geographical position restriction, can satisfy off-grid system energy demand, and market is wide. The renewable energy generation (RDEG) technology market, released by international market research institute Technavio, will grow 295.15GW scale during 2019-. In recent years, the development speed of the Chinese photovoltaic power generation is remarkable, the installation rate of 44.06GW is newly increased in 2018 national photovoltaic power generation, and the total installation rate of the national photovoltaic power generation reaches 174.63 GW. With the rapid increase of the installed photovoltaic capacity, the intelligent operation function of the photovoltaic power generation system). The implementation of the above functions depends on the quality and reliability of the data.

The current distributed photovoltaic power station is different from a large grid-connected photovoltaic power station, and data collected during operation of the distributed photovoltaic power station often lack meteorological data of a power station field. The lack of meteorological information has led to the failure of many past photovoltaic power plant analysis methods, and has been constrained in practical engineering applications. Therefore, the photovoltaic power station needs to be analyzed and evaluated from a new perspective.

In addition, the output situation of actual photovoltaic is complex, the difference between the output data and theoretical data is large, and in order to analyze power station data, indexes capable of effectively describing the operation state of a photovoltaic power station are needed, and a photovoltaic power station direct current side weak point diagnosis method based on time and space functions is provided.

Disclosure of Invention

The technical problem to be solved by the invention is as follows: in order to solve the defects in the prior art, the method and the device for diagnosing the faults of the distributed photovoltaic power station are provided.

The technical scheme adopted by the invention for solving the technical problem is as follows:

a distributed photovoltaic power station fault diagnosis method comprises the following steps:

s1: collecting historical operating data of each photovoltaic power station, establishing a time sequence of output data of each photovoltaic power station, and collecting historical fault diagnosis records of each photovoltaic power station;

s2: training a BP neural network according to the collected historical operation data of each photovoltaic power station to obtain a prediction model of theoretical power generation, wherein the prediction model of the theoretical power generation can calculate the theoretical power generation of a specific photovoltaic power station through the operation data of other photovoltaic power stations and establish a time sequence of the theoretical power generation of each photovoltaic power station;

s3: intercepting the time sequence of the output data of each photovoltaic power station and the time sequence of the theoretical generated energy according to the same occurrence time in a certain time period, calibrating the fault type of each intercepted section at the corresponding occurrence time according to historical fault diagnosis records, and respectively calculating the correlation coefficient of the intercepted section of the time sequence of the output data and the intercepted section of the time sequence of the theoretical generated energy

S4: taking the correlation coefficient as an input vector, marking the fault type as an output vector, taking the input vector and the output vector as training data, and establishing a distributed photovoltaic fault diagnosis model by using a decision tree method;

s5: the method comprises the steps of monitoring output data of each photovoltaic power station of the photovoltaic power stations, inputting the output data into a theoretical power generation prediction model to obtain theoretical power generation, calculating correlation coefficients of the output data and the theoretical power generation, and inputting the correlation coefficients into a distributed photovoltaic fault diagnosis model to judge the state of the power stations.

Preferably, in the distributed photovoltaic power station fault diagnosis method of the present invention, in the step S1, similarity is further performed on the time series of the output data of each photovoltaic power station through the pearson correlation coefficient, so as to obtain a plurality of photovoltaic power stations with higher similarity to the photovoltaic power station;

the theoretical power generation of a particular photovoltaic plant is calculated in step S2 from the selected operating data of a similar higher photovoltaic plant.

Preferably, according to the fault diagnosis method for the distributed photovoltaic power station, the correlation coefficients are relative Euclidean distance and Pearson correlation coefficient.

Preferably, in the distributed photovoltaic power station fault diagnosis method, the calculation formula of the relative Euclidean distance is

In the formula: delta (X) _Tar ,X _Ref ) Is a relative Euclidean distance; wherein X _Tar Is an intercepted segment of the time series of the output data, X _Ref Is an intercepted segment of a time sequence of theoretical generated energy;

the Pearson correlation coefficient is calculated by the formula

Wherein X _Tar Is an intercepted segment of the time series of the output data, X _Ref Is an intercepted segment of a time sequence of theoretical generated energy; r is the Pearson correlation coefficient;

the average value of the intercepted segment of the time sequence of the output data and the intercepted segment of the time sequence of the theoretical generating capacity.

Preferably, in the distributed photovoltaic power plant fault diagnosis method, the time period intercepted in the step S3 is 2-5 h.

6. A distributed photovoltaic power station fault diagnosis device comprises:

a data acquisition module: collecting historical operating data of each photovoltaic power station, establishing a time sequence of output data of each photovoltaic power station, and collecting historical fault diagnosis records of each photovoltaic power station;

a prediction model of theoretical power generation: training a BP neural network according to the collected historical operating data of each photovoltaic power station to obtain a prediction model of theoretical power generation, wherein the prediction model of the theoretical power generation can calculate the theoretical power generation of a specific photovoltaic power station through the operating data of other photovoltaic power stations;

theoretical generated energy calculation module: the system comprises a power generation system, a power generation system and a power generation system, wherein the power generation system is used for collecting operation data of each photovoltaic power station, calculating theoretical power generation of each photovoltaic power station according to the collected operation data of each photovoltaic power station, and forming a theoretical power generation time sequence according to the calculated theoretical power generation;

a correlation calculation module: the system is used for intercepting the time sequence of the output data of each photovoltaic power station and the time sequence of the theoretical generated energy according to the same occurrence time in a certain time period, calibrating the fault type of each intercepted section at the corresponding occurrence time according to historical fault diagnosis records, and respectively calculating the correlation coefficient of the intercepted section of the time sequence of the output data and the intercepted section of the time sequence of the theoretical generated energy;

a fault diagnosis model: taking the correlation coefficient obtained by the correlation calculation module as an input vector, marking the fault type as an output vector, taking the input vector and the output vector as training data, and training and establishing by using a decision tree method to obtain the fault type;

a fault diagnosis module: and calculating a correlation coefficient according to the output data correlation calculation module of each photovoltaic power station of the monitored photovoltaic power stations, and inputting the correlation coefficient serving as an input vector into the distributed photovoltaic fault diagnosis model to judge the state of the power station.

Preferably, in the distributed photovoltaic power station fault diagnosis method of the invention, the data acquisition module further performs similarity on the time sequence of the output data of each photovoltaic power station through a pearson correlation coefficient to obtain a plurality of photovoltaic power stations with higher similarity to the photovoltaic power station;

and calculating the theoretical power generation of the specific photovoltaic power station by selecting similar higher operation data of the photovoltaic power station in the theoretical power generation prediction model.

the calculation formula of the Pearson correlation coefficient is

Preferably, in the distributed photovoltaic power station fault diagnosis method, the time period intercepted by the correlation calculation module is 2-5 h.

The beneficial effects of the invention are:

drawings

The technical solution of the present application is further explained below with reference to the drawings and the embodiments.

FIG. 1 is a flow chart of a method of fault diagnosis for a photovoltaic power plant;

FIG. 2 is a schematic diagram of a BP neural network;

FIG. 3 is a graph of similarity of time series under different faults;

FIG. 4 is a distance graph of time series under different faults;

fig. 5 is a schematic diagram of the operation of a decision tree.

Detailed Description

It should be noted that, in the present application, the embodiments and features of the embodiments may be combined with each other without conflict.

The technical solutions of the present application will be described in detail below with reference to the accompanying drawings in combination with embodiments.

Example 1

The embodiment provides a photovoltaic fault diagnosis method based on a BP neural network, as shown in fig. 1, the specific steps are as follows:

s1: collecting historical operation data of each photovoltaic power station, establishing a time sequence of output data of each photovoltaic power station, and collecting historical fault diagnosis records of each photovoltaic power station;

the historical operating data of the photovoltaic power station can also be subjected to data preprocessing, and a large blank part or an obvious error part is removed;

taking the data of 10 photovoltaic power stations 2018 in the whole year in the Xin county of China as an example. The ten photovoltaic power stations are composed of photovoltaic power generation systems and photovoltaic power station monitoring systems with the capacities of 40MW respectively, the data sampling interval is 10 minutes, and the output data of the 10 power stations form a time sequence of the output data according to time.

the function of the theoretical power generation prediction model is to calculate the theoretical power generation of each photovoltaic power station, and when the theoretical power generation of a specific power station (such as the power station 1) is calculated, the theoretical power generation is calculated by using the output data of other photovoltaic power stations (power stations 2-9) which do not comprise the specific power station.

The BP neural network is a multi-layer forward feedback neural network, and is mainly characterized by forward propagation of input signals and backward propagation of errors. In forward transmission, the input signal is processed layer by layer from the input layer through the hidden layer to the output layer. The neuron state at each layer will only have an effect on the state of the neurons at the next layer. If the output layer does not get the desired output, it does back-propagation and adjusts the network weights and thresholds based on the prediction error so that the decision of the BP neural network is continually closer to the desired output.

X ₁ ,X ₂ ,X ₃ ,…,X _n Forming an input series X, Y for the input variables of the BP neural network (here the output data of other photovoltaic power plants not including this particular power plant) ₁ ,Y ₂ ,…,Y _m Forming an output series Y, omega for the output variable of the BP neural network (i.e. the theoretical power generation of a particular photovoltaic plant) _ij And ω _jk Is the weight value of the BP neural network. Considering the BP neural network as a nonlinear function, if the input variable is a and the output variable is b, the BP neural network reflects a nonlinear mapping relationship from a independent variables to b dependent variables.

Before neural network prediction to predict BP, the network must be trained. The trained neural network has the capability of prediction and judgment. Generally, the training process of the BP neural network is as follows.

Step 1, initializing the network. Determining the number n of network input variables, the number l of nodes of the hidden layer and the number m of output variables according to the input and output sequence (X, Y) of the system, and initializing the network weight omega among the neurons of the input layer, the hidden layer and the output layer _ij ,ω _jk Initializing a hidden layer threshold value a, outputting a layer threshold value b, and giving a learning step length and a stimulation function of the hidden layer.

And 2, outputting and calculating the hidden layer. According to the input sequence X, the connection weight omega between the input layer and the hidden layer _ij And a hidden layer threshold a, calculating a hidden layer output G.

In the formula, l is the number of hidden layer nodes; f is a hidden layer excitation function, the function has various expression forms, and the function selected in this chapter is:

and 3, outputting layer output calculation. And calculating the prediction output P of the BP neural network according to the hidden layer output H and the connection weight omega jk and the threshold b.

And 4, calculating errors. And calculating a network prediction error e according to the network prediction output P and the expected output sequence Y.

e _k ＝Y _k -P _k (4)

And 5, updating the weight value. Updating the network weight omega of the network according to the network prediction error e calculated in the previous step _ij ，ω _jk 。

ω _ij ＝ω _jk +ηe _k G _j (6)

In the formula, η is a learning step length.

And 6, updating the threshold. New network node thresholds a, b are generated based on the calculated network prediction error e.

b _k ＝b _k +e _k (8)

And 7, judging whether the algorithm iteration is finished or not, and returning to the step 2 if the algorithm iteration is not finished.

Preferably, the output data of the partial power stations with higher correlation can be selected for training, so that the calculated amount is reduced, and the training process is accelerated

Selecting the power station 1 as a target power station, randomly selecting n (n is 2,3, … …,9) power stations from the rest power stations as reference power stations to perform fitting, and calculating fitting errors. And respectively selecting power stations 2-9 to repeat the process. And finally, averaging the errors with the number of the fitting power stations being n each time. Generally, 3 power stations are selected or the number of input sources is determined according to the calculation result.

For example, according to the historical operating data of 10 photovoltaic power stations in 2018 from seine county, china, the pearson correlation coefficient is obtained as shown in table 1 below. Therefore, the power stations 1, 4, 3, and 7 can be selected as the power stations for calculation, and when the theoretical power generation amount of the photovoltaic power station of the power station 1 is calculated, the theoretical power generation amount of the power stations 4, 3, and 7 can be selected for calculation.

TABLE 1 plant similarity analysis

S3: intercepting the time sequence of the output data of each photovoltaic power station and the time sequence of the theoretical generated energy according to the same occurrence time in a certain time period (the intercepted time period is 2-5h, for example, 3h), calibrating the fault type of each intercepted section corresponding to the occurrence time according to historical fault diagnosis records, and respectively calculating the correlation coefficient of the intercepted section of the time sequence of the output data and the intercepted section of the time sequence of the theoretical generated energy;

the correlation coefficients are relative Euclidean distance and Pearson correlation coefficient.

The formula for calculating the relative Euclidean distance is

the calculation formula of the Pearson correlation coefficient is

the average of the truncated segment of the time series of the output data and the truncated segment of the time series of the theoretical power generation amount.

table 2 output vector of fault type

Type of failure	Output vector
		Is normal	η ₁ ＝[1 0 0 0]
Abnormal aging	η ₂ ＝[0 1 0 0]
		Shadow masking	η ₃ ＝[0 0 1 0]
Open circuit fault	η ₄ ＝[0 0 0 1]

The decision tree is a tree structure with sample attributes as leaf nodes and values of the attributes as branches. The basic principle of decision tree building is to recursively split the training data set into subsets such that each contains states where the target variables are similar, these targets being predictable attributes. And in the splitting process, splitting attribute selection is carried out by using the principle of an information theory. Let the set of fault features of a certain equipment be d ═ d1, d2, …, dn }, the set of fault points e ═ e1, e2, …, em }, d is the set of test attributes (relative euclidean distance and pearson correlation coefficient in this embodiment), and e is the set of class labels (output vector representing fault type in this embodiment). The formed root node of the fault characteristic decision tree is a training set, one internal node represents a test of a fault characteristic, one edge represents a test result, and the leaf represents a certain fault point or a certain fault processing mode. The attribute values for any internal node are discrete, and for each fault the fault signature either exists a 1 (indicating that the fault signature exists) or does not exist a 0 (indicating that the fault signature does not exist).

Taking historical fault diagnosis records as training samples, generating 10 different types of faults at different fault moments and different fault positions by using the established model, extracting corresponding fault characteristics to form corresponding training samples, wherein the effective training samples of the 10 different types of faults are all 100, and the total number of the samples is 1000: and (3) adopting an ID3 algorithm to take the observable fault characteristics as the test splitting attributes, taking fault points as class labels, correspondingly dividing records, and adopting a pre-pruning mode to control the growth of the tree to form a decision.

1) The tree growth may be stopped when the number of instances to reach this node is less than a certain threshold.

2) A substitution error rate is introduced. When a set is continuously divided on a certain branch of the sub-tree in the calculation process, although all samples do not belong to the same class, if the number of records in different classes is greatly different, an error substitution rate formula is introduced:

in the formula: n represents the number of records of the branch; n' represents the number of records in the majority category in the branch; m represents the total number of records in the training set. If the value calculated by the formula is less than a certain threshold value, converting the subtree into a leaf node, otherwise, continuing to call the 1 st step for further decomposition. The algorithm recurses the above operations until the fault-free feature attributes are available to partition the current sample subset or satisfy the condition that the tree stops growing.

The data used in this example is derived from a device fault record, and is used to establish a decision tree to find the association between the fault signature and the fault point. Step 1: 320 pieces of data are extracted, 70% of the data are taken as training tuples, and 30% of the data are taken as test data. And (5) counting a test attribute fault feature set and a fault point set. Step 2: data were preprocessed as above to count statistics of failure points in these 224 records. And 3, step 3: and selecting the attribute which can divide the training set into the most instances by calculating the gain of the fault characteristic information one by one. As can be seen from the table 3, the accuracy of the method for diagnosing various faults is over 97 percent, so that the fault diagnosis method has high accuracy in actual fault diagnosis of the photovoltaic power station and has practical application value.

TABLE 3 Fault diagnosis accuracy statistics

And (3) theoretical power generation calculation is carried out by inputting the monitored output data of each photovoltaic power station into a theoretical power generation prediction model, and if the theoretical power generation prediction model is trained by using a part of photovoltaic power stations with high similarity, the corresponding photovoltaic power stations are also used for calculation at the moment.

During judgment, the time sequence of the output data of the photovoltaic power station and the time sequence of the calculated theoretical power generation amount are also required to be intercepted according to the same time period length of the step S3, and the time sequence is 2-5 h.

Example 2

The implementation provides a distributed photovoltaic power station fault diagnosis device, which corresponds to the method of embodiment 1, and includes:

theoretical generated energy calculation module: the system comprises a power generation system, a power generation system and a power generation control system, wherein the power generation system is used for collecting operation data of each photovoltaic power station, calculating theoretical power generation of each photovoltaic power station according to the collected operation data of each photovoltaic power station, and forming a theoretical power generation time sequence according to the calculated theoretical power generation;

a fault diagnosis model: taking the correlation coefficient obtained by the correlation calculation module as an input vector, marking the fault type as an output vector, taking the input vector and the output vector as training data, and training and establishing by using a decision tree method;

The data acquisition module is also used for carrying out similarity on the time sequence of the output data of each photovoltaic power station through Pearson correlation coefficients to obtain a plurality of photovoltaic power stations with higher similarity with the photovoltaic power station;

The formula for calculating the relative Euclidean distance is

the Pearson correlation coefficient is calculated by the formula

The time period intercepted in the correlation calculation module is 2-5h, and preferably 3 h.

In light of the foregoing description of the preferred embodiments according to the present application, it is to be understood that various changes and modifications may be made without departing from the spirit and scope of the invention. The technical scope of the present application is not limited to the content of the specification, and must be determined according to the scope of the claims.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

Claims

1. A distributed photovoltaic power station fault diagnosis method is characterized by comprising the following steps:

s3: intercepting the time sequence of the output data of each photovoltaic power station and the time sequence of the theoretical generated energy according to the same occurrence time in a certain time period, calibrating the fault type of each intercepted section at the corresponding occurrence time according to historical fault diagnosis records, and respectively calculating the correlation coefficient of the intercepted section of the time sequence of the output data and the intercepted section of the time sequence of the theoretical generated energy;

2. The distributed photovoltaic power station fault diagnosis method according to claim 1, wherein in the step S1, similarity is further performed on the time series of the output data of each photovoltaic power station through pearson correlation coefficients, so as to obtain a plurality of photovoltaic power stations with higher similarity to the photovoltaic power station;

and calculating the theoretical power generation amount of the specific photovoltaic power station by the selected operation data of the photovoltaic power station with higher similarity in the step S2.

3. The distributed photovoltaic power plant fault diagnosis method of claim 1 wherein the correlation coefficients are relative Euclidean distance and Pearson correlation coefficients.

4. The distributed photovoltaic power plant fault diagnosis method of claim 3 wherein the calculation formula for the relative Euclidean distance is

the calculation formula of the Pearson correlation coefficient is

5. The distributed photovoltaic power plant fault diagnosis method as claimed in any one of claims 1 to 4, wherein the time period intercepted in step S3 is 2 to 5 hours.

6. A distributed photovoltaic power station fault diagnosis device is characterized by comprising:

a correlation calculation module: the system comprises a time sequence acquisition module, a fault diagnosis module, a power generation module and a power generation module, wherein the time sequence acquisition module is used for acquiring the time sequence of the output data of each photovoltaic power station and the time sequence of the theoretical power generation according to the same occurrence time in a certain time period, calibrating the fault type of each acquisition section corresponding to the occurrence time according to historical fault diagnosis records, and respectively calculating the correlation coefficient of the acquisition section of the time sequence of the output data and the acquisition section of the time sequence of the theoretical power generation;

7. The distributed photovoltaic power station fault diagnosis device according to claim 6, wherein the data acquisition module further performs similarity on the time series of the output data of each photovoltaic power station through Pearson correlation coefficients to obtain a plurality of photovoltaic power stations with higher similarity to the photovoltaic power station;

and calculating the theoretical power generation of the specific photovoltaic power station by the selected operation data of the photovoltaic power station with higher similarity in the theoretical power generation prediction model.

8. The distributed photovoltaic power plant fault diagnosis apparatus of claim 6 or 7 wherein the correlation coefficients are relative Euclidean distance and Pearson correlation coefficients.

9. The distributed photovoltaic power plant fault diagnosis apparatus of claim 8 wherein the calculation formula of the relative Euclidean distance is

In the formula: delta (X) _Tar ,X _Ref ) Is a relative Euclidean distance; wherein X _Tar Is an intercepted segment of the time series of the output data, X _Ref Is a truncated segment of a time series of theoretical power generation;

the calculation formula of the Pearson correlation coefficient is

10. The distributed photovoltaic power plant fault diagnosis apparatus as claimed in any one of claims 6 or 7, wherein the time period intercepted in the correlation calculation module is 2-5 h.