CN115021679B

CN115021679B - Photovoltaic equipment fault detection method based on multi-dimensional outlier detection

Info

Publication number: CN115021679B
Application number: CN202210946811.5A
Authority: CN
Inventors: 陈运蓬; 尚文; 白静波; 赵锐; 马飞; 夏彦; 张红伟
Original assignee: Datong Power Supply Co of State Grid Shanxi Electric Power Co Ltd
Current assignee: Datong Power Supply Co of State Grid Shanxi Electric Power Co Ltd
Priority date: 2022-08-09
Filing date: 2022-08-09
Publication date: 2022-11-04
Anticipated expiration: 2042-08-09
Also published as: CN115021679A

Abstract

The invention discloses a photovoltaic equipment fault detection method based on multi-dimensional outlier detection, which relates to the technical field of photovoltaic equipment detection, clustering is carried out by using an FCM clustering algorithm introducing a threshold mechanism, an isolated forest algorithm based on data dimension reduction and a statistical decision algorithm based on the inner distance of quartile points are taken as cores to be combined with abnormal data detection, detection can be carried out on photovoltaic equipment with different models and different environments, and the detection efficiency, accuracy and compatibility all reach better standards; according to the technical scheme, the fault condition of the photovoltaic equipment can be efficiently analyzed in real time at low cost, the photovoltaic equipment with different models and different environments can be classified and detected, and the compatibility of the detection of the photovoltaic equipment is improved so as to make up for the defects of the prior art.

Description

Photovoltaic equipment fault detection method based on multi-dimensional outlier detection

Technical Field

The invention relates to the technical field of photovoltaic equipment detection, in particular to a photovoltaic equipment fault detection method based on multi-dimensional outlier detection.

Background

The photovoltaic equipment fault mainly refers to the abnormal information such as current, voltage, power, temperature and the like caused by the problems of the photovoltaic equipment, the circuit or the meter, and has important significance for the normal operation of a photovoltaic power generation system by positioning the fault occurrence point as soon as possible and processing the fault occurrence point in time. In recent years, manufacturers of photovoltaic equipment are various, and communication protocols of equipment of different versions of the same manufacturer are different, so that great difficulty is brought to fault detection of the photovoltaic equipment.

At present, traditional detection algorithms such as isolated forests are mainly used in a fault detection method of photovoltaic equipment, but due to the fact that the types of the photovoltaic equipment are various and the types of the photovoltaic equipment are different from the environment, the latitude of data generated in operation is high, the difference is large, detection accuracy, detection efficiency and the like are reduced, and an effective solution is not provided for the traditional detection algorithms for the problems.

At present, based on unsupervised learning, development of a multi-dimensional outlier detection technology and continuous improvement of computer computing power, real-time fault detection of photovoltaic equipment becomes possible by combining an artificial intelligence technology, so that a multi-dimensional outlier detection method for the photovoltaic equipment needs to be provided as soon as possible at the present stage, and a high-efficiency detection algorithm is adopted for real-time analysis of fault conditions of the photovoltaic equipment to make up for the defects of the prior art.

Disclosure of Invention

The invention aims to: the multi-dimensional outlier detection method for the photovoltaic equipment is provided, and the detection efficiency of real-time analysis of the fault condition of the photovoltaic equipment is improved.

The technical scheme of the invention is as follows: the utility model provides a photovoltaic equipment fault detection method based on multi-dimensional outlier detection, this method includes:

s1, collecting and preprocessing photovoltaic equipment operation state data, acquiring the model and capacity of the photovoltaic equipment and corresponding current, voltage, power, temperature and other data through a data collector, and integrating the photovoltaic equipment operation information data collected at the same time into a data setLData ofLIn a total amount ofNUsing a proportional mean filling method to the dataLCarrying out interpolation processing on the missing information in the step (2);

s2, inputting the data processed in the step S1 into the FCM clustering algorithm by using the FCM clustering algorithm based on a threshold mechanism to obtain a clustering result, wherein the threshold mechanism comprises a threshold value for initializing the clustering quantityCThreshold decision mechanism for initializing membership degree matrixUThe threshold initialization mechanism is firstly used for initializing the clustering number of the photovoltaic equipment categoriesCAnd membership matrixUAnd finally obtaining a new membership matrix through iterative computation of a valence functionUObtaining a clustering result according to the final membership matrix;

s3, detecting abnormal data, and specifically comprising the following steps:

S3A, obtaining the statistics judgment algorithm based on the inner distance of the quartile point under all categoriesUpper bound for all types of dataTopAnd the lower boundBottomThen for all dataLJudging, and calibrating all values which are not positioned in the upper and lower bound ranges of the self type as abnormal values;

S3B, while executing the step S3A, concurrently using an isolated forest algorithm based on data dimension reduction to process the preprocessed data, performing dimension reduction operation on all types of data, selecting a certain type from the dimension-reduced data, randomly selecting a sample from the type as a sub-sample of a training isolated tree, putting the sub-sample into a root node of the isolated tree, randomly assigning a data type, and randomly generating a cut point between the maximum value and the minimum value of the data type in the current sub-sample

，

Setting the corresponding value of all the types under the current specified type to be less than or equal to

Is placed at the left node and has a value greater than

The samples are placed at the right node, cutting points are repeatedly generated at the left branch node and the right branch node of the nodes, new leaf nodes are continuously constructed until only one piece of data on the leaf nodes can not be cut any more, or the tree grows to the set maximum height, and the method is based on the principle that

Calculating an abnormality score, wherein

Is composed ofLThe expected values of path lengths in the trees,

is oneIncludedhA sample-by-sample data set, representing the average path length of the tree,

whereinhIs a sub-sample capacity

For adjusting the sum, repeatedly executing the isolated forest algorithm until all the class data are detected;

and S4, performing weighted threshold judgment on the abnormal data detected in the step S3A and the step S3B to obtain final detected abnormal data.

In any one of the above technical solutions, further, the step of filling the missing information by using a proportional mean value filling method in step S1 includes: random decimationtThe data of the bar is transmitted to the mobile terminal,

taking out all numerical values of data types corresponding to missing information positions in the data, and sorting the data from small to large, the firstaA data note

，

Using the formula

To obtainpI.e., the value to be inserted, wherein,Eis a mathematical expectation.

In any of the above technical solutions, further, the membership matrix to be initialized in step S2UIs a matrix of N rows and C columns, with

Represents the n-th row and c-column of the matrix U,

，（

；

) Wherein

For the nth value in the data W,

corresponds to the firstc

The average power value of the group is,

corresponds to the firstj

The average power value of the group is,

indicating that the nth data corresponds to the membership of the class c, wherein the sum of the membership of each piece of data belonging to the class c is 1 for the total number N of data, namely, satisfies

；

The iterative calculation process specifically comprises: the first iteration is based on the initialized membership matrixUFrom the formula

Calculating the clustering center of C clusters

Where m is the value 2, the value of m,

is pretreatedNIn the datanData of a personLUsing the formula

Calculating a cost function, wherein

，

To represent

The corresponding clustering center; at this point, iteration is continued, using the formula

Recalculating membership matrixUIn which

，

Is shown asdIndividual cluster center

Is reused

Calculating the cluster center, and then using the formula

Recalculating the cost function, wherein

；

If the value of this calculation isJValue to last calculationJThe amount of change is less than the thresholdεAnd if the algorithm is finished, the final membership matrix is obtainedUOtherwise, the above steps are repeatedStep until valueJIs less than a threshold valueεAccording to the finally obtained membership matrixUAnd clustering, wherein the class corresponding to the maximum membership degree of each datum is the finally divided class.

In any of the above technical solutions, further, the step S3A of using a statistical decision algorithm based on the inner distance of the quartile point specifically includes: selecting a category from preprocessed data, respectively sequencing values of each category from small to large according to the data types of all the data in the category, randomly selecting one type of data to find out the digit of the data, marking the digit as Q2, dividing the data into a left half and a right half by taking the Q2 as a dividing point, respectively finding out the digits on the left side and the right side of the divided data, wherein the left side is marked with the middle digit as Q1, the right side is marked with the middle digit as Q3, and according to a formula

，

，

Respectively calculate the four-bit distanceIQRUpper bound ofTopAnd lower boundBottomWhereinkThe statistical decision algorithm is repeatedly executed as a constant until the upper bound corresponding to all types of data under all categories is obtainedTopAnd the lower boundBottom。

In any of the above technical solutions, further, the dimension reduction operation in step S3B specifically includes: firstly, one category of data is taken, and the number of the category of data isMA bar, composing such category dataRLine ofMColumn matrixXThen will beXIs zero-averaged for each row to obtain a covariance matrix

，TFor transposing symbols, solving matricesVThe characteristic vector is arranged into a matrix from top to bottom according to the size of the corresponding characteristic value, and the characteristic value is taken beforesThe rows form a matrixPWherein

，Final calculation

，YI.e. reducing the dimension of the category tosAnd repeatedly executing the dimensionality reduction operation on the data after dimensionality reduction until all the categories are subjected to dimensionality reduction.

In any of the above technical solutions, further, the specific determination basis of the weighting threshold in step S4 is: when one piece of data is judged to be abnormal by a statistic judgment algorithm based on the inner distance of the quartile point and an isolated forest algorithm based on data dimension reduction, the data is defined as abnormal data;

otherwise, respectively calculating the statistic judgment abnormal factor of the data:

，

and isolated forest anomaly factors:

wherein S represents an abnormality score,

by the formula

Obtaining the degree of abnormalityD，WhereinlThe constraint factor is defined as 0.3 when

If so, it is determined to be abnormal.

The invention has the beneficial effects that:

according to the technical scheme, as the clustering algorithm and the isolated forest algorithm are combined with the statistical judgment algorithm, compared with other commonly used detection methods, the photovoltaic equipment fault detection effect is more accurate; according to the invention, the calculation burden is reduced by performing data dimension reduction on the operation data of the photovoltaic equipment, higher detection efficiency is achieved, and operation and maintenance personnel can be effectively assisted to quickly locate the fault point; the detection algorithm of the invention can simultaneously detect a large amount of data, even if the data have difference, thus achieving the effect of deploying detection in a large range in a small amount and reducing the cost burden.

In the preferred implementation mode of the invention, a clustering algorithm is added in the algorithm, so that the photovoltaic equipment with different models and different environments can be classified and detected, and the compatibility of the photovoltaic equipment detection is improved.

Drawings

The above and additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:

FIG. 1 is a schematic flow diagram of a method of photovoltaic equipment fault detection based on multi-dimensional outlier detection according to an embodiment of the present invention;

fig. 2 is a schematic diagram of a quartet point inner distance algorithm in a photovoltaic device fault detection method based on multi-dimensional outlier detection according to an embodiment of the invention.

Detailed Description

In order that the above objects, features and advantages of the present invention can be more clearly understood, a more particular description of the invention will be rendered by reference to the appended drawings. It should be noted that the embodiments of the present invention and features of the embodiments may be combined with each other without conflict.

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, however, the present invention may be practiced in other ways than those specifically described herein, and therefore the scope of the present invention is not limited by the specific embodiments disclosed below.

As shown in fig. 1, the present embodiment provides a method for detecting a failure of a photovoltaic device based on multi-dimensional outlier detection, where the method includes:

s1, collecting and preprocessing running state data of the photovoltaic equipment, and acquiring the model and the capacity of the photovoltaic equipment and corresponding current, voltage and work through a data acquisition unitData such as rate, temperature and the like, and integrates the photovoltaic equipment operation information data acquired at the same time into one dataLData ofLIn a total amount ofNUsing a proportional mean filling method to the dataLThe missing information in (2) is interpolated.

In particular, during the data acquisition process, the situation of incomplete information can occur, and the data integrated in such a way

There will be a partial deletion in.

Wherein the proportional mean filling method selects randomlytThe data of the bar is transmitted to the mobile terminal,

taking out all numerical values of data types corresponding to missing information in the data, and sorting the data from small to largeaA data note as

，（

) Using the formula

S2, inputting the data processed in the step S1 into the FCM clustering algorithm by using the FCM clustering algorithm based on a threshold mechanism to obtain a clustering result, wherein the clustering result specifically comprises the following steps:

s21, initializing the photovoltaic equipment category clustering number by using a threshold mechanismC，CDefaults to 1, and takes out all dataLThe power values in the sequence are sorted from small to large, and the sorted power valuesNA power value is recorded asWWill beWDividing according to the principle that the number of data in each interval is equalmA section wherein

，mRounding up, calculating the average power value of each interval in turn, and calculating the average power value difference between two adjacent intervals

，

In which

Is shown asbAverage power value of each interval: (

) Whenever there is

Time, cluster numberCAnd adding 1.

S22, initializing a membership matrix by using a threshold mechanismU，UIs composed ofNLine ofCA matrix of columns, wherein

Representation matrixUTo (1)nLine ofcRow (a), (b)

；

) Also represents the firstnThe individual data corresponds tocDegree of membership of a class, for a total number ofNHas a sum of membership degrees of all classes of 1, i.e., satisfies

Then will beWRe-dividing according to the principle of equal data number of each intervalCA section whereincThe average power value of each interval is recorded as

According to the formula

，（

；

) Initializing a membership matrixUWherein in the

As dataWTo (1)nA numerical value.

S23, based on the initialized membership matrixUFrom the formula

ComputingCCluster center of individual clusters

Here, themThe value of 2 is taken as the index,

is pretreatedNIn the datanData of a personL。

S24, use formula

Calculating a cost function, wherein

，

To represent

Recalculating membership matrixUWherein

，

Is shown asdIndividual cluster center

Is reused

Calculating the cluster center, and then using the formula

Recalculating the cost function, wherein

；

If the value of this calculation isJValue to last calculationJChange less than thresholdεIf the algorithm is finished, the final membership matrix is obtainedUOtherwise, the above steps are repeated until the value is reachedJIs less than a threshold valueεAccording to the finally obtained membership matrixUAnd clustering, wherein the class corresponding to the maximum membership degree of each datum is the finally divided class.

S3, detecting abnormity, and processing data in parallel by using a statistic decision algorithm based on the inner distance of the quartile point and an isolated forest algorithm based on data dimension reduction, wherein the two algorithms specifically comprise the following steps:

S3A, selecting a category from the preprocessed data by using a statistic decision algorithm based on the inner distance of quartile points, sorting all data of the category from small to large according to the data types of the data, randomly selecting one type of data to find out the digit of the data, marking the digit as Q2, dividing the data into a left half and a right half by taking the Q2 as a division point, and respectively finding out the digits on the left side and the right side of the divided data, wherein the left side median is marked as Q1, and the right side median is marked as Q3. As shown in FIG. 2, there are 11 total data, Q2 marked at x6, Q1 marked at x3, and Q3 marked at x 9.

According to the formula

，

，

Respectively calculate four-bit distancesIQRUpper bound ofTopAnd lower boundBottomWhereinkIs a constant, defined as 1.5 in this example. Repeatedly executing the statistical judgment algorithm until obtaining the upper bound corresponding to all types of data under all categoriesTopAnd the lower boundBottom。

For all dataLAnd judging, namely marking all numerical values which are not positioned in the upper and lower bound ranges of the self type as abnormal, and marking a piece of data as abnormal data as long as one data type is detected as abnormal by the algorithm in one piece of data.

S3B, processing the preprocessed data by using an isolated forest algorithm based on data dimension reduction while the step S3A is performed, and firstly, taking one type of data, wherein the number of the type of data isMA bar, composing such category dataRLine ofMColumn matrixXThen will beXIs zero-averaged for each row to obtain a covariance matrix

，TFor transposing symbols, a matrix is solvedVThe characteristic vector is arranged into a matrix from top to bottom according to the size of the corresponding characteristic value, and the characteristic value is taken beforesThe rows form a matrixPWherein

，Final calculation

Selecting a certain category from the data after dimension reduction, and randomly selecting the sample capacity ash（h1/10) of the number of the classes, placing the samples as child samples of the training isolated tree into a root node of the isolated tree, randomly assigning a data type, and randomly generating a cut point between the maximum value and the minimum value of the data type in the current child sample

，（

) Setting the corresponding value of all the types under the current specified type to be less than or equal to

Is placed at the left node and has a value greater than

The samples of (2) are placed at the right node, the cutting points are repeatedly generated at the left branch node and the right branch node of the node, new leaf nodes are continuously constructed until only one piece of data on the leaf nodes can not be cut any more, or the tree grows to the set maximum height, and in the embodiment, the maximum height of the tree is set to be 256.

According to

An anomaly score is calculated. Wherein

Is composed ofLThe expected values of path lengths in the trees,

is one composed ofhA sample-by-sample data set, representing the average path length of the tree,

whereinhIs a sub-sample capacity

To sum the sums.

And repeatedly executing the isolated forest algorithm until all the class data are detected. Specifically, it is found through experiments that the result of the calculation of the data by the isolated forest is mostly negative, so in this embodiment, the abnormal score at this time is added by 0.5 and the detection model decision threshold is set to 0, that is, the data still less than 0 in the data obtained after adding 0.5 is all determined as abnormal data.

S4, performing weighted threshold judgment on the abnormal data detected in the step S3A and the step S3B to obtain final detected abnormal data, wherein the specific judgment basis of the weighted threshold is as follows: defining a piece of data as abnormal data when the data is judged to be abnormal by the isolated forest algorithm and the statistical judgment algorithm, otherwise respectively calculating the statistical judgment abnormal factor of the data

And isolated forest anomaly factors

Wherein S represents an abnormality score, represented by the formula

Obtaining the degree of abnormalityD，WhereinlIs defined as a constraint factor of 0.3 when

If so, the system is judged to be abnormal.

In one practical application of the embodiment, partial data of a group of real photovoltaic devices in operation at the time t1 is collected, and fault detection is performed on the partial data as shown in table 1.

TABLE 1 statistics of the actual data collected at a time

The detection algorithm classifies the first 6 pieces of data into a first category and the last 6 pieces of data into a second category, and after data preprocessing and clustering, whether the data are abnormal or not is judged respectively through statistical judgment in the step S3A and outlier detection in the isolated forest algorithm in the step S3B.

As shown in Table 2, the final test results showed that the 1 st and 7 th data were abnormal data.

Table 2 data test results

By simply observing data, the 2 nd data can be obviously found to have abnormal temperature, the 7 th data has abnormal PV2 voltage, and the two data are really abnormal data and are consistent with a model judgment result, so that the fault detection method is proved to be effective.

The technical scheme of the invention is explained in detail by combining the attached drawings, the invention provides a photovoltaic equipment fault detection method based on multi-dimensional outlier detection, clustering is carried out by using an FCM clustering algorithm introducing a threshold mechanism, and abnormal data is detected by combining an isolated forest algorithm based on data dimension reduction and a statistical decision algorithm based on quartering bit point inner distance as a core. The photovoltaic equipment detection device can detect photovoltaic equipment of different models and different environments, and detection efficiency, accuracy and compatibility all reach better standards.

The steps in the invention can be sequentially adjusted, combined and deleted according to actual requirements.

Although the present invention has been disclosed in detail with reference to the accompanying drawings, it is to be understood that such description is merely illustrative of and not restrictive on the application of the present invention. The scope of the invention is defined by the appended claims and may include various modifications, adaptations and equivalents of the invention without departing from its scope and spirit.

Claims

1. A method for detecting faults of photovoltaic equipment based on multi-dimensional outlier detection is characterized by comprising the following steps:

s1, collecting and preprocessing photovoltaic equipment operation state data, acquiring the model and capacity of the photovoltaic equipment and corresponding current, voltage, power and temperature through a data acquisition unit, and integrating the photovoltaic equipment operation information data acquired at the same time into a data setLData ofLIn a total amount ofNUsing a proportional mean filling method to the dataLThe missing information in the method is interpolated, firstly, the ratio mean filling method is selected randomlytThe number of the pieces of data is set,

，

Using the formula

To find outpI.e., the value to be inserted, wherein,Eis a mathematical expectation;

s2, inputting the data processed in the step S1 into the FCM clustering algorithm by using the FCM clustering algorithm based on a threshold mechanism to obtain a clustering result, wherein the threshold mechanism comprises a threshold value for initializing the clustering quantityCThe threshold value determination mechanism of (2) is,Cdefaults to 1, and takes out all dataLThe power values in the sequence are sorted from small to large, and the sorted power valuesNA power value is recorded asWWill beWDividing according to the principle that the number of data in each interval is equalmA section wherein

，

Wherein

Is shown asbThe average power value of the individual intervals,

whenever there is

Time, cluster numberCAdding 1; the threshold mechanism further includes a mechanism for initializing a membership matrixUThreshold initialization mechanism, membership matrixUIs composed ofNLine ofCA matrix of columns, wherein

Representation matrixUTo (1)nLine ofcThe columns of the image data are,Wrepartitioning according to the principle of equal data number of each intervalCA section whereincThe average power value of each interval is recorded as

Of 1 atjThe average power value of each interval is recorded as

，WTo (1) anThe numerical value is recorded as

According to the formula

Initializing membership matrixUWherein

、

、

、

(ii) a The threshold mechanism is used for initializing the clustering number of the photovoltaic equipment categoriesCAnd membership matrixUAnd finally obtaining a new membership matrix through iterative computation of a valence functionUAnd obtaining a clustering result according to the final membership matrix, wherein the iterative computation process specifically comprises the following steps: the first iteration is based on the initialized membership matrixUFrom the formula

Calculating the clustering centers of C clusters

Here, themThe value of 2 is taken as the index,

is pretreatedNIn the datanData of a personLUsing the formula

Calculating a cost function, wherein

，

To represent

Recalculating membership matrixUIn which

，

Is shown asdThe center of each cluster is determined by the center of each cluster,

is reused

Calculating the cluster center, and then using the formula

Recalculating the cost function, wherein

；

If the value of this calculation isJValue to last calculationJChange less than thresholdεAnd if the algorithm is finished, the final membership matrix is obtainedUOtherwise, the above steps are repeated until the value is up toJIs less than a threshold valueεAccording to the finally obtained membership matrixUClustering is carried out, and the category corresponding to the maximum membership degree of each datum is the finally divided category;

s3, detecting abnormal data, and specifically comprising the following steps:

S3A, obtaining upper bounds corresponding to all types of data under all categories by using a statistic decision algorithm based on inner distances of quartile pointsTopAnd the lower boundBottomFor all dataLJudging, and calibrating all values which are not positioned in the upper and lower bound range of the self type as abnormal values;

，

Is placed at the left node and has a value greater than

Calculating an abnormality score, wherein

Is composed ofLThe expected values of path lengths in the trees,

is one ofhA sample-by-sample data set, representing the average path length of the tree,

whereinhIn order to be the sub-sample size,

s4, performing weighted threshold judgment on the abnormal data detected in the step S3A and the step S3B to obtain final detected abnormal data, wherein the weighted threshold judgment comprises the following steps:

when one piece of data is judged to be abnormal by a statistic judgment algorithm based on the inner distance of the quartile point and an isolated forest algorithm based on data dimension reduction, the data is defined as abnormal data;

，

and isolated forest anomaly factor:

wherein S represents an abnormality score,

by the formula

If so, the system is judged to be abnormal.

2. The method for detecting the faults of the photovoltaic equipment based on the multi-dimensional outlier detection as claimed in claim 1, wherein the step S3A of using the statistical decision algorithm based on the inner distance of the quartile point specifically comprises the following steps: selecting a category from preprocessed data, respectively sequencing values of each category from small to large according to the data types of all the data in the category, randomly selecting one type of data to find out the digit of the data, marking the digit as Q2, dividing the data into a left half and a right half by taking the Q2 as a dividing point, respectively finding out the digits on the left side and the right side of the divided data, wherein the left side is marked with the middle digit as Q1, the right side is marked with the middle digit as Q3, and according to a formula

，

，

Respectively calculate four-bit distancesIQRUpper bound ofTopAnd the lower boundBottomWhereinkThe statistical decision algorithm is repeatedly executed as a constant until the upper bound corresponding to all types of data under all categories is obtainedTopAnd lower boundBottom。

3. The method for detecting the faults of the photovoltaic equipment based on the multi-dimensional outlier detection as claimed in claim 1, wherein the step S3B of reducing the dimensions comprises the following specific steps: firstly, one category of data is taken, and the number of the category of data isMA bar, composing such category dataRLine ofMColumn matrixXThen will beXIs zero-averaged for each row to obtain a covariance matrix

，TTo turn toPut symbols and calculate a matrixVThe characteristic vector is arranged into a matrix from top to bottom according to the size of the corresponding characteristic value, and the characteristic value is taken beforesThe rows form a matrixPWherein

，Final calculation