CN115021679B - Photovoltaic equipment fault detection method based on multi-dimensional outlier detection - Google Patents

Photovoltaic equipment fault detection method based on multi-dimensional outlier detection Download PDF

Info

Publication number
CN115021679B
CN115021679B CN202210946811.5A CN202210946811A CN115021679B CN 115021679 B CN115021679 B CN 115021679B CN 202210946811 A CN202210946811 A CN 202210946811A CN 115021679 B CN115021679 B CN 115021679B
Authority
CN
China
Prior art keywords
data
value
photovoltaic equipment
matrix
clustering
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210946811.5A
Other languages
Chinese (zh)
Other versions
CN115021679A (en
Inventor
陈运蓬
尚文
白静波
赵锐
马飞
夏彦
张红伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Datong Power Supply Co of State Grid Shanxi Electric Power Co Ltd
Original Assignee
Datong Power Supply Co of State Grid Shanxi Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Datong Power Supply Co of State Grid Shanxi Electric Power Co Ltd filed Critical Datong Power Supply Co of State Grid Shanxi Electric Power Co Ltd
Priority to CN202210946811.5A priority Critical patent/CN115021679B/en
Publication of CN115021679A publication Critical patent/CN115021679A/en
Application granted granted Critical
Publication of CN115021679B publication Critical patent/CN115021679B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02SGENERATION OF ELECTRIC POWER BY CONVERSION OF INFRARED RADIATION, VISIBLE LIGHT OR ULTRAVIOLET LIGHT, e.g. USING PHOTOVOLTAIC [PV] MODULES
    • H02S50/00Monitoring or testing of PV systems, e.g. load balancing or fault identification
    • H02S50/10Testing of PV devices, e.g. of PV modules or single PV cells
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02EREDUCTION OF GREENHOUSE GAS [GHG] EMISSIONS, RELATED TO ENERGY GENERATION, TRANSMISSION OR DISTRIBUTION
    • Y02E10/00Energy generation through renewable energy sources
    • Y02E10/50Photovoltaic [PV] energy

Landscapes

  • Photovoltaic Devices (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a photovoltaic equipment fault detection method based on multi-dimensional outlier detection, which relates to the technical field of photovoltaic equipment detection, clustering is carried out by using an FCM clustering algorithm introducing a threshold mechanism, an isolated forest algorithm based on data dimension reduction and a statistical decision algorithm based on the inner distance of quartile points are taken as cores to be combined with abnormal data detection, detection can be carried out on photovoltaic equipment with different models and different environments, and the detection efficiency, accuracy and compatibility all reach better standards; according to the technical scheme, the fault condition of the photovoltaic equipment can be efficiently analyzed in real time at low cost, the photovoltaic equipment with different models and different environments can be classified and detected, and the compatibility of the detection of the photovoltaic equipment is improved so as to make up for the defects of the prior art.

Description

Photovoltaic equipment fault detection method based on multi-dimensional outlier detection
Technical Field
The invention relates to the technical field of photovoltaic equipment detection, in particular to a photovoltaic equipment fault detection method based on multi-dimensional outlier detection.
Background
The photovoltaic equipment fault mainly refers to the abnormal information such as current, voltage, power, temperature and the like caused by the problems of the photovoltaic equipment, the circuit or the meter, and has important significance for the normal operation of a photovoltaic power generation system by positioning the fault occurrence point as soon as possible and processing the fault occurrence point in time. In recent years, manufacturers of photovoltaic equipment are various, and communication protocols of equipment of different versions of the same manufacturer are different, so that great difficulty is brought to fault detection of the photovoltaic equipment.
At present, traditional detection algorithms such as isolated forests are mainly used in a fault detection method of photovoltaic equipment, but due to the fact that the types of the photovoltaic equipment are various and the types of the photovoltaic equipment are different from the environment, the latitude of data generated in operation is high, the difference is large, detection accuracy, detection efficiency and the like are reduced, and an effective solution is not provided for the traditional detection algorithms for the problems.
At present, based on unsupervised learning, development of a multi-dimensional outlier detection technology and continuous improvement of computer computing power, real-time fault detection of photovoltaic equipment becomes possible by combining an artificial intelligence technology, so that a multi-dimensional outlier detection method for the photovoltaic equipment needs to be provided as soon as possible at the present stage, and a high-efficiency detection algorithm is adopted for real-time analysis of fault conditions of the photovoltaic equipment to make up for the defects of the prior art.
Disclosure of Invention
The invention aims to: the multi-dimensional outlier detection method for the photovoltaic equipment is provided, and the detection efficiency of real-time analysis of the fault condition of the photovoltaic equipment is improved.
The technical scheme of the invention is as follows: the utility model provides a photovoltaic equipment fault detection method based on multi-dimensional outlier detection, this method includes:
s1, collecting and preprocessing photovoltaic equipment operation state data, acquiring the model and capacity of the photovoltaic equipment and corresponding current, voltage, power, temperature and other data through a data collector, and integrating the photovoltaic equipment operation information data collected at the same time into a data setLData ofLIn a total amount ofNUsing a proportional mean filling method to the dataLCarrying out interpolation processing on the missing information in the step (2);
s2, inputting the data processed in the step S1 into the FCM clustering algorithm by using the FCM clustering algorithm based on a threshold mechanism to obtain a clustering result, wherein the threshold mechanism comprises a threshold value for initializing the clustering quantityCThreshold decision mechanism for initializing membership degree matrixUThe threshold initialization mechanism is firstly used for initializing the clustering number of the photovoltaic equipment categoriesCAnd membership matrixUAnd finally obtaining a new membership matrix through iterative computation of a valence functionUObtaining a clustering result according to the final membership matrix;
s3, detecting abnormal data, and specifically comprising the following steps:
S3A, obtaining the statistics judgment algorithm based on the inner distance of the quartile point under all categoriesUpper bound for all types of dataTopAnd the lower boundBottomThen for all dataLJudging, and calibrating all values which are not positioned in the upper and lower bound ranges of the self type as abnormal values;
S3B, while executing the step S3A, concurrently using an isolated forest algorithm based on data dimension reduction to process the preprocessed data, performing dimension reduction operation on all types of data, selecting a certain type from the dimension-reduced data, randomly selecting a sample from the type as a sub-sample of a training isolated tree, putting the sub-sample into a root node of the isolated tree, randomly assigning a data type, and randomly generating a cut point between the maximum value and the minimum value of the data type in the current sub-sample
Figure 429989DEST_PATH_IMAGE001
Figure 409446DEST_PATH_IMAGE002
Setting the corresponding value of all the types under the current specified type to be less than or equal to
Figure 267812DEST_PATH_IMAGE003
Is placed at the left node and has a value greater than
Figure 990917DEST_PATH_IMAGE003
The samples are placed at the right node, cutting points are repeatedly generated at the left branch node and the right branch node of the nodes, new leaf nodes are continuously constructed until only one piece of data on the leaf nodes can not be cut any more, or the tree grows to the set maximum height, and the method is based on the principle that
Figure 352760DEST_PATH_IMAGE004
Calculating an abnormality score, wherein
Figure 503118DEST_PATH_IMAGE005
Is composed ofLThe expected values of path lengths in the trees,
Figure 583201DEST_PATH_IMAGE006
is oneIncludedhA sample-by-sample data set, representing the average path length of the tree,
Figure 250943DEST_PATH_IMAGE007
whereinhIs a sub-sample capacity
Figure 247718DEST_PATH_IMAGE008
For adjusting the sum, repeatedly executing the isolated forest algorithm until all the class data are detected;
and S4, performing weighted threshold judgment on the abnormal data detected in the step S3A and the step S3B to obtain final detected abnormal data.
In any one of the above technical solutions, further, the step of filling the missing information by using a proportional mean value filling method in step S1 includes: random decimationtThe data of the bar is transmitted to the mobile terminal,
Figure 316780DEST_PATH_IMAGE009
taking out all numerical values of data types corresponding to missing information positions in the data, and sorting the data from small to large, the firstaA data note
Figure 274372DEST_PATH_IMAGE010
Figure 339280DEST_PATH_IMAGE011
Using the formula
Figure 410135DEST_PATH_IMAGE012
To obtainpI.e., the value to be inserted, wherein,Eis a mathematical expectation.
In any of the above technical solutions, further, the membership matrix to be initialized in step S2UIs a matrix of N rows and C columns, with
Figure 167876DEST_PATH_IMAGE013
Represents the n-th row and c-column of the matrix U,
Figure 753709DEST_PATH_IMAGE014
,(
Figure 232095DEST_PATH_IMAGE015
Figure 547670DEST_PATH_IMAGE016
) Wherein
Figure 351678DEST_PATH_IMAGE017
For the nth value in the data W,
Figure 408495DEST_PATH_IMAGE018
corresponds to the firstc
Figure 956151DEST_PATH_IMAGE019
The average power value of the group is,
Figure 535687DEST_PATH_IMAGE020
corresponds to the firstj
Figure 900809DEST_PATH_IMAGE021
The average power value of the group is,
Figure 320289DEST_PATH_IMAGE013
indicating that the nth data corresponds to the membership of the class c, wherein the sum of the membership of each piece of data belonging to the class c is 1 for the total number N of data, namely, satisfies
Figure 281423DEST_PATH_IMAGE022
The iterative calculation process specifically comprises: the first iteration is based on the initialized membership matrixUFrom the formula
Figure 102749DEST_PATH_IMAGE023
Calculating the clustering center of C clusters
Figure 638772DEST_PATH_IMAGE024
Where m is the value 2, the value of m,
Figure 545548DEST_PATH_IMAGE025
is pretreatedNIn the datanData of a personLUsing the formula
Figure 310373DEST_PATH_IMAGE026
Calculating a cost function, wherein
Figure 251784DEST_PATH_IMAGE027
Figure 958709DEST_PATH_IMAGE028
To represent
Figure 962568DEST_PATH_IMAGE017
The corresponding clustering center; at this point, iteration is continued, using the formula
Figure 249193DEST_PATH_IMAGE029
Recalculating membership matrixUIn which
Figure 779532DEST_PATH_IMAGE030
Figure 139581DEST_PATH_IMAGE031
Is shown asdIndividual cluster center
Figure 880004DEST_PATH_IMAGE032
Is reused
Figure 376845DEST_PATH_IMAGE023
Calculating the cluster center, and then using the formula
Figure 902635DEST_PATH_IMAGE026
Recalculating the cost function, wherein
Figure 561149DEST_PATH_IMAGE027
If the value of this calculation isJValue to last calculationJThe amount of change is less than the thresholdεAnd if the algorithm is finished, the final membership matrix is obtainedUOtherwise, the above steps are repeatedStep until valueJIs less than a threshold valueεAccording to the finally obtained membership matrixUAnd clustering, wherein the class corresponding to the maximum membership degree of each datum is the finally divided class.
In any of the above technical solutions, further, the step S3A of using a statistical decision algorithm based on the inner distance of the quartile point specifically includes: selecting a category from preprocessed data, respectively sequencing values of each category from small to large according to the data types of all the data in the category, randomly selecting one type of data to find out the digit of the data, marking the digit as Q2, dividing the data into a left half and a right half by taking the Q2 as a dividing point, respectively finding out the digits on the left side and the right side of the divided data, wherein the left side is marked with the middle digit as Q1, the right side is marked with the middle digit as Q3, and according to a formula
Figure 788868DEST_PATH_IMAGE033
Figure 823821DEST_PATH_IMAGE034
Figure 469697DEST_PATH_IMAGE035
Respectively calculate the four-bit distanceIQRUpper bound ofTopAnd lower boundBottomWhereinkThe statistical decision algorithm is repeatedly executed as a constant until the upper bound corresponding to all types of data under all categories is obtainedTopAnd the lower boundBottom
In any of the above technical solutions, further, the dimension reduction operation in step S3B specifically includes: firstly, one category of data is taken, and the number of the category of data isMA bar, composing such category dataRLine ofMColumn matrixXThen will beXIs zero-averaged for each row to obtain a covariance matrix
Figure 158167DEST_PATH_IMAGE036
TFor transposing symbols, solving matricesVThe characteristic vector is arranged into a matrix from top to bottom according to the size of the corresponding characteristic value, and the characteristic value is taken beforesThe rows form a matrixPWherein
Figure 748548DEST_PATH_IMAGE037
Final calculation
Figure 462558DEST_PATH_IMAGE038
YI.e. reducing the dimension of the category tosAnd repeatedly executing the dimensionality reduction operation on the data after dimensionality reduction until all the categories are subjected to dimensionality reduction.
In any of the above technical solutions, further, the specific determination basis of the weighting threshold in step S4 is: when one piece of data is judged to be abnormal by a statistic judgment algorithm based on the inner distance of the quartile point and an isolated forest algorithm based on data dimension reduction, the data is defined as abnormal data;
otherwise, respectively calculating the statistic judgment abnormal factor of the data:
Figure 821995DEST_PATH_IMAGE039
and isolated forest anomaly factors:
Figure 212525DEST_PATH_IMAGE040
wherein S represents an abnormality score,
by the formula
Figure 649058DEST_PATH_IMAGE041
Obtaining the degree of abnormalityD,WhereinlThe constraint factor is defined as 0.3 when
Figure 291392DEST_PATH_IMAGE042
If so, it is determined to be abnormal.
The invention has the beneficial effects that:
according to the technical scheme, as the clustering algorithm and the isolated forest algorithm are combined with the statistical judgment algorithm, compared with other commonly used detection methods, the photovoltaic equipment fault detection effect is more accurate; according to the invention, the calculation burden is reduced by performing data dimension reduction on the operation data of the photovoltaic equipment, higher detection efficiency is achieved, and operation and maintenance personnel can be effectively assisted to quickly locate the fault point; the detection algorithm of the invention can simultaneously detect a large amount of data, even if the data have difference, thus achieving the effect of deploying detection in a large range in a small amount and reducing the cost burden.
In the preferred implementation mode of the invention, a clustering algorithm is added in the algorithm, so that the photovoltaic equipment with different models and different environments can be classified and detected, and the compatibility of the photovoltaic equipment detection is improved.
Drawings
The above and additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
FIG. 1 is a schematic flow diagram of a method of photovoltaic equipment fault detection based on multi-dimensional outlier detection according to an embodiment of the present invention;
fig. 2 is a schematic diagram of a quartet point inner distance algorithm in a photovoltaic device fault detection method based on multi-dimensional outlier detection according to an embodiment of the invention.
Detailed Description
In order that the above objects, features and advantages of the present invention can be more clearly understood, a more particular description of the invention will be rendered by reference to the appended drawings. It should be noted that the embodiments of the present invention and features of the embodiments may be combined with each other without conflict.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, however, the present invention may be practiced in other ways than those specifically described herein, and therefore the scope of the present invention is not limited by the specific embodiments disclosed below.
As shown in fig. 1, the present embodiment provides a method for detecting a failure of a photovoltaic device based on multi-dimensional outlier detection, where the method includes:
s1, collecting and preprocessing running state data of the photovoltaic equipment, and acquiring the model and the capacity of the photovoltaic equipment and corresponding current, voltage and work through a data acquisition unitData such as rate, temperature and the like, and integrates the photovoltaic equipment operation information data acquired at the same time into one dataLData ofLIn a total amount ofNUsing a proportional mean filling method to the dataLThe missing information in (2) is interpolated.
In particular, during the data acquisition process, the situation of incomplete information can occur, and the data integrated in such a way
Figure 895549DEST_PATH_IMAGE025
There will be a partial deletion in.
Wherein the proportional mean filling method selects randomlytThe data of the bar is transmitted to the mobile terminal,
Figure 332347DEST_PATH_IMAGE043
taking out all numerical values of data types corresponding to missing information in the data, and sorting the data from small to largeaA data note as
Figure 772686DEST_PATH_IMAGE010
,(
Figure 546607DEST_PATH_IMAGE011
) Using the formula
Figure 880637DEST_PATH_IMAGE012
To obtainpI.e., the value to be inserted, wherein,Eis a mathematical expectation.
S2, inputting the data processed in the step S1 into the FCM clustering algorithm by using the FCM clustering algorithm based on a threshold mechanism to obtain a clustering result, wherein the clustering result specifically comprises the following steps:
s21, initializing the photovoltaic equipment category clustering number by using a threshold mechanismCCDefaults to 1, and takes out all dataLThe power values in the sequence are sorted from small to large, and the sorted power valuesNA power value is recorded asWWill beWDividing according to the principle that the number of data in each interval is equalmA section wherein
Figure 363702DEST_PATH_IMAGE044
mRounding up, calculating the average power value of each interval in turn, and calculating the average power value difference between two adjacent intervals
Figure 415971DEST_PATH_IMAGE045
Figure 728004DEST_PATH_IMAGE046
In which
Figure 57485DEST_PATH_IMAGE047
Is shown asbAverage power value of each interval: (
Figure 836085DEST_PATH_IMAGE048
) Whenever there is
Figure 500285DEST_PATH_IMAGE049
Time, cluster numberCAnd adding 1.
S22, initializing a membership matrix by using a threshold mechanismUUIs composed ofNLine ofCA matrix of columns, wherein
Figure 22533DEST_PATH_IMAGE013
Representation matrixUTo (1)nLine ofcRow (a), (b)
Figure 938012DEST_PATH_IMAGE015
Figure 12147DEST_PATH_IMAGE016
) Also represents the firstnThe individual data corresponds tocDegree of membership of a class, for a total number ofNHas a sum of membership degrees of all classes of 1, i.e., satisfies
Figure 117638DEST_PATH_IMAGE050
Then will beWRe-dividing according to the principle of equal data number of each intervalCA section whereincThe average power value of each interval is recorded as
Figure 443577DEST_PATH_IMAGE051
According to the formula
Figure 731339DEST_PATH_IMAGE052
,(
Figure 586162DEST_PATH_IMAGE053
Figure 975686DEST_PATH_IMAGE054
) Initializing a membership matrixUWherein in the
Figure 698792DEST_PATH_IMAGE055
As dataWTo (1)nA numerical value.
S23, based on the initialized membership matrixUFrom the formula
Figure 716426DEST_PATH_IMAGE023
ComputingCCluster center of individual clusters
Figure 148676DEST_PATH_IMAGE024
Here, themThe value of 2 is taken as the index,
Figure 884551DEST_PATH_IMAGE025
is pretreatedNIn the datanData of a personL
S24, use formula
Figure 676926DEST_PATH_IMAGE026
Calculating a cost function, wherein
Figure 549067DEST_PATH_IMAGE027
Figure 623989DEST_PATH_IMAGE028
To represent
Figure 971794DEST_PATH_IMAGE017
The corresponding clustering center; at this point, iteration is continued, using the formula
Figure 177647DEST_PATH_IMAGE029
Recalculating membership matrixUWherein
Figure 779661DEST_PATH_IMAGE030
Figure 678347DEST_PATH_IMAGE031
Is shown asdIndividual cluster center
Figure 982289DEST_PATH_IMAGE032
Is reused
Figure 867200DEST_PATH_IMAGE023
Calculating the cluster center, and then using the formula
Figure 979512DEST_PATH_IMAGE026
Recalculating the cost function, wherein
Figure 642575DEST_PATH_IMAGE027
If the value of this calculation isJValue to last calculationJChange less than thresholdεIf the algorithm is finished, the final membership matrix is obtainedUOtherwise, the above steps are repeated until the value is reachedJIs less than a threshold valueεAccording to the finally obtained membership matrixUAnd clustering, wherein the class corresponding to the maximum membership degree of each datum is the finally divided class.
S3, detecting abnormity, and processing data in parallel by using a statistic decision algorithm based on the inner distance of the quartile point and an isolated forest algorithm based on data dimension reduction, wherein the two algorithms specifically comprise the following steps:
S3A, selecting a category from the preprocessed data by using a statistic decision algorithm based on the inner distance of quartile points, sorting all data of the category from small to large according to the data types of the data, randomly selecting one type of data to find out the digit of the data, marking the digit as Q2, dividing the data into a left half and a right half by taking the Q2 as a division point, and respectively finding out the digits on the left side and the right side of the divided data, wherein the left side median is marked as Q1, and the right side median is marked as Q3. As shown in FIG. 2, there are 11 total data, Q2 marked at x6, Q1 marked at x3, and Q3 marked at x 9.
According to the formula
Figure 840338DEST_PATH_IMAGE033
Figure 732202DEST_PATH_IMAGE034
Figure 823655DEST_PATH_IMAGE035
Respectively calculate four-bit distancesIQRUpper bound ofTopAnd lower boundBottomWhereinkIs a constant, defined as 1.5 in this example. Repeatedly executing the statistical judgment algorithm until obtaining the upper bound corresponding to all types of data under all categoriesTopAnd the lower boundBottom
For all dataLAnd judging, namely marking all numerical values which are not positioned in the upper and lower bound ranges of the self type as abnormal, and marking a piece of data as abnormal data as long as one data type is detected as abnormal by the algorithm in one piece of data.
S3B, processing the preprocessed data by using an isolated forest algorithm based on data dimension reduction while the step S3A is performed, and firstly, taking one type of data, wherein the number of the type of data isMA bar, composing such category dataRLine ofMColumn matrixXThen will beXIs zero-averaged for each row to obtain a covariance matrix
Figure 798564DEST_PATH_IMAGE036
TFor transposing symbols, a matrix is solvedVThe characteristic vector is arranged into a matrix from top to bottom according to the size of the corresponding characteristic value, and the characteristic value is taken beforesThe rows form a matrixPWherein
Figure 90480DEST_PATH_IMAGE037
Final calculation
Figure 566461DEST_PATH_IMAGE038
YI.e. reducing the dimension of the category tosAnd repeatedly executing the dimensionality reduction operation on the data after dimensionality reduction until all the categories are subjected to dimensionality reduction.
Selecting a certain category from the data after dimension reduction, and randomly selecting the sample capacity ashh1/10) of the number of the classes, placing the samples as child samples of the training isolated tree into a root node of the isolated tree, randomly assigning a data type, and randomly generating a cut point between the maximum value and the minimum value of the data type in the current child sample
Figure 122207DEST_PATH_IMAGE001
,(
Figure 408963DEST_PATH_IMAGE002
) Setting the corresponding value of all the types under the current specified type to be less than or equal to
Figure 174794DEST_PATH_IMAGE003
Is placed at the left node and has a value greater than
Figure 329832DEST_PATH_IMAGE003
The samples of (2) are placed at the right node, the cutting points are repeatedly generated at the left branch node and the right branch node of the node, new leaf nodes are continuously constructed until only one piece of data on the leaf nodes can not be cut any more, or the tree grows to the set maximum height, and in the embodiment, the maximum height of the tree is set to be 256.
According to
Figure 881030DEST_PATH_IMAGE004
An anomaly score is calculated. Wherein
Figure 197742DEST_PATH_IMAGE005
Is composed ofLThe expected values of path lengths in the trees,
Figure 450869DEST_PATH_IMAGE006
is one composed ofhA sample-by-sample data set, representing the average path length of the tree,
Figure 753805DEST_PATH_IMAGE007
whereinhIs a sub-sample capacity
Figure 549723DEST_PATH_IMAGE008
To sum the sums.
And repeatedly executing the isolated forest algorithm until all the class data are detected. Specifically, it is found through experiments that the result of the calculation of the data by the isolated forest is mostly negative, so in this embodiment, the abnormal score at this time is added by 0.5 and the detection model decision threshold is set to 0, that is, the data still less than 0 in the data obtained after adding 0.5 is all determined as abnormal data.
S4, performing weighted threshold judgment on the abnormal data detected in the step S3A and the step S3B to obtain final detected abnormal data, wherein the specific judgment basis of the weighted threshold is as follows: defining a piece of data as abnormal data when the data is judged to be abnormal by the isolated forest algorithm and the statistical judgment algorithm, otherwise respectively calculating the statistical judgment abnormal factor of the data
Figure 161970DEST_PATH_IMAGE039
And isolated forest anomaly factors
Figure 43338DEST_PATH_IMAGE040
Wherein S represents an abnormality score, represented by the formula
Figure 141176DEST_PATH_IMAGE041
Obtaining the degree of abnormalityD,WhereinlIs defined as a constraint factor of 0.3 when
Figure 791601DEST_PATH_IMAGE042
If so, the system is judged to be abnormal.
In one practical application of the embodiment, partial data of a group of real photovoltaic devices in operation at the time t1 is collected, and fault detection is performed on the partial data as shown in table 1.
TABLE 1 statistics of the actual data collected at a time
Figure 840328DEST_PATH_IMAGE056
The detection algorithm classifies the first 6 pieces of data into a first category and the last 6 pieces of data into a second category, and after data preprocessing and clustering, whether the data are abnormal or not is judged respectively through statistical judgment in the step S3A and outlier detection in the isolated forest algorithm in the step S3B.
As shown in Table 2, the final test results showed that the 1 st and 7 th data were abnormal data.
Table 2 data test results
Figure 208993DEST_PATH_IMAGE057
By simply observing data, the 2 nd data can be obviously found to have abnormal temperature, the 7 th data has abnormal PV2 voltage, and the two data are really abnormal data and are consistent with a model judgment result, so that the fault detection method is proved to be effective.
The technical scheme of the invention is explained in detail by combining the attached drawings, the invention provides a photovoltaic equipment fault detection method based on multi-dimensional outlier detection, clustering is carried out by using an FCM clustering algorithm introducing a threshold mechanism, and abnormal data is detected by combining an isolated forest algorithm based on data dimension reduction and a statistical decision algorithm based on quartering bit point inner distance as a core. The photovoltaic equipment detection device can detect photovoltaic equipment of different models and different environments, and detection efficiency, accuracy and compatibility all reach better standards.
The steps in the invention can be sequentially adjusted, combined and deleted according to actual requirements.
Although the present invention has been disclosed in detail with reference to the accompanying drawings, it is to be understood that such description is merely illustrative of and not restrictive on the application of the present invention. The scope of the invention is defined by the appended claims and may include various modifications, adaptations and equivalents of the invention without departing from its scope and spirit.

Claims (3)

1. A method for detecting faults of photovoltaic equipment based on multi-dimensional outlier detection is characterized by comprising the following steps:
s1, collecting and preprocessing photovoltaic equipment operation state data, acquiring the model and capacity of the photovoltaic equipment and corresponding current, voltage, power and temperature through a data acquisition unit, and integrating the photovoltaic equipment operation information data acquired at the same time into a data setLData ofLIn a total amount ofNUsing a proportional mean filling method to the dataLThe missing information in the method is interpolated, firstly, the ratio mean filling method is selected randomlytThe number of the pieces of data is set,
Figure 773372DEST_PATH_IMAGE001
taking out all numerical values of data types corresponding to missing information in the data, and sorting the data from small to largeaA data note as
Figure 661693DEST_PATH_IMAGE002
Figure 262308DEST_PATH_IMAGE003
Using the formula
Figure 83633DEST_PATH_IMAGE004
To find outpI.e., the value to be inserted, wherein,Eis a mathematical expectation;
s2, inputting the data processed in the step S1 into the FCM clustering algorithm by using the FCM clustering algorithm based on a threshold mechanism to obtain a clustering result, wherein the threshold mechanism comprises a threshold value for initializing the clustering quantityCThe threshold value determination mechanism of (2) is,Cdefaults to 1, and takes out all dataLThe power values in the sequence are sorted from small to large, and the sorted power valuesNA power value is recorded asWWill beWDividing according to the principle that the number of data in each interval is equalmA section wherein
Figure 495023DEST_PATH_IMAGE005
mRounding up, calculating the average power value of each interval in turn, and calculating the average power value difference between two adjacent intervals
Figure 198537DEST_PATH_IMAGE006
Figure 87995DEST_PATH_IMAGE007
Wherein
Figure 780139DEST_PATH_IMAGE008
Is shown asbThe average power value of the individual intervals,
Figure 96851DEST_PATH_IMAGE009
whenever there is
Figure 490923DEST_PATH_IMAGE010
Time, cluster numberCAdding 1; the threshold mechanism further includes a mechanism for initializing a membership matrixUThreshold initialization mechanism, membership matrixUIs composed ofNLine ofCA matrix of columns, wherein
Figure 184072DEST_PATH_IMAGE011
Representation matrixUTo (1)nLine ofcThe columns of the image data are,Wrepartitioning according to the principle of equal data number of each intervalCA section whereincThe average power value of each interval is recorded as
Figure 229258DEST_PATH_IMAGE012
Of 1 atjThe average power value of each interval is recorded as
Figure 982450DEST_PATH_IMAGE013
WTo (1) anThe numerical value is recorded as
Figure 863818DEST_PATH_IMAGE014
According to the formula
Figure 95080DEST_PATH_IMAGE015
Initializing membership matrixUWherein
Figure 696569DEST_PATH_IMAGE016
Figure 886242DEST_PATH_IMAGE017
Figure 176278DEST_PATH_IMAGE018
Figure 211230DEST_PATH_IMAGE019
(ii) a The threshold mechanism is used for initializing the clustering number of the photovoltaic equipment categoriesCAnd membership matrixUAnd finally obtaining a new membership matrix through iterative computation of a valence functionUAnd obtaining a clustering result according to the final membership matrix, wherein the iterative computation process specifically comprises the following steps: the first iteration is based on the initialized membership matrixUFrom the formula
Figure 981740DEST_PATH_IMAGE020
Calculating the clustering centers of C clusters
Figure 827467DEST_PATH_IMAGE021
Here, themThe value of 2 is taken as the index,
Figure 683428DEST_PATH_IMAGE022
is pretreatedNIn the datanData of a personLUsing the formula
Figure 256491DEST_PATH_IMAGE023
Calculating a cost function, wherein
Figure 881508DEST_PATH_IMAGE024
Figure 396672DEST_PATH_IMAGE025
To represent
Figure 474349DEST_PATH_IMAGE014
The corresponding clustering center; at this point, iteration is continued, using the formula
Figure 116683DEST_PATH_IMAGE026
Recalculating membership matrixUIn which
Figure 596206DEST_PATH_IMAGE027
Figure 786666DEST_PATH_IMAGE028
Is shown asdThe center of each cluster is determined by the center of each cluster,
Figure 351639DEST_PATH_IMAGE029
is reused
Figure 266505DEST_PATH_IMAGE020
Calculating the cluster center, and then using the formula
Figure 600535DEST_PATH_IMAGE023
Recalculating the cost function, wherein
Figure 457501DEST_PATH_IMAGE024
If the value of this calculation isJValue to last calculationJChange less than thresholdεAnd if the algorithm is finished, the final membership matrix is obtainedUOtherwise, the above steps are repeated until the value is up toJIs less than a threshold valueεAccording to the finally obtained membership matrixUClustering is carried out, and the category corresponding to the maximum membership degree of each datum is the finally divided category;
s3, detecting abnormal data, and specifically comprising the following steps:
S3A, obtaining upper bounds corresponding to all types of data under all categories by using a statistic decision algorithm based on inner distances of quartile pointsTopAnd the lower boundBottomFor all dataLJudging, and calibrating all values which are not positioned in the upper and lower bound range of the self type as abnormal values;
S3B, while executing the step S3A, concurrently using an isolated forest algorithm based on data dimension reduction to process the preprocessed data, performing dimension reduction operation on all types of data, selecting a certain type from the dimension-reduced data, randomly selecting a sample from the type as a sub-sample of a training isolated tree, putting the sub-sample into a root node of the isolated tree, randomly assigning a data type, and randomly generating a cut point between the maximum value and the minimum value of the data type in the current sub-sample
Figure 509771DEST_PATH_IMAGE030
Figure 228328DEST_PATH_IMAGE031
Setting the corresponding value of all the types under the current specified type to be less than or equal to
Figure 682443DEST_PATH_IMAGE032
Is placed at the left node and has a value greater than
Figure 211776DEST_PATH_IMAGE032
The samples are placed at the right node, cutting points are repeatedly generated at the left branch node and the right branch node of the nodes, new leaf nodes are continuously constructed until only one piece of data on the leaf nodes can not be cut any more, or the tree grows to the set maximum height, and the method is based on the principle that
Figure 751342DEST_PATH_IMAGE033
Calculating an abnormality score, wherein
Figure 70328DEST_PATH_IMAGE034
Is composed ofLThe expected values of path lengths in the trees,
Figure 378949DEST_PATH_IMAGE035
is one ofhA sample-by-sample data set, representing the average path length of the tree,
Figure 328451DEST_PATH_IMAGE036
whereinhIn order to be the sub-sample size,
Figure 339001DEST_PATH_IMAGE037
for adjusting the sum, repeatedly executing the isolated forest algorithm until all the class data are detected;
s4, performing weighted threshold judgment on the abnormal data detected in the step S3A and the step S3B to obtain final detected abnormal data, wherein the weighted threshold judgment comprises the following steps:
when one piece of data is judged to be abnormal by a statistic judgment algorithm based on the inner distance of the quartile point and an isolated forest algorithm based on data dimension reduction, the data is defined as abnormal data;
otherwise, respectively calculating the statistic judgment abnormal factor of the data:
Figure 664940DEST_PATH_IMAGE038
and isolated forest anomaly factor:
Figure 31330DEST_PATH_IMAGE039
wherein S represents an abnormality score,
by the formula
Figure 899536DEST_PATH_IMAGE040
Obtaining the degree of abnormalityD,WhereinlThe constraint factor is defined as 0.3 when
Figure 413694DEST_PATH_IMAGE041
If so, the system is judged to be abnormal.
2. The method for detecting the faults of the photovoltaic equipment based on the multi-dimensional outlier detection as claimed in claim 1, wherein the step S3A of using the statistical decision algorithm based on the inner distance of the quartile point specifically comprises the following steps: selecting a category from preprocessed data, respectively sequencing values of each category from small to large according to the data types of all the data in the category, randomly selecting one type of data to find out the digit of the data, marking the digit as Q2, dividing the data into a left half and a right half by taking the Q2 as a dividing point, respectively finding out the digits on the left side and the right side of the divided data, wherein the left side is marked with the middle digit as Q1, the right side is marked with the middle digit as Q3, and according to a formula
Figure 481007DEST_PATH_IMAGE042
Figure 233062DEST_PATH_IMAGE043
Figure 789945DEST_PATH_IMAGE044
Respectively calculate four-bit distancesIQRUpper bound ofTopAnd the lower boundBottomWhereinkThe statistical decision algorithm is repeatedly executed as a constant until the upper bound corresponding to all types of data under all categories is obtainedTopAnd lower boundBottom
3. The method for detecting the faults of the photovoltaic equipment based on the multi-dimensional outlier detection as claimed in claim 1, wherein the step S3B of reducing the dimensions comprises the following specific steps: firstly, one category of data is taken, and the number of the category of data isMA bar, composing such category dataRLine ofMColumn matrixXThen will beXIs zero-averaged for each row to obtain a covariance matrix
Figure 525820DEST_PATH_IMAGE045
TTo turn toPut symbols and calculate a matrixVThe characteristic vector is arranged into a matrix from top to bottom according to the size of the corresponding characteristic value, and the characteristic value is taken beforesThe rows form a matrixPWherein
Figure 693364DEST_PATH_IMAGE046
Final calculation
Figure 831084DEST_PATH_IMAGE047
YI.e. reducing the dimension of the category tosAnd repeatedly executing the dimensionality reduction operation on the data after dimensionality reduction until all the categories are subjected to dimensionality reduction.
CN202210946811.5A 2022-08-09 2022-08-09 Photovoltaic equipment fault detection method based on multi-dimensional outlier detection Active CN115021679B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210946811.5A CN115021679B (en) 2022-08-09 2022-08-09 Photovoltaic equipment fault detection method based on multi-dimensional outlier detection

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210946811.5A CN115021679B (en) 2022-08-09 2022-08-09 Photovoltaic equipment fault detection method based on multi-dimensional outlier detection

Publications (2)

Publication Number Publication Date
CN115021679A CN115021679A (en) 2022-09-06
CN115021679B true CN115021679B (en) 2022-11-04

Family

ID=83065644

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210946811.5A Active CN115021679B (en) 2022-08-09 2022-08-09 Photovoltaic equipment fault detection method based on multi-dimensional outlier detection

Country Status (1)

Country Link
CN (1) CN115021679B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116842322B (en) * 2023-07-19 2024-02-23 深圳市精微康投资发展有限公司 Electric motor operation optimization method and system based on data processing
CN116662729B (en) * 2023-08-02 2023-10-31 山东鲁玻玻璃科技有限公司 Low borosilicate glass feeding control data intelligent monitoring method
CN117077044B (en) * 2023-10-18 2024-02-06 深圳市大易电气实业有限公司 Method and device for judging faults of vacuum circuit breaker for generator

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111666169A (en) * 2020-05-13 2020-09-15 云南电网有限责任公司信息中心 Improved isolated forest algorithm and Gaussian distribution-based combined data anomaly detection method

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5581965B2 (en) * 2010-01-19 2014-09-03 オムロン株式会社 MPPT controller, solar cell control device, photovoltaic power generation system, MPPT control program, and MPPT controller control method
CN108776683B (en) * 2018-06-01 2022-01-21 广东电网有限责任公司 Electric power operation and maintenance data cleaning method based on isolated forest algorithm and neural network
CN113378449A (en) * 2021-04-15 2021-09-10 黄山东安新高能源科技有限公司 Photovoltaic module health state diagnosis method based on fuzzy C-means clustering
CN113839618A (en) * 2021-10-15 2021-12-24 李力洋 Real-time fault detection method for large grid-connected solar photovoltaic power station

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111666169A (en) * 2020-05-13 2020-09-15 云南电网有限责任公司信息中心 Improved isolated forest algorithm and Gaussian distribution-based combined data anomaly detection method

Also Published As

Publication number Publication date
CN115021679A (en) 2022-09-06

Similar Documents

Publication Publication Date Title
CN115021679B (en) Photovoltaic equipment fault detection method based on multi-dimensional outlier detection
CN110596492B (en) Transformer fault diagnosis method based on particle swarm optimization random forest model
US7415386B2 (en) Method and system for failure signal detection analysis
CN111340063B (en) Data anomaly detection method for coal mill
CN111460728A (en) Method and device for predicting residual life of industrial equipment, storage medium and equipment
CN113570138B (en) Method and device for predicting residual service life of equipment of time convolution network
CN111046961B (en) Fault classification method based on bidirectional long-time and short-time memory unit and capsule network
CN110097123B (en) Express mail logistics process state detection multi-classification system
CN111722046A (en) Transformer fault diagnosis method based on deep forest model
CN108491991B (en) Constraint condition analysis system and method based on industrial big data product construction period
CN111625399A (en) Method and system for recovering metering data
CN111343147A (en) Network attack detection device and method based on deep learning
CN109240276B (en) Multi-block PCA fault monitoring method based on fault sensitive principal component selection
CN114330486A (en) Power system bad data identification method based on improved Wasserstein GAN
CN114200245A (en) Construction method of line loss abnormity identification model of power distribution network
CN112817954A (en) Missing value interpolation method based on multi-method ensemble learning
CN111863135B (en) False positive structure variation filtering method, storage medium and computing device
CN113127464A (en) Agricultural big data environment feature processing method and device and electronic equipment
CN116400168A (en) Power grid fault diagnosis method and system based on depth feature clustering
CN111612149A (en) Main network line state detection method, system and medium based on decision tree
CN107728476B (en) SVM-forest based method for extracting sensitive data from unbalanced data
CN116304721A (en) Data standard making method and system for big data management based on data category
CN116578833A (en) IGBT module aging fault diagnosis system based on optimized random forest model
CN113889274B (en) Method and device for constructing risk prediction model of autism spectrum disorder
CN115729825A (en) Fuzzy test case generation method and device of industrial protocol and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant