CN115021679B - Photovoltaic equipment fault detection method based on multi-dimensional outlier detection - Google Patents
Photovoltaic equipment fault detection method based on multi-dimensional outlier detection Download PDFInfo
- Publication number
- CN115021679B CN115021679B CN202210946811.5A CN202210946811A CN115021679B CN 115021679 B CN115021679 B CN 115021679B CN 202210946811 A CN202210946811 A CN 202210946811A CN 115021679 B CN115021679 B CN 115021679B
- Authority
- CN
- China
- Prior art keywords
- data
- value
- photovoltaic equipment
- matrix
- clustering
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000013450 outlier detection Methods 0.000 title claims abstract description 16
- 238000001514 detection method Methods 0.000 title abstract description 35
- 230000002159 abnormal effect Effects 0.000 claims abstract description 35
- 230000009467 reduction Effects 0.000 claims abstract description 22
- 230000007246 mechanism Effects 0.000 claims abstract description 14
- 239000011159 matrix material Substances 0.000 claims description 45
- 238000000034 method Methods 0.000 claims description 23
- 238000004364 calculation method Methods 0.000 claims description 12
- 230000005856 abnormality Effects 0.000 claims description 5
- 230000008569 process Effects 0.000 claims description 5
- 238000007781 pre-processing Methods 0.000 claims description 4
- GSDSWSVVBLHKDQ-UHFFFAOYSA-N 9-fluoro-3-methyl-10-(4-methylpiperazin-1-yl)-7-oxo-2,3-dihydro-7H-[1,4]oxazino[2,3,4-ij]quinoline-6-carboxylic acid Chemical compound FC1=CC(C(C(C(O)=O)=C2)=O)=C3N2C(C)COC3=C1N1CCN(C)CC1 GSDSWSVVBLHKDQ-UHFFFAOYSA-N 0.000 claims description 3
- 241001611138 Isma Species 0.000 claims description 3
- 239000004576 sand Substances 0.000 claims description 3
- 238000012549 training Methods 0.000 claims description 3
- 238000012163 sequencing technique Methods 0.000 claims description 2
- 230000007547 defect Effects 0.000 abstract description 2
- 230000006870 function Effects 0.000 description 4
- 238000012545 processing Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000010223 real-time analysis Methods 0.000 description 2
- 101100161752 Mus musculus Acot11 gene Proteins 0.000 description 1
- 230000006978 adaptation Effects 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000012217 deletion Methods 0.000 description 1
- 230000037430 deletion Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000013100 final test Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000010248 power generation Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H02—GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
- H02S—GENERATION OF ELECTRIC POWER BY CONVERSION OF INFRARED RADIATION, VISIBLE LIGHT OR ULTRAVIOLET LIGHT, e.g. USING PHOTOVOLTAIC [PV] MODULES
- H02S50/00—Monitoring or testing of PV systems, e.g. load balancing or fault identification
- H02S50/10—Testing of PV devices, e.g. of PV modules or single PV cells
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02E—REDUCTION OF GREENHOUSE GAS [GHG] EMISSIONS, RELATED TO ENERGY GENERATION, TRANSMISSION OR DISTRIBUTION
- Y02E10/00—Energy generation through renewable energy sources
- Y02E10/50—Photovoltaic [PV] energy
Landscapes
- Photovoltaic Devices (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a photovoltaic equipment fault detection method based on multi-dimensional outlier detection, which relates to the technical field of photovoltaic equipment detection, clustering is carried out by using an FCM clustering algorithm introducing a threshold mechanism, an isolated forest algorithm based on data dimension reduction and a statistical decision algorithm based on the inner distance of quartile points are taken as cores to be combined with abnormal data detection, detection can be carried out on photovoltaic equipment with different models and different environments, and the detection efficiency, accuracy and compatibility all reach better standards; according to the technical scheme, the fault condition of the photovoltaic equipment can be efficiently analyzed in real time at low cost, the photovoltaic equipment with different models and different environments can be classified and detected, and the compatibility of the detection of the photovoltaic equipment is improved so as to make up for the defects of the prior art.
Description
Technical Field
The invention relates to the technical field of photovoltaic equipment detection, in particular to a photovoltaic equipment fault detection method based on multi-dimensional outlier detection.
Background
The photovoltaic equipment fault mainly refers to the abnormal information such as current, voltage, power, temperature and the like caused by the problems of the photovoltaic equipment, the circuit or the meter, and has important significance for the normal operation of a photovoltaic power generation system by positioning the fault occurrence point as soon as possible and processing the fault occurrence point in time. In recent years, manufacturers of photovoltaic equipment are various, and communication protocols of equipment of different versions of the same manufacturer are different, so that great difficulty is brought to fault detection of the photovoltaic equipment.
At present, traditional detection algorithms such as isolated forests are mainly used in a fault detection method of photovoltaic equipment, but due to the fact that the types of the photovoltaic equipment are various and the types of the photovoltaic equipment are different from the environment, the latitude of data generated in operation is high, the difference is large, detection accuracy, detection efficiency and the like are reduced, and an effective solution is not provided for the traditional detection algorithms for the problems.
At present, based on unsupervised learning, development of a multi-dimensional outlier detection technology and continuous improvement of computer computing power, real-time fault detection of photovoltaic equipment becomes possible by combining an artificial intelligence technology, so that a multi-dimensional outlier detection method for the photovoltaic equipment needs to be provided as soon as possible at the present stage, and a high-efficiency detection algorithm is adopted for real-time analysis of fault conditions of the photovoltaic equipment to make up for the defects of the prior art.
Disclosure of Invention
The invention aims to: the multi-dimensional outlier detection method for the photovoltaic equipment is provided, and the detection efficiency of real-time analysis of the fault condition of the photovoltaic equipment is improved.
The technical scheme of the invention is as follows: the utility model provides a photovoltaic equipment fault detection method based on multi-dimensional outlier detection, this method includes:
s1, collecting and preprocessing photovoltaic equipment operation state data, acquiring the model and capacity of the photovoltaic equipment and corresponding current, voltage, power, temperature and other data through a data collector, and integrating the photovoltaic equipment operation information data collected at the same time into a data setLData ofLIn a total amount ofNUsing a proportional mean filling method to the dataLCarrying out interpolation processing on the missing information in the step (2);
s2, inputting the data processed in the step S1 into the FCM clustering algorithm by using the FCM clustering algorithm based on a threshold mechanism to obtain a clustering result, wherein the threshold mechanism comprises a threshold value for initializing the clustering quantityCThreshold decision mechanism for initializing membership degree matrixUThe threshold initialization mechanism is firstly used for initializing the clustering number of the photovoltaic equipment categoriesCAnd membership matrixUAnd finally obtaining a new membership matrix through iterative computation of a valence functionUObtaining a clustering result according to the final membership matrix;
s3, detecting abnormal data, and specifically comprising the following steps:
S3A, obtaining the statistics judgment algorithm based on the inner distance of the quartile point under all categoriesUpper bound for all types of dataTopAnd the lower boundBottomThen for all dataLJudging, and calibrating all values which are not positioned in the upper and lower bound ranges of the self type as abnormal values;
S3B, while executing the step S3A, concurrently using an isolated forest algorithm based on data dimension reduction to process the preprocessed data, performing dimension reduction operation on all types of data, selecting a certain type from the dimension-reduced data, randomly selecting a sample from the type as a sub-sample of a training isolated tree, putting the sub-sample into a root node of the isolated tree, randomly assigning a data type, and randomly generating a cut point between the maximum value and the minimum value of the data type in the current sub-sample,Setting the corresponding value of all the types under the current specified type to be less than or equal toIs placed at the left node and has a value greater thanThe samples are placed at the right node, cutting points are repeatedly generated at the left branch node and the right branch node of the nodes, new leaf nodes are continuously constructed until only one piece of data on the leaf nodes can not be cut any more, or the tree grows to the set maximum height, and the method is based on the principle thatCalculating an abnormality score, whereinIs composed ofLThe expected values of path lengths in the trees,is oneIncludedhA sample-by-sample data set, representing the average path length of the tree,whereinhIs a sub-sample capacityFor adjusting the sum, repeatedly executing the isolated forest algorithm until all the class data are detected;
and S4, performing weighted threshold judgment on the abnormal data detected in the step S3A and the step S3B to obtain final detected abnormal data.
In any one of the above technical solutions, further, the step of filling the missing information by using a proportional mean value filling method in step S1 includes: random decimationtThe data of the bar is transmitted to the mobile terminal,taking out all numerical values of data types corresponding to missing information positions in the data, and sorting the data from small to large, the firstaA data note,Using the formulaTo obtainpI.e., the value to be inserted, wherein,Eis a mathematical expectation.
In any of the above technical solutions, further, the membership matrix to be initialized in step S2UIs a matrix of N rows and C columns, withRepresents the n-th row and c-column of the matrix U,,(;) WhereinFor the nth value in the data W,corresponds to the firstc The average power value of the group is,corresponds to the firstj The average power value of the group is,indicating that the nth data corresponds to the membership of the class c, wherein the sum of the membership of each piece of data belonging to the class c is 1 for the total number N of data, namely, satisfies;
The iterative calculation process specifically comprises: the first iteration is based on the initialized membership matrixUFrom the formulaCalculating the clustering center of C clustersWhere m is the value 2, the value of m,is pretreatedNIn the datanData of a personLUsing the formulaCalculating a cost function, wherein,To representThe corresponding clustering center; at this point, iteration is continued, using the formulaRecalculating membership matrixUIn which,Is shown asdIndividual cluster centerIs reusedCalculating the cluster center, and then using the formulaRecalculating the cost function, wherein;
If the value of this calculation isJValue to last calculationJThe amount of change is less than the thresholdεAnd if the algorithm is finished, the final membership matrix is obtainedUOtherwise, the above steps are repeatedStep until valueJIs less than a threshold valueεAccording to the finally obtained membership matrixUAnd clustering, wherein the class corresponding to the maximum membership degree of each datum is the finally divided class.
In any of the above technical solutions, further, the step S3A of using a statistical decision algorithm based on the inner distance of the quartile point specifically includes: selecting a category from preprocessed data, respectively sequencing values of each category from small to large according to the data types of all the data in the category, randomly selecting one type of data to find out the digit of the data, marking the digit as Q2, dividing the data into a left half and a right half by taking the Q2 as a dividing point, respectively finding out the digits on the left side and the right side of the divided data, wherein the left side is marked with the middle digit as Q1, the right side is marked with the middle digit as Q3, and according to a formula,,Respectively calculate the four-bit distanceIQRUpper bound ofTopAnd lower boundBottomWhereinkThe statistical decision algorithm is repeatedly executed as a constant until the upper bound corresponding to all types of data under all categories is obtainedTopAnd the lower boundBottom。
In any of the above technical solutions, further, the dimension reduction operation in step S3B specifically includes: firstly, one category of data is taken, and the number of the category of data isMA bar, composing such category dataRLine ofMColumn matrixXThen will beXIs zero-averaged for each row to obtain a covariance matrix,TFor transposing symbols, solving matricesVThe characteristic vector is arranged into a matrix from top to bottom according to the size of the corresponding characteristic value, and the characteristic value is taken beforesThe rows form a matrixPWherein ,Final calculation,YI.e. reducing the dimension of the category tosAnd repeatedly executing the dimensionality reduction operation on the data after dimensionality reduction until all the categories are subjected to dimensionality reduction.
In any of the above technical solutions, further, the specific determination basis of the weighting threshold in step S4 is: when one piece of data is judged to be abnormal by a statistic judgment algorithm based on the inner distance of the quartile point and an isolated forest algorithm based on data dimension reduction, the data is defined as abnormal data;
otherwise, respectively calculating the statistic judgment abnormal factor of the data:
and isolated forest anomaly factors:
by the formulaObtaining the degree of abnormalityD,WhereinlThe constraint factor is defined as 0.3 whenIf so, it is determined to be abnormal.
The invention has the beneficial effects that:
according to the technical scheme, as the clustering algorithm and the isolated forest algorithm are combined with the statistical judgment algorithm, compared with other commonly used detection methods, the photovoltaic equipment fault detection effect is more accurate; according to the invention, the calculation burden is reduced by performing data dimension reduction on the operation data of the photovoltaic equipment, higher detection efficiency is achieved, and operation and maintenance personnel can be effectively assisted to quickly locate the fault point; the detection algorithm of the invention can simultaneously detect a large amount of data, even if the data have difference, thus achieving the effect of deploying detection in a large range in a small amount and reducing the cost burden.
In the preferred implementation mode of the invention, a clustering algorithm is added in the algorithm, so that the photovoltaic equipment with different models and different environments can be classified and detected, and the compatibility of the photovoltaic equipment detection is improved.
Drawings
The above and additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
FIG. 1 is a schematic flow diagram of a method of photovoltaic equipment fault detection based on multi-dimensional outlier detection according to an embodiment of the present invention;
fig. 2 is a schematic diagram of a quartet point inner distance algorithm in a photovoltaic device fault detection method based on multi-dimensional outlier detection according to an embodiment of the invention.
Detailed Description
In order that the above objects, features and advantages of the present invention can be more clearly understood, a more particular description of the invention will be rendered by reference to the appended drawings. It should be noted that the embodiments of the present invention and features of the embodiments may be combined with each other without conflict.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, however, the present invention may be practiced in other ways than those specifically described herein, and therefore the scope of the present invention is not limited by the specific embodiments disclosed below.
As shown in fig. 1, the present embodiment provides a method for detecting a failure of a photovoltaic device based on multi-dimensional outlier detection, where the method includes:
s1, collecting and preprocessing running state data of the photovoltaic equipment, and acquiring the model and the capacity of the photovoltaic equipment and corresponding current, voltage and work through a data acquisition unitData such as rate, temperature and the like, and integrates the photovoltaic equipment operation information data acquired at the same time into one dataLData ofLIn a total amount ofNUsing a proportional mean filling method to the dataLThe missing information in (2) is interpolated.
In particular, during the data acquisition process, the situation of incomplete information can occur, and the data integrated in such a wayThere will be a partial deletion in.
Wherein the proportional mean filling method selects randomlytThe data of the bar is transmitted to the mobile terminal,taking out all numerical values of data types corresponding to missing information in the data, and sorting the data from small to largeaA data note as,() Using the formulaTo obtainpI.e., the value to be inserted, wherein,Eis a mathematical expectation.
S2, inputting the data processed in the step S1 into the FCM clustering algorithm by using the FCM clustering algorithm based on a threshold mechanism to obtain a clustering result, wherein the clustering result specifically comprises the following steps:
s21, initializing the photovoltaic equipment category clustering number by using a threshold mechanismC,CDefaults to 1, and takes out all dataLThe power values in the sequence are sorted from small to large, and the sorted power valuesNA power value is recorded asWWill beWDividing according to the principle that the number of data in each interval is equalmA section wherein,mRounding up, calculating the average power value of each interval in turn, and calculating the average power value difference between two adjacent intervals,In whichIs shown asbAverage power value of each interval: () Whenever there isTime, cluster numberCAnd adding 1.
S22, initializing a membership matrix by using a threshold mechanismU,UIs composed ofNLine ofCA matrix of columns, whereinRepresentation matrixUTo (1)nLine ofcRow (a), (b);) Also represents the firstnThe individual data corresponds tocDegree of membership of a class, for a total number ofNHas a sum of membership degrees of all classes of 1, i.e., satisfiesThen will beWRe-dividing according to the principle of equal data number of each intervalCA section whereincThe average power value of each interval is recorded asAccording to the formula,(;) Initializing a membership matrixUWherein in theAs dataWTo (1)nA numerical value.
S23, based on the initialized membership matrixUFrom the formulaComputingCCluster center of individual clustersHere, themThe value of 2 is taken as the index,is pretreatedNIn the datanData of a personL。
S24, use formulaCalculating a cost function, wherein,To representThe corresponding clustering center; at this point, iteration is continued, using the formulaRecalculating membership matrixUWherein,Is shown asdIndividual cluster centerIs reusedCalculating the cluster center, and then using the formulaRecalculating the cost function, wherein;
If the value of this calculation isJValue to last calculationJChange less than thresholdεIf the algorithm is finished, the final membership matrix is obtainedUOtherwise, the above steps are repeated until the value is reachedJIs less than a threshold valueεAccording to the finally obtained membership matrixUAnd clustering, wherein the class corresponding to the maximum membership degree of each datum is the finally divided class.
S3, detecting abnormity, and processing data in parallel by using a statistic decision algorithm based on the inner distance of the quartile point and an isolated forest algorithm based on data dimension reduction, wherein the two algorithms specifically comprise the following steps:
S3A, selecting a category from the preprocessed data by using a statistic decision algorithm based on the inner distance of quartile points, sorting all data of the category from small to large according to the data types of the data, randomly selecting one type of data to find out the digit of the data, marking the digit as Q2, dividing the data into a left half and a right half by taking the Q2 as a division point, and respectively finding out the digits on the left side and the right side of the divided data, wherein the left side median is marked as Q1, and the right side median is marked as Q3. As shown in FIG. 2, there are 11 total data, Q2 marked at x6, Q1 marked at x3, and Q3 marked at x 9.
According to the formula,,Respectively calculate four-bit distancesIQRUpper bound ofTopAnd lower boundBottomWhereinkIs a constant, defined as 1.5 in this example. Repeatedly executing the statistical judgment algorithm until obtaining the upper bound corresponding to all types of data under all categoriesTopAnd the lower boundBottom。
For all dataLAnd judging, namely marking all numerical values which are not positioned in the upper and lower bound ranges of the self type as abnormal, and marking a piece of data as abnormal data as long as one data type is detected as abnormal by the algorithm in one piece of data.
S3B, processing the preprocessed data by using an isolated forest algorithm based on data dimension reduction while the step S3A is performed, and firstly, taking one type of data, wherein the number of the type of data isMA bar, composing such category dataRLine ofMColumn matrixXThen will beXIs zero-averaged for each row to obtain a covariance matrix,TFor transposing symbols, a matrix is solvedVThe characteristic vector is arranged into a matrix from top to bottom according to the size of the corresponding characteristic value, and the characteristic value is taken beforesThe rows form a matrixPWherein ,Final calculation,YI.e. reducing the dimension of the category tosAnd repeatedly executing the dimensionality reduction operation on the data after dimensionality reduction until all the categories are subjected to dimensionality reduction.
Selecting a certain category from the data after dimension reduction, and randomly selecting the sample capacity ash(h1/10) of the number of the classes, placing the samples as child samples of the training isolated tree into a root node of the isolated tree, randomly assigning a data type, and randomly generating a cut point between the maximum value and the minimum value of the data type in the current child sample,() Setting the corresponding value of all the types under the current specified type to be less than or equal toIs placed at the left node and has a value greater thanThe samples of (2) are placed at the right node, the cutting points are repeatedly generated at the left branch node and the right branch node of the node, new leaf nodes are continuously constructed until only one piece of data on the leaf nodes can not be cut any more, or the tree grows to the set maximum height, and in the embodiment, the maximum height of the tree is set to be 256.
According toAn anomaly score is calculated. WhereinIs composed ofLThe expected values of path lengths in the trees,is one composed ofhA sample-by-sample data set, representing the average path length of the tree,whereinhIs a sub-sample capacityTo sum the sums.
And repeatedly executing the isolated forest algorithm until all the class data are detected. Specifically, it is found through experiments that the result of the calculation of the data by the isolated forest is mostly negative, so in this embodiment, the abnormal score at this time is added by 0.5 and the detection model decision threshold is set to 0, that is, the data still less than 0 in the data obtained after adding 0.5 is all determined as abnormal data.
S4, performing weighted threshold judgment on the abnormal data detected in the step S3A and the step S3B to obtain final detected abnormal data, wherein the specific judgment basis of the weighted threshold is as follows: defining a piece of data as abnormal data when the data is judged to be abnormal by the isolated forest algorithm and the statistical judgment algorithm, otherwise respectively calculating the statistical judgment abnormal factor of the dataAnd isolated forest anomaly factorsWherein S represents an abnormality score, represented by the formulaObtaining the degree of abnormalityD,WhereinlIs defined as a constraint factor of 0.3 whenIf so, the system is judged to be abnormal.
In one practical application of the embodiment, partial data of a group of real photovoltaic devices in operation at the time t1 is collected, and fault detection is performed on the partial data as shown in table 1.
TABLE 1 statistics of the actual data collected at a time
The detection algorithm classifies the first 6 pieces of data into a first category and the last 6 pieces of data into a second category, and after data preprocessing and clustering, whether the data are abnormal or not is judged respectively through statistical judgment in the step S3A and outlier detection in the isolated forest algorithm in the step S3B.
As shown in Table 2, the final test results showed that the 1 st and 7 th data were abnormal data.
Table 2 data test results
By simply observing data, the 2 nd data can be obviously found to have abnormal temperature, the 7 th data has abnormal PV2 voltage, and the two data are really abnormal data and are consistent with a model judgment result, so that the fault detection method is proved to be effective.
The technical scheme of the invention is explained in detail by combining the attached drawings, the invention provides a photovoltaic equipment fault detection method based on multi-dimensional outlier detection, clustering is carried out by using an FCM clustering algorithm introducing a threshold mechanism, and abnormal data is detected by combining an isolated forest algorithm based on data dimension reduction and a statistical decision algorithm based on quartering bit point inner distance as a core. The photovoltaic equipment detection device can detect photovoltaic equipment of different models and different environments, and detection efficiency, accuracy and compatibility all reach better standards.
The steps in the invention can be sequentially adjusted, combined and deleted according to actual requirements.
Although the present invention has been disclosed in detail with reference to the accompanying drawings, it is to be understood that such description is merely illustrative of and not restrictive on the application of the present invention. The scope of the invention is defined by the appended claims and may include various modifications, adaptations and equivalents of the invention without departing from its scope and spirit.
Claims (3)
1. A method for detecting faults of photovoltaic equipment based on multi-dimensional outlier detection is characterized by comprising the following steps:
s1, collecting and preprocessing photovoltaic equipment operation state data, acquiring the model and capacity of the photovoltaic equipment and corresponding current, voltage, power and temperature through a data acquisition unit, and integrating the photovoltaic equipment operation information data acquired at the same time into a data setLData ofLIn a total amount ofNUsing a proportional mean filling method to the dataLThe missing information in the method is interpolated, firstly, the ratio mean filling method is selected randomlytThe number of the pieces of data is set,taking out all numerical values of data types corresponding to missing information in the data, and sorting the data from small to largeaA data note as,Using the formulaTo find outpI.e., the value to be inserted, wherein,Eis a mathematical expectation;
s2, inputting the data processed in the step S1 into the FCM clustering algorithm by using the FCM clustering algorithm based on a threshold mechanism to obtain a clustering result, wherein the threshold mechanism comprises a threshold value for initializing the clustering quantityCThe threshold value determination mechanism of (2) is,Cdefaults to 1, and takes out all dataLThe power values in the sequence are sorted from small to large, and the sorted power valuesNA power value is recorded asWWill beWDividing according to the principle that the number of data in each interval is equalmA section wherein,mRounding up, calculating the average power value of each interval in turn, and calculating the average power value difference between two adjacent intervals,WhereinIs shown asbThe average power value of the individual intervals,whenever there isTime, cluster numberCAdding 1; the threshold mechanism further includes a mechanism for initializing a membership matrixUThreshold initialization mechanism, membership matrixUIs composed ofNLine ofCA matrix of columns, whereinRepresentation matrixUTo (1)nLine ofcThe columns of the image data are,Wrepartitioning according to the principle of equal data number of each intervalCA section whereincThe average power value of each interval is recorded asOf 1 atjThe average power value of each interval is recorded as,WTo (1) anThe numerical value is recorded asAccording to the formulaInitializing membership matrixUWherein、、、(ii) a The threshold mechanism is used for initializing the clustering number of the photovoltaic equipment categoriesCAnd membership matrixUAnd finally obtaining a new membership matrix through iterative computation of a valence functionUAnd obtaining a clustering result according to the final membership matrix, wherein the iterative computation process specifically comprises the following steps: the first iteration is based on the initialized membership matrixUFrom the formulaCalculating the clustering centers of C clustersHere, themThe value of 2 is taken as the index,is pretreatedNIn the datanData of a personLUsing the formulaCalculating a cost function, wherein,To representThe corresponding clustering center; at this point, iteration is continued, using the formulaRecalculating membership matrixUIn which,Is shown asdThe center of each cluster is determined by the center of each cluster,is reusedCalculating the cluster center, and then using the formulaRecalculating the cost function, wherein;
If the value of this calculation isJValue to last calculationJChange less than thresholdεAnd if the algorithm is finished, the final membership matrix is obtainedUOtherwise, the above steps are repeated until the value is up toJIs less than a threshold valueεAccording to the finally obtained membership matrixUClustering is carried out, and the category corresponding to the maximum membership degree of each datum is the finally divided category;
s3, detecting abnormal data, and specifically comprising the following steps:
S3A, obtaining upper bounds corresponding to all types of data under all categories by using a statistic decision algorithm based on inner distances of quartile pointsTopAnd the lower boundBottomFor all dataLJudging, and calibrating all values which are not positioned in the upper and lower bound range of the self type as abnormal values;
S3B, while executing the step S3A, concurrently using an isolated forest algorithm based on data dimension reduction to process the preprocessed data, performing dimension reduction operation on all types of data, selecting a certain type from the dimension-reduced data, randomly selecting a sample from the type as a sub-sample of a training isolated tree, putting the sub-sample into a root node of the isolated tree, randomly assigning a data type, and randomly generating a cut point between the maximum value and the minimum value of the data type in the current sub-sample,Setting the corresponding value of all the types under the current specified type to be less than or equal toIs placed at the left node and has a value greater thanThe samples are placed at the right node, cutting points are repeatedly generated at the left branch node and the right branch node of the nodes, new leaf nodes are continuously constructed until only one piece of data on the leaf nodes can not be cut any more, or the tree grows to the set maximum height, and the method is based on the principle thatCalculating an abnormality score, whereinIs composed ofLThe expected values of path lengths in the trees,is one ofhA sample-by-sample data set, representing the average path length of the tree,whereinhIn order to be the sub-sample size,for adjusting the sum, repeatedly executing the isolated forest algorithm until all the class data are detected;
s4, performing weighted threshold judgment on the abnormal data detected in the step S3A and the step S3B to obtain final detected abnormal data, wherein the weighted threshold judgment comprises the following steps:
when one piece of data is judged to be abnormal by a statistic judgment algorithm based on the inner distance of the quartile point and an isolated forest algorithm based on data dimension reduction, the data is defined as abnormal data;
otherwise, respectively calculating the statistic judgment abnormal factor of the data:
and isolated forest anomaly factor:
2. The method for detecting the faults of the photovoltaic equipment based on the multi-dimensional outlier detection as claimed in claim 1, wherein the step S3A of using the statistical decision algorithm based on the inner distance of the quartile point specifically comprises the following steps: selecting a category from preprocessed data, respectively sequencing values of each category from small to large according to the data types of all the data in the category, randomly selecting one type of data to find out the digit of the data, marking the digit as Q2, dividing the data into a left half and a right half by taking the Q2 as a dividing point, respectively finding out the digits on the left side and the right side of the divided data, wherein the left side is marked with the middle digit as Q1, the right side is marked with the middle digit as Q3, and according to a formula,,Respectively calculate four-bit distancesIQRUpper bound ofTopAnd the lower boundBottomWhereinkThe statistical decision algorithm is repeatedly executed as a constant until the upper bound corresponding to all types of data under all categories is obtainedTopAnd lower boundBottom。
3. The method for detecting the faults of the photovoltaic equipment based on the multi-dimensional outlier detection as claimed in claim 1, wherein the step S3B of reducing the dimensions comprises the following specific steps: firstly, one category of data is taken, and the number of the category of data isMA bar, composing such category dataRLine ofMColumn matrixXThen will beXIs zero-averaged for each row to obtain a covariance matrix,TTo turn toPut symbols and calculate a matrixVThe characteristic vector is arranged into a matrix from top to bottom according to the size of the corresponding characteristic value, and the characteristic value is taken beforesThe rows form a matrixPWherein ,Final calculation,YI.e. reducing the dimension of the category tosAnd repeatedly executing the dimensionality reduction operation on the data after dimensionality reduction until all the categories are subjected to dimensionality reduction.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210946811.5A CN115021679B (en) | 2022-08-09 | 2022-08-09 | Photovoltaic equipment fault detection method based on multi-dimensional outlier detection |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210946811.5A CN115021679B (en) | 2022-08-09 | 2022-08-09 | Photovoltaic equipment fault detection method based on multi-dimensional outlier detection |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115021679A CN115021679A (en) | 2022-09-06 |
CN115021679B true CN115021679B (en) | 2022-11-04 |
Family
ID=83065644
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210946811.5A Active CN115021679B (en) | 2022-08-09 | 2022-08-09 | Photovoltaic equipment fault detection method based on multi-dimensional outlier detection |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115021679B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116842322B (en) * | 2023-07-19 | 2024-02-23 | 深圳市精微康投资发展有限公司 | Electric motor operation optimization method and system based on data processing |
CN116662729B (en) * | 2023-08-02 | 2023-10-31 | 山东鲁玻玻璃科技有限公司 | Low borosilicate glass feeding control data intelligent monitoring method |
CN117077044B (en) * | 2023-10-18 | 2024-02-06 | 深圳市大易电气实业有限公司 | Method and device for judging faults of vacuum circuit breaker for generator |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111666169A (en) * | 2020-05-13 | 2020-09-15 | 云南电网有限责任公司信息中心 | Improved isolated forest algorithm and Gaussian distribution-based combined data anomaly detection method |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP5581965B2 (en) * | 2010-01-19 | 2014-09-03 | オムロン株式会社 | MPPT controller, solar cell control device, photovoltaic power generation system, MPPT control program, and MPPT controller control method |
CN108776683B (en) * | 2018-06-01 | 2022-01-21 | 广东电网有限责任公司 | Electric power operation and maintenance data cleaning method based on isolated forest algorithm and neural network |
CN113378449A (en) * | 2021-04-15 | 2021-09-10 | 黄山东安新高能源科技有限公司 | Photovoltaic module health state diagnosis method based on fuzzy C-means clustering |
CN113839618A (en) * | 2021-10-15 | 2021-12-24 | 李力洋 | Real-time fault detection method for large grid-connected solar photovoltaic power station |
-
2022
- 2022-08-09 CN CN202210946811.5A patent/CN115021679B/en active Active
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111666169A (en) * | 2020-05-13 | 2020-09-15 | 云南电网有限责任公司信息中心 | Improved isolated forest algorithm and Gaussian distribution-based combined data anomaly detection method |
Also Published As
Publication number | Publication date |
---|---|
CN115021679A (en) | 2022-09-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN115021679B (en) | Photovoltaic equipment fault detection method based on multi-dimensional outlier detection | |
CN110596492B (en) | Transformer fault diagnosis method based on particle swarm optimization random forest model | |
US7415386B2 (en) | Method and system for failure signal detection analysis | |
CN111340063B (en) | Data anomaly detection method for coal mill | |
CN111460728A (en) | Method and device for predicting residual life of industrial equipment, storage medium and equipment | |
CN113570138B (en) | Method and device for predicting residual service life of equipment of time convolution network | |
CN111046961B (en) | Fault classification method based on bidirectional long-time and short-time memory unit and capsule network | |
CN110097123B (en) | Express mail logistics process state detection multi-classification system | |
CN111722046A (en) | Transformer fault diagnosis method based on deep forest model | |
CN108491991B (en) | Constraint condition analysis system and method based on industrial big data product construction period | |
CN111625399A (en) | Method and system for recovering metering data | |
CN111343147A (en) | Network attack detection device and method based on deep learning | |
CN109240276B (en) | Multi-block PCA fault monitoring method based on fault sensitive principal component selection | |
CN114330486A (en) | Power system bad data identification method based on improved Wasserstein GAN | |
CN114200245A (en) | Construction method of line loss abnormity identification model of power distribution network | |
CN112817954A (en) | Missing value interpolation method based on multi-method ensemble learning | |
CN111863135B (en) | False positive structure variation filtering method, storage medium and computing device | |
CN113127464A (en) | Agricultural big data environment feature processing method and device and electronic equipment | |
CN116400168A (en) | Power grid fault diagnosis method and system based on depth feature clustering | |
CN111612149A (en) | Main network line state detection method, system and medium based on decision tree | |
CN107728476B (en) | SVM-forest based method for extracting sensitive data from unbalanced data | |
CN116304721A (en) | Data standard making method and system for big data management based on data category | |
CN116578833A (en) | IGBT module aging fault diagnosis system based on optimized random forest model | |
CN113889274B (en) | Method and device for constructing risk prediction model of autism spectrum disorder | |
CN115729825A (en) | Fuzzy test case generation method and device of industrial protocol and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |