CN114595762A - Photovoltaic power station abnormal data sequence extraction method - Google Patents
Photovoltaic power station abnormal data sequence extraction method Download PDFInfo
- Publication number
- CN114595762A CN114595762A CN202210218937.0A CN202210218937A CN114595762A CN 114595762 A CN114595762 A CN 114595762A CN 202210218937 A CN202210218937 A CN 202210218937A CN 114595762 A CN114595762 A CN 114595762A
- Authority
- CN
- China
- Prior art keywords
- data
- data sequence
- sequence
- abnormal
- normal
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2411—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/243—Classification techniques relating to the number of classes
- G06F18/2433—Single-class perspective, e.g. one-against-all classification; Novelty detection; Outlier detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/20—Administration of product repair or maintenance
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
- G06Q50/06—Electricity, gas or water supply
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y04—INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
- Y04S—SYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
- Y04S10/00—Systems supporting electrical power generation, transmission or distribution
- Y04S10/50—Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications
Abstract
The invention provides a photovoltaic power station abnormal data sequence extraction method, which comprises the steps of establishing a complete data sequence sample set according to original electrical data sequences of a plurality of healthy power stations, calculating deviation rate and Mahalanobis distance of the data sequences, setting a healthy threshold value, and realizing normal and abnormal marking; and designing a training support vector machine model to classify all data sequences in the sample set, then judging the state of the support vector machine for multiple types of data sequences, verifying the reliability and effectiveness of the method, and finally extracting abnormal data sequences from mass data in a problem power station by using the model, thereby achieving the positioning effect of abnormal string combination.
Description
The technical field is as follows:
the invention belongs to preparation work for a follow-up photovoltaic power station to perform a fault diagnosis technology by using a method for extracting an abnormal data sequence in a photovoltaic power station, and particularly relates to a method for extracting the abnormal data sequence of the photovoltaic power station.
Background art:
solar energy is renewable energy with the largest resource quantity and has the remarkable advantages of large reserves, no noise, universality, durability and the like. In the Paris agreement signed by the Hangzhou G20 Peak meeting, the total carbon dioxide emission of domestic unit production is 60-65% lower than that of 2005 in the promise of 2030 in China. According to the renewable energy medium and long term development planning, the installed capacity of solar power generation is estimated to reach 1.8GW in 2020, 600GW in 2050, and the installed capacity of solar power generation accounts for 5% of the installed capacity of electric power in China. The concept of 'intelligence +' related to the traditional digital economic energy industry is put forward for the first time by national strategy in 2019, and under the guidance of artificial intelligence technology, the photovoltaic industry is bound to meet the historical opportunity of intelligent upgrading. Compared with a centralized photovoltaic power station, the photovoltaic industry starts late, informatization construction is not fully developed, and data monitoring, analysis statistics, operation and maintenance scheme formulation and other work are often completed manually in operation management and maintenance of the power station.
Therefore, the method for extracting the abnormal data sequence of the photovoltaic power station is the most direct target at present, and compared with the complexity degree of manually monitoring data and then carrying out fault diagnosis and the possibility of errors, the method provided by the invention has the advantages that the function of a data mining algorithm is added, so that the abnormal data sequence of the photovoltaic power station is more intelligently extracted, and a precondition step is also provided for subsequent fault diagnosis.
The invention content is as follows:
the invention aims to provide a method for extracting abnormal data sequences of a photovoltaic power station, wherein the detection data of the photovoltaic power station has typical characteristics of scale, diversity and low value density, so that in order to solve the problems of analysis and characteristic extraction of high-latitude complex data, the method adopts a classification principle based on a support vector machine, firstly establishes a data sequence sample set from three dimensions of a joint axis, a data axis and a time axis, secondly judges abnormal states of various types of data sequences by adopting a training support vector machine model, and finally realizes analysis comparison, judgment and extraction of the abnormal data sequences, and the specific implementation steps are as follows:
the method comprises the following steps: establishing a complete data sequence sample set by importing a plurality of original electric data sequences of healthy power stations, comparing the complete data sequence sample set with a healthy threshold value after calculating a deviation rate sequence and a Mahalanobis distance, and marking a normal/abnormal state to obtain a data sequence training sample, wherein the complete data sequence comprises a normal data sequence sample set and an abnormal data sequence sample set;
the specific steps of obtaining the data sequence sample set are as follows:
s1, selecting a certain number of multiple paths of normal data sequences (in the same link, in the same type and at the same time interval) with better consistency from a plurality of healthy power stations as normal data sequence sample sets, wherein the multiple paths of normal data sequence sample sets are shown in a formula (1):
Xi={xi,t}t=1,2..49 (1)
x in the formula (1)iIs a normal data sequence sample set (i is a multipath serial number), xi,jVoltage and current data of the ith group string at the time t;
the current and voltage data in the normal data sequence sample set are easily influenced by the model of the components in the power station, so the normalization processing is carried out, as shown in formula (2):
wherein x in the formula (2)aver_tThe average value of current and voltage at the moment t, and n is the number of the series branches in the power station;
the average value is calculated again after the data lower than the average value are removed, and the expected values of the voltage and current data at each moment are obtained by repeating the calculation for multiple timesThereby forming a corrected average value sequence
S2, calculating a normal data sequence XiAnd the sequence of mean valuesDeviation sequence P ofiAs shown in formula (3):
wherein p in formula (3)i,tThe deviation rate of the ith branch at the time t is obtained;
because the deviation rate mode can basically eliminate the influence of different light intensities and environment temperatures on the data sequence numerical values, the deviation rate sequence can objectively reflect the difference degree between the data sequences, and then the average value of the multipath deviation rates at the same time is calculated to obtain the deviation rate average value sequence
S3, describing the multipath deviation rate sequence P by using the Mahalanobis distanceiWith a sequence of multiple deviation ratios averagesThe distance of (d) is represented by the formula (4):
MD in formula (4)iMahalanobis distance, Σ, representing the ith sequence of deviation ratios-1The covariance matrix is a covariance matrix of a multipath deviation rate sequence, and the abnormal samples are fewer in number, so that the degree of deviation from a sample set is higher, and the Mahalanobis distance is larger; to determine a reasonable health threshold, the mahalanobis distance of each deviation rate sequence is converted into a variable subject to normal distribution by using Box-Cox power function transformation, and the determination threshold is converted into a determination confidence level.
The confidence level reflects the probability that the sample falls within the confidence interval, i.e., the likelihood that the sample is normal, so the confidence level is related to the proportion of abnormal samples. In order to determine reasonable confidence level, calculating confidence interval corresponding to each confidence level, calculating corresponding threshold value by using Box-Cox inverse transformation, comparing mahalanobis distance of each deviation rate sequence with the threshold value, marking the sample, testing the marked result by combining the waveform characteristics of abnormal data sequence, and counting misjudgment times n1Number of missed judgment n2Calculating the accuracy k according to the formula (5)1
k1=1-(n1+n2)/n (5)
Wherein n in the above formula is the number of samples;
and S4, selecting a sufficient number of data sequences (multiple links, multiple types and multiple time periods) as a complete data sequence sample set including normal and abnormal states, repeating the calculation process from S1 to S2, calculating and obtaining the Mahalanobis distance of all the data sequences, comparing the Mahalanobis distance with a health threshold, and classifying and marking the data sequences according to the normal/abnormal states to obtain the data sequence sample set which can be used for supporting the training of the vector machine model.
Step two: designing a training support vector machine model, training the support vector machine by using the obtained data sequence sample set to obtain a classifier model, quickly finishing the classification of all data sequences in the sample set, and judging the normal/abnormal state of the electrical data sequence of the photovoltaic power station;
the specific steps involved in training the support vector machine model are as follows:
a. according to the abnormal features of the data sequence in the photovoltaic power station, a hyperplane which supports a vector machine model and seeks an optimal solution in the training sample set obtained in the step one is constructed, in the invention, the number of samples in the photovoltaic power station is far more than the number of sample features, the polynomial kernel parameters are more, meanwhile, when the order is higher, the complexity is higher, and the Gaussian kernel can map the sample features into an infinite space, so that the Gaussian kernel is more suitable to be selected;
b. in the construction of a support vector machine model, a grid search and cross verification method is combined to select a penalty factor C and a Gaussian kernel parameter g;
c. introducing misjudgment rate k for representing classification effect under different parameter combinations2And a rate of missing judgment k3As shown in formula (6):
wherein n in the above formula (5)p、nnThe number of normal and abnormal samples.
Step three: the method comprises the steps of judging the states of support vector machines of various data sequences, combining site operation and maintenance, elaborately designing a data sequence format in three dimensions of a link axis (power generation/transmission/distribution/grid-connected point), a data axis (current/voltage/power) and a time axis (time/day/month/season) according to the occurrence and evolution principle of site abnormity or fault phenomena, judging the data sequences containing various types of abnormity or faults by using the support vector machines, and verifying the reliability and effectiveness of the method;
the data sequence format is as follows: time length, sampling period, data type, decision frequency, etc.; aiming at urgent repair abnormity or faults, the data sequence is set to be short in time length, small in sampling period and high in judgment frequency, for example, a photovoltaic module bypass diode is in conduction fault, and voltage data and the like are selected according to the data type;
step four: after the state judgment of the support vector machine with multiple data types is completed, the on-line abnormal state judgment of the voltage, current and power data sequences is detected and monitored by the power station, normal/abnormal state classification and identification are realized, and the abnormal data sequences are extracted for the next fault diagnosis. According to the judging and extracting effects, the data sequence format parameters, the support vector machine model structure parameters, the kernel functions and the like are revised, and the online applicability and the state judging accuracy of the method are improved.
For the photovoltaic power station, the invention further develops, researches and analyzes and fuses data sequence characteristics under various fault states by using a machine learning method for reference, develops rich connotation and distinct characteristics of the data sequence, realizes abnormal data sequence extraction, and prepares for subsequent fault diagnosis in the power station, so that the power station is more efficient in operation management, maintenance and overhaul, and the invention is put into practical use in the photovoltaic power station.
Description of the drawings:
fig. 1 is a general idea flow chart of a photovoltaic power station abnormal data sequence extraction method of the present invention.
Fig. 2 is a diagram of an implementation process for constructing a data sequence sample set.
Fig. 3(a) and fig. 3(b) are statistical results of the times of making a mark and making a false judgment, and making a missing judgment and the accuracy of the current deviation sequence under each confidence level.
Fig. 4(a) and 4(b) are statistical results of false positives, false negatives, and accuracy when the voltage deviation sequence is marked at each confidence level.
Fig. 5(a) and 5(b) are curved surface diagrams of the false rate and the false rate when classifying the current deviation rate sequence under each parameter combination.
Fig. 6(a) and 6(b) are curved surface diagrams of the false rate and the false rate when the voltage deviation rate sequence is classified under each parameter combination.
The specific implementation mode is as follows:
the following is a detailed description of embodiments of the invention, taken in conjunction with the accompanying drawings.
As shown in fig. 1, the abnormal data sequence extraction method for the photovoltaic power station is described in four steps.
The method comprises the following steps: a complete data sequence sample set is established by leading in a plurality of original electric data sequences of healthy power stations, is compared with a healthy threshold value after deviation rate sequence calculation and Mahalanobis distance calculation, and then is marked in a normal/abnormal state to obtain a data sequence training sample, wherein the complete data sequence comprises a normal data sequence sample set and an abnormal data sequence sample set, and as shown in figure 2, the steps for constructing the data sequence sample set are described;
s1, selecting a certain number of multiple paths (same link, same type and same time period) of normal data sequences with better consistency from a plurality of healthy power stations as a normal data sequence sample set, wherein the multiple paths of normal data sequence sample sets are as shown in a formula (1):
Xi={xi,t}t=1,2..49 (1)
x in the formula (1)iIs a normal data sequence sample set (i is a multipath serial number), xi,jVoltage and current data of the ith group string at the time t;
the current and voltage data in the normal data sequence sample set are easily influenced by the model of the components in the power station, so the normalization processing is carried out, as shown in formula (2):
wherein x in the formula (2)aver_tThe average value of current and voltage at the moment t, and n is the number of the series branches in the power station;
the average value is calculated again after the data lower than the average value are removed, and the expected values of the voltage and current data at each moment are obtained by repeating the calculation for multiple timesThereby forming a corrected average value sequence
S2, calculating a normal data sequence XiAnd the sequence of mean valuesDeviation sequence P ofiAs shown in formula (3):
wherein p in formula (3)i,tThe deviation rate of the ith branch at the time t is obtained;
because the deviation rate mode can basically eliminate the influence of different light intensities and environment temperatures on the data sequence numerical values, the deviation rate sequence can objectively reflect the difference degree between the data sequences, and then the average value of the multipath deviation rates at the same time is calculated to obtain the deviation rate average value sequence
S3, describing the multipath deviation rate sequence P by using the Mahalanobis distanceiWith a sequence of multiple deviation ratios averagesThe distance of (d) is represented by the formula (4):
MD in formula (4)iMahalanobis distance, Σ, representing the ith sequence of deviation ratios-1A covariance matrix which is a multipath deviation ratio sequence;
due to the small number of abnormal samples, the abnormal samples deviate from the sample set to a high degree, and the mahalanobis distance is large. To determine a reasonable health threshold, the mahalanobis distance of each deviation rate sequence is converted into a variable subject to normal distribution by using Box-Cox power function transformation, and the determination threshold is converted into a determination confidence level.
The confidence level reflects the probability that the sample falls within the confidence interval, i.e., the likelihood that the sample is normal, so the confidence level is related to the proportion of abnormal samples. In order to determine reasonable confidence level, calculating confidence interval corresponding to each confidence level, calculating corresponding threshold value by using Box-Cox inverse transformation, comparing mahalanobis distance of each deviation rate sequence with the threshold value, marking the sample, testing the marked result by combining the waveform characteristics of abnormal data sequence, and counting misjudgment times n1Number of missed judgment n2Calculating the accuracy k according to the formula (5)1;
k1=1-(n1+n2)/n (5)
As shown in fig. 3(a) and 3(b), the statistical results of the times of false mark and missed judgment and the accuracy of the current deviation sequence at each confidence level are shown, and it can be known from the graphs that the accuracy increases and then decreases with the increase of the confidence level, and the accuracy reaches 100% when the confidence level is 61.8% -65.8%. This is because when the confidence level is lower than 61.8%, the corresponding health threshold is too small, which is likely to cause erroneous judgment; when the confidence level is higher than 65.8%, the corresponding threshold value is too large, and the missed judgment is easy to generate.
As shown in fig. 4(a) and (b), the statistical results of the labeling misjudgment and the accuracy of the voltage deviation sequence at each confidence level show that the accuracy is the highest when the confidence level is 71.2% to 75.6%. A health threshold corresponding to a confidence level of 73.4% is calculated, enabling the labeling of the sequence of voltage deviation ratios.
And S4, selecting a sufficient number of data sequences (multiple links, multiple types and multiple time periods) as a complete data sequence sample set including normal and abnormal states, repeating the calculation process from S1 to S2, calculating and obtaining the Mahalanobis distance of all the data sequences, comparing the Mahalanobis distance with a health threshold, and classifying and marking the data sequences according to the normal/abnormal states to obtain the data sequence sample set which can be used for supporting the training of the vector machine model.
Step two: designing a training support vector machine model, training the support vector machine by using the obtained data sequence sample set to obtain a classifier model, quickly finishing the classification of all data sequences in the sample set, and judging the normal/abnormal state of the electrical data sequence of the photovoltaic power station;
the specific steps involved in training the support vector machine model are as follows:
a. according to the abnormal features of the data sequence in the photovoltaic power station, a hyperplane which supports a vector machine model and seeks an optimal solution in the training sample set obtained in the step one is constructed, in the invention, the number of samples in the photovoltaic power station is far more than the number of sample features, the polynomial kernel parameters are more, meanwhile, when the order is higher, the complexity is higher, and the Gaussian kernel can map the sample features into an infinite space, so that the Gaussian kernel is more suitable to be selected;
b. in the construction of a support vector machine model, a grid search and cross verification method is combined to select a penalty factor C and a Gaussian kernel parameter g;
c. introducing misjudgment rate k for representing classification effect under different parameter combinations2And a rate of missing judgment k3As shown in formula (5):
wherein n in the above formula (5)p、nnThe number of normal and abnormal samples.
When the current misjudgment rate is plotted in FIG. 5(a), it can be seen that k increases with the increase of g2Increasing and then decreasing, and then remaining the same. When g ∈ [2 ]-9,2-1.2]When k is2Is substantially 0; when g ∈ [2 ]-1.2,22.8]When, as C increases, k2Increasing first and then decreasing; when g ∈ [2 ]2.8,212]When k is2Is substantially 0.
As can be seen from the curve of the current leakage rate shown in FIG. 5(b), as g increases, k is increased3Decreasing and then increasing and then remaining unchanged. When g ∈ [2 ]-9,20.4]When, as C increases, k3Gradually decrease; when g ∈ [2 ]0.4,24.4]When, as C increases, k3Decreasing first and then increasing; when g ∈ [2 ]4.4,212]When, as C increases, k3Remain substantially unchanged.
As the parameter combination (C, g) varies, k2、k3Generally presents opposite change trends, and g is selected to achieve the balance of the twoi=4、Ci0.19, when k1=97.30%、k2=1.32%、k3=4.17%。
As can be seen from the voltage misjudgment ratio curve of FIG. 6(a), as g increases, k2Increasing and then decreasing, and then remaining the same. When g ∈ [2 ]-9,25.2]When k is2Is substantially 0; when g ∈ [2 ]5.2,28.4]When, as C increases, k2Increasing first and then decreasing; when g ∈ [2 ]8.4,212]When k is2Is substantially 0.
As can be seen from the graph of the voltage leakage rate curve in FIG. 6(b), as g increases, k3Decreasing and then increasing and then remaining unchanged. When g ∈ [2 ]-9,2-3.8]When, as C increases, k3Gradually decrease; when g is equal to [2 ]-3.8,22.4]When, as C increases, k3Decreasing first and then increasing; when g ∈ [2 ]2.4,212]When, as C increases, k3Remains substantially unchanged, so g is selectedu=4.92、Cu1.32, when k1=99.80%、k2=0%、k3=0.42%。
The optimal parameters of the model, namely g, are determined by combining a grid search method and a cross verification methodi=4、Ci=0.19、gu=4.92,Cu1.32, the method achieves a better classification effect and lays a foundation for extracting abnormal data sequences of the problem power stations.
Step three: the method comprises the steps of judging the states of support vector machines of various data sequences, combining site operation and maintenance, elaborately designing a data sequence format in three dimensions of a link axis (power generation/transmission/distribution/grid-connected point), a data axis (current/voltage/power) and a time axis (time/day/month/season) according to the occurrence and evolution principle of site abnormity or fault phenomena, judging the data sequences containing various types of abnormity or faults by using the support vector machines, and verifying the reliability and effectiveness of the method;
the data sequence format is as follows: time length, sampling period, data type, decision frequency, etc.; aiming at urgent repair abnormity or faults, the data sequence is set to be short in time length, small in sampling period and high in judgment frequency, for example, a photovoltaic module bypass diode is in conduction fault, and voltage data and the like are selected according to the data type;
step four: after the state judgment of the support vector machine with multiple data types is completed, the on-line abnormal state judgment of the voltage, current and power data sequences is detected and monitored by the power station, normal/abnormal state classification and identification are realized, and the abnormal data sequences are extracted for the next fault diagnosis. According to the judging and extracting effects, the data sequence format parameters, the support vector machine model structure parameters, the kernel functions and the like are revised, and the online applicability and the state judging accuracy of the method are improved.
Claims (3)
1. A photovoltaic power station abnormal data sequence extraction method is characterized by comprising the following steps: the photovoltaic power station detection data has typical characteristics of scale, diversity and low-value density, so in order to solve the problems of analysis and characteristic extraction of high-latitude complex data, the invention adopts a classification principle based on a support vector machine, firstly establishes a data sequence sample set from three dimensions of a pitch axis, a data axis and a time axis, secondly judges abnormal states of various types of data sequences by adopting a training support vector machine model, and finally realizes analysis comparison, judgment and extraction of abnormal data sequences, wherein the specific implementation steps are as follows:
the method comprises the following steps: establishing a complete data sequence sample set by importing a plurality of original electrical data sequences of healthy power stations, summarizing waveform change rules of the abnormal data sequences by analyzing waveform characteristics of the abnormal data sequences, comparing the data sequences with a set health threshold after calculating a deviation rate sequence and a Mahalanobis distance, and marking a normal/abnormal state to obtain a data sequence training sample, wherein the complete data sequence comprises a normal data sequence sample set and an abnormal data sequence sample set;
step two: designing a training support vector machine model, training the support vector machine by using the obtained data sequence sample set to obtain a classifier model, determining optimal parameters of the model by using a grid search and cross verification method, and then quickly finishing classification of all data sequences in the sample set to realize normal/abnormal state judgment of the electrical data sequence of the photovoltaic power station;
step three: the method comprises the steps of judging the states of support vector machines of various data sequences, elaborately designing data sequence formats in three dimensions of a joint axis, a data axis and a time axis according to the occurrence and evolution principles of field abnormity or fault phenomena by combining field operation and maintenance, judging the data sequences containing various abnormity or faults by using the support vector machines, and verifying the reliability and effectiveness of the method;
the data sequence format is as follows: time length, sampling period, data type, decision frequency, etc.; aiming at urgent repair abnormity or faults, a data sequence is set to be short in time length, small in sampling period and high in judgment frequency, for example, a photovoltaic module bypass diode conduction fault, voltage data and the like are selected according to the data type;
step four: after the state judgment of the support vector machine with multiple data types is completed, the on-line abnormal state judgment of the voltage, current and power data sequences is detected and monitored by the power station, normal/abnormal state classification and identification are realized, and the abnormal data sequences are extracted for the next fault diagnosis. According to the judging and extracting effects, the data sequence format parameters, the support vector machine model structure parameters, the kernel functions and the like are revised, and the online applicability and the state judging accuracy of the method are improved.
2. The method for extracting the abnormal data sequence of the photovoltaic power station as claimed in claim 1, wherein the specific steps of obtaining the training sample set of the data sequence in the step one are as follows:
s1, selecting a certain number of multiple paths (same link, same type and same time period) of normal data sequences with better consistency from a plurality of healthy power stations as a normal data sequence sample set, wherein the multiple paths of normal data sequence sample sets are as shown in a formula (1):
Xi={xi,t}t=1,2..49 (1)
wherein X in the formula (1)iIs a normal data sequence sample set (i is a multipath serial number), xi,jVoltage and current data of the ith group string at the time t;
the current and voltage data in the normal data sequence sample set are easily influenced by the model of the components in the power station, so the normalization processing is carried out, as shown in formula (2):
wherein x in the formula (2)aver_tThe average value of current and voltage at the moment t, and n is the number of the series branches in the power station;
the average value is calculated again after the data lower than the average value are removed, and the expected values of the voltage and current data at each moment are obtained by repeating the calculation for multiple timesThereby forming a corrected average value sequence
S2, calculating a normal data sequence XiAnd the mean value sequenceDeviation of (2)Sequence PiAs shown in formula (3):
wherein p in formula (3)i,tThe deviation rate of the ith branch at the time t is obtained;
because the deviation rate mode can basically eliminate the influence of different light intensities and environment temperatures on the data sequence numerical values, the deviation rate sequence can objectively reflect the difference degree between the data sequences, and then the average value of the multipath deviation rates at the same time is calculated to obtain the deviation rate average value sequence
S3, describing the multipath deviation rate sequence P by using the Mahalanobis distanceiWith a sequence of multiple deviation ratios averagesThen, obtaining a multiple deviation rate sequence Mahalanobis distance subject to normal distribution through Box-Cox power function transformation, calculating mathematical expectation and standard deviation, selecting a proper confidence level by combining operation and maintenance experience of the photovoltaic power station, wherein the confidence level reflects the probability that a sample falls in a confidence interval, namely the probability that the sample is normal, so that the confidence level is related to the proportion of an abnormal sample. In order to determine reasonable confidence level, calculating confidence interval corresponding to each confidence level, calculating corresponding threshold value by using Box-Cox inverse transformation, comparing Mahalanobis distance of each deviation rate sequence with the threshold value, marking the sample, testing the marking result by combining the waveform characteristics of abnormal data sequence, and counting misjudgment times n1Number of missed judgment n2Calculating the accuracy k according to the formula (4)1;
k1=1-(n1+n2)/n (4)
Wherein n in the above formula is the number of samples;
and S4, selecting a sufficient number of data sequences (multiple links, multiple types and multiple time periods) as a complete data sequence sample set including normal and abnormal states, repeating the calculation process from S1 to S2, calculating and obtaining the Mahalanobis distance of all the data sequences, comparing the Mahalanobis distance with a health threshold, and classifying and marking the data sequences according to the normal/abnormal states to obtain the data sequence sample set which can be used for supporting the training of the vector machine model.
3. The method for extracting the abnormal data sequence of the photovoltaic power station as recited in claim 1, wherein the step two of designing and training the support vector machine model comprises the following specific steps:
a. according to the abnormal features of the data sequence in the photovoltaic power station, a hyperplane which supports a vector machine model and seeks an optimal solution in the training sample set obtained in the step one is constructed, in the invention, the number of samples in the photovoltaic power station is far more than the number of sample features, the polynomial kernel parameters are more, meanwhile, when the order is higher, the complexity is higher, and the Gaussian kernel can map the sample features into an infinite space, so that the Gaussian kernel is more suitable to be selected;
b. in the construction of a support vector machine model, a grid search and cross verification method is combined to select a penalty factor C and a Gaussian kernel parameter g;
c. introducing misjudgment rate k for representing classification effect under different parameter combinations2And a rate of missing judgment k3As shown in formula (5):
wherein n in the above formula (5)p、nnThe number of normal and abnormal samples.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210218937.0A CN114595762A (en) | 2022-03-08 | 2022-03-08 | Photovoltaic power station abnormal data sequence extraction method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210218937.0A CN114595762A (en) | 2022-03-08 | 2022-03-08 | Photovoltaic power station abnormal data sequence extraction method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114595762A true CN114595762A (en) | 2022-06-07 |
Family
ID=81815749
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210218937.0A Withdrawn CN114595762A (en) | 2022-03-08 | 2022-03-08 | Photovoltaic power station abnormal data sequence extraction method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114595762A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116930595A (en) * | 2023-09-18 | 2023-10-24 | 法拉迪电气有限公司 | Accurate data metering method for new energy grid-connected voltage regulation |
CN117114254A (en) * | 2023-10-25 | 2023-11-24 | 山东电力工程咨询院有限公司 | Power grid new energy abnormal data monitoring method and system |
-
2022
- 2022-03-08 CN CN202210218937.0A patent/CN114595762A/en not_active Withdrawn
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116930595A (en) * | 2023-09-18 | 2023-10-24 | 法拉迪电气有限公司 | Accurate data metering method for new energy grid-connected voltage regulation |
CN116930595B (en) * | 2023-09-18 | 2023-11-24 | 法拉迪电气有限公司 | Accurate data metering method for new energy grid-connected voltage regulation |
CN117114254A (en) * | 2023-10-25 | 2023-11-24 | 山东电力工程咨询院有限公司 | Power grid new energy abnormal data monitoring method and system |
CN117114254B (en) * | 2023-10-25 | 2024-03-19 | 山东电力工程咨询院有限公司 | Power grid new energy abnormal data monitoring method and system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110288136B (en) | Wind power multi-step prediction model establishment method | |
CN109308571B (en) | Distribution line variable relation detection method | |
CN111680820B (en) | Distributed photovoltaic power station fault diagnosis method and device | |
CN114595762A (en) | Photovoltaic power station abnormal data sequence extraction method | |
CN108876163B (en) | Transient state power angle stability rapid evaluation method integrating causal analysis and machine learning | |
CN104573879A (en) | Photovoltaic power station output predicting method based on optimal similar day set | |
CN113011481B (en) | Electric energy meter function abnormality assessment method and system based on decision tree algorithm | |
CN116911806B (en) | Internet + based power enterprise energy information management system | |
CN115758151A (en) | Combined diagnosis model establishing method and photovoltaic module fault diagnosis method | |
CN110968703B (en) | Method and system for constructing abnormal metering point knowledge base based on LSTM end-to-end extraction algorithm | |
CN116432123A (en) | Electric energy meter fault early warning method based on CART decision tree algorithm | |
CN115859099A (en) | Sample generation method and device, electronic equipment and storage medium | |
CN115146723A (en) | Electrochemical model parameter identification method based on deep learning and heuristic algorithm | |
CN112926686B (en) | BRB and LSTM model-based power consumption anomaly detection method and device for big power data | |
CN104951654A (en) | Method for evaluating reliability of large-scale wind power plant based on control variable sampling | |
CN112327190B (en) | Method for identifying health state of energy storage battery | |
CN107730399B (en) | Theoretical line loss evaluation method based on wind power generation characteristic curve | |
CN116681312B (en) | Ecological-oriented multi-objective reservoir optimal scheduling decision method and system | |
CN116842337A (en) | Transformer fault diagnosis method based on LightGBM (gallium nitride based) optimal characteristics and COA-CNN (chip on board) model | |
CN116436405A (en) | Hot spot fault diagnosis method for photovoltaic string | |
CN116663393A (en) | Random forest-based power distribution network continuous high-temperature fault risk level prediction method | |
CN114676931B (en) | Electric quantity prediction system based on data center technology | |
CN115796341A (en) | Carbon effect code-based collaborative measure method for enterprise low-carbon economic performance | |
CN115733258A (en) | Control method of all-indoor intelligent substation system based on Internet of things technology | |
CN115828165B (en) | New energy intelligent micro-grid data processing method and system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20220607 |
|
WW01 | Invention patent application withdrawn after publication |