CN114595762A - Photovoltaic power station abnormal data sequence extraction method - Google Patents

Photovoltaic power station abnormal data sequence extraction method Download PDF

Info

Publication number
CN114595762A
CN114595762A CN202210218937.0A CN202210218937A CN114595762A CN 114595762 A CN114595762 A CN 114595762A CN 202210218937 A CN202210218937 A CN 202210218937A CN 114595762 A CN114595762 A CN 114595762A
Authority
CN
China
Prior art keywords
data
data sequence
sequence
abnormal
normal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202210218937.0A
Other languages
Chinese (zh)
Inventor
郭倩
高剑
卫东
顾鑫磊
郭泽阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Jiliang University
Original Assignee
China Jiliang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Jiliang University filed Critical China Jiliang University
Priority to CN202210218937.0A priority Critical patent/CN114595762A/en
Publication of CN114595762A publication Critical patent/CN114595762A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/2433Single-class perspective, e.g. one-against-all classification; Novelty detection; Outlier detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/20Administration of product repair or maintenance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/06Electricity, gas or water supply
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S10/00Systems supporting electrical power generation, transmission or distribution
    • Y04S10/50Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications

Abstract

The invention provides a photovoltaic power station abnormal data sequence extraction method, which comprises the steps of establishing a complete data sequence sample set according to original electrical data sequences of a plurality of healthy power stations, calculating deviation rate and Mahalanobis distance of the data sequences, setting a healthy threshold value, and realizing normal and abnormal marking; and designing a training support vector machine model to classify all data sequences in the sample set, then judging the state of the support vector machine for multiple types of data sequences, verifying the reliability and effectiveness of the method, and finally extracting abnormal data sequences from mass data in a problem power station by using the model, thereby achieving the positioning effect of abnormal string combination.

Description

Photovoltaic power station abnormal data sequence extraction method
The technical field is as follows:
the invention belongs to preparation work for a follow-up photovoltaic power station to perform a fault diagnosis technology by using a method for extracting an abnormal data sequence in a photovoltaic power station, and particularly relates to a method for extracting the abnormal data sequence of the photovoltaic power station.
Background art:
solar energy is renewable energy with the largest resource quantity and has the remarkable advantages of large reserves, no noise, universality, durability and the like. In the Paris agreement signed by the Hangzhou G20 Peak meeting, the total carbon dioxide emission of domestic unit production is 60-65% lower than that of 2005 in the promise of 2030 in China. According to the renewable energy medium and long term development planning, the installed capacity of solar power generation is estimated to reach 1.8GW in 2020, 600GW in 2050, and the installed capacity of solar power generation accounts for 5% of the installed capacity of electric power in China. The concept of 'intelligence +' related to the traditional digital economic energy industry is put forward for the first time by national strategy in 2019, and under the guidance of artificial intelligence technology, the photovoltaic industry is bound to meet the historical opportunity of intelligent upgrading. Compared with a centralized photovoltaic power station, the photovoltaic industry starts late, informatization construction is not fully developed, and data monitoring, analysis statistics, operation and maintenance scheme formulation and other work are often completed manually in operation management and maintenance of the power station.
Therefore, the method for extracting the abnormal data sequence of the photovoltaic power station is the most direct target at present, and compared with the complexity degree of manually monitoring data and then carrying out fault diagnosis and the possibility of errors, the method provided by the invention has the advantages that the function of a data mining algorithm is added, so that the abnormal data sequence of the photovoltaic power station is more intelligently extracted, and a precondition step is also provided for subsequent fault diagnosis.
The invention content is as follows:
the invention aims to provide a method for extracting abnormal data sequences of a photovoltaic power station, wherein the detection data of the photovoltaic power station has typical characteristics of scale, diversity and low value density, so that in order to solve the problems of analysis and characteristic extraction of high-latitude complex data, the method adopts a classification principle based on a support vector machine, firstly establishes a data sequence sample set from three dimensions of a joint axis, a data axis and a time axis, secondly judges abnormal states of various types of data sequences by adopting a training support vector machine model, and finally realizes analysis comparison, judgment and extraction of the abnormal data sequences, and the specific implementation steps are as follows:
the method comprises the following steps: establishing a complete data sequence sample set by importing a plurality of original electric data sequences of healthy power stations, comparing the complete data sequence sample set with a healthy threshold value after calculating a deviation rate sequence and a Mahalanobis distance, and marking a normal/abnormal state to obtain a data sequence training sample, wherein the complete data sequence comprises a normal data sequence sample set and an abnormal data sequence sample set;
the specific steps of obtaining the data sequence sample set are as follows:
s1, selecting a certain number of multiple paths of normal data sequences (in the same link, in the same type and at the same time interval) with better consistency from a plurality of healthy power stations as normal data sequence sample sets, wherein the multiple paths of normal data sequence sample sets are shown in a formula (1):
Xi={xi,t}t=1,2..49 (1)
x in the formula (1)iIs a normal data sequence sample set (i is a multipath serial number), xi,jVoltage and current data of the ith group string at the time t;
the current and voltage data in the normal data sequence sample set are easily influenced by the model of the components in the power station, so the normalization processing is carried out, as shown in formula (2):
Figure BDA0003535920890000021
wherein x in the formula (2)aver_tThe average value of current and voltage at the moment t, and n is the number of the series branches in the power station;
the average value is calculated again after the data lower than the average value are removed, and the expected values of the voltage and current data at each moment are obtained by repeating the calculation for multiple times
Figure BDA0003535920890000022
Thereby forming a corrected average value sequence
Figure BDA0003535920890000023
S2, calculating a normal data sequence XiAnd the sequence of mean values
Figure BDA0003535920890000024
Deviation sequence P ofiAs shown in formula (3):
Figure BDA0003535920890000025
wherein p in formula (3)i,tThe deviation rate of the ith branch at the time t is obtained;
because the deviation rate mode can basically eliminate the influence of different light intensities and environment temperatures on the data sequence numerical values, the deviation rate sequence can objectively reflect the difference degree between the data sequences, and then the average value of the multipath deviation rates at the same time is calculated to obtain the deviation rate average value sequence
Figure BDA0003535920890000026
S3, describing the multipath deviation rate sequence P by using the Mahalanobis distanceiWith a sequence of multiple deviation ratios averages
Figure BDA0003535920890000027
The distance of (d) is represented by the formula (4):
Figure BDA0003535920890000028
MD in formula (4)iMahalanobis distance, Σ, representing the ith sequence of deviation ratios-1The covariance matrix is a covariance matrix of a multipath deviation rate sequence, and the abnormal samples are fewer in number, so that the degree of deviation from a sample set is higher, and the Mahalanobis distance is larger; to determine a reasonable health threshold, the mahalanobis distance of each deviation rate sequence is converted into a variable subject to normal distribution by using Box-Cox power function transformation, and the determination threshold is converted into a determination confidence level.
The confidence level reflects the probability that the sample falls within the confidence interval, i.e., the likelihood that the sample is normal, so the confidence level is related to the proportion of abnormal samples. In order to determine reasonable confidence level, calculating confidence interval corresponding to each confidence level, calculating corresponding threshold value by using Box-Cox inverse transformation, comparing mahalanobis distance of each deviation rate sequence with the threshold value, marking the sample, testing the marked result by combining the waveform characteristics of abnormal data sequence, and counting misjudgment times n1Number of missed judgment n2Calculating the accuracy k according to the formula (5)1
k1=1-(n1+n2)/n (5)
Wherein n in the above formula is the number of samples;
and S4, selecting a sufficient number of data sequences (multiple links, multiple types and multiple time periods) as a complete data sequence sample set including normal and abnormal states, repeating the calculation process from S1 to S2, calculating and obtaining the Mahalanobis distance of all the data sequences, comparing the Mahalanobis distance with a health threshold, and classifying and marking the data sequences according to the normal/abnormal states to obtain the data sequence sample set which can be used for supporting the training of the vector machine model.
Step two: designing a training support vector machine model, training the support vector machine by using the obtained data sequence sample set to obtain a classifier model, quickly finishing the classification of all data sequences in the sample set, and judging the normal/abnormal state of the electrical data sequence of the photovoltaic power station;
the specific steps involved in training the support vector machine model are as follows:
a. according to the abnormal features of the data sequence in the photovoltaic power station, a hyperplane which supports a vector machine model and seeks an optimal solution in the training sample set obtained in the step one is constructed, in the invention, the number of samples in the photovoltaic power station is far more than the number of sample features, the polynomial kernel parameters are more, meanwhile, when the order is higher, the complexity is higher, and the Gaussian kernel can map the sample features into an infinite space, so that the Gaussian kernel is more suitable to be selected;
b. in the construction of a support vector machine model, a grid search and cross verification method is combined to select a penalty factor C and a Gaussian kernel parameter g;
c. introducing misjudgment rate k for representing classification effect under different parameter combinations2And a rate of missing judgment k3As shown in formula (6):
Figure BDA0003535920890000031
wherein n in the above formula (5)p、nnThe number of normal and abnormal samples.
Step three: the method comprises the steps of judging the states of support vector machines of various data sequences, combining site operation and maintenance, elaborately designing a data sequence format in three dimensions of a link axis (power generation/transmission/distribution/grid-connected point), a data axis (current/voltage/power) and a time axis (time/day/month/season) according to the occurrence and evolution principle of site abnormity or fault phenomena, judging the data sequences containing various types of abnormity or faults by using the support vector machines, and verifying the reliability and effectiveness of the method;
the data sequence format is as follows: time length, sampling period, data type, decision frequency, etc.; aiming at urgent repair abnormity or faults, the data sequence is set to be short in time length, small in sampling period and high in judgment frequency, for example, a photovoltaic module bypass diode is in conduction fault, and voltage data and the like are selected according to the data type;
step four: after the state judgment of the support vector machine with multiple data types is completed, the on-line abnormal state judgment of the voltage, current and power data sequences is detected and monitored by the power station, normal/abnormal state classification and identification are realized, and the abnormal data sequences are extracted for the next fault diagnosis. According to the judging and extracting effects, the data sequence format parameters, the support vector machine model structure parameters, the kernel functions and the like are revised, and the online applicability and the state judging accuracy of the method are improved.
For the photovoltaic power station, the invention further develops, researches and analyzes and fuses data sequence characteristics under various fault states by using a machine learning method for reference, develops rich connotation and distinct characteristics of the data sequence, realizes abnormal data sequence extraction, and prepares for subsequent fault diagnosis in the power station, so that the power station is more efficient in operation management, maintenance and overhaul, and the invention is put into practical use in the photovoltaic power station.
Description of the drawings:
fig. 1 is a general idea flow chart of a photovoltaic power station abnormal data sequence extraction method of the present invention.
Fig. 2 is a diagram of an implementation process for constructing a data sequence sample set.
Fig. 3(a) and fig. 3(b) are statistical results of the times of making a mark and making a false judgment, and making a missing judgment and the accuracy of the current deviation sequence under each confidence level.
Fig. 4(a) and 4(b) are statistical results of false positives, false negatives, and accuracy when the voltage deviation sequence is marked at each confidence level.
Fig. 5(a) and 5(b) are curved surface diagrams of the false rate and the false rate when classifying the current deviation rate sequence under each parameter combination.
Fig. 6(a) and 6(b) are curved surface diagrams of the false rate and the false rate when the voltage deviation rate sequence is classified under each parameter combination.
The specific implementation mode is as follows:
the following is a detailed description of embodiments of the invention, taken in conjunction with the accompanying drawings.
As shown in fig. 1, the abnormal data sequence extraction method for the photovoltaic power station is described in four steps.
The method comprises the following steps: a complete data sequence sample set is established by leading in a plurality of original electric data sequences of healthy power stations, is compared with a healthy threshold value after deviation rate sequence calculation and Mahalanobis distance calculation, and then is marked in a normal/abnormal state to obtain a data sequence training sample, wherein the complete data sequence comprises a normal data sequence sample set and an abnormal data sequence sample set, and as shown in figure 2, the steps for constructing the data sequence sample set are described;
s1, selecting a certain number of multiple paths (same link, same type and same time period) of normal data sequences with better consistency from a plurality of healthy power stations as a normal data sequence sample set, wherein the multiple paths of normal data sequence sample sets are as shown in a formula (1):
Xi={xi,t}t=1,2..49 (1)
x in the formula (1)iIs a normal data sequence sample set (i is a multipath serial number), xi,jVoltage and current data of the ith group string at the time t;
the current and voltage data in the normal data sequence sample set are easily influenced by the model of the components in the power station, so the normalization processing is carried out, as shown in formula (2):
Figure BDA0003535920890000051
wherein x in the formula (2)aver_tThe average value of current and voltage at the moment t, and n is the number of the series branches in the power station;
the average value is calculated again after the data lower than the average value are removed, and the expected values of the voltage and current data at each moment are obtained by repeating the calculation for multiple times
Figure BDA0003535920890000052
Thereby forming a corrected average value sequence
Figure BDA0003535920890000053
S2, calculating a normal data sequence XiAnd the sequence of mean values
Figure BDA0003535920890000054
Deviation sequence P ofiAs shown in formula (3):
Figure BDA0003535920890000055
wherein p in formula (3)i,tThe deviation rate of the ith branch at the time t is obtained;
because the deviation rate mode can basically eliminate the influence of different light intensities and environment temperatures on the data sequence numerical values, the deviation rate sequence can objectively reflect the difference degree between the data sequences, and then the average value of the multipath deviation rates at the same time is calculated to obtain the deviation rate average value sequence
Figure BDA0003535920890000056
S3, describing the multipath deviation rate sequence P by using the Mahalanobis distanceiWith a sequence of multiple deviation ratios averages
Figure BDA0003535920890000057
The distance of (d) is represented by the formula (4):
Figure BDA0003535920890000058
MD in formula (4)iMahalanobis distance, Σ, representing the ith sequence of deviation ratios-1A covariance matrix which is a multipath deviation ratio sequence;
due to the small number of abnormal samples, the abnormal samples deviate from the sample set to a high degree, and the mahalanobis distance is large. To determine a reasonable health threshold, the mahalanobis distance of each deviation rate sequence is converted into a variable subject to normal distribution by using Box-Cox power function transformation, and the determination threshold is converted into a determination confidence level.
The confidence level reflects the probability that the sample falls within the confidence interval, i.e., the likelihood that the sample is normal, so the confidence level is related to the proportion of abnormal samples. In order to determine reasonable confidence level, calculating confidence interval corresponding to each confidence level, calculating corresponding threshold value by using Box-Cox inverse transformation, comparing mahalanobis distance of each deviation rate sequence with the threshold value, marking the sample, testing the marked result by combining the waveform characteristics of abnormal data sequence, and counting misjudgment times n1Number of missed judgment n2Calculating the accuracy k according to the formula (5)1
k1=1-(n1+n2)/n (5)
As shown in fig. 3(a) and 3(b), the statistical results of the times of false mark and missed judgment and the accuracy of the current deviation sequence at each confidence level are shown, and it can be known from the graphs that the accuracy increases and then decreases with the increase of the confidence level, and the accuracy reaches 100% when the confidence level is 61.8% -65.8%. This is because when the confidence level is lower than 61.8%, the corresponding health threshold is too small, which is likely to cause erroneous judgment; when the confidence level is higher than 65.8%, the corresponding threshold value is too large, and the missed judgment is easy to generate.
As shown in fig. 4(a) and (b), the statistical results of the labeling misjudgment and the accuracy of the voltage deviation sequence at each confidence level show that the accuracy is the highest when the confidence level is 71.2% to 75.6%. A health threshold corresponding to a confidence level of 73.4% is calculated, enabling the labeling of the sequence of voltage deviation ratios.
And S4, selecting a sufficient number of data sequences (multiple links, multiple types and multiple time periods) as a complete data sequence sample set including normal and abnormal states, repeating the calculation process from S1 to S2, calculating and obtaining the Mahalanobis distance of all the data sequences, comparing the Mahalanobis distance with a health threshold, and classifying and marking the data sequences according to the normal/abnormal states to obtain the data sequence sample set which can be used for supporting the training of the vector machine model.
Step two: designing a training support vector machine model, training the support vector machine by using the obtained data sequence sample set to obtain a classifier model, quickly finishing the classification of all data sequences in the sample set, and judging the normal/abnormal state of the electrical data sequence of the photovoltaic power station;
the specific steps involved in training the support vector machine model are as follows:
a. according to the abnormal features of the data sequence in the photovoltaic power station, a hyperplane which supports a vector machine model and seeks an optimal solution in the training sample set obtained in the step one is constructed, in the invention, the number of samples in the photovoltaic power station is far more than the number of sample features, the polynomial kernel parameters are more, meanwhile, when the order is higher, the complexity is higher, and the Gaussian kernel can map the sample features into an infinite space, so that the Gaussian kernel is more suitable to be selected;
b. in the construction of a support vector machine model, a grid search and cross verification method is combined to select a penalty factor C and a Gaussian kernel parameter g;
c. introducing misjudgment rate k for representing classification effect under different parameter combinations2And a rate of missing judgment k3As shown in formula (5):
Figure BDA0003535920890000061
wherein n in the above formula (5)p、nnThe number of normal and abnormal samples.
When the current misjudgment rate is plotted in FIG. 5(a), it can be seen that k increases with the increase of g2Increasing and then decreasing, and then remaining the same. When g ∈ [2 ]-9,2-1.2]When k is2Is substantially 0; when g ∈ [2 ]-1.2,22.8]When, as C increases, k2Increasing first and then decreasing; when g ∈ [2 ]2.8,212]When k is2Is substantially 0.
As can be seen from the curve of the current leakage rate shown in FIG. 5(b), as g increases, k is increased3Decreasing and then increasing and then remaining unchanged. When g ∈ [2 ]-9,20.4]When, as C increases, k3Gradually decrease; when g ∈ [2 ]0.4,24.4]When, as C increases, k3Decreasing first and then increasing; when g ∈ [2 ]4.4,212]When, as C increases, k3Remain substantially unchanged.
As the parameter combination (C, g) varies, k2、k3Generally presents opposite change trends, and g is selected to achieve the balance of the twoi=4、Ci0.19, when k1=97.30%、k2=1.32%、k3=4.17%。
As can be seen from the voltage misjudgment ratio curve of FIG. 6(a), as g increases, k2Increasing and then decreasing, and then remaining the same. When g ∈ [2 ]-9,25.2]When k is2Is substantially 0; when g ∈ [2 ]5.2,28.4]When, as C increases, k2Increasing first and then decreasing; when g ∈ [2 ]8.4,212]When k is2Is substantially 0.
As can be seen from the graph of the voltage leakage rate curve in FIG. 6(b), as g increases, k3Decreasing and then increasing and then remaining unchanged. When g ∈ [2 ]-9,2-3.8]When, as C increases, k3Gradually decrease; when g is equal to [2 ]-3.8,22.4]When, as C increases, k3Decreasing first and then increasing; when g ∈ [2 ]2.4,212]When, as C increases, k3Remains substantially unchanged, so g is selectedu=4.92、Cu1.32, when k1=99.80%、k2=0%、k3=0.42%。
The optimal parameters of the model, namely g, are determined by combining a grid search method and a cross verification methodi=4、Ci=0.19、gu=4.92,Cu1.32, the method achieves a better classification effect and lays a foundation for extracting abnormal data sequences of the problem power stations.
Step three: the method comprises the steps of judging the states of support vector machines of various data sequences, combining site operation and maintenance, elaborately designing a data sequence format in three dimensions of a link axis (power generation/transmission/distribution/grid-connected point), a data axis (current/voltage/power) and a time axis (time/day/month/season) according to the occurrence and evolution principle of site abnormity or fault phenomena, judging the data sequences containing various types of abnormity or faults by using the support vector machines, and verifying the reliability and effectiveness of the method;
the data sequence format is as follows: time length, sampling period, data type, decision frequency, etc.; aiming at urgent repair abnormity or faults, the data sequence is set to be short in time length, small in sampling period and high in judgment frequency, for example, a photovoltaic module bypass diode is in conduction fault, and voltage data and the like are selected according to the data type;
step four: after the state judgment of the support vector machine with multiple data types is completed, the on-line abnormal state judgment of the voltage, current and power data sequences is detected and monitored by the power station, normal/abnormal state classification and identification are realized, and the abnormal data sequences are extracted for the next fault diagnosis. According to the judging and extracting effects, the data sequence format parameters, the support vector machine model structure parameters, the kernel functions and the like are revised, and the online applicability and the state judging accuracy of the method are improved.

Claims (3)

1. A photovoltaic power station abnormal data sequence extraction method is characterized by comprising the following steps: the photovoltaic power station detection data has typical characteristics of scale, diversity and low-value density, so in order to solve the problems of analysis and characteristic extraction of high-latitude complex data, the invention adopts a classification principle based on a support vector machine, firstly establishes a data sequence sample set from three dimensions of a pitch axis, a data axis and a time axis, secondly judges abnormal states of various types of data sequences by adopting a training support vector machine model, and finally realizes analysis comparison, judgment and extraction of abnormal data sequences, wherein the specific implementation steps are as follows:
the method comprises the following steps: establishing a complete data sequence sample set by importing a plurality of original electrical data sequences of healthy power stations, summarizing waveform change rules of the abnormal data sequences by analyzing waveform characteristics of the abnormal data sequences, comparing the data sequences with a set health threshold after calculating a deviation rate sequence and a Mahalanobis distance, and marking a normal/abnormal state to obtain a data sequence training sample, wherein the complete data sequence comprises a normal data sequence sample set and an abnormal data sequence sample set;
step two: designing a training support vector machine model, training the support vector machine by using the obtained data sequence sample set to obtain a classifier model, determining optimal parameters of the model by using a grid search and cross verification method, and then quickly finishing classification of all data sequences in the sample set to realize normal/abnormal state judgment of the electrical data sequence of the photovoltaic power station;
step three: the method comprises the steps of judging the states of support vector machines of various data sequences, elaborately designing data sequence formats in three dimensions of a joint axis, a data axis and a time axis according to the occurrence and evolution principles of field abnormity or fault phenomena by combining field operation and maintenance, judging the data sequences containing various abnormity or faults by using the support vector machines, and verifying the reliability and effectiveness of the method;
the data sequence format is as follows: time length, sampling period, data type, decision frequency, etc.; aiming at urgent repair abnormity or faults, a data sequence is set to be short in time length, small in sampling period and high in judgment frequency, for example, a photovoltaic module bypass diode conduction fault, voltage data and the like are selected according to the data type;
step four: after the state judgment of the support vector machine with multiple data types is completed, the on-line abnormal state judgment of the voltage, current and power data sequences is detected and monitored by the power station, normal/abnormal state classification and identification are realized, and the abnormal data sequences are extracted for the next fault diagnosis. According to the judging and extracting effects, the data sequence format parameters, the support vector machine model structure parameters, the kernel functions and the like are revised, and the online applicability and the state judging accuracy of the method are improved.
2. The method for extracting the abnormal data sequence of the photovoltaic power station as claimed in claim 1, wherein the specific steps of obtaining the training sample set of the data sequence in the step one are as follows:
s1, selecting a certain number of multiple paths (same link, same type and same time period) of normal data sequences with better consistency from a plurality of healthy power stations as a normal data sequence sample set, wherein the multiple paths of normal data sequence sample sets are as shown in a formula (1):
Xi={xi,t}t=1,2..49 (1)
wherein X in the formula (1)iIs a normal data sequence sample set (i is a multipath serial number), xi,jVoltage and current data of the ith group string at the time t;
the current and voltage data in the normal data sequence sample set are easily influenced by the model of the components in the power station, so the normalization processing is carried out, as shown in formula (2):
Figure FDA0003535920880000021
wherein x in the formula (2)aver_tThe average value of current and voltage at the moment t, and n is the number of the series branches in the power station;
the average value is calculated again after the data lower than the average value are removed, and the expected values of the voltage and current data at each moment are obtained by repeating the calculation for multiple times
Figure FDA0003535920880000022
Thereby forming a corrected average value sequence
Figure FDA0003535920880000023
S2, calculating a normal data sequence XiAnd the mean value sequence
Figure FDA0003535920880000024
Deviation of (2)Sequence PiAs shown in formula (3):
Figure FDA0003535920880000025
wherein p in formula (3)i,tThe deviation rate of the ith branch at the time t is obtained;
because the deviation rate mode can basically eliminate the influence of different light intensities and environment temperatures on the data sequence numerical values, the deviation rate sequence can objectively reflect the difference degree between the data sequences, and then the average value of the multipath deviation rates at the same time is calculated to obtain the deviation rate average value sequence
Figure FDA0003535920880000026
S3, describing the multipath deviation rate sequence P by using the Mahalanobis distanceiWith a sequence of multiple deviation ratios averages
Figure FDA0003535920880000027
Then, obtaining a multiple deviation rate sequence Mahalanobis distance subject to normal distribution through Box-Cox power function transformation, calculating mathematical expectation and standard deviation, selecting a proper confidence level by combining operation and maintenance experience of the photovoltaic power station, wherein the confidence level reflects the probability that a sample falls in a confidence interval, namely the probability that the sample is normal, so that the confidence level is related to the proportion of an abnormal sample. In order to determine reasonable confidence level, calculating confidence interval corresponding to each confidence level, calculating corresponding threshold value by using Box-Cox inverse transformation, comparing Mahalanobis distance of each deviation rate sequence with the threshold value, marking the sample, testing the marking result by combining the waveform characteristics of abnormal data sequence, and counting misjudgment times n1Number of missed judgment n2Calculating the accuracy k according to the formula (4)1
k1=1-(n1+n2)/n (4)
Wherein n in the above formula is the number of samples;
and S4, selecting a sufficient number of data sequences (multiple links, multiple types and multiple time periods) as a complete data sequence sample set including normal and abnormal states, repeating the calculation process from S1 to S2, calculating and obtaining the Mahalanobis distance of all the data sequences, comparing the Mahalanobis distance with a health threshold, and classifying and marking the data sequences according to the normal/abnormal states to obtain the data sequence sample set which can be used for supporting the training of the vector machine model.
3. The method for extracting the abnormal data sequence of the photovoltaic power station as recited in claim 1, wherein the step two of designing and training the support vector machine model comprises the following specific steps:
a. according to the abnormal features of the data sequence in the photovoltaic power station, a hyperplane which supports a vector machine model and seeks an optimal solution in the training sample set obtained in the step one is constructed, in the invention, the number of samples in the photovoltaic power station is far more than the number of sample features, the polynomial kernel parameters are more, meanwhile, when the order is higher, the complexity is higher, and the Gaussian kernel can map the sample features into an infinite space, so that the Gaussian kernel is more suitable to be selected;
b. in the construction of a support vector machine model, a grid search and cross verification method is combined to select a penalty factor C and a Gaussian kernel parameter g;
c. introducing misjudgment rate k for representing classification effect under different parameter combinations2And a rate of missing judgment k3As shown in formula (5):
Figure FDA0003535920880000031
wherein n in the above formula (5)p、nnThe number of normal and abnormal samples.
CN202210218937.0A 2022-03-08 2022-03-08 Photovoltaic power station abnormal data sequence extraction method Withdrawn CN114595762A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210218937.0A CN114595762A (en) 2022-03-08 2022-03-08 Photovoltaic power station abnormal data sequence extraction method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210218937.0A CN114595762A (en) 2022-03-08 2022-03-08 Photovoltaic power station abnormal data sequence extraction method

Publications (1)

Publication Number Publication Date
CN114595762A true CN114595762A (en) 2022-06-07

Family

ID=81815749

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210218937.0A Withdrawn CN114595762A (en) 2022-03-08 2022-03-08 Photovoltaic power station abnormal data sequence extraction method

Country Status (1)

Country Link
CN (1) CN114595762A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116930595A (en) * 2023-09-18 2023-10-24 法拉迪电气有限公司 Accurate data metering method for new energy grid-connected voltage regulation
CN117114254A (en) * 2023-10-25 2023-11-24 山东电力工程咨询院有限公司 Power grid new energy abnormal data monitoring method and system

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116930595A (en) * 2023-09-18 2023-10-24 法拉迪电气有限公司 Accurate data metering method for new energy grid-connected voltage regulation
CN116930595B (en) * 2023-09-18 2023-11-24 法拉迪电气有限公司 Accurate data metering method for new energy grid-connected voltage regulation
CN117114254A (en) * 2023-10-25 2023-11-24 山东电力工程咨询院有限公司 Power grid new energy abnormal data monitoring method and system
CN117114254B (en) * 2023-10-25 2024-03-19 山东电力工程咨询院有限公司 Power grid new energy abnormal data monitoring method and system

Similar Documents

Publication Publication Date Title
CN110288136B (en) Wind power multi-step prediction model establishment method
CN109308571B (en) Distribution line variable relation detection method
CN111680820B (en) Distributed photovoltaic power station fault diagnosis method and device
CN114595762A (en) Photovoltaic power station abnormal data sequence extraction method
CN108876163B (en) Transient state power angle stability rapid evaluation method integrating causal analysis and machine learning
CN104573879A (en) Photovoltaic power station output predicting method based on optimal similar day set
CN113011481B (en) Electric energy meter function abnormality assessment method and system based on decision tree algorithm
CN116911806B (en) Internet + based power enterprise energy information management system
CN115758151A (en) Combined diagnosis model establishing method and photovoltaic module fault diagnosis method
CN110968703B (en) Method and system for constructing abnormal metering point knowledge base based on LSTM end-to-end extraction algorithm
CN116432123A (en) Electric energy meter fault early warning method based on CART decision tree algorithm
CN115859099A (en) Sample generation method and device, electronic equipment and storage medium
CN115146723A (en) Electrochemical model parameter identification method based on deep learning and heuristic algorithm
CN112926686B (en) BRB and LSTM model-based power consumption anomaly detection method and device for big power data
CN104951654A (en) Method for evaluating reliability of large-scale wind power plant based on control variable sampling
CN112327190B (en) Method for identifying health state of energy storage battery
CN107730399B (en) Theoretical line loss evaluation method based on wind power generation characteristic curve
CN116681312B (en) Ecological-oriented multi-objective reservoir optimal scheduling decision method and system
CN116842337A (en) Transformer fault diagnosis method based on LightGBM (gallium nitride based) optimal characteristics and COA-CNN (chip on board) model
CN116436405A (en) Hot spot fault diagnosis method for photovoltaic string
CN116663393A (en) Random forest-based power distribution network continuous high-temperature fault risk level prediction method
CN114676931B (en) Electric quantity prediction system based on data center technology
CN115796341A (en) Carbon effect code-based collaborative measure method for enterprise low-carbon economic performance
CN115733258A (en) Control method of all-indoor intelligent substation system based on Internet of things technology
CN115828165B (en) New energy intelligent micro-grid data processing method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20220607

WW01 Invention patent application withdrawn after publication