CN112883019A - Data processing method and system - Google Patents
Data processing method and system Download PDFInfo
- Publication number
- CN112883019A CN112883019A CN202110194372.2A CN202110194372A CN112883019A CN 112883019 A CN112883019 A CN 112883019A CN 202110194372 A CN202110194372 A CN 202110194372A CN 112883019 A CN112883019 A CN 112883019A
- Authority
- CN
- China
- Prior art keywords
- data
- wind speed
- data set
- processed
- power
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000003672 processing method Methods 0.000 title claims abstract description 22
- 238000000034 method Methods 0.000 claims abstract description 107
- 238000004140 cleaning Methods 0.000 claims abstract description 39
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 21
- 230000002159 abnormal effect Effects 0.000 claims description 106
- 238000007781 pre-processing Methods 0.000 claims description 22
- 238000012545 processing Methods 0.000 claims description 15
- 238000003860 storage Methods 0.000 claims description 11
- 230000007613 environmental effect Effects 0.000 claims description 4
- 238000009826 distribution Methods 0.000 description 12
- 238000012937 correction Methods 0.000 description 10
- 238000010586 diagram Methods 0.000 description 10
- 238000007405 data analysis Methods 0.000 description 8
- 239000003570 air Substances 0.000 description 6
- 238000004364 calculation method Methods 0.000 description 5
- 230000000694 effects Effects 0.000 description 5
- 238000012544 monitoring process Methods 0.000 description 5
- 230000008569 process Effects 0.000 description 4
- 238000007619 statistical method Methods 0.000 description 4
- 239000012080 ambient air Substances 0.000 description 3
- 230000008859 change Effects 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 238000012423 maintenance Methods 0.000 description 3
- 230000000717 retained effect Effects 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000004590 computer program Methods 0.000 description 2
- 230000005611 electricity Effects 0.000 description 2
- 238000005259 measurement Methods 0.000 description 2
- 238000010248 power generation Methods 0.000 description 2
- NAWXUBYGYWOOIX-SFHVURJKSA-N (2s)-2-[[4-[2-(2,4-diaminoquinazolin-6-yl)ethyl]benzoyl]amino]-4-methylidenepentanedioic acid Chemical compound C1=CC2=NC(N)=NC(N)=C2C=C1CCC1=CC=C(C(=O)N[C@@H](CC(=C)C(O)=O)C(O)=O)C=C1 NAWXUBYGYWOOIX-SFHVURJKSA-N 0.000 description 1
- 238000012935 Averaging Methods 0.000 description 1
- 101150040844 Bin1 gene Proteins 0.000 description 1
- 238000009825 accumulation Methods 0.000 description 1
- 230000001174 ascending effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000013211 curve analysis Methods 0.000 description 1
- 238000012217 deletion Methods 0.000 description 1
- 230000037430 deletion Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 239000006185 dispersion Substances 0.000 description 1
- 230000036541 health Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 238000013450 outlier detection Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 239000000725 suspension Substances 0.000 description 1
- 238000012549 training Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/215—Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F30/00—Computer-aided design [CAD]
- G06F30/20—Design optimisation, verification or simulation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/04—Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/06—Energy or water supply
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Economics (AREA)
- General Physics & Mathematics (AREA)
- Human Resources & Organizations (AREA)
- Strategic Management (AREA)
- Health & Medical Sciences (AREA)
- Marketing (AREA)
- General Engineering & Computer Science (AREA)
- Tourism & Hospitality (AREA)
- Databases & Information Systems (AREA)
- General Business, Economics & Management (AREA)
- Quality & Reliability (AREA)
- Water Supply & Treatment (AREA)
- Public Health (AREA)
- General Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- Primary Health Care (AREA)
- Development Economics (AREA)
- Game Theory and Decision Science (AREA)
- Entrepreneurship & Innovation (AREA)
- Operations Research (AREA)
- Computer Hardware Design (AREA)
- Evolutionary Computation (AREA)
- Geometry (AREA)
- Wind Motors (AREA)
Abstract
The application provides a data processing method and a system, wherein the data processing method comprises the following steps: acquiring at least part of data in an SCADA data set as a data set to be processed, wherein the data set to be processed comprises a plurality of data groups, and each data group comprises wind speed data and corresponding output power data at the same moment; carrying out data cleaning on a data set to be processed by adopting a quartile method to obtain an intermediate data set; according to the cut-in wind speed V of the wind generating setminCut-out wind speed VmaxAnd rated wind speed VrDividing the intermediate data set into two sub data sets, wherein the range of the wind speed data of one sub data set is [ V ]min,Vr) The wind speed data of the other sub data set is in the range of Vr,Vmax](ii) a And performing data cleaning on the subdata sets by adopting a bin algorithm. The quality of SCADA data can be improved.
Description
Technical Field
The application relates to the field of wind driven generators, in particular to a data processing method and system.
Background
The quality of the wind generating set data directly determines the quality of the forecasting and generalization capability of the model, and the quality relates to many factors such as accuracy, completeness, consistency, timeliness, credibility and interpretability. The data acquired by the wind generating set SCADA system in actual production may contain a large amount of abnormal data such as missing values and noises, and abnormal data may exist due to manual entry errors, so that the training and development of models such as monitoring and early warning are not facilitated. Therefore, how to clean the abnormal data in the data analysis of the wind generating set to eliminate the fluctuation of the SCADA data and improve the quality of the SCADA data is a problem to be solved in the field.
Disclosure of Invention
The application provides a data processing method and system.
Specifically, the method is realized through the following technical scheme:
according to a first aspect of embodiments of the present application, there is provided a data processing method for processing a SCADA data set of a wind turbine generator system acquired by a SCADA system, including:
acquiring at least part of data in the SCADA data set as a data set to be processed, wherein the data set to be processed comprises a plurality of data groups, and each data group comprises wind speed data at the same moment and output power data of the wind generating set corresponding to the wind speed data;
carrying out data cleaning on the data set to be processed by adopting a quartile method to obtain an intermediate data set;
according to the cut-in wind speed V of the wind generating setminCut-out wind speed VmaxAnd rated wind speed VrDividing the intermediate data set into two sub data sets, wherein the range of the wind speed data of one sub data set is [ V ]min,Vr) The wind speed data of the other sub data set is in the range of Vr,Vmax];
And performing data cleaning on the subdata set by adopting a bin algorithm to obtain a processed data set.
Optionally, the performing data cleaning on the data set to be processed by using a quartile method to obtain an intermediate data set includes:
determining abnormal wind speed data in the data set to be processed by adopting a quartile method;
and removing the abnormal wind speed data and other data of the data group comprising the abnormal wind speed data to obtain an intermediate data set.
Optionally, the method further comprises:
dividing the output power data into a plurality of power data groups, and obtaining the output power data of each power data group and corresponding wind speed data;
the method for determining the abnormal wind speed data in the data set to be processed by adopting the quartile method comprises the following steps:
and respectively determining abnormal wind speed data in the wind speed data corresponding to the plurality of power data groups by adopting a quartile method.
Optionally, the determining, by using a quartile method, abnormal wind speed data in wind speed data corresponding to a plurality of power data groups respectively includes:
respectively determining the inner limit wind speed range of the wind speed data in the plurality of power data groups by adopting a quartile method;
and determining abnormal wind speed data in the wind speed data corresponding to the power data group according to the inner limit wind speed range.
Optionally, the determining, according to the inner limit wind speed range, abnormal wind speed data in the wind speed data corresponding to the power data group includes:
adjusting the maximum wind speed of the inner limit wind speed range according to a preset upper limit margin coefficient, and adjusting the minimum wind speed of the inner limit wind speed range according to a preset lower limit margin coefficient to obtain a new inner limit wind speed range;
the abnormal wind speed data is the wind speed data which exceeds the new inner limit wind speed range in the power data group.
Optionally, the performing data cleaning on the sub data set by using a bin algorithm to obtain a processed data set includes:
determining abnormal output power data in each sub data set by using a bin algorithm;
and removing the abnormal output power data and other data of the data group comprising the abnormal output power data to obtain a processed data set.
Optionally, the determining abnormal output power data in each sub data set by using a bin algorithm includes:
dividing the wind speed data of the sub data set into a plurality of wind speed sections;
determining a power expected value and a power standard deviation of the wind speed section;
and determining abnormal output power data in the output power data corresponding to the wind speed section according to the expected power value and the standard power difference.
Optionally, for a range of wind speed data [ V ]min,Vr) Determining the expected power value of the wind speed section corresponding to the sub data set, including:
dividing the wind speed segment into a plurality of first sub-wind speed segments;
and determining the expected power value of the wind speed section according to the probability that the wind speed data is positioned in each first sub-wind speed section and the output power data corresponding to each first sub-wind speed section.
Optionally, for a range of wind speed data [ V ]min,Vr) Determining abnormal output power data in the output power data corresponding to the wind speed segment according to the expected power value and the standard power difference, wherein the abnormal output power data comprises:
determining a first effective power range as [ the desired power value ± 3 power standard deviations ];
determining a range of wind speed data as [ V ]min,Vr) Output power data outside the first effective power range in the wind speed section corresponding to the sub data set of (a) is abnormal output power data.
Optionally, for a range of wind speed data [ V ]r,Vmax]Determining the expected power value of the wind speed section corresponding to the sub data set, including:
dividing the wind speed segment into a plurality of second sub-wind speed segments;
and determining the power expected value of the wind speed section as the rated power of the wind generating set.
Optionally, for a range of wind speed data [ V ]r,Vmax]Determining abnormal output power data in the output power data corresponding to the wind speed segment according to the expected power value and the standard power difference, wherein the abnormal output power data comprises:
determining a second effective power range as [ the expected power value ± standard power deviation ];
determining a range of wind speed data as [ V ]r,Vmax]Output power data outside the second effective power range in the wind speed section corresponding to the sub data set of (a) is abnormal output power data.
Optionally, the method further comprises:
determining a wind speed expected value of the wind speed section;
and fitting a curve representing the relation between the wind speed and the output power of the wind generating set according to the wind speed expected value and the power expected value of each wind speed section to obtain a curve graph.
Optionally, after the removing the abnormal output power data and other data of the data group including the abnormal output power data to obtain the processed data set, the method further includes:
the processed data set is displayed on the graph in a first display mode.
Optionally, the method further comprises:
displaying the removed data set on the graph in a second display mode;
the first display mode is different from the second display mode.
Optionally, the acquiring at least part of the data in the SCADA dataset as a to-be-processed dataset includes:
and acquiring at least part of data with set time dimension in the SCADA data set as the data set to be processed.
Optionally, after at least part of data in the SCADA dataset is obtained as a to-be-processed dataset, the to-be-processed dataset is subjected to data cleaning by using a quartile method, and before an intermediate dataset is obtained, the method further includes:
preprocessing the data set to be processed;
the data cleaning is carried out on the data set to be processed by adopting a quartile method to obtain an intermediate data set, and the method comprises the following steps:
and (4) carrying out data cleaning on the preprocessed data set to be processed by adopting a quartile method to obtain an intermediate data set.
Optionally, the step of pre-processing comprises:
and replacing the data which is infinite in the data set to be processed with a default value.
Optionally, the step of preprocessing further comprises:
judging whether the number ratio of the data which is a default value in the data set to be processed is larger than a ratio threshold value, wherein the number ratio is the ratio of the number of the data which is a default value in the data set to be processed to the number of the data in the data set to be processed;
if so, removing the data which is the default value in the data set to be processed and other data of the data group comprising the data which is the default value;
if not, replacing the data which is the default value in the data set to be processed with the median of the data in the data set to be processed, and replacing other data of the data group which comprises the data which is the default value in the data set to be processed with the median of the other data.
Optionally, the step of pre-processing comprises:
removing data which are not of numerical type in the data set to be processed and other data of the data group of the data which comprise the data which are not of numerical type; and/or the presence of a gas in the gas,
removing the wind speed data which exceeds the wind speed range and other data of a data group comprising the wind speed data which exceeds the wind speed range in the data set to be processed, wherein the wind speed range is [ V ]min,Vmax](ii) a And/or the presence of a gas in the gas,
and removing the output power data which represents that the output power is negative in the data set to be processed and other data of the data set comprising the output power data.
Optionally, after the preprocessing the data set to be processed, the data cleaning is performed on the data set to be processed by using a quartile method, and before an intermediate data set is obtained, the method further includes:
correcting the wind speed data in the preprocessed data set to be processed according to the environmental temperature and the environmental air pressure of the current position of the wind generating set;
the data cleaning is carried out on the data set to be processed by adopting a quartile method to obtain an intermediate data set, and the method comprises the following steps:
and (4) carrying out data cleaning on the data set to be processed after the wind speed is corrected by adopting a quartile method to obtain an intermediate data set.
According to a second aspect of embodiments of the present application, there is provided a data processing system of a wind turbine generator system, comprising one or more processors for implementing the data processing method as described in any one of the above.
According to a third aspect of embodiments of the present application, there is provided a computer-readable storage medium having stored thereon a program which, when executed by a processor, implements a data processing method as described in any one of the above.
According to the technical scheme provided by the embodiment of the application, a quartile method and a regional bin method are combined, and the cut-in wind speed V of the wind generating set is considered during regional divisionminCut-out wind speed VmaxAnd rated wind speed VrThe method has the advantages that a better abnormal data cleaning effect is achieved, poor quality data in the SCADA data can be reduced, the quality of the SCADA data is improved, the accuracy of health prediction and management of the wind generating set based on the SCADA data is improved, a foundation is provided for building of models of monitoring and early warning of the wind generating set in the later period, accurate and effective data support is provided for the wind generating set, the influence of abnormal data on modeling is reduced, and safe and reliable operation of the wind generating set is guaranteed.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application.
FIG. 1 is a schematic diagram of wind speed-power anomaly data characteristic distribution of a prior wind generating set;
FIG. 2 is a flow chart diagram illustrating a data processing method according to an exemplary embodiment of the present application;
FIG. 3 is a schematic flow chart illustrating a process of performing data cleaning on a to-be-processed data set by a quartile method to obtain an intermediate data set according to an exemplary embodiment of the present application;
FIG. 4 is a schematic flow chart illustrating a method for determining abnormal wind speed data in wind speed data corresponding to a plurality of power data sets by using a quartile method according to an exemplary embodiment of the present application;
FIG. 5 is a diagram illustrating an exemplary embodiment of the bin algorithm averaging process;
FIG. 6 is a schematic flow chart illustrating an exemplary embodiment of the present application for performing data cleaning on a sub-dataset using a bin algorithm to obtain a processed dataset;
FIG. 7 is a flow chart diagram illustrating a data processing method according to another exemplary embodiment of the present application;
FIG. 8A is a graph of wind speed and power data distribution in raw data;
FIG. 8B is a graph illustrating a wind speed-output power data distribution obtained after preprocessing the raw data shown in FIG. 8A according to an exemplary embodiment of the present application;
FIG. 8C is a wind speed frequency plot for a wind turbine generator set illustrated in an exemplary embodiment of the present application;
FIG. 8D is a graph illustrating a range of variation of a wind speed correction factor β according to an exemplary embodiment of the present application;
FIG. 8E is a wind speed-power curve comparison before wind speed correction for a wind turbine generator set according to an exemplary embodiment of the present application;
FIG. 8F is a wind speed-power curve comparison graph illustrating a wind generating set after wind speed correction according to an exemplary embodiment of the present application;
FIG. 8G is a graph comparing data after removing abnormal wind speed data by a quartile method according to an exemplary embodiment of the present application;
FIG. 8H is a plot of wind speed versus power obtained after a first data cleaning using the quartile method followed by a second data cleaning using the dbin algorithm, as shown in an exemplary embodiment of the present application;
FIG. 9 is a system block diagram of a data processing system, shown in an exemplary embodiment of the present application.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present application, as detailed in the appended claims.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this application and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.
It is to be understood that although the terms first, second, third, etc. may be used herein to describe various information, such information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present application. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.
The data processing method and system of the present application will be described in detail below with reference to the accompanying drawings. The features of the following examples and embodiments may be combined with each other without conflict.
In the process of wind power generation, wind has high randomness and volatility, and the wind speed-power also change, so that a large amount of abnormal data exist in the operation data of the wind generating set collected by the SCADA system, and the abnormal operation monitoring and the prediction and maintenance of the components of the wind generating set are greatly influenced. The abnormal data can be classified into a first type of abnormal data, a second type of abnormal data and a third type of abnormal data, as shown in fig. 1, there are mainly 3 reasons for generating the abnormal data:
1) first type of anomaly data: the wind generating set is mostly located in an area with rich wind energy resources, the operating environment is severe, for example, gobi, wilderness, mountains and the like all cause certain influence on the wind generating set, so that a sensor of the wind generating set is easy to generate faults, and the wind generating set is likely to cause measurement data to deviate from a normal value due to the fact that the wind generating set is stopped in a maintenance fault plan. This type of data is represented in the wind speed-power curve as large wind speed, zero power or negative power.
2) And second type abnormal data: the wind abandoning and electricity limiting are phenomena of partial wind generating set suspension caused by insufficient receiving capacity of a local power grid, unstable wind power and the like. Abandoning wind and limiting electricity can cause the output power of the wind generating set to be maintained at a lower level for a long time, even if the wind speed exceeds the rated wind speed, the output power is lower than the rated power, and a large amount of abnormal data exist in the original data, and the abnormal data can directly influence the modeling accuracy of the prediction model.
3) And third type abnormal data: due to the fact that the control terminal of the SCADA system is far away from the wind power plant, noise and external electromagnetic interference can be generated in the data transmission process, operation data are abnormal, and the abnormal data are represented in a wind speed-power scatter diagram and are randomly and discretely distributed around a power curve.
These abnormal data directly affect the modeling accuracy of the prediction model. Therefore, in order to improve the data quality of the operation data, it is necessary to perform data cleaning on the operation data of the wind turbine generator system collected by the SCADA system so as to improve the quality of the operation data.
Referring to fig. 2, an embodiment of the present application provides a data processing method for processing a SCADA data set of a wind turbine generator system collected by a SCADA system, where the method includes the following steps:
and step S21, acquiring at least part of data in the SCADA data set as a data set to be processed, wherein the data set to be processed comprises a plurality of data groups, and each data group comprises wind speed data at the same moment and output power data of the wind generating set corresponding to the wind speed data.
As will be appreciated, the SCADA data may include temperature, wind speed, power, etc. monitoring data. All the data collected by the SCADA system in the same moment can be divided into the same data group. The SCADA system, i.e. a Data Acquisition And monitoring system (Supervisory Control And Data Acquisition), can monitor And Control the wind turbine generator system on site to realize various functions of Data Acquisition, equipment Control, measurement, parameter adjustment, various signal alarms And the like, And display the functions to users in a proper form such as sound, graphics, images And the like, thereby finally achieving the effect of sensing the states of various parameters of the equipment in real time.
Optionally, each data group is composed of wind speed data at the same time and output power data representing the wind turbine generator system corresponding to the wind speed data, and at this time, the data group may also be referred to as a wind speed power data pair.
In some optional embodiments, in step S21, acquiring at least a part of the data in the SCADA data set as the data set to be processed includes: and acquiring at least part of data of a set time dimension in the SCADA data set as a data set to be processed. The SCADA data can be divided into two dimensions of data of 10min and data of 30 s. Where the 10min data is the mean of a plurality of 30s data. Considering the slowly changing characteristics of the data monitored by the SCADA system due to factors such as wind speed change and variable rotating speed of the wind generating set, the fluctuation of the data of 30s is large, optionally, 10min data is selected, namely, the time dimension is set to be 10min, the data fluctuation caused by the variable rotating speed of the wind generating set can be reduced, and the influence on data analysis and processing is reduced. In addition, due to the fact that the SCADA data are various in types, all types of data can be obtained for data processing, and some types of data can also be obtained for data processing, for example, several types of data which have large influence on the wind generating set, such as wind speed, temperature, power and the like.
And step S22, performing data cleaning on the data set to be processed by adopting a quartile method to obtain an intermediate data set.
Referring to fig. 3, the data cleaning of the data set to be processed by the quartile method to obtain the intermediate data set may include the following steps:
and S221, determining abnormal wind speed data in the data set to be processed by adopting a quartile method.
The quartile method is an outlier detection algorithm independent of mean and variance, does not need a sequence to obey certain distribution, and has good data identification and universality. Specifically, the quartile method is to divide a group of ordered sequences arranged from small to large into four parts on average, each part of the sequence accounts for 25% of the whole group of sequences, and three data points are required as boundary points of each part, and the three data points are as follows: a lower quantile Q1, a median Q2, and an upper quantile Q3. For an ascending sequence X ═ X1, X2, X3, …, xn ], where the sequence xi < xi +1, n denotes the total number of samples and i denotes a certain point in the sequence, the calculation of quantiles Q1, Q2, Q3, respectively, allows to divide sequence X into four parts in a proportion of 25%.
Formula for median Q2:
the lower quantile Q1 and the upper quantile Q3 are calculated as follows:
when n is 2k (k is 1,2, …), the median Q2 divides the sequence X into two parts, i.e., a first half and a second half, and then Q1 and Q3 are calculated for the two parts of the sequence using the formulas (2) and (3), respectively.
When n is 4k +3(k is 0,1,2, …), the calculation formula is:
when n is 4k +1(k is 0,1,2, …), the calculation formula is:
according to the obtained Q1 and Q3, calculating to obtain a quartile distance IQR, namely:
IQR=Q3-Q1 (4);
statistically, IQR is used to determine the internal limit of sequence X:
[F1,Fu]=[Q1-μ1IQR,Q3+μ2IQR] (5);
in the formula (5), μ1、μ2As a weight, exemplary μ1=μ2=1.5。
Illustratively, in some embodiments, the method further comprises: the output power data in the data set to be processed are divided into a plurality of power data groups, and the output power data of each power data group and corresponding wind speed data are obtained.
Illustratively, the power set of the output power data in the data set to be processed is referred to as P, the output power data in P are sorted, and the minimum power P is determinedminAnd PmaxWherein P ismin∈P,PmaxBelongs to P and defines a power interval Pmin,Pmax]Power interval [ P ]min,Pmax]I.e. the output power data in P is obtained by sorting from small to large. Will power interval [ Pmin,Pmax]Dividing into Y (Y is more than 1) power data groups Bin, Y Bin can be a power interval [ Pmin,Pmax]The power interval [ P ] can be obtained by dividing equallymin,Pmax]Non-equal divisions.
Optionally, 125 is defined as Y, and the value of Y may be adjusted according to the abnormal data distribution.
In this embodiment, the quartile method is adopted to determine abnormal wind speed data in the wind speed data corresponding to the plurality of power data sets respectively. Referring to fig. 4, determining abnormal wind speed data in wind speed data corresponding to a plurality of power data sets by using a quartile method may include the following steps:
and S2211, respectively determining the inner limit wind speed range of the wind speed data in the plurality of power data groups by adopting a quartile method.
Specifically, wind speed data corresponding to each power data group is taken, and a lower quantile Q1 and an upper quantile Q3 of the power data group are calculated according to a formula (2) or a formula (3); then, equations (4) and (5) are used to determine the inner wind speed range [ v1, vu ] of the wind speed data in each power data set.
And S2212, determining abnormal wind speed data in the wind speed data corresponding to the power data group according to the inner limit wind speed range.
When determining the abnormal wind speed data in the wind speed data corresponding to the power data group according to the inner limit wind speed range, different strategies may be adopted, for example, in some embodiments, the abnormal wind speed data is the wind speed data in the power data group which exceeds the inner limit wind speed range.
In other embodiments, the maximum wind speed of the inner limit wind speed range is adjusted according to a preset upper limit margin coefficient S1, and the minimum wind speed of the inner limit wind speed range is adjusted according to a preset lower limit margin coefficient S0, so as to obtain a new inner limit wind speed range; the abnormal wind speed data is the wind speed data which exceeds the new inner limit wind speed range in the power data group, the inner limit wind speed range is updated through S1 and S0, and the new inner limit wind speed range is used as the judgment basis of the abnormal wind speed data, so that the mistaken deletion of the normal operation data can be avoided. Exemplary, the new inner limit wind speed range is [ S0 v1, S1 v ], where 0< S0<1, S1> 1. Alternatively, S0 is 0.85, S1 is 1.05, and S0 and S1 may be set to other sizes.
Step S222, removing the abnormal wind speed data and other data of the data group comprising the abnormal wind speed data to obtain an intermediate data set.
Illustratively, in some embodiments, if vi∈[v1,vu]Then, it is determined as normal data and retained if it is determinedAnd judging that the abnormal wind speed data are removed, and deleting the corresponding output power data, wherein vi is the ith wind speed data in the output power data group.
In other embodiments, if vi∈[S0*v1,S1*vu]Then, it is determined as normal data and retained if it is determinedAnd judging that the abnormal wind speed data are removed, and deleting the corresponding output power data.
It should be appreciated that in other embodiments, the output power data in the data set to be processed may not be divided, but the power interval [ P ] may be determined directly by the quartile methodmin,Pmax]Determining the power interval [ P ] of the abnormal wind speed data in the corresponding wind speed data by adopting a quartile methodmin,Pmax]The mode of the abnormal wind speed data in the corresponding wind speed data is similar to the mode of respectively determining the abnormal wind speed data in the wind speed data corresponding to each power data group by adopting the quartile method, and the description is omitted.
Step S23, according to cut-in wind speed V of the wind generating setminCut-out wind speed VmaxAnd rated wind speed VrDividing the intermediate data set into two sub data sets, wherein the range of the wind speed data of one sub data set is [ V ]min,Vr) The wind speed data of the other sub data set is in the range of Vr,Vmax]。
And step S24, performing data cleaning on the sub-data set by adopting a bin algorithm to obtain a processed data set.
The wind speed-power curve is an important index for evaluating the performance of the wind generating set, the wind speed can be seen as the most main factor influencing the output power of the wind generating set in the wind speed-power curve, and a manufacturer of the wind generating set provides a theoretical wind speed-power curve for the produced wind generating set as a part of the technical specification and the performance index of the wind generating set. The theoretical wind speed-power curve provides an expected relationship between wind speed and power under standard operating conditions, but due to different operating environments of different wind farms, the wind speed-power relationship obtained from measured data of the wind farms is different from the theoretical curve.
The output power of the wind generating set varies with the wind speed, and can be expressed by the following formula in theory:
in formula (6), P is the output power of the wind turbine generator system, CpThe wind energy utilization coefficient of the wind generating set; a ═ pi R2The area swept by the wind wheel; r is the radius of the wind wheel; ρ is the air density and v is the wind speed.
At present, the wind speed-power curve analysis of the wind generating set is usually a processing method provided by IEC61400-12 standard, please refer to FIG. 5, the wind speed range is divided into a plurality of regions (Bin) according to the interval of 0.5m/s by a data processing mode, and the center of each Bin is an integral multiple of 0.5 m/s. As shown in the following formula:
in the formula (7) and the formula (8),average wind speed and average power, v, of the No. I BinjAnd pjIs the jth group of wind speed power data pairs in the ith Bin, NIAnd the number of the wind speed power data pairs in the No. I Bin is shown.
Then, the wind speed data in the I Bin is statistically analyzed, namely [ min (VI), max (VI)]Further dividing the wind speed into m sections, and calculating the expected value of the No. I Bin wind speed by using the following formulaAnd desired value of power
In the formulae (9) and (10), pj′=nj/NIX 100% represents the probability that the wind speed is in the jth bin, njIndicating the number of wind speeds located in the jth bin, vj being the wind speed corresponding to the jth bin, and pj being the output power corresponding to the jth bin.
Referring to fig. 6, the data cleaning of the sub-dataset by using the bin algorithm to obtain the processed dataset may include the following steps:
and step S241, determining abnormal output power data in each subdata set by adopting a bin algorithm.
Optionally, determining abnormal output power data in each sub-data set by using a bin algorithm includes the following steps:
step 1), dividing the wind speed data of the sub data set into a plurality of wind speed sections.
The wind speed data can be divided into a range of V and V respectively in a division manner as shown in FIG. 5min,Vr) And the range of the wind speed data is [ V ]r,Vmax]The sub data set is divided into a plurality of wind speed segments, such as Bin1, Bin2, …, and the number of the wind speed segments divided by two sub data sets may be equal or unequal.
Illustratively, the range of wind speed data is [ V ]min,Vr) Is divided into N1 wind speed segments, the range of the wind speed data is Vr,Vmax]Is divided into N2 wind speed segments, where N1, N2 are positive integers. Optionally, N1 ≈ 2 ≈ Vr-Vmin), N2 ≈ 2 ≈ Vmax-Vr.
Specifically, the range for wind speed data is [ V ]min,Vr) Let K1 be 0 and K1<N1, the K1 wind speed section is [ V ]min+K1Vstep1,Vmin+(K1+1)Vstep1]Looking for [ V ]min+K1Vstep1,Vmin+(K1+1)Vstep1]Wind speed data V ofmIn which V isstep1Is the step length;
for a range of wind speed data of [ V ]r,Vmax]Let K2 be 0 and K2<N2, the K2 wind speed section is [ V ]r+K2Vstep2,Vr+(K2+1)Vstep2]Looking for [ V ]r+K2Vstep2,Vr+(K2+1)Vstep2]Wind speed data V ofnIn which V isstep2Is the step length;
and 2) determining the expected power value and the standard power difference of the wind speed section.
In the embodiment of the present application, the range for the wind speed data is [ V ]min,Vr) Dividing the wind speed section into a plurality of first sub-wind speed sections; and determining the expected power value of the wind speed section according to the probability that the wind speed data is positioned in each first sub-wind speed section and the output power data corresponding to each first sub-wind speed section. Specifically, the range for wind speed data is [ V ]min,Vr) Wind speed data V within the K1 th wind speed segment of the sub data set ofmPerforming statistical analysis, i.e. further analyzing [ min (V) ]m),max(Vm)]Dividing into m bins, and calculating the power expectation value of the K1 th wind speed segment according to the formula (10)
For a range of wind speed data of [ V ]r,Vmax]Dividing the wind speed section into a plurality of second sub-wind speed sections; determining the desired power value of the wind speed section as wind powerRated power of the generator set. Specifically, the range for wind speed data is [ V ]r,Vmax]Wind speed data V within the K2 th wind speed segment of the sub data set ofnPerforming statistical analysis, i.e. further analyzing [ min (V) ]n),max(Vn)]Power expectation value of wind speed segment divided into m bins and K2 th wind speed segmentPrIs the rated power.
The calculation method of the power standard deviation is selected as the existing calculation method of the standard deviation, which is not specifically described in the present application.
And 3) determining abnormal output power data in the output power data corresponding to the wind speed section according to the expected power value and the standard power difference.
In the embodiment of the present application, the range for the wind speed data is [ V ]min,Vr) Determining a first effective power range as [ power expected value + -3 power standard deviation ] of the wind speed segment corresponding to the sub data set](ii) a Determining a range of wind speed data as [ V ]min,Vr) The output power data outside the first effective power range in the wind speed section corresponding to the sub data set of (1) is abnormal output power data. I.e. for a range of wind speed data of [ V ]min,Vr) The wind speed section corresponding to the sub data set is combined with a judgment method of a 3 sigma criterion to detect abnormal output power data, wherein sigma is a power standard deviation. For the K1 th wind speed segment, the first effective power range isIn that The outer output power data is abnormal output power data.
For a range of wind speed data of [ V ]r,Vmax]According to the expected power value and the standard power deviation, the wind speed section corresponding to the subdata set is determinedTwo effective power ranges are [ expected power value + -standard power deviation](ii) a Determining a range of wind speed data as [ V ]r,Vmax]The output power data outside the second effective power range in the wind speed section corresponding to the sub data set of (1) is abnormal output power data. For the K2 th wind speed segment, the second effective power range isIn thatThe external output power data is abnormal output power data, so that the noise is small.
Step S242, remove the abnormal output power data and other data of the data group including the abnormal output power data to obtain a processed data set.
For example, the abnormal output power data and the corresponding wind speed data in each wind speed segment are removed, and the data left after the secondary data cleaning is the processed data set.
Referring to fig. 7, in some embodiments, the method further includes the following steps:
and step S71, determining the wind speed expectation value of the wind speed section.
The manner of dividing the wind speed data of the sub data set into a plurality of wind speed segments is as described above, and is not described herein again.
In the embodiment of the present application, the range for the wind speed data is [ V ]min,Vr) Dividing the wind speed section into a plurality of first sub-wind speed sections; and determining the expected wind speed value of the wind speed section according to the probability that the wind speed data is located in each first sub-wind speed section and the wind speed data corresponding to each first sub-wind speed section. Specifically, the range for wind speed data is [ V ]min,Vr) Wind speed data V within the K1 th wind speed segment of the sub data set ofmPerforming statistical analysis, i.e. further analyzing [ min (V) ]m),max(Vm)]Dividing into m bins, and calculating the expected wind speed value of the K1 th wind speed segment according to the formula (9)
For a range of wind speed data of [ V ]r,Vmax]Dividing the wind speed section into a plurality of second sub-wind speed sections; and determining the expected wind speed value of the wind speed section according to the probability that the wind speed data are located in each second sub-wind speed section and the wind speed data corresponding to each second sub-wind speed section. Specifically, the range for wind speed data is [ V ]r,Vmax]Wind speed data V within the K2 th wind speed segment of the sub data set ofnPerforming statistical analysis, i.e. further analyzing [ min (V) ]n),max(Vn)]Dividing into m bins, and calculating the expected wind speed value of the K2 th wind speed segment according to the formula (9)
And S72, fitting a curve representing the relation between the wind speed and the output power of the wind generating set according to the wind speed expected value and the power expected value of each wind speed section to obtain a curve graph.
Optionally, the abscissa of the curve represents the wind speed and the ordinate represents the output power.
The data cleaned by the quartile method are analyzed, the determined wind speed expected value and power expected value are more accurate, and a wind speed-power curve can be better fitted.
Further, the method may further include: and removing the abnormal output power data and other data of the data group comprising the abnormal output power data to obtain a processed data set, displaying the processed data set on the graph by adopting a first display mode, visually displaying the distribution area of the wind speed data and the output power data of each data group in the processed data set by adopting a graph mode, and displaying the position of the distribution area relative to a fitted curve.
Still further, the method further comprises: and displaying the removed data group on the graph by adopting a second display mode, wherein the first display mode is different from the second display mode, so that the difference between the data processed by the data processing method of the embodiment of the application and the original data can be visually and clearly displayed. The color displayed by the first display mode is different from the color displayed by the second display mode, and/or the identification of the data point displayed by the first display mode (the wind speed data and the output power data of each data group form a data point) is different from the identification of the data point displayed by the second display mode; of course, the first display mode and the second display mode may be distinguished by other display modes.
In some possible embodiments, the method may further include: after at least part of data in the SCADA data set is obtained and used as a data set to be processed, data cleaning is carried out on the data set to be processed by adopting a quartile method, and preprocessing is carried out on the data set to be processed before an intermediate data set is obtained; in step S21, data cleaning is performed on the data set to be processed by using a quartile method to obtain an intermediate data set, including: and (4) carrying out data cleaning on the preprocessed data set to be processed by adopting a quartile method to obtain an intermediate data set.
Different strategies may be used to preprocess the data set to be processed, and for example, in some embodiments, the preprocessing step includes: and replacing data which is infinite in the data set to be processed with a default value NAN. In the actual operation of the wind generating set, because the geographic position of the wind generating set may have poor communication signals, a signal interruption state often occurs, data in the SCADA system when the signals are interrupted is recorded as a default value, and the preprocessing step aims to remove data which are in the default value in a data set to be processed and other data which are in the same moment in the data set comprising the data which are in the default value, so that fluctuation of the SCADA data can be reduced, and the quality of the SCADA data and the accuracy of data analysis are improved.
Further, the step of preprocessing further comprises: judging whether the number ratio of the data which is the default value in the data set to be processed is larger than a ratio threshold value, wherein the number ratio is the ratio of the number of the data which is the default value in the data set to be processed to the number of the data in the data set to be processed; if so, removing the data which is the default value in the data set to be processed and other data of the data group comprising the data which is the default value; if not, replacing the data which is the default value in the data set to be processed with the median of the data in the data set to be processed, and replacing other data of the data group which comprises the data which is the default value in the data set to be processed with the median of the other data. Optionally, the duty ratio threshold is 20%, and it is understood that the duty ratio threshold may be set to other sizes as needed.
In some embodiments, the step of pre-processing comprises: and removing the data which is in the non-numerical type in the data set to be processed and other data of the data group comprising the data which is in the non-numerical type, and improving the quality of the SCADA data and the accuracy of data analysis.
In some embodiments, wind speed data outside of a wind speed range, V, and other data of a data set including the wind speed data outside of the wind speed range are removed from the set of data to be processedmin,Vmax]. It can be understood that the cut-in wind speed of the wind generating set is specific to the grid-connected wind generating set, and means that the wind speed V of the wind generating set reaches the grid-connected conditionminNamely the lowest wind speed capable of generating power, below which the wind generating set can be automatically stopped. Cut-out wind speed V of wind generating setmaxThe wind generating set is the maximum wind speed of grid-connected power generation of the wind generating set, and the wind generating set is cut out of a power grid when the wind speed exceeds the maximum wind speed, namely the wind generating set stops and stops generating power. When the wind generating set reaches the cut-in wind speed, the generator of the wind generating set can generate power continuously and stably.
In some embodiments, the output power data representing that the output power is negative in the data set to be processed and other data of the data set including the output power data are removed, that is, the output power data which is greater than 0 and less than the rated power and other data of the data set including the output power data are retained. In the actual operation of the wind generating set, due to factors such as maintenance, shutdown or small wind weather of the wind generating set, when the wind generating set does not reach cut-out wind speed, the power value corresponding to the SCADA data recorded in the SCADA system is a negative number, the data are not beneficial to subsequent data analysis, the preprocessing step aims to remove output power data representing that the output power is a negative number in the to-be-processed data set and other data at the same moment in the data set comprising the output power data, fluctuation of the SCADA data can be reduced, and quality of the SCADA data and accuracy of data analysis are improved.
In some optional embodiments, the pre-processing step may further comprise: and removing the data which exceed the alarm value in the data set to be processed and other data of the data group comprising the data which exceed the alarm value. In the actual operation of the wind generating set, each corresponding SCADA data point position can be set with an alarm value, and when the monitored data exceeds the alarm value, the data in the time period is explained to be out-of-tolerance data, which is not a normal state of the wind generating set, and is not beneficial to subsequent data analysis. For example, the alarm value of the bearing temperature is 60 degrees, and when the monitored actual bearing temperature is higher than 60 degrees, the state of the wind generating set is judged to be abnormal, and an alarm is given. The preprocessing step aims to remove the data which exceed the alarm value in the data set to be processed and other data at the same moment in the data set comprising the data which exceed the alarm value, so that the fluctuation of the SCADA data can be reduced, and the quality of the SCADA data and the accuracy of data analysis are improved.
Further, in some embodiments, the method further comprises: after the data set to be processed is preprocessed, data cleaning is carried out on the data set to be processed by adopting a quartile method, and before an intermediate data set is obtained, wind speed data in the preprocessed data set to be processed are corrected according to the ambient temperature and the ambient air pressure of the current position of the wind generating set. In step S21, data cleaning is performed on the data set to be processed by using a quartile method to obtain an intermediate data set, including: and (4) carrying out data cleaning on the data set to be processed after the wind speed is corrected by adopting a quartile method to obtain an intermediate data set.
For example, if the wind speed data is the original average wind speed V measured within 10 minutes, the air density correction is based on the average air density ρ measured within 10 minutes, i.e. the wind speed data is adjusted, and the adjusted wind speed data V' is as follows:
in the equations (13) and (14),representing the wind speed correction factor, p0Sea level dry air density (1.225 kg/m3) specified by the international organization for standardization atmospheric standards, B is the ambient air pressure measured over 10 minutes, T is the ambient temperature measured over 10 minutes, R0The gas constant of dry air is 287.05J/(kg. times.K).
For example, for the data after preprocessing, the corrected wind speed V' is calculated according to the formula (13) by combining the ambient temperature T and the ambient air pressure B of the current position of the wind turbine generator system.
The bin algorithm is adopted to fit the wind speed-power curve, and the condition that the power is kept unchanged when the wind speed is between the rated wind speed and the cut-out wind speed is not considered, so that the phenomenon of inaccurate fitting occurs when the power is the rated power; in the dbin algorithm (i.e. the combination of step S23 and step S24 in the present application), in consideration of the problem of unchanged power between the rated wind speed and the cut-out wind speed, the fitted curve also better conforms to the wind speed-power curve given by the trader, but when the wind speed-power curve is obtained and an abnormal value is detected by combining a determination method of the 3 σ criterion, the probability density of the wind speed-power is often multimodal, and a good removal effect cannot be achieved. When the quartile method is independently adopted to process the abnormal data, the proportion of the abnormal data in the wind power data is large, and the abnormal data can be incompletely removed. Therefore, the method combines the quartile method and the dbin method for the data so as to achieve a better abnormal data cleaning effect.
The SCADA data of a certain type of offshore 4MW wind generating set in 3 years is selected, the data processing method is verified, the cut-in wind speed of the wind generating set is 3m/s, the rated wind speed is 11.5m/s, the cut-out wind speed is 25m/s, and a wind speed output power data distribution diagram in original data is shown in fig. 8A, so that a large amount of the three types of abnormal data can be found in the original data. FIG. 8B is a graph of wind speed-output power data distribution after pre-processing, where the pre-processing retains data between cut-in and cut-out wind speeds, removes most of the first type anomaly data, and also has a large amount of second and third type anomaly data.
According to the data processing method, the wind speed of the wind generating set is corrected, a wind speed frequency graph of the wind generating set is shown in fig. 8C, the variation range of the wind speed correction factor beta is shown in fig. 8D, and it can be seen that the correction factor is mainly concentrated on about 1. And keeping the wind speed data after the wind speed correction, and redrawing a wind speed-power distribution map by using the corrected wind speed data and the output power data. For example, as shown in fig. 8E and 8F, the wind speed-power curve of the wind turbine generator system before and after wind speed correction has no change in the overall trend after correction, but the local profile of the curve is changed, which can be seen in the profiles of the boxes in fig. 8E and 8F.
Fig. 8G shows a data comparison situation after removing abnormal wind speed data by the quartile method, where a light gray point in fig. 8G is a wind speed-power distribution situation of the original data, and a black point is a wind speed-power distribution diagram after removing the abnormal wind speed data. As can be seen from the scatter in fig. 8G, there are a large number of abnormal data points (light gray) in the original wind speed power diagram, and the wind curtailment data with a certain regularity. After processing, most outliers are removed, but there is also a small amount of curtailment data, such as the partial data within the ellipse in FIG. 8G.
FIG. 8H shows a plot of wind speed versus power obtained after a first data wash using the quartile method followed by a second data wash using the dbin algorithm. The abnormal wind speed data are cleaned by introducing a quartile method, the curve fitting wind speed-power curve of the dbin algorithm tends to be gentle, the fitting precision is high, the effective identification of accumulation type abnormal data and dispersion type abnormal data in the wind speed-power curve can be realized, and the effect of removing outliers is obvious compared with a single mode.
Referring to fig. 9, an embodiment of the present application further provides a data processing system of a wind turbine generator system, including one or more processors, for implementing the data processing method according to any of the above embodiments.
Embodiments of the data processing system may be applied on a wind park. The device embodiments may be implemented by software, or by hardware, or by a combination of hardware and software. The software implementation is taken as an example, and as a device in a logical sense, a processor of the wind turbine generator set where the device is located reads corresponding computer program instructions in the nonvolatile memory into the memory for operation. From a hardware aspect, as shown in fig. 9, the hardware structure diagram of the wind turbine generator system in which the data processing system is located in the present application is shown, except for the processor, the internal bus, the memory, the network interface, and the nonvolatile memory shown in fig. 9, the wind turbine generator system in which the device is located in the embodiment may also include other hardware according to the actual function of the wind turbine generator, which is not described again.
The Processor may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The embodiment of the present application further provides a computer-readable storage medium, on which a program is stored, and when the program is executed by a processor, the program implements the data processing method according to any one of the above embodiments.
The computer readable storage medium may be an internal storage unit, such as a hard disk or a memory, of the wind turbine generator system according to any of the foregoing embodiments. The computer readable storage medium may also be an external storage device of the wind turbine, such as a plug-in hard disk, a Smart Media Card (SMC), an SD Card, a Flash memory Card (Flash Card), and the like, provided on the device. Further, the computer readable storage medium may also comprise both an internal storage unit of the wind park and an external storage device. The computer-readable storage medium is used for storing the computer program and other programs and data required by the wind park and may also be used for temporarily storing data that has been or will be output.
The above description is only exemplary of the present application and should not be taken as limiting the present application, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the scope of protection of the present application.
Claims (22)
1. A data processing method is used for processing an SCADA data set of a wind generating set collected by an SCADA system, and is characterized by comprising the following steps:
acquiring at least part of data in the SCADA data set as a data set to be processed, wherein the data set to be processed comprises a plurality of data groups, and each data group comprises wind speed data at the same moment and output power data of the wind generating set corresponding to the wind speed data;
carrying out data cleaning on the data set to be processed by adopting a quartile method to obtain an intermediate data set;
according to the cut-in wind speed V of the wind generating setminCut-out wind speed VmaxAnd rated wind speed VrDividing the intermediate data set into two sub data sets, wherein the range of the wind speed data of one sub data set is [ V ]min,Vr) The wind speed data of the other sub data set is in the range of Vr,Vmax];
And performing data cleaning on the subdata set by adopting a bin algorithm to obtain a processed data set.
2. The method of claim 1, wherein the performing data cleaning on the data set to be processed by using a quartile method to obtain an intermediate data set comprises:
determining abnormal wind speed data in the data set to be processed by adopting a quartile method;
and removing the abnormal wind speed data and other data of the data group comprising the abnormal wind speed data to obtain an intermediate data set.
3. The method of claim 2, further comprising:
dividing the output power data into a plurality of power data groups, and obtaining the output power data of each power data group and corresponding wind speed data;
the method for determining the abnormal wind speed data in the data set to be processed by adopting the quartile method comprises the following steps:
and respectively determining abnormal wind speed data in the wind speed data corresponding to the plurality of power data groups by adopting a quartile method.
4. The method according to claim 3, wherein the determining abnormal wind speed data in the wind speed data corresponding to the plurality of power data sets by using a quartile method comprises:
respectively determining the inner limit wind speed range of the wind speed data in the plurality of power data groups by adopting a quartile method;
and determining abnormal wind speed data in the wind speed data corresponding to the power data group according to the inner limit wind speed range.
5. The method of claim 4, wherein the determining abnormal wind speed data in the wind speed data corresponding to the power data set according to the inner limit wind speed range comprises:
adjusting the maximum wind speed of the inner limit wind speed range according to a preset upper limit margin coefficient, and adjusting the minimum wind speed of the inner limit wind speed range according to a preset lower limit margin coefficient to obtain a new inner limit wind speed range;
the abnormal wind speed data is the wind speed data which exceeds the new inner limit wind speed range in the power data group.
6. The method of claim 1, wherein the data cleansing of the sub data set using a bin algorithm to obtain a processed data set comprises:
determining abnormal output power data in each sub data set by using a bin algorithm;
and removing the abnormal output power data and other data of the data group comprising the abnormal output power data to obtain a processed data set.
7. The method of claim 6, wherein said determining abnormal output power data in each of said sub-data sets using a bin algorithm comprises:
dividing the wind speed data of the sub data set into a plurality of wind speed sections;
determining a power expected value and a power standard deviation of the wind speed section;
and determining abnormal output power data in the output power data corresponding to the wind speed section according to the expected power value and the standard power difference.
8. The method of claim 7, wherein the range for wind speed data is [ V ]min,Vr) Determining the expected power value of the wind speed section corresponding to the sub data set, including:
dividing the wind speed segment into a plurality of first sub-wind speed segments;
and determining the expected power value of the wind speed section according to the probability that the wind speed data is positioned in each first sub-wind speed section and the output power data corresponding to each first sub-wind speed section.
9. A method according to claim 7 or 8, characterized in that the range for wind speed data is [ V ]min,Vr) The wind speed segment corresponding to the subdata set of (1) is determined according to the expected power value and the standard power differenceThe abnormal output power data among the corresponding output power data includes:
determining a first effective power range as [ the desired power value ± 3 power standard deviations ];
determining a range of wind speed data as [ V ]min,Vr) Output power data outside the first effective power range in the wind speed section corresponding to the sub data set of (a) is abnormal output power data.
10. The method of claim 7, wherein the range for wind speed data is [ V ]r,Vmax]Determining the expected power value of the wind speed section corresponding to the sub data set, including:
dividing the wind speed segment into a plurality of second sub-wind speed segments;
and determining the power expected value of the wind speed section as the rated power of the wind generating set.
11. A method according to claim 7 or 10, characterised in that the range for wind speed data is [ V [ ]r,Vmax]Determining abnormal output power data in the output power data corresponding to the wind speed segment according to the expected power value and the standard power difference, wherein the abnormal output power data comprises:
determining a second effective power range as [ the expected power value ± standard power deviation ];
determining a range of wind speed data as [ V ]r,Vmax]Output power data outside the second effective power range in the wind speed section corresponding to the sub data set of (a) is abnormal output power data.
12. The method of claim 7, further comprising:
determining a wind speed expected value of the wind speed section;
and fitting a curve representing the relation between the wind speed and the output power of the wind generating set according to the wind speed expected value and the power expected value of each wind speed section to obtain a curve graph.
13. The method of claim 12, wherein after removing the abnormal output power data and other data of the data group including the abnormal output power data to obtain the processed data set, further comprising:
the processed data set is displayed on the graph in a first display mode.
14. The method of claim 13, further comprising:
displaying the removed data set on the graph in a second display mode;
the first display mode is different from the second display mode.
15. The method of claim 1, wherein said acquiring at least a portion of the data in the SCADA dataset as a pending dataset comprises:
and acquiring at least part of data with set time dimension in the SCADA data set as the data set to be processed.
16. The method of claim 1, wherein after acquiring at least a portion of the SCADA data set as a to-be-processed data set, before performing data cleaning on the to-be-processed data set by using a quartile method to obtain an intermediate data set, the method further comprises:
preprocessing the data set to be processed;
the data cleaning is carried out on the data set to be processed by adopting a quartile method to obtain an intermediate data set, and the method comprises the following steps:
and (4) carrying out data cleaning on the preprocessed data set to be processed by adopting a quartile method to obtain an intermediate data set.
17. The method of claim 16, wherein the step of pre-processing comprises:
and replacing the data which is infinite in the data set to be processed with a default value.
18. The method of claim 17, wherein the step of pre-processing further comprises:
judging whether the number ratio of the data which is a default value in the data set to be processed is larger than a ratio threshold value, wherein the number ratio is the ratio of the number of the data which is a default value in the data set to be processed to the number of the data in the data set to be processed;
if so, removing the data which is the default value in the data set to be processed and other data of the data group comprising the data which is the default value;
if not, replacing the data which is the default value in the data set to be processed with the median of the data in the data set to be processed, and replacing other data of the data group which comprises the data which is the default value in the data set to be processed with the median of the other data.
19. The method of claim 16, wherein the step of pre-processing comprises:
removing data which are not of numerical type in the data set to be processed and other data of the data group of the data which comprise the data which are not of numerical type; and/or the presence of a gas in the gas,
removing the wind speed data which exceeds the wind speed range and other data of a data group comprising the wind speed data which exceeds the wind speed range in the data set to be processed, wherein the wind speed range is [ V ]min,Vmax](ii) a And/or the presence of a gas in the gas,
and removing the output power data which represents that the output power is negative in the data set to be processed and other data of the data set comprising the output power data.
20. The method of claim 16, wherein after preprocessing the data set to be processed, performing data cleaning on the data set to be processed by using a quartile method, and before obtaining an intermediate data set, further comprising:
correcting the wind speed data in the preprocessed data set to be processed according to the environmental temperature and the environmental air pressure of the current position of the wind generating set;
the data cleaning is carried out on the data set to be processed by adopting a quartile method to obtain an intermediate data set, and the method comprises the following steps:
and (4) carrying out data cleaning on the data set to be processed after the wind speed is corrected by adopting a quartile method to obtain an intermediate data set.
21. A data processing system of a wind park comprising one or more processors for implementing the data processing method according to any of claims 1-20.
22. A computer-readable storage medium, having stored thereon a program which, when executed by a processor, implements a data processing method according to any one of claims 1 to 20.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110194372.2A CN112883019A (en) | 2021-02-20 | 2021-02-20 | Data processing method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110194372.2A CN112883019A (en) | 2021-02-20 | 2021-02-20 | Data processing method and system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112883019A true CN112883019A (en) | 2021-06-01 |
Family
ID=76056644
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110194372.2A Pending CN112883019A (en) | 2021-02-20 | 2021-02-20 | Data processing method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112883019A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115342036A (en) * | 2022-09-02 | 2022-11-15 | 西安热工研究院有限公司 | Abnormity early warning method and system for variable pitch motor of wind power generation set |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111461564A (en) * | 2020-04-08 | 2020-07-28 | 湖南大学 | Wind turbine generator power characteristic evaluation method based on cloud model and optimal combined weighting |
CN111881617A (en) * | 2020-07-02 | 2020-11-03 | 上海电气风电集团股份有限公司 | Data processing method, and performance evaluation method and system of wind generating set |
CN112032003A (en) * | 2020-09-01 | 2020-12-04 | 浙江运达风电股份有限公司 | Method for monitoring operation performance of large wind turbine generator |
-
2021
- 2021-02-20 CN CN202110194372.2A patent/CN112883019A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111461564A (en) * | 2020-04-08 | 2020-07-28 | 湖南大学 | Wind turbine generator power characteristic evaluation method based on cloud model and optimal combined weighting |
CN111881617A (en) * | 2020-07-02 | 2020-11-03 | 上海电气风电集团股份有限公司 | Data processing method, and performance evaluation method and system of wind generating set |
CN112032003A (en) * | 2020-09-01 | 2020-12-04 | 浙江运达风电股份有限公司 | Method for monitoring operation performance of large wind turbine generator |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115342036A (en) * | 2022-09-02 | 2022-11-15 | 西安热工研究院有限公司 | Abnormity early warning method and system for variable pitch motor of wind power generation set |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Shen et al. | A combined algorithm for cleaning abnormal data of wind turbine power curve based on change point grouping algorithm and quartile algorithm | |
Zheng et al. | Raw wind data preprocessing: a data-mining approach | |
CN111539553B (en) | Wind turbine generator fault early warning method based on SVR algorithm and off-peak degree | |
WO2022001379A1 (en) | Data processing method, and performance evaluation method and system of wind generating set | |
CN111878320B (en) | Monitoring method and system of wind generating set and computer readable storage medium | |
CN108443088B (en) | Wind turbine generator system state judging method based on cumulative probability distribution | |
CN110362045B (en) | Marine doubly-fed wind turbine generator fault discrimination method considering marine meteorological factors | |
CN103912448B (en) | A kind of regional wind power power of the assembling unit characteristic monitoring method | |
CN107728059B (en) | Pitch system state evaluation method | |
CN112032003B (en) | Method for monitoring operation performance of large wind turbine generator | |
US20150077155A1 (en) | Monitoring of wind turbine performance | |
CN112731022B (en) | Photovoltaic inverter fault detection method, equipment and medium | |
CN111209934A (en) | Fan fault prediction and alarm method and system | |
CN113236508B (en) | Method for detecting wind speed-power abnormal data of wind generating set | |
CN114417971A (en) | Electric power data abnormal value detection algorithm based on K nearest neighbor density peak clustering | |
Shi et al. | Study of wind turbine fault diagnosis and early warning based on SCADA data | |
CN117148045A (en) | Fault studying and judging management system for running state of power distribution network | |
CN112883019A (en) | Data processing method and system | |
CN117113157B (en) | Platform district power consumption fault detection system based on artificial intelligence | |
CN117787698A (en) | Micro-grid risk assessment method and system based on power supply range maximization | |
Feng et al. | Multivariate Anomaly Detection and Early Warning Framework for Wind Turbine Condition Monitoring Using SCADA Data | |
Xie et al. | Data Cleaning and Modeling of Wind Power Curves | |
CN115828439B (en) | Method and device for identifying abnormal loss of wind generating set | |
CN115616248A (en) | Wind turbine generator anemometer data anomaly identification method and system | |
CN110334951B (en) | Intelligent evaluation method and system for high-temperature capacity reduction state of wind turbine generator |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20210601 |