CN114169681B - Wind turbine generator power curve data preprocessing method considering space-time outlier detection - Google Patents

Wind turbine generator power curve data preprocessing method considering space-time outlier detection Download PDF

Info

Publication number
CN114169681B
CN114169681B CN202111314376.6A CN202111314376A CN114169681B CN 114169681 B CN114169681 B CN 114169681B CN 202111314376 A CN202111314376 A CN 202111314376A CN 114169681 B CN114169681 B CN 114169681B
Authority
CN
China
Prior art keywords
power curve
data
box
wind speed
time
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111314376.6A
Other languages
Chinese (zh)
Other versions
CN114169681A (en
Inventor
杨秦敏
方静宜
陈积明
孟文超
陈棋
傅凌焜
孙勇
王琳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Zhejiang Windey Co Ltd
Original Assignee
Zhejiang University ZJU
Zhejiang Windey Co Ltd
Filing date
Publication date
Application filed by Zhejiang University ZJU, Zhejiang Windey Co Ltd filed Critical Zhejiang University ZJU
Priority to CN202111314376.6A priority Critical patent/CN114169681B/en
Publication of CN114169681A publication Critical patent/CN114169681A/en
Application granted granted Critical
Publication of CN114169681B publication Critical patent/CN114169681B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The invention discloses a wind turbine generator power curve data preprocessing method considering space-time outlier detection. According to the method, based on operation time, wind speed, active power and pitch angle data acquired from a wind turbine data acquisition and monitoring system, abnormal points which do not accord with a fan operation principle are removed according to wind speed information, an outlier judgment rule is formulated in a time dimension to remove the outlier, and an isolated forest algorithm based on outlier scores is used in a space dimension to remove the space outlier. In the invention, the abnormal point removing step considers the operation principle of the fan, and the step of detecting the outlier in the time dimension and the space dimension ensures the accuracy of detecting the outlier. The method is based on the real-time operation data of the wind turbine, the data preprocessing flow is standard and complete, the outlier detection result has higher credibility, can provide effective basis for fan state monitoring, and has certain engineering significance in practical application.

Description

Wind turbine generator power curve data preprocessing method considering space-time outlier detection
Technical Field
The invention relates to a preprocessing method for power curve data of a wind turbine, in particular to a preprocessing method for power curve data of a wind turbine considering space-time outlier detection.
Background
In the world, with the continuous development of industrial technology, the consumption of traditional fossil energy is increasing, industrial pollution is serious, and clean energy is gaining attention. Wind energy is always concerned by the society as a renewable clean energy source, and the wind power industry is one of new energy industries which are well developed at home and abroad in recent years. In China, the wind power industry is always an object of strong support of national policies, but simultaneously due to factors such as unbalanced geographical distribution, uncertainty of natural resources and the like, a fan control scheme is complex, and the performance of a wind driven generator cannot be well guaranteed. Therefore, the construction of a complete set of wind turbine generator fault prediction and health management system has very important significance for the operation planning of the wind turbine generator.
Through analysis of the operation principle of the wind turbine, the power curve of the wind turbine can reflect the current operation state of the fan and whether the components are faulty or not, so that accurate acquisition of the power curve of the wind turbine is of great significance in evaluating the performance of the wind turbine. However, due to factors such as accuracy of sensor measurement and interference influence in a data transmission process, a wind turbine power curve obtained from a wind turbine data acquisition and monitoring System (SCADA) often contains abnormal data points, and the existence of the abnormal data can cause deviation of performance condition evaluation of the wind turbine by operation and maintenance managers, which is not beneficial to health management and maintenance process of the wind turbine. Therefore, a pre-processing procedure is necessary to normalize the power curve data set containing the anomaly data.
The existing preprocessing method for the power curve data of the wind turbine has the following defects:
(1) The existing preprocessing method for the power curve data of the wind turbine generator does not have strict specification and step requirements;
(2) The existing preprocessing method for the power curve data of the wind turbine does not remove abnormal data points which do not meet the operation principle of the wind turbine at different wind speeds;
(3) The existing wind turbine generator power curve data preprocessing method only considers outliers in the space dimension, does not consider outliers in the time dimension, and is incomplete in data cleaning;
therefore, the preprocessing flow of the wind turbine power curve data is improved and innovated, so that wind turbine operation and maintenance personnel can obtain effective data information from a wind turbine data acquisition and monitoring System (SCADA).
Disclosure of Invention
The invention aims to perfect and normalize the defects of the existing wind turbine generator power curve data preprocessing technology, and provides a wind turbine generator power curve data preprocessing method considering space-time outlier detection. The method carries out outlier detection on the data set of the power curve of the wind turbine from the time dimension and the space dimension respectively, so that the accuracy of outlier detection is improved, and the method has high reference value for early warning of faults of the wind turbine and has certain reliability.
The aim of the invention is achieved by the following technical scheme: a preprocessing method of wind turbine power curve data considering space-time outlier detection comprises the following steps:
1) N pieces of operation data of the wind turbine to be evaluated in a demand period are read from a data acquisition and monitoring System (SCADA) of the wind turbine to be evaluated and form an operation data set of the wind turbine Selecting the information of running time { t i }, wind speed { v i }, active power { P i }, and pitch angle { beta i } from the data set to form a power curve information data set to be preprocessed, and recording the information data set asWherein i=1, 2,3, …, N;
2) For the power curve information data set in step 1) Detecting abnormal points, dividing abnormal points which do not accord with the fan operation principle according to the wind speed { v i }, and recording N normal pieces of operation data which do not accord with the characteristics of the abnormal points as a fan standard operation datasetWherein i=1, 2,3, …, N normal;
3) The standard operation data set of the fan in the step 2) is collected Wind speed inActive powerComposite power curve data setDividing b box-type intervals based on wind speedWherein j=1, 2,3, …, b;
4) Calculating each box section in step 3) Inside box interval mean data pointsBased on the box section average data points of the b box sections, using a Logistic growth function to fit to obtain a reference box power curve PV basic;
5) Setting the speed v lf and the maximum range of the right and left translation of the reference box power curve PV basic in the step 4) Speed v ud and maximum range of up-and-down translationRecording the upper left translation of the reference box power curve PV basic to obtain an upper limit box power curve PV up_limit, the lower right translation of the reference box power curve PV basic to obtain a lower limit box power curve PV low_limit, calculating the ratio alpha of the number of data points in the upper limit and lower limit box power curves obtained by each translation to the total data points, and determining the optimal upper limit box power curve when alpha is larger than alpha limit Optimum lower limit box type power curve
6) The power curve data set in the step 3) is collectedData points in (a)Sequentially judging whether the optimal upper limit box type power curve in the step 5) is positioned in time sequenceOptimum lower limit box type power curveIf the ith data point is outside the boundary range and the first a data point is within the boundary range, determining the ith data point as a time outlier
7) The power curve data set in the step 3) is collectedAs training data, randomly selecting beta samples as root nodes of an isolated tree, randomly designating a power curve data setRandomly designating a cutting point q in the feature, placing data points smaller than the cutting point q in the feature p in a left subnode, placing data points larger than the cutting point q in the feature p in a right subnode, continuously constructing subnodes by continuously designating to generate p and q until k isolated trees are generated, then bringing each data point into each isolated tree, calculating the outlier score of each data point, and judging the space outlier according to the outlier score
8) The time outliers in step 6) and step 7) are added to the time outliersAnd spatial outliersMaking union to form space-time outliersSpace-time outliersFrom the power curve data set in step 3)Removing to obtain a preprocessing result of wind turbine generator power curve data considering space-time outlier detectionAnd a reliable basis is provided for fan power curve fitting, state monitoring, fault early warning and the like.
Further, in the step 2), three types of abnormal points which do not conform to the operation principle of the fan are divided according to the wind speed { v i }, and the dividing is based on the following:
a) In a low wind speed state, the wind speed meets the condition of 0 < v i<vcut_in, wherein v cut_in is the cut-in wind speed of the fan, and if the active power P i is lower than 0 or within the threshold range of the active power, namely P i is less than 0 or 0 is less than or equal to P i<Pthres, the data point is judged to be an abnormal data point in the low wind speed state;
b) In the wind speed state, the wind speed meets the condition of v cut_in<vi<vrated, wherein v rated is the rated wind speed of the fan, and if the pitch angle beta i exceeds the threshold beta thres of the pitch angle, namely |beta i|>βthres, the data point is judged to be an abnormal data point in the wind speed state;
c) In the high wind speed state, the wind speed meets the condition of v rated<vi<vcut_off, wherein v cut_off is the cut-out wind speed of the fan, and if the active power P i is larger than or equal to the active power threshold P thres and smaller than the difference value between the rated active power P rated and the active power threshold P thres of the fan and P thres≤Pi<Prated-Pthres, the data point is judged to be an abnormal data point in the high wind speed state.
Further, in the step 3), the wind speed is based onThe steps of dividing the box-type section are as follows:
a) Standard operation data set of fan Wind speed inActive powerComposition data pointsSearching wind speedThe maximum wind speed value isRounding up to obtain the maximum value of the whole wind speed interval
B) Setting the interval of each wind speed box section as 1, setting b wind speed box sections in total, and recording each wind speed box section asWhere j=1, 2,3, …, b.
Further, in the step 4), a Logistic growth function is used to fit bin-interval mean data points within b bin intervalsThe step of obtaining the reference box power curve PV basic is as follows:
a) Record jth wind speed box section Is the box interval mean data point ofWherein,Where n j represents the number of data points within the jth wind speed bin,The sum of the active power of the data points in the jth wind speed box section is calculated;
b) Recording box section mean data points in b box sections using Logistic growth function The expression to obtain the reference box power curve PV basic is:
Wherein P max is a fan standard operation dataset Active power inP 0 is the fan standard operation datasetThe active power, r, of the first data point is the rate of increase of the reference bin power curve PV basic,Is the wind speed average in the independent variable box interval average data point.
Further, in the step 5), an optimal upper limit box power curve is determinedOptimum lower limit box type power curveThe steps of (a) are as follows:
a) Setting the left and right translational speed v lf and the maximum range of the reference box power curve PV basic respectively Speed v ud and maximum range of up-and-down translation
B) Note that the displacement of the reference box power curve PV basic =p (v) h left-right translation x lf yields two box power curves PV l_basic and PV f_basic, noted as:
wherein, the displacement x lf=h*vlf of the left and right translation is
C) The two box power curves PV up_limit and PV low_limit obtained by noting the displacement of the m-th up-down translation x ud of the two box power curves PV l_basic and PV f_basic are respectively an upper limit box power curve and a lower limit box power curve, and are noted as:
Wherein, the displacement x ud=m*vud of the up-down translation is
D) Calculating the ratio alpha of the number of data points in the upper limit and lower limit box power curves obtained by each translation to the total data points, and determining the optimal upper limit box power curve when alpha is larger than alpha limit Optimum lower limit box type power curveWhere α limit is the data point scaling constraint that determines the optimal boundary box power curve.
Further, in the step 6), the step of determining the data point as a time outlier is as follows:
a) In the power curve data set The time sliding window with the time span of a is set as follows:
b) Sequentially judging whether data points in a time sliding window with time span of a are positioned in an optimal upper limit box type power curve Optimum lower limit box type power curveIf the ith data point is outside the boundary range and the first a data points are all within the boundary range, determining the ith data point as a time outlier, and recording as
Further, in the step 7), the step of determining the data point as a spatial outlier is as follows:
a) Data set of power curve As training data, fromRandomly selecting beta samples to form a subset of a power curve data set as a root node of an isolated tree, and designating k as the number of the isolated tree;
b) Randomly selecting a power curve dataset Randomly designating a cutting point q in the feature p, and taking the cutting point q as a division basis, and the ith data pointIf p i is greater than q, placing the ith data point on the right sub-node, if p i is less than q, placing the ith data point on the left sub-node, and continuously constructing the sub-nodes by continuously designating p and q until k isolated trees are generated, and finishing training;
c) Data set of power curve Each data point in the tree is brought into each isolated tree, and each data point is calculatedThe outlier Score abnorm (i, β) is calculated as follows:
Wherein, N normal is the power curve data set for the average path length of the isolated treeThe total number of the data points is H (N normal) which is a harmonic number, the calculation formula is H (N normal)≈ln(Nnormal) +0.5772156649, and E (H (i)) is the expected path length of the ith data point in k isolated trees;
d) From each data point Is used for determining space outliers by outlier Score abnorm (i, beta)The specific criteria are as follows:
Compared with the prior art, the invention has the following innovative advantages and remarkable effects:
1) Three types of abnormal points which do not accord with the operation principle of the fan are divided according to the wind speed information, abnormal points are cleaned on a power curve of the fan, reliability of a data set is guaranteed, and accuracy of a subsequent data analysis process is guaranteed;
2) Aiming at the detection of the time outliers, a Logistic growth function is used for fitting a reference box type power curve, a boundary box type power curve is obtained through translation, a sliding time window and a time outlier judgment rule are designed, and outliers in a time dimension are effectively identified;
3) Aiming at the detection of the space outliers, a fixed number of isolated trees are trained, the space outliers are judged by calculating outlier scores, the outliers in the space dimension are simply and effectively identified, and the method has certain application significance for the analysis method of fault diagnosis and fault early warning in the fan operation process.
Drawings
FIG. 1 is a flow chart of a preprocessing method of wind turbine power curve data;
FIG. 2 is a raw dataset wind speed-power scatter plot of an embodiment to which the present invention is applied;
FIG. 3 is a wind speed-power scatter diagram of abnormal point detection results of the wind turbine after the processing of step 2;
FIG. 4 is a wind speed-power scatter diagram of the wind turbine after abnormal points are removed in step 2;
FIG. 5 is a graph of reference box power of the wind turbine after the processing of steps 3-4 of the present invention;
FIG. 6 is a graph of optimal upper and lower limit box power of the wind turbine calculated in step 5 of the present invention;
FIG. 7 is a graph of the time outliers of the wind turbine after the processing of step 6 of the present invention;
FIG. 8 is a schematic diagram of spatial outliers of the wind turbine after the processing of step 7 according to the present invention;
FIG. 9 is a graph of wind speed versus power scatter of the wind turbine after pretreatment in step 8 of the present invention.
Detailed Description
The following detailed description of the specific embodiments and working principles of the present invention refers to the accompanying drawings, in which:
Examples
The embodiment is based on data acquired by the SCADA system during the period of 2016-2017 during the operation of a wind turbine in a wind farm, and performs data preprocessing on the fan power curve. The data sampling interval of the SCADA system of the wind driven generator is 5min, the data information is zero and one month in one year, and the specific data acquisition time range is 2016/2/1:0:00 to 2017/2/28:23:55:00. The specific data variables and contents included in the data set are shown in table 1 and table 2:
TABLE 1 partial data content of a data set of a SCADA System for a wind farm
Data Acquisition time Actual value of pitch angle Wind speed Active power
1 2016/2/1 00:00:00 0 5.680 880
2 2016/2/1 00:05:00 0 5.878 968
72835 2016/10/22 07:00:00 0.001 4.253 282
72836 2016/10/22 07:05:00 0.001 4.631 382
93367 2017/1/2 01:45:00 15.511 2.547 1
110022 2017/2/28 23:50:00 0.088 6.762 1131
110023 2017/2/28 23:55:00 0.009 6.78 1092
TABLE 2 variable information of SCADA System data set of certain wind farm certain wind turbine
Variable name Meaning of variable Variable unit
Acquisition time Data acquisition real-time point Year/month/day/time of day in minutes/seconds
Pitch angle actual value (beta i) Current fan real-time pitch angle degree
Wind speed (vi) Current fan speed m/s
Active power (p i) Active power of current fan kW
In this embodiment, the implementation data set of the default power curve data preprocessing method is the operation data of the fan with a period of zero and one month, the method result is the preprocessing result of the fan power curve data, and the implementation steps of the method specifically include:
1) N pieces of operation data of the wind turbine to be evaluated in a demand period are read from a data acquisition and monitoring System (SCADA) of the wind turbine to be evaluated and form an operation data set of the wind turbine Selecting the information of running time { t i }, wind speed { v i }, active power { P i }, and pitch angle { beta i } from the data set to form a power curve information data set to be preprocessed, and recording the information data set asWherein i=1, 2,3, …, N; according to 110023 pieces of data information listed in tables 1 and 2, which include necessary information (acquisition time, pitch angle, wind speed, active power) required in the present step, and do not include other operation information, the result shown in fig. 2 is a scatter diagram of raw data of a fan wind speed-power curve in the present step;
2) Dividing three types of abnormal points which do not accord with the fan operation principle based on the wind speed { v i }, and recording N normal pieces of operation data which do not accord with the characteristics of the three types of abnormal points as a fan standard operation data set Wherein i=1, 2,3, …, N normal; abnormal point detection of a fan power curve is carried out according to three types of wind speeds of the fan running state, and the specific abnormal point detection process is as follows:
a) In a low wind speed state, the wind speed meets the condition of 0 < v i<vcut_in, wherein v cut_in is the cut-in wind speed of the fan, and if the active power P i is lower than 0 or within the threshold range of the active power, namely P i is less than 0 or 0 is less than or equal to P i<Pthres, the data point is judged to be an abnormal data point in the low wind speed state; for the embodiment, the cut-in wind speed v cut_in of the fan is 3m/s, the threshold value P thres of the active power is 10kW, and 645 abnormal points are arranged in a low wind speed state;
b) In the wind speed state, the wind speed meets the condition of v cut_in<vi<vrated, wherein v rated is the rated wind speed of the fan, and if the pitch angle beta i exceeds the threshold value of the pitch angle, namely |beta i|>βthres, the data point is judged to be a abnormal data point in the wind speed state; for the embodiment, the rated wind speed v rated of the fan is 11m/s, the threshold value beta thres of the pitch angle is 7.5 degrees, and 2082 abnormal points exist in the wind speed state;
c) In a high wind speed state, the wind speed meets the condition of v rated<vi<vcut_off, wherein v cut_off is the cut-out wind speed of the fan, and if the active power P i is more than or equal to the active power threshold P thres and less than the difference value between the rated active power of the fan and the active power threshold P thres, and P thres≤Pi<Prated-Pthres, the data point is judged to be a abnormal data point in the high wind speed state; for the embodiment, the cut-out wind speed v cut_off of the fan is 18m/s, the rated active power P rated of the fan is 2050kW, and 180 abnormal points are arranged in a high wind speed state;
FIG. 3 shows the results of outlier detection after this step, wherein "Y" represents an outlier detected as a low wind speed state, "diamond-solid" represents an outlier detected as a medium wind speed state, "+" represents an outlier detected as a high wind speed state, "" represents a normal data point; the power curve scatter diagram obtained after eliminating the three types of abnormal points is shown in fig. 4.
3) Standard operation data set of fanWind speed inActive powerComposite power curve data setDividing b box-type intervals based on wind speedWherein j=1, 2,3, …, b; for this embodiment, the number of box sections b=19, and the size of each wind speed section is 1m/s.
4) Calculate each box sectionInside box interval mean data pointsBased on the box section average data points of the b box sections, a Logistic growth function is used for fitting to obtain a reference box power curve PV basic, and the specific steps are as follows:
a) Record jth wind speed box section Is the box interval mean data point ofWherein,Where n j represents the number of data points within the jth wind speed bin,The sum of the active power of the data points in the jth wind speed box section is calculated;
b) Recording box section mean data points in b box sections using Logistic growth function The expression to obtain the reference box power curve PV basic is:
Wherein P max is a fan standard operation dataset Active power inP 0 is the fan standard operation datasetThe active power, r, of the first data point is the rate of increase of the reference bin power curve PV basic,The wind speed average value in the independent variable box type interval average value data point; for this embodiment, the expression of the fitted reference box power curve PV basic is:
FIG. 5 shows bin mean data points over 19 wind speed bins and a baseline bin power curve PV basic fitted using a Logistic growth function, where "·" represents bin mean data points over 19 wind speed bins.
5) Determining an optimal upper limit box power curveOptimum lower limit box type power curveThe steps of (a) are as follows:
a) Setting the speed v lf and the maximum range of the horizontal translation of the reference box power curve Pv Basic respectively Speed v ud and maximum range of up-and-down translationFor the present embodiment, the speed v lf of the side-to-side translation of the reference box power curve PV basic is 0.1 m/s/time, the maximum range of the side-to-side translationAt a speed v ud of 4m/s for the reference box power curve PV basic to translate up and down at 5 kW/time, the maximum range of translation left and right100KW;
b) Note that the displacement of the reference box power curve PV basic =p (v) h left-right translation x lf yields two box power curves PV l_basic and PV f_basic, noted as:
wherein, the displacement x lf=h*vlf of the left and right translation is
C) The two box power curves PV up_limit and PV low_limit obtained by noting the displacement of the m-th up-down translation x ud of the two box power curves PV l_basic and PV f_basic are respectively an upper limit box power curve and a lower limit box power curve, and are noted as:
Wherein, the displacement x ud=m*vud of the up-down translation is
D) Calculating the ratio alpha of the number of data points in the upper limit and lower limit box power curves obtained by each translation to the total data points, and determining the optimal upper limit box power curve when alpha is larger than alpha limit Optimum lower limit box type power curveWhere α limit is the data point proportion limit that determines the optimal boundary box power curve; for this embodiment, α limit is set to 0.95, the optimum upper limit box power curveAnd an optimal lower limit box power curveThe method is characterized by comprising the following steps:
FIG. 6 shows an optimum upper limit box power curve And an optimal lower limit box power curveWherein the dashed line "-" represents the optimum upper limit bin power curve and the optimum lower limit bin power curve, "" represents the original power curve data points.
6) Determining a data point in the power curve data set as a time outlierThe specific implementation steps of (a) are as follows:
a) In the power curve data set The time sliding window with the time span of a is set as follows:
for the present embodiment, the span of the time sliding window a is set to 3;
b) Sequentially judging whether data points in a time sliding window with time span of a are positioned in an optimal upper limit box type power curve Optimum lower limit box type power curveIf the ith data point is outside the boundary range and the first a data points are all within the boundary range, determining the ith data point as a time outlier, and recording asFor this embodiment, 557 temporal outliers may be determined according to a temporal outlier determination rule, and fig. 7 illustrates the locations of the temporal outliers, where "." represents the original power curve data points and "×" represents the temporal outliers.
7) Determining a data point in the power curve data set as a spatial outlierThe specific implementation steps of (a) are as follows:
a) Data set of power curve As training data, fromRandomly selecting beta samples to form a subset of a power curve data set as a root node of an isolated tree, and designating k as the number of the isolated tree; for this embodiment, the number of orphan trees k is 50 and the number of subset samples β of the power curve dataset is 256;
b) Randomly selecting a power curve dataset Randomly designating a cutting point q in the feature p, and taking the cutting point q as a division basis, and the ith data pointIf p i is greater than q, placing the ith data point on the right sub-node, if p i is less than q, placing the ith data point on the left sub-node, and continuously constructing the sub-nodes by continuously designating p and q until k isolated trees are generated, and finishing training;
specifically: power curve data set Is characterized by wind speed onlySum powerOne at a time randomly selected as feature p, then randomly assigned one cut point q, for example: the wind speed is selected as the characteristic p for the first time, the wind speed is 0-18m/s, 10m/s is randomly designated as the cutting point q, all data points larger than 10m/s are placed on the right, and all data points smaller than 10m/s are placed on the left.
C) Data set of power curveEach data point in the tree is brought into each isolated tree, and each data point is calculatedThe outlier Score abnorm (i, β) is calculated as follows:
Wherein, N normal is the power curve data set for the average path length of the isolated treeThe total number of the data points is H (N normal) which is a harmonic number, the calculation formula is H (N normal)≈ln(Nnormal) +0.5772156649, and E (H (i)) is the expected path length of the ith data point in k isolated trees;
d) From each data point Is used for determining space outliers by outlier Score abnorm (i, beta)The specific criteria are as follows:
for this embodiment, 482 spatial outliers may be determined according to a spatial outlier determination rule, and fig. 8 shows the locations of the spatial outliers, where "," represents the original power curve data points, and "," represents the spatial outliers.
8) The temporal outliers and the spatial outliers are combined into the space-time outliersSpace-time outliersFrom a power curve datasetRemoving to obtain a preprocessing result of wind turbine generator power curve data considering space-time outlier detectionProviding reliable basis for fan power curve fitting, state monitoring, fault early warning and the like; for this example, fig. 9 shows a plot of power curve data after the end of the final pretreatment after the elimination of temporal outliers, where "." "represents the pretreated power curve data points.
The invention relates to a fan power curve data preprocessing method considering space-time outliers, which mainly comprises three links of eliminating outliers which do not accord with a fan operation principle according to wind speed information in three types, formulating an outlier judgment rule in a time dimension to eliminate outliers, and eliminating the space outliers in a space dimension by using an isolated forest algorithm based on outlier scores. Fig. 1 is a flowchart of an implementation of the method according to the present invention. The embodiment of the invention is used for preprocessing the power curve data of the wind turbine according to the flow shown in fig. 1. Fig. 2-8 are result diagrams of data sets processed according to a flow, and the result diagrams of the processing procedure show that the preprocessed wind speed-power scatter diagram of the wind turbine can well reflect the actual running state of the wind turbine, and the preprocessed data has certain authenticity and reliability and can generate certain guiding significance for subsequent research works such as fan performance evaluation, fault early warning and the like.
The foregoing is merely a preferred embodiment of the present invention, and the present invention has been disclosed in the above description of the preferred embodiment, but is not limited thereto. Any person skilled in the art can make many possible variations and modifications to the technical solution of the present invention or modifications to equivalent embodiments using the methods and technical contents disclosed above, without departing from the scope of the technical solution of the present invention. Therefore, any simple modification, equivalent variation and modification of the above embodiments according to the technical substance of the present invention still fall within the scope of the technical solution of the present invention.

Claims (7)

1. A preprocessing method for wind turbine power curve data considering space-time outlier detection is characterized by comprising the following steps:
1) N pieces of operation data of the wind turbine to be evaluated in a demand period are read from a data acquisition and monitoring System (SCADA) of the wind turbine to be evaluated and form an operation data set of the wind turbine Selecting the information of running time { t i }, wind speed { v i }, active power { P i }, and pitch angle { beta i } from the data set to form a power curve information data set to be preprocessed, and recording the information data set asWherein i=1, 2,3, …, N;
2) For the power curve information data set in step 1) Detecting abnormal points, dividing abnormal points which do not accord with the fan operation principle according to the wind speed { v i }, and recording N normal pieces of operation data which do not accord with the characteristics of the abnormal points as a fan standard operation datasetWherein i=1, 2,3, …, N normal;
3) The standard operation data set of the fan in the step 2) is collected Wind speed inActive powerComposite power curve data setDividing b box-type intervals based on wind speedWhere j=1, 2,3, b;
4) Calculating each box section in step 3) Inside box interval mean data pointsBased on the box section average data points of the b box sections, using a Logistic growth function to fit to obtain a reference box power curve PV basic;
5) Setting the speed v lf and the maximum range of the right and left translation of the reference box power curve PV basic in the step 4) Speed v ud and maximum range of up-and-down translationRecording the upper left translation of the reference box power curve PV basic to obtain an upper limit box power curve PV up_limit, the lower right translation of the reference box power curve PV basic to obtain a lower limit box power curve PV low_limit, calculating the ratio alpha of the number of data points in the upper limit and lower limit box power curves obtained by each translation to the total data points, and determining the optimal upper limit box power curve when alpha is larger than alpha limit Optimum lower limit box type power curve
6) The power curve data set in the step 3) is collectedData points in (a)Sequentially judging whether the optimal upper limit box type power curve in the step 5) is located in the time sequenceOptimum lower limit box type power curveIf the ith data point is outside the boundary range and the first a data point is within the boundary range, determining the ith data point as a time outlier
7) The power curve data set in the step 3) is collectedAs training data, randomly selecting beta samples as root nodes of an isolated tree, randomly designating a power curve data setRandomly designating a cutting point q in the feature, placing data points smaller than the cutting point q in the feature p in a left subnode, placing data points larger than the cutting point q in the feature p in a right subnode, continuously constructing subnodes by continuously designating to generate p and q until k isolated trees are generated, then bringing each data point into each isolated tree, calculating the outlier score of each data point, and judging the space outlier occupation according to the outlier score
8) The time outliers in step 6) and step 7) are added to the time outliersAnd spatial outliersMaking union to form space-time outliersSpace-time outliersFrom the power curve data set in step 3)Removing to obtain a preprocessing result of wind turbine generator power curve data considering space-time outlier detection
2. The method for preprocessing the power curve data of the wind turbine generator set by considering the space-time outlier detection according to claim 1, wherein in the step 2), three types of abnormal points which do not accord with the fan operation principle are divided according to the wind speed { v i }, the division is based on the following:
a) In a low wind speed state, the wind speed meets the condition of 0 < v i<vcut_in, wherein v cut_in is the cut-in wind speed of the fan, and if the active power P i is lower than 0 or within the threshold range of the active power, namely P i is less than 0 or 0 is less than or equal to P i<Pthres, the data point is judged to be an abnormal data point in the low wind speed state;
b) In the wind speed state, the wind speed meets the condition of v cut_in<vi<vrated, wherein v rated is the rated wind speed of the fan, and if the pitch angle beta i exceeds the threshold beta thres of the pitch angle, namely |beta i|>βthres, the data point is judged to be an abnormal data point in the wind speed state;
c) In the high wind speed state, the wind speed meets the condition of v rated<vi<vcut_off, wherein v cut_off is the cut-out wind speed of the fan, and if the active power P i is larger than or equal to the active power threshold P thres and smaller than the difference value between the rated active power P rated and the active power threshold P thres of the fan and P thres≤Pi<Prated-Pthres, the data point is judged to be an abnormal data point in the high wind speed state.
3. The method for preprocessing wind turbine power curve data in consideration of space-time outlier detection according to claim 1, wherein in said step 3), the wind speed is based onThe steps of dividing the box-type section are as follows:
a) Standard operation data set of fan Wind speed inActive powerComposition data pointsSearching wind speedThe maximum wind speed value isRounding up to obtain the maximum value of the whole wind speed interval
B) Setting the interval of each wind speed box section as 1, setting b wind speed box sections in total, and recording each wind speed box section asWhere j=1, 2,3,..b.
4. The method for preprocessing wind turbine power curve data in consideration of space-time outlier detection according to claim 1, wherein in the step 4), a Logistic growth function is used to fit box section average data points in b box sectionsThe step of obtaining the reference box power curve PV basic is as follows:
a) Record jth wind speed box section Is the box interval mean data point ofWherein,Where n j represents the number of data points within the jth wind speed bin,The sum of the active power of the data points in the jth wind speed box section is calculated;
b) Recording box section mean data points in b box sections using Logistic growth function The expression to obtain the reference box power curve PV basic is:
Wherein P max is a fan standard operation dataset Active power inP 0 is the fan standard operation datasetThe active power, r, of the first data point is the rate of increase of the reference bin power curve PV basic,Is the wind speed average in the independent variable box interval average data point.
5. The method for preprocessing wind turbine power curve data in consideration of space-time outlier detection according to claim 1, wherein in said step 5), an optimal upper limit box power curve is determinedOptimum lower limit box type power curveThe steps of (a) are as follows:
a) Setting the left and right translational speed v lf and the maximum range of the reference box power curve PV basic respectively Speed v ud and maximum range of up-and-down translation
B) Note that the displacement of the reference box power curve PV basic =p (v) h left-right translation x lf yields two box power curves PV l_basic and PV f_basic, noted as:
wherein, the displacement x lf=h*vlf of the left and right translation is
C) The two box power curves PV up_limit and PV low_limit obtained by noting the displacement of the m-th up-down translation x ud of the two box power curves PV l_basic and PV f_basic are respectively an upper limit box power curve and a lower limit box power curve, and are noted as:
Wherein, the displacement x ud=m*vud of the up-down translation is
D) Calculating the ratio alpha of the number of data points in the upper limit and lower limit box power curves obtained by each translation to the total data points, and determining the optimal upper limit box power curve when alpha is larger than alpha limit Optimum lower limit box type power curveWhere α limit is the data point scaling constraint that determines the optimal boundary box power curve.
6. The method for preprocessing wind turbine power curve data considering space-time outlier detection according to claim 1, wherein in the step 6), the step of determining that the data point is a space-time outlier is as follows:
a) In the power curve data set The time sliding window with the time span of a is set as follows:
b) Sequentially judging whether data points in a time sliding window with time span of a are positioned in an optimal upper limit box type power curve Optimum lower limit box type power curveIf the ith data point is outside the boundary range and the first a data points are all within the boundary range, determining the ith data point as a time outlier, and recording as
7. The method for preprocessing wind turbine power curve data considering space-time outlier detection according to claim 1, wherein in the step 7), the step of determining that the data point is a space outlier is as follows:
a) Data set of power curve As training data, fromRandomly selecting beta samples to form a subset of a power curve data set as a root node of an isolated tree, and designating k as the number of the isolated tree;
b) Randomly selecting a power curve dataset Randomly designating a cutting point q in the feature p, and taking the cutting point q as a division basis, and the ith data pointIf p i is greater than q, placing the ith data point on the right sub-node, if p i is less than q, placing the ith data point on the left sub-node, and continuously constructing the sub-nodes by continuously designating p and q until k isolated trees are generated, and finishing training;
c) Data set of power curve Each data point in the tree is brought into each isolated tree, and each data point is calculatedThe outlier Score abnorm (i, β) is calculated as follows:
Wherein, N normal is the power curve data set for the average path length of the isolated treeThe total number of data points, H (N normal) is the harmonic number, E (H (i)) is the expected path length of the ith data point in k isolated trees;
d) From each data point Is used for determining space outliers by outlier Score abnorm (i, beta)The specific criteria are as follows:
CN202111314376.6A 2021-11-08 Wind turbine generator power curve data preprocessing method considering space-time outlier detection Active CN114169681B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111314376.6A CN114169681B (en) 2021-11-08 Wind turbine generator power curve data preprocessing method considering space-time outlier detection

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111314376.6A CN114169681B (en) 2021-11-08 Wind turbine generator power curve data preprocessing method considering space-time outlier detection

Publications (2)

Publication Number Publication Date
CN114169681A CN114169681A (en) 2022-03-11
CN114169681B true CN114169681B (en) 2024-07-16

Family

ID=

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105322519A (en) * 2015-11-02 2016-02-10 湖南大学 Big data fusion analysis and running state monitoring method for intelligent power distribution network
CN108171400A (en) * 2017-12-06 2018-06-15 浙江大学 A kind of power of fan curve data preprocess method based on abnormal point and outlier detection

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105322519A (en) * 2015-11-02 2016-02-10 湖南大学 Big data fusion analysis and running state monitoring method for intelligent power distribution network
CN108171400A (en) * 2017-12-06 2018-06-15 浙江大学 A kind of power of fan curve data preprocess method based on abnormal point and outlier detection

Similar Documents

Publication Publication Date Title
CN109740175B (en) Outlier discrimination method for power curve data of wind turbine generator
CN111275367B (en) Regional comprehensive energy system energy efficiency state evaluation method
CN111539553B (en) Wind turbine generator fault early warning method based on SVR algorithm and off-peak degree
CN111260503B (en) Wind turbine generator power curve outlier detection method based on cluster center optimization
CN108443088B (en) Wind turbine generator system state judging method based on cumulative probability distribution
CN107732970A (en) A kind of static security probability evaluation method of failure of new-energy grid-connected power system
CN111275570A (en) Wind turbine generator set power abnormal value detection method based on iterative statistics and hypothesis test
CN113657662B (en) Downscaling wind power prediction method based on data fusion
CN111209934A (en) Fan fault prediction and alarm method and system
CN110991701A (en) Wind power plant fan wind speed prediction method and system based on data fusion
CN116609055A (en) Method for diagnosing wind power gear box fault by using graph convolution neural network
CN114169681B (en) Wind turbine generator power curve data preprocessing method considering space-time outlier detection
Elijorde et al. A wind turbine fault detection approach based on cluster analysis and frequent pattern mining
CN108876060B (en) Big data based prediction method for wind power output probability of sample collection
CN115062007A (en) Wind turbine generator set wind speed and power data cleaning method based on isolated forest algorithm
Han et al. Characteristic curve fitting method of wind speed and wind turbine output based on abnormal data cleaning
CN111794921B (en) Onshore wind turbine blade icing diagnosis method based on migration component analysis
CN114033631B (en) Online identification method for wind energy utilization coefficient of wind turbine generator
CN116365500A (en) Wind power plant power generation power prediction method based on special region set prediction
Souza et al. Evaluation of data based normal behavior models for fault detection in wind turbines
CN114169681A (en) Wind turbine generator power curve data preprocessing method considering space-time outlier detection
CN113554203B (en) Wind power prediction method and device based on high-dimensional meshing and LightGBM
CN115326393A (en) Wind turbine generator bearing pair fault diagnosis method based on temperature information
CN113946977A (en) Application method for early warning of fan variable pitch fault based on decision tree algorithm
CN110334951B (en) Intelligent evaluation method and system for high-temperature capacity reduction state of wind turbine generator

Legal Events

Date Code Title Description
PB01 Publication
SE01 Entry into force of request for substantive examination
GR01 Patent grant