CN112581315A - Wind power field clustering method based on extreme gradient dynamic density clustering - Google Patents

Wind power field clustering method based on extreme gradient dynamic density clustering Download PDF

Info

Publication number
CN112581315A
CN112581315A CN202011565766.6A CN202011565766A CN112581315A CN 112581315 A CN112581315 A CN 112581315A CN 202011565766 A CN202011565766 A CN 202011565766A CN 112581315 A CN112581315 A CN 112581315A
Authority
CN
China
Prior art keywords
clustering
wind power
power plant
tree
average gain
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011565766.6A
Other languages
Chinese (zh)
Inventor
王长江
陈厚合
姜涛
李雪
李国庆
范维
段方维
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Corp of China SGCC
Electric Power Research Institute of State Grid Liaoning Electric Power Co Ltd
Northeast Electric Power University
Original Assignee
Northeast Dianli University
Electric Power Research Institute of State Grid Liaoning Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northeast Dianli University, Electric Power Research Institute of State Grid Liaoning Electric Power Co Ltd filed Critical Northeast Dianli University
Priority to CN202011565766.6A priority Critical patent/CN112581315A/en
Publication of CN112581315A publication Critical patent/CN112581315A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/15Correlation function computation including computation of convolution operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06393Score-carding, benchmarking or key performance indicator [KPI] analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • Human Resources & Organizations (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Economics (AREA)
  • Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Strategic Management (AREA)
  • Entrepreneurship & Innovation (AREA)
  • General Engineering & Computer Science (AREA)
  • General Business, Economics & Management (AREA)
  • Pure & Applied Mathematics (AREA)
  • Marketing (AREA)
  • Mathematical Optimization (AREA)
  • Mathematical Analysis (AREA)
  • Computational Mathematics (AREA)
  • Educational Administration (AREA)
  • Development Economics (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Tourism & Hospitality (AREA)
  • Computing Systems (AREA)
  • Water Supply & Treatment (AREA)
  • Operations Research (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Public Health (AREA)
  • Game Theory and Decision Science (AREA)
  • Molecular Biology (AREA)
  • Quality & Reliability (AREA)
  • Biomedical Technology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Primary Health Care (AREA)
  • Algebra (AREA)
  • Databases & Information Systems (AREA)

Abstract

The invention discloses a wind power field clustering method based on extreme gradient dynamic density clustering, which comprises the following steps: selecting indexes of groups in the wind power plant, and carrying out abnormal value detection and interception on corresponding index data in a certain period; performing dimensionality reduction selection on the preprocessed clustering index data by using XGboost; and dividing the cluster of the selected index data based on the clustering of DBSCAN-DTW. The method can effectively solve the problem of partial data loss of the actual wind power plant, and improves the accuracy of the model; and the method is used for processing the multi-dimensional time sequence characteristic operation data of the fan, and can obtain accurate and effective cluster division in the wind power plant.

Description

Wind power field clustering method based on extreme gradient dynamic density clustering
Technical Field
The invention relates to the field of grouping in new energy plants of an electric power system, in particular to a wind power plant grouping method based on extreme gradient dynamic density clustering.
Background
The scale of the wind power plant is gradually increased, the dynamic characteristics of the wind power plant have great influence on the stability of the power system, and a simulation model which accurately reflects the dynamic characteristics of the wind power plant needs to be constructed. If each fan is modeled in detail, the model is complex in structure and high in dimension, and complexity of a power system and required simulation time are increased. How to establish a wind power plant equivalent model accurately representing the operation characteristics plays an important role in analyzing the safe and stable operation of a system after a large wind power plant is connected to a power grid, and clustering is a key step of wind power plant equivalence.
Clustering based on a clustering algorithm is a research hotspot in recent years, most clustering algorithms have extremely high calculation speed, and have obvious advantages in the problem of processing large data sets. Common wind power plant clustering algorithms include k-means clustering, fuzzy C-means clustering, spectral clustering and the like, wherein the k-means clustering is used for the most times. The nature of the k-means clustering algorithm is to continuously probe the centroid position of a cluster, so as to form a sphere, and the algorithm has low time and space complexity, but the algorithm is sensitive to initial points and noisy data, if the data is noisy, the overall clustering cluster is deviated, and the algorithm needs to preset a k value, so that the limitation is strong.
However, in practical applications, the above method has the following disadvantages:
1. when only the mechanical characteristic index is selected, the precision is influenced by a larger inertia time constant; the problem of single visual angle is also existed when the electric index is used alone. When mechanical and electrical indexes are selected simultaneously, the effectiveness of the grouping result is improved, but the artificial subjective selection of the grouping indexes may cause that the information of the unit represented by the clustering data is not fully mined, and the correlation among the variables and the redundancy on the data have great influence on the grouping effect;
2. the clustering method is high in processing speed, but is easily influenced by the shape, noise data, initial operating points and the like of a data set, dynamic response time is different due to different wind speeds of fans in a wind power plant, and the problem of time sequence misalignment cannot be solved by only depending on distance and density.
Disclosure of Invention
In order to improve equivalent precision and wide applicability of multiple working conditions of a wind power plant, the invention provides a method for carrying out grouping index dimension reduction Based on eXtreme Gradient Boosting (XGboost) and carrying out cluster division Based on Dynamic Time Warping (DTW) optimization of Density-Based noisy Spatial Clustering (DBSCAN) so as to process multidimensional timing characteristic operation data of a fan, thereby obtaining accurate and effective wind power plant cluster division, which is described in detail in the following description:
a method for clustering within a wind farm based on extreme gradient dynamic density clustering, the method comprising:
selecting indexes of groups in the wind power plant, and carrying out abnormal value detection and interception on corresponding index data in a certain period;
performing dimensionality reduction selection on the preprocessed clustering index data by using XGboost;
and dividing the cluster of the selected index data based on the clustering of DBSCAN-DTW.
The method comprises the following steps of selecting indexes of groups in the wind power plant, and carrying out abnormal value detection and interception on corresponding index data in a certain time period, wherein the abnormal value detection and interception are specifically as follows:
selecting 13 wind power plant grouping indexes, comprising the following steps: the rotor angular speed, the pitch angle, the electromagnetic torque, the mechanical torque, the stator voltage, the active power, the reactive power, the rotor voltage d-axis component, the rotor voltage q-axis component, the stator current d-axis component, the stator current q-axis component, the rotor current d-axis component and the rotor current q-axis component of each fan;
outlier cutoff upper and lower limits are:
Figure BDA0002861460530000021
in the formula: min and max represent the upper limit and the lower limit of data truncation; q1、Q3Respectively representing upper and lower quartiles; IQR ═ Q3-Q1.
Further, performing dimensionality reduction and selection on the clustering index data by using the XGBoost specifically comprises:
obtaining an objective function of each lifting tree, obtaining an optimal value of the difference between the loss before node segmentation and the loss after node segmentation after node iteration, calculating the average gain of the features to represent the importance degree of the features, and then selecting more important features to realize dimension reduction;
when each feature is split, average gain is recorded, and finally the sum of all average gain values of the feature is divided by the number of times that the feature is used for splitting nodes to obtain a quantitative score of the contribution degree of the feature;
and deleting the features one by one according to the sequence of the contribution degrees from low to high, clustering again, and traversing to obtain an index selection scheme corresponding to the clustering condition with the highest contour value.
The cluster division of the selected index data based on the clustering of DBSCAN-DTW specifically comprises the following steps:
the similarity between the two time sequences is calculated by extending and shortening the time sequences, carrying out some constraints and pruning in the middle, searching an optimal normalization path, and measuring the similarity between the two time sequences by using the sum of Euclidean distances between all similar points, which is called as the normalization path distance.
Further, the method further comprises:
and obtaining optimal clustering by continuously adjusting parameters, selecting clustering indexes by using XGboost to realize feature dimension reduction, performing DBSCAN-DTW clustering again, and outputting clustering results.
The technical scheme provided by the invention has the beneficial effects that:
1. because the dynamic response time of the double-fed Induction Generator (DFIG) of the same model in the wind power plant is different under different wind speeds, and the dynamic response time difference is larger by considering different electrical distances, different fault positions or different fault types, but the timing sequence correspondence is the premise that the traditional euclidean distance, cosine distance and other modes are used in the clustering algorithm, so that the DFIG cannot be effectively clustered. The method adopts a DTW algorithm to calculate the normalization path, solves the problems by a time sequence transformation principle, is more reliable by taking the Euclidean distance between the normalization paths as the similarity index between fans, can effectively solve the problem of partial data loss of the actual wind power plant, and improves the model accuracy;
2. because the distribution of the DFIGs on the multi-dimensional feature space has randomness, similar DFIGs are usually irregular clusters, and the traditional distance-based clustering algorithm can only form spherical clusters, so that similar DFIG outliers can be caused in the clustering process, the model complexity is improved, and the contour value is reduced; the wind power plant has a lot of noise signals in time periods of stable operation, fault occurrence, fault recovery and the like under the multidimensional characteristics, and the traditional clustering method can reduce the clustering accuracy. Clusters of different shapes can be obtained by adopting the DBSCAN algorithm based on density clustering, the sensitivity to noise is low, the noise influence can be eliminated, the problems are solved, and the model simplification degree and accuracy are improved;
3. the operation characteristic dimension of the fan is high, in order to comprehensively consider the characteristics of the DFIG, the electrical index and the mechanical index are high in dimension, strong correlation and noise can exist between the indexes, the clustering speed and the clustering precision can be reduced, and manual index selection has strong subjectivity. According to the method, the XGboost is adopted to select the clustering index, the strong correlation among variables is eliminated, noise and redundant data are filtered, the calculation speed of subsequent clustering is improved, the problem that principal component analysis is not suitable for non-Gaussian distribution data and the principal component is not interpretable is solved, the speed and the accuracy of dimension reduction are remarkably improved compared with the traditional random forest algorithm, and therefore a model is more quickly, accurately and effectively simplified.
Drawings
FIG. 1 is a flow chart of a clustering method in a wind farm based on extreme gradient dynamic density clustering;
FIG. 2 is a distance matrix grid diagram;
FIG. 3 is a schematic diagram of cumulative minimum distance paths;
FIG. 4 is a clustering diagram;
FIG. 5 is a clustering flow chart;
FIG. 6 is a wind farm structural diagram;
FIG. 7 is a graph showing the clustering results.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention are described in further detail below.
Example 1
In order to establish a wind power plant equivalent model for accurately representing the operating characteristics, referring to fig. 1, an embodiment of the present invention provides a wind power plant clustering method based on extreme gradient dynamic density clustering, which is described in detail in the following description:
step 101: selecting indexes of groups in the wind power plant, and carrying out abnormal value detection and abnormal value truncation processing on corresponding index data in a certain period;
step 102: performing dimensionality reduction selection on the preprocessed clustering index data by using XGboost;
step 103: and dividing the cluster of the selected index data by a clustering method based on DBSCAN-DTW.
In summary, the embodiment of the present invention can process the multi-dimensional time sequence characteristic operation data of the wind turbine based on the steps 101 to 103, so as to obtain an accurate and effective wind farm cluster division scheme.
Example 2
The scheme in embodiment 1 is further described below with reference to fig. 1 to 5, specific calculation formulas and examples, and details are described below:
201: detecting and processing abnormal data values;
wherein, select 13 wind-powered electricity generation field grouping indexes, include: the mechanical characteristic indexes of 4 of rotor angular velocity wr, Pitch angle Pitch, electromagnetic torque Tem and mechanical torque Tm of each fan and 9 of stator voltage Vs, active power P, reactive power Q, rotor voltage d-axis component Vrd, rotor voltage Q-axis component Vrq, stator current d-axis component Isd, stator current Q-axis component Isq, rotor current d-axis component Ird and rotor current Q-axis component Irq of each fan. In consideration of actual engineering conditions such as measurement errors and the like, the wind power plant initial data set has more abnormal values, and the subsequent clustering result overall offset can be caused. Data cleaning can be completed by the aid of box graphs for drawing variables based on graph-based detection, and the upper limit and the lower limit of abnormal value truncation are determined by the formula (1).
Figure BDA0002861460530000051
In the formula: min and max represent the upper limit and the lower limit of data truncation; q1、Q3Respectively represent upper and lower quartiles (terms in mathematical statistics); iqr ═ Q3-Q1
202: extracting and reducing the dimension of the wind power plant characteristics through XGboost;
the XGboost of the embodiment of the invention is an integrated algorithm based on a regression Tree model, and further optimizes the space and time of engineering problems while realizing a Gradient Boosting Decision Tree (GBDT) algorithm: for example, nodes in the same level can be parallel, and through cross validation, tree building can be stopped in advance when the prediction result is good; regular terms are added into the objective function, so that the complexity of the model is reduced, the generalization capability of the model is improved, and overfitting is effectively prevented; taylor expansion is carried out on the loss function, the calculation speed is obviously improved through approximate processing, and the descending degree of the loss function can be deeper.
The overall structure of the algorithm is as follows (2):
Figure BDA0002861460530000052
where i is the ith sample, and the input sample data xiTo find an estimated value
Figure BDA0002861460530000053
φ(xi) Is sample data xiTo the estimated value
Figure BDA0002861460530000054
Of the mapping relation of (c), and fkThe K-th tree is trained to obtain a function, and the overall spatial mapping relation is the sum of functions generated by the K trees.
The objective function is expressed as:
Figure BDA0002861460530000055
wherein, L is a training error, namely a loss function about a predicted value and a true value, and represents the matching degree of the model to a training set; omega (f)t) The regular term represents the complexity of the model, and the more complex the model is, the larger the value is; c is a constant term, yiAre true values.
For the loss function, the regression problem is measured by Mean Square Error (MSE), the classification problem is measured by cross entropy (used to measure the dissimilarity between two probability distributions), and the loss function is:
Figure BDA0002861460530000056
wherein n is the number of sample points.
The XGboost adds a regular term in the objective function, so that overfitting is avoided, wherein the regular term defines an L1 norm and an L2 norm by constructing parameters of new tree leaf nodes:
L1:Ω(w)=λ||w||1 (5)
L2:Ω(w)=λ||w||2 (6)
where w is the set of scores of leaf nodes, λ can control the score of the leaf nodes not to be too large, i.e. to prevent overfitting, and Ω (w) is a norm.
The embodiment of the invention adopts an L2 norm to define a regular term:
Figure BDA0002861460530000061
in the formula, W represents the number of leaf nodes of each tree, and μ and λ are both artificially set hyper-parameters.
Gradient lifting tree for the loss function, an approximation is proposed to represent the residual with a negative gradient. The gradient indicates that the directional derivative of the function at that point takes a maximum along that direction, i.e. the direction along this gradient at that point the function changes most rapidly. Each round of training of the method adds a function, which is to reduce the residual error. To reduce the most rapidly, a new model is built each time in the gradient direction in which the residual error decreases.
The preset model needs to train t rounds, and the process of determining the final function is as follows:
Figure BDA0002861460530000064
wherein f ist(xi) The resulting function is trained in the t tree for the ith sample.
Adding a new function f in a new roundt(xi) To reduce the objective function maximally, in the t-th round, the objective function is:
Figure BDA0002861460530000066
using taylor expansion to make an approximation, we get:
Figure BDA0002861460530000067
Figure BDA0002861460530000068
wherein, f (x) corresponds to a function added in a new round; f' (x) is the derivative of f (x); giIs the derivative of the training error; f' (x) is the second derivative of f (x); h isiIn order to train the second derivative of the error,
Figure BDA0002861460530000069
training errors in the t-1 tree for the ith sample.
Defining leaf node average gain GjAnd leaf node average gain trend Hj
Figure BDA00028614605300000610
Figure BDA00028614605300000611
Wherein, IjThe sample space corresponding to the jth leaf node.
The new objective function is obtained as:
Figure BDA0002861460530000071
Figure BDA0002861460530000072
wherein the content of the first and second substances,
Figure BDA0002861460530000073
an objective function for each lifting tree; theta is a hyper-parameter of the leaf node prediction score; m isjThe prediction scores for the leaf nodes.
The above is the target function of the t step, namely, the function is minimized during the t round training, so that the operation is obviously accelerated, more rounds of calculation can be carried out, and the loss function is reduced more.
The generation steps of the lifting tree are as follows:
step 1: for the first tree, each characteristic value of each wind driven generator sample is traversed, the original multiple wind driven generator samples are divided into two parts by the characteristic value, and mean square errors of the two sets are calculated respectively. And finding out the characteristic which minimizes the sum of the mean square errors of the left and right sets (sets are also called nodes) in all the characteristic values of the fan sample, recording the minimum characteristic name and the corresponding characteristic value, and dividing the tree into the left and right nodes according to the minimum characteristic.
Step 2: and repeating the steps until the average gain G of all places where the features are possibly segmented is traversed and is a negative value, and ending the model.
And step 3: when the classification is not performed, the last layer is called a child node of the tree (each node is a set), and the average value of the current predicted values of the samples falling in the child node is the classification result.
For a general regression tree, a square error loss function is adopted, which is that optimization is only needed to be achieved each time according to forward distribution, and the optimization of the whole is guaranteed. Due to the particularity of the squared error, it can be deduced that only the fitting residuals (true-predicted values) are needed at each time, so the input quantity to generate the second tree is a residual, which is the same principle as the last tree.
Namely, the above is a calculation method of the spanning tree, and the above spanning tree is subsequently adopted to select the characteristics of the wind driven generator.
By utilizing the algorithm, the specific steps of the wind driven generator characteristic selection are as follows:
1) inputting a wind driven generator sample x and a loss function L;
2) building a tree by using a greedy algorithm, learning a new function, and fitting the residual error of the last prediction;
3) performing iterative training on the loss function L, so that the smaller the error is, the better the error is;
4) and defining a regular term, calculating complexity, and dividing the spanning tree into a structure s and a weight m.
Conversion of the objective function Y (t)
Figure BDA0002861460530000081
To obtain
Figure BDA0002861460530000082
5) After node iteration, an optimal value of the difference between the loss before node segmentation and the loss after node segmentation (namely, the gain of node splitting) is obtained, and the optimal value can be used for calculating the average gain G of the feature to represent the importance degree of the feature, and then the more important feature is selected to realize dimension reduction:
Figure BDA0002861460530000083
wherein the content of the first and second substances,
Figure BDA0002861460530000084
representing the score of the left sub-tree,
Figure BDA0002861460530000085
representing the score of the right sub-tree,
Figure BDA0002861460530000086
representing the score obtained without segmentation and μ representing the complexity of the new node.
6) And repeating the steps until enough trees are generated, so that the predicted value is closest to the true value, and finishing the algorithm.
The embodiment of the invention applies the algorithm to the dimensionality reduction of the clustering index, and the larger the gain is, the larger the difference of the models before and after splitting is, namely the more the gradient of the target function is reduced, the closer the optimal solution of the target function is. Therefore, for each feature, when splitting, a G value is recorded, and finally, the quantitative score (i.e., the G value and/or the number) of the contribution degree of the feature is obtained by dividing the total G value of the feature by the number of times the feature is used to split the node. The features are deleted one by one according to the sequence of the contribution degrees from low to high and are clustered again, and an index selection scheme corresponding to the highest clustering condition of the contour value (a common index for evaluating the clustering effect, which is well known by those skilled in the art and is not described in detail in the embodiments of the present invention) is obtained after traversal.
Step 203: a clustering method based on DBSCAN-DTW is provided for cluster division.
For any kind of time-series characteristic data, two sequences P and Q are defined, the lengths of which are m and n respectively, and are expressed as P ═ P (P) respectively1,p2,…,pm),Q=(q1,q2,…,qn). When m is equal to n, the distance between a point and a point can be calculated by using the euclidean distance:
Figure BDA0002861460530000087
due to the influence of wind direction, landform, wake flow and other effects, all units in the same wind power plant are subjected to different wind speeds, and even if the fans are of the same type, the dynamic response time of the fans is different; due to the structure of the collector network inside the wind farm, the actual faults received by different wind turbines during a fault are different, for example, when the wind farm exits a fault, one part of the wind turbines may experience low voltage ride through, and the other part may not be affected. Under the complex conditions, the operation data of different fans are not aligned in time series, and the distance for measuring the similarity between two time series cannot be effectively obtained in a DBSCAN clustering algorithm even if Euclidean distance is used. Therefore, the method adopts DTW to calculate the similarity of the time sequence data in DBSCAN, calculates the similarity between the two time sequences by extending and shortening the time sequence, and searching the optimal normalization Path after some constraints and pruning, and measures the similarity between the two time sequences by using the sum of Euclidean distances between all similar points, which is called the normalization Path Distance (WPD).
DTW is an important method for measuring the similarity of two sequences with different lengths, and the core of the algorithm lies in that the algorithm can break through the restriction of inconsistent sequence lengths, and the points on different sequences are aligned with the points through the extension and contraction of the sequences, so as to calculate the cumulative minimum distance between the points on the two sequences with different lengths, as shown in the following:
Figure BDA0002861460530000091
where D (i, j) is the distance between the ith point of the sequence p and the jth point of the sequence q.
To align the two sequences, an m × n distance matrix grid is constructed, each element (i, j) in the matrix representing the distance d (p) of two pointsi,qj) The smaller the distance, the higher the similarity. Suppose sequence P is (2,3,6,2,1) and sequence Q is (1,3,6, 4). The distance between them is 1 and the distance matrix grid is shown in fig. 2.
Assuming that the sequence-normalized path is R and K denotes the length of the path, the normalized path is R ═ (R)1,r2,…rk) The normalized path distance function is
Figure BDA0002861460530000092
The defined warping path needs to satisfy certain constraints:
1) boundary property: the start and end points of the two sequences P and Q must correspond, i.e.R1=(1,1),Rk=(m,n)。
2) Monotonicity: the points on the sequences P and Q must be monotonic so that the two sequences do not intersect.
3) Continuity: points in the sequence can only be matched with adjacent points, and cannot be matched in a spanning mode, namely, 0 is less than or equal to i-i' is less than or equal to 1.
After the above constraint conditions are satisfied, the regular path and the cumulative minimum distance can be calculated, as shown in fig. 3.
The similarity among the multidimensional time sequences is obtained through DTW, and the similarity needs to be substituted into a DBSCAN clustering algorithm for clustering. The algorithm divides the area with sufficient density into clusters and finds arbitrarily shaped clusters in a spatial database with noise, which defines clusters as the largest set of density-connected points. Unlike distance-based clustering algorithms, density-based clustering algorithms can find clusters of arbitrary shape. In a density-based clustering algorithm, high-density regions separated by low-density regions are found in a data set, and the separated high-density regions are taken as an independent category. The DBSCAN algorithm is one of algorithms with higher degree of freedom in the clustering algorithm, can break through the limitation of the clustering algorithm on the number of samples, and is suitable for any dense or non-dense data. It determines how close the samples are based on density, thereby classifying samples that meet the requirements into a category, i.e., a cluster. Unlike other clustering algorithms, there are two important parameters in the DBSCAN algorithm that require manual intervention, namely Epsilon and MinPts, where Epsilon represents the clustering radius and MinPts represents the minimum value of the number of samples in a class.
Wherein the parameters Epsilon and MinPts can separate all samples into three categories: core point: selecting a point M in the samples, setting N as the number of samples with the density capable of being reached, and when N is larger than or equal to MinPts, setting the point M as a core point; boundary points are as follows: selecting a point P in the sample, setting the radius from the point P to a core point M as r, and when r is Epsilon, taking the point P as a boundary point; noise points: and selecting a point Q in the sample, wherein if the point Q and any core point do not meet the density reachable, the point Q is a noise point, as shown in FIG. 4. After the two parameters are set, one point in the samples can be selected as a core point, all samples meeting the condition that the density can reach are found as a category, all the points in the category are ensured to be in the Epsilon neighborhood, the sample set is set as E, and the point m is taken as follows:
Epsilon(m,M)={m∈E|d(m,M)≤Epsilon} (20)
where Epsilon (M, M) is the set of data points within the clustering radius with M as the core point.
By analogy, more core points are found, the algorithm flow is as shown in fig. 1, and the algorithm is terminated after all the core objects have the category.
And calculating the optimal normalization path between fans according to the acquired fan data by adopting DTW (dynamic time warping), calculating the similarity according to the normalized Euclidean distance, using the similarity as a clustering basis of DBSCAN (direct space-based control area network), and obtaining optimal grouping by continuously adjusting parameters. At the moment, the calculation dimensionality is extremely high, and strong correlation problems and redundancy among features may exist, so that XGboost is used for selecting the clustering index, feature dimensionality reduction is realized, DBSCAN-DTW clustering is carried out again, and the clustering result is output.
Example 3
The feasibility of the protocols of examples 1 and 2 is verified below with reference to specific tests, calculations, tables 1-3, as described in detail below: according to the embodiment of the invention, a matlab/simulink simulation platform is utilized to build a wind power plant formed by 16 DFIGs with rated power of 1.5MW, as shown in FIG. 6. The terminal voltage 690V of the DFIG is boosted to 35kV on site in a unit wiring mode of one machine to one machine, the voltage is transmitted to a 35kV/220kV transformer substation through an overhead line and is transmitted to an external power grid, and the initial wind speed data of the wind turbine is shown in the following table.
TABLE 1 initial wind speed
Figure BDA0002861460530000101
Figure BDA0002861460530000111
In terms of software configuration, the software written by python code, such as a sklern machine learning library, a vim integrated development editor and anaconda environment management software, is used in the embodiment.
And setting three-phase short circuit faults at the outlet of the wind power plant at a certain period of time, and acquiring 13-dimensional characteristic time sequence data of the transient state and the steady state of each fan. Wherein, the data feature is 13-dimensional, including: the mechanical characteristic indexes of 4 of rotor angular velocity wr, Pitch angle Pitch, electromagnetic torque Tem and mechanical torque Tm of each fan and 9 of stator voltage Vs, active power P, reactive power Q, rotor voltage d-axis component Vrd, rotor voltage Q-axis component Vrq, stator current d-axis component Isd, stator current Q-axis component Isq, rotor current d-axis component Ird and rotor current Q-axis component Irq of each fan. The XGboost and the random forest algorithm are respectively used for reducing the dimension of the 13-dimensional DFIG time sequence characteristics collected in the text, and the obtained index contribution degree sequence is shown in the following table 2.
TABLE 2 index contribution ranking
Figure BDA0002861460530000112
The XGboost training time is 3.53s, the random forest training time is 52.85s, the XGboost training time is far shorter than that of the random forest, and the XGboost training speed is verified to be fast.
The scheme with the highest contour value obtained after dimension reduction by using XGboost is 0.954, and the corresponding clustering index is 3-dimensional: tem, Vrq, Ird; after the dimensionality reduction is carried out by using a random forest, the obtained scheme contour value with the highest contour value is 0.897, and the corresponding clustering index is also 3-dimensional: tem, Ird, Irq. The contour value of the former is obviously higher than that of the latter, and the contour value of the outlier is set to be 0 in the embodiment of the invention, so that the contour value index is an optimization target which comprehensively considers few outliers and high clustering similarity of DFIG, thereby showing that the XGboost selects the clustering index more accurately.
DTW obtains the Euclidean distance between regular paths, the represented similarity is used for clustering, the maximum search radius is set as the maximum similarity representation value between two fans, all high-dimensional space points are traversed, and solution spaces are screened.
And calculating equivalent parameters of the DFIG by a capacity weighting method through the wind power plant subjected to clustering processing by a clustering algorithm, obtaining equivalent parameters of a current collection network in the wind power plant by a loss invariant principle, continuously building a corresponding matlab/simulink simulation model, and comparing the equivalent parameters with the outlet dynamic response of the original model. In order to verify the clustering accuracy of the DBSCAN-DTW of XGboost dimensionality reduction, the embodiment of the invention uses a K-means clustering algorithm, a DBSCAN-DTW clustering algorithm and a DBSCAN-DTW clustering algorithm of random forest dimensionality reduction to simultaneously carry out the experiment as comparison, and the comparison data is shown in the following table.
TABLE 3 comparison of cluster partitioning results with dynamic equivalent relative deviation
Figure BDA0002861460530000121
As shown in table 3 above, compared with the equivalent model clustered by other methods, the XGBoost dimension reduction DBSCAN-DTW clustering proposed in the embodiment of the present invention can reduce the voltage deviation, the active power deviation, and the reactive power deviation of the equivalent model, and significantly improve the accuracy.
And (3) evaluating a model:
1) when a three-phase short-circuit fault occurs at a grid-connected point, because the dynamic response time of the DFIGs of the same model is different at different wind speeds, and the dynamic response time difference is larger due to the consideration of different electrical distances, different fault positions or different fault types, the time sequence correspondence is assumed on the premise that the traditional Euclidean distance, cosine distance and other modes are used in a clustering algorithm, so that the DFIGs cannot be effectively clustered. The method adopts a DTW algorithm to calculate the rounding path, solves the problems through a time sequence transformation principle, and improves the accuracy of the model.
2) Because the distribution of the DFIGs on the multi-dimensional feature space has randomness, similar DFIGs are usually irregular clusters, and the traditional distance-based clustering algorithm can only form spherical clusters, so that the clusters can generate mass center offset due to the abnormal points in the clustering process, and the contour value is reduced. The DBSCAN is adopted to solve the problems through a clustering method based on density, and the model accuracy is improved.
3) Because the wind power plant has a lot of noise signals in the time periods of stable operation, fault occurrence, fault recovery and the like under the multidimensional characteristics, the traditional clustering method can reduce the clustering accuracy. The DBSCAN has extremely low sensitivity to noise, has better noise immunity and improves the model accuracy.
4) In order to fully consider the characteristics of the DFIG, the electrical index and the mechanical index have high dimensionality, strong correlation may exist between the indexes, noise may also be contained, and the clustering speed and the clustering precision are reduced. The XGboost can effectively select the clustering characteristics, improve the clustering speed, filter noise and redundant data and remarkably improve the contour value; the XGboost is faster than the dimension reduction speed of Random Forest, the contour value index obtained by dimension reduction is higher, and therefore the model is more quickly, accurately and effectively simplified.
In the embodiment of the present invention, except for the specific description of the model of each device, the model of other devices is not limited, as long as the device can perform the above functions.
Those skilled in the art will appreciate that the drawings are only schematic illustrations of preferred embodiments, and the above-described embodiments of the present invention are merely provided for description and do not represent the merits of the embodiments.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (7)

1. A wind power plant clustering method based on extreme gradient dynamic density clustering is characterized by comprising the following steps:
selecting indexes of groups in the wind power plant, and carrying out abnormal value detection and interception on corresponding index data in a certain period;
performing dimensionality reduction selection on the preprocessed clustering index data by using XGboost;
and dividing the cluster of the selected index data based on the clustering of DBSCAN-DTW.
2. The wind power plant clustering method based on extreme gradient dynamic density clustering according to claim 1, wherein the selecting of the index of the wind power plant clustering, and the abnormal value detection and the truncation processing of the corresponding index data in a certain period specifically comprise:
selecting 13 wind power plant grouping indexes, comprising the following steps: the rotor angular speed, the pitch angle, the electromagnetic torque, the mechanical torque, the stator voltage, the active power, the reactive power, the rotor voltage d-axis component, the rotor voltage q-axis component, the stator current d-axis component, the stator current q-axis component, the rotor current d-axis component and the rotor current q-axis component of each fan;
outlier cutoff upper and lower limits are:
Figure FDA0002861460520000011
in the formula: min and max represent the upper limit and the lower limit of data truncation; q1、Q3Respectively representing upper and lower quartiles; iqr ═ Q3-Q1
3. The wind power field clustering method based on extreme gradient dynamic density clustering according to claim 1, wherein the performing dimensionality reduction selection on clustering index data by using XGboost specifically comprises:
obtaining an objective function of each lifting tree, obtaining an optimal value of the difference between the loss before node segmentation and the loss after node segmentation after node iteration, calculating the average gain of the features to represent the importance degree of the features, and then selecting more important features to realize dimension reduction;
when each feature is split, average gain is recorded, and finally the sum of all average gain values of the feature is divided by the number of times that the feature is used for splitting nodes to obtain a quantitative score of the contribution degree of the feature;
and deleting the features one by one according to the sequence of the contribution degrees from low to high, clustering again, and traversing to obtain an index selection scheme corresponding to the clustering condition with the highest contour value.
4. The wind farm clustering method based on extreme gradient dynamic density clustering according to claim 1, wherein the cluster division of the selected index data based on the clustering of DBSCAN-DTW specifically comprises:
the similarity between the two time sequences is calculated by extending and shortening the time sequences, carrying out some constraints and pruning in the middle, searching an optimal normalization path, and measuring the similarity between the two time sequences by using the sum of Euclidean distances between all similar points, which is called as the normalization path distance.
5. The wind farm clustering method based on extreme gradient dynamic density clustering according to claim 1, wherein the method further comprises:
and obtaining optimal clustering by continuously adjusting parameters, selecting clustering indexes by using XGboost to realize feature dimension reduction, performing DBSCAN-DTW clustering again, and outputting clustering results.
6. The method for clustering in wind power plants based on extreme gradient dynamic density clustering according to claim 3, wherein the objective function of each lifting tree is as follows:
Figure FDA0002861460520000021
wherein W is the number of leaf nodes per tree, GjAverage gain of leaf nodes, mjIs the prediction score of the leaf node, n is the number of sample points, HjRepresenting the average gain trend of the leaf nodes, wherein theta is a hyper-parameter of the leaf node prediction scores; μ represents the complexity of the new node.
7. The wind farm clustering method based on extreme gradient dynamic density clustering according to claim 6, wherein the average gain is as follows:
Figure FDA0002861460520000022
wherein the content of the first and second substances,
Figure FDA0002861460520000023
representing the score of the left sub-tree,
Figure FDA0002861460520000024
representing the score of the right sub-tree,
Figure FDA0002861460520000025
representing the score obtained without segmentation, GLAverage gain for leaf nodes of the left sub-tree, HLMean gain trend for leaf nodes of the left sub-tree, GRAverage gain for leaf nodes of the right subtree, HRThe average gain trend is the leaf node of the right subtree.
CN202011565766.6A 2020-12-25 2020-12-25 Wind power field clustering method based on extreme gradient dynamic density clustering Pending CN112581315A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011565766.6A CN112581315A (en) 2020-12-25 2020-12-25 Wind power field clustering method based on extreme gradient dynamic density clustering

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011565766.6A CN112581315A (en) 2020-12-25 2020-12-25 Wind power field clustering method based on extreme gradient dynamic density clustering

Publications (1)

Publication Number Publication Date
CN112581315A true CN112581315A (en) 2021-03-30

Family

ID=75139752

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011565766.6A Pending CN112581315A (en) 2020-12-25 2020-12-25 Wind power field clustering method based on extreme gradient dynamic density clustering

Country Status (1)

Country Link
CN (1) CN112581315A (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113065604A (en) * 2021-04-15 2021-07-02 北京理工大学 Air target grouping method based on DTW-DBSCAN algorithm
CN113378332A (en) * 2021-08-16 2021-09-10 成都数联云算科技有限公司 Production equipment group analysis method and device, electronic equipment and computer readable storage medium
CN113657786A (en) * 2021-08-24 2021-11-16 国网青海省电力公司清洁能源发展研究院 Cluster division method based on wind turbine generator operation performance evaluation index system
CN114492569A (en) * 2021-12-20 2022-05-13 浙江大学 Typhoon path classification method based on width learning system
CN115673595A (en) * 2022-12-28 2023-02-03 苏芯物联技术(南京)有限公司 Welding voltage real-time monitoring system and method based on intelligent edge calculation
CN116167232A (en) * 2023-03-03 2023-05-26 国网浙江省电力有限公司电力科学研究院 DFIG sequence impedance model identification method and system
CN117436352A (en) * 2023-12-20 2024-01-23 聚合电力工程设计(北京)股份有限公司 Wind farm noise analysis method and system
CN117610943A (en) * 2024-01-23 2024-02-27 中国标准化研究院 Modeling method of credit evaluation risk model of credit subject

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103870923A (en) * 2014-03-03 2014-06-18 华北电力大学 Information entropy condensation type hierarchical clustering algorithm-based wind power plant cluster aggregation method
CN108899930A (en) * 2018-07-09 2018-11-27 西华大学 Wind-powered electricity generation station equivalent modeling method based on Principal Component Analysis Method and hierarchical clustering algorithm
CN109063276A (en) * 2018-07-12 2018-12-21 国网江苏省电力有限公司电力科学研究院 Wind power plant dynamic equivalent modeling method suitable for long time domain fluctuations in wind speed
CN109409575A (en) * 2018-09-27 2019-03-01 贵州电网有限责任公司 Wind power plant group of planes division methods based on Gap Statistic

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103870923A (en) * 2014-03-03 2014-06-18 华北电力大学 Information entropy condensation type hierarchical clustering algorithm-based wind power plant cluster aggregation method
CN108899930A (en) * 2018-07-09 2018-11-27 西华大学 Wind-powered electricity generation station equivalent modeling method based on Principal Component Analysis Method and hierarchical clustering algorithm
CN109063276A (en) * 2018-07-12 2018-12-21 国网江苏省电力有限公司电力科学研究院 Wind power plant dynamic equivalent modeling method suitable for long time domain fluctuations in wind speed
CN109409575A (en) * 2018-09-27 2019-03-01 贵州电网有限责任公司 Wind power plant group of planes division methods based on Gap Statistic

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
夏雨: ""大型风电场等值建模研究"", 《中国优秀硕士学位论文全文数据库 (工程科技Ⅱ辑)》 *
沈小军 等: ""风电机组风速-功率异常运行数据特征及清洗方法"", 《电工技术学报》 *
邬春明 等: ""基于XGBoost-EE的电力系统暂态稳定评估方法"", 《电力自动化设备》 *

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113065604B (en) * 2021-04-15 2022-10-21 北京理工大学 Air target grouping method based on DTW-DBSCAN algorithm
CN113065604A (en) * 2021-04-15 2021-07-02 北京理工大学 Air target grouping method based on DTW-DBSCAN algorithm
CN113378332A (en) * 2021-08-16 2021-09-10 成都数联云算科技有限公司 Production equipment group analysis method and device, electronic equipment and computer readable storage medium
CN113657786A (en) * 2021-08-24 2021-11-16 国网青海省电力公司清洁能源发展研究院 Cluster division method based on wind turbine generator operation performance evaluation index system
CN114492569B (en) * 2021-12-20 2023-08-29 浙江大学 Typhoon path classification method based on width learning system
CN114492569A (en) * 2021-12-20 2022-05-13 浙江大学 Typhoon path classification method based on width learning system
CN115673595A (en) * 2022-12-28 2023-02-03 苏芯物联技术(南京)有限公司 Welding voltage real-time monitoring system and method based on intelligent edge calculation
CN116167232A (en) * 2023-03-03 2023-05-26 国网浙江省电力有限公司电力科学研究院 DFIG sequence impedance model identification method and system
CN116167232B (en) * 2023-03-03 2023-12-26 国网浙江省电力有限公司电力科学研究院 DFIG sequence impedance model identification method and system
CN117436352A (en) * 2023-12-20 2024-01-23 聚合电力工程设计(北京)股份有限公司 Wind farm noise analysis method and system
CN117436352B (en) * 2023-12-20 2024-03-22 聚合电力工程设计(北京)股份有限公司 Wind farm noise analysis method and system
CN117610943A (en) * 2024-01-23 2024-02-27 中国标准化研究院 Modeling method of credit evaluation risk model of credit subject
CN117610943B (en) * 2024-01-23 2024-03-29 中国标准化研究院 Modeling method of credit evaluation risk model of credit subject

Similar Documents

Publication Publication Date Title
CN112581315A (en) Wind power field clustering method based on extreme gradient dynamic density clustering
CN109063276B (en) Wind power plant dynamic equivalent modeling method suitable for long-time domain wind speed fluctuation
CN107909211B (en) Wind field equivalent modeling and optimization control method based on fuzzy c-means clustering algorithm
CN109255477B (en) Wind speed prediction method based on depth limit learning machine, system and unit thereof
CN112818491A (en) Wind power plant aggregation equivalent modeling method based on principal component analysis and clustering algorithm
Ouyang et al. Monitoring wind turbines' unhealthy status: a data-driven approach
CN110263998B (en) Double-layer correction method for multisource numerical weather forecast set
CN116937579B (en) Wind power interval prediction considering space-time correlation and interpretable method thereof
Xu et al. Correlation based neuro-fuzzy Wiener type wind power forecasting model by using special separate signals
CN109066651B (en) Method for calculating limit transmission power of wind power-load scene
CN110263834A (en) A kind of detection method of new energy power quality exceptional value
CN114021483A (en) Ultra-short-term wind power prediction method based on time domain characteristics and XGboost
CN113612237A (en) Method for positioning resonance-induced subsynchronous oscillation source in offshore wind farm
Gao et al. Design and application of a fault diagnosis and monitoring system for electric vehicle charging equipment based on improved deep belief network
Li et al. Forecasting of wind capacity ramp events using typical event clustering identification
CN112418504B (en) Wind speed prediction method based on mixed variable selection optimization deep belief network
Wang et al. Cluster division in wind farm through ensemble modelling
Yao et al. Power curve modeling for wind turbine using hybrid-driven outlier detection method
Zhang et al. A data-driven method for power system transient instability mode identification based on knowledge discovery and XGBoost algorithm
Joshuva et al. A comparative study for condition monitoring on wind turbine blade using vibration signals through statistical features: a lazy learning approach
CN114781244A (en) Grouping and parameter optimization method in wind power plant
Liu et al. Data-driven mode identification method for broad-band oscillation of interconnected power system
Guan et al. Scenario generation of wind power based on longitudinal-horizontal clustering strategy
Xiao et al. Identification of Fault Indicator Variables of Wind Turbine Pitch System Based on SGD-R’s Improved K-means Algorithm
Liu et al. Wind Turbine Abnormal Data Identification Based on MKIF Model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20210622

Address after: 132012 No. 169, Changchun Road, Jilin, Jilin

Applicant after: NORTHEAST DIANLI University

Applicant after: STATE GRID LIAONING ELECTRIC POWER Research Institute

Applicant after: STATE GRID CORPORATION OF CHINA

Address before: 132012 No. 169, Changchun Road, Jilin, Jilin

Applicant before: NORTHEAST DIANLI University

Applicant before: STATE GRID LIAONING ELECTRIC POWER Research Institute

RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20210330