CN109684673B

CN109684673B - Feature extraction and cluster analysis method for transient stability result of power system

Info

Publication number: CN109684673B
Application number: CN201811466907.1A
Authority: CN
Inventors: 刘颂凯; 毛丹; 程江洲; 杨楠; 王灿; 杨苗; 李欣; 郭攀锋
Original assignee: China Three Gorges University CTGU
Current assignee: China Three Gorges University CTGU
Priority date: 2018-12-03
Filing date: 2018-12-03
Publication date: 2023-03-24
Anticipated expiration: 2038-12-03
Also published as: CN109684673A

Abstract

A method for extracting characteristics and clustering analysis of transient stability results of a power system comprises the following steps: 1) Carrying out normalization preprocessing on the transient stable characteristic data to cluster the data; 2) Performing feature extraction and abnormal point judgment on the preprocessed data by utilizing an improved clustering algorithm; 3) Carrying out effectiveness evaluation on the clustering effect; 4) And analyzing the data features extracted from the transient stability by combining the geographical position information. The invention aims to solve the technical problems of low normalization degree, loose connection among different parameters and slow network convergence of the existing method for extracting and clustering the transient stability result characteristics of the power system. The system data characteristics can be effectively and accurately extracted, and help is provided for power system planners to identify system response.

Description

Feature extraction and cluster analysis method for transient stability result of power system

Technical Field

The invention relates to the field of transient stability analysis of a power system, in particular to a method for extracting characteristics and clustering analysis of transient stability results of the power system.

Background

The transient stability simulation tool is the key to the safe and reliable operation of the power system. In a power system which is huge in real life, transient stability research often generates a large amount of data, which provides important basis for monitoring and controlling the power system, but at the same time, because the efficiency of thorough analysis is very low, a great challenge is provided for power system planners to analyze the response of the whole system and identify possible abnormal conditions in the system. Therefore, it is of great significance to develop a method for automatically extracting such information from transient stability data. Applying clustering techniques to transient stability data, such as voltage and frequency response signals, addresses the above-mentioned needs. By extracting common features, outliers with unusual features are identified. Meanwhile, in view of better and more visual observation of the transient stability result, the geographic information can be injected into the result display, so that the transient result is clearer.

However, in practical power application, there are many disadvantages in the method for extracting and clustering transient stability result features of a power system: (1) the traditional feature extraction method can not completely identify some abnormal values or identify the abnormal values with insufficient accuracy, so that potential danger exists; (2) the adopted clustering method is random in determination of the number of clustering clusters and the initial clustering center, and has great errors and contingency, so that the clustering effect is often unsatisfactory; (3) the simple feature extraction scheme has insufficient information clearness, and places where transient stability data are required to be improved by combining with geographic information for analysis exist, for example, as the number of stages is increased, the data is dense, and a traditional bar graph cannot distinguish accurate identification signals. The transient stability result feature extraction and the cluster analysis play an important role in system operation, and the improvement of the algorithm and the optimization scheme are a key point in the field of transient research.

Disclosure of Invention

The invention aims to solve the technical problems of low normalization degree, loose connection among different parameters and slow network convergence of the existing method for extracting and clustering the transient stability result characteristics of the power system.

The purpose of the invention is realized by the following steps:

a method for extracting characteristics and clustering analysis of transient stability results of a power system comprises the following steps:

step 1, carrying out normalization preprocessing on transient stability characteristic data to cluster the data;

step 2, performing feature extraction and abnormal point judgment on the preprocessed data by utilizing an improved clustering algorithm;

step 3, evaluating the effectiveness of the clustering effect;

and 4, analyzing the data features extracted from the transient stability by combining the geographical position information.

In step 1, a combined normalization method is adopted for normalization feature data preprocessing, and column vectors and row vectors are normalized by a maximum value method in sequence;

in step 2, the feature extraction comprises two steps of data dimension reduction and high-quality clustering;

in step 3, evaluating the clustering effect of the feature extraction through the contour coefficient;

in step 4, a force guiding model is introduced on the basis of the layout of the classic elastic model diagram, and the geographical position of the node in the diagram is adjusted.

In step 2, the method specifically comprises the following steps:

step 2-1: improving the K-means clustering algorithm by using statistical linear regression and residual error analysis, solving an abnormal value according to a standardized residual error drop point range to obtain a clustering center, solving the number K of clusters through inverse mapping, and obtaining a self-determined K mean value algorithm of the clustering number and the clustering center, namely a CANCS K-means algorithm;

step 2-1: high quality clustering re-clusters by applying the diameter between the qualified classes to the new clusters obtained by the CANCS K-means algorithm.

In step 2-1: the steps in processing the CANCS K-means algorithm are as follows:

(1) Inputting a data set S = { x ] to be clustered _1, x ₂ ,...,x _N }；

(2) Carrying out standardization preprocessing on the data set S by adopting a joint normalization method to obtain a standardized data set

(3) According to local density formula

(k ₀ _distancex _i Representation and sample object x _i Nearest neighbor k ₀ Sum of Euclidean distances of objects, where k _o = n × 4%, when n is less than 100, k ₀ Take 3) and formula of Euclidean distance between samples

Evaluating>

The local density p of each of the data objects _i And sample object>

The closest distance to a sample object having a higher local density than it ∑ is>

(4) Let ρ be ^* =1/ρ, using a linear function

(a ₀ Is a linear constant, a ₁ Linear coefficients) to fit p _i And delta _i The relationship of (1);

(5) Calculate each σ _i Residual error of (2)

(δ _i And &>

Are each x _i Corresponding distance response values and fit values) and normalizing all residuals, i.e. dividing each residual by their standard deviation;

(6) And screening out data with a residual absolute value larger than 3 from the processed standardized residuals. These points correspond to

In (1)The data object is the clustering center to be searched;

(7) Using the obtained data object as an initial clustering center c _i To is aligned with

Carrying out K-means clustering operation on the data in the step (1);

(8) Outputting clustering results

(k is the number of clusters, c) _i ^* The ith cluster).

In step (7) of step 2-1: then to

When the K-means clustering operation is carried out on the data, the following steps are adopted:

1) Taking the automatically determined clustering center as an initial clustering center;

2) Using a formula

Evaluating each sample in the data set->

To the cluster center c _i The distance of (d);

3) Find each sample point to the cluster center c _i And classifying the sample points into corresponding clusters

Performing the following steps;

4) Recalculating new cluster centers of the same class

Updating the clustering center; />

(5) And (4) repeating the steps (2) to (4) until all cluster center points are not changed or the maximum operation times are reached.

In step 2-2: the high quality clustering algorithm steps are as follows:

(1) Selecting a cluster center c from a cluster center list of K clusters _i ^* ；

(2) Determining the Euclidean distance between each node in the cluster and the clustering center, and if the distance is smaller than the pre-specified quality threshold distance, clustering the two nodes together;

(3) And (3) executing the step (2) by sequentially selecting the nodes in the cluster until all the nodes in the cluster are traversed to obtain a new cluster, and recalculating the average value of the cluster data to obtain a new cluster center c' _i ；

(4) And (4) selecting a second candidate cluster, and repeating the step (2) and the step (3) until the K clusters are reclassified and calculated to obtain new cluster centers and clusters, wherein the number of the clusters is unchanged.

In step 3, for sample x _i The profile coefficient Sil of (a) is defined as:

(wherein: r) _i Indicates the number of samples in each cluster, a (x) _i ) Represents a sample x _i Average distance to remaining samples in the cluster, b (x) _i ) Represents a sample x _i Average distance minimum to samples in other clusters). For the whole data set, the effectiveness of the clustering result can be evaluated through an average contour index, which is expressed as follows: />

(wherein: N represents a dataset sample size), and £ is present>

If/or>

The closer to 1, the better the clustering effect.

In step 4, the geographical position of the node in the graph is adjusted, and the steps are as follows:

(1) Giving all points an initial random position;

(2) Firstly, calculating mutual repulsive force among nodes, then calculating mutual attractive force of the nodes connected with edges in a graph, and finally integrating the attractive force and the repulsive force to adjust the positions of the nodes in the layout through force action;

(3) And repeating the steps until the network area is balanced or iterating for a certain number of times.

In step (2) of step 4, an FR algorithm is used for processing, specifically the following steps are used:

1) Calculating a network balance distance;

2) Calculating the geometric distance between the nodes;

3) Obtaining an attraction function between adjacent nodes;

4) A repulsive force function between adjacent nodes is obtained.

By adopting the technical scheme, the following technical effects can be brought:

(1) Compared with the problems of low normalization degree, loose connection among different parameters, slow network convergence and the like existing in the traditional data preprocessing method, the combined normalization method adopted by the invention can enable different data characteristics to be closely connected, and has good anti-interference performance;

(2) The invention improves the K-means clustering algorithm by applying statistical linear regression and residual analysis to obtain the CANCS K-means clustering algorithm with the cluster number and the initial central point automatically determined, thereby avoiding the problem of weakening data clustering effect caused by randomness and contingency.

(3) Aiming at the problem of data display, the invention provides a short board which injects geographic information into transient data, improves the regional node layout by means of force guidance, enables the data monitoring of the power system to be more visualized, and effectively solves the problem that the traditional bar graph cannot provide geographic information.

Drawings

The invention is further illustrated by the following examples in conjunction with the accompanying drawings:

FIG. 1 is an overall flow chart of the present invention;

FIG. 2 is a diagram of a feature extraction model of the present invention;

FIG. 3 is a flow chart of the K-means clustering algorithm of the present invention;

FIG. 4 is a system diagram of an IEEE-118 node of an example of the present invention;

FIG. 5 is a graph of distance threshold selection versus number of clusters of the present invention;

FIG. 6 is a table of generator terminal voltage signals in combination with geographical location information classification in accordance with the present invention.

Detailed Description

A method for extracting and clustering characteristics of transient stability results of a power system is shown in figure 1 and comprises the following steps:

step 1: and carrying out normalized preprocessing on the transient stable characteristic data so as to cluster the data. The normalized feature data is preprocessed by a combined normalization method, and the column vectors and the row vectors are normalized by a maximum value method in sequence. The system data are more closely related, and the system disturbance is better adaptive.

Data set S = { x = { n = } { (x) } ₁ ,x ₂ ,...x _N Is a set of N sample objects, and each sample object x in S _ij ＝{x _i1 ,x _i2 ,...,x _iτ } (i =1, 2.., N, j =1, 2.., τ) contains τ dimensions, where x is _ij Representing a sample object x _i The j-th dimension of (c) may form a (τ × N) sample matrix. The normalization processing of the column vector and the row vector by a maximum value method comprises the following steps:

(1) Column vector normalization:

in the formula: i =0, 1., N-1, j =0, 1., τ -1, x _ij 、

Respectively before and after column vector normalization, X _maxj 、X _minj Respectively the most prior to column vector normalizationLarge and minimum values.

(2) Row vector normalization:

in the formula: j =0, 1.,. Tau-1,

respectively the values before and after row vector normalization>

The maximum value and the minimum value before the row vector normalization are respectively.

Step 2: the preprocessed data is subjected to feature extraction and abnormal point judgment by using an improved clustering algorithm, wherein the feature extraction comprises two steps of data dimension reduction and high-quality clustering, and as shown in fig. 2, the two steps are respectively as follows:

step 2-1: the K-means clustering algorithm is improved by applying statistical linear regression and residual error analysis, abnormal values are obtained according to the standardized residual error drop point range so as to obtain a clustering center, the cluster number K is obtained through inverse mapping, and a K mean algorithm, namely a CANCS K-means algorithm, for self-determining the clustering number and the clustering center is obtained.

Step 2-2: high quality clustering re-clusters by applying the diameter between the qualified classes to the new clusters obtained by the CANCS K-means algorithm.

In step 2-1, the CANCS K-means algorithm comprises the following steps, as shown in step 2-1 of FIG. 2:

(1) Inputting a data set S = { x ] to be clustered ₁ ,x ₂ ,...,x _N }；

(3) According to local density formula

(k ₀ _distancex _i Representation and sample object x _i Nearest neighbor k ₀ Sum of Euclidean distances of individual objects, where k _o = n × 4%, when n is less than 100, k ₀ Take 3) and formula of Euclidean distance between samples

Evaluating->

The local density p of each data object in _i And the sample object->

(4) Let ρ be ^* =1/ρ, using a linear function

(5) Calculate each sigma _i Residual error of

(δ _i And &>

(6) And screening out data with a residual absolute value larger than 3 from the processed standardized residuals. Then these points correspond to

The data object in (2) is the clustering center to be searched;

(7) Taking the data object obtained in the step (6) as an initial clustering center c _i To, for

Carrying out K-means clustering operation on the data in the step (1);

(8) Outputting clustering results

(k is the number of clusters, c) _i ^* The ith cluster).

The K-means clustering is a typical clustering algorithm based on distance, and the distance is used as an evaluation index of similarity, that is, the closer the distance between two objects is, the greater the similarity of the two objects is. The algorithm considers clusters to be composed of closely spaced objects, and therefore targets the resulting compact and independent clusters as final targets. Wherein step (7) of step 2-1 is shown in FIG. 3, and the operation steps are as follows:

2) Using formulas

Evaluating each sample in the data set->

To the center of the cluster c _i The distance of (d);

Performing the following steps;

4) Recalculating new cluster centers of the same class

Updating the clustering center;

5) And (4) repeating the steps (2) to (4) until all cluster center points are not changed or the maximum operation times are reached.

In step 2-2, the high quality clustering algorithm steps are as follows, as in step 2-2 of FIG. 2:

In step 3: carrying out effectiveness evaluation on the clustering effect; the clustering effect of feature extraction can be evaluated through the contour coefficient, and for a sample x _i The profile coefficient Sil of (a) is defined as:

in the formula: r is _i Indicates the number of samples in each cluster, a (x) _i ) Represents a sample x _i Average distance to remaining samples in the cluster, b (x) _i ) Represents a sample x _i The minimum of the average distance to the samples in the other clusters.

For the whole data set, the effectiveness of the clustering result can be evaluated through an average contour index, which is expressed as follows:

in the formula: n represents a sample of the data setSize, and

if/or>

The closer to 1, the better the clustering effect.

In step 4: the data features extracted from transient stability are analyzed by combining with geographical position information, the transient data volume is large, the clusters are dense, and the influence of the overlapping part needs to be considered in cluster analysis. Therefore, a force guiding model is introduced on the basis of the layout of a classical elasticity model diagram, and the geographical position of a node in the diagram is adjusted, wherein the method comprises the following steps:

(1) Giving all points an initial random position;

(2) Firstly, calculating mutual repulsive force among nodes, then calculating mutual attractive force of the nodes connected with edges in the graph, and finally integrating the attractive force and the repulsive force to adjust the positions of the nodes in the layout through force action;

(3) And (3) repeating the step (2) until the network areas are balanced (the acting force of all nodes is small), or iterating for a certain number of times.

In step (2) of step 4, the node position is adjusted by using an FR algorithm (force-guided layout algorithm, which is a method for achieving stable layout by continuously iterating the attractive force and repulsive force between two points). In a display area having a height W and a width H, FR is defined substantially as follows:

for a display area with the height W and the width H, any node n has two layout parameters, namely the position pos of the node and the position offset generated by the resultant force.

1) Balance distance:

in the formula: | N | is the number of nodes in the graph, d _p Also referred to as the optimal distance.

2) The geometric distance between the two nodes u and v:

in the formula: us _x ，u.pos _y Is the location information of node u; pos _x ，v.pos _y Is the location information of node v.

3) Attraction function between neighboring nodes u and v:

f _a (u,x)＝(dist(u,v)) ² /d _p (7)

4) Repulsive force function between adjacent nodes u and v:

f _r (u,v)＝d _p ² /dist(u,v) (8)

as shown in fig. 4, with the IEEE-118 node system, 19 generators are all modeled as classical generator models and synchronous generators are modeled as davinion voltage sources to replay a known voltage signal. At 1 second, a three-phase fault is simulated between the node 23 and the node 25, and the fault is cleared by opening the line at 1.12 seconds, thereby triggering the system to be disturbed and recording the per unit voltage of the terminal. And the transient stability data is preprocessed according to the step 1 process, so that preparation is made for feature extraction of the next step. The data are processed following step 2 and step 3 according to the invention.

As shown in fig. 5, the sensitivity relationship of the number of clusters in the system to a selected threshold diameter is shown. The number of cluster clusters used for feature extraction analysis is related to the selected threshold diameter, and after the system automatically determines the number of cluster clusters, it can be seen that the smaller the distance threshold is, the more cluster classification numbers are obtained, and when the distance threshold selected by the system is large to a certain extent, the system classification clusters can be unified into one. It is important to choose a suitable threshold diameter. In this example, a threshold diameter of 0.03 is selected, and the system terminal voltage signal is subjected to characteristic analysis.

The table in fig. 6 is a table of 6 cluster classes obtained by separating and grouping generator terminal voltage signals in combination with geographical location information using high quality clustering, in total: the first cluster is a generator based on voltage response, and has 13 stations (black dots on the single line diagram in fig. 4); the second cluster contains 2 generators (point 5 on the single line diagram of fig. 4); the remaining four clusters each consist of one generator (points 1,2, 3, 4 are shown in fig. 4), with the 4 th cluster being connected to the failed 25 nodes. In an actual system, each cluster has corresponding position information, and after the position information is obtained, the extracted voltage information can be further combined according to the process of step 4, so that the transient state information condition which is convenient and easy to observe can be obtained.

Claims

1. A method for extracting and clustering transient stability results of a power system is characterized by comprising the following steps:

step 1, carrying out normalization preprocessing on transient stable characteristic data to cluster the data;

step 3, evaluating the effectiveness of the clustering effect;

step 4, the data characteristics extracted from the transient stability are analyzed by combining the geographical position information,

in step 4, introducing a force guiding model on the basis of the layout of a classical elastic model diagram, and adjusting the geographical position of a node in the diagram;

in step 2, the method specifically comprises the following steps:

step 2-2: high quality clustering re-clusters by re-using the diameters between the defined classes for the new clusters obtained by the CANCS K-means algorithm;

in step 2-1, the following steps are taken in processing the CANCS K-means algorithm:

(1) Inputting a data set S = { x ] to be clustered ₁ ,x ₂ ,...,x _N }；

(3) According to local density formula

And formula of Euclidean distance between samples

Find out

The local density p of each data object in _i And a sample object

Closest distance to sample object having higher local density than it

k ₀ _distancex _i Representation and sample object x _i Nearest neighbor k ₀ Sum of Euclidean distances of individual objects, where k _o = n × 4%, when n is less than 100, k ₀ Taking 3;

(4) Let ρ be ^* =1/ρ, using a linear function

Fitting rho _i And delta _i The relationship of (1); a is a ₀ Is a linear constant, a ₁ Is a linear coefficient;

(5) Calculate each sigma _i Residual error of

And all residuals are normalized, i.e. each residual is divided by their standard deviation; delta _i And

are each x _i Corresponding distance response values and fit values;

(6) Screening out data with residual error absolute value larger than 3 from the processed standardized residual errors, and determining the points corresponding to the data

The data object in (1) is the clustering center to be searched;

(7) Using the obtained data object as an initial clustering center c _i To, for

Carrying out K-means clustering operation on the data in the step (1);

(8) Outputting clustering results

k is the number of clusters, c _i ^* Is the ith cluster;

in step (7) in step 2-1, in the pair

2) Using formulas

Computing each sample in a dataset

To the center of the cluster c _i The distance of (d);

Performing the following steps;

4) Recalculating new cluster centers of the same class

Updating the clustering center;

5) And (4) repeating the steps 2) to 4) until all the clustering center points are not changed or the maximum operation times are reached.

2. The method for feature extraction and cluster analysis of transient stability results of power system as claimed in claim 1, wherein in step 2-2, the high quality clustering algorithm comprises the following steps:

3. The method for feature extraction and cluster analysis of transient stability results of electric power system according to claim 1, wherein in step 3, for sample x _i The profile coefficient Sil of (a) is defined as:

wherein: r is _i Indicates the number of samples in each cluster, a (x) _i ) Represents a sample x _i Average distance to remaining samples in the cluster, b (x) _i ) Represents a sample x _i The minimum average distance to the samples in other clusters, for the whole data set, the effectiveness of the clustering result can be evaluated by the average contour index, which is expressed as follows:

wherein: n represents the data set sample size, and

if it is

The closer to 1, the better the clustering effect.

4. The method for feature extraction and cluster analysis of transient stability results of an electric power system according to claim 1, wherein in step 4, the geographical position of the node in the graph is adjusted, and the steps are as follows:

(1) Giving all points an initial random position;

5. The method for feature extraction and cluster analysis of transient stability results of an electric power system according to claim 4, wherein in step (2) of step 4, node positions are adjusted by using an FR algorithm, specifically adopting the following steps:

1) Calculating a network balance distance;

2) Calculating the geometric distance between the nodes;

3) Obtaining an attraction function between adjacent nodes;

4) A repulsive force function between adjacent nodes is obtained.