Disclosure of Invention
The invention provides a coherent machine group identification method based on complex invariance and a deep neural network, which improves the identification precision and generalization of a coherent machine group of a power system and is described in detail as follows:
a coherent cluster identification method based on complex invariance and deep neural networks, the method comprising:
mapping the sample to an n-dimensional space, and mining wide-area phasor measurement data characteristics based on the constructed self-encoder deep neural network;
calculating complexity invariance distance between unit time sequence data according to the wide-area phasor measurement data characteristics;
and determining the similarity between the samples in the n-dimensional space based on the complexity invariance distance, and clustering the unit samples according to the similarity result.
Wherein the self-encoder deep neural network comprises: an encoder and a decoder, and a control unit,
the encoder includes: the system comprises a 1-dimensional convolutional neural network, a maximum pooling layer and two layers of bidirectional long-term and short-term memory neural networks;
the decoder includes: an upsampling layer and a deconvolution neural network.
Further, the calculating the complexity invariance distance between the unit time sequence data according to the wide-area phasor measurement data characteristics specifically includes:
the complexity invariance distance of the time series Q and C is:
CID(Q,C)=ED(Q,C)×CF(Q,C)
wherein ED (Q, C) represents Euclidean distance between time series Q and C, and CF (Q, C) represents complexity correlation coefficient between time series Q and C;
wherein, the complexity measurement standards of the sequences Q and C are respectively CE (Q) and CE (C);
wherein q isiAnd n is the number of sampling points.
The method comprises the following steps of determining similarity among samples in an n-dimensional space based on complexity invariance distance, and clustering unit samples according to a similarity result, wherein the clustering specifically comprises the following steps:
calculating the probability that each sample belongs to a certain class, comparing the probability with the real distribution, calculating KL divergence, reversely changing the initial centroid and the encoder parameters of the clustering layer, and finally selecting the optimal clustering scheme with the minimum KL divergence loss value through grid search;
based on GMM and EM, searching the optimal clustering result by iteratively minimizing KL divergence.
Further, the calculating the probability that each sample belongs to a certain class specifically includes:
wherein q is
ikRepresents the probability that the ith sample belongs to centroid k; phi (z)
i|θ
k) Representing the similarity of the ith sample point and the kth centroid; a is
kIs a gaussian distribution coefficient; z is a radical of
iIs an implicit variable of sample i;
representing the Gaussian distribution parameter, K being the total number of centroids, μ
kFor the k-th centroid corresponding to the mean value of the Gaussian distribution, σ
kThe k-th centroid corresponds to the variance of the gaussian distribution.
Comparing with the real distribution, calculating KL divergence, and reversely changing the initial centroid of the clustering layer and the parameters of the encoder specifically comprise:
the clustering mean of centroid k in the new iteration is:
wherein η is the learning rate, and defines the Loss function Loss as the KL divergence between the prediction probability and the true estimation probability:
the true estimated probability distribution is:
the covariance matrix is:
the gaussian distribution coefficient is:
wherein p isikThe true probability that the ith sample belongs to the centroid k is shown, N is the total number of the samples, and T is a transposed symbol; sigmakIs a covariance matrix; a iskIs a gaussian distribution coefficient.
Further, the step of finally selecting the optimal clustering scheme with the minimum KL divergence loss value through grid search specifically includes:
and continuously and reversely fine-tuning the parameters and the centroid of the clustering layer of the initial encoder, and jointly adjusting the parameters and the centroid of the clustering layer with the training process of the initial encoder and the initial encoder, wherein when the two loss functions of KL divergence and MSE are minimum, the process is ended.
The technical scheme provided by the invention has the beneficial effects that:
1. the invention maps the samples to the n-dimensional space, determines the similarity among the samples in the n-dimensional space, and clusters the similar samples, wherein the similarity measurement standard defined by the invention is the complexity invariance distance, thereby effectively solving the problem that some generators and generators in other clusters have the maximum correlation coefficient and simultaneously have the minimum correlation coefficient with other generators in the clusters;
2. the data feature number of the multi-layer deep neural network mining constructed by the invention is far more than that of other methods, so that the key feature loss of data is avoided;
3. the cluster layer in the neural network can accurately identify isolated machine sets or machine sets with few number after local faults of the system, and effectively guide the formulation of a subsequent implementation scheme of active splitting.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention are described in further detail below.
Example 1
In order to accurately identify a coherent cluster of a power system, an embodiment of the present invention provides a coherent cluster identification method based on a complex invariance and a deep neural network, referring to fig. 1 to 3, the method including the following steps:
step 101: mapping the sample to an n-dimensional space, and mining wide-area phasor measurement data characteristics based on the constructed self-encoder deep neural network;
step 102: calculating complexity invariance distance between unit time sequence data according to the wide-area phasor measurement data characteristics;
step 103: and determining the similarity between the samples in the n-dimensional space based on the complexity invariance distance, and clustering the unit samples according to the similarity result.
In summary, the embodiment of the present invention can process the multidimensional timing characteristic operation data of the wide-area phasor measurement, thereby obtaining an accurate and effective coherent cluster division scheme, implementing clustering of unit samples, and completing identification of coherent clusters of the power system.
Example 2
The scheme in embodiment 1 is further described below with reference to specific calculation formulas, fig. 1 to 4, and examples, and is described in detail below:
201: constructing a self-encoder deep neural network;
the Auto-encoder (Auto-encoder) neural network structure constructed by the embodiment of the invention is shown in fig. 2.
After each output, the error is propagated backwards from the encoder, continuously optimizing. The two processes of the Encoder (Encoder) and the decoder (decoder) can be understood as inverse functions of each other, the dimension is continuously reduced in the encoding process, and the dimension is increased in the decoding process. When the self-encoder uses convolution operation to extract features, which is equivalent to that the encoding process is a deep convolutional neural network and multilayer convolution pooling is adopted, the decoding process needs to be performed with deconvolution and deconvolution. Because data collected by the PMU contains noise signals, random noise can be added into time sequence data before input, so that the input of each time is slightly different, then the time sequence data with the noise is subjected to self-coding, and an output value is compared with the time sequence data before the noise is added, so that the trained output has the anti-noise capability.
The encoder portion constructed by the embodiment of the present invention includes: a 1-dimensional Convolutional Neural Network (CNN), a Max-pool Layer (MPL), and two layers of Bidirectional Long Short Term Memory (BilSTM).
Through the superposition of a plurality of filters in the CNN, the edge detection at different angles in space is realized, the CNN can capture various characteristics in the PMU time sequence data of the power plant, and the time sequence data is disassembled into a plurality of characteristics. The mapping parameters of the filter structure are obtained by data learning, and the super parameters such as step length, the number of filters, the dimensionality of the filters and the like are adjusted manually.
The essence of MPL is to maximize the time series data, for example: and setting the MPL super parameter to be 4, wherein the maximum value is taken for every 4 values, and dimension reduction is realized. An intuitive understanding of this goal is to have neural networks only care about the emphasis in one area, the most salient features.
The basis of the BilSTM is a Long Short Term Memory neural network (LSTM), the input of the LSTM is a one-dimensional feature vector formed by the last CNN, timing sequence information is still contained at the moment, then the LSTM can map the timing sequence information with larger dimensionality into a vector with smaller dimensionality, and the mapped parameters are obtained by the neural network through data learning. There are special mechanisms inside the LSTM that make it possible to learn the temporal correlation between points and not treat each data point as independent, so that the resulting feature vector contains both the spatial edge features of the data (provided by CNN) and the temporal features (provided by LSTM).
The BilSTM inputs the input data in the forward direction once and then in the reverse direction once, and finally the output of the BilSTM and the output of the BilSTM are spliced, so that the above information and the following information can be learned, and the finally formed characteristic number is equal to the forward LSTM hidden layer node and the reverse LSTM hidden layer node. The hidden layer information is the language of the neural network, and the number of the nodes can be adjusted manually.
In the encoder part, the embodiment of the invention realizes the compression of power plant time sequence PMU data with tens of thousands of time points into hundreds or even tens of characteristics, and the characteristics are hierarchically described by a plurality of hidden layers inside a neural network. Before the advent of neural networks, feature mining was often required to process time series data, typically using averages, standard deviations, ratios of maxima to minima, or some statistically based F-value, among others. The embodiment of the invention compresses the data of the power plant times time point number into the data of the power plant times characteristic number.
The encoder can directly map each power plant to a euclidean space to calculate CID (complex-Invariant Distance) by obtaining the data of the number of power plants and the characteristic number, and the remaining problem is the determination of mapping parameters. For the above training problem, in order to obtain a structure capable of mapping time series data into feature vectors, the embodiment of the present invention adopts a basic idea of an Auto-encoder: compressing the data, if the compressed data can be basically restored, namely the input layer and the output layer are basically consistent, the compressed data retains most information, and the specific restoration link is realized by adding a decoder.
Referring to fig. 2, a decoder portion constructed in accordance with an embodiment of the present invention includes: an Upsample layer (Upsample) and an inverse CNN.
The CNN disassembles the time series data into several features, which the upsampling layer needs to restore. The upsampling layer of the embodiment of the present invention refers to a simple interpolation method, i.e., each time is repeated several times on the time axis.
Deconvolution is actually a special convolution operation that can be scaled from small samples to large samples. Specifically, 0 is first complemented for the original sequence, then the filter is rotated, and then calculation is performed according to a convolution formula.
After the two layers of upsampling and reverse CNN, the data is restored from the original power plant number characteristic number to the power plant number time point number, and then the difference before and after comparison is measured by Mean-square Error (MSE). Through the training of the decoder, the encoder can learn the mapping parameters, at the moment, the decoder layer is removed, and the rest is a structure capable of mapping the power plant PMU time sequence data into the eigenvector.
Step 202: calculating complexity invariance distance between unit time sequence data according to the wide-area phasor measurement data characteristics;
in clustering of plant time series frequency data, pairs of oscillating plants (which should be of the same class) tend to be further apart than simple euclidean distances of other plants in pairs (which are not necessarily of the same class). Thus, if only simple euclidean distances are used, it is possible to group two different classes of other power plants into the same class, but group a pair of oscillating power plants of the same class into two classes.
The embodiment of the invention defines the distance with the complexity taken into consideration as the similarity measurement standard for mining time series data, namely CID. Complexity invariance, which uses information about the complexity difference between two time series as a correction factor for the existing distance metric, produces a significant improvement in classification accuracy, the simple, parameter-free nature of which makes this improvement not detrimental to the efficiency of algorithms that call distance metrics frequently; modification of the triangle inequality can be used to take advantage of most existing indexing and data mining algorithms. The CIDs for time series Q and C are:
CID(Q,C)=ED(Q,C)×CF(Q,C) (1)
wherein, ED (Q, C) represents Euclidean distance between time series Q and C, and CF (Q, C) represents complexity correlation coefficient between time series Q and C.
The complexity correlation coefficient CF is equivalent to a complexity ratio, and the more different the complexity, the more different the complexity correlation coefficient, the larger the CID, i.e. the farther the individuals are apart.
Wherein, the complexity measurement standards of the sequences Q and C are respectively CE (Q) and CE (C).
In the information theory, there are many methods for measuring the complexity of a time sequence, and the embodiment of the invention provides a method with strong intuition, strong interpretability and no parameter: the more complex a time series is, the longer it should be when it is "straightened out", i.e.:
wherein q isiAnd n is the number of sampling points. Since Q and C are both time series and are a string of data, the calculation formula of ce (C) is the same as that of ce (Q), see formula (3), which is not described herein again in the embodiments of the present invention.
Step 203: and determining the similarity between the samples in the n-dimensional space based on the complexity invariance distance, and clustering the unit samples according to the similarity result.
The k-means algorithm is most commonly used in the traditional clustering algorithm, the clustering speed is high, the structure is simple, and the method is widely applied. The idea of k-means is: k centroids are selected, and then the distance from each data point to the centroid is calculated, thereby determining which centroid the data point belongs to. After all calculations, the center point (i.e., the average) of each cluster is selected as the new centroid and the process is repeated. Termination conditions were as follows: the distance the centroid moves at each update is less than a certain threshold or is preset for the number of iterations, i.e., iterations, to be performed. This aggregates the data into k classes.
However, the conventional k-means algorithm needs to determine the number of groups of clusters in advance, which severely limits the application of the k-means algorithm in many scenes. Therefore, in the embodiment of the invention, the samples are not directly classified, but the probability that each sample belongs to a certain class is calculated, and is compared with the real distribution, so that the Kullback-Leibler (KL) divergence is calculated, the initial centroid and the encoder parameters of the clustering layer are reversely changed, and finally, the optimal clustering scheme with the minimum KL divergence loss value is selected through grid search without determining the group number in advance.
Based on Gaussian Mixture Model (GMM) and Expectation-maximization algorithm (EM), the optimal clustering result is searched by iteratively minimizing KL divergence, and the flowchart is shown in fig. 3.
The probability distribution of the prediction in step E in the figure is as follows:
wherein q is
ikRepresents the probability that the ith sample belongs to centroid k; phi (z)
i|θ
k) Representing the similarity of the ith sample point and the kth centroid; a is
kIs a gaussian distribution coefficient; z is a radical of
iIs an implicit variable of sample i;
representing the Gaussian distribution parameter, K being the total number of centroids, μ
kFor the k-th centroid corresponding to the mean value of the Gaussian distribution, σ
kThe k-th centroid corresponds to the variance of the gaussian distribution.
The new iteration parameters calculated in the step M are as follows:
the clustering mean of centroid k in the new iteration is:
where η is the learning rate. The embodiment of the invention creatively defines the Loss function Loss as the KL divergence (standard for measuring the difference between two distributions) of the prediction probability and the real estimation probability, namely:
the true estimated probability distribution is:
the covariance matrix is:
the gaussian distribution coefficient is:
wherein p isikThe true probability that the ith sample belongs to the centroid k is shown, N is the total number of the samples, and T is a transposed symbol; sigmakIs a covariance matrix; a iskIs a gaussian distribution coefficient.
Because the encoder and the decoder need to continuously optimize parameters, the embodiment of the invention carries out joint adjustment with the training process of the initial decoder and the initial encoder in fig. 2 by continuously and reversely fine-tuning the parameters and the centroid of the initial encoder, and when the two loss functions of KL divergence and MSE are minimum, the process is ended.
Example 3:
the feasibility of the schemes of examples 1 and 2 is verified below with reference to fig. 4, the calculation examples, and tables 1 to 4, as described in detail below:
a typical 16-machine 68 node 5 regional power system was set up for example analysis. With the generator G1 as a reference machine, a three-phase permanent fault is set in a certain ac branch between the nodes 46 to 49, the fault occurrence time is 0.1s, the fault duration time is 0.1s, the fault line cutting time is 0.2s, and the whole simulation time lasts 20 s.
By adopting the algorithm in the embodiment of the invention, the frequency data of the generator is input, if the fault is found in 0.2s, the coherent identification is carried out from 0.2s, and after repeated experiments, the identification result of the coherent cluster is shown in the following table 1.
TABLE 1 coherent cluster after failure of nodes 46 to 49
The fault location is changed, three-phase short-circuit faults are set between the nodes 1 to 2, between the nodes 8 to 9 and between the nodes 41 to 42, the previous steps are repeated, and the identification results of the coherent cluster are shown in the following table.
TABLE 2 coherent cluster after 1-2 node failures
TABLE 3 coherent cluster after 8-9 nodes failure
TABLE 4 coherent cluster after failure of nodes 41 to 42
As can be seen from the above table, due to different dominant oscillation modes excited by different faults, the identification results of the coherent clusters may be slightly different, wherein the coherent clusters identified when three-phase short-circuit faults occur between the nodes 8 to 9 and between the nodes 41 to 42 are the same.
The calculation example shows that: the deep neural network constructed based on the self-encoder idea provided by the embodiment of the invention is used for identifying the coherent cluster of the power system by means of wide-area synchronous phasor measurement information of the power system. Adopting CNN, MPL, Bi-LSTM, upsampling and deconvolution to build a deep neural network, regarding an input layer and an output layer as the same layer, and performing data mining and feature extraction; calculating the CID representation unit similarity of the data characteristics by considering the characteristic that the movement trends of the time sequence characteristics of the coherent unit are similar; based on KL divergence, GMM model probability clustering is improved, parameters in a clustering process are automatically set without depending on the number of predetermined clustering groups, so that an optimal coherent cluster division scheme is obtained, isolated units or clusters with few units can be accurately identified after local system faults, and the coherent cluster of the electric power system based on wide-area synchronous phasor measurement information is effectively identified. The method has strong robustness, is completely based on wide-area measurement data to perform coherent clustering, effectively avoids the influence of uncertainty of a power system model and parameters on coherent clustering results, avoids key feature loss of data mining, does not depend on the number of clusters determined in advance, can accurately identify isolated units, and realizes quick and accurate identification of coherent clusters of the power system. The research result has certain reference significance for the safe and stable online monitoring and wide area control of the power system under the background of wide area measurement.
In the embodiment of the present invention, except for the specific description of the model of each device, the model of other devices is not limited, as long as the device can perform the above functions.
Those skilled in the art will appreciate that the drawings are only schematic illustrations of preferred embodiments, and the above-described embodiments of the present invention are merely provided for description and do not represent the merits of the embodiments.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.