CN112396113A

CN112396113A - Two-stage selection method for operation mode data characteristics of power system

Info

Publication number: CN112396113A
Application number: CN202011318226.8A
Authority: CN
Inventors: 夏德明; 胡伟; 阴宏民; 田增垚; 刘洋; 王克非; 岳涵; 侯凯元; 屈可丁; 沈毅; 张博闻; 马坤; 蒋振宇
Original assignee: Northeast Branch Of State Grid Corp Of China; Tsinghua University; State Grid Corp of China SGCC
Current assignee: Northeast Branch Of State Grid Corp Of China; Tsinghua University; State Grid Corp of China SGCC
Priority date: 2020-11-23
Filing date: 2020-11-23
Publication date: 2021-02-23

Abstract

The invention belongs to the technical field of operation and control of an electric power system, and relates to a two-stage selection method for data characteristics of an electric power system operation mode. Firstly, selecting a Filter stage algorithm through IG-RFE characteristics based on standardized mutual information and interactive information gain, removing prior parameters in related algorithms and correctly judging complex dependency relations among the characteristics, thereby realizing automatic searching and extraction of key characteristics of power system operation mode data; and then removing the bad features by improving a mixed kernel function SVM algorithm in cooperation with a recursive feature elimination RFE searching method until a feature set with the result number being a preset value is obtained, so that the key features of the power system are efficiently, accurately and automatically searched. The method improves the efficiency and accuracy of the selection of the data characteristics of the operation mode of the power system, and provides a technical basis and a practical method for the selection of the data characteristics of the operation mode of the power grid.

Description

Two-stage selection method for operation mode data characteristics of power system

Technical Field

The invention belongs to the technical field of operation and control of an electric power system, and relates to a two-stage selection method for data characteristics of an electric power system operation mode.

Background

The scale of the ultrahigh voltage alternating current-direct current hybrid power grid in China is continuously enlarged, the wide access of new energy and the two-way interaction degree of a flexible load and the power grid are increased, so that the uncertainty of the source load on both sides is increased, the characteristics of the power grid are increasingly complex, and great challenges are provided for dispatching operators to monitor and regulate the safe and stable operation of the power grid. Therefore, the potential safety and stability problem of the power grid is researched, the observability and the controllability of the power system are improved, the transient stability of the large power grid is efficiently and quickly evaluated, and the method has important significance for maintaining the safe and stable operation of the power system. With the maturity of wide area measurement technology and the development of big data theory, online TSA based on an artificial intelligence method provides a new idea for intelligent control of a large power grid.

Because the actual large-scale alternating current-direct current hybrid system contains numerous variables, has large data scale and high dimension, only a few characteristic quantities influence the stability level of the system, most of the characteristics are redundant, and if the system is used as the input of a transient evaluation model, the system will influence the calculation efficiency, the classification effect and the requirements of online application, so that the initial input characteristics in the power system need to be screened to find the characteristics playing a key role in the researched safety and stability problem.

Disclosure of Invention

The invention aims to provide a two-stage selection method for the data characteristics of the operation mode of an electric power system, which comprises a Filter stage characteristic selection process based on an information theory and a Wrapper stage characteristic selection process based on an improved SVM algorithm.

The invention provides a two-stage selection method for the operation mode data characteristics of an electric power system, which comprises the following steps:

(1) the method for selecting the characteristics of the Filter stage based on the standardized mutual information and the interactive information gain on the operation mode data of the power system comprises the following steps:

(1-1) acquiring power system operation mode data from a synchronous vector measurement unit of a power system, supplementing missing data in the data, deleting repeated data to obtain effective data, constructing a sample and characteristics of the sample for each group of operation data, marking a label of 0 or 1 on the sample according to whether the transient state is stable or not, and recording the label as a class attribute C of the sample;

(1-2) constructing a selected feature subset S, and initializing S to be an empty set; constructing an alternative feature set US, and initializing the alternative feature set US into an empty set; respectively constructing classification accuracy indexes of a set S and a set US, recording the classification accuracy indexes as Acc (S) and Acc (US), and respectively setting Acc (S) and Acc (US) as 0 during initialization;

the foregoing acc (US) and acc (S) respectively represent the classification accuracy of the target optimal feature subset S and the candidate feature set US, and the calculation formula of the classification accuracy is as follows:

wherein TP, FN, FP, and TN respectively represent the number of samples correctly classified into positive examples, the number of samples incorrectly classified into negative examples, the number of samples incorrectly classified into positive examples, and the number of samples correctly classified into negative examples in a given machine learning algorithm;

(1-3) constructing an initial feature set of the power system operation mode data according to the effective data in the step (1-1), and updating the alternative feature set US into the initial feature set;

(1-4) carrying out discrete estimation and probability density estimation on each feature in the current candidate feature set US by adopting a Parzen window method to obtain probability distribution of the power system operation mode data features, namely the feature f_iProbability distribution p (f)_i) Characteristic f_j,f_iCombined probability distribution p (f)_i,f_j) Probability distribution p (C) and feature f of feature class attribute C_jConditional probability distribution p (f) under feature class attribute C_j,c)；

(1-5) respectively calculating the normalized mutual information NMI (f; C) of all the features f and the class attributes C in the candidate feature set US of the step (1-4) by using the following formula:

where MI (f; C) represents mutual information between the feature f and the feature class attribute C, namely:

h (f) represents the information entropy of the feature f, H (C) represents the information entropy of the feature class attribute C, and the calculation formula of the information entropy is as follows for a single continuous variable X:

H(X)＝-∫p(x)log₂(p(x))

wherein p (X) is the probability distribution of the variable X;

eliminating corresponding characteristics with values of zero in the NMI (f; C) from the alternative characteristic set US;

(1-6) calculating an NIG index and a Score index between any two features in the candidate feature set US in the step (1-4), and forming an IG-RFE evaluation standard result of each feature by using the following formula, namely calculating each feature of the candidate feature set US to obtain weight scores w (f) of all the features in the candidate feature set US_i)：

Wherein N is the number of features in the candidate feature set US;

score index Score (f)_i,f_j) The expression of (a) is as follows:

(1-7) weight scores w (f) for all features in step (1-6)_i) Removing features corresponding to the minimum weight scores in the max (1, r x N) sequences from the alternative feature set US, wherein r is the minimum removal proportion of backward search single iteration, and N is the total number of the features in the alternative feature set US;

(1-8) judging the total number of the features in the candidate feature set US in the step (1-7), and if the US is an empty set, outputting the current selected feature subset S as an optimal feature subset to realize two-stage selection of the power system operation mode data features; if the US is not an empty set, taking the current alternative feature set US as the input of the step (2) to carry out the second-stage screening;

(2) and (2) taking the alternative feature set US obtained in the step (1) as an input of an improved mixed kernel function SVM, and performing second-stage Wrapper feature selection, wherein the second-stage Wrapper feature selection comprises the following steps:

(2-1) adopting an improved mixed kernel function support vector machine algorithm, taking the alternative feature set US as input, carrying out classification training on the alternative feature set US according to a 10-fold cross validation method, and outputting to obtain the classification accuracy Acc (US) of the current alternative feature set US;

(2-2) comparing the classification accuracy Acc (S) of the selected feature subset S with the classification accuracy of the candidate subset US calculated in the step (2-1), if Acc (US) is greater than Acc (S), updating S to US, updating Acc (S) to Acc (US), returning to the step (1-4), and if Acc (US) is less than or equal to Acc (S), directly returning to the step (1-4).

In the step (1-1) of the feature extraction method, the initial feature set is steady-state operation data information before a fault in the power system, and the initial feature set comprises element feature data and system feature data, wherein the element feature data comprises active power and reactive power of each generator set in the system before the fault, active power and reactive power of loads of nodes in the system before the fault, active power and reactive power of a power transmission line, and voltage and phase angle of each bus in the system before the fault; the system characteristic data are total active output and total reactive output of a generator in the system before the fault, all active loads and all reactive loads in the system before the fault, the sum of mechanical input power in the system before the fault, total reactive reserve capacity in the system before the fault and network topology indexes of the electric power system before the fault.

In the step (1-4) of the feature extraction method, the Parzen window method is a non-parameter estimation method, the power system operation mode data features after data cleaning in the step (1-1) are spatially divided, and the frequency is used as the probability corresponding to the spatial center point coordinate to obtain the density distribution of the operation mode data features.

In the step (1-6) of the feature extraction method, the NIG index is a normalized information gain indexMarker NIG (f)_i；f_j(ii) a C) The expression is as follows:

wherein, IG (f)_i；f_j(ii) a C) Representing a feature f_i、f_jAnd mutual information gain index IG (f) between class attributes C_i；f_j；C)＝MI(f_i；f_j；C)＝MI(f_i；C)-MI(f_j；C)，H(f_i) Representing a feature f_iThe entropy of information of (1).

In the steps (1-6) of the feature extraction method, the IG-RFE evaluation criterion of a single feature is obtained by weight matching of the degree of association NMI between standardized mutual information index measures and the degree of cooperation NIG between mutual information gain calculation features, and the IG-RFE evaluation criterion expression of a single feature is as follows:

where N is the total number of US features of the set calculated in steps (1-6).

In step (2) of the above feature extraction method, the improved mixed kernel function support vector machine algorithm uses the selected mapping function transformation as a mixed function, maps the data samples into a high-dimensional space, and distinguishes two types of data samples by a linear hyperplane in the high-dimensional space, wherein a specific expression of the improved mixed function is as follows:

K_mix＝λK_local+(1-λ)K_global

in the formula, K_localRepresenting local kernel function, the local kernel function selecting RBF kernel function

K_globalRepresenting a global kernel function, selected as a polynomial kernel function k (x, x ') ═ x' + c^d。

The two-stage selection method for the operation mode data characteristics of the power system, provided by the invention, has the advantages that:

1. the invention relates to a two-stage selection method for the data characteristics of an electric power system operation mode, which comprises the steps of firstly selecting a Filter stage algorithm through IG-RFE (Interaction Gain-reactive Feature) characteristics based on standardized mutual information and interactive information Gain, removing prior parameters in related algorithms and correctly judging complex dependency relations among the characteristics, thereby realizing the automatic search and extraction of the key characteristics of the electric power system operation mode data; and then removing the bad features by improving a mixed kernel function SVM algorithm in cooperation with a recursive feature elimination RFE searching method until a feature set with the result number being a preset value is obtained, so that the key features of the power system are efficiently, accurately and automatically searched.

2. The method is easy to implement, and the method realizes automatic search of the key features of the power system by introducing a data driving method in the field of artificial intelligence and designing classifiers for two more key parts, namely a collaborative Recursive Feature Elimination (RFE) search method based on a standardized mutual information and interactive information gain feature selection algorithm and an improved mixed kernel function SVM algorithm, so that the method is easy to implement.

Detailed Description

wherein TP, FN, FP, and TN respectively represent the number of samples correctly classified into positive examples, the number of samples incorrectly classified into negative examples, the number of samples incorrectly classified into positive examples, and the number of samples correctly classified into negative examples in a given machine learning algorithm; reference may be made to the following table:

sample class attributes	Predicted to be 0	Prediction is 1
			Is actually 0	TN	FP
Is actually 1	FN	TP

H(X)＝-∫p(x)log₂(p(x))

wherein p (X) is the probability distribution of the variable X;

(1-6) calculating NIG index and Score index between any two features in the candidate feature set US in the step (1-4), and forming IG-RFE evaluation standard result of each feature by using the following formula, namely, for the candidate feature setCalculating each feature of the US to obtain the weight scores w (f) of all the features in the candidate feature set US_i)：

Where N is the number of features in the candidate feature set US.

Score index Score (f)_i,f_j) The expression of (a) is as follows:

(1-8) judging the total number of the features in the candidate feature set US in the step (1-7), if the US is an empty set, indicating that the selection process of the backward-eliminated feature subset is finished, and outputting the current selected feature subset S as an optimal feature subset to realize two-stage selection of the data features of the power system operation mode; if the US is not an empty set, taking the current alternative feature set US as the input of the step (2) to carry out the second-stage screening;

(2-2) comparing the classification accuracy Acc (S) of the selected feature subset S with the classification accuracy of the candidate subset US calculated in the step (2-1), if Acc (US) is greater than Acc (S), indicating that the performance of the feature subset of the current US is better than that of the selected feature subset S, updating S to US, updating Acc (S) to Acc (US), returning to the step (1-4), and if Acc (US) is less than or equal to Acc (S), indicating that the performance of the US at the moment is not as good as that of the classification of the selected feature subset S, so updating is not performed, and returning to the step (1-4).

In the steps (1-6) of the feature extraction method, the NIG index is a normalized information gain index NIG (f)_i；f_j(ii) a C) The expression is as follows:

K_mix＝λK_local+(1-λ)K_global

K_globalRepresenting a global kernel selected as a polynomial kernel^k(x,x′)＝(x*x'+c)^d. In one embodiment of the invention, the precision requirement of data, the classification performance of the algorithm and the actual running time requirement are comprehensively considered, four parameters are tested, and the typical values given in the following table are finally selected as actual values.

Parameter(s)	σ	C	d	λ
					Value taking	2.5	1	3	0.783

The two-stage selection method for the data characteristics of the power system operation mode comprises the steps of firstly selecting a Filter stage characteristic selection process based on an information theory, selecting a Filter stage algorithm through IG-RFE characteristics based on standardized mutual information and mutual information gain, removing prior parameters in related algorithms, and correctly judging complex dependency relations among the characteristics, including correlation, redundancy, complementarity and the like, so that the key characteristics of a power system are automatically searched and extracted. And then based on a Wrapper stage feature selection process of an improved SVM algorithm, continuously removing bad features from a current feature set to be processed by an improved mixed kernel function SVM algorithm in cooperation with a recursive feature elimination RFE search method to realize a selection process until a feature set with the number of results being a preset value is obtained, bringing the power grid operation mode data features proved by practice into initial candidate features, screening features capable of providing supplementary information, and providing as much power flow information as possible with as few features as possible so as to facilitate the implementation and monitoring of scheduling operators.

Aiming at the operation mode data of the power system with large data scale and high dimension, the method adopts a two-stage feature selection method combining a Filter stage feature selection method based on an information theory and a Wrapper stage feature selection method based on an improved SVM algorithm, is used for different training tasks, and can effectively improve the efficiency and accuracy of automatic searching of the key features of the power system.

The method of the invention is mainly divided into two stages: the method comprises the steps of firstly, providing a corresponding IG-RFE evaluation standard based on a Filter stage feature selection process of an information theory, introducing definition and basic indexes of related concepts of the information theory in the stage, wherein the evaluation standard is used as an evaluation index of a Filter stage algorithm, and has a good depicting effect on the correlation and the combined synergistic effect among features, thereby avoiding the correlation error possibly caused by artificial prior parameter setting of the traditional method to a great extent. This is one of the points of the method of the present invention. And then, performing a Wrapper stage feature selection process based on an improved SVM algorithm, and continuously removing bad features from the current feature set to be processed by the improved mixed kernel function SVM algorithm in cooperation with the recursive feature elimination RFE search method to realize a selection process until a feature set with the number of results being a preset value is obtained. This is another essential difference between the method of the present invention and other methods, and is the second invention of the present invention.

Claims

1. A two-stage selection method for operation mode data characteristics of a power system comprises the following steps:

H(X)＝-∫p(x)log₂(p(x))

wherein p (X) is the probability distribution of the variable X;

Wherein N is the number of features in the candidate feature set US;

score index Score (f)_i,f_j) The expression of (a) is as follows:

2. The feature extraction method according to claim 1, wherein the initial feature set in step (1-1) is steady-state operation data information before a fault in the power system, and includes element feature data and system feature data, where the element feature data includes active power and reactive power of each generator set in the system before the fault, active power and reactive power of loads of nodes in the system before the fault, active power and reactive power of the transmission line, and voltage and phase angle of each bus in the system before the fault; the system characteristic data are total active output and total reactive output of a generator in the system before the fault, all active loads and all reactive loads in the system before the fault, the sum of mechanical input power in the system before the fault, total reactive reserve capacity in the system before the fault and network topology indexes of the electric power system before the fault.

3. The feature extraction method of claim 1, wherein the Parzen window method in the step (1-4) is a non-parametric estimation method, and the operating mode data features of the power system after data cleaning in the step (1-1) are spatially divided, and the frequency is used as the probability corresponding to the coordinates of the spatial center point to obtain the density distribution of the operating mode data features.

4. The feature extraction method of claim 1, wherein the NIG index of the step (1-6) is a normalized information gain index NIG (f)_i；f_j(ii) a C) The expression is as follows:

5. The feature extraction method of claim 1, wherein the IG-RFE evaluation criterion for the individual features in the steps (1-6) is obtained by weight matching the degree of association NMI between the normalized mutual information index measures and the degree of synergy NIG between the mutual information gain calculation features, and the IG-RFE evaluation criterion for the individual features is expressed as follows:

6. The feature extraction method of claim 1, wherein the modified mixed kernel function support vector machine algorithm in step (2) uses the selected mapping function transform as a mixing function to map the data samples into a high-dimensional space, and the two types of data samples are distinguished by a linear hyperplane in the high-dimensional space, wherein the modified mixing function is specifically expressed as follows:

K_mix＝λK_local+(1-λ)K_global