CN112396113A - Two-stage selection method for operation mode data characteristics of power system - Google Patents
Two-stage selection method for operation mode data characteristics of power system Download PDFInfo
- Publication number
- CN112396113A CN112396113A CN202011318226.8A CN202011318226A CN112396113A CN 112396113 A CN112396113 A CN 112396113A CN 202011318226 A CN202011318226 A CN 202011318226A CN 112396113 A CN112396113 A CN 112396113A
- Authority
- CN
- China
- Prior art keywords
- feature
- data
- features
- power system
- acc
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000010187 selection method Methods 0.000 title claims abstract description 14
- 238000000034 method Methods 0.000 claims abstract description 44
- 238000000605 extraction Methods 0.000 claims abstract description 17
- 230000002452 interceptive effect Effects 0.000 claims abstract description 6
- 230000006870 function Effects 0.000 claims description 32
- 238000011156 evaluation Methods 0.000 claims description 12
- 238000004364 calculation method Methods 0.000 claims description 10
- 238000012706 support-vector machine Methods 0.000 claims description 6
- 230000001052 transient effect Effects 0.000 claims description 5
- 238000005259 measurement Methods 0.000 claims description 4
- 238000012216 screening Methods 0.000 claims description 4
- 238000012549 training Methods 0.000 claims description 4
- 230000005540 biological transmission Effects 0.000 claims description 3
- 238000004140 cleaning Methods 0.000 claims description 3
- 238000002790 cross-validation Methods 0.000 claims description 3
- 238000010801 machine learning Methods 0.000 claims description 3
- 238000013507 mapping Methods 0.000 claims description 3
- 230000001502 supplementing effect Effects 0.000 claims description 3
- 230000001360 synchronised effect Effects 0.000 claims description 3
- 230000008030 elimination Effects 0.000 abstract description 5
- 238000003379 elimination reaction Methods 0.000 abstract description 5
- 230000008569 process Effects 0.000 description 9
- 238000013473 artificial intelligence Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 230000009466 transformation Effects 0.000 description 2
- 238000011161 development Methods 0.000 description 1
- 238000013210 evaluation model Methods 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 230000002195 synergetic effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/211—Selection of the most significant subset of features
- G06F18/2113—Selection of the most significant subset of features by ranking or filtering the set of features, e.g. using a measure of variance or of feature cross-correlation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/10—Pre-processing; Data cleansing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2411—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
Abstract
The invention belongs to the technical field of operation and control of an electric power system, and relates to a two-stage selection method for data characteristics of an electric power system operation mode. Firstly, selecting a Filter stage algorithm through IG-RFE characteristics based on standardized mutual information and interactive information gain, removing prior parameters in related algorithms and correctly judging complex dependency relations among the characteristics, thereby realizing automatic searching and extraction of key characteristics of power system operation mode data; and then removing the bad features by improving a mixed kernel function SVM algorithm in cooperation with a recursive feature elimination RFE searching method until a feature set with the result number being a preset value is obtained, so that the key features of the power system are efficiently, accurately and automatically searched. The method improves the efficiency and accuracy of the selection of the data characteristics of the operation mode of the power system, and provides a technical basis and a practical method for the selection of the data characteristics of the operation mode of the power grid.
Description
Technical Field
The invention belongs to the technical field of operation and control of an electric power system, and relates to a two-stage selection method for data characteristics of an electric power system operation mode.
Background
The scale of the ultrahigh voltage alternating current-direct current hybrid power grid in China is continuously enlarged, the wide access of new energy and the two-way interaction degree of a flexible load and the power grid are increased, so that the uncertainty of the source load on both sides is increased, the characteristics of the power grid are increasingly complex, and great challenges are provided for dispatching operators to monitor and regulate the safe and stable operation of the power grid. Therefore, the potential safety and stability problem of the power grid is researched, the observability and the controllability of the power system are improved, the transient stability of the large power grid is efficiently and quickly evaluated, and the method has important significance for maintaining the safe and stable operation of the power system. With the maturity of wide area measurement technology and the development of big data theory, online TSA based on an artificial intelligence method provides a new idea for intelligent control of a large power grid.
Because the actual large-scale alternating current-direct current hybrid system contains numerous variables, has large data scale and high dimension, only a few characteristic quantities influence the stability level of the system, most of the characteristics are redundant, and if the system is used as the input of a transient evaluation model, the system will influence the calculation efficiency, the classification effect and the requirements of online application, so that the initial input characteristics in the power system need to be screened to find the characteristics playing a key role in the researched safety and stability problem.
Disclosure of Invention
The invention aims to provide a two-stage selection method for the data characteristics of the operation mode of an electric power system, which comprises a Filter stage characteristic selection process based on an information theory and a Wrapper stage characteristic selection process based on an improved SVM algorithm.
The invention provides a two-stage selection method for the operation mode data characteristics of an electric power system, which comprises the following steps:
(1) the method for selecting the characteristics of the Filter stage based on the standardized mutual information and the interactive information gain on the operation mode data of the power system comprises the following steps:
(1-1) acquiring power system operation mode data from a synchronous vector measurement unit of a power system, supplementing missing data in the data, deleting repeated data to obtain effective data, constructing a sample and characteristics of the sample for each group of operation data, marking a label of 0 or 1 on the sample according to whether the transient state is stable or not, and recording the label as a class attribute C of the sample;
(1-2) constructing a selected feature subset S, and initializing S to be an empty set; constructing an alternative feature set US, and initializing the alternative feature set US into an empty set; respectively constructing classification accuracy indexes of a set S and a set US, recording the classification accuracy indexes as Acc (S) and Acc (US), and respectively setting Acc (S) and Acc (US) as 0 during initialization;
the foregoing acc (US) and acc (S) respectively represent the classification accuracy of the target optimal feature subset S and the candidate feature set US, and the calculation formula of the classification accuracy is as follows:
wherein TP, FN, FP, and TN respectively represent the number of samples correctly classified into positive examples, the number of samples incorrectly classified into negative examples, the number of samples incorrectly classified into positive examples, and the number of samples correctly classified into negative examples in a given machine learning algorithm;
(1-3) constructing an initial feature set of the power system operation mode data according to the effective data in the step (1-1), and updating the alternative feature set US into the initial feature set;
(1-4) carrying out discrete estimation and probability density estimation on each feature in the current candidate feature set US by adopting a Parzen window method to obtain probability distribution of the power system operation mode data features, namely the feature fiProbability distribution p (f)i) Characteristic fj,fiCombined probability distribution p (f)i,fj) Probability distribution p (C) and feature f of feature class attribute CjConditional probability distribution p (f) under feature class attribute Cj,c);
(1-5) respectively calculating the normalized mutual information NMI (f; C) of all the features f and the class attributes C in the candidate feature set US of the step (1-4) by using the following formula:
where MI (f; C) represents mutual information between the feature f and the feature class attribute C, namely:
h (f) represents the information entropy of the feature f, H (C) represents the information entropy of the feature class attribute C, and the calculation formula of the information entropy is as follows for a single continuous variable X:
H(X)=-∫p(x)log2(p(x))
wherein p (X) is the probability distribution of the variable X;
eliminating corresponding characteristics with values of zero in the NMI (f; C) from the alternative characteristic set US;
(1-6) calculating an NIG index and a Score index between any two features in the candidate feature set US in the step (1-4), and forming an IG-RFE evaluation standard result of each feature by using the following formula, namely calculating each feature of the candidate feature set US to obtain weight scores w (f) of all the features in the candidate feature set USi):
Wherein N is the number of features in the candidate feature set US;
score index Score (f)i,fj) The expression of (a) is as follows:
(1-7) weight scores w (f) for all features in step (1-6)i) Removing features corresponding to the minimum weight scores in the max (1, r x N) sequences from the alternative feature set US, wherein r is the minimum removal proportion of backward search single iteration, and N is the total number of the features in the alternative feature set US;
(1-8) judging the total number of the features in the candidate feature set US in the step (1-7), and if the US is an empty set, outputting the current selected feature subset S as an optimal feature subset to realize two-stage selection of the power system operation mode data features; if the US is not an empty set, taking the current alternative feature set US as the input of the step (2) to carry out the second-stage screening;
(2) and (2) taking the alternative feature set US obtained in the step (1) as an input of an improved mixed kernel function SVM, and performing second-stage Wrapper feature selection, wherein the second-stage Wrapper feature selection comprises the following steps:
(2-1) adopting an improved mixed kernel function support vector machine algorithm, taking the alternative feature set US as input, carrying out classification training on the alternative feature set US according to a 10-fold cross validation method, and outputting to obtain the classification accuracy Acc (US) of the current alternative feature set US;
(2-2) comparing the classification accuracy Acc (S) of the selected feature subset S with the classification accuracy of the candidate subset US calculated in the step (2-1), if Acc (US) is greater than Acc (S), updating S to US, updating Acc (S) to Acc (US), returning to the step (1-4), and if Acc (US) is less than or equal to Acc (S), directly returning to the step (1-4).
In the step (1-1) of the feature extraction method, the initial feature set is steady-state operation data information before a fault in the power system, and the initial feature set comprises element feature data and system feature data, wherein the element feature data comprises active power and reactive power of each generator set in the system before the fault, active power and reactive power of loads of nodes in the system before the fault, active power and reactive power of a power transmission line, and voltage and phase angle of each bus in the system before the fault; the system characteristic data are total active output and total reactive output of a generator in the system before the fault, all active loads and all reactive loads in the system before the fault, the sum of mechanical input power in the system before the fault, total reactive reserve capacity in the system before the fault and network topology indexes of the electric power system before the fault.
In the step (1-4) of the feature extraction method, the Parzen window method is a non-parameter estimation method, the power system operation mode data features after data cleaning in the step (1-1) are spatially divided, and the frequency is used as the probability corresponding to the spatial center point coordinate to obtain the density distribution of the operation mode data features.
In the step (1-6) of the feature extraction method, the NIG index is a normalized information gain indexMarker NIG (f)i;fj(ii) a C) The expression is as follows:
wherein, IG (f)i;fj(ii) a C) Representing a feature fi、fjAnd mutual information gain index IG (f) between class attributes Ci;fj;C)=MI(fi;fj;C)=MI(fi;C)-MI(fj;C),H(fi) Representing a feature fiThe entropy of information of (1).
In the steps (1-6) of the feature extraction method, the IG-RFE evaluation criterion of a single feature is obtained by weight matching of the degree of association NMI between standardized mutual information index measures and the degree of cooperation NIG between mutual information gain calculation features, and the IG-RFE evaluation criterion expression of a single feature is as follows:
where N is the total number of US features of the set calculated in steps (1-6).
In step (2) of the above feature extraction method, the improved mixed kernel function support vector machine algorithm uses the selected mapping function transformation as a mixed function, maps the data samples into a high-dimensional space, and distinguishes two types of data samples by a linear hyperplane in the high-dimensional space, wherein a specific expression of the improved mixed function is as follows:
Kmix=λKlocal+(1-λ)Kglobal
in the formula, KlocalRepresenting local kernel function, the local kernel function selecting RBF kernel functionKglobalRepresenting a global kernel function, selected as a polynomial kernel function k (x, x ') ═ x' + cd。
The two-stage selection method for the operation mode data characteristics of the power system, provided by the invention, has the advantages that:
1. the invention relates to a two-stage selection method for the data characteristics of an electric power system operation mode, which comprises the steps of firstly selecting a Filter stage algorithm through IG-RFE (Interaction Gain-reactive Feature) characteristics based on standardized mutual information and interactive information Gain, removing prior parameters in related algorithms and correctly judging complex dependency relations among the characteristics, thereby realizing the automatic search and extraction of the key characteristics of the electric power system operation mode data; and then removing the bad features by improving a mixed kernel function SVM algorithm in cooperation with a recursive feature elimination RFE searching method until a feature set with the result number being a preset value is obtained, so that the key features of the power system are efficiently, accurately and automatically searched.
2. The method is easy to implement, and the method realizes automatic search of the key features of the power system by introducing a data driving method in the field of artificial intelligence and designing classifiers for two more key parts, namely a collaborative Recursive Feature Elimination (RFE) search method based on a standardized mutual information and interactive information gain feature selection algorithm and an improved mixed kernel function SVM algorithm, so that the method is easy to implement.
Detailed Description
The invention provides a two-stage selection method for the operation mode data characteristics of an electric power system, which comprises the following steps:
(1) the method for selecting the characteristics of the Filter stage based on the standardized mutual information and the interactive information gain on the operation mode data of the power system comprises the following steps:
(1-1) acquiring power system operation mode data from a synchronous vector measurement unit of a power system, supplementing missing data in the data, deleting repeated data to obtain effective data, constructing a sample and characteristics of the sample for each group of operation data, marking a label of 0 or 1 on the sample according to whether the transient state is stable or not, and recording the label as a class attribute C of the sample;
(1-2) constructing a selected feature subset S, and initializing S to be an empty set; constructing an alternative feature set US, and initializing the alternative feature set US into an empty set; respectively constructing classification accuracy indexes of a set S and a set US, recording the classification accuracy indexes as Acc (S) and Acc (US), and respectively setting Acc (S) and Acc (US) as 0 during initialization;
the foregoing acc (US) and acc (S) respectively represent the classification accuracy of the target optimal feature subset S and the candidate feature set US, and the calculation formula of the classification accuracy is as follows:
wherein TP, FN, FP, and TN respectively represent the number of samples correctly classified into positive examples, the number of samples incorrectly classified into negative examples, the number of samples incorrectly classified into positive examples, and the number of samples correctly classified into negative examples in a given machine learning algorithm; reference may be made to the following table:
sample class attributes | Predicted to be 0 | Prediction is 1 |
Is actually 0 | TN | FP |
Is actually 1 | FN | TP |
(1-3) constructing an initial feature set of the power system operation mode data according to the effective data in the step (1-1), and updating the alternative feature set US into the initial feature set;
(1-4) carrying out discrete estimation and probability density estimation on each feature in the current candidate feature set US by adopting a Parzen window method to obtain probability distribution of the power system operation mode data features, namely the feature fiProbability distribution p (f)i) Characteristic fj,fiCombined probability distribution p (f)i,fj) Probability distribution p (C) and feature f of feature class attribute CjConditional probability distribution p (f) under feature class attribute Cj,c);
(1-5) respectively calculating the normalized mutual information NMI (f; C) of all the features f and the class attributes C in the candidate feature set US of the step (1-4) by using the following formula:
where MI (f; C) represents mutual information between the feature f and the feature class attribute C, namely:
h (f) represents the information entropy of the feature f, H (C) represents the information entropy of the feature class attribute C, and the calculation formula of the information entropy is as follows for a single continuous variable X:
H(X)=-∫p(x)log2(p(x))
wherein p (X) is the probability distribution of the variable X;
eliminating corresponding characteristics with values of zero in the NMI (f; C) from the alternative characteristic set US;
(1-6) calculating NIG index and Score index between any two features in the candidate feature set US in the step (1-4), and forming IG-RFE evaluation standard result of each feature by using the following formula, namely, for the candidate feature setCalculating each feature of the US to obtain the weight scores w (f) of all the features in the candidate feature set USi):
Where N is the number of features in the candidate feature set US.
Score index Score (f)i,fj) The expression of (a) is as follows:
(1-7) weight scores w (f) for all features in step (1-6)i) Removing features corresponding to the minimum weight scores in the max (1, r x N) sequences from the alternative feature set US, wherein r is the minimum removal proportion of backward search single iteration, and N is the total number of the features in the alternative feature set US;
(1-8) judging the total number of the features in the candidate feature set US in the step (1-7), if the US is an empty set, indicating that the selection process of the backward-eliminated feature subset is finished, and outputting the current selected feature subset S as an optimal feature subset to realize two-stage selection of the data features of the power system operation mode; if the US is not an empty set, taking the current alternative feature set US as the input of the step (2) to carry out the second-stage screening;
(2) and (2) taking the alternative feature set US obtained in the step (1) as an input of an improved mixed kernel function SVM, and performing second-stage Wrapper feature selection, wherein the second-stage Wrapper feature selection comprises the following steps:
(2-1) adopting an improved mixed kernel function support vector machine algorithm, taking the alternative feature set US as input, carrying out classification training on the alternative feature set US according to a 10-fold cross validation method, and outputting to obtain the classification accuracy Acc (US) of the current alternative feature set US;
(2-2) comparing the classification accuracy Acc (S) of the selected feature subset S with the classification accuracy of the candidate subset US calculated in the step (2-1), if Acc (US) is greater than Acc (S), indicating that the performance of the feature subset of the current US is better than that of the selected feature subset S, updating S to US, updating Acc (S) to Acc (US), returning to the step (1-4), and if Acc (US) is less than or equal to Acc (S), indicating that the performance of the US at the moment is not as good as that of the classification of the selected feature subset S, so updating is not performed, and returning to the step (1-4).
In the step (1-1) of the feature extraction method, the initial feature set is steady-state operation data information before a fault in the power system, and the initial feature set comprises element feature data and system feature data, wherein the element feature data comprises active power and reactive power of each generator set in the system before the fault, active power and reactive power of loads of nodes in the system before the fault, active power and reactive power of a power transmission line, and voltage and phase angle of each bus in the system before the fault; the system characteristic data are total active output and total reactive output of a generator in the system before the fault, all active loads and all reactive loads in the system before the fault, the sum of mechanical input power in the system before the fault, total reactive reserve capacity in the system before the fault and network topology indexes of the electric power system before the fault.
In the step (1-4) of the feature extraction method, the Parzen window method is a non-parameter estimation method, the power system operation mode data features after data cleaning in the step (1-1) are spatially divided, and the frequency is used as the probability corresponding to the spatial center point coordinate to obtain the density distribution of the operation mode data features.
In the steps (1-6) of the feature extraction method, the NIG index is a normalized information gain index NIG (f)i;fj(ii) a C) The expression is as follows:
wherein, IG (f)i;fj(ii) a C) Representing a feature fi、fjAnd mutual information gain index IG (f) between class attributes Ci;fj;C)=MI(fi;fj;C)=MI(fi;C)-MI(fj;C),H(fi) Representing a feature fiThe entropy of information of (1).
In the steps (1-6) of the feature extraction method, the IG-RFE evaluation criterion of a single feature is obtained by weight matching of the degree of association NMI between standardized mutual information index measures and the degree of cooperation NIG between mutual information gain calculation features, and the IG-RFE evaluation criterion expression of a single feature is as follows:
where N is the total number of US features of the set calculated in steps (1-6).
In step (2) of the above feature extraction method, the improved mixed kernel function support vector machine algorithm uses the selected mapping function transformation as a mixed function, maps the data samples into a high-dimensional space, and distinguishes two types of data samples by a linear hyperplane in the high-dimensional space, wherein a specific expression of the improved mixed function is as follows:
Kmix=λKlocal+(1-λ)Kglobal
in the formula, KlocalRepresenting local kernel function, the local kernel function selecting RBF kernel functionKglobalRepresenting a global kernel selected as a polynomial kernelk(x,x′)=(x*x'+c)d. In one embodiment of the invention, the precision requirement of data, the classification performance of the algorithm and the actual running time requirement are comprehensively considered, four parameters are tested, and the typical values given in the following table are finally selected as actual values.
Parameter(s) | σ | C | d | λ |
Value taking | 2.5 | 1 | 3 | 0.783 |
The two-stage selection method for the data characteristics of the power system operation mode comprises the steps of firstly selecting a Filter stage characteristic selection process based on an information theory, selecting a Filter stage algorithm through IG-RFE characteristics based on standardized mutual information and mutual information gain, removing prior parameters in related algorithms, and correctly judging complex dependency relations among the characteristics, including correlation, redundancy, complementarity and the like, so that the key characteristics of a power system are automatically searched and extracted. And then based on a Wrapper stage feature selection process of an improved SVM algorithm, continuously removing bad features from a current feature set to be processed by an improved mixed kernel function SVM algorithm in cooperation with a recursive feature elimination RFE search method to realize a selection process until a feature set with the number of results being a preset value is obtained, bringing the power grid operation mode data features proved by practice into initial candidate features, screening features capable of providing supplementary information, and providing as much power flow information as possible with as few features as possible so as to facilitate the implementation and monitoring of scheduling operators.
Aiming at the operation mode data of the power system with large data scale and high dimension, the method adopts a two-stage feature selection method combining a Filter stage feature selection method based on an information theory and a Wrapper stage feature selection method based on an improved SVM algorithm, is used for different training tasks, and can effectively improve the efficiency and accuracy of automatic searching of the key features of the power system.
The method of the invention is mainly divided into two stages: the method comprises the steps of firstly, providing a corresponding IG-RFE evaluation standard based on a Filter stage feature selection process of an information theory, introducing definition and basic indexes of related concepts of the information theory in the stage, wherein the evaluation standard is used as an evaluation index of a Filter stage algorithm, and has a good depicting effect on the correlation and the combined synergistic effect among features, thereby avoiding the correlation error possibly caused by artificial prior parameter setting of the traditional method to a great extent. This is one of the points of the method of the present invention. And then, performing a Wrapper stage feature selection process based on an improved SVM algorithm, and continuously removing bad features from the current feature set to be processed by the improved mixed kernel function SVM algorithm in cooperation with the recursive feature elimination RFE search method to realize a selection process until a feature set with the number of results being a preset value is obtained. This is another essential difference between the method of the present invention and other methods, and is the second invention of the present invention.
Claims (6)
1. A two-stage selection method for operation mode data characteristics of a power system comprises the following steps:
(1) the method for selecting the characteristics of the Filter stage based on the standardized mutual information and the interactive information gain on the operation mode data of the power system comprises the following steps:
(1-1) acquiring power system operation mode data from a synchronous vector measurement unit of a power system, supplementing missing data in the data, deleting repeated data to obtain effective data, constructing a sample and characteristics of the sample for each group of operation data, marking a label of 0 or 1 on the sample according to whether the transient state is stable or not, and recording the label as a class attribute C of the sample;
(1-2) constructing a selected feature subset S, and initializing S to be an empty set; constructing an alternative feature set US, and initializing the alternative feature set US into an empty set; respectively constructing classification accuracy indexes of a set S and a set US, recording the classification accuracy indexes as Acc (S) and Acc (US), and respectively setting Acc (S) and Acc (US) as 0 during initialization;
the foregoing acc (US) and acc (S) respectively represent the classification accuracy of the target optimal feature subset S and the candidate feature set US, and the calculation formula of the classification accuracy is as follows:
wherein TP, FN, FP, and TN respectively represent the number of samples correctly classified into positive examples, the number of samples incorrectly classified into negative examples, the number of samples incorrectly classified into positive examples, and the number of samples correctly classified into negative examples in a given machine learning algorithm;
(1-3) constructing an initial feature set of the power system operation mode data according to the effective data in the step (1-1), and updating the alternative feature set US into the initial feature set;
(1-4) carrying out discrete estimation and probability density estimation on each feature in the current candidate feature set US by adopting a Parzen window method to obtain probability distribution of the power system operation mode data features, namely the feature fiProbability distribution p (f)i) Characteristic fj,fiCombined probability distribution p (f)i,fj) Probability distribution p (C) and feature f of feature class attribute CjConditional probability distribution p (f) under feature class attribute Cj,c);
(1-5) respectively calculating the normalized mutual information NMI (f; C) of all the features f and the class attributes C in the candidate feature set US of the step (1-4) by using the following formula:
where MI (f; C) represents mutual information between the feature f and the feature class attribute C, namely:
h (f) represents the information entropy of the feature f, H (C) represents the information entropy of the feature class attribute C, and the calculation formula of the information entropy is as follows for a single continuous variable X:
H(X)=-∫p(x)log2(p(x))
wherein p (X) is the probability distribution of the variable X;
eliminating corresponding characteristics with values of zero in the NMI (f; C) from the alternative characteristic set US;
(1-6) calculating an NIG index and a Score index between any two features in the candidate feature set US in the step (1-4), and forming an IG-RFE evaluation standard result of each feature by using the following formula, namely calculating each feature of the candidate feature set US to obtain weight scores w (f) of all the features in the candidate feature set USi):
Wherein N is the number of features in the candidate feature set US;
score index Score (f)i,fj) The expression of (a) is as follows:
(1-7) weight scores w (f) for all features in step (1-6)i) Removing features corresponding to the minimum weight scores in the max (1, r x N) sequences from the alternative feature set US, wherein r is the minimum removal proportion of backward search single iteration, and N is the total number of the features in the alternative feature set US;
(1-8) judging the total number of the features in the candidate feature set US in the step (1-7), and if the US is an empty set, outputting the current selected feature subset S as an optimal feature subset to realize two-stage selection of the power system operation mode data features; if the US is not an empty set, taking the current alternative feature set US as the input of the step (2) to carry out the second-stage screening;
(2) and (2) taking the alternative feature set US obtained in the step (1) as an input of an improved mixed kernel function SVM, and performing second-stage Wrapper feature selection, wherein the second-stage Wrapper feature selection comprises the following steps:
(2-1) adopting an improved mixed kernel function support vector machine algorithm, taking the alternative feature set US as input, carrying out classification training on the alternative feature set US according to a 10-fold cross validation method, and outputting to obtain the classification accuracy Acc (US) of the current alternative feature set US;
(2-2) comparing the classification accuracy Acc (S) of the selected feature subset S with the classification accuracy of the candidate subset US calculated in the step (2-1), if Acc (US) is greater than Acc (S), updating S to US, updating Acc (S) to Acc (US), returning to the step (1-4), and if Acc (US) is less than or equal to Acc (S), directly returning to the step (1-4).
2. The feature extraction method according to claim 1, wherein the initial feature set in step (1-1) is steady-state operation data information before a fault in the power system, and includes element feature data and system feature data, where the element feature data includes active power and reactive power of each generator set in the system before the fault, active power and reactive power of loads of nodes in the system before the fault, active power and reactive power of the transmission line, and voltage and phase angle of each bus in the system before the fault; the system characteristic data are total active output and total reactive output of a generator in the system before the fault, all active loads and all reactive loads in the system before the fault, the sum of mechanical input power in the system before the fault, total reactive reserve capacity in the system before the fault and network topology indexes of the electric power system before the fault.
3. The feature extraction method of claim 1, wherein the Parzen window method in the step (1-4) is a non-parametric estimation method, and the operating mode data features of the power system after data cleaning in the step (1-1) are spatially divided, and the frequency is used as the probability corresponding to the coordinates of the spatial center point to obtain the density distribution of the operating mode data features.
4. The feature extraction method of claim 1, wherein the NIG index of the step (1-6) is a normalized information gain index NIG (f)i;fj(ii) a C) The expression is as follows:
wherein, IG (f)i;fj(ii) a C) Representing a feature fi、fjAnd mutual information gain index IG (f) between class attributes Ci;fj;C)=MI(fi;fj;C)=MI(fi;C)-MI(fj;C),H(fi) Representing a feature fiThe entropy of information of (1).
5. The feature extraction method of claim 1, wherein the IG-RFE evaluation criterion for the individual features in the steps (1-6) is obtained by weight matching the degree of association NMI between the normalized mutual information index measures and the degree of synergy NIG between the mutual information gain calculation features, and the IG-RFE evaluation criterion for the individual features is expressed as follows:
where N is the total number of US features of the set calculated in steps (1-6).
6. The feature extraction method of claim 1, wherein the modified mixed kernel function support vector machine algorithm in step (2) uses the selected mapping function transform as a mixing function to map the data samples into a high-dimensional space, and the two types of data samples are distinguished by a linear hyperplane in the high-dimensional space, wherein the modified mixing function is specifically expressed as follows:
Kmix=λKlocal+(1-λ)Kglobal
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011318226.8A CN112396113A (en) | 2020-11-23 | 2020-11-23 | Two-stage selection method for operation mode data characteristics of power system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011318226.8A CN112396113A (en) | 2020-11-23 | 2020-11-23 | Two-stage selection method for operation mode data characteristics of power system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112396113A true CN112396113A (en) | 2021-02-23 |
Family
ID=74606851
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011318226.8A Pending CN112396113A (en) | 2020-11-23 | 2020-11-23 | Two-stage selection method for operation mode data characteristics of power system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112396113A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114021425A (en) * | 2021-10-11 | 2022-02-08 | 清华大学 | Power system operation data modeling and feature selection method and device, electronic equipment and storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107609760A (en) * | 2017-08-30 | 2018-01-19 | 清华大学 | The key feature system of selection of power system and device |
CN107992722A (en) * | 2017-11-07 | 2018-05-04 | 大连理工大学 | Based on symmetrical uncertain and information exchange gain feature selection approach |
WO2019090878A1 (en) * | 2017-11-09 | 2019-05-16 | 合肥工业大学 | Analog circuit fault diagnosis method based on vector-valued regularized kernel function approximation |
US20200271720A1 (en) * | 2020-05-09 | 2020-08-27 | Hefei University Of Technology | Method for diagnosing analog circuit fault based on vector-valued regularized kernel function approximation |
-
2020
- 2020-11-23 CN CN202011318226.8A patent/CN112396113A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107609760A (en) * | 2017-08-30 | 2018-01-19 | 清华大学 | The key feature system of selection of power system and device |
CN107992722A (en) * | 2017-11-07 | 2018-05-04 | 大连理工大学 | Based on symmetrical uncertain and information exchange gain feature selection approach |
WO2019090878A1 (en) * | 2017-11-09 | 2019-05-16 | 合肥工业大学 | Analog circuit fault diagnosis method based on vector-valued regularized kernel function approximation |
US20200271720A1 (en) * | 2020-05-09 | 2020-08-27 | Hefei University Of Technology | Method for diagnosing analog circuit fault based on vector-valued regularized kernel function approximation |
Non-Patent Citations (1)
Title |
---|
徐遐龄 等: "考虑特征组合效应的电网关键稳定特征筛选方法研究", 《中国电机工程学报》, vol. 38, no. 8, 20 April 2018 (2018-04-20), pages 2232 - 2238 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114021425A (en) * | 2021-10-11 | 2022-02-08 | 清华大学 | Power system operation data modeling and feature selection method and device, electronic equipment and storage medium |
CN114021425B (en) * | 2021-10-11 | 2024-04-12 | 清华大学 | Power system operation data modeling and feature selection method and device, electronic equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Nguyen et al. | Filter based backward elimination in wrapper based PSO for feature selection in classification | |
CN106897821B (en) | Transient evaluation feature selection method and device | |
Lane et al. | Gaussian based particle swarm optimisation and statistical clustering for feature selection | |
Nguyen et al. | PSO and statistical clustering for feature selection: A new representation | |
Naik et al. | Genetic algorithm-aided dynamic fuzzy rule interpolation | |
CN110781174A (en) | Feature engineering modeling method and system using pca and feature intersection | |
Mo et al. | Power transformer fault diagnosis using support vector machine and particle swarm optimization | |
CN112396113A (en) | Two-stage selection method for operation mode data characteristics of power system | |
Wu et al. | Remaining useful life prediction of Lithium-ion batteries based on PSO-RF algorithm | |
CN114186862A (en) | Entropy weight TOPSIS model-based double-layer energy performance evaluation system | |
CN109074348A (en) | For being iterated the equipment and alternative manner of cluster to input data set | |
CN116628136A (en) | Collaborative query processing method, system and electronic equipment based on declarative reasoning | |
CN115713032A (en) | Power grid prevention control method, device, equipment and medium | |
CN116225752A (en) | Fault root cause analysis method and system for micro-service system based on fault mode library | |
CN115936303A (en) | Transient voltage safety analysis method based on machine learning model | |
Erhart et al. | Constructing Local Bases for a Deep Variational Quantum Eigensolver for Molecular Systems | |
CN111814394B (en) | Power system safety assessment method based on correlation and redundancy detection | |
CN109713665B (en) | Minimum collision set algorithm suitable for multiple multiphase faults of power distribution network | |
Sagar et al. | Error Evaluation on K-Means and Hierarchical Clustering with Effect of Distance Functions for Iris Dataset | |
Zarif et al. | Improving performance of multi-label classification using ensemble of feature selection and outlier detection | |
Li et al. | Prediction of pareto dominance using an attribute tendency model for expensive multi-objective optimization | |
Saha et al. | Optimized Decision Tree-based Early Phase Software Dependability Analysis in Uncertain Environment | |
Drias et al. | Swarm intelligence with clustering for solving SAT | |
Wang et al. | Transient Stability Evaluation of Power System based on Neighborhood Rough Set and Extreme Learning Machine [J] | |
Lu et al. | Power System Transient Stability Assessment Based on Graph Convolutional Network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |