CN112396113A - Two-stage selection method for operation mode data characteristics of power system - Google Patents

Two-stage selection method for operation mode data characteristics of power system Download PDF

Info

Publication number
CN112396113A
CN112396113A CN202011318226.8A CN202011318226A CN112396113A CN 112396113 A CN112396113 A CN 112396113A CN 202011318226 A CN202011318226 A CN 202011318226A CN 112396113 A CN112396113 A CN 112396113A
Authority
CN
China
Prior art keywords
feature
data
features
power system
acc
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011318226.8A
Other languages
Chinese (zh)
Inventor
夏德明
胡伟
阴宏民
田增垚
刘洋
王克非
岳涵
侯凯元
屈可丁
沈毅
张博闻
马坤
蒋振宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Northeast Branch Of State Grid Corp Of China
Tsinghua University
State Grid Corp of China SGCC
Original Assignee
Northeast Branch Of State Grid Corp Of China
Tsinghua University
State Grid Corp of China SGCC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northeast Branch Of State Grid Corp Of China, Tsinghua University, State Grid Corp of China SGCC filed Critical Northeast Branch Of State Grid Corp Of China
Priority to CN202011318226.8A priority Critical patent/CN112396113A/en
Publication of CN112396113A publication Critical patent/CN112396113A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/211Selection of the most significant subset of features
    • G06F18/2113Selection of the most significant subset of features by ranking or filtering the set of features, e.g. using a measure of variance or of feature cross-correlation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/10Pre-processing; Data cleansing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines

Abstract

The invention belongs to the technical field of operation and control of an electric power system, and relates to a two-stage selection method for data characteristics of an electric power system operation mode. Firstly, selecting a Filter stage algorithm through IG-RFE characteristics based on standardized mutual information and interactive information gain, removing prior parameters in related algorithms and correctly judging complex dependency relations among the characteristics, thereby realizing automatic searching and extraction of key characteristics of power system operation mode data; and then removing the bad features by improving a mixed kernel function SVM algorithm in cooperation with a recursive feature elimination RFE searching method until a feature set with the result number being a preset value is obtained, so that the key features of the power system are efficiently, accurately and automatically searched. The method improves the efficiency and accuracy of the selection of the data characteristics of the operation mode of the power system, and provides a technical basis and a practical method for the selection of the data characteristics of the operation mode of the power grid.

Description

Two-stage selection method for operation mode data characteristics of power system
Technical Field
The invention belongs to the technical field of operation and control of an electric power system, and relates to a two-stage selection method for data characteristics of an electric power system operation mode.
Background
The scale of the ultrahigh voltage alternating current-direct current hybrid power grid in China is continuously enlarged, the wide access of new energy and the two-way interaction degree of a flexible load and the power grid are increased, so that the uncertainty of the source load on both sides is increased, the characteristics of the power grid are increasingly complex, and great challenges are provided for dispatching operators to monitor and regulate the safe and stable operation of the power grid. Therefore, the potential safety and stability problem of the power grid is researched, the observability and the controllability of the power system are improved, the transient stability of the large power grid is efficiently and quickly evaluated, and the method has important significance for maintaining the safe and stable operation of the power system. With the maturity of wide area measurement technology and the development of big data theory, online TSA based on an artificial intelligence method provides a new idea for intelligent control of a large power grid.
Because the actual large-scale alternating current-direct current hybrid system contains numerous variables, has large data scale and high dimension, only a few characteristic quantities influence the stability level of the system, most of the characteristics are redundant, and if the system is used as the input of a transient evaluation model, the system will influence the calculation efficiency, the classification effect and the requirements of online application, so that the initial input characteristics in the power system need to be screened to find the characteristics playing a key role in the researched safety and stability problem.
Disclosure of Invention
The invention aims to provide a two-stage selection method for the data characteristics of the operation mode of an electric power system, which comprises a Filter stage characteristic selection process based on an information theory and a Wrapper stage characteristic selection process based on an improved SVM algorithm.
The invention provides a two-stage selection method for the operation mode data characteristics of an electric power system, which comprises the following steps:
(1) the method for selecting the characteristics of the Filter stage based on the standardized mutual information and the interactive information gain on the operation mode data of the power system comprises the following steps:
(1-1) acquiring power system operation mode data from a synchronous vector measurement unit of a power system, supplementing missing data in the data, deleting repeated data to obtain effective data, constructing a sample and characteristics of the sample for each group of operation data, marking a label of 0 or 1 on the sample according to whether the transient state is stable or not, and recording the label as a class attribute C of the sample;
(1-2) constructing a selected feature subset S, and initializing S to be an empty set; constructing an alternative feature set US, and initializing the alternative feature set US into an empty set; respectively constructing classification accuracy indexes of a set S and a set US, recording the classification accuracy indexes as Acc (S) and Acc (US), and respectively setting Acc (S) and Acc (US) as 0 during initialization;
the foregoing acc (US) and acc (S) respectively represent the classification accuracy of the target optimal feature subset S and the candidate feature set US, and the calculation formula of the classification accuracy is as follows:
Figure BDA0002791959330000021
wherein TP, FN, FP, and TN respectively represent the number of samples correctly classified into positive examples, the number of samples incorrectly classified into negative examples, the number of samples incorrectly classified into positive examples, and the number of samples correctly classified into negative examples in a given machine learning algorithm;
(1-3) constructing an initial feature set of the power system operation mode data according to the effective data in the step (1-1), and updating the alternative feature set US into the initial feature set;
(1-4) carrying out discrete estimation and probability density estimation on each feature in the current candidate feature set US by adopting a Parzen window method to obtain probability distribution of the power system operation mode data features, namely the feature fiProbability distribution p (f)i) Characteristic fj,fiCombined probability distribution p (f)i,fj) Probability distribution p (C) and feature f of feature class attribute CjConditional probability distribution p (f) under feature class attribute Cj,c);
(1-5) respectively calculating the normalized mutual information NMI (f; C) of all the features f and the class attributes C in the candidate feature set US of the step (1-4) by using the following formula:
Figure BDA0002791959330000022
where MI (f; C) represents mutual information between the feature f and the feature class attribute C, namely:
Figure BDA0002791959330000023
Figure BDA0002791959330000024
h (f) represents the information entropy of the feature f, H (C) represents the information entropy of the feature class attribute C, and the calculation formula of the information entropy is as follows for a single continuous variable X:
H(X)=-∫p(x)log2(p(x))
wherein p (X) is the probability distribution of the variable X;
eliminating corresponding characteristics with values of zero in the NMI (f; C) from the alternative characteristic set US;
(1-6) calculating an NIG index and a Score index between any two features in the candidate feature set US in the step (1-4), and forming an IG-RFE evaluation standard result of each feature by using the following formula, namely calculating each feature of the candidate feature set US to obtain weight scores w (f) of all the features in the candidate feature set USi):
Figure BDA0002791959330000031
Wherein N is the number of features in the candidate feature set US;
score index Score (f)i,fj) The expression of (a) is as follows:
Figure BDA0002791959330000032
(1-7) weight scores w (f) for all features in step (1-6)i) Removing features corresponding to the minimum weight scores in the max (1, r x N) sequences from the alternative feature set US, wherein r is the minimum removal proportion of backward search single iteration, and N is the total number of the features in the alternative feature set US;
(1-8) judging the total number of the features in the candidate feature set US in the step (1-7), and if the US is an empty set, outputting the current selected feature subset S as an optimal feature subset to realize two-stage selection of the power system operation mode data features; if the US is not an empty set, taking the current alternative feature set US as the input of the step (2) to carry out the second-stage screening;
(2) and (2) taking the alternative feature set US obtained in the step (1) as an input of an improved mixed kernel function SVM, and performing second-stage Wrapper feature selection, wherein the second-stage Wrapper feature selection comprises the following steps:
(2-1) adopting an improved mixed kernel function support vector machine algorithm, taking the alternative feature set US as input, carrying out classification training on the alternative feature set US according to a 10-fold cross validation method, and outputting to obtain the classification accuracy Acc (US) of the current alternative feature set US;
(2-2) comparing the classification accuracy Acc (S) of the selected feature subset S with the classification accuracy of the candidate subset US calculated in the step (2-1), if Acc (US) is greater than Acc (S), updating S to US, updating Acc (S) to Acc (US), returning to the step (1-4), and if Acc (US) is less than or equal to Acc (S), directly returning to the step (1-4).
In the step (1-1) of the feature extraction method, the initial feature set is steady-state operation data information before a fault in the power system, and the initial feature set comprises element feature data and system feature data, wherein the element feature data comprises active power and reactive power of each generator set in the system before the fault, active power and reactive power of loads of nodes in the system before the fault, active power and reactive power of a power transmission line, and voltage and phase angle of each bus in the system before the fault; the system characteristic data are total active output and total reactive output of a generator in the system before the fault, all active loads and all reactive loads in the system before the fault, the sum of mechanical input power in the system before the fault, total reactive reserve capacity in the system before the fault and network topology indexes of the electric power system before the fault.
In the step (1-4) of the feature extraction method, the Parzen window method is a non-parameter estimation method, the power system operation mode data features after data cleaning in the step (1-1) are spatially divided, and the frequency is used as the probability corresponding to the spatial center point coordinate to obtain the density distribution of the operation mode data features.
In the step (1-6) of the feature extraction method, the NIG index is a normalized information gain indexMarker NIG (f)i;fj(ii) a C) The expression is as follows:
Figure BDA0002791959330000041
wherein, IG (f)i;fj(ii) a C) Representing a feature fi、fjAnd mutual information gain index IG (f) between class attributes Ci;fj;C)=MI(fi;fj;C)=MI(fi;C)-MI(fj;C),H(fi) Representing a feature fiThe entropy of information of (1).
In the steps (1-6) of the feature extraction method, the IG-RFE evaluation criterion of a single feature is obtained by weight matching of the degree of association NMI between standardized mutual information index measures and the degree of cooperation NIG between mutual information gain calculation features, and the IG-RFE evaluation criterion expression of a single feature is as follows:
Figure BDA0002791959330000042
where N is the total number of US features of the set calculated in steps (1-6).
In step (2) of the above feature extraction method, the improved mixed kernel function support vector machine algorithm uses the selected mapping function transformation as a mixed function, maps the data samples into a high-dimensional space, and distinguishes two types of data samples by a linear hyperplane in the high-dimensional space, wherein a specific expression of the improved mixed function is as follows:
Kmix=λKlocal+(1-λ)Kglobal
in the formula, KlocalRepresenting local kernel function, the local kernel function selecting RBF kernel function
Figure BDA0002791959330000051
KglobalRepresenting a global kernel function, selected as a polynomial kernel function k (x, x ') ═ x' + cd
The two-stage selection method for the operation mode data characteristics of the power system, provided by the invention, has the advantages that:
1. the invention relates to a two-stage selection method for the data characteristics of an electric power system operation mode, which comprises the steps of firstly selecting a Filter stage algorithm through IG-RFE (Interaction Gain-reactive Feature) characteristics based on standardized mutual information and interactive information Gain, removing prior parameters in related algorithms and correctly judging complex dependency relations among the characteristics, thereby realizing the automatic search and extraction of the key characteristics of the electric power system operation mode data; and then removing the bad features by improving a mixed kernel function SVM algorithm in cooperation with a recursive feature elimination RFE searching method until a feature set with the result number being a preset value is obtained, so that the key features of the power system are efficiently, accurately and automatically searched.
2. The method is easy to implement, and the method realizes automatic search of the key features of the power system by introducing a data driving method in the field of artificial intelligence and designing classifiers for two more key parts, namely a collaborative Recursive Feature Elimination (RFE) search method based on a standardized mutual information and interactive information gain feature selection algorithm and an improved mixed kernel function SVM algorithm, so that the method is easy to implement.
Detailed Description
The invention provides a two-stage selection method for the operation mode data characteristics of an electric power system, which comprises the following steps:
(1) the method for selecting the characteristics of the Filter stage based on the standardized mutual information and the interactive information gain on the operation mode data of the power system comprises the following steps:
(1-1) acquiring power system operation mode data from a synchronous vector measurement unit of a power system, supplementing missing data in the data, deleting repeated data to obtain effective data, constructing a sample and characteristics of the sample for each group of operation data, marking a label of 0 or 1 on the sample according to whether the transient state is stable or not, and recording the label as a class attribute C of the sample;
(1-2) constructing a selected feature subset S, and initializing S to be an empty set; constructing an alternative feature set US, and initializing the alternative feature set US into an empty set; respectively constructing classification accuracy indexes of a set S and a set US, recording the classification accuracy indexes as Acc (S) and Acc (US), and respectively setting Acc (S) and Acc (US) as 0 during initialization;
the foregoing acc (US) and acc (S) respectively represent the classification accuracy of the target optimal feature subset S and the candidate feature set US, and the calculation formula of the classification accuracy is as follows:
Figure BDA0002791959330000052
wherein TP, FN, FP, and TN respectively represent the number of samples correctly classified into positive examples, the number of samples incorrectly classified into negative examples, the number of samples incorrectly classified into positive examples, and the number of samples correctly classified into negative examples in a given machine learning algorithm; reference may be made to the following table:
sample class attributes Predicted to be 0 Prediction is 1
Is actually 0 TN FP
Is actually 1 FN TP
(1-3) constructing an initial feature set of the power system operation mode data according to the effective data in the step (1-1), and updating the alternative feature set US into the initial feature set;
(1-4) carrying out discrete estimation and probability density estimation on each feature in the current candidate feature set US by adopting a Parzen window method to obtain probability distribution of the power system operation mode data features, namely the feature fiProbability distribution p (f)i) Characteristic fj,fiCombined probability distribution p (f)i,fj) Probability distribution p (C) and feature f of feature class attribute CjConditional probability distribution p (f) under feature class attribute Cj,c);
(1-5) respectively calculating the normalized mutual information NMI (f; C) of all the features f and the class attributes C in the candidate feature set US of the step (1-4) by using the following formula:
Figure BDA0002791959330000061
where MI (f; C) represents mutual information between the feature f and the feature class attribute C, namely:
Figure BDA0002791959330000062
Figure BDA0002791959330000063
h (f) represents the information entropy of the feature f, H (C) represents the information entropy of the feature class attribute C, and the calculation formula of the information entropy is as follows for a single continuous variable X:
H(X)=-∫p(x)log2(p(x))
wherein p (X) is the probability distribution of the variable X;
eliminating corresponding characteristics with values of zero in the NMI (f; C) from the alternative characteristic set US;
(1-6) calculating NIG index and Score index between any two features in the candidate feature set US in the step (1-4), and forming IG-RFE evaluation standard result of each feature by using the following formula, namely, for the candidate feature setCalculating each feature of the US to obtain the weight scores w (f) of all the features in the candidate feature set USi):
Figure BDA0002791959330000071
Where N is the number of features in the candidate feature set US.
Score index Score (f)i,fj) The expression of (a) is as follows:
Figure BDA0002791959330000072
(1-7) weight scores w (f) for all features in step (1-6)i) Removing features corresponding to the minimum weight scores in the max (1, r x N) sequences from the alternative feature set US, wherein r is the minimum removal proportion of backward search single iteration, and N is the total number of the features in the alternative feature set US;
(1-8) judging the total number of the features in the candidate feature set US in the step (1-7), if the US is an empty set, indicating that the selection process of the backward-eliminated feature subset is finished, and outputting the current selected feature subset S as an optimal feature subset to realize two-stage selection of the data features of the power system operation mode; if the US is not an empty set, taking the current alternative feature set US as the input of the step (2) to carry out the second-stage screening;
(2) and (2) taking the alternative feature set US obtained in the step (1) as an input of an improved mixed kernel function SVM, and performing second-stage Wrapper feature selection, wherein the second-stage Wrapper feature selection comprises the following steps:
(2-1) adopting an improved mixed kernel function support vector machine algorithm, taking the alternative feature set US as input, carrying out classification training on the alternative feature set US according to a 10-fold cross validation method, and outputting to obtain the classification accuracy Acc (US) of the current alternative feature set US;
(2-2) comparing the classification accuracy Acc (S) of the selected feature subset S with the classification accuracy of the candidate subset US calculated in the step (2-1), if Acc (US) is greater than Acc (S), indicating that the performance of the feature subset of the current US is better than that of the selected feature subset S, updating S to US, updating Acc (S) to Acc (US), returning to the step (1-4), and if Acc (US) is less than or equal to Acc (S), indicating that the performance of the US at the moment is not as good as that of the classification of the selected feature subset S, so updating is not performed, and returning to the step (1-4).
In the step (1-1) of the feature extraction method, the initial feature set is steady-state operation data information before a fault in the power system, and the initial feature set comprises element feature data and system feature data, wherein the element feature data comprises active power and reactive power of each generator set in the system before the fault, active power and reactive power of loads of nodes in the system before the fault, active power and reactive power of a power transmission line, and voltage and phase angle of each bus in the system before the fault; the system characteristic data are total active output and total reactive output of a generator in the system before the fault, all active loads and all reactive loads in the system before the fault, the sum of mechanical input power in the system before the fault, total reactive reserve capacity in the system before the fault and network topology indexes of the electric power system before the fault.
In the step (1-4) of the feature extraction method, the Parzen window method is a non-parameter estimation method, the power system operation mode data features after data cleaning in the step (1-1) are spatially divided, and the frequency is used as the probability corresponding to the spatial center point coordinate to obtain the density distribution of the operation mode data features.
In the steps (1-6) of the feature extraction method, the NIG index is a normalized information gain index NIG (f)i;fj(ii) a C) The expression is as follows:
Figure BDA0002791959330000081
wherein, IG (f)i;fj(ii) a C) Representing a feature fi、fjAnd mutual information gain index IG (f) between class attributes Ci;fj;C)=MI(fi;fj;C)=MI(fi;C)-MI(fj;C),H(fi) Representing a feature fiThe entropy of information of (1).
In the steps (1-6) of the feature extraction method, the IG-RFE evaluation criterion of a single feature is obtained by weight matching of the degree of association NMI between standardized mutual information index measures and the degree of cooperation NIG between mutual information gain calculation features, and the IG-RFE evaluation criterion expression of a single feature is as follows:
Figure BDA0002791959330000082
where N is the total number of US features of the set calculated in steps (1-6).
In step (2) of the above feature extraction method, the improved mixed kernel function support vector machine algorithm uses the selected mapping function transformation as a mixed function, maps the data samples into a high-dimensional space, and distinguishes two types of data samples by a linear hyperplane in the high-dimensional space, wherein a specific expression of the improved mixed function is as follows:
Kmix=λKlocal+(1-λ)Kglobal
in the formula, KlocalRepresenting local kernel function, the local kernel function selecting RBF kernel function
Figure BDA0002791959330000083
KglobalRepresenting a global kernel selected as a polynomial kernelk(x,x′)=(x*x'+c)d. In one embodiment of the invention, the precision requirement of data, the classification performance of the algorithm and the actual running time requirement are comprehensively considered, four parameters are tested, and the typical values given in the following table are finally selected as actual values.
Parameter(s) σ C d λ
Value taking 2.5 1 3 0.783
The two-stage selection method for the data characteristics of the power system operation mode comprises the steps of firstly selecting a Filter stage characteristic selection process based on an information theory, selecting a Filter stage algorithm through IG-RFE characteristics based on standardized mutual information and mutual information gain, removing prior parameters in related algorithms, and correctly judging complex dependency relations among the characteristics, including correlation, redundancy, complementarity and the like, so that the key characteristics of a power system are automatically searched and extracted. And then based on a Wrapper stage feature selection process of an improved SVM algorithm, continuously removing bad features from a current feature set to be processed by an improved mixed kernel function SVM algorithm in cooperation with a recursive feature elimination RFE search method to realize a selection process until a feature set with the number of results being a preset value is obtained, bringing the power grid operation mode data features proved by practice into initial candidate features, screening features capable of providing supplementary information, and providing as much power flow information as possible with as few features as possible so as to facilitate the implementation and monitoring of scheduling operators.
Aiming at the operation mode data of the power system with large data scale and high dimension, the method adopts a two-stage feature selection method combining a Filter stage feature selection method based on an information theory and a Wrapper stage feature selection method based on an improved SVM algorithm, is used for different training tasks, and can effectively improve the efficiency and accuracy of automatic searching of the key features of the power system.
The method of the invention is mainly divided into two stages: the method comprises the steps of firstly, providing a corresponding IG-RFE evaluation standard based on a Filter stage feature selection process of an information theory, introducing definition and basic indexes of related concepts of the information theory in the stage, wherein the evaluation standard is used as an evaluation index of a Filter stage algorithm, and has a good depicting effect on the correlation and the combined synergistic effect among features, thereby avoiding the correlation error possibly caused by artificial prior parameter setting of the traditional method to a great extent. This is one of the points of the method of the present invention. And then, performing a Wrapper stage feature selection process based on an improved SVM algorithm, and continuously removing bad features from the current feature set to be processed by the improved mixed kernel function SVM algorithm in cooperation with the recursive feature elimination RFE search method to realize a selection process until a feature set with the number of results being a preset value is obtained. This is another essential difference between the method of the present invention and other methods, and is the second invention of the present invention.

Claims (6)

1. A two-stage selection method for operation mode data characteristics of a power system comprises the following steps:
(1) the method for selecting the characteristics of the Filter stage based on the standardized mutual information and the interactive information gain on the operation mode data of the power system comprises the following steps:
(1-1) acquiring power system operation mode data from a synchronous vector measurement unit of a power system, supplementing missing data in the data, deleting repeated data to obtain effective data, constructing a sample and characteristics of the sample for each group of operation data, marking a label of 0 or 1 on the sample according to whether the transient state is stable or not, and recording the label as a class attribute C of the sample;
(1-2) constructing a selected feature subset S, and initializing S to be an empty set; constructing an alternative feature set US, and initializing the alternative feature set US into an empty set; respectively constructing classification accuracy indexes of a set S and a set US, recording the classification accuracy indexes as Acc (S) and Acc (US), and respectively setting Acc (S) and Acc (US) as 0 during initialization;
the foregoing acc (US) and acc (S) respectively represent the classification accuracy of the target optimal feature subset S and the candidate feature set US, and the calculation formula of the classification accuracy is as follows:
Figure FDA0002791959320000011
wherein TP, FN, FP, and TN respectively represent the number of samples correctly classified into positive examples, the number of samples incorrectly classified into negative examples, the number of samples incorrectly classified into positive examples, and the number of samples correctly classified into negative examples in a given machine learning algorithm;
(1-3) constructing an initial feature set of the power system operation mode data according to the effective data in the step (1-1), and updating the alternative feature set US into the initial feature set;
(1-4) carrying out discrete estimation and probability density estimation on each feature in the current candidate feature set US by adopting a Parzen window method to obtain probability distribution of the power system operation mode data features, namely the feature fiProbability distribution p (f)i) Characteristic fj,fiCombined probability distribution p (f)i,fj) Probability distribution p (C) and feature f of feature class attribute CjConditional probability distribution p (f) under feature class attribute Cj,c);
(1-5) respectively calculating the normalized mutual information NMI (f; C) of all the features f and the class attributes C in the candidate feature set US of the step (1-4) by using the following formula:
Figure FDA0002791959320000021
where MI (f; C) represents mutual information between the feature f and the feature class attribute C, namely:
Figure FDA0002791959320000022
Figure FDA0002791959320000023
h (f) represents the information entropy of the feature f, H (C) represents the information entropy of the feature class attribute C, and the calculation formula of the information entropy is as follows for a single continuous variable X:
H(X)=-∫p(x)log2(p(x))
wherein p (X) is the probability distribution of the variable X;
eliminating corresponding characteristics with values of zero in the NMI (f; C) from the alternative characteristic set US;
(1-6) calculating an NIG index and a Score index between any two features in the candidate feature set US in the step (1-4), and forming an IG-RFE evaluation standard result of each feature by using the following formula, namely calculating each feature of the candidate feature set US to obtain weight scores w (f) of all the features in the candidate feature set USi):
Figure FDA0002791959320000024
Wherein N is the number of features in the candidate feature set US;
score index Score (f)i,fj) The expression of (a) is as follows:
Figure FDA0002791959320000025
(1-7) weight scores w (f) for all features in step (1-6)i) Removing features corresponding to the minimum weight scores in the max (1, r x N) sequences from the alternative feature set US, wherein r is the minimum removal proportion of backward search single iteration, and N is the total number of the features in the alternative feature set US;
(1-8) judging the total number of the features in the candidate feature set US in the step (1-7), and if the US is an empty set, outputting the current selected feature subset S as an optimal feature subset to realize two-stage selection of the power system operation mode data features; if the US is not an empty set, taking the current alternative feature set US as the input of the step (2) to carry out the second-stage screening;
(2) and (2) taking the alternative feature set US obtained in the step (1) as an input of an improved mixed kernel function SVM, and performing second-stage Wrapper feature selection, wherein the second-stage Wrapper feature selection comprises the following steps:
(2-1) adopting an improved mixed kernel function support vector machine algorithm, taking the alternative feature set US as input, carrying out classification training on the alternative feature set US according to a 10-fold cross validation method, and outputting to obtain the classification accuracy Acc (US) of the current alternative feature set US;
(2-2) comparing the classification accuracy Acc (S) of the selected feature subset S with the classification accuracy of the candidate subset US calculated in the step (2-1), if Acc (US) is greater than Acc (S), updating S to US, updating Acc (S) to Acc (US), returning to the step (1-4), and if Acc (US) is less than or equal to Acc (S), directly returning to the step (1-4).
2. The feature extraction method according to claim 1, wherein the initial feature set in step (1-1) is steady-state operation data information before a fault in the power system, and includes element feature data and system feature data, where the element feature data includes active power and reactive power of each generator set in the system before the fault, active power and reactive power of loads of nodes in the system before the fault, active power and reactive power of the transmission line, and voltage and phase angle of each bus in the system before the fault; the system characteristic data are total active output and total reactive output of a generator in the system before the fault, all active loads and all reactive loads in the system before the fault, the sum of mechanical input power in the system before the fault, total reactive reserve capacity in the system before the fault and network topology indexes of the electric power system before the fault.
3. The feature extraction method of claim 1, wherein the Parzen window method in the step (1-4) is a non-parametric estimation method, and the operating mode data features of the power system after data cleaning in the step (1-1) are spatially divided, and the frequency is used as the probability corresponding to the coordinates of the spatial center point to obtain the density distribution of the operating mode data features.
4. The feature extraction method of claim 1, wherein the NIG index of the step (1-6) is a normalized information gain index NIG (f)i;fj(ii) a C) The expression is as follows:
Figure FDA0002791959320000031
wherein, IG (f)i;fj(ii) a C) Representing a feature fi、fjAnd mutual information gain index IG (f) between class attributes Ci;fj;C)=MI(fi;fj;C)=MI(fi;C)-MI(fj;C),H(fi) Representing a feature fiThe entropy of information of (1).
5. The feature extraction method of claim 1, wherein the IG-RFE evaluation criterion for the individual features in the steps (1-6) is obtained by weight matching the degree of association NMI between the normalized mutual information index measures and the degree of synergy NIG between the mutual information gain calculation features, and the IG-RFE evaluation criterion for the individual features is expressed as follows:
Figure FDA0002791959320000041
where N is the total number of US features of the set calculated in steps (1-6).
6. The feature extraction method of claim 1, wherein the modified mixed kernel function support vector machine algorithm in step (2) uses the selected mapping function transform as a mixing function to map the data samples into a high-dimensional space, and the two types of data samples are distinguished by a linear hyperplane in the high-dimensional space, wherein the modified mixing function is specifically expressed as follows:
Kmix=λKlocal+(1-λ)Kglobal
in the formula, KlocalRepresenting local kernel function, the local kernel function selecting RBF kernel function
Figure FDA0002791959320000042
KglobalRepresenting a global kernel function, selected as a polynomial kernel function k (x, x ') ═ x' + cd
CN202011318226.8A 2020-11-23 2020-11-23 Two-stage selection method for operation mode data characteristics of power system Pending CN112396113A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011318226.8A CN112396113A (en) 2020-11-23 2020-11-23 Two-stage selection method for operation mode data characteristics of power system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011318226.8A CN112396113A (en) 2020-11-23 2020-11-23 Two-stage selection method for operation mode data characteristics of power system

Publications (1)

Publication Number Publication Date
CN112396113A true CN112396113A (en) 2021-02-23

Family

ID=74606851

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011318226.8A Pending CN112396113A (en) 2020-11-23 2020-11-23 Two-stage selection method for operation mode data characteristics of power system

Country Status (1)

Country Link
CN (1) CN112396113A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114021425A (en) * 2021-10-11 2022-02-08 清华大学 Power system operation data modeling and feature selection method and device, electronic equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107609760A (en) * 2017-08-30 2018-01-19 清华大学 The key feature system of selection of power system and device
CN107992722A (en) * 2017-11-07 2018-05-04 大连理工大学 Based on symmetrical uncertain and information exchange gain feature selection approach
WO2019090878A1 (en) * 2017-11-09 2019-05-16 合肥工业大学 Analog circuit fault diagnosis method based on vector-valued regularized kernel function approximation
US20200271720A1 (en) * 2020-05-09 2020-08-27 Hefei University Of Technology Method for diagnosing analog circuit fault based on vector-valued regularized kernel function approximation

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107609760A (en) * 2017-08-30 2018-01-19 清华大学 The key feature system of selection of power system and device
CN107992722A (en) * 2017-11-07 2018-05-04 大连理工大学 Based on symmetrical uncertain and information exchange gain feature selection approach
WO2019090878A1 (en) * 2017-11-09 2019-05-16 合肥工业大学 Analog circuit fault diagnosis method based on vector-valued regularized kernel function approximation
US20200271720A1 (en) * 2020-05-09 2020-08-27 Hefei University Of Technology Method for diagnosing analog circuit fault based on vector-valued regularized kernel function approximation

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
徐遐龄 等: "考虑特征组合效应的电网关键稳定特征筛选方法研究", 《中国电机工程学报》, vol. 38, no. 8, 20 April 2018 (2018-04-20), pages 2232 - 2238 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114021425A (en) * 2021-10-11 2022-02-08 清华大学 Power system operation data modeling and feature selection method and device, electronic equipment and storage medium
CN114021425B (en) * 2021-10-11 2024-04-12 清华大学 Power system operation data modeling and feature selection method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
Nguyen et al. Filter based backward elimination in wrapper based PSO for feature selection in classification
CN106897821B (en) Transient evaluation feature selection method and device
Lane et al. Gaussian based particle swarm optimisation and statistical clustering for feature selection
Nguyen et al. PSO and statistical clustering for feature selection: A new representation
Naik et al. Genetic algorithm-aided dynamic fuzzy rule interpolation
CN110781174A (en) Feature engineering modeling method and system using pca and feature intersection
Mo et al. Power transformer fault diagnosis using support vector machine and particle swarm optimization
CN112396113A (en) Two-stage selection method for operation mode data characteristics of power system
Wu et al. Remaining useful life prediction of Lithium-ion batteries based on PSO-RF algorithm
CN114186862A (en) Entropy weight TOPSIS model-based double-layer energy performance evaluation system
CN109074348A (en) For being iterated the equipment and alternative manner of cluster to input data set
CN116628136A (en) Collaborative query processing method, system and electronic equipment based on declarative reasoning
CN115713032A (en) Power grid prevention control method, device, equipment and medium
CN116225752A (en) Fault root cause analysis method and system for micro-service system based on fault mode library
CN115936303A (en) Transient voltage safety analysis method based on machine learning model
Erhart et al. Constructing Local Bases for a Deep Variational Quantum Eigensolver for Molecular Systems
CN111814394B (en) Power system safety assessment method based on correlation and redundancy detection
CN109713665B (en) Minimum collision set algorithm suitable for multiple multiphase faults of power distribution network
Sagar et al. Error Evaluation on K-Means and Hierarchical Clustering with Effect of Distance Functions for Iris Dataset
Zarif et al. Improving performance of multi-label classification using ensemble of feature selection and outlier detection
Li et al. Prediction of pareto dominance using an attribute tendency model for expensive multi-objective optimization
Saha et al. Optimized Decision Tree-based Early Phase Software Dependability Analysis in Uncertain Environment
Drias et al. Swarm intelligence with clustering for solving SAT
Wang et al. Transient Stability Evaluation of Power System based on Neighborhood Rough Set and Extreme Learning Machine [J]
Lu et al. Power System Transient Stability Assessment Based on Graph Convolutional Network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination