CN116031879A

CN116031879A - A hybrid intelligent feature selection method for power system transient voltage stability assessment

Info

Publication number: CN116031879A
Application number: CN202310174560.8A
Authority: CN
Inventors: 王渝红; 朱玲俐; 郑宗生; 周旭; 李晨鑫; 史云翔; 何其多; 陈明雪
Original assignee: Sichuan University
Current assignee: Sichuan University
Priority date: 2023-02-28
Filing date: 2023-02-28
Publication date: 2023-04-28

Abstract

The invention discloses a hybrid intelligent feature selection method suitable for transient voltage stability evaluation of a power system, which mainly comprises the following steps: s1, sample generation: the method comprises the steps of obtaining a high-dimensional time sequence sample data set through stability related factor analysis, original feature set construction and multi-scene transient time domain simulation; s2, feature effectiveness measurement and preliminary screening based on a T-Relief algorithm; s3, feature selection and stability evaluation based on an improved population intelligent algorithm: the search performance of the group intelligent optimization algorithm is enhanced through the feature validity metric value obtained in the step S2, and an improved group intelligent optimization algorithm is obtained; based on the algorithm, a packaged feature selection scheme is built, a ConvGRU evaluation model is embedded as a subset evaluator, feature subset optimization is further achieved on the basis of primary screening of features in step S2, and feature redundancy is reduced. The method can directly process the time sequence characteristics, and can improve the searching efficiency as much as possible while guaranteeing the screening quality of the characteristic subsets.

Description

A hybrid intelligent feature selection method for transient voltage stability assessment of power systems

技术领域Technical Field

本发明涉及电力系统技术领域，尤其是一种适应电力系统暂态电压稳定评估的混合智能特征选择方法。The invention relates to the technical field of power systems, and in particular to a hybrid intelligent feature selection method adapted to transient voltage stability assessment of power systems.

背景技术Background Art

规模化的新能源并网和多回特高压直流输电工程的投运，使得新型电力系统“双高一低”及负荷中心“空心化”特征明显，源-荷不确定性加剧。高度电力电子化致使电网动态特性深刻变化，同时故障耐受能力降低，加剧了系统暂态电压失稳的风险。随着大数据理论和人工智能技术的发展成熟，以及电网量测装置的迅速普及，响应数据驱动的人工智能方法为快速实现暂态电压稳定评估提供了新的思路。然而实际电力系统中各类元件数量众多，电网规模庞大，电气特征为高维数据，并且特征间的冗余性会极大影响评估模型的分类性能和效率。近年来，为充分挖掘系统受扰后关键特征的机理演化信息从而提高稳定评估准确率，部分文献提出使用动态时序数据进行暂态稳定性分析，这无疑加剧了模型的计算负担和过拟合风险。因此，探索合适的特征选择方案来降低原始特征维度是人工智能方法在电力系统暂态电压稳定评估中应用的关键问题。The large-scale grid connection of new energy and the commissioning of multiple UHV DC transmission projects have made the new power system "double high and one low" and the "hollowing out" of load centers obvious, and the uncertainty of source-load has intensified. The high degree of power electronics has caused profound changes in the dynamic characteristics of the power grid, and at the same time, the fault tolerance has been reduced, which has increased the risk of transient voltage instability in the system. With the development and maturity of big data theory and artificial intelligence technology, as well as the rapid popularization of power grid measurement devices, the response data-driven artificial intelligence method has provided a new idea for the rapid realization of transient voltage stability assessment. However, there are a large number of various components in the actual power system, the scale of the power grid is huge, the electrical characteristics are high-dimensional data, and the redundancy between features will greatly affect the classification performance and efficiency of the assessment model. In recent years, in order to fully explore the mechanism evolution information of key features after the system is disturbed and thus improve the accuracy of stability assessment, some literatures have proposed using dynamic time series data for transient stability analysis, which undoubtedly increases the computational burden and overfitting risk of the model. Therefore, exploring appropriate feature selection schemes to reduce the original feature dimension is a key issue in the application of artificial intelligence methods in transient voltage stability assessment of power systems.

以往研究中，多采用人工方式进行关键特征选择，即依靠专家经验选取符合先验知识的变量特征进行稳定性分析；当前许多学者进行了基于数据挖掘的特征选择研究，主要有过滤式、嵌入式和封装式三种。过滤式方法往往通过计算特征变量与标签之间的关联度来筛选与目标属性最相关的特征，典型的相关性度量准则包括Relief统计、信息度量、Fisher统计等。嵌入式方法是在学习模型训练的过程中同步实现特征选择，典型的嵌入式特征选择模型有决策树。封装式方法是通过寻优算法与学习模型的相互配合，以迭代寻优的方式找到与之相匹配的最佳特征子集。In previous studies, manual methods were often used to select key features, that is, relying on expert experience to select variable features that conform to prior knowledge for stability analysis; currently, many scholars have conducted research on feature selection based on data mining, mainly in three types: filtering, embedded, and encapsulated. The filtering method often selects the features most relevant to the target attribute by calculating the correlation between the feature variable and the label. Typical correlation measurement criteria include relief statistics, information measurement, Fisher statistics, etc. The embedded method is to realize feature selection synchronously during the training of the learning model. Typical embedded feature selection models include decision trees. The encapsulated method is to find the best feature subset that matches it through the mutual cooperation of the optimization algorithm and the learning model in an iterative optimization manner.

由于当下电力系统中许多新型动态响应设备故障特性不明，暂态电压失稳的机理分析也不明确，因此仅依靠人工经验选择变量特征已难以适应复杂大电网特性且易造成信息缺漏。而在基于数据挖掘的特征选择研究中，过滤式方法虽然计算速度快，可操作性强，但其特征筛选原则较为单一，不能有效降低特征冗余性，并且不能有效应用于时序特征选择；嵌入式方法又过于依赖学习模型，难以适应暂态电压稳定快速评估的要求；封装式方法所选特征评估准确率较高，但算法计算成本高，运算效率有待提升。因此现有特征选择方法在兼顾筛选效率和筛选准确性方面还存在不足。Since the fault characteristics of many new dynamic response devices in the current power system are unclear, and the mechanism analysis of transient voltage instability is also unclear, it is difficult to adapt to the characteristics of complex large power grids and easily cause information omissions by relying solely on manual experience to select variable features. In the feature selection research based on data mining, although the filtering method has fast calculation speed and strong operability, its feature screening principle is relatively simple, and it cannot effectively reduce feature redundancy, and cannot be effectively applied to time series feature selection; the embedded method is too dependent on the learning model and is difficult to adapt to the requirements of rapid evaluation of transient voltage stability; the encapsulation method has a high accuracy rate in feature evaluation, but the algorithm calculation cost is high, and the operation efficiency needs to be improved. Therefore, the existing feature selection methods are still insufficient in terms of balancing screening efficiency and screening accuracy.

发明内容Summary of the invention

针对当前电力系统分析特征选择方法中存在的特征子集筛选效率及分类性能难以兼得的问题，本发明提供一种适应电力系统暂态电压稳定评估的混合智能特征选择方法。In view of the problem that it is difficult to achieve both feature subset screening efficiency and classification performance in current power system analysis feature selection methods, the present invention provides a hybrid intelligent feature selection method suitable for power system transient voltage stability assessment.

本发明提供的适应电力系统暂态电压稳定评估的混合智能特征选择方法，步骤如下：The hybrid intelligent feature selection method adapted to transient voltage stability assessment of power systems provided by the present invention comprises the following steps:

S1、样本生成：S1. Sample generation:

通过进行稳定性相关因素分析和原始特征集构建及多场景暂态时域仿真，得到高维时序样本数据集。时域仿真获取电力系统受扰后暂态电压样本数据，并依据工程实用判据完成样本稳定性标注，从而构建特征选择与评估模型的训练和测试数据集，即高维时序样本数据集。By analyzing stability-related factors, constructing original feature sets, and simulating transient time domains in multiple scenarios, a high-dimensional time series sample data set is obtained. The time domain simulation obtains transient voltage sample data after the power system is disturbed, and completes sample stability labeling based on practical engineering criteria, thereby constructing training and test data sets for feature selection and evaluation models, namely, high-dimensional time series sample data sets.

S2、基于T-Relief算法的特征有效性度量及初步筛选。S2. Feature validity measurement and preliminary screening based on T-Relief algorithm.

Relief算法具备运算高效的优点，但不直接适用于时序输入特征选择。因此通过时序分层处理对其进行改进，得到T-Relief算法。将此算法用于原始特征的初步筛选，一方面计算出特征分类有效性度量值去增强后续步骤S3的搜索性能，另一方面实现初步降维，降低步骤S3的迭代运算成本。该步骤具体包括以下子步骤：The Relief algorithm has the advantage of high computational efficiency, but it is not directly applicable to time series input feature selection. Therefore, it is improved by time series hierarchical processing to obtain the T-Relief algorithm. This algorithm is used for the preliminary screening of original features. On the one hand, the feature classification effectiveness metric is calculated to enhance the search performance of the subsequent step S3. On the other hand, it realizes preliminary dimensionality reduction and reduces the iterative computation cost of step S3. This step specifically includes the following sub-steps:

S21、假设原始特征数据集共有M个样本，每个样本具有d个特征属性，数据记录的时间点个数为N；对具有N个时间步的M个时序高维特征数据进行时序分层，形成N个分层高维特征矩阵；S21. Assume that the original feature data set has a total of M samples, each sample has d feature attributes, and the number of time points of data records is N; perform time series stratification on M time series high-dimensional feature data with N time steps to form N stratified high-dimensional feature matrices;

S22、分别计算N个分层高维特征矩阵中M个样本与其他样本之间的欧式距离，构建欧式距离矩阵；S22, respectively calculating the Euclidean distances between M samples and other samples in N hierarchical high-dimensional feature matrices, and constructing a Euclidean distance matrix;

S23、依据欧式距离矩阵，寻找各分层高维特征样本的近邻样本；计算对应时刻下各个特征的相关统计量；S23, according to the Euclidean distance matrix, find the neighboring samples of the high-dimensional feature samples of each layer; calculate the relevant statistics of each feature at the corresponding time;

S24、对不同时刻分层下求得的相关统计量求平均，得到综合时序信息的各个特征有效性度量值δ；S24, averaging the relevant statistics obtained at different time layers to obtain the effectiveness measurement value δ of each feature of the comprehensive time series information;

S25、设定阈值τ，筛选出满足条件的有效特征进行下一步骤。设定的阈值τ可以为筛选特征的个数或相关统计量特定值等。S25, setting a threshold τ, screening out valid features that meet the conditions and proceeding to the next step. The set threshold τ may be the number of screened features or a specific value of a related statistic, etc.

S3、基于改进群体智能算法的特征选择及稳定性评估。S3. Feature selection and stability evaluation based on improved swarm intelligence algorithm.

本步骤通过步骤S2得到的特征有效性度量值，对群智能优化算法(BGOA)进行搜索性能增强，得到改进的群智能优化算法(IBGOA)。基于此算法搭建封装式特征选择方案，并嵌入ConvGRU评估模型作为子集评价器，能充分考虑特征的时序演变信息和内部组合关系，在步骤S2初筛特征基础上进一步实现特征子集寻优，降低特征冗余性。This step uses the feature validity metric obtained in step S2 to enhance the search performance of the swarm intelligence optimization algorithm (BGOA) to obtain the improved swarm intelligence optimization algorithm (IBGOA). Based on this algorithm, a packaged feature selection scheme is built, and the ConvGRU evaluation model is embedded as a subset evaluator, which can fully consider the temporal evolution information and internal combination relationship of the features, and further realize feature subset optimization based on the initial screening features in step S2 to reduce feature redundancy.

该步骤包括以下子步骤：This step includes the following sub-steps:

S31、采用改进的二进制蝗虫优化算法进行特征选择；S31, using improved binary locust optimization algorithm for feature selection;

通过步骤S24得到的有效性度量值δ，对二进制蝗虫优化算法进行搜索性能增强，得到改进的二进制蝗虫优化算法；改进的二进制蝗虫优化算法中，改进蝗虫位置初始化公式及迭代更新公式如下：The effectiveness metric value δ obtained in step S24 is used to enhance the search performance of the binary locust optimization algorithm to obtain an improved binary locust optimization algorithm; in the improved binary locust optimization algorithm, the improved locust position initialization formula and iterative update formula are as follows:

式中，a∈(0,1)为权重系数，r∈[0,1]为均匀分布随机数；Round(.)表示四舍五入取整函数；In the formula, a∈(0,1) is the weight coefficient, r∈[0,1] is a uniformly distributed random number; Round(.) represents the rounding function;

式中，β、η、γ均为权重系数，r∈[0,1]为均匀分布随机数；In the formula, β, η, γ are weight coefficients, r∈[0,1] is a uniformly distributed random number;

S32、嵌入ConvGRU的多维时序二分类模型对特征子集进行分类性能评估，通过适应度函数作为综合评价指标，判断综合评价指标是否满足迭代次数要求，若满足迭代次数要求，则输出最优特征子集，若不满足迭代次数要求，则再次重复采用改进的二进制蝗虫优化算法进行特征选择。S32. The multi-dimensional time series binary classification model embedded with ConvGRU evaluates the classification performance of the feature subset. The fitness function is used as a comprehensive evaluation index to determine whether the comprehensive evaluation index meets the iteration number requirement. If the iteration number requirement is met, the optimal feature subset is output. If the iteration number requirement is not met, the improved binary locust optimization algorithm is repeatedly used for feature selection.

该步骤中，ConvGRU的计算公式如下：In this step, the calculation formula of ConvGRU is as follows:

R_t＝σ(W_xr*X_t+W_hr*H_t-1)R _t =σ(W _xr *X _t +W _hr *H _t-1 )

Z_t＝σ(W_xz*X_t+W_hz*H_t-1)Z _t =σ(W _xz *X _t +W _hz *H _t-1 )

式中，σ和tanh表示激励函数；⊙为按元素相乘；*为卷积运算；R_t示意重置门；Z_t示意更新门；X_t表示当前时刻输入数据；

表示当前候选值；H_t-1表示上一时刻隐藏状态；W_xr、W_hr、W_xz、W_hz、W_xh、W_hh分别表示对应连接的权重系数。Wherein, σ and tanh represent activation functions; ⊙ represents element-wise multiplication; * represents convolution operation; R _t represents the reset gate; Z _t represents the update gate; X _t represents the input data at the current moment;

represents the current candidate value; H _t-1 represents the hidden state at the previous moment; W _xr , W _hr , W _xz , W _hz , W _xh , and W _hh represent the weight coefficients of the corresponding connections respectively.

采用的适应度函数如下：The fitness function used is as follows:

式中，β∈(0,1)为权重系数，|R|为对应特征子集的维度，|N|为原始特征维度；P_c为综合错判率指标；In the formula, β∈(0,1) is the weight coefficient, |R| is the dimension of the corresponding feature subset, |N| is the original feature dimension; P _c is the comprehensive misjudgment rate indicator;

综合错判率指标P_c计算公式如下：The calculation formula of the comprehensive misjudgment rate index _Pc is as follows:

P_c＝αP_fs+(1-α)P_fus P _c = αP _fs + (1-α)P _fus

其中，漏判率

Among them, the missed rate

误判率

式中，α为权重系数，F_s表示错误判定为稳定的样本数(漏判)；F_us表示错误判定为失稳的样本数(误判)；T_s表示正确判定为稳定的样本数；T_us表示正确判定为失稳的样本数。False positive rate

Where α is the weight coefficient, _Fs represents the number of samples incorrectly judged as stable (missed judgment); _Fus represents the number of samples incorrectly judged as unstable (misjudgment); _Ts represents the number of samples correctly judged as stable; _Tus represents the number of samples correctly judged as unstable.

与现有技术相比，本发明的有益之处在于：Compared with the prior art, the present invention is beneficial in that:

(1)通过对Relief算法进行时序改进，得到T-Relief算法，该方法适用于高维时序特征的有效性度量，且仍具备运算高效的优势；(1) By improving the time series of the Relief algorithm, we obtain the T-Relief algorithm, which is suitable for measuring the effectiveness of high-dimensional time series features and still has the advantage of high computational efficiency.

(2)融合特征有效性度量值，改进了群智能优化算法的初始化及迭代更新公式，有效提升了基于此智能算法进行封装式特征选择的效率和性能；(2) The feature validity metric is integrated to improve the initialization and iterative update formula of the swarm intelligent optimization algorithm, effectively improving the efficiency and performance of encapsulated feature selection based on this intelligent algorithm;

(3)基于双重筛选的混合智能特征选择方法能有效实现特征降维。初筛阶段提供特征有效性度量值并初步降维，提升了后续子集寻优的效率。基于改进群智能优化算法的封装式阶段嵌入ConvGRU时序评估模型能充分考虑特征的时变特性，具有更高的分类精度。与其他常用特征选择方法相比，本发明的方法所选特征子集分类性能更优。(3) The hybrid intelligent feature selection method based on double screening can effectively achieve feature dimensionality reduction. The initial screening stage provides feature validity measurement and preliminary dimensionality reduction, which improves the efficiency of subsequent subset optimization. The encapsulated stage embedded ConvGRU timing evaluation model based on the improved group intelligence optimization algorithm can fully consider the time-varying characteristics of the features and has higher classification accuracy. Compared with other commonly used feature selection methods, the feature subset classification performance selected by the method of the present invention is better.

本发明的其它优点、目标和特征将部分通过下面的说明体现，部分还将通过对本发明的研究和实践而为本领域的技术人员所理解。Other advantages, objectives and features of the present invention will be embodied in part through the following description, and in part will be understood by those skilled in the art through study and practice of the present invention.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1、混合智能特征选择算法框图。Figure 1. Block diagram of hybrid intelligent feature selection algorithm.

图2、本发明改进的T-Relief算法流程图。FIG2 is a flow chart of the improved T-Relief algorithm of the present invention.

图3、ConvGRU单元结构示意图。Figure 3. Schematic diagram of the ConvGRU unit structure.

图4、IBGOA与BGOA迭代收敛曲线对比效果图。Figure 4. Comparison of iterative convergence curves of IBGOA and BGOA.

具体实施方式DETAILED DESCRIPTION

以下结合附图对本发明的优选实施例进行说明，应当理解，此处所描述的优选实施例仅用于说明和解释本发明，并不用于限定本发明。The preferred embodiments of the present invention are described below in conjunction with the accompanying drawings. It should be understood that the preferred embodiments described herein are only used to illustrate and explain the present invention, and are not used to limit the present invention.

如图1所示，本发明提供的适应电力系统暂态电压稳定评估的混合智能特征选择方法，主要包括样本生成、基于T-Relief算法的特征有效性度量及初步筛选、基于群智能优化算法的特征子集寻优及稳定性评估三个部分。As shown in Figure 1, the hybrid intelligent feature selection method adapted to the transient voltage stability assessment of the power system provided by the present invention mainly includes three parts: sample generation, feature validity measurement and preliminary screening based on the T-Relief algorithm, and feature subset optimization and stability assessment based on the swarm intelligence optimization algorithm.

其中，T-Relief算法原理如下：The principle of the T-Relief algorithm is as follows:

Relief算法是针对二分类问题提出的一种基于过滤式的特征选择方法，该方法根据特征对近距离样本的区分能力计算相关统计量，从而对不同特征进行有效性评估。具体步骤如下：The Relief algorithm is a filtering-based feature selection method proposed for binary classification problems. This method calculates relevant statistics based on the ability of features to distinguish close-range samples, thereby evaluating the effectiveness of different features. The specific steps are as follows:

(1)以欧式距离为依据，在同类样本中寻找与指定样本距离最近的样本为“猜中近邻”，在非同类样本中寻找与指定样本距离最近的样本为“猜错近邻”，然后根据选出的“猜中近邻”和“猜错近邻”计算各个特征的相关统计量，最后对训练集中所有样本的相关统计量结果求和可得整体特征重要性度量结果，计算公式如下：(1) Based on the Euclidean distance, the sample closest to the specified sample is found among the same samples as the "guessed neighbor", and the sample closest to the specified sample is found among the different samples as the "wrongly guessed neighbor". Then, the relevant statistics of each feature are calculated based on the selected "guessed neighbors" and "wrongly guessed neighbors". Finally, the relevant statistics of all samples in the training set are summed to obtain the overall feature importance measurement result. The calculation formula is as follows:

式中，上标j表示对应样本的第j维特征，δ为所有特征的相关统计量列表，x_i为当前计算样本，x_i,nh为当前样本的“猜中近邻”，x_i,nm为当前样本的“猜错近邻”。由于电力系统响应数据为连续型变量，故

的取值为：In the formula, the superscript j represents the j-th dimension feature of the corresponding sample, δ is the list of relevant statistics of all features, _xi is the current calculation sample, xi _,nh is the "guessed neighbor" of the current sample, and xi _,nm is the "wrongly guessed neighbor" of the current sample. Since the power system response data is a continuous variable,

The value of is:

(2)对得到的各个特征相关统计量进行排序，选定合适的阈值，选择重要性更高的特征进行分类任务。(2) Sort the relevant statistics of each feature, select a suitable threshold, and choose features with higher importance for classification tasks.

本发明方法中通过时序分层处理，对Relief算法进行时序改进，得到改进后得到的T-Relief算法，使其能直接进行高维时序特征的筛选。假设原始特征数据集共有M个样本，每个样本具有d个特征属性，数据记录的时间点个数为N，则T-Relief算法的计算流程如图2所示，具体如下：In the method of the present invention, the Relief algorithm is improved in time series by time series layering processing, and the improved T-Relief algorithm is obtained, so that it can directly screen high-dimensional time series features. Assuming that the original feature data set has a total of M samples, each sample has d feature attributes, and the number of time points of data records is N, the calculation process of the T-Relief algorithm is shown in Figure 2, which is as follows:

①对具有N个时间步的M个时序高维特征数据进行时序分层，形成N个分层高维特征矩阵；① Perform time-series stratification on M time-series high-dimensional feature data with N time steps to form N stratified high-dimensional feature matrices;

②分别计算N个分层高维特征矩阵中M个样本与其他样本之间的欧式距离，构建欧式距离矩阵；② Calculate the Euclidean distances between M samples and other samples in N hierarchical high-dimensional feature matrices respectively, and construct a Euclidean distance matrix;

③依据欧式距离矩阵，寻找各分层高维特征样本的近邻样本，归一化原始特征数据，计算对应时刻下各个特征的相关统计量；计算公式如下：③ Based on the Euclidean distance matrix, find the neighboring samples of each layered high-dimensional feature sample, normalize the original feature data, and calculate the relevant statistics of each feature at the corresponding time; the calculation formula is as follows:

④对不同时刻分层下求得的相关统计量求平均，得到综合时序信息的各个特征有效性度量值；④Average the relevant statistics obtained at different time levels to obtain the effectiveness measurement values of each feature of the comprehensive time series information;

⑤设定阈值τ，筛选出满足条件的有效特征进行下一步筛选研究。本方法中的阈值τ可以是筛选特征的个数或相关统计量特定值。⑤ Set a threshold value τ to screen out valid features that meet the conditions for the next step of screening research. The threshold value τ in this method can be the number of screened features or a specific value of related statistics.

改进群智能优化算法原理如下：The principle of the improved swarm intelligence optimization algorithm is as follows:

基于群体智能的封装式特征选择方法，是目前寻找最优特征子集近似解的重要手段。本发明选用较为稳健可靠的二进制蝗虫优化算法(Binary grasshopper optimizationalgorithm，BGOA)作为封装式特征选择的搜索策略。The encapsulated feature selection method based on swarm intelligence is an important means to find the approximate solution of the optimal feature subset. The present invention selects the relatively robust and reliable Binary Grasshopper Optimization Algorithm (BGOA) as the search strategy for encapsulated feature selection.

BGOA是模拟蝗虫觅食行为提出的依据个体间社会作用力进行数学建模的一种群体智能优化算法，算法结构简单，性能稳定，数学模型为：BGOA is a swarm intelligence optimization algorithm proposed to simulate locust foraging behavior and mathematically model the social forces between individuals. The algorithm has a simple structure and stable performance. The mathematical model is:

X_i＝S_i+G_i+A_i _Xi ＝ _Si + _Gi + _Ai

式中，下标i表示种群中的第i个蝗虫；X_i表示其位置；S_i为其他代理对其的社会作用力；G_i为其自身重力；A_i为其所受风力，实际应用中常忽略重力和风力作用。其中S_i的计算公式如下：In the formula, the subscript i represents the i-th locust in the population; _Xi represents its position; _Si is the social force exerted on it by other agents; _Gi is its own gravity; _Ai is the wind force it is subjected to, and gravity and wind are often ignored in practical applications. The calculation formula of _Si is as follows:

式中，d_ij＝|x_i-x_j|表示两个蝗虫之间的距离，s(.)为社会力强度函数，正数表示吸引，负数表示排斥，其计算公式如下：In the formula, d _ij = | _xi _-xj | represents the distance between two locusts, s(.) is the social force strength function, positive numbers represent attraction, and negative numbers represent repulsion. The calculation formula is as follows:

式中，f为引力强度参数，l为引力范围参数，f＝0.5，l＝1.5，r是一个自变量，没有特别的含义，等同于上面公式中d_ij的代号。In the formula, f is the gravitational intensity parameter, l is the gravitational range parameter, f=0.5, l=1.5, r is an independent variable with no special meaning, which is equivalent to the code of _dij in the above formula.

在特征选择问题中，BGOA的位置向量X_i为长度为D的二进制序列，表示D个特征的选取情况，即X_id＝1表示第d维特征被选中，反之则相反。因此结合社会力作用，定义蝗虫位置更新的步进向量为Δx_i，并利用S型传递函数将其转换成选中的概率，具体计算方式如下：In the feature selection problem, the position vector _Xi of BGOA is a binary sequence of length D, indicating the selection of D features, that is, _Xid = 1 means that the d-th dimension feature is selected, and vice versa. Therefore, combined with the social force, the step vector of locust position update is defined as _Δxi , and the S-type transfer function is used to convert it into the probability of selection. The specific calculation method is as follows:

式中，u^d，l^d分别为特征取值的上、下限；t表示迭代轮次；T(.)是一个函数；c为自适应调节系数，调节方式如下：In the formula, ^ud and ^ld are the upper and lower limits of the feature value respectively; t represents the iteration round; T(.) is a function; c is the adaptive adjustment coefficient, and the adjustment method is as follows:

c＝c_max-t(c_max-c_min)/Lc＝c _max −t(c _max −c _min )/L

式中，c_max，c_min分别为最大调节系数和最小调节系数，L表示最大迭代次数。Wherein, c _max and c _min are the maximum adjustment coefficient and the minimum adjustment coefficient respectively, and L represents the maximum number of iterations.

本发明中，为融合T-Relief阶段特征有效性度量值对蝗虫迭代更新的引导作用，改进蝗虫位置初始化公式和迭代更新公式如下：In the present invention, in order to integrate the guiding role of the feature validity metric value of the T-Relief stage on the iterative update of locusts, the locust position initialization formula and iterative update formula are improved as follows:

式中，a∈(0,1)为权重系数，r∈[0,1]为均匀分布随机数。Round(.)表示四舍五入取整函数，In the formula, a∈(0,1) is the weight coefficient, r∈[0,1] is a uniformly distributed random number. Round(.) represents the rounding function.

式中，β、η、γ均为权重系数，r∈[0,1]为均匀分布随机数。In the formula, β, η, and γ are weight coefficients, and r∈[0,1] is a uniformly distributed random number.

ConvGRU单元结构如图3所示。评价待选特征子集的优劣主要从两个方面考虑，分别是使用该特征子集进行稳定评估的分类效果以及特征子集的维度。The structure of the ConvGRU unit is shown in Figure 3. The evaluation of the quality of the feature subset to be selected mainly considers two aspects, namely, the classification effect of using the feature subset for stable evaluation and the dimension of the feature subset.

为评估分类器性能，定义准确率P_acc、漏判率P_fs、误判率P_fus指标来评估暂稳分类器(ConvGRU)的性能。同时引入权重系数α>1来平衡误判、漏判的重要程度，定义了综合错判率指标P_c来评估特征子集的分类表现。In order to evaluate the performance of the classifier, the accuracy rate P _acc , missed detection rate P _fs , and false positive rate P _fus indicators are defined to evaluate the performance of the temporary stable classifier (ConvGRU). At the same time, the weight coefficient α>1 is introduced to balance the importance of false positives and missed detections, and the comprehensive false positive rate indicator P _c is defined to evaluate the classification performance of the feature subset.

P_c＝αP_fs+(1-α)P_fus P _c = αP _fs + (1-α)P _fus

为实现分类精度的最大化和特征子集维度的最小化，定义稳定评估特征选择的目标函数(即改进BGOA特征选择算法的寻优适应度函数)为：In order to maximize the classification accuracy and minimize the dimension of the feature subset, the objective function of the stable evaluation feature selection (that is, the optimization fitness function of the improved BGOA feature selection algorithm) is defined as:

式中，β∈(0,1)为权重系数，|R|为对应特征子集的维度，|N|为原始特征维度。Where β∈(0,1) is the weight coefficient, |R| is the dimension of the corresponding feature subset, and |N| is the original feature dimension.

性能对比分析：Performance comparison analysis:

(1)改进BGOA算法(IBGOA)与原始BGOA算法性能对比(1) Performance comparison between the improved BGOA algorithm (IBGOA) and the original BGOA algorithm

为验证本发明的改进BGOA算法(IBGOA)在子集迭代寻优过程中具有更好的性能表现，对两者进行对比实验。实验设置：分别使用BGOA和IBGOA对原始特征集合进行特征筛选，迭代次数均为50次，多次试验求取每一次迭代的适应度平均值。最终得适应度值迭代收敛曲线如图4所示。由图中可以明显看出，IBGOA方法下，子集的优化搜索效率更高，并且最优子集的适应度值更低，充分证明了本发明对BGOA进行改进的有效性。In order to verify that the improved BGOA algorithm (IBGOA) of the present invention has better performance in the subset iterative optimization process, a comparative experiment is conducted on the two. Experimental setup: BGOA and IBGOA are used to perform feature screening on the original feature set, and the number of iterations is 50 times. The average fitness value of each iteration is obtained by multiple experiments. The final fitness value iterative convergence curve is shown in Figure 4. It can be clearly seen from the figure that under the IBGOA method, the optimization search efficiency of the subset is higher, and the fitness value of the optimal subset is lower, which fully proves the effectiveness of the present invention in improving BGOA.

(2)特征选择前后暂态电压稳定评估效果对比(2) Comparison of transient voltage stability assessment results before and after feature selection

选取稳定评估准确率、漏判率、误判率指标、特征维度及模型训练时间角度对特征选择的效果进行对比论证。The effects of feature selection were compared and demonstrated from the perspectives of stable evaluation accuracy, missed detection rate, misjudgment rate, feature dimension and model training time.

实验细节：原始特征(370维)经T-Relief计算后，保留有效性度量值大于0.6的有效特征131维；再经IBGOA迭代寻优后，最终保留关键特征18维(多次试验取平均值，每次试验迭代50轮)，实验结果如表1所示。由表1可知，采用本发明方法进行特征选择后，最终保留的关键特征仅有18个，模型训练时间甚至减少了76.41％。在模型复杂度加深，样本量急剧增加的情况下，这将极大降低模型训练的难度。另外，虽然特征的维度大幅降低，但模型评估的准确率却上升了2.73％，误判率和漏判率均降低1.365％，这说明本发明所提特征选择方法能有效去除高维特征中的无效特征，从而提高暂态电压稳定评估模型的精度。Experimental details: After the original features (370 dimensions) were calculated by T-Relief, 131 dimensions of effective features with effectiveness metrics greater than 0.6 were retained; after IBGOA iterative optimization, 18 dimensions of key features were finally retained (average values were taken from multiple experiments, and each experiment was iterated 50 times). The experimental results are shown in Table 1. As can be seen from Table 1, after the feature selection method of the present invention was used, only 18 key features were finally retained, and the model training time was even reduced by 76.41%. This will greatly reduce the difficulty of model training when the model complexity increases and the sample size increases sharply. In addition, although the dimension of the features was greatly reduced, the accuracy of the model evaluation increased by 2.73%, and the false positive rate and missed positive rate were both reduced by 1.365%. This shows that the feature selection method proposed in the present invention can effectively remove invalid features in high-dimensional features, thereby improving the accuracy of the transient voltage stability assessment model.

表1、特征选择前后稳定评估效果对比Table 1. Comparison of stable evaluation effects before and after feature selection

以上所述，仅是本发明的较佳实施例而已，并非对本发明作任何形式上的限制，虽然本发明已以较佳实施例揭露如上，然而并非用以限定本发明，任何熟悉本专业的技术人员，在不脱离本发明技术方案范围内，当可利用上述揭示的技术内容作出些许更动或修饰为等同变化的等效实施例，但凡是未脱离本发明技术方案的内容，依据本发明的技术实质对以上实施例所作的任何简单修改、等同变化与修饰，均仍属于本发明技术方案的范围内。The above description is only a preferred embodiment of the present invention and does not limit the present invention in any form. Although the present invention has been disclosed as a preferred embodiment as above, it is not used to limit the present invention. Any technician familiar with this profession can make some changes or modify the technical contents disclosed above into equivalent embodiments without departing from the scope of the technical solution of the present invention. However, any simple modification, equivalent change and modification made to the above embodiments according to the technical essence of the present invention without departing from the content of the technical solution of the present invention still fall within the scope of the technical solution of the present invention.

Claims

1. A hybrid intelligent feature selection method suitable for transient voltage stability evaluation of a power system is characterized by comprising the following steps:

s1, sample generation:

the method comprises the steps of obtaining a high-dimensional time sequence sample data set through stability related factor analysis, original feature set construction and multi-scene transient time domain simulation;

s2, feature effectiveness measurement and preliminary screening based on a T-Relief algorithm, comprising the following substeps:

s21, assuming that an original characteristic data set has M samples, wherein each sample has d characteristic attributes, and the number of time points of data record is N; performing time sequence layering on M time sequence high-dimensional characteristic data with N time steps to form N layered high-dimensional characteristic matrixes;

s22, respectively calculating Euclidean distances between M samples and other samples in the N layered high-dimensional feature matrices, and constructing an Euclidean distance matrix;

s23, searching for neighbor samples of each layered high-dimensional characteristic sample according to the Euclidean distance matrix; calculating the relevant statistics of each feature at the corresponding moment;

s24, averaging the related statistics obtained under different time layering to obtain each characteristic effectiveness metric delta of the comprehensive time sequence information;

s25, setting a threshold tau, screening out effective features meeting the conditions, and carrying out the next step;

s3, feature selection and stability evaluation based on an improved population intelligent algorithm, comprising the following substeps:

s31, adopting an improved binary locust optimization algorithm to perform feature selection;

the binary locust optimization algorithm is subjected to search performance enhancement through the effectiveness metric delta obtained in the step S24, so that an improved binary locust optimization algorithm is obtained; in the improved binary locust optimization algorithm, an improved locust position initialization formula and an iterative update formula are as follows:

wherein a epsilon (0, 1) is a weight coefficient, and r epsilon [0,1] is a uniformly distributed random number; round () represents a rounding function;

wherein, beta, eta and gamma are weight coefficients, and r E [0,1] is a uniformly distributed random number;

s32, classifying performance evaluation is carried out on the feature subsets by a multi-dimensional time sequence classification model embedded with ConvGRU, an fitness function is used as a comprehensive evaluation index, whether the comprehensive evaluation index meets the iteration number requirement is judged, if the comprehensive evaluation index meets the iteration number requirement, the optimal feature subset is output, and if the comprehensive evaluation index does not meet the iteration number requirement, an improved binary locust optimization algorithm is repeatedly adopted again to carry out feature selection.

2. The hybrid intelligent feature selection method for adapting to power system transient voltage stability assessment according to claim 1, wherein in step S1, transient voltage sample data after power system disturbance is obtained through time domain simulation, and sample stability labeling is completed according to engineering practical criteria, so as to construct training and testing data sets of feature selection and assessment models, namely high-dimensional time sequence sample data sets.

3. The hybrid intelligent feature selection method for adapting to power system transient voltage stability assessment according to claim 1, wherein in step S32, the calculation formula of convglu is as follows:

R _t ＝σ(W _xr *X _t +W _hr *H _t-1 )

Z _t ＝σ(W _xz *X _t +W _hz *H _t-1 )

wherein σ and tanh represent excitation functions; the addition of the elements is carried out; * Is convolution operation; r is R _t Schematic weightSetting a door; z is Z _t Schematic update gate; x is X _t Representing the current time input data;

representing the current candidate value; h _t-1 Representing the hidden state at the previous moment; w (W) _xr 、W _hr 、W _xz 、W _hz 、W _xh 、W _hh Respectively representing the weight coefficients of the corresponding connections.

4. The hybrid intelligent feature selection method for adapting to transient voltage stability assessment of a power system according to claim 3, wherein in step S32, the fitness function employed is as follows:

wherein, beta epsilon (0, 1) is a weight coefficient, r| is the dimension of the corresponding feature subset, and N| is the original feature dimension; p (P) _c Is a comprehensive erroneous judgment rate index;

P _c the calculation formula is as follows:

P _c ＝αP _fs +(1-α)P _fus

wherein, alpha is a weight coefficient,

miss rate P _fs The calculation formula of (2) is as follows:

false positive rate P _fus The calculation formula of (2) is as follows:

wherein F is _s Indicating the number of samples that were erroneously determined to be stable; f (F) _us A sample number indicating that the error determination is unstable; t (T) _s Representation ofCorrectly judging the stable sample number; t (T) _us The number of samples correctly determined as unstable is represented.