CN117332322A

CN117332322A - Boundary region importance sampling method

Info

Publication number: CN117332322A
Application number: CN202311072657.4A
Authority: CN
Inventors: 刘颂凯; 陈浩; 刘峻良; 晏光辉; 张涛; 李文武; 李欣; 郭攀锋; 刁良涛; 江进波; 曹成; 王丰; 李丹
Original assignee: China Three Gorges University CTGU
Current assignee: China Three Gorges University CTGU
Priority date: 2020-04-16
Filing date: 2020-04-16
Publication date: 2024-01-02
Also published as: CN111401476B; CN111401476A

Abstract

A boundary region importance sampling method, comprising the steps of: step (1) determining a boundary region using information entropy; step (2) using a sampling method based on a Monte Carlo variance reduction technology (MCVR), constructing effective sampling, introducing deviation in the sampling process, so that the characterization of rare events in an evaluation stage is increased; through the steps, the offline training sample set is efficiently generated.

Description

Boundary region importance sampling method

Technical Field

The invention relates to the field of transient security assessment of power systems, in particular to a boundary area importance sampling method, which is a divisional application of an invention patent with the name of a transient security assessment method (application number 2020103010104) based on boundary area importance sampling and a kernel vector machine.

Background

On the one hand, due to the environment of the interconnected power systems, a large number of devices, such as smart meters and new energy sources, have been connected to the grid. As the scale of the power grid continues to expand, the complexity of the power grid also continues to increase, which presents a significant challenge for safe operation of the power grid. Meanwhile, with the continuous development of national economy and society, the requirements on the safe and stable operation and the power supply reliability of the power system are also higher and higher. The reform of the electric power system in China is deepened continuously, and the electric power system is developed in the direction of long distance and extra-high voltage. The novel loads such as various large-scale energy storage elements, electric automobile charging piles and the like are continuously connected, and the trans-regional high-capacity tie line power transmission system is gradually put into operation, so that the stability and the scheduling operation of the power system face serious challenges. In order to avoid huge economic loss and social influence caused by power outage in the whole country, the transient stability assessment of the power system plays an important role in analysis and judgment of the dynamic behavior of the system.

On the other hand, area criteria such as time domain simulation, a direct method (including a Lyapunov method and a transient energy function method) and expansion are mainstream methods for transient stability evaluation of an electric power system. These methods can provide near real-time or real-time assessment of transient stability of the power system, but leave room for improvement in terms of computational accuracy, speed and capacity. The time domain simulation method has low calculation speed, can not provide stability margin, and is difficult to be applied to real-time online analysis. The direct method and the extended area method can obtain the stability margin of the system, but are limited to application under a simple model, and cannot completely meet the requirement of online calculation. With the continuous development of synchronous phasor measurement units in an electric power system, the methods cannot utilize a large amount of phasor measurement unit data to perform real-time online calculation. Meanwhile, with the maturation of wide area measurement technology and the development of big data theory, machine learning has become one of the main methods for online stability assessment of power systems. However, conventional machine learning methods still have a number of drawbacks, such as: the efficiency problem of training a sample set; evaluating the evaluation result; the transient security information cannot be visualized; training time is too long to be suitable for large-scale data. Some accidents often occur in an actually operated power system, and the conventional transient safety evaluation model is difficult to evaluate the accidents.

In summary, the conventional method is difficult to adapt to the practical requirement of the modern power grid with high-speed development on real-time transient security evaluation, and a real-time evaluation method capable of meeting high adaptability and high precision is urgently needed.

Patent document with the authority of publication number CN104881741A discloses a power system transient stability judging method based on a support vector machine, wherein an input characteristic quantity set of the support vector machine (Support Vector Machine, SVM) is determined by utilizing round-by-round optimization, and then a transient stability evaluation rule is established through the SVM. Firstly, determining an input vector alternative set, the number of input vector elements, a kernel function of an SVM and training parameters, then generating a training sample and a test sample, adding all the alternative feature quantities into the input feature quantity set one by one, training the SVM, determining the feature quantity with highest SVM classification accuracy, further judging whether feature quantity selection calculation is finished or not and outputting the input feature quantity set, and finally training the SVM and obtaining a stability rule. However, this method has the following drawbacks:

(1) the power system safety assessment method requires a large number of sample sets to train or test its performance, and generating such sample sets is a very difficult task, even for smaller-scale power systems. It would be quite time consuming to determine the set of input features of the SVM using round-by-round optimization.

(2) Compared to SVM, the kernel vector machine (Core Vector Machine, CVM) has higher accuracy, lower temporal and spatial complexity, and thus higher efficiency.

Disclosure of Invention

The invention aims to provide a method which is beneficial to improving the evaluation speed and precision, so that the method has extremely strong applicability in the field of transient safety evaluation of a power system, is beneficial to system operators to take preventive control measures in time, and improves the stability of safe operation of a power grid.

The aim of the invention is realized as follows:

the transient state safety evaluation method based on the boundary area importance sampling and the kernel vector machine is characterized by comprising the following steps of:

step one): acquiring a system operation sample by utilizing historical operation data of the power system and analog simulation of a series of faults of the power system, constructing a dynamic safety index and establishing a corresponding sample database;

step two): for the sample database, a boundary area importance sampling method is used for sampling the sample database to efficiently generate an offline training sample set, and standard normalization is carried out on the sample set;

step three): based on the sample set, combining with CVM, constructing a transient security assessment model of the power system, and performing offline training and updating on the model by utilizing the sample set;

step four): based on the real-time operation data of the power system, the continuously updated evaluation model is utilized to complete the evaluation of the real-time transient state safety state of the power system, and a transient state safety evaluation result is obtained.

In the first step), based on historical operation data and an expected accident set of the power system, carrying out detailed power flow analysis and time domain simulation to obtain a system operation sample, and establishing a corresponding sample database.

And performing time domain simulation by using PSS/E software to obtain limit cutting time (Critical Clearing Time, CCT) of each fault position under each running state. Typically, when the CCT is greater than the actual clean time (Actual Clearing Time, ACT), the operating state of the system is judged to be safe. Thus, a transient safety index, i.e. a transient safety margin (Transient Stability Margin, TSM), is constructed as shown in equation (1):

wherein: CCT (CCT) _i Limiting cutting time under an accident i for a certain position of the power system; ACT (active transport protocol) _i The actual cutting time of the fault point under the accident i is the actual cutting time; TSM (TSM) _i Is a transient safety margin for that location. The definition of TSM is shown in equation (2):

in step two), the boundary region importance sampling method used for the established sample database is divided into the following two steps:

1) The information entropy is used to determine the boundary region as shown in equation (3):

wherein: s is a sample data set; c is the number of categories; p is p _i The proportion in S classified as class i. From the concept of entropy, a measure of the purity of the sample database can be obtained. The larger the value of E (S) is, the lower the purity, i.e., the more information content is, and therefore a place where the entropy value is relatively large is defined as a boundary region. By usingThis approach approximately determines the boundary region.

2) An efficient sampling is constructed using a sampling method based on the monte carlo variance reduction (Monte Carlo Variance Reduction, MCVR) technique. A bias is introduced during sampling such that the characterization of rare events in the evaluation phase increases. The boundary region is approximately determined in step 1), and the sampling method based on the MCVR technology is used to bias the sampling process towards the boundary region. Thus, an offline training sample set may be obtained.

Standard normalization is carried out on the training sample set generated efficiently so as to reduce the calculation burden of the machine, and the standard normalization mode is shown in a formula (4):

wherein:a value of a certain operation variable after standard normalization; x is x _i An original value for the run variable; x is x _{i_min} A minimum value for the variable in the acquired sample; x is x _{i_max} Maximum value of the variable in the acquired sample; in this way the values of all variables are varied from 0 to 1.

The power system safety assessment method requires a large sample set to train or test its performance. Since historical data often contains a limited number of anomalies, and related information in the vicinity of the boundary region is often lost, simulation data is required for this purpose. Generating such a sample set is a very difficult task, even for smaller scale power systems. Therefore, by using a boundary region importance sampling method, the boundary region is mainly biased in the sampling process, and safe and unsafe regions can be mapped. The sample set with rich information and small data volume is generated, so that the training process is faster and the prediction precision is higher.

In step three), the efficiently generated sample set is input into a training model. CVM (chemical vapor deposition) medicineOverfeature mappingThe sample set S is projected into a high dimensional space to build a minimum bounding sphere (Minimum Enclosing Ball, MEB) and solve the MEB problem. And solving the MEB problem by adopting a CVM algorithm. By S _t 、c _t And R is _t The core set, center of sphere and radius over t iterations are represented, respectively. Center and radius sphere B is defined by c _B And r _B Representing, given a positive number ε, the offline training process is as follows:

1)S ₀ 、c ₀ and R is ₀ Initializing:

selecting an arbitrary point z ε S to initialize S ₀ = { z }, in the feature space, z is found _a E S the point furthest from z, then can be found _a Another point z furthest _b E S, the initial core set is S ₀ ＝{z _a ,z _b The initial sphere center isInitial sphere center R ₀ ；

2) If there is no pointOutside the (1+ε) sphere, the algorithm ends. Otherwise, the core set is S _t+1 ＝S _t U { z }, z is->Separation c _t The furthest point;

3) Searching for new MEBs:

new MEB (S) _t+1 ) Given by step 2), andand->Can be according toObtained, wherein α= [ α ] ₁ ,α ₂ ,...,α _m ]' is the Lagrangian multiplier and k is the kernel matrix. And then go to 2) step for the next iteration.

Through the steps, an offline training model can be obtained.

A variety of factors that may affect the transient safety state of the power system are comprehensively considered, including topology changes, generator power changes, load power changes, and other operating condition changes. Aiming at the situation, a near-real-time updating sample set is obtained, and the sample set is used for updating the offline training model so as to obtain an updated transient security assessment model.

In the fourth step), the synchronous phasor measurement unit and the wide area monitoring system are utilized to collect the operation variables of the power system in real time, and based on real-time data, the updated transient state safety evaluation model is utilized to predict the transient state safety state of the power system, so that an online transient state safety evaluation result is obtained.

A boundary region importance sampling method, comprising the steps of:

step (1) determining a boundary region using information entropy;

step (2) uses a sampling method based on MCVR techniques to construct an efficient sample, introducing a bias in the sampling process that increases the characterization of rare events during the evaluation phase.

In step (1), the information entropy is used to determine the boundary region, as shown in equation (5):

wherein: s is a sample data set, C is the number of categories, p _i The proportion in S classified as class i; from the concept of entropy, a measure of purity of the sample database can be obtained; the larger the value of E (S), the lower the purity, i.e., the more abundant the information content, and therefore, a boundary region is defined as a region where the entropy value is relatively large, and the boundary region is roughly determined by this method.

In step (2), an effective sampling is constructed by using a sampling method based on the MCVR technology, deviation is introduced in the sampling process, so that the characterization of rare events in the evaluation stage is increased, the boundary area is roughly determined in step (1), and the sampling process can be biased towards the boundary area by using the sampling method based on the MCVR technology; the method comprises the following two steps:

1) Variance reduction of importance samples:

defining the probability of an unacceptable event, i.e., P (Y-unacceptable event), as shown in equation (6):

wherein: y=t represents a threshold, Y < t represents the performance of an unacceptable event, we can define the indicator function I (Y) as shown in equation (7):

equation (6) can thus be defined as shown in equation (8):

the above-mentioned expectation function gives a rough Monte Carlo estimate, where y _i Is a Monte Carlo sample extracted from the f (y) distribution, this estimate has a variance associated with it, since h (y) _i ) Number of (a) with y _i The variance of the estimate is reduced by reconstructing the desired function as shown in equation (9):

wherein: y is _i Is a Monte Carlo sample extracted from distribution g (y), which is trueProtect theThe number of (2) is almost equal to y _i Is consistent in number;

2) Efficient generation of training samples:

the first stage operation provides a boundary region where X is most likely to occur, thus determining the X-space in which we want the offset samples to be generated, and in terms of the indicator function, the sampled region is as shown in equation (10):

wherein: s is a bounding region, e.g., in the case of univariate, S= { x: x is defined ₁ ≤x≤x ₂ Sampling distribution function g _X (x) Can be constructed as |h (x) |f (x), f _X (x) Is S, and the sample density importance is expressed as shown in formula (11):

wherein: k (k) ₁ And k ₂ Is to satisfy the probability condition k ₁ +k ₂ Bias of =1, f _1X (x) Is a probability density function of the boundary region, f _2X (x) Is a probability distribution function outside the boundary region, and the sampling distribution function g _X (x) At k ₁ When=1, that is, completely biased toward the boundary region, the state space probability distribution conditional on the boundary region is as shown in the formula (12) and the formula (13):

a＝∫ _S f _X (x)dx (13)

wherein: a is a scaling factor that satisfies 0.ltoreq.a.ltoreq.1. The probability distribution is changed by the above equation, so that more data comes from the boundary region, and thus an offline training sample set is obtained.

By adopting the technical scheme, the following technical effects can be brought:

(1) By using the concept of information entropy, the region with rich information can be judged, so that the boundary region can be roughly determined;

(2) By using the sampling method based on the MCVR technology, the sampling can be biased to the boundary area, and a high-efficiency sample set with rich information and small data volume can be generated, so that the speed of the offline training process can be faster;

(3) Based on the sample set generated efficiently, a transient stability evaluation model of the power system is built by combining CVM, and the evaluation result has higher precision and less use time.

Drawings

The invention is further illustrated by the following examples in conjunction with the accompanying drawings:

FIG. 1 is a flow chart of the method of the present invention;

FIG. 2 is a flow chart of a sampling method according to the present invention;

FIG. 3 is a diagram of an IEEE 39 node system topology in accordance with an embodiment of the invention;

FIG. 4 is a graph comparing data processing speeds for four different models tested by an embodiment of the present invention;

FIG. 5 is a graph comparing accuracy of model evaluation using three different sampling methods tested by an embodiment of the present invention.

Detailed Description

The transient security assessment method based on the boundary region importance sampling and the kernel vector machine, as shown in fig. 1, specifically comprises the following steps:

And performing time domain simulation by using PSS/E software to obtain CCT of each fault position in each running state. Typically, when CCT is greater than ACT, the operating state of the system is judged as safe. Thus, a transient safety index, TSM, is constructed as shown in equation (1):

in step two), a boundary region importance sampling method is used for the established sample database, as shown in fig. 2, specifically including the following steps:

wherein: s is the sample data set, C is the number of categories,p _i the proportion in S classified as class i. From the concept of entropy, a measure of the purity of the sample database can be obtained. The larger the value of E (S) is, the lower the purity, i.e., the more information content is, and therefore a place where the entropy value is relatively large is defined as a boundary region. In this way, the boundary region is approximately determined.

2) An efficient sampling is constructed using a sampling method based on MCVR techniques. A bias is introduced during sampling such that the characterization of rare events in the evaluation phase increases. The boundary region is approximately determined in step 1), and the sampling method based on the MCVR technology is used to bias the sampling process towards the boundary region. The method comprises the following two steps:

1) Variance reduction of importance samples:

wherein: y=t represents a threshold and Y < t represents the performance of an unacceptable event. We can define the indicator function I (Y) as shown in equation (7):

equation (6) can thus be defined as shown in equation (8):

the above-mentioned expectation function gives a rough Monte Carlo estimate, where y _i Is a monte carlo sample extracted from the f (y) distribution. This estimate has a variance associated with it because h (y _i ) Number of (a) with y _i And (3) a change. The variance of the estimate is reduced by reconstructing the desired function as shown in equation (9):

wherein: y is _i Is a Monte Carlo sample extracted from distribution g (y), which ensuresThe number of (2) is almost equal to y _i Is uniform in number.

2) Efficient generation of training samples:

the first stage operation provides a boundary region where X is most likely to occur, thus determining the X-space in which we want the offset samples to be generated. As for the indicator function, the sampled region is shown in formula (10):

wherein: s is a bounding region, e.g., in the case of univariate, S= { x: x is defined ₁ ≤x≤x ₂ }. Sampling distribution function g _X (x) May be constructed as |h (x) |f (x). f (f) _X (x) Is S, and the sample density importance is expressed as shown in formula (11):

wherein: k (k) ₁ And k ₂ Is to satisfy the probability condition k ₁ +k ₂ Bias of =1, f _1X (x) Is a probability density function of the boundary region, f _2X (x) Is a probability distribution function outside the boundary region. Sampling distribution function g _X (x) At k ₁ When=1, that is, completely biased toward the boundary region, the state space probability distribution conditional on the boundary region is as shown in the formula (12) and the formula (13):

a＝∫ _S f _X (x)dx (13)

wherein: a is a scaling factor, which satisfies 0.ltoreq.a.ltoreq.1. The probability distribution is changed by the above equation, causing more data to come from the boundary region, thus resulting in an offline training sample set.

In step three), the efficiently generated sample set is input into a training model. CVM through feature mappingThe sample set S is projected into a high dimensional space to build an MEB (S) and solve the MEB problem. And solving the MEB problem by adopting a CVM algorithm. By S _t 、c _t And R is _t The core set, center of sphere and radius over t iterations are represented, respectively. Center and radius sphere B is defined by c _B And r _B Representing, given a positive number ε, the offline training process is as follows:

1)S ₀ 、c ₀ and R is ₀ Initializing:

selecting an arbitrary point z ε S to initialize S ₀ = { z }, in the feature space, z is found _a E S the point furthest from z, then can be found _a Another point z furthest _b E S, initial coreThe heart set is S ₀ ＝{z _a ,z _b The initial sphere center isInitial sphere center R ₀ ；

3) Searching for new MEBs:

Through the steps, an offline training model can be obtained.

Examples:

the inventive example uses an IEEE 39 node system. As shown in fig. 3, the test system involved 39 nodes, 10 generators, 46 transmission lines. The reference power was 100MVA and the reference voltage was 345kV. It is assumed that a synchronization vector measurement unit is installed on all buses in order to collect a large number of data sets. To generate a reasonable data set, the operating conditions of the test system are changed randomly. Consider 10 different load levels (80%, 85%, 90%, 95%, 100%, 105%, 110%, 115%, 120%, 125%), with corresponding changes in generator output. On the basis, a load-changing and power-generating method is adopted to solve the tide problem of the power system. The emergency considered is mainly a three-phase ground fault on each bus, and three locations on each transmission line (25%, 50% and 75% of the length of the line). The simulation assumes that the specific fault occurred at 0.1 seconds and was shut off at 0.3 seconds (or 0.35 seconds, 0.4 seconds). The generator is a fourth-order model, and the load is a constant impedance model. A total of 6310 samples were obtained, and 1890 samples were obtained for testing using a boundary region importance sampling method for these samples. 10 cross-validation was used for 1890 samples obtained, each validation being repeated 10 times.

Four different models were used for testing and training, including: SVM, core vector data description (Core Vector Data Description, CVDD), ball vector machine (Ball Vector Machine, BVM), CVM. Four different evaluation models for the test were evaluated comprehensively using the confusion matrix shown in table 1. In the figure, class=1 and class=0 are respectively represented as stability and instability. f (f) ₁₁ The actual condition and the predicted condition of the system are the same, and the system is in a stable state. f (f) ₀₀ The actual condition and the predicted condition of the system are the same, and the system is in an unstable state. f (f) ₁₀ The representation predicts an unstable state, but the system is actually steady state. f (f) ₀₁ Indicating that the prediction is transient steady state, but that the system is actually unstable.

The accuracy AC, the missed alarm rate FD and the false alarm rate FA are used as evaluation indexes of the classification performance.

TABLE 1

The results of the performance tests for four different types of models are given in table 2, fig. 4. As shown in table 2, the accuracy AC of the CVM model is highest, and the false alarm rate FA and the false alarm rate FD are both lowest. As shown in fig. 4, the data processing time of four different types of models is given, and the CVM model takes the least time. Therefore, the CVM model has higher precision compared with other three models, realizes lower time and space complexity and has higher efficiency than other algorithms.

TABLE 2

Model	AC(％)	FD(％)	FA(％)
				SVM	77.85	13.29	8.86
CVDD	79.11	12.53	8.36
				BVM	83.54	9.88	6.58
CVM	93.04	3.83	3.13

As shown in fig. 5, the results of another study are shown comparing the model evaluation accuracy using three different sampling methods, namely sampling from the entire state space by probability distribution, sampling by uniform sampling, and sampling of boundary region importance. It can be seen that the boundary region importance sampling method shows high accuracy even in the case of a reduction in the data amount.

The results prove the effectiveness of a transient security assessment model based on the boundary area importance sampling and the kernel vector machine. The result shows that the CVM algorithm has extremely high performance, and under the condition of smaller data volume, more information content can be generated by using the boundary region importance sampling method, so that the performance of the evaluation model is improved. The training sample set generation method provided by the invention can be applied to other data mining technologies, and the proposed evaluation model can also solve the safety problem of other power systems and can be applied to actual power system operation.

Claims

1. A method for sampling importance of a boundary region, comprising the steps of:

step 1: determining a boundary region using the information entropy;

step 2: using a sampling method based on a Monte Carlo variance reduction technology MCVR to construct effective sampling, introducing deviation in the sampling process, so that the characterization of rare events in an evaluation stage is increased;

through the steps, the offline training sample set is efficiently generated.

2. The boundary region importance sampling method according to claim 1, wherein in step 1, the boundary region is determined using information entropy as shown in formula (5):

wherein: s is a sample data set, C is the number of categories, p _i The proportion in S classified as class i; from the concept of entropy, a measure of purity of the sample database can be obtained; the larger the value of E (S) is, the lower the purity, i.e., the more abundant the information content is, and therefore, a boundary region is defined as a place where the entropy value is relatively large, and the boundary region is determined by this method.

3. The boundary region importance sampling method according to claim 1 or 2, characterized in that in step 2, an efficient sampling is constructed using a sampling method based on the monte carlo variance reduction technique MCVR, wherein deviations are introduced during the sampling process, such that the characterization of rare events in the evaluation phase is increased, and wherein in step 1 the boundary region is determined, and wherein the sampling process is biased towards the boundary region using a sampling method based on the monte carlo variance reduction technique MCVR.

4. A method according to claim 3, characterized in that in step 2, it comprises in particular the steps of:

1) Variance reduction of importance samples:

equation (6) can thus be defined as shown in equation (8):

the expectation function of equation (8) gives a rough Monte Carlo estimate, where y _i Is a Monte Carlo sample extracted from the f (y) distribution, this estimate has a variance associated with it, since h (y) _i ) Number of (a) with y _i The variance of the estimate is reduced by reconstructing the desired function as shown in equation (9):

wherein: y is _i Is a Monte Carlo sample extracted from distribution g (y), which ensuresThe number of (2) is almost equal to y _i Is consistent in number;

2) Efficient generation of training samples:

wherein: s is a bounding region, in the univariate case, S= { x: x is defined ₁ ≤x≤x ₂ Sampling distribution function g _X (x) Can be constructed as |h (x) |f (x), f _X (x) Is S, and the sample density importance is expressed as shown in formula (11):

a＝∫ _S f _X (x)dx (13)

wherein: a is a scaling factor, satisfying 0.ltoreq.a.ltoreq.1, and the probability distribution is changed as illustrated by the formulas (12) and (13), so that more data comes from the boundary area, and thus an offline training sample set is obtained.

5. The method according to claim 1 or 2 or 4, characterized in that in obtaining an offline training model from the sample set, the following steps are taken:

the sample set generated efficiently is input into a training model, and the kernel vector machine CVM is mapped through characteristicsProjecting the sample set S to a high-dimensional space to establish a minimum bounding sphere MEB, solving the minimum bounding sphere MEB problem, adopting a kernel vector machine CVM algorithm to solve the minimum bounding sphere MEB problem, and using S _t 、c _t And R is _t Respectively representing a core set, a sphere center and a radius after t times of iteration, wherein the sphere B with the center and the radius is formed by c _B And r _B Representing, given a positive number ε, the offline training process is as follows:

1)S ₀ 、c ₀ and R is ₀ Initializing:

2) If there is no pointIf the core set falls outside the (1+epsilon) sphere, the algorithm is ended, otherwise, the core set is S _t+1 ＝S _t U { z }, z is->Separation c _t The furthest point;

3) Searching for a new minimum bounding sphere MEB:

new MEB (S) _t+1 ) Given by step 2), andand->Can be according toObtained, wherein α= [ α ] ₁ ,α ₂ ,...,α _m ]' is the Lagrangian multiplier, k is the kernel matrix, and then go to 2) step for next iteration;

through the steps, an offline training model can be obtained.