CN115293249A

CN115293249A - Power system typical scene probability prediction method based on dynamic time sequence prediction

Info

Publication number: CN115293249A
Application number: CN202210877305.5A
Authority: CN
Inventors: 廖思阳; 姜新雄; 徐箭; 李琰; 王新迎; 尚学军; 王天昊
Original assignee: Wuhan University WHU; China Electric Power Research Institute Co Ltd CEPRI; State Grid Tianjin Electric Power Co Ltd
Current assignee: Wuhan University WHU; China Electric Power Research Institute Co Ltd CEPRI; State Grid Tianjin Electric Power Co Ltd
Priority date: 2022-07-25
Filing date: 2022-07-25
Publication date: 2022-11-04

Abstract

The invention relates to a typical scene probability prediction method of an electric power system based on dynamic time sequence prediction, which comprises the steps of firstly, constructing a dynamic time sequence prediction model aiming at characteristic variables; modeling the data knowledge by using a support vector machine to obtain a data sample decision score; modifying the sample decision score by adopting a sigmoid function, and mapping the sample decision score to an interval [0,1]; and finally, determining the value of the sigmoid parameter by utilizing maximum likelihood estimation to realize quantitative probability prediction of a target typical scene. The invention uses the maximum mutual information to carry out dimensionality reduction on the original data, combines a dynamic time sequence prediction model with a scene classification model, and uses a decision score probability transformation method based on the maximum likelihood estimation to transform the classification information of a typical scene of the power system into probability information, thereby being more beneficial to the dispatching operation personnel to evaluate the future risk of the system and making a more accurate dispatching operation plan.

Description

Power system typical scene probability prediction method based on dynamic time sequence prediction

Technical Field

The invention belongs to the technical field of typical scene probability prediction of an electric power system, and particularly relates to a typical scene probability prediction method of the electric power system based on dynamic time sequence prediction.

Background

With the rapid development of new energy and stochastic load, when the power grid is accessed in a large scale, the randomness and the intermittence of the new energy and the stochastic load can greatly improve the difficulty of planning and scheduling of the power grid, and can also cause different scenes such as serious faults, heavy load of important sections, unbalance of power supply and demand, and blocked new energy consumption of the power grid. However, the current scheduling mode is a passive waiting type scheduling mode, cannot meet the active scheduling requirement, mainly represents that the power grid accident is passively waited for, and has a space for improving predictable risk early warning and prevention of natural disasters and the like; no effective monitoring means is provided for the power fluctuation of the unit, and the power plant reports are relied on; the method has no monitoring means for the power grid emergent public sentiment events and relies on lower-level scheduling reporting; technical support system information is frequently scattered, and a centralized monitoring function is lacked. If the typical operation scene can be effectively predicted, operators are early warned and dispatched in advance, and controllable resources are dispatched as early as possible to participate in active adjustment of the power system, so that the monitoring and predicting capabilities of the power system on abnormal states can be greatly improved, and the safe and stable operation level of a power grid is improved.

The existing research related to scene prediction of the power system mainly focuses on aspects such as situation perception and stability assessment. Situational awareness is the recognition of a large number of environmental elements in time and space, understanding their meaning, and predicting their state in the near future to realize decision advantages. The definition of situational awareness, which means the extraction, understanding and prediction of the short term of the future of environmental elements within a certain time and space, was first proposed by Endsley in 1988. It is believed that situational awareness can be roughly divided into 3 levels, namely "cognition, understanding, and prediction". The core part of situation awareness includes: extraction of situation elements, understanding of current situation, prediction of future situation, decision making and action implementation. The research content in the stability assessment mainly comprises the related content of the static stability assessment and the transient stability assessment. The existing research on static stability prediction comprises line power flow out-of-limit prediction, node voltage out-of-limit prediction, limit transmission capacity prediction and the like. Most of the methods construct a learning model based on data driving based on part of measurable variables of the system, so that the purpose of prediction is achieved, but the evaluation result of the methods is often the state type of the future power system or the numerical value of a certain concerned index, however, under general conditions, only type information or index values are difficult to reflect the critical degree of the power system scene, scheduling operators cannot be helped to accurately judge the situation, the output result of the methods cannot play an ideal role, and workers still need to judge the situation again according to own experience and make strategies.

Therefore, based on the analysis and the explanation, the operation state of the system becomes more complex and changeable in consideration of the current situation of rapid development of new energy at present, in order to fully master the operation development situation of the system and provide more feed-forward information for the formulation of a scheduling strategy, thereby improving the situation perception capability of the power system, the invention designs a method for outputting the combined sigmoid function correction probability based on decision scores of a Support Vector Machine (SVM), thereby realizing the quantitative probability early warning of the future state of a target scene or a key element, and further providing a more reliable indication effect for the operation scheduling of the system. According to the method, firstly, a time sequence prediction model of typical scene key variables is constructed based on a long-and-short term memory network (LSTM), then, an SVM is used for carrying out classification learning on historical samples with screened characteristics, the historical samples can be accurately classified according to the scene attributes of the samples, so that the future states are accurately predicted, probability correction learning is carried out by using a Sigmoid function based on decision scores output by the SVM model, parameters of the Sigmoid function are determined by combining a maximum likelihood method, mapping from the decision scores to probability values is completed, and therefore the quantitative probability prediction of a power system target typical scene is finally achieved.

Disclosure of Invention

The invention provides a typical scene probability prediction method of an electric power system based on dynamic time sequence prediction. Firstly, screening out a characteristic subset related to a power system target typical scene based on a maximum mutual information measurement method according to physical characteristics of the power system target typical scene, collecting time series data of characteristic variables from historical data, and forming a multi-dimensional time series data set by combining a target typical scene state sequence; based on the long and short time memory network and the historical time sequence data, constructing a dynamic time sequence prediction model aiming at the associated characteristic variables through cross validation and grid search; converting the scene prediction problem into a classification problem, modeling the data knowledge by using a support vector machine, and acquiring a decision score for each data sample; then, a sigmoid function with parameters is adopted to modify decision score output of the SVM, and a decision score value can be mapped to an interval [0,1], so that probability output is realized; and finally, determining parameter values in the sigmoid function by using a maximum likelihood estimation method, and realizing mapping from decision scores to probability values, thereby finally realizing quantitative probability prediction of a typical target scene of the power system.

The invention provides a power system typical scene probability prediction method based on dynamic time sequence prediction, which is characterized by comprising the following steps of:

screening out a feature subset related to the power system based on a maximum mutual information measurement method according to physical characteristics of a target typical scene of the power system, collecting time series data of feature variables from historical data, and forming a multi-dimensional time series data set by combining a target typical scene state sequence;

based on the long and short time memory network and the historical time sequence data, a dynamic time sequence prediction model for the associated characteristic variables is constructed through cross validation and grid search, a support vector machine is used for modeling data knowledge, historical samples of the screened characteristics are subjected to classified learning, and a decision score for each data sample is obtained;

the decision score output of the SVM is modified by adopting a sigmoid function with parameters, the decision score value is mapped to an interval [0,1], the parameter value in the sigmoid function is determined by utilizing a maximum likelihood estimation method, the mapping from the decision score to the probability value is realized, a typical scene probability prediction model of the power system target is obtained, and finally the quantitative probability prediction of the typical scene of the power system target is realized by combining the dynamic time sequence prediction result of the characteristic variable.

In the prediction method, the typical target scene of the power system needing to be predicted is determined, and the state sequence of the typical target scene is constructed according to historical data information

N is the total number of data points, y _k Value of the target scene state at the kth time point, y _k ∈{0,1}，y _k =1 indicates that the target scene occurred, positive example, y _k =0 indicates that the target scene did not occur, negative example. And simultaneously acquiring time series data of each characteristic variable recorded in historical data information, recording the time series data as X, then acquiring a maximum Mutual Information Coefficient (MIC) coefficient between each characteristic variable and a target typical scene state sequence Y by using an MIC measurement method, setting a threshold value to eliminate low-correlation characteristic variables, and acquiring a characteristic subset

D' is equal to the total number of characteristic variables D minus the rejectedExcept for the number of characteristic variables. Then, time series data of the feature variables included in the feature subset Q are extracted from the historical data, and combined with the target typical scene state sequence Y, a multi-dimensional time series data set is formed.

In the above formula, ts ⁱ Denotes the ith characteristic variable F ⁱ The time-series data of (a) is,

the data measurement value of the kth time point in the time sequence data of the ith characteristic variable is shown as D, the total number of the characteristic variables is shown as D, and the total number of the data points, namely the length of the time sequence data, is shown as N.

In the prediction method described above, a dynamic time series prediction model, called a characteristic variable dynamic time series prediction model, for each of the characteristic variables included in the characteristic subset Q is constructed using a long-short time memory network based on the acquired multidimensional time series dataset. The training input for the long and short term memory network is

X' _k To input a multidimensional time series sample, x' _k+α Is corresponding to sample X' _k α is the number of advance time steps of the time series prediction. Through cross validation and grid search, the dynamic time sequence prediction model of the feature variables obtained through training can realize prediction of the feature variables in the feature subset Q in advance by alpha time steps.

Wherein L is the length of each time series segment in the multi-dimensional time series sample.

In the prediction method, the power system target is established according to the constructed multi-dimensional time series data setTypical scene prediction classification supervised format dataset

y _k =1 denotes sample x' _k Belong to the positive example, y _k =0 denotes sample x' _k Belonging to negative examples, which were then divided into m groups. Extracting m-1 groups of samples, constructing a typical scene classification model of the power system by using a support vector machine model to obtain an SVM decision function f (), then obtaining decision scores aiming at the remaining 1 groups of samples by using the decision function f (), storing the decision scores, repeating the process for m times, obtaining the decision score aiming at each data sample when the m-1 groups of samples extracted each time are different, and establishing a decision score-label set

For k-th data sample x' _k For which the decision score of the support vector machine model is f _k ，y _k Taking a value of the target scene state at the kth time point, wherein k =1,2,. Cndot.N;

in the prediction method, a sigmoid function with A and B parameters is adopted to modify the decision score output by the SVM, and the decision score is mapped to [0,1]]Interval and based on the decision score-label set obtained in step 3

And determining the values of parameters A and B of the sigmoid function by using a maximum likelihood estimation method, and realizing the conversion from the decision score to the occurrence probability of the target typical scene.

The decision score mapping form based on the sigmoid function is as follows:

in the formula: A. b is a sigmoid function parameter, f is a decision score value corresponding to the input sample x ', and P (y =1 calcuum x ') represents the probability that the input sample x ' belongs to a positive example.

Based on the obtained decision score-tag set

The specific process of solving the parameters A and B by using the maximum likelihood estimation method is as follows:

in the formula:

is the probability estimate corresponding to the k-th sample belonging to the positive example. N is a radical of ₊ Is the number of samples belonging to the positive example, N, in the total samples _- The number of samples belonging to negative examples in all samples. Parameters A and B can be obtained by solving the minF (A and B), so that the decision score of the SVM model is converted into probability output.

In the prediction method, the latest multidimensional time series sample X 'obtained at the current time t is dynamically acquired in real time' _t Inputting the predicted values into the established characteristic variable dynamic time sequence prediction model to obtain predicted values x ' of D ' characteristic variables with time step alpha ahead ' _t+α X' _t+α Inputting the decision score f into the typical scene classification model of the SVM power system established in the step 3 to obtain the corresponding decision score f _t+α And obtaining a sample x 'by using the sigmoid function of the parameters A and B determined in the step 4' _t+α Probability p of belonging to a positive case _t+α Namely the probability of the occurrence of the target typical scene of the power system at the future alpha time step, and the probability prediction of the target typical scene advancing the alpha time step is completed.

The invention provides a power system typical scene probability prediction method based on dynamic time sequence prediction for the first time, and probability output is superposed on the type prediction of the power system typical scene, so that more reference information of scheduling operators is provided, and a more accurate scheduling control strategy is facilitated to be established. Firstly, screening out a characteristic subset which is strongly related to a target scene based on a maximum mutual information measurement method, collecting time sequence data of characteristic variables from historical data, and forming a multi-dimensional time sequence data set by combining a scene state sequence; then, constructing a dynamic time sequence prediction model of the associated characteristic variables based on the long and short time memory network and the historical time sequence data; then, learning the scene data samples by using a support vector machine to obtain a decision score for each data sample, modifying the decision score of the SVM by using a sigmoid function with parameters, and mapping the decision score to an interval [0,1]; and finally, determining parameter values in the sigmoid function by using a maximum likelihood estimation method, and realizing quantitative probability prediction of a typical scene of the power system. The invention has the following advantages: 1. the dimension reduction is carried out on the original data by using a maximum mutual information coefficient measurement method, the subsequent training efficiency of a characteristic variable dynamic time sequence prediction model based on a long-and-short-time memory network and a typical scene classification model based on an SVM is improved, and the complexity of the overall model is reduced; 2. the characteristic variable dynamic time sequence prediction model is combined with the typical scene classification model, so that the dynamic perception of the future state of the power system is realized, and the comprehensive understanding of the scene trend of the power system is facilitated; 3. the decision score probability conversion method based on the maximum likelihood estimation is used for converting the classification information of the typical target scene of the power system into the probability information, so that the scheduling operator can evaluate the future risk of the system and make a more accurate scheduling operation plan.

Drawings

FIG. 1 is a diagram of an SVM classification model.

FIG. 2 is a schematic diagram showing the fluctuation segment of the actual load 1 in the embodiment of the present invention.

Fig. 3 is a schematic diagram showing the fluctuation segment of the actual load 2 in the embodiment of the present invention.

FIG. 4 is a schematic diagram of key variables extracted based on MIC in the present invention.

FIG. 5 is a schematic diagram of an optimal prediction model training process according to an embodiment of the present invention.

FIG. 6 is a diagram illustrating the time-series prediction effect of feature 8 according to an embodiment of the present invention.

FIG. 7 is a diagram illustrating the time-series prediction effect of the feature 17 according to the embodiment of the present invention.

FIG. 8 is a diagram illustrating the time-series prediction effect of the feature 15 according to the embodiment of the present invention.

FIG. 9 is a diagram illustrating the effect of scene probability prediction in an embodiment of the present invention.

FIG. 10 is a schematic flow chart of the method of the present invention.

Detailed Description

The technical scheme of the invention is further specifically described by the following examples and the accompanying drawings.

1. Typical scene associated feature selection based on maximum mutual information coefficient

In order to screen out the feature subset most relevant to the target typical scene of the power system and reduce the calculation complexity of the subsequent learning model construction, the method measures the correlation coefficient between the alternative features and the target typical scene by using a maximum mutual information coefficient method, and sets a threshold to eliminate low-correlation features.

The maximum mutual information coefficient is an effective method for measuring the correlation between bivariates, has strong robustness, and can capture the linear and nonlinear correlations between a wide range of bivariates. The basic idea of the maximum mutual information coefficient is: meshing is performed on a scatter diagram drawn based on data of two variables, and then correlation between bivariables is evaluated based on the meshing. Therefore, in order to obtain the maximum mutual information coefficient between two variables, the gridding strategy is adopted as much as possible, and the maximum correlation coefficient value which can be calculated under all gridding conditions is called the maximum mutual information coefficient.

In particular, it is assumed that a given set of candidate features is given

F ⁱ And representing the ith characteristic variable, collecting time sequence data of the characteristic variables from historical data, and recording the time sequence data as X:

in the formula, ts ⁱ Denotes the ith characteristic variable F ⁱ The time-series data of (a) is,

the data measurement value of the kth time point in the time sequence data of the ith characteristic variable is shown as D, the total number of the candidate characteristic variables is shown as D, and the total number of the data points, namely the length of the time sequence data, is shown as N.

Likewise, a sequence of states relating to a typical scene of the object may be obtained

N is the total number of data points, y _k Value of the target scene state at the kth time point, y _k ∈{0,1}，y _k =1 indicates that the target scene occurred, positive example, y _k =0 indicates that the target scene did not occur, negative example. Thus, for the ith characteristic variable F ⁱ Based on its historical data and the sequence of target scene states, a set of data pairs can be constructed

The scattered points are distributed in a two-dimensional coordinate system to obtain scattered point distribution. Dividing the horizontal axis value into a intervals, and dividing the vertical axis value into b intervals (no data point exists in the allowed interval), namely a-by-b grid division, wherein a and b are positive integers and are marked as G. Given a value of oneA mesh division mode G using D ⁱ | _G Represents D ⁱ Of the grid G, and thus for a certain data set D ⁱ Different meshing strategies G may result in different scatter distributions D ⁱ | _G . Thus, for one data set D ⁱ And a fixed a, b parameter, the maximum mutual information value that can be obtained on all possible meshing strategies G can be expressed as follows:

I ^* (D ⁱ ,a,b)＝maxI(D ⁱ | _G )

in the formula, I (D) ⁱ | _G ) Representing a distribution of scatter D ⁱ | _G Mutual information between the calculated variables, from which D can be defined ⁱ With respect to I ^* The characteristic matrix of (2) is as follows:

in the formula, M (D) ⁱ ) Is D ⁱ Characteristic matrix of M (D) ⁱ ) _a,b The elements representing the a row a and column b in the matrix, the mutual information value can be normalized to [0,1] by the denominator in the above equation]And the interval is adopted, so that the fair and maximum mutual information coefficient comparison among different characteristics is ensured. Thus, the ith characteristic variable F ⁱ The maximum Mutual Information Coefficient (MIC) with the object scene tag Y can be defined as the matrix M (D) ⁱ ) Maximum value of (2):

MIC(F ⁱ ,Y)＝max _ab＜B(N) {M(D ⁱ )}

wherein, MIC (F) ⁱ Y) denotes the ith characteristic variable F ⁱ The maximum mutual information coefficient with the target scene label Y, B (N) is a function of the number of data points N, which controls how many meshing strategies need to be considered, which is usually set to B (N) = N ^0.6 . MIC values in the interval [0,1]Within, a larger value indicates a higher correlation between the two. The maximum mutual information coefficient between the D alternative variables and the target scene label sequence is calculated in sequence, so that the D alternative variables and the target scene can be obtainedThe correlation between the key characteristic variables can be screened by setting a threshold value to obtain a key characteristic variable subset

D' is equal to the total number of the alternative variables D minus the number of the characteristic variables to be eliminated. Time series data of the feature variables contained in the feature subset Q can then be extracted from the historical data and combined with the target typical scene state sequence Y to form a multi-dimensional time series dataset.

2. Construction of characteristic variable dynamic time sequence prediction model

(1) Long and short time memory network

With the explosive growth of data volume and the improvement of computer performance, the traditional neural networks limit the possibility that the efficiency and performance can be further improved so as to deal with big data problems due to the limitations of the traditional neural networks; conventional machine learning often fails to process data in raw format. For example, when a picture is to be classified, all pixel values of the whole picture are not used as input, but features of the picture are artificially extracted, converted into a digital form, and then used for training a network. Deep learning is one of important components of machine learning, and refers to a method set which is based on a deep neural network and is used for designing a series of algorithms for feature learning and processing so as to solve the problems of detection, tracking, classification and the like of images and texts. Deep learning is representation learning, which is a learning method enabling a machine to automatically detect the characteristics of original data, and is used for a plurality of representation layers which are combined together in a simple but nonlinear module form, and each layer converts the representation into higher-layer abstraction and transmits the higher-layer abstraction to the next layer. Deep learning mainly takes the characteristics of learning data as a core task, and acquires hierarchical data characteristics through different layers of neural networks, so that the problem that the characteristics need to be manually extracted in the past is solved. There are many kinds and variations of Neural networks used in deep learning, and at present, there are two kinds of Neural networks which are most widely applied and are the most common, namely, RNN (Recurrent Neural Network) and CNN (Convolutional Neural Network).

CNN (Convolutional Neural Network): it can be seen as an advanced version of the standard neural network. It includes a convolutional layer (convolutional layer), a pooling layer (pooling layer), and a full-connected layer. These results enable it to receive the full pixel values of a picture without the need for artificial feature extraction;

RNN (Recurrent Neural Network ): it is a completely different model from CNN, dedicated to the dynamic behavior of the presentation sequence data. A Recurrent Neural Network (RNN) is a type of Recurrent Neural Network (Recurrent Neural Network) that takes sequence data as input, processes the input sequence in combination with an internal state (internal state), is dedicated to showing the dynamic behavior of the sequence data, recurses in the evolution direction of the sequence, and all nodes (Recurrent units) are connected in a chain. Time series data refers to data collected at different points in time, which reflects the state or degree of change of a certain object, phenomenon, etc. over time. This is the definition of time series data, but it may not be time, such as text sequences, but the total sequence data has a feature that later data is related to earlier data.

As known from a traditional neural network, the neural network comprises an input layer, a hidden layer and an output layer, the process is from the input layer to the hidden layer to the output layer, the output is controlled by activating a function, the layers are fully connected through weights, and nodes between every two layers are not connected. The activation function is determined in advance, and what the neural network model learns through training is contained in the weight. However, the basic neural network only establishes the weight connection between layers, and cannot be used for many problems. For example, you would typically need to use the previous word to predict what the next word in a sentence is, because the previous and next words in a sentence are not independent. The largest difference of RNN is that a weight connection is also established between layer-to-layer neurons. The RNN is called a recurrent neural network, i.e., the current output of a sequence is also related to the previous output. The concrete expression is that the network memorizes the previous information and applies the previous information to the calculation of the current output, namely, the nodes between the hidden layers are not connected but connected, and the input of the hidden layer not only comprises the output of the input layer but also comprises the output of the hidden layer at the last moment. In theory, RNNs can process sequence data of any length. This ability to obtain an efficient representation of information between time series has led to the widespread use of RNNs in the fields of Natural Language Processing (NLP), such as speech recognition, language modeling, machine translation, etc., as well as various types of time series predictions, and has yielded a range of superior results.

The long-short-time memory network (LSTM) architecture is an improvement over the traditional RNN model. The long-short term memory network LSTM is a special variant of RNN, RNN only has short-term memory due to gradient disappearance, and LSTM brings addition operation into the network through delicate gate control, thereby solving the problem of gradient disappearance to a certain extent and learning long-term dependence information. However, too long sequences still exhibit "gradient vanishing" (which may occur for lengths exceeding 300), so LSTM is called "short-term memory" a bit longer. LSTM was proposed by Hochreiter & Schmidhuber (1997) and was modified and promoted by AlexGraves. LSTM has enjoyed considerable success and widespread use in a number of problems.

(2) Characteristic variable dynamic time sequence prediction model based on LSTM

Based on the acquired multi-dimensional time series data set, a dynamic time series prediction model for each characteristic variable included in the characteristic subset Q is constructed by using a long-time and short-time memory network, and is called as a characteristic variable dynamic time series prediction model. The training input for the long and short term memory network is

X' _k To input the multidimensional time series of samples, as follows:

where D' is the dimension of the feature subset and L is the length of each time series segment in the multi-dimensional time series sample.

x' _k+α Is corresponding to sample X' _k The regression prediction target of (1) is as follows:

where α is the number of advance time steps of the timing prediction.

And finally, predicting the advanced alpha time step of the characteristic variable in the characteristic subset Q by a dynamic time sequence prediction model of the characteristic variable obtained by training through cross validation and grid search.

3. Typical scene classification model construction and probability transformation method for power system

(1) Support vector machine model

The Support Vector Machine (SVM) is widely applied to a power system, shows good prediction performance in aspects of transient stability evaluation, transformer fault diagnosis and the like, and has the characteristics of few training samples, strong generalization capability and the like. Meanwhile, for any sample x, the SVM can acquire the distance from the SVM to the classification hyperplane, so that the confidence degree of the classification result can be defined, and the probability significance is achieved. The SVM searches for a classification hyperplane in the high-dimensional space by converting the input space into the high-dimensional feature space, so that the classification interval is maximized on the premise of separating the sample points without errors, and the optimal classification effect is obtained.

The support vector machine is a two-classification model, the basic model of the support vector machine is a linear classifier which is defined on a feature space and has the maximum interval, and the maximum interval makes the support vector machine different from a perceptron; the support vector machine also includes kernel skills, which make it a substantially non-linear classifier. The learning strategy of the support vector machine is interval maximization, and can be formulated as a problem of solving convex quadratic programming (convex quadratic programming), which is also equivalent to the minimization problem of the regularized hinge loss function. The learning algorithm of the support vector machine is an optimization algorithm for solving convex quadratic programming.

The learning method of the support vector machine comprises the steps of constructing a model from simple to complex, wherein the simple model is the basis of the complex model and is also the special case of the complex model. When the training data is linearly divisible, learning a linear classifier, namely a linear branching support vector machine (also called a hard interval support vector machine), through hard interval maximization (hard interval maximization); when the training data is approximately linear and divisible, a linear classifier, namely a linear support vector machine, is also learned through soft interval maximization (soft margin maximization); when the training data is linearly infeasible, the nonlinear support vector machine is learned by using kernel trick and soft interval maximization.

When the input space is a euclidean space or a discrete set and the feature space is a hilbert space, a kernel function (kernel function) represents an inner product between feature vectors obtained by mapping the input from the input space to the feature space. The nonlinear support vector machine can be learned by using a kernel function, which is equivalent to learning a linear support vector machine implicitly in a high-dimensional feature space, and such a method is called kernel skill. The kernel method is a more general machine learning method than the support vector machine.

(2) Typical scene classification model construction and probability transformation method based on SVM

According to the characteristic subset historical time sequence data and the target typical scene state sequence, a power system target typical scene prediction classification supervision format data set can be established

y _k =1 denotes sample x' _k Belong to the positive example, y _k =0 denotes sample x' _k Belong to the negative example, k =1,2,., N:

wherein x' _k ∈R ^D′ D representing the kth sample' dimensional input feature, y _k Is the sample label value. The SVM adopts a kernel function K (.) to map an original problem to a certain high-dimensional space, and then an optimal classification hyperplane f (x') is constructed in the transformed high-dimensional space, namely an SVM decision function:

in the formula: alpha is alpha _k For lagrange multipliers, b ∈ R is the offset. Alpha is alpha _k This can be obtained by solving the following optimization problem:

s.t.C≥α _k ≥0,k＝1,...,N

wherein C epsilon R is a penalty factor. Kernel function K (x' _k ,x′ _j ) The method has multiple functions, can approach any function with any small error by adopting a radial basis kernel function, adopts the kernel function to construct a model in many existing researches, and obtains good test performance, so that the radial basis kernel function is selected in the research:

in the formula: gamma epsilon R is a core parameter. In general, the optimal parameters γ and C of the model can be obtained through grid search or heuristic algorithm.

As shown above, for any sample x 'to be classified, the decision function f (x') is output as a certain numerical value, which is the decision score. In the binary classification problem, the SVM can classify the samples into positive and negative classes based on whether the decision score is greater than 0, thereby completing class prediction, but the decision score can only reflect the distance of the samples from the classification hyperplane, and has no probability significance. A sigmoid function with parameters is adopted to modify decision score output of the SVM, and a decision function value can be mapped to an interval [0,1], so that probability output is realized, and the probability output form is as follows:

in the formula: A. b is a sigmoid function parameter, f is a decision score value corresponding to the input sample x ', and P (y =1 _ y _ x ') represents the probability that the input sample x ' belongs to a positive example.

A data set

An SVM classifier and parameters A and B need to be trained simultaneously. In order to avoid overfitting, an original training sample is divided into two parts by adopting a cross validation mode, one part is used for training an SVM classification model, and the other part is used for determining A and B of a sigmoid function by using a maximum likelihood estimation method. According to the above thought, the training sample set can be collected

Dividing the samples into m groups, training the m-1 groups to obtain an SVM model and calculating the decision scores of the rest group of samples, processing the samples for m times to obtain the decision scores of all the samples, and further forming a decision score-label set

The specific process for solving the parameters A and B based on the maximum likelihood estimation method is as follows:

in the formula:

is an estimate of the probability corresponding to the kth sample belonging to the positive example. N is a radical of hydrogen ₊ Is the number of samples belonging to positive examples in the total samples, and N-is the number of samples belonging to negative examples in the total samples. Parameters A and B can be obtained by solving the minF (A and B), so that the decision score of the SVM model is converted into probability output.

The probability estimation can reflect the probability that the sample x' to be identified belongs to different categories, effectively normalize the distance between the sample and the classification hyperplane, and output the probability p for a common two-classification problem _k Between 0% and 100%. Now, besides the prediction labels given by the classification model, the probability output gives more information to people, so that related people can more comprehensively know the situation.

4. Simulation verification of scene probability prediction model

In the calculation example, a key section heavy-load typical scene of an electric power system is taken as an example, an IEEE300 node system is used for simulation verification of probability prediction of the typical scene of the electric power system based on dynamic time sequence prediction, in order to construct training samples with sufficient quantity and rich types for training an LSTM dynamic time sequence prediction model and an SVM scene probability prediction model and have good generalization capability, actual load data acquired in a Huazhong power grid are firstly utilized and accessed into the IEEE300 node system to form a fluctuation source, then the numerical values of monitoring variables in each time section system are obtained through load flow calculation, so that a complete data sample is formed, and then the data sample is incorporated into the time sequence prediction model and the scene probability prediction model, so that a classification model capable of online rolling prediction of the probability of a future scene is trained.

Based on the above thought, the research selects 360-day actual data of two actual load points, the load data sampling interval is 15min when the actual data is accessed to two nodes of the system, fig. 2 and fig. 3 show partial segment characteristics of the actual data, and it can be seen that the actual data has relatively large fluctuation characteristics, and after time sequence power flow simulation calculation, 34560 sample data of time points are formed in total and 1960 alternative data characteristics are included. According to the maximum mutual information coefficient method, 19 key features are screened out, and the specific information is shown in fig. 4.

The extracted data information of the key variables is input into an LSTM network for training, in the research, a scheme that the data of the previous 301 days is used as training, and the data of the next 59 days is used as testing is adopted, and the scheme is also applied to an SVM scene classification model.

After a sample complete set is divided into a training set and a testing set, a cross validation mode is adopted on the training set, and grid search is combined, an optimal model and parameter setting are learned through the mode, the parameters enable the trained model to have generalization capability as large as possible, after the model parameters are determined, the model is trained through all data sets, so that the constructed model learns more knowledge as possible, after the time sequence prediction model is constructed, a similar method is adopted, data of the previous 301 days are used for training an SVM classification model, whether the overload of a certain key section is used as a label in the present example (namely, the overload of the key section is used as a positive example and the overload is not used as a negative example in the learning process), so that the SVM model can learn the data characteristics of the overload sample from historical data, and accurate scene classification prediction can be made based on the LSTM time sequence prediction information in the future. FIG. 5 illustrates the loss reduction process during the optimal LSTM model training process. Fig. 6-8 are illustrations of the dynamic time-series prediction effect of partial variables.

As shown by the time sequence prediction effects of FIGS. 6-8, the time sequence prediction model constructed in the invention can better predict the short-term future value-taking situation of the key variable, which also lays a solid foundation for the probability prediction of the subsequent scene. Through the construction of an LSTM model and an SVM model and based on the SVM decision score output and conversion-based scene probability prediction method described above, the decision score of the SVM is converted into a scene probability through a sigmoid function, so that the function of carrying out quantitative probability early warning on a typical scene of a future power system based on dynamic time sequence prediction can be realized, and the probability prediction effect is shown in fig. 9.

As can be seen from fig. 9, the SVM model better predicts the future scene category attribution of the system, only one time point is not predicted accurately in the simulation at 96 points a day, but under the effect of the scene probability prediction model provided by the present invention, when the category information prediction is incorrect, the probability prediction information can still assist the scheduling personnel to judge the situation, and when the scene probability prediction is available, the judgment of the future scene type is no longer non-zero, i.e., one, and the probability prediction information is available, so that more confidences can be added to the judgment of the scheduling personnel, thereby assisting the scheduling personnel to make more accurate and effective decisions. As can be seen from the figure, the probability prediction curve (dotted line) better tracks the change of the typical scene category of the system and provides more reference information on the top of the category information.

The specific embodiments described herein are merely illustrative of the spirit of the invention. Various modifications or additions may be made to the described embodiments or alternatives may be employed by those skilled in the art without departing from the spirit or ambit of the invention as defined in the appended claims.

Claims

1. A typical scene probability prediction method of an electric power system based on dynamic time sequence prediction is characterized by comprising the following steps

constructing a dynamic time sequence prediction model aiming at the associated characteristic variables through cross validation and grid search based on the long and short time memory network and the historical time sequence data, modeling data knowledge by using a Support Vector Machine (SVM), performing classification learning on the historical samples of the screened characteristics, and acquiring a decision score aiming at each data sample;

2. The prediction method of claim 1, wherein a typical target scenario of the power system to be predicted is determined, and a state sequence of the typical target scenario is constructed according to historical data information

N is the total number of data points, y _k Value of the target scene state at the kth time point, y _k ∈{0,1}，y _k =1 indicates that the target scene occurred, positive example, y _k =0 indicates that the target scene does not occur, negative example; and simultaneously acquiring time series data of each characteristic variable recorded in historical data information, recording the time series data as X, then acquiring a maximum Mutual Information Coefficient (MIC) coefficient between each characteristic variable and a target typical scene state sequence Y by using an MIC (MIC) measurement method, setting a threshold value to remove low-correlation characteristic variables, and obtaining a characteristic subset

D' is equal to the total number D of the characteristic variables minus the number of the characteristic variables to be eliminated; then, time series data of the feature variables included in the feature subset Q are extracted from the historical data, and combined with the target typical scene state sequence Y, a multi-dimensional time series data set is formed.

3. The prediction method according to claim 1, wherein a long-and-short-time memory network is used to construct a dynamic time sequence prediction model for each feature variable included in the feature subset Q, referred to as a feature variable dynamic time sequence prediction model; training input for long and short term memory networksIs composed of

X' _k To input a multidimensional time series sample, x' _k+α Is corresponding to sample X' _k Alpha is the advance time step number of the time sequence prediction; through cross validation and grid search, the dynamic time sequence prediction model of the characteristic variables obtained through training can realize prediction of the characteristic variables in the characteristic subset Q in advance of the alpha time step.

4. The prediction method according to claim 1, wherein a power system target typical scene prediction classification supervision format data set is established

y _k =1 denotes sample x _k ' belonging to the positive example, y _k =0 for sample x _k ' belonging to negative examples, which were subsequently divided into m groups; extracting m-1 groups of the decision scores, constructing a typical scene classification model of the power system by using a support vector machine model to obtain an SVM decision function f (), then obtaining and storing decision scores aiming at the rest 1 groups of samples by using the decision function f (), repeating the process for m times, obtaining the decision score aiming at each data sample when the m-1 groups of samples extracted each time are different, and establishing a decision score-label set

For the k-th data sample x _k ', for which the decision score of the support vector machine model is f _k ，y _k For the target scene state value at the kth time point, k =1, 2.

5. The prediction method of claim 1, wherein the decision score of the SVM output is modified by using a sigmoid function with a and B parameters, and the decision score is mapped to [0, 1%]Interval and based on the decision score-label set obtained in step 3

6. The prediction method according to claim 1, characterized in that the decision score mapping form based on the sigmoid function is adopted as follows:

in the formula: A. b is the sigmoid function parameter, f is the decision score value corresponding to the input sample x ', and P (y =1x ') represents the probability that the input sample x ' belongs to a positive case.

7. The prediction method of claim 1, wherein the set of labels is based on the obtained decision score

in the formula:

is the probability estimate corresponding to the kth sample belonging to the positive example; n is a radical of ₊ Is the number of samples belonging to the positive example, N, out of all samples _- The number of samples belonging to negative examples in all samples is taken; by solving for minF (A, B), canSo as to obtain parameters A and B, and convert the decision score of the SVM model into probability output.

8. The prediction method of claim 1, wherein the latest multidimensional time series sample X 'available at the current time t is dynamically obtained' _t Inputting the predicted values into the characteristic variable dynamic time sequence prediction model established in the step 2 to obtain predicted values x ' of the D ' characteristic variables in the time step alpha in advance ' _t+α X' _t+α Inputting the decision score f into the typical scene classification model of the SVM power system established in the step 3 to obtain the corresponding decision score f _t+α And obtaining a sample x 'by using the sigmoid function of the parameters A and B determined in the step 4' _t+α Probability p of belonging to a positive case _t+α Namely the probability of the occurrence of the target typical scene of the power system at the future alpha time step, and the probability prediction of the target typical scene advancing the alpha time step is completed.