CN115293249A - Power system typical scene probability prediction method based on dynamic time sequence prediction - Google Patents

Power system typical scene probability prediction method based on dynamic time sequence prediction Download PDF

Info

Publication number
CN115293249A
CN115293249A CN202210877305.5A CN202210877305A CN115293249A CN 115293249 A CN115293249 A CN 115293249A CN 202210877305 A CN202210877305 A CN 202210877305A CN 115293249 A CN115293249 A CN 115293249A
Authority
CN
China
Prior art keywords
prediction
probability
data
power system
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210877305.5A
Other languages
Chinese (zh)
Inventor
廖思阳
姜新雄
徐箭
李琰
王新迎
尚学军
王天昊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan University WHU
China Electric Power Research Institute Co Ltd CEPRI
State Grid Tianjin Electric Power Co Ltd
Original Assignee
Wuhan University WHU
China Electric Power Research Institute Co Ltd CEPRI
State Grid Tianjin Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan University WHU, China Electric Power Research Institute Co Ltd CEPRI, State Grid Tianjin Electric Power Co Ltd filed Critical Wuhan University WHU
Priority to CN202210877305.5A priority Critical patent/CN115293249A/en
Publication of CN115293249A publication Critical patent/CN115293249A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S10/00Systems supporting electrical power generation, transmission or distribution
    • Y04S10/50Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Economics (AREA)
  • General Physics & Mathematics (AREA)
  • Strategic Management (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • General Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Tourism & Hospitality (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Public Health (AREA)
  • Water Supply & Treatment (AREA)
  • Primary Health Care (AREA)
  • Development Economics (AREA)
  • Game Theory and Decision Science (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to a typical scene probability prediction method of an electric power system based on dynamic time sequence prediction, which comprises the steps of firstly, constructing a dynamic time sequence prediction model aiming at characteristic variables; modeling the data knowledge by using a support vector machine to obtain a data sample decision score; modifying the sample decision score by adopting a sigmoid function, and mapping the sample decision score to an interval [0,1]; and finally, determining the value of the sigmoid parameter by utilizing maximum likelihood estimation to realize quantitative probability prediction of a target typical scene. The invention uses the maximum mutual information to carry out dimensionality reduction on the original data, combines a dynamic time sequence prediction model with a scene classification model, and uses a decision score probability transformation method based on the maximum likelihood estimation to transform the classification information of a typical scene of the power system into probability information, thereby being more beneficial to the dispatching operation personnel to evaluate the future risk of the system and making a more accurate dispatching operation plan.

Description

Power system typical scene probability prediction method based on dynamic time sequence prediction
Technical Field
The invention belongs to the technical field of typical scene probability prediction of an electric power system, and particularly relates to a typical scene probability prediction method of the electric power system based on dynamic time sequence prediction.
Background
With the rapid development of new energy and stochastic load, when the power grid is accessed in a large scale, the randomness and the intermittence of the new energy and the stochastic load can greatly improve the difficulty of planning and scheduling of the power grid, and can also cause different scenes such as serious faults, heavy load of important sections, unbalance of power supply and demand, and blocked new energy consumption of the power grid. However, the current scheduling mode is a passive waiting type scheduling mode, cannot meet the active scheduling requirement, mainly represents that the power grid accident is passively waited for, and has a space for improving predictable risk early warning and prevention of natural disasters and the like; no effective monitoring means is provided for the power fluctuation of the unit, and the power plant reports are relied on; the method has no monitoring means for the power grid emergent public sentiment events and relies on lower-level scheduling reporting; technical support system information is frequently scattered, and a centralized monitoring function is lacked. If the typical operation scene can be effectively predicted, operators are early warned and dispatched in advance, and controllable resources are dispatched as early as possible to participate in active adjustment of the power system, so that the monitoring and predicting capabilities of the power system on abnormal states can be greatly improved, and the safe and stable operation level of a power grid is improved.
The existing research related to scene prediction of the power system mainly focuses on aspects such as situation perception and stability assessment. Situational awareness is the recognition of a large number of environmental elements in time and space, understanding their meaning, and predicting their state in the near future to realize decision advantages. The definition of situational awareness, which means the extraction, understanding and prediction of the short term of the future of environmental elements within a certain time and space, was first proposed by Endsley in 1988. It is believed that situational awareness can be roughly divided into 3 levels, namely "cognition, understanding, and prediction". The core part of situation awareness includes: extraction of situation elements, understanding of current situation, prediction of future situation, decision making and action implementation. The research content in the stability assessment mainly comprises the related content of the static stability assessment and the transient stability assessment. The existing research on static stability prediction comprises line power flow out-of-limit prediction, node voltage out-of-limit prediction, limit transmission capacity prediction and the like. Most of the methods construct a learning model based on data driving based on part of measurable variables of the system, so that the purpose of prediction is achieved, but the evaluation result of the methods is often the state type of the future power system or the numerical value of a certain concerned index, however, under general conditions, only type information or index values are difficult to reflect the critical degree of the power system scene, scheduling operators cannot be helped to accurately judge the situation, the output result of the methods cannot play an ideal role, and workers still need to judge the situation again according to own experience and make strategies.
Therefore, based on the analysis and the explanation, the operation state of the system becomes more complex and changeable in consideration of the current situation of rapid development of new energy at present, in order to fully master the operation development situation of the system and provide more feed-forward information for the formulation of a scheduling strategy, thereby improving the situation perception capability of the power system, the invention designs a method for outputting the combined sigmoid function correction probability based on decision scores of a Support Vector Machine (SVM), thereby realizing the quantitative probability early warning of the future state of a target scene or a key element, and further providing a more reliable indication effect for the operation scheduling of the system. According to the method, firstly, a time sequence prediction model of typical scene key variables is constructed based on a long-and-short term memory network (LSTM), then, an SVM is used for carrying out classification learning on historical samples with screened characteristics, the historical samples can be accurately classified according to the scene attributes of the samples, so that the future states are accurately predicted, probability correction learning is carried out by using a Sigmoid function based on decision scores output by the SVM model, parameters of the Sigmoid function are determined by combining a maximum likelihood method, mapping from the decision scores to probability values is completed, and therefore the quantitative probability prediction of a power system target typical scene is finally achieved.
Disclosure of Invention
The invention provides a typical scene probability prediction method of an electric power system based on dynamic time sequence prediction. Firstly, screening out a characteristic subset related to a power system target typical scene based on a maximum mutual information measurement method according to physical characteristics of the power system target typical scene, collecting time series data of characteristic variables from historical data, and forming a multi-dimensional time series data set by combining a target typical scene state sequence; based on the long and short time memory network and the historical time sequence data, constructing a dynamic time sequence prediction model aiming at the associated characteristic variables through cross validation and grid search; converting the scene prediction problem into a classification problem, modeling the data knowledge by using a support vector machine, and acquiring a decision score for each data sample; then, a sigmoid function with parameters is adopted to modify decision score output of the SVM, and a decision score value can be mapped to an interval [0,1], so that probability output is realized; and finally, determining parameter values in the sigmoid function by using a maximum likelihood estimation method, and realizing mapping from decision scores to probability values, thereby finally realizing quantitative probability prediction of a typical target scene of the power system.
The invention provides a power system typical scene probability prediction method based on dynamic time sequence prediction, which is characterized by comprising the following steps of:
screening out a feature subset related to the power system based on a maximum mutual information measurement method according to physical characteristics of a target typical scene of the power system, collecting time series data of feature variables from historical data, and forming a multi-dimensional time series data set by combining a target typical scene state sequence;
based on the long and short time memory network and the historical time sequence data, a dynamic time sequence prediction model for the associated characteristic variables is constructed through cross validation and grid search, a support vector machine is used for modeling data knowledge, historical samples of the screened characteristics are subjected to classified learning, and a decision score for each data sample is obtained;
the decision score output of the SVM is modified by adopting a sigmoid function with parameters, the decision score value is mapped to an interval [0,1], the parameter value in the sigmoid function is determined by utilizing a maximum likelihood estimation method, the mapping from the decision score to the probability value is realized, a typical scene probability prediction model of the power system target is obtained, and finally the quantitative probability prediction of the typical scene of the power system target is realized by combining the dynamic time sequence prediction result of the characteristic variable.
In the prediction method, the typical target scene of the power system needing to be predicted is determined, and the state sequence of the typical target scene is constructed according to historical data information
Figure BDA0003762985380000031
N is the total number of data points, y k Value of the target scene state at the kth time point, y k ∈{0,1},y k =1 indicates that the target scene occurred, positive example, y k =0 indicates that the target scene did not occur, negative example. And simultaneously acquiring time series data of each characteristic variable recorded in historical data information, recording the time series data as X, then acquiring a maximum Mutual Information Coefficient (MIC) coefficient between each characteristic variable and a target typical scene state sequence Y by using an MIC measurement method, setting a threshold value to eliminate low-correlation characteristic variables, and acquiring a characteristic subset
Figure BDA0003762985380000032
D' is equal to the total number of characteristic variables D minus the rejectedExcept for the number of characteristic variables. Then, time series data of the feature variables included in the feature subset Q are extracted from the historical data, and combined with the target typical scene state sequence Y, a multi-dimensional time series data set is formed.
Figure BDA0003762985380000033
In the above formula, ts i Denotes the ith characteristic variable F i The time-series data of (a) is,
Figure BDA0003762985380000034
the data measurement value of the kth time point in the time sequence data of the ith characteristic variable is shown as D, the total number of the characteristic variables is shown as D, and the total number of the data points, namely the length of the time sequence data, is shown as N.
In the prediction method described above, a dynamic time series prediction model, called a characteristic variable dynamic time series prediction model, for each of the characteristic variables included in the characteristic subset Q is constructed using a long-short time memory network based on the acquired multidimensional time series dataset. The training input for the long and short term memory network is
Figure BDA0003762985380000035
X' k To input a multidimensional time series sample, x' k+α Is corresponding to sample X' k α is the number of advance time steps of the time series prediction. Through cross validation and grid search, the dynamic time sequence prediction model of the feature variables obtained through training can realize prediction of the feature variables in the feature subset Q in advance by alpha time steps.
Figure BDA0003762985380000036
Wherein L is the length of each time series segment in the multi-dimensional time series sample.
In the prediction method, the power system target is established according to the constructed multi-dimensional time series data setTypical scene prediction classification supervised format dataset
Figure BDA0003762985380000041
y k =1 denotes sample x' k Belong to the positive example, y k =0 denotes sample x' k Belonging to negative examples, which were then divided into m groups. Extracting m-1 groups of samples, constructing a typical scene classification model of the power system by using a support vector machine model to obtain an SVM decision function f (), then obtaining decision scores aiming at the remaining 1 groups of samples by using the decision function f (), storing the decision scores, repeating the process for m times, obtaining the decision score aiming at each data sample when the m-1 groups of samples extracted each time are different, and establishing a decision score-label set
Figure BDA0003762985380000042
For k-th data sample x' k For which the decision score of the support vector machine model is f k ,y k Taking a value of the target scene state at the kth time point, wherein k =1,2,. Cndot.N;
Figure BDA0003762985380000043
in the prediction method, a sigmoid function with A and B parameters is adopted to modify the decision score output by the SVM, and the decision score is mapped to [0,1]]Interval and based on the decision score-label set obtained in step 3
Figure BDA0003762985380000044
And determining the values of parameters A and B of the sigmoid function by using a maximum likelihood estimation method, and realizing the conversion from the decision score to the occurrence probability of the target typical scene.
The decision score mapping form based on the sigmoid function is as follows:
Figure BDA0003762985380000045
in the formula: A. b is a sigmoid function parameter, f is a decision score value corresponding to the input sample x ', and P (y =1 calcuum x ') represents the probability that the input sample x ' belongs to a positive example.
Based on the obtained decision score-tag set
Figure BDA0003762985380000046
The specific process of solving the parameters A and B by using the maximum likelihood estimation method is as follows:
Figure BDA0003762985380000047
Figure BDA0003762985380000048
in the formula:
Figure BDA0003762985380000049
is the probability estimate corresponding to the k-th sample belonging to the positive example. N is a radical of + Is the number of samples belonging to the positive example, N, in the total samples - The number of samples belonging to negative examples in all samples. Parameters A and B can be obtained by solving the minF (A and B), so that the decision score of the SVM model is converted into probability output.
In the prediction method, the latest multidimensional time series sample X 'obtained at the current time t is dynamically acquired in real time' t Inputting the predicted values into the established characteristic variable dynamic time sequence prediction model to obtain predicted values x ' of D ' characteristic variables with time step alpha ahead ' t+α X' t+α Inputting the decision score f into the typical scene classification model of the SVM power system established in the step 3 to obtain the corresponding decision score f t+α And obtaining a sample x 'by using the sigmoid function of the parameters A and B determined in the step 4' t+α Probability p of belonging to a positive case t+α Namely the probability of the occurrence of the target typical scene of the power system at the future alpha time step, and the probability prediction of the target typical scene advancing the alpha time step is completed.
Figure BDA0003762985380000051
Figure BDA0003762985380000052
The invention provides a power system typical scene probability prediction method based on dynamic time sequence prediction for the first time, and probability output is superposed on the type prediction of the power system typical scene, so that more reference information of scheduling operators is provided, and a more accurate scheduling control strategy is facilitated to be established. Firstly, screening out a characteristic subset which is strongly related to a target scene based on a maximum mutual information measurement method, collecting time sequence data of characteristic variables from historical data, and forming a multi-dimensional time sequence data set by combining a scene state sequence; then, constructing a dynamic time sequence prediction model of the associated characteristic variables based on the long and short time memory network and the historical time sequence data; then, learning the scene data samples by using a support vector machine to obtain a decision score for each data sample, modifying the decision score of the SVM by using a sigmoid function with parameters, and mapping the decision score to an interval [0,1]; and finally, determining parameter values in the sigmoid function by using a maximum likelihood estimation method, and realizing quantitative probability prediction of a typical scene of the power system. The invention has the following advantages: 1. the dimension reduction is carried out on the original data by using a maximum mutual information coefficient measurement method, the subsequent training efficiency of a characteristic variable dynamic time sequence prediction model based on a long-and-short-time memory network and a typical scene classification model based on an SVM is improved, and the complexity of the overall model is reduced; 2. the characteristic variable dynamic time sequence prediction model is combined with the typical scene classification model, so that the dynamic perception of the future state of the power system is realized, and the comprehensive understanding of the scene trend of the power system is facilitated; 3. the decision score probability conversion method based on the maximum likelihood estimation is used for converting the classification information of the typical target scene of the power system into the probability information, so that the scheduling operator can evaluate the future risk of the system and make a more accurate scheduling operation plan.
Drawings
FIG. 1 is a diagram of an SVM classification model.
FIG. 2 is a schematic diagram showing the fluctuation segment of the actual load 1 in the embodiment of the present invention.
Fig. 3 is a schematic diagram showing the fluctuation segment of the actual load 2 in the embodiment of the present invention.
FIG. 4 is a schematic diagram of key variables extracted based on MIC in the present invention.
FIG. 5 is a schematic diagram of an optimal prediction model training process according to an embodiment of the present invention.
FIG. 6 is a diagram illustrating the time-series prediction effect of feature 8 according to an embodiment of the present invention.
FIG. 7 is a diagram illustrating the time-series prediction effect of the feature 17 according to the embodiment of the present invention.
FIG. 8 is a diagram illustrating the time-series prediction effect of the feature 15 according to the embodiment of the present invention.
FIG. 9 is a diagram illustrating the effect of scene probability prediction in an embodiment of the present invention.
FIG. 10 is a schematic flow chart of the method of the present invention.
Detailed Description
The technical scheme of the invention is further specifically described by the following examples and the accompanying drawings.
1. Typical scene associated feature selection based on maximum mutual information coefficient
In order to screen out the feature subset most relevant to the target typical scene of the power system and reduce the calculation complexity of the subsequent learning model construction, the method measures the correlation coefficient between the alternative features and the target typical scene by using a maximum mutual information coefficient method, and sets a threshold to eliminate low-correlation features.
The maximum mutual information coefficient is an effective method for measuring the correlation between bivariates, has strong robustness, and can capture the linear and nonlinear correlations between a wide range of bivariates. The basic idea of the maximum mutual information coefficient is: meshing is performed on a scatter diagram drawn based on data of two variables, and then correlation between bivariables is evaluated based on the meshing. Therefore, in order to obtain the maximum mutual information coefficient between two variables, the gridding strategy is adopted as much as possible, and the maximum correlation coefficient value which can be calculated under all gridding conditions is called the maximum mutual information coefficient.
In particular, it is assumed that a given set of candidate features is given
Figure BDA0003762985380000061
F i And representing the ith characteristic variable, collecting time sequence data of the characteristic variables from historical data, and recording the time sequence data as X:
Figure BDA0003762985380000062
in the formula, ts i Denotes the ith characteristic variable F i The time-series data of (a) is,
Figure BDA0003762985380000063
the data measurement value of the kth time point in the time sequence data of the ith characteristic variable is shown as D, the total number of the candidate characteristic variables is shown as D, and the total number of the data points, namely the length of the time sequence data, is shown as N.
Likewise, a sequence of states relating to a typical scene of the object may be obtained
Figure BDA0003762985380000071
N is the total number of data points, y k Value of the target scene state at the kth time point, y k ∈{0,1},y k =1 indicates that the target scene occurred, positive example, y k =0 indicates that the target scene did not occur, negative example. Thus, for the ith characteristic variable F i Based on its historical data and the sequence of target scene states, a set of data pairs can be constructed
Figure BDA0003762985380000072
The scattered points are distributed in a two-dimensional coordinate system to obtain scattered point distribution. Dividing the horizontal axis value into a intervals, and dividing the vertical axis value into b intervals (no data point exists in the allowed interval), namely a-by-b grid division, wherein a and b are positive integers and are marked as G. Given a value of oneA mesh division mode G using D i | G Represents D i Of the grid G, and thus for a certain data set D i Different meshing strategies G may result in different scatter distributions D i | G . Thus, for one data set D i And a fixed a, b parameter, the maximum mutual information value that can be obtained on all possible meshing strategies G can be expressed as follows:
I * (D i ,a,b)=maxI(D i | G )
in the formula, I (D) i | G ) Representing a distribution of scatter D i | G Mutual information between the calculated variables, from which D can be defined i With respect to I * The characteristic matrix of (2) is as follows:
Figure BDA0003762985380000073
in the formula, M (D) i ) Is D i Characteristic matrix of M (D) i ) a,b The elements representing the a row a and column b in the matrix, the mutual information value can be normalized to [0,1] by the denominator in the above equation]And the interval is adopted, so that the fair and maximum mutual information coefficient comparison among different characteristics is ensured. Thus, the ith characteristic variable F i The maximum Mutual Information Coefficient (MIC) with the object scene tag Y can be defined as the matrix M (D) i ) Maximum value of (2):
MIC(F i ,Y)=max ab<B(N) {M(D i )}
wherein, MIC (F) i Y) denotes the ith characteristic variable F i The maximum mutual information coefficient with the target scene label Y, B (N) is a function of the number of data points N, which controls how many meshing strategies need to be considered, which is usually set to B (N) = N 0.6 . MIC values in the interval [0,1]Within, a larger value indicates a higher correlation between the two. The maximum mutual information coefficient between the D alternative variables and the target scene label sequence is calculated in sequence, so that the D alternative variables and the target scene can be obtainedThe correlation between the key characteristic variables can be screened by setting a threshold value to obtain a key characteristic variable subset
Figure BDA0003762985380000074
D' is equal to the total number of the alternative variables D minus the number of the characteristic variables to be eliminated. Time series data of the feature variables contained in the feature subset Q can then be extracted from the historical data and combined with the target typical scene state sequence Y to form a multi-dimensional time series dataset.
2. Construction of characteristic variable dynamic time sequence prediction model
(1) Long and short time memory network
With the explosive growth of data volume and the improvement of computer performance, the traditional neural networks limit the possibility that the efficiency and performance can be further improved so as to deal with big data problems due to the limitations of the traditional neural networks; conventional machine learning often fails to process data in raw format. For example, when a picture is to be classified, all pixel values of the whole picture are not used as input, but features of the picture are artificially extracted, converted into a digital form, and then used for training a network. Deep learning is one of important components of machine learning, and refers to a method set which is based on a deep neural network and is used for designing a series of algorithms for feature learning and processing so as to solve the problems of detection, tracking, classification and the like of images and texts. Deep learning is representation learning, which is a learning method enabling a machine to automatically detect the characteristics of original data, and is used for a plurality of representation layers which are combined together in a simple but nonlinear module form, and each layer converts the representation into higher-layer abstraction and transmits the higher-layer abstraction to the next layer. Deep learning mainly takes the characteristics of learning data as a core task, and acquires hierarchical data characteristics through different layers of neural networks, so that the problem that the characteristics need to be manually extracted in the past is solved. There are many kinds and variations of Neural networks used in deep learning, and at present, there are two kinds of Neural networks which are most widely applied and are the most common, namely, RNN (Recurrent Neural Network) and CNN (Convolutional Neural Network).
CNN (Convolutional Neural Network): it can be seen as an advanced version of the standard neural network. It includes a convolutional layer (convolutional layer), a pooling layer (pooling layer), and a full-connected layer. These results enable it to receive the full pixel values of a picture without the need for artificial feature extraction;
RNN (Recurrent Neural Network ): it is a completely different model from CNN, dedicated to the dynamic behavior of the presentation sequence data. A Recurrent Neural Network (RNN) is a type of Recurrent Neural Network (Recurrent Neural Network) that takes sequence data as input, processes the input sequence in combination with an internal state (internal state), is dedicated to showing the dynamic behavior of the sequence data, recurses in the evolution direction of the sequence, and all nodes (Recurrent units) are connected in a chain. Time series data refers to data collected at different points in time, which reflects the state or degree of change of a certain object, phenomenon, etc. over time. This is the definition of time series data, but it may not be time, such as text sequences, but the total sequence data has a feature that later data is related to earlier data.
As known from a traditional neural network, the neural network comprises an input layer, a hidden layer and an output layer, the process is from the input layer to the hidden layer to the output layer, the output is controlled by activating a function, the layers are fully connected through weights, and nodes between every two layers are not connected. The activation function is determined in advance, and what the neural network model learns through training is contained in the weight. However, the basic neural network only establishes the weight connection between layers, and cannot be used for many problems. For example, you would typically need to use the previous word to predict what the next word in a sentence is, because the previous and next words in a sentence are not independent. The largest difference of RNN is that a weight connection is also established between layer-to-layer neurons. The RNN is called a recurrent neural network, i.e., the current output of a sequence is also related to the previous output. The concrete expression is that the network memorizes the previous information and applies the previous information to the calculation of the current output, namely, the nodes between the hidden layers are not connected but connected, and the input of the hidden layer not only comprises the output of the input layer but also comprises the output of the hidden layer at the last moment. In theory, RNNs can process sequence data of any length. This ability to obtain an efficient representation of information between time series has led to the widespread use of RNNs in the fields of Natural Language Processing (NLP), such as speech recognition, language modeling, machine translation, etc., as well as various types of time series predictions, and has yielded a range of superior results.
The long-short-time memory network (LSTM) architecture is an improvement over the traditional RNN model. The long-short term memory network LSTM is a special variant of RNN, RNN only has short-term memory due to gradient disappearance, and LSTM brings addition operation into the network through delicate gate control, thereby solving the problem of gradient disappearance to a certain extent and learning long-term dependence information. However, too long sequences still exhibit "gradient vanishing" (which may occur for lengths exceeding 300), so LSTM is called "short-term memory" a bit longer. LSTM was proposed by Hochreiter & Schmidhuber (1997) and was modified and promoted by AlexGraves. LSTM has enjoyed considerable success and widespread use in a number of problems.
(2) Characteristic variable dynamic time sequence prediction model based on LSTM
Based on the acquired multi-dimensional time series data set, a dynamic time series prediction model for each characteristic variable included in the characteristic subset Q is constructed by using a long-time and short-time memory network, and is called as a characteristic variable dynamic time series prediction model. The training input for the long and short term memory network is
Figure BDA0003762985380000091
X' k To input the multidimensional time series of samples, as follows:
Figure BDA0003762985380000092
where D' is the dimension of the feature subset and L is the length of each time series segment in the multi-dimensional time series sample.
x' k+α Is corresponding to sample X' k The regression prediction target of (1) is as follows:
Figure BDA0003762985380000101
where α is the number of advance time steps of the timing prediction.
And finally, predicting the advanced alpha time step of the characteristic variable in the characteristic subset Q by a dynamic time sequence prediction model of the characteristic variable obtained by training through cross validation and grid search.
3. Typical scene classification model construction and probability transformation method for power system
(1) Support vector machine model
The Support Vector Machine (SVM) is widely applied to a power system, shows good prediction performance in aspects of transient stability evaluation, transformer fault diagnosis and the like, and has the characteristics of few training samples, strong generalization capability and the like. Meanwhile, for any sample x, the SVM can acquire the distance from the SVM to the classification hyperplane, so that the confidence degree of the classification result can be defined, and the probability significance is achieved. The SVM searches for a classification hyperplane in the high-dimensional space by converting the input space into the high-dimensional feature space, so that the classification interval is maximized on the premise of separating the sample points without errors, and the optimal classification effect is obtained.
The support vector machine is a two-classification model, the basic model of the support vector machine is a linear classifier which is defined on a feature space and has the maximum interval, and the maximum interval makes the support vector machine different from a perceptron; the support vector machine also includes kernel skills, which make it a substantially non-linear classifier. The learning strategy of the support vector machine is interval maximization, and can be formulated as a problem of solving convex quadratic programming (convex quadratic programming), which is also equivalent to the minimization problem of the regularized hinge loss function. The learning algorithm of the support vector machine is an optimization algorithm for solving convex quadratic programming.
The learning method of the support vector machine comprises the steps of constructing a model from simple to complex, wherein the simple model is the basis of the complex model and is also the special case of the complex model. When the training data is linearly divisible, learning a linear classifier, namely a linear branching support vector machine (also called a hard interval support vector machine), through hard interval maximization (hard interval maximization); when the training data is approximately linear and divisible, a linear classifier, namely a linear support vector machine, is also learned through soft interval maximization (soft margin maximization); when the training data is linearly infeasible, the nonlinear support vector machine is learned by using kernel trick and soft interval maximization.
When the input space is a euclidean space or a discrete set and the feature space is a hilbert space, a kernel function (kernel function) represents an inner product between feature vectors obtained by mapping the input from the input space to the feature space. The nonlinear support vector machine can be learned by using a kernel function, which is equivalent to learning a linear support vector machine implicitly in a high-dimensional feature space, and such a method is called kernel skill. The kernel method is a more general machine learning method than the support vector machine.
(2) Typical scene classification model construction and probability transformation method based on SVM
According to the characteristic subset historical time sequence data and the target typical scene state sequence, a power system target typical scene prediction classification supervision format data set can be established
Figure BDA0003762985380000111
y k =1 denotes sample x' k Belong to the positive example, y k =0 denotes sample x' k Belong to the negative example, k =1,2,., N:
Figure BDA0003762985380000112
wherein x' k ∈R D′ D representing the kth sample' dimensional input feature, y k Is the sample label value. The SVM adopts a kernel function K (.) to map an original problem to a certain high-dimensional space, and then an optimal classification hyperplane f (x') is constructed in the transformed high-dimensional space, namely an SVM decision function:
Figure BDA0003762985380000113
in the formula: alpha is alpha k For lagrange multipliers, b ∈ R is the offset. Alpha is alpha k This can be obtained by solving the following optimization problem:
Figure BDA0003762985380000114
s.t.C≥α k ≥0,k=1,...,N
Figure BDA0003762985380000115
wherein C epsilon R is a penalty factor. Kernel function K (x' k ,x′ j ) The method has multiple functions, can approach any function with any small error by adopting a radial basis kernel function, adopts the kernel function to construct a model in many existing researches, and obtains good test performance, so that the radial basis kernel function is selected in the research:
Figure BDA0003762985380000116
in the formula: gamma epsilon R is a core parameter. In general, the optimal parameters γ and C of the model can be obtained through grid search or heuristic algorithm.
As shown above, for any sample x 'to be classified, the decision function f (x') is output as a certain numerical value, which is the decision score. In the binary classification problem, the SVM can classify the samples into positive and negative classes based on whether the decision score is greater than 0, thereby completing class prediction, but the decision score can only reflect the distance of the samples from the classification hyperplane, and has no probability significance. A sigmoid function with parameters is adopted to modify decision score output of the SVM, and a decision function value can be mapped to an interval [0,1], so that probability output is realized, and the probability output form is as follows:
Figure BDA0003762985380000121
in the formula: A. b is a sigmoid function parameter, f is a decision score value corresponding to the input sample x ', and P (y =1 _ y _ x ') represents the probability that the input sample x ' belongs to a positive example.
A data set
Figure BDA0003762985380000122
An SVM classifier and parameters A and B need to be trained simultaneously. In order to avoid overfitting, an original training sample is divided into two parts by adopting a cross validation mode, one part is used for training an SVM classification model, and the other part is used for determining A and B of a sigmoid function by using a maximum likelihood estimation method. According to the above thought, the training sample set can be collected
Figure BDA0003762985380000123
Dividing the samples into m groups, training the m-1 groups to obtain an SVM model and calculating the decision scores of the rest group of samples, processing the samples for m times to obtain the decision scores of all the samples, and further forming a decision score-label set
Figure BDA0003762985380000124
The specific process for solving the parameters A and B based on the maximum likelihood estimation method is as follows:
Figure BDA0003762985380000125
Figure BDA0003762985380000126
in the formula:
Figure BDA0003762985380000127
is an estimate of the probability corresponding to the kth sample belonging to the positive example. N is a radical of hydrogen + Is the number of samples belonging to positive examples in the total samples, and N-is the number of samples belonging to negative examples in the total samples. Parameters A and B can be obtained by solving the minF (A and B), so that the decision score of the SVM model is converted into probability output.
The probability estimation can reflect the probability that the sample x' to be identified belongs to different categories, effectively normalize the distance between the sample and the classification hyperplane, and output the probability p for a common two-classification problem k Between 0% and 100%. Now, besides the prediction labels given by the classification model, the probability output gives more information to people, so that related people can more comprehensively know the situation.
4. Simulation verification of scene probability prediction model
In the calculation example, a key section heavy-load typical scene of an electric power system is taken as an example, an IEEE300 node system is used for simulation verification of probability prediction of the typical scene of the electric power system based on dynamic time sequence prediction, in order to construct training samples with sufficient quantity and rich types for training an LSTM dynamic time sequence prediction model and an SVM scene probability prediction model and have good generalization capability, actual load data acquired in a Huazhong power grid are firstly utilized and accessed into the IEEE300 node system to form a fluctuation source, then the numerical values of monitoring variables in each time section system are obtained through load flow calculation, so that a complete data sample is formed, and then the data sample is incorporated into the time sequence prediction model and the scene probability prediction model, so that a classification model capable of online rolling prediction of the probability of a future scene is trained.
Based on the above thought, the research selects 360-day actual data of two actual load points, the load data sampling interval is 15min when the actual data is accessed to two nodes of the system, fig. 2 and fig. 3 show partial segment characteristics of the actual data, and it can be seen that the actual data has relatively large fluctuation characteristics, and after time sequence power flow simulation calculation, 34560 sample data of time points are formed in total and 1960 alternative data characteristics are included. According to the maximum mutual information coefficient method, 19 key features are screened out, and the specific information is shown in fig. 4.
The extracted data information of the key variables is input into an LSTM network for training, in the research, a scheme that the data of the previous 301 days is used as training, and the data of the next 59 days is used as testing is adopted, and the scheme is also applied to an SVM scene classification model.
After a sample complete set is divided into a training set and a testing set, a cross validation mode is adopted on the training set, and grid search is combined, an optimal model and parameter setting are learned through the mode, the parameters enable the trained model to have generalization capability as large as possible, after the model parameters are determined, the model is trained through all data sets, so that the constructed model learns more knowledge as possible, after the time sequence prediction model is constructed, a similar method is adopted, data of the previous 301 days are used for training an SVM classification model, whether the overload of a certain key section is used as a label in the present example (namely, the overload of the key section is used as a positive example and the overload is not used as a negative example in the learning process), so that the SVM model can learn the data characteristics of the overload sample from historical data, and accurate scene classification prediction can be made based on the LSTM time sequence prediction information in the future. FIG. 5 illustrates the loss reduction process during the optimal LSTM model training process. Fig. 6-8 are illustrations of the dynamic time-series prediction effect of partial variables.
As shown by the time sequence prediction effects of FIGS. 6-8, the time sequence prediction model constructed in the invention can better predict the short-term future value-taking situation of the key variable, which also lays a solid foundation for the probability prediction of the subsequent scene. Through the construction of an LSTM model and an SVM model and based on the SVM decision score output and conversion-based scene probability prediction method described above, the decision score of the SVM is converted into a scene probability through a sigmoid function, so that the function of carrying out quantitative probability early warning on a typical scene of a future power system based on dynamic time sequence prediction can be realized, and the probability prediction effect is shown in fig. 9.
As can be seen from fig. 9, the SVM model better predicts the future scene category attribution of the system, only one time point is not predicted accurately in the simulation at 96 points a day, but under the effect of the scene probability prediction model provided by the present invention, when the category information prediction is incorrect, the probability prediction information can still assist the scheduling personnel to judge the situation, and when the scene probability prediction is available, the judgment of the future scene type is no longer non-zero, i.e., one, and the probability prediction information is available, so that more confidences can be added to the judgment of the scheduling personnel, thereby assisting the scheduling personnel to make more accurate and effective decisions. As can be seen from the figure, the probability prediction curve (dotted line) better tracks the change of the typical scene category of the system and provides more reference information on the top of the category information.
The specific embodiments described herein are merely illustrative of the spirit of the invention. Various modifications or additions may be made to the described embodiments or alternatives may be employed by those skilled in the art without departing from the spirit or ambit of the invention as defined in the appended claims.

Claims (8)

1. A typical scene probability prediction method of an electric power system based on dynamic time sequence prediction is characterized by comprising the following steps
Screening out a feature subset related to the power system based on a maximum mutual information measurement method according to physical characteristics of a target typical scene of the power system, collecting time series data of feature variables from historical data, and forming a multi-dimensional time series data set by combining a target typical scene state sequence;
constructing a dynamic time sequence prediction model aiming at the associated characteristic variables through cross validation and grid search based on the long and short time memory network and the historical time sequence data, modeling data knowledge by using a Support Vector Machine (SVM), performing classification learning on the historical samples of the screened characteristics, and acquiring a decision score aiming at each data sample;
the decision score output of the SVM is modified by adopting a sigmoid function with parameters, the decision score value is mapped to an interval [0,1], the parameter value in the sigmoid function is determined by utilizing a maximum likelihood estimation method, the mapping from the decision score to the probability value is realized, a typical scene probability prediction model of the power system target is obtained, and finally the quantitative probability prediction of the typical scene of the power system target is realized by combining the dynamic time sequence prediction result of the characteristic variable.
2. The prediction method of claim 1, wherein a typical target scenario of the power system to be predicted is determined, and a state sequence of the typical target scenario is constructed according to historical data information
Figure FDA0003762985370000011
N is the total number of data points, y k Value of the target scene state at the kth time point, y k ∈{0,1},y k =1 indicates that the target scene occurred, positive example, y k =0 indicates that the target scene does not occur, negative example; and simultaneously acquiring time series data of each characteristic variable recorded in historical data information, recording the time series data as X, then acquiring a maximum Mutual Information Coefficient (MIC) coefficient between each characteristic variable and a target typical scene state sequence Y by using an MIC (MIC) measurement method, setting a threshold value to remove low-correlation characteristic variables, and obtaining a characteristic subset
Figure FDA0003762985370000012
D' is equal to the total number D of the characteristic variables minus the number of the characteristic variables to be eliminated; then, time series data of the feature variables included in the feature subset Q are extracted from the historical data, and combined with the target typical scene state sequence Y, a multi-dimensional time series data set is formed.
3. The prediction method according to claim 1, wherein a long-and-short-time memory network is used to construct a dynamic time sequence prediction model for each feature variable included in the feature subset Q, referred to as a feature variable dynamic time sequence prediction model; training input for long and short term memory networksIs composed of
Figure FDA0003762985370000013
X' k To input a multidimensional time series sample, x' k+α Is corresponding to sample X' k Alpha is the advance time step number of the time sequence prediction; through cross validation and grid search, the dynamic time sequence prediction model of the characteristic variables obtained through training can realize prediction of the characteristic variables in the characteristic subset Q in advance of the alpha time step.
4. The prediction method according to claim 1, wherein a power system target typical scene prediction classification supervision format data set is established
Figure FDA0003762985370000021
y k =1 denotes sample x k ' belonging to the positive example, y k =0 for sample x k ' belonging to negative examples, which were subsequently divided into m groups; extracting m-1 groups of the decision scores, constructing a typical scene classification model of the power system by using a support vector machine model to obtain an SVM decision function f (), then obtaining and storing decision scores aiming at the rest 1 groups of samples by using the decision function f (), repeating the process for m times, obtaining the decision score aiming at each data sample when the m-1 groups of samples extracted each time are different, and establishing a decision score-label set
Figure FDA0003762985370000022
For the k-th data sample x k ', for which the decision score of the support vector machine model is f k ,y k For the target scene state value at the kth time point, k =1, 2.
5. The prediction method of claim 1, wherein the decision score of the SVM output is modified by using a sigmoid function with a and B parameters, and the decision score is mapped to [0, 1%]Interval and based on the decision score-label set obtained in step 3
Figure FDA0003762985370000023
And determining the values of parameters A and B of the sigmoid function by using a maximum likelihood estimation method, and realizing the conversion from the decision score to the occurrence probability of the target typical scene.
6. The prediction method according to claim 1, characterized in that the decision score mapping form based on the sigmoid function is adopted as follows:
Figure FDA0003762985370000024
in the formula: A. b is the sigmoid function parameter, f is the decision score value corresponding to the input sample x ', and P (y =1x ') represents the probability that the input sample x ' belongs to a positive case.
7. The prediction method of claim 1, wherein the set of labels is based on the obtained decision score
Figure FDA0003762985370000025
The specific process of solving the parameters A and B by using the maximum likelihood estimation method is as follows:
Figure FDA0003762985370000026
Figure FDA0003762985370000027
in the formula:
Figure FDA0003762985370000028
is the probability estimate corresponding to the kth sample belonging to the positive example; n is a radical of + Is the number of samples belonging to the positive example, N, out of all samples - The number of samples belonging to negative examples in all samples is taken; by solving for minF (A, B), canSo as to obtain parameters A and B, and convert the decision score of the SVM model into probability output.
8. The prediction method of claim 1, wherein the latest multidimensional time series sample X 'available at the current time t is dynamically obtained' t Inputting the predicted values into the characteristic variable dynamic time sequence prediction model established in the step 2 to obtain predicted values x ' of the D ' characteristic variables in the time step alpha in advance ' t+α X' t+α Inputting the decision score f into the typical scene classification model of the SVM power system established in the step 3 to obtain the corresponding decision score f t+α And obtaining a sample x 'by using the sigmoid function of the parameters A and B determined in the step 4' t+α Probability p of belonging to a positive case t+α Namely the probability of the occurrence of the target typical scene of the power system at the future alpha time step, and the probability prediction of the target typical scene advancing the alpha time step is completed.
CN202210877305.5A 2022-07-25 2022-07-25 Power system typical scene probability prediction method based on dynamic time sequence prediction Pending CN115293249A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210877305.5A CN115293249A (en) 2022-07-25 2022-07-25 Power system typical scene probability prediction method based on dynamic time sequence prediction

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210877305.5A CN115293249A (en) 2022-07-25 2022-07-25 Power system typical scene probability prediction method based on dynamic time sequence prediction

Publications (1)

Publication Number Publication Date
CN115293249A true CN115293249A (en) 2022-11-04

Family

ID=83823830

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210877305.5A Pending CN115293249A (en) 2022-07-25 2022-07-25 Power system typical scene probability prediction method based on dynamic time sequence prediction

Country Status (1)

Country Link
CN (1) CN115293249A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116668194A (en) * 2023-07-27 2023-08-29 北京弘明复兴信息技术有限公司 Network security situation assessment system based on Internet centralized control platform

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116668194A (en) * 2023-07-27 2023-08-29 北京弘明复兴信息技术有限公司 Network security situation assessment system based on Internet centralized control platform
CN116668194B (en) * 2023-07-27 2023-10-10 北京弘明复兴信息技术有限公司 Network security situation assessment system based on Internet centralized control platform

Similar Documents

Publication Publication Date Title
Shen et al. Wind speed prediction of unmanned sailboat based on CNN and LSTM hybrid neural network
CN112087442B (en) Time sequence related network intrusion detection method based on attention mechanism
CN109583565A (en) Forecasting Flood method based on the long memory network in short-term of attention model
CN109523021A (en) A kind of dynamic network Structure Prediction Methods based on long memory network in short-term
CN113486578A (en) Method for predicting residual life of equipment in industrial process
Dai et al. Hybrid deep model for human behavior understanding on industrial internet of video things
Suryo et al. Improved time series prediction using LSTM neural network for smart agriculture application
CN114120637A (en) Intelligent high-speed traffic flow prediction method based on continuous monitor
CN116186633A (en) Power consumption abnormality diagnosis method and system based on small sample learning
CN113591988B (en) Knowledge cognitive structure analysis method, system, computer equipment, medium and terminal
CN115017970A (en) Migration learning-based gas consumption behavior anomaly detection method and system
CN115293249A (en) Power system typical scene probability prediction method based on dynamic time sequence prediction
Li et al. A LSTM-based method for comprehension and evaluation of network security situation
CN112668633B (en) Adaptive graph migration learning method based on fine granularity field
WO2023137918A1 (en) Text data analysis method and apparatus, model training method, and computer device
Dang et al. seq2graph: discovering dynamic dependencies from multivariate time series with multi-level attention
ABBAS A survey of research into artificial neural networks for crime prediction
CN113469013A (en) Motor fault prediction method and system based on transfer learning and time sequence
CN106845386A (en) A kind of action identification method based on dynamic time warping Yu Multiple Kernel Learning
CN113723660A (en) Specific behavior type prediction method and system based on DNN-LSTM fusion model
Agarwal Application of Deep Learning in Chemical Processes: Explainability, Monitoring and Observability
Liu et al. Fault prediction for satellite communication equipment based on deep neural network
Bashar et al. ALGAN: Time Series Anomaly Detection with Adjusted-LSTM GAN
Ding et al. A Novel LSTM-1DCNN-based Deep Network for Fault Diagnosis in Chemical Process
CN113627366B (en) Face recognition method based on incremental clustering

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination