CN116404637A - Short-term load prediction method and device for electric power system - Google Patents

Short-term load prediction method and device for electric power system Download PDF

Info

Publication number
CN116404637A
CN116404637A CN202310326869.4A CN202310326869A CN116404637A CN 116404637 A CN116404637 A CN 116404637A CN 202310326869 A CN202310326869 A CN 202310326869A CN 116404637 A CN116404637 A CN 116404637A
Authority
CN
China
Prior art keywords
short
load prediction
model
term load
scene
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310326869.4A
Other languages
Chinese (zh)
Inventor
耿华
江博臻
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CN202310326869.4A priority Critical patent/CN116404637A/en
Publication of CN116404637A publication Critical patent/CN116404637A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J3/00Circuit arrangements for ac mains or ac distribution networks
    • H02J3/003Load forecast, e.g. methods or systems for forecasting future load demand
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0455Auto-encoder networks; Encoder-decoder networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/096Transfer learning
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S10/00Systems supporting electrical power generation, transmission or distribution
    • Y04S10/50Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Power Engineering (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides a power system short-term load prediction and a power system short-term load prediction device, which relate to the technical field of power and comprise the following components: clustering based on the respective short-term load history curves of each prediction scene by combining the time sequence and the distribution similarity of each short-term load history curve to obtain an optimal clustering result; according to different target application scenes, different load prediction models based on a transducer are established, and the load prediction models in each category are subjected to model training and evaluation by utilizing a short-term load history curve and known characteristics to obtain respective performance optimal models of each category; and performing model migration on the respective performance optimal model of each category so that each performance optimal model is respectively applied to other application scenes among the categories to perform short-term load prediction. The method and the device simply predict the short-term load of any application scene. The low-level load prediction precision is improved, the operation amount is small, and the operation speed is high.

Description

Short-term load prediction method and device for electric power system
Technical Field
The invention relates to the technical field of power, in particular to a power system short-term load prediction and a power system short-term load prediction device.
Background
Power data analysis has entered the big data age, and the large amount of real-time fine-grained energy consumption data collected by advanced metering infrastructure (Advanced Metering Infrastructure, AMI) provides more reliable information for power supply and demand balance analysis. Accurate Short-term load prediction (STLF) may provide references for power sellers, dispatchers, and users, making more rational power sales plans, dispatch plans, and electricity usage plans. Therefore, under a large number of similar load prediction scenes, the research reduces the training time cost of the prediction model, ensures certain prediction precision, and improves the load prediction under a small sample, thereby having important significance.
Over the past decades, researchers have devised a variety of Short-term predictive models (Short-term Load Forecasting Model, SLFM), such as: autoregressive models (autoregressive moving average, autoregressive integral moving average), support vector machines, and the like. A number of research documents have shown their effectiveness in the STLF field. However, the load nonlinear characteristics of the loads of the lower hierarchy, particularly those of the residential distribution transformer class, are more remarkable than those of the system level, and the prediction accuracy of the presently mentioned method is limited, resulting in a decrease in the prediction accuracy.
Disclosure of Invention
In view of the above, the present invention proposes a power system short-term load prediction and a power system short-term load prediction apparatus.
The embodiment of the invention provides a power system short-term load prediction method, which comprises the following steps:
based on the respective short-term load history curves of each prediction scene, clustering is carried out by combining the time sequence and the distribution similarity of each short-term load history curve, and an optimal clustering result is obtained, wherein the optimal clustering result comprises: a plurality of categories of optimal clusters;
according to different target application scenes, different load prediction models based on a transducer are established, and model training and evaluation are carried out on the load prediction models in each category by utilizing the short-term load history curve and the known characteristics, so that respective performance optimal models of each category are obtained;
and performing model migration on the respective performance optimal model of each category so that each performance optimal model is applied to other application scenes among the categories to perform short-term load prediction.
Optionally, the different load prediction models include: a single encoder and a plurality of decoders, or comprises: a single encoder and a single decoder;
Establishing different load prediction models based on a transducer, including:
the single encoder receives known historical features and known forecast features, performs encoding operation by using the characteristics of a transducer, obtains output features and transmits the output features to a single decoder or a plurality of decoders, and one decoder corresponds to one prediction scene;
the decoder performs decoding operation on the output characteristics by utilizing the characteristics of the transducer to obtain a short-term load prediction curve corresponding to a prediction scene;
and the decoders perform decoding operation on the output characteristics by utilizing the characteristics of the transformers to obtain short-term load prediction curves of the corresponding prediction scenes.
Optionally, the encoder receives the known history feature and the known forecast feature, performs a coding operation using the characteristics of the transducer to obtain an output feature, including:
presetting the known history feature as X h ∈R h×n The known forecast feature is X p ∈R p×n Where h represents a known historical feature time length, p represents a known forecast feature time length, and n represents the number of features used for prediction;
the known historical features and the known forecast features are respectively input into a plurality of attention layers, the output of each attention layer is obtained through linear mapping, dot product and normalization, and a plurality of attention layers are stacked to obtain the output multi head (Q, K, V) of the multi head attention layer, wherein the multi head attention layer is expressed as follows:
Multihead(Q,K,V)=concat(head 1 ,...,head m )W O
Figure BDA0004153558980000021
In the above formula, m represents the number of attention heads, W O Representing linear mapping weights fused to multi-headed attention and mapped to appropriate dimensions, q=xw Q ,K=XW K ,V=XW V X represents input data X of the attention layer h And X p
Figure BDA0004153558980000031
Respectively representing linear mapping weights, Q, K, V respectively representing a value matrix, a key matrix and a query matrix;
adding the output multi head (Q, K, V) of the multi-head attention layer to the input X of the attention layer and performing layer normalization to obtain a Norm out1 Expressed as:
Norm out1 =Norm(X+Multihead(Q,K,V))
the Norm is subjected to out1 Inputting to a feedforward neural network to obtain feedforward neural network output characteristics, and then inputting the Norm out1 Adding the obtained value with the output characteristics of the feedforward neural network and carrying out layer normalization to obtain a Norm out2 Expressed as:
Norm out2 =Norm(Norm out1 +FC(Norm out1 ))
in the above formula, FC (·) represents a fully connected neural network;
stacking the output features of the known history features and the output features of the known forecast features to obtain an Encoder out Expressed as:
Figure BDA0004153558980000032
in the above formula, the known history feature X h Is characterized by
Figure BDA0004153558980000033
The known forecast characteristics X p The output characteristic of (2) is->
Figure BDA0004153558980000034
Optionally, clustering is performed based on the respective short-term load history curves of each prediction scene by combining the time sequence and the distribution similarity of each short-term load history curve to obtain an optimal clustering result, which comprises the following steps:
Z-SCORE standardization is carried out on each short-term load history curve to obtain a standardized curve;
setting peak height and peak width, and extracting sequence peak-valley points for each standardized curve;
stretching a peak-valley point transverse axis, performing density clustering on a longitudinal axis by using DBSCAN, aligning peak-valley points, extracting sequence key points of each standardized curve, and ignoring outliers;
using Euclidean distance as a measure for time sequence similarity, performing similarity measure calculation, and performing hierarchical clustering and similarity index calculation;
and obtaining the optimal clustering result by combining a preset distribution similarity threshold value based on the similarity measurement calculation result, hierarchical clustering and similarity index calculation result.
Optionally, performing similarity metric calculation, hierarchical clustering and similarity index calculation using euclidean distance as a metric for performing time series similarity, including:
calculating Euclidean distance matrixes among key points of different sequences: d (D) E ∈R m×m
Estimating probability distribution of short-term load prediction curves by using the kernel density, and calculating KL divergence matrixes among different short-term load prediction curves by taking KL divergence as a measure of sequence distribution similarity;
calculating the hierarchical clustering result under the clustering number n: c= { C 1 ,...,c n -and corresponding distribution similarity index: sim (Sim) dis The following formula is shown:
Figure BDA0004153558980000041
in the above-mentioned method, the step of,
Figure BDA0004153558980000042
representing the sum of divergence between classes, +.>
Figure BDA0004153558980000043
Representing the sum of divergence between different classes, x p Is of any kind C i Load sequence, x q Is of any kind C j In a sequence of loads in a host cell.
Optionally, according to different target application scenarios, different load prediction models based on a transducer are established, and model training and evaluation are performed on the load prediction models of each category by using the short-term load history curve and the known characteristics, so as to obtain respective performance optimal models of each category, including:
determining whether the target application scene is a large sample scene or a small sample scene;
under the condition that the target application scene is the large sample scene, constructing a first load prediction model based on a transducer;
constructing a second load prediction model based on a transducer under the condition that the target application scene is the small sample scene;
and dividing a training set and a testing set for the first load prediction model or the second load prediction model by utilizing the short-term load history curve and the known characteristics according to the optimal clustering result, and training the models and evaluating the models to obtain respective performance optimal models of each category.
Optionally, the known features include: known historical features, known forecast features;
according to the optimal clustering result, the training set and the testing set are divided into the first load prediction model or the second load prediction model by utilizing the short-term load history curve and the known characteristics, and the training model and the evaluation model are used for obtaining the respective performance optimal model of each category, which comprises the following steps:
according to the optimal clustering result C opt Selecting any scene r as a reference scene from target application scenes corresponding to the first load prediction model;
dividing the known historical characteristics, the known forecast characteristics and the short-term load historical curve of the reference scene into a training set and a testing set corresponding to the first load prediction model according to preset conditions;
taking the minimum mean square error as an objective function, training the first load prediction model for multiple times by utilizing the training set through a gradient back propagation algorithm, and evaluating the first load prediction model by utilizing the test set to obtain a performance optimal model of each category; or alternatively, the process may be performed,
according to the optimal clustering result C opt Selecting any type c in a target application scene corresponding to the second load prediction model i
Class c i Dividing the known historical characteristics, the known forecast characteristics and the short-term load historical curve into a training set and a testing set corresponding to the second load prediction model according to preset conditions;
and training the first load prediction model for multiple times by using the training set through a gradient back propagation algorithm by taking the minimum mean square error as an objective function, and evaluating the second load prediction model by using the test set to obtain the respective performance optimal model of each category.
Optionally, performing model migration on the respective performance best model of each category, so that each performance best model is respectively applied to other application scenes among the categories, and performing short-term load prediction includes:
according to the optimal clustering result C opt Respectively applying the performance optimal models in each class to other target scenes among the classes;
constructing a third load prediction model based on a transducer based on the other target scenes, dividing known historical features, known forecast features and short-term load historical curves of the other target scenes into a training set and a testing set corresponding to the third load prediction model according to preset conditions, and finely adjusting an output layer of a performance optimal model in a category where the other target scenes are located;
And solving output layer parameters of the third load prediction model in the training set by using a least square method with 2-norm constraint, and evaluating the third load prediction model by using the test set so that the third load prediction model is applied to the other target scenes to perform short-term load prediction.
Optionally, performing model migration on the respective performance best model of each category, so that each performance best model is respectively applied to other application scenes among the categories, and performing short-term load prediction includes:
according to the optimal clustering result C opt Respectively applying the performance optimal models in each class to other target scenes among the classes;
calculating the time sequence and distribution similarity of other target scenes and each predicted scene, and classifying the other target scenes to be closest to the predicted scene;
constructing a fourth load prediction model based on a transducer based on the nearest prediction scene, and dividing known historical features, known prediction features and a short-term load history curve of the other target scenes into a training set and a testing set corresponding to the fourth load prediction model according to preset conditions;
and training the decoder of the fourth load prediction model by using the minimum mean square error as an objective function through a BP algorithm by using the training set, and evaluating the fourth load prediction model after the decoder training by using the test set so that the fourth load prediction model after the decoder training is applied to other target scenes to perform short-term load prediction.
The embodiment of the invention also provides a device for predicting the short-term load of the power system, which comprises:
the clustering module is used for clustering based on the respective short-term load history curves of each prediction scene and combining the time sequence and the distribution similarity of each short-term load history curve to obtain an optimal clustering result, wherein the optimal clustering result comprises: a plurality of categories of optimal clusters;
the modeling training and evaluating module is used for establishing different load prediction models based on a transducer according to different target application scenes, and carrying out model training and evaluating on the load prediction models in each category by utilizing the short-term load history curve and the known characteristics to obtain respective performance optimal models of each category;
and the migration module is used for carrying out model migration on the respective performance optimal model of each category so that each performance optimal model is respectively applied to other application scenes among the categories to carry out short-term load prediction.
Optionally, the clustering module includes:
the normalization unit is used for performing Z-SCORE normalization on each short-term load history curve to obtain a normalization curve;
an extraction unit for setting peak height and peak width, and extracting sequence peak-valley points for each standardized curve;
The alignment unit is used for stretching the transverse axis of peak and valley points, performing density clustering on the longitudinal axis by using DBSCAN, aligning the peak and valley points, extracting sequence key points of each standardized curve, and ignoring outliers;
the computing unit is used for carrying out similarity measurement computation by taking Euclidean distance as a measurement for carrying out time sequence similarity, hierarchical clustering and similarity index computation;
and the clustering unit is used for obtaining the optimal clustering result based on the similarity measurement calculation result, the hierarchical clustering and the similarity index calculation result and combining a preset distribution similarity threshold value.
Optionally, the computing unit is specifically configured to:
calculating Euclidean distance matrixes among key points of different sequences: d (D) E ∈R m×m
Estimating probability distribution of short-term load prediction curves by using the kernel density, and calculating KL divergence matrixes among different short-term load prediction curves by taking KL divergence as a measure of sequence distribution similarity;
calculating the hierarchical clustering result under the clustering number n: c= { C 1 ,...,c n -and corresponding distribution similarity index: sim (Sim) dis The following formula is shown:
Figure BDA0004153558980000071
in the above-mentioned method, the step of,
Figure BDA0004153558980000072
representing the sum of divergence between classes, +.>
Figure BDA0004153558980000073
Representing the sum of divergence between different classes, x p Is of any kind C i Load sequence, x q Is of any kind C j In a sequence of loads in a host cell.
Optionally, the modeling training and evaluation module comprises:
the scene unit is used for determining whether the target application scene is a large sample scene or a small sample scene;
the first modeling unit is used for constructing a first load prediction model based on a transducer under the condition that the target application scene is the large sample scene;
the second modeling unit is used for constructing a second load prediction model based on a transducer under the condition that the target application scene is the small sample scene;
and the training and evaluating unit is used for dividing the training set and the testing set into the first load prediction model or the second load prediction model by utilizing the short-term load history curve and the known characteristics according to the optimal clustering result, and training the models and evaluating the models to obtain the respective performance optimal model of each category.
Optionally, the known features include: known historical features, known forecast features; the training and evaluation unit is specifically configured to:
according to the optimumClustering result C opt Selecting any scene r as a reference scene from target application scenes corresponding to the first load prediction model;
Dividing the known historical characteristics, the known forecast characteristics and the short-term load historical curve of the reference scene into a training set and a testing set corresponding to the first load prediction model according to preset conditions;
taking the minimum mean square error as an objective function, training the first load prediction model for multiple times by utilizing the training set through a gradient back propagation algorithm, and evaluating the first load prediction model by utilizing the test set to obtain a performance optimal model of each category; or alternatively, the process may be performed,
according to the optimal clustering result C opt Selecting any type c in a target application scene corresponding to the second load prediction model i
Class c i Dividing the known historical characteristics, the known forecast characteristics and the short-term load historical curve into a training set and a testing set corresponding to the second load prediction model according to preset conditions;
and training the first load prediction model for multiple times by using the training set through a gradient back propagation algorithm by taking the minimum mean square error as an objective function, and evaluating the second load prediction model by using the test set to obtain the respective performance optimal model of each category.
Optionally, the migration module includes:
A first application unit for generating a cluster result C according to the optimal cluster result opt Respectively applying the performance optimal models in each class to other target scenes among the classes;
the modeling fine tuning unit is used for constructing a third load prediction model based on a transducer based on the other target scenes, dividing the known historical characteristics, the known forecast characteristics and the short-term load historical curve of the other target scenes into a training set and a testing set corresponding to the third load prediction model according to preset conditions, and fine tuning the output layer of the performance optimal model in the category of the other target scenes;
and the solving application unit is used for solving the output layer parameters of the third load prediction model by utilizing a least square method with 2-norm constraint in the training set, and evaluating the third load prediction model by utilizing the test set so that the third load prediction model is applied to the other target scenes to perform short-term load prediction.
Optionally, the migration module further includes:
a second application unit for generating a cluster result C according to the optimal cluster result opt Respectively applying the performance optimal models in each class to other target scenes among the classes;
The classifying unit is used for calculating the time sequence and distribution similarity of other target scenes and each predicted scene and classifying the other target scenes to be closest to the predicted scene;
the modeling dividing unit is used for constructing a fourth load prediction model based on a transducer based on the nearest prediction scene, dividing the known historical characteristics, the known prediction characteristics and the short-term load historical curve of the other target scenes into a training set and a testing set corresponding to the fourth load prediction model according to preset conditions;
the training application unit is used for training the decoder of the fourth load prediction model by using the training set through the BP algorithm and using the test set to evaluate the fourth load prediction model after the decoder training, so that the fourth load prediction model after the decoder training is applied to other target scenes to perform short-term load prediction.
According to the power system short-term load prediction method, firstly, clustering is conducted on the basis of the short-term load history curves of each prediction scene and combining the time sequence and the distribution similarity of each short-term load history curve, and an optimal clustering result is obtained.
Then, according to different target application scenes, different load prediction models based on a transducer are established, and the short-term load history curves and known characteristics are utilized to perform model training and evaluation on the load prediction models in each category, namely, general characteristics are extracted, so that the respective performance optimal model of each category is obtained; and performing model migration on the respective performance optimal model of each category so that each performance optimal model is applied to other application scenes among the categories to perform short-term load prediction.
The method aims at low-level loads, particularly at the level loads of a residential distribution transformer, firstly, a respective short-term load history curve of each prediction scene is obtained, and because the characteristics of the loads are concentrated, clustering can be carried out, and a certain number of clustering categories are obtained. And extracting general features for each practical application scene, and further obtaining the respective performance optimal model of each category. And finally, migrating the performance optimal model to other application scenes among classes, so that short-term load prediction can be simply and conveniently performed on any application scene. The prediction precision of the low-level load is greatly improved, the operation amount is small, the operation speed is high, and the practicability is high.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to designate like parts throughout the figures. In the drawings:
FIG. 1 is a flow chart of a method for predicting short-term load of an electrical power system according to an embodiment of the invention;
FIG. 2 is a schematic diagram of the structure of an encoder, a plurality of decodes according to an embodiment of the present invention;
fig. 3 is a block diagram of a power system short-term load prediction apparatus according to an embodiment of the present invention.
Detailed Description
In order that the above-recited objects, features and advantages of the present invention will become more readily apparent, a more particular description of the invention will be rendered by reference to the appended drawings and appended detailed description. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
The inventors found that, although various short-term prediction models have been designed so far, load nonlinear characteristics of loads at a low level, particularly at a level such as a residential distribution transformer, are more remarkable than loads at a system level, and prediction accuracy of the short-term prediction models which have been mentioned so far is limited, resulting in a reduction in prediction accuracy.
The inventors have further studied and found that in recent years, with the advent of artificial intelligence, a number of Artificial Neural Network (ANN) based methods have been applied in combination with STLF, such as fully connected neural networks (Full Connect Neural Network, FCN), convolutional neural networks (Convolutional Neural Network, CNN) and Long short term memory networks (Long short-term memory network, LSTM). In particular LSTM shows excellent performance in STLF. However, LSTM is recursive along the sequential direction of the sequence, in other words, it is limited by the time complexity, and is complex to apply in STLF, and the computation is very large.
In the field of natural language processing, a general transducer is used for capturing context information, and a limited computing resource is allocated by using an attention mechanism to pay attention to an important part, so that the interpretability of various models can be greatly improved. In addition, the network model constructed by the transducer belongs to a feedforward neural network, can be calculated in parallel, and greatly reduces the training cost of various models.
In addition, the inventors have found that model-based transfer learning assumptions are well trained, and that source domain models learn a great deal of structural knowledge from the data. Therefore, re-extraction of training data or relation reasoning of complex data representation can be avoided by reusing the model learned from the source domain, so that model-based transfer learning is more effective, and advanced knowledge of the source domain is mastered. The model-based migration learning method is applied to natural language processing, and a good effect can be obtained by pretraining a large amount of text data and then fine-tuning a model output layer on a downstream task.
Aiming at the creative research, the inventor creatively combines and applies various technologies such as clustering, transform model construction, model migration and the like in STLF, and provides the short-term load prediction device of the power system and the short-term load prediction device of the power system. The power system short-term load prediction and the power system short-term load prediction apparatus according to the present invention are explained and described in detail below.
Referring to fig. 1, a flowchart of a method for predicting short-term load of an electric power system according to an embodiment of the present invention is shown, the method including:
step 101: based on the respective short-term load history curves of each prediction scene, clustering is carried out by combining the time sequence and the distribution similarity of each short-term load history curve, and an optimal clustering result is obtained, wherein the optimal clustering result comprises: a plurality of categories of optimal clusters.
For each scene of the power system, especially for low-level equipment, the change characteristics of the load data have higher similarity, so that the load data can be clustered to obtain one or more most representative load data to cover other load data, the data quantity of subsequent operation can be reduced, and the operation efficiency is improved. Based on the consideration, clustering is carried out by combining the time sequence and the distribution similarity of each short-term load history curve based on the respective short-term load history curve of each prediction scene, and an optimal clustering result is obtained. In general, the optimal clustering result can obtain one or more clustering numbers, and each clustering result is different from other clustering results, so that the clustering results can be classified according to the clustering number, and one class corresponds to one clustering result.
In a preferred embodiment, the clustering method specifically includes:
firstly, Z-SCORE standardization is carried out on each short-term load history curve to obtain a standardized curve; setting peak height and peak width, and extracting sequence peak-valley points from each standardized curve; and stretching the transverse axis of peak and valley points, performing density clustering on the longitudinal axis by using DBSCAN, aligning the peak and valley points, extracting sequence key points of each standardized curve, and ignoring outliers.
Then, using Euclidean distance as a measure for time sequence similarity, performing similarity measure calculation, and performing hierarchical clustering and similarity index calculation; and finally, based on the similarity measurement calculation result, hierarchical clustering and similarity index calculation result, combining a preset distribution similarity threshold value to obtain an optimal clustering result.
Wherein performing similarity metric calculation, hierarchical clustering and similarity index calculation using Euclidean distance as a metric for performing time series similarity, comprises:
firstly, calculating Euclidean distance matrixes among key points of different sequences: d (D) E ∈R m×m The method comprises the steps of carrying out a first treatment on the surface of the And estimating probability distribution of short-term load prediction curves by using the kernel density, and calculating a KL (Kullback-Leibler) divergence matrix between different short-term load prediction curves by taking the KL divergence as a measure of sequence distribution similarity.
Assuming that the number of clusters is n, calculating the hierarchical clustering result under the number of clusters n: c= { C 1 ,...,c n -and corresponding distribution similarity index: sim (Sim) dis The following formula is shown:
Figure BDA0004153558980000121
in the above-mentioned method, the step of,
Figure BDA0004153558980000122
representing the sum of divergence between classes, +.>
Figure BDA0004153558980000123
Representing the sum of divergence between different classes, x p Is of any kind C i Load sequence, x q Is of any kind C j In a sequence of loads in a host cell. It should be noted that the function is a monotonically decreasing function, so that the smaller the value, the better the clustering effect is represented, but the formula has no extreme points, so that the threshold needs to be set manually, i.e. the preset distribution similarity threshold is set.
Step 102: according to different target application scenes, different load prediction models based on a transducer are established, and the load prediction models in each category are subjected to model training and evaluation by utilizing a short-term load history curve and known characteristics, so that the respective performance optimal model of each category is obtained.
After the clustering result is obtained, different load prediction models based on a transducer can be established according to different target application scenes, and the load prediction models in each category are subjected to model training and evaluation by utilizing a short-term load history curve and known characteristics, namely, the general characteristics are extracted, and the respective performance optimal model of each category is obtained. Generally, application scenarios in practical applications include: a large sample scene and a small sample scene. A large sample scene refers to a scene with a relatively large amount of data, and a small sample scene refers to a scene with a relatively small amount of data. Taking a resident distribution transformer as an example, a large sample scene is that the load data of the resident distribution transformer is large, the load data of the resident distribution transformer from the current day to the first 1 year is stored, and a small sample scene is that the load data of the resident distribution transformer is small, and the load data of the resident distribution transformer only from the current day to the first two months is stored. Therefore, the small sample scene does not reflect the real load data of the residential distribution transformer well.
While there are two different load prediction models, one can be defined as a multi-objective load prediction model, which includes: a single encoder and a plurality of decoders; another may be defined as a single target load prediction model, comprising: a single encoder and a single decoder.
Firstly, a single or multi-target load prediction model is established based on the characteristics of a transducer, and then known historical characteristics and known forecast characteristics can be processed by using the single or multi-target load prediction model to obtain respective short-term load prediction curves of each prediction scene. This is because the single or multi-objective load prediction model is a model of a recursive nature in nature that reflects the relationship between known historical characteristics, known forecast characteristics, and short-term load prediction curves for each scene.
In a preferred embodiment, the known historical features and the known predictive features are received by an encoder, encoded using the characteristics of the transducer, and the output features are obtained and transmitted to a plurality of decoders. The single decoder performs decoding operation on the output characteristics by using the characteristics of a transducer to obtain a short-term load prediction curve corresponding to the prediction scene; and a plurality of decoders for decoding the output characteristics by using the characteristics of the transformers to obtain short-term load prediction curves of the corresponding prediction scenes. One decoder obtains a short-term load prediction curve of a predicted scene, and a plurality of decoders naturally obtain short-term load prediction curves of respective predicted scenes.
For the above-described encoder, multiple decoding structure, a clearer understanding can be obtained with reference to the schematic structure shown in fig. 2. Presetting a known history characteristic as X h ∈R h×n The forecast feature is known as X p ∈R p×n Where h represents a known historical feature time length, p represents a known forecast feature time length, and n represents the number of features used for prediction; these two features serve as inputs to the encoder.
The known historical features and the known predictive features are each input to multiple Attention layers (denoted Multi-Head Attention in fig. 2), the output of each Attention layer is obtained through linear mapping, dot product and normalization, and the multiple Attention layers are stacked to obtain the output Multi-Head (Q, K, V) of the Attention layer, which is expressed as follows:
Multihead(Q,K,V)=concat(head 1 ,...,head m )W O
Figure BDA0004153558980000131
in the above formula, m represents the number of attention heads, W O Representing linear mapping weights fused to multi-headed attention and mapped to appropriate dimensions, q=xw Q ,K=XW K ,V=XW V X represents input data X of the attention layer h And X p
Figure BDA0004153558980000132
Respectively representing linear mapping weights, Q, K, V respectivelyRepresenting a value matrix, a key matrix, a query matrix.
And then the output multi-head (Q, K, V) of the multi-head attention layer is combined with the input X (including X) h And X p ) Add and layer normalize (Add in fig. 2 &Norm representation) to obtain Norm out1 Expressed as:
Norm out1 =Norm(X+Multihead(Q,K,V))
norm is subjected to out1 Input to a feedforward neural network (represented by Feed Forward in FIG. 2) to obtain the output characteristics of the feedforward neural network, and the Norm is then calculated out1 Added to the output characteristics of the Feed Forward neural network and layer normalized (using Add over Feed Forward in FIG. 2)&Norm representation) to obtain Norm out2 Expressed as:
Norm out2 =Norm(Norm out1 +FC(Norm out1 ))
in the above formula, FC (·) represents a fully connected neural network.
Finally, stacking the output features of the known history features and the output features of the known forecast features (represented by Concate in FIG. 2) to obtain an Encoder out Expressed as:
Figure BDA0004153558980000141
in the above formula, the history feature X is known h Is characterized by
Figure BDA0004153558980000142
Known forecast characteristics X p The output characteristic of (2) is->
Figure BDA0004153558980000143
Multiple decoders each receive an Encoder out Decoding, outputting respective short-term load prediction curves y corresponding to respective scenes 1 、y 2 ......y m . The specific decoding operations of the decoder can be understood with reference to the encoder and known decoder principles, and will not be described in detail. The invention is thatThe inventive feature is understood to be a feature corresponding to information other than load data or load curves, for example: temperature, rainfall, air pressure, wind speed, humidity, sunlight time and other information.
In a preferred embodiment, step 102 specifically includes:
Determining whether the target application scene is a large sample scene or a small sample scene; according to whether the target application scene to be predicted is a large sample scene or a small sample scene, a prediction model needs to be established respectively. Namely:
under the condition that the target application scene is a large sample scene, constructing a first load prediction model based on a transducer; and under the condition that the target application scene is a small sample scene, constructing a second load prediction model based on the transducer.
And finally, dividing a training set and a testing set for the first load prediction model or the second load prediction model by utilizing a short-term load history curve and known characteristics according to the optimal clustering result, and training the models and evaluating the models to obtain the respective performance optimal model of each category.
Let the optimal clustering result be defined as C opt Then for a large sample scene there is:
according to the optimal clustering result C opt Selecting any scene r as a reference scene from target application scenes corresponding to the first load prediction model; and dividing the known historical characteristics, the known forecast characteristics and the short-term load historical curve of the reference scene r into a training set and a testing set corresponding to the single-target load forecasting model according to preset conditions. The preset conditions in the embodiment of the present invention may be divided according to actual requirements, for example: 80% of all known historical features, 80% of all known forecast features and 80% of all short-term load history curves of the reference scene r are respectively selected as training sets, and the remaining 20% are selected as test sets.
Finally, taking the minimum mean square error as an objective function, training the first load prediction model for multiple times by utilizing a training set through a gradient back propagation algorithm, and evaluating the first load prediction model by utilizing a testing set to obtain the respective property of each categoryCan best model { M 1 ,...,M k -a }; wherein M is 1 A performance best model representing a first category, M k The best performance model for the kth category is represented.
As one example: training time: the inputs are known historical features and forecast features relative to the known historical features, and the outputs are true short-term loads; then, at the time of prediction, an unknown load is predicted. For example: assuming that there is a feature and load data of the past week, the first training is that the input known historical features are Monday and Tuesday features, the known forecast features relative to the known historical features are Tuesday features, and the output is the true short-term load of Tuesday; in the second training, the input known historical features are Tuesday and Tuesday features, the known forecast features relative to the known historical features are Tuesday features, the output is the real short-term load of Tuesday, and so on, the training is repeated, and finally the evaluation is carried out. The load prediction model is utilized to predict the next Monday, and the input known historical characteristics are Saturday and Monday characteristics at the moment, the known prediction characteristics are Monday predicted characteristics, and the unknown short-term load of Monday can be accurately predicted through the load prediction model.
It should be noted that, since the data size of the large sample scene is relatively large, each type of corresponding scene is determined, and thus the first load prediction model is actually a single target load prediction model. The data size of the small sample scene is relatively small, and the load characteristics of the small sample scene cannot be determined to be more similar to the load characteristics corresponding to which predicted scene, so that the second load prediction model needs to be constructed into a multi-target load prediction model, and short-term load prediction of the small sample scene can be more accurate. For small sample scenes there are:
according to the optimal clustering result C opt Selecting any type c in a target application scene corresponding to the second load prediction model i The method comprises the steps of carrying out a first treatment on the surface of the The selected class c i Is divided into a first stage according to preset conditionsTraining sets and test sets corresponding to the two load prediction models; finally, taking the minimum mean square error as an objective function, training the second load prediction model for multiple times by utilizing a training set through a gradient back propagation algorithm, evaluating the second load prediction model by utilizing a testing set, further obtaining a respective performance optimal model of each category, and using { M } 1, ...,M k Represented by M 1 A performance best model representing a first category, M k The best performance model for the kth category is represented.
Step 103: and performing model migration on the respective performance optimal model of each category so that each performance optimal model is applied to other application scenes among the categories to perform short-term load prediction.
After the performance optimal model of each category is obtained, the obtained performance optimal model is the general characteristic extracted by the clustering result, and the situation of other scenes in one category cannot be accurately reflected, so that model migration is needed finally, the migrated model can accurately reflect the situation of other scenes in the category, and further short-term load prediction is accurately performed.
For large sample scenes, since it is explicitly known to which category the target application scene belongs, according to the optimal clustering result C opt The optimal performance models in the various classes are respectively applied to other target scenes among the classes, and a third load prediction model based on the transducer and constructed based on the other target scenes is also a single target load prediction model. Naturally, it can be appreciated that for small sample scenes, according to the optimal clustering result C opt The optimal performance models in the various classes are respectively applied to other target scenes among the classes, and a fourth load prediction model based on the transducer and built based on the other target scenes is essentially a multi-target load prediction model.
Based on the above principle, for a large sample scene, the known historical features, the known forecast features and the short-term load historical curves of other target scenes between the classes need to be divided into a training set and a testing set corresponding to the third load forecasting model according to the preset conditions, and meanwhile, the output layer of the performance optimal model in the class where the other target scenes are located needs to be finely adjusted. For example: a certain distribution transformer a and another distribution transformer B are in the same category C. And obtaining the optimal performance model of the class C based on the distribution transformer A. Under the condition that the distribution transformer B is used as a target application scene, the known historical characteristics, the known forecast characteristics and the short-term load historical curve of the distribution transformer B are divided into a training set and a testing set corresponding to the third load forecasting model according to the preset conditions, and meanwhile, the output layer of the performance optimal model of the category C is required to be finely adjusted.
After fine tuning, the output layer parameters of the third load prediction model are solved in a training set by using a least square method with 2-norm constraint, and the third load prediction model is evaluated by using a test set, so that the third load prediction model is applied to other target scenes to perform short-term load prediction. The foregoing examples are followed: after fine tuning, the output layer parameters of the third load prediction model in the class C are solved in the training set by using a least square method with 2-norm constraint, and the third load prediction model in the class C is evaluated by using the test set, so that the third load prediction model in the class C is applied to the distribution transformer B, and short-term load prediction of the distribution transformer B is accurately performed.
For small sample scenes, slightly different from large sample scenes:
firstly, calculating the time sequence and distribution similarity of other target scenes and each predicted scene, and classifying the other target scenes to be closest to the predicted scene; i.e. it is first determined into which category the target scene should be divided.
Then, based on the nearest predicted scene, a fourth load prediction model based on a transducer is built, known historical features, known prediction features and a short-term load historical curve of other target scenes are divided into a training set and a testing set corresponding to the fourth load prediction model according to preset conditions; and finally, training a decoder of the fourth load prediction model by using a BP algorithm (gradient back propagation algorithm) and using a training set by taking the minimum mean square error as an objective function, and evaluating the fourth load prediction model after the decoder training by using a test set so that the fourth load prediction model after the decoder training is applied to other target scenes, thereby accurately carrying out short-term load prediction of the target scenes.
In an embodiment of the present invention, based on the method for predicting a short-term load of a power system, a short-term load predicting device of a power system is further provided, referring to fig. 3, which shows a block diagram of the short-term load predicting device of a power system, including:
the clustering module 310 is configured to cluster based on the respective short-term load history curves of each predicted scenario, and combine the time sequence and the distribution similarity of each short-term load history curve to obtain an optimal clustering result, where the optimal clustering result includes: a plurality of categories of optimal clusters;
the modeling training and evaluating module 320 is configured to establish different load prediction models based on a transducer according to different target application scenarios, and perform model training and evaluation on the load prediction models in each category by using the short-term load history curve and the known characteristics to obtain respective performance optimal models of each category;
the migration module 330 is configured to perform model migration on the respective performance best model of each class, so that each performance best model is applied to other application scenarios between classes, and performs short-term load prediction.
Optionally, the clustering module 310 includes:
the normalization unit is used for performing Z-SCORE normalization on each short-term load history curve to obtain a normalization curve;
An extraction unit for setting peak height and peak width, and extracting sequence peak-valley points for each standardized curve;
the alignment unit is used for stretching the transverse axis of peak and valley points, performing density clustering on the longitudinal axis by using DBSCAN, aligning the peak and valley points, extracting sequence key points of each standardized curve, and ignoring outliers;
the computing unit is used for carrying out similarity measurement computation by taking Euclidean distance as a measurement for carrying out time sequence similarity, hierarchical clustering and similarity index computation;
and the clustering unit is used for obtaining the optimal clustering result based on the similarity measurement calculation result, the hierarchical clustering and the similarity index calculation result and combining a preset distribution similarity threshold value.
Optionally, the computing unit is specifically configured to:
calculating Euclidean distance matrixes among key points of different sequences: d (D) E ∈R m×m
Estimating probability distribution of short-term load prediction curves by using the kernel density, and calculating KL divergence matrixes among different short-term load prediction curves by taking KL divergence as a measure of sequence distribution similarity;
calculating the hierarchical clustering result under the clustering number n: c= { C 1 ,...,c n -and corresponding distribution similarity index: sim (Sim) dis The following formula is shown:
Figure BDA0004153558980000181
In the above-mentioned method, the step of,
Figure BDA0004153558980000182
representing the sum of divergence between classes, +.>
Figure BDA0004153558980000183
Representing the sum of divergence between different classes, x p Is of any kind C i Load sequence, x q Is of any kind C j In a sequence of loads in a host cell.
Optionally, the modeling training and evaluation module 320 includes:
the scene unit is used for determining whether the target application scene is a large sample scene or a small sample scene;
the first modeling unit is used for constructing a first load prediction model based on a transducer under the condition that the target application scene is the large sample scene;
the second modeling unit is used for constructing a second load prediction model based on a transducer under the condition that the target application scene is the small sample scene;
and the training and evaluating unit is used for dividing the training set and the testing set into the first load prediction model or the second load prediction model by utilizing the short-term load history curve and the known characteristics according to the optimal clustering result, and training the models and evaluating the models to obtain the respective performance optimal model of each category.
Optionally, the known features include: known historical features, known forecast features; the training and evaluation unit is specifically configured to:
According to the optimal clustering result C opt Selecting any scene r as a reference scene from target application scenes corresponding to the first load prediction model;
dividing the known historical characteristics, the known forecast characteristics and the short-term load historical curve of the reference scene into a training set and a testing set corresponding to the first load prediction model according to preset conditions;
taking the minimum mean square error as an objective function, training the first load prediction model for multiple times by utilizing the training set through a gradient back propagation algorithm, and evaluating the first load prediction model by utilizing the test set to obtain a performance optimal model of each category; or alternatively, the process may be performed,
according to the optimal clustering result C opt Selecting any type c in a target application scene corresponding to the second load prediction model i
Class c i Dividing the known historical characteristics, the known forecast characteristics and the short-term load historical curve into a training set and a testing set corresponding to the second load prediction model according to preset conditions;
and training the first load prediction model for multiple times by using the training set through a gradient back propagation algorithm by taking the minimum mean square error as an objective function, and evaluating the second load prediction model by using the test set to obtain the respective performance optimal model of each category.
Optionally, the migration module 330 includes:
a first application unit for generating a cluster result C according to the optimal cluster result opt Respectively applying the performance optimal models in each class to other target scenes among the classes;
the modeling fine tuning unit is used for constructing a third load prediction model based on a transducer based on the other target scenes, dividing the known historical characteristics, the known forecast characteristics and the short-term load historical curve of the other target scenes into a training set and a testing set corresponding to the third load prediction model according to preset conditions, and fine tuning the output layer of the performance optimal model in the category of the other target scenes;
and the solving application unit is used for solving the output layer parameters of the third load prediction model by utilizing a least square method with 2-norm constraint in the training set, and evaluating the third load prediction model by utilizing the test set so that the third load prediction model is applied to the other target scenes to perform short-term load prediction.
Optionally, the migration module 330 further includes:
a second application unit for generating a cluster result C according to the optimal cluster result opt Respectively applying the performance optimal models in each class to other target scenes among the classes;
The classifying unit is used for calculating the time sequence and distribution similarity of other target scenes and each predicted scene and classifying the other target scenes to be closest to the predicted scene;
the modeling dividing unit is used for constructing a fourth load prediction model based on a transducer based on the nearest prediction scene, dividing the known historical characteristics, the known prediction characteristics and the short-term load historical curve of the other target scenes into a training set and a testing set corresponding to the fourth load prediction model according to preset conditions;
the training application unit is used for training the decoder of the fourth load prediction model by using the training set through the BP algorithm and using the test set to evaluate the fourth load prediction model after the decoder training, so that the fourth load prediction model after the decoder training is applied to other target scenes to perform short-term load prediction.
In summary, according to the power system short-term load prediction method, firstly, clustering is performed based on the respective short-term load history curves of each prediction scene and by combining the time sequence and the distribution similarity of each short-term load history curve, so as to obtain an optimal clustering result.
Then, according to different target application scenes, different load prediction models based on a transducer are established, and the short-term load history curves and known characteristics are utilized to perform model training and evaluation on the load prediction models in each category, namely, general characteristics are extracted, so that the respective performance optimal model of each category is obtained; and performing model migration on the respective performance optimal model of each category so that each performance optimal model is applied to other application scenes among the categories to perform short-term load prediction.
The method aims at low-level loads, particularly at the level loads of a residential distribution transformer, firstly, a respective short-term load history curve of each prediction scene is obtained, and because the characteristics of the loads are concentrated, clustering can be carried out, and a certain number of clustering categories are obtained. And extracting general features for each practical application scene, and further obtaining the respective performance optimal model of each category. And finally, migrating the performance optimal model to other application scenes among classes, so that short-term load prediction can be simply and conveniently performed on any application scene. The prediction precision of the low-level load is greatly improved, the operation amount is small, the operation speed is high, and the practicability is high.
While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiment and all such alterations and modifications as fall within the scope of the embodiments of the invention.
Finally, it is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or terminal device comprising the element.
The embodiments of the present invention have been described above with reference to the accompanying drawings, but the present invention is not limited to the above-described embodiments, which are merely illustrative and not restrictive, and many forms may be made by those having ordinary skill in the art without departing from the spirit of the present invention and the scope of the claims, which are to be protected by the present invention.

Claims (10)

1. A method for predicting short-term load of an electric power system, the method comprising:
based on the respective short-term load history curves of each prediction scene, clustering is carried out by combining the time sequence and the distribution similarity of each short-term load history curve, and an optimal clustering result is obtained, wherein the optimal clustering result comprises: a plurality of categories of optimal clusters;
according to different target application scenes, different load prediction models based on a transducer are established, and model training and evaluation are carried out on the load prediction models in each category by utilizing the short-term load history curve and the known characteristics, so that respective performance optimal models of each category are obtained;
and performing model migration on the respective performance optimal model of each category so that each performance optimal model is applied to other application scenes among the categories to perform short-term load prediction.
2. The power system short-term load prediction method according to claim 1, wherein the different load prediction models include: a single encoder and a plurality of decoders, or comprises: a single encoder and a single decoder;
establishing different load prediction models based on a transducer, including:
the single encoder receives known historical features and known forecast features, performs encoding operation by using the characteristics of a transducer, obtains output features and transmits the output features to a single decoder or a plurality of decoders, and one decoder corresponds to one prediction scene;
the decoder performs decoding operation on the output characteristics by utilizing the characteristics of the transducer to obtain a short-term load prediction curve corresponding to a prediction scene;
and the decoders perform decoding operation on the output characteristics by utilizing the characteristics of the transformers to obtain short-term load prediction curves of the corresponding prediction scenes.
3. The method of claim 2, wherein the encoder receives known historical features and known forecast features, performs a coding operation using the characteristics of the transducer, and obtains the output features, comprising:
Presetting the known history feature as X h ∈R h×n The known forecast feature is X p ∈R p×n Where h represents a known historical feature time length, p represents a known forecast feature time length, and n represents the number of features used for prediction;
the known historical features and the known forecast features are respectively input into a plurality of attention layers, the output of each attention layer is obtained through linear mapping, dot product and normalization, and a plurality of attention layers are stacked to obtain the output multi head (Q, K, V) of the multi head attention layer, wherein the multi head attention layer is expressed as follows:
Multihead(Q,K,V)=concat(head 1 ,...,head m )W O
Figure FDA0004153558960000021
in the above formula, m represents the number of attention heads, W O Representing linear mapping weights fused to multi-headed attention and mapped to appropriate dimensions, q=xw Q ,K=XW K ,V=XW V X represents input data X of the attention layer h And X p
Figure FDA0004153558960000022
Respectively representing linear mapping weights, Q, K, V respectively representing a value matrix, a key matrix and a query matrix;
adding the output multi head (Q, K, V) of the multi-head attention layer to the input X of the attention layer and performing layer normalization to obtain a Norm out1 Expressed as:
Norm out1 =Norm(X+Multihead(Q,K,V))
the Norm is subjected to out1 Inputting to a feedforward neural network to obtain feedforward neural network output characteristics, and then inputting the Norm out1 Adding the obtained value with the output characteristics of the feedforward neural network and carrying out layer normalization to obtain a Norm out2 Expressed as:
Norm out2 =Norm(Norm out1 +FC(Norm out1 ))
in the above formula, FC (·) represents a fully connected neural network;
stacking the output features of the known history features and the output features of the known forecast features to obtain an Encoder out Expressed as:
Figure FDA0004153558960000023
in the above formula, the known history feature X h Is characterized by
Figure FDA0004153558960000024
The known forecast characteristics X p Is characterized by
Figure FDA0004153558960000025
4. The method for predicting short-term load of a power system according to claim 1, wherein clustering is performed based on respective short-term load history curves of each prediction scene in combination with timing and distribution similarity of each short-term load history curve to obtain an optimal clustering result, comprising:
Z-SCORE standardization is carried out on each short-term load history curve to obtain a standardized curve;
setting peak height and peak width, and extracting sequence peak-valley points for each standardized curve;
stretching a peak-valley point transverse axis, performing density clustering on a longitudinal axis by using DBSCAN, aligning peak-valley points, extracting sequence key points of each standardized curve, and ignoring outliers;
using Euclidean distance as a measure for time sequence similarity, performing similarity measure calculation, and performing hierarchical clustering and similarity index calculation;
And obtaining the optimal clustering result by combining a preset distribution similarity threshold value based on the similarity measurement calculation result, hierarchical clustering and similarity index calculation result.
5. The power system short-term load prediction method according to claim 4, wherein performing similarity metric calculation, and performing hierarchical clustering and similarity index calculation using euclidean distance as a metric for performing time series similarity, comprises:
calculating Euclidean distance matrixes among key points of different sequences: d (D) E ∈R m×m
Estimating probability distribution of short-term load prediction curves by using the kernel density, and calculating KL divergence matrixes among different short-term load prediction curves by taking KL divergence as a measure of sequence distribution similarity;
calculating the hierarchical clustering result under the clustering number n: c= { C 1 ,...,c n -and corresponding distribution similarity index: sim (Sim) dis The following formula is shown:
Figure FDA0004153558960000031
in the above-mentioned method, the step of,
Figure FDA0004153558960000032
representing the sum of the divergence between the classes,
Figure FDA0004153558960000033
representing the sum of divergence between different classes, x p Is of any kind C i Load sequence, x q Is of any kind C j In a sequence of loads in a host cell.
6. The method for predicting short-term load of electric power system according to claim 1, wherein the steps of establishing different load prediction models based on a transducer according to different target application scenes, and performing model training and evaluation on the load prediction models of each category by using the short-term load history curve and the known characteristics to obtain respective performance optimal models of each category comprise:
Determining whether the target application scene is a large sample scene or a small sample scene;
under the condition that the target application scene is the large sample scene, constructing a first load prediction model based on a transducer;
constructing a second load prediction model based on a transducer under the condition that the target application scene is the small sample scene;
and dividing a training set and a testing set for the first load prediction model or the second load prediction model by utilizing the short-term load history curve and the known characteristics according to the optimal clustering result, and training the models and evaluating the models to obtain respective performance optimal models of each category.
7. The power system short-term load prediction method according to claim 6, characterized in that the known features include: known historical features, known forecast features;
according to the optimal clustering result, the training set and the testing set are divided into the first load prediction model or the second load prediction model by utilizing the short-term load history curve and the known characteristics, and the training model and the evaluation model are used for obtaining the respective performance optimal model of each category, which comprises the following steps:
According to the optimal clustering result C opt Selecting any scene r as a reference scene from target application scenes corresponding to the first load prediction model;
dividing the known historical characteristics, the known forecast characteristics and the short-term load historical curve of the reference scene into a training set and a testing set corresponding to the first load prediction model according to preset conditions;
taking the minimum mean square error as an objective function, training the first load prediction model for multiple times by utilizing the training set through a gradient back propagation algorithm, and evaluating the first load prediction model by utilizing the test set to obtain a performance optimal model of each category; or alternatively, the process may be performed,
according to the optimal clustering result C opt Selecting any type c in a target application scene corresponding to the second load prediction model i
Class c i Dividing the known historical characteristics, the known forecast characteristics and the short-term load historical curve into a training set and a testing set corresponding to the second load prediction model according to preset conditions;
and training the first load prediction model for multiple times by using the training set through a gradient back propagation algorithm by taking the minimum mean square error as an objective function, and evaluating the second load prediction model by using the test set to obtain the respective performance optimal model of each category.
8. The method for predicting short-term load of an electric power system according to claim 1, wherein model migration is performed on the respective performance best model of each class, so that each performance best model is respectively applied to other application scenarios among classes, and short-term load prediction is performed, and the method comprises:
according to the optimal clustering result C opt Respectively applying the performance optimal models in each class to other target scenes among the classes;
constructing a third load prediction model based on a transducer based on the other target scenes, dividing known historical features, known forecast features and short-term load historical curves of the other target scenes into a training set and a testing set corresponding to the third load prediction model according to preset conditions, and finely adjusting an output layer of a performance optimal model in a category where the other target scenes are located;
and solving output layer parameters of the third load prediction model in the training set by using a least square method with 2-norm constraint, and evaluating the third load prediction model by using the test set so that the third load prediction model is applied to the other target scenes to perform short-term load prediction.
9. The method for predicting short-term load of an electric power system according to claim 1, wherein model migration is performed on the respective performance best model of each class, so that each performance best model is respectively applied to other application scenarios among classes, and short-term load prediction is performed, and the method comprises:
According to the optimal clustering result C opt Respectively applying the performance optimal models in each class to other target scenes among the classes;
calculating the time sequence and distribution similarity of other target scenes and each predicted scene, and classifying the other target scenes to be closest to the predicted scene;
constructing a fourth load prediction model based on a transducer based on the nearest prediction scene, and dividing known historical features, known prediction features and a short-term load history curve of the other target scenes into a training set and a testing set corresponding to the fourth load prediction model according to preset conditions;
and training the decoder of the fourth load prediction model by using the minimum mean square error as an objective function through a BP algorithm by using the training set, and evaluating the fourth load prediction model after the decoder training by using the test set so that the fourth load prediction model after the decoder training is applied to other target scenes to perform short-term load prediction.
10. An electric power system short-term load prediction apparatus, characterized by comprising:
the clustering module is used for clustering based on the respective short-term load history curves of each prediction scene and combining the time sequence and the distribution similarity of each short-term load history curve to obtain an optimal clustering result, wherein the optimal clustering result comprises: a plurality of categories of optimal clusters;
The modeling training and evaluating module is used for establishing different load prediction models based on a transducer according to different target application scenes, and carrying out model training and evaluating on the load prediction models in each category by utilizing the short-term load history curve and the known characteristics to obtain respective performance optimal models of each category;
and the migration module is used for carrying out model migration on the respective performance optimal model of each category so that each performance optimal model is respectively applied to other application scenes among the categories to carry out short-term load prediction.
CN202310326869.4A 2023-03-29 2023-03-29 Short-term load prediction method and device for electric power system Pending CN116404637A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310326869.4A CN116404637A (en) 2023-03-29 2023-03-29 Short-term load prediction method and device for electric power system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310326869.4A CN116404637A (en) 2023-03-29 2023-03-29 Short-term load prediction method and device for electric power system

Publications (1)

Publication Number Publication Date
CN116404637A true CN116404637A (en) 2023-07-07

Family

ID=87011718

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310326869.4A Pending CN116404637A (en) 2023-03-29 2023-03-29 Short-term load prediction method and device for electric power system

Country Status (1)

Country Link
CN (1) CN116404637A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117422182A (en) * 2023-12-18 2024-01-19 保大坊科技有限公司 Data prediction method, device and storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117422182A (en) * 2023-12-18 2024-01-19 保大坊科技有限公司 Data prediction method, device and storage medium

Similar Documents

Publication Publication Date Title
CN111199016B (en) Daily load curve clustering method for improving K-means based on DTW
Li et al. A new flood forecasting model based on SVM and boosting learning algorithms
CN111861013B (en) Power load prediction method and device
CN116028838B (en) Clustering algorithm-based energy data processing method and device and terminal equipment
CN110443417A (en) Multiple-model integration load forecasting method based on wavelet transformation
CN111008726B (en) Class picture conversion method in power load prediction
CN111275571B (en) Resident load probability prediction deep learning method considering microclimate and user mode
Lu et al. A hybrid model based on convolutional neural network and long short-term memory for short-term load forecasting
CN109086926B (en) Short-time rail transit passenger flow prediction method based on combined neural network structure
CN114792156A (en) Photovoltaic output power prediction method and system based on curve characteristic index clustering
CN117096867A (en) Short-term power load prediction method, device, system and storage medium
CN115310674A (en) Long-time sequence prediction method based on parallel neural network model LDformer
CN114399021A (en) Probability wind speed prediction method and system based on multi-scale information
CN112418485A (en) Household load prediction method and system based on load characteristics and power consumption behavior mode
CN116404637A (en) Short-term load prediction method and device for electric power system
CN117458440A (en) Method and system for predicting generated power load based on association feature fusion
CN115600640A (en) Power load prediction method based on decomposition network
CN114154716B (en) Enterprise energy consumption prediction method and device based on graph neural network
CN117674098B (en) Multi-element load space-time probability distribution prediction method and system for different permeability
CN108694475B (en) Short-time-scale photovoltaic cell power generation capacity prediction method based on hybrid model
Yao et al. A novel data-driven multi-energy load forecasting model
CN116167465A (en) Solar irradiance prediction method based on multivariate time series ensemble learning
Patil et al. Application of ARIMA and 2D-CNNs Using Recurrence Plots for Medium-Term Load Forecasting
CN116011633A (en) Regional gas consumption prediction method, regional gas consumption prediction system, regional gas consumption prediction equipment and Internet of things cloud platform
CN115481788A (en) Load prediction method and system for phase change energy storage system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination