CN114358185A - Improved K-means clustering CCA-BilSTM-based multi-dimensional short-term power load prediction method - Google Patents

Improved K-means clustering CCA-BilSTM-based multi-dimensional short-term power load prediction method Download PDF

Info

Publication number
CN114358185A
CN114358185A CN202210003822.XA CN202210003822A CN114358185A CN 114358185 A CN114358185 A CN 114358185A CN 202210003822 A CN202210003822 A CN 202210003822A CN 114358185 A CN114358185 A CN 114358185A
Authority
CN
China
Prior art keywords
data
historical
load
vector
daily
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210003822.XA
Other languages
Chinese (zh)
Inventor
李鑫
李�昊
杨桢
李洪珠
左辉
马煜翔
徐彤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Liaoning Technical University
Original Assignee
Liaoning Technical University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Liaoning Technical University filed Critical Liaoning Technical University
Priority to CN202210003822.XA priority Critical patent/CN114358185A/en
Publication of CN114358185A publication Critical patent/CN114358185A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/90335Query processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J3/00Circuit arrangements for ac mains or ac distribution networks
    • H02J3/003Load forecast, e.g. methods or systems for forecasting future load demand
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02EREDUCTION OF GREENHOUSE GAS [GHG] EMISSIONS, RELATED TO ENERGY GENERATION, TRANSMISSION OR DISTRIBUTION
    • Y02E40/00Technologies for an efficient electrical power generation, transmission or distribution
    • Y02E40/70Smart grids as climate change mitigation technology in the energy generation sector
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S10/00Systems supporting electrical power generation, transmission or distribution
    • Y04S10/50Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Economics (AREA)
  • Databases & Information Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Power Engineering (AREA)
  • Probability & Statistics with Applications (AREA)
  • Public Health (AREA)
  • Water Supply & Treatment (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a multi-dimensional short-term power load prediction method based on improved K-means clustering CCA-BilSTM, and belongs to the technical field of power load prediction. The method comprises the steps of firstly preprocessing historical load and multi-dimensional data, removing abnormal values and supplementing missing values monthly; then, initially determining K daily load characteristic labels, clustering historical load data by adopting a PCCs improved K-means algorithm, analyzing DBi indexes, and determining the K value of each daily load label and the characteristics of the corresponding load label w by combining the analysis result and engineering experience; constructing a preprocessed historical multidimensional data vector set, carrying out CCA contribution degree analysis on the preprocessed historical multidimensional data vector set and historical load data, and screening out 10 characteristic variables to reconstruct a characteristic data set; and completing the training of the BilSTM network by using the historical load data, the load labels and the reconstruction data set, and finally realizing the prediction of the short-term power load data in the future. By utilizing the short-term power load prediction method provided by the invention, the time redundancy of the prediction process can be reduced, the dimension of the required external variable is reduced, and the accuracy and the universality of the load prediction result are effectively enhanced.

Description

Improved K-means clustering CCA-BilSTM-based multi-dimensional short-term power load prediction method
Technical Field
The invention relates to the technical field of power load prediction, in particular to a multi-dimensional short-term power load prediction method based on improved K-means clustering CCA-BilSTM.
Background
The short-term power load prediction is used as an important component for guaranteeing the stable operation of a power system, and the accurate prediction is helpful for improving a supply side structure, helping a power grid to master the change trend of the load demand of a user in time, guiding a power supply party to make a more efficient and safe power scheduling strategy, and providing important reference for the construction of an intelligent power grid and the development of energy conservation and emission reduction work.
According to the difference of the adopted basic prediction methods, the current power load prediction method can be divided into 3 types based on a traditional mathematical model, a single intelligent algorithm and a combined structure. The prediction method based on the traditional mathematical model mainly adopts pure mathematical derivation and analyzes the load change trend through the characteristics of data, and the representative methods comprise Kalman filtering, exponential smoothing, wavelet analysis, linear regression analysis and the like. In the early stage, the method is widely applied due to the advantages of small operand and accurate prediction of simple linear loads, but the prediction accuracy and the adaptability of the method are greatly reduced due to the increase of nonlinear loads. The prediction method based on the single intelligent algorithm is developed mainly on the basis of artificial intelligent algorithms such as a shallow neural network algorithm, a support vector machine and the like, has certain improvement on the nonlinear data processing capacity compared with the traditional mathematical model method, can analyze multidimensional information to improve the prediction precision, and is easy to cause the problems of unstable constructed network, non-convergence of obtained results and the like because of the problems of insufficient model depth and weak generalization capacity. The load prediction method of the combined structure generally combines a plurality of algorithms with different advantages directly or in a weighting way, so that the overall performance advantage of the method is improved to meet the actual demand of short-term power load prediction. Generally, the precision of the combined structure method is higher than that of a single model, but in actual situations, the short-term power load is influenced by multi-dimensional parameters to influence the result of combined structure load prediction, so that final data mining is insufficient, and the accuracy of the final result is influenced. In view of the above, it is necessary to find a multi-dimensional short-term power load prediction method with high accuracy.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a multi-dimensional short-term power load prediction method based on improved K-means clustering CCA-BilSTM.
The technical scheme adopted by the invention is a multi-dimensional short-term power load prediction method based on improved K-means clustering CCA-BilSTM, the general flow is shown in figure 1, and the method comprises the following 9 steps.
Step 1: the historical daily load data and historical multidimensional information preprocessing can be divided into three types of data, namely ordinal type data, daily average type data and nominal type data according to data characteristics, the three types of original data can cause recording errors or data vacancy of the existing historical data due to recording equipment, recording means and the like, the existing historical data is preprocessed and corrected before data analysis, the flow is shown in figure 2, and the specific steps are as follows.
Step 1.1: distinguishing data types, wherein original data such as historical power loads, historical temperatures, historical wind power and the like which are expressed by continuous time functions are ordinal type data, original data obtained by averaging daily average temperatures, daily average wind power and the like on the basis of ordinal type are daily average type data, and discrete time or characteristic data such as seasons, months, weeks, holidays, working days and the like are nominal type data.
Step 1.2: performing preliminary division and bit supplement on ordinal number type historical data, and if the ordinal number type historical data has N days, dividing the ordinal number type historical data into vectors x by taking the days as a unit1、x2、…、xNEach daily vector contains n time granularity elements, if some daily vector x existsiAt a certain time granularity of element xijIf the data is missing, the day is determined as the missing day, and 0 supplement is adoptedThe simultaneous missing data is to be further processed subsequently.
Step 1.3: abnormal value detection is carried out on ordinal number type historical data, and elements under jth time granularity of the ith day are used as xijIndicates (i is 1 to N, j is 1 to N), and sets ordinal type history data x1~xNGrouping according to the monthly parts, and respectively calculating the average value mu of the elements under the granularity of n elements at the same time every dayjAnd standard deviation sigmajThe calculation method is as follows:
Figure BDA0003454658730000021
wherein Ne is the number of days included in each month; using Lauda criterion, Ne x numbers of each month of ordinal type historical data are respectively judgediEach element x in the vectorijWhether the condition is satisfied.
xij∈[μj-3σjj+3σj] (2)
The probability that the value of an element satisfying the condition falls in this interval can reach 0.9973 theoretically if the element xijIf the element does not fall into the interval, the point is determined to be an abnormal value, the element value should be set to 0 to be regarded as a missing value, and the day vector is also regarded as a missing day.
Step 1.4: in order to ensure the continuity of the whole data, after abnormal values are removed, missing values in various historical data are subjected to bit complementing, and missing elements x in each missing day of the order number type variable are subjected to bit complementingijSupplementing data by adopting a modified Akima cubic Hermite interpolation method, and deleting an element x after supplementingijSatisfies the following conditions:
Figure BDA0003454658730000022
in the formula, x(i-A)jThe value of the non-missing element with the shortest time at the granularity of the same time before the missing date is the value of the non-missing element with the difference of A days on two days; x is the number of(i+B)jThe value of the non-missing element with the shortest time at the granularity of the same time after the missing date is the value of the non-missing element with the difference of B days between two days; x'(i-A)jFor raw time series data at x(i-A)jThe derivative value of (d); x'(i+B)jFor raw time series data at x(i+B)jThe derivative value of (c).
Then adopting linear difference value to delete element x of deletion dayijLinear interpolation is carried out again to supplement the missing element x after the supplementijSatisfies the following conditions:
Figure BDA0003454658730000031
taking the mean value of the two interpolation results to finally obtain xijThe estimated values of (c) are:
Figure BDA0003454658730000032
daily average data loss is completed by manually recalculating the average value, and nominal data loss is completed by a mode in the same month.
Step 2: if the historical daily load has N days of load data, the historical daily load is set to have k daily load label numbers, the value of k is an integer from 1 to M, and the numerical value of M is manually set according to experience.
And step 3: and respectively performing M-time clustering on the historical daily load data of N days by adopting a K-means algorithm improved based on PCCs, and clustering each time into K classes according to the set daily load label number, wherein the specific steps are as follows.
Step 3.1: dividing the historical daily load data of N days into L1、L2、…、LNN sample sets, labeled LiIs a vector formed by the historical daily loads on the day i.
Step 3.2: generating k daily clustering center vectors according to the number of the daily load labels, wherein the labels of the clustering center vectors are c1、c2、…、ck
Step 3.3: respectively calculate Li(i 1-N) PCCs distance d of sample set from k cluster centersi1~dik
Figure BDA0003454658730000033
In the formula (d)ijIs LiSample distance and jth cluster center cjPCCs distance in between; rhoijIs LiSample vector and cluster center vector cjThe correlation coefficient is in the range of [ -1,1];cov(xi,cj) Is the covariance between the two vectors; sigmaLiAnd σcjAre respectively LiVector sum cjStandard deviation of the vector; n is LiSample time particle size number; l isizIs LiZ-th element in the vector, cjzIs cjThe z-th element in the vector, wherein z is 1-n;
Figure BDA0003454658730000034
are respectively LiVector sum cjThe average of the vectors.
Step 3.4: comparison LiThe distance of k PCCs in the sample set is obtained to obtain LiShortest PCCs distance d of sample setiNamely:
di=mindij (7)
at this time dijCorresponding cluster center cjI.e. distance sample LiThe nearest cluster center is then considered as labeled LiThe load data of day belongs to the jth load characteristic label, and by analogy, the load data of day of N days can be classified under k load characteristic labels respectively.
And 4, step 4: respectively evaluating the clustering effect of the historical daily load data classification results adopting different daily load characteristic labels k by adopting Davies-Bouldin (DBi) indexes, wherein the DBi index calculation method comprises the following steps:
Figure BDA0003454658730000041
in the formula, k is the clustering number; siFor all kinds of neutron elements to the clustering centerA value distance; r isijAdopting Euclidean distance as the center distance between the ith class and the jth class; i CiI is the number of elements in the ith class; l is a historical load element set; and analyzing the DBi results of different daily load label numbers.
Although the smaller the DBi index is, the tighter the various objects and the class center is, the greater the separation degree between classes is, that is, the better the clustering effect is, in some cases, the DBi index cannot be analyzed according to the index size at one step, and the proper DBi index is selected as a standard to judge whether the clustering result meets the expected requirement or not by combining historical experience and actual conditions.
And finally determining the number W of the labels of the historical daily load data as W through DBi cluster analysis.
And 5: in order to facilitate the analysis of the multi-dimensional data correlation, a matrix [ X ] is constructed by utilizing the preprocessed multi-dimensional historical data1,X2,…]Each column vector represents a set of extrinsic influence variables.
Step 6: performing CCA analysis on each external influence variable column vector X in the constructed multidimensional data matrix and the preprocessed historical load data L respectively, and finally obtaining the contribution degree R of all external influence column vectors to the historical load data.
Step 6.1: all the X-column vectors and the historical load data L are normalized according to the following method:
Figure BDA0003454658730000042
in the formula, XcIs the corresponding vector obtained after normalization of X vector, wherein X is the element in the X vectorminIs the minimum of the elements in the X vector, XmaxIs the maximum of the elements in the X vector.
Step 6.2: setting one-dimensional vectors obtained by projecting the normalized column vector Xc and the normalized historical load data Lc as X 'and L', wherein the relationship between the X 'and the L' is as follows:
X'=αTXc,L'=βTLc (9)
here, α and β are linear coefficients of two vectors, and only the direction and not the magnitude are considered.
To ensure that the solution results have a general constraint on α and β:
Figure BDA0003454658730000051
in the formula, SXX、SLLThe variances of X and L, respectively.
Step 6.3: defining a Lagrangian function:
Figure BDA0003454658730000052
in the formula, SXLFor the covariance between vectors X and L, λ and θ are lagrange coefficients, the maximum of which is taken.
For J (α, β), the partial derivatives of α and β are calculated and the result is 0, respectively, and:
Figure BDA0003454658730000053
step 6.4: pair formula (12) is left-multiplied by alphaTAnd betaTAnd using the conditions of formula (10):
λ=θ=αTSXLβ (13)。
further finishing the mixture to obtain:
Figure BDA0003454658730000054
order to
H=SXX -1SXLSLL -1SLX (15)
And carrying out singular value decomposition on the H to obtain the maximum singular value and left and right singular value vectors u and v corresponding to the maximum singular value.
Step 6.5: calculating linear coefficient vectors α and β:
Figure BDA0003454658730000055
x 'and L' are obtained with reference to formula (9).
Step 6.6: calculating the contribution degree R of the influence factor vector X 'to the historical load data vector L':
Figure BDA0003454658730000056
and 7: and sorting the contribution degrees R of all the influence factors X to the historical load data from high to low, ensuring that 10 influence variables with higher contribution degrees are comprehensively selected relatively, and reconstructing a characteristic data set O.
And 8: sending the reconstructed characteristic data set O, the label number W for determining the historical daily load data and the data set constructed by the historical load data L into a BilSTM network for training, wherein a BilSTM algorithm comprises a forward group of LSTM subunit networks and a reverse group of LSTM subunit networks, the input data are simultaneously processed by the LSTM networks in two directions, and the final two results are output after further superposition processing, the specific process of the BilSTM network training is shown in figure 3, wherein (a) is the internal process of each LSTM subunit, and (b) is the process of the whole BilSTM network, and the specific process of the training is as follows.
Step 8.1: input data xtAn input forgetting gate combined with the output result h of the LSTM subunit in the last time periodt-1Screening and retaining the result processed by the previous memory unit, adjusting the state parameter of the LSTM unit, and finally forgetting the result f output by the gatetComprises the following steps:
ft=σ(Wfxt+Ufht-1+bf) (18)
where σ is an activation function, a sigmoid function, W, is generally usedf、Uf、bfAre network training parameters.
Step 8.2: input data xtInput through the input gate, combined with the previous time periodOutput result h of the LSTM subunitt-1To update the state C of the LSTM celltThe activation function adopted by the link is tanh, and the final result i output by the input gatetComprises the following steps:
Figure BDA0003454658730000061
where σ is the activation function, sigmoid function is usually used, tanh is the activation function, Wi、Ui、bi、Wc、Uc、 bcAre network training parameters.
Step 8.3: according to the result i of the input gatetControlling updated cell state CtFinal states are used to update CtSimultaneously with the result f of the forgetting gatetFinally, the updated state C of the LSTM subunit is obtainedtComprises the following steps:
Figure BDA0003454658730000062
step 8.4: will input data xtFed into the output gate, producing a result o of the output gatet
fo=σ(Woxt+Uoht-1+bo) (21)
Where σ is an activation function, a sigmoid function, W, is generally usedo、Uo、boAre all network training parameters, usually boIs 1.
Step 8.5: results otState C updated by the LSTM celltControlling and outputting the final result h corresponding to the LSTM unit local datatComprises the following steps:
Figure BDA0003454658730000063
step 8.6: the result h obtained by the forward and reverse two-layer LSTM subunits of the BilSTM is obtainedtAnd h'tPerforming superposition operation, and finally outputting the result y of the BilSTM networktComprises the following steps:
Figure BDA0003454658730000064
in the formula, atWeights output for the forward propagating LSTM subunit layer, btWeights output for back-propagating LSTM subunit layers, ctThe parameters are optimized for the bias of the ensemble at the current time.
And step 9: and (3) sending the data label information of the time needing to be predicted and the 10 external factor information after the existing dimensionality reduction as input variables into the trained BilSTM network to finish the work of predicting the future power load data.
Adopt the produced beneficial effect of above-mentioned technical scheme to lie in: the invention provides a multi-dimensional short-term power load forecasting method based on improved K-means clustering CCA-BilSt (binary-weighted average) algorithm, which is characterized in that clustering indexes of a conventional K-means algorithm are improved by PCCs (Primary control Cs) distance, load clustering analysis is carried out, DBi indexes are utilized for carrying out clustering evaluation, daily short-term power load data are labeled so as to display time variable characteristics, and the time redundancy of a forecasting result can be effectively reduced; evaluating the respective contribution degrees of the multi-dimensional external influence factors by using a CCA algorithm, screening a high-contribution-degree factor reconstruction feature set, and realizing the reduction of the dimension of the data set and the improvement of the algorithm operation efficiency; and performing bidirectional cyclic training on the BilSTM network by using the data set subjected to dimensionality reduction, introducing a time memory unit to further mine load data information in the past time, and improving the accuracy of a final load prediction result.
Drawings
FIG. 1 is a flow chart of a multi-dimensional short-term power load prediction method based on improved K-means clustering CCA-BilSTM.
FIG. 2 is a flow chart of the pretreatment of the present invention.
Fig. 3 is a diagram of the structure of the BiLSTM network of the present invention.
Wherein, (a) LSTM algorithm structure chart; (b) BilSTM algorithm structure chart.
Fig. 4 is a diagram of a DBi index cluster analysis result according to an embodiment of the present invention.
Fig. 5 shows the first 20 ranked results of the CCA analysis variable contribution according to an embodiment of the present invention.
FIG. 6 is a comparison graph of predicted results and actual results according to the embodiment of the present invention.
Detailed Description
The following detailed description of embodiments of the present invention is provided in connection with the accompanying drawings and examples. The following examples are intended to illustrate the invention but are not intended to limit the scope of the invention.
In this embodiment, taking the power load of a certain city from 11 months to 8 months of the next year in Jiangsu province in China as an example and mostly degree information data related to the power load as well as randomly selecting 2 days (20 days in total) from each month as a test set and other training sets, processing all training set data according to the first 8 steps of the overall flow shown in fig. 1 and training a model, preprocessing test set data according to the 1 st step shown in fig. 1, and entering the 9 th step to verify the method after the training set finishes the first 8 steps.
Step 1: the historical daily load data and historical multidimensional information preprocessing can be divided into three types of data, namely ordinal type data, daily average type data and nominal type data according to data characteristics, the three types of original data can cause recording errors or data vacancy of the existing historical data due to recording equipment, recording means and other reasons, the existing historical data is preprocessed and corrected before data analysis, the flow is shown as figure 2, and the specific steps are as follows:
step 1.1: the data type is distinguished, ordinal type data is mostly expressed by continuous time functions, daily average type data is obtained by daily averaging on the basis of ordinal type, discrete time or characteristic data is nominal type data, and therefore the corresponding relation between the original data of the embodiment and the ordinal type, the daily average type and the nominal type is shown in table 1.
TABLE 1
Type (B) Raw data
Ordinal number type Historical power load, historical temperature, and historical wind power
Average daily pattern Average daily temperature, average daily wind power, and average daily load
Of the nominal type Season, working day, weather, wind direction, month, week, holiday
Step 1.2: the ordinal number type historical data is divided into vectors x by taking the day as a unit when the ordinal number type historical data of a city in Jiangsu province has 304 days1、x2、…、x304Each day vector contains 96 time granularity elements, if some day vector x existsiAt a certain time granularity of element xijAnd if the data is missing, the day is determined as the missing day, and 0 is adopted to fill up the missing data for further subsequent processing.
Step 1.3: abnormal value detection is carried out on ordinal number type historical data, and elements under jth time granularity of the ith day are used as xijRepresents (i is 1 to 304, j is 1 to 96), and is ordinal type history data x1~x304Grouping according to the monthly parts, and respectively calculating the average value mu of elements under the granularity of 96 elements at the same time every dayjAnd standard deviation sigmajThe calculation method is as follows:
Figure BDA0003454658730000081
in the formula, Ne is the number of days included in each month, and the historical data includes 10 months of historical data, wherein Ne values of 1 month, 3 months, 5 months, 7 months and 8 months are 31, Ne values of 4 months, 6 months, 9 months and 11 months are 30, and Ne value of 2 months is 28.
Using Lauda criterion, Ne x numbers of each month of ordinal type historical data are respectively judgediEach element x in the vectorijWhether the condition is satisfied:
xij∈[μj-3σjj+3σj] (2)
the probability that the value of an element satisfying the condition falls in this interval can reach 0.9973 theoretically if the element xijIf the element does not fall into the interval, the point is determined to be an abnormal value, the element value should be set to 0 to be regarded as a missing value, and the day vector is also regarded as a missing day.
Step 1.4: in order to ensure the continuity of the whole data, after abnormal values are removed, missing values in various historical data are subjected to bit complementing, and missing elements x in each missing day of the order number type variable are subjected to bit complementingijSupplementing data by adopting a modified Akima cubic Hermite interpolation method, and deleting an element x after supplementingijSatisfies the following conditions:
Figure BDA0003454658730000082
in the formula, x(i-A)jThe value of the non-missing element with the shortest time at the granularity of the same time before the missing date is the value of the non-missing element with the difference of A days on two days; x is the number of(i+B)jThe value of the non-missing element with the shortest time at the granularity of the same time after the missing date is the value of the non-missing element with the difference of B days between two days; x'(i-A)jFor raw time series data at x(i-A)jThe derivative value of (d); x'(i+B)jFor raw time series data at x(i+B)jThe derivative value of (c).
Then adopting linear difference value to delete element x of deletion dayijLinear interpolation is carried out again to supplement the missing element x after the supplementijSatisfies the following conditions:
Figure BDA0003454658730000091
taking the mean value of the two interpolation results to finally obtain xijThe estimated values of (c) are:
Figure BDA0003454658730000092
daily average data loss is completed by manually recalculating the average value, and nominal data loss is completed by a mode in the same month.
Step 2: the historical daily load training set has 284 days of load data, the highest number of k daily load labels is set, the value of k is an integer from 1 to M, and M is artificially set to be 8 in the embodiment according to a large number of existing documents and historical experiences.
And step 3: clustering is carried out on 284 days of historical daily load data for 8 times respectively by adopting a K-means algorithm improved based on PCCs, each clustering is carried out into K classes according to the set daily load label number, and the method specifically comprises the following steps:
step 3.1: dividing the historical daily load data of 284 days into L1、L2、…、L284And 284 sample sets are provided, and each L sample set is a vector formed by historical daily loads corresponding to subscript numbers of the L sample sets.
Step 3.2: generating k daily clustering center vectors according to the number of the daily load labels, wherein the labels of the clustering center vectors are c1、c2、…、ck
Step 3.3: respectively calculate Li(i-1-284) distance d of sample set from PCCs of k cluster centersi1~dik
Figure BDA0003454658730000093
In the formula (d)ijIs LiSample distance and jth cluster center cjPCCs distance in between; rhoijIs LiSample vector and cluster center vector cjThe correlation coefficient is in the range of [ -1,1];cov(xi,cj) As co-square between two vectorsA difference; sigmaLiAnd σcjAre respectively LiVector sum cjStandard deviation of the vector; l, LizIs LiZ-th element in the vector, cjzIs cjThe z-th element in the vector, wherein z is 1-96; l isi、cjAre respectively LiVector sum cjThe average of the vectors.
Step 3.4: comparison LiThe distance of k PCCs in the sample set is obtained to obtain LiShortest PCCs distance d of sample setiNamely:
di=mindij (7)
at this time dijCorresponding cluster center cjI.e. distance sample LiThe nearest cluster center is then considered as labeled LiThe load data of day belongs to the jth load characteristic label, and by analogy, the load data of day 284 can be classified under 1-8 load characteristic labels respectively.
And 4, step 4: respectively evaluating the clustering effect of the historical daily load data classification results adopting different daily load characteristic labels k by adopting Davies-Bouldin (DBi) indexes, wherein the DBi index calculation method comprises the following steps:
Figure BDA0003454658730000101
in the formula, k is the clustering number; siThe mean distance from each neutron element to the clustering center; r isijAdopting Euclidean distance as the center distance between the ith class and the jth class; i CiI is the number of elements in the ith class; l is a historical load element set; and analyzing the DBi results of different daily load label numbers.
The DBi index clustering analysis results with the label number of 1-8 are shown in FIG. 4.
Although the smaller the DBi index is, the tighter the various objects and the class center is, the greater the separation degree between classes is, that is, the better the clustering effect is, in some cases, the DBi index cannot be analyzed according to the index size at one step, and the proper DBi index is selected as a standard to judge whether the clustering result meets the expected requirement or not by combining historical experience and actual conditions.
As can be seen from the DBi cluster analysis result in fig. 4, the DBi index value is continuously increased along with the increase of the number of the daily load tags, when the number of the daily load tags is 3-4, the change of the DBi index is relatively smooth, and by combining with the existing partial research, the number W of the tags of the historical daily load data is finally determined to be 4, and the load characteristics corresponding to the tags 1-4 are shown in table 2.
Label number Characteristic of daily load Description of the invention
1 High temperature day Beginning of late 7 months to early 8 months
2 Spring and autumn working day 3-5 months and 9-11 months non-weekends
3 Working days in summer and winter Non-weekend non-high temperature day of 12-2 months and 6-8 months
4 Non-working day Weekend and legal holidays
And 5: multiple dimensions for ease of analysisAccording to the correlation, a matrix [ Th ] is constructed by utilizing the preprocessed multidimensional historical data1,Th2,…, Th23,Td1,Td2,…,Td7,Wh1,Wh2,…,Wh23,Wd1,Wd1,…,Wd7,Dd1,Dd2,…,Dd7,Hd1, Hd2,…,Hd7,Lh1,Lh2,…,Lh23,Ld1,Ld2,…,Ld7]Each column vector represents a set of extrinsic influence variables, where ThiHistorical air temperature, Td, i hours agoiHistorical average air temperature, Wh, i days agoiHistorical wind power i hours ago, WdiIs the historical average wind power Dd i days agoiHistorical wind direction, Hd, i days agoiHistorical weather i days ago, LhiHistorical load i hours ago, LdiMean load data for the i ephemeris history.
Step 6: performing CCA analysis on each external influence variable column vector X in the constructed multidimensional data matrix and the preprocessed historical load data L respectively, and finally obtaining the contribution degree R of all external influence column vectors to the historical load data.
Step 6.1: all the X-column vectors and the historical load data L are normalized according to the following method:
Figure BDA0003454658730000111
in the formula, XcIs the corresponding vector obtained after normalization of X vector, wherein X is the element in the X vectorminIs the minimum of the elements in the X vector, XmaxIs the maximum of the elements in the X vector.
Step 6.2: setting one-dimensional vectors obtained by projecting the normalized column vector Xc and the normalized historical load data Lc as X 'and L', wherein the relationship between the X 'and the L' is as follows:
X'=αTXc,L'=βTLc (9)
here, α and β are linear coefficients of two vectors, and only the direction and not the magnitude are considered.
To ensure that the solution results have a general constraint on α and β:
Figure BDA0003454658730000112
in the formula, SXX、SLLThe variances of X and L, respectively.
Step 6.3: defining a Lagrangian function:
Figure BDA0003454658730000113
in the formula, SXLFor the covariance between vectors X and L, λ and θ are lagrange coefficients, the maximum of which is taken.
For J (α, β), the partial derivatives of α and β are calculated and the result is 0, respectively, and:
Figure BDA0003454658730000114
step 6.4: pair formula (12) is left-multiplied by alphaTAnd betaTAnd using the conditions of formula (10):
λ=θ=αTSXLβ (13)。
further finishing the mixture to obtain:
Figure BDA0003454658730000115
order to
H=SXX -1SXLSLL -1SLX (15)
And carrying out singular value decomposition on the H to obtain the maximum singular value and left and right singular value vectors u and v corresponding to the maximum singular value.
Step 6.5: calculating linear coefficient vectors α and β:
Figure BDA0003454658730000121
x 'and L' are obtained with reference to formula (9).
Step 6.6: calculating the contribution degree R of the influence factor vector X 'to the historical load data vector L':
Figure BDA0003454658730000122
and 7: and sorting the contribution degrees R of all the influence factors X to the historical load data from high to low, ensuring that 10 influence variables with higher contribution degrees are comprehensively selected relatively, and reconstructing a characteristic data set O.
The first 20 sequenced variables are shown in fig. 5, and in order to ensure the comprehensiveness of the information and consider the contribution degree R of the variables, the following 10 variables are finally selected to construct a feature data set O: lh1~Lh4、Lh22~Lh23、Th1、Ld1、Td1、Wd1
And 8: sending the reconstructed characteristic data set O, the label number W for determining the historical daily load data and the data set constructed by the historical load data L into a BilSTM network for training, wherein a BilSTM algorithm comprises a forward group of LSTM subunit networks and a reverse group of LSTM subunit networks, the input data are processed by the LSTM networks in two directions simultaneously, and the final two results are output after further processing, the specific process of the BilSTM network training is shown in figure 3, wherein, the figure (a) is the internal process of each LSTM subunit, the figure (b) is the process of the whole BilSTM network, and the training process is as follows.
Step 8.1: input data xtAn input forgetting gate combined with the output result h of the LSTM subunit in the last time periodt-1Screening and retaining the result processed by the previous memory unit, adjusting the state parameter of the LSTM unit, and finally forgetting the result f output by the gatetComprises the following steps:
ft=σ(Wfxt+Ufht-1+bf) (18)
where σ is an activation function, a sigmoid function, W, is generally usedf、Uf、bfAll are network training parameters.
Step 8.2: input data xtInput through the input gate, and combine the output result h of the LSTM subunit in the previous time periodt-1To update the state C of the LSTM celltThe activation function adopted by the link is tanh, and the final result i output by the input gatetComprises the following steps:
Figure BDA0003454658730000123
where σ is the activation function, sigmoid function is usually used, tanh is the activation function, Wi、Ui、bi、Wc、Uc、 bcAre network training parameters.
Step 8.3: according to the result i of the input gatetControlling updated cell state CtFinal states are used to update CtSimultaneously with the result f of the forgetting gatetFinally, the updated state C of the LSTM subunit is obtainedtComprises the following steps:
Figure BDA0003454658730000124
step 8.4: will input data xtFed into the output gate, producing a result o of the output gatet
fo=σ(Woxt+Uoht-1+bo) (21)
Where σ is an activation function, a sigmoid function, W, is generally usedo、Uo、boAre all network training parameters, usually boIs 1.
Step 8.5: results otState C updated by the LSTM celltControlling and outputting the final result h corresponding to the LSTM unit local datatComprises the following steps:
Figure BDA0003454658730000131
step 8.6: the result h obtained by the forward and reverse two-layer LSTM subunits of the BilSTM is obtainedtAnd h'tPerforming superposition operation, and finally outputting the result y of the BilSTM networktComprises the following steps:
Figure BDA0003454658730000132
in the formula, atWeights output for the forward propagating LSTM subunit layer, btWeights output for back-propagating LSTM subunit layers, ctThe parameters are optimized for the bias of the ensemble at the current time.
And step 9: and (3) sending the data label information of 20 test sets in the historical data and the 10 external factor information after corresponding dimensionality reduction as input variables into a trained BilSTM network to predict the power load data of the current day.
The predicted result of the data of 6 months is shown as an example, and a comparison graph of the predicted result and the real result is shown in fig. 6.
The 10-month prediction results obtained by the method were evaluated using the mean absolute error percentage (MAPE) and the Root Mean Square Error (RMSE) as evaluation indices, and the evaluation results are shown in table 3.
TABLE 3
Month of the year MAPE RMSR Month of the year MAPE RMSR
11 month 1.24% 0.044 4 month 1.34% 0.051
12 month 1.32% 0.053 Month 5 1.23% 0.045
1 month 1.12% 0.045 6 month 1.05% 0.041
2 month 1.62% 0.058 7 month 1.47% 0.054
3 month 1.51% 0.049 8 month 1.39% 0.042
As can be seen from fig. 6 and table 3, the load prediction results MAPE of each month are less than 1.7% and the RMSR is less than 0.06 by using the improved K-means clustering CCA-BiLSTM-based multi-dimensional short-term power load prediction method.
The same group of data is compared with the performance of other existing partial power load prediction algorithms of single BP, single BilSTM, PSO-BilSTM and SSA-BilSTM, and the comparison result is shown in Table 4.
TABLE 4
Name of method MAPE RMSR
BP 89.25% 2.587
BiLSTM 48.24% 1.258
PSO-BiLSTM 16.21% 0.477
SSA-BiLSTM 9.27% 0.352
Improved K-means clustering CCA-BilSTM 1.05% 0.041
As can be seen from the comparison of Table 4, the improved K-means clustering CCA-BilSTM short-term power load prediction method provided by the invention has a remarkable effect improvement compared with part of the existing power load prediction methods, and the prediction precision is obviously superior to that of BP and BilSTM methods with single structures and is also superior to that of PSO-BilSTM and SSA-BilSTM load prediction methods with part of optimized parameters.
Although the present invention has been described with reference to the preferred embodiments, it should be understood that various changes and modifications can be made therein by those skilled in the art without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (6)

1. A multi-dimensional short-term power load prediction method based on improved K-means clustering CCA-BilSTM is characterized by comprising the following 9 steps:
step 1: the historical daily load data and historical multi-dimensional information preprocessing can be divided into three types of data, namely ordinal type data, daily average type data and nominal type data according to data characteristics, the three types of original data can cause recording errors or data vacancy of the existing historical data due to recording equipment, recording means and the like, and preprocessing and correcting are carried out before data analysis.
Step 2: if the historical daily load has N days of load data, the historical daily load is set to have k daily load label numbers, the value of k is an integer from 1 to M, and the numerical value of M is manually set according to experience.
And step 3: and respectively performing M-time clustering on the historical daily load data of N days by adopting a K-means algorithm improved based on PCCs, and clustering each time into K classes according to the set daily load label number.
And 4, step 4: and (3) respectively evaluating the clustering effect of the historical daily load data classification results adopting different daily load characteristic labels k by adopting Davies-Bouldin (DBi) indexes, wherein the smaller the DBi index is, the tighter each object and the class center is proved to be, the greater the separation degree between classes is, namely, the better the clustering effect is, but in some cases, analysis can not be carried out only according to the size of the indexes during analysis of the DBi index, and the proper DBi index is selected as a standard to judge whether the clustering result meets the expected requirement or not by combining historical experience and actual conditions. And finally determining the number W of the labels of the historical daily load data as W through DBi cluster analysis.
And 5: in order to facilitate the analysis of the multi-dimensional data correlation, a matrix [ X ] is constructed by utilizing the preprocessed multi-dimensional historical data1,X2,…]Each column vector represents a set of extrinsic influence variables.
Step 6: and performing CCA analysis on each external influence variable column vector X in the constructed multidimensional data matrix and the preprocessed historical load data L respectively, and finally obtaining the contribution degree R of all external influence column vectors to the historical load data.
And 7: and sorting the contribution degrees R of all the influence factors X to the historical load data from high to low, ensuring that 10 influence variables with higher contribution degrees are comprehensively selected relatively, and reconstructing a characteristic data set O.
And 8: and sending the reconstructed characteristic data set O, the label number W for determining the historical daily load data and the data set constructed by the historical load data L into a BilSTM network for training, wherein a BilSTM algorithm consists of a forward LSTM subunit network and a reverse LSTM subunit network, input data are simultaneously processed by the LSTM networks in two directions, and finally two results are further processed and then output.
And step 9: and (3) sending the data label information of the time needing to be predicted and the 10 external factor information after the existing dimensionality reduction as input variables into the trained BilSTM network to finish the work of predicting the future power load data.
2. The method for predicting the short-term power load based on the improved K-means clustering CCA-BilSTM multi-dimension as claimed in claim 1, wherein the process of the step 1 is as follows:
step 1.1: distinguishing data types, wherein original data such as historical power loads, historical temperatures, historical wind power and the like which are expressed by continuous time functions are ordinal type data, original data obtained by averaging daily average temperatures, daily average wind power and the like on the basis of ordinal type are daily average type data, and discrete time or characteristic data such as seasons, months, weeks, holidays, working days and the like are nominal type data.
Step 1.2: performing preliminary division and bit supplement on ordinal number type historical data, and if the ordinal number type historical data has N days, dividing the ordinal number type historical data into vectors x by taking the days as a unit1、x2、…、xNEach daily vector contains n time granularity elements, if some daily vector x existsiAt a certain time granularity of element xijAnd if the data is missing, the day is determined as the missing day, and 0 is adopted to fill up the missing data for further subsequent processing.
Step 1.3: abnormal value detection is carried out on ordinal number type historical data, and elements under jth time granularity of the ith day are used as xijIndicates (i is 1 to N, j is 1 to N), and sets ordinal type history data x1~xNGrouping according to the monthly parts, and respectively calculating the average value mu of the elements under the granularity of n elements at the same time every dayjAnd standard deviation sigmajThe calculation method is as follows:
Figure FDA0003454658720000021
wherein Ne is the number of days included in each month; using Lauda criterion, Ne x numbers of each month of ordinal type historical data are respectively judgediEach element x in the vectorijWhether the condition is satisfied.
xij∈[μj-3σjj+3σj] (2)
The values of the elements theoretically satisfying the condition fall in this regionThe probability of the element x can reach 0.9973ijIf the element does not fall into the interval, the point is determined to be an abnormal value, the element value should be set to 0 to be regarded as a missing value, and the day vector is also regarded as a missing day.
Step 1.4: in order to ensure the continuity of the whole data, after abnormal values are removed, missing values in various historical data are subjected to bit complementing, and missing elements x in each missing day of the order number type variable are subjected to bit complementingijSupplementing data by adopting a modified Akima cubic Hermite interpolation method, and deleting an element x after supplementingijSatisfies the following conditions:
Figure FDA0003454658720000022
in the formula, x(i-A)jThe value of the non-missing element with the shortest time at the granularity of the same time before the missing date is the value of the non-missing element with the difference of A days on two days; x is the number of(i+B)jThe value of the non-missing element with the shortest time at the granularity of the same time after the missing date is the value of the non-missing element with the difference of B days between two days; x'(i-A)jFor raw time series data at x(i-A)jThe derivative value of (d); x'(i+B)jFor raw time series data at x(i+B)jThe derivative value of (c).
Then adopting linear difference value to delete element x of deletion dayijLinear interpolation is carried out again to supplement the missing element x after the supplementijSatisfies the following conditions:
Figure FDA0003454658720000023
taking the mean value of the two interpolation results to finally obtain xijThe estimated values of (c) are:
Figure FDA0003454658720000031
daily average data loss is completed by manually recalculating the average value, and nominal data loss is completed by a mode in the same month.
3. The method for predicting the short-term power load based on the improved K-means clustering CCA-BilSTM multi-dimension as claimed in claim 1, wherein the procedure of the step 3 is as follows:
step 3.1: dividing the historical daily load data of N days into L1、L2、…、LNN sample sets, labeled LiIs a vector formed by the historical daily loads on the day i.
Step 3.2: generating k daily clustering center vectors according to the number of the daily load labels, wherein the labels of the clustering center vectors are c1、c2、…、ck
Step 3.3: respectively calculate Li(i 1-N) PCCs distance d of sample set from k cluster centersi1~dik
Figure FDA0003454658720000032
In the formula (d)ijIs LiSample distance and jth cluster center cjPCCs distance in between; rhoijIs LiSample vector and cluster center vector cjThe correlation coefficient is in the range of [ -1,1];cov(xi,cj) Is the covariance between the two vectors; sigmaLiAnd σcjAre respectively LiVector sum cjStandard deviation of the vector; n is LiSample time particle size number; l isizIs LiZ-th element in the vector, cjzIs cjThe z-th element in the vector, wherein z is 1-n;
Figure FDA0003454658720000033
are respectively LiVector sum cjThe average of the vectors.
Step 3.4: comparison LiThe distance of k PCCs in the sample set is obtained to obtain LiShortest PCCs distance d of sample setiNamely:
di=min dij (7)
at this time dijCorresponding cluster center cjI.e. distance sample LiThe nearest cluster center is then considered as labeled LiThe load data of day belongs to the jth load characteristic label, and by analogy, the load data of day of N days can be classified under k load characteristic labels respectively.
4. The method for predicting the short-term power load based on the improved K-means clustering CCA-BilSTM multi-dimension as claimed in claim 1, wherein the procedure of the step 4 is as follows:
the DBi index calculation method is as follows:
Figure FDA0003454658720000041
in the formula, k is the clustering number; siThe mean distance from each neutron element to the clustering center; r isijAdopting Euclidean distance as the center distance between the ith class and the jth class; i CiI is the number of elements in the ith class; l is a historical load element set; and analyzing the DBi results of different daily load label numbers.
5. The method for predicting the short-term power load based on the improved K-means clustering CCA-BilSTM multi-dimension as claimed in claim 1, wherein the procedure of the step 6 is as follows:
step 6.1: all the X-column vectors and the historical load data L are normalized according to the following method:
Figure FDA0003454658720000042
in the formula, XcIs the corresponding vector obtained after normalization of X vector, wherein X is the element in the X vectorminIs the minimum of the elements in the X vector, XmaxIs the maximum of the elements in the X vector.
Step 6.2: setting one-dimensional vectors obtained by projecting the normalized column vector Xc and the normalized historical load data Lc as X 'and L', wherein the relationship between the X 'and the L' is as follows:
X'=αTXc,L'=βTLc (9)
here, α and β are linear coefficients of two vectors, and only the direction and not the magnitude are considered.
To ensure that the solution results have a general constraint on α and β:
Figure FDA0003454658720000043
in the formula, SXX、SLLThe variances of X and L, respectively.
Step 6.3: defining a Lagrangian function:
Figure FDA0003454658720000044
in the formula, SXLFor the covariance between vectors X and L, λ and θ are lagrange coefficients, the maximum of which is taken.
For J (α, β), the partial derivatives of α and β are calculated and the result is 0, respectively, and:
Figure FDA0003454658720000045
step 6.4: pair formula (12) is left-multiplied by alphaTAnd betaTAnd using the conditions of formula (10):
λ=θ=αTSXLβ (13)
further finishing the mixture to obtain:
Figure FDA0003454658720000051
order to
H=SXX -1SXLSLL -1SLX (15)
And carrying out singular value decomposition on the H to obtain the maximum singular value and left and right singular value vectors u and v corresponding to the maximum singular value.
Step 6.5: calculating linear coefficient vectors α and β:
Figure FDA0003454658720000052
x 'and L' are obtained with reference to formula (9).
Step 6.6: calculating the contribution degree R of the influence factor vector X 'to the historical load data vector L':
Figure FDA0003454658720000053
6. the method for predicting the short-term power load based on the improved K-means clustering CCA-BilSTM multi-dimension as claimed in claim 1, wherein the procedure of the step 8 is as follows:
step 8.1: input data xtAn input forgetting gate combined with the output result h of the LSTM subunit in the last time periodt-1Screening and retaining the result processed by the previous memory unit, adjusting the state parameter of the LSTM unit, and finally forgetting the result f output by the gatetComprises the following steps:
ft=σ(Wfxt+Ufht-1+bf) (18)
where σ is an activation function, a sigmoid function, W, is generally usedf、Uf、bfAre network training parameters.
Step 8.2: input data xtInput through the input gate, and combine the output result h of the LSTM subunit in the previous time periodt-1To update the state C of the LSTM celltThe activation function adopted by the link is tan h, and the final input gate outputsResult of (i)tComprises the following steps:
Figure FDA0003454658720000054
where σ is the activation function, sigmoid function is usually used, tanh is the activation function, Wi、Ui、bi、Wc、Uc、bcAre network training parameters.
Step 8.3: according to the result i of the input gatetControlling updated cell state CtFinal states are used to update CtSimultaneously with the result f of the forgetting gatetFinally, the updated state C of the LSTM subunit is obtainedtComprises the following steps:
Figure FDA0003454658720000061
step 8.4: will input data xtFed into the output gate, producing a result o of the output gatet
fo=σ(Woxt+Uoht-1+bo) (21)
Where σ is an activation function, a sigmoid function, W, is generally usedo、Uo、boAre all network training parameters, usually boIs 1.
Step 8.5: results otState C updated by the LSTM celltControlling and outputting the final result h corresponding to the LSTM unit local datatComprises the following steps:
Figure FDA0003454658720000062
step 8.6: the result h obtained by the forward and reverse two-layer LSTM subunits of the BilSTM is obtainedtAnd h'tPerforming superposition operation, and finally outputting the result y of the BilSTM networktComprises the following steps:
Figure FDA0003454658720000063
in the formula, atWeights output for the forward propagating LSTM subunit layer, btWeights output for back-propagating LSTM subunit layers, ctThe parameters are optimized for the bias of the ensemble at the current time.
CN202210003822.XA 2022-01-04 2022-01-04 Improved K-means clustering CCA-BilSTM-based multi-dimensional short-term power load prediction method Pending CN114358185A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210003822.XA CN114358185A (en) 2022-01-04 2022-01-04 Improved K-means clustering CCA-BilSTM-based multi-dimensional short-term power load prediction method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210003822.XA CN114358185A (en) 2022-01-04 2022-01-04 Improved K-means clustering CCA-BilSTM-based multi-dimensional short-term power load prediction method

Publications (1)

Publication Number Publication Date
CN114358185A true CN114358185A (en) 2022-04-15

Family

ID=81108225

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210003822.XA Pending CN114358185A (en) 2022-01-04 2022-01-04 Improved K-means clustering CCA-BilSTM-based multi-dimensional short-term power load prediction method

Country Status (1)

Country Link
CN (1) CN114358185A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115392387A (en) * 2022-09-01 2022-11-25 国网江苏省电力有限公司镇江供电分公司 Low-voltage distributed photovoltaic power generation output prediction method
CN115876257A (en) * 2023-02-10 2023-03-31 南京城建隧桥智慧管理有限公司 Dynamic determination method for early warning value of tunnel structure health monitoring sensor
CN116258280A (en) * 2023-05-12 2023-06-13 国网湖北省电力有限公司经济技术研究院 Short-term load prediction method based on time sequence clustering
CN116884554A (en) * 2023-09-06 2023-10-13 济宁蜗牛软件科技有限公司 Electronic medical record classification management method and system

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115392387A (en) * 2022-09-01 2022-11-25 国网江苏省电力有限公司镇江供电分公司 Low-voltage distributed photovoltaic power generation output prediction method
CN115392387B (en) * 2022-09-01 2023-08-08 国网江苏省电力有限公司镇江供电分公司 Low-voltage distributed photovoltaic power generation output prediction method
CN115876257A (en) * 2023-02-10 2023-03-31 南京城建隧桥智慧管理有限公司 Dynamic determination method for early warning value of tunnel structure health monitoring sensor
CN116258280A (en) * 2023-05-12 2023-06-13 国网湖北省电力有限公司经济技术研究院 Short-term load prediction method based on time sequence clustering
CN116258280B (en) * 2023-05-12 2023-08-11 国网湖北省电力有限公司经济技术研究院 Short-term load prediction method based on time sequence clustering
CN116884554A (en) * 2023-09-06 2023-10-13 济宁蜗牛软件科技有限公司 Electronic medical record classification management method and system
CN116884554B (en) * 2023-09-06 2023-11-24 济宁蜗牛软件科技有限公司 Electronic medical record classification management method and system

Similar Documents

Publication Publication Date Title
CN111199016B (en) Daily load curve clustering method for improving K-means based on DTW
CN114358185A (en) Improved K-means clustering CCA-BilSTM-based multi-dimensional short-term power load prediction method
WO2018045642A1 (en) A bus bar load forecasting method
CN108596242B (en) Power grid meteorological load prediction method based on wavelet neural network and support vector machine
CN110610121B (en) Small-scale source load power abnormal data identification and restoration method based on curve clustering
CN112258251B (en) Grey correlation-based integrated learning prediction method and system for electric vehicle battery replacement demand
CN111008726B (en) Class picture conversion method in power load prediction
CN110188221B (en) Shape distance-based load curve hierarchical clustering method
CN110717610B (en) Wind power prediction method based on data mining
CN113780153A (en) Cutter wear monitoring and predicting method
CN112308298B (en) Multi-scenario performance index prediction method and system for semiconductor production line
CN113610227A (en) Efficient deep convolutional neural network pruning method
CN115271161A (en) Short-term prediction method for multi-energy load
CN115099461A (en) Solar radiation prediction method and system based on double-branch feature extraction
CN111882114A (en) Short-term traffic flow prediction model construction method and prediction method
CN113221442B (en) Method and device for constructing health assessment model of power plant equipment
Kim et al. Solar radiation forecasting based on the hybrid CNN-CatBoost model
Chen Estimating job cycle time in a wafer fabrication factory: A novel and effective approach based on post-classification
CN117407681A (en) Time sequence data prediction model establishment method based on vector clustering
CN116303386A (en) Intelligent interpolation method and system for missing data based on relational graph
CN116404637A (en) Short-term load prediction method and device for electric power system
CN114998048A (en) Electric quantity change factor analysis method and device, computer equipment and storage medium
CN115545319A (en) Power grid short-term load prediction method based on meteorological similar day set
CN115392387A (en) Low-voltage distributed photovoltaic power generation output prediction method
Siddhant et al. Solar Energy Forecasting using Artificial Neural Network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination