CN112991091A

CN112991091A - Short-term power load prediction method and device based on Stacking algorithm

Info

Publication number: CN112991091A
Application number: CN202110174198.5A
Authority: CN
Inventors: 卢先领; 金辰曦
Original assignee: Jiangnan University
Current assignee: Jiangnan University
Priority date: 2021-02-09
Filing date: 2021-02-09
Publication date: 2021-06-18

Abstract

The invention discloses a short-term power load prediction method and a device based on a Stacking algorithm, wherein the method comprises the following steps: acquiring load data per hour in a historical period, wherein the load data comprises load capacity, weather data and time type; training KELM models of various cores in a basic model layer based on the acquired historical load data, respectively predicting the load of the day to be predicted by adopting each trained KELM model, and acquiring a load prediction value of the day to be predicted; and (3) utilizing a Stacking algorithm, fusing prediction results of KELMs of various different cores in the basic model layer with the acquired historical load data, training a KELM model in the secondary model layer, predicting the load of the day to be predicted by adopting the KELM model in the secondary model layer after training, and obtaining a final load prediction value of the day to be predicted. According to the invention, the KELM models on two sides are constructed, so that the prediction precision of the models can be improved.

Description

Short-term power load prediction method and device based on Stacking algorithm

Technical Field

The invention belongs to the technical field of power load prediction, and particularly relates to a short-term power load prediction method based on a Stacking algorithm, and further relates to a short-term power load prediction device based on the Stacking algorithm.

Background

At present, a single prediction mode is mostly adopted in the field of power load prediction, and a single prediction model has limited prediction capability and lower robustness. And the LSTM and the XGboost are used for carrying out ultra-short-term load prediction on the load in a weighting combination mode, so that higher prediction accuracy is obtained. But the weighted combination mode cannot reflect the influence of the prediction error influence of a single sample in the sample set on the overall weight.

Disclosure of Invention

The invention aims to overcome the defects in the prior art, provides a short-term power load prediction method based on a Stacking algorithm, and solves the technical problems of low prediction precision and system overfitting caused by random parameters in the prior art.

In order to solve the technical problems, the invention provides a short-term power load prediction method based on a Stacking algorithm, which comprises the following steps:

acquiring load data per hour in a historical period, wherein the load data comprises load capacity, weather data and time type;

training KELM models of various cores in a basic model layer based on the acquired historical load data, respectively predicting the load of the day to be predicted by adopting each trained KELM model, and acquiring a load prediction value of the day to be predicted;

and (3) utilizing a Stacking algorithm, fusing prediction results of KELMs of various different cores in the basic model layer with the acquired historical load data, training a KELM model in the secondary model layer, predicting the load of the day to be predicted by adopting the KELM model in the secondary model layer after training, and obtaining a final load prediction value of the day to be predicted.

Optionally, the load amount includes 168 hours of preload, 48 hours of preload, 25 hours of preload and 24 hours of preload, the weather data includes ultraviolet intensity and temperature, the time type includes a holiday type and a time type, the holiday type includes monday to sunday, and the time type includes 1-24 hours.

Optionally, the KELM models of a plurality of different cores in the basic model layer include: a KELM model of linear kernels, gaussian kernels and polynomial kernels.

Optionally, the KELM model in the secondary model layer includes a KELM model of a gaussian kernel.

Optionally, in the training process of the KELM model:

and optimizing the nuclear parameters in the KELM model by adopting a whale optimization algorithm.

Optionally, the optimizing the nuclear parameters in the KELM model by using a whale optimization algorithm includes:

1) introducing a Cauchy inverse cumulative distribution function, generating larger disturbance near a reference target by utilizing the characteristics of smaller Cauchy distribution peak value and longer two ends, expanding the search range, and improving the formula as follows:

in the formula, X is whale position, A is coefficient vector, t is iteration number, and r is random number between 0 and 1;

2) introducing a dynamic inertia weight combining nonlinear degressive and random factors, weakening the influence of the optimal individual at the early stage of search on the current individual, wherein the improved weight formula is as follows:

w(t)＝ω_min+(ω_max-ω_min)r·e^-t/Tmax (5)

in the formula, ω_minIs the minimum inertial weight, ω_maxRepresents the maximum inertia weight, T is the current iteration number, T_maxIs the maximum iteration number;

3) introducing a variable spiral strategy, setting a mode close to a target as dynamic spiral enclosure, and updating a formula of an improved new spiral position as follows:

in the formula, X^*For the global optimal position, D is the distance of the whale from the optimal solution, l is a random constant, and b is a constant of the logarithmic spiral shape.

Correspondingly, the invention also provides a short-term power load prediction device based on the Stacking algorithm, which comprises the following steps:

the data acquisition module is used for acquiring load data of each hour in a historical period, and the load data comprises load capacity, weather data and time type;

the basic prediction module is used for training KELM models of various cores in the basic model layer based on the acquired historical load data, predicting the load of the day to be predicted by adopting each trained KELM model respectively and obtaining a load prediction value of the day to be predicted;

and the fusion prediction module is used for training a KELM model in the secondary model layer after fusing the prediction results of the KELMs of various different cores in the basic model layer with the acquired historical load data by using a Stacking algorithm, predicting the load of the day to be predicted by using the KELM model in the secondary model layer after training, and obtaining the final load prediction value of the day to be predicted.

Compared with the prior art, the invention has the following beneficial effects:

1) the traditional whale algorithm is improved, so that the target searching range can be effectively expanded, the searching speed is improved, the local optimum is avoided, and the convergence performance is improved. Parameter optimization is carried out on the nuclear extreme learning machine by adopting an improved whale algorithm, and the overfitting problem caused by improper parameter setting can be effectively avoided.

2) The Stacking algorithm comprehensively considers the prediction difference of each basic model, and excellent generalization capability can be obtained by fusing a plurality of independent models; by adding the correlation property as a new input, the prediction accuracy of the model can be improved.

3) The load conditions of different seasons are predicted by constructing a combined load prediction model, and the result shows that the model has better accuracy and applicability.

Drawings

FIG. 1 is a flow chart of an improved whale optimization algorithm;

FIG. 2 is a flow chart of an improved Stacking algorithm.

Detailed Description

The invention is further described below with reference to the accompanying drawings. The following examples are only for illustrating the technical solutions of the present invention more clearly, and the protection scope of the present invention is not limited thereby.

The invention discloses a short-term power load prediction method based on a Stacking algorithm, which is shown in a figure 1 and comprises the following processes:

step S1, load data of each hour in a historical period is obtained, the load data comprises load quantity, weather data and time types, the load quantity comprises 168-hour preload, 48-hour preload, 25-hour preload and 24-hour preload, the weather data comprises ultraviolet intensity and temperature, the time types comprise a holiday type and a time type, the holiday type comprises Monday to Sunday, and the time type comprises 1-24 hours.

Hourly load data constitutes a row vector including ultraviolet intensity (UV), holiday type, hour type, temperature, 168 hour preload, 48 hour preload, 25 hour preload, 24 hour preload.

For example, the data from 1/2010 to 30/3/31/2010 are selected to predict the load data of 31/3/2136 hours in the history period, so that the acquired history data are 2136 rows, 8 columns and 8 is the number of high correlation variables.

And step S2, training KELM models of various cores based on the acquired historical load data, and respectively predicting the load of the day to be predicted by adopting each trained KELM model to obtain the predicted value of the load of the day to be predicted.

And dividing the acquired historical load data into a training set and a testing set by a cross-validation method.

In view of the randomness of the extreme learning machine kernel mapping, different influences on prediction can be caused by setting different numbers of neurons in the hidden layer and input weights. The kernel function is introduced, so that parameter setting can be reduced, and the calculation speed is higher, the prediction precision is higher and the universality is stronger when the regression problem is solved. The nuclear parameters in the model are optimized through a whale algorithm, and the prediction capability of the model can be effectively improved. Meanwhile, in order to overcome the defect that the whale algorithm is easy to fall into local optimization, a reference target, a moving mode and the like in the algorithm are optimized, and the prediction capability of the load prediction model is further improved.

An Extreme Learning Machine (ELM) is a single hidden layer feed-forward network that randomly generates input layer weight matrices and hidden layer biases. The ELM model of the random mapping can be expressed as:

in the formula (I), the compound is shown in the specification,

representing the output result of the ELM, g (-) is a sigmod function, omega represents an input layer weight matrix, x represents a model input matrix, namely an acquired data forming matrix, b represents a hidden layer bias matrix, beta represents an output layer weight matrix, and h () is the conversion of the previous g (-) so as to facilitate the writing of other formulas later.

ELM is obtained by solving a linear equation set H beta-T, and adopting a regularization coefficient C in an optimization stage, and deducing an output layer weight matrix beta as follows:

in the formula, H is a hidden layer mapping matrix, I is an identity matrix, and T is a target value matrix.

In the original ELM, w and b are set immediately, which results in HH^TAlso a random value, and after introducing a Kernel function in a Kernel Extension Learning Machine (KELM), the random mapping is converted into a Kernel mapping, HH^TBecomes a constant value.

According to the Mercer condition (any semi-positive function can be taken as a kernel function), the kernel matrix of the KELM is defined as:

in the formula, Ω represents a kernel matrix, K (·) represents a kernel function, i and j are all random values, and the value range is (0, N).

For parameters in the kernel function, such as penalty factors, different parameter settings have different influences on the prediction result, and the number of nodes of the hidden layer of the network cannot be determined due to random setting of the weight matrix and the hidden layer bias, so that the prediction difficulty is increased, and the prediction performance is reduced. Aiming at the problems brought by random mapping, a kernel function can be introduced to map the input historical load data to a high-dimensional feature space, so that the computing capability is improved, the linear separability of the data is enhanced, and the operation difficulty is simplified. The method adopts different kernel functions, such as linear kernel, Gaussian kernel and polynomial kernel, to map data in various ways, so as to mine various data information.

Herein, Whale Optimization Algorithm (WOA) is adopted to set the parameters of the KELM model of different kernel functions reasonably. The whale algorithm is used for obtaining the optimal parameters in the model by calculating the fitness function, and the final output result is the optimal nuclear parameters.

The WOA optimization process, as shown in fig. 1, is to simulate a whale feeding process to perform pseudo traversal on a data range, determine spiral updating or continue to search for an optimal solution according to the random probability, and determine a searching mode according to the size of a coefficient vector if the search continues, so as to search for a minimum error under the optimal condition. The final optimization result is to find the optimal parameter to minimize the error of the model prediction effect.

The conventional WOA updates its position by three ways, namely, enclosing a prey, driving the prey and randomly searching, and determines the way of updating the position by using random probability and vector coefficients set artificially. Although the algorithm has high convergence speed and simple operation, the random generation of the reference target can cause an early-maturing phenomenon under the condition of high convergence. In addition, the static weight and the single target approach can cause the search result to be trapped in local optimum. An Improved Whale Algorithm (Improved Whale Optimization Algorithm, IWOA) is proposed herein, with the following specific improvements:

the Cauchy inverse cumulative distribution function is introduced, and the characteristics of small Cauchy distribution peak value and long two ends are utilized to generate large disturbance near the reference target, so that the search range is expanded, and the local optimum is easier to jump out. The improved formula is as follows:

1. the Cauchy inverse cumulative distribution function is introduced, and the characteristics of small Cauchy distribution peak value and long two ends are utilized to generate large disturbance near the reference target, so that the search range is expanded, and the local optimum is easier to jump out. The improved formula is as follows:

in the formula, X is whale position, A is coefficient vector, t is iteration number, and r is random number between 0 and 1.

2. And the dynamic inertia weight combining nonlinear degressive with the random factor is introduced, so that the influence of the optimal individual at the early stage of searching on the current individual is weakened, and the global searching capability of the algorithm is improved. The improved weight formula is:

w(t)＝ω_min+(ω_max-ω_min)r·e^-t/Tmax (5)

in the formula, ω_minIs the minimum inertial weight, ω_maxRepresents the maximum inertia weight, T is the current iteration number, T_maxIs the maximum number of iterations.

3. And a variable spiral strategy is introduced, and a mode close to the target is set as dynamic spiral surrounding, so that the search capability of the unknown field is increased. In combination with the dynamic inertia factor, the improved new spiral position update formula is as follows:

And training the KELM models of various different kernels based on a training set, and respectively predicting the load of the day to be predicted by adopting each trained KELM model to obtain the predicted value of the load of the day to be predicted.

And step S3, utilizing a Stacking algorithm to train a KELM model in the secondary model layer after fusion processing is carried out on the prediction results of the KELMs of various cores in the basic model layer and the acquired historical load data, and predicting the load of the day to be predicted by adopting the KELM model in the secondary model layer after training to obtain the final load prediction value of the day to be predicted.

Although a single model subjected to parameter optimization can obtain a better prediction result, according to the NFL (No Free Lunch) theory, the single model has limited prediction capability and weak generalization performance, and the Stacking algorithm model has stronger heterogeneous data processing capability and generalization capability and can improve the prediction accuracy of the model, so that the Stacking algorithm can be adopted to fuse a plurality of models to predict future data. However, the algorithm has the defect of hidden correlation characteristic loss in the fusion process, and the time sequence correlation characteristic is embedded in the training process to serve as new input, so that the proportion of time sequence information is increased, and the generalization capability and the prediction accuracy of the model are further improved.

The method utilizes the Stacking algorithm to train a plurality of KELM models of different kernel functions, and the model results are shown in figure 2, so as to improve the prediction accuracy. The fusion system is designed into a two-layer structure, wherein the first layer is a system basic model layer and is integrated by KELM models of different cores; the second layer is a secondary model layer and is integrated by a single KELM model, and the second layer and the first layer jointly form a fusion model. Although the fusion model can optimize the combined structure through excellent self-learning capability and improve the prediction performance of the model, only the basic model is considered as input in the training process, the importance of data in the time dimension is ignored, and the time sequence is not considered in the training process to generate an overfitting phenomenon. Whereas in conventional non-linear regression analysis the final prediction result is related to the current regression feature vector.

In the basic model layer, loads of days to be predicted are predicted by KELM models of different cores respectively, and corresponding prediction results are obtained.

In the secondary model, the acquired historical load data (load amount, weather data and date data) and the output of the basic model are fused together and then used as the input of the secondary model, and the final load prediction result of the day to be predicted is output. The historical load data is used as input, so that the proportion of time-series characteristic variables (historical load) in the fusion process is improved, and the prediction accuracy of the model is further improved. Meanwhile, the generalization capability of the model can be improved through multi-core fusion.

The fusion formula is as follows:

Iz_i＝F(F₁(X_i),...,F_h(X_i),...,F_n(X_i),V) (7)

in the formula, X_iFor model input variables, and the data collected in step S1, F_hIs the h-th basic model function of the first layer, F is the prediction model function of the second layer, n is the number of the first basic models, and V is the high correlation variable (historical load data). The output of the h-th basic model of the first layer recorded in the fusion process is F_h(X_i) And takes it as input to the second layer prediction model.

The number of the basic models is strongly related to the fusion effect, the effect of complementary fusion of the models cannot be achieved when the number of the models is small, redundancy is caused when the number of the models is too large, the complexity of system parameters and the prediction time are increased, and 3 to 5 basic models are suitable for being used frequently. In the design, KELMs of three different kernel functions of a Gaussian kernel, a linear kernel and a polynomial kernel are put into a basic model layer of a Stacking algorithm fusion system, and a G-KELM (Gaussian kernel limit learning machine) with strong learning capacity is put into a secondary model layer. The highly correlated time sequence variable (historical load data) is set and is used as the input of the secondary model layer together with the output of the basic model, and the prediction performance of the Stacking algorithm is improved.

The invention has the following beneficial effects:

1. and optimizing the nuclear parameters in the nuclear extreme learning machine by using a whale algorithm, improving the optimization algorithm, improving the efficiency of searching each nuclear parameter in the model, and enhancing the executable capacity of the research scheme.

2. The Stacking fusion algorithm is improved, so that the model can learn more hidden information, the prediction capability of the model is improved, and the adaptability of the model to load prediction is enhanced.

Examples

The power load prediction at the whole time is carried out on 3/31 days in 2010 in Malaysia, and the training data is load data from 1/1 day to 3/30 days in 2010 (2136 time points in total). The data is sourced from the regional power company and has a load unit of gigawatts (kMW). The training and testing herein was performed in a Matlab2019b environment using a microcomputer platform from Intel Core (i) i5-9400F CPU, NVDIA GeForce GTX 1660 and ARM8.00GB.

The Mean Absolute Percentage Error (MAPE) is selected as the main evaluation standard of the prediction performance of each model, and the Root Mean Square Error (RMSE) is selected as the auxiliary evaluation standard. MAPE, RMSE were calculated as follows:

in the formula: y is_tThe real value of the load at the time t; y is_tA predicted power load value at time t; n is the power load data volume.

Firstly, the WOA is improved, an improved algorithm (IWOA) and the WOA are respectively used for solving the fitness function, the IWOA finds the optimal solution in the 10 th iteration, and the IWOA cannot find the optimal solution in 50 iterations, so that the IWOA is faster in search speed and stronger in convergence performance.

IWOA is used for optimizing parameters in G-KELM, and load amount of the same day is predicted by using IWOA-G-KELM, G-KELM and WOA-KELM, and the obtained prediction error result is shown in Table 1. As can be seen from Table 1, the prediction errors of IWOA-G-KELM and WOA-G-KELM are lower than those of G-KELM, which indicates that the prediction errors can be effectively reduced by optimizing the parameters of the model. In addition, compared with the WOA-G-KELM, the IWOA-G-KELM has the advantages that MAPE is reduced by 0.029%, RMSE is reduced by 0.016kMW, and the IWOA can effectively improve the prediction accuracy of the model.

TABLE 1 prediction error before and after look-up

In the model fusion process, the Stacking algorithm is improved by adding invisible correlation characteristics to a secondary model layer of the Stacking algorithm, an unmodified model is marked as Sta-KELM, and an improved model is marked as iSta-KELM. The typical stealth correlation properties added are shown in table 2.

TABLE 2 implicit Association characteristics

In order to verify the effectiveness of model improvement on load prediction, the iSta-KELM, the Sta-KELM and each single-core model are used for predicting the load in the same day, and the prediction error is shown in Table 3.

As can be seen from table 3, the single-core model has limited learning ability and a fragile structure, and the fusion model can effectively reduce the error of load prediction and has a stable structure. In addition, after the Stacking algorithm is improved, the MAPE is reduced by 0.1141%, the RMSE is reduced by 0.0157kMW, and the iSta-KELM has higher prediction capability and stronger robustness.

TABLE 3 improved model validation errors

In the error curve of the predicted value and the original load of each model within 24 hours of the day, it can be known that the structure of each single-core model is relatively unstable when the load is predicted. The L-KELM has a great error at 8, peaks frequently at 12, 16 and 20, and is not robust. The P-KELM starts with a large error at 5 and a large peak at 16. G-KELM performs better than the first two, but also shows larger error peaks at 10 and 15. Compared with a single-core model, the fusion algorithm is stable in performance. Compared with the Sta-KELM, the structure of the iSta-KELM is more stable, the model has larger fluctuation from 12 hours to 14 hours, and the performance is better than that of other comparison models in other time periods.

To verify that the model presented herein outperforms other models on this dataset, the model presented herein was analyzed in comparison to LSTM versus random forests. The load amounts for 30 days were predicted using these three models, and the obtained load prediction curves and errors are shown in table 4. The LSTM is designed into a single-input single-output structure only considering the sequence characteristics of the LSTM, the hidden layer is designed into a two-layer network, and the number of neurons in each layer of the two-layer network is respectively set to be 64 and 32. The random forest was designed to be 25 trees and 4 leaves.

Each model predicted a curve for a load of 30 days over 6 months. The three methods are known to perform well when a gentle curve is predicted, but when the load fluctuates sharply, the learning capacity of the nonlinear sequence of the random forest is poor, the overfitting problem is easy to occur, and the deviation between the prediction result and the actual result is large. Although the LSTM can well predict the change trend of the load sequence, factors such as the weather and the like are ignored only by considering the nature of the load, so that a certain difference exists between a predicted value and an actual value. Compared with the model, the model provided by the text has better prediction effect.

Table 4 shows the prediction error of each model. As can be seen from Table 4, the MAPE of the iSta-KELM was reduced by 0.25 and 0.19 compared to the random forest and LSTM, respectively, when dealing with the same regression fitting problem, indicating that the model is structurally superior to the comparative model and has higher robustness. The RMSE of the iSta-KELM decreased by 0.21, 0.17, respectively, indicating that the model performed better in prediction accuracy. In conclusion, the model provided by the method is higher in prediction accuracy and more stable in structure, and can be used for actual load prediction work.

TABLE 4 prediction error of different methods

In order to avoid the influence of the particularity of the selected quarter, load prediction is carried out on the other three quarters of the region, and the universality of the model provided by the text is further verified. The load of each quarter is selected for the experiment, the load condition of the last month of each quarter for 30 days is predicted by using different methods, and the prediction result of each model is shown in table 5.

TABLE 5 different model generalization error analysis

Table 5 shows the error comparison of the load prediction for each model for each of days 6 and 30, 9 and 30, and 12 and 30. It is known that the iSta-KELM performs well in all seasons. Compared with other models, the iSta-KELM has a more stable model structure, and the proportion of time information can be effectively enhanced by the improved scheme, so that the model prediction capability is improved. In addition, the iSta-KELM can fit curves of all seasons, and has strong learning ability. In conclusion, the iSta-KELM has strong generalization ability.

Example 3

The invention relates to a short-term power load prediction device based on a Stacking algorithm, which comprises:

The specific implementation scheme of each module in the device refers to the specific implementation step process of the method.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The above description is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, several modifications and variations can be made without departing from the technical principle of the present invention, and these modifications and variations should also be regarded as the protection scope of the present invention.

Claims

1. A short-term power load prediction method based on a Stacking algorithm is characterized by comprising the following steps:

2. The method according to claim 1, wherein the load capacity comprises 168 hour pre-load, 48 hour pre-load, 25 hour pre-load and 24 hour pre-load, the weather data comprises ultraviolet intensity and temperature, the time type comprises holiday type and time type, the holiday type comprises Monday to Sunday, and the time type comprises 1-24 hours.

3. The method of claim 1, wherein the KELM model of the different cores in the basic model layer comprises: a KELM model of linear kernels, gaussian kernels and polynomial kernels.

4. The method of claim 1, wherein the KELM model in the secondary model layer comprises a KELM model with Gaussian kernel.

5. The method for predicting the short-term power load based on the Stacking algorithm as claimed in claim 1, wherein in the training process of the KELM model:

6. The method as claimed in claim 5, wherein the optimizing kernel parameters in the KELM model by using whale optimization algorithm comprises:

w(t)＝ω_min+(ω_max-ω_min)r·e^-t/Tmax (5)

7. A short-term power load prediction device based on a Stacking algorithm is characterized by comprising the following components:

8. The short-term power load forecasting device based on the Stacking algorithm as claimed in claim 7, wherein the load capacity comprises 168 hours of preload, 48 hours of preload, 25 hours of preload and 24 hours of preload, the weather data comprises ultraviolet intensity and temperature, the time types comprise a holiday type and a time type, the holiday type comprises Monday to Sunday, and the time type comprises 1-24 hours.

9. The device of claim 7, wherein the KELM models of the different cores in the basic model layer comprise: a KELM model of linear kernels, gaussian kernels and polynomial kernels.

10. The device for predicting short-term power load based on Stacking algorithm as claimed in claim 7, wherein the KELM model in the secondary model layer comprises a KELM model of Gaussian kernel.