CN117977568A

CN117977568A - Power load prediction method based on nested LSTM and quantile calculation

Info

Publication number: CN117977568A
Application number: CN202410049336.0A
Authority: CN
Inventors: 李丹; 张远航; 孙光帆; 杨保华; 王奇; 缪书唯; 李振兴; 刘颂凯
Original assignee: China Three Gorges University CTGU
Current assignee: China Three Gorges University CTGU
Priority date: 2020-10-13
Filing date: 2020-10-13
Publication date: 2024-05-03
Also published as: CN112232561A; CN112232561B

Abstract

The invention discloses a power load prediction method based on nested LSTM and quantile calculation, which comprises the steps of collecting load power and influence factor data of a plurality of sample days to form a data set; establishing a nested LSTM model, and pre-training each quantile LSTM in the nested LSTM model to obtain a weight and bias parameter set; performing overall training on the nested LSTM model, and performing fine adjustment on the weight and the bias parameter in the training process to determine the optimal weight and the bias parameter of the nested LSTM model; inputting the verification set into a trained nested LSTM model, and selecting the optimal super parameters of the model according to the verification error; and inputting the test sample into a nested LSTM model with the optimal super parameters, and performing inverse normalization on a prediction result output by the nested LSTM model. According to the invention, the nested LSTM model is adopted to carry out quantile regression prediction of the power load, so that the probability distribution of the predicted load is more reasonable, and the intersection between quantile predicted values is avoided.

Description

Power load prediction method based on nested LSTM and quantile calculation

Technical Field

The invention belongs to the field of power load prediction, and particularly relates to a power load prediction method based on nested LSTM and quantile calculation.

Background

Short-term power load prediction is the basis of safe and economic operation of a power system, and provides important information for power system planning and operation, energy transaction, unit start-stop, economic dispatch and the like. The improvement of the accuracy of load prediction is beneficial to improving the utilization rate of power equipment and reducing the energy waste to the greatest extent.

At present, the load probability prediction method mainly comprises interval estimation, kernel density estimation, quantile regression and the like. The first two methods are mainly based on parameter statistics estimation probability distribution of point prediction errors, and fractional regression can directly explain the relation between response variables and dependent variables under different fractional points, so that the method becomes a focus of attention of load probability prediction literature in recent years. However, the quantile predicted value of quantile regression has the phenomenon of crossing, which results in unreasonable.

The load probability prediction method is to combine a machine learning algorithm and a quantile regression method to construct a quantile model. However, conventional machine learning algorithms often require processing of data using feature engineering. Deep learning neural networks have proven to be more efficient in coping with short-term load predictions for large data sets than traditional machine learning methods. In particular, long short-term memory (LSTM) neural networks, as shown in fig. 2, have been widely used because of their strong adaptability to time-series forms of data.

Therefore, a short-term power load probability prediction method based on nested LSTM neural network fractional regression is studied.

Disclosure of Invention

The technical problem of the invention is that the quantile predictive value of the existing quantile regression method of the power load has the phenomenon of crossing, so that the method is unreasonable.

The invention aims to solve the problems and provide a power load prediction method based on nested LSTM and quantile calculation, which combines the robustness and memory characteristics of LSTM with the quantile regression probability prediction function, considers the inherent characteristics of the quantile of the predicted load probability, adds a combination layer considering the constraint relation between quantile prediction values, and constructs a nested LSTM, namely constraint parallel long-short-term memory network model ((constrained parallel Long-Short Term Memory, CP-LSTM), so that the predicted load probability distribution is more reasonable, and the intersection between quantile prediction values is avoided.

The technical proposal of the invention is a power load prediction method based on nested LSTM and quantile calculation, which comprises the following steps,

Step 1: collecting load power and influence factor data of a plurality of sample days, forming a data set and dividing the data set into a training set, a verification set and a test set;

Step 2: establishing a nested LSTM model and setting model super-parameters; pre-training the parallel LSTM under each quantile in the nested LSTM model by adopting a parallel training method to obtain a global parameter set { W (τi), b (τi) } _opt;

step 3: taking the obtained global parameter set { W (τi), b (τi) } _opt as initial parameters of the nested LSTM model, carrying out overall training on the nested LSTM model, and carrying out fine adjustment on the weight and the bias parameters in the training process to determine the optimal weight and the bias parameters of the nested LSTM model;

step 4: inputting the verification set into a trained nested LSTM model, and selecting the optimal super parameters of the model according to the verification error;

step 5: inputting the test sample into a nested LSTM model with optimal super parameters, and performing inverse normalization on a prediction result output by the nested LSTM model to obtain a plurality of quantile prediction values of the prediction load at each moment in the prediction day;

step 6: and (5) calculating to obtain a probability density curve of the predicted point according to the plurality of quantiles of the predicted load obtained in the step (5).

Preferably, step 1 further comprises normalizing the various types of data of the dataset to normalize the data variable to the [ -1,1] interval.

Specifically, 96-point load power data of 15 minutes from adjacent time points at 0 time to 24 time points are collected on a sample day, 96-point load power of a day before prediction, 24-time air temperature and regional rainfall on the day after prediction are selected to form a multidimensional characteristic input variable vector, 96-point load quantile on the day after prediction is used as an output variable vector, an input variable X _d=[T_d,R_d and an air temperature T _d=[T₁, T₂,…, T₂₄]_d are used, wherein T _i, i epsilon {1,2, …,24} represents weather temperature measured when i, rainfall R _d=[R₁, R₂,…,R_M]_d, wherein R _j, j epsilon {1,2, …, M } represents rainfall of a j-th subarea of a prediction area, D epsilon {1,2, …, D } and D is the total number of days of a historical sample, and M is the number of subareas contained in the prediction area.

In step 2, the model super-parameters include the number m of neurons, the time window length l of the samples, the node number n and the penalty parameter lambda ₁、λ₂.

Preferably, the parallel training is implemented through GPU distributed computing, the training set is equally divided into a plurality of subsets, and is distributed to each node of the computing system, each computing node is responsible for processing a different subset of the data set, so as to reduce the total time of training the neural network, the parameter set obtained by training each node is used for calculating a new global weight set by using a gradient descent formula, and then is distributed to each node of the computing system, and the formula is as follows:

Wherein Z _φ={W, b}^(φ) is a global parameter set obtained by phi-th iterative training, delta Z _φ,j is a parameter gradient of the j-th computing node obtained by phi-th iterative training, n is the total number of computing nodes, Is a scaling factor.

In step3, the weight and the bias parameters are finely adjusted, and the weight and the bias parameters are finely adjusted by using a gradient descent algorithm according to the loss function.

Preferably, the probability density curve of the predicted point obtained by calculation adopts a Gaussian kernel density estimation method.

Preferably, step 1 divides the data set into a training set, a validation set and a test set in a ratio of 8:1:1.

Preferably, the prediction result in step 5 adopts an evaluation index considering the constraint relation of the quantiles to evaluate the crossing condition of the quantiles, and the quantile prediction value at the time t should satisfy the condition as known from the inherent attribute of the quantiles

，

The index for accounting for the fractional number constraint relationship is as follows:

Wherein the method comprises the steps of An evaluation index value indicating a constraint relation of the quantiles; /(I)Is a predicted value under the t-moment quantile, N is the total number of test moments, v _t,i is a constraint violation degree function, θ=τ _i+1-τ_i is the step length between the quantiles, and is a constant; v _t,i is 0 when the constraint relationship is satisfied between adjacent quantiles, and v _t,i is a positive difference value between adjacent quantiles when the constraint relationship is violated, reflecting the degree of constraint violation. The coefficient term 2 theta/N is a normalized coefficient of the quantile constraint error square, whereby the calculated X _CS is the normalized root mean square of v _t,i over the whole test set sample and all adjacent quantiles. The crossing condition reflecting the quantiles can be quantified by X _CS.

When the probability prediction evaluation indexes X _QS and X _CS are simultaneously lower, the predicted quantiles have better performance, and the probability prediction evaluation indexes X _QCS are formed by combining the probability prediction evaluation indexes X _QS and the probability prediction evaluation indexes X _CS:

Compared with the prior art, the invention has the beneficial effects that:

1) According to the invention, the nested LSTM model is adopted to carry out quantile regression prediction of the power load, so that the probability distribution of the predicted load is more reasonable, and the intersection between quantile predicted values is avoided.

2) The parallel training method is adopted to pretrain each quantile LSTM in the nested LSTM model, a weight and bias parameter set is obtained as initial parameters of the nested LSTM model, then overall training is carried out, fine adjustment is carried out on the weight and bias parameters, and the optimal weight and bias parameters of the nested LSTM model are obtained, so that the model prediction efficiency is higher, and accurate point prediction results can be obtained.

3) The evaluation index considering the quantile constraint relation provided by the invention can be used for evaluating the crossing condition of quantiles.

Drawings

The invention is further described below with reference to the drawings and examples.

Fig. 1 is a flowchart of a power load probability prediction method according to an embodiment.

Fig. 2 is a schematic view of LSTM structure.

Fig. 3 is a schematic structural diagram of a nested LSTM model of an embodiment.

FIG. 4 is a schematic diagram of parallel training of an embodiment.

FIG. 5 is a schematic diagram of a training process for a parallel LSTM.

FIG. 6 is a comparative diagram of evaluation index Xcs of sample days of test sets obtained by different predictive models in the examples.

Detailed Description

As shown in fig. 1, the power load prediction method based on nested LSTM and quantile calculation, includes the steps of,

Step 1: load data, air temperature data and rainfall which are 15 minutes apart from each other in the period from 1 month 1 in 2016 to 30 months 6 in 2017 in a certain actual area are collected to form a data set, the data set is divided into a training set, a verification set and a test set according to the proportion of 8:1:1, and variables X _d=[T_d,R_d are input, wherein the data set comprises an air temperature T _d=[T₁, T₂,…, T₂₄]_d at 24 days and rainfall R _d=[R₁, R₂,…,R_M]_d of M subareas; considering that the data difference between the data is relatively large, different data needs to be normalized into [ -1,1], and the input sample after normalization is; Sample data before normalization processing is/>The maximum and minimum sample values are respectively、/>The number of samples is N, and the specific processing formula is as follows:

step 2: a nested LSTM model is built, as shown in FIG. 3, which includes an input layer, a hidden layer, an output layer, and a regression layer, the hidden layer including a plurality of quantile long and short term memory network models (Quantile Long-Short Term Memory, Q-LSTM).

Setting model super parameters, including the number m of neurons, the length l of a sample time window, the number n of calculation nodes and penalty parameters lambda ₁、λ₂; in the embodiment, the value of m is 200, the value of the time window length l is 6, the value of lambda ₁ is 1, the value of lambda ₂ is 20, and the total sample day is 547 days.

The parallel training method is adopted to pretrain parallel LSTM under each sub-point in the nested LSTM model, the training set is divided into n equal subsets, and the corresponding n computing nodes are utilized to train the network in parallel;

As shown in fig. 4, the data parallel training of the neural network is implemented through GPU distributed computing, the training set is equally divided into a plurality of subsets, and distributed to each node of the computing system, each computing node is responsible for processing a different subset of the data set, so as to reduce the total time of training the neural network, each node trains the data subset thereof to obtain a set of model parameters, the parameter set obtained by training each node calculates a new global weight set by using a gradient descent formula, and then distributed to each node of the computing system, and the formula is as follows:

Wherein Z _φ={W, b}^(φ) is a global parameter set obtained by phi-th iterative training, delta Z _φ,j is a parameter gradient of the j-th computing node obtained by phi-th iterative training, n is the total number of computing nodes, For scaling factors, the learning rate is similar.

As shown in fig. 5, the model Q-LSTM trained separately for each node is trained as follows:

(4) Inputting an initial weight W _0(τi) and an initial bias b _0(τi);

(5) Input gate for calculating LSTM Forgetting door/>Output door/>Candidate memory cell/>New memory stateHidden layer state/>Current iteration value/>、/>、/>、/>、/>、/>The calculation process is as follows:

Given the current input x _t, the hidden layer state h _t-1 and the storage state C _t-1 at the previous time, the detailed calculation process is as follows:

Wherein W _i、W_f、W_o、W_c represents the corresponding weight matrix, and b _i、b_f、b_o、b_c represents the corresponding bias vector; sigma () and tanh () are Sigmoid and tangent Sigmoid curve activation functions, respectively; final output of output layer Calculated from the hidden layer state h _t:

Where W _S is the implicit layer-output layer connection weight matrix and b _S represents the corresponding bias vector.

(6) Gradient calculation using gradient descent method based on loss functionAnd/>And from this the gradient of each weight and bias is calculated, the loss function is as follows:

Wherein the method comprises the steps of

W(τ_i)={W_f(τ_i),W_i(τ_i),W_c(τ_i),W_o(τ_i)_,W_S(τ_i)}b(τ_i)={b_f(τ_i),b_i(τ_i),b_c(τ_i),b_o(τ_i),b_S(τ_i)}

Respectively are quantilesAll weight parameter matrix sets and bias vector sets of the lower LSTM neural network; lambda ₁ is a regularized penalty parameter that prevents model training from fitting,/>(A) As a test function, it is defined as:

Defining a gradient function And/>The following are provided:

As a loss function/> For hidden layer state/>Differentiation of/>As a loss function/>For storage state/>Is a derivative of (a).

1) The gradient of hidden layer to output layer parameters is:

to hide layer state/> Differentiating the connection weight matrix W _S of the hidden layer and the output layer,/>To hide layer state/>Differential the bias vector b _S.

2) 2) According to、/>Respectively calculating gradients of parameters of the forgetting gate, the input gate, the candidate storage unit and the output gate;

(5) Updating the weights and offsets as follows:

Where η is the learning rate, and W _* and b _* represent the corresponding weight matrix and bias vector, respectively.

And (3) repeating the steps (2) to (4) until the convergence condition is reached, and obtaining the optimal parameters { W (tau _i), b(τ_i)}_opt) of the model.

Step 3: the obtained weight and bias parameter set { W (tau _i), b(τ_i)}_opt is taken as the initial parameter of a nested LSTM model, the nested LSTM model is integrally trained, the { W (tau _i),b(τ_i)}_r is finely tuned, the optimal weight and bias parameter of the CP-LSTM short-term load probability prediction model are determined), in order to obtain the optimal parameter of the nested LSTM model, a gradient descent method is adopted to search the model parameter { W (tau _i),b(τ_i)}_opt; the training method of the nested LSTM model is consistent with the Q-LSTM training method, only the loss function and the gradient are different, and the loss function of the nested LSTM model is searched based on the training sample setThe following are provided:

Wherein the method comprises the steps of ,/>To violate the penalty parameters of the constraint, the corresponding gradient/>、/>And/>The phase change is as follows:

The elements in the vector u _i are respectively:

the gradient calculation of the forgetting gate, the input gate, the storage unit, the candidate storage unit and the output gate parameters is the same as the calculation mode in the step 3.

Step 4: inputting the verification set into the nested LSTM model trained in the step 3, and selecting the most superior super parameters according to the verification error.

10% Of the sample data for day 547 of the example was used for validation and the best super-parameters were chosen based on the error of the final output result from the true value.

Step 5: inputting the test sample into a nested LSTM model with optimal super parameters to obtain an output result, converting the output result into different dimensions, namely, inversely normalizing, and finally carrying out comparative analysis on the predicted data and the real result; considering that the quantile prediction result meets the quantile Constraint condition, the invention provides an evaluation index Constraint Score (CS) considering the quantile Constraint relation on the basis of a common probability prediction evaluation index Quantile Score (QS). From the inherent attribute of quantiles, the quantile predictive value at time t should satisfyAccording to the method, the index considering the quantile constraint relation is as follows:

When X _QS and X _CS are simultaneously lower, the predicted quantiles have better performance, and the comprehensive evaluation index X _QCS is formed by combining the two components:

furthermore, the reliability index PI coverage probability deviation index (PICP) and the sharpness index PI standard root mean square width (PINRW) of the prediction interval (prediction interval, PI) are also important indexes for the evaluation of the probability prediction result.

Common probability prediction evaluation index X _QS:

Wherein the method comprises the steps of Is the quantile/>At pinball losses, y _t is the actual value of the power load at time t,/>Is time t/>And the predicted value under the quantile, N is the total number of test moments.

Reliability index X _PICP:

where ε ^α represents the number of prediction intervals that the actual value falls within under confidence 1- α.

Deviation of the actual coverage PICP of PI from the nominal value (PI nominal confidence, PINC) covers the probability deviation index X _Dev:

Sharpness index X _PINRW:

Wherein X _PINRW ^α is the normalized root mean square width of the prediction section under the confidence coefficient of 1-alpha, U _t ^α and L _t ^α are the upper limit and the lower limit of the prediction section of the t test sample under the confidence coefficient of 1-alpha, and R is the difference between the maximum value and the minimum value of the load in the test set.

Step 6: according to the multiple quantiles of the predicted load obtained in the step 5, a probability density curve of the predicted point is calculated by adopting a Gaussian kernel density estimation method, and the Gaussian kernel density estimation method is disclosed by a paper "Short-term power load probability density forecasting based on Yeo-Johnson transformation quantile regression and Gaussian kernel function" published in journal Energy 2018.

In the embodiment, a 15-minute-level load data set from 1 st year to 1 st 6th year to 30 th year in a practical area is selected, and the daily preload probability is predicted by the method. To verify the predictive performance of the nested LSTM model, it was compared to a linear quantile regression model L-QR, quantile neural networks bQRNN, QRNN with parametric rectified linear activation function RCLU, and Q-LSTM without the addition of a combining layer. The evaluation index statistics of the probability prediction results of each model are shown in tables 1 and 2, and table 1 lists training time length T _train, common probability prediction evaluation index X _QS, index X _CS considering quantile constraint relation, comprehensive evaluation index X _QCS and sharpness index X _PINRW under 50% confidence and sample ratio f against adjacent quantile constraint relation; table 2 shows a comparison of the reliability index X _PICP and the bias index X _Dev at different confidence levels, where X _AD、X_MD is the mean, maximum, respectively, of X _Dev at each confidence level.

As can be seen from the combination of FIG. 6 and Table 1, the X _CS index of the nested LSTM, i.e. the CP-LSTM, is significantly lower than that of the other methods in most sample days, and the comprehensive X _CS index of the CP-LSTM in the whole test set is only 27.28% of that of the Q-LSTM, and the proportion f of the samples against the constraint in the whole test set sample is reduced by 16.3% compared with that of the Q-LSTM, but the X _QS index reflecting the prediction accuracy is not significantly changed. The CP-LSTM can effectively avoid quantile crossing and improve the rationality of the predicted quantile on the premise of not reducing the prediction precision.

Table 1 comparison table of evaluation indexes of various models

Table 2 comparison table of models X _PICP and X _Dev

The scope of the present invention is not limited thereto, and although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those skilled in the art that: any person skilled in the art may modify or easily conceive of the technical solution described in the foregoing embodiments, or perform equivalent substitution of some of the technical features, while remaining within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention, and are intended to be included in the scope of the present invention.

Claims

1. The power load prediction method based on nested LSTM and quantile calculation is characterized by comprising the following steps of:

Step 2: establishing a nested LSTM model and setting model super-parameters; pretraining each quantile LSTM in the nested LSTM model by adopting a parallel training method to obtain a weight and bias parameter set;

The split-site long-short-term memory network model Q-LSTM comprises an input gate Forgetting door/>Output door/>Candidate memory cell；

Step 3: taking the obtained weight and bias parameter set as initial parameters of the nested LSTM model, carrying out overall training on the nested LSTM model, and carrying out fine adjustment on the weight and bias parameters in the training process to determine the optimal weight and bias parameters of the nested LSTM model;

Step 5: and inputting the test sample into a nested LSTM model with the optimal super parameters, and performing inverse normalization on a prediction result output by the nested LSTM model to obtain a plurality of quantile predicted values of the predicted load at each moment in the prediction day.

2. The method for power load prediction based on nested LSTM and quantile calculation of claim 1, further comprising step 6: and (5) calculating to obtain a probability density curve of the predicted point according to the plurality of quantiles of the predicted load obtained in the step (5).

3. The method for predicting the electric load based on the nested LSTM and quantile calculation according to claim 2, wherein step 1 collects 96-point load power data of 15 minutes from adjacent time points in 0 to 24 times for a sample day, selects 96-point load power of the day before prediction, 24-time air temperature and regional rainfall on the day before prediction to form a multidimensional feature input variable vector, takes 96-point load quantile on the day after prediction as an output variable vector, inputs variable X _d=[T_d, R_d and air temperature T _d=[T₁, T₂,…, T₂₄]_d, wherein T _i, i e {1,2, …,24} represents weather temperature measured when i, rainfall R _d=[R₁, R₂,…, R_M]_d, wherein R _j, j e {1,2, …, M } represents rainfall of a j-th sub-area of a prediction area, D e {1,2, …, D } is total number of days of historical samples, and M is the number of sub-areas included in the prediction area.

4. A method of power load prediction based on nested LSTM and quantile calculations as claimed in claim 3, wherein in step 2, the model hyper-parameters include the number of neurons m, the time window length of samples l, the node number n and the penalty term parameter λ ₁、λ₂.

5. The method for predicting the power load based on nested LSTM and quantile calculation according to claim 4, wherein the parallel training is implemented by GPU distributed computing, the training set is equally divided into a plurality of subsets and distributed to each node of the computing system, each computing node is responsible for processing a different subset of the data set, thereby reducing the total time of training the neural network, the parameter set obtained by training each node is used for calculating a new global weight set by using a gradient descent formula, and then distributed to each node of the computing system, and the formula is as follows:

6. The method for predicting the power load based on nested LSTM and quantile calculation of claim 5, wherein the quantile long-short-term memory network model Q-LSTM trained by each node individually is trained as follows:

(1) Inputting an initial weight W _0(τi) and an initial bias b _0(τi);

(2) Input gate for calculating LSTM Forgetting door/>Output door/>Candidate memory cell/>New memory state/>Hidden layer state/>Current iteration value/>、/>、/>、/>、/>、/>The calculation process is as follows:

wherein W _S is a connection weight matrix of the hidden layer and the output layer, and b _S represents a corresponding bias vector;

(3) Gradient calculation using gradient descent method based on loss function And/>And from this the gradient of each weight and bias is calculated, the loss function is as follows:

wherein W(τ_i)={W_f(τ_i),W_i(τ_i),W_c(τ_i),W_o(τ_i)_,W_S(τ_i)},b(τ_i)={b_f(τ_i),b_i(τ_i),b_c(τ_i),b_o(τ_i),b_S(τ_i)} are each quantiles All weight parameter matrix sets and bias vector sets of the lower LSTM neural network; lambda ₁ is a regularized penalty parameter that prevents model training from fitting,/>Is a checking function;

the gradient of hidden layer to output layer parameters is:

to hide layer state/> Differentiating the connection weight matrix W _S of the hidden layer and the output layer,/>To hide layer state/>Differentiating the bias vector b _S;

According to 、/>Respectively calculating gradients of parameters of the forgetting gate, the input gate, the candidate storage unit and the output gate;

(4) Updating the weight and the bias, wherein the formula is as follows:

Wherein eta is the learning rate, and W _* and b _* respectively represent the corresponding weight matrix and bias vector;

And (3) repeating the step (2) -the step (4) until convergence conditions are reached, and obtaining the optimal parameters { W (tau _i), b(τ_i)}_opt) of the model.

7. The method for predicting power load based on nested LSTM and quantile calculation as defined in claim 6, wherein in step 3, in order to obtain optimal parameters of constrained parallel LSTM model, a gradient descent method is used to search model parameters { W (τ _i), b(τ_i)}_opt; the training method of constrained parallel LSTM model is consistent with the quantile long-short-term memory network model Q-LSTM training method, except that there is a difference between the loss function and gradient, and the loss function of constrained parallel LSTM model is searched based on training sample setThe method comprises the following steps:

Wherein the method comprises the steps of

；

Penalty parameters for violating constraint conditions;

the gradient calculation of the forgetting gate, the input gate, the storage unit, the candidate storage unit and the output gate parameters is the same as the calculation mode in the step 2.

8. The method for predicting the power load based on nested LSTM and quantile calculation according to claim 7, wherein the prediction result in step 5 adopts an evaluation index which takes into account a quantile constraint relation to evaluate the crossing condition of quantiles, and the index which takes into account the quantile constraint relation is as follows:

Wherein the method comprises the steps of An evaluation index value indicating a constraint relation of the quantiles; /(I)Is t moment quantile/>The predicted value under the test, N is the total number of test moments, v _t,i is a constraint violation degree function, and theta represents the step length between the sub-stations; v _t,i is 0 when the constraint relation is satisfied between adjacent quantiles, and v _t,i is a positive difference value of the adjacent quantiles when the constraint relation is violated, reflecting the degree of constraint violation; coefficient term 2 theta/N is a normalized coefficient of quantile constraint error square.

9. The method for predicting the power load based on nested LSTM and quantile calculation according to claim 8, wherein in step 6, a gaussian kernel density estimation method is used for the probability density curve of the predicted points obtained by the calculation.