CN115689001A

CN115689001A - Short-term Load Forecasting Method Based on Pattern Matching

Info

Publication number: CN115689001A
Application number: CN202211321149.0A
Authority: CN
Inventors: 唐志远; 唐義坤; 高毅; 张梁; 周进; 陈月; 迟福建; 张桂婷
Original assignee: Sichuan University; State Grid Corp of China SGCC; State Grid Tianjin Electric Power Co Ltd
Current assignee: Sichuan University; State Grid Corp of China SGCC; State Grid Tianjin Electric Power Co Ltd
Priority date: 2022-10-26
Filing date: 2022-10-26
Publication date: 2023-02-03

Abstract

The invention relates to a short-term load forecasting method based on pattern matching, belonging to the technical field of load forecasting of an electric power system. And selecting a prediction model with the best verification effect from the multiple prediction models aiming at each type of load mode so as to improve the prediction precision of each type of load mode. The method realizes the self-adaptive matching of the user load and the load mode; the matching of the load mode and the optimal prediction model is realized, the advantages of each prediction model are fully exerted, and the overall prediction precision can be effectively improved.

Description

Short-term Load Forecasting Method Based on Pattern Matching

技术领域technical field

本发明属于电力系统负荷预测技术领域，具体涉及基于模式匹配的短期负荷预测方法。The invention belongs to the technical field of power system load forecasting, and in particular relates to a short-term load forecasting method based on pattern matching.

背景技术Background technique

负荷预测是制定电力调度计划的关键环节之一。准确的负荷预测，有助于系统合理安排机组的启停，新能源利用率的提高，自动发电控制和安全维护，对系统的安全稳定，经济调度起着重要的作用。随着我国新型电力系统建设的不断深入，用户侧高级量测体系的建立，这一过程中积累大量用户侧的用电数据，但用户集群由于分布较为分散，用电量较少，负荷具有较大波动性和随机性，对用户集群负荷的精准预测是当下的需解决的重要难题。Load forecasting is one of the key links in formulating power dispatching plan. Accurate load forecasting is helpful for the system to reasonably arrange the start and stop of units, improve the utilization rate of new energy, automatic power generation control and safety maintenance, and play an important role in the safety and stability of the system and economic dispatch. With the continuous deepening of the construction of new power systems in my country and the establishment of advanced measurement systems on the user side, a large amount of electricity consumption data on the user side has been accumulated in this process. Due to large volatility and randomness, accurate prediction of user cluster load is an important problem to be solved at present.

近年来，国内外许多学者对短期负荷预测做了大量研究，提出了许多预测模型和方法，大体可以分为两类，一类为以时间序列法为代表的传统的预测方法，另一类为基于数据驱动的人工智能方法。自回归(auto regression，AR)和自回归移动平均(autoregression moving average，ARMA)数学模型是时间序列法的常用模型，这两种模型只利用到了历史负荷数据，计算量小，速度快，但没有学习能力，对数据平稳性要求较高。为解决传统预测方法预测精度不高，适应能力较低的问题，基于数据驱动的人工智能方法也越来越多的参与到负荷预测中来，如人工神经网络(artificial neural network，ANN)，支持向量机(support vector machines，SVM)，随机森林，深度信念网络等等。现有采用深度信念网络进行变电站负荷预测，并用自适应矩阵估计算法得到网络的最佳参数，实现网络的自适应性，但计算较为复杂。还有的对元胞先进行K-means聚类后构建SVM预测模型，实现空间负荷预测，但SVM预测模型受核函数相关系数，惩罚系数c，影响较大。还有的提出了一种基于贝叶斯优化的卷积神经网络(convolutional neural network，CNN)双向门控循环网络(bidirectional gate recurrent unit,BiGRU)短期电力负荷预测方法。但神经网络训练容易出现“过拟合”问题，影响其模型的泛化能力，降低预测精度。还有的利用C均值模糊聚类算法对历史样本进行聚类，后对同类数据构建随机森林回归模型进行预测。但该方法需创建很多决策树，训练时所需要的空间和时间会很大。In recent years, many scholars at home and abroad have done a lot of research on short-term load forecasting, and proposed many forecasting models and methods, which can be roughly divided into two categories, one is the traditional forecasting method represented by the time series method, and the other is the Based on a data-driven approach to artificial intelligence. Auto regression (auto regression, AR) and auto regression moving average (autoregression moving average, ARMA) mathematical models are commonly used models in time series methods. Learning ability, high requirements for data stability. In order to solve the problems of low prediction accuracy and low adaptability of traditional forecasting methods, more and more data-driven artificial intelligence methods are involved in load forecasting, such as artificial neural network (ANN), which supports Vector machines (support vector machines, SVM), random forests, deep belief networks, and more. At present, the deep belief network is used for substation load forecasting, and the optimal parameters of the network are obtained by using the adaptive matrix estimation algorithm to realize the adaptability of the network, but the calculation is relatively complicated. There are also K-means clustering on the cells first to construct an SVM prediction model to realize space load prediction, but the SVM prediction model is greatly affected by the correlation coefficient of the kernel function and the penalty coefficient c. Another proposed a short-term power load forecasting method based on Bayesian optimization based on convolutional neural network (CNN) bidirectional gate recurrent unit (BiGRU). However, neural network training is prone to "overfitting" problems, which affects the generalization ability of its model and reduces the prediction accuracy. Others use the C-means fuzzy clustering algorithm to cluster historical samples, and then build a random forest regression model for similar data to predict. However, this method needs to create many decision trees, and the space and time required for training will be large.

然而，随着新能源发电系统在用户侧的普及，使得用户集群负荷随机性和波动性增强，应用上述单一的预测方法可能由于随机性而导致泛化性能不佳。However, with the popularization of new energy power generation systems on the user side, the randomness and volatility of user cluster loads increase, and the application of the above single prediction method may lead to poor generalization performance due to randomness.

发明内容Contents of the invention

本发明目的在于提供基于模式匹配的短期负荷预测方法，用于解决上述现有技术中存在的技术问题，实现用户负荷与负荷模式的自适应匹配；也实现负荷模式与最佳预测模型的匹配，充分发挥每个预测模型的优势，可有效提高整体的预测精度。The purpose of the present invention is to provide a short-term load forecasting method based on pattern matching, which is used to solve the technical problems in the above-mentioned prior art, realize adaptive matching of user load and load pattern; also realize the matching of load pattern and optimal forecasting model, Giving full play to the advantages of each forecasting model can effectively improve the overall forecasting accuracy.

为实现上述目的，本发明的技术方案是：For realizing the above object, technical scheme of the present invention is:

基于模式匹配的短期负荷预测方法，包括以下步骤：A short-term load forecasting method based on pattern matching, including the following steps:

S1、采用基于皮尔逊相关系数的层次K-means聚类算法，将用户历史负荷数据最优地划分出多种月负荷模式；S1. Using the hierarchical K-means clustering algorithm based on the Pearson correlation coefficient, the user's historical load data is optimally divided into various monthly load patterns;

S2、针对每一个月负荷模式，从反馈神经网络，支持向量机回归和线性回归中选取验证效果最佳的模式匹配算法；S2. For each monthly load pattern, select the pattern matching algorithm with the best verification effect from feedback neural network, support vector machine regression and linear regression;

S3、根据用户最新月的负荷数据与月负荷模式的相似度，将用户用电数据与月负荷模式自适应匹配，各个模式预测结果相加便得到总的预测结果。S3. According to the similarity between the user's latest monthly load data and the monthly load pattern, the user's electricity consumption data is adaptively matched with the monthly load pattern, and the prediction results of each pattern are added to obtain the total prediction result.

进一步的，步骤S1具体如下：Further, step S1 is specifically as follows:

基于皮尔逊相关系数的聚类相似度度量函数：Clustering similarity measure function based on Pearson correlation coefficient:

设x_i＝(x_i1,x_i2,…,x_in)，x_j＝(x_j1,x_j2,…,x_jn)为两条负荷曲线，其皮尔逊相关系数为：Suppose x _i ＝(x _i1 , x _i2 ,…,x _in ), x _j ＝(x _j1 ,x _j2 ,…,x _jn ) are two load curves, and their Pearson correlation coefficients are:

式中：

和

分别为x_i和x_j的均值；In the formula:

and

are the mean values of x _i and x _j respectively;

层次K-means聚类方法原理：Hierarchical K-means clustering method principle:

重复进行多次相同K值的K-means聚类，记录下每次的聚类中心；Repeat K-means clustering with the same K value multiple times, and record each cluster center;

对所有记录下的聚类中心，进行层次聚类；Perform hierarchical clustering on all recorded cluster centers;

以层次聚类得出的聚类中心作为初始聚类中心，再进行一次K-means聚类，得到最终的聚类结果；The cluster center obtained by hierarchical clustering is used as the initial cluster center, and then K-means clustering is performed again to obtain the final clustering result;

聚类指标的选取：Selection of clustering indicators:

评价指标为：The evaluation indicators are:

式中：K为聚类数目，v_c为各聚类簇的聚类中心；n_K为当前第K类簇包含的样本数。In the formula: K is the number of clusters, v _c is the cluster center of each cluster; n _K is the number of samples contained in the current K-th cluster.

进一步的，步骤S2具体如下：Further, step S2 is specifically as follows:

BP神经网络算法机理：BP neural network algorithm mechanism:

其中：P为输入向量，b为偏置常数，w为输入信号到神经元的权值向量，f为激励函数，y为神经元的输出信号；神经元的输出为Among them: P is the input vector, b is the bias constant, w is the weight vector from the input signal to the neuron, f is the activation function, and y is the output signal of the neuron; the output of the neuron is

y＝f(wP+b) (14)y=f(wP+b) (14)

BP神经网络由输入层，隐藏层和输出层三部分组成，每层之间的神经元进行全连接；The BP neural network consists of three parts: the input layer, the hidden layer and the output layer, and the neurons between each layer are fully connected;

BP神经网络算法由两部分组成：一是信号的前向传播，二是计算输出值和真实值的误差，将误差反向传播，通过L-M算法不断修正各个神经元的参数；参数修正完后再次进行训练，直到训练结果达到误差要求或训练最大次数；The BP neural network algorithm is composed of two parts: one is the forward propagation of the signal, the other is to calculate the error between the output value and the real value, and propagate the error back, and continuously correct the parameters of each neuron through the L-M algorithm; Carry out training until the training result reaches the error requirement or the maximum number of training times;

支持向量机回归算法机理：Support vector machine regression algorithm mechanism:

通过非线性映射将输入向量映射到高维特征空间，通过计算得到一个回归超平面，让集合中所有样本点到该超平面的距离和最小；非线性映射也被称作核函数；设定输入的样本为M＝{(x_i,y_i),i＝1,2,…n}，x_i∈R^d，y_i∈R，SVR得到的超平面函数表达式为：The input vector is mapped to the high-dimensional feature space through nonlinear mapping, and a regression hyperplane is obtained by calculation, so that the distance sum of all sample points in the set to the hyperplane is the smallest; nonlinear mapping is also called a kernel function; set the input The sample of is M={( _xi ,y _i ),i=1,2,…n}, x _i ∈ R ^d , y _i ∈ R, the hyperplane function expression obtained by SVR is:

f(x)＝ωΦ(x)+b (15)f(x)=ωΦ(x)+b (15)

式中：Φ(·)为核函数，b为阈值，ω为权值向量；In the formula: Φ( ) is the kernel function, b is the threshold, and ω is the weight vector;

SVR损失函数为：The SVR loss function is:

式中：ε为容忍样本点到超平面的的距离，即当样本点距离超平面距离小于或等于ε，损失为0；In the formula: ε is the distance from the tolerance sample point to the hyperplane, that is, when the distance between the sample point and the hyperplane is less than or equal to ε, the loss is 0;

线性回归算法机理：Linear regression algorithm mechanism:

线性回归是利用回归方程对一个或多个自变量x和因变量y之间关系进行建模的一种分析方式；其模型为：Linear regression is an analytical method that uses a regression equation to model the relationship between one or more independent variables x and dependent variables y; its model is:

f(x)＝w₀+w₁x₁+w₂x₂+...+w_nx_n (17)f(x)＝w ₀ +w ₁ x ₁ +w ₂ x ₂ +...+w _n x _n (17)

y＝f(x)+δ (18)y=f(x)+δ (18)

式中：w为权值系数，δ为残差；线性回归的主要目的是求得系数w使对于集合中所有样本点M＝{(x_i,y_i),i＝1,2,…n}离f(x)距离和最近；线性回归采用最小二乘法求出其方程；In the formula: w is the weight coefficient, δ is the residual error; the main purpose of linear regression is to obtain the coefficient w so that for all sample points in the set M={( _xi ,y _i ),i=1,2,…n } The distance from f(x) is the closest; linear regression uses the least square method to find its equation;

ω₁,...,ω_n＝(X^TY)^-1X^TY (20)ω ₁ ,...,ω _n ＝(X ^T Y) ^-1 X ^T Y (20)

利用线性回归做负荷预测计算量小，直接得出输入特征与预测信息的关系。Using linear regression to do load forecasting requires a small amount of calculation, and the relationship between input features and forecast information can be directly obtained.

进一步的，步骤S3具体如下：Further, step S3 is specifically as follows:

输入特征参数：Input feature parameters:

预测第t时刻的负荷P_t，需要给模型输入与t时刻负荷相关性很强的特征参数；输入特征参数为：To predict the load P _t at time t, it is necessary to input characteristic parameters to the model that are highly correlated with the load at time t; the input characteristic parameters are:

x＝{P_t-h,P_t-h-1,P_t-h-2,P_t-h+1,P_t-h+2,P_t-2h,P_t-2h-1,P_t-2h+1,x＝{P _th ,P _th-1 ,P _th-2 ,P _t-h+1 ,P _t-h+2 ,P _t-2h ,P _t-2h-1 ,P _t-2h+1 ,

P_t-3h,P_t-7h,weekday,hour}P _t-3h ,P _t-7h ,weekday,hour}

式中前7个为历史负荷数据，h为采集负荷数据一天的时刻数，weekday，hour分别表征星期和小时；In the formula, the first seven are historical load data, h is the number of hours in a day when the load data is collected, weekday and hour represent the week and hour respectively;

基于皮尔逊相关系数的用电负荷模式匹配：Electric load pattern matching based on Pearson correlation coefficient:

设定一个月按28计算，在预测某月第q+1(q＝0，1，…27)天的负荷时，需要选取预测日前28天的负荷数据进行模式匹配，其中包含本月前q天的数据data1，和上一月后28-q的负荷数据data2；为使得模式匹配时间上的一致，将data1数据移到data2之前，拼接成一个用户从月初到月末的完成月负荷数据data3；计算每个用户data3与每个月负荷模式的皮尔逊相关系数，以其最低值将每个用户归类到形态最相似的月负荷模式中；Set a month to be calculated as 28. When predicting the load on day q+1 (q=0, 1, ... 27) of a certain month, it is necessary to select the load data of the 28 days before the forecast date for pattern matching, which includes the q The data data1 of the day, and the load data data2 of 28-q after the previous month; in order to make the pattern matching time consistent, the data of data1 is moved before data2, and spliced into a user's completed monthly load data data3 from the beginning of the month to the end of the month; Calculate the Pearson correlation coefficient between each user data3 and each monthly load pattern, and classify each user into the most similar monthly load pattern with its lowest value;

负荷预测总流程：The overall process of load forecasting:

(1)对用户负荷数据进行预处理；(1) Preprocessing the user load data;

(2)以所有用户的一个月的负荷为样本，对样本进行多次基于皮尔逊系数的K-means聚类，计算聚类指标V，得到最佳的聚类数K_b；(2) Taking the monthly load of all users as a sample, perform multiple K-means clustering based on Pearson's coefficient on the sample, calculate the clustering index V, and obtain the best clustering number K _b ;

(3)选取聚类数为K_b，进行基于皮尔逊相关系数的层次K-means聚类，得到K_b类月负荷，以每类月负荷的聚类中心作为月负荷模式；(3) Select the number of clusters as K _b , perform hierarchical K-means clustering based on the Pearson correlation coefficient, and obtain the monthly load of the K _b category, and use the cluster center of each category of monthly load as the monthly load pattern;

(4)对每个月负荷模式构建BPNN，SVR和LR模型；选择测试效果最佳的模型与月负荷模式进行匹配；(4) Build BPNN, SVR and LR models for each monthly load pattern; select the model with the best test effect to match the monthly load pattern;

(5)将每个用户预测日前一个月负荷与与之形态最相似的月负荷相匹配；(5) Match the load of the month before each user's forecast with the monthly load that is most similar to it;

(6)对匹配到同一模式的用户月负荷求和，计算与月负荷模式的匹配度，即皮尔逊相关系数；(6) Sum the monthly loads of users matched to the same pattern, and calculate the matching degree with the monthly load pattern, that is, the Pearson correlation coefficient;

(7)以月负荷模式最佳的预测模型进行负荷预测，将每个模式的预测结果相加得到最终的负荷预测结果；(7) Carry out load forecasting with the best forecasting model of the monthly load mode, and add up the forecasting results of each mode to obtain the final load forecasting result;

(8)在(7)中如果月负荷模式的最佳预测模型是BPNN，且得到的用户负荷数据与月负荷模式的匹配度小于一定的阈值

则使用BPNN预测模型，否则使用该月负荷模式的次优预测模型；(8) In (7), if the best prediction model of the monthly load pattern is BPNN, and the matching degree between the obtained user load data and the monthly load pattern is less than a certain threshold

Then use the BPNN forecasting model, otherwise use the suboptimal forecasting model of the monthly load pattern;

(9)使用MAPE和RMSE对用户集群预测效果进行评估；(9) Use MAPE and RMSE to evaluate the user cluster prediction effect;

式中：e为实际值，o为预测值，N为预测数目。In the formula: e is the actual value, o is the predicted value, and N is the predicted number.

与现有技术相比，本发明所具有的有益效果为：Compared with prior art, the beneficial effect that the present invention has is:

本方案其中一个有益效果在于，首先充分挖掘用户历史负荷数据的形态特征，聚类出波动规律性较强的负荷模式，根据最新的用户负荷数据与负荷模式的相似度，将用户负荷数据与负荷模式相匹配，以将波动规律较为一致的数据自动地归为到一类。针对每类负荷模式从多种预测模型中选取验证效果最佳的预测模型，以提高每类负荷模式的预测精度。以上方法实现了用户负荷与负荷模式的自适应匹配；也实现了负荷模式与最佳预测模型的匹配，充分发挥了每个预测模型的优势，可有效提高整体的预测精度。对用户的用电负荷进行基于皮尔逊相关系数的形态聚类，自适应匹配用户的用电负荷模式。对每个负荷模式采取最佳的预测模型，以获得最优的预测结果。通过算例表明，基于模式自适应匹配的算法能有效提高预测精度。One of the beneficial effects of this solution is that, firstly, fully excavate the morphological characteristics of the user's historical load data, cluster the load pattern with strong fluctuation regularity, and combine the user load data with the load pattern according to the similarity between the latest user load data and the load pattern. Pattern matching to automatically classify data with more consistent fluctuations into one category. For each type of load mode, the prediction model with the best verification effect is selected from a variety of prediction models to improve the prediction accuracy of each type of load mode. The above method realizes the adaptive matching of user load and load pattern; also realizes the matching of load pattern and optimal forecasting model, gives full play to the advantages of each forecasting model, and can effectively improve the overall forecasting accuracy. Carry out morphological clustering based on the Pearson correlation coefficient for the user's electricity load, and adaptively match the user's electricity load pattern. Take the best forecasting model for each load pattern to get the best forecasting results. The calculation example shows that the algorithm based on pattern adaptive matching can effectively improve the prediction accuracy.

附图说明Description of drawings

图1为本发明中一个具体实施方式的3条负荷曲线形态相似性比较图。Fig. 1 is a comparison diagram of the shape similarity of three load curves in a specific embodiment of the present invention.

图2为本发明中一个具体实施方式的神经元模型图。Fig. 2 is a neuron model diagram of a specific embodiment of the present invention.

图3为本发明中一个具体实施方式的三层BP网络结构图。FIG. 3 is a structural diagram of a three-layer BP network in a specific embodiment of the present invention.

图4为本发明中一个具体实施方式的模式匹配示意图。Fig. 4 is a schematic diagram of pattern matching in a specific embodiment of the present invention.

图5为本发明中一个具体实施方式的预测流程图。Fig. 5 is a prediction flow chart of a specific embodiment of the present invention.

图6为本发明中一个具体实施方式的聚类指标图。Fig. 6 is a clustering index diagram of a specific embodiment of the present invention.

图7为本发明中一个具体实施方式的月负荷模式图。Fig. 7 is a monthly load pattern diagram of a specific embodiment of the present invention.

图8为本发明中一个具体实施方式的用户集群预测结果图。FIG. 8 is a diagram of user cluster prediction results in a specific embodiment of the present invention.

具体实施方式Detailed ways

下面结合本发明的附图1-附图8，对本发明实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅仅是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本发明保护的范围。The technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with accompanying drawings 1 to 8 of the present invention. Obviously, the described embodiments are only part of the embodiments of the present invention, not all of them. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

实施例：Example:

提供一种基于模式匹配的短期负荷预测方法。以300户带有光伏发电系统的居民负荷数据为基础，用皮尔逊相关系数度量用户月负荷的形态距离，采用层次K-means聚类方法最优地划分出不同的月负荷模式。针对每个月负荷模式都进行反馈神经网络(backpropagation neural network，BPNN)，支持向量机回归(support vector regression,SVR)和线性回归(linear regression，LR)模型训练、验证。进行负荷预测时，根据预测日前一月的用户用电数据，将所有用户匹配到形态距离最近的负荷模式中，并以上述测试效果最佳的预测模型进行预测。最后使用平均绝对百分误差(mean absolute percentageerror，MAPE)和均方根误差(root mean square error,RMSE)作为误差测量，通过比较其他预测方法，验证了所提用户集群预测方法的有效性。A short-term load forecasting method based on pattern matching is provided. Based on the load data of 300 households with photovoltaic power generation systems, the Pearson correlation coefficient is used to measure the morphological distance of the user's monthly load, and the hierarchical K-means clustering method is used to optimally divide different monthly load patterns. For each monthly load pattern, the feedback neural network (backpropagation neural network, BPNN), support vector regression (support vector regression, SVR) and linear regression (linear regression, LR) model training and verification are carried out. When carrying out load forecasting, all users are matched to the load pattern with the closest morphological distance according to the user's electricity consumption data in the month before the forecast date, and the forecasting model with the best test effect above is used for forecasting. Finally, using mean absolute percentage error (MAPE) and root mean square error (RMSE) as error measurements, the effectiveness of the proposed user cluster prediction method is verified by comparing other prediction methods.

一、基于层次K-means聚类算法月负荷模式的划分与匹配1. Division and matching of monthly load patterns based on hierarchical K-means clustering algorithm

层次K-means聚类：Hierarchical K-means clustering:

聚类分析主要目标是通过比较所有样本的相似度高低，将相似度高的样本划分为一类，将相似度低的样本划分为不同类。常用的样本相似度度量方法是将样本归一化，并计算归一化样本之间的欧氏距离，但是归一化会造成数据信息压缩和丢失，基于欧式距离的度量方法会因距离平方和计算两个样本的总体差异，而忽略了样本形态上的差异，从而导致聚类效果不理想。K-means算法作为经典的聚类算法，广泛应用于电力负荷聚类分析中，但其初始聚类中心是随机的，且聚类数K值需要人为确定，可能导致聚类效果出现偏差。为解决以上问题，采用基于皮尔逊相关系数的层次K-means聚类算法。The main goal of cluster analysis is to divide the samples with high similarity into one class and the samples with low similarity into different classes by comparing the similarity of all samples. The commonly used sample similarity measurement method is to normalize the samples and calculate the Euclidean distance between the normalized samples, but the normalization will cause the compression and loss of data information, and the measurement method based on the Euclidean distance will be due to the sum of squared distances Calculate the overall difference between two samples, but ignore the difference in sample morphology, resulting in unsatisfactory clustering effect. As a classic clustering algorithm, the K-means algorithm is widely used in the cluster analysis of electric power load, but its initial cluster center is random, and the K value of the cluster number needs to be determined manually, which may lead to deviations in the clustering effect. In order to solve the above problems, a hierarchical K-means clustering algorithm based on Pearson's correlation coefficient is adopted.

皮尔逊系数常用于曲线形态相似度的度量。设x_i＝(x_i1,x_i2,…,x_in)，x_j＝(x_j1,x_j2,…,x_jn)为两条负荷曲线，其皮尔逊相关系数为：The Pearson coefficient is often used to measure the similarity of curve shapes. Suppose x _i ＝(x _i1 , x _i2 ,…,x _in ), x _j ＝(x _j1 ,x _j2 ,…,x _jn ) are two load curves, and their Pearson correlation coefficients are:

式中：

和

分别为x_i和x_j的均值。从式中可以看出，计算时样本减去了样本均值，消除了曲线幅值对相似度的影响。两条曲线形态上越相似，皮尔逊相关系数越低，当两条曲线形态上完全一致时，皮尔逊相关系数为0。如图1所示，图中三条曲线分别为三个用户某天的用电负荷曲线Q₁，Q₂，Q₃。计算Q₂、Q₃与Q₁的欧式距离d和皮尔逊相关系数r，d(Q₁,Q₂)＝133.2，d(Q₂,Q₃)＝346.4，d(Q₁,Q₃)＝363.7，r(Q₁,Q₂)＝0.2524，r(Q₁,Q₃)＝0.2524，r(Q₂,Q₃)＝0。用欧式距离计算，Q₁，Q₂更为接近，但实际上Q₂，Q₃形态上完全一致，计算出来的皮尔逊系数为0。除此之外，现有用余弦相似度作为形态相似性度量，但余弦相似度受曲线幅值影响。因此，选用皮尔逊相关系作为负荷曲线形态相似性的度量。In the formula:

and

are the mean values of x _i and x _j respectively. It can be seen from the formula that the sample mean is subtracted from the sample during calculation, which eliminates the influence of the curve amplitude on the similarity. The more similar the two curves are in shape, the lower the Pearson correlation coefficient is. When the two curves are completely consistent in shape, the Pearson correlation coefficient is 0. As shown in Figure 1, the three curves in the figure are respectively the electricity load curves Q ₁ , Q ₂ , and Q ₃ of three users on a certain day. Calculate the Euclidean distance d and Pearson correlation coefficient r between Q ₂ , Q ₃ and Q ₁ , d(Q ₁ ,Q ₂ )=133.2, d(Q ₂ ,Q ₃ )=346.4, d(Q ₁ ,Q ₃ ) =363.7, r(Q ₁ ,Q ₂ )=0.2524, r(Q ₁ ,Q ₃ )=0.2524, r(Q ₂ ,Q ₃ )=0. Calculated by Euclidean distance, Q ₁ and Q ₂ are closer, but in fact Q ₂ and Q ₃ are completely consistent in shape, and the calculated Pearson coefficient is 0. In addition, cosine similarity is currently used as a morphological similarity measure, but cosine similarity is affected by the amplitude of the curve. Therefore, the Pearson correlation is chosen as the measure of the similarity of load curve shape.

K-means算法作为一种经典划分式算法，其聚类结果取决于初始聚类中心和聚类数K。如果选取不当，会直接导致聚类效果不理想。层次聚类算法可以不用选取初始聚类中心，聚类效果稳定，但其计算复杂度较高，不适用于大规模数据聚类。因此，将二者结合，建立层次K-means聚类算法，可减小计算复杂度，并提高聚类稳定性。层次K-means聚类算法的步骤为：The K-means algorithm is a classic partition algorithm, and its clustering result depends on the initial cluster center and the number K of clusters. If it is not selected properly, it will directly lead to unsatisfactory clustering effect. Hierarchical clustering algorithm does not need to select the initial clustering center, and the clustering effect is stable, but its computational complexity is high, and it is not suitable for large-scale data clustering. Therefore, combining the two to establish a hierarchical K-means clustering algorithm can reduce computational complexity and improve clustering stability. The steps of the hierarchical K-means clustering algorithm are:

(1)重复进行多次相同K值的K-means聚类，记录下每次的聚类中心；(1) Repeat K-means clustering with the same K value multiple times, and record each cluster center;

(2)对所有记录下的聚类中心，进行层次聚类；(2) Perform hierarchical clustering for all recorded cluster centers;

(3)以层次聚类得出的聚类中心作为初始聚类中心，再进行一次K-means聚类，得到最终的聚类结果。(3) The cluster center obtained by hierarchical clustering is used as the initial cluster center, and then K-means clustering is performed to obtain the final clustering result.

该算法可以解决K-means聚类初始聚类中心随机性的问题，但聚类数K仍需人为确定。聚类指标的选取可以确定最优的聚类数。This algorithm can solve the problem of randomness of the initial cluster centers of K-means clustering, but the number of clusters K still needs to be determined manually. The selection of clustering index can determine the optimal number of clusters.

聚类指标的选取：Selection of clustering indicators:

选取合适的聚类质量评价指标，对聚类有效性进行校验，可以确定最优的聚类数K。现有基于图的有效性指标，构建了余弦相似度的聚类评价指标。此方法同样适用于基于皮尔逊相关系数的聚类分析。评价指标为：The optimal clustering number K can be determined by selecting an appropriate clustering quality evaluation index and verifying the clustering effectiveness. Based on the existing graph-based effectiveness index, a clustering evaluation index of cosine similarity is constructed. This method is also suitable for cluster analysis based on Pearson's correlation coefficient. The evaluation indicators are:

式中：K为聚类数目，v_c为各聚类簇的聚类中心；n_K为当前第K类簇包含的样本数。该指标表征各类样本到其相应聚类中心的距离之和，V随着聚类数目的递增而减小，当V值下降趋势趋于平缓时，所对应的聚类数就是最佳聚类数。In the formula: K is the number of clusters, v _c is the cluster center of each cluster; n _K is the number of samples contained in the current K-th cluster. This indicator represents the sum of the distances from various samples to their corresponding cluster centers. V decreases with the increase of the number of clusters. When the downward trend of V value tends to be flat, the corresponding number of clusters is the optimal cluster. number.

二、负荷预测方法及流程2. Load forecasting method and process

负荷预测方法：Load Forecasting Method:

不同的预测方法对不同的负荷模式预测效果不同，为提高预测精度，从BPNN，SVR和LR三种预测方法中选取适合负荷模式的最佳预测模型。Different forecasting methods have different forecasting effects for different load patterns. In order to improve the forecasting accuracy, the best forecasting model suitable for the load pattern is selected from the three forecasting methods of BPNN, SVR and LR.

BP神经网络算法机理：BP neural network algorithm mechanism:

人工神经网络具有自学习、自组织、自适应以及很强的非线性函数逼近能力，拥有强大的容错性。图2为神经元模型结构。Artificial neural network has self-learning, self-organization, self-adaptation and strong nonlinear function approximation ability, and has strong fault tolerance. Figure 2 shows the neuron model structure.

图中：P为输入向量，b为偏置常数，w为输入信号到神经元的权值向量，f为激励函数，y为神经元的输出信号。神经元的输出为In the figure: P is the input vector, b is the bias constant, w is the weight vector from the input signal to the neuron, f is the activation function, and y is the output signal of the neuron. The output of the neuron is

y＝f(wP+b) (25)y=f(wP+b) (25)

BP神经网络作为常用的人工神经网络，由输入层，隐藏层和输出层三部分组成，每层之间的神经元进行全连接，其结构如图3所示。As a commonly used artificial neural network, BP neural network consists of three parts: input layer, hidden layer and output layer. The neurons between each layer are fully connected. Its structure is shown in Figure 3.

BP神经网络算法主要由两部分组成：一是信号的前向传播，二是计算输出值和真实值的误差，将误差反向传播，通过L-M(Levenberg-Marquardt)算法不断修正各个神经元的参数。参数修正完后再次进行训练，直到训练结果达到误差要求或训练最大次数。BP神经网络模型没有具体的数学表达式，其模型完全由网络结构和参数表征。The BP neural network algorithm is mainly composed of two parts: one is the forward propagation of the signal, the other is to calculate the error between the output value and the real value, and then propagate the error back, and continuously correct the parameters of each neuron through the L-M (Levenberg-Marquardt) algorithm . After the parameters are corrected, train again until the training result reaches the error requirement or the maximum number of training times. The BP neural network model has no specific mathematical expression, and its model is completely characterized by the network structure and parameters.

支持向量回归是支持向量机(support vector machine，SVM)的重要应用分支。其是通过非线性映射将输入向量映射到高维特征空间，通过计算得到一个回归超平面，让集合中所有样本点到该超平面的距离和最小。非线性映射也被称作核函数，常用的核函数有线性函数，多项式函数，高斯函数，径向基函数等。设定输入的样本为M＝{(x_i,y_i),i＝1,2,…n}，x_i∈R^d，y_i∈R，SVR得到的超平面函数表达式为：Support vector regression is an important application branch of support vector machine (SVM). It maps the input vector to a high-dimensional feature space through nonlinear mapping, and obtains a regression hyperplane through calculation, so that the distance sum of all sample points in the set to the hyperplane is the smallest. Nonlinear mapping is also called kernel function, and commonly used kernel functions include linear function, polynomial function, Gaussian function, radial basis function, etc. Set the input sample as M={(xi _, y _i ),i=1,2,…n}, x _i ∈ R ^d , y _i ∈ R, the hyperplane function expression obtained by SVR is:

f(x)＝ωΦ(x)+b (26)f(x)=ωΦ(x)+b (26)

式中：Φ(·)为核函数，b为阈值，ω为权值向量。Where: Φ( ) is the kernel function, b is the threshold, and ω is the weight vector.

SVR损失函数为：The SVR loss function is:

式中：ε为可以容忍样本点到超平面的的距离，即当样本点距离超平面距离小于等于ε，损失为0。模型得求解很多现有技术已有叙述，这里就不在赘述。In the formula: ε is the distance between the sample point and the hyperplane that can be tolerated, that is, when the distance between the sample point and the hyperplane is less than or equal to ε, the loss is 0. The solution of the model has already been described in many existing technologies, and will not be repeated here.

线性回归算法机理：Linear regression algorithm mechanism:

线性回归(linear regression，LR)是利用回归方程(函数)对一个或多个自变量x和因变量y之间关系进行建模的一种分析方式。其模型为：Linear regression (linear regression, LR) is an analysis method that uses a regression equation (function) to model the relationship between one or more independent variables x and dependent variables y. Its model is:

f(x)＝w₀+w₁x₁+w₂x₂+...+w_nx_n (28)f(x)＝w ₀ +w ₁ x ₁ +w ₂ x ₂ +...+w _n x _n (28)

y＝f(x)+δ (29)y=f(x)+δ (29)

式中：w为权值系数，δ为残差。线性回归的主要目的是求得系数w使对于集合中所有样本点M＝{(x_i,y_i),i＝1,2,n}离f(x)距离和最近。线性回归一般采用最小二乘法求出其方程。In the formula: w is the weight coefficient, and δ is the residual. The main purpose of linear regression is to obtain the coefficient w so that for all sample points M={( _xi ,y _i ),i=1,2,n} in the set, the distance from f(x) is the shortest. Linear regression generally uses the least squares method to find its equation.

ω₁,...,ω_n＝(X^TY)^-1X^TY (31)ω ₁ ,...,ω _n ＝(X ^T Y) ^-1 X ^T Y (31)

利用线性回归做负荷预测计算量小，直接得出输入特征与预测信息的关系，可解释性好。The use of linear regression for load forecasting requires less calculation, and the relationship between input features and forecast information can be directly obtained, which has good interpretability.

三、基于模式匹配的负荷预测流程3. Load forecasting process based on pattern matching

输入特征参数部分：Enter the feature parameter section:

预测第t时刻的负荷Pt，需要给模型输入与t时刻负荷相关性很强的特征参数。历史负荷，工作日，周末以及每天所处的时刻对用电负荷影响较大，选择的输入特征参数为：To predict the load Pt at time t, it is necessary to input characteristic parameters that are highly correlated with the load at time t to the model. Historical load, weekdays, weekends and the time of day have a great influence on the power load. The selected input characteristic parameters are:

x＝{P_t-h,P_t-h-1,P_t-h-2,P_t-h+1,P_t-h+2,P_t-2h,P_t-2h-1,P_t-2h+1,x＝{P _th ,P _th-1 ,P _th-2 ,P _t-h+1 ,P _t-h+2, P _t-2h ,P _t-2h-1 ,P _t-2h+1 ,

P_t-3h,P_t-7h,weekday,hour}P _t-3h ,P _t-7h ,weekday,hour}

式中前7个为历史负荷数据，h为采集负荷数据一天的时刻数，weekday，hour分别表征星期和小时。In the formula, the first seven are historical load data, h is the number of hours in a day when the load data is collected, weekday and hour represent the week and hour respectively.

基于皮尔逊相关系数的用电负荷模式匹配部分：The matching part of electricity load pattern based on Pearson correlation coefficient:

设定一个月按28计算(下同)，在预测某月第q+1(q＝0，1，…27)天的负荷时，需要选取预测日前28天的负荷数据进行模式匹配，其中包含本月前q天的数据data1，和上一月后28-q的负荷数据data2。为使得模式匹配时间上的一致，将data1数据移到data2之前，拼接成一个用户从月初到月末的完成月负荷数据data3。计算每个用户data3与每个月负荷模式的皮尔逊相关系数，以其最低值将每个用户归类到形态最相似的月负荷模式中。模式匹配如图4所示。Set a month to be calculated as 28 (the same below). When predicting the load on the q+1th (q=0, 1, ... 27) day of a certain month, it is necessary to select the load data of the 28 days before the forecast date for pattern matching, which includes The data data1 of the first q days of this month, and the load data data2 of the 28-q after the previous month. In order to make the pattern matching time consistent, data1 data is moved before data2, and spliced into a user's completed monthly load data data3 from the beginning of the month to the end of the month. Calculate the Pearson correlation coefficient between each user data3 and each monthly load pattern, and classify each user into the most similar monthly load pattern with its lowest value. Pattern matching is shown in Figure 4.

负荷预测总流程部分：The overall process part of load forecasting:

所提出的基于模式自适应匹配的用户集群预测方法的总流程如图5所示，其具体步骤为：The overall process of the proposed user cluster prediction method based on pattern adaptive matching is shown in Figure 5, and its specific steps are:

(1)对用户负荷数据进行预处理。(1) Preprocessing the user load data.

(2)以所有用户的一个月(按28天计算)的负荷为样本，对样本进行多次基于皮尔逊系数的K-means聚类，计算聚类指标V，得到最佳的聚类数K_b。(2) Taking the load of all users in one month (calculated according to 28 days) as a sample, perform multiple K-means clustering based on Pearson's coefficient on the sample, calculate the clustering index V, and obtain the optimal clustering number K _b .

(3)选取聚类数为K_b，进行基于皮尔逊相关系数的层次K-means聚类，得到K_b类月负荷，以每类月负荷的聚类中心作为月负荷模式。(3) Select the number of clusters as K _b , and perform hierarchical K-means clustering based on Pearson correlation coefficient to obtain K _b category monthly loads, and take the cluster center of each category of monthly loads as the monthly load pattern.

(4)对每个月负荷模式构建BPNN，SVR和LR模型。选择测试效果最佳的模型与月负荷模式进行匹配。(4) Construct BPNN, SVR and LR models for each monthly load pattern. The best tested model is selected to match the monthly load pattern.

(5)将每个用户预测日前一个月负荷与与之形态最相似的月负荷相匹配。(5) Match the load of the month before each user's forecast date with the monthly load that is most similar to it.

(6)对匹配到同一模式的用户月负荷求和，计算与月负荷模式的匹配度，即皮尔逊相关系数。(6) Sum the monthly loads of users matched to the same mode, and calculate the matching degree with the monthly load mode, that is, the Pearson correlation coefficient.

(7)以月负荷模式最佳的预测模型进行负荷预测，将每个模式的预测结果相加得到最终的负荷预测结果。(7) Carry out load forecasting with the best forecasting model of monthly load mode, and add up the forecasting results of each mode to obtain the final load forecasting result.

(8)在步骤(7)中如果月负荷模式的最佳预测模型是BPNN，且得到的用户负荷数据与月负荷模式的匹配度小于一定的阈值

(设

)，方可使用BPNN预测模型，否则使用该月负荷模式的次优预测模型，以解决BPNN模型训练时出现的“过拟合”，导致泛化能力差，预测结果不好的问题。(8) In step (7), if the best prediction model of the monthly load pattern is BPNN, and the matching degree between the obtained user load data and the monthly load pattern is less than a certain threshold

(set up

), the BPNN forecasting model can be used, otherwise, the suboptimal forecasting model of the monthly load pattern is used to solve the problem of "overfitting" that occurs during BPNN model training, resulting in poor generalization ability and poor forecasting results.

(9)使用MAPE和RMSE对用户集群预测效果进行评估。(9) Use MAPE and RMSE to evaluate the effect of user cluster prediction.

算例分析：Calculation analysis:

选取澳大利亚300户带有光伏发电系统居民用户的电网侧用电数据进行分析。数据长度从2010年7月1日到2011年6月30日。时间分辨率为30分钟。The grid-side power consumption data of 300 households with photovoltaic power generation systems in Australia were selected for analysis. The data length is from July 1, 2010 to June 30, 2011. The temporal resolution is 30 minutes.

月负荷划分结果：The result of monthly load division:

选取300户居民2010年7月1日开始前10月的数据，共计3000个月负荷数据样本。按照上述步骤(2)，得到聚类指标如图6所示。Select the data of 300 households in the 10 months before July 1, 2010, a total of 3000 monthly load data samples. According to the above step (2), the clustering index is obtained as shown in Figure 6.

从图中可以看出，K值在9～12之间，评价指标V下降趋势趋缓，选择K_b＝9为最佳聚类数。再进行层次K-means聚类，得到9类月负荷模式。如图7所示。It can be seen from the figure that when the value of K is between 9 and 12, the downward trend of the evaluation index V slows down, and K _b =9 is selected as the optimal number of clusters. Then perform hierarchical K-means clustering to obtain 9 types of monthly load patterns. As shown in Figure 7.

从图中可以看出，9类月负荷模式波动模式都不相同，第1，2，3，5，8，9类模式每天负荷波动较为平稳，规律性强。第4类模式每天负荷幅值有增大的趋势。第6，7类模式的负荷波动较随机。It can be seen from the figure that the fluctuation patterns of the nine monthly load patterns are different, and the daily load fluctuations of the first, second, third, fifth, eighth, and ninth patterns are relatively stable and regular. The daily load amplitude of the fourth type of mode has a tendency to increase. The load fluctuations of the 6th and 7th modes are more random.

月负荷模式共计有1344个点，如果月负荷模式的数据首尾相接，即可以用月负荷末端的数据预测月负荷首端负荷，可以构建1344个训练样本，从中均匀、有放回的选出1344个样本进行预测模型预测训练，大约有63％的样本用于训练，剩下没有被抽取到的样本用于测试，得到的测试效果如下表1所示。There are a total of 1344 points in the monthly load pattern. If the data of the monthly load pattern are connected end to end, the data at the end of the monthly load can be used to predict the head-end load of the monthly load, and 1344 training samples can be constructed, and the ones that are uniform and have replacement can be selected 1344 samples were used for forecasting model prediction training, about 63% of the samples were used for training, and the remaining samples that were not drawn were used for testing. The test results obtained are shown in Table 1 below.

表1各类月负荷模式不同模型测试误差MAPETable 1 Test error MAPE of different models for various monthly load modes

Tab 1MAPE of different model test errors for various monthly loadpatternsTab 1MAPE of different model test errors for various monthly load patterns

表中加粗标注字体的是适合该类月负荷模式的最佳模型的测试误差。例如：第1类月负荷模式选用SVR预测模型效果最好，测试误差为0.087。The bold font in the table is the test error of the best model suitable for this type of monthly load pattern. For example: the SVR prediction model for the first type of monthly load mode has the best effect, and the test error is 0.087.

实验对象设置和结果分析：Experimental object setup and result analysis:

为了验证所提方法的有效性，设置以下4个方案进行负荷预测。In order to verify the effectiveness of the proposed method, the following four schemes are set up for load forecasting.

方案一，不进行模式划分，使用BPNN直接预测。Option 1, without pattern division, use BPNN to predict directly.

方案二，不进行模式划分，使用SVR直接预测。Scheme 2, without pattern division, use SVR to predict directly.

方案三，不进行模式划分，使用LR直接预测。Option 3, without pattern division, use LR to directly predict.

方案四，进行模式划分和匹配，每个模式按最佳预测方法进行预测。Scheme 4, pattern division and matching are carried out, and each pattern is predicted according to the best prediction method.

对11月前7日进行负荷预测，得到的预测结果如图8和表2所示：The load forecast is carried out on the first 7 days of November, and the forecast results are shown in Figure 8 and Table 2:

表2不同实验方案用户集群预测结果Table 2 User cluster prediction results of different experimental schemes

Tab 2Consumer cluster prediction results of different experimentalschemesTab 2Consumer cluster prediction results of different experimental schemes

对比以上结果发现，相较于其他使用单一模型的方法，所提的自适应模式匹配预测方法能达到最佳的预测效果，能够提升约0.6％的预测精度Comparing the above results, it is found that compared with other methods using a single model, the proposed adaptive pattern matching prediction method can achieve the best prediction effect, and can improve the prediction accuracy by about 0.6%.

选取第11月第7日模式匹配结果如下表3所示：Select the pattern matching results on the 7th day of the 11th month as shown in Table 3 below:

表3 11月第7日模式匹配结果Table 3 Pattern matching results on the 7th day of November

Tab 3Pattern matching results on November 7Tab 3Pattern matching results on November 7

从表3中可以得出，模型匹配到的用户数越少，匹配度越高，即用户负荷数据与月负荷模式形态相似度越低，这在一定程度上会使预测效果变差，但因为用户数较少，对整体负荷预测的影响较小。It can be concluded from Table 3 that the fewer users matched by the model, the higher the matching degree, that is, the lower the similarity between the user load data and the monthly load pattern, which will make the prediction effect worse to a certain extent, but because The number of users is small, and the impact on the overall load forecasting is small.

以上是本发明的较佳实施例，凡依本发明技术方案所作的改变，所产生的功能作用未超出本发明技术方案的范围时，均属于本发明的保护范围。The above are the preferred embodiments of the present invention, and all changes made according to the technical solution of the present invention, when the functional effect produced does not exceed the scope of the technical solution of the present invention, all belong to the protection scope of the present invention.

Claims

1. The short-term load forecasting method based on pattern matching is characterized in that, comprises the following steps:

S1. Using the hierarchical K-means clustering algorithm based on the Pearson correlation coefficient, the user's historical load data is optimally divided into various monthly load patterns;

S2. For each monthly load pattern, select the pattern matching algorithm with the best verification effect from feedback neural network, support vector machine regression and linear regression;

S3. According to the similarity between the user's latest monthly load data and the monthly load pattern, the user's electricity consumption data is adaptively matched with the monthly load pattern, and the prediction results of each pattern are added to obtain the total prediction result.

2. the short-term load forecasting method based on pattern matching as claimed in claim 1, is characterized in that, step S1 is specifically as follows:

Clustering similarity measure function based on Pearson correlation coefficient:

Suppose x _i ＝(x _i1 , x _i2 ,…,x _in ), x _j ＝(x _j1 ,x _j2 ,…,x _jn ) are two load curves, and their Pearson correlation coefficients are:

In the formula:

and

are the mean values of x _i and x _j respectively;

Hierarchical K-means clustering method principle:

Repeat K-means clustering with the same K value multiple times, and record each cluster center;

Perform hierarchical clustering on all recorded cluster centers;

The cluster center obtained by hierarchical clustering is used as the initial cluster center, and then K-means clustering is performed again to obtain the final clustering result;

Selection of clustering indicators:

The evaluation indicators are:

In the formula: K is the number of clusters, v _c is the cluster center of each cluster; n _K is the number of samples contained in the current K-th cluster.

3. the short-term load forecasting method based on pattern matching as claimed in claim 2, is characterized in that, step S2 is specifically as follows:

The mechanism formula of BP neural network algorithm is as follows:

Among them: P is the input vector, b is the bias constant, w is the weight vector from the input signal to the neuron, f is the activation function, and y is the output signal of the neuron; the output of the neuron is

y=f(wP+b) (3)

The BP neural network consists of three parts: the input layer, the hidden layer and the output layer, and the neurons between each layer are fully connected;

The BP neural network algorithm is composed of two parts: one is the forward propagation of the signal, the other is to calculate the error between the output value and the real value, and propagate the error back, and continuously correct the parameters of each neuron through the L-M algorithm; Carry out training until the training result reaches the error requirement or the maximum number of training times;

Support vector machine regression algorithm mechanism:

The input vector is mapped to the high-dimensional feature space through nonlinear mapping, and a regression hyperplane is obtained by calculation, so that the distance sum of all sample points in the set to the hyperplane is the smallest; nonlinear mapping is also called a kernel function; set the input The sample of is M={( _xi ,y _i ),i=1,2,…n}, x _i ∈ R ^d , y _i ∈ R, the hyperplane function expression obtained by SVR is:

f(x)=ωΦ(x)+b (4)

In the formula: Φ( ) is the kernel function, b is the threshold, and ω is the weight vector;

The SVR loss function is:

In the formula: ε is the distance from the tolerance sample point to the hyperplane, that is, when the distance between the sample point and the hyperplane is less than or equal to ε, the loss is 0;

Linear regression algorithm mechanism:

Linear regression is an analytical method that uses a regression equation to model the relationship between one or more independent variables x and dependent variables y; its model is:

f(x)＝w ₀ +w ₁ x ₁ +w ₂ x ₂ +...+w _n x _n (6)

y=f(x)+δ (7)

In the formula: w is the weight coefficient, δ is the residual error; the main purpose of linear regression is to obtain the coefficient w so that for all sample points in the set M={( _xi ,y _i ),i=1,2,…n } The distance from f(x) is the closest; linear regression uses the least square method to find its equation;

ω ₁ ,...,ω _n ＝(X ^T Y) ^-1 X ^T Y (9)

Using linear regression to do load forecasting requires a small amount of calculation, and the relationship between input features and forecast information can be directly obtained.

4. the short-term load forecasting method based on pattern matching as claimed in claim 3, is characterized in that, step S3 is specifically as follows:

Input feature parameters:

To predict the load P _t at time t, it is necessary to input characteristic parameters to the model that are highly correlated with the load at time t; the input characteristic parameters are:

x＝{P _th ,P _th-1 ,P _th-2 ,P _t-h+1 ,P _t-h+2 ,P _{t-2h ,P t-2h} _-1 ,P _t-2h+1 ,P _t-3h ,P _t-7h ,weekday,hour}

In the formula, the first seven are historical load data, h is the number of hours in a day when the load data is collected, weekday and hour represent the week and hour respectively;

Electric load pattern matching based on Pearson correlation coefficient:

Set a month to be calculated as 28. When predicting the load on day q+1 (q=0, 1, ... 27) of a certain month, it is necessary to select the load data of the 28 days before the forecast date for pattern matching, which includes the q The data data1 of the day, and the load data data2 of 28-q after the previous month; in order to make the pattern matching time consistent, the data of data1 is moved before data2, and spliced into a user's completed monthly load data data3 from the beginning of the month to the end of the month; Calculate the Pearson correlation coefficient between each user data3 and each monthly load pattern, and classify each user into the most similar monthly load pattern with its lowest value;

The overall process of load forecasting:

(1) Preprocessing the user load data;

(2) Taking the monthly load of all users as a sample, perform multiple K-means clustering based on Pearson's coefficient on the sample, calculate the clustering index V, and obtain the best clustering number K _b ;

(3) Select the number of clusters as K _b , perform hierarchical K-means clustering based on the Pearson correlation coefficient, and obtain the monthly load of the K _b category, and use the cluster center of each category of monthly load as the monthly load pattern;

(4) Build BPNN, SVR and LR models for each monthly load pattern; select the model with the best test effect to match the monthly load pattern;

(5) Match the load of the month before each user's forecast with the monthly load that is most similar to it;

(6) Sum the monthly loads of users matched to the same pattern, and calculate the matching degree with the monthly load pattern, that is, the Pearson correlation coefficient;

(7) Carry out load forecasting with the best forecasting model of the monthly load mode, and add up the forecasting results of each mode to obtain the final load forecasting result;

(8) In (7), if the best prediction model of the monthly load pattern is BPNN, and the matching degree between the obtained user load data and the monthly load pattern is less than a certain threshold

(9) Use MAPE and RMSE to evaluate the user cluster prediction effect;

In the formula: e is the actual value, o is the predicted value, and N is the predicted number.