CN117273802B

CN117273802B - Medium-short term electricity sales quantity prediction method based on deep neural network

Info

Publication number: CN117273802B
Application number: CN202311239313.8A
Authority: CN
Inventors: 何军; 龚妮; 向婷; 陈秘; 康蓝心; 李明超; 付饶; 周韵
Original assignee: State Grid Sichuan Electric Power Co Ltd
Current assignee: State Grid Sichuan Electric Power Co Ltd
Priority date: 2023-09-25
Filing date: 2023-09-25
Publication date: 2024-01-19
Anticipated expiration: 2043-09-25
Also published as: CN117273802A

Abstract

The invention discloses a method for predicting middle-short-term electricity sales based on a deep neural network, which relates to the technical field of electricity sales data processing and comprises the following steps: acquiring an original sample set of each area; performing data cleaning on the original sample set; preprocessing the cleaned data to obtain a preprocessed sample set; the management center carries out prediction optimization coefficient YH analysis on the pretreatment sample sets of each platform region to obtain a prediction optimization sequence of the pretreatment sample sets, and data processing efficiency is improved; after receiving the pretreatment sample set, the prediction terminal processes the time sequence in a sequence difference mode, and an LSTM neural network model is established; initializing the weight of a neural network and the parameters of a particle swarm optimization algorithm by adopting an Nguyen-Widry method; performing model training on the neural network model, and performing model evaluation through a loss function to obtain an optimal medium-short-term electricity selling data prediction model with minimum overall error of a training sample; and the data prediction accuracy is improved.

Description

A short- and medium-term electricity sales forecast method based on deep neural network

技术领域Technical field

本发明涉及售电数据处理技术领域，具体是一种基于深度神经网络的中短期售电量预测方法。The invention relates to the technical field of electricity sales data processing, specifically a short- and medium-term electricity sales prediction method based on a deep neural network.

背景技术Background technique

随着电力市场的发展以及用户需求的提升，电网的安全及经济运行变得至关重要。对台区售电量进行准确的短期预测，可以有效保障电网安全运行，降低发电成本，满足用户需求和提高社会经济效益。由于台区售电量具有明显的周期特性，同时影响因素复杂，如用户用电行为、负荷变化、季节变化、节假日等，因此选择先进和准确的中短期售电量预测方法十分必要。With the development of the power market and the increase in user needs, the safe and economical operation of the power grid has become crucial. Accurate short-term forecasting of electricity sales in Taiwan can effectively ensure the safe operation of the power grid, reduce power generation costs, meet user needs and improve social and economic benefits. Since electricity sales in Taiwan have obvious cyclical characteristics and the influencing factors are complex, such as user electricity consumption behavior, load changes, seasonal changes, holidays, etc., it is necessary to choose advanced and accurate short- and medium-term electricity sales forecasting methods.

在售电量预测过程中，以利用循环神经网络(Recurrent Neural Network,RNN)或者长短期记忆网络(Long short-term memory,LSTM)两种神经网络模型较常见。在时间充足的情况下，RNN/LSTM能满足大多数电力部分的要求。然而，在RNN/LSTM的网络结构中，当前的层输入为前一层的输出，所有RNN/LSTM比较适合处理时间序列问题，但也正是由于这种前后串行结构，限制了RNN/LSTM模型的训练速度，无法进行并行化处理，在数据量较大和工期较紧张的情况下，严重影响了建模效率；而建立在统计学理论基础上的支持向量法，其自选参数和函数的确定需要依靠人工经验，也会影响预测效果。基于以上不足，本发明提出一种基于深度神经网络的中短期售电量预测方法。In the process of power sales forecasting, it is more common to use two neural network models: Recurrent Neural Network (RNN) or Long Short-term Memory Network (LSTM). Given sufficient time, RNN/LSTM can meet most power requirements. However, in the network structure of RNN/LSTM, the input of the current layer is the output of the previous layer. All RNN/LSTM are more suitable for processing time series problems, but it is precisely because of this serial structure that limits RNN/LSTM. The training speed of the model cannot be parallelized, which seriously affects the modeling efficiency when the amount of data is large and the construction period is tight. However, the support vector method based on statistical theory has to determine its optional parameters and functions. The need to rely on manual experience will also affect the prediction effect. Based on the above shortcomings, the present invention proposes a short- and medium-term power sales prediction method based on deep neural networks.

发明内容Contents of the invention

本发明旨在至少解决现有技术中存在的技术问题之一。为此，本发明提出一种基于深度神经网络的中短期售电量预测方法。The present invention aims to solve at least one of the technical problems existing in the prior art. To this end, the present invention proposes a short- and medium-term power sales prediction method based on deep neural networks.

为实现上述目的，根据本发明的第一方面的实施例提出一种基于深度神经网络的中短期售电量预测方法，包括如下步骤：In order to achieve the above objectives, according to an embodiment of the first aspect of the present invention, a short- and medium-term power sales prediction method based on a deep neural network is proposed, which includes the following steps:

步骤一：获取各个台区的原始样本集；对所述原始样本集进行数据清洗；对清洗后数据进行预处理，得到预处理样本集并缓存至管理中心；所述原始样本集包括日期数据、气候文本数据以及售电量历史数据；Step 1: Obtain the original sample set of each station; perform data cleaning on the original sample set; preprocess the cleaned data to obtain the preprocessed sample set and cache it to the management center; the original sample set includes date data, Climate text data and electricity sales historical data;

步骤二：管理中心对各个台区的预处理样本集进行预测优化系数YH分析，并将预处理样本集按照预测优化系数YH大小进行排序，得到预处理样本集的预测优化序列；Step 2: The management center analyzes the prediction optimization coefficient YH of the preprocessing sample sets in each station area, and sorts the preprocessing sample sets according to the size of the prediction optimization coefficient YH to obtain the prediction optimization sequence of the preprocessing sample set;

然后根据所述预测优化序列依次将各个台区的预处理样本集发送至预测终端进行售电数据预测模型训练；Then, according to the prediction optimization sequence, the preprocessed sample sets of each station area are sent to the prediction terminal for electricity sales data prediction model training;

步骤三：预测终端接收到预处理样本集后，将预处理样本集拆分为互斥的训练集、验证集以及测试集；采用序列差分的方式处理时间序列，建立LSTM神经网络模型；其中，根据输入变量的个数指定长短期记忆神经网络LSTM输入节点的个数，设定适合的隐藏层节点个数，以及代表售电量预测值的输出节点个数；Step 3: After the prediction terminal receives the preprocessed sample set, it splits the preprocessed sample set into mutually exclusive training sets, verification sets, and test sets; uses sequence difference to process the time series and establishes an LSTM neural network model; where, Specify the number of input nodes of the long short-term memory neural network LSTM according to the number of input variables, set the appropriate number of hidden layer nodes, and the number of output nodes representing the predicted value of electricity sales;

步骤四：初始化神经网络权值和粒子群优化算法参数；其中，采用Nguyen-Widrow方法初始化神经网络权值；对所述神经网络模型进行模型训练，得到训练好的中短期售电数据预测模型；Step 4: Initialize the neural network weights and particle swarm optimization algorithm parameters; wherein, the Nguyen-Widrow method is used to initialize the neural network weights; perform model training on the neural network model to obtain a trained short- and medium-term electricity sales data prediction model;

步骤五：获得最优参数的中短期售电数据预测模型后，将待预测时区前一时区的售电量数据和待预测时区的气候文本数据作为模型输入，获得待预测时区的售电量预测值。Step 5: After obtaining the short- and medium-term electricity sales data prediction model with optimal parameters, use the electricity sales data in the time zone before the time zone to be predicted and the climate text data in the time zone to be predicted as input to the model to obtain the predicted value of electricity sales in the time zone to be predicted.

进一步地，管理中心对各个台区的预处理样本集进行预测优化系数YH分析，具体分析步骤为：Further, the management center conducts prediction optimization coefficient YH analysis on the preprocessed sample sets of each station area. The specific analysis steps are:

获取预处理样本集对应的台区，获取所述台区的供电区域；统计供电区域内供电线路长度为L1，供电户数为HL以及户均用电量为DL；Obtain the station area corresponding to the preprocessed sample set and obtain the power supply area of the station area; count the length of the power supply line in the power supply area as L1, the number of power supply households as HL, and the average power consumption per household as DL;

利用公式GD=L1×g2+HL×g3+DL×g4计算得到所述台区的区域关联系数GD，其中g2、g3、g4为预设系数因子；The regional correlation coefficient GD of the station area is calculated using the formula GD=L1×g2+HL×g3+DL×g4, where g2, g3, and g4 are preset coefficient factors;

在预设时间段内，采集所述台区每个采样区间的供售误差数据；所述供售误差数据为供电量和售电量的差值数据；所述差值数据取正数；Within a preset time period, the supply and sales error data of each sampling interval of the station area is collected; the supply and sales error data is the difference data between the power supply and the electricity sales; the difference data is taken as a positive number;

将每个采样区间的供售误差数据标记为Wi；将供售误差数据Wi与预设误差阈值相比较；若Wi＞预设误差阈值，则表明对应台区电能损耗较大，拉高了发电成本，对电网运行造成了额外损耗；Mark the supply and sale error data of each sampling interval as Wi; compare the supply and sale error data Wi with the preset error threshold; if Wi > the preset error threshold, it indicates that the corresponding station area has a large power loss, which increases the power generation. cost, causing additional losses to the operation of the power grid;

统计供售误差数据Wi＞预设误差阈值的次数占比为Zb，当Wi＞预设误差阈值时，获取Wi与预设误差阈值的差值并进行求和得到超误总值LZ；利用公式GS=Zb×g1+LZ×g5计算得到所述台区的供电损耗系数GS，其中g1、g5均为预设系数因子；The proportion of statistical supply and sale error data Wi > the preset error threshold is Zb. When Wi > the preset error threshold, the difference between Wi and the preset error threshold is obtained and summed to obtain the total excess error value LZ; use the formula GS=Zb×g1+LZ×g5 is calculated to obtain the power supply loss coefficient GS of the station area, where g1 and g5 are both preset coefficient factors;

将区域关联系数、供电损耗系数进行归一化处理并取其数值，利用公式YH=ƒ×(GD×b1+GS×b2)计算得到所述预处理样本集的预测优化系数YH，其中b1、b2均为预设系数因子；ƒ为预设均衡系数。The regional correlation coefficient and power supply loss coefficient are normalized and their values are calculated, and the prediction optimization coefficient YH of the preprocessed sample set is calculated using the formula YH=ƒ×(GD×b1+GS×b2), where b1, b2 are all preset coefficient factors; ƒ is the preset equalization coefficient.

进一步地，对所述原始样本集进行数据清洗，包括：对空值进行填充或者丢弃；对重复数据进行去重处理；对范围错误的气候文本数据进行清洗；对气候文本数据进行数据验证。Further, data cleaning is performed on the original sample set, including: filling or discarding empty values; deduplicating duplicate data; cleaning climate text data with wrong ranges; and performing data verification on climate text data.

进一步地，对清洗后数据进行预处理，包括：Further, preprocess the cleaned data, including:

对售电历史数据进行单位换算处理，补全采样时间点保证其连续，并利用平均插值法填补采样点缺失数据，得到台区售电量时间序列；其中，若采样时间点和数据大面积缺失，则利用同时期他年数据进行填补。Perform unit conversion processing on the historical electricity sales data, complete the sampling time points to ensure their continuity, and use the average interpolation method to fill in the missing data at the sampling points to obtain the Taiwan area electricity sales time series; among them, if the sampling time points and data are missing in large areas, Data from other years during the same period are used for filling.

进一步地，采用序列差分的方式处理时间序列，具体步骤包括：Further, the time series is processed using sequence difference. The specific steps include:

依据时间序列的月、季度数据，得到台区售电量12个月或4个季度的移动平均值，获得长期的移动平均值趋势数据Q；Based on the monthly and quarterly data of the time series, the 12-month or 4-quarter moving average of electricity sales in Taiwan is obtained, and the long-term moving average trend data Q is obtained;

依据乘法模型，计算Y/Q=S×C×I；其中，Y表示年份，S代表季节成分，C代表周期成分，I代表不规则成分；According to the multiplicative model, calculate Y/Q=S×C×I; where Y represents the year, S represents the seasonal component, C represents the periodic component, and I represents the irregular component;

重新依据各个年份的同月份或同季度计算Y/Q并取均值，得到各个年份同月份或同季度的算术平均数Pi；Recalculate Y/Q based on the same month or quarter of each year and take the average to obtain the arithmetic mean Pi of the same month or quarter of each year;

将各个年份同月份或同季度的算术平均数Pi作为分子，将所有月份或季度的算术平均数之和作为分母，计算得到每个月份或季度的季节比率Si，其中，Si=(N×Pi)/(P1+P2+…+PN)；季节比率Si即为季节因素对售电量长期趋势的修正系数；P1+P2+…+PN即为所有月份或季度的算术平均数之和；N为样本数；Taking the arithmetic mean Pi of the same month or quarter in each year as the numerator and the sum of the arithmetic means of all months or quarters as the denominator, the seasonal ratio Si of each month or quarter is calculated, where Si=(N×Pi )/(P1+P2+…+PN); seasonal ratio Si is the correction coefficient of seasonal factors to the long-term trend of electricity sales; P1+P2+…+PN is the sum of the arithmetic average of all months or quarters; N is the number of samples ;

计算得到去除季节性因素的Tt与t期对应季节性Si的乘积，即为对应t期的售电量预测值；其中Qt为t期的移动平均值趋势数据。The product of Tt with seasonal factors removed and the corresponding seasonal Si in period t is calculated, which is the predicted value of electricity sales corresponding to period t; where Qt is the moving average trend data of period t.

进一步地，所述神经网络权值包括模型的输入值、隐藏值、模型层数、神经元丢弃率、激活函数、特征权重初始值、偏置中的至少一种。Further, the neural network weight includes at least one of the input value of the model, hidden value, number of model layers, neuron dropout rate, activation function, initial value of feature weight, and bias.

进一步地，对所述神经网络模型进行模型训练，得到训练好的中短期售电数据预测模型，具体包括：Further, model training is performed on the neural network model to obtain a trained medium- and short-term electricity sales data prediction model, which specifically includes:

将所述训练集、验证集、测试集作为历史特征值输入所述LSTM神经网络模型以进行模型训练，并通过损失函数进行模型评估，获得使训练样本整体误差最小的最优中短期售电数据预测模型.The training set, verification set, and test set are input into the LSTM neural network model as historical feature values for model training, and the model is evaluated through the loss function to obtain the optimal short- and medium-term electricity sales data that minimizes the overall error of the training sample. Predictive model.

进一步地，通过损失函数进行模型评估，包括：Further, model evaluation is performed through the loss function, including:

采用均方根误差方程作为损失函数，采用R2_score决定系数作为模型拟合度评估方法，评估模型预测值与真实值之间的损失情况；The root mean square error equation is used as the loss function, and the R2_score coefficient of determination is used as the model fitting evaluation method to evaluate the loss between the model prediction value and the true value;

根据损失评估结果确定训练好的中短期售电数据预测模型。Determine the trained short- and medium-term electricity sales data prediction model based on the loss assessment results.

与现有技术相比，本发明的有益效果是：Compared with the prior art, the beneficial effects of the present invention are:

1、本发明通过获取各个台区的原始样本集；对原始样本集进行数据清洗；对清洗后数据进行预处理，得到预处理样本集并缓存至管理中心；管理中心对各个台区的预处理样本集进行预测优化系数YH分析，结合供电区域内供电线路长度，供电户数、户均用电量以及每个采样区间的供售误差数据，计算得到预处理样本集的预测优化系数YH；将预处理样本集按照预测优化系数YH大小进行排序，得到预处理样本集的预测优化序列；有效提高数据处理效率；1. The present invention obtains the original sample set of each station area; performs data cleaning on the original sample set; preprocesses the cleaned data to obtain the preprocessed sample set and caches it to the management center; the management center preprocesses each station area The sample set is analyzed for the prediction optimization coefficient YH, and the prediction optimization coefficient YH of the preprocessed sample set is calculated based on the length of the power supply line in the power supply area, the number of households with power supply, average electricity consumption per household, and the supply and sale error data in each sampling interval; The preprocessing sample set is sorted according to the size of the prediction optimization coefficient YH to obtain the prediction optimization sequence of the preprocessing sample set; effectively improving data processing efficiency;

2、本发明中预测终端接收到预处理样本集后，将预处理样本集拆分为互斥的三组数据集；采用序列差分的方式处理时间序列，建立LSTM神经网络模型；采用Nguyen-Widrow方法初始化神经网络权值和粒子群优化算法参数；将训练集、验证集、测试集作为历史特征值输入LSTM神经网络模型以进行模型训练，并通过损失函数进行模型评估，获得使训练样本整体误差最小的最优中短期售电数据预测模型；提高数据预测精度。2. In the present invention, after the prediction terminal receives the preprocessed sample set, it splits the preprocessed sample set into three mutually exclusive data sets; uses sequence difference to process the time series and establishes an LSTM neural network model; uses Nguyen-Widrow The method initializes the neural network weights and particle swarm optimization algorithm parameters; inputs the training set, verification set, and test set as historical feature values into the LSTM neural network model for model training, and evaluates the model through the loss function to obtain the overall error of the training sample The smallest optimal short- and medium-term electricity sales data prediction model; improves data prediction accuracy.

附图说明Description of the drawings

为了更清楚地说明本发明实施例或现有技术中的技术方案，下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本发明的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to explain the embodiments of the present invention or the technical solutions in the prior art more clearly, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings in the following description are only These are some embodiments of the present invention. For those of ordinary skill in the art, other drawings can be obtained based on these drawings without exerting creative efforts.

图1为本发明一种基于深度神经网络的中短期售电量预测方法的原理框图。Figure 1 is a schematic block diagram of a medium- and short-term electricity sales forecasting method based on deep neural networks according to the present invention.

具体实施方式Detailed ways

下面将结合实施例对本发明的技术方案进行清楚、完整地描述，显然，所描述的实施例仅仅是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其它实施例，都属于本发明保护的范围。The technical solution of the present invention will be described clearly and completely below with reference to the embodiments. Obviously, the described embodiments are only some of the embodiments of the present invention, rather than all the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts fall within the scope of protection of the present invention.

如图1所示，一种基于深度神经网络的中短期售电量预测方法，包括如下步骤：As shown in Figure 1, a short- and medium-term electricity sales forecast method based on deep neural networks includes the following steps:

步骤一：获取各个台区的原始样本集；对原始样本集进行数据清洗；对清洗后数据进行预处理，得到预处理样本集并缓存至管理中心；原始样本集包括日期数据、气候文本数据以及售电量历史数据；Step 1: Obtain the original sample set of each station area; perform data cleaning on the original sample set; preprocess the cleaned data to obtain the preprocessed sample set and cache it to the management center; the original sample set includes date data, climate text data and Historical data of electricity sales;

其中，对原始样本集进行数据清洗，包括：Among them, data cleaning is performed on the original sample set, including:

由于样本台区的售电量历史数据可能存在少量数据缺失的情况，而采样数据的缺失会影响VMD分解，因此本发明选用平均插值法对缺失数据进行补全其中；Since there may be a small amount of missing data in the historical electricity sales data of the sample station area, and the missing data will affect the VMD decomposition, the present invention uses the average interpolation method to complete the missing data;

对清洗后数据进行预处理，包括：Preprocess the cleaned data, including:

对售电历史数据进行单位换算处理，补全采样时间点保证其连续，并利用平均插值法填补采样点缺失数据，得到台区售电量时间序列；其中，若采样时间点和数据大面积缺失，则利用同时期他年数据进行填补；Perform unit conversion processing on the historical electricity sales data, complete the sampling time points to ensure their continuity, and use the average interpolation method to fill in the missing data at the sampling points to obtain the Taiwan area electricity sales time series; among them, if the sampling time points and data are missing in large areas, Data from other years during the same period is used for filling;

步骤二：管理中心对各个台区的预处理样本集进行预测优化系数YH分析，并将预处理样本集按照预测优化系数YH大小进行排序，得到预处理样本集的预测优化序列；具体分析步骤为：Step 2: The management center analyzes the prediction optimization coefficient YH of the preprocessing sample sets in each station area, and sorts the preprocessing sample sets according to the size of the prediction optimization coefficient YH to obtain the prediction optimization sequence of the preprocessing sample set; the specific analysis steps are: :

获取预处理样本集对应的台区，获取台区的供电区域；统计供电区域内供电线路长度为L1，供电户数为HL以及户均用电量为DL；Obtain the station area corresponding to the preprocessed sample set and obtain the power supply area of the station area; count the length of the power supply line in the power supply area as L1, the number of power supply households as HL, and the average power consumption per household as DL;

利用公式GD=L1×g2+HL×g3+DL×g4计算得到台区的区域关联系数GD，其中g2、g3、g4为预设系数因子；The regional correlation coefficient GD of the station area is calculated using the formula GD=L1×g2+HL×g3+DL×g4, where g2, g3, and g4 are preset coefficient factors;

在预设时间段内，采集台区每个采样区间的供售误差数据；供售误差数据为供电量和售电量的差值数据；差值数据取正数；Within the preset time period, the supply and sales error data of each sampling interval in the Taiwan area is collected; the supply and sales error data is the difference between power supply and electricity sales; the difference data is taken as a positive number;

将每个采样区间的供售误差数据标记为Wi；将供售误差数据Wi与预设误差阈值相比较；其中，供售误差数据Wi越大，则表明对应台区电能损耗越大，拉高了发电成本；对电网运行造成了额外损耗；Mark the supply and sale error data of each sampling interval as Wi; compare the supply and sale error data Wi with the preset error threshold; among them, the larger the supply and sale error data Wi, the greater the power loss in the corresponding station area, and the higher the Reduced power generation costs; caused additional losses to power grid operation;

统计供售误差数据Wi＞预设误差阈值的次数占比为Zb，当Wi＞预设误差阈值时，获取Wi与预设误差阈值的差值并进行求和得到超误总值LZ；The proportion of statistical supply and sale error data Wi > the preset error threshold is Zb. When Wi > the preset error threshold, the difference between Wi and the preset error threshold is obtained and summed to obtain the total excess error value LZ;

利用公式GS=Zb×g1+LZ×g5计算得到台区的供电损耗系数GS，其中g1、g5均为预设系数因子；Use the formula GS=Zb×g1+LZ×g5 to calculate the power supply loss coefficient GS of the station area, where g1 and g5 are both preset coefficient factors;

将区域关联系数、供电损耗系数进行归一化处理并取其数值，利用公式YH=ƒ×(GD×b1+GS×b2)计算得到预处理样本集的预测优化系数YH，其中b1、b2均为预设系数因子；ƒ为预设均衡系数；Normalize the regional correlation coefficient and power supply loss coefficient and take their values. Use the formula YH=ƒ×(GD×b1+GS×b2) to calculate the prediction optimization coefficient YH of the preprocessed sample set, where b1 and b2 are both is the preset coefficient factor; ƒ is the preset equalization coefficient;

管理中心按照预测优化序列依次将各个台区的预处理样本集发送至预测终端进行售电数据预测模型训练；The management center sequentially sends the pre-processed sample sets of each station area to the prediction terminal according to the prediction optimization sequence for electricity sales data prediction model training;

步骤三：预测终端接收到预处理样本集后，将预处理样本集拆分为互斥的三组数据集；数据集为训练集、验证集、测试集；采用序列差分的方式处理时间序列，建立LSTM神经网络模型，其中，根据输入变量的个数指定长短期记忆神经网络LSTM输入节点的个数，设定适合的隐藏层节点个数，以及代表售电量预测值的输出节点个数；Step 3: After the prediction terminal receives the preprocessed sample set, it splits the preprocessed sample set into three mutually exclusive data sets; the data sets are training set, verification set, and test set; the time series is processed using sequence difference. Establish an LSTM neural network model, in which the number of input nodes of the long short-term memory neural network LSTM is specified based on the number of input variables, the appropriate number of hidden layer nodes, and the number of output nodes representing the predicted value of electricity sales are set;

其中，采用序列差分的方式处理时间序列，具体步骤包括：Among them, sequence difference is used to process time series. The specific steps include:

将各个年份同月份或同季度的算术平均数Pi作为分子，将所有月份或季度的算术平均数之和作为分母，计算得到每个月份或季度的季节比率Si，其中，Si=(N×Pi)/(P1+P2+…+PN)；季节比率Si即为季节因素对售电量长期趋势的修正系数；P1+P2+…+PN即为所有月份或季度的算术平均数之和；N为样本数；Using the arithmetic mean Pi of the same month or quarter in each year as the numerator and the sum of the arithmetic mean of all months or quarters as the denominator, the seasonal ratio Si of each month or quarter is calculated, where Si=(N×Pi )/(P1+P2+…+PN); seasonal ratio Si is the correction coefficient of seasonal factors to the long-term trend of electricity sales; P1+P2+…+PN is the sum of the arithmetic average of all months or quarters; N is the number of samples ;

计算得到去除季节性因素的Tt与t期对应季节性Si的乘积，即为对应t期的售电量预测值；其中Qt为t期的移动平均值趋势数据；The product of Tt with seasonal factors removed and the corresponding seasonal Si in period t is calculated, which is the predicted value of electricity sales in period t; where Qt is the moving average trend data in period t;

步骤四：初始化神经网络权值和粒子群优化算法参数；对神经网络模型进行模型训练，得到训练好的中短期售电数据预测模型；Step 4: Initialize the neural network weights and particle swarm optimization algorithm parameters; conduct model training on the neural network model to obtain the trained short- and medium-term electricity sales data prediction model;

其中，采用Nguyen-Widrow方法初始化神经网络权值；神经网络权值包括模型的输入值、隐藏值、模型层数、神经元丢弃率、激活函数、特征权重初始值、偏置中的至少一种；Among them, the Nguyen-Widrow method is used to initialize the neural network weights; the neural network weights include at least one of the input value of the model, the hidden value, the number of model layers, the neuron drop rate, the activation function, the initial value of the feature weight, and the bias. ;

其中，对神经网络模型进行模型训练，得到训练好的中短期售电数据预测模型，具体包括：Among them, model training is performed on the neural network model to obtain a trained short- and medium-term electricity sales data prediction model, which specifically includes:

将训练集、验证集、测试集作为历史特征值输入LSTM神经网络模型以进行模型训练，并通过损失函数进行模型评估，获得使训练样本整体误差最小的最优中短期售电数据预测模型；Input the training set, verification set, and test set as historical feature values into the LSTM neural network model for model training, and conduct model evaluation through the loss function to obtain the optimal short- and medium-term electricity sales data prediction model that minimizes the overall error of the training sample;

更进一步地技术方案在于：通过损失函数进行模型评估，包括：A further technical solution lies in: model evaluation through loss functions, including:

根据损失评估结果确定训练好的中短期售电数据预测模型；Determine the trained short- and medium-term electricity sales data prediction model based on the loss assessment results;

上述公式均是去除量纲取其数值计算，公式是由采集大量数据进行软件模拟得到最接近真实情况的一个公式，公式中的预设参数和预设阈值由本领域的技术人员根据实际情况设定或者大量数据模拟获得。The above formulas are all numerical calculations after removing the dimensions. The formula is a formula closest to the real situation obtained by collecting a large amount of data for software simulation. The preset parameters and preset thresholds in the formula are set by those skilled in the field according to the actual situation. Or obtain a large amount of data through simulation.

本发明的工作原理：Working principle of the invention:

一种基于深度神经网络的中短期售电量预测方法，在工作时，获取各个台区的原始样本集；对原始样本集进行数据清洗；对清洗后数据进行预处理，得到预处理样本集并缓存至管理中心；管理中心对各个台区的预处理样本集进行预测优化系数YH分析，结合供电区域内供电线路长度，供电户数、户均用电量以及每个采样区间的供售误差数据，计算得到预处理样本集的预测优化系数YH；将预处理样本集按照预测优化系数YH大小进行排序，得到预处理样本集的预测优化序列；管理中心按照预测优化序列依次将各个台区的预处理样本集发送至预测终端进行售电数据预测模型训练；提高数据处理效率；A short- and medium-term power sales prediction method based on deep neural networks. When working, the original sample set of each station is obtained; the original sample set is data cleaned; the cleaned data is preprocessed to obtain the preprocessed sample set and cached to the management center; the management center conducts prediction optimization coefficient YH analysis on the preprocessed sample sets of each station area, combined with the length of the power supply lines in the power supply area, the number of power supply households, average household electricity consumption, and the supply and sales error data of each sampling interval, Calculate the prediction optimization coefficient YH of the preprocessing sample set; sort the preprocessing sample set according to the size of the prediction optimization coefficient YH to obtain the prediction optimization sequence of the preprocessing sample set; the management center sequentially sorts the preprocessing results of each station area according to the prediction optimization sequence The sample set is sent to the prediction terminal for electricity sales data prediction model training; improving data processing efficiency;

预测终端接收到预处理样本集后，将预处理样本集拆分为互斥的三组数据集；采用序列差分的方式处理时间序列，建立LSTM神经网络模型；采用Nguyen-Widrow方法初始化神经网络权值和粒子群优化算法参数；将训练集、验证集、测试集作为历史特征值输入LSTM神经网络模型以进行模型训练，并通过损失函数进行模型评估，获得使训练样本整体误差最小的最优中短期售电数据预测模型；提高数据预测精度。After the prediction terminal receives the preprocessed sample set, it splits the preprocessed sample set into three mutually exclusive data sets; uses sequence difference to process the time series and establishes an LSTM neural network model; uses the Nguyen-Widrow method to initialize the neural network weights values and particle swarm optimization algorithm parameters; input the training set, verification set, and test set as historical feature values into the LSTM neural network model for model training, and conduct model evaluation through the loss function to obtain the optimal medium that minimizes the overall error of the training sample. Short-term electricity sales data prediction model; improve data prediction accuracy.

在本说明书的描述中，参考术语“一个实施例”、“示例”、“具体示例”等的描述意指结合该实施例或示例描述的具体特征、结构、材料或者特点包含于本发明的至少一个实施例或示例中。在本说明书中，对上述术语的示意性表述不一定指的是相同的实施例或示例。而且，描述的具体特征、结构、材料或者特点可以在任何的一个或多个实施例或示例中以合适的方式结合。In the description of this specification, reference to the terms "one embodiment," "example," "specific example," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one aspect of the invention. in an embodiment or example. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the specific features, structures, materials or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

以上公开的本发明优选实施例只是用于帮助阐述本发明。优选实施例并没有详尽叙述所有的细节，也不限制该发明仅为的具体实施方式。显然，根据本说明书的内容，可作很多的修改和变化。本说明书选取并具体描述这些实施例，是为了更好地解释本发明的原理和实际应用，从而使所属技术领域技术人员能很好地理解和利用本发明。本发明仅受权利要求书及其全部范围和等效物的限制。The preferred embodiments of the invention disclosed above are only intended to help illustrate the invention. The preferred embodiments do not describe all details, nor do they limit the invention to specific implementations. Obviously, many modifications and variations are possible in light of the contents of this specification. These embodiments are selected and described in detail in this specification to better explain the principles and practical applications of the present invention, so that those skilled in the art can better understand and utilize the present invention. The invention is limited only by the claims and their full scope and equivalents.

Claims

1. The medium-short-term electricity sales quantity prediction method based on the deep neural network is characterized by comprising the following steps of:

step one: acquiring an original sample set of each area; performing data cleaning on the original sample set; preprocessing the cleaned data to obtain a preprocessed sample set and caching the preprocessed sample set to a management center; the original sample set comprises date data, climate text data and sales volume history data;

step two: the management center carries out prediction optimization coefficient YH analysis on the pretreatment sample set of each platform region, and the specific analysis steps are as follows:

obtaining a platform region corresponding to a pretreatment sample set, and obtaining a power supply region of the platform region; counting the length of a power supply line in a power supply area to be L1, the number of power supplies to be HL and the average power consumption of users to be DL;

calculating a region association coefficient GD of the station region by using a formula GD=L1×g2+HL×g3+DL×g4, wherein g2, g3 and g4 are preset coefficient factors;

collecting vending error data of each sampling interval of the platform region in a preset time period; the supply and sales error data are difference data of the supply power quantity and the sales power quantity; the difference value data is positive;

marking the vending error data of each sampling interval as Wi; comparing the vending error data Wi with a preset error threshold value; if Wi is larger than a preset error threshold, the power consumption of the corresponding area is large, the power generation cost is increased, and extra loss is caused to the operation of the power grid;

counting the times of the sale error data Wi > the preset error threshold value as Zb, and when Wi > the preset error threshold value, obtaining the difference value of Wi and the preset error threshold value and summing to obtain a super error total value LZ; calculating a power supply loss coefficient GS of the station area by using a formula GS=Zb×g1+LZ×g5, wherein g1 and g5 are preset coefficient factors;

normalizing the area association coefficient and the power supply loss coefficient and taking the values of the area association coefficient and the power supply loss coefficient, and calculating by using a formula YH= ƒ × (GDXb1+GS Xb 2) to obtain a prediction optimization coefficient YH of the pretreatment sample set, wherein b1 and b2 are preset coefficient factors; ƒ is a preset equalization coefficient;

sequencing the pretreatment sample set according to the magnitude of the prediction optimization coefficient YH to obtain a prediction optimization sequence of the pretreatment sample set; then, the pretreatment sample sets of each platform area are sequentially sent to a prediction terminal for electricity selling data prediction model training according to the prediction optimization sequence;

step three: after receiving the pretreatment sample set, the prediction terminal splits the pretreatment sample set into a mutually exclusive training set, a mutually exclusive verification set and a mutually exclusive testing set; processing the time sequence in a sequence difference mode, and establishing an LSTM neural network model; the method comprises the steps of designating the number of LSTM input nodes of a long-short-period memory neural network according to the number of input variables, and setting the number of suitable hidden layer nodes and the number of output nodes representing the predicted value of the sales quantity;

step four: initializing the weight of a neural network and the parameters of a particle swarm optimization algorithm; initializing a neural network weight by adopting an Nguyen-widry method; performing model training on the neural network model to obtain a trained medium-short-term electricity selling data prediction model;

step five: after the middle-short-term electricity selling data prediction model of the optimal parameters is obtained, the electricity selling quantity data of the time zone before the time zone to be predicted and the climate text data of the time zone to be predicted are input as models, and the electricity selling quantity prediction value of the time zone to be predicted is obtained.

2. The method for predicting the medium-short term electricity sales amount based on the deep neural network according to claim 1, wherein the data cleaning of the original sample set comprises the following steps: filling or discarding the null value; performing de-duplication processing on the repeated data; cleaning climate text data with incorrect ranges; and carrying out data verification on the climate text data.

3. The method for predicting the medium-short term electricity sales amount based on the deep neural network according to claim 2, wherein preprocessing the cleaned data comprises the following steps:

carrying out unit conversion processing on the electricity selling history data, complementing sampling time points to ensure continuity of the electricity selling history data, and filling missing data of the sampling points by using an average interpolation method to obtain a time sequence of electricity selling quantity of a platform area; if the sampling time point and the data are missing in a large area, filling is carried out by using the data of other years in the same period.

4. The method for predicting the medium-short-term electricity sales amount based on the deep neural network according to claim 1, wherein the method for processing the time sequence by adopting a sequence difference mode comprises the following specific steps:

according to the month and quarter data of the time sequence, obtaining a moving average value of the sales power of the platform area for 12 months or 4 quarters, and obtaining long-term moving average value trend data Q;

calculating Y/q=s×c×i according to the multiplication model; wherein Y represents year, S represents seasonal component, C represents periodic component, and I represents irregular component;

calculating Y/Q according to the same month or the same quarter of each year again and taking an average value to obtain an arithmetic average Pi of the same month or the same quarter of each year;

calculating to obtain a seasonal ratio Si of each month or quarter by taking an arithmetic mean Pi of the same month or quarter of each year as a molecule and taking the sum of the arithmetic mean of all months or quarters as a denominator, wherein Si= (N×Pi)/(P1+P2+ … +PN); the seasonal ratio Si is a correction coefficient of the seasonal factor to the long-term trend of the sales power; p1+p2+ … +pn is the sum of the arithmetic averages of all months or quarters; n is the number of samples;

calculating to obtain the product of Qt with seasonal factors removed and seasonal Si corresponding to the t period, namely the predicted value of the sales power quantity corresponding to the t period; wherein Qt is moving average trend data for period t.

5. The method for predicting the middle-short term electricity sales based on the deep neural network according to claim 1, wherein the neural network weight comprises at least one of an input value, a hidden value, a model layer number, a neuron discarding rate, an activation function, a characteristic weight initial value and a bias of a model.

6. The method for predicting the medium-short-term electricity sales amount based on the deep neural network according to claim 1, wherein the neural network model is subjected to model training to obtain a trained medium-short-term electricity sales data prediction model, and the method specifically comprises the following steps:

and inputting the training set, the verification set and the test set as historical characteristic values into the LSTM neural network model to perform model training, and performing model evaluation through a loss function to obtain an optimal medium-short term electricity selling data prediction model which minimizes the overall error of the training sample.

7. The method for predicting medium-short term sales power based on deep neural network according to claim 6, wherein the model evaluation by the loss function comprises:

adopting a root mean square error equation as a loss function, adopting an R2_score determining coefficient as a model fitting degree evaluating method, and evaluating the loss condition between a model predicted value and a true value;

and determining a trained medium-short-term electricity selling data prediction model according to the loss evaluation result.