CN113485986B

CN113485986B - Electric power data restoration method

Info

Publication number: CN113485986B
Application number: CN202110717117.1A
Authority: CN
Inventors: 夏飞; 汤铭; 王鹏飞; 邹昊东; 宋浒; 胡游君; 刘军; 邱玉祥; 张磊; 刘赛; 高雪; 晁凯; 杨勰
Original assignee: Nari Information and Communication Technology Co; Information and Telecommunication Branch of State Grid Jiangsu Electric Power Co Ltd
Current assignee: Nari Information and Communication Technology Co; Information and Telecommunication Branch of State Grid Jiangsu Electric Power Co Ltd
Priority date: 2021-06-25
Filing date: 2021-06-25
Publication date: 2024-08-02
Anticipated expiration: 2041-06-25
Also published as: CN113485986A

Abstract

The invention discloses a power data restoration method, which utilizes an SOM neural network to classify and process power data in a historical power data set; obtaining influence factors of the electric power data types meeting the relevance threshold by using the Pearson correlation coefficient theory; inputting influencing factors of the missing data into a trained LSTM neural network to obtain the electric power data type of the missing data; and repairing the data by adopting different methods according to the type of the power data of the missing data. According to the invention, the complex nonlinearity of the power data is considered, the characteristic of nonlinear problem can be processed by utilizing the strong learning ability of the neural network, the restoration of the power data is realized, and the classification efficiency and accuracy can be effectively improved.

Description

A method for repairing power data

技术领域Technical Field

本发明涉及一种电力数据修复方法，属于电力数据检测修复技术领域。The invention relates to a power data repair method, belonging to the technical field of power data detection and repair.

背景技术Background technique

数字化技术不断发展进步，使得电力系统中产生了大量的电力数据。但是，电力数据常常因为外界干扰、传输错误、设备异常、网络延迟等原因而出现数据缺失的情况，这样会影响电力系统内处理数据的正确性与及时性。The continuous development and progress of digital technology has generated a large amount of power data in the power system. However, power data is often missing due to external interference, transmission errors, equipment abnormalities, network delays, etc., which will affect the accuracy and timeliness of data processing in the power system.

现有修复方法多采用传统机器学习方法，但是，面对目前复杂的日益结构复杂的电网结构，尤其是新能源发电系统与电动汽车的接入、需求响应机制的应用，导致传统机器学习方法无法应对高随机性的电力数据的修复。Existing repair methods mostly use traditional machine learning methods. However, faced with the increasingly complex power grid structure, especially the access of new energy power generation systems and electric vehicles, and the application of demand response mechanisms, traditional machine learning methods are unable to cope with the repair of highly random power data.

为了保证智能化电力系统的优化稳定运行，需要有完整正确的电力数据提供支持，避免影响电力系统安全稳定运行。本领域技术人员急需要对电力系统的数据进行更高精度的修复。In order to ensure the optimized and stable operation of the intelligent power system, complete and correct power data is needed to provide support to avoid affecting the safe and stable operation of the power system. Technical personnel in this field urgently need to repair the data of the power system with higher accuracy.

发明内容Summary of the invention

目的：为了克服现有技术中存在的不足，本发明提供一种电力数据修复方法。Purpose: In order to overcome the deficiencies in the prior art, the present invention provides a method for repairing power data.

技术方案：为解决上述技术问题，本发明采用的技术方案为：Technical solution: To solve the above technical problems, the technical solution adopted by the present invention is:

一种电力数据修复方法，包括如下步骤：A method for repairing power data comprises the following steps:

步骤S1、获取历史电力数据集，利用SOM神经网络对历史电力数据集中电力数据进行分类处理，得到电力数据类型。Step S1, obtaining a historical power data set, and using a SOM neural network to classify the power data in the historical power data set to obtain a power data type.

步骤S2、利用Pearson相关系数理论，对电力数据类型与影响因素进行相关性分析，获得电力数据类型满足关联性阈值的影响因素，将电力数据类型对应的影响因素作为特征值。Step S2: Use the Pearson correlation coefficient theory to perform correlation analysis on the power data type and the influencing factors, obtain the influencing factors of the power data type that meet the correlation threshold, and use the influencing factors corresponding to the power data type as feature values.

步骤S3、将电力数据类型与对应的特征值作为训练样本，对LSTM神经网络进行训练，获得训练好的LSTM神经网络。Step S3: Use the power data type and the corresponding characteristic value as training samples to train the LSTM neural network to obtain a trained LSTM neural network.

步骤S4、将缺失数据的影响因素输入训练好的LSTM神经网络，获得缺失数据的电力数据类型。Step S4: Input the influencing factors of the missing data into the trained LSTM neural network to obtain the power data type of the missing data.

步骤S5、根据缺失数据的电力数据类型采用不同的方法对数据进行修复。Step S5: Use different methods to repair the data according to the type of power data of the missing data.

作为优选方案，所述SOM神经网络由输入层和输出层两层组成，输入层与输出层之间采用全连接。在对输入电力数据进行分类时，输入层的各神经元协同工作，分别竞争输入电力数据的响应机会，从而得到输出的神经元。通过对权值的更新，输出层的神经元四周的权值都会进行相应的调整，电力数据经过调整后输入层的每一个神经元对特定类别的学习，实现输出层对输入电力数据的学习分类。权值取值范围通常为[0,1]。As a preferred solution, the SOM neural network consists of two layers, an input layer and an output layer, and the input layer and the output layer are fully connected. When classifying the input power data, the neurons in the input layer work together to compete for the response opportunities of the input power data, thereby obtaining the output neurons. By updating the weights, the weights around the neurons in the output layer will be adjusted accordingly. After the power data is adjusted, each neuron in the input layer learns a specific category, thereby realizing the learning and classification of the input power data by the output layer. The weight value range is usually [0,1].

作为优选方案，所述Pearson相关系数理论，计算公式如下：As a preferred solution, the Pearson correlation coefficient theory is calculated as follows:

式中：ρ_X.Y为相关性系数，范围为[0,1]；X,Y分别为连续变量；n为连续变量样本的个数；和分别为连续变量的平均值；ρ_X.Y＝0，表示变量间无关；ρ_X.Y>0，表示变量间正相关；ρ_X.Y<0，表示变量间负相关；|ρ_X,Y|越大说明变量间关联性越强，反之越弱。Where: ρ _XY is the correlation coefficient, ranging from [0,1]; X, Y are continuous variables; n is the number of continuous variable samples; and are the average values of continuous variables respectively; ρ _XY = 0, indicating that the variables are independent of each other; ρ _XY > 0, indicating that the variables are positively correlated; ρ _XY < 0, indicating that the variables are negatively correlated; the larger the |ρ _X,Y |, the stronger the correlation between the variables, and vice versa.

作为优选方案，所述LSTM神经网络构建如下所示：As a preferred solution, the LSTM neural network is constructed as follows:

LSTM神经网络包括3个门结构实现对电力数据类型的预测，分别是遗忘门，输入门和输出门。The LSTM neural network includes three gate structures to realize the prediction of power data type, namely forget gate, input gate and output gate.

(1)遗忘门。主要用于对输入数据筛选，计算数据的保留程度，通过sigmoid神经层处理后输出一个[0,1]范围内的值，数值越大表示保留成分越多，反之越少。遗忘门计算公式如下：(1) Forget gate. It is mainly used to filter input data and calculate the degree of data retention. After processing through the sigmoid neural layer, it outputs a value in the range of [0,1]. The larger the value, the more components are retained, and vice versa. The calculation formula of the forget gate is as follows:

f_t＝σ(W_xfx_t+W_hfh_t-1+B_f)f _t =σ(W _xf x _t +W _hf h _t-1 +B _f )

式中：f_t为遗忘门的输出；W_xf和W_hf为遗忘门需要学习的网络参数；B_f为遗忘门的偏置；σ为sigmoid函数：Where: _ft is the output of the forget gate; _Wxf and _Whf are the network parameters that the forget gate needs to learn; _Bf is the bias of the forget gate; σ is the sigmoid function:

(2)输入门。主要用于对数据状态的更新，主要包括两部分信息，一部分是sigmoid函数选取需要保存的数据；另一部分是tanh函数把当前输入x_t生成的新信息，C′_t将两部分信息结合起来产生新的记忆状态，计算公式如下：(2) Input gate. It is mainly used to update the data state. It mainly includes two parts of information. One part is the data to be saved selected by the sigmoid function; the other part is the new information generated by the tanh function for the current input _xt . _C′t combines the two parts of information to generate a new memory state. The calculation formula is as follows:

i_t＝δ(W_xix_t+W_hih_t-1+B_i)i _t =δ(W _xi x _t +W _hi h _t-1 +B _i )

C′_t＝tanh(W_xCx_t+W_hCh_t-1+B_C)C′ _t =tanh(W _xC x _t +W _hC h _t-1 +B _C )

C_t＝f_t×C_t-1+i_t×C′_t C _t =f _t ×C _t-1 +i _t ×C′ _t

式中：i_t为输入门的输出；W_xi和W_hi为输入门需要学习的网络参数；B_i为输入门的偏置；C_t′为tanh函数更新后的输入；W_xc和W_hc为细胞状态需要学习的网络参数；B_c为细胞状态的偏置；C_t为细胞状态。Where: _it is the output of the input gate; _Wxi and _Whi are the network parameters that need to be learned by the input gate; _Bi is the bias of the input gate; _Ct ′ is the input after the tanh function is updated; _Wxc and _Whc are the network parameters that need to be learned for the cell state; _Bc is the bias of the cell state; _Ct is the cell state.

(3)输出门。主要用于计算输入信息x_t的输出程度。通过细胞状态中输出的信息经过tanh函数后与sigmoid函数做乘得到最终的隐藏单元h_t，计算公式如下：(3) Output gate. It is mainly used to calculate the output level of input information _xt . The information output in the cell state is multiplied by the sigmoid function after passing through the tanh function to obtain the final hidden unit _ht . The calculation formula is as follows:

o_t＝σ(W_xox_t+W_hoh_t-1+B_o)o _t =σ(W _xo x _t +W _ho h _t-1 +B _o )

h_t＝o_t×tanh(C_t)h _t = o _t × tanh(C _t )

式中：O_t为输出门的输出；h_t为输出的隐藏单元；W_xo和W_ho为输出门的需要学习的网络参数；B_o为输出门的偏置。Where: _Ot is the output of the output gate; _ht is the hidden unit of the output; _Wxo and _Who are the network parameters of the output gate that need to be learned; _Bo is the bias of the output gate.

作为优选方案，所述根据缺失数据的电力数据类型采用不同的方法对数据进行修复，包括如下步骤：As a preferred solution, the method of repairing the data using different methods according to the type of power data of the missing data includes the following steps:

当缺失数据的电力数据类型为电能数据时，采用分层均值填补法，将电能数据按每年的相同月份进行分层，取每个层的平均值代替缺失值。When the type of power data with missing data is electric energy data, the stratified mean filling method is used to stratify the electric energy data according to the same month of each year, and the average value of each layer is used to replace the missing value.

当缺失数据的电力数据类型为负荷数据时，采用最大期望算法，对缺失数据进行最大似然估计，并不断迭代，最终输出值代替缺失值。When the type of power data with missing data is load data, the maximum expectation algorithm is used to perform maximum likelihood estimation on the missing data, and iterates continuously to finally output the value to replace the missing value.

当缺失数据的电力数据类型为电流数据、电压数据或频率数据时，采用均值填补法，对前后两个观测值的平均值或者是非缺失数据的均值来代替缺失值。When the type of power data with missing data is current data, voltage data or frequency data, the mean filling method is used to replace the missing value with the average of the two previous and next observations or the mean of the non-missing data.

有益效果：本发明提供的一种电力数据修复方法，本方法考虑到电力数据的复杂非线性，利用神经网络的学习能力强，能够处理非线性问题的特性，实现电力数据的修复，可有效提升分类效率和准确率。Beneficial effect: The present invention provides a method for repairing power data. This method takes into account the complex nonlinearity of power data, utilizes the strong learning ability of neural networks, and can handle the characteristics of nonlinear problems to achieve power data repair, which can effectively improve classification efficiency and accuracy.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1是本发明的方法流程示意图。FIG. 1 is a schematic flow chart of the method of the present invention.

图2是本发明一实施例提供的分类模型SOM-LSTM神经网络的性能准确率对比图。FIG2 is a performance accuracy comparison chart of the SOM-LSTM neural network classification model provided by an embodiment of the present invention.

图3是本发明另一实施例提供的分类模型SOM-LSTM神经网络的性能准确率对比图。FIG3 is a performance accuracy comparison chart of a SOM-LSTM neural network classification model provided by another embodiment of the present invention.

具体实施方式Detailed ways

下面结合具体实施例对本发明作更进一步的说明。The present invention will be further described below in conjunction with specific embodiments.

以下结合附图和具体实施方式对本发明提出的一种电力数据修复方法作进一步详细说明。根据下面说明，本发明的优点和特征将更清楚。需要说明的是，附图采用非常简化的形式且均使用非精准的比例，仅用以方便、明晰地辅助说明本发明实施方式的目的。为了使本发明的目的、特征和优点能够更加明显易懂，请参阅附图。须知，本说明书所附图式所绘示的结构、比例、大小等，均仅用以配合说明书所揭示的内容，以供熟悉此技术的人士了解与阅读，并非用以限定本发明实施的限定条件，故不具技术上的实质意义，任何结构的修饰、比例关系的改变或大小的调整，在不影响本发明所能产生的功效及所能达成的目的下，均应仍落在本发明所揭示的技术内容能涵盖的范围内。The following is a further detailed description of a power data repair method proposed in the present invention in combination with the accompanying drawings and specific embodiments. According to the following description, the advantages and features of the present invention will be clearer. It should be noted that the drawings are in a very simplified form and use non-precise proportions, which are only used to conveniently and clearly assist in explaining the purpose of the embodiments of the present invention. In order to make the purposes, features and advantages of the present invention more obvious and easy to understand, please refer to the accompanying drawings. It should be noted that the structure, proportion, size, etc. illustrated in the drawings of this specification are only used to match the contents disclosed in the specification for people familiar with this technology to understand and read, and are not used to limit the limiting conditions for the implementation of the present invention, so it has no technical substantive significance. Any modification of the structure, change in the proportional relationship or adjustment of the size, without affecting the effects that the present invention can produce and the purposes that can be achieved, should still fall within the scope of the technical content disclosed in the present invention.

需要说明的是，在本文中，诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来，而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且，术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含，从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素，而且还包括没有明确列出的其他要素，或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下，由语句“包括一个……”限定的要素，并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。It should be noted that, in this article, relational terms such as first and second, etc. are only used to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply any such actual relationship or order between these entities or operations. Moreover, the terms "include", "comprise" or any other variants thereof are intended to cover non-exclusive inclusion, so that a process, method, article or device including a series of elements includes not only those elements, but also other elements not explicitly listed, or also includes elements inherent to such process, method, article or device. In the absence of further restrictions, the elements defined by the sentence "comprise a ..." do not exclude the presence of other identical elements in the process, method, article or device including the elements.

结合附图1所示，本实施例提供的一种电力数据修复方法，包括如下步骤：As shown in FIG. 1 , the present embodiment provides a method for repairing power data, comprising the following steps:

所述SOM神经网络由输入层和输出层两层组成，输入层与输出层之间采用全连接。在对输入电力数据进行分类时，输入层的各神经元协同工作，分别竞争输入电力数据的响应机会，从而得到输出的神经元。通过对权值的更新，输出层的神经元四周的权值都会进行相应的调整，电力数据经过调整后输入层的每一个神经元对特定类别的学习，实现输出层对输入电力数据的学习分类。权值取值范围通常为[0,1]。The SOM neural network consists of two layers, the input layer and the output layer, and the input layer and the output layer are fully connected. When classifying the input power data, the neurons in the input layer work together to compete for the response opportunities of the input power data, thereby obtaining the output neurons. By updating the weights, the weights around the neurons in the output layer will be adjusted accordingly. After the power data is adjusted, each neuron in the input layer learns a specific category, thereby realizing the learning and classification of the input power data in the output layer. The weight value range is usually [0,1].

所述Pearson相关系数理论，计算公式如下：The Pearson correlation coefficient theory, the calculation formula is as follows:

所述LSTM神经网络构建如下所示：The LSTM neural network is constructed as follows:

f_t＝σ(W_xfx_t+W_hfh_t-1+B_f)f _t =σ(W _xf x _t +W _hf h _t-1 +B _f )

i_t＝δ(W_xix_t+W_hih_t-1+B_i)i _t =δ(W _xi x _t +W _hi h _t-1 +B _i )

C_t＝f_t×C_t-1+i_t×C′_t C _t =f _t ×C _t-1 +i _t ×C′ _t

o_t＝σ(W_xox_t+W_hoh_t-1+B_o)o _t =σ(W _xo x _t +W _ho h _t-1 +B _o )

h_t＝o_t×tanh(C_t)h _t = o _t × tanh(C _t )

所述根据缺失数据的电力数据类型采用不同的方法对数据进行修复，包括如下步骤：The method of repairing the data using different methods according to the type of power data of the missing data includes the following steps:

如图2，3所示，为了验证所提的基于SOM-LSMT电力数据修复方法的有效性，本文实验仿真的计算机配置为Inter Intel(R)Core(TM)i5-8300H CPU@2.30GHz2.30GHz和NVIDIAGTX 1050Ti 4G显存，在MATLAB R2019a平台上进行仿真计算。As shown in Figures 2 and 3, in order to verify the effectiveness of the proposed SOM-LSMT-based power data repair method, the computer configuration of the experimental simulation in this paper is Inter Intel(R)Core(TM)i5-8300H CPU@2.30GHz2.30GHz and NVIDIAGTX 1050Ti 4G video memory, and the simulation calculation is carried out on the MATLAB R2019a platform.

本文选取2018年8月中国某地区的家庭电力数据以及相应气象站所采集到的温度数据，数据采集间隔为15分钟。选用平均绝对误差((mean absolute error,MAE)和均方根误差(root mean square error,RMSE)评价指标对电力数据修复结果进行分析。This paper selects household electricity data from a certain area in China in August 2018 and temperature data collected by the corresponding meteorological station, with a data collection interval of 15 minutes. The mean absolute error (MAE) and root mean square error (RMSE) evaluation indicators are used to analyze the power data repair results.

为了对所提出的基于SOM-LSTM模型的准确性进行分析，本文分别对典型工作日和非工作日的连续缺失电力数据SOM-LSTM模型与极限学习机(Extreme Learning Machine,ELM)模型、LSTM模型的修复结果绘制曲线进行对比，结果如图2和图3所示。In order to analyze the accuracy of the proposed SOM-LSTM model, this paper draws curves to compare the repair results of the SOM-LSTM model, the Extreme Learning Machine (ELM) model, and the LSTM model for continuous missing power data on typical working days and non-working days. The results are shown in Figures 2 and 3.

由图2和图3可知，针对典型工作日和非工作日的用户电压数据连续缺失进行修复时，LSTM神经网络的误差最大，且随着数据缺失的增加，精确度显著降低，说明该神经网络的拟合修复效果不足。对比LSTM模型和ELM模型，经过SOM神经网络分类后的数据进行预测修复的SOM-LSTM神经网络能够有效改善连续缺失数据修复准确率低的问题，并且在数据缺失较多的情况下依然保证较高水平的修复率。As shown in Figures 2 and 3, when repairing the continuous missing voltage data of users on typical working days and non-working days, the LSTM neural network has the largest error, and as the data missing increases, the accuracy decreases significantly, indicating that the fitting and repair effect of the neural network is insufficient. Compared with the LSTM model and the ELM model, the SOM-LSTM neural network that predicts and repairs the data after classification by the SOM neural network can effectively improve the problem of low repair accuracy of continuous missing data, and still maintain a high level of repair rate when there is a lot of data missing.

尽管本发明的内容已经通过上述优选实施例作了详细介绍，但应当认识到上述的描述不应被认为是对本发明的限制。在本领域技术人员阅读了上述内容后，对于本发明的多种修改和替代都将是显而易见的。因此，本发明的保护范围应由所附的权利要求来限定。Although the content of the present invention has been described in detail through the above preferred embodiments, it should be appreciated that the above description should not be considered as a limitation of the present invention. After reading the above content, it will be apparent to those skilled in the art that various modifications and substitutions of the present invention will occur. Therefore, the protection scope of the present invention should be limited by the appended claims.

Claims

1. A method for repairing power data, characterized in that it comprises the following steps:

Step S1, obtaining a historical power data set, and using a SOM neural network to classify the power data in the historical power data set to obtain a power data type;

Step S2: using the Pearson correlation coefficient theory, perform a correlation analysis on the power data type and the influencing factors, obtain the influencing factors of the power data type that meet the correlation threshold, and use the influencing factors corresponding to the power data type as feature values;

Step S3: Use the power data type and the corresponding characteristic value as training samples to train the LSTM neural network to obtain a trained LSTM neural network;

Step S4, input the influencing factors of the missing data into the trained LSTM neural network to obtain the power data type of the missing data;

Step S5: Use different methods to repair the data according to the type of power data of the missing data.

2. A power data repair method according to claim 1, characterized in that: the SOM neural network consists of two layers, an input layer and an output layer, and the input layer and the output layer are fully connected; when classifying the input power data, the neurons in the input layer work together to compete for the response opportunities of the input power data, thereby obtaining output neurons; by updating the weights, the weights around the neurons in the output layer will be adjusted accordingly, and after the power data is adjusted, each neuron in the input layer learns a specific category, thereby realizing the learning and classification of the input power data by the output layer.

3. A power data repair method according to claim 2, characterized in that: the weight value range is [0, 1].

4. A method for repairing power data according to claim 1, characterized in that: the Pearson correlation coefficient theory is calculated by the following formula:

Where: ρ _{X, Y} are correlation coefficients, ranging from [0,1]; X, Y are continuous variables; n is the number of continuous variable samples; and are the average values of continuous variables respectively; ρ _{X, Y} = 0, indicating that the variables are independent; ρ _{X, Y} > 0, indicating that the variables are positively correlated; ρ _{X, Y} < 0, indicating that the variables are negatively correlated; the larger the |ρ _{X, Y} |, the stronger the correlation between the variables, and vice versa.

5. A power data repair method according to claim 1, characterized in that: the LSTM neural network includes three gate structures to realize the prediction of power data type, namely a forget gate, an input gate and an output gate.

6. A method for repairing power data according to claim 5, characterized in that: the calculation formula of the forget gate is as follows:

f _t =σ(W _xf x _t +W _hf h _t-1 +B _f )

Where: _ft is the output of the forget gate; _Wxf and _Whf are the network parameters that the forget gate needs to learn; _Bf is the bias of the forget gate; σ is the sigmoid function:

The input gate calculation formula is as follows:

i _t =δ(W _xi x _t +W _hi h _t-1 +B _i )

C _t ^′ =tanh(W _xC x _t +W _hC h _t-1 +B _C )

C _t =f _t ×C _t-1 +i _t ×C _t ^′

Where: _it is the output of the input gate; _Wxi and _Whi are the network parameters that need to be learned by the input gate; _Bi is the bias of the input gate; _Ct ′ is the input after the tanh function is updated; _WxC and _WhC are the network parameters that need to be learned for the cell state; _Bc is the bias of the cell state; _Ct is the cell state;

The output gate calculation formula is as follows:

o _t =σ(W _xo x _t +W _ho h _t-1 +B _o )

h _t = o _t × tanh(C _t )

Where: o _t is the output of the output gate; h _t is the hidden unit of the output; W _xo and _Who are the network parameters of the output gate that need to be learned; B _o is the bias of the output gate.

7. A method for repairing power data according to claim 1, characterized in that: the method for repairing data using different methods according to the type of power data of the missing data comprises the following steps:

When the type of power data with missing data is electric energy data, the stratified mean filling method is used to stratify the electric energy data according to the same month of each year, and the average value of each layer is used to replace the missing value;

When the type of power data with missing data is load data, the maximum expectation algorithm is used to perform maximum likelihood estimation on the missing data, and the algorithm is iterated continuously to finally output the value to replace the missing value.

When the type of power data with missing data is current data, voltage data or frequency data, the mean filling method is used to replace the missing value with the average of the two previous and next observations or the mean of the non-missing data.