WO2016165378A1 - 一种储能电站海量数据清洗方法及系统 - Google Patents

一种储能电站海量数据清洗方法及系统 Download PDF

Info

Publication number
WO2016165378A1
WO2016165378A1 PCT/CN2015/097998 CN2015097998W WO2016165378A1 WO 2016165378 A1 WO2016165378 A1 WO 2016165378A1 CN 2015097998 W CN2015097998 W CN 2015097998W WO 2016165378 A1 WO2016165378 A1 WO 2016165378A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
energy storage
value
cleaning
power station
Prior art date
Application number
PCT/CN2015/097998
Other languages
English (en)
French (fr)
Inventor
李相俊
郑昊
姚继锋
惠东
王向前
徐琛
王立业
董文琦
岳巍澎
郭光朝
贾学翠
张亮
汪奂伶
郑高
Original Assignee
国网新源张家口风光储示范电站有限公司
中国电力科学研究院
国家电网公司
国网福建省电力有限公司
国网福建省电力有限公司电力科学研究院
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 国网新源张家口风光储示范电站有限公司, 中国电力科学研究院, 国家电网公司, 国网福建省电力有限公司, 国网福建省电力有限公司电力科学研究院 filed Critical 国网新源张家口风光储示范电站有限公司
Publication of WO2016165378A1 publication Critical patent/WO2016165378A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • the invention relates to a method and a system for storing energy in the technical field, in particular to a method and a system for cleaning massive data of an energy storage power station.
  • the massive data of energy storage power stations mainly have the following characteristics: (1) Large amount of data: Due to the large number of batteries in the energy storage power station, each battery has many monitoring devices, and the amount of data collected per second is huge, so it is required to clean correctly and quickly. These data. (2) The cause of abnormal data is complicated: due to the large number of monitoring devices, due to various objective and unpredictable factors such as accuracy and network signals, abnormal data exists in the data.
  • the arrival of the era of big data provides an opportunity for the development of energy storage technology.
  • the use value of energy storage battery data is huge.
  • the accurate and efficient processing of massive data of energy storage power station is the evaluation of power plant operation effect and equipment characteristics and refined control management.
  • the important foundation due to objective reasons such as monitoring equipment defects and unstable network transmission signals, the energy storage power station data often includes many outliers and default values, which greatly interferes with the analysis and calculation of the massive data of the energy storage power station. Therefore, the energy storage power station massive battery
  • the accuracy of the data analysis calculations depends to a large extent on how to effectively clean the original massive battery data.
  • the present invention provides a mass data cleaning method and system for an energy storage power station.
  • a method for cleaning mass data of an energy storage power station comprising the following steps:
  • the unreasonable data is determined and replaced in the data set obtained after the replacement.
  • the statistical value is used to locate the default value; the K-nearest neighbor algorithm is used to determine the normal value of the default value attachment, and the default value is replaced by the normal value.
  • the abnormal value is located by using a Layida criterion method; a normal value near the abnormal value is determined by a K-nearest neighbor algorithm, and the abnormal value is replaced by the normal value.
  • the unreasonable data is determined according to different characteristics of the data in the data set, and replaced by normal values before or after the unreasonable data.
  • the type of the energy storage battery data includes current, voltage, temperature, SOC, and power
  • the different category features include abrupt thresholds determined from different categories of data based on prior knowledge
  • the step III includes traversing the data of each category, determining the unreasonable data according to the mutation threshold, and replacing the unreasonable data with the data of the previous moment.
  • a mass data cleaning system for an energy storage power station comprising a data storage module, a data cleaning module and a display module;
  • the data storage module constructs a battery data table based on HBase, and the battery data table is used to store all the energy storage power station data involved;
  • the data cleaning module is based on Hadoop cleaning data of the energy storage station
  • the display module is configured to display the energy storage power station data before and after the cleaning.
  • the data cleaning module is configured to clean the energy storage power station data, and the data cleaning module includes a sub-module that implements the following steps:
  • the unreasonable data is determined and replaced in the data set obtained after the replacement.
  • the statistical value is used to locate the default value; the K-nearest neighbor algorithm is used to determine the normal value of the default value attachment, and the default value is replaced by the normal value.
  • the abnormal value is located by using a Layida criterion method; a normal value near the abnormal value is determined by a K-nearest neighbor algorithm, and the abnormal value is replaced by the normal value.
  • the type of the energy storage battery data includes current, voltage, temperature, SOC, and power
  • the different category features include abrupt thresholds determined from different categories of data based on prior knowledge
  • the step III includes traversing the data of each category, determining the unreasonable data according to the mutation threshold, and replacing the unreasonable data with the data of the previous moment.
  • the present invention has the following beneficial effects:
  • the method and system of the invention not only realize massive data cleaning of the battery, but also ensure the distributed processing requirements of the massive data, and realize the massive battery data of the energy storage power station which comprehensively considers the K-nearest neighbor algorithm, the Laida criterion method and the distributed processing. Optimize the purpose of cleaning and pretreatment, and improve the massive data and pretreatment and utilization effects of large-capacity battery energy storage power stations.
  • the cleaning method proposed by the invention combines the statistical method and the additional processing method to improve the cleaning effect;
  • multiple nodes can clean massive amounts of battery data in parallel, increasing the cleaning range and improving the cleaning accuracy.
  • parallel processing can bring about an increase in efficiency.
  • the Hadoop distributed computing framework ensures high-efficiency parallel processing of data and scalability. By adding processing nodes, the cleaning efficiency and range can be further improved.
  • the NoSQL-type database HBase is used to ensure the storage of massive battery data.
  • the method and its decentralized system use the Map/Reduce computing framework to classify the massive battery data, which reduces the computational complexity.
  • FIG. 1 is a flow chart of a method for cleaning a large amount of battery data of an energy storage power station according to the present invention
  • FIG. 2 is a structural diagram of a mass battery data cleaning system for an energy storage power station according to the present invention
  • FIG. 3 is a structural diagram of a massive battery data table of an HBase energy storage power station according to the present invention.
  • FIG. 4 is a flow chart of distributed cleaning based on Hadoop in the present invention.
  • FIG. 1 is a flowchart of a method for cleaning a large amount of battery data of an energy storage power station according to the present invention; the method includes the following steps:
  • the unreasonable data is determined and replaced in the data set obtained after the replacement.
  • step I the statistical value is used to locate the default value; the K-nearest neighbor algorithm is used to determine the normal value of the default value attachment, and the default value is replaced by the normal value. Achieve data cleaning.
  • S101 The original data of each battery monitoring point is imported into the memory for a period of time, and the original data includes a data number and a corresponding data value, and the data number corresponds to the data value, and the default value of each point whose value is null is located.
  • S102 Using a K-nearest neighbor algorithm near each battery data default value, calculating the number of occurrences of the nearby K samples in the data set of the range N, and replacing the default value by using the battery data with the largest frequency as the normal value.
  • Step II using the Layida criterion method to locate the abnormal value; using a K-nearest neighbor algorithm to determine a normal value near the abnormal value, and replacing the abnormal value with the normal value. Achieve data cleaning.
  • the default is that the battery monitoring data is subject to a normal distribution.
  • the mathematical expectation and the standard deviation of the data set containing the original data are determined, and the deviation of each data is greater than the standard deviation (generally the standard deviation of 3) Double), considered to be an abnormal value.
  • An application embodiment is provided, measuring 11 times for a certain temperature T, the data of which is as follows:
  • S202 Using a K-nearest neighbor algorithm near each battery data default value, calculating the number of occurrences of the nearby K neighbor samples in the data set of the range N, and replacing the default value with the battery data with the largest frequency as the normal value.
  • a K proximity algorithm is used to determine a value for replacement, that is, among N samples, K neighbors of x are found.
  • the K proximity algorithm is used to determine the category of the value to be replaced, and specifically includes the following steps:
  • Step III Determine unreasonable data in the data set obtained after the replacement according to the unused category characteristics of the stored energy battery data, and replace the data. Complete further cleaning. Specifically include:
  • step 301 the data in the data set is classified according to identifiers, including: temperature, voltage, current, SOC, and active power.
  • identifiers including: temperature, voltage, current, SOC, and active power.
  • 5 sets can be obtained, each set representing a data set of one category.
  • the thresholds of each category are set with reference to a priori knowledge, which sequentially traverses whether the data exceeds the threshold, and if i exceeds, the value is replaced by i-1.
  • an embodiment of the present invention further provides a mass battery data cleaning system for an energy storage power station, including a battery data storage module, a battery data cleaning module, and a battery display module.
  • the data storage module builds a battery data table based on HBase for storing all involved energy storage power station data; the data cleaning module is based on Hadoop cleaning energy storage power station data; the display module is configured to display the Storage power station data before and after cleaning.
  • the data cleaning module is configured to clean the energy storage power station data, and the data cleaning module includes a sub-module that implements the following steps: I, locate and replace a default value in the energy storage power station data set; II, locate and replace the data concentration The abnormal value; III. According to the unused category feature of the stored energy battery data, the unreasonable data is determined in the data set obtained after the replacement, and replaced.
  • a system embodiment including a battery data storage module, a battery data cleaning module, and a battery data display module.
  • the data table table1 is stored by HBase to store the massive battery data of the energy storage power station.
  • the table structure is shown in Figure 3.
  • the composition of the Row key is the data identifier, the number of days from January 1, 1970, and the number of seconds from the beginning of the day, separated by "
  • a battery data cleaning module is built, which is built on a Hadoop distributed framework.
  • the cleaning procedure built according to the cleaning method is verified.
  • the cleansing program is ported to the Hadoop distributed framework to build the mapreduce program.
  • Hadoop reads massive battery data from HBase and distributes it to each node in the Hadoop cluster for map processing.
  • the data of each battery monitoring point is integrated into one data slice through the map program and the shuffle phase.
  • the Reduce program on each node cleans the data of a battery monitoring point that is input, and stores the result in HBase.

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Health & Medical Sciences (AREA)
  • Economics (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Water Supply & Treatment (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • Human Resources & Organizations (AREA)
  • General Business, Economics & Management (AREA)
  • General Health & Medical Sciences (AREA)
  • Public Health (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Charge And Discharge Circuits For Batteries Or The Like (AREA)

Abstract

一种储能电站海量数据清洗方法及系统,该方法包括以下步骤:I、定位并替换储能电站数据集中的缺省值;II、定位并替换所述数据集中的异常值;III、根据所述储能电池数据的不同类别特征,在替换后获得的数据集中确定不合理数据,并进行替换。所述方法和系统既实现海量电池数据清洗,又能够保证海量数据分布式处理要求,实现了综合考虑K近邻算法、拉依达准则法、分布式处理等的储能电站海量电池数据优化清洗与预处理目的,提高大容量电池储能电站海量数据的与预处理与利用效果。

Description

一种储能电站海量数据清洗方法及系统 技术领域
本发明涉及一种储能技术领域的方法及系统,具体讲涉及一种储能电站海量数据清洗方法及系统。
背景技术
目前,储能电站数据采集、存储与管理方法尚不规范,需要对储能电站海量数据管理和挖掘技术开展进一步深化研究。储能电站海量数据主要有以下特点:(1)数据量大:由于储能电站电池数量众多,每个电池又有很多监测设备,每秒采集上来的数据量巨大,因此要求能够正确快速地清洗这些数据。(2)异常数据原因复杂:由于监测设备众多,受精度、网络信号等多种客观并且不可预知的因素影响,导致数据中存在异常数据。
大数据时代的到来为储能技术的发展提供了一个契机,其中储能电池数据的使用价值巨大,对储能电站海量数据的准确、高效处理是电站运行效果与设备特性评估及精细化控制管理的重要基础。然而,由于监测设备缺陷和网络传输信号不稳定等客观原因,储能电站数据常常包括了很多异常值和缺省值,极大地干扰了储能电站海量数据的分析计算,因此储能电站海量电池数据分析计算的准确程度很大程度上取决于如何有效地对原始的海量电池数据进行清洗。
针对海量的原始数据进行清洗,现有常用方法是按照一定周期将海量数据分成多个批次,然后一批一批进行清洗,流水线作业。此种方法有如下缺陷:
1、单批次处理的范围有限,导致每次进行数学统计分析的数量少,清洗精度较低;
2、不能应对海量数据的并行处理,单线清洗费时长,速度慢,效率不高。
3、数据种类繁多,单批次需要考虑各种情况,处理比较复杂,增加了计算难度。
鉴于此,需要提供一种能够克服上述现有技术方案所存在的缺陷的储能电站数据清洗方法及系统。
发明内容
为克服上述现有技术的不足,本发明提供一种储能电站海量数据清洗方法及系统。
实现上述目的所采用的解决方案为:
一种储能电站海量数据清洗方法,所述方法包括以下步骤:
I、定位并替换储能电站数据集中的缺省值;
II、定位并替换所述数据集中的异常值;
III、根据所述储能电池数据的不用类别特征,在替换后获得的数据集中确定不合理数据,并进行替换。
优选地,所述步骤I中,运用统计学处理方法定位所述缺省值;运用K近邻算法确定所述缺省值附件的正常值,用所述正常值替换所述缺省值。
优选地,所述步骤II中,运用拉依达准则法定位所述异常值;利用K近邻算法确定所述异常值附近的正常值,用所述正常值替换所述异常值。
优选地,所述步骤III中,根据所述数据集中数据的不同特征确定其中不合理数据,并用所述不合理数据的前面或后面的正常值进行替换。
优选地,所述储能电池数据的种类包括电流、电压、温度、SOC和功率;
所述不同类别特征包括根据先验知识,不同类别的数据确定的突变阈值;
所述步骤III包括,遍历各类别的数据,根据所述突变阈值,确定不合理数据,用前一时刻的数据将所述不合理数据替换。
一种储能电站海量数据清洗系统,所述系统包括数据存储模块、数据清洗模块和显示模块;
所述数据存储模块基于HBase构建电池数据表,所述电池数据表用于存储所有涉及的储能电站数据;
所述数据清洗模块基于Hadoop清洗储能电站数据;
所述显示模块用于展示所述清洗前和清洗后的储能电站数据。
优选地,所述数据清洗模块用于清洗所述储能电站数据,所述数据清洗模块包括实现以下步骤的子模块:
I、定位并替换储能电站数据集中的缺省值;
II、定位并替换所述数据集中的异常值;
III、根据所述储能电池数据的不用类别特征,在替换后获得的数据集中确定不合理数据,并进行替换。
优选地,所述步骤I中,运用统计学处理方法定位所述缺省值;运用K近邻算法确定所述缺省值附件的正常值,用所述正常值替换所述缺省值。
优选地,所述步骤II中,运用拉依达准则法定位所述异常值;利用K近邻算法确定所述异常值附近的正常值,用所述正常值替换所述异常值。
优选地,所述储能电池数据的种类包括电流、电压、温度、SOC和功率;
所述不同类别特征包括根据先验知识,不同类别的数据确定的突变阈值;
所述步骤III包括,遍历各类别的数据,根据所述突变阈值,确定不合理数据,用前一时刻的数据将所述不合理数据替换。
与现有技术相比,本发明具有以下有益效果:
1、本发明的方法和系统既实现海量电池数据清洗,又能够保证海量数据分布式处理要求,实现了综合考虑K近邻算法、拉依达准则法、分布式处理等的储能电站海量电池数据优化清洗与预处理目的,提高大容量电池储能电站海量数据的与预处理与利用效果。
2、针对储能电站海量电池数据的特点,本发明提出的清洗方法采用统计学方法和附加式处理方法相结合,提升了清洗效果;
利用Hadoop分布式处理特性,多节点并行清洗海量的电池数据,增大了清洗范围,提高了清洗精度,另外并行处理可以带来效率的提升。
采用Hadoop分布式计算框架,保证高效率并行处理数据及可扩展性,通过增加处理节点,可以进一步提升清洗效率和范围;采用NoSQL型数据库HBase,保证海量电池数据的存储。
3、该方法及其分散式系统,利用Map/Reduce计算框架,对海量电池数据进行分类处理,减少了计算的复杂度。
4、利用HBase表的多版本性,保存了清洗前后的海量电池数据,并利用前端技术EChart进行展示,给用户一个直观的清洗效果。
附图说明
图1为本发明中储能电站海量电池数据清洗方法流程图;
图2为本发明中储能电站海量电池数据清洗系统结构图;
图3为本发明中HBase储能电站海量电池数据表的结构图;
图4为本发明中基于Hadoop的分布式清洗流程图。
具体实施方式
下面结合附图对本发明的具体实施方式做进一步的详细说明。
如图1所示,图1为本发明提供的一种储能电站海量电池数据清洗方法流程图;该方法包括以下步骤:
I、定位并替换储能电站数据集中的缺省值;
II、定位并替换所述数据集中的异常值;
III、根据所述储能电池数据的不用类别特征,在替换后获得的数据集中确定不合理数据,并进行替换。
步骤I,运用统计学处理方法定位所述缺省值;运用K近邻算法确定所述缺省值附件的正常值,用所述正常值替换所述缺省值。实现数据清洗。
S101、每个电池监测点的一段时间内的原始数据导入内存,原始数据包括数据编号和对应的数据值,数据编号对应数据值,定位每个数值值为空的点即缺省值。
S102、在每个电池数据缺省值附近使用K近邻算法,计算附近K个样本在范围为N的数据集中分别出现的次数,用出现频率最大的电池数据作为正常值替换掉缺省值。
步骤II,运用拉依达准则法定位所述异常值;利用K近邻算法确定所述异常值附近的正常值,用所述正常值替换所述异常值。实现数据清洗。
S201、默认为电池监测数据是服从正态分布,根据拉依达准则法,确定包含原始数据的数据集的数学期望和标准方差,对于各个数据的偏差大于标准偏差的(一般是标准差的3倍),认为是异常值。
即,若电池检测数据总体服从正态分布,则对于大于μ+3σ或小于μ-3σ的实验数据作为异常数据,予以剔除。μ与σ分别表示正态总体的数学期望和标准差剔除后,对余下的各测量值重新计算偏差和标准偏差,并继续审查,直到各个偏差均小于3σ为止。
提供一应用实施例,对某一温度T测量11次,其数据如下:
Figure PCTCN2015097998-appb-000001
计算获得:
Figure PCTCN2015097998-appb-000002
3σ=3.01×3=9.03
Figure PCTCN2015097998-appb-000003
确定20.33为异常值,用K临近算法将该值替换。
S202、在每个电池数据缺省值附近使用K近邻算法,计算附近K个近邻样本在范围为N的数据集中分别出现的次数,用出现频率最大的电池数据作为正常值替换掉缺省值。
本发明还提供一方案,步骤S102、S202中,运用K临近算法确定用于替换的值,即在N个样本中,找出x的K个近邻。假设N个样本中有Kc个Wc类的样本,若K1,K2,…Kc分别是K个近邻中分别属于W1,W2,…,Wc类的样本数,则 定义判别函数:Gi(x)=Ki,i=1,2,3,…,c;若Gj(x)=maxki,则决策x∈Wj,用Wj替换缺省值x。
本发明还提供另一方案,步骤S102、S202中,运用K临近算法确定用于替换的值的类别,具体包括以下步骤:
设x为缺省值,取A[1]~A[k]作为x的初始近邻,计算与测试样本x间的欧氏距离d(x,A[i]),i=1~k;
按d(x,A[i])升序排序,计算最远样本与x间的距离D_max{d(x,A[j])},j=1~k;
for(i=k+1;i<=n;i++)
计算A[i]与x间的距离d(x,A[i]);
if d(x,A[i])<D
then用A[i]代替最远样本;
按d(x,A[i])升序排序,计算最远样本与x间的距离D_max{d(x,A[j])},j=1~i;
计算前k个样本A[i],i=1~k所属类别的概率,具有最大概率的类别即为样本x的类。
最后,以最大概率的类别的近邻值替换x。
步骤III,根据所述储能电池数据的不用类别特征,在替换后获得的数据集中确定不合理数据,并进行替换。完成进一步清洗。具体包括:
步骤301,将数据集中的数据根据标示符进行分类,包括:温度、电压、电流、SOC、有功功率五类。分类后可获得5个集合,每个集合表示一种类别的数据集。各类别的阈值是参照先验知识设定的,依次遍历其中数据是否超过阈值,若i超过,则用i-1替换该数值。
如图2所述,本发明实施例还提供了一种储能电站海量电池数据清洗系统,包括电池数据存储模块、电池数据清洗模块和电池显示模块。
所述数据存储模块基于HBase构建电池数据表,所述电池数据表用于存储所有涉及的储能电站数据;所述数据清洗模块基于Hadoop清洗储能电站数据;所述显示模块用于展示所述清洗前和清洗后的储能电站数据。
数据清洗模块用于清洗所述储能电站数据,所述数据清洗模块包括实现以下步骤的子模块:I、定位并替换储能电站数据集中的缺省值;II、定位并替换所述数据集中的异常值;III、根据所述储能电池数据的不用类别特征,在替换后获得的数据集中确定不合理数据,并进行替换。
提供一系统实施例,包括电池数据存储模块,电池数据清洗模块和电池数据显示模块。
构建电池数据存储模块。
通过HBase建立数据表table1存储储能电站海量电池数据,表结构如图3所示。
其中,Row key的组成为数据标示符、距离1970年1月1日的天数和当天开始的秒数,中间以“|“分隔开来,表中存有2个版本的数据,t0表示清洗前的数据,t1表示清洗后的数据。Column:”data”为列族,value为列名,后面跟的数字为监测的电池数据。
构建电池数据清洗模块,该模块基于Hadoop分布式框架构建。
将根据清洗方法构建的清洗程序进行验证。将清洗程序移植到Hadoop分布式框架中来,构建mapreduce程序。
如图4所示,Hadoop从HBase中读取海量电池数据并进行分片分发给Hadoop集群下各个节点进行map处理,通过map程序和shuffle阶段将每个电池监测点的数据都归集成一个数据片供reduce程序处理。各个节点上的Reduce程序则对输入进来的某个电池监测点的数据进行清洗,并将结果存入HBase中。
构建储能电站海量电池数据显示模块,利用EChart前端技术将清洗前后的各个电池数据以图表的形式展示给用户。通过清洗前后对比的数据,直观地判断清洗效果的好坏。
最后应当说明的是:以上实施例仅用于说明本申请的技术方案而非对其保护范围的限制,尽管参照上述实施例对本申请进行了详细的说明,所属领域的普通技术人员应当理解:本领域技术人员阅读本申请后依然可对申请的具体实施方式进行种种变更、修改或者等同替换,但这些变更、修改或者等同替换,均在申请待批的权利要求保护范围之内。

Claims (10)

  1. 一种储能电站海量数据清洗方法,其特征在于:所述方法包括以下步骤:
    I、定位并替换储能电站数据集中的缺省值;
    II、定位并替换所述数据集中的异常值;
    III、根据所述储能电池数据的不用类别特征,在替换后获得的数据集中确定不合理数据,并进行替换。
  2. 如权利要求1所述的方法,其特征在于:所述步骤I中,运用统计学处理方法定位所述缺省值;运用K近邻算法确定所述缺省值附件的正常值,用出现频率最大的所述正常值替换所述缺省值。
  3. 如权利要求1所述的方法,其特征在于:所述步骤II中,运用拉依达准则法定位所述异常值;利用K近邻算法确定所述异常值附近的正常值,用出现频率最大的所述正常值替换所述异常值。
  4. 如权利要求1所述的方法,其特征在于:所述步骤III中,根据所述数据集中数据的不同特征确定其中不合理数据,并用所述不合理数据的前面或后面的正常值进行替换。
  5. 如权利要求1所述的方法,其特征在于:所述储能电池数据的种类包括电流、电压、温度、SOC和功率;
    所述不同类别特征包括根据先验知识,不同类别的数据确定的突变阈值;
    所述步骤III包括,遍历各类别的数据,根据所述突变阈值,确定不合理数据,用前一时刻的数据将所述不合理数据替换。
  6. 一种储能电站海量数据清洗系统,其特征在于:所述系统包括数据存储模块、数据清洗模块和显示模块;
    所述数据存储模块基于HBase构建电池数据表,所述电池数据表用于存储所有涉及的储能电站数据;
    所述数据清洗模块基于Hadoop清洗储能电站数据;
    所述显示模块用于展示所述清洗前和清洗后的储能电站数据。
  7. 如权利要求6所述的系统,其特征在于:所述数据清洗模块用于清洗所述储能电站数据,所述数据清洗模块包括实现以下步骤的子模块:
    I、定位并替换储能电站数据集中的缺省值;
    II、定位并替换所述数据集中的异常值;
    III、根据所述储能电池数据的不用类别特征,在替换后获得的数据集中确定不合理数据,并进行替换。
  8. 如权利要求7所述的系统,其特征在于:所述步骤I中,运用统计学处理方法定位所述缺省值;运用K近邻算法确定所述缺省值附件的正常值,用所述正常值替换所述缺省值。
  9. 如权利要求7所述的系统,其特征在于:所述步骤II中,运用拉依达准则法定位所述异常值;利用K近邻算法确定所述异常值附近的正常值,用所述正常值替换所述异常值。
  10. 如权利要求7所述的系统,其特征在于:所述储能电池数据的种类包括电流、电压、温度、SOC和功率;
    所述不同类别特征包括根据先验知识,不同类别的数据确定的突变阈值;
    所述步骤III包括,遍历各类别的数据,根据所述突变阈值,确定不合理数据,用前一时刻的数据将所述不合理数据替换。
PCT/CN2015/097998 2015-04-16 2015-12-21 一种储能电站海量数据清洗方法及系统 WO2016165378A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201510181094.1 2015-04-16
CN201510181094.1A CN104750861B (zh) 2015-04-16 2015-04-16 一种储能电站海量数据清洗方法及系统

Publications (1)

Publication Number Publication Date
WO2016165378A1 true WO2016165378A1 (zh) 2016-10-20

Family

ID=53590545

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2015/097998 WO2016165378A1 (zh) 2015-04-16 2015-12-21 一种储能电站海量数据清洗方法及系统

Country Status (2)

Country Link
CN (1) CN104750861B (zh)
WO (1) WO2016165378A1 (zh)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111552685A (zh) * 2019-12-27 2020-08-18 广东电网有限责任公司电力科学研究院 基于Spark的电能质量数据清洗方法及装置
CN111695623A (zh) * 2020-06-09 2020-09-22 中国电力科学研究院有限公司 基于模糊聚类的大规模电池储能系统成组建模方法、系统、设备及可读存储介质
CN111797078A (zh) * 2019-04-09 2020-10-20 Oppo广东移动通信有限公司 数据清洗方法、模型训练方法、装置、存储介质及设备
CN112286924A (zh) * 2020-11-20 2021-01-29 中国水利水电科学研究院 一种数据异常动态识别与多模式自匹配的数据清洗技术

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104750861B (zh) * 2015-04-16 2019-05-21 中国电力科学研究院 一种储能电站海量数据清洗方法及系统
CN105138650A (zh) * 2015-08-28 2015-12-09 成都康赛信息技术有限公司 一种基于孤立点挖掘的Hadoop数据清洗方法及系统
CN106682225B (zh) * 2017-01-04 2019-07-23 成都四方伟业软件股份有限公司 一种大数据的汇集存储方法与系统
CN106934208B (zh) * 2017-01-05 2019-07-23 国家能源局大坝安全监察中心 一种大坝异常监测数据自动识别方法
CN109033174A (zh) * 2018-06-21 2018-12-18 北京国网信通埃森哲信息技术有限公司 一种电能质量数据清洗方法及装置
CN109039809A (zh) * 2018-07-17 2018-12-18 中国电子科技集团公司电子科学研究院 一种网闸集群异常的检测方法、装置及内网服务器
CN109710601A (zh) * 2018-12-25 2019-05-03 国电大渡河大岗山水电开发有限公司 一种智能化水电厂运行数据清洗方法
CN112231333A (zh) * 2020-11-09 2021-01-15 南京莱斯网信技术研究院有限公司 一种生态环境数据共享交换方法和系统
CN112765149B (zh) * 2020-12-03 2023-06-09 万克能源科技有限公司 储能系统容量的计算系统及方法
CN114995992A (zh) * 2022-01-29 2022-09-02 中国华能集团清洁能源技术研究院有限公司 一种电池储能分布式计算控制系统及控制方法

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102135979A (zh) * 2010-12-08 2011-07-27 华为技术有限公司 数据清洗方法及装置
WO2013146884A1 (ja) * 2012-03-27 2013-10-03 日本電気株式会社 データクレンジングシステム、方法およびプログラム
CN103701931A (zh) * 2014-01-08 2014-04-02 东华大学 一种基于云平台的远程环境数据管理监控系统
CN103955510A (zh) * 2014-04-30 2014-07-30 广西电网公司电力科学研究院 基于etl云平台上传的海量电力营销数据整合方法
CN104111996A (zh) * 2014-07-07 2014-10-22 山大地纬软件股份有限公司 基于hadoop平台的医保门诊大数据抽取系统及方法
CN104750861A (zh) * 2015-04-16 2015-07-01 中国电力科学研究院 一种储能电站海量数据清洗方法及系统

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102982489A (zh) * 2012-11-23 2013-03-20 广东电网公司电力科学研究院 一种基于海量计量数据的电力客户在线分群方法

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102135979A (zh) * 2010-12-08 2011-07-27 华为技术有限公司 数据清洗方法及装置
WO2013146884A1 (ja) * 2012-03-27 2013-10-03 日本電気株式会社 データクレンジングシステム、方法およびプログラム
CN103701931A (zh) * 2014-01-08 2014-04-02 东华大学 一种基于云平台的远程环境数据管理监控系统
CN103955510A (zh) * 2014-04-30 2014-07-30 广西电网公司电力科学研究院 基于etl云平台上传的海量电力营销数据整合方法
CN104111996A (zh) * 2014-07-07 2014-10-22 山大地纬软件股份有限公司 基于hadoop平台的医保门诊大数据抽取系统及方法
CN104750861A (zh) * 2015-04-16 2015-07-01 中国电力科学研究院 一种储能电站海量数据清洗方法及系统

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111797078A (zh) * 2019-04-09 2020-10-20 Oppo广东移动通信有限公司 数据清洗方法、模型训练方法、装置、存储介质及设备
CN111552685A (zh) * 2019-12-27 2020-08-18 广东电网有限责任公司电力科学研究院 基于Spark的电能质量数据清洗方法及装置
CN111695623A (zh) * 2020-06-09 2020-09-22 中国电力科学研究院有限公司 基于模糊聚类的大规模电池储能系统成组建模方法、系统、设备及可读存储介质
CN111695623B (zh) * 2020-06-09 2024-05-10 中国电力科学研究院有限公司 基于模糊聚类的大规模电池储能系统成组建模方法、系统、设备及可读存储介质
CN112286924A (zh) * 2020-11-20 2021-01-29 中国水利水电科学研究院 一种数据异常动态识别与多模式自匹配的数据清洗技术

Also Published As

Publication number Publication date
CN104750861A (zh) 2015-07-01
CN104750861B (zh) 2019-05-21

Similar Documents

Publication Publication Date Title
WO2016165378A1 (zh) 一种储能电站海量数据清洗方法及系统
WO2022151819A1 (zh) 一种基于聚类分析的电池系统在线故障诊断方法和系统
Taleb et al. Big data quality: A quality dimensions evaluation
CN111783953A (zh) 一种基于优化lstm网络的24点电力负荷值7日预测方法
CN110335168B (zh) 基于gru优化用电信息采集终端故障预测模型的方法及系统
CN111090643B (zh) 一种基于数据分析系统下的海量用电数据挖掘方法
CN105373620A (zh) 大规模电池储能电站海量电池数据异常检测方法及系统
Dong et al. Forecasting smart meter energy usage using distributed systems and machine learning
CN110287237B (zh) 一种基于社会网络结构分析社团数据挖掘方法
WO2020211466A1 (zh) 一种非冗余基因集聚类方法、系统及电子设备
Gao et al. A deep learning framework with spatial-temporal attention mechanism for cellular traffic prediction
CN116662412B (zh) 一种电网配用电大数据的数据挖掘方法
CN115801589B (zh) 一种事件拓扑关系确定方法、装置、设备及存储介质
CN117034149A (zh) 故障处理策略确定方法、装置、电子设备和存储介质
Pramanik et al. Predicting device availability in mobile crowd computing using ConvLSTM
Liu et al. An electric power sensor data oriented data cleaning solution
CN116859255A (zh) 一种储能电池健康状态的预测方法、装置、设备及介质
CN115203873A (zh) 应用于配电网的拓扑关系构建方法、装置、设备及介质
CN111476316B (zh) 一种基于云计算下电力负荷特征数据均值聚类的方法及系统
CN110175705B (zh) 一种负荷预测方法及包含该方法的存储器、系统
CN112711913A (zh) 基于粒度支持向量机的冷热电短期负荷预测系统及方法
Guo et al. Influencing Factors and Forecasting Statistics of Enterprise Market Sales Based on Big Data and Intelligent IoT
Xiao et al. Similarity matching method of power distribution system operating data based on neural information retrieval
CN114678069B (zh) 器官移植的免疫排斥预测及信号通路确定装置
Zhou et al. Fault record detection with random forests in data center of large power grid

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15889074

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15889074

Country of ref document: EP

Kind code of ref document: A1