CN115982232A

CN115982232A - Hadoop-based power grid data processing method and system

Info

Publication number: CN115982232A
Application number: CN202211595468.0A
Authority: CN
Inventors: 王捷; 李晶; 黄杰; 崔一铂; 王晋; 朱国威; 刘畅; 喻潇; 周亮; 唐泽洋; 田里; 徐江珮; 龙凤; 董重重; 苏昊扬; 徐成伟; 赵环
Original assignee: State Grid Hubei Electric Power Co Ltd; Electric Power Research Institute of State Grid Hubei Electric Power Co Ltd
Current assignee: State Grid Hubei Electric Power Co Ltd; Electric Power Research Institute of State Grid Hubei Electric Power Co Ltd
Priority date: 2022-12-13
Filing date: 2022-12-13
Publication date: 2023-04-18

Abstract

The present invention provides a Hadoop-based power grid data processing method and system, the method comprising: step 1: collecting power grid big data based on Hadoop-based power grid data mining and analysis technology, the power grid big data including real-time data information and equipment parameters of the power grid Data, power generation and load data; Step 2: Based on MapReduce technology, store and manage the collected grid big data on the grid big data storage platform for data security situation storage; Step 3: Establish a zero-trust framework to realize power terminal security protection. The present invention can improve data quality, improve data storage security, improve terminal security protection, reduce the probability of data being tampered with, destroyed, and leaked, promote data circulation, give full play to the value of power grid data, and meet the national requirements for data sharing and exchange. Use data innovation, mine data dividends, and promote data economy to provide support.

Description

A Hadoop-based power grid data processing method and system

技术领域technical field

本发明涉及智能电网技术领域，具体是一种基于Hadoop的电网数据处理方法及系统。The invention relates to the technical field of smart grids, in particular to a Hadoop-based grid data processing method and system.

背景技术Background technique

随着社会电力能源需求量不断增加，电网智能化也在不断深入发展；智能电网中客户侧用能控制系统作为连接客户和智慧能源服务平台的纽带，是支撑客户侧泛在电力物联网的重要手段，也是落实需求响应及能效提升等各类综合能源业务的执行单元。With the increasing demand for social power energy, the intelligence of the power grid is also developing in depth; the customer-side energy consumption control system in the smart grid, as the link connecting customers and the smart energy service platform, is an important support for the ubiquitous power Internet of Things on the customer side. It is also the execution unit for implementing various comprehensive energy businesses such as demand response and energy efficiency improvement.

随着电网规模的不断扩大，使得电网运行的复杂度不断增加，数据安全风险的问题也在日益突出，针对电网对数据采集监控、需求响应等共性需求认识不足，开发成本高，可移植和复用性差等缺陷，有必要利用基于大数据分析相关技术实现电网数据安全态势存储、电网数据挖掘与分析以及电力终端安全防护。With the continuous expansion of the scale of the power grid, the complexity of the power grid operation continues to increase, and the problem of data security risks is also becoming increasingly prominent. In view of the lack of understanding of the common needs of the power grid for data collection and monitoring, demand response, etc., the development cost is high, and the portability and replication However, due to defects such as poor usability, it is necessary to use related technologies based on big data analysis to realize power grid data security situation storage, power grid data mining and analysis, and power terminal security protection.

发明内容Contents of the invention

本发明目的在于提供一种基于Hadoop的电网数据处理方法及系统，实现电网数据安全态势存储、电网数据挖掘与分析以及电力终端安全防护，同时解决现有技术开发成本高、可移植和复用性差等问题。The purpose of the present invention is to provide a Hadoop-based power grid data processing method and system to realize power grid data security situation storage, power grid data mining and analysis, and power terminal security protection, while solving the problems of high development costs, poor portability, and poor reusability of existing technologies And other issues.

一种基于Hadoop的电网数据处理方法，包括以下步骤：A Hadoop-based power grid data processing method, comprising the following steps:

步骤一：基于Hadoop的电网数据挖掘与分析技术采集电网大数据，所述电网大数据包括电网的实时数据信息、设备参数数据、发电及负荷数据；Step 1: Collect grid big data based on Hadoop-based power grid data mining and analysis technology. The grid big data includes real-time data information, equipment parameter data, power generation and load data of the grid;

步骤二：基于MapReduce技术将采集的所述电网大数据存储和管理在电网大数据存储平台进行数据安全态势存储；Step 2: based on MapReduce technology, store and manage the collected grid big data on the grid big data storage platform for data security situation storage;

步骤三：建立零信任框架实现电力终端安全防护。Step 3: Establish a zero trust framework to realize the security protection of power terminals.

进一步的，所述基于Hadoop的电网数据挖掘与分析技术采用数据采集层、数据存储层、业务应用层和用户层实现；Further, the Hadoop-based power grid data mining and analysis technology is implemented using a data acquisition layer, a data storage layer, a business application layer and a user layer;

所述数据采集层，采用分布式定向采集体系架构且以不同网络中的终端站点作为网络数据采集的一个基本任务单位来对原始网络数据进行采集，并向数据存储层汇聚传输，其中，每个基本任务单位采用独立的采集规则及策略；The data collection layer adopts a distributed directional collection architecture and uses terminal sites in different networks as a basic task unit of network data collection to collect original network data and aggregate and transmit them to the data storage layer. The basic task unit adopts independent collection rules and strategies;

所述数据存储层，用于完成数据的原始数据的汇聚、存储及原始处理，并提供不同类型的功能调用服务，所述数据存储层采用Hadoop框架实现；The data storage layer is used to complete the aggregation, storage and original processing of the original data of the data, and provide different types of function call services, and the data storage layer is realized by the Hadoop framework;

所述业务应用层，用于调取数据存储层处理后的网络数据并进行分析，来实现公有组件与个性业务应用组件剥离，并将网络数据分析后的结果传送至用户层进行实时展示；The business application layer is used to retrieve and analyze the network data processed by the data storage layer to realize the separation of public components and individual business application components, and transmit the analyzed results of the network data to the user layer for real-time display;

所述用户层，用于传输与展示业务应用层的数据信息。The user layer is used to transmit and display data information of the business application layer.

进一步的，所述基本任务单位包括数据采集单元，用于通过动态网页采集方法和网页信息抽取方法对数据进行采集，采用基于行块分布函数的方法抽取信息，进而获取数据。Further, the basic task unit includes a data collection unit, which is used to collect data through a dynamic webpage collection method and a webpage information extraction method, and extract information based on a row block distribution function, and then obtain data.

进一步的，所述数据采集单元通过广度遍历站点获取Feed地址，对每个Feed地址对应的信息进行实时采集，跟踪更新信息，以增量更新方式采集信息。Further, the data collection unit acquires feed addresses by traversing the sites in breadth, collects information corresponding to each feed address in real time, tracks and updates information, and collects information in an incremental update manner.

进一步的，所述采集规则及策略包括垂直搜索模板半自动生成技术、动态页面优化访问技术和智能化的抓取进程调度策略。Further, the collection rules and strategies include vertical search template semi-automatic generation technology, dynamic page optimization access technology and intelligent crawling process scheduling strategy.

进一步的，所述数据存储层中对原始数据的处理，包括采用窗口技术来分块所要处理的数据、采用滑窗模型来描述流数据的变化及使用滑窗模型保存原有数据中的模式。Further, the processing of the original data in the data storage layer includes using the window technology to block the data to be processed, using the sliding window model to describe the change of the stream data, and using the sliding window model to save the pattern in the original data.

进一步的，使用滑窗模型保存原有数据中的模式，具体为：Further, use the sliding window model to save the pattern in the original data, specifically:

根据数据的变化分块数据，将未变化部分数据的模式存入滑窗；分别计算添加和删除部分数据的模式；根据变化部分数据的模式，更新滑窗中所保存的模式；Block the data according to the change of the data, store the mode of the unchanged part of the data in the sliding window; calculate the mode of adding and deleting part of the data respectively; update the mode saved in the sliding window according to the mode of the changed part of the data;

使用多窗口方法，支持用户的在线挖掘请求；多窗口方法将数据流划分为多个固定长度的段，每个段都形成一个窗口，当内存中的窗口数达到一定数目时，将这多个窗口合并，形成概要层次更高的窗口随着数据流的流入，概要层次不同的多个窗口形成一个层次结构，此时每个窗口相当于对数据流上两个预定义的时间戳之间数据的一个快照。Use the multi-window method to support users' online mining requests; the multi-window method divides the data stream into multiple fixed-length segments, and each segment forms a window. When the number of windows in the memory reaches a certain number, the multiple Merge windows to form a window with a higher summary level. With the inflow of data streams, multiple windows with different summary levels form a hierarchical structure. At this time, each window is equivalent to the data between two predefined time stamps on the data stream. A snapshot of .

进一步的，所述基于MapReduce技术进行数据安全态势存储，包括以下步骤：Further, the storage of data security situation based on MapReduce technology includes the following steps:

步骤2.1：调取用户层的数据信息输入至user program；Step 2.1: Call the data information of the user layer and input it to the user program;

步骤2.2：MapReduce库将user program的输入文件划分为M份，M为用户定义；Step 2.2: The MapReduce library divides the input file of the user program into M parts, and M is defined by the user;

步骤2.3：被分配Map作业的worker读取对应分片的输入数据，Map作业从输入数据中抽取出键值对，每一个键值对都作为参数传递给map函数，map函数产生的中间键值对被缓存在内存中；Step 2.3: The worker assigned to the Map job reads the input data of the corresponding shard, and the Map job extracts key-value pairs from the input data. Each key-value pair is passed as a parameter to the map function, and the intermediate key value generated by the map function pairs are cached in memory;

步骤2.3：缓存的中间键值被定期写入本地磁盘，而且被分为R个区，R的大小由用户定义，将来每个区会对应一个Reduce作业；中间键值对的位置会被通报给master，master负责将信息转发给Reduce worker；Step 2.3: The cached intermediate key values are regularly written to the local disk and divided into R areas. The size of R is defined by the user. In the future, each area will correspond to a Reduce job; the location of the intermediate key-value pairs will be notified to master, the master is responsible for forwarding the information to the Reduce worker;

步骤2.5：master通知分配Reduce作业的worker负责的分区的具体位置，当Reduceworker把所有负责的中间键值读取后，对中间键值进行排序，使得相同键的键值对聚集在一起；Step 2.5: The master notifies the worker assigned to the Reduce job to be responsible for the specific location of the partition. After the Reduceworker reads all the responsible intermediate key values, it sorts the intermediate key values so that the key-value pairs of the same key are gathered together;

步骤2.5：reduce worker遍历排序后的中间键值对，对于每个唯一的键，将键与关联的值传递给reduce函数，reduce函数产生的输出会添加到这个分区的输出文件中；Step 2.5: The reduce worker traverses the sorted intermediate key-value pairs, and for each unique key, passes the key and associated value to the reduce function, and the output generated by the reduce function will be added to the output file of this partition;

步骤2.7：当所有的Map和Reduce作业完成，master唤醒user program，MapReduce函数调用返回user program的代码。Step 2.7: When all Map and Reduce jobs are completed, the master wakes up the user program, and the MapReduce function call returns the code of the user program.

进一步的，所述建立零信任框架实现电力终端安全防护，包括以下步骤：Further, the establishment of a zero-trust framework to realize the security protection of electric power terminals includes the following steps:

步骤3.1、构建零信任模块，对电力终端设备的设备信息，进行采集，并根据采集的设备信息进行信任评分，给出信任值，根据信任值对电力终端设备进行评估，将电力终端设备分为可信任设备、异常设备；Step 3.1, build a zero trust module, collect the equipment information of the power terminal equipment, and perform a trust score based on the collected equipment information, give a trust value, evaluate the power terminal equipment according to the trust value, and classify the power terminal equipment into Trusted equipment, abnormal equipment;

步骤3.2、对步骤3.1中的可信任设备进行数据采集，获得采集数据；Step 3.2, collect data from the trusted device in step 3.1, and obtain collected data;

步骤3.3、构建安全态势感知模块，对步骤3.2中的采集数据进行态势感知，当感知合格后，将采集的数据转换为感知数据；Step 3.3, building a security situation awareness module, performing situation awareness on the collected data in step 3.2, and converting the collected data into perception data when the perception is qualified;

步骤3.4、构建实时管控模块，对步骤3.3中的感知数据进行管控，并生成安全指令；Step 3.4, build a real-time control module, control the sensing data in step 3.3, and generate security instructions;

步骤3.5、将步骤3.4的安全指令下发给电力终端设备，对电力终端设备进行安全防护以及安全加固。Step 3.5, sending the safety instruction in step 3.4 to the power terminal equipment, and performing security protection and security reinforcement on the power terminal equipment.

一种基于Hadoop的电网数据处理系统，包括：计算机可读存储介质和处理器；A Hadoop-based grid data processing system, comprising: a computer-readable storage medium and a processor;

所述计算机可读存储介质用于存储可执行指令；The computer-readable storage medium is used to store executable instructions;

所述处理器用于读取所述计算机可读存储介质中存储的可执行指令，执行所述的基于Hadoop的电网数据处理方法。The processor is configured to read executable instructions stored in the computer-readable storage medium, and execute the Hadoop-based grid data processing method.

本发明面向电网数据安全业务提升安全防护能力，提升数据安全事件识别准确率，溯源时效性，降低数据被篡改、破坏、外泄的几率，促进数据流转，充分发挥电网数据价值，符合国家对数据共享交换的要求，为利用数据创新、挖掘数据红利、推动数据经济提供支撑。The present invention improves the security protection ability for grid data security business, improves the identification accuracy of data security events, and the timeliness of traceability, reduces the probability of data being tampered with, destroyed, and leaked, promotes data circulation, and fully utilizes the value of grid data. The requirements of sharing and exchange provide support for using data innovation, mining data dividends, and promoting the data economy.

附图说明Description of drawings

图1为本发明实施例一种基于Hadoop的电网数据处理方法的流程图；Fig. 1 is a flow chart of a Hadoop-based grid data processing method according to an embodiment of the present invention;

图2为本发明实施例中基于MapReduce技术进行数据安全态势存储的流程图。FIG. 2 is a flowchart of data security situation storage based on MapReduce technology in an embodiment of the present invention.

具体实施方式Detailed ways

为使本发明实施例的目的、技术方案和优点更加清楚，下面将结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例是本发明的一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有做出创造性劳动的前提下所获得的所有其他实施例，都属于本发明保护的范围。In order to make the purpose, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the drawings in the embodiments of the present invention. Obviously, the described embodiments It is a part of embodiments of the present invention, but not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

请参阅图1，本发明第一方面提供一种基于Hadoop的电网数据处理方法，包括以下步骤：Referring to Fig. 1, the first aspect of the present invention provides a Hadoop-based grid data processing method, comprising the following steps:

本发明所述的基于Hadoop的电网数据挖掘与分析技术采用数据采集层、数据存储层、业务应用层和用户层实现。The Hadoop-based power grid data mining and analysis technology described in the present invention is realized by using a data acquisition layer, a data storage layer, a business application layer and a user layer.

所述数据采集层，采用分布式定向采集体系架构且以不同网络中的终端站点作为电网数据采集的一个基本任务单位来对实时数据信息、设备参数数据、发电及负荷数据进行采集，并向数据存储层汇聚传输；所述基本任务单位包括数据采集单元，用于通过动态网页采集方法和网页信息抽取方法对数据进行采集，采用基于行块分布函数的方法抽取信息，进而获取数据，具体的，所述数据采集单元通过广度遍历站点获取Feed地址，对每个Feed地址对应的信息进行实时采集，跟踪更新信息，以增量更新方式采集信息。其中，每个基本任务单位采用独立的采集规则及策略；所述采集规则及策略包括垂直搜索模板半自动生成技术、动态页面优化访问技术和智能化的抓取进程调度策略。The data acquisition layer adopts a distributed directional acquisition system architecture and uses terminal sites in different networks as a basic task unit for grid data acquisition to collect real-time data information, equipment parameter data, power generation and load data, and send data to the data collection layer. Convergence and transmission at the storage layer; the basic task unit includes a data collection unit, which is used to collect data through a dynamic webpage collection method and a webpage information extraction method, and extract information by using a method based on a row block distribution function, and then obtain data. Specifically, The data collection unit acquires feed addresses by traversing the sites in breadth, collects information corresponding to each feed address in real time, tracks and updates information, and collects information in an incremental update manner. Wherein, each basic task unit adopts independent acquisition rules and strategies; said acquisition rules and strategies include semi-automatic generation technology of vertical search template, dynamic page optimization access technology and intelligent crawling process scheduling strategy.

所述数据存储层，用于完成数据的原始数据的汇聚、存储及原始处理，并提供不同类型的功能调用服务；所述数据存储层采用Hadoop框架实现；The data storage layer is used to complete the aggregation, storage and original processing of the original data of the data, and provide different types of function call services; the data storage layer is implemented using the Hadoop framework;

所述业务应用层，用于调取数据存储层处理后的数据并进行分析，来实现公有组件与个性业务应用组件剥离，并将网络数据分析后的结果传送至用户层进行实时展示。The business application layer is used to retrieve and analyze the data processed by the data storage layer to realize the separation of public components and individual business application components, and transmit the analyzed results of network data to the user layer for real-time display.

所述基于MapReduce技术进行数据安全态势存储采用分布式文件系统HDFS和MapReduce实现，所述分布式文件系统HDFS是Hadoop的文件系统，用于存储超大文件；MapReduce是Hadoop的并行编程模型，用于对分布式文件系统HDFS上存储的数据进行深度分析。The storage of data security situation based on MapReduce technology adopts distributed file system HDFS and MapReduce to realize, and described distributed file system HDFS is the file system of Hadoop, is used for storing ultra-large files; MapReduce is the parallel programming model of Hadoop, is used for In-depth analysis of data stored on the distributed file system HDFS.

图2所示，所述基于MapReduce技术进行数据安全态势存储，包括以下步骤：As shown in Figure 2, the described storage of data security situation based on MapReduce technology includes the following steps:

步骤2.3：被分配Map作业的worker，开始读取对应分片的输入数据，Map作业数量是由M决定的，和split一一对应；Map作业从输入数据中抽取出键值对，每一个键值对都作为参数传递给map函数，map函数产生的中间键值对被缓存在内存中；Step 2.3: The worker assigned the Map job starts to read the input data corresponding to the shard. The number of Map jobs is determined by M, which corresponds to the split one by one; the Map job extracts key-value pairs from the input data, and each key Value pairs are passed to the map function as parameters, and the intermediate key-value pairs generated by the map function are cached in memory;

步骤2.3：缓存的中间键值会被定期写入本地磁盘，而且被分为R个区，R的大小是由用户定义的，将来每个区会对应一个Reduce作业；这些中间键值对的位置会被通报给master，master负责将信息转发给Reduce worker；Step 2.3: The cached intermediate key values will be regularly written to the local disk and divided into R areas. The size of R is defined by the user. In the future, each area will correspond to a Reduce job; the location of these intermediate key-value pairs will be notified to the master, and the master is responsible for forwarding the information to the Reduce worker;

步骤2.5：master通知分配了Reduce作业的worker负责的分区的具体位置，当Reduce worker把所有它负责的中间键值对都读过来后，先对它们进行排序，使得相同键的键值对聚集在一起；Step 2.5: The master notifies the worker assigned the Reduce job to be responsible for the specific location of the partition. When the Reduce worker reads all the intermediate key-value pairs it is responsible for, it first sorts them so that the key-value pairs with the same key are gathered in Together;

步骤2.5：reduce worker遍历排序后的中间键值对，对于每个唯一的键，都将键与关联的值传递给reduce函数，reduce函数产生的输出会添加到这个分区的输出文件中；Step 2.5: The reduce worker traverses the sorted intermediate key-value pairs, and for each unique key, passes the key and associated value to the reduce function, and the output generated by the reduce function will be added to the output file of this partition;

所述建立零信任框架实现电力终端安全防护，包括以下步骤：The establishment of a zero-trust framework to realize the security protection of electric power terminals includes the following steps:

步骤3.1：构建零信任模块，对电力终端设备的设备信息，进行采集；并根据采集的设备信息进行信任评分，给出信任值；根据信任值对电力终端设备进行评估，将电力终端设备分为可信任设备、异常设备；Step 3.1: Construct a zero-trust module to collect equipment information of power terminal equipment; perform a trust score based on the collected equipment information, and give a trust value; evaluate power terminal equipment according to the trust value, and classify power terminal equipment into Trusted equipment, abnormal equipment;

零信任模块采集设备信息的流程为：读取设备数据、读取规则文件、解析规则库、采集设备信息；同时，零信任模块对电力终端设备进行持续的动态设备身份验证，用以阻断虚假设备信息；信任值是身份验证的指标，根据设备的基础属性、访问时延进行综合评分获取；信任值的维护包括以下内容：The process of collecting device information by the zero trust module is: reading device data, reading rule files, parsing the rule base, and collecting device information; at the same time, the zero trust module performs continuous dynamic device identity verification on power terminal devices to block false Device information; the trust value is an indicator of identity verification, which is obtained by comprehensive scoring based on the basic attributes of the device and access delay; the maintenance of the trust value includes the following:

(1)信任值最大为M，最低为N；M>N(1) The maximum trust value is M, and the minimum is N; M>N

(2)信任值阈值为H，高于等于H为合法用户，低于H为非法用户；(2) The trust value threshold is H, which is higher than or equal to H as a legitimate user, and lower than H is an illegal user;

(3)每次验证成功信任值加T；(3) Add T to the trust value for each successful verification;

(4)每次验证失败信任值减T；(4) The trust value minus T for each verification failure;

所述信任值包括直接信任值、时延评估信任值、异常行为评估信任值，其计算公式如下：The trust value includes a direct trust value, a delay evaluation trust value, and an abnormal behavior evaluation trust value, and its calculation formula is as follows:

T＝T_d+T_t+T_a T=T _d +T _t +T _a

T为信任值，T_d为直接信任值、T_t为时延评估信任值、T_a为异常行为评估信任值；T is the trust value, T _d is the direct trust value, T _t is the delay evaluation trust value, T _a is the abnormal behavior evaluation trust value;

直接信任值为S型函数，其计算公式为：The direct trust value is an S-type function, and its calculation formula is:

其中T_d为直接信任值，f为不同设备的直接信任值约束系数；时延评估信任值和异常行为评估信任值组成间接信任值；Where T _d is the direct trust value, f is the direct trust value constraint coefficient of different devices; the time delay evaluation trust value and the abnormal behavior evaluation trust value form the indirect trust value;

时延评估信任值根据设备应答时间进行评估，其计算公式为：The delay evaluation trust value is evaluated based on the device response time, and its calculation formula is:

其中T_t为时延评估信任值，τ为设备应答最大允许延迟，D为信息传输延迟量；Among them, T _t is the trust value of delay evaluation, τ is the maximum allowable delay of equipment response, and D is the amount of information transmission delay;

异常行为评估信任值根据设备异常行为与正常行为的占比量进行评估，其计算公式为：

The abnormal behavior evaluation trust value is evaluated according to the ratio of abnormal behavior to normal behavior of the device. The calculation formula is:

其中T_a为异常行为评估信任值，where T _a is the trust value of abnormal behavior evaluation,

A_u为异常行为量，A_n为正常行为量；A _u is the amount of abnormal behavior, A _n is the amount of normal behavior;

步骤3.2：对步骤3.1中的可信任设备进行数据采集，获得采集数据；Step 3.2: Collect data from the trusted device in step 3.1 to obtain collected data;

步骤3.3：构建安全态势感知模块，对步骤3.2中的采集数据进行态势感知；当感知合格后，将采集的数据，转换为感知数据；Step 3.3: Construct a security situation awareness module to perform situation awareness on the collected data in step 3.2; when the perception is qualified, convert the collected data into perception data;

所述态势感知包括入侵检测、脆弱性感知、文件完整性检测、日志监控操作；The situational awareness includes intrusion detection, vulnerability awareness, file integrity detection, and log monitoring operations;

步骤3.4：构建实时管控模块，对步骤3.3中的感知数据，进行管控，并生成安全指令；Step 3.4: Build a real-time management and control module to control the sensing data in step 3.3 and generate safety instructions;

步骤3.5：将步骤3.4的安全指令下发给电力终端设备，对电力终端设备进行安全防护以及安全加固；Step 3.5: Issue the security instruction in step 3.4 to the power terminal equipment, and perform security protection and security reinforcement for the power terminal equipment;

本发明可提升数据质量，提升数据存储安全性，提升终端安全防护，降低数据被篡改、破坏、外泄的几率，促进数据流转，充分发挥电网数据价值，符合国家对数据共享交换的要求，为利用数据创新、挖掘数据红利、推动数据经济提供支撑。The present invention can improve data quality, improve data storage security, improve terminal security protection, reduce the probability of data being tampered with, destroyed, and leaked, promote data circulation, and give full play to the value of power grid data, which meets the national requirements for data sharing and exchange. Use data innovation, mine data dividends, and promote data economy to provide support.

本发明另一方面提供了一种基于Hadoop的电网数据处理系统，包括：计算机可读存储介质和处理器；Another aspect of the present invention provides a Hadoop-based grid data processing system, including: a computer-readable storage medium and a processor;

所述处理器用于读取所述计算机可读存储介质中存储的可执行指令，执行第一方面所述的基于Hadoop的电网数据处理方法。The processor is configured to read executable instructions stored in the computer-readable storage medium, and execute the Hadoop-based power grid data processing method described in the first aspect.

本发明另一方面提供了一种非暂态计算机可读存储介质，其上存储有计算机程序，该计算机程序被处理器执行时实现第一方面所述的基于Hadoop的电网数据处理方法。Another aspect of the present invention provides a non-transitory computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the Hadoop-based grid data processing method described in the first aspect is implemented.

本领域内的技术人员应明白，本申请的实施例可提供为方法、系统、或计算机程序产品。因此，本申请可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且，本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。Those skilled in the art should understand that the embodiments of the present application may be provided as methods, systems, or computer program products. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

本申请是参照根据本申请实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器，使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。The present application is described with reference to flowcharts and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the present application. It should be understood that each procedure and/or block in the flowchart and/or block diagram, and a combination of procedures and/or blocks in the flowchart and/or block diagram can be realized by computer program instructions. These computer program instructions may be provided to a general purpose computer, special purpose computer, embedded processor, or processor of other programmable data processing equipment to produce a machine such that the instructions executed by the processor of the computer or other programmable data processing equipment produce a An apparatus for realizing the functions specified in one or more procedures of the flowchart and/or one or more blocks of the block diagram.

这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中，使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品，该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing apparatus to operate in a specific manner, such that the instructions stored in the computer-readable memory produce an article of manufacture comprising instruction means, the instructions The device realizes the function specified in one or more procedures of the flowchart and/or one or more blocks of the block diagram.

这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上，使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理，从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions can also be loaded onto a computer or other programmable data processing device, causing a series of operational steps to be performed on the computer or other programmable device to produce a computer-implemented process, thereby The instructions provide steps for implementing the functions specified in the flow chart or blocks of the flowchart and/or the block or blocks of the block diagrams.

最后应当说明的是：以上实施例仅用以说明本发明的技术方案而非对其限制，尽管参照上述实施例对本发明进行了详细的说明，所属领域的普通技术人员应当理解：依然可以对本发明的具体实施方式进行修改或者等同替换，而未脱离本发明精神和范围的任何修改或者等同替换，其均应涵盖在本发明的权利要求保护范围之内。Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention and not to limit them. Although the present invention has been described in detail with reference to the above embodiments, those of ordinary skill in the art should understand that: the present invention can still be Any modifications or equivalent replacements that do not depart from the spirit and scope of the present invention shall fall within the protection scope of the claims of the present invention.

Claims

1. A power grid data processing method based on Hadoop is characterized by comprising the following steps:

the method comprises the following steps: the method comprises the steps that power grid big data are collected through a Hadoop-based power grid data mining and analyzing technology, wherein the power grid big data comprise real-time data information, equipment parameter data and power generation and load data of a power grid;

step two: storing and managing the collected power grid big data on a power grid big data storage platform based on a MapReduce technology to store data security situations;

step three: and establishing a zero trust framework to realize the safety protection of the power terminal.

2. The Hadoop-based power grid data processing method according to claim 1, wherein the Hadoop-based power grid data mining and analyzing technology is implemented by a data acquisition layer, a data storage layer, a service application layer and a user layer;

the data acquisition layer adopts a distributed directional acquisition system architecture, and terminal stations in different networks are used as a basic task unit for network data acquisition to acquire original network data and gather and transmit the original network data to the data storage layer, wherein each basic task unit adopts an independent acquisition rule and strategy;

the data storage layer is used for finishing the aggregation, storage and original processing of original data of the data and providing different types of function calling services, and the data storage layer is realized by adopting a Hadoop framework;

the service application layer is used for calling and analyzing the network data processed by the data storage layer to realize the stripping of the public component and the individual service application component and transmitting the result of the network data analysis to the user layer for real-time display;

and the user layer is used for transmitting and displaying the data information of the service application layer.

3. The Hadoop-based power grid data processing method as claimed in claim 2, wherein the basic task unit comprises a data acquisition unit for acquiring data by a dynamic web page acquisition method and a web page information extraction method, and extracting information by a method based on a row-block distribution function to further acquire data.

4. The Hadoop-based power grid data processing method as claimed in claim 3, wherein the data acquisition unit acquires Feed addresses through a breadth traversal site, acquires information corresponding to each Feed address in real time, tracks updated information, and acquires information in an incremental updating manner.

5. The Hadoop-based power grid data processing method as claimed in claim 2, wherein the collection rules and policies include vertical search template semi-automatic generation technology, dynamic page optimization access technology, and intelligent capture process scheduling policy.

6. The Hadoop-based power grid data processing method according to claim 2, wherein the processing of the raw data in the data storage layer comprises blocking the data to be processed by using a window technique, describing changes in the stream data by using a sliding window model, and saving a pattern in the original data by using the sliding window model.

7. The Hadoop-based power grid data processing method as claimed in claim 6, wherein a sliding window model is used to store patterns in the original data, specifically:

storing the mode of the unchanged partial data into a sliding window according to the changed block data of the data; respectively calculating modes of adding and deleting partial data; updating the mode stored in the sliding window according to the mode of the changed part of data;

using a multi-window method to support the online mining request of a user; the multi-window method divides the data stream into a plurality of segments with fixed length, each segment forms a window, when the number of windows in the memory reaches a certain number, the windows are combined to form a window with higher summary level along with the inflow of the data stream, the windows with different summary levels form a hierarchical structure, and at the moment, each window is equivalent to a snapshot of data between two predefined time stamps on the data stream.

8. The Hadoop-based power grid data processing method according to claim 1, wherein the data security situation storage based on the MapReduce technology comprises the following steps:

step 2.1: calling data information of a user layer and inputting the data information into a user program;

step 2.2: dividing an input file of a user program into M parts by a MapReduce library, wherein M is defined by a user;

step 2.3: reading input data of the corresponding fragments by a worker to which Map operation is allocated, extracting key value pairs from the input data by the Map operation, transmitting each key value pair to a Map function as a parameter, and caching middle key value pairs generated by the Map function in a memory;

step 2.3: the cached intermediate key value is periodically written into a local disk and is divided into R areas, the size of R is defined by a user, and each area corresponds to a Reduce operation in the future; the position of the middle key-value pair is notified to a master, and the master is responsible for forwarding the information to a Reduce worker;

step 2.5: the master informs the specific position of a partition responsible for a worker distributing Reduce operation, and after the Reduce worker reads all responsible intermediate key values, the intermediate key values are sorted so that the key value pairs of the same key are gathered together;

step 2.5: traversing the sorted intermediate key value pairs by the reduce worker, transmitting the key and the associated value to a reduce function for each unique key, and adding the output generated by the reduce function into the output file of the partition;

step 2.7: when all Map and Reduce jobs are completed, the master wakes up the user program, and the MapReduce function call returns the code of the user program.

9. The Hadoop-based power grid data processing method as claimed in claim 1, wherein the establishing of the zero trust framework to realize the power terminal security protection comprises the following steps:

step 3.1, a zero trust module is constructed, equipment information of the electric power terminal equipment is collected, trust scoring is carried out according to the collected equipment information, a trust value is given, the electric power terminal equipment is evaluated according to the trust value, and the electric power terminal equipment is divided into trusted equipment and abnormal equipment;

step 3.2, data acquisition is carried out on the trusted equipment in the step 3.1, and acquired data are obtained;

3.3, constructing a security situation awareness module, carrying out situation awareness on the acquired data in the step 3.2, and converting the acquired data into awareness data when the awareness is qualified;

step 3.4, a real-time management and control module is constructed, the perception data in the step 3.3 are managed and controlled, and a safety instruction is generated;

and 3.5, issuing the safety command in the step 3.4 to the electric power terminal equipment, and carrying out safety protection and safety reinforcement on the electric power terminal equipment.

10. A Hadoop-based power grid data processing system comprises: a computer-readable storage medium and a processor;

the computer-readable storage medium is used for storing executable instructions;

the processor is used for reading executable instructions stored in the computer-readable storage medium and executing the Hadoop-based power grid data processing method of any one of claims 1 to 9.