CN105321124A

CN105321124A - Hadoop-based electric power cloud platform design scheme

Info

Publication number: CN105321124A
Application number: CN201510817806.4A
Authority: CN
Inventors: 刘琦; 蔡卫东; 肖博; 沈剑; 付章杰
Original assignee: Nanjing University of Information Science and Technology
Current assignee: Nanjing University of Information Science and Technology
Priority date: 2015-11-23
Filing date: 2015-11-23
Publication date: 2016-02-10

Abstract

The present invention provides a Hadoop-based power cloud platform design scheme. By using HDFS and HBase, the efficiency of data storage and indexing is improved, and MapReduce is used for off-line analysis of data to efficiently process data. Rest standard Web? Service is convenient for uploading user data, and it also brings convenience to third-party development. This solution solves the shortcomings of low efficiency and poor performance in the traditional data processing process. In the case of traditional stand-alone data processing, the low efficiency solves the problem of data consistency, ensures the stable operation of the cluster, has high throughput, and improves A higher standard data interface is established to avoid a lot of cost caused by repeated development.

Description

A Hadoop-based power cloud platform design scheme

技术领域 technical field

本发明涉及分布式计算技术、分布式存储技术，属于云计算领域，具体涉及是一种基于Hadoop的电力云平台设计方案。 The invention relates to distributed computing technology and distributed storage technology, and belongs to the field of cloud computing, in particular to a Hadoop-based power cloud platform design scheme.

背景技术 Background technique

随着各种电器设备的增多，电力数据正变得越来越大，越来越复杂。传统的数据出具处理方式很难处理TB甚至PB级别的数据，即使能正确处理，效率也很低，其次，数据的一致性、可靠性得不到保证，而且即使具有成套的系统，系统的吞吐量也有限，没有合适的接口，可以提供第三方的数据接入。 With the increase of various electrical equipment, power data is becoming larger and more complex. Traditional data issuance and processing methods are difficult to handle TB or even PB-level data. Even if they can be processed correctly, the efficiency is very low. Secondly, data consistency and reliability cannot be guaranteed, and even with a complete system, the throughput of the system The amount is also limited, and there is no suitable interface that can provide third-party data access.

不仅在学术领域,而且在工业领域,云计算都被广泛研究。由于其大规模、虚拟化、失败连接组件之间的控制,异步交流沟通等特点，使其具有独特的优势。分布式系统的要求与按需服务、计算能力和存储资源变得越来越紧迫。MapReduce,由谷歌提出的MapReduce、GFS和BigTable,很好的解决了，传统处理电力数据的困难,MapReudce一直是在云环境中最受欢迎的分发编程模型。云基础设施中使用MapReduce,可以轻松高效地处理这些数据集。 Cloud computing has been extensively studied not only in the academic field but also in the industrial field. Due to its large scale, virtualization, control between failed connection components, and asynchronous communication, it has unique advantages. Distributed system requirements and on-demand services, computing power, and storage resources are becoming increasingly pressing. MapReduce, MapReduce, GFS and BigTable proposed by Google, have solved the difficulties of traditional power data processing. MapReduce has always been the most popular distribution programming model in the cloud environment. Using MapReduce in cloud infrastructure, these datasets can be processed easily and efficiently.

一种基于Hadoop的电力云平台设计方案，综合运用ApacheHadoop，灵活运用MapReduce、HadoopDistributedFileSystem(HDFS)、HBase等能够高效地处理数据,解决数据一致性的问题。 A Hadoop-based power cloud platform design scheme, comprehensively using Apache Hadoop, flexibly using MapReduce, Hadoop Distributed File System (HDFS), HBase, etc., can efficiently process data and solve the problem of data consistency.

发明内容 Contents of the invention

本发明的目的是提供一种基于Hadoop的电力云平台设计方案，解决了传统数据数据处理过程中效率低下，性能差的缺点，传统单机情况下数据处理过程中效率低下，即使依靠分布式计算，数据的一致性和集群的稳定也很难得到保障性能差的缺点。 The purpose of the present invention is to provide a Hadoop-based power cloud platform design solution, which solves the shortcomings of low efficiency and poor performance in the traditional data processing process. In the case of a traditional single machine, the data processing process is inefficient. The consistency of data and the stability of the cluster are also difficult to guarantee the disadvantages of poor performance.

技术方案： Technical solutions:

本发明所提供的一种基于Hadoop的电力云平台设计方案主要包括多个模块组成：HDFS、MapReduce、HBase、WebService和网站。DFS作为分布式的文件存储系统用与电力数据的一般存储；同时有部分数据方便查询等，存入HBase；MapReduce可以从HBase或者HDFS中读取数据处理，并且写回；其次，网站也可以从HBase进行读写操作。APP和电力数据都可以通过WebService与HBase进行交互。 A Hadoop-based power cloud platform design scheme provided by the present invention mainly includes a plurality of modules: HDFS, MapReduce, HBase, WebService and website. DFS is used as a distributed file storage system for general storage of power data; at the same time, some data is convenient for query and stored in HBase; MapReduce can read data from HBase or HDFS for processing and write back; secondly, the website can also be accessed from HBase performs read and write operations. Both APP and power data can interact with HBase through WebService.

具体运行过程如下： The specific operation process is as follows:

(1)数据上传： (1) Data upload:

电力传感器的数据通过WebService上传到我们的集群，也可以用户或者电力部门采集的数据通过WebService直接上传。 The data of power sensors is uploaded to our cluster through WebService, and the data collected by users or power departments can also be directly uploaded through WebService.

(2)数据处理： (2) Data processing:

数据上传后，默认会会进行一些分类和统计运算，后期会进行一些用电量的预测。这些步骤由多个MapReduce程序组成。分别负责数据的一些操作，最后会写到HBase之中。 After the data is uploaded, some classification and statistical calculations will be performed by default, and some power consumption predictions will be performed later. These steps consist of multiple MapReduce programs. They are respectively responsible for some operations on the data, and will finally be written to HBase.

(3)数据的展示和管理: (3) Data display and management:

数据的展示通过网站进行，同时通过调用WebService开发APP，网站和APP都可以进行一些近期用电量的展示，一些相似用电行为用户推测，以及推荐一些比较省电的设备。 The display of data is carried out through the website, and at the same time, by calling WebService to develop an APP, both the website and the APP can display some recent power consumption, speculate on some similar power consumption behaviors, and recommend some relatively power-saving devices.

各模块具体功能如下 The specific functions of each module are as follows

(1)HDFS:在云环境中，传统的文件系统已经不能满足用户对数据灾备、数据一致性等方面的需求。HDFS运行在通用硬件上的分布式文件系统。它和现有的分布式文件系统有很多共同点。但同时，它一个高度容错性的系统，适合部署在廉价的机器上。在本设计方案中，采用HDFS作为基本的文件系统，提供高吞吐量的数据访问，可以与MapReduce、HBase等进行数据的交互。 (1) HDFS: In the cloud environment, the traditional file system can no longer meet the needs of users for data disaster recovery and data consistency. HDFS is a distributed file system running on general-purpose hardware. It has a lot in common with existing distributed file systems. But at the same time, it is a highly fault-tolerant system suitable for deployment on inexpensive machines. In this design scheme, HDFS is used as the basic file system to provide high-throughput data access, and it can interact with MapReduce, HBase, etc. for data.

(2)MapReduce:由于电力数据发送频率高，数据量大，用户数据量多等特点，传统的处理程序已经不能满足需求。MapReduce是一种编程模型，用于大规模数据集(大于1TB)的并行运算。在本方案中，MapReduce负责处理HDFS和HBase里面的一些数据，定并且时将数据写回HBase。电力云平台主要包括一些基于MapReduce统计的程序、SVM预测模型程序以及其他一些程序。具体的处理操作有： (2) MapReduce: Due to the high frequency of power data transmission, the large amount of data, and the large amount of user data, the traditional processing program can no longer meet the demand. MapReduce is a programming model for parallel operations on large-scale data sets (greater than 1TB). In this solution, MapReduce is responsible for processing some data in HDFS and HBase, and writes the data back to HBase regularly. The power cloud platform mainly includes some programs based on MapReduce statistics, SVM predictive model programs and other programs. The specific processing operations are:

2-1)单个用户总用电量统计； 2-1) Statistics of the total power consumption of a single user;

2-2)用户用电等级划分； 2-2) Classification of user electricity consumption;

2-3)未来24小时用户用电量预测； 2-3) Prediction of user electricity consumption in the next 24 hours;

2-4)电器设备的分类划分(按照耗电情况)； 2-4) Classification of electrical equipment (according to power consumption);

2-5)电器设备历史用电分析； 2-5) Historical electricity consumption analysis of electrical equipment;

2-6)其余一些统计操作。 2-6) Some other statistical operations.

(3)HBase:HBase是一个分布式的、面向列的开源数据库，谷歌Bigtable的开源实现，是一个结构化数据的分布式存储系统。HBase是一个列式存数数据库，属于NOSQL数据库的一种，它是一个适合于非结构化数据存储的数据库。在本设计方案中，HBase代替了传统的数据库，同时具备了灾备和数据一致性等功能，等功能，可以与HDFS进行很好的交互，HDFS中的电力数据可以随意的交给HDFS进行操作，HBase也可以与一些基于MapReduce编写的电力数据处理程序进行交互。 (3) HBase: HBase is a distributed, column-oriented open source database, an open source implementation of Google Bigtable, and a distributed storage system for structured data. HBase is a columnar storage database, which belongs to a type of NOSQL database. It is a database suitable for unstructured data storage. In this design scheme, HBase replaces the traditional database, and has functions such as disaster recovery and data consistency, etc., and can interact well with HDFS. The power data in HDFS can be freely handed over to HDFS for operation , HBase can also interact with some power data processing programs written based on MapReduce.

(4)网站和APP:网站基于JSP和J2EE编写，APP包括安卓版和IOS版，提供给用户访问，进行一些数据的图形化展示，同时提供一些定制的服务，具体来说包含以下功能： (4) Website and APP: The website is written based on JSP and J2EE. The APP includes Android version and IOS version, which are provided for users to access, graphically display some data, and provide some customized services, specifically including the following functions:

4-1)每月用电提醒； 4-1) Monthly power consumption reminder;

4-2)提醒方式设置； 4-2) Reminder mode setting;

4-3)省电的电器型号推荐； 4-3) Recommendations for power-saving electrical appliances;

4-4)电器老化预测。 4-4) Electrical aging prediction.

(5)WebService:开放了Rest标准的一种WebService,作为我们的数据的来源，电力传感器的数据通过WebService上传到我们的集群，然后交由多个MapReduce程序进行处理。同时，WebService也支持第三用户直接将大批量的数据上传，或者通过WebService从电力部门直接获取数据。除此意外Rest标准的service支持以下操作： (5) WebService: A WebService that opens the Rest standard. As the source of our data, the data of the power sensor is uploaded to our cluster through the WebService, and then processed by multiple MapReduce programs. At the same time, WebService also supports the third user to directly upload a large amount of data, or directly obtain data from the power department through WebService. In addition to this, rest standard services support the following operations:

5-1)PUT 5-1)PUT

对应电力数据的上传云平台，具体用法如： Corresponding to the cloud platform for uploading power data, the specific usage is as follows:

PUThttp://aaa.com/user1/device1 PUT http://aaa.com/user1/device1

5-2)DELETE 5-2) DELETE

对应对电力数据的删除，如删除20150101的数据，具体用法如下： Corresponding to the deletion of power data, such as deleting the data of 20150101, the specific usage is as follows:

DELETEhttp://aaa.com/user1/device1/20150101 DELETE http://aaa.com/user1/device1/20150101

5-3)GET 5-3) GET

获取电力数据，如某一天的数据总量，如获取20150101的数据具体用法如下： Obtaining power data, such as the total amount of data in a certain day, such as obtaining the data of 20150101, the specific usage is as follows:

GEThttp://aaa.com/user1/device1/20150101 GET http://aaa.com/user1/device1/20150101

5-4)POST 5-4) POST

可以修改云平台中数据，如要修改light1的数据，具体用法如下： You can modify the data in the cloud platform. If you want to modify the data of light1, the specific usage is as follows:

POSThttp://aaa.com/user1/device1/light1 POST http://aaa.com/user1/device1/light1

有益效果 Beneficial effect

基于Hadoop的电力云平台设计方案能够高效地处理数据,解决数据一致性的问题，保证集群的稳定运行，具有很高的吞吐量，同时提高了较高标准的数据接口，支持第三方数据的接入，避免很多重复开发带来的成本。 The Hadoop-based power cloud platform design scheme can efficiently process data, solve the problem of data consistency, ensure the stable operation of the cluster, and have high throughput. At the same time, it improves the data interface of a higher standard and supports the third-party data connection. Income, to avoid a lot of costs caused by repeated development.

附图说明 Description of drawings

图1为一种基于Hadoop的电力云平台设计方案结构图。 Figure 1 is a structural diagram of a Hadoop-based power cloud platform design scheme.

图2为一种基于Hadoop的电力云平台设计方案实现的数据流向图。 Figure 2 is a data flow diagram realized by a Hadoop-based power cloud platform design scheme.

具体实施方式 detailed description

下面将参考附图并结合实施例，来详细说明本发明。以下结合实际部署情况为例来说明本发明。 The present invention will be described in detail below with reference to the accompanying drawings and in combination with embodiments. The present invention will be described below in combination with an actual deployment situation as an example.

平台结构如图1所示：电力云平台的设计包含：HDFS、MapReduce、HBase、WebService和网站。DFS作为分布式的文件存储系统用与电力数据的一般存储；同时有部分数据方便查询等，存入HBase；MapReduce可以从HBase或者HDFS中读取数据处理，并且写回；其次，网站也可以从HBase进行读写操作。APP和电力数据都可以通过WebService与HBase进行交互。 The platform structure is shown in Figure 1: the design of the power cloud platform includes: HDFS, MapReduce, HBase, WebService and website. DFS is used as a distributed file storage system for general storage of power data; at the same time, some data is convenient for query and stored in HBase; MapReduce can read data from HBase or HDFS for processing and write back; secondly, the website can also be accessed from HBase performs read and write operations. Both APP and power data can interact with HBase through WebService.

进一步的来说，所述的云平台由1个主节点、1个第二主节点、20个从节点组成，15个从节点有15个硬盘为500GB、内存为4GB的物理机器、5台虚拟机组成，虚拟机通过Oracl公司的OracleVMVirtualBox软件实现。2台配置为200GB硬盘、4GB内存，3太配置为。3台配置为500GB硬盘、8GB内存。 Further, the cloud platform is composed of 1 master node, 1 second master node, and 20 slave nodes, and 15 slave nodes have 15 physical machines with a hard disk of 500GB and a memory of 4GB, and 5 virtual machines. Machine composition, the virtual machine is realized by the OracleVMVirtualBox software of Oracle Corporation. 2 sets are configured as 200GB hard disk and 4GB memory, and 3 sets are configured as . The three configurations are 500GB hard disk and 8GB memory.

(1)HDFS:在云环境中，传统的文件系统已经不能满足用户对数据灾备、数据一致性等方面的需求。HDFS运行在通用硬件上的分布式文件系统。它和现有的分布式文件系统有很多共同点。但同时，它一个高度容错性的系统，适合部署在廉价的机器上。在本设计方案中，采用HDFS作为基本的文件系统，提供高吞吐量的数据访问，可以与MapReduce、HBase等进行数据的交互。每个节点HDFS的备份数设置为3，从而保证了当单个节点宕机时，数据的完整性。 (1) HDFS: In the cloud environment, the traditional file system can no longer meet the needs of users for data disaster recovery and data consistency. HDFS is a distributed file system running on general-purpose hardware. It has a lot in common with existing distributed file systems. But at the same time, it is a highly fault-tolerant system suitable for deployment on inexpensive machines. In this design scheme, HDFS is used as the basic file system to provide high-throughput data access, and it can interact with MapReduce, HBase, etc. for data. The number of HDFS backups for each node is set to 3, thus ensuring data integrity when a single node goes down.

(2)MapReduce:由于电力数据发送频率高，数据量大，用户数据量多等特点，传统的处理程序已经不能满足需求。MapReduce是一种编程模型，用于大规模数据集(大于1TB)的并行运算。在本方案中，MapReduce负责处理HDFS和HBase里面的一些数据，定并且时将数据写回HBase。电力云平台主要包括一些基于MapReduce统计的程序、SVM预测模型程序以及其他一些程序。我们为每个节点配置MapReduce配置了1GB内存和1是个处理器。电力数据开始处理时将会有200个maptask和40个reducetask，数据处理完将写回HBase。 (2) MapReduce: Due to the high frequency of power data transmission, the large amount of data, and the large amount of user data, the traditional processing program can no longer meet the demand. MapReduce is a programming model for parallel operations on large-scale data sets (greater than 1TB). In this solution, MapReduce is responsible for processing some data in HDFS and HBase, and writes the data back to HBase regularly. The power cloud platform mainly includes some programs based on MapReduce statistics, SVM predictive model programs and other programs. We configured MapReduce for each node with 1GB of memory and 1 processor. There will be 200 maptasks and 40 reducetasks when the power data starts processing, and the data will be written back to HBase after processing.

具体的处理操作有： The specific processing operations are:

2-5)其余一些统计操作。 2-5) Some other statistical operations.

(3)HBase:HBase是一个分布式的、面向列的开源数据库，谷歌Bigtable的开源实现，是一个结构化数据的分布式存储系统。HBase是一个列式存数数据库，属于NOSQL数据库的一种，它是一个适合于非结构化数据存储的数据库。在本设计方案中，HBase代替了传统的数据库，同时具备了灾备和数据一致性等功能，可以与HDFS进行很好的交互，HDFS中的电力数据可以随意的交给HDFS进行操作，HBase也可以与一些基于MapReduce编写的电力数据处理程序进行交互。 (3) HBase: HBase is a distributed, column-oriented open source database, an open source implementation of Google Bigtable, and a distributed storage system for structured data. HBase is a columnar storage database, which belongs to a type of NOSQL database. It is a database suitable for unstructured data storage. In this design scheme, HBase replaces the traditional database, and has functions such as disaster recovery and data consistency, and can interact well with HDFS. The power data in HDFS can be freely handed over to HDFS for operation, and HBase also It can interact with some power data processing programs written based on MapReduce.

4-1)每月用电提醒； 4-1) Monthly power consumption reminder;

4-2)提醒方式设置； 4-2) Reminder mode setting;

4-4)电器老化预测. 4-4) Electrical aging prediction.

(5)WebService:开放了Rest标准的一种WebService,作为我们的数据的来源，电力传感器的数据通过WebService上传到我们的集群，然后交由多个MapReduce程序进行处理。同时，WebService也支持第三用户直接将大批量的数据上传，或者通过WebService从电力部门直接获取数据。除此意外Rest标准的service支持以下操作 (5) WebService: A WebService that opens the Rest standard. As the source of our data, the data of the power sensor is uploaded to our cluster through the WebService, and then processed by multiple MapReduce programs. At the same time, WebService also supports the third user to directly upload a large amount of data, or directly obtain data from the power department through WebService. In addition, rest standard service supports the following operations

5-1)PUT 5-1)PUT

PUThttp://aaa.com/user1/device1 PUT http://aaa.com/user1/device1

5-2)DELETE 5-2) DELETE

5-3)GET 5-3) GET

5-4)POST 5-4) POST

具体运行过程如下： The specific operation process is as follows:

(1)数据上传： (1) Data upload:

电力传感器的数据通过WebService上传到我们的集群，也可以用户或者电力部门采集的数据通过WebService直接上传。具体的是通过POST操作发送到实际的网址。 The data of power sensors is uploaded to our cluster through WebService, and the data collected by users or power departments can also be directly uploaded through WebService. Specifically, it is sent to the actual URL through the POST operation.

(2)数据处理： (2) Data processing:

(3)数据的展示和管理工作: (3) Data display and management:

数据的展示通过网站进行，同时通过调用WebService开发APP，网站和APP都可以进行一些近期用电量的展示，一些相似用电行为用户推测，以及推荐一些比较省电的设备。其次用户通过WebService可以直接操作自己的设备数据，如设备的添加等，设备号一旦更新，展示数据随之进行即时更新。 The display of data is carried out through the website, and at the same time, by calling WebService to develop an APP, both the website and the APP can display some recent power consumption, speculate on some similar power consumption behaviors, and recommend some relatively power-saving devices. Secondly, users can directly operate their own device data through WebService, such as adding devices, etc. Once the device number is updated, the display data will be updated immediately.

Claims

1. A design scheme of a Hadoop-based power cloud platform is characterized by comprising modules HDFS, MapReduce, HBase, WebService, websites and APP, and specifically comprising the following operation processes:

(1) and (3) data uploading:

the data of the power sensor is uploaded to the cluster through WebService, and the data collected by a user or a power department can also be directly uploaded through WebService;

(2) data processing:

after data are uploaded, classification and statistical operation can be carried out by default, and power consumption can be predicted in the later period, wherein the step is composed of a plurality of MapReduce programs; the data are respectively responsible for data operation and are written into HBase finally;

(3) and (3) displaying and managing data:

data display is carried out through a website, and meanwhile, through calling WebService to develop APP, the website and the APP can carry out display of recent electricity consumption, similar electricity consumption behavior user conjecture and recommend electricity-saving equipment.

Wherein,

the HDFS serves as a basic file system, provides high-throughput data access, and can interact with MapReduce, HBase and the like;

the MapReduce is responsible for processing data in the HDFS and the HBase and writing the data back to the HBase at regular time, and the specific processing operations include:

2-1) counting the total electricity consumption of a single user;

2-2) grading the electricity consumption of the user;

2-3) predicting the electricity consumption of the user in 24 hours in the future;

2-4) classifying the electrical equipment (according to the power consumption condition);

2-5) analyzing the historical electricity consumption of the electrical equipment;

2-6) some other statistical operations;

the HBase replaces a traditional database, has disaster recovery and data consistency functions at the same time, can interact with the HDFS, can freely deliver power data in the HDFS to the HDFS for operation, and can also interact with a power data processing program written based on MapReduce;

the website is written based on JSP and J2EE, APP includes android version and IOS version, provides user access, carries out graphical display of data, provides customized service simultaneously, and it contains specifically:

4-1) reminding the user of electricity consumption every month;

4-2) setting a reminding mode;

4-3) recommending the electric appliance model with electricity saving;

4-4) predicting the aging of the electric appliance;

the Service is used as a data source, the data of the power sensor are uploaded to the cluster through WebService, and then are processed by a plurality of MapReduce programs; meanwhile, WebService also supports a third user to directly upload large batch of data, or directly acquire data from the power department through WebService; service beyond this unexpected Rest standard supports the following operations:

5-1) PUT: uploading cloud platforms corresponding to the electric power data;

5-2) DELETE: deleting corresponding power data;

5-3) GET: acquiring power data;

5-4) POST: and modifying data in the cloud platform.