CN105321124A - Hadoop-based electric power cloud platform design scheme - Google Patents

Hadoop-based electric power cloud platform design scheme Download PDF

Info

Publication number
CN105321124A
CN105321124A CN201510817806.4A CN201510817806A CN105321124A CN 105321124 A CN105321124 A CN 105321124A CN 201510817806 A CN201510817806 A CN 201510817806A CN 105321124 A CN105321124 A CN 105321124A
Authority
CN
China
Prior art keywords
data
power
hbase
webservice
mapreduce
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201510817806.4A
Other languages
Chinese (zh)
Inventor
刘琦
蔡卫东
肖博
沈剑
付章杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Information Science and Technology
Original Assignee
Nanjing University of Information Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Information Science and Technology filed Critical Nanjing University of Information Science and Technology
Priority to CN201510817806.4A priority Critical patent/CN105321124A/en
Publication of CN105321124A publication Critical patent/CN105321124A/en
Pending legal-status Critical Current

Links

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides a Hadoop-based electric power cloud platform design scheme. Through the HDFS and HBase, the efficiency of data storage and index is increased; MapReduce is employed for offline analysis of data and efficient processing of data. Rest-standard Web Service facilitates uploading of user data and brings convenience for the development of a third party. The scheme solves the problems of low efficiency and poor performance in a conventional data processing process, solves the problem of low efficiency in solving the problem in data consistency in the data processing process under the conventional single machine condition, guarantees the stable operation of clusters and has a high handling capacity; data interfaces of higher standards are used, and thus the cost for repeated development is prevented.

Description

Hadoop-based electric power cloud platform design scheme
Technical Field
The invention relates to a distributed computing technology and a distributed storage technology, belongs to the field of cloud computing, and particularly relates to a design scheme of a Hadoop-based power cloud platform.
Background
With the increase of various electrical devices, power data is becoming larger and more complex. The traditional data issuing processing mode is difficult to process TB-level and PB-level data, even if the TB-level and PB-level data can be processed correctly, the efficiency is low, secondly, the consistency and the reliability of the data cannot be guaranteed, and even if a complete system is provided, the throughput of the system is limited, no proper interface is provided, and the data access of a third party can be provided.
Cloud computing is widely studied not only in academic fields but also in industrial fields. Due to the characteristics of large scale, virtualization, control among failure connection components, asynchronous communication and the like, the method has unique advantages. The requirements of distributed systems and on-demand services, computing power and storage resources are becoming more and more stringent. MapReduce, GFS and BigTable proposed by Google, have solved well, the difficulty of the traditional processing electric power data, MapReudge is always the most popular distribution programming model in the cloud environment. The data sets can be easily and efficiently processed by using MapReduce in the cloud infrastructure.
A design scheme of a Hadoop-based power cloud platform comprehensively utilizes Apache Hadoop, flexibly utilizes MapReduce, HadoopDistributedFilesystem (HDFS), HBase and the like, can efficiently process data, and solves the problem of data consistency.
Disclosure of Invention
The invention aims to provide a design scheme of an electric power cloud platform based on Hadoop, which solves the defects of low efficiency and poor performance in the traditional data processing process, and the defects of low efficiency in the data processing process under the traditional single machine condition, and difficult guarantee of the poor performance of the consistency of data and the stability of clusters even depending on distributed computation.
The technical scheme is as follows:
the invention provides a Hadoop-based power cloud platform design scheme which mainly comprises a plurality of modules: HDFS, MapReduce, HBase, WebService and websites. The DFS is used as a distributed file storage system and is used for general storage of power data; meanwhile, partial data are convenient to query and the like and are stored in HBase; MapReduce can read data processing from HBase or HDFS and write back; secondly, the website can also perform read-write operations from HBase. Both APP and power data can interact with HBase through WebService.
The specific operation process is as follows:
(1) and (3) data uploading:
data of the power sensor are uploaded to our cluster through WebService, and data collected by a user or a power department can also be directly uploaded through WebService.
(2) Data processing:
after data are uploaded, classification and statistical operation can be carried out by default, and prediction of power consumption can be carried out in the later stage. These steps consist of multiple MapReduce programs. And respectively responsible for some operations of data, and finally written into HBase.
(3) And (3) displaying and managing data:
data display is carried out through a website, and meanwhile WebService is called to develop APP, the website and the APP can display some recent electricity consumption, and some similar electricity consumption behaviors are presumed by users, and some devices with relatively low electricity consumption are recommended.
The specific functions of the modules are as follows
(1) HDFS, in a cloud environment, a traditional file system cannot meet the requirements of users on data disaster recovery, data consistency and the like. HDFS runs a distributed file system on general purpose hardware. It has many similarities with existing distributed file systems. But at the same time it is a highly fault tolerant system suitable for deployment on inexpensive machines. In the design scheme, the HDFS is used as a basic file system, high-throughput data access is provided, and data interaction can be performed with MapReduce, HBase and the like.
(2) MapReduce is characterized in that the traditional processing program cannot meet the requirements due to the characteristics of high power data transmission frequency, large data volume, large user data volume and the like. MapReduce is a programming model for parallel operation of large-scale data sets (greater than 1 TB). In the scheme, MapReduce is responsible for processing some data in HDFS and HBase and writing the data back to the HBase regularly. The power cloud platform mainly comprises a plurality of programs based on MapReduce statistics, an SVM prediction model program and other programs. The specific processing operations are as follows:
2-1) counting the total electricity consumption of a single user;
2-2) grading the electricity consumption of the user;
2-3) predicting the electricity consumption of the user in 24 hours in the future;
2-4) classifying the electrical equipment (according to the power consumption condition);
2-5) analyzing the historical electricity consumption of the electrical equipment;
2-6) some of the rest of the statistical operations.
(3) HBase is a distributed and column-oriented open source database, and the open source implementation of Google Bigtable is a distributed storage system of structured data. HBase is a column-type memory database, belongs to one of NOSQL databases, and is a database suitable for unstructured data storage. In the design scheme, the HBase replaces a traditional database, has the functions of disaster recovery, data consistency and the like, can well interact with the HDFS, can freely deliver power data in the HDFS to the HDFS for operation, and can also interact with power data processing programs written based on MapReduce.
(4) The website is written based on JSP and J2EE, the APP comprises an android version and an IOS version, the APP is provided for users to access, graphical display of some data is carried out, some customized services are provided, and the method specifically comprises the following functions:
4-1) reminding the user of electricity consumption every month;
4-2) setting a reminding mode;
4-3) recommending the electric appliance model with electricity saving;
4-4) predicting the aging of the electric appliance.
(5) WebService, which is a WebService with an open Rest standard, is used as a source of data, and the data of the power sensor is uploaded to a cluster through the WebService and then processed by a plurality of MapReduce programs. Meanwhile, WebService also supports a third user to directly upload large batch of data, or directly acquire data from the power department through WebService. Service beyond this unexpected Rest standard supports the following operations:
5-1)PUT
the uploading cloud platform corresponding to the electric power data has the specific usage as follows:
PUThttp://aaa.com/user1/device1
5-2)DELETE
the specific usage of the data corresponding to the deletion of the power data, such as the data of the deletion 20150101, is as follows:
DELETEhttp://aaa.com/user1/device1/20150101
5-3)GET
obtaining power data, such as the total amount of data for a certain day, and obtaining 20150101 specifically uses the following data:
GEThttp://aaa.com/user1/device1/20150101
5-4)POST
data in the cloud platform can be modified, for example, data of light1 is to be modified, and the specific usage is as follows:
POSThttp://aaa.com/user1/device1/light1
advantageous effects
The Hadoop-based power cloud platform design scheme can efficiently process data, solves the problem of data consistency, ensures stable operation of a cluster, has high throughput, improves a data interface with a higher standard, supports access of third-party data, and avoids cost caused by repeated development.
Drawings
FIG. 1 is a structural diagram of a design scheme of a Hadoop-based power cloud platform.
FIG. 2 is a data flow diagram implemented by a Hadoop-based power cloud platform design scheme.
Detailed Description
The present invention will be described in detail below with reference to the accompanying drawings in conjunction with embodiments. The invention is described below with reference to an actual deployment scenario as an example.
The platform structure is shown in figure 1: the design of the power cloud platform comprises the following steps: HDFS, MapReduce, HBase, WebService and websites. The DFS is used as a distributed file storage system and is used for general storage of power data; meanwhile, partial data are convenient to query and the like and are stored in HBase; MapReduce can read data processing from HBase or HDFS and write back; secondly, the website can also perform read-write operations from HBase. Both APP and power data can interact with HBase through WebService.
Further, the cloud platform is composed of 1 master node, 1 second master node and 20 slave nodes, 15 slave nodes are composed of a physical machine with 15 hard disks of 500GB and a memory of 4GB and 5 virtual machines, and the virtual machines are realized through oracle vmvirtualbox software of Oracl corporation. 2 sets are configured as 200GB hard disk, 4GB memory, and 3 sets are configured. 3 hard disks and 8GB memories are configured.
(1) HDFS, in a cloud environment, a traditional file system cannot meet the requirements of users on data disaster recovery, data consistency and the like. HDFS runs a distributed file system on general purpose hardware. It has many similarities with existing distributed file systems. But at the same time it is a highly fault tolerant system suitable for deployment on inexpensive machines. In the design scheme, the HDFS is used as a basic file system, high-throughput data access is provided, and data interaction can be performed with MapReduce, HBase and the like. The backup number of each node HDFS is set to be 3, so that the integrity of data is guaranteed when a single node is down.
(2) MapReduce is characterized in that the traditional processing program cannot meet the requirements due to the characteristics of high power data transmission frequency, large data volume, large user data volume and the like. MapReduce is a programming model for parallel operation of large-scale data sets (greater than 1 TB). In the scheme, MapReduce is responsible for processing some data in HDFS and HBase and writing the data back to the HBase regularly. The power cloud platform mainly comprises a plurality of programs based on MapReduce statistics, an SVM prediction model program and other programs. We configure 1GB memory and 1 processor for each node configured with MapReduce. When the power data is processed, 200 maptask and 40 reducetask are generated, and the HBase is written back after the data is processed.
The specific processing operations are as follows:
2-1) counting the total electricity consumption of a single user;
2-2) grading the electricity consumption of the user;
2-3) predicting the electricity consumption of the user in 24 hours in the future;
2-4) classifying the electrical equipment (according to the power consumption condition);
2-5) some of the rest of the statistical operations.
(3) HBase is a distributed and column-oriented open source database, and the open source implementation of Google Bigtable is a distributed storage system of structured data. HBase is a column-type memory database, belongs to one of NOSQL databases, and is a database suitable for unstructured data storage. In the design scheme, the HBase replaces a traditional database, has the functions of disaster recovery, data consistency and the like, can be well interacted with the HDFS, power data in the HDFS can be randomly handed to the HDFS for operation, and the HBase can also be interacted with power data processing programs written based on MapReduce.
(4) The website is written based on JSP and J2EE, the APP comprises an android version and an IOS version, the APP is provided for users to access, graphical display of some data is carried out, some customized services are provided, and the method specifically comprises the following functions:
4-1) reminding the user of electricity consumption every month;
4-2) setting a reminding mode;
4-3) recommending the electric appliance model with electricity saving;
4-4) predicting the aging of the electric appliance.
(5) WebService, which is a WebService with an open Rest standard, is used as a source of data, and the data of the power sensor is uploaded to a cluster through the WebService and then processed by a plurality of MapReduce programs. Meanwhile, WebService also supports a third user to directly upload large batch of data, or directly acquire data from the power department through WebService. Service beyond this unexpected Rest standard supports the following operations
5-1)PUT
The uploading cloud platform corresponding to the electric power data has the specific usage as follows:
PUThttp://aaa.com/user1/device1
5-2)DELETE
the specific usage of the data corresponding to the deletion of the power data, such as the data of the deletion 20150101, is as follows:
DELETEhttp://aaa.com/user1/device1/20150101
5-3)GET
obtaining power data, such as the total amount of data for a certain day, and obtaining 20150101 specifically uses the following data:
GEThttp://aaa.com/user1/device1/20150101
5-4)POST
data in the cloud platform can be modified, for example, data of light1 is to be modified, and the specific usage is as follows:
POSThttp://aaa.com/user1/device1/light1
the specific operation process is as follows:
(1) and (3) data uploading:
data of the power sensor are uploaded to our cluster through WebService, and data collected by a user or a power department can also be directly uploaded through WebService. Specifically, the POST operation is used to send the actual website address.
(2) Data processing:
after data are uploaded, classification and statistical operation can be carried out by default, and prediction of power consumption can be carried out in the later stage. These steps consist of multiple MapReduce programs. And respectively responsible for some operations of data, and finally written into HBase.
(3) Data display and management work:
data display is carried out through a website, and meanwhile WebService is called to develop APP, the website and the APP can display some recent electricity consumption, and some similar electricity consumption behaviors are presumed by users, and some devices with relatively low electricity consumption are recommended. Secondly, the user can directly operate own equipment data through WebService, such as addition of equipment and the like, and once the equipment number is updated, the display data is updated immediately.

Claims (1)

1. A design scheme of a Hadoop-based power cloud platform is characterized by comprising modules HDFS, MapReduce, HBase, WebService, websites and APP, and specifically comprising the following operation processes:
(1) and (3) data uploading:
the data of the power sensor is uploaded to the cluster through WebService, and the data collected by a user or a power department can also be directly uploaded through WebService;
(2) data processing:
after data are uploaded, classification and statistical operation can be carried out by default, and power consumption can be predicted in the later period, wherein the step is composed of a plurality of MapReduce programs; the data are respectively responsible for data operation and are written into HBase finally;
(3) and (3) displaying and managing data:
data display is carried out through a website, and meanwhile, through calling WebService to develop APP, the website and the APP can carry out display of recent electricity consumption, similar electricity consumption behavior user conjecture and recommend electricity-saving equipment.
Wherein,
the HDFS serves as a basic file system, provides high-throughput data access, and can interact with MapReduce, HBase and the like;
the MapReduce is responsible for processing data in the HDFS and the HBase and writing the data back to the HBase at regular time, and the specific processing operations include:
2-1) counting the total electricity consumption of a single user;
2-2) grading the electricity consumption of the user;
2-3) predicting the electricity consumption of the user in 24 hours in the future;
2-4) classifying the electrical equipment (according to the power consumption condition);
2-5) analyzing the historical electricity consumption of the electrical equipment;
2-6) some other statistical operations;
the HBase replaces a traditional database, has disaster recovery and data consistency functions at the same time, can interact with the HDFS, can freely deliver power data in the HDFS to the HDFS for operation, and can also interact with a power data processing program written based on MapReduce;
the website is written based on JSP and J2EE, APP includes android version and IOS version, provides user access, carries out graphical display of data, provides customized service simultaneously, and it contains specifically:
4-1) reminding the user of electricity consumption every month;
4-2) setting a reminding mode;
4-3) recommending the electric appliance model with electricity saving;
4-4) predicting the aging of the electric appliance;
the Service is used as a data source, the data of the power sensor are uploaded to the cluster through WebService, and then are processed by a plurality of MapReduce programs; meanwhile, WebService also supports a third user to directly upload large batch of data, or directly acquire data from the power department through WebService; service beyond this unexpected Rest standard supports the following operations:
5-1) PUT: uploading cloud platforms corresponding to the electric power data;
5-2) DELETE: deleting corresponding power data;
5-3) GET: acquiring power data;
5-4) POST: and modifying data in the cloud platform.
CN201510817806.4A 2015-11-23 2015-11-23 Hadoop-based electric power cloud platform design scheme Pending CN105321124A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510817806.4A CN105321124A (en) 2015-11-23 2015-11-23 Hadoop-based electric power cloud platform design scheme

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510817806.4A CN105321124A (en) 2015-11-23 2015-11-23 Hadoop-based electric power cloud platform design scheme

Publications (1)

Publication Number Publication Date
CN105321124A true CN105321124A (en) 2016-02-10

Family

ID=55248454

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510817806.4A Pending CN105321124A (en) 2015-11-23 2015-11-23 Hadoop-based electric power cloud platform design scheme

Country Status (1)

Country Link
CN (1) CN105321124A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106055678A (en) * 2016-06-07 2016-10-26 国网河南省电力公司电力科学研究院 Hadoop-based panoramic big data distributed storage method
CN107508880A (en) * 2017-08-20 2017-12-22 成都才智圣有科技有限责任公司 Data-storage system based on big data processing
WO2018099406A1 (en) * 2016-11-29 2018-06-07 中兴通讯股份有限公司 Method for realizing mobile cloud-computing intermediate platform and method for realizing distribution

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102169505A (en) * 2011-05-16 2011-08-31 苏州两江科技有限公司 Recommendation system building method based on cloud computing
CN102955977A (en) * 2012-11-16 2013-03-06 国家电气设备检测与工程能效测评中心(武汉) Energy efficiency service method and energy efficiency service platform adopting same on basis of cloud technology
CN103136335A (en) * 2013-01-31 2013-06-05 北京千分点信息科技有限公司 Data control method based on data platforms
CN203149803U (en) * 2013-03-21 2013-08-21 衡水供电公司 Summarization and analysis system for electric power data
CN104135516A (en) * 2014-07-29 2014-11-05 浪潮软件集团有限公司 Distributed cloud storage method based on industry data acquisition
CN104361110A (en) * 2014-12-01 2015-02-18 广东电网有限责任公司清远供电局 Mass electricity consumption data analysis system as well as real-time calculation method and data mining method
CN104394211A (en) * 2014-11-21 2015-03-04 浪潮电子信息产业股份有限公司 Hadoop-based user behavior analysis system design and implementation method
CN104468220A (en) * 2014-12-11 2015-03-25 汤亿则 Early warning control platform of power telecommunication network

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102169505A (en) * 2011-05-16 2011-08-31 苏州两江科技有限公司 Recommendation system building method based on cloud computing
CN102955977A (en) * 2012-11-16 2013-03-06 国家电气设备检测与工程能效测评中心(武汉) Energy efficiency service method and energy efficiency service platform adopting same on basis of cloud technology
CN103136335A (en) * 2013-01-31 2013-06-05 北京千分点信息科技有限公司 Data control method based on data platforms
CN203149803U (en) * 2013-03-21 2013-08-21 衡水供电公司 Summarization and analysis system for electric power data
CN104135516A (en) * 2014-07-29 2014-11-05 浪潮软件集团有限公司 Distributed cloud storage method based on industry data acquisition
CN104394211A (en) * 2014-11-21 2015-03-04 浪潮电子信息产业股份有限公司 Hadoop-based user behavior analysis system design and implementation method
CN104361110A (en) * 2014-12-01 2015-02-18 广东电网有限责任公司清远供电局 Mass electricity consumption data analysis system as well as real-time calculation method and data mining method
CN104468220A (en) * 2014-12-11 2015-03-25 汤亿则 Early warning control platform of power telecommunication network

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106055678A (en) * 2016-06-07 2016-10-26 国网河南省电力公司电力科学研究院 Hadoop-based panoramic big data distributed storage method
WO2018099406A1 (en) * 2016-11-29 2018-06-07 中兴通讯股份有限公司 Method for realizing mobile cloud-computing intermediate platform and method for realizing distribution
CN107508880A (en) * 2017-08-20 2017-12-22 成都才智圣有科技有限责任公司 Data-storage system based on big data processing

Similar Documents

Publication Publication Date Title
Klimovic et al. Selecta: Heterogeneous cloud storage configuration for data analytics
US10884636B1 (en) Presenting workload performance in a storage system
US20190361626A1 (en) Integrated storage management between storage systems and container orchestrators
US10585691B2 (en) Distribution system, computer, and arrangement method for virtual machine
Lai et al. Towards a framework for large-scale multimedia data storage and processing on Hadoop platform
US10599648B2 (en) Optimized storage solution for real-time queries and data modeling
US10133775B1 (en) Run time prediction for data queries
Donvito et al. Testing of several distributed file-systems (HDFS, Ceph and GlusterFS) for supporting the HEP experiments analysis
US20180247234A1 (en) Platform for management and tracking of collaborative projects
WO2019226652A1 (en) Auto-scaling a software application
US20220197514A1 (en) Balancing The Number Of Read Operations And Write Operations That May Be Simultaneously Serviced By A Storage System
CN105138281B (en) A kind of sharing method and device of physical disk
US20220012093A1 (en) System and method for optimizing and load balancing of applications using distributed computer clusters
Belcastro et al. Big data analysis on clouds
US11023284B2 (en) System and method for optimization and load balancing of computer clusters
US10671509B1 (en) Simulating storage server configurations
CN111966677A (en) Data report processing method and device, electronic equipment and storage medium
US12072892B2 (en) Data ingestion with spatial and temporal locality
US11169716B2 (en) Prediction of maintenance window of a storage system
CN111352592B (en) Disk read-write control method, device, equipment and computer readable storage medium
CN105321124A (en) Hadoop-based electric power cloud platform design scheme
CN112148461A (en) Application scheduling method and device
US9690728B1 (en) Burst buffer appliance comprising multiple virtual machines
CN113792038A (en) Method and apparatus for storing data
CN111158595B (en) Enterprise-level heterogeneous storage resource scheduling method and system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20160210

WD01 Invention patent application deemed withdrawn after publication