CN105321124A - Hadoop-based electric power cloud platform design scheme - Google Patents
Hadoop-based electric power cloud platform design scheme Download PDFInfo
- Publication number
- CN105321124A CN105321124A CN201510817806.4A CN201510817806A CN105321124A CN 105321124 A CN105321124 A CN 105321124A CN 201510817806 A CN201510817806 A CN 201510817806A CN 105321124 A CN105321124 A CN 105321124A
- Authority
- CN
- China
- Prior art keywords
- data
- power
- hbase
- webservice
- mapreduce
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000013461 design Methods 0.000 title claims abstract description 16
- 238000012545 processing Methods 0.000 claims abstract description 20
- 238000000034 method Methods 0.000 claims abstract description 13
- 230000005611 electricity Effects 0.000 claims description 25
- 230000006870 function Effects 0.000 claims description 6
- 238000011084 recovery Methods 0.000 claims description 5
- 230000032683 aging Effects 0.000 claims description 3
- 238000013500 data storage Methods 0.000 abstract description 3
- 238000011161 development Methods 0.000 abstract description 3
- 230000015654 memory Effects 0.000 description 6
- 238000012217 deletion Methods 0.000 description 4
- 230000037430 deletion Effects 0.000 description 4
- 230000006399 behavior Effects 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 2
- 230000007547 defect Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 238000004891 communication Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
Landscapes
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention provides a Hadoop-based electric power cloud platform design scheme. Through the HDFS and HBase, the efficiency of data storage and index is increased; MapReduce is employed for offline analysis of data and efficient processing of data. Rest-standard Web Service facilitates uploading of user data and brings convenience for the development of a third party. The scheme solves the problems of low efficiency and poor performance in a conventional data processing process, solves the problem of low efficiency in solving the problem in data consistency in the data processing process under the conventional single machine condition, guarantees the stable operation of clusters and has a high handling capacity; data interfaces of higher standards are used, and thus the cost for repeated development is prevented.
Description
Technical Field
The invention relates to a distributed computing technology and a distributed storage technology, belongs to the field of cloud computing, and particularly relates to a design scheme of a Hadoop-based power cloud platform.
Background
With the increase of various electrical devices, power data is becoming larger and more complex. The traditional data issuing processing mode is difficult to process TB-level and PB-level data, even if the TB-level and PB-level data can be processed correctly, the efficiency is low, secondly, the consistency and the reliability of the data cannot be guaranteed, and even if a complete system is provided, the throughput of the system is limited, no proper interface is provided, and the data access of a third party can be provided.
Cloud computing is widely studied not only in academic fields but also in industrial fields. Due to the characteristics of large scale, virtualization, control among failure connection components, asynchronous communication and the like, the method has unique advantages. The requirements of distributed systems and on-demand services, computing power and storage resources are becoming more and more stringent. MapReduce, GFS and BigTable proposed by Google, have solved well, the difficulty of the traditional processing electric power data, MapReudge is always the most popular distribution programming model in the cloud environment. The data sets can be easily and efficiently processed by using MapReduce in the cloud infrastructure.
A design scheme of a Hadoop-based power cloud platform comprehensively utilizes Apache Hadoop, flexibly utilizes MapReduce, HadoopDistributedFilesystem (HDFS), HBase and the like, can efficiently process data, and solves the problem of data consistency.
Disclosure of Invention
The invention aims to provide a design scheme of an electric power cloud platform based on Hadoop, which solves the defects of low efficiency and poor performance in the traditional data processing process, and the defects of low efficiency in the data processing process under the traditional single machine condition, and difficult guarantee of the poor performance of the consistency of data and the stability of clusters even depending on distributed computation.
The technical scheme is as follows:
the invention provides a Hadoop-based power cloud platform design scheme which mainly comprises a plurality of modules: HDFS, MapReduce, HBase, WebService and websites. The DFS is used as a distributed file storage system and is used for general storage of power data; meanwhile, partial data are convenient to query and the like and are stored in HBase; MapReduce can read data processing from HBase or HDFS and write back; secondly, the website can also perform read-write operations from HBase. Both APP and power data can interact with HBase through WebService.
The specific operation process is as follows:
(1) and (3) data uploading:
data of the power sensor are uploaded to our cluster through WebService, and data collected by a user or a power department can also be directly uploaded through WebService.
(2) Data processing:
after data are uploaded, classification and statistical operation can be carried out by default, and prediction of power consumption can be carried out in the later stage. These steps consist of multiple MapReduce programs. And respectively responsible for some operations of data, and finally written into HBase.
(3) And (3) displaying and managing data:
data display is carried out through a website, and meanwhile WebService is called to develop APP, the website and the APP can display some recent electricity consumption, and some similar electricity consumption behaviors are presumed by users, and some devices with relatively low electricity consumption are recommended.
The specific functions of the modules are as follows
(1) HDFS, in a cloud environment, a traditional file system cannot meet the requirements of users on data disaster recovery, data consistency and the like. HDFS runs a distributed file system on general purpose hardware. It has many similarities with existing distributed file systems. But at the same time it is a highly fault tolerant system suitable for deployment on inexpensive machines. In the design scheme, the HDFS is used as a basic file system, high-throughput data access is provided, and data interaction can be performed with MapReduce, HBase and the like.
(2) MapReduce is characterized in that the traditional processing program cannot meet the requirements due to the characteristics of high power data transmission frequency, large data volume, large user data volume and the like. MapReduce is a programming model for parallel operation of large-scale data sets (greater than 1 TB). In the scheme, MapReduce is responsible for processing some data in HDFS and HBase and writing the data back to the HBase regularly. The power cloud platform mainly comprises a plurality of programs based on MapReduce statistics, an SVM prediction model program and other programs. The specific processing operations are as follows:
2-1) counting the total electricity consumption of a single user;
2-2) grading the electricity consumption of the user;
2-3) predicting the electricity consumption of the user in 24 hours in the future;
2-4) classifying the electrical equipment (according to the power consumption condition);
2-5) analyzing the historical electricity consumption of the electrical equipment;
2-6) some of the rest of the statistical operations.
(3) HBase is a distributed and column-oriented open source database, and the open source implementation of Google Bigtable is a distributed storage system of structured data. HBase is a column-type memory database, belongs to one of NOSQL databases, and is a database suitable for unstructured data storage. In the design scheme, the HBase replaces a traditional database, has the functions of disaster recovery, data consistency and the like, can well interact with the HDFS, can freely deliver power data in the HDFS to the HDFS for operation, and can also interact with power data processing programs written based on MapReduce.
(4) The website is written based on JSP and J2EE, the APP comprises an android version and an IOS version, the APP is provided for users to access, graphical display of some data is carried out, some customized services are provided, and the method specifically comprises the following functions:
4-1) reminding the user of electricity consumption every month;
4-2) setting a reminding mode;
4-3) recommending the electric appliance model with electricity saving;
4-4) predicting the aging of the electric appliance.
(5) WebService, which is a WebService with an open Rest standard, is used as a source of data, and the data of the power sensor is uploaded to a cluster through the WebService and then processed by a plurality of MapReduce programs. Meanwhile, WebService also supports a third user to directly upload large batch of data, or directly acquire data from the power department through WebService. Service beyond this unexpected Rest standard supports the following operations:
5-1)PUT
the uploading cloud platform corresponding to the electric power data has the specific usage as follows:
PUThttp://aaa.com/user1/device1
5-2)DELETE
the specific usage of the data corresponding to the deletion of the power data, such as the data of the deletion 20150101, is as follows:
DELETEhttp://aaa.com/user1/device1/20150101
5-3)GET
obtaining power data, such as the total amount of data for a certain day, and obtaining 20150101 specifically uses the following data:
GEThttp://aaa.com/user1/device1/20150101
5-4)POST
data in the cloud platform can be modified, for example, data of light1 is to be modified, and the specific usage is as follows:
POSThttp://aaa.com/user1/device1/light1
advantageous effects
The Hadoop-based power cloud platform design scheme can efficiently process data, solves the problem of data consistency, ensures stable operation of a cluster, has high throughput, improves a data interface with a higher standard, supports access of third-party data, and avoids cost caused by repeated development.
Drawings
FIG. 1 is a structural diagram of a design scheme of a Hadoop-based power cloud platform.
FIG. 2 is a data flow diagram implemented by a Hadoop-based power cloud platform design scheme.
Detailed Description
The present invention will be described in detail below with reference to the accompanying drawings in conjunction with embodiments. The invention is described below with reference to an actual deployment scenario as an example.
The platform structure is shown in figure 1: the design of the power cloud platform comprises the following steps: HDFS, MapReduce, HBase, WebService and websites. The DFS is used as a distributed file storage system and is used for general storage of power data; meanwhile, partial data are convenient to query and the like and are stored in HBase; MapReduce can read data processing from HBase or HDFS and write back; secondly, the website can also perform read-write operations from HBase. Both APP and power data can interact with HBase through WebService.
Further, the cloud platform is composed of 1 master node, 1 second master node and 20 slave nodes, 15 slave nodes are composed of a physical machine with 15 hard disks of 500GB and a memory of 4GB and 5 virtual machines, and the virtual machines are realized through oracle vmvirtualbox software of Oracl corporation. 2 sets are configured as 200GB hard disk, 4GB memory, and 3 sets are configured. 3 hard disks and 8GB memories are configured.
(1) HDFS, in a cloud environment, a traditional file system cannot meet the requirements of users on data disaster recovery, data consistency and the like. HDFS runs a distributed file system on general purpose hardware. It has many similarities with existing distributed file systems. But at the same time it is a highly fault tolerant system suitable for deployment on inexpensive machines. In the design scheme, the HDFS is used as a basic file system, high-throughput data access is provided, and data interaction can be performed with MapReduce, HBase and the like. The backup number of each node HDFS is set to be 3, so that the integrity of data is guaranteed when a single node is down.
(2) MapReduce is characterized in that the traditional processing program cannot meet the requirements due to the characteristics of high power data transmission frequency, large data volume, large user data volume and the like. MapReduce is a programming model for parallel operation of large-scale data sets (greater than 1 TB). In the scheme, MapReduce is responsible for processing some data in HDFS and HBase and writing the data back to the HBase regularly. The power cloud platform mainly comprises a plurality of programs based on MapReduce statistics, an SVM prediction model program and other programs. We configure 1GB memory and 1 processor for each node configured with MapReduce. When the power data is processed, 200 maptask and 40 reducetask are generated, and the HBase is written back after the data is processed.
The specific processing operations are as follows:
2-1) counting the total electricity consumption of a single user;
2-2) grading the electricity consumption of the user;
2-3) predicting the electricity consumption of the user in 24 hours in the future;
2-4) classifying the electrical equipment (according to the power consumption condition);
2-5) some of the rest of the statistical operations.
(3) HBase is a distributed and column-oriented open source database, and the open source implementation of Google Bigtable is a distributed storage system of structured data. HBase is a column-type memory database, belongs to one of NOSQL databases, and is a database suitable for unstructured data storage. In the design scheme, the HBase replaces a traditional database, has the functions of disaster recovery, data consistency and the like, can be well interacted with the HDFS, power data in the HDFS can be randomly handed to the HDFS for operation, and the HBase can also be interacted with power data processing programs written based on MapReduce.
(4) The website is written based on JSP and J2EE, the APP comprises an android version and an IOS version, the APP is provided for users to access, graphical display of some data is carried out, some customized services are provided, and the method specifically comprises the following functions:
4-1) reminding the user of electricity consumption every month;
4-2) setting a reminding mode;
4-3) recommending the electric appliance model with electricity saving;
4-4) predicting the aging of the electric appliance.
(5) WebService, which is a WebService with an open Rest standard, is used as a source of data, and the data of the power sensor is uploaded to a cluster through the WebService and then processed by a plurality of MapReduce programs. Meanwhile, WebService also supports a third user to directly upload large batch of data, or directly acquire data from the power department through WebService. Service beyond this unexpected Rest standard supports the following operations
5-1)PUT
The uploading cloud platform corresponding to the electric power data has the specific usage as follows:
PUThttp://aaa.com/user1/device1
5-2)DELETE
the specific usage of the data corresponding to the deletion of the power data, such as the data of the deletion 20150101, is as follows:
DELETEhttp://aaa.com/user1/device1/20150101
5-3)GET
obtaining power data, such as the total amount of data for a certain day, and obtaining 20150101 specifically uses the following data:
GEThttp://aaa.com/user1/device1/20150101
5-4)POST
data in the cloud platform can be modified, for example, data of light1 is to be modified, and the specific usage is as follows:
POSThttp://aaa.com/user1/device1/light1
the specific operation process is as follows:
(1) and (3) data uploading:
data of the power sensor are uploaded to our cluster through WebService, and data collected by a user or a power department can also be directly uploaded through WebService. Specifically, the POST operation is used to send the actual website address.
(2) Data processing:
after data are uploaded, classification and statistical operation can be carried out by default, and prediction of power consumption can be carried out in the later stage. These steps consist of multiple MapReduce programs. And respectively responsible for some operations of data, and finally written into HBase.
(3) Data display and management work:
data display is carried out through a website, and meanwhile WebService is called to develop APP, the website and the APP can display some recent electricity consumption, and some similar electricity consumption behaviors are presumed by users, and some devices with relatively low electricity consumption are recommended. Secondly, the user can directly operate own equipment data through WebService, such as addition of equipment and the like, and once the equipment number is updated, the display data is updated immediately.
Claims (1)
1. A design scheme of a Hadoop-based power cloud platform is characterized by comprising modules HDFS, MapReduce, HBase, WebService, websites and APP, and specifically comprising the following operation processes:
(1) and (3) data uploading:
the data of the power sensor is uploaded to the cluster through WebService, and the data collected by a user or a power department can also be directly uploaded through WebService;
(2) data processing:
after data are uploaded, classification and statistical operation can be carried out by default, and power consumption can be predicted in the later period, wherein the step is composed of a plurality of MapReduce programs; the data are respectively responsible for data operation and are written into HBase finally;
(3) and (3) displaying and managing data:
data display is carried out through a website, and meanwhile, through calling WebService to develop APP, the website and the APP can carry out display of recent electricity consumption, similar electricity consumption behavior user conjecture and recommend electricity-saving equipment.
Wherein,
the HDFS serves as a basic file system, provides high-throughput data access, and can interact with MapReduce, HBase and the like;
the MapReduce is responsible for processing data in the HDFS and the HBase and writing the data back to the HBase at regular time, and the specific processing operations include:
2-1) counting the total electricity consumption of a single user;
2-2) grading the electricity consumption of the user;
2-3) predicting the electricity consumption of the user in 24 hours in the future;
2-4) classifying the electrical equipment (according to the power consumption condition);
2-5) analyzing the historical electricity consumption of the electrical equipment;
2-6) some other statistical operations;
the HBase replaces a traditional database, has disaster recovery and data consistency functions at the same time, can interact with the HDFS, can freely deliver power data in the HDFS to the HDFS for operation, and can also interact with a power data processing program written based on MapReduce;
the website is written based on JSP and J2EE, APP includes android version and IOS version, provides user access, carries out graphical display of data, provides customized service simultaneously, and it contains specifically:
4-1) reminding the user of electricity consumption every month;
4-2) setting a reminding mode;
4-3) recommending the electric appliance model with electricity saving;
4-4) predicting the aging of the electric appliance;
the Service is used as a data source, the data of the power sensor are uploaded to the cluster through WebService, and then are processed by a plurality of MapReduce programs; meanwhile, WebService also supports a third user to directly upload large batch of data, or directly acquire data from the power department through WebService; service beyond this unexpected Rest standard supports the following operations:
5-1) PUT: uploading cloud platforms corresponding to the electric power data;
5-2) DELETE: deleting corresponding power data;
5-3) GET: acquiring power data;
5-4) POST: and modifying data in the cloud platform.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510817806.4A CN105321124A (en) | 2015-11-23 | 2015-11-23 | Hadoop-based electric power cloud platform design scheme |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510817806.4A CN105321124A (en) | 2015-11-23 | 2015-11-23 | Hadoop-based electric power cloud platform design scheme |
Publications (1)
Publication Number | Publication Date |
---|---|
CN105321124A true CN105321124A (en) | 2016-02-10 |
Family
ID=55248454
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510817806.4A Pending CN105321124A (en) | 2015-11-23 | 2015-11-23 | Hadoop-based electric power cloud platform design scheme |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105321124A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106055678A (en) * | 2016-06-07 | 2016-10-26 | 国网河南省电力公司电力科学研究院 | Hadoop-based panoramic big data distributed storage method |
CN107508880A (en) * | 2017-08-20 | 2017-12-22 | 成都才智圣有科技有限责任公司 | Data-storage system based on big data processing |
WO2018099406A1 (en) * | 2016-11-29 | 2018-06-07 | 中兴通讯股份有限公司 | Method for realizing mobile cloud-computing intermediate platform and method for realizing distribution |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102169505A (en) * | 2011-05-16 | 2011-08-31 | 苏州两江科技有限公司 | Recommendation system building method based on cloud computing |
CN102955977A (en) * | 2012-11-16 | 2013-03-06 | 国家电气设备检测与工程能效测评中心(武汉) | Energy efficiency service method and energy efficiency service platform adopting same on basis of cloud technology |
CN103136335A (en) * | 2013-01-31 | 2013-06-05 | 北京千分点信息科技有限公司 | Data control method based on data platforms |
CN203149803U (en) * | 2013-03-21 | 2013-08-21 | 衡水供电公司 | Summarization and analysis system for electric power data |
CN104135516A (en) * | 2014-07-29 | 2014-11-05 | 浪潮软件集团有限公司 | Distributed cloud storage method based on industry data acquisition |
CN104361110A (en) * | 2014-12-01 | 2015-02-18 | 广东电网有限责任公司清远供电局 | Mass electricity consumption data analysis system as well as real-time calculation method and data mining method |
CN104394211A (en) * | 2014-11-21 | 2015-03-04 | 浪潮电子信息产业股份有限公司 | Hadoop-based user behavior analysis system design and implementation method |
CN104468220A (en) * | 2014-12-11 | 2015-03-25 | 汤亿则 | Early warning control platform of power telecommunication network |
-
2015
- 2015-11-23 CN CN201510817806.4A patent/CN105321124A/en active Pending
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102169505A (en) * | 2011-05-16 | 2011-08-31 | 苏州两江科技有限公司 | Recommendation system building method based on cloud computing |
CN102955977A (en) * | 2012-11-16 | 2013-03-06 | 国家电气设备检测与工程能效测评中心(武汉) | Energy efficiency service method and energy efficiency service platform adopting same on basis of cloud technology |
CN103136335A (en) * | 2013-01-31 | 2013-06-05 | 北京千分点信息科技有限公司 | Data control method based on data platforms |
CN203149803U (en) * | 2013-03-21 | 2013-08-21 | 衡水供电公司 | Summarization and analysis system for electric power data |
CN104135516A (en) * | 2014-07-29 | 2014-11-05 | 浪潮软件集团有限公司 | Distributed cloud storage method based on industry data acquisition |
CN104394211A (en) * | 2014-11-21 | 2015-03-04 | 浪潮电子信息产业股份有限公司 | Hadoop-based user behavior analysis system design and implementation method |
CN104361110A (en) * | 2014-12-01 | 2015-02-18 | 广东电网有限责任公司清远供电局 | Mass electricity consumption data analysis system as well as real-time calculation method and data mining method |
CN104468220A (en) * | 2014-12-11 | 2015-03-25 | 汤亿则 | Early warning control platform of power telecommunication network |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106055678A (en) * | 2016-06-07 | 2016-10-26 | 国网河南省电力公司电力科学研究院 | Hadoop-based panoramic big data distributed storage method |
WO2018099406A1 (en) * | 2016-11-29 | 2018-06-07 | 中兴通讯股份有限公司 | Method for realizing mobile cloud-computing intermediate platform and method for realizing distribution |
CN107508880A (en) * | 2017-08-20 | 2017-12-22 | 成都才智圣有科技有限责任公司 | Data-storage system based on big data processing |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Klimovic et al. | Selecta: Heterogeneous cloud storage configuration for data analytics | |
US10884636B1 (en) | Presenting workload performance in a storage system | |
US20190361626A1 (en) | Integrated storage management between storage systems and container orchestrators | |
US10585691B2 (en) | Distribution system, computer, and arrangement method for virtual machine | |
Lai et al. | Towards a framework for large-scale multimedia data storage and processing on Hadoop platform | |
US10599648B2 (en) | Optimized storage solution for real-time queries and data modeling | |
US10133775B1 (en) | Run time prediction for data queries | |
Donvito et al. | Testing of several distributed file-systems (HDFS, Ceph and GlusterFS) for supporting the HEP experiments analysis | |
US20180247234A1 (en) | Platform for management and tracking of collaborative projects | |
WO2019226652A1 (en) | Auto-scaling a software application | |
US20220197514A1 (en) | Balancing The Number Of Read Operations And Write Operations That May Be Simultaneously Serviced By A Storage System | |
CN105138281B (en) | A kind of sharing method and device of physical disk | |
US20220012093A1 (en) | System and method for optimizing and load balancing of applications using distributed computer clusters | |
Belcastro et al. | Big data analysis on clouds | |
US11023284B2 (en) | System and method for optimization and load balancing of computer clusters | |
US10671509B1 (en) | Simulating storage server configurations | |
CN111966677A (en) | Data report processing method and device, electronic equipment and storage medium | |
US12072892B2 (en) | Data ingestion with spatial and temporal locality | |
US11169716B2 (en) | Prediction of maintenance window of a storage system | |
CN111352592B (en) | Disk read-write control method, device, equipment and computer readable storage medium | |
CN105321124A (en) | Hadoop-based electric power cloud platform design scheme | |
CN112148461A (en) | Application scheduling method and device | |
US9690728B1 (en) | Burst buffer appliance comprising multiple virtual machines | |
CN113792038A (en) | Method and apparatus for storing data | |
CN111158595B (en) | Enterprise-level heterogeneous storage resource scheduling method and system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20160210 |
|
WD01 | Invention patent application deemed withdrawn after publication |