CN103023995B - A kind of distributed cloud based on Hadoop stores automatic classification data management system - Google Patents

A kind of distributed cloud based on Hadoop stores automatic classification data management system Download PDF

Info

Publication number
CN103023995B
CN103023995B CN201210499413.XA CN201210499413A CN103023995B CN 103023995 B CN103023995 B CN 103023995B CN 201210499413 A CN201210499413 A CN 201210499413A CN 103023995 B CN103023995 B CN 103023995B
Authority
CN
China
Prior art keywords
data
message
module
staging
server
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201210499413.XA
Other languages
Chinese (zh)
Other versions
CN103023995A (en
Inventor
张大华
罗志明
周里涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Corp of China SGCC
China Electric Power Research Institute Co Ltd CEPRI
State Grid Sichuan Electric Power Co Ltd
Original Assignee
State Grid Corp of China SGCC
China Electric Power Research Institute Co Ltd CEPRI
State Grid Sichuan Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Corp of China SGCC, China Electric Power Research Institute Co Ltd CEPRI, State Grid Sichuan Electric Power Co Ltd filed Critical State Grid Corp of China SGCC
Priority to CN201210499413.XA priority Critical patent/CN103023995B/en
Publication of CN103023995A publication Critical patent/CN103023995A/en
Application granted granted Critical
Publication of CN103023995B publication Critical patent/CN103023995B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The invention provides a kind of distributed cloud based on Hadoop and store automatic classification data management system, comprise node server and central server, described node server acquisition server status message and data temperature allocation message, and the data staging administration module gathered message being sent to described central server.The present invention passes through at central server deploy data staging administration module, the unified message received from back end in HDFS cluster and namenode transmission, form the outer data staging instruction of band after treatment, and send to the namenode in HDFS cluster to be responsible for final data block to distribute again, thus the automaticdata differentiated control realized based on the distributed cloud storage system of Hadoop, improve the utilance of storage resources.

Description

A kind of distributed cloud based on Hadoop stores automatic classification data management system
Technical field
The invention belongs to field of computer technology, be specifically related to a kind of distributed cloud based on Hadoop and store automatic classification data management system.
Background technology
Along with cloud computing technology high speed development at home and abroad, the cloud memory technology based on Hadoop distributed file system (HDFS) is widely used.By to reuse and the mode of newly-increased PC server sets up HDFS cluster in large scale, utilize the local disk on PC server provide high-performance, high performance-price ratio, can the distributed cloud stores service of resilient expansion.
Due to the otherness of the server node of composition HDFS cluster, cluster interior joint probably has different memory properties and memory capacity.Therefore, how to take into full account internodal otherness, the distribution of optimization storage resources builds the distributed cloud storage system problem demanding prompt solution based on Hadoop.
Summary of the invention
In order to overcome above-mentioned the deficiencies in the prior art, the invention provides a kind of distributed cloud based on Hadoop and storing automatic classification data management system, realizing automaticdata differentiated control, improve utilization ratio of storage resources.
In order to realize foregoing invention object, the present invention takes following technical scheme:
A kind of distributed cloud based on Hadoop is provided to store automatic classification data management system, described system comprises node server and central server, described node server acquisition server status message and data temperature allocation message, and the data staging administration module gathered message being sent to described central server.
Described node server comprises server info acquisition module, data temperature acquisition module and data staging proxy module; Described server info acquisition module and data temperature acquisition module acquisition server status message and data temperature allocation message respectively, and respectively the message of collection is sent to described data staging administration module.
Described server info acquisition module is deployed on the back end of HDFS cluster, described data temperature acquisition module and data staging proxy module are deployed on the namenode of described HDFS cluster, and described data staging administration module is deployed on described central server.
Described server state message comprises hardware configuration and the running state information of back end, and described data temperature allocation message comprises data temperature message and Data distribution8 message.
Described message includes message header and message body, and described message header comprises central server title, IP address, node ID, encryption method, School Affairs timestamp; Described message body comprises specifies the node hardware configuration after encryption method encryption, node running status, data temperature and Data distribution8 through message header.
The outer data staging order of described data staging proxy module receiving belt, resolve the outer data staging order acquisition of described band and need Mobile data block message, and institute's obtaining information is informed namenode, described namenode copies target data block to destination node while receiving information, and the data block of deleting on source data node, after needing Mobile data block message all to obtain to terminate, namenode sends success message to data staging proxy module, and described data staging proxy module sends acknowledge message to data staging administration module.
Described target data block ID, source data node ID and the destination node ID needing Mobile data block message to comprise needs movement.
Described central server comprises message reception module, information persistence module, information cache module, data staging administration module, instruction processing module, instruction sending module, analysis engine Module nodes and Registering modules.
Described message reception module receives the server state message and data temperature allocation message that send respectively from back end and namenode, resolves and be sent to described data staging administration module to message;
Described data staging administration module generates the outer data staging order of band, and periodically sends to namenode;
Described information cache module receives the message from message reception module, effective information is formed with stored in information cache district through process, administer and maintain information cache district content simultaneously, and data staging administration module being sent to after information classification, described information cache district content comprises the establishment of information, renewal and deletion;
Information cache district exceeds capacity or time counter terminates or service stopping time, described information persistence module is written to disk to the message after information cache module process, and data staging administration module reads information from disk, send into information cache district;
The server state message that described analysis engine module information cache module sends and data temperature allocation message, form the distributed polar plot of data temperature, and the state updating of service data Temperature Distribution polar plot; Form the outer data staging instruction of band according to the distributed polar plot of data temperature, send to instruction processing module;
The content that described instruction processing module exports according to analysis engine module, is processed to form the instruction encoding that can be sent to specific node by instruction sending module;
Described instruction sending module receives the outer data staging order of the band generated from described data staging administration module, and sends instruction according to received instruction to destination node;
Described Node registry module receives the log-on message from information cache district, and the information of registration or renewal specified node.
Compared with prior art, beneficial effect of the present invention is:
1, be different from other Bedding storage methods by Data distribution8 on different storage mediums (internal memory, solid magnetic disc, disk, SAN network, tape), the distributed cloud based on Hadoop provided by the invention stores automatic classification data management system and utilizes the local disk of X86 server (SATA interface, scsi interface) to store the situation of data; Compared by static informations such as the contrast of node server configuration information disk size, quantity, interface type, read-write speed, in conjunction with operation condition of server message multidate informations such as () disk size, CPU, the network bandwidths and data temperature allocation message breath (the accessed number of times of data, time, frequency), realize depositing different temperatures data on the server of different performance, the optimization reaching server stores resources uses.
2, not the reasonable layout considering data block at the beginning of data store, the present invention uses the data block Distribution Strategy again of off-line, namely by the outer data hierarchy instruction of transmit band, in the load of whole HDFS cluster, the lightest or most suitable time carries out the movement of data block, thus more reasonably calculate the temperature information of data, reduce storing the normal impact used simultaneously.
3, the Hadoop distributed file system of the present invention and indication is loose coupling state, only need to modify to HDFS at two places in design process, can be transplanted in other cloud storage platform adopting distributed file system (metadata centralized management) very soon, data staging storage scheme is provided, there is stronger portability.
Accompanying drawing explanation
Fig. 1 is the data stewardship program figure of the distributed cloud storage based on Hadoop;
Fig. 2 is the distributed cloud storage automatic classification data management system logical architecture schematic diagram based on Hadoop;
Fig. 3 is central server comprising modules schematic diagram.
Embodiment
Below in conjunction with accompanying drawing, the present invention is described in further detail.
As Fig. 1 and Fig. 2, a kind of distributed cloud based on Hadoop is provided to store automatic classification data management system, described system comprises node server and central server, described node server acquisition server status message and data temperature allocation message, and the data staging administration module gathered message being sent to described central server.
Described node server comprises server info acquisition module, data temperature acquisition module and data staging proxy module;
Described server info acquisition module, by collecting the hardware configuration information (comprising the static information of the configuring conditions such as CPU, internal memory, hard disk, network) of back end, sends to central server after generation server status message; Afterwards, this service, by the running status message (comprising the multidate information of the service conditions such as CPU, internal memory, hard disk, network) of periodically image data node, forms server state message by analysis and sends to central server after process.
Described data temperature acquisition module by number of times accessed for record data and frequency (by the direct mode of namenode source code in amendment HDFS cluster, or in distributed cloud storage system client read request message, embed the indirect mode of more New count), calculate the temperature information of each data in storage (existing with document form); By the metadata information on resolve Name node, obtain the distributed intelligence of file; Periodically renewal ground information generated data Temperature Distribution message is had to send to central server by above-mentioned.
Described server info acquisition module is deployed on the back end of HDFS cluster, described data temperature acquisition module and data staging proxy module are deployed on the namenode of described HDFS cluster, and described data staging administration module is deployed on described central server.
Described server state message comprises hardware configuration and the running state information of back end, and described data temperature allocation message comprises data temperature message and Data distribution8 message.
Described message includes message header and message body, and described message header comprises central server title, IP address, node ID, encryption method, School Affairs timestamp; Described message body comprises specifies the node hardware configuration after encryption method encryption, node running status, data temperature and Data distribution8 through message header.
The outer data staging order of described data staging proxy module receiving belt, resolve the outer data staging order acquisition of described band and need Mobile data block message, and institute's obtaining information is informed namenode, described namenode copies target data block to destination node while receiving information, and the data block of deleting on source data node, after needing Mobile data block message all to obtain to terminate, namenode sends success message to data staging proxy module, and described data staging proxy module sends acknowledge message to data staging administration module.
Described target data block ID, source data node ID and the destination node ID needing Mobile data block message to comprise needs movement.
As Fig. 3, described central server comprises message reception module, information persistence module, information cache module, data staging administration module, instruction processing module, instruction sending module, analysis engine Module nodes and Registering modules.
Described message reception module receives the server state message and data temperature allocation message that send respectively from back end and namenode, resolves and be sent to described data staging administration module to message;
Described data staging administration module generates the outer data staging order of band, and periodically sends to namenode;
Described information cache module receives the message from message reception module, effective information is formed with stored in information cache district through process, administer and maintain information cache district content simultaneously, and data staging administration module being sent to after information classification, described information cache district content comprises the establishment of information, renewal and deletion;
Information cache district exceeds capacity or time counter terminates or service stopping time, described information persistence module is written to disk to the message after information cache module process, and data staging administration module reads information from disk, send into information cache district;
The server state message that described analysis engine module information cache module sends and data temperature allocation message, form the distributed polar plot of data temperature, and the state updating of service data Temperature Distribution polar plot; Form the outer data staging instruction of band according to the distributed polar plot of data temperature, send to instruction processing module;
The content that described instruction processing module exports according to analysis engine module, is processed to form the instruction encoding that can be sent to specific node by instruction sending module;
Described instruction sending module receives the outer data staging order of the band generated from described data staging administration module, and sends instruction according to received instruction to destination node;
Described Node registry module receives the log-on message from information cache district, and the information of registration or renewal specified node (forming the ID of node, for operations such as analysis engine completion status information and data temperature distributed intelligence parsing, node identification, generation instructions).
Finally should be noted that: above embodiment is only in order to illustrate that technical scheme of the present invention is not intended to limit, although with reference to above-described embodiment to invention has been detailed description, those of ordinary skill in the field are to be understood that: still can modify to the specific embodiment of the present invention or equivalent replacement, and not departing from any amendment of spirit and scope of the invention or equivalent replacement, it all should be encompassed in the middle of right of the present invention.

Claims (7)

1. one kind stores automatic classification data management system based on the distributed cloud of Hadoop, it is characterized in that: described system comprises node server and central server, described node server acquisition server status message and data temperature allocation message, and the data staging administration module gathered message being sent to described central server;
Described node server comprises server info acquisition module, data temperature acquisition module and data staging proxy module; Described server info acquisition module and data temperature acquisition module acquisition server status message and data temperature allocation message respectively, and respectively the message of collection is sent to described data staging administration module;
The outer data staging order of described data staging proxy module receiving belt, resolve the outer data staging order acquisition of described band and need Mobile data block message, and institute's obtaining information is informed namenode, described namenode copies target data block to destination node while receiving information, and the data block of deleting on source data node, after needing Mobile data block message all to obtain to terminate, namenode sends success message to data staging proxy module, and described data staging proxy module sends acknowledge message to data staging administration module.
2. the distributed cloud based on Hadoop according to claim 1 stores automatic classification data management system, it is characterized in that: described server info acquisition module is deployed on the back end of HDFS cluster, described data temperature acquisition module and data staging proxy module are deployed on the namenode of described HDFS cluster, and described data staging administration module is deployed on described central server.
3. the distributed cloud based on Hadoop according to claim 1 stores automatic classification data management system, it is characterized in that: described server state message comprises hardware configuration and the running state information of back end, described data temperature allocation message comprises data temperature message and Data distribution8 message.
4. the distributed cloud based on Hadoop according to claim 3 stores automatic classification data management system, it is characterized in that: described server state message and data temperature allocation message include message header and message body, described message header comprises central server title, IP address, node ID, encryption method, School Affairs timestamp; Described message body comprises specifies the node hardware configuration after encryption method encryption, node running status, data temperature and Data distribution8 through message header.
5. the distributed cloud based on Hadoop according to claim 1 stores automatic classification data management system, it is characterized in that: described target data block ID, source data node ID and the destination node ID needing Mobile data block message to comprise needs movement.
6. the distributed cloud based on Hadoop according to claim 1 and 2 stores automatic classification data management system, it is characterized in that: described central server comprises message reception module, information persistence module, information cache module, data staging administration module, instruction processing module, instruction sending module, analysis engine module and Node registry module.
7. the distributed cloud based on Hadoop according to claim 6 stores automatic classification data management system, it is characterized in that: described message reception module receives the server state message and data temperature allocation message that send respectively from back end and namenode, resolves and be sent to described data staging administration module to message;
Described data staging administration module generates the outer data staging order of band, and periodically sends to namenode;
Described information cache module receives the message from message reception module, effective information is formed with stored in information cache district through process, administer and maintain information cache district content simultaneously, and data staging administration module being sent to after information classification, described information cache district content comprises the establishment of information, renewal and deletion;
When information cache district exceed capacity time counter terminates or service stopping time, described information persistence module is written to disk to the message after information cache module process, and data staging administration module reads information from disk, send into information cache district;
Described analysis engine module receives server state message and the data temperature allocation message of information cache module transmission, forms the distributed polar plot of data temperature, and the state updating of service data Temperature Distribution polar plot; Form the outer data staging instruction of band according to the distributed polar plot of data temperature, send to instruction processing module;
The content that described instruction processing module exports according to analysis engine module, is processed to form the instruction encoding that can be sent to specific node by instruction sending module;
Described instruction sending module receives the outer data staging order of the band generated from described data staging administration module, and sends instruction according to received instruction to destination node;
Described Node registry module receives the log-on message from information cache district, and the information of registration or renewal specified node.
CN201210499413.XA 2012-11-29 2012-11-29 A kind of distributed cloud based on Hadoop stores automatic classification data management system Active CN103023995B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210499413.XA CN103023995B (en) 2012-11-29 2012-11-29 A kind of distributed cloud based on Hadoop stores automatic classification data management system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210499413.XA CN103023995B (en) 2012-11-29 2012-11-29 A kind of distributed cloud based on Hadoop stores automatic classification data management system

Publications (2)

Publication Number Publication Date
CN103023995A CN103023995A (en) 2013-04-03
CN103023995B true CN103023995B (en) 2015-09-09

Family

ID=47972119

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210499413.XA Active CN103023995B (en) 2012-11-29 2012-11-29 A kind of distributed cloud based on Hadoop stores automatic classification data management system

Country Status (1)

Country Link
CN (1) CN103023995B (en)

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103336670B (en) * 2013-06-04 2016-11-23 华为技术有限公司 A kind of method and apparatus data block being distributed automatically based on data temperature
CN103780622B (en) * 2014-01-24 2016-09-28 华中科技大学 A kind of data classification encryption method of facing cloud storage
CN104021503A (en) * 2014-05-08 2014-09-03 国家电网公司 Relaying cloud establishing method based on virtualized Hadoop cluster
CN104135516B (en) * 2014-07-29 2017-04-05 浪潮软件集团有限公司 Distributed cloud storage method based on industry data acquisition
CN104462577B (en) * 2014-12-29 2018-04-13 北京奇艺世纪科技有限公司 A kind of date storage method and device
CN105930102B (en) * 2016-05-06 2018-12-21 歌尔股份有限公司 A kind of micro electronmechanical product test data synchronous safety transfer method and system
CN106470242B (en) * 2016-09-07 2019-07-19 东南大学 A kind of large scale scale heterogeneous clustered node fast quantification stage division of cloud data center
CN108600281B (en) * 2017-03-16 2021-12-31 杭州海康威视数字技术股份有限公司 Cloud storage system, media data storage method and system
CN107135274A (en) * 2017-06-20 2017-09-05 郑州云海信息技术有限公司 The memory management method and device of a kind of distributed cluster system
CN109361560A (en) * 2018-01-24 2019-02-19 广州Tcl智能家居科技有限公司 A kind of clustered node Communication processing method, system, storage medium and server
CN111177486B (en) * 2019-12-19 2020-09-08 四川蜀天梦图数据科技有限公司 Message transmission method and device in distributed graph calculation process
CN113407620B (en) * 2020-03-17 2023-04-21 北京信息科技大学 Data block placement method and system based on heterogeneous Hadoop cluster environment
CN115190168B (en) * 2022-07-08 2023-08-04 苏州浪潮智能科技有限公司 Edge server management system and server cluster

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101957863A (en) * 2010-10-14 2011-01-26 广州从兴电子开发有限公司 Data parallel processing method, device and system
CN102263822A (en) * 2011-07-22 2011-11-30 北京星网锐捷网络技术有限公司 Distributed cache control method, system and device
CN102638566A (en) * 2012-02-28 2012-08-15 山东大学 BLOG system running method based on cloud storage
CN102646121A (en) * 2012-02-23 2012-08-22 武汉大学 Two-stage storage method combined with RDBMS (relational database management system) and Hadoop cloud storage

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101957863A (en) * 2010-10-14 2011-01-26 广州从兴电子开发有限公司 Data parallel processing method, device and system
CN102263822A (en) * 2011-07-22 2011-11-30 北京星网锐捷网络技术有限公司 Distributed cache control method, system and device
CN102646121A (en) * 2012-02-23 2012-08-22 武汉大学 Two-stage storage method combined with RDBMS (relational database management system) and Hadoop cloud storage
CN102638566A (en) * 2012-02-28 2012-08-15 山东大学 BLOG system running method based on cloud storage

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
王峰,雷葆华.Hadoop分布式文件系统的模型分析.《研究与开发》.2010,(第12期),全文. *

Also Published As

Publication number Publication date
CN103023995A (en) 2013-04-03

Similar Documents

Publication Publication Date Title
CN103023995B (en) A kind of distributed cloud based on Hadoop stores automatic classification data management system
EP3087513B1 (en) Hierarchical chunking of objects in a distributed storage system
US20190354713A1 (en) Fully managed account level blob data encryption in a distributed storage environment
CN105335513B (en) A kind of distributed file system and file memory method
CN102411637B (en) Metadata management method of distributed file system
CN102629941B (en) Caching method of a virtual machine mirror image in cloud computing system
CN103095806B (en) A kind of load balancing management system of the real-time dataBase system towards bulk power grid
CN102523279B (en) A kind of distributed file system and focus file access method thereof
CN103379159B (en) A kind of method that distributed Web station data synchronizes
CN103166991B (en) Cross nodal point storage implementation method and device based on P2P and cloud storage
US8938517B2 (en) System, method and computer program product for managing a remote storage
CN104133882A (en) HDFS (Hadoop Distributed File System)-based old file processing method
TW202111564A (en) Log-structured storage systems
JP2016511499A (en) Avoiding system-wide checkpoints in distributed database systems
CN108881942B (en) Super-fusion normal state recorded broadcast system based on distributed object storage
CN104281506A (en) Data maintenance method and system for file system
US20150205819A1 (en) Techniques for optimizing data flows in hybrid cloud storage systems
CN103455577A (en) Multi-backup nearby storage and reading method and system of cloud host mirror image file
US20130297969A1 (en) File management method and apparatus for hybrid storage system
CN103888499A (en) Distributed object processing method and system
CN103533058A (en) HDFS (Hadoop distributed file system)/Hadoop storage cluster-oriented resource monitoring system and HDFS/Hadoop storage cluster-oriented resource monitoring method
CN106953910A (en) A kind of Hadoop calculates storage separation method
CN102438020A (en) Method and equipment for distributing contents in content distribution network, and network system
CN103763368A (en) Cross-data-center data synchronism method
CN106020713A (en) File storage method based on buffer area

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant