CN104113597B - The HDFS data read-write method of a kind of many Data centres - Google Patents

The HDFS data read-write method of a kind of many Data centres Download PDF

Info

Publication number
CN104113597B
CN104113597B CN201410344218.9A CN201410344218A CN104113597B CN 104113597 B CN104113597 B CN 104113597B CN 201410344218 A CN201410344218 A CN 201410344218A CN 104113597 B CN104113597 B CN 104113597B
Authority
CN
China
Prior art keywords
data
metadata
node
hdfs
centre
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201410344218.9A
Other languages
Chinese (zh)
Other versions
CN104113597A (en
Inventor
董博
阮建飞
郑庆华
贺欢
张汉宁
张未展
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian Jiaotong University
Original Assignee
Xian Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian Jiaotong University filed Critical Xian Jiaotong University
Priority to CN201410344218.9A priority Critical patent/CN104113597B/en
Publication of CN104113597A publication Critical patent/CN104113597A/en
Application granted granted Critical
Publication of CN104113597B publication Critical patent/CN104113597B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The present invention provides a kind of HDFS data read-write system and the method for many Data centres, it is characterized in that, set up global metadata server, for the metadata information of the store and management overall situation, and be responsible for receiving client data read-write access request, according to preset schedule algorithms selection HDFS Data centre; Client terminal and selected data center carry out data read-write operation alternately, after having operated, the metadata node of Data centre again by the change synchronizing information of metadata to global metadata server. System and method of the present invention achieves the reading and writing data access of many HDFS Data centre, it is provided that unified data access interface, effectively achieves resource and the data sharing of many HDFS Data centre.

Description

The HDFS data read-write method of a kind of many Data centres
Technical field
The present invention relates to computer cloud memory technology, in particular to a kind of data read-write system based on HDFS distributed document system and method.
Background technology
Cloud storage extends in cloud computing (CloudComputing) concept and develops. Cloud storage refers to by functions such as cluster, gridding technique or distributed document systems, various dissimilar storing device a large amount of in network is gathered collaborative work by application software, common externally offer data store and Operational Visit function, and ensure the security etc. of data.
The technology being representative with the Hadoop project distributed file system (HadoopDistributedFileSystem is called for short HDFS) of open source community Apache and parallel programming framework HadoopMapReduce at present becomes the mainstream technology of mass data storage and analyzing and processing gradually. Wherein, HDFS becomes one of most popular distributed document system gradually, is the main flow file system currently building cloud storage.
HDFS system architecture as shown in Figure 1, forms primarily of metadata node NameNode, data node DataNode and client terminal Client. Wherein, NameNode also claims Master node, is in charge of title space and the data block map information of HDFS, configuration replication policy, and processes client-requested. DataNode, also referred to as Slave node, stores actual data, performs the read-write operation of data block, and periodically reports the data block information of storage to NameNode. Client terminal Client, for cutting data file, accesses or manages HDFS by order line pipe; Mutual with NameNode, obtain file location information; Mutual with DataNode, carry out data read and write operation.
At present, HDFS is widely used in Data centre by numerous enterprises, colleges and universities, scientific research institutions etc., becomes the basic storage system of Data centre gradually, carries mass data storage task. Along with more and more foundation being dispersed in the middle-size and small-size Data centre of independence everywhere, how effectively shared the storage resources of each Data centre and data are, the data access interface providing unified how to outer business, becomes one of core difficult problem of restriction cloud storage system high speed development and widespread use. The current HDFS reading and writing data technology about many Data centres, there are no concrete open report, is technical problem currently urgently to be resolved hurrily.
Summary of the invention
It is an object of the invention to provide the access interface that a kind of read-write that can be data provides unified, it is achieved HDFS data read-write system that many data center information and resource are effectively shared and method.
For reaching above object, invention takes following technical scheme to be achieved:
The HDFS data read-write system of a kind of many Data centres, it is characterised in that, comprise a global metadata server, n Data centre, a client terminal, all there are a metadata node and multiple data node in each Data centre; Adopt Wide area network to link between global metadata server with client terminal and each Data centre metadata node, link by local area network between the metadata node of each Data centre with data node; Global metadata server is used for the metadata information of the store and management overall situation, is responsible for each Data centre distribution metadata name space; The metadata node of each Data centre all comprises a GMSplugin module, is responsible for global metadata server registration and timing report data center resource using state and metadata information; Global metadata server is responsible for receiving client terminal HDFS reading and writing data access request, and meets the Data centre of requirement according to preset schedule algorithms selection; The metadata node at client-access selected data center, the scheduling of HDFS reading and writing data is carried out by this metadata node, client terminal after HDFS reading and writing data completes, the metadata node of Data centre again by the change synchronizing information of metadata to global metadata server.
The HDFS data read-write method of a kind of many Data centres, it is characterised in that, comprise the big step of read and write two:
The first step, HDFS data are read, and comprising:
(1) global metadata server is set up, for the metadata information of the store and management overall situation; Global metadata server is each Data centre distribution name space, and metadata information is reported to global metadata server by each Data centre;
(2) global metadata server receives client terminal read data request, selects the Data centre meeting reading requirement by preset algorithm, returns the metadata node information at selected data center;
(3) metadata node of client-access Data centre, metadata node returns to client terminal according to preset schedule algorithm data block and data section dot information;
(4) client terminal and data node carry out alternately, read data, notify metadata node after having read, and reading completes according to operation;
2nd step, HDFS data write, comprising:
(1) step (1) read with HDFS data;
(2) global metadata server receives client terminal read data request, selects the Data centre meeting write requirement by preset algorithm, returns the metadata node information at selected data center;
(3) metadata node of HDFS Data centre selected by client-access, metadata node creates metadata information, and distributes data node according to preset algorithm, and data section dot information is returned to client terminal;
(4) client terminal and data node carry out carrying out data writing operation, notifying metadata node after having write alternately; Adopting piecemeal writing mechanism during client terminal write data, data block copy copy is completed automatically by data node, and all data blocks all write and successfully notify that metadata node has write afterwards;
(5) after write process completes, the metadata node of Data centre by the change synchronizing information of metadata to global metadata server.
In aforesaid method, described client terminal read data request comprises any feature of file path, data block index, buffer size;Described client terminal write data requests comprises the new any feature creating file path, write size of data, access rights.
Data centre's selection algorithm that described global metadata server is preset, according to any feature reading or writing the data distribution of request of data and each Data centre, system performance, condition of loading, adopts that data distribute preferentially, performance priority policy selection Data centre.
Described metadata node preset schedule algorithm comprises any feature of the distance according to size of data, piecemeal quantity, data block and client terminal, data block distribution, selects by distance priority, distribution fairness policy.
The HDFS data read-write system of many Data centres of the present invention adopts two layers of logical separation scheduling architecture. Global logic layer has global metadata server to be responsible for the selection of the name distribution in space of each Data centre, the inquiry of global metadata, Data centre when reading and writing data, and is by integrated for each independent data center unified core. Business Logic is by carrying out regarded as output controlling to the metadata node of HDFS, increase GMSplugin module, and link as subordinate module with global metadata server, thus form the many HDFS data center resource share framework can supported metadata synchronization and share. The overall situation that the present invention realizes metadata while retaining the function such as metadata node data management of original HDFS Data centre is shared. This kind of mode reduces system complexity while keeping original system efficient stable, it is possible to effectively realize the reading and writing data access of many HDFS Data centre fast.
Accompanying drawing explanation
Fig. 1 is HDFS system tray composition.
Fig. 2 is the HDFS data read-write system framework figure of the many Data centres of the present invention.
Fig. 3 is the HDFS time data stream journey figure of the many Data centres of the present invention.
The HDFS that Fig. 4 is the many Data centres of the present invention writes data flowchart.
Embodiment
In order to be illustrated more clearly in the technical scheme of the present invention, describe the present invention below in conjunction with the drawings and specific embodiments.
As shown in Figure 2, the HDFS data read-write system of a kind of many Data centres, comprise a global metadata server (GlobalMetadataServer, GMS), it is numbered n the Data centre of 01 to N, a client terminal Client, all there are a metadata node (NameNode) and multiple data node (DataNode) in each Data centre, wherein Wide area network is adopted to link between global metadata server and client terminal, Wide area network is adopted to link between the metadata node of global metadata server and each Data centre, link by local area network between the metadata node of each Data centre with data node. global metadata server is used for the metadata information of the store and management overall situation, is responsible for each Data centre distribution metadata name space, the metadata node of each Data centre all comprises a GMSplugin (global metadata server middleware) module, and link with global metadata server, to global metadata server registration also timing report data center resource using state and metadata information.
Global metadata server is responsible for receiving client terminal HDFS reading and writing data access request, and meets the Data centre of requirement according to preset schedule algorithms selection; The metadata node at the above-mentioned selected data center of client-access, the scheduling of HDFS reading and writing data is carried out by this metadata node, after client terminal HDFS reading and writing data completes, the metadata node of Data centre again by the change synchronizing information of metadata to global metadata server.
Global metadata server is used for the metadata information of the store and management overall situation; It is responsible for each Data centre distribution metadata name space; It is responsible for receiving client terminal HDFS reading and writing data access request, and meets metadata node corresponding to the Data centre of requirement according to preset schedule algorithms selection; It is responsible for receiving the metadata updates of the metadata node of each Data centre.
Global metadata server manages three template compositions primarily of access interface, GMS service routine, metadata; Access interface is the mutual interface module of client terminal and global metadata server, is responsible for process client terminal to requests such as the reading and writing of HDFS data, inquiries; Module is guarded in the service that GMS service routine is global metadata server, and operation monitoring, the module of being responsible for global metadata server heavily open, and ensure the steady running of global metadata server; Metadata manages the metadata node of Shi Ge Data centre and the mutual interface module of global metadata server, it is in charge of the metadata node of each Data centre, receive the metadata synchronization update request of each Data centre and store global metadata information, the process reading and writing data request that receives of access interface module, and according to the suitable Data centre of global metadata information and each Data centre condition selecting.
GMSplugin module, is a middleware for carrying out communicating with global metadata server, is responsible for global metadata server registration, in real time to global metadata server sync notebook data center situation information and metadata information.
The metadata node (containing GMSplugin module) of each Data centre for managing catalogue tree and the file metadata information at notebook data center, when the metadata of metadata node changes by preset algorithm by the real-time synchronizing information of GMSplugin module to global metadata server; Metadata node is responsible for the management of the data node at notebook data center, the process of client terminal HDFS reading and writing data request of data; Metadata node according to data parameters to be visited and preset schedule strategy, can select data node from the Data centre of its management.
The data node of each Data centre for manage on node storage, block list, data read-write; Data node carries out the establishment of block, deletion and duplication under the scheduling of metadata node; Data section is pressed preset algorithm and is periodically reported data block information to metadata node.
Client is used for and system interaction, and client terminal carries out writing the piecemeal of data, and mutual with metadata node, the data node of global metadata server and Data centre respectively, completes HDFS data read-write operation.
Optional 1 to 200 of the HDFS data read-write system number n of Tu2Duo Data centre.
By Fig. 2 system, present invention also offers the HDFS data reading method of many Data centres, it be described below by Fig. 3:
S301 sets up global metadata server, for the metadata information of the store and management overall situation; Global metadata server is each HDFS Data centre distribution name space, and metadata information is reported to global metadata server by each Data centre;
S302 global metadata server receives client terminal HDFS read data request, selects the HDFS Data centre meeting reading requirement by preset algorithm, returns the metadata node information at selected data center;
Client terminal read data request comprises the information such as file path, data block index, buffer size;
Preset schedule algorithm according to reading the information such as the data distribution of HDFS request of data and each Data centre, system performance, condition of loading, adopt data distribution preferentially, the policy selection Data centre such as performance priority;
The metadata node of S303 client-access HDFS Data centre, metadata node returns to client terminal according to preset schedule algorithm data block and data section dot information;
Metadata node comprises the information such as the distance according to data block and client terminal, piecemeal quantity, data block distribution according to preset schedule algorithm and provides recommendation reading order, select by distance priority, distribution fairness policy, it is possible to develop customization as required by those skilled in the art;
S304 client terminal and data node carry out alternately, read data, notify metadata node after having read, and reading completes according to operation.
By Fig. 2 system, the present invention provides the HDFS data write method of many Data centres, is described below by Fig. 4:
S401 sets up global metadata server, for the metadata information of the store and management overall situation; Global metadata server is each Data centre distribution name space, and metadata information is reported to global metadata server by each Data centre;
S402 global metadata server receives client terminal read data request, selects the HDFS Data centre meeting write requirement by preset algorithm, returns the metadata node information of selected HDFS Data centre;
Client terminal write data requests comprises new establishment file path, the write information such as size of data, access rights;
Global metadata server preset schedule algorithm selects concrete Data centre according to information such as the data distribution of request information and each Data centre, system performance, condition of loading, adopting the strategies such as data distribution is preferential, performance priority to dispatch, scheduling algorithm can by those skilled in the art's flexible customization as required;
The metadata node of HDFS Data centre selected by S403 client-access, metadata node creates metadata information, and according to preset schedule algorithm assigns data node, and data section dot information is returned to client terminal;
Metadata node preset schedule algorithm comprises according to information such as size of data, piecemeal quantity, data block distributions, dispatches by strategies such as distance priority, distribution justices, it is possible to develop customization as required by those skilled in the art;
S404 client terminal and data node carry out carrying out data writing operation, notifying metadata node after having write alternately; Adopting piecemeal writing mechanism during client terminal write data, data block copy copy is completed automatically by data node, and all data blocks all write and successfully notify that metadata node has write afterwards;
S405 after write process completes, the metadata node of HDFS Data centre by the change synchronizing information of metadata to global metadata server.
In sum, the invention solves along with disperseing the middle-size and small-size Data centre of independence everywhere to get more and more, and be difficult to realize the storage resources of each Data centre and data effectively shared, and provide the problems such as unified data access interface how to outer business, achieve unified management, unified interface, the HDFS reading and writing data framework for many Data centres of open and stable and method.

Claims (1)

1. the HDFS data read-write method of a Zhong Duo Data centre, it is characterised in that, comprise the big step of read and write two:
The first step, HDFS project distributed file body coefficient, according to reading, comprising:
(1) global metadata server is set up, for the metadata information of the store and management overall situation; Global metadata server is each Data centre distribution name space, and metadata information is reported to global metadata server by each Data centre;
(2) global metadata server receives client terminal read data request, selects the Data centre meeting reading requirement by preset algorithm, returns the metadata node information at selected data center;
(3) metadata node of client-access Data centre, metadata node returns to client terminal according to preset schedule algorithm data block and data section dot information;
(4) client terminal and data node carry out alternately, read data, notify metadata node after having read, and reading completes according to operation;
2nd step, HDFS project distributed file body system data write, comprising:
(1) with HDFS project distributed file body coefficient according to the step (1) read;
(2) global metadata server receives client terminal read data request, selects the Data centre meeting write requirement by preset algorithm, returns the metadata node information at selected data center;
(3) metadata node of Data centre of HDFS project distributed file body system selected by client-access, metadata node creates metadata information, and distributes data node according to preset algorithm, and data section dot information is returned to client terminal;
(4) client terminal and data node carry out carrying out data writing operation, notifying metadata node after having write alternately; Adopting piecemeal writing mechanism during client terminal write data, data block copy copy is completed automatically by data node, and all data blocks all write and successfully notify that metadata node has write afterwards;
(5) after write process completes, the metadata node of Data centre by the change synchronizing information of metadata to global metadata server;
In aforesaid method, the metadata node of described each Data centre all comprises a GMSplugin global metadata server middleware module, is responsible for global metadata server registration and timing report data center resource using state and metadata information; Data centre's selection algorithm that described global metadata server is preset, according to any feature reading or writing the data distribution of request of data and each Data centre, system performance, condition of loading, adopts that data distribute preferentially, performance priority policy selection Data centre; Described metadata node preset schedule algorithm comprises the information according to size of data, piecemeal quantity and data block distribution, and the strategy fair by distance priority and distribution is selected.
CN201410344218.9A 2014-07-18 2014-07-18 The HDFS data read-write method of a kind of many Data centres Active CN104113597B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410344218.9A CN104113597B (en) 2014-07-18 2014-07-18 The HDFS data read-write method of a kind of many Data centres

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410344218.9A CN104113597B (en) 2014-07-18 2014-07-18 The HDFS data read-write method of a kind of many Data centres

Publications (2)

Publication Number Publication Date
CN104113597A CN104113597A (en) 2014-10-22
CN104113597B true CN104113597B (en) 2016-06-08

Family

ID=51710229

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410344218.9A Active CN104113597B (en) 2014-07-18 2014-07-18 The HDFS data read-write method of a kind of many Data centres

Country Status (1)

Country Link
CN (1) CN104113597B (en)

Families Citing this family (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104506527B (en) * 2014-12-23 2021-12-17 苏州海博智能系统有限公司 Multi-dimensional information pointer platform and data access method thereof
CN105049504B (en) * 2015-07-09 2019-03-05 国云科技股份有限公司 A kind of big data transfer transmission is synchronous and storage method
CN105760556B (en) * 2016-04-19 2019-05-24 江苏物联网研究发展中心 More wave files of low delay high-throughput read and write optimization method
CN105847392A (en) * 2016-04-25 2016-08-10 乐视控股(北京)有限公司 HDFS writing method and device
CN107451138A (en) * 2016-05-30 2017-12-08 中兴通讯股份有限公司 A kind of distributed file system storage method and system
CN106357723A (en) * 2016-08-15 2017-01-25 杭州古北电子科技有限公司 Synchronous system and method for multi-cluster information caching based on cloud host
CN106502795A (en) * 2016-11-03 2017-03-15 郑州云海信息技术有限公司 The method and system of scientific algorithm application deployment are realized on distributed type assemblies
CN107483571A (en) * 2017-08-08 2017-12-15 柏域信息科技(上海)有限公司 A kind of dynamic cloud storage method and system
CN107562926B (en) * 2017-09-14 2023-09-26 丙申南京网络技术有限公司 Multi-hadoop distributed file system for big data analysis
CN107958159A (en) * 2017-11-15 2018-04-24 广东电网有限责任公司电力调度控制中心 A kind of method and system of big data migration
CN110022338B (en) * 2018-01-09 2022-05-27 阿里巴巴集团控股有限公司 File reading method and system, metadata server and user equipment
CN109582686B (en) * 2018-12-13 2021-01-15 中山大学 Method, device, system and application for ensuring consistency of distributed metadata management
CN109726250B (en) * 2018-12-27 2020-01-17 星环信息科技(上海)有限公司 Data storage system, metadata database synchronization method and data cross-domain calculation method
CN110213352B (en) * 2019-05-17 2020-12-18 北京航空航天大学 Method for aggregating dispersed autonomous storage resources with uniform name space
CN110825704B (en) * 2019-09-27 2023-09-01 华为云计算技术有限公司 Data reading method, data writing method and server
CN111030858B (en) * 2019-12-06 2023-04-07 北京浪潮数据技术有限公司 Data management method, system and related device for distributed multi-cluster system
CN111124301B (en) * 2019-12-18 2024-02-23 深圳供电局有限公司 Data consistency storage method and system of object storage device
CN111198849A (en) * 2020-01-10 2020-05-26 国网福建省电力有限公司 Power supply data read-write system based on Hadoop and working method thereof
CN111327681A (en) * 2020-01-21 2020-06-23 北京工业大学 Cloud computing data platform construction method based on Kubernetes
CN112395354B (en) * 2020-11-05 2022-08-02 深圳市中博科创信息技术有限公司 Distributed relational database based on HDFS metadata server and construction method
CN113419687B (en) * 2021-07-13 2022-08-12 广东电网有限责任公司 Object storage method, system, equipment and storage medium
CN117076391B (en) * 2023-10-12 2024-03-22 长江勘测规划设计研究有限责任公司 Water conservancy metadata management system

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102419766B (en) * 2011-11-01 2013-11-20 西安电子科技大学 Data redundancy and file operation methods based on Hadoop distributed file system (HDFS)
CN103793425B (en) * 2012-10-31 2017-07-14 国际商业机器公司 Data processing method and device for distributed system
CN103473365B (en) * 2013-09-25 2017-06-06 北京奇虎科技有限公司 A kind of file memory method based on HDFS, device and distributed file system

Also Published As

Publication number Publication date
CN104113597A (en) 2014-10-22

Similar Documents

Publication Publication Date Title
CN104113597B (en) The HDFS data read-write method of a kind of many Data centres
CN107066319B (en) Multi-dimensional scheduling system for heterogeneous resources
CN105468473B (en) Data migration method and data migration device
US10366111B1 (en) Scalable distributed computations utilizing multiple distinct computational frameworks
CN103106152B (en) Based on the data dispatching method of level storage medium
CN109314721B (en) Management of multiple clusters of a distributed file system
US20150379050A1 (en) Configurable-capacity time-series tables
CN104508639B (en) Use the coherency management of coherency domains table
CN103067461A (en) Metadata management system of document and metadata management method thereof
JPWO2012121316A1 (en) Distributed storage system and method
CN105574217B (en) The method of data synchronization and device of distributed relation database
CN103067433A (en) Method, device and system of data migration of distributed type storage system
CN103080903A (en) Scheduler, multi-core processor system, and scheduling method
CN102981929A (en) Management method and system for disk mirror images
CN105339899B (en) For making the method and controller of application program cluster in software defined network
CN101753439A (en) Method for distributing and transmitting streaming media
CN114553865B (en) Heterogeneous hybrid cloud system architecture design method
CN103793442A (en) Spatial data processing method and system
CN103067488A (en) Implement method of unified storage
US10776404B2 (en) Scalable distributed computations utilizing multiple distinct computational frameworks
CN104539730A (en) Load balancing method of facing video in HDFS
CN105677761A (en) Data sharding method and system
CN104869140A (en) Multi-cluster system and method for controlling data storage of multi-cluster system
CN102207978A (en) Database access method and system
CN109992373A (en) Resource regulating method, approaches to IM and device and task deployment system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C53 Correction of patent for invention or patent application
CB03 Change of inventor or designer information

Inventor after: Dong Bo

Inventor after: Ruan Jianfei

Inventor after: Zheng Qinghua

Inventor after: He Huan

Inventor after: Zhang Hanning

Inventor after: Zhang Weizhan

Inventor before: Dong Bo

Inventor before: Zhang Hanning

Inventor before: Zheng Qinghua

Inventor before: He Huan

Inventor before: Zhang Weizhan

COR Change of bibliographic data

Free format text: CORRECT: INVENTOR; FROM: DONG BO ZHANG HANNING ZHENG QINGHUA HE HUAN ZHANG WEIZHAN TO: DONG BO RUANJIANFEI ZHENG QINGHUA HE HUAN ZHANG HANNING ZHANG WEIZHAN

C14 Grant of patent or utility model
GR01 Patent grant