CN108984635A - A kind of HDFS storage system and date storage method - Google Patents
A kind of HDFS storage system and date storage method Download PDFInfo
- Publication number
- CN108984635A CN108984635A CN201810643546.7A CN201810643546A CN108984635A CN 108984635 A CN108984635 A CN 108984635A CN 201810643546 A CN201810643546 A CN 201810643546A CN 108984635 A CN108984635 A CN 108984635A
- Authority
- CN
- China
- Prior art keywords
- management node
- storage
- metadata
- metadata management
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/16—Error detection or correction of the data by redundancy in hardware
- G06F11/20—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Quality & Reliability (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a kind of HDFS storage systems, comprising: multiple metadata management nodes, the distributed system High Availabitity component connecting with each metadata management node, metadata storage pool;Wherein, the metadata management node is used to receive and handle the storage request of data to be stored;The distributed system High Availabitity component, in the case where current distributed system High Availabitity component corresponding current meta data management node delay machine, the storage for being sent to current meta data management node request to be transferred on another metadata management node;The metadata storage pool, for storing the data to be stored, the multiple metadata management node establishes communication linkage with the metadata storage pool.HDFS storage system provided by the present invention ensure that the data consistency in service handoff procedure, data to be stored will not be lost in handoff procedure.The present invention also provides a kind of date storage methods, have above-mentioned beneficial effect.
Description
Technical field
The present invention relates to technical field of data storage, store more particularly to a kind of HDFS storage system and a kind of data
Method.
Background technique
HDFS is the storage assembly of Hadoop big data, is responsible for the storage of overall data, NameNode is first number of HDFS
According to management module, if NameNode goes wrong, it is unavailable to will lead to whole HDFS storage system, for this purpose, HDFS has pushed away base
In the High Availabitity solution of active-standby mode, same time, main NameNode is responsible for the data storage service of big data, if main
NameNode goes wrong, and can take over service from NameNode, to carry out the storage service of big data entirety.
It is active shape that active and standby NameNode framework, which only has main NameNode in the same time, in traditional HDFS storage system
State can receive the storage request of data;Standby NameNode is in Standy state, and active and standby NameNode enjoys a storage jointly
Region, when switching, standby NameNode reads shared storage area, obtains newest state, becomes main NameNode.This
The High Availabitity of kind storage mode is likely to result in the loss and inconsistence problems of data in switching, meanwhile, synchronization, only
There is a NameNode that overall load can be allowed heavier.
Since existing HDFS storage system uses log management mode, and temporally log is exported, such as
The main NameNode of fruit goes wrong, and will read log from NameNode, and take over to service.Since log is by certain time
Derived from interval, if data, which are not synchronized to, will lead to the loss of data from NameNode and asks before main NameNode failure
Topic.And only one NameNode of the same time is externally serviced, and there are problems that overload.
In summary as can be seen that guaranteeing in service handoff procedure how when main metadata management node sends failure
Data consistency is current problem to be solved.
Summary of the invention
The object of the present invention is to provide a kind of HDFS storage systems, can be automatic in a certain metadata management node failure
It switches on other metadata management nodes, guarantees the data consistency in service handoff procedure.The present invention also provides one
Kind date storage method, has above-mentioned beneficial effect.
In order to solve the above technical problems, the present invention provides a kind of HDFS storage system, comprising: multiple metadata management sections
Point, the distributed system High Availabitity component being connect with each metadata management node, metadata storage pool;Wherein, first number
It is used to receive and handle the storage request of data to be stored according to management node;The distributed system High Availabitity component is used for
Currently in the case where the corresponding current meta data management node delay machine of distributed system High Availabitity component, it will be sent to described current
The storage request of metadata management node is transferred on another metadata management node;The metadata storage pool, for storing
The data to be stored, the multiple metadata management node establish communication linkage with the metadata storage pool.
Preferably, further includes: client, the client and the multiple metadata management node, which are established, to be communicated to connect;
The multiple metadata management node provides multiple virtual ip address to the client.
Preferably, the client is asked to the storage that the virtual ip address of the metadata management node sends storing data
It asks;In the case where the virtual ip address corresponding metadata management node delay machine, the distributed system High Availabitity component
The virtual ip address is transferred on another metadata management node.
Preferably, the multiple metadata management node is specifically used for: receive and handle transmitted by the client to
The storage request of storing data, the file directory tree for safeguarding entire file system and maintenance documentation and data block block list
Corresponding relationship.
Preferably, the metadata storage pool be distributed storage pond, the multiple metadata management node with it is described
Metadata cluster in metadata storage pool keeps communication.
Preferably, further includes: the back end being connect with the metadata storage pool, for according to the client or institute
The scheduling storage and retrieval data of metadata management node are stated, and is sent out every prefixed time interval to the metadata management node
Send the list of the back end institute memory block block.
The present invention also provides a kind of date storage methods, comprising:
The storage that data to be stored is received and handled using multiple metadata management nodes is requested;
Wherein, each metadata management node is respectively connected with distributed system High Availabitity component;
In the case where current distributed system High Availabitity component corresponding current meta data management node delay machine, it will send
Storage request to the current meta data management node is transferred on another metadata management node;
The multiple metadata management node is established with the metadata storage pool and is communicated to connect, in order to will it is described to
Storing data is stored to the metadata storage pool.
Preferably, it is described using multiple metadata management nodes receive and handle data to be stored storage request include:
The storage that the data to be stored that client is sent is received and handled using multiple metadata management nodes is requested;It is described
Client and the multiple metadata management node, which are established, to be communicated to connect;The multiple metadata management node is to the client
Multiple virtual ip address are provided.
Preferably, it includes: described that the multiple metadata management node, which provides multiple virtual ip address to the client,
Client sends the storage request of storing data to the virtual ip address of the metadata management node;In the virtual ip address
In the case where corresponding metadata management node delay machine, the distributed system High Availabitity component shifts the virtual ip address
Onto another metadata management node.
Preferably, the metadata storage pool be distributed storage pond, the multiple metadata management node with it is described
Metadata cluster in metadata storage pool keeps communication.
HDFS storage system provided by the present invention, including multiple metadata management nodes, and each metadata management section
Point is respectively connected with distributed system High Availabitity component, and each distributed system High Availabitity component connects with metadata storage pool
It connects.Wherein, the metadata management node is used to receive and handle the data storage request that the client is sent;In current member
In the case that data management node breaks down, the distributed system High Availabitity group that is connect with the current meta data administrative unit
Storage request is switched to another metadata management node and handled by part;Between the multiple metadata management node mutually
It is independent, it does not need progress data and synchronizes, be jointly processed by same part data;In a certain metadata management nodes break down, institute
It states distributed system High Availabitity component to request the storage, is sent on another metadata management node and is handled, from
And ensure that the service of storage system is not interrupted, it ensure that the whole High Availabitity of service, and ensure that in service handoff procedure
Data consistency, data to be stored will not be lost in handoff procedure.
Detailed description of the invention
It, below will be to embodiment or existing for the clearer technical solution for illustrating the embodiment of the present invention or the prior art
Attached drawing needed in technical description is briefly described, it should be apparent that, the accompanying drawings in the following description is only this hair
Bright some embodiments for those of ordinary skill in the art without creative efforts, can be with root
Other attached drawings are obtained according to these attached drawings.
Fig. 1 is a kind of structural block diagram of HDFS storage system provided by the embodiment of the present invention;
Fig. 2 is a kind of flow chart of specific embodiment of date storage method provided by the embodiment of the present invention.
Specific embodiment
Core of the invention is to provide a kind of HDFS storage system, ensure that the data consistency in service handoff procedure,
Data to be stored will not be lost in handoff procedure.The present invention also provides a kind of date storage methods, have above-described embodiment.
The main NameNode of existing HDFS storage system carries out the synchronization of metadata using the mode for reading image file,
Main NameNode records the operation that current system is done by writing journal file, and writes log information by certain time rule
In image file, when finding NameNode switching, reading image file is removed from NameNode meeting active, to obtain master
The various states of NameNode, to reach the handoff procedure of data.If journal file is noted down before writing mirror image, in service
It is disconnected, it will cause the loss of data or the inconsistence problems of data.
In order to solve the disadvantage that in the prior art, the present invention provides a kind of HDFS storage systems, have multiple metadata
Management node, and when so that a certain metadata node is broken down using High Availabitity component CTDB, the storage of client can be asked
It asks generation into another metadata management node, ensure that the high availability of storage system, and ensure that service handoff procedure
The consistency of middle data.
In order to enable those skilled in the art to better understand the solution of the present invention, with reference to the accompanying drawings and detailed description
The present invention is described in further detail.Obviously, described embodiments are only a part of the embodiments of the present invention, rather than
Whole embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art are not making creative work premise
Under every other embodiment obtained, shall fall within the protection scope of the present invention.
Referring to FIG. 1, Fig. 1 is a kind of structural block diagram of HDFS storage system provided by the embodiment of the present invention;This implementation
Provided HDFS storage system includes: multiple metadata management nodes, the distribution connecting with each metadata management node
System High Availabitity component, metadata storage pool;Wherein, the metadata management node is for receiving and handling data to be stored
Storage request;The distributed system High Availabitity component, in the corresponding current member of current distributed system High Availabitity component
In the case where data management node delay machine, the storage for being sent to current meta data management node request is transferred to another member
On data management node;The metadata storage pool, for storing the data to be stored, the multiple metadata management node
Communication linkage is established with the metadata storage pool.
There can be multiple metadata management nodes externally to provide service in the present embodiment, such as three metadata management sections
Point, four metadata management nodes etc..The multiple metadata management node is jointly processed by with a metadata, is solved existing
Storage system in the single metadata management node problem that causes load pressure excessive, considerably increase HDFS storage system
Working efficiency.
It in the present embodiment, further include client in the HDFS storage system, the client and the multiple first number
It establishes and communicates to connect according to management node;The multiple metadata management node provides multiple virtual ip address to the client.
The client sends storage request, the metadata management corresponding to the virtual ip address to the multiple virtual ip address
In the case where nodes break down, the virtual ip address is transferred to another by the distributed system High Availabitity component CTDB
On metadata management node, continue to service.
The metadata management node is used to receive and handle the storage request of data to be stored;The metadata node is also
For safeguarding the file directory tree of entire file system and the corresponding relationship of maintenance documentation and data block block list.It is described
The unified maintenance of multiple metadata management nodes and shared a metadata.
The metadata storage pool is distributed storage pond, and the multiple metadata management node is deposited with the metadata
Metadata cluster in reservoir keeps communication.
HDFS storage system provided by the present embodiment further includes the back end DataNode connecting with the metadata;
The back end is used to be stored according to the scheduling of client or metadata management node and retrieves data, and every preset time
It is spaced the list that the back end institute memory block block is sent to the metadata management node.
In the present embodiment, asking for distributed High Availabitity is solved using the distributed system High Availabitity component CTDB
Topic is changed the High Availabitity application method in traditional storage system, is made each by the distributed High Availabitity component CTDB
Unified in metadata management node to safeguard a metadata, each metadata management node works independently from each other, and shares with a
Data, when some metadata management node in the multiple metadata management node breaks down, the distributed system
System High Availabitity component CTDB can restart service, fault metadata is managed according to the cluster state at current time
Storage request on node switches on other metadata management nodes, guarantees that the integrity service of the HDFS storage system will not
It interrupts, ensure that the whole High Availabitity of service, and guarantee to service the data consistency in handoff procedure.
And HDFS storage system provided by the embodiment of the present invention can have multiple metadata management sections in the same time
Point provides service to the client simultaneously, solves the single metadata management node load pressure of storage system in the prior art
Excessive problem improves storage system to the efficiency of data processing.
Referring to FIG. 2, Fig. 2 is a kind of process of specific embodiment of date storage method provided by the embodiment of the present invention
Figure;Specific steps are as follows:
Step S201: the storage that data to be stored is received and handled using multiple metadata management nodes is requested;Wherein, often
A metadata management node is respectively connected with distributed system High Availabitity component;
Step S202: the current distributed system High Availabitity component corresponding current meta data management node delay machine the case where
Under, the storage for being sent to current meta data management node request is transferred on another metadata management node;
Step S203: the multiple metadata management node is established with the metadata storage pool and is communicated to connect, so as to
It stores in by the data to be stored to the metadata storage pool.
Date storage method provided by the present embodiment is for realizing HDFS storage system above-mentioned, therefore data storage side
The embodiment part of the visible HDFS storage system hereinbefore of specific embodiment in method, details are not described herein.
A kind of HDFS storage system provided by the present invention and date storage method are described in detail above.This
Apply that a specific example illustrates the principle and implementation of the invention in text, the explanation of above example is only intended to
It facilitates the understanding of the method and its core concept of the invention.It should be pointed out that for those skilled in the art,
Without departing from the principles of the invention, can be with several improvements and modifications are made to the present invention, these improvement and modification are also fallen
Enter in the protection scope of the claims in the present invention.
Claims (10)
1. a kind of HDFS storage system characterized by comprising
Multiple metadata management nodes, the distributed system High Availabitity component being connect with each metadata management node, metadata
Storage pool;
Wherein, the metadata management node is used to receive and handle the storage request of data to be stored;
The distributed system High Availabitity component, in the corresponding current meta data pipe of current distributed system High Availabitity component
In the case where managing node delay machine, the storage for being sent to current meta data management node request is transferred to another metadata pipe
It manages on node;
The metadata storage pool, for storing the data to be stored, the multiple metadata management node with the member
Data storage pool establishes communication linkage.
2. HDFS storage system as described in claim 1, which is characterized in that further include: client, the client with it is described
Multiple metadata management nodes establish communication connection;The multiple metadata management node provides multiple virtual to the client
IP address.
3. HDFS storage system as claimed in claim 2, which is characterized in that the client is to the metadata management node
Virtual ip address send storing data storage request;In the corresponding metadata management node delay machine of the virtual ip address
In the case of, the virtual ip address is transferred on another metadata management node by the distributed system High Availabitity component.
4. HDFS storage system as claimed in claim 3, which is characterized in that the multiple metadata management node is specifically used
In: it receives and handles the storage request of data to be stored transmitted by the client, safeguard the file mesh of entire file system
The corresponding relationship of record tree and maintenance documentation and data block block list.
5. HDFS storage system as described in claim 1, which is characterized in that the metadata storage pool is distributed storage
Pond, the multiple metadata management node are communicated with the metadata cluster holding in the metadata storage pool.
6. HDFS storage system as described in claim 1, which is characterized in that further include: it is connect with the metadata storage pool
Back end, for according to the scheduling of the client or the metadata management node store and retrieve data, and every
Prefixed time interval sends the list of the back end institute memory block block to the metadata management node.
7. a kind of date storage method, which is characterized in that be applied to HDFS storage system, comprising:
The storage that data to be stored is received and handled using multiple metadata management nodes is requested;
Wherein, each metadata management node is respectively connected with distributed system High Availabitity component;
In the case where current distributed system High Availabitity component corresponding current meta data management node delay machine, by being sent to
The storage request for stating current meta data management node is transferred on another metadata management node;
The multiple metadata management node with the metadata storage pool establish communicate to connect, in order to by described wait store
Data are stored to the metadata storage pool.
8. date storage method as claimed in claim 7, which is characterized in that described to be received using multiple metadata management nodes
Storage with processing data to be stored is requested
The storage that the data to be stored that client is sent is received and handled using multiple metadata management nodes is requested;
The client and the multiple metadata management node, which are established, to be communicated to connect;The multiple metadata management node is to institute
It states client and multiple virtual ip address is provided.
9. date storage method as claimed in claim 8, which is characterized in that the multiple metadata management node is to the visitor
Family end provides multiple virtual ip address
The client sends the storage request of storing data to the virtual ip address of the metadata management node;In the void
In the case where the quasi- corresponding metadata management node delay machine of IP address, the distributed system High Availabitity component is by the virtual IP address
Address is transferred on another metadata management node.
10. date storage method as described in claim 1, which is characterized in that the metadata storage pool is distributed storage
Pond, the multiple metadata management node are communicated with the metadata cluster holding in the metadata storage pool.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810643546.7A CN108984635A (en) | 2018-06-21 | 2018-06-21 | A kind of HDFS storage system and date storage method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810643546.7A CN108984635A (en) | 2018-06-21 | 2018-06-21 | A kind of HDFS storage system and date storage method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108984635A true CN108984635A (en) | 2018-12-11 |
Family
ID=64541664
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810643546.7A Pending CN108984635A (en) | 2018-06-21 | 2018-06-21 | A kind of HDFS storage system and date storage method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108984635A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111143027A (en) * | 2019-12-06 | 2020-05-12 | 北京浪潮数据技术有限公司 | Cloud platform management method, system, equipment and computer readable storage medium |
CN111338647A (en) * | 2018-12-18 | 2020-06-26 | 杭州海康威视数字技术股份有限公司 | Big data cluster management method and device |
CN113824812A (en) * | 2021-08-27 | 2021-12-21 | 济南浪潮数据技术有限公司 | Method, device and storage medium for HDFS service to acquire service node IP |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104994168A (en) * | 2015-07-14 | 2015-10-21 | 苏州科达科技股份有限公司 | distributed storage method and distributed storage system |
CN107181608A (en) * | 2016-03-11 | 2017-09-19 | 阿里巴巴集团控股有限公司 | A kind of method and operation management system for recovering service and performance boost |
CN107920131A (en) * | 2017-12-08 | 2018-04-17 | 郑州云海信息技术有限公司 | A kind of metadata management method and device of HDFS storage systems |
-
2018
- 2018-06-21 CN CN201810643546.7A patent/CN108984635A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104994168A (en) * | 2015-07-14 | 2015-10-21 | 苏州科达科技股份有限公司 | distributed storage method and distributed storage system |
CN107181608A (en) * | 2016-03-11 | 2017-09-19 | 阿里巴巴集团控股有限公司 | A kind of method and operation management system for recovering service and performance boost |
CN107920131A (en) * | 2017-12-08 | 2018-04-17 | 郑州云海信息技术有限公司 | A kind of metadata management method and device of HDFS storage systems |
Non-Patent Citations (1)
Title |
---|
MINGFEI10: "分布式高可用CTDB方案", 《CHINAUNIX》 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111338647A (en) * | 2018-12-18 | 2020-06-26 | 杭州海康威视数字技术股份有限公司 | Big data cluster management method and device |
CN111338647B (en) * | 2018-12-18 | 2023-09-12 | 杭州海康威视数字技术股份有限公司 | Big data cluster management method and device |
CN111143027A (en) * | 2019-12-06 | 2020-05-12 | 北京浪潮数据技术有限公司 | Cloud platform management method, system, equipment and computer readable storage medium |
CN113824812A (en) * | 2021-08-27 | 2021-12-21 | 济南浪潮数据技术有限公司 | Method, device and storage medium for HDFS service to acquire service node IP |
CN113824812B (en) * | 2021-08-27 | 2023-02-28 | 济南浪潮数据技术有限公司 | Method, device and storage medium for HDFS service to acquire service node IP |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107590182B (en) | Distributed log collection method | |
EP3039549B1 (en) | Distributed file system using consensus nodes | |
CN104679772B (en) | Method, apparatus, equipment and the system of file are deleted in Distributed Data Warehouse | |
CN108984635A (en) | A kind of HDFS storage system and date storage method | |
CN100375093C (en) | Processing of multiroute processing element data | |
CN108964948A (en) | Principal and subordinate's service system, host node fault recovery method and device | |
CN103905537A (en) | System for managing industry real-time data storage in distributed environment | |
CN104320401A (en) | Big data storage and access system and method based on distributed file system | |
US11068499B2 (en) | Method, device, and system for peer-to-peer data replication and method, device, and system for master node switching | |
CN103237046A (en) | Distributed file system supporting mixed cloud storage application and realization method thereof | |
CN105095317A (en) | Distributive database service management system | |
CN102143237A (en) | Grid-based Internet content delivery method and system | |
CN103207841A (en) | Method and device for data reading and writing on basis of key-value buffer | |
CN108881512A (en) | Virtual IP address equilibrium assignment method, apparatus, equipment and the medium of CTDB | |
CN103544285A (en) | Data loading method and device | |
CN109992373A (en) | Resource regulating method, approaches to IM and device and task deployment system | |
CN102546776A (en) | Method for realizing off-line reading files in SAN (Storage Area Networking) shared file system | |
CN109871365A (en) | A kind of distributed file system | |
CN107682411A (en) | A kind of extensive SDN controllers cluster and network system | |
CN102831038B (en) | The disaster recovery method and ENUM-DNS of ENUM-DNS | |
CN101621535B (en) | Network communication method and device of real-time monitoring system | |
CN111475537B (en) | Global data synchronization system based on pulsar | |
CN112835862B (en) | Data synchronization method, device, system and storage medium | |
CN106227470A (en) | A kind of SRM method and device | |
CN105007172A (en) | Method for realizing HDFS high-availability scheme |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20181211 |
|
RJ01 | Rejection of invention patent application after publication |