CN105278882A - Disk management method of distributed file system - Google Patents

Disk management method of distributed file system Download PDF

Info

Publication number
CN105278882A
CN105278882A CN201510699041.9A CN201510699041A CN105278882A CN 105278882 A CN105278882 A CN 105278882A CN 201510699041 A CN201510699041 A CN 201510699041A CN 105278882 A CN105278882 A CN 105278882A
Authority
CN
China
Prior art keywords
disk
management
node
step
metadata
Prior art date
Application number
CN201510699041.9A
Other languages
Chinese (zh)
Inventor
杨卫华
严鹏
Original Assignee
创新科存储技术有限公司
创新科存储技术(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 创新科存储技术有限公司, 创新科存储技术(深圳)有限公司 filed Critical 创新科存储技术有限公司
Priority to CN201510699041.9A priority Critical patent/CN105278882A/en
Publication of CN105278882A publication Critical patent/CN105278882A/en

Links

Abstract

The invention discloses a disk management method of a distributed file system. The disk management method comprises the following steps: when the distributed file system is added into one disk, a data storage node where the disk is positioned sends a notification message for adding a disk to a metadata node; after a metadata management node receives the notification message, a disk management identifier is distributed for the disk; the management identifier is stored in the database of the metadata management node; and on the basis of the disk management identifier, operating the disk. The scheme of the invention can realize the high-reliability maintenance of the distributed file system.

Description

A kind of disk management method of distributed file system

Technical field

The application relates to computer network and technical field of memory, particularly relates to a kind of disk management method of distributed file system.

Background technology

Distributed file system (DistributedFileSystem) refers to that the physical memory resources of file system management is not necessarily connected directly between on local node, but is connected with node by computer network.

The data that traditional computer is positioned on local disk by file system management, storage.Flourish along with multiple application such as mobile device, social networks, Internet of Things, the data that human society produces are explosive growth, expand merely the mode of the memory capacity of computer file system by increasing hard disk number, the performance in amount of capacity, capacity growth rate, data backup, data security etc. is all barely satisfactory.Distributed file system effectively can solve a store and management difficult problem for data: will be fixed on certain file system in certain place, and expand to any number of place/multiple file system, numerous nodes forms a Filesystem Network.Each node can be distributed in different places, carries out internodal communication and data transmission by network.People, when using distributed file system, without the need to being concerned about that data are stored on which node or from which node from acquisition, only need to manage and the data in memory file system as use local file system.

After the node of distributed file system arrives some, the maintenance of this system physical machine is especially just become particularly outstanding to the operability of the maintenance of disk.

Summary of the invention

This application provides a kind of disk management method of distributed file system, the maintenance of the high reliability of distributed file system can be realized.

The disk management method of a kind of distributed file system that the embodiment of the present application provides, comprising:

A, when one piece of disk adds distributed file system, the data memory node at this disk place sends the notification message increasing disk to metadata node;

After B, metadata management node receive notification message, it is this disk allocation disk management identifier; This management mark is kept in the database of metadata management node;

C, based on described disk management identifier, disk to be operated.

Alternatively, the method comprises further: when the management information of disk changes, data memory node is by the disk management of this disk mark and disk management message notice metadata management node, metadata management node based on described disk management identifier at database retrieval to corresponding disk management information, and upgrade described disk management information.

Alternatively, described disk management information comprises: the general unique identifier UUID generated when disk size, free space, group identification, disk formatting.

Alternatively, carry out operation to disk described in step C to comprise: reading and writing data, Data Migration, data balancing, play dish, fall dish.

Alternatively, the method comprises further:

When disk rejoins distributed file system after off-line, the data memory node at this disk place upgrades local disk management information;

The disk management information of local maintenance is sent to metadata management node by described data memory node;

After metadata node receives disk management information, this disk is verified, verification content comprises: whether this disk group identification mates with current cluster, if so, judge whether the disk management identifier of this disk is present in database, if, whether further inspection UUID is consistent with database, if verification is passed through, metadata node according to the disk management information in the disk management information updating database received, and to described data memory node back-checking success message.Otherwise metadata node is to data memory node back-checking failed message.

Alternatively, step C comprises:

If the some disks on some data memory nodes are pulled out or go offline suddenly, this data memory node obtains this disk and to go offline state, then upgrade the disk management information of local maintenance;

This data memory node sends to metadata management node the disk carrying the disk management identifier of this disk and to go offline notification message;

Metadata management node receives disk and to go offline notification message, and it is invalid disk management information corresponding for the identifier of disk management described in database to be set to.

Alternatively, described disk management information disk management described in database being identified corresponding ID be set to invalid after comprise further:

Metadata management node, by reading information from other data memory nodes, constructs the data that disk corresponding to described disk management identifier stores, and these data is written on other disks of distributed file system.

Alternatively, step C comprises:

Metadata management node has determined whether data memory node off-line according to the heartbeat message between data memory node, if, metadata management node determines the disk management identifier that described data memory node comprises, and it is invalid disk management information corresponding for the identifier of disk management described in database to be set to;

Metadata management node, by reading information from other data memory nodes, constructs the data that disk corresponding to described disk management identifier stores, and these data is written on other disks of distributed file system.

Alternatively, step C comprises:

Step 601: client sends data write request by proxy module to metadata management node;

Step 602: the disk management information in metadata management Nodes Retrieval database, finds storage space can meet the disk of said write request, and the disk management identifier of described disk and positional information are sent to proxy module;

Step 603: write request is routed to corresponding disk according to described management mark and positional information and carries out data write by proxy module;

Step 604: judge whether to write successfully, if so, process ends, otherwise perform step 605;

Step 605: the data memory node at described disk place returns write failed message to metadata management node.Return step 602.

Alternatively, step C comprises:

Step 701: client sends data read request by proxy module to metadata management node;

Step 702: the disk management information in metadata management Nodes Retrieval database, finds the disk that described read request is corresponding, and the disk management identifier of described disk and positional information are sent to proxy module;

Step 703: read request is routed to corresponding disk according to described disk management identifier and positional information and reads data by proxy module;

Step 704: judge whether to be read as merit, if so, process ends, otherwise perform step 705;

Step 705: judge whether read number of times exceedes predetermined threshold value, if so, process ends, otherwise perform step 706;

Step 706: the data memory node at described disk place returns reading failure message to metadata management node.Return step 702.

As can be seen from the above technical solutions, each disk has the unique disk management identifier of the overall situation in this distributed file system, carries out the operations such as data distribution, read-write in use based on this mark to disk; Disk, independent of back end, and at this disk after adding distributed file system, when this disk moves to another data memory node from a data memory node, is positioned at data on this dish without the need to migration or amendment.According to the application's scheme, disk is independent of data memory node, and the management identifier based on disk is safeguarded disk, simplifies the maintenance work of cluster array, and carries out rationalization to all hard disks, and provide corresponding error handle and Restoration Mechanism.

Accompanying drawing explanation

The structural representation of the distributed file system that Fig. 1 provides for the embodiment of the present application;

The realization flow schematic diagram of the disk management method of the distributed file system that Fig. 2 provides for the embodiment of the present application;

Fig. 3 for the embodiment of the present application provide for the management process schematic diagram of again reaching the standard grade after disk off-line;

The treatment scheme schematic diagram that the disk that Fig. 4 provides for the embodiment of the present application goes offline;

The treatment scheme schematic diagram of the data memory node off-line that Fig. 5 provides for the application's embodiment;

Fig. 6 for another embodiment of the application provide write data exception time treatment scheme schematic diagram;

Treatment scheme schematic diagram during the read data exception that Fig. 7 provides for another embodiment of the application.

Embodiment

The disk management method for distributed file system that the application provides, its fundamental design idea is: each disk has the unique identifier of the overall situation in this distributed file system, carries out the operations such as data distribution, read-write in use based on this mark to disk; Disk, independent of back end, and at this disk after adding distributed file system, when this disk moves to another data memory node from a data memory node, is positioned at data on this dish without the need to migration or amendment.

For making the know-why of technical scheme, feature and technique effect clearly, below in conjunction with specific embodiment, technical scheme is described in detail.

The structural representation of the distributed file system that Fig. 1 provides for the embodiment of the present application.This distributed file system comprises agent node 101, metadata management node 102 and multiple data memory node 103.Wherein each data memory node there is at least one disk.Each disk has the unique management identifier of the overall situation.Database in metadata management node 102 is for storing and managing the disk management information that these manage identifiers and other necessity.

Request, as the communication interface between distributed file system and client, for receiving the various requests from client, and is sent to corresponding node by described agent node 101.

Data memory node 103 is the functional units for store file data.Each data memory node 103 can safeguard the disk management information relevant to notebook data memory node, when local metadata needs to upgrade, Data Update request is sent to metadata management node 102, metadata management node 102 verifies this request, the disk management information after having verified more in new database.

The realization flow of the disk management method of the distributed file system that the embodiment of the present application provides as shown in Figure 2, comprises the steps:

Step 201: when one piece of disk adds distributed file system, the data memory node at this disk place sends the notification message increasing disk to metadata node.

Step 202: after metadata management node receives notification message, for this dish distributes an overall situation unique mark (being designated management identifier hereinafter referred to as this),

Step 203: this management mark is preserved ID in the database of metadata management node.

Step 204: disk is operated based on described disk management.

The follow-up all operations to this dish all carries out based on this management mark.Described operation includes but not limited to: reading and writing data, Data Migration, data balancing, play dish, fall dish etc.

When the management information of disk changes, data memory node is by the disk management of this disk mark and management information notice metadata management node, metadata management node is identified at database retrieval to corresponding disk management information based on described disk management, and upgrades described disk management information.The general unique identifier (UUID, UniversallyUniqueIdentifier) that described disk management information generates when including but not limited to identifier disk size, free space, group identification, disk formatting.

If certain data memory node with the addition of new disk or deletes disk, then notify metadata management node by this data memory node.If distributed file system is restarted, all data memory nodes in this system all will notify metadata management node.

For reading and writing data, its process is: client sends write request by proxy module, metadata management Nodes Retrieval database finds a disk having free space, returns to management identifier and the positional information (data memory node position, disk are in the position of this data memory node) of this disk of proxy module.Proxy module writes data according to described management mark and positional information at data memory node.

In distributed file system operational process, inevitably disk failures, the abnormal conditions such as go offline and again reach the standard grade, and needs to process these abnormal conditions, make distributed file system remain normal and run.

Fig. 3 show that the embodiment of the present application provides for the management process of again reaching the standard grade after disk off-line, comprising:

Step 301: when disk rejoins distributed file system after off-line, the data memory node at this disk place upgrades local disk management information.

Step 302: the disk management information of local maintenance is sent to metadata management node by described data memory node.Preferably, in this step, the changing unit of disk management information only can be sent to metadata management node by data memory node.

Step 303: after metadata node receives disk management information, this disk is verified and (verifies content to comprise: whether verification group information mates with current cluster, if, judge whether the disk management identifier of this disk is present in database, if so, whether inspection disk UUID is consistent with database further), if verification is passed through, perform step 304, otherwise perform step 305.

Step 304: metadata node according to the disk management information in the disk management information updating database received, and to described data memory node back-checking success message.

Step 305: metadata node is to data memory node back-checking failed message.

The treatment scheme that the disk that Fig. 4 shows the embodiment of the present application to be provided goes offline, comprising:

Step 401: if the some disks on some data memory nodes are pulled out or go offline suddenly, this data memory node obtains this disk and to go offline state, then upgrade the disk management information of local maintenance.

Particularly, data memory node can obtain Disk State by the udev event of Linux system.

Step 402: this data memory node sends to metadata management node the disk carrying the disk management identifier of this disk and to go offline notification message.

Step 403: metadata management node receives disk and to go offline notification message, and it is invalid disk management information corresponding for the identifier of disk management described in database to be set to.

Alternatively, comprise further after step 403:

Step 404: metadata management node, by reading information from other data memory nodes, constructs the data that disk corresponding to described disk management identifier stores, and these data is written on other disks of distributed file system.

Fig. 5 shows the treatment scheme of the data memory node off-line that the application's embodiment provides, and comprising:

Step 501: metadata management node has determined whether data memory node off-line according to the heartbeat message between data memory node, if so, performs step 502.

Step 502: metadata management node determines the disk management identifier that described data memory node comprises, it is invalid disk management information corresponding for the mark of disk management described in database to be set to.

Step 503: metadata management node, by reading information from other data memory nodes, constructs the data that disk corresponding to described disk management identifier stores, and these data is written on other disks of distributed file system.

Treatment scheme during data exception that what Fig. 6 showed that another embodiment of the application provides write, comprising:

Step 601: client sends data write request by proxy module to metadata management node.

Step 602: the disk management information in metadata management Nodes Retrieval database, finds storage space can meet the disk of said write request, and the disk management identifier of described disk and positional information are sent to proxy module.

Step 603: write request is routed to corresponding disk according to described management mark and positional information and carries out data write by proxy module.

Step 604: judge whether to write successfully, if so, process ends, otherwise perform step 605.

Step 605: the data memory node at described disk place returns write failed message to metadata management node.Return step 602.

Fig. 7 show read data that another embodiment of the application provides abnormal time treatment scheme, comprising:

Step 701: client sends data read request by proxy module to metadata management node.

Step 702: the disk management information in metadata management Nodes Retrieval database, finds the disk that described read request is corresponding, and the disk management identifier of described disk and positional information are sent to proxy module.

Step 703: read requests is routed to corresponding disk according to described management identifier and positional information and reads data by proxy module.

Step 704: judge whether to be read as merit, if so, process ends, otherwise perform step 705.

Step 705: judge whether read number of times exceedes predetermined threshold value, if so, process ends, otherwise perform step 706.

Step 706: the data memory node at described disk place returns reading failure message to metadata management node.Return step 702.

The disk management method that the application provides, disk is independent of data memory node, and the management identifier based on disk is safeguarded disk, simplifies the maintenance work of cluster array, and rationalization is carried out to all hard disks, and provide corresponding error handle and Restoration Mechanism.

Be to be understood that, although this instructions describes according to each embodiment, but not each embodiment only comprises an independently technical scheme, this narrating mode of instructions is only for clarity sake, those skilled in the art should by instructions integrally, technical scheme in each embodiment also through appropriately combined, can form other embodiments that it will be appreciated by those skilled in the art that.

The foregoing is only the preferred embodiment of the application; not in order to limit the protection domain of the application; within all spirit in technical scheme and principle, any amendment made, equivalent replacements, improvement etc., all should be included within scope that the application protects.

Claims (10)

1. a disk management method for distributed file system, is characterized in that, comprising:
A, when one piece of disk adds distributed file system, the data memory node at this disk place sends the notification message increasing disk to metadata node;
After B, metadata management node receive notification message, it is this disk allocation disk management identifier; This management mark is kept in the database of metadata management node;
C, based on described disk management identifier, disk to be operated.
2. method according to claim 1, it is characterized in that, the method comprises further: when the management information of disk changes, data memory node is by the disk management of this disk mark and disk management message notice metadata management node, metadata management node based on described disk management identifier at database retrieval to corresponding disk management information, and upgrade described disk management information.
3. method according to claim 2, is characterized in that, described disk management information comprises: the general unique identifier UUID generated when identifier disk size, free space, group identification, disk formatting.
4. method according to claim 1, is characterized in that, carries out operation comprise described in step C to disk: reading and writing data, Data Migration, data balancing, play dish, fall dish.
5. method according to claim 3, is characterized in that, the method comprises further:
When disk rejoins distributed file system after off-line, the data memory node at this disk place upgrades local disk management information;
The disk management information of local maintenance is sent to metadata management node by described data memory node;
After metadata node receives disk management information, this disk is verified, verification content comprises: whether this disk group identification mates with current cluster, if so, judge whether the disk management identifier of this disk is present in database, if, whether further inspection UUID is consistent with database, if verification is passed through, metadata node according to the disk management information in the disk management information updating database received, and to described data memory node back-checking success message.Otherwise metadata node is to data memory node back-checking failed message.
6. method according to claim 1, is characterized in that, step C comprises:
If the some disks on some data memory nodes are pulled out or go offline suddenly, this data memory node obtains this disk and to go offline state, then upgrade the disk management information of local maintenance;
This data memory node sends to metadata management node the disk carrying the disk management identifier of this disk and to go offline notification message;
Metadata management node receives disk and to go offline notification message, and it is invalid disk management information corresponding for the identifier of disk management described in database to be set to.
7. method according to claim 6, is characterized in that, described disk management information disk management described in database being identified corresponding ID be set to invalid after comprise further:
Metadata management node, by reading information from other data memory nodes, constructs the data that disk corresponding to described disk management identifier stores, and these data is written on other disks of distributed file system.
8. method according to claim 1, is characterized in that, step C comprises:
Metadata management node has determined whether data memory node off-line according to the heartbeat message between data memory node, if, metadata management node determines the disk management identifier that described data memory node comprises, and it is invalid disk management information corresponding for the identifier of disk management described in database to be set to;
Metadata management node, by reading information from other data memory nodes, constructs the data that disk corresponding to described disk management identifier stores, and these data is written on other disks of distributed file system.
9. method according to claim 1, is characterized in that, step C comprises:
Step 601: client sends data write request by proxy module to metadata management node;
Step 602: the disk management information in metadata management Nodes Retrieval database, finds storage space can meet the disk of said write request, and the disk management identifier of described disk and positional information are sent to proxy module;
Step 603: write request is routed to corresponding disk according to described management mark and positional information and carries out data write by proxy module;
Step 604: judge whether to write successfully, if so, process ends, otherwise perform step 605;
Step 605: the data memory node at described disk place returns write failed message to metadata management node.Return step 602.
10. method according to claim 1, is characterized in that, step C comprises:
Step 701: client sends data read request by proxy module to metadata management node;
Step 702: the disk management information in metadata management Nodes Retrieval database, finds the disk that described read request is corresponding, and the disk management identifier of described disk and positional information are sent to proxy module;
Step 703: read request is routed to corresponding disk according to described disk management identifier and positional information and reads data by proxy module;
Step 704: judge whether to be read as merit, if so, process ends, otherwise perform step 705;
Step 705: judge whether read number of times exceedes predetermined threshold value, if so, process ends, otherwise perform step 706;
Step 706: the data memory node at described disk place returns reading failure message to metadata management node.Return step 702.
CN201510699041.9A 2015-10-26 2015-10-26 Disk management method of distributed file system CN105278882A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510699041.9A CN105278882A (en) 2015-10-26 2015-10-26 Disk management method of distributed file system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510699041.9A CN105278882A (en) 2015-10-26 2015-10-26 Disk management method of distributed file system

Publications (1)

Publication Number Publication Date
CN105278882A true CN105278882A (en) 2016-01-27

Family

ID=55147969

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510699041.9A CN105278882A (en) 2015-10-26 2015-10-26 Disk management method of distributed file system

Country Status (1)

Country Link
CN (1) CN105278882A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106250278A (en) * 2016-08-04 2016-12-21 深圳市泽云科技有限公司 The data of magnetic disk array restoration methods that an a kind of key performs
CN106406754A (en) * 2016-08-31 2017-02-15 北京小米移动软件有限公司 Data migration method and device
CN106527996A (en) * 2016-11-23 2017-03-22 成都广达新网科技股份有限公司 Disk identification method and device for disk cabinet
CN106528005A (en) * 2017-01-12 2017-03-22 郑州云海信息技术有限公司 Disk adding method and device for distributed type storage system
CN106528005B (en) * 2017-01-12 2019-12-31 苏州浪潮智能科技有限公司 Disk adding method and device of distributed storage system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102122306A (en) * 2011-03-28 2011-07-13 中国人民解放军国防科学技术大学 Data processing method and distributed file system applying same
CN102843403A (en) * 2011-06-23 2012-12-26 盛大计算机(上海)有限公司 File processing method based on distributed file system, system, and client
US20140250155A1 (en) * 2011-11-17 2014-09-04 Huawei Technologies Co., Ltd. Metadata storage and management method for cluster file system
CN104113606A (en) * 2014-08-02 2014-10-22 成都致云科技有限公司 Uniformity dynamically-balanced distributed metadata node framework
CN104731915A (en) * 2015-03-24 2015-06-24 上海爱数软件有限公司 Magnetic disk device mapping method in distributed memory system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102122306A (en) * 2011-03-28 2011-07-13 中国人民解放军国防科学技术大学 Data processing method and distributed file system applying same
CN102843403A (en) * 2011-06-23 2012-12-26 盛大计算机(上海)有限公司 File processing method based on distributed file system, system, and client
US20140250155A1 (en) * 2011-11-17 2014-09-04 Huawei Technologies Co., Ltd. Metadata storage and management method for cluster file system
CN104113606A (en) * 2014-08-02 2014-10-22 成都致云科技有限公司 Uniformity dynamically-balanced distributed metadata node framework
CN104731915A (en) * 2015-03-24 2015-06-24 上海爱数软件有限公司 Magnetic disk device mapping method in distributed memory system

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106250278A (en) * 2016-08-04 2016-12-21 深圳市泽云科技有限公司 The data of magnetic disk array restoration methods that an a kind of key performs
CN106406754A (en) * 2016-08-31 2017-02-15 北京小米移动软件有限公司 Data migration method and device
CN106527996A (en) * 2016-11-23 2017-03-22 成都广达新网科技股份有限公司 Disk identification method and device for disk cabinet
CN106528005A (en) * 2017-01-12 2017-03-22 郑州云海信息技术有限公司 Disk adding method and device for distributed type storage system
CN106528005B (en) * 2017-01-12 2019-12-31 苏州浪潮智能科技有限公司 Disk adding method and device of distributed storage system

Similar Documents

Publication Publication Date Title
US8510508B2 (en) Storage subsystem and storage system architecture performing storage virtualization and method thereof
US8533157B2 (en) Snapshot management apparatus and method, and storage system
US9552248B2 (en) Cloud alert to replica
EP1179770B1 (en) File system
US9229646B2 (en) Methods and apparatus for increasing data storage capacity
Bronson et al. {TAO}: Facebook’s Distributed Data Store for the Social Graph
KR101833114B1 (en) Fast crash recovery for distributed database systems
US7647327B2 (en) Method and system for implementing storage strategies of a file autonomously of a user
US7353335B2 (en) Storage control method for database recovery in logless mode
JP2016508275A (en) Safety device for volume operation
KR20110086114A (en) Distributed data storage
JP5411250B2 (en) Data placement according to instructions to redundant data storage system
US9524218B1 (en) Leverage fast VP extent-level statistics within CDP environments
US20030158999A1 (en) Method and apparatus for maintaining cache coherency in a storage system
US8996611B2 (en) Parallel serialization of request processing
JP2005196683A (en) Information processing system, information processor and control method of information processing system
JP2004334574A (en) Operation managing program and method of storage, and managing computer
KR101915826B1 (en) Synchronous replication in a distributed storage environment
JP2016517124A (en) Efficient read replica
CN101997823B (en) Distributed file system and data access method thereof
CN102667772B (en) File level hierarchical storage management system, method, and apparatus
US7702757B2 (en) Method, apparatus and program storage device for providing control to a networked storage architecture
EP2400382B1 (en) Storage system
US9514007B2 (en) Database system with database engine and separate distributed storage service
KR101771246B1 (en) System-wide checkpoint avoidance for distributed database systems

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20160127