WO2018058949A1

WO2018058949A1 - Data storage method, device and system

Info

Publication number: WO2018058949A1
Application number: PCT/CN2017/082141
Authority: WO
Inventors: 任永强; 谢晓芹
Original assignee: 华为技术有限公司
Priority date: 2016-09-30
Filing date: 2017-04-27
Publication date: 2018-04-05
Also published as: CN106446197B; CN106446197A

Abstract

A data storage method, device and system, the method comprising: determining a subtree path in a distributed file system (101); obtaining home node information on the substree path, the home node information indicating information about connections between subtrees on the subtree path and home data nodes of all subtrees (102); and storing the home node information on the subtree path (103). The present technical solution can resolve the problem of long subtree management takeover time when a fault occurs to an MDS, and lower the requirement for the consistency of subtree information of the MDS.

Description

Data storage method, device and system

Technical field

The present invention relates to the field of communications technologies, and in particular, to a data storage method, apparatus, and system.

Background technique

A distributed file system (Distributed File System) is a file system network that is formed by extending a file system fixed to a node to any number of nodes/file systems and connecting them through a plurality of nodes. . The system usually includes the following three types of nodes: Protocol Server, Meta Data Server (MDS), and Data Server (DS). The distributed file system has various ways of dividing metadata services, such as a dynamic subtree based approach. In a dynamic subtree-based manner, each node maintains one or more subtrees, which is responsible for a portion of the metadata service.

Currently, when an MDS in a distributed file system fails, it is necessary to select a normal MDS (ie, the receiver) to take over the subtree belonging to the faulty MDS to continue to provide the metadata service. This process is called a subtree. Take over or subtree recovery. The subtree takes over as follows: the receiver collects the subtree path information and the affiliation relationship from all the nodes in the system, and deduplicates the collected subtree information to regenerate the subtree information on the faulty MDS, that is, the faulty MDS. reconstruction. In this mode, the receiver needs to send a request to all the MDSs in the system, so that when the cluster is large during the takeover process and the number of sent and received messages of the node is large, a single point bottleneck is easily caused, resulting in a long takeover time. Moreover, the method requires that the sub-tree information of each MDS has a high consistency. If the sub-tree information of each MDS is inconsistent, the receiver's reconstruction algorithm is likely to be complicated.

Summary of the invention

The embodiment of the invention provides a data storage method, device and system, which can solve the problem that the subtree takes over a long time when a data node fails, and reduces the consistency requirement of the subtree information of the data node.

In a first aspect, an embodiment of the present invention provides a data storage method, including:

Determining a subtree path in the distributed file system, obtaining node attribution information on the subtree path, and storing node attribution information on the subtree path.

The subtree path may include a root node (a subtree root node, referred to as a “subtree root”) of a subtree in each data node in the distributed file system, and a root node of the subtree of each data node to the system. A subtree node on the path of the root node. The node attribution information indicates connection information between subtrees on the subtree path and a home data node of each subtree. Therefore, operations such as subtree migration or subtree takeover can be performed based on the stored node attribution information, thereby reducing consistency requirements for subtree information in the system.

In an optional embodiment, the storing the node attribution information on the subtree path may be specifically: generating a subtree path attribution table including the node attribution information on the subtree path, and storing the subtree path attribution table. If the subtree path attribution table can be written to the disk for persistent storage.

The connection information between the subtrees on the subtree path includes an inode number of each subtree node on the subtree path, and an inode number of a previous subtree node of each subtree node. Optionally, the subtree path attribution table may further include subtree root node information and the like on the subtree path.

In an optional embodiment, when the target subtree belonging to the first data node needs to be migrated to the second data node, the index node number of the target subtree may be obtained; and the subtree path attribution table is found out An index node number with the same index node number of the target subtree; and the home data node corresponding to the found index node number in the subtree path attribution table is updated to the second data node. The index node number of the target subtree may specifically refer to the target subtree. The index node number of the root node of the subtree. Thereby subtree migration can be implemented based on the stored node attribution information.

In an optional embodiment, a subtree takeover request indicating that the third data node is faulty may also be received; and in response to the subtree takeover request, the belonging of the subtree belonging to the third data node in the subtree path attribution table Updating the data node to a fourth data node; and returning, to the fourth data node, subtree information belonging to the third data node in the subtree path attribution table before updating, so that the fourth data node is in the Subtree reconstruction is performed in the data cache of the four data nodes. The subtree takeover can be performed based on the stored node attribution information, thereby solving the problem that the subtree takeover time is long when the data node is in an MDS fault.

In a second aspect, the embodiment of the present invention further provides a data storage system, including: a first data node and a central node; wherein

a first data node, configured to provide a data service for a subtree of the first data node;

And a central node, configured to determine a subtree path in the distributed file system where the first data node is located; acquire node attribution information on the subtree path, and store node attribution information on the subtree path.

The subtree path may include a root node of a subtree in each data node in the distributed file system and a subtree node on a path of a root node of the subtree of each data node to a root node of the system. The node attribution information indicates connection information between subtrees on the subtree path and a home data node of each subtree. Optionally, the central node may be any data node in the distributed file system, such as an MDS, that is, an identity is superimposed on the data node, and the function still provides a metadata service; or is separately set in the system. The node of the present invention is not limited.

In an optional embodiment, the central node may be further configured to generate a subtree path attribution table including node attribution information on the subtree path, and store the subtree path attribution table.

The connection information between the subtrees on the subtree path may include an inode number of each subtree node on the subtree path and an inode number of a previous subtree node of each subtree node. Optionally, the subtree path attribution table may further include subtree root node information on the subtree path, a subtree node name associated with the inode number, and the like.

Further, the system may further include: a second data node; wherein

The first data node may be further configured to send a migration notification message to the central node, where the migration notification message indicates an inode number of the target subtree to be migrated and a second data node to be migrated to;

The central node is further configured to: find, from the subtree path attribution table, an index node number that is the same as an index node number of the target subtree; and corresponding to the found index node number in the subtree path attribution table. The home data node is updated to the second data node.

The first data node is a data node that needs to perform subtree migration, and the second data node is a determined data node that needs to be migrated. Optionally, the second data node that needs to be migrated may be selected according to a preset rule, such as a data node with the lowest heat, or may be randomly selected, which is not limited in the embodiment of the present invention. Thereby subtree migration can be implemented based on the stored node attribution information.

Further, the second data node may be further configured to construct a data cache, and update a home data node of the target subtree in the cache to the second data node;

The first data node is further configured to update a home data node of the target subtree in a data cache of the first data node to the second data node.

That is, the second data node may also construct a data cache. After updating the subtree path attribution table, the second data node may change the home data node of the target subtree in the cache to itself. A data node can change the home data node of the target subtree in the cache to the second data node. In order to quickly extract subtree information.

In an optional embodiment, the system may include: a third data node that is a failed data node, and a fourth data node that determines the takeover of the third data The data node of the subtree of the node;

The fourth data node is configured to send, to the central node, a subtree takeover request for indicating that the third data node is faulty;

The central node is configured to receive the subtree takeover request, update the home data node of the subtree belonging to the third data node in the subtree path attribution table to a fourth data node, and return to the fourth data node Subtree information attributed to the third data node in the subtree path attribution table before the update;

The fourth data node is further configured to perform subtree reconstruction in a data cache of the fourth data node based on the subtree information of the third data node.

The central node, the first data node, the second data node, the third data node, the fourth data node, and the like may be the same data node or a different node, which is not limited in the embodiment of the present invention. Therefore, the subtree migration and the subtree takeover can be performed based on the stored node attribution information, thereby solving the problem that the subtree takes over a long time when the data node is faulty, such as the MDS, and reduces the consistency requirement of the subtree information in the system. .

In a third aspect, the embodiment of the present invention further provides a data storage device, including: a path determining module, a first obtaining module, and a storage module, wherein the data storage device can implement the part of the data storage method of the first aspect by using the foregoing module Or all the steps.

In a fourth aspect, an embodiment of the present invention further provides a computer storage medium, where the computer storage medium stores a program, and the program includes some or all of the steps of the data storage method of the first aspect.

In a fifth aspect, an embodiment of the present invention further provides a data server, including: a memory and a processor, where the memory is connected to the processor;

The memory is used to store driver software;

The processor is configured to read the driver software from the memory and execute the function of the driver software:

Determining the subtree path in the distributed file system;

Obtaining node attribution information on the subtree path, where the node attribution information indicates connection information between subtrees on the subtree path and a home data node of each subtree;

The node attribution information on the subtree path is stored.

Optionally, the processor is further configured to read the driver software from the memory and perform some or all of the steps of the data storage method of the first aspect described above by the driver software.

Embodiments of the present invention have the following beneficial effects:

In the implementation of the present invention, by detecting a subtree path in the distributed file system, obtaining connection information between subtrees on the subtree path and node attribution information of the home data node of each subtree, and storing the node information The node attribution information enables the migration between the subtrees and the takeover process when the subtrees are faulted based on the stored node attribution information, thereby solving the problem that the subtree takes over a long time when the data node fails, and reduces the subnode of the data node. Tree information consistency requirements.

DRAWINGS

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the description of the prior art will be briefly described below. Obviously, the drawings in the following description are only It is a certain embodiment of the present invention, and it can be rooted by those skilled in the art without any creative work. Other figures are obtained from these figures.

FIG. 1 is a structural diagram of a distributed file system according to an embodiment of the present invention;

2 is a schematic flowchart of a data storage method according to an embodiment of the present invention;

3 is a schematic diagram of a subtree grouping according to an embodiment of the present invention;

4 is a schematic diagram of a subtree path according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of interaction of a data storage method according to an embodiment of the present invention;

FIG. 6 is a schematic diagram of a subtree migration according to an embodiment of the present invention;

FIG. 7 is a schematic diagram of another seed tree path according to an embodiment of the present invention;

FIG. 8 is a schematic diagram of interaction of another data storage method according to an embodiment of the present invention; FIG.

FIG. 9 is a schematic diagram of another seed tree path according to an embodiment of the present invention; FIG.

FIG. 10 is a schematic structural diagram of a data storage device according to an embodiment of the present invention;

11 is a schematic structural diagram of a data storage system according to an embodiment of the present invention;

FIG. 12 is a schematic structural diagram of a data server according to an embodiment of the present invention.

detailed description

The technical solutions in the embodiments of the present invention are clearly and completely described in the following with reference to the accompanying drawings in the embodiments of the present invention. It is obvious that the described embodiments are only a part of the embodiments of the present invention, but not all embodiments. All other embodiments obtained by those skilled in the art based on the embodiments of the present invention without creative efforts are within the scope of the present invention.

In a distributed file system, it usually includes three types of nodes: a protocol server (Protocol Server), an MDS, and a DS. A plurality of nodes are connected to form a cluster. The protocol server is responsible for providing the user with a standard Network File System (NFS) and SMB (Server Message Block) network file service function. MDS is responsible for providing metadata related services such as parsing paths, finding files, and creating files. The MDS is responsible for storing the data of the file. The user program can access the file system through a protocol client. The protocol client and the protocol server are connected through a front-end network (FE), and each node in the file system is connected through a back-end network (BE). The front-end network is used for request and data interaction between the user service and the distributed file system, and the back-end network is used for request and data interaction between the various node devices in the distributed file system. Further, the three types of nodes may be logical nodes, and may be deployed on the same physical device or may be deployed separately, which is not limited in the embodiment of the present invention.

In a distributed file system, in order to achieve better performance, the entire system's metadata service can be distributed to each MDS (or DS), and each MDS (or DS) is responsible for a part of the metadata service. At the same time, the system can continue to provide metadata services when one or several MDSs (or DSs) fail, thereby improving system reliability. The embodiment of the present invention is described by taking a data node as an MDS as an example. The distributed file system divides the metadata service, that is, the manner of dividing the MDS, for example, based on a dynamic subtree. Specifically, the system can count the heat of each file, directory (or directory fragment) and directory tree in the cache according to the access type of the service (such as getattr, setattr, readdir, etc.) and the access frequency, and the different access types are accumulated. Different heats, and the higher the frequency of access, the higher the heat. As the heat increases, the entire tree can be divided into multiple subtrees and migrated to different MDSs. Each MDS is responsible for providing metadata services belonging to its own subtree. Among them, each MDS only needs to cache the metadata related to itself, including the part of the subtree metadata belonging to itself (used to provide metadata service), the boundary point of the subtree and the subordinate subtree (downward access) It is possible to know where the metadata originated from another MDS, and the path between the root of the subtree and the root directory (when the node fails, it can still parse the full path).

Further, as each MDS load changes, subtree heat changes, etc., the subtrees can move freely between MDSs and can be split into smaller subtrees or merged into large subtrees. The division and attribution of its subtrees are dynamically adjusted with the access of the business (ie, the heat, ie the load of the nodes), rather than static.

The present application introduces a central node to centrally manage subtree information in the entire system through the central node. Wherein, the central node may be any MDS (or DS) node in the system (ie, an identity is superimposed on the MDS, which still has the function of providing a metadata service as the MDS), or an additional set node, the present invention The embodiment is not limited. Therefore, the problem of large single-point message volume and long takeover time when the sub-tree is taken over in a large cluster is solved, and the takeover logic is also simplified.

The embodiment of the invention discloses a data storage method, device, data server and system, which can solve the problem that the subtree takes over a long time when the data node fails, and reduces the consistency requirement of the subtree information of the data node. The details are explained below.

Referring to FIG. 2, FIG. 2 is a schematic flowchart of a data storage method according to an embodiment of the present invention. Specifically, the method of the embodiment of the present invention may be specifically applied to a central node such as an MDS or a DS. As shown in FIG. 2, the method of the embodiment of the present invention may include the following steps:

101. Determine a subtree path in the distributed file system.

The subtree path of a subtree may refer to the path from the root of the subtree to the root of the system. In the embodiment of the present invention, the subtree path of each subtree in the distributed file system (hereinafter referred to as “system”) may be determined first, that is, the subtree path in the system may include each data node in the system. The root node of the subtree and the root node of the subtree of each data node to the subtree node on the path of the root node (system root directory) of the system.

102. Acquire node attribution information on the subtree path, where the node attribution information indicates connection information between subtrees on the subtree path and a home data node of each subtree.

The connection information between the subtrees on the subtree path may include information such as an inode number of each subtree node and an inode number of a previous subtree node of each subtree node. The index node number is identification information of the subtree node, and each index node number uniquely identifies a subtree node (such as a directory, a file, etc.) in the distributed file system.

103. Store node attribution information on the subtree path.

Optionally, the storing the node attribution information on the subtree path may be specifically: generating a subtree path attribution table including the node attribution information on the subtree path, and storing the subtree path attribution table . That is, the subtree path attribution table includes subtree paths of all the subtrees in the system to the system root node. Optionally, the subtree path attribution table may further include subtree root node information on the subtree path. The name of the subtree node associated with the inode number, and so on.

Referring to FIG. 3, FIG. 3 is a schematic diagram of a subtree grouping according to an embodiment of the present invention. As shown in FIG. 3, the distributed file system includes three data nodes, namely, MDS1, MDS2, and MDS3, and is divided into four subtrees, including subtrees A, B, C, and D. Among them, the subtree nodes (/, var, lib, etc, usr, lib, lib.so.6) in subtree A belong to MDS1, subtree nodes in subtree B (log, messages, news, news. Err), the subtree nodes in subtree D (such as vim72, plugin) belong to MDS2, and the subtree nodes (share, bash, helpfiles, kill, cd, vim, site) in subtree C belong to MDS3; /" is the root node in the distributed file system, that is, the system root directory; the subtree nodes on the subtree path in the distributed file system are /, var, usr, log, share, vim, vim72, assuming respectively Recorded as ino1, ino2, ino3, ino4, ino5, ino6, ino7 (of course, can also be identified as other forms, here only for example), the subtree node on the subtree path is associated with its corresponding identification information ino (eg "/" associated with ino1, "var" associated with ino2, "usr" associated with ino3, "log" associated with ino4, "share" associated with ino5, "vim" and Ino6 association, "vim72" is associated with ino7), as shown in Figure 4. Further, based on the node attribution information of the subtree node on the subtree path, a subtree path attribution table may be generated, as shown in the following Table 1:

Table I

Where ino is the subtree node on the subtree path, dirino is the previous subtree node (parent node) of the subtree node, and subtree flag indicates the subtree root node (referred to as "subtree root") information, which is "1" ” indicates that the subtree node is the subtree root node of the data node, and “0” indicates that the subtree node is not the subtree root node of the data node; auth represents the home data node of the subtree node. The subtree flag subtree flag can be used as a flag to indicate whether a node on the path is a subtree root (that is, a boundary where the upper and lower nodes belong to change). Optionally, the subtree root information may not be stored in the subtree path attribution table. At this time, the subtree root information can also be deduced from the attribution information, that is, determined according to ino, dirino, and auth. For example, the home data node of ino3 is MDS1, and the home data node of ino4 is MDS2. If the attribution changes, it can be determined that starting from ino4, it is a subtree, and ino4 is the subtree root of the subtree. Further, each ino recorded in the attribution table is associated with /, var, usr, log, share, vim, vim72, and the ino is part of a file or directory attribute, and is unique in the system, which is equivalent to a file or a directory. Etc. ID. Therefore, ino can not be replaced by the name of the subtree node in the attribution table, because there may be multiple files (or directories) with the same name in the system. If there are multiple directories named "var", it is impossible to uniquely determine the Subtree node.

Optionally, the subtree path attribution table may be written to the disk for persistent storage. Further, the content in the subtree path attribution table may be cached in the form of a tree in the memory to quickly access the subtree path and obtain the subtree node information.

Further optionally, after storing the node attribution information on the subtree path, the subtree migration may be performed based on the stored node attribution information. Specifically, when the target sub-tree belonging to the first data node needs to be migrated to the second data node, the central node may acquire an index node number of the target sub-tree; and find and describe the sub-tree path attribution table from the sub-tree path An index node number of the target sub-tree having the same inode number; updating the home data node corresponding to the found index node number in the subtree path attribution table to the second data node. Optionally, the subtree migration process may be triggered when the node is too hot, or triggered when the system heat allocation adjustment is needed, and so on; the second data node that needs to be migrated may be in the system. Any data node that is selected according to a preset rule, such as the data node with the lowest heat, may also be randomly selected, which is not limited in the embodiment of the present invention.

Further optionally, after storing the node attribution information on the subtree path, when a certain data node fails, the subtree takeover (subtree recovery) may be performed based on the stored node attribution information. Specifically, receiving a subtree takeover request indicating that the third data node is faulty; and responding to the subtree takeover request, updating the home data node of the subtree belonging to the third data node in the subtree path attribution table a fourth data node; to the fourth data node Returning the subtree information attributed to the third data node in the subtree path attribution table before the update, so that the fourth data node performs subtree reconstruction in the cache. The third data node is the faulty node, and the fourth data node is the determined data node for taking over the subtree of the third data node. Optionally, the fourth data node of the takeover may be selected according to a preset rule, such as a data node with the lowest heat, or may be randomly selected, which is not limited in the embodiment of the present invention.

In the implementation of the present invention, by detecting a subtree path in the distributed file system, obtaining connection information between subtrees on the subtree path and node attribution information of the home data node of each subtree, and storing the node information The node attribution information enables the migration between the subtrees and the takeover process when the subtrees are faulty based on the stored node attribution information, thereby solving the problem that the subtree takes over a long time when the data node is in an MDS failure, and reduces the MDS. Sub-tree information consistency requirements.

Further, please refer to FIG. 5. FIG. 5 is a schematic diagram of interaction of a data storage method according to an embodiment of the present invention. The embodiment of the present invention is applied to a subtree migration scenario between data nodes, that is, the target subtree belonging to the first data node needs to be migrated to the second data node. In the embodiment of the present invention, it is assumed that the subtree D (ie, the target subtree) of the MDS 2 (ie, the first data node) needs to be migrated to the MDS 1 (ie, the second data node), as shown in FIG. 6. Referring to FIG. 5, together with FIG. 6, the method of the embodiment of the present invention may include the following steps:

201. The MDS2 sends a migration request to the MDS1.

Optionally, the migration request may be triggered when the traffic access of the MDS2 is too hot, such as exceeding a preset heat threshold. The MDS to be migrated to may be determined by any MDS or central node in the distributed file system in which the MDS 2 is located. For example, according to the heat of each MDS, the MDS1 with the lowest heat is determined as the MDS to be migrated. Specifically, the migration request may carry metadata information of the subtree D in the MDS2. The metadata information includes information such as a file name, an attribute, and a size.

202. The MDS1 constructs metadata in the subtree D in the cache.

203. The MDS1 replies to the MDS2 with a response message that the migration preparation is completed.

Specifically, the MDS2 notifies the MDS1 to start migrating the subtree D by sending a migration request to the MDS1. MDS1 can construct the metadata in subtree D in the cache according to the metadata information in the request, and can reply MDS2 with a migration preparation completion message.

204. The MDS2 sends a migration notification message to the central node.

In addition, when it is determined that the subtree D in the MDS2 needs to be migrated, a migration notification message may also be sent to the central node to notify the central node that the attribution of the subtree D becomes MDS1.

205. The central node updates the subtree path attribution table.

Specifically, after receiving the migration notification message, the central node may modify the subtree path attribution table, update the home data node of the subtree D from MDS2 to MDS1, obtain a new subtree path attribution table, and obtain the new subtree path attribution table. The subtree path ownership table is persisted to disk, which is saved to disk. The new subtree path attribution table can be as shown in Table 2 below:

Table II

Further, the subtree path structure in the cache may be updated, that is, the new subtree path attribution table is cached in the form of a tree in the memory, as shown in FIG. 7, which is a schematic diagram of the updated subtree path structure.

206. The central node returns a response message of successful update to the MDS2.

207. MDS2 notifies MDS1 that the migration is successful.

Specifically, after the central node updates the subtree path attribution table successfully, a response message of successful update may be returned to the MDS2 to notify the MDS2 that the attribution table is changed. After receiving the response message from the central node, MDS2 can send a message to MDS1 to notify the subtree to migrate successfully.

208. The MDS1 changes the attribution information of the subtree D in the cache.

209. The MDS1 returns a response message that the migration succeeds to the MDS2.

210. The MDS2 changes the attribution information of the subtree D in the cache.

Specifically, after receiving the notification message of successful migration, the MDS1 can change the subtree attribution information in the cache to itself, and can return a response message to the MDS2. After receiving the response message, the MDS2 can change the subtree ownership information in the cache to MDS1.

Further, please refer to FIG. 8. FIG. 8 is a schematic diagram of interaction of another data storage method according to an embodiment of the present invention. The embodiment of the present invention can be applied to a subtree takeover scenario when a data node is faulty, that is, when the third data node fails, the subtree of the third data node needs to be migrated to the fourth data node. In the embodiment of the present invention, if it is detected that the MDS2 (ie, the third data node) is faulty, the subtree of the MDS2 needs to be migrated to the MDS1 (ie, the fourth data node). As shown in FIG. 8, the method of the embodiment of the present invention may include the following steps:

301. When the MDS2 fails, the MDS1 sends a takeover request to the central node.

Specifically, when the MDS2 fails, the MDS that takes over the subtree of the MDS2 can be determined. Optionally, the MDS2 fault condition may be detected by any MDS or a central node other than the MDS2 in the distributed file system where the MDS2 is located. The MDS that needs to be taken over may be determined by any MDS or central node in the distributed file system in which the MDS 2 is located. For example, according to the heat of each MDS, the MDS1 with the lowest heat is determined as the MDS taken over. If the MDS of the takeover is MDS1, the MDS1 can send a takeover request to the central node (if it is determined that the data node to be taken over is the central node, the takeover request may not be sent), and the takeover request may carry the identification information of the MDS2 to inform the central node The MDS2 fails, and MDS1 needs to take over the subtree of MDS2.

302. The central node updates the subtree path attribution table.

303. The central node returns the subtree information of the MDS2 to the MDS1.

Specifically, after receiving the takeover request, the central node may search the subtree path attribution table, change the home information of all subtrees belonging to the MDS2 to the MDS1, obtain a new subtree path attribution table, and obtain the new The subtree path ownership table is persisted to disk, which is saved to disk. The new subtree path attribution table can be as shown in Table 3 below:

Table 3

Ino

Dirino

Subtree

Auth

		flagFlag
ino1Ino1	--	11	MDS1MDS1
ino2Ino2	ino1Ino1	00	MDS1MDS1
ino3Ino3	ino1Ino1	00	MDS1MDS1
ino4Ino4	ino2Ino2	11	MDS1MDS1
ino5Ino5	ino3Ino3	11	MDS3MDS3
ino6Ino6	ino5Ino5	00	MDS3MDS3
ino7Ino7	ino6Ino6	11	MDS1MDS1

Further, the subtree path structure in the cache may be updated, and the new subtree path attribution table is cached in the form of a tree in the memory, as shown in FIG. 9 , which is a schematic diagram of the updated subtree path structure.

Further, after updating the subtree path attribution table, the central node may also return all subtree information attributed to the MDS2 to the MDS1.

304. The MDS1 reconstructs the subtree information of the MDS2 in the cache.

Specifically, after receiving the subtree information of the MDS2, the MDS1 can reconstruct the subtree originally belonging to the MDS2 in the cache. Since the subtree path attribution table records the reduced subtree information (only ino is recorded), the metadata cache reconstructed by MDS4 is also incomplete, called the empty type. This type of metadata can be read by the business trigger and gradually filled in. Therefore, when an MDS node fails, the receiver can directly obtain the sub-tree information of the faulty MDS from the central node without sending a request to all MDSs, which reduces the amount of message transmission in the system and the subtree takeover when the MDS fails. Time, and reduce the consistency requirements for sub-tree information of MDS.

Further optionally, if the faulty MDS is a central node, a new central node may also be determined, for example, the determined new central node is MDS1. Then, MDS1 can read the subtree path attribution table from the low layer storage (because the subtree path attribution table is stored on the disk), and the subtree information of the entire system can be reconstructed in the cache. Optionally, the fault condition of the central node may be detected by any MDS in the system. The MDS that needs to be taken over may also be determined by any MDS in the system, and is not described here. Further, the MDS1 can also detect whether there is a subtree on the central node of the fault that needs to be taken over. If yes, the subtree belonging to the central node can be searched from the subtree path attribution table, and the attribution is changed to MDS1 to obtain a new The subtree path belongs to the table, and persists the new subtree path attribution table to the disk, replacing the previous subtree path attribution table.

In the embodiment of the present invention, the affiliation relationship between the subtree and the MDS and the subtree path information are managed by the central point, and are persistent to the disk, so that the data nodes (including the MDS or the central node) in the system are faulty. The subtree path attribution table can be reconstructed by reading the persistent information without sending a request to all the MDSs to obtain the obtained subtree information for subtree reconstruction, thereby reducing the message burst amount on the takeover node and reducing the MDS fault time. The tree takes over the time, so that the takeover time is not affected by the cluster size, and the switching time can be controlled within 5s. In addition, when the subtree takes over, the subtree path attribution table on the central node is taken as the standard, so that the complexity of reconstructing the subtree information is reduced, and the consistency requirement for the subtree information on each MDS is reduced.

Referring to FIG. 10, FIG. 10 is a schematic structural diagram of a data storage device according to an embodiment of the present invention. Specifically, as shown in FIG. 10, the data storage device of the embodiment of the present invention may include a path determining module 11, a first obtaining module 12, and a storage module 13. among them,

The path determining module 11 is configured to determine a subtree path in the distributed file system.

The subtree path may include a root node of each subtree in each data node in the distributed file system and each data. The root node of the node's subtree to the subtree node on the path of the root node of the system.

The first obtaining module 12 is configured to acquire node attribution information on the subtree path.

The node attribution information indicates connection information between subtrees on the subtree path and a home data node of each subtree.

The storage module 13 is configured to store node attribution information on the subtree path.

Optionally, in the embodiment of the present invention, the storage module 13 may be specifically configured to:

Generating a subtree path attribution table including node attribution information on the subtree path, and storing the subtree path attribution table.

Further optionally, the device may further include:

a second acquiring module, configured to acquire an index node number of the target subtree when a target subtree belonging to the first data node needs to be migrated to the second data node;

a searching module, configured to search, from the subtree path attribution table, an inode number that is the same as an inode number of the target subtree;

And a first update module, configured to update the home data node corresponding to the found index node number in the subtree path attribution table to the second data node.

Further optionally, the device may further include:

a receiving module, configured to receive a subtree takeover request indicating that the third data node is faulty;

a second update module, configured to update a home data node of the subtree belonging to the third data node in the subtree path attribution table to a fourth data node in response to the subtree takeover request;

And a sending module, configured to return, to the fourth data node, subtree information attributed to the third data node, so that the fourth data node performs subtree reconstruction in a data cache of the fourth data node.

The central node, the first data node, the second data node, the third data node, the fourth data node, and the like may be the same data node or a different node, which is not limited in the embodiment of the present invention. Thereby, subtree migration and subtree takeover can be performed based on the stored node attribution information.

Referring to FIG. 11, FIG. 11 is a schematic structural diagram of a data storage system according to an embodiment of the present invention. Specifically, as shown in FIG. 11, the data storage system of the embodiment of the present invention may include a first data node 2 and a central node 1;

The first data node 2 is configured to provide a data service for a subtree of the first data node 1;

The central node 1 is configured to determine a subtree path in the distributed file system where the first data node 2 is located; acquire node attribution information on the subtree path, and store nodes on the subtree path Ownership information.

Optionally, in the embodiment of the present invention,

The central node 1 is further configured to generate a subtree path attribution table including node attribution information on the subtree path, and store the subtree path attribution table;

The connection information between the subtrees on the subtree path includes an inode number of each subtree node on the subtree path, and an inode number of a previous subtree node of each subtree node. Optionally, the subtree path attribution table may further include subtree root node information on the subtree path.

Further optionally, the system further includes: a second data node 3; wherein

The first data node 2 is configured to send a migration notification message to the central node 1, where the migration notification message indicates an inode number of a target subtree to be migrated and a second data node 3 to be migrated to;

The central node 1 is configured to search, from the subtree path attribution table, an index node number that is the same as an index node number of the target subtree; and locate the found index node in the subtree path attribution table. The home data node corresponding to the number is updated to the second data node 3.

The first data node 2 is configured to provide a data service for a subtree of the first data node, and the second data node 3 is configured to provide a data service for a subtree of the second data node.

Further optionally, in the embodiment of the present invention,

The second data node 3 is further configured to construct a data cache, and update the home data node of the target subtree in the cache to the second data node 3;

The first data node 2 is further configured to update the home data node of the target subtree in the data cache of the first data node 2 to the second data node 3.

That is, the second data node 3 can also construct a data cache. After updating the subtree path attribution table, the second data node 3 can change the home data node of the target subtree in the cache to itself. The first data node 2 can change the home data node of the target subtree in the cache to the second data node 3. In order to quickly extract subtree information.

Further optionally, the system further includes: a third data node 4 and a fourth data node 5, wherein the third data node 4 is a failed data node;

The fourth data node 5 is configured to send, to the central node 1, a subtree takeover request for indicating that the third data node 4 is faulty;

The central node 1 is configured to receive the subtree takeover request, and update the home data node of the subtree belonging to the third data node 4 in the subtree path attribution table to the fourth data node 5, and Returning, to the fourth data node 5, subtree information belonging to the third data node 4 in the subtree path attribution table before the update;

The fourth data node 5 is further configured to perform subtree reconstruction in a data cache of the fourth data node based on the subtree information of the third data node 4.

In the embodiment of the present invention, each data node in the distributed file system can communicate with each other. The central node, the first data node, the second data node, the third data node, the fourth data node, etc. may be the same The data nodes, or different nodes, are not limited in the embodiment of the present invention. Specifically, the central node, the first data node, the second data node, the third data node, and the fourth data node in the embodiment of the present invention may refer to the related description in the corresponding embodiment in FIG. 1-9, and no longer Narration.

Referring to FIG. 12, FIG. 12 is a schematic structural diagram of a data server according to an embodiment of the present invention. Specifically, as shown in FIG. 12, the data server in the embodiment of the present invention includes: a communication interface 300, a memory 200, and a processor 100, and the processor 100 is respectively connected to the communication interface 300 and the memory 200. The memory 200 may be a high speed RAM memory or a non-volatile memory such as at least one disk memory. The communication interface 300, the memory 200, and the processor 100 may be connected to each other through a bus, or may be connected by other means. In the present embodiment, a bus connection will be described. Specifically, the data server in the embodiment of the present invention may correspond to the central node in the corresponding embodiment of FIG. 1 to FIG. 11 , which may specifically be a data node in the distributed file system, such as an MDS or a DS. Please refer to the related description in the corresponding embodiments of FIG. 1 to FIG. among them,

The memory 200 is configured to store driver software;

The processor 100 reads the driver software from the memory and executes it under the action of the driver software:

Determining the subtree path in the distributed file system;

The node attribution information on the subtree path is stored.

Optionally, the processor 100 performs the storing the node attribution information on the subtree path by using the driving software, and specifically performing the following steps:

Generating a subtree path attribution table including node attribution information on the subtree path, and storing the subtree path attribution table;

Optionally, the subtree path attribution table may also store subtree root node information on the subtree path.

Optionally, the processor 100 is further configured to perform the following steps by using the driver software:

Obtaining an index node number of the target subtree when the target subtree belonging to the first data node needs to be migrated to the second data node;

Finding an index node number that is the same as an index node number of the target subtree from the subtree path attribution table;

Updating the home data node corresponding to the found index node number in the subtree path attribution table to the second data node.

Receiving, by the communication interface 300, a subtree takeover request indicating that the third data node is faulty;

Responding to the subtree takeover request, updating the home data node of the subtree belonging to the third data node in the subtree path attribution table to a fourth data node;

Returning, by the communication interface 300, the subtree information belonging to the third data node in the subtree path attribution table before the update to the fourth data node, so that the fourth data node is in the Subtree reconstruction is performed in the data cache of the fourth data node.

In the several embodiments provided by the present invention, it should be understood that the disclosed apparatus and method may be implemented in other manners. For example, the device embodiments described above are merely illustrative. For example, the division of the modules is only a logical function division. In actual implementation, there may be another division manner, for example, multiple modules or components may be combined or Can be integrated into another system, or some features can be ignored or not executed. In addition, the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or module, and may be electrical, mechanical or otherwise.

The modules described as separate components may or may not be physically separated. The components displayed as modules may or may not be physical modules, that is, may be located in one place, or may be distributed to multiple network modules. . Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, each functional module in each embodiment of the present invention may be integrated into one processing module, or each module may exist physically separately, or two or more modules may be integrated into one module. The above integrated modules can be implemented in the form of hardware or in the form of hardware plus software function modules.

The above-described integrated modules implemented in the form of software function modules can be stored in a computer readable storage medium. The software function modules described above are stored in a storage medium and include instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor to perform the methods of the various embodiments of the present invention. Part of the steps. The foregoing storage medium includes: a U disk, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk, and the like, and the program code can be stored. Medium.

Finally, it should be noted that the above embodiments are merely illustrative of the technical solutions of the present invention, and are not intended to be limiting; although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art will understand that The technical solutions described in the foregoing embodiments may be modified, or some or all of the technical features may be equivalently replaced; and the modifications or substitutions do not deviate from the technical solutions of the embodiments of the present invention. range.

Claims

A data storage method, comprising:

Determining the subtree path in the distributed file system;

Obtaining node attribution information on the subtree path, where the node attribution information indicates connection information between subtrees on the subtree path and a home data node of each subtree;

The node attribution information on the subtree path is stored.
The method according to claim 1, wherein the storing the node attribution information on the subtree path comprises:

Generating a subtree path attribution table including node attribution information on the subtree path, and storing the subtree path attribution table;

The connection information between the subtrees on the subtree path includes an inode number of each subtree node on the subtree path, and an inode number of a previous subtree node of each subtree node.
The method of claim 2, wherein the method further comprises:

Obtaining an index node number of the target subtree when the target subtree belonging to the first data node needs to be migrated to the second data node;

Finding an index node number that is the same as an index node number of the target subtree from the subtree path attribution table;

Updating the home data node corresponding to the found index node number in the subtree path attribution table to the second data node.
The method of claim 2, wherein the method further comprises:

Receiving a subtree takeover request indicating that the third data node is faulty;

Responding to the subtree takeover request, updating the home data node of the subtree belonging to the third data node in the subtree path attribution table to a fourth data node;

Returning, to the fourth data node, subtree information belonging to the third data node in the subtree path attribution table before the updating, so that the data of the fourth data node in the fourth data node Subtree reconstruction in the cache.
A data storage system, comprising: a first data node and a central node; wherein

The first data node is configured to provide a data service for a subtree of the first data node;

The central node is configured to determine a subtree path in the distributed file system where the first data node is located; acquire node attribution information on the subtree path, and store node attribution information on the subtree path ;

The node attribution information indicates connection information between subtrees on the subtree path and a home data node of each subtree.
The system of claim 5 wherein:

The central node is further configured to generate a subtree path attribution table including node attribution information on the subtree path, and store the subtree path attribution table;

The connection information between the subtrees on the subtree path includes an inode number of each subtree node on the subtree path, and an inode number of a previous subtree node of each subtree node.
The system of claim 6 wherein said system further comprises: a second data node; wherein

The first data node is further configured to send a migration notification message to the central node, where the migration notification message is Indicates the inode number of the target subtree that needs to be migrated and the second data node that needs to be migrated to;

The central node is further configured to search, from the subtree path attribution table, an index node number that is the same as an index node number of the target subtree; and the found index node in the subtree path attribution table The home data node corresponding to the number is updated to the second data node.
The system of claim 7 wherein:

The second data node is further configured to construct a data cache, and update a home data node of the target subtree in the cache to the second data node;

The first data node is further configured to update a home data node of the target subtree in a data cache of the first data node to the second data node.
The system of claim 6 wherein said system further comprises: a third data node and a fourth data node, said third data node being a failed data node, said fourth data node being a data node that takes over the subtree of the third data node; wherein

The fourth data node is configured to send, to the central node, a subtree takeover request for indicating that the third data node is faulty;

The central node is further configured to receive the subtree takeover request, and update a home data node of the subtree belonging to the third data node in the subtree path attribution table to the fourth data node; Returning, to the fourth data node, subtree information belonging to the third data node in the subtree path attribution table before the updating;

The fourth data node is further configured to perform subtree reconstruction in a data cache of the fourth data node based on the subtree information of the third data node.
A data storage device, comprising:

a path determining module, configured to determine a subtree path in the distributed file system;

a first acquiring module, configured to acquire node attribution information on the subtree path, where the node attribution information indicates connection information between subtrees on the subtree path and a home data node of each subtree;

a storage module, configured to store node attribution information on the subtree path.
The device according to claim 10, wherein the storage module is specifically configured to:

Generating a subtree path attribution table including node attribution information on the subtree path, and storing the subtree path attribution table;

The connection information between the subtrees on the subtree path includes an inode number of each subtree node on the subtree path, and an inode number of a previous subtree node of each subtree node.
The device according to claim 11, wherein the device further comprises:

a second acquiring module, configured to acquire an index node number of the target subtree when a target subtree belonging to the first data node needs to be migrated to the second data node;

a searching module, configured to search, from the subtree path attribution table, an inode number that is the same as an inode number of the target subtree;

And a first update module, configured to update the home data node corresponding to the found index node number in the subtree path attribution table to the second data node.
The device according to claim 11, wherein the device further comprises:

a receiving module, configured to receive a subtree takeover request indicating that the third data node is faulty;

a second update module, configured to respond to the subtree takeover request, to attribute the subtree path attribution table to the The home data node of the subtree of the third data node is updated to the fourth data node;

a sending module, configured to return, to the fourth data node, subtree information belonging to the third data node in the subtree path attribution table before the updating, so that the fourth data node is in the Subtree reconstruction is performed in the data cache of the four data nodes.
A data server, comprising: a memory and a processor, wherein the memory is connected to the processor; wherein

The memory is used to store driver software;

The processor is configured to read the driver software from the memory and execute the function of the driver software:

Determining the subtree path in the distributed file system;

Obtaining node attribution information on the subtree path, where the node attribution information indicates connection information between subtrees on the subtree path and a home data node of each subtree;

The node attribution information on the subtree path is stored.