WO2019000949A1

WO2019000949A1 - Metadata storage method and system in distributed storage system, and storage medium

Info

Publication number: WO2019000949A1
Application number: PCT/CN2018/075077
Authority: WO
Inventors: 饶蓉; 魏明昌
Original assignee: 华为技术有限公司
Priority date: 2017-06-28
Filing date: 2018-02-02
Publication date: 2019-01-03
Also published as: CN109144406A; CN109144406B; CN111949210A

Abstract

Disclosed is a metadata storage method in a distributed storage system. In a distributed storage system, in a scenario where a metadata stripe composed of an EC algorithm realizes data reliability, a main data storage node backs up other metadata blocks in the metadata stripe. Since metadata blocks on a data storage node only need to be backed up on the main data storage node, in comparison with multiple replicas of all the metadata blocks, the storage space is reduced, and when a client accesses metadata, it is only necessary to access all the metadata blocks from the main data storage node, thus increasing the metadata access speed.

Description

Metadata storage method, system and storage medium in distributed storage system

Technical field

The present invention relates to the field of data storage technologies, and in particular, to a metadata storage method, system, and storage medium in a distributed storage system.

Background technique

In a distributed storage system, after the management node stores the user data to the storage node, metadata such as a logical address, a physical address, and the like of the recorded data are generated, and the metadata is also stored in the storage node. A common metadata storage method is to break up the blocks in the metadata stripe to each storage node. When reading the metadata, the blocks in the metadata stripe need to be read from each storage node, and the pieces are pieced together into metadata strips. However, the amount of data forwarding between storage nodes is large, which affects performance. Another way metadata is stored in multiple copies on the storage node, but it increases storage overhead.

Summary of the invention

In a first aspect, an embodiment of the present invention provides a metadata storage solution in a distributed storage system, where the distributed storage system includes a management node and (M+N) storage nodes, and the management node and (M) +N) storage nodes each store a partitioned view of the metadata stripe; the partitioned view of the metadata stripe includes a primary data storage node DS _A , a data storage node DS _{i ,} and a check storage node CS _r ; For a natural number not less than 2, M is a natural number not less than 1, A is one of natural numbers 1 to N, i is each of natural numbers 1 to N except A, and r is each of natural numbers 1 to M In the storage scheme: the management node determines, according to the partitioned view of the metadata stripe, the primary data storage node DS _A , the data storage node DS _{i ,} and the verification storage node CS _r for the metadata striping; The metadata stripe includes a metadata block D _A , D _i and a check block C _r , sends D _i to the data storage node DS _i , and sends D _A to the primary data storage node DS _A , Cr is sent to the check storage node CS _r ; the check storage node CS _r receiving and storing C _r ; the data storage node DS _i receives and stores D _i and transmits D _i to the primary data storage node DS _A according to the partitioned view of the metadata strip; the primary data storage Node DS _A receives and stores D _A and D _i . In this solution, under the implementation of the metadata using the Erasure Coding (EC) protection mechanism, the primary data storage node DS _A backs up the other metadata blocks D _i in the metadata strip because only the data storage node DS needs to be used. metadata block D _i on _i on the primary backup data storage node DS _a, compared with the prior art multiple copies all metadata block does not need a copy of the check block, reduced storage space, while access to the metadata client At the same time, all metadata blocks can be accessed from the primary data storage node DS _A , which improves the speed of metadata access. The distributed storage system of the present solution can be stored for a distributed file system, a distributed object storage system, or a distributed block device.

Optionally, the management node determines, according to the partitioned view of the metadata stripe, the primary data storage node DS _A , the data storage node DS _{i ,} and the verification storage node CS _r for the metadata striping, specifically including: Determining, by the management node, a partition corresponding to the metadata strip according to a write request for generating metadata in the metadata stripe; the management node querying the metadata strip according to a partition corresponding to the metadata stripe The partitioned view determines the primary data storage node DS _A , the data storage node DS _{i ,} and the parity storage node CS _r .

Optionally, the management node determines, according to the address carried by the write request, a partition corresponding to the metadata stripe.

Optionally, the verifying the storage node CS _r storing the Cr specifically includes: the verification storage node CS _r allocates a fragment S _r to the Cr, and establishes the identifier of the Cr and the fragment S _r mapping relationship; the data storage node DS _i D _i memory comprises: a data storage node DS _i D _i is the slice allocated SD _i, and establishing the identity of D _i and SD _i of the slice mapping relationship; said main memory data storage node DS _a D _a and D _i, comprises: a primary data storage DS _a to the node D _a dispensing fragment SD _a, and the identification and the establishment of D _a Describe the mapping relationship of the fragment SD _A , allocate the fragment SD _i to the D _i , and establish a mapping relationship between the identifier of the D _i and the fragment SD _i .

Further, the management node establishes a mapping relationship between the identifier of D _i and the data storage node DS _i and the primary data storage node DS _A . When garbage collection is performed on the metadata stripe, the management node may recover the data of the metadata block in the data storage node and the main data storage node according to the mapping relationship between the identifier of the metadata block in the metadata strip and the storage node. , improve the efficiency of metadata recovery.

In a second aspect, the embodiment of the present invention further provides a distributed storage system, where the distributed storage system includes a management node and (M+N) storage nodes, and the management node and (M+ Each of the N) storage nodes stores a partitioned view of the metadata stripe; the partitioned view of the metadata stripe includes a primary data storage node DS _A , a data storage node DS _{i ,} and a check storage node CS _r ; wherein N is a natural number not less than 2, M is a natural number not less than 1, A is one of natural numbers 1 to N, i is each of natural numbers 1 to N except A, and r is each of natural numbers 1 to M; The distributed storage system is used to implement various implementations of the first aspect.

Accordingly, the present invention also provides a non-volatile computer readable storage medium and a computer program product, which are included in a memory-loaded non-volatile computer readable storage medium and computer program product of a storage device provided by an embodiment of the present invention. Computer program instructions, the computer program instructions being operative in a distributed storage system, the distributed storage system comprising a management node and (M+N) storage nodes, the management node and (M+N) storage nodes all storing a partitioned view having a metadata stripe; the partitioned view of the metadata stripe includes a primary data storage node DS _A , a data storage node DS _{i ,} and a check storage node CS _r ; wherein N is a natural number not less than 2, M For a natural number not less than 1, A is one of natural numbers 1 to N, i is each of natural numbers 1 to N except A, and r is each of natural numbers 1 to M; when one or more computers execute the computer program instructions are stored as the management node of the distributed system, the data storage master node DS _a, the data storage node DS _i and the check node memory for implementing a first aspect of the CS _r Kind of implementation.

The metadata storage scheme in the various distributed storage systems disclosed in the first aspect can also be applied to the storage of data corresponding to the metadata. Accordingly, the distributed storage system of the second aspect and the non-transitory computer readable storage medium and computer program product of the third aspect are equally applicable to data storage.

DRAWINGS

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings used in the description of the embodiments will be briefly described below.

FIG. 1 is a schematic diagram of a storage structure of a distributed block device according to an embodiment of the present invention;

2 is a schematic structural diagram of a server in a distributed block device according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a relationship between a data stripe and a partition view according to an embodiment of the present invention;

4 is a schematic diagram of data striping according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of a partition view according to an embodiment of the present invention;

FIG. 6 is a schematic diagram of a relationship between a metadata stripe and a partition view according to an embodiment of the present invention;

FIG. 7 is a flowchart of metadata storage according to an embodiment of the present invention;

FIG. 8 is a schematic diagram of metadata striping according to an embodiment of the present invention;

FIG. 9 is a schematic diagram of metadata storage according to an embodiment of the present invention.

Detailed ways

The technical solutions in the embodiments of the present invention will be clearly described below with reference to the accompanying drawings in the embodiments of the present invention.

Distributed storage systems mainly include distributed file system storage, distributed object storage, and distributed block device storage, such as

of

Series products. The embodiment of the present invention is described by taking a distributed block device storage as an example. Illustratively, as shown in FIG. 1, the distributed block device storage includes a plurality of servers 1, a server 2, a server 3, a server 4, a server 5, and a server 6, and the servers communicate with each other. In an actual application, the number of servers in the distributed block device storage may be increased according to actual requirements, which is not limited by the embodiment of the present invention. The server stored in the distributed block device includes the structure as shown in FIG. 2.

As shown in FIG. 2, each server in the distributed block device storage includes a central processing unit (CPU) 201, a memory 202, a hard disk 1, a hard disk 2, and a hard disk 3. The memory 202 stores computer instructions, and the CPU 201 executes Program instructions in memory 202 perform the corresponding operations. The hard disk can be at least one of a mechanical hard disk and a solid state hard disk. In addition, in order to save the computing resources of the CPU 201, a Field Programmable Gate Array (FPGA) or other hardware may also be used for the corresponding operations of the CPU 201, or the FPGA or other hardware may perform the corresponding operations together with the CPU 201. For convenience of description, the embodiments of the present invention are generally described as a processor for implementing the corresponding operations described above.

In the configuration shown in FIG. 2, an application is loaded in the memory 202, and the CPU 201 executes an application instruction in the memory 202, and the server serves as a client. The application can be a virtual machine (VM) or a specific application, such as office software. The client stores write data to or reads data from the distributed block device store. The storage management program is loaded in the memory 202, and the CPU 201 executes the storage management program instruction as the virtual block storage management program in the memory 202, and the server acts as a management node, and is responsible for managing the volume metadata, and provides a block protocol access interface to the client. A distributed storage access point service is provided for the client, so that the client can access the storage resource stored by the distributed block device through the management node. The storage object program is loaded in the memory 202, and the CPU 201 executes the storage object program instruction in the memory 202, and the server functions as a storage node for performing a specific input/output (I/O) operation. On each server, you can run multiple storage object program processes. For example, one hard disk corresponds to running a storage object program process by default. Each storage object program process manages one hard disk, and the server runs each storage object program. The process acts as a storage node. Specifically, it is also possible to run a storage object program on a server corresponding to all hard disks on the server. The embodiment of the present invention describes a case where a storage object program process manages a hard disk. When the distributed block device is initialized, the process of each storage object program will manage the hard disk in units of 1 MB, and record the allocation information of each 1 MB fragment in the metadata management area of the hard disk. Storage resource pool. The storage management program communicates with all the storage object programs of the resource pools that it can access, that is, the management node communicates with all the storage nodes of the resource pool that the management node can access, so that the management node can concurrently access all the hard disks of the resource pool.

When the distributed block device is initialized, the hash space (such as 0ˉ2^32,) is divided into N equal parts, and each partition is a partition, and the N equal parts are equally divided according to the number of hard disks. For example, the default block storage device storage N defaults to 3600, that is, the partitions are P1, P2, P3...P3600, respectively. As shown in FIG. 3, assuming that the current distributed block device stores 18 hard disks (storage nodes), each storage node carries 200 partitions. The corresponding relationship between the partition and the storage node, that is, the partition view, is allocated when the distributed block device is initialized, and then adjusted according to the change of the number of hard disks in the distributed block device storage. The server stored by the distributed block device saves the partition view in the memory 202, and the management node uses the partition view for fast routing. Each partition node also stores all partitioned views of the distributed block device storage system, that is, the correspondence between each partition and the storage node. At the same time, according to the reliability requirements of distributed block device storage, the Erasure Coding (EC) algorithm can be used to improve data reliability, such as using 3+1 mode, that is, 3 data blocks and 1 check block to form data points. As shown in Figure 4, the partition view is "Partition - Primary Data Storage Node - Data Storage Node 1 - Data Storage Node 2 - Verify Storage Node, for example, the partition view is shown in Figure 5. This partition view a data storage node 1 and a data node 2 representing a partition corresponding primary data node and other data blocks for storing data strips, and a check storage node storing check data, which are stored in the data storage node 1 and the data storage node 2 The backup data storage node of the data block is the primary data storage node.

The distributed block device storage will logically slice each logical unit number (LUN) by 1MB. For example, a 1GB LUN will be sliced into 1024*1MB fragments. As shown in Figure 3, when the client sends a write request to the LUN through the management node, it will carry the LUN ID (Identifier, ID) and logical block address (Logical Block) in the Small Computer System Interface (SCSI) command. Address, LBA) ID and data to be written, the management node where the client is located receives the write request, and forms a key according to the LUN ID and the LBA ID. The key will contain the rounding calculation information of the LBA ID to 1 MB. An integer (within 0ˉ2^32) is calculated by the Distributed Hash Table (DHT) Hash and falls in the specified partition; the management node where the client is located determines the main according to the partitioned view recorded in the memory 202. a data storage node, a data storage node 1, a data storage node 2, and a check storage node. The management node sends the data block 1, the data block 2, the data block 3, and the check block 4 in the EC data stripe to the main data storage, respectively. Node 1, data storage node 2, data storage node 3, and verification storage node 4. The main data storage node stores the data block 1, the data storage node 1 stores the data block 2, the data storage node 2 stores the data block 3, and the check storage node stores the check block 1. The

data storage nodes

1 and 2 respectively determine the primary data storage node according to the partition view, the data storage node 1 backs up the data block 2 to the primary data storage node, and the data storage node 2 backs up the data block 3 to the primary data storage node, the primary data storage node Data block 2 and data block 3 are stored separately. In a specific implementation, the primary data storage node allocates the fragment 1 from the hard disk managed by the data block 1 to establish a mapping relationship between the identifier of the data block 1 and the fragment 1; the data storage node 1 is a data block from the hard disk managed by the data storage node 1 2Assigning the slice 2, establishing the mapping relationship between the identifier of the data block 2 and the slice 2; the data storage node 2 assigns the slice 3 to the data block 3 from the hard disk it manages, and establishes the identifier of the data block 3 and the slice 3 Mapping relationship; the verification storage node allocates the slice 4 to the check block 1 from the hard disk managed by the storage node, and establishes the mapping relationship between the identifier of the check block 1 and the slice 4. The primary data storage node receives the data block 2 sent by the data storage node 1 and the data block 3 sent by the data storage node 2, and the primary data storage node allocates the fragment 5 and the fragment 6 from the hard disk managed by the primary data storage node, and the primary data storage node establishes the data. The mapping relationship between the identification of the block 2 and the slice 5, and the mapping relationship between the identification of the data block 3 and the slice 6. In the embodiment of the present invention, the mapping relationship between the identifier of the data block and the fragment is taken as an example. When one process of the storage object program corresponds to one hard disk, that is, the storage node is the hard disk itself, the identification and division of the data block. The mapping relationship between the data block is the mapping relationship between the identifier of the data block and the physical address of the slice; when one process of the storage object program corresponds to multiple hard disks, that is, the storage node manages multiple hard disks, the identification and fragmentation of the data block The mapping relationship is a mapping including the identification of the data block and the hard disk storing the data block, and the hard disk to slice storage of the data block. The mapping relationship of the physical addresses of the fragments. Further, the data blocks 2 are respectively stored into the slice 2 and the slice 5, and the data block 3 is stored in the slices 3 and 6, respectively, and the management node establishes and holds the identifier of the data block 2 and the data storage node 1 and the main data storage node. Mapping, establishing and saving the mapping relationship between the identity of the data block 3 and the data storage node 2 and the primary data storage node. Further, the data storage node 1 holds a mapping of the identifier of the saved data block 2 with the data storage node 1 and the primary data storage node, and the data storage node 2 holds the mapping relationship between the identifier of the data block 3 and the data storage node 2 and the primary data storage node. . When garbage collection is performed on the data stripe, the management node can recover the data of the data block in the data storage node and the main data storage node according to the mapping relationship between the identifier of the data block in the data strip and the storage node, thereby improving the data. The efficiency of recycling.

In the embodiment of the present invention, when the client sends a write request write data to the distributed block device, metadata is generated for recording the logical address and physical address of the data. In the embodiment of the present invention, the metadata storage corresponding to the data uses the same EC algorithm as the data storage. The metadata striping based on the EC algorithm has the same partitioned view as the above-described composition data striping based on the EC algorithm, as shown in FIG. 6.

Storing metadata in a distributed storage system, wherein the distributed storage system includes a management node and (M+N) storage nodes, and the management node and (M+N) storage nodes each store a partitioned view of metadata strips; The partitioned view of the data stripe includes a primary data storage node DS _A , a data storage node DS _{i ,} and a parity storage node CS _r ; wherein N is a natural number not less than 2, M is a natural number not less than 1, and A is a natural number 1 to One of N, i is each of the natural numbers 1 to N except A, and r is each of the natural numbers 1 to M; the flow shown in FIG. 7 is executed in the distributed storage system storage:

Step 701: The management node determines, according to the partitioned view of the metadata stripe, the primary data storage node DS _A , the data storage node DS _{i ,} and the check storage node CS _r for the metadata striping; the metadata strip includes Metadata blocks D _A , D _i and check block Cr.

Specifically, the management node determines, according to the partitioned view of the metadata stripe, the primary data storage node DS _A , the data storage node DS _{i ,} and the verification storage node CS _r for the metadata striping, specifically: The management node determines, according to the write request that generates the metadata in the metadata stripe, the partition corresponding to the metadata stripe; the management node queries the metadata stripe according to the partition corresponding to the metadata stripe The partitioned view determines the primary data storage node DS _A , the data storage node DS _{i ,} and the parity storage node CS _r .

Specifically, the management node determines, according to the address carried by the write request, a partition corresponding to the metadata stripe. For details, refer to the scheme in which the distributed block device stores the write request sent by the client, and details are not described herein.

Step 702: The management node sends D _i to the data storage node DS _i , sends D _A to the primary data storage node DS _A , and sends C _r to the verification storage node CS _r .

Step 703: The verification storage node CS _r receives and stores C _r .

Step 704: The data storage node DS _i receives and stores D _i , and sends D _i to the primary data storage node DS _A according to the partitioned view of the metadata strip.

Step 705: The primary data storage node DS _A receives and stores D _A and D _i .

Specifically, the verifying the storage node CS _r storing the Cr specifically includes: the verification storage node CS _r allocates a fragment S _r to the Cr, and establishes a mapping between the identifier of the Cr and the fragment S _r relation; the data storage node DS _i D _i memory comprises: a data storage node DS _i D _i is the slice allocated SD _i, and establishing the identity of D _i of the slice of the mapping SD _i relation; said main memory data storage node DS _a D _a and D _i, comprises: a primary data storage DS _a to the node D _a dispensing fragment SD _a, D _a and establishing the identity of the SD _a mapping relationship between the slice, said slice allocation D _i SD _i, and the mapping relation of D _i and identifying the fragmentation of SD _i. Further, the management node establishes a mapping relationship between the identifier of D _i and the data storage node DS _i and the primary data storage node DS _A . Further, further, the data storage node 1 holds a mapping relationship between the identifier of the saved D _i and the data storage node DS _i and the primary data storage node DS _A . When garbage collection is performed on the metadata stripe, the management node may recover the data of the metadata block in the data storage node and the main data storage node according to the mapping relationship between the identifier of the metadata block in the metadata strip and the storage node. , improve the efficiency of metadata recovery.

In the embodiment of the present invention, in combination with the foregoing distributed block device storage and data storage manner, as shown in FIG. 8, the metadata blocks in the metadata stripe using the EC algorithm are D ₁ , D ₂ and D ₃ , The block is C ₁ . The management node where the client is located determines the primary data storage node, the data storage node 1, and the data storage according to the partitioned view "Partition - Primary Data Storage Node - Data Storage Node 1 - Data Storage Node 2 - Verify Storage Node" recorded in the memory 202. Node 2 and the check storage node. The partitioned view represents a data storage node 1 and a data node 2 corresponding to the primary data node and other data blocks for storing the metadata stripe, and a check storage node storing the check data, which are stored in the data storage node 1 and the data. The backup data storage node of the metadata block of the storage node 2 is the primary data storage node. The management node transmits D ₁ , D ₂ , D _{3 ,} and C ₁ in the metadata strip based on the EC algorithm to the primary data storage node, the data storage node 1, the data storage node 2, and the verification storage node 4, respectively. Primary data storage node receives and stores D _1, a data storage node receives and stores D _2, data storage node 2 receives and stores the D _3, check storage node receives and stores C _1. The

data storage nodes

1 and 2 respectively determine the primary data storage node according to the partition view, the data storage node 1 backs up D ₂ to the primary data storage node, and the data storage node 2 backs up D ₃ to the primary data storage node, and the primary data storage node receives and Store D ₂ and D ₃ . In specific implementation, 9, 7 D ₁ slice allocated from the hard disk management, a mapping relationship between the identifier 7 of slices ₁ D primary data storage node; data storage node from a hard disk management A slice 8 is allocated for D _{2 to} establish a mapping relationship between the identifier of D ₂ and the slice 8; the data storage node 2 allocates a slice 9 to D ₃ from the hard disk it manages, and establishes the identifier of the D ₃ and the slice 9 Mapping relationship; verifying that the storage node allocates a fragment 10 to C ₁ from its managed hard disk, and establishes a mapping relationship between the identifier of C ₁ and the fragment 10 . The primary data storage node receives D ₂ sent by the data storage node 1 and D ₃ sent by the data storage node 2, and the primary data storage node allocates the fragment 11 and the fragment 12 from the hard disk managed by the primary data storage node, and the primary data storage node establishes D ₂ The mapping relationship between the identifier and the fragment 11 and the mapping relationship between the identifier of the D ₃ and the fragment 12 are identified. In the embodiment of the present invention, the mapping relationship between the identifier of the metadata block and the fragment is taken as an example. When one process of the storage object program corresponds to one hard disk, that is, the storage node is the hard disk itself, the identifier of the metadata block. The mapping relationship with the fragment is the mapping relationship between the identifier of the metadata block and the physical address of the fragment; when one process of the storage object program corresponds to multiple hard disks, that is, the storage node manages multiple hard disks, the identifier of the metadata block The mapping relationship with the slice is a mapping including the identity of the metadata block and the hard disk storing the metadata block, and a hard disk to slice mapping of the metadata block. Further, D _{2 is} stored to the slice 8 and the slice 11 respectively, and D _{3 is} stored in the fragments 9 and 12, respectively, and the management node establishes and saves the mapping of the identifier of the D ₂ with the data storage node 1 and the primary data storage node, and establishes D ₃ and save the mapping relationship between the identifier and the data storage nodes 2 and the master data storage node. Further, the data storage node 1 holds a mapping of the identifier of the saved D ₂ with the data storage node 1 and the primary data storage node, and the data storage node 2 stores the mapping relationship between the identifier of D ₃ and the data storage node 2 and the primary data storage node. When garbage collection is performed on the metadata stripe, the management node may recover the data of the metadata block in the data storage node and the main data storage node according to the mapping relationship between the identifier of the metadata block in the metadata strip and the storage node. , improve the efficiency of metadata recovery.

Therefore, in the scenario where the metadata is composed of the EC algorithm to achieve data reliability, the primary data storage node backs up other metadata blocks in the metadata strip, because only the metadata block on the data storage node needs to be in the main data. Backup on the storage node, compared with multiple copies of all metadata blocks in the prior art, the storage space is reduced, and when the client accesses the metadata, only all metadata blocks need to be accessed from the primary data storage node, thereby improving metadata access. speed.

The embodiment of the present invention further provides a non-transitory computer readable storage medium and a computer program product, a non-transitory computer readable storage medium, and computer program instructions contained in a computer program product, the CPU executing the computer loaded in the memory The program instructions are used to implement functions corresponding to the management node and the storage node (the primary data storage node, the data storage node, and the verification storage node) in the implementations of the present invention.

An exemplary description given in the embodiments of the present invention. "Slice 1" and "Slice 2" in the embodiment of the present invention. . . "Shard 12" and the like are not used to strictly define the order relationship, but are used to distinguish different pieces. The slice in the embodiment of the present invention may be a physical block or the like in the hard disk. The hard disk in the embodiment of the present invention may be at least one of a mechanical disk and a solid state hard disk as described above. The hard disk corresponding to the process of storing the object program in the embodiment of the present invention may also be a storage array or the like, which is not limited in the embodiment of the present invention.

In the several embodiments provided by the present invention, it should be understood that the disclosed apparatus and method may be implemented in other manners. For example, the division of the units described in the device embodiments described above is only one logical function division, and may be further divided in actual implementation, for example, multiple units or components may be combined or may be integrated into another system, or Some features can be ignored or not executed. In addition, the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be in an electrical, mechanical or other form.

The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.

Claims

A metadata storage method in a distributed storage system, characterized in that the distributed storage system comprises a management node and (M+N) storage nodes, and the management node and (M+N) storage nodes are stored. a partitioned view having a metadata stripe; the partitioned view of the metadata stripe includes a primary data storage node DS A , a data storage node DS i , and a check storage node CS r ; wherein N is a natural number not less than 2, M A natural number not less than 1, A is one of natural numbers 1 to N, i is each of natural numbers 1 to N except A, and r is each of natural numbers 1 to M; the method includes: The management node determines, according to the partitioned view of the metadata stripe, the primary data storage node DS A , the data storage node DS i , and the check storage node CS r for the metadata strip; the metadata strip includes the metadata block D A , D i and check block Cr;

The management node sends D i to the data storage node DS i , sends D A to the primary data storage node DS A , and sends C r to the verification storage node CS r ;

The verification storage node CS r receives and stores C r ;

The data storage node DS i receives and stores D i , and sends D i to the primary data storage node DS A according to the partitioned view of the metadata stripe;

The primary data storage node DS A receives and stores D A and D i .
The method according to claim 1, wherein the management node determines the primary data storage node DS A , the data storage node DS i and the check for the metadata strip according to the partitioned view of the metadata stripe The storage node CS r specifically includes:

Determining, by the management node, a partition corresponding to the metadata strip according to a write request that generates metadata in the metadata stripe;

Determining, by the management node, the primary data storage node DS A , the data storage node DS i , and the verification storage node CS r according to the partitioned view of the metadata query corresponding to the partitioning of the metadata stripe .
The method according to claim 2, wherein the management node determines a partition corresponding to the metadata strip according to an address carried by the write request.
The method according to claim 1, wherein the verifying the storage node CS r storing the Cr specifically comprises: the verifying storage node CS r allocating the fragment S r to the Cr, and establishing the Cr Identifying a mapping relationship with the fragment S r ;

The data storage node DS i D i memory comprises: a data storage node DS i D i is the slice allocated SD i, identifying and establishing the mapping relation of D i and SD i of the slice;

Said main memory data storage node DS A D A and D i, comprises: a primary data storage DS A to the node D A dispensing fragment SD A, D A and establishing the identity of the fragment with SD a mapping relationship, and D i is the slice allocated SD i, and the mapping relation of D i and identifying the fragmentation of SD i.
A distributed storage system, characterized in that the distributed storage system comprises a management node and (M+N) storage nodes, and the management node and (M+N) storage nodes each store metadata strips. The partitioned view of the metadata stripe includes a primary data storage node DS A , a data storage node DS i , and a check storage node CS r ; wherein N is a natural number not less than 2, and M is not less than 1. Natural number, A is one of natural numbers 1 to N, i is each of the natural numbers 1 to N except A, and r is each of natural numbers 1 to M;

The management node is configured to determine, according to the partitioned view of the metadata stripe, the primary data storage node DS A , the data storage node DS i , and the verification storage node CS r for the metadata striping; the metadata striping Include metadata block D A , D i and check block Cr, send D i to the data storage node DS i , send D A to the primary data storage node DS A , and send C r to the school Verify the storage node CS r ;

The check node memory for receiving and storing CS r C r;

The data storage node DS i is configured to receive and store D i , and send D i to the primary data storage node DS A according to the partitioned view of the metadata stripe;

The primary data storage node DS A is for receiving and storing D A and D i .
The system according to claim 5, wherein the management node is specifically configured to determine, according to a write request for generating metadata in the metadata stripe, a partition corresponding to the metadata stripe, according to the element The partitioned view corresponding to the data stripe queries the partitioned view of the metadata stripe to determine the primary data storage node DS A , the data storage node DS i , and the parity storage node CS r .
The system according to claim 6, wherein the management node is further configured to determine a partition corresponding to the metadata strip according to an address carried by the write request.
A system according to claim 5, characterized in that the check for the particular storage node CS r S r slice allocated to the Cr, Cr and establishing the identity of the fragment mapping S r relationship;

The data storage node DS i D i for said particular slice allocation SD i, and the mapping relationship between the identifier of D i and SD i of the slice;

The primary data storage node for said particular DS A D A A dispensing fragment SD, and establishing the mapping relationship between the identifier D A of the A fragment of SD, D i is the slice allocated SD i, and the mapping relation of D i and identifying the fragmentation of SD i.
A non-volatile readable storage medium, characterized in that the non-volatile readable storage medium comprises computer program instructions, which can be run in a distributed storage system, the distributed storage system comprising management a node and (M+N) storage nodes, the management node and (M+N) storage nodes each store a partitioned view of the metadata stripe; the partitioned view of the metadata stripe includes a primary data storage node DS A , the data storage node DS i and the check storage node CS r ; wherein N is a natural number not less than 2, M is a natural number not less than 1, A is one of natural numbers 1 to N, and i is a natural number 1 to N Each of R except for A, r is each of the natural numbers 1 to M; when the one or more computers execute the computer instructions, the one or more computers are used as the management node for Defining a partitioned view of the metadata stripe to determine the primary data storage node DS A , the data storage node DS i , and the check storage node CS r for the metadata strip; the metadata stripe includes the metadata block D A , D i and check block Cr, send D i Sent to the data storage node DS i , send D A to the primary data storage node DS A , and send C r to the verification storage node CS r ; the one or more computers serve as the verification The storage node CS r is used to receive and store C r ;

The one or more computers are used as the data storage node DS i to receive and store D i , and send D i to the primary data storage node DS A according to the partitioned view of the metadata striping;

The one or more computers are used as the primary data storage node DS A for receiving and storing D A and D i .
A storage medium according to claim 9, further comprising computer program instructions to cause said one or more computers to be used as said management node in particular for generating a write request based on metadata in said metadata stripe Determining a partition corresponding to the metadata stripe, determining, according to the partitioned view of the metadata stripe corresponding to the metadata stripe, the primary data storage node DS A , the data storage node DS i and the Check the storage node CS r .