WO2019000949A1 - Procédé et système de stockage de métadonées dans un système de stockage distribué, et support de stockage - Google Patents
Procédé et système de stockage de métadonées dans un système de stockage distribué, et support de stockage Download PDFInfo
- Publication number
- WO2019000949A1 WO2019000949A1 PCT/CN2018/075077 CN2018075077W WO2019000949A1 WO 2019000949 A1 WO2019000949 A1 WO 2019000949A1 CN 2018075077 W CN2018075077 W CN 2018075077W WO 2019000949 A1 WO2019000949 A1 WO 2019000949A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- storage node
- metadata
- data storage
- node
- stripe
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 18
- 238000013500 data storage Methods 0.000 claims abstract description 199
- 239000008186 active pharmaceutical agent Substances 0.000 claims description 90
- 238000007726 management method Methods 0.000 claims description 62
- 238000013507 mapping Methods 0.000 claims description 55
- 239000012634 fragment Substances 0.000 claims description 33
- 238000005192 partition Methods 0.000 claims description 33
- 238000012795 verification Methods 0.000 claims description 17
- 238000004590 computer program Methods 0.000 claims description 11
- 238000013467 fragmentation Methods 0.000 claims description 4
- 238000006062 fragmentation reaction Methods 0.000 claims description 4
- 238000000638 solvent extraction Methods 0.000 claims 1
- 238000004422 calculation algorithm Methods 0.000 abstract description 8
- 230000008569 process Effects 0.000 description 11
- 238000010586 diagram Methods 0.000 description 8
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 238000011084 recovery Methods 0.000 description 3
- 238000004891 communication Methods 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 239000007787 solid Substances 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000004064 recycling Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0602—Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
- G06F3/061—Improving I/O performance
- G06F3/0611—Improving I/O performance in relation to response time
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/14—Error detection or correction of the data by redundancy in operation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0602—Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
- G06F3/0608—Saving storage space on storage systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0655—Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0668—Interfaces specially adapted for storage systems adopting a particular infrastructure
- G06F3/067—Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
- H04L67/1095—Replication or mirroring of data, e.g. scheduling or transport for data synchronisation between network nodes
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
- H04L67/1097—Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/14—Error detection or correction of the data by redundancy in operation
- G06F11/1402—Saving, restoring, recovering or retrying
- G06F11/1446—Point-in-time backing up or restoration of persistent data
- G06F11/1458—Management of the backup or restore process
- G06F11/1464—Management of the backup or restore process for networked environments
Definitions
- the present invention relates to the field of data storage technologies, and in particular, to a metadata storage method, system, and storage medium in a distributed storage system.
- Metadata such as a logical address, a physical address, and the like of the recorded data are generated, and the metadata is also stored in the storage node.
- a common metadata storage method is to break up the blocks in the metadata stripe to each storage node. When reading the metadata, the blocks in the metadata stripe need to be read from each storage node, and the pieces are pieced together into metadata strips. However, the amount of data forwarding between storage nodes is large, which affects performance. Another way metadata is stored in multiple copies on the storage node, but it increases storage overhead.
- an embodiment of the present invention provides a metadata storage solution in a distributed storage system, where the distributed storage system includes a management node and (M+N) storage nodes, and the management node and (M) +N) storage nodes each store a partitioned view of the metadata stripe;
- the partitioned view of the metadata stripe includes a primary data storage node DS A , a data storage node DS i , and a check storage node CS r ;
- M is a natural number not less than 1
- A is one of natural numbers 1 to N
- i is each of natural numbers 1 to N except A
- r is each of natural numbers 1 to M
- the management node determines, according to the partitioned view of the metadata stripe, the primary data storage node DS A , the data storage node DS i , and the verification storage node CS r for the metadata striping;
- the metadata stripe includes a metadata block D A , D i
- the primary data storage node DS A backs up the other metadata blocks D i in the metadata strip because only the data storage node DS needs to be used.
- metadata block D i on i on the primary backup data storage node DS a compared with the prior art multiple copies all metadata block does not need a copy of the check block, reduced storage space, while access to the metadata client At the same time, all metadata blocks can be accessed from the primary data storage node DS A , which improves the speed of metadata access.
- the distributed storage system of the present solution can be stored for a distributed file system, a distributed object storage system, or a distributed block device.
- the management node determines, according to the partitioned view of the metadata stripe, the primary data storage node DS A , the data storage node DS i , and the verification storage node CS r for the metadata striping, specifically including: Determining, by the management node, a partition corresponding to the metadata strip according to a write request for generating metadata in the metadata stripe; the management node querying the metadata strip according to a partition corresponding to the metadata stripe The partitioned view determines the primary data storage node DS A , the data storage node DS i , and the parity storage node CS r .
- the management node determines, according to the address carried by the write request, a partition corresponding to the metadata stripe.
- the verifying the storage node CS r storing the Cr specifically includes: the verification storage node CS r allocates a fragment S r to the Cr, and establishes the identifier of the Cr and the fragment S r mapping relationship;
- the data storage node DS i D i memory comprises: a data storage node DS i D i is the slice allocated SD i, and establishing the identity of D i and SD i of the slice mapping relationship;
- said main memory data storage node DS a D a and D i comprises: a primary data storage DS a to the node D a dispensing fragment SD a, and the identification and the establishment of D a Describe the mapping relationship of the fragment SD A , allocate the fragment SD i to the D i , and establish a mapping relationship between the identifier of the D i and the fragment SD i .
- the management node establishes a mapping relationship between the identifier of D i and the data storage node DS i and the primary data storage node DS A .
- the management node may recover the data of the metadata block in the data storage node and the main data storage node according to the mapping relationship between the identifier of the metadata block in the metadata strip and the storage node. , improve the efficiency of metadata recovery.
- the embodiment of the present invention further provides a distributed storage system, where the distributed storage system includes a management node and (M+N) storage nodes, and the management node and (M+ Each of the N) storage nodes stores a partitioned view of the metadata stripe; the partitioned view of the metadata stripe includes a primary data storage node DS A , a data storage node DS i , and a check storage node CS r ; wherein N is a natural number not less than 2, M is a natural number not less than 1, A is one of natural numbers 1 to N, i is each of natural numbers 1 to N except A, and r is each of natural numbers 1 to M;
- the distributed storage system is used to implement various implementations of the first aspect.
- the present invention also provides a non-volatile computer readable storage medium and a computer program product, which are included in a memory-loaded non-volatile computer readable storage medium and computer program product of a storage device provided by an embodiment of the present invention.
- Computer program instructions being operative in a distributed storage system, the distributed storage system comprising a management node and (M+N) storage nodes, the management node and (M+N) storage nodes all storing a partitioned view having a metadata stripe; the partitioned view of the metadata stripe includes a primary data storage node DS A , a data storage node DS i , and a check storage node CS r ; wherein N is a natural number not less than 2, M For a natural number not less than 1, A is one of natural numbers 1 to N, i is each of natural numbers 1 to N except A, and r is each of natural numbers 1 to M; when one or more computers execute the computer program instructions are stored as the management node of the distributed system, the data storage master node DS a, the data storage node DS i and the check node memory for implementing a first aspect of the CS r Kind of implementation.
- the metadata storage scheme in the various distributed storage systems disclosed in the first aspect can also be applied to the storage of data corresponding to the metadata. Accordingly, the distributed storage system of the second aspect and the non-transitory computer readable storage medium and computer program product of the third aspect are equally applicable to data storage.
- FIG. 1 is a schematic diagram of a storage structure of a distributed block device according to an embodiment of the present invention
- FIG. 2 is a schematic structural diagram of a server in a distributed block device according to an embodiment of the present invention
- FIG. 3 is a schematic diagram of a relationship between a data stripe and a partition view according to an embodiment of the present invention
- FIG. 4 is a schematic diagram of data striping according to an embodiment of the present invention.
- FIG. 5 is a schematic diagram of a partition view according to an embodiment of the present invention.
- FIG. 6 is a schematic diagram of a relationship between a metadata stripe and a partition view according to an embodiment of the present invention.
- FIG. 7 is a flowchart of metadata storage according to an embodiment of the present invention.
- FIG. 8 is a schematic diagram of metadata striping according to an embodiment of the present invention.
- FIG. 9 is a schematic diagram of metadata storage according to an embodiment of the present invention.
- Distributed storage systems mainly include distributed file system storage, distributed object storage, and distributed block device storage, such as of Series products.
- the embodiment of the present invention is described by taking a distributed block device storage as an example.
- the distributed block device storage includes a plurality of servers 1, a server 2, a server 3, a server 4, a server 5, and a server 6, and the servers communicate with each other.
- the number of servers in the distributed block device storage may be increased according to actual requirements, which is not limited by the embodiment of the present invention.
- the server stored in the distributed block device includes the structure as shown in FIG. 2.
- each server in the distributed block device storage includes a central processing unit (CPU) 201, a memory 202, a hard disk 1, a hard disk 2, and a hard disk 3.
- the memory 202 stores computer instructions, and the CPU 201 executes Program instructions in memory 202 perform the corresponding operations.
- the hard disk can be at least one of a mechanical hard disk and a solid state hard disk.
- a Field Programmable Gate Array (FPGA) or other hardware may also be used for the corresponding operations of the CPU 201, or the FPGA or other hardware may perform the corresponding operations together with the CPU 201.
- FPGA Field Programmable Gate Array
- the embodiments of the present invention are generally described as a processor for implementing the corresponding operations described above.
- an application is loaded in the memory 202, and the CPU 201 executes an application instruction in the memory 202, and the server serves as a client.
- the application can be a virtual machine (VM) or a specific application, such as office software.
- the client stores write data to or reads data from the distributed block device store.
- the storage management program is loaded in the memory 202, and the CPU 201 executes the storage management program instruction as the virtual block storage management program in the memory 202, and the server acts as a management node, and is responsible for managing the volume metadata, and provides a block protocol access interface to the client.
- a distributed storage access point service is provided for the client, so that the client can access the storage resource stored by the distributed block device through the management node.
- the storage object program is loaded in the memory 202, and the CPU 201 executes the storage object program instruction in the memory 202, and the server functions as a storage node for performing a specific input/output (I/O) operation.
- I/O input/output
- one hard disk corresponds to running a storage object program process by default.
- Each storage object program process manages one hard disk, and the server runs each storage object program.
- the process acts as a storage node.
- the embodiment of the present invention describes a case where a storage object program process manages a hard disk.
- each storage object program When the distributed block device is initialized, the process of each storage object program will manage the hard disk in units of 1 MB, and record the allocation information of each 1 MB fragment in the metadata management area of the hard disk.
- Storage resource pool The storage management program communicates with all the storage object programs of the resource pools that it can access, that is, the management node communicates with all the storage nodes of the resource pool that the management node can access, so that the management node can concurrently access all the hard disks of the resource pool.
- the hash space (such as 0 ⁇ 2 ⁇ 32,) is divided into N equal parts, and each partition is a partition, and the N equal parts are equally divided according to the number of hard disks.
- the default block storage device storage N defaults to 3600, that is, the partitions are P1, P2, P3...P3600, respectively.
- each storage node carries 200 partitions. The corresponding relationship between the partition and the storage node, that is, the partition view, is allocated when the distributed block device is initialized, and then adjusted according to the change of the number of hard disks in the distributed block device storage.
- the server stored by the distributed block device saves the partition view in the memory 202, and the management node uses the partition view for fast routing.
- Each partition node also stores all partitioned views of the distributed block device storage system, that is, the correspondence between each partition and the storage node.
- the Erasure Coding (EC) algorithm can be used to improve data reliability, such as using 3+1 mode, that is, 3 data blocks and 1 check block to form data points.
- the partition view is "Partition - Primary Data Storage Node - Data Storage Node 1 - Data Storage Node 2 - Verify Storage Node, for example, the partition view is shown in Figure 5.
- This partition view a data storage node 1 and a data node 2 representing a partition corresponding primary data node and other data blocks for storing data strips, and a check storage node storing check data, which are stored in the data storage node 1 and the data storage node 2
- the backup data storage node of the data block is the primary data storage node.
- the distributed block device storage will logically slice each logical unit number (LUN) by 1MB. For example, a 1GB LUN will be sliced into 1024*1MB fragments.
- LUN ID Identity
- logical block address Logical Block
- SCSI Small Computer System Interface
- Address, LBA) ID and data to be written the management node where the client is located receives the write request, and forms a key according to the LUN ID and the LBA ID.
- the key will contain the rounding calculation information of the LBA ID to 1 MB.
- An integer (within 0 ⁇ 2 ⁇ 32) is calculated by the Distributed Hash Table (DHT) Hash and falls in the specified partition; the management node where the client is located determines the main according to the partitioned view recorded in the memory 202.
- the management node sends the data block 1, the data block 2, the data block 3, and the check block 4 in the EC data stripe to the main data storage, respectively.
- the main data storage node stores the data block 1, the data storage node 1 stores the data block 2, the data storage node 2 stores the data block 3, and the check storage node stores the check block 1.
- the data storage nodes 1 and 2 respectively determine the primary data storage node according to the partition view, the data storage node 1 backs up the data block 2 to the primary data storage node, and the data storage node 2 backs up the data block 3 to the primary data storage node, the primary data storage node Data block 2 and data block 3 are stored separately.
- the primary data storage node allocates the fragment 1 from the hard disk managed by the data block 1 to establish a mapping relationship between the identifier of the data block 1 and the fragment 1;
- the data storage node 1 is a data block from the hard disk managed by the data storage node 1 2Assigning the slice 2, establishing the mapping relationship between the identifier of the data block 2 and the slice 2;
- the data storage node 2 assigns the slice 3 to the data block 3 from the hard disk it manages, and establishes the identifier of the data block 3 and the slice 3 Mapping relationship;
- the verification storage node allocates the slice 4 to the check block 1 from the hard disk managed by the storage node, and establishes the mapping relationship between the identifier of the check block 1 and the slice 4.
- the primary data storage node receives the data block 2 sent by the data storage node 1 and the data block 3 sent by the data storage node 2, and the primary data storage node allocates the fragment 5 and the fragment 6 from the hard disk managed by the primary data storage node, and the primary data storage node establishes the data.
- the mapping relationship between the identifier of the data block and the fragment is taken as an example.
- the mapping relationship between the data block is the mapping relationship between the identifier of the data block and the physical address of the slice; when one process of the storage object program corresponds to multiple hard disks, that is, the storage node manages multiple hard disks, the identification and fragmentation of the data block
- the mapping relationship is a mapping including the identification of the data block and the hard disk storing the data block, and the hard disk to slice storage of the data block.
- the data blocks 2 are respectively stored into the slice 2 and the slice 5
- the data block 3 is stored in the slices 3 and 6, respectively, and the management node establishes and holds the identifier of the data block 2 and the data storage node 1 and the main data storage node.
- mapping establishing and saving the mapping relationship between the identity of the data block 3 and the data storage node 2 and the primary data storage node.
- the data storage node 1 holds a mapping of the identifier of the saved data block 2 with the data storage node 1 and the primary data storage node
- the data storage node 2 holds the mapping relationship between the identifier of the data block 3 and the data storage node 2 and the primary data storage node.
- garbage collection is performed on the data stripe
- the management node can recover the data of the data block in the data storage node and the main data storage node according to the mapping relationship between the identifier of the data block in the data strip and the storage node, thereby improving the data. The efficiency of recycling.
- the metadata storage corresponding to the data uses the same EC algorithm as the data storage.
- the metadata striping based on the EC algorithm has the same partitioned view as the above-described composition data striping based on the EC algorithm, as shown in FIG. 6.
- the distributed storage system includes a management node and (M+N) storage nodes, and the management node and (M+N) storage nodes each store a partitioned view of metadata strips;
- the partitioned view of the data stripe includes a primary data storage node DS A , a data storage node DS i , and a parity storage node CS r ; wherein N is a natural number not less than 2, M is a natural number not less than 1, and A is a natural number 1 to One of N, i is each of the natural numbers 1 to N except A, and r is each of the natural numbers 1 to M; the flow shown in FIG. 7 is executed in the distributed storage system storage:
- Step 701 The management node determines, according to the partitioned view of the metadata stripe, the primary data storage node DS A , the data storage node DS i , and the check storage node CS r for the metadata striping; the metadata strip includes Metadata blocks D A , D i and check block Cr.
- the management node determines, according to the partitioned view of the metadata stripe, the primary data storage node DS A , the data storage node DS i , and the verification storage node CS r for the metadata striping, specifically: The management node determines, according to the write request that generates the metadata in the metadata stripe, the partition corresponding to the metadata stripe; the management node queries the metadata stripe according to the partition corresponding to the metadata stripe The partitioned view determines the primary data storage node DS A , the data storage node DS i , and the parity storage node CS r .
- the management node determines, according to the address carried by the write request, a partition corresponding to the metadata stripe.
- a partition corresponding to the metadata stripe For details, refer to the scheme in which the distributed block device stores the write request sent by the client, and details are not described herein.
- Step 702 The management node sends D i to the data storage node DS i , sends D A to the primary data storage node DS A , and sends C r to the verification storage node CS r .
- Step 703 The verification storage node CS r receives and stores C r .
- Step 704 The data storage node DS i receives and stores D i , and sends D i to the primary data storage node DS A according to the partitioned view of the metadata strip.
- Step 705 The primary data storage node DS A receives and stores D A and D i .
- the verifying the storage node CS r storing the Cr specifically includes: the verification storage node CS r allocates a fragment S r to the Cr, and establishes a mapping between the identifier of the Cr and the fragment S r relation;
- the data storage node DS i D i memory comprises: a data storage node DS i D i is the slice allocated SD i, and establishing the identity of D i of the slice of the mapping SD i relation;
- said main memory data storage node DS a D a and D i comprises: a primary data storage DS a to the node D a dispensing fragment SD a, D a and establishing the identity of the SD a mapping relationship between the slice, said slice allocation D i SD i, and the mapping relation of D i and identifying the fragmentation of SD i.
- the management node establishes a mapping relationship between the identifier of D i and the data storage node DS i and the primary data storage node DS A . Further, further, the data storage node 1 holds a mapping relationship between the identifier of the saved D i and the data storage node DS i and the primary data storage node DS A .
- the management node may recover the data of the metadata block in the data storage node and the main data storage node according to the mapping relationship between the identifier of the metadata block in the metadata strip and the storage node. , improve the efficiency of metadata recovery.
- the metadata blocks in the metadata stripe using the EC algorithm are D 1 , D 2 and D 3 ,
- the block is C 1 .
- the management node where the client is located determines the primary data storage node, the data storage node 1, and the data storage according to the partitioned view "Partition - Primary Data Storage Node - Data Storage Node 1 - Data Storage Node 2 - Verify Storage Node" recorded in the memory 202. Node 2 and the check storage node.
- the partitioned view represents a data storage node 1 and a data node 2 corresponding to the primary data node and other data blocks for storing the metadata stripe, and a check storage node storing the check data, which are stored in the data storage node 1 and the data.
- the backup data storage node of the metadata block of the storage node 2 is the primary data storage node.
- the management node transmits D 1 , D 2 , D 3 , and C 1 in the metadata strip based on the EC algorithm to the primary data storage node, the data storage node 1, the data storage node 2, and the verification storage node 4, respectively.
- Primary data storage node receives and stores D 1, a data storage node receives and stores D 2, data storage node 2 receives and stores the D 3, check storage node receives and stores C 1.
- the data storage nodes 1 and 2 respectively determine the primary data storage node according to the partition view, the data storage node 1 backs up D 2 to the primary data storage node, and the data storage node 2 backs up D 3 to the primary data storage node, and the primary data storage node receives and Store D 2 and D 3 .
- 9, 7 D 1 slice allocated from the hard disk management a mapping relationship between the identifier 7 of slices 1 D primary data storage node; data storage node from a hard disk management
- a slice 8 is allocated for D 2 to establish a mapping relationship between the identifier of D 2 and the slice 8; the data storage node 2 allocates a slice 9 to D 3 from the hard disk it manages, and establishes the identifier of the D 3 and the slice 9 Mapping relationship; verifying that the storage node allocates a fragment 10 to C 1 from its managed hard disk, and establishes a mapping relationship between the identifier of C 1 and the fragment 10 .
- the primary data storage node receives D 2 sent by the data storage node 1 and D 3 sent by the data storage node 2, and the primary data storage node allocates the fragment 11 and the fragment 12 from the hard disk managed by the primary data storage node, and the primary data storage node establishes D 2
- the mapping relationship between the identifier and the fragment 11 and the mapping relationship between the identifier of the D 3 and the fragment 12 are identified.
- the mapping relationship between the identifier of the metadata block and the fragment is taken as an example.
- the mapping relationship with the fragment is the mapping relationship between the identifier of the metadata block and the physical address of the fragment; when one process of the storage object program corresponds to multiple hard disks, that is, the storage node manages multiple hard disks, the identifier of the metadata block
- the mapping relationship with the slice is a mapping including the identity of the metadata block and the hard disk storing the metadata block, and a hard disk to slice mapping of the metadata block.
- D 2 is stored to the slice 8 and the slice 11 respectively
- D 3 is stored in the fragments 9 and 12, respectively
- the management node establishes and saves the mapping of the identifier of the D 2 with the data storage node 1 and the primary data storage node, and establishes D 3 and save the mapping relationship between the identifier and the data storage nodes 2 and the master data storage node.
- the data storage node 1 holds a mapping of the identifier of the saved D 2 with the data storage node 1 and the primary data storage node
- the data storage node 2 stores the mapping relationship between the identifier of D 3 and the data storage node 2 and the primary data storage node.
- the management node may recover the data of the metadata block in the data storage node and the main data storage node according to the mapping relationship between the identifier of the metadata block in the metadata strip and the storage node. , improve the efficiency of metadata recovery.
- the primary data storage node backs up other metadata blocks in the metadata strip, because only the metadata block on the data storage node needs to be in the main data. Backup on the storage node, compared with multiple copies of all metadata blocks in the prior art, the storage space is reduced, and when the client accesses the metadata, only all metadata blocks need to be accessed from the primary data storage node, thereby improving metadata access. speed.
- the embodiment of the present invention further provides a non-transitory computer readable storage medium and a computer program product, a non-transitory computer readable storage medium, and computer program instructions contained in a computer program product, the CPU executing the computer loaded in the memory
- the program instructions are used to implement functions corresponding to the management node and the storage node (the primary data storage node, the data storage node, and the verification storage node) in the implementations of the present invention.
- the slice in the embodiment of the present invention may be a physical block or the like in the hard disk.
- the hard disk in the embodiment of the present invention may be at least one of a mechanical disk and a solid state hard disk as described above.
- the hard disk corresponding to the process of storing the object program in the embodiment of the present invention may also be a storage array or the like, which is not limited in the embodiment of the present invention.
- the disclosed apparatus and method may be implemented in other manners.
- the division of the units described in the device embodiments described above is only one logical function division, and may be further divided in actual implementation, for example, multiple units or components may be combined or may be integrated into another system, or Some features can be ignored or not executed.
- the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be in an electrical, mechanical or other form.
- the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
- each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Human Computer Interaction (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Quality & Reliability (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
L'invention concerne un procédé de stockage de métadonnées dans un système de stockage distribué. Dans un système de stockage distribué, dans un scénario où une bande de métadonnées composée d'un algorithme EC réalise une fiabilité de données, un nœud de stockage de données principal sauvegarde d'autres blocs de métadonnées dans la bande de métadonnées. Comme les blocs de métadonnées sur un nœud de stockage de données ne doivent être sauvegardés que sur le nœud de stockage de données principal, en comparaison avec de multiples répliques de tous les blocs de métadonnées, l'espace de stockage est réduit et, lorsqu'un client accède à des métadonnées, il est nécessaire uniquement d'accéder à tous les blocs de métadonnées à partir du nœud de stockage de données principal, ce qui permet d'augmenter la vitesse d'accès aux métadonnées.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710508014.8 | 2017-06-28 | ||
CN201710508014.8A CN109144406B (zh) | 2017-06-28 | 2017-06-28 | 分布式存储系统中元数据存储方法、系统及存储介质 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2019000949A1 true WO2019000949A1 (fr) | 2019-01-03 |
Family
ID=64740945
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2018/075077 WO2019000949A1 (fr) | 2017-06-28 | 2018-02-02 | Procédé et système de stockage de métadonées dans un système de stockage distribué, et support de stockage |
Country Status (2)
Country | Link |
---|---|
CN (2) | CN109144406B (fr) |
WO (1) | WO2019000949A1 (fr) |
Families Citing this family (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11734248B2 (en) * | 2019-03-04 | 2023-08-22 | Hitachi Vantara Llc | Metadata routing in a distributed system |
WO2021046693A1 (fr) * | 2019-09-09 | 2021-03-18 | 华为技术有限公司 | Procédé de traitement de données dans un système de stockage, dispositif, et système de stockage |
CN111444274B (zh) * | 2020-03-26 | 2021-04-30 | 上海依图网络科技有限公司 | 数据同步方法、数据同步系统及其装置、介质和系统 |
CN111638995B (zh) * | 2020-05-08 | 2024-09-20 | 杭州海康威视系统技术有限公司 | 元数据备份方法、装置及设备、存储介质 |
CN116490847A (zh) * | 2020-11-05 | 2023-07-25 | 阿里巴巴集团控股有限公司 | 支持分布式文件系统中的垃圾收集的虚拟数据复制 |
CN112947864B (zh) * | 2021-03-29 | 2024-03-08 | 南方电网数字平台科技(广东)有限公司 | 元数据的存储方法、装置、设备和存储介质 |
CN115904794A (zh) * | 2021-08-18 | 2023-04-04 | 华为技术有限公司 | 一种数据处理方法及装置 |
CN115268801B (zh) * | 2022-09-30 | 2023-01-10 | 天津卓朗昆仑云软件技术有限公司 | 块设备的备份系统和方法 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102411637A (zh) * | 2011-12-30 | 2012-04-11 | 创新科软件技术(深圳)有限公司 | 分布式文件系统的元数据管理方法 |
CN103699494A (zh) * | 2013-12-06 | 2014-04-02 | 北京奇虎科技有限公司 | 一种数据存储方法、数据存储设备和分布式存储系统 |
US20140310489A1 (en) * | 2013-04-16 | 2014-10-16 | International Business Machines Corporation | Managing metadata and data for a logical volume in a distributed and declustered system |
CN106599308A (zh) * | 2016-12-29 | 2017-04-26 | 郭晓凤 | 一种分布式元数据管理方法及系统 |
Family Cites Families (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7051155B2 (en) * | 2002-08-05 | 2006-05-23 | Sun Microsystems, Inc. | Method and system for striping data to accommodate integrity metadata |
CN103399823B (zh) * | 2011-12-31 | 2016-03-30 | 华为数字技术(成都)有限公司 | 业务数据的存储方法、设备和系统 |
US8914668B2 (en) * | 2012-09-06 | 2014-12-16 | International Business Machines Corporation | Asynchronous raid stripe writes to enable response to media errors |
CN102937964B (zh) * | 2012-09-28 | 2015-02-11 | 无锡江南计算技术研究所 | 基于分布式系统的智能数据服务方法 |
US9529675B2 (en) * | 2013-07-26 | 2016-12-27 | Huawei Technologies Co., Ltd. | Data recovery method, data recovery device and distributed storage system |
CN103729436A (zh) * | 2013-12-27 | 2014-04-16 | 中国科学院信息工程研究所 | 一种分布式元数据管理方法及系统 |
US9772787B2 (en) * | 2014-03-31 | 2017-09-26 | Amazon Technologies, Inc. | File storage using variable stripe sizes |
EP3152648B1 (fr) * | 2014-06-04 | 2021-08-04 | Pure Storage, Inc. | Reconfiguration automatique d'une topologie de mémoire de stockage |
CN106662983B (zh) * | 2015-12-31 | 2019-04-12 | 华为技术有限公司 | 分布式存储系统中的数据重建的方法、装置和系统 |
CN106294772B (zh) * | 2016-08-11 | 2019-03-19 | 电子科技大学 | 分布式内存列式数据库的缓存管理方法 |
-
2017
- 2017-06-28 CN CN201710508014.8A patent/CN109144406B/zh active Active
- 2017-06-28 CN CN202010648620.1A patent/CN111949210A/zh active Pending
-
2018
- 2018-02-02 WO PCT/CN2018/075077 patent/WO2019000949A1/fr active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102411637A (zh) * | 2011-12-30 | 2012-04-11 | 创新科软件技术(深圳)有限公司 | 分布式文件系统的元数据管理方法 |
US20140310489A1 (en) * | 2013-04-16 | 2014-10-16 | International Business Machines Corporation | Managing metadata and data for a logical volume in a distributed and declustered system |
CN103699494A (zh) * | 2013-12-06 | 2014-04-02 | 北京奇虎科技有限公司 | 一种数据存储方法、数据存储设备和分布式存储系统 |
CN106599308A (zh) * | 2016-12-29 | 2017-04-26 | 郭晓凤 | 一种分布式元数据管理方法及系统 |
Non-Patent Citations (1)
Title |
---|
ZHANG, BO.: "Research on the metadata management of multinamenodes based on HDFS", CHINA MASTER'S THESES FULL-TEXT DATABASE (ELECTRONIC TECHNOLOGY & INFORMATION SCIENCE), vol. 2014, no. 5, 15 May 2014 (2014-05-15), ISSN: 1674-0246 * |
Also Published As
Publication number | Publication date |
---|---|
CN111949210A (zh) | 2020-11-17 |
CN109144406B (zh) | 2020-08-07 |
CN109144406A (zh) | 2019-01-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11379142B2 (en) | Snapshot-enabled storage system implementing algorithm for efficient reclamation of snapshot storage space | |
US11853780B2 (en) | Architecture for managing I/O and storage for a virtualization environment | |
US11386042B2 (en) | Snapshot-enabled storage system implementing algorithm for efficient reading of data from stored snapshots | |
WO2019000949A1 (fr) | Procédé et système de stockage de métadonées dans un système de stockage distribué, et support de stockage | |
US11243706B2 (en) | Fragment management method and fragment management apparatus | |
US10169365B2 (en) | Multiple deduplication domains in network storage system | |
US10374792B1 (en) | Layout-independent cryptographic stamp of a distributed dataset | |
CN102255962B (zh) | 一种分布式存储方法、装置和系统 | |
US11061594B1 (en) | Enhanced data encryption in distributed datastores using a cluster-wide fixed random tweak | |
US8868877B2 (en) | Creating encrypted storage volumes based on thin-provisioning mode information | |
JP2018532166A (ja) | 記憶システムにおける重複排除のための方法、記憶システムおよびコントローラ | |
US11199990B2 (en) | Data reduction reporting in storage systems | |
US8566541B2 (en) | Storage system storing electronic modules applied to electronic objects common to several computers, and storage control method for the same | |
US20190114076A1 (en) | Method and Apparatus for Storing Data in Distributed Block Storage System, and Computer Readable Storage Medium | |
US11573711B2 (en) | Enhanced data encryption in distributed datastores using random tweaks stored in data blocks | |
WO2020134143A1 (fr) | Procédé de reconstruction de segment dans un système de stockage, et serveur de segmentation | |
US11775194B2 (en) | Data storage method and apparatus in distributed storage system, and computer program product | |
US20210311654A1 (en) | Distributed Storage System and Computer Program Product | |
US11144445B1 (en) | Use of compression domains that are more granular than storage allocation units | |
CN107145305B (zh) | 一种分布式物理磁盘的使用方法及虚拟机 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 18825500 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 18825500 Country of ref document: EP Kind code of ref document: A1 |