WO2019000949A1 - Procédé et système de stockage de métadonées dans un système de stockage distribué, et support de stockage - Google Patents

Procédé et système de stockage de métadonées dans un système de stockage distribué, et support de stockage Download PDF

Info

Publication number
WO2019000949A1
WO2019000949A1 PCT/CN2018/075077 CN2018075077W WO2019000949A1 WO 2019000949 A1 WO2019000949 A1 WO 2019000949A1 CN 2018075077 W CN2018075077 W CN 2018075077W WO 2019000949 A1 WO2019000949 A1 WO 2019000949A1
Authority
WO
WIPO (PCT)
Prior art keywords
storage node
metadata
data storage
node
stripe
Prior art date
Application number
PCT/CN2018/075077
Other languages
English (en)
Chinese (zh)
Inventor
饶蓉
魏明昌
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2019000949A1 publication Critical patent/WO2019000949A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • G06F3/0611Improving I/O performance in relation to response time
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0608Saving storage space on storage systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0655Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1095Replication or mirroring of data, e.g. scheduling or transport for data synchronisation between network nodes
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1458Management of the backup or restore process
    • G06F11/1464Management of the backup or restore process for networked environments

Definitions

  • the present invention relates to the field of data storage technologies, and in particular, to a metadata storage method, system, and storage medium in a distributed storage system.
  • Metadata such as a logical address, a physical address, and the like of the recorded data are generated, and the metadata is also stored in the storage node.
  • a common metadata storage method is to break up the blocks in the metadata stripe to each storage node. When reading the metadata, the blocks in the metadata stripe need to be read from each storage node, and the pieces are pieced together into metadata strips. However, the amount of data forwarding between storage nodes is large, which affects performance. Another way metadata is stored in multiple copies on the storage node, but it increases storage overhead.
  • an embodiment of the present invention provides a metadata storage solution in a distributed storage system, where the distributed storage system includes a management node and (M+N) storage nodes, and the management node and (M) +N) storage nodes each store a partitioned view of the metadata stripe;
  • the partitioned view of the metadata stripe includes a primary data storage node DS A , a data storage node DS i , and a check storage node CS r ;
  • M is a natural number not less than 1
  • A is one of natural numbers 1 to N
  • i is each of natural numbers 1 to N except A
  • r is each of natural numbers 1 to M
  • the management node determines, according to the partitioned view of the metadata stripe, the primary data storage node DS A , the data storage node DS i , and the verification storage node CS r for the metadata striping;
  • the metadata stripe includes a metadata block D A , D i
  • the primary data storage node DS A backs up the other metadata blocks D i in the metadata strip because only the data storage node DS needs to be used.
  • metadata block D i on i on the primary backup data storage node DS a compared with the prior art multiple copies all metadata block does not need a copy of the check block, reduced storage space, while access to the metadata client At the same time, all metadata blocks can be accessed from the primary data storage node DS A , which improves the speed of metadata access.
  • the distributed storage system of the present solution can be stored for a distributed file system, a distributed object storage system, or a distributed block device.
  • the management node determines, according to the partitioned view of the metadata stripe, the primary data storage node DS A , the data storage node DS i , and the verification storage node CS r for the metadata striping, specifically including: Determining, by the management node, a partition corresponding to the metadata strip according to a write request for generating metadata in the metadata stripe; the management node querying the metadata strip according to a partition corresponding to the metadata stripe The partitioned view determines the primary data storage node DS A , the data storage node DS i , and the parity storage node CS r .
  • the management node determines, according to the address carried by the write request, a partition corresponding to the metadata stripe.
  • the verifying the storage node CS r storing the Cr specifically includes: the verification storage node CS r allocates a fragment S r to the Cr, and establishes the identifier of the Cr and the fragment S r mapping relationship;
  • the data storage node DS i D i memory comprises: a data storage node DS i D i is the slice allocated SD i, and establishing the identity of D i and SD i of the slice mapping relationship;
  • said main memory data storage node DS a D a and D i comprises: a primary data storage DS a to the node D a dispensing fragment SD a, and the identification and the establishment of D a Describe the mapping relationship of the fragment SD A , allocate the fragment SD i to the D i , and establish a mapping relationship between the identifier of the D i and the fragment SD i .
  • the management node establishes a mapping relationship between the identifier of D i and the data storage node DS i and the primary data storage node DS A .
  • the management node may recover the data of the metadata block in the data storage node and the main data storage node according to the mapping relationship between the identifier of the metadata block in the metadata strip and the storage node. , improve the efficiency of metadata recovery.
  • the embodiment of the present invention further provides a distributed storage system, where the distributed storage system includes a management node and (M+N) storage nodes, and the management node and (M+ Each of the N) storage nodes stores a partitioned view of the metadata stripe; the partitioned view of the metadata stripe includes a primary data storage node DS A , a data storage node DS i , and a check storage node CS r ; wherein N is a natural number not less than 2, M is a natural number not less than 1, A is one of natural numbers 1 to N, i is each of natural numbers 1 to N except A, and r is each of natural numbers 1 to M;
  • the distributed storage system is used to implement various implementations of the first aspect.
  • the present invention also provides a non-volatile computer readable storage medium and a computer program product, which are included in a memory-loaded non-volatile computer readable storage medium and computer program product of a storage device provided by an embodiment of the present invention.
  • Computer program instructions being operative in a distributed storage system, the distributed storage system comprising a management node and (M+N) storage nodes, the management node and (M+N) storage nodes all storing a partitioned view having a metadata stripe; the partitioned view of the metadata stripe includes a primary data storage node DS A , a data storage node DS i , and a check storage node CS r ; wherein N is a natural number not less than 2, M For a natural number not less than 1, A is one of natural numbers 1 to N, i is each of natural numbers 1 to N except A, and r is each of natural numbers 1 to M; when one or more computers execute the computer program instructions are stored as the management node of the distributed system, the data storage master node DS a, the data storage node DS i and the check node memory for implementing a first aspect of the CS r Kind of implementation.
  • the metadata storage scheme in the various distributed storage systems disclosed in the first aspect can also be applied to the storage of data corresponding to the metadata. Accordingly, the distributed storage system of the second aspect and the non-transitory computer readable storage medium and computer program product of the third aspect are equally applicable to data storage.
  • FIG. 1 is a schematic diagram of a storage structure of a distributed block device according to an embodiment of the present invention
  • FIG. 2 is a schematic structural diagram of a server in a distributed block device according to an embodiment of the present invention
  • FIG. 3 is a schematic diagram of a relationship between a data stripe and a partition view according to an embodiment of the present invention
  • FIG. 4 is a schematic diagram of data striping according to an embodiment of the present invention.
  • FIG. 5 is a schematic diagram of a partition view according to an embodiment of the present invention.
  • FIG. 6 is a schematic diagram of a relationship between a metadata stripe and a partition view according to an embodiment of the present invention.
  • FIG. 7 is a flowchart of metadata storage according to an embodiment of the present invention.
  • FIG. 8 is a schematic diagram of metadata striping according to an embodiment of the present invention.
  • FIG. 9 is a schematic diagram of metadata storage according to an embodiment of the present invention.
  • Distributed storage systems mainly include distributed file system storage, distributed object storage, and distributed block device storage, such as of Series products.
  • the embodiment of the present invention is described by taking a distributed block device storage as an example.
  • the distributed block device storage includes a plurality of servers 1, a server 2, a server 3, a server 4, a server 5, and a server 6, and the servers communicate with each other.
  • the number of servers in the distributed block device storage may be increased according to actual requirements, which is not limited by the embodiment of the present invention.
  • the server stored in the distributed block device includes the structure as shown in FIG. 2.
  • each server in the distributed block device storage includes a central processing unit (CPU) 201, a memory 202, a hard disk 1, a hard disk 2, and a hard disk 3.
  • the memory 202 stores computer instructions, and the CPU 201 executes Program instructions in memory 202 perform the corresponding operations.
  • the hard disk can be at least one of a mechanical hard disk and a solid state hard disk.
  • a Field Programmable Gate Array (FPGA) or other hardware may also be used for the corresponding operations of the CPU 201, or the FPGA or other hardware may perform the corresponding operations together with the CPU 201.
  • FPGA Field Programmable Gate Array
  • the embodiments of the present invention are generally described as a processor for implementing the corresponding operations described above.
  • an application is loaded in the memory 202, and the CPU 201 executes an application instruction in the memory 202, and the server serves as a client.
  • the application can be a virtual machine (VM) or a specific application, such as office software.
  • the client stores write data to or reads data from the distributed block device store.
  • the storage management program is loaded in the memory 202, and the CPU 201 executes the storage management program instruction as the virtual block storage management program in the memory 202, and the server acts as a management node, and is responsible for managing the volume metadata, and provides a block protocol access interface to the client.
  • a distributed storage access point service is provided for the client, so that the client can access the storage resource stored by the distributed block device through the management node.
  • the storage object program is loaded in the memory 202, and the CPU 201 executes the storage object program instruction in the memory 202, and the server functions as a storage node for performing a specific input/output (I/O) operation.
  • I/O input/output
  • one hard disk corresponds to running a storage object program process by default.
  • Each storage object program process manages one hard disk, and the server runs each storage object program.
  • the process acts as a storage node.
  • the embodiment of the present invention describes a case where a storage object program process manages a hard disk.
  • each storage object program When the distributed block device is initialized, the process of each storage object program will manage the hard disk in units of 1 MB, and record the allocation information of each 1 MB fragment in the metadata management area of the hard disk.
  • Storage resource pool The storage management program communicates with all the storage object programs of the resource pools that it can access, that is, the management node communicates with all the storage nodes of the resource pool that the management node can access, so that the management node can concurrently access all the hard disks of the resource pool.
  • the hash space (such as 0 ⁇ 2 ⁇ 32,) is divided into N equal parts, and each partition is a partition, and the N equal parts are equally divided according to the number of hard disks.
  • the default block storage device storage N defaults to 3600, that is, the partitions are P1, P2, P3...P3600, respectively.
  • each storage node carries 200 partitions. The corresponding relationship between the partition and the storage node, that is, the partition view, is allocated when the distributed block device is initialized, and then adjusted according to the change of the number of hard disks in the distributed block device storage.
  • the server stored by the distributed block device saves the partition view in the memory 202, and the management node uses the partition view for fast routing.
  • Each partition node also stores all partitioned views of the distributed block device storage system, that is, the correspondence between each partition and the storage node.
  • the Erasure Coding (EC) algorithm can be used to improve data reliability, such as using 3+1 mode, that is, 3 data blocks and 1 check block to form data points.
  • the partition view is "Partition - Primary Data Storage Node - Data Storage Node 1 - Data Storage Node 2 - Verify Storage Node, for example, the partition view is shown in Figure 5.
  • This partition view a data storage node 1 and a data node 2 representing a partition corresponding primary data node and other data blocks for storing data strips, and a check storage node storing check data, which are stored in the data storage node 1 and the data storage node 2
  • the backup data storage node of the data block is the primary data storage node.
  • the distributed block device storage will logically slice each logical unit number (LUN) by 1MB. For example, a 1GB LUN will be sliced into 1024*1MB fragments.
  • LUN ID Identity
  • logical block address Logical Block
  • SCSI Small Computer System Interface
  • Address, LBA) ID and data to be written the management node where the client is located receives the write request, and forms a key according to the LUN ID and the LBA ID.
  • the key will contain the rounding calculation information of the LBA ID to 1 MB.
  • An integer (within 0 ⁇ 2 ⁇ 32) is calculated by the Distributed Hash Table (DHT) Hash and falls in the specified partition; the management node where the client is located determines the main according to the partitioned view recorded in the memory 202.
  • the management node sends the data block 1, the data block 2, the data block 3, and the check block 4 in the EC data stripe to the main data storage, respectively.
  • the main data storage node stores the data block 1, the data storage node 1 stores the data block 2, the data storage node 2 stores the data block 3, and the check storage node stores the check block 1.
  • the data storage nodes 1 and 2 respectively determine the primary data storage node according to the partition view, the data storage node 1 backs up the data block 2 to the primary data storage node, and the data storage node 2 backs up the data block 3 to the primary data storage node, the primary data storage node Data block 2 and data block 3 are stored separately.
  • the primary data storage node allocates the fragment 1 from the hard disk managed by the data block 1 to establish a mapping relationship between the identifier of the data block 1 and the fragment 1;
  • the data storage node 1 is a data block from the hard disk managed by the data storage node 1 2Assigning the slice 2, establishing the mapping relationship between the identifier of the data block 2 and the slice 2;
  • the data storage node 2 assigns the slice 3 to the data block 3 from the hard disk it manages, and establishes the identifier of the data block 3 and the slice 3 Mapping relationship;
  • the verification storage node allocates the slice 4 to the check block 1 from the hard disk managed by the storage node, and establishes the mapping relationship between the identifier of the check block 1 and the slice 4.
  • the primary data storage node receives the data block 2 sent by the data storage node 1 and the data block 3 sent by the data storage node 2, and the primary data storage node allocates the fragment 5 and the fragment 6 from the hard disk managed by the primary data storage node, and the primary data storage node establishes the data.
  • the mapping relationship between the identifier of the data block and the fragment is taken as an example.
  • the mapping relationship between the data block is the mapping relationship between the identifier of the data block and the physical address of the slice; when one process of the storage object program corresponds to multiple hard disks, that is, the storage node manages multiple hard disks, the identification and fragmentation of the data block
  • the mapping relationship is a mapping including the identification of the data block and the hard disk storing the data block, and the hard disk to slice storage of the data block.
  • the data blocks 2 are respectively stored into the slice 2 and the slice 5
  • the data block 3 is stored in the slices 3 and 6, respectively, and the management node establishes and holds the identifier of the data block 2 and the data storage node 1 and the main data storage node.
  • mapping establishing and saving the mapping relationship between the identity of the data block 3 and the data storage node 2 and the primary data storage node.
  • the data storage node 1 holds a mapping of the identifier of the saved data block 2 with the data storage node 1 and the primary data storage node
  • the data storage node 2 holds the mapping relationship between the identifier of the data block 3 and the data storage node 2 and the primary data storage node.
  • garbage collection is performed on the data stripe
  • the management node can recover the data of the data block in the data storage node and the main data storage node according to the mapping relationship between the identifier of the data block in the data strip and the storage node, thereby improving the data. The efficiency of recycling.
  • the metadata storage corresponding to the data uses the same EC algorithm as the data storage.
  • the metadata striping based on the EC algorithm has the same partitioned view as the above-described composition data striping based on the EC algorithm, as shown in FIG. 6.
  • the distributed storage system includes a management node and (M+N) storage nodes, and the management node and (M+N) storage nodes each store a partitioned view of metadata strips;
  • the partitioned view of the data stripe includes a primary data storage node DS A , a data storage node DS i , and a parity storage node CS r ; wherein N is a natural number not less than 2, M is a natural number not less than 1, and A is a natural number 1 to One of N, i is each of the natural numbers 1 to N except A, and r is each of the natural numbers 1 to M; the flow shown in FIG. 7 is executed in the distributed storage system storage:
  • Step 701 The management node determines, according to the partitioned view of the metadata stripe, the primary data storage node DS A , the data storage node DS i , and the check storage node CS r for the metadata striping; the metadata strip includes Metadata blocks D A , D i and check block Cr.
  • the management node determines, according to the partitioned view of the metadata stripe, the primary data storage node DS A , the data storage node DS i , and the verification storage node CS r for the metadata striping, specifically: The management node determines, according to the write request that generates the metadata in the metadata stripe, the partition corresponding to the metadata stripe; the management node queries the metadata stripe according to the partition corresponding to the metadata stripe The partitioned view determines the primary data storage node DS A , the data storage node DS i , and the parity storage node CS r .
  • the management node determines, according to the address carried by the write request, a partition corresponding to the metadata stripe.
  • a partition corresponding to the metadata stripe For details, refer to the scheme in which the distributed block device stores the write request sent by the client, and details are not described herein.
  • Step 702 The management node sends D i to the data storage node DS i , sends D A to the primary data storage node DS A , and sends C r to the verification storage node CS r .
  • Step 703 The verification storage node CS r receives and stores C r .
  • Step 704 The data storage node DS i receives and stores D i , and sends D i to the primary data storage node DS A according to the partitioned view of the metadata strip.
  • Step 705 The primary data storage node DS A receives and stores D A and D i .
  • the verifying the storage node CS r storing the Cr specifically includes: the verification storage node CS r allocates a fragment S r to the Cr, and establishes a mapping between the identifier of the Cr and the fragment S r relation;
  • the data storage node DS i D i memory comprises: a data storage node DS i D i is the slice allocated SD i, and establishing the identity of D i of the slice of the mapping SD i relation;
  • said main memory data storage node DS a D a and D i comprises: a primary data storage DS a to the node D a dispensing fragment SD a, D a and establishing the identity of the SD a mapping relationship between the slice, said slice allocation D i SD i, and the mapping relation of D i and identifying the fragmentation of SD i.
  • the management node establishes a mapping relationship between the identifier of D i and the data storage node DS i and the primary data storage node DS A . Further, further, the data storage node 1 holds a mapping relationship between the identifier of the saved D i and the data storage node DS i and the primary data storage node DS A .
  • the management node may recover the data of the metadata block in the data storage node and the main data storage node according to the mapping relationship between the identifier of the metadata block in the metadata strip and the storage node. , improve the efficiency of metadata recovery.
  • the metadata blocks in the metadata stripe using the EC algorithm are D 1 , D 2 and D 3 ,
  • the block is C 1 .
  • the management node where the client is located determines the primary data storage node, the data storage node 1, and the data storage according to the partitioned view "Partition - Primary Data Storage Node - Data Storage Node 1 - Data Storage Node 2 - Verify Storage Node" recorded in the memory 202. Node 2 and the check storage node.
  • the partitioned view represents a data storage node 1 and a data node 2 corresponding to the primary data node and other data blocks for storing the metadata stripe, and a check storage node storing the check data, which are stored in the data storage node 1 and the data.
  • the backup data storage node of the metadata block of the storage node 2 is the primary data storage node.
  • the management node transmits D 1 , D 2 , D 3 , and C 1 in the metadata strip based on the EC algorithm to the primary data storage node, the data storage node 1, the data storage node 2, and the verification storage node 4, respectively.
  • Primary data storage node receives and stores D 1, a data storage node receives and stores D 2, data storage node 2 receives and stores the D 3, check storage node receives and stores C 1.
  • the data storage nodes 1 and 2 respectively determine the primary data storage node according to the partition view, the data storage node 1 backs up D 2 to the primary data storage node, and the data storage node 2 backs up D 3 to the primary data storage node, and the primary data storage node receives and Store D 2 and D 3 .
  • 9, 7 D 1 slice allocated from the hard disk management a mapping relationship between the identifier 7 of slices 1 D primary data storage node; data storage node from a hard disk management
  • a slice 8 is allocated for D 2 to establish a mapping relationship between the identifier of D 2 and the slice 8; the data storage node 2 allocates a slice 9 to D 3 from the hard disk it manages, and establishes the identifier of the D 3 and the slice 9 Mapping relationship; verifying that the storage node allocates a fragment 10 to C 1 from its managed hard disk, and establishes a mapping relationship between the identifier of C 1 and the fragment 10 .
  • the primary data storage node receives D 2 sent by the data storage node 1 and D 3 sent by the data storage node 2, and the primary data storage node allocates the fragment 11 and the fragment 12 from the hard disk managed by the primary data storage node, and the primary data storage node establishes D 2
  • the mapping relationship between the identifier and the fragment 11 and the mapping relationship between the identifier of the D 3 and the fragment 12 are identified.
  • the mapping relationship between the identifier of the metadata block and the fragment is taken as an example.
  • the mapping relationship with the fragment is the mapping relationship between the identifier of the metadata block and the physical address of the fragment; when one process of the storage object program corresponds to multiple hard disks, that is, the storage node manages multiple hard disks, the identifier of the metadata block
  • the mapping relationship with the slice is a mapping including the identity of the metadata block and the hard disk storing the metadata block, and a hard disk to slice mapping of the metadata block.
  • D 2 is stored to the slice 8 and the slice 11 respectively
  • D 3 is stored in the fragments 9 and 12, respectively
  • the management node establishes and saves the mapping of the identifier of the D 2 with the data storage node 1 and the primary data storage node, and establishes D 3 and save the mapping relationship between the identifier and the data storage nodes 2 and the master data storage node.
  • the data storage node 1 holds a mapping of the identifier of the saved D 2 with the data storage node 1 and the primary data storage node
  • the data storage node 2 stores the mapping relationship between the identifier of D 3 and the data storage node 2 and the primary data storage node.
  • the management node may recover the data of the metadata block in the data storage node and the main data storage node according to the mapping relationship between the identifier of the metadata block in the metadata strip and the storage node. , improve the efficiency of metadata recovery.
  • the primary data storage node backs up other metadata blocks in the metadata strip, because only the metadata block on the data storage node needs to be in the main data. Backup on the storage node, compared with multiple copies of all metadata blocks in the prior art, the storage space is reduced, and when the client accesses the metadata, only all metadata blocks need to be accessed from the primary data storage node, thereby improving metadata access. speed.
  • the embodiment of the present invention further provides a non-transitory computer readable storage medium and a computer program product, a non-transitory computer readable storage medium, and computer program instructions contained in a computer program product, the CPU executing the computer loaded in the memory
  • the program instructions are used to implement functions corresponding to the management node and the storage node (the primary data storage node, the data storage node, and the verification storage node) in the implementations of the present invention.
  • the slice in the embodiment of the present invention may be a physical block or the like in the hard disk.
  • the hard disk in the embodiment of the present invention may be at least one of a mechanical disk and a solid state hard disk as described above.
  • the hard disk corresponding to the process of storing the object program in the embodiment of the present invention may also be a storage array or the like, which is not limited in the embodiment of the present invention.
  • the disclosed apparatus and method may be implemented in other manners.
  • the division of the units described in the device embodiments described above is only one logical function division, and may be further divided in actual implementation, for example, multiple units or components may be combined or may be integrated into another system, or Some features can be ignored or not executed.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be in an electrical, mechanical or other form.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
  • each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Quality & Reliability (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

L'invention concerne un procédé de stockage de métadonnées dans un système de stockage distribué. Dans un système de stockage distribué, dans un scénario où une bande de métadonnées composée d'un algorithme EC réalise une fiabilité de données, un nœud de stockage de données principal sauvegarde d'autres blocs de métadonnées dans la bande de métadonnées. Comme les blocs de métadonnées sur un nœud de stockage de données ne doivent être sauvegardés que sur le nœud de stockage de données principal, en comparaison avec de multiples répliques de tous les blocs de métadonnées, l'espace de stockage est réduit et, lorsqu'un client accède à des métadonnées, il est nécessaire uniquement d'accéder à tous les blocs de métadonnées à partir du nœud de stockage de données principal, ce qui permet d'augmenter la vitesse d'accès aux métadonnées.
PCT/CN2018/075077 2017-06-28 2018-02-02 Procédé et système de stockage de métadonées dans un système de stockage distribué, et support de stockage WO2019000949A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710508014.8 2017-06-28
CN201710508014.8A CN109144406B (zh) 2017-06-28 2017-06-28 分布式存储系统中元数据存储方法、系统及存储介质

Publications (1)

Publication Number Publication Date
WO2019000949A1 true WO2019000949A1 (fr) 2019-01-03

Family

ID=64740945

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/075077 WO2019000949A1 (fr) 2017-06-28 2018-02-02 Procédé et système de stockage de métadonées dans un système de stockage distribué, et support de stockage

Country Status (2)

Country Link
CN (2) CN109144406B (fr)
WO (1) WO2019000949A1 (fr)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11734248B2 (en) * 2019-03-04 2023-08-22 Hitachi Vantara Llc Metadata routing in a distributed system
WO2021046693A1 (fr) * 2019-09-09 2021-03-18 华为技术有限公司 Procédé de traitement de données dans un système de stockage, dispositif, et système de stockage
CN111444274B (zh) * 2020-03-26 2021-04-30 上海依图网络科技有限公司 数据同步方法、数据同步系统及其装置、介质和系统
CN111638995B (zh) * 2020-05-08 2024-09-20 杭州海康威视系统技术有限公司 元数据备份方法、装置及设备、存储介质
CN116490847A (zh) * 2020-11-05 2023-07-25 阿里巴巴集团控股有限公司 支持分布式文件系统中的垃圾收集的虚拟数据复制
CN112947864B (zh) * 2021-03-29 2024-03-08 南方电网数字平台科技(广东)有限公司 元数据的存储方法、装置、设备和存储介质
CN115904794A (zh) * 2021-08-18 2023-04-04 华为技术有限公司 一种数据处理方法及装置
CN115268801B (zh) * 2022-09-30 2023-01-10 天津卓朗昆仑云软件技术有限公司 块设备的备份系统和方法

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102411637A (zh) * 2011-12-30 2012-04-11 创新科软件技术(深圳)有限公司 分布式文件系统的元数据管理方法
CN103699494A (zh) * 2013-12-06 2014-04-02 北京奇虎科技有限公司 一种数据存储方法、数据存储设备和分布式存储系统
US20140310489A1 (en) * 2013-04-16 2014-10-16 International Business Machines Corporation Managing metadata and data for a logical volume in a distributed and declustered system
CN106599308A (zh) * 2016-12-29 2017-04-26 郭晓凤 一种分布式元数据管理方法及系统

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7051155B2 (en) * 2002-08-05 2006-05-23 Sun Microsystems, Inc. Method and system for striping data to accommodate integrity metadata
CN103399823B (zh) * 2011-12-31 2016-03-30 华为数字技术(成都)有限公司 业务数据的存储方法、设备和系统
US8914668B2 (en) * 2012-09-06 2014-12-16 International Business Machines Corporation Asynchronous raid stripe writes to enable response to media errors
CN102937964B (zh) * 2012-09-28 2015-02-11 无锡江南计算技术研究所 基于分布式系统的智能数据服务方法
US9529675B2 (en) * 2013-07-26 2016-12-27 Huawei Technologies Co., Ltd. Data recovery method, data recovery device and distributed storage system
CN103729436A (zh) * 2013-12-27 2014-04-16 中国科学院信息工程研究所 一种分布式元数据管理方法及系统
US9772787B2 (en) * 2014-03-31 2017-09-26 Amazon Technologies, Inc. File storage using variable stripe sizes
EP3152648B1 (fr) * 2014-06-04 2021-08-04 Pure Storage, Inc. Reconfiguration automatique d'une topologie de mémoire de stockage
CN106662983B (zh) * 2015-12-31 2019-04-12 华为技术有限公司 分布式存储系统中的数据重建的方法、装置和系统
CN106294772B (zh) * 2016-08-11 2019-03-19 电子科技大学 分布式内存列式数据库的缓存管理方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102411637A (zh) * 2011-12-30 2012-04-11 创新科软件技术(深圳)有限公司 分布式文件系统的元数据管理方法
US20140310489A1 (en) * 2013-04-16 2014-10-16 International Business Machines Corporation Managing metadata and data for a logical volume in a distributed and declustered system
CN103699494A (zh) * 2013-12-06 2014-04-02 北京奇虎科技有限公司 一种数据存储方法、数据存储设备和分布式存储系统
CN106599308A (zh) * 2016-12-29 2017-04-26 郭晓凤 一种分布式元数据管理方法及系统

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
ZHANG, BO.: "Research on the metadata management of multinamenodes based on HDFS", CHINA MASTER'S THESES FULL-TEXT DATABASE (ELECTRONIC TECHNOLOGY & INFORMATION SCIENCE), vol. 2014, no. 5, 15 May 2014 (2014-05-15), ISSN: 1674-0246 *

Also Published As

Publication number Publication date
CN111949210A (zh) 2020-11-17
CN109144406B (zh) 2020-08-07
CN109144406A (zh) 2019-01-04

Similar Documents

Publication Publication Date Title
US11379142B2 (en) Snapshot-enabled storage system implementing algorithm for efficient reclamation of snapshot storage space
US11853780B2 (en) Architecture for managing I/O and storage for a virtualization environment
US11386042B2 (en) Snapshot-enabled storage system implementing algorithm for efficient reading of data from stored snapshots
WO2019000949A1 (fr) Procédé et système de stockage de métadonées dans un système de stockage distribué, et support de stockage
US11243706B2 (en) Fragment management method and fragment management apparatus
US10169365B2 (en) Multiple deduplication domains in network storage system
US10374792B1 (en) Layout-independent cryptographic stamp of a distributed dataset
CN102255962B (zh) 一种分布式存储方法、装置和系统
US11061594B1 (en) Enhanced data encryption in distributed datastores using a cluster-wide fixed random tweak
US8868877B2 (en) Creating encrypted storage volumes based on thin-provisioning mode information
JP2018532166A (ja) 記憶システムにおける重複排除のための方法、記憶システムおよびコントローラ
US11199990B2 (en) Data reduction reporting in storage systems
US8566541B2 (en) Storage system storing electronic modules applied to electronic objects common to several computers, and storage control method for the same
US20190114076A1 (en) Method and Apparatus for Storing Data in Distributed Block Storage System, and Computer Readable Storage Medium
US11573711B2 (en) Enhanced data encryption in distributed datastores using random tweaks stored in data blocks
WO2020134143A1 (fr) Procédé de reconstruction de segment dans un système de stockage, et serveur de segmentation
US11775194B2 (en) Data storage method and apparatus in distributed storage system, and computer program product
US20210311654A1 (en) Distributed Storage System and Computer Program Product
US11144445B1 (en) Use of compression domains that are more granular than storage allocation units
CN107145305B (zh) 一种分布式物理磁盘的使用方法及虚拟机

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18825500

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18825500

Country of ref document: EP

Kind code of ref document: A1