CN109144406A - Metadata storing method, system and storage medium in distributed memory system - Google Patents

Metadata storing method, system and storage medium in distributed memory system Download PDF

Info

Publication number
CN109144406A
CN109144406A CN201710508014.8A CN201710508014A CN109144406A CN 109144406 A CN109144406 A CN 109144406A CN 201710508014 A CN201710508014 A CN 201710508014A CN 109144406 A CN109144406 A CN 109144406A
Authority
CN
China
Prior art keywords
node
metadata
slitting
memory node
fragment
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710508014.8A
Other languages
Chinese (zh)
Other versions
CN109144406B (en
Inventor
饶蓉
魏明昌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN201710508014.8A priority Critical patent/CN109144406B/en
Priority to CN202010648620.1A priority patent/CN111949210A/en
Priority to PCT/CN2018/075077 priority patent/WO2019000949A1/en
Publication of CN109144406A publication Critical patent/CN109144406A/en
Application granted granted Critical
Publication of CN109144406B publication Critical patent/CN109144406B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • G06F3/0611Improving I/O performance in relation to response time
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0608Saving storage space on storage systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0655Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1095Replication or mirroring of data, e.g. scheduling or transport for data synchronisation between network nodes
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1458Management of the backup or restore process
    • G06F11/1464Management of the backup or restore process for networked environments

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Quality & Reliability (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Disclose metadata storing method in a kind of distributed memory system, in distributed memory system, under the scene of the metadata slitting realization data reliability of EC algorithm composition, other meta data blocks in primary data store node standby metadata slitting, because only needing to back up the meta data block on data memory node on primary data store node, the more copies of all meta data blocks compared to the prior art, reduce memory space, simultaneously when client accesses metadata, it only needs from all meta data blocks of primary data store node visit, improve metadata access speed.

Description

Metadata storing method, system and storage medium in distributed memory system
Technical field
The present invention relates to metadata storage sides in technical field of data storage more particularly to a kind of distributed memory system Method, system and storage medium.
Background technique
In distributed memory system, management node stores user data to after memory node, can generate record data Logical address, the metadata of physical address etc., metadata will also store memory node.Common metadata storage mode is Block in metadata slitting is broken up into each memory node, when reading the metadata, needs to read metadata from each memory node Block in slitting scrabbles up metadata slitting, but data forwarding amount is big between memory node, influences performance.Another mode member Data are stored in memory node with more copy versions, but will increase memory space expense.
Summary of the invention
In a first aspect, the embodiment of the invention provides metadata storage schemes in a kind of distributed memory system, described Comprising management node and (M+N) a memory node in distributed memory system, the management node and (M+N) a memory node are equal It is stored with the subregion view of metadata slitting;The subregion view of the metadata slitting includes primary data store node DSA, data Memory node DSiWith verification memory node CSr;Wherein, N is the natural number not less than 2, and M is the natural number not less than 1, and A is certainly One into N of right number 1, i are each except A in addition to of the natural number 1 into N, and r is natural number 1 each of to M;Institute State in storage scheme: the management node is that the metadata slitting determines main number according to the subregion view of the metadata slitting According to memory node DSA, data memory node DSiWith verification memory node CSr;The metadata slitting includes meta data block DA、Di And check block Cr, by DiIt is sent to the data memory node DSi, by DAIt is sent to the primary data store node DSA, by Cr It is sent to the verification memory node CSr;The verification memory node CSrReceive and store Cr;The data memory node DSi Receive and store Di, and according to the subregion view of the metadata slitting by DiIt is sent to the primary data store node DSA;Institute State primary data store node DSAReceive and store DAAnd Di.In the present solution, using correcting and eleting codes (Erasure in realization metadata Coding, EC) under protection mechanism, primary data store node DSAOther meta data blocks D in backup metadata slittingi, because only needing By data memory node DSiOn meta data block DiIn primary data store node DSAUpper backup, owns compared to the prior art The more copies of meta data block, do not need check block copy, reduce memory space, while when client accesses metadata, can be with From primary data store node DSAAll meta data blocks are accessed, metadata access speed is improved.The distributed storage system of this programme System can be distributed file system, distributed objects storage system or the storage of distributed block equipment.
Optionally, the management node is that the metadata slitting determines master according to the subregion view of the metadata slitting Data memory node DSA, data memory node DSiWith verification memory node CSr, specifically include: the management node is according to generation The write request of metadata in the metadata slitting determines the corresponding subregion of the metadata slitting;The management node according to The subregion view that the corresponding subregion of the metadata slitting inquires the metadata slitting determines the primary data store node DSA, the data memory node DSiWith the verification memory node CSr
Optionally, the management node determines corresponding point of the metadata slitting according to the address that the write request carries Area.
Optionally, the verification memory node CSrStorage Cr is specifically included: the verification memory node CSrIt is the Cr points With fragment Sr, and establish the Cr mark and the fragment SrMapping relations;The data memory node DSiStore Di It specifically includes: the data memory node DSiFor the DiDistribute fragment SDi, and establish the DiMark and the fragment SDiMapping relations;The primary data store node DSAStore DAAnd Di, specifically include: the primary data store node DSAFor The DADistribute fragment SDA, and establish the DAMark and the fragment SDAMapping relations, be the DiDistribute fragment SDi, and establish the DiMark and the fragment SDiMapping relations.
Further, management node establishes DiMark and data memory node DSiWith primary data store node DSAReflect Penetrate relationship.When carrying out garbage reclamation to metadata slitting, management node can be according to the mark of meta data block in metadata slitting Know the mapping relations with memory node, data of the meta data block in data memory node and primary data store node are returned It receives, improves the efficiency of metadata recycling.
Second aspect, correspondingly, the embodiment of the invention also provides a kind of distributed memory systems, deposit in the distribution Member is stored with comprising management node and (M+N) a memory node, the management node and (M+N) a memory node in storage system The subregion view of date classification;The subregion view of the metadata slitting includes primary data store node DSA, data memory node DSiWith verification memory node CSr;Wherein, N is the natural number not less than 2, and M is the natural number not less than 1, and A is natural number 1 to N In one, i is each except A in addition to of the natural number 1 into N, and r is natural number 1 each of to M;The distribution is deposited Storage system is for realizing the various implementations of first aspect.
Correspondingly, the present invention also provides non-volatile computer readable storage medium storing program for executing and computer program products, when this The memory loading non-volatile computer readable storage medium and computer program for the storage equipment that inventive embodiments provide produce The computer program instructions for including in product, the computer program instructions can run in distributed memory system, and distribution is deposited Storage system includes management node and (M+N) a memory node, and the management node and (M+N) a memory node are stored with first number According to the subregion view of slitting;The subregion view of the metadata slitting includes primary data store node DSA, data memory node DSi With verification memory node CSr;Wherein, N is the natural number not less than 2, and M is the natural number not less than 1, and A is natural number 1 into N One, i is each except A in addition to of the natural number 1 into N, and r is natural number 1 each of to M;When one or more is counted Calculation machine executes the computer program instructions respectively as the management node in the distributed memory system, primary data store section Point DSA, data memory node DSiWith verification memory node CSrFor realizing the various implementations of first aspect.
Metadata pair can also be applied in metadata storage scheme in various distributed memory systems disclosed in the first aspect The storage for the data answered.Correspondingly, the distributed memory system in second aspect direction and the non-volatile calculating of the third aspect Machine readable storage medium storing program for executing and computer program product can equally be well applied to data storage.
Detailed description of the invention
To describe the technical solutions in the embodiments of the present invention more clearly, make required in being described below to embodiment Attached drawing is briefly described.
Fig. 1 is a kind of distributed block equipment storage architecture schematic diagram provided in an embodiment of the present invention;
Fig. 2 is server architecture schematic diagram in a kind of distributed block equipment provided in an embodiment of the present invention;
Fig. 3 is a kind of date classification and subregion view relation schematic diagram provided in an embodiment of the present invention;
Fig. 4 is a kind of date classification schematic diagram provided in an embodiment of the present invention;
Fig. 5 is subregion view schematic diagram provided in an embodiment of the present invention;
Fig. 6 is a kind of metadata slitting provided in an embodiment of the present invention and subregion view relation schematic diagram;
Fig. 7 is metadata of embodiment of the present invention Stored Procedure figure;
Fig. 8 is a kind of metadata slitting schematic diagram provided in an embodiment of the present invention;
Fig. 9 is that metadata provided in an embodiment of the present invention stores schematic diagram.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention is clearly retouched It states.
Distributed memory system is mainly distributed the storage of formula file system, distributed objects storage and distributed block equipment and deposits Several forms such as storage, such as Huawei'sSeries of products.The embodiment of the present invention in a distributed manner deposit by block device It is illustrated for storage.Illustratively as shown in Figure 1, the storage of distributed block equipment includes multiple servers 1, server 2, service Device 3, server 4, server 5 and server 6 communicate with each other between server.In practical application, the storage of distributed block equipment The quantity of middle server can increase according to actual needs, and the embodiment of the present invention is not construed as limiting this.The storage of distributed block equipment Server in include structure as shown in Figure 2.
As shown in Fig. 2, every server in the storage of distributed block equipment includes central processing unit (Central Processing Unit, CPU) 201, memory 202, hard disk 1, hard disk 2 and hard disk 3, computer instruction is stored in memory 202, CPU201 executes the program instruction in memory 202 and executes corresponding operation.Hard disk can be in mechanical hard disk and solid state hard disk It is at least one.In addition, to save the computing resource of CPU201, field programmable gate array (Field Programmable Gate Array, FPGA) or other hardware can be used for the above-mentioned corresponding operation of CPU201, alternatively, FPGA or other hardware with CPU201 completes above-mentioned corresponding operation jointly.For convenience of description, Unify legislation of the embodiment of the present invention be processor for realizing Above-mentioned corresponding operation.
In structure shown in Fig. 2, loading application programs in memory 202, CPU201 executes the application program in memory 202 Instruction, then server is as client.Wherein, application program can be virtual machine (Virtual Machine, VM), can also be with For some specific application, such as office software.Client stores write-in data to distributed block equipment or sets from distributed block Data are read in standby storage.Load store management program in memory 202, the conduct virtual block that CPU201 is executed in memory 202 are deposited The storage management program instruction of management program is stored up, then server is responsible for the management of volume metadata as management node, is used for visitor Family end provides block protocol access interface, provides distributed storage access point service for client, enables the client to pass through management The storage resource of node visit distributed block equipment storage.Load store object program in memory 202, CPU201 execute memory Storage object program instruction in 202, then server is as memory node, for executing specific input and output (Input/ Output, I/O) operation.The process of multiple storage object programs can be run on each server, and illustratively, one piece hard Disk default one storage object program process of corresponding operation, each storage object program process one piece of hard disk good at managing then service Device runs the process of each storage object program as a memory node.Specific implementation, can also be to transport on a server All hard disks in the process corresponding server of one storage object program of row.The embodiment of the present invention is with a storage object program It is described for process one piece of hard disk good at managing.When distributed block equipment storing initial, each storage object program into Journey can be unit according to 1MB to hard disk progress management by district, and in each 1MB fragment of the metadata management regional record of hard disk Information is distributed, the fragment of hard disk forms memory resource pool.All storages of storage management program and its resource pool that can be accessed All memory nodes for the resource pool that the process point-to-point communication of object program, i.e. management node can be accessed with it are led to Letter, so that management node can concurrently access all hard disks of resource pool.
In distributed block equipment storing initial, hash space (such as 0-2^32) is divided into N equal portions, every 1 equal portions are 1 subregion (Partition), this N equal portions are divided equally according to hard disk quantity.For example, N is silent in distributed block storage equipment storage Think 3600, i.e. subregion is respectively P1, P2, P3 ... P3600.As shown in Figure 3, it is assumed that current distributed block equipment is stored with 18 pieces Hard disk (memory node), then every piece of memory node carries 200 subregions.Above-mentioned subregion and memory node corresponding relationship, i.e. subregion View can be distributed in distributed block equipment storing initial, and rear extended meeting is with hard disk quantity in the storage of distributed block equipment Variation be adjusted.The server of distributed block equipment storage can save the subregion view in memory 202, and management node makes With the subregion view carry out through street by.Also all points that distributed block device memory systems are preserved in each memory node Area's view, the i.e. corresponding relationship of each subregion and memory node.The reliability requirement stored simultaneously according to distributed block equipment, Correcting and eleting codes (Erasure Coding, EC) algorithm can be used and improve data reliability, such as use 3+1 mode, i.e. 3 data blocks With 1 check block composition data slitting, as shown in figure 4, then subregion view is " subregion-primary data store node-data storage section Point 1- data memory node 2- verifies memory node, and illustratively, subregion view is as shown in Figure 5.The subregion view indicates subregion Corresponding primary data node and for storing data data memory node 1 and back end 2 of other data blocks of slitting, and The verification memory node of storage verification data, is stored in the backup number of the data block of data memory node 1 and data memory node 2 It is main data memory node according to memory node.
The storage of distributed block equipment can logically press each logical unit number (Logical Unit Number, LUN) It is sliced according to 1MB size, such as the LUN of 1GB can then be cut into 1024*1MB fragment.As shown in figure 3, client passes through management When node sends write request to LUN, in small computer system interface (Small Computer System Interface, SCSI) order in can band LUN identification (Identifier, ID), logical block address (Logical Block Address, LBA) ID and data to be written, the management node where client receives write request, according to LUN ID and LBA ID A key key is formed, which can calculate information to the rounding of 1MB comprising LBA ID.Pass through distributed hashtable (Distributed Hash Table, DHT) Hash calculates an integer (range is in 0-2^32), and falls in specified partition In;Management node where client determines that primary data store node, data store according to the subregion view recorded in memory 202 Node 1, data memory node 2 and verification memory node, management node by EC date classification data block 1, data block 2, number Primary data store node 1, data memory node 2, data memory node 3 and verification storage are separately sent to according to block 3 and check block 4 Node 4.Primary data store node storing data block 1,1 storing data block 2 of data memory node, data memory node 2 store number According to block 3, verifies memory node and store check block 1.Data memory node 1 and 2 determines primary data store according to subregion view respectively Data block 2 is backuped to primary data store node by node, data memory node 1, and data memory node 2 backups to data block 3 Primary data store node, primary data store node distinguish storing data block 2 and data block 3.In the specific implementation, primary data store section Point is that data block 1 distributes fragment 1 from the hard disk that it is managed, and establishes the mark of data block 1 and the mapping relations of fragment 1;Data Memory node 1 distributes fragment 2 from the hard disk that it is managed for data block 2, and the mapping of the mark and fragment 2 of establishing data block 2 is closed System;Data memory node 2 distributes fragment 3 from the hard disk that it is managed for data block 3, establishes the mark and fragment 3 of data block 3 Mapping relations;Verification memory node distributes fragment 4 from the hard disk that it is managed for check block 1, establishes the mark of check block 1 and divides The mapping relations of piece 4.Primary data store node receives the data block 2 that data memory node 1 is sent and data memory node 2 is sent Data block 3, primary data store node distributes fragment 5 and fragment 6, primary data store node from the hard disk that it is managed and establishes number According to the mapping relations of the mark and fragment 6 of the mark of block 2 and the mapping relations of fragment 5 and data block 3.The embodiment of the present invention In, by taking the mark of data block and the mapping relations of fragment as an example, when corresponding 1 hard disk of 1 process of storage object program, I.e. memory node is hard disk itself, then the mark of data block and the mapping relations of fragment are the mark and fragment physics of data block The mapping relations of address;When 1 process of storage object program corresponds to multiple hard disks namely memory node management is multiple hard Disk, then the mark of data block and the mapping relations of fragment be include that the hard disk of the mark with storage of the data block data block reflects It penetrates, and stores the hard disk of the data block to the mapping of fragment.The mapping relations of fragment physical address are worked as.Further, data Block 2 is respectively stored into fragment 2 and fragment 5, and data block 3 is respectively stored in fragment 3 and 6, and management node is established and saves data block 2 Mark and data memory node 1 and primary data store node mapping, establish and save data block 3 mark and data store The mapping relations of node 2 and primary data store node.Further, data memory node 1 save save data block 2 mark with The mapping of data memory node 1 and primary data store node, data memory node 2 saves the mark of data block 3 and data store The mapping relations of node 2 and primary data store node.When carrying out garbage reclamation to date classification, management node can be according to number According to the mark of data block in slitting and the mapping relations of memory node, by data block in data memory node and primary data store Data in node recycle, and improve the efficiency of data record
In the embodiment of the present invention, when client stores transmission write request write-in data to distributed block equipment, member can be generated Data, logical address and physical address etc. for recording data.In the embodiment of the present invention, the storage of data corresponding metadata with Data storage uses identical EC algorithm.Metadata slitting and the above-mentioned composition data based on EC algorithm based on EC algorithm composition Slitting subregion view having the same, as shown in Figure 6.
Metadata is stored in distributed memory system, wherein distributed memory system includes management node and (M+N) is a deposits Node is stored up, management node and (M+N) a memory node are stored with the subregion view of metadata slitting;The subregion of metadata slitting View includes primary data store node DSA, data memory node DSiWith verification memory node CSr;Wherein, N is oneself not less than 2 So number, M are the natural number not less than 1, and A is one of natural number 1 into N, and i is each except A in addition to of the natural number 1 into N, R is each of natural number 1 to M;Process as shown in Figure 7 is executed in distributed memory system storage:
Step 701: management node is that the metadata slitting determines main number according to the subregion view of the metadata slitting According to memory node DSA, data memory node DSiWith verification memory node CSr;The metadata slitting includes meta data block DA、Di And check block Cr.
Specifically, the management node is that the metadata slitting determines master according to the subregion view of the metadata slitting Data memory node DSA, data memory node DSiWith verification memory node CSr, specifically include: the management node is according to generation The write request of metadata in the metadata slitting determines the corresponding subregion of the metadata slitting;The management node according to The subregion view that the corresponding subregion of the metadata slitting inquires the metadata slitting determines the primary data store node DSA, the data memory node DSiWith the verification memory node CSr
Specifically, the management node determines corresponding point of the metadata slitting according to the address that the write request carries Area.For details, reference can be made to distributed block equipment to store the scheme when storing the write request that client is sent, and details are not described herein.
Step 702: the management node is by DiIt is sent to the data memory node DSi, by DAIt is sent to the master data Memory node DSA, by CrIt is sent to the verification memory node CSr
Step 703: the verification memory node CSrReceive and store Cr
Step 704: the data memory node DSiReceive and store Di, and according to the subregion view of the metadata slitting By DiIt is sent to the primary data store node DSA
Step 705: the primary data store node DSAReceive and store DAAnd Di
Specifically, the verification memory node CSrStorage Cr is specifically included: the verification memory node CSrIt is the Cr points With fragment Sr, and establish the Cr mark and the fragment SrMapping relations;The data memory node DSiStore Di It specifically includes: the data memory node DSiFor the DiDistribute fragment SDi, and establish the DiMark and the fragment SDiMapping relations;The primary data store node DSAStore DAAnd Di, specifically include: the primary data store node DSAFor The DADistribute fragment SDA, and establish the DAMark and the fragment SDAMapping relations, be the DiDistribute fragment SDi, and establish the DiMark and the fragment SDiMapping relations.Further, management node establishes DiMark With data memory node DSiWith primary data store node DSAMapping relations.Further, further, data memory node 1 Save DiMark and data memory node DSiWith primary data store node DSAMapping relations.To metadata slitting When carrying out garbage reclamation, management node can be closed according to the mapping of the mark of meta data block and memory node in metadata slitting System, data of the meta data block in data memory node and primary data store node are recycled, and improve metadata recycling Efficiency.
In the embodiment of the present invention, in conjunction with the storage of mentioned-above distributed block equipment and data storage method, such as Fig. 8 institute Show, the use of meta data block in the metadata slitting of EC algorithm is D1, D2And D3, check block C1.Management node where client According to the subregion view " subregion-primary data store node-data memory node 1- data memory node 2- recorded in memory 202 Verification memory node " determines primary data store node, data memory node 1, data memory node 2 and verification memory node.It should Subregion view indicates that subregion corresponds to the data storage section of primary data node and other data blocks for storing metadata slitting Point 1 and back end 2, and the verification memory node of storage verification data, are stored in data memory node 1 and data storage section The backup data store node of the meta data block of point 2 is main data memory node.Management node is by the metadata based on EC algorithm D in slitting1、D2、D3And C1It is separately sent to primary data store node, data memory node 1, data memory node 2 and verification Memory node 4.Primary data store node receives and stores D1, data memory node 1 receives and stores D2, data memory node 2 connects It receives and stores D3, verify memory node and receive and store C1.Data memory node 1 and 2 determines master data according to subregion view respectively Memory node, data memory node 1 is by D2Backup to primary data store node, data memory node 2 is by D3Master data is backuped to deposit Node is stored up, primary data store node receives and stores D2And D3.In the specific implementation, as shown in figure 9, primary data store node is D1 Fragment 7 is distributed in the hard disk managed from it, establishes D1Mark and fragment 7 mapping relations;Data memory node 1 is managed from it Hard disk in be D2Fragment 8 is distributed, D is established2Mark and fragment 8 mapping relations;Data memory node 2 manages hard from it It is D in disk3Fragment 9 is distributed, D is established3Mark and fragment 9 mapping relations;Verify memory node is from the hard disk that it is managed C1Fragment 10 is distributed, C is established1Mark and fragment 10 mapping relations.Primary data store node receives data memory node 1 and sends out The D sent2The D sent with data memory node 23, primary data store node distribution fragment 11 and fragment from the hard disk that it is managed 12, primary data store node establishes D2Mark and fragment 11 mapping relations and D3Mark and fragment 12 mapping close System.In the embodiment of the present invention, by taking the mark of meta data block and the mapping relations of fragment as an example, when 1 of storage object program into When journey 1 hard disk of correspondence namely memory node is hard disk itself, then the mark of meta data block and the mapping relations of fragment are member The mark of data block and the mapping relations of fragment physical address;When 1 process of storage object program corresponds to multiple hard disks, I.e. memory node manages multiple hard disks, then the mark of meta data block and the mapping relations of fragment be include meta data block mark with The mapping of the hard disk of the meta data block is stored, and stores the hard disk of the meta data block to the mapping of fragment.Further, D2Point Fragment 8 and fragment 11, D Cun Chu not arrived3It is respectively stored in fragment 9 and 12, management node is established and saves D2Mark deposited with data The mapping for storing up node 1 and primary data store node, establishes and saves D3Mark and data memory node 2 and primary data store section The mapping relations of point.Further, data memory node 1 saves D2Mark deposited with data memory node 1 and master data The mapping of node is stored up, data memory node 2 saves D3Mark and the mapping of data memory node 2 and primary data store node close System.To metadata slitting carry out garbage reclamation when, management node can according to the mark of meta data block in metadata slitting with The mapping relations of memory node recycle data of the meta data block in data memory node and primary data store node, Improve the efficiency of metadata recycling.
Therefore, under the scene that data reliability is realized in the metadata slitting formed using EC algorithm, primary data store section Other meta data blocks in point backup metadata slitting, because only needing to deposit the meta data block on data memory node in master data It is backed up on storage node, compared to the prior art the more copies of all meta data blocks, reduces memory space, while accessing in client When metadata, it is only necessary to from all meta data blocks of primary data store node visit, improve metadata access speed.
The embodiment of the present invention additionally provides non-volatile computer readable storage medium storing program for executing and computer program product, non-easy The computer program instructions for including in the property lost computer readable storage medium and computer program product, CPU, which is executed in memory, to be added The computer program instructions carried for realizing management node in of the invention each implement and memory node (primary data store node, Data memory node and verification memory node) corresponding function.
The exemplary description provided in the embodiment of the present invention." fragment 1 ", " fragment 2 " in the embodiment of the present invention." point Piece 12 " etc. is not used to considered critical precedence relationship, is only intended to distinguish different fragments.Fragment in the embodiment of the present invention It can be the physical block etc. in hard disk.Hard disk in the embodiment of the present invention, as previously mentioned, can be in mechanical disk and solid state hard disk At least one.The corresponding hard disk of the process of storage object program can also be storage array etc., this hair in the embodiment of the present invention Bright embodiment is not construed as limiting this.
In several embodiments provided by the present invention, it should be understood that disclosed device, method can pass through it Its mode is realized.For example, the division of unit described in Installation practice described above, only a kind of logic function is drawn Point, there may be another division manner in actual implementation, such as multiple units or components may be combined or can be integrated into separately One system, or some features can be ignored or not executed.Another point, shown or discussed mutual coupling or straight Connecing coupling or communication connection can be through some interfaces, and the indirect coupling or communication connection of device or unit can be electrical property, Mechanical or other forms.
The unit as illustrated by the separation member may or may not be physically separated, aobvious as unit The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple In network unit.It can select some or all of unit therein according to the actual needs to realize the mesh of this embodiment scheme 's.
It, can also be in addition, the functional units in various embodiments of the present invention may be integrated into one processing unit It is that each unit physically exists alone, can also be integrated in one unit with two or more units.

Claims (10)

1. metadata storing method in a kind of distributed memory system, which is characterized in that the distributed memory system includes pipe Node and (M+N) a memory node are managed, the management node and (M+N) a memory node are stored with the subregion of metadata slitting View;The subregion view of the metadata slitting includes primary data store node DSA, data memory node DSiWith verification storage section Point CSr;Wherein, N is the natural number not less than 2, and M is the natural number not less than 1, and A is one of natural number 1 into N, and i is certainly Right each except A in addition to of the number 1 into N, r are each of natural number 1 to M;The described method includes:
The management node is that the metadata slitting determines primary data store section according to the subregion view of the metadata slitting Point DSA, data memory node DSiWith verification memory node CSr;The metadata slitting includes meta data block DA、DiAnd verification Block Cr;
The management node is by DiIt is sent to the data memory node DSi, by DAIt is sent to the primary data store node DSA, By CrIt is sent to the verification memory node CSr
The verification memory node CSrReceive and store Cr
The data memory node DSiReceive and store Di, and according to the subregion view of the metadata slitting by DiIt is sent to institute State primary data store node DSA
The primary data store node DSAReceive and store DAAnd Di
2. the method according to claim 1, wherein subregion of the management node according to the metadata slitting View is that the metadata slitting determines primary data store node DSA, data memory node DSiWith verification memory node CSr, tool Body includes:
The management node determines the metadata slitting pair according to the write request for generating the metadata in the metadata slitting The subregion answered;
The management node is determined according to the subregion view that the corresponding subregion of the metadata slitting inquires the metadata slitting The primary data store node DSA, the data memory node DSiWith the verification memory node CSr
3. according to the method described in claim 2, it is characterized in that, the address that the management node is carried according to the write request Determine the corresponding subregion of the metadata slitting.
4. the method according to claim 1, wherein the verification memory node CSrStorage Cr is specifically included: institute State verification memory node CSrFragment S is distributed for the Crr, and establish the Cr mark and the fragment SrMapping close System;
The data memory node DSiStore DiIt specifically includes: the data memory node DSiFor the DiDistribute fragment SDi, and And establish the DiMark and the fragment SDiMapping relations;
The primary data store node DSAStore DAAnd Di, specifically include: the primary data store node DSAFor the DADistribution Fragment SDA, and establish the DAMark and the fragment SDAMapping relations, be the DiDistribute fragment SDi, and build Found the DiMark and the fragment SDiMapping relations.
5. a kind of distributed memory system, which is characterized in that the distributed memory system includes management node and (M+N) is a deposits Node is stored up, the management node and (M+N) a memory node are stored with the subregion view of metadata slitting;The metadata point The subregion view of item includes primary data store node DSA, data memory node DSiWith verification memory node CSr;Wherein, N is not Natural number less than 2, M are natural number not less than 1, and A is one of natural number 1 into N, i be natural number 1 into N in addition to A Each, r is natural number 1 each of to M;
The management node is used to be that the metadata slitting determines that master data is deposited according to the subregion view of the metadata slitting Store up node DSA, data memory node DSiWith verification memory node CSr;The metadata slitting includes meta data block DA、DiAnd Check block Cr, by DiIt is sent to the data memory node DSi, by DAIt is sent to the primary data store node DSA, by CrIt sends To the verification memory node CSr
The verification memory node CSrFor receiving and storing Cr
The data memory node DSiFor receiving and storing Di, and according to the subregion view of the metadata slitting by DiIt sends To the primary data store node DSA
The primary data store node DSAFor receiving and storing DAAnd Di
6. system according to claim 5, which is characterized in that the management node is specifically used for according to generation first number The corresponding subregion of the metadata slitting is determined according to the write request of the metadata in slitting, it is corresponding according to the metadata slitting The subregion view that subregion inquires the metadata slitting determines the primary data store node DSA, the data memory node DSi With the verification memory node CSr
7. system according to claim 6, which is characterized in that the management node is also used to be carried according to the write request Address determine the corresponding subregion of the metadata slitting.
8. system according to claim 5, which is characterized in that the verification memory node CSrSpecifically for being the Cr points With fragment Sr, and establish the Cr mark and the fragment SrMapping relations;
The data memory node DSiSpecifically for for the DiDistribute fragment SDi, and establish the DiMark with described point Piece SDiMapping relations;
The primary data store node DSASpecifically for for the DADistribute fragment SDA, and establish the DAMark with it is described Fragment SDAMapping relations, be the DiDistribute fragment SDi, and establish the DiMark and the fragment SDiMapping close System.
9. a kind of non-volatile readable storage medium, which is characterized in that the non-volatile readable storage medium includes computer Program instruction, the computer program instructions can run in distributed memory system, and distributed memory system includes management section Point and (M+N) a memory node, the management node and (M+N) a memory node are stored with the subregion view of metadata slitting Figure;The subregion view of the metadata slitting includes primary data store node DSA, data memory node DSiWith verification memory node CSr;Wherein, N is the natural number not less than 2, and M is the natural number not less than 1, and A is one of natural number 1 into N, and i is nature Each except A in addition to of the number 1 into N, r are each of natural number 1 to M;When one or more of computers execute institute When stating computer instruction, one or more of computers are used for point according to the metadata slitting as the management node Area's view is that the metadata slitting determines primary data store node DSA, data memory node DSiWith verification memory node CSr; The metadata slitting includes meta data block DA、DiAnd check block Cr, by DiIt is sent to the data memory node DSi, by DA It is sent to the primary data store node DSA, by CrIt is sent to the verification memory node CSr;One or more of computers As the verification memory node CSrFor receiving and storing Cr
One or more of computers are as the data memory node DSiFor receiving and storing Di, and according to first number According to the subregion view of slitting by DiIt is sent to the primary data store node DSA
One or more of computers are as the primary data store node DSAFor receiving and storing DAAnd Di
10. storage medium according to claim 9, which is characterized in that further include that computer program instructions make described one A or multiple computers are specifically used for being asked according to the writing for metadata generated in the metadata slitting as the management node It asks and determines the corresponding subregion of the metadata slitting, the metadata slitting is inquired according to the corresponding subregion of the metadata slitting Subregion view determine the primary data store node DSA, the data memory node DSiWith the verification memory node CSr
CN201710508014.8A 2017-06-28 2017-06-28 Metadata storage method, system and storage medium in distributed storage system Active CN109144406B (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
CN201710508014.8A CN109144406B (en) 2017-06-28 2017-06-28 Metadata storage method, system and storage medium in distributed storage system
CN202010648620.1A CN111949210A (en) 2017-06-28 2017-06-28 Metadata storage method, system and storage medium in distributed storage system
PCT/CN2018/075077 WO2019000949A1 (en) 2017-06-28 2018-02-02 Metadata storage method and system in distributed storage system, and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710508014.8A CN109144406B (en) 2017-06-28 2017-06-28 Metadata storage method, system and storage medium in distributed storage system

Related Child Applications (1)

Application Number Title Priority Date Filing Date
CN202010648620.1A Division CN111949210A (en) 2017-06-28 2017-06-28 Metadata storage method, system and storage medium in distributed storage system

Publications (2)

Publication Number Publication Date
CN109144406A true CN109144406A (en) 2019-01-04
CN109144406B CN109144406B (en) 2020-08-07

Family

ID=64740945

Family Applications (2)

Application Number Title Priority Date Filing Date
CN201710508014.8A Active CN109144406B (en) 2017-06-28 2017-06-28 Metadata storage method, system and storage medium in distributed storage system
CN202010648620.1A Pending CN111949210A (en) 2017-06-28 2017-06-28 Metadata storage method, system and storage medium in distributed storage system

Family Applications After (1)

Application Number Title Priority Date Filing Date
CN202010648620.1A Pending CN111949210A (en) 2017-06-28 2017-06-28 Metadata storage method, system and storage medium in distributed storage system

Country Status (2)

Country Link
CN (2) CN109144406B (en)
WO (1) WO2019000949A1 (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111444274A (en) * 2020-03-26 2020-07-24 上海依图网络科技有限公司 Data synchronization method, data synchronization system, and apparatus, medium, and system thereof
WO2021046693A1 (en) * 2019-09-09 2021-03-18 华为技术有限公司 Data processing method in storage system, device, and storage system
CN112947864A (en) * 2021-03-29 2021-06-11 南方电网数字电网研究院有限公司 Metadata storage method, device, equipment and storage medium
CN113508372A (en) * 2019-03-04 2021-10-15 日立数据管理有限公司 Metadata routing in distributed systems
WO2022094895A1 (en) * 2020-11-05 2022-05-12 Alibaba Group Holding Limited Virtual data copy supporting garbage collection in distributed file systems
CN115268801A (en) * 2022-09-30 2022-11-01 天津卓朗昆仑云软件技术有限公司 Backup system and method for block device

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115904794A (en) * 2021-08-18 2023-04-04 华为技术有限公司 Data processing method and device

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040024963A1 (en) * 2002-08-05 2004-02-05 Nisha Talagala Method and system for striping data to accommodate integrity metadata
CN103399823A (en) * 2011-12-31 2013-11-20 华为数字技术(成都)有限公司 Method, equipment and system for storing service data
US20140068324A1 (en) * 2012-09-06 2014-03-06 International Business Machines Corporation Asynchronous raid stripe writesto enable response to media errors
CN106233264A (en) * 2014-03-31 2016-12-14 亚马逊科技公司 Use the file storage device of variable stripe size
CN106471461A (en) * 2014-06-04 2017-03-01 纯存储公司 Automatically reconfigure storage device memorizer topology
CN106662983A (en) * 2015-12-31 2017-05-10 华为技术有限公司 Method, apparatus and system for data reconstruction in distributed storage system

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102411637B (en) * 2011-12-30 2013-07-24 创新科软件技术(深圳)有限公司 Metadata management method of distributed file system
CN102937964B (en) * 2012-09-28 2015-02-11 无锡江南计算技术研究所 Intelligent data service method based on distributed system
US9104332B2 (en) * 2013-04-16 2015-08-11 International Business Machines Corporation Managing metadata and data for a logical volume in a distributed and declustered system
CN103699494B (en) * 2013-12-06 2017-03-15 北京奇虎科技有限公司 A kind of date storage method, data storage device and distributed memory system
CN103729436A (en) * 2013-12-27 2014-04-16 中国科学院信息工程研究所 Distributed metadata management method and system
CN106599308B (en) * 2016-12-29 2020-01-31 郭晓凤 distributed metadata management method and system

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040024963A1 (en) * 2002-08-05 2004-02-05 Nisha Talagala Method and system for striping data to accommodate integrity metadata
CN103399823A (en) * 2011-12-31 2013-11-20 华为数字技术(成都)有限公司 Method, equipment and system for storing service data
US20140068324A1 (en) * 2012-09-06 2014-03-06 International Business Machines Corporation Asynchronous raid stripe writesto enable response to media errors
CN106233264A (en) * 2014-03-31 2016-12-14 亚马逊科技公司 Use the file storage device of variable stripe size
CN106471461A (en) * 2014-06-04 2017-03-01 纯存储公司 Automatically reconfigure storage device memorizer topology
CN106662983A (en) * 2015-12-31 2017-05-10 华为技术有限公司 Method, apparatus and system for data reconstruction in distributed storage system

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113508372A (en) * 2019-03-04 2021-10-15 日立数据管理有限公司 Metadata routing in distributed systems
WO2021046693A1 (en) * 2019-09-09 2021-03-18 华为技术有限公司 Data processing method in storage system, device, and storage system
CN113544635A (en) * 2019-09-09 2021-10-22 华为技术有限公司 Data processing method and device in storage system and storage system
CN111444274A (en) * 2020-03-26 2020-07-24 上海依图网络科技有限公司 Data synchronization method, data synchronization system, and apparatus, medium, and system thereof
CN111444274B (en) * 2020-03-26 2021-04-30 上海依图网络科技有限公司 Data synchronization method, data synchronization system, and apparatus, medium, and system thereof
WO2022094895A1 (en) * 2020-11-05 2022-05-12 Alibaba Group Holding Limited Virtual data copy supporting garbage collection in distributed file systems
CN116490847A (en) * 2020-11-05 2023-07-25 阿里巴巴集团控股有限公司 Virtual data replication supporting garbage collection in a distributed file system
CN112947864A (en) * 2021-03-29 2021-06-11 南方电网数字电网研究院有限公司 Metadata storage method, device, equipment and storage medium
CN112947864B (en) * 2021-03-29 2024-03-08 南方电网数字平台科技(广东)有限公司 Metadata storage method, apparatus, device and storage medium
CN115268801A (en) * 2022-09-30 2022-11-01 天津卓朗昆仑云软件技术有限公司 Backup system and method for block device
CN115268801B (en) * 2022-09-30 2023-01-10 天津卓朗昆仑云软件技术有限公司 Backup system and method for block device

Also Published As

Publication number Publication date
CN109144406B (en) 2020-08-07
WO2019000949A1 (en) 2019-01-03
CN111949210A (en) 2020-11-17

Similar Documents

Publication Publication Date Title
US11379142B2 (en) Snapshot-enabled storage system implementing algorithm for efficient reclamation of snapshot storage space
CN109144406A (en) Metadata storing method, system and storage medium in distributed memory system
US11082206B2 (en) Layout-independent cryptographic stamp of a distributed dataset
CN106662981B (en) Storage device, program, and information processing method
US20230013281A1 (en) Storage space optimization in a system with varying data redundancy schemes
US10977124B2 (en) Distributed storage system, data storage method, and software program
US7447839B2 (en) System for a distributed column chunk data store
US8775751B1 (en) Aggressive reclamation of tier-1 storage space in presence of copy-on-write-snapshots
JP5539683B2 (en) Scalable secondary storage system and method
CN106687911B (en) Online data movement without compromising data integrity
US7457935B2 (en) Method for a distributed column chunk data store
US20200117362A1 (en) Erasure coding content driven distribution of data blocks
KR20170056418A (en) Distributed multimode storage management
US11093387B1 (en) Garbage collection based on transmission object models
US9514008B2 (en) System and method for distributed processing of file volume
US20060218113A1 (en) Method and system for shredding data within a data storage subsystem
EP3120235A1 (en) Remote replication using mediums
CN110134338B (en) Distributed storage system and data redundancy protection method and related equipment thereof
US20150046398A1 (en) Accessing And Replicating Backup Data Objects
Manogar et al. A study on data deduplication techniques for optimized storage
US10613755B1 (en) Efficient repurposing of application data in storage environments
CN103970875A (en) Parallel repeated data deleting method
CN110968554A (en) Block chain storage method, storage system and storage medium based on file chain blocks
CN109582213A (en) Data reconstruction method and device, data-storage system
US20200142628A1 (en) Data reduction reporting in storage systems

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant