CN109144406A - Metadata storing method, system and storage medium in distributed memory system - Google Patents
Metadata storing method, system and storage medium in distributed memory system Download PDFInfo
- Publication number
- CN109144406A CN109144406A CN201710508014.8A CN201710508014A CN109144406A CN 109144406 A CN109144406 A CN 109144406A CN 201710508014 A CN201710508014 A CN 201710508014A CN 109144406 A CN109144406 A CN 109144406A
- Authority
- CN
- China
- Prior art keywords
- node
- metadata
- slitting
- memory node
- fragment
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 23
- 238000003860 storage Methods 0.000 title claims description 67
- 239000012634 fragment Substances 0.000 claims description 76
- 238000013507 mapping Methods 0.000 claims description 54
- 238000012795 verification Methods 0.000 claims description 50
- 239000008186 active pharmaceutical agent Substances 0.000 claims description 42
- 238000004590 computer program Methods 0.000 claims description 13
- 238000004422 calculation algorithm Methods 0.000 abstract description 8
- 238000007726 management method Methods 0.000 description 45
- 238000010586 diagram Methods 0.000 description 8
- 238000013500 data storage Methods 0.000 description 7
- 238000004891 communication Methods 0.000 description 3
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 238000012545 processing Methods 0.000 description 3
- 238000004064 recycling Methods 0.000 description 3
- 230000006870 function Effects 0.000 description 2
- 238000005192 partition Methods 0.000 description 2
- 239000007787 solid Substances 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000009434 installation Methods 0.000 description 1
- 230000014759 maintenance of location Effects 0.000 description 1
- 230000005055 memory storage Effects 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0602—Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
- G06F3/061—Improving I/O performance
- G06F3/0611—Improving I/O performance in relation to response time
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/14—Error detection or correction of the data by redundancy in operation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0602—Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
- G06F3/0608—Saving storage space on storage systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0655—Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0668—Interfaces specially adapted for storage systems adopting a particular infrastructure
- G06F3/067—Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
- H04L67/1095—Replication or mirroring of data, e.g. scheduling or transport for data synchronisation between network nodes
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
- H04L67/1097—Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/14—Error detection or correction of the data by redundancy in operation
- G06F11/1402—Saving, restoring, recovering or retrying
- G06F11/1446—Point-in-time backing up or restoration of persistent data
- G06F11/1458—Management of the backup or restore process
- G06F11/1464—Management of the backup or restore process for networked environments
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Human Computer Interaction (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Quality & Reliability (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Disclose metadata storing method in a kind of distributed memory system, in distributed memory system, under the scene of the metadata slitting realization data reliability of EC algorithm composition, other meta data blocks in primary data store node standby metadata slitting, because only needing to back up the meta data block on data memory node on primary data store node, the more copies of all meta data blocks compared to the prior art, reduce memory space, simultaneously when client accesses metadata, it only needs from all meta data blocks of primary data store node visit, improve metadata access speed.
Description
Technical field
The present invention relates to metadata storage sides in technical field of data storage more particularly to a kind of distributed memory system
Method, system and storage medium.
Background technique
In distributed memory system, management node stores user data to after memory node, can generate record data
Logical address, the metadata of physical address etc., metadata will also store memory node.Common metadata storage mode is
Block in metadata slitting is broken up into each memory node, when reading the metadata, needs to read metadata from each memory node
Block in slitting scrabbles up metadata slitting, but data forwarding amount is big between memory node, influences performance.Another mode member
Data are stored in memory node with more copy versions, but will increase memory space expense.
Summary of the invention
In a first aspect, the embodiment of the invention provides metadata storage schemes in a kind of distributed memory system, described
Comprising management node and (M+N) a memory node in distributed memory system, the management node and (M+N) a memory node are equal
It is stored with the subregion view of metadata slitting;The subregion view of the metadata slitting includes primary data store node DSA, data
Memory node DSiWith verification memory node CSr;Wherein, N is the natural number not less than 2, and M is the natural number not less than 1, and A is certainly
One into N of right number 1, i are each except A in addition to of the natural number 1 into N, and r is natural number 1 each of to M;Institute
State in storage scheme: the management node is that the metadata slitting determines main number according to the subregion view of the metadata slitting
According to memory node DSA, data memory node DSiWith verification memory node CSr;The metadata slitting includes meta data block DA、Di
And check block Cr, by DiIt is sent to the data memory node DSi, by DAIt is sent to the primary data store node DSA, by Cr
It is sent to the verification memory node CSr;The verification memory node CSrReceive and store Cr;The data memory node DSi
Receive and store Di, and according to the subregion view of the metadata slitting by DiIt is sent to the primary data store node DSA;Institute
State primary data store node DSAReceive and store DAAnd Di.In the present solution, using correcting and eleting codes (Erasure in realization metadata
Coding, EC) under protection mechanism, primary data store node DSAOther meta data blocks D in backup metadata slittingi, because only needing
By data memory node DSiOn meta data block DiIn primary data store node DSAUpper backup, owns compared to the prior art
The more copies of meta data block, do not need check block copy, reduce memory space, while when client accesses metadata, can be with
From primary data store node DSAAll meta data blocks are accessed, metadata access speed is improved.The distributed storage system of this programme
System can be distributed file system, distributed objects storage system or the storage of distributed block equipment.
Optionally, the management node is that the metadata slitting determines master according to the subregion view of the metadata slitting
Data memory node DSA, data memory node DSiWith verification memory node CSr, specifically include: the management node is according to generation
The write request of metadata in the metadata slitting determines the corresponding subregion of the metadata slitting;The management node according to
The subregion view that the corresponding subregion of the metadata slitting inquires the metadata slitting determines the primary data store node
DSA, the data memory node DSiWith the verification memory node CSr。
Optionally, the management node determines corresponding point of the metadata slitting according to the address that the write request carries
Area.
Optionally, the verification memory node CSrStorage Cr is specifically included: the verification memory node CSrIt is the Cr points
With fragment Sr, and establish the Cr mark and the fragment SrMapping relations;The data memory node DSiStore Di
It specifically includes: the data memory node DSiFor the DiDistribute fragment SDi, and establish the DiMark and the fragment
SDiMapping relations;The primary data store node DSAStore DAAnd Di, specifically include: the primary data store node DSAFor
The DADistribute fragment SDA, and establish the DAMark and the fragment SDAMapping relations, be the DiDistribute fragment
SDi, and establish the DiMark and the fragment SDiMapping relations.
Further, management node establishes DiMark and data memory node DSiWith primary data store node DSAReflect
Penetrate relationship.When carrying out garbage reclamation to metadata slitting, management node can be according to the mark of meta data block in metadata slitting
Know the mapping relations with memory node, data of the meta data block in data memory node and primary data store node are returned
It receives, improves the efficiency of metadata recycling.
Second aspect, correspondingly, the embodiment of the invention also provides a kind of distributed memory systems, deposit in the distribution
Member is stored with comprising management node and (M+N) a memory node, the management node and (M+N) a memory node in storage system
The subregion view of date classification;The subregion view of the metadata slitting includes primary data store node DSA, data memory node
DSiWith verification memory node CSr;Wherein, N is the natural number not less than 2, and M is the natural number not less than 1, and A is natural number 1 to N
In one, i is each except A in addition to of the natural number 1 into N, and r is natural number 1 each of to M;The distribution is deposited
Storage system is for realizing the various implementations of first aspect.
Correspondingly, the present invention also provides non-volatile computer readable storage medium storing program for executing and computer program products, when this
The memory loading non-volatile computer readable storage medium and computer program for the storage equipment that inventive embodiments provide produce
The computer program instructions for including in product, the computer program instructions can run in distributed memory system, and distribution is deposited
Storage system includes management node and (M+N) a memory node, and the management node and (M+N) a memory node are stored with first number
According to the subregion view of slitting;The subregion view of the metadata slitting includes primary data store node DSA, data memory node DSi
With verification memory node CSr;Wherein, N is the natural number not less than 2, and M is the natural number not less than 1, and A is natural number 1 into N
One, i is each except A in addition to of the natural number 1 into N, and r is natural number 1 each of to M;When one or more is counted
Calculation machine executes the computer program instructions respectively as the management node in the distributed memory system, primary data store section
Point DSA, data memory node DSiWith verification memory node CSrFor realizing the various implementations of first aspect.
Metadata pair can also be applied in metadata storage scheme in various distributed memory systems disclosed in the first aspect
The storage for the data answered.Correspondingly, the distributed memory system in second aspect direction and the non-volatile calculating of the third aspect
Machine readable storage medium storing program for executing and computer program product can equally be well applied to data storage.
Detailed description of the invention
To describe the technical solutions in the embodiments of the present invention more clearly, make required in being described below to embodiment
Attached drawing is briefly described.
Fig. 1 is a kind of distributed block equipment storage architecture schematic diagram provided in an embodiment of the present invention;
Fig. 2 is server architecture schematic diagram in a kind of distributed block equipment provided in an embodiment of the present invention;
Fig. 3 is a kind of date classification and subregion view relation schematic diagram provided in an embodiment of the present invention;
Fig. 4 is a kind of date classification schematic diagram provided in an embodiment of the present invention;
Fig. 5 is subregion view schematic diagram provided in an embodiment of the present invention;
Fig. 6 is a kind of metadata slitting provided in an embodiment of the present invention and subregion view relation schematic diagram;
Fig. 7 is metadata of embodiment of the present invention Stored Procedure figure;
Fig. 8 is a kind of metadata slitting schematic diagram provided in an embodiment of the present invention;
Fig. 9 is that metadata provided in an embodiment of the present invention stores schematic diagram.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention is clearly retouched
It states.
Distributed memory system is mainly distributed the storage of formula file system, distributed objects storage and distributed block equipment and deposits
Several forms such as storage, such as Huawei'sSeries of products.The embodiment of the present invention in a distributed manner deposit by block device
It is illustrated for storage.Illustratively as shown in Figure 1, the storage of distributed block equipment includes multiple servers 1, server 2, service
Device 3, server 4, server 5 and server 6 communicate with each other between server.In practical application, the storage of distributed block equipment
The quantity of middle server can increase according to actual needs, and the embodiment of the present invention is not construed as limiting this.The storage of distributed block equipment
Server in include structure as shown in Figure 2.
As shown in Fig. 2, every server in the storage of distributed block equipment includes central processing unit (Central
Processing Unit, CPU) 201, memory 202, hard disk 1, hard disk 2 and hard disk 3, computer instruction is stored in memory 202,
CPU201 executes the program instruction in memory 202 and executes corresponding operation.Hard disk can be in mechanical hard disk and solid state hard disk
It is at least one.In addition, to save the computing resource of CPU201, field programmable gate array (Field Programmable Gate
Array, FPGA) or other hardware can be used for the above-mentioned corresponding operation of CPU201, alternatively, FPGA or other hardware with
CPU201 completes above-mentioned corresponding operation jointly.For convenience of description, Unify legislation of the embodiment of the present invention be processor for realizing
Above-mentioned corresponding operation.
In structure shown in Fig. 2, loading application programs in memory 202, CPU201 executes the application program in memory 202
Instruction, then server is as client.Wherein, application program can be virtual machine (Virtual Machine, VM), can also be with
For some specific application, such as office software.Client stores write-in data to distributed block equipment or sets from distributed block
Data are read in standby storage.Load store management program in memory 202, the conduct virtual block that CPU201 is executed in memory 202 are deposited
The storage management program instruction of management program is stored up, then server is responsible for the management of volume metadata as management node, is used for visitor
Family end provides block protocol access interface, provides distributed storage access point service for client, enables the client to pass through management
The storage resource of node visit distributed block equipment storage.Load store object program in memory 202, CPU201 execute memory
Storage object program instruction in 202, then server is as memory node, for executing specific input and output (Input/
Output, I/O) operation.The process of multiple storage object programs can be run on each server, and illustratively, one piece hard
Disk default one storage object program process of corresponding operation, each storage object program process one piece of hard disk good at managing then service
Device runs the process of each storage object program as a memory node.Specific implementation, can also be to transport on a server
All hard disks in the process corresponding server of one storage object program of row.The embodiment of the present invention is with a storage object program
It is described for process one piece of hard disk good at managing.When distributed block equipment storing initial, each storage object program into
Journey can be unit according to 1MB to hard disk progress management by district, and in each 1MB fragment of the metadata management regional record of hard disk
Information is distributed, the fragment of hard disk forms memory resource pool.All storages of storage management program and its resource pool that can be accessed
All memory nodes for the resource pool that the process point-to-point communication of object program, i.e. management node can be accessed with it are led to
Letter, so that management node can concurrently access all hard disks of resource pool.
In distributed block equipment storing initial, hash space (such as 0-2^32) is divided into N equal portions, every 1 equal portions are
1 subregion (Partition), this N equal portions are divided equally according to hard disk quantity.For example, N is silent in distributed block storage equipment storage
Think 3600, i.e. subregion is respectively P1, P2, P3 ... P3600.As shown in Figure 3, it is assumed that current distributed block equipment is stored with 18 pieces
Hard disk (memory node), then every piece of memory node carries 200 subregions.Above-mentioned subregion and memory node corresponding relationship, i.e. subregion
View can be distributed in distributed block equipment storing initial, and rear extended meeting is with hard disk quantity in the storage of distributed block equipment
Variation be adjusted.The server of distributed block equipment storage can save the subregion view in memory 202, and management node makes
With the subregion view carry out through street by.Also all points that distributed block device memory systems are preserved in each memory node
Area's view, the i.e. corresponding relationship of each subregion and memory node.The reliability requirement stored simultaneously according to distributed block equipment,
Correcting and eleting codes (Erasure Coding, EC) algorithm can be used and improve data reliability, such as use 3+1 mode, i.e. 3 data blocks
With 1 check block composition data slitting, as shown in figure 4, then subregion view is " subregion-primary data store node-data storage section
Point 1- data memory node 2- verifies memory node, and illustratively, subregion view is as shown in Figure 5.The subregion view indicates subregion
Corresponding primary data node and for storing data data memory node 1 and back end 2 of other data blocks of slitting, and
The verification memory node of storage verification data, is stored in the backup number of the data block of data memory node 1 and data memory node 2
It is main data memory node according to memory node.
The storage of distributed block equipment can logically press each logical unit number (Logical Unit Number, LUN)
It is sliced according to 1MB size, such as the LUN of 1GB can then be cut into 1024*1MB fragment.As shown in figure 3, client passes through management
When node sends write request to LUN, in small computer system interface (Small Computer System
Interface, SCSI) order in can band LUN identification (Identifier, ID), logical block address (Logical Block
Address, LBA) ID and data to be written, the management node where client receives write request, according to LUN ID and LBA ID
A key key is formed, which can calculate information to the rounding of 1MB comprising LBA ID.Pass through distributed hashtable
(Distributed Hash Table, DHT) Hash calculates an integer (range is in 0-2^32), and falls in specified partition
In;Management node where client determines that primary data store node, data store according to the subregion view recorded in memory 202
Node 1, data memory node 2 and verification memory node, management node by EC date classification data block 1, data block 2, number
Primary data store node 1, data memory node 2, data memory node 3 and verification storage are separately sent to according to block 3 and check block 4
Node 4.Primary data store node storing data block 1,1 storing data block 2 of data memory node, data memory node 2 store number
According to block 3, verifies memory node and store check block 1.Data memory node 1 and 2 determines primary data store according to subregion view respectively
Data block 2 is backuped to primary data store node by node, data memory node 1, and data memory node 2 backups to data block 3
Primary data store node, primary data store node distinguish storing data block 2 and data block 3.In the specific implementation, primary data store section
Point is that data block 1 distributes fragment 1 from the hard disk that it is managed, and establishes the mark of data block 1 and the mapping relations of fragment 1;Data
Memory node 1 distributes fragment 2 from the hard disk that it is managed for data block 2, and the mapping of the mark and fragment 2 of establishing data block 2 is closed
System;Data memory node 2 distributes fragment 3 from the hard disk that it is managed for data block 3, establishes the mark and fragment 3 of data block 3
Mapping relations;Verification memory node distributes fragment 4 from the hard disk that it is managed for check block 1, establishes the mark of check block 1 and divides
The mapping relations of piece 4.Primary data store node receives the data block 2 that data memory node 1 is sent and data memory node 2 is sent
Data block 3, primary data store node distributes fragment 5 and fragment 6, primary data store node from the hard disk that it is managed and establishes number
According to the mapping relations of the mark and fragment 6 of the mark of block 2 and the mapping relations of fragment 5 and data block 3.The embodiment of the present invention
In, by taking the mark of data block and the mapping relations of fragment as an example, when corresponding 1 hard disk of 1 process of storage object program,
I.e. memory node is hard disk itself, then the mark of data block and the mapping relations of fragment are the mark and fragment physics of data block
The mapping relations of address;When 1 process of storage object program corresponds to multiple hard disks namely memory node management is multiple hard
Disk, then the mark of data block and the mapping relations of fragment be include that the hard disk of the mark with storage of the data block data block reflects
It penetrates, and stores the hard disk of the data block to the mapping of fragment.The mapping relations of fragment physical address are worked as.Further, data
Block 2 is respectively stored into fragment 2 and fragment 5, and data block 3 is respectively stored in fragment 3 and 6, and management node is established and saves data block 2
Mark and data memory node 1 and primary data store node mapping, establish and save data block 3 mark and data store
The mapping relations of node 2 and primary data store node.Further, data memory node 1 save save data block 2 mark with
The mapping of data memory node 1 and primary data store node, data memory node 2 saves the mark of data block 3 and data store
The mapping relations of node 2 and primary data store node.When carrying out garbage reclamation to date classification, management node can be according to number
According to the mark of data block in slitting and the mapping relations of memory node, by data block in data memory node and primary data store
Data in node recycle, and improve the efficiency of data record
In the embodiment of the present invention, when client stores transmission write request write-in data to distributed block equipment, member can be generated
Data, logical address and physical address etc. for recording data.In the embodiment of the present invention, the storage of data corresponding metadata with
Data storage uses identical EC algorithm.Metadata slitting and the above-mentioned composition data based on EC algorithm based on EC algorithm composition
Slitting subregion view having the same, as shown in Figure 6.
Metadata is stored in distributed memory system, wherein distributed memory system includes management node and (M+N) is a deposits
Node is stored up, management node and (M+N) a memory node are stored with the subregion view of metadata slitting;The subregion of metadata slitting
View includes primary data store node DSA, data memory node DSiWith verification memory node CSr;Wherein, N is oneself not less than 2
So number, M are the natural number not less than 1, and A is one of natural number 1 into N, and i is each except A in addition to of the natural number 1 into N,
R is each of natural number 1 to M;Process as shown in Figure 7 is executed in distributed memory system storage:
Step 701: management node is that the metadata slitting determines main number according to the subregion view of the metadata slitting
According to memory node DSA, data memory node DSiWith verification memory node CSr;The metadata slitting includes meta data block DA、Di
And check block Cr.
Specifically, the management node is that the metadata slitting determines master according to the subregion view of the metadata slitting
Data memory node DSA, data memory node DSiWith verification memory node CSr, specifically include: the management node is according to generation
The write request of metadata in the metadata slitting determines the corresponding subregion of the metadata slitting;The management node according to
The subregion view that the corresponding subregion of the metadata slitting inquires the metadata slitting determines the primary data store node
DSA, the data memory node DSiWith the verification memory node CSr。
Specifically, the management node determines corresponding point of the metadata slitting according to the address that the write request carries
Area.For details, reference can be made to distributed block equipment to store the scheme when storing the write request that client is sent, and details are not described herein.
Step 702: the management node is by DiIt is sent to the data memory node DSi, by DAIt is sent to the master data
Memory node DSA, by CrIt is sent to the verification memory node CSr。
Step 703: the verification memory node CSrReceive and store Cr。
Step 704: the data memory node DSiReceive and store Di, and according to the subregion view of the metadata slitting
By DiIt is sent to the primary data store node DSA。
Step 705: the primary data store node DSAReceive and store DAAnd Di。
Specifically, the verification memory node CSrStorage Cr is specifically included: the verification memory node CSrIt is the Cr points
With fragment Sr, and establish the Cr mark and the fragment SrMapping relations;The data memory node DSiStore Di
It specifically includes: the data memory node DSiFor the DiDistribute fragment SDi, and establish the DiMark and the fragment
SDiMapping relations;The primary data store node DSAStore DAAnd Di, specifically include: the primary data store node DSAFor
The DADistribute fragment SDA, and establish the DAMark and the fragment SDAMapping relations, be the DiDistribute fragment
SDi, and establish the DiMark and the fragment SDiMapping relations.Further, management node establishes DiMark
With data memory node DSiWith primary data store node DSAMapping relations.Further, further, data memory node 1
Save DiMark and data memory node DSiWith primary data store node DSAMapping relations.To metadata slitting
When carrying out garbage reclamation, management node can be closed according to the mapping of the mark of meta data block and memory node in metadata slitting
System, data of the meta data block in data memory node and primary data store node are recycled, and improve metadata recycling
Efficiency.
In the embodiment of the present invention, in conjunction with the storage of mentioned-above distributed block equipment and data storage method, such as Fig. 8 institute
Show, the use of meta data block in the metadata slitting of EC algorithm is D1, D2And D3, check block C1.Management node where client
According to the subregion view " subregion-primary data store node-data memory node 1- data memory node 2- recorded in memory 202
Verification memory node " determines primary data store node, data memory node 1, data memory node 2 and verification memory node.It should
Subregion view indicates that subregion corresponds to the data storage section of primary data node and other data blocks for storing metadata slitting
Point 1 and back end 2, and the verification memory node of storage verification data, are stored in data memory node 1 and data storage section
The backup data store node of the meta data block of point 2 is main data memory node.Management node is by the metadata based on EC algorithm
D in slitting1、D2、D3And C1It is separately sent to primary data store node, data memory node 1, data memory node 2 and verification
Memory node 4.Primary data store node receives and stores D1, data memory node 1 receives and stores D2, data memory node 2 connects
It receives and stores D3, verify memory node and receive and store C1.Data memory node 1 and 2 determines master data according to subregion view respectively
Memory node, data memory node 1 is by D2Backup to primary data store node, data memory node 2 is by D3Master data is backuped to deposit
Node is stored up, primary data store node receives and stores D2And D3.In the specific implementation, as shown in figure 9, primary data store node is D1
Fragment 7 is distributed in the hard disk managed from it, establishes D1Mark and fragment 7 mapping relations;Data memory node 1 is managed from it
Hard disk in be D2Fragment 8 is distributed, D is established2Mark and fragment 8 mapping relations;Data memory node 2 manages hard from it
It is D in disk3Fragment 9 is distributed, D is established3Mark and fragment 9 mapping relations;Verify memory node is from the hard disk that it is managed
C1Fragment 10 is distributed, C is established1Mark and fragment 10 mapping relations.Primary data store node receives data memory node 1 and sends out
The D sent2The D sent with data memory node 23, primary data store node distribution fragment 11 and fragment from the hard disk that it is managed
12, primary data store node establishes D2Mark and fragment 11 mapping relations and D3Mark and fragment 12 mapping close
System.In the embodiment of the present invention, by taking the mark of meta data block and the mapping relations of fragment as an example, when 1 of storage object program into
When journey 1 hard disk of correspondence namely memory node is hard disk itself, then the mark of meta data block and the mapping relations of fragment are member
The mark of data block and the mapping relations of fragment physical address;When 1 process of storage object program corresponds to multiple hard disks,
I.e. memory node manages multiple hard disks, then the mark of meta data block and the mapping relations of fragment be include meta data block mark with
The mapping of the hard disk of the meta data block is stored, and stores the hard disk of the meta data block to the mapping of fragment.Further, D2Point
Fragment 8 and fragment 11, D Cun Chu not arrived3It is respectively stored in fragment 9 and 12, management node is established and saves D2Mark deposited with data
The mapping for storing up node 1 and primary data store node, establishes and saves D3Mark and data memory node 2 and primary data store section
The mapping relations of point.Further, data memory node 1 saves D2Mark deposited with data memory node 1 and master data
The mapping of node is stored up, data memory node 2 saves D3Mark and the mapping of data memory node 2 and primary data store node close
System.To metadata slitting carry out garbage reclamation when, management node can according to the mark of meta data block in metadata slitting with
The mapping relations of memory node recycle data of the meta data block in data memory node and primary data store node,
Improve the efficiency of metadata recycling.
Therefore, under the scene that data reliability is realized in the metadata slitting formed using EC algorithm, primary data store section
Other meta data blocks in point backup metadata slitting, because only needing to deposit the meta data block on data memory node in master data
It is backed up on storage node, compared to the prior art the more copies of all meta data blocks, reduces memory space, while accessing in client
When metadata, it is only necessary to from all meta data blocks of primary data store node visit, improve metadata access speed.
The embodiment of the present invention additionally provides non-volatile computer readable storage medium storing program for executing and computer program product, non-easy
The computer program instructions for including in the property lost computer readable storage medium and computer program product, CPU, which is executed in memory, to be added
The computer program instructions carried for realizing management node in of the invention each implement and memory node (primary data store node,
Data memory node and verification memory node) corresponding function.
The exemplary description provided in the embodiment of the present invention." fragment 1 ", " fragment 2 " in the embodiment of the present invention." point
Piece 12 " etc. is not used to considered critical precedence relationship, is only intended to distinguish different fragments.Fragment in the embodiment of the present invention
It can be the physical block etc. in hard disk.Hard disk in the embodiment of the present invention, as previously mentioned, can be in mechanical disk and solid state hard disk
At least one.The corresponding hard disk of the process of storage object program can also be storage array etc., this hair in the embodiment of the present invention
Bright embodiment is not construed as limiting this.
In several embodiments provided by the present invention, it should be understood that disclosed device, method can pass through it
Its mode is realized.For example, the division of unit described in Installation practice described above, only a kind of logic function is drawn
Point, there may be another division manner in actual implementation, such as multiple units or components may be combined or can be integrated into separately
One system, or some features can be ignored or not executed.Another point, shown or discussed mutual coupling or straight
Connecing coupling or communication connection can be through some interfaces, and the indirect coupling or communication connection of device or unit can be electrical property,
Mechanical or other forms.
The unit as illustrated by the separation member may or may not be physically separated, aobvious as unit
The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple
In network unit.It can select some or all of unit therein according to the actual needs to realize the mesh of this embodiment scheme
's.
It, can also be in addition, the functional units in various embodiments of the present invention may be integrated into one processing unit
It is that each unit physically exists alone, can also be integrated in one unit with two or more units.
Claims (10)
1. metadata storing method in a kind of distributed memory system, which is characterized in that the distributed memory system includes pipe
Node and (M+N) a memory node are managed, the management node and (M+N) a memory node are stored with the subregion of metadata slitting
View;The subregion view of the metadata slitting includes primary data store node DSA, data memory node DSiWith verification storage section
Point CSr;Wherein, N is the natural number not less than 2, and M is the natural number not less than 1, and A is one of natural number 1 into N, and i is certainly
Right each except A in addition to of the number 1 into N, r are each of natural number 1 to M;The described method includes:
The management node is that the metadata slitting determines primary data store section according to the subregion view of the metadata slitting
Point DSA, data memory node DSiWith verification memory node CSr;The metadata slitting includes meta data block DA、DiAnd verification
Block Cr;
The management node is by DiIt is sent to the data memory node DSi, by DAIt is sent to the primary data store node DSA,
By CrIt is sent to the verification memory node CSr;
The verification memory node CSrReceive and store Cr;
The data memory node DSiReceive and store Di, and according to the subregion view of the metadata slitting by DiIt is sent to institute
State primary data store node DSA;
The primary data store node DSAReceive and store DAAnd Di。
2. the method according to claim 1, wherein subregion of the management node according to the metadata slitting
View is that the metadata slitting determines primary data store node DSA, data memory node DSiWith verification memory node CSr, tool
Body includes:
The management node determines the metadata slitting pair according to the write request for generating the metadata in the metadata slitting
The subregion answered;
The management node is determined according to the subregion view that the corresponding subregion of the metadata slitting inquires the metadata slitting
The primary data store node DSA, the data memory node DSiWith the verification memory node CSr。
3. according to the method described in claim 2, it is characterized in that, the address that the management node is carried according to the write request
Determine the corresponding subregion of the metadata slitting.
4. the method according to claim 1, wherein the verification memory node CSrStorage Cr is specifically included: institute
State verification memory node CSrFragment S is distributed for the Crr, and establish the Cr mark and the fragment SrMapping close
System;
The data memory node DSiStore DiIt specifically includes: the data memory node DSiFor the DiDistribute fragment SDi, and
And establish the DiMark and the fragment SDiMapping relations;
The primary data store node DSAStore DAAnd Di, specifically include: the primary data store node DSAFor the DADistribution
Fragment SDA, and establish the DAMark and the fragment SDAMapping relations, be the DiDistribute fragment SDi, and build
Found the DiMark and the fragment SDiMapping relations.
5. a kind of distributed memory system, which is characterized in that the distributed memory system includes management node and (M+N) is a deposits
Node is stored up, the management node and (M+N) a memory node are stored with the subregion view of metadata slitting;The metadata point
The subregion view of item includes primary data store node DSA, data memory node DSiWith verification memory node CSr;Wherein, N is not
Natural number less than 2, M are natural number not less than 1, and A is one of natural number 1 into N, i be natural number 1 into N in addition to A
Each, r is natural number 1 each of to M;
The management node is used to be that the metadata slitting determines that master data is deposited according to the subregion view of the metadata slitting
Store up node DSA, data memory node DSiWith verification memory node CSr;The metadata slitting includes meta data block DA、DiAnd
Check block Cr, by DiIt is sent to the data memory node DSi, by DAIt is sent to the primary data store node DSA, by CrIt sends
To the verification memory node CSr;
The verification memory node CSrFor receiving and storing Cr;
The data memory node DSiFor receiving and storing Di, and according to the subregion view of the metadata slitting by DiIt sends
To the primary data store node DSA;
The primary data store node DSAFor receiving and storing DAAnd Di。
6. system according to claim 5, which is characterized in that the management node is specifically used for according to generation first number
The corresponding subregion of the metadata slitting is determined according to the write request of the metadata in slitting, it is corresponding according to the metadata slitting
The subregion view that subregion inquires the metadata slitting determines the primary data store node DSA, the data memory node DSi
With the verification memory node CSr。
7. system according to claim 6, which is characterized in that the management node is also used to be carried according to the write request
Address determine the corresponding subregion of the metadata slitting.
8. system according to claim 5, which is characterized in that the verification memory node CSrSpecifically for being the Cr points
With fragment Sr, and establish the Cr mark and the fragment SrMapping relations;
The data memory node DSiSpecifically for for the DiDistribute fragment SDi, and establish the DiMark with described point
Piece SDiMapping relations;
The primary data store node DSASpecifically for for the DADistribute fragment SDA, and establish the DAMark with it is described
Fragment SDAMapping relations, be the DiDistribute fragment SDi, and establish the DiMark and the fragment SDiMapping close
System.
9. a kind of non-volatile readable storage medium, which is characterized in that the non-volatile readable storage medium includes computer
Program instruction, the computer program instructions can run in distributed memory system, and distributed memory system includes management section
Point and (M+N) a memory node, the management node and (M+N) a memory node are stored with the subregion view of metadata slitting
Figure;The subregion view of the metadata slitting includes primary data store node DSA, data memory node DSiWith verification memory node
CSr;Wherein, N is the natural number not less than 2, and M is the natural number not less than 1, and A is one of natural number 1 into N, and i is nature
Each except A in addition to of the number 1 into N, r are each of natural number 1 to M;When one or more of computers execute institute
When stating computer instruction, one or more of computers are used for point according to the metadata slitting as the management node
Area's view is that the metadata slitting determines primary data store node DSA, data memory node DSiWith verification memory node CSr;
The metadata slitting includes meta data block DA、DiAnd check block Cr, by DiIt is sent to the data memory node DSi, by DA
It is sent to the primary data store node DSA, by CrIt is sent to the verification memory node CSr;One or more of computers
As the verification memory node CSrFor receiving and storing Cr;
One or more of computers are as the data memory node DSiFor receiving and storing Di, and according to first number
According to the subregion view of slitting by DiIt is sent to the primary data store node DSA;
One or more of computers are as the primary data store node DSAFor receiving and storing DAAnd Di。
10. storage medium according to claim 9, which is characterized in that further include that computer program instructions make described one
A or multiple computers are specifically used for being asked according to the writing for metadata generated in the metadata slitting as the management node
It asks and determines the corresponding subregion of the metadata slitting, the metadata slitting is inquired according to the corresponding subregion of the metadata slitting
Subregion view determine the primary data store node DSA, the data memory node DSiWith the verification memory node CSr。
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710508014.8A CN109144406B (en) | 2017-06-28 | 2017-06-28 | Metadata storage method, system and storage medium in distributed storage system |
CN202010648620.1A CN111949210A (en) | 2017-06-28 | 2017-06-28 | Metadata storage method, system and storage medium in distributed storage system |
PCT/CN2018/075077 WO2019000949A1 (en) | 2017-06-28 | 2018-02-02 | Metadata storage method and system in distributed storage system, and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710508014.8A CN109144406B (en) | 2017-06-28 | 2017-06-28 | Metadata storage method, system and storage medium in distributed storage system |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010648620.1A Division CN111949210A (en) | 2017-06-28 | 2017-06-28 | Metadata storage method, system and storage medium in distributed storage system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109144406A true CN109144406A (en) | 2019-01-04 |
CN109144406B CN109144406B (en) | 2020-08-07 |
Family
ID=64740945
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710508014.8A Active CN109144406B (en) | 2017-06-28 | 2017-06-28 | Metadata storage method, system and storage medium in distributed storage system |
CN202010648620.1A Pending CN111949210A (en) | 2017-06-28 | 2017-06-28 | Metadata storage method, system and storage medium in distributed storage system |
Family Applications After (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010648620.1A Pending CN111949210A (en) | 2017-06-28 | 2017-06-28 | Metadata storage method, system and storage medium in distributed storage system |
Country Status (2)
Country | Link |
---|---|
CN (2) | CN109144406B (en) |
WO (1) | WO2019000949A1 (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111444274A (en) * | 2020-03-26 | 2020-07-24 | 上海依图网络科技有限公司 | Data synchronization method, data synchronization system, and apparatus, medium, and system thereof |
WO2021046693A1 (en) * | 2019-09-09 | 2021-03-18 | 华为技术有限公司 | Data processing method in storage system, device, and storage system |
CN112947864A (en) * | 2021-03-29 | 2021-06-11 | 南方电网数字电网研究院有限公司 | Metadata storage method, device, equipment and storage medium |
CN113508372A (en) * | 2019-03-04 | 2021-10-15 | 日立数据管理有限公司 | Metadata routing in distributed systems |
WO2022094895A1 (en) * | 2020-11-05 | 2022-05-12 | Alibaba Group Holding Limited | Virtual data copy supporting garbage collection in distributed file systems |
CN115268801A (en) * | 2022-09-30 | 2022-11-01 | 天津卓朗昆仑云软件技术有限公司 | Backup system and method for block device |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115904794A (en) * | 2021-08-18 | 2023-04-04 | 华为技术有限公司 | Data processing method and device |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20040024963A1 (en) * | 2002-08-05 | 2004-02-05 | Nisha Talagala | Method and system for striping data to accommodate integrity metadata |
CN103399823A (en) * | 2011-12-31 | 2013-11-20 | 华为数字技术(成都)有限公司 | Method, equipment and system for storing service data |
US20140068324A1 (en) * | 2012-09-06 | 2014-03-06 | International Business Machines Corporation | Asynchronous raid stripe writesto enable response to media errors |
CN106233264A (en) * | 2014-03-31 | 2016-12-14 | 亚马逊科技公司 | Use the file storage device of variable stripe size |
CN106471461A (en) * | 2014-06-04 | 2017-03-01 | 纯存储公司 | Automatically reconfigure storage device memorizer topology |
CN106662983A (en) * | 2015-12-31 | 2017-05-10 | 华为技术有限公司 | Method, apparatus and system for data reconstruction in distributed storage system |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102411637B (en) * | 2011-12-30 | 2013-07-24 | 创新科软件技术(深圳)有限公司 | Metadata management method of distributed file system |
CN102937964B (en) * | 2012-09-28 | 2015-02-11 | 无锡江南计算技术研究所 | Intelligent data service method based on distributed system |
US9104332B2 (en) * | 2013-04-16 | 2015-08-11 | International Business Machines Corporation | Managing metadata and data for a logical volume in a distributed and declustered system |
CN103699494B (en) * | 2013-12-06 | 2017-03-15 | 北京奇虎科技有限公司 | A kind of date storage method, data storage device and distributed memory system |
CN103729436A (en) * | 2013-12-27 | 2014-04-16 | 中国科学院信息工程研究所 | Distributed metadata management method and system |
CN106599308B (en) * | 2016-12-29 | 2020-01-31 | 郭晓凤 | distributed metadata management method and system |
-
2017
- 2017-06-28 CN CN201710508014.8A patent/CN109144406B/en active Active
- 2017-06-28 CN CN202010648620.1A patent/CN111949210A/en active Pending
-
2018
- 2018-02-02 WO PCT/CN2018/075077 patent/WO2019000949A1/en active Application Filing
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20040024963A1 (en) * | 2002-08-05 | 2004-02-05 | Nisha Talagala | Method and system for striping data to accommodate integrity metadata |
CN103399823A (en) * | 2011-12-31 | 2013-11-20 | 华为数字技术(成都)有限公司 | Method, equipment and system for storing service data |
US20140068324A1 (en) * | 2012-09-06 | 2014-03-06 | International Business Machines Corporation | Asynchronous raid stripe writesto enable response to media errors |
CN106233264A (en) * | 2014-03-31 | 2016-12-14 | 亚马逊科技公司 | Use the file storage device of variable stripe size |
CN106471461A (en) * | 2014-06-04 | 2017-03-01 | 纯存储公司 | Automatically reconfigure storage device memorizer topology |
CN106662983A (en) * | 2015-12-31 | 2017-05-10 | 华为技术有限公司 | Method, apparatus and system for data reconstruction in distributed storage system |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113508372A (en) * | 2019-03-04 | 2021-10-15 | 日立数据管理有限公司 | Metadata routing in distributed systems |
WO2021046693A1 (en) * | 2019-09-09 | 2021-03-18 | 华为技术有限公司 | Data processing method in storage system, device, and storage system |
CN113544635A (en) * | 2019-09-09 | 2021-10-22 | 华为技术有限公司 | Data processing method and device in storage system and storage system |
CN111444274A (en) * | 2020-03-26 | 2020-07-24 | 上海依图网络科技有限公司 | Data synchronization method, data synchronization system, and apparatus, medium, and system thereof |
CN111444274B (en) * | 2020-03-26 | 2021-04-30 | 上海依图网络科技有限公司 | Data synchronization method, data synchronization system, and apparatus, medium, and system thereof |
WO2022094895A1 (en) * | 2020-11-05 | 2022-05-12 | Alibaba Group Holding Limited | Virtual data copy supporting garbage collection in distributed file systems |
CN116490847A (en) * | 2020-11-05 | 2023-07-25 | 阿里巴巴集团控股有限公司 | Virtual data replication supporting garbage collection in a distributed file system |
CN112947864A (en) * | 2021-03-29 | 2021-06-11 | 南方电网数字电网研究院有限公司 | Metadata storage method, device, equipment and storage medium |
CN112947864B (en) * | 2021-03-29 | 2024-03-08 | 南方电网数字平台科技(广东)有限公司 | Metadata storage method, apparatus, device and storage medium |
CN115268801A (en) * | 2022-09-30 | 2022-11-01 | 天津卓朗昆仑云软件技术有限公司 | Backup system and method for block device |
CN115268801B (en) * | 2022-09-30 | 2023-01-10 | 天津卓朗昆仑云软件技术有限公司 | Backup system and method for block device |
Also Published As
Publication number | Publication date |
---|---|
CN109144406B (en) | 2020-08-07 |
WO2019000949A1 (en) | 2019-01-03 |
CN111949210A (en) | 2020-11-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11379142B2 (en) | Snapshot-enabled storage system implementing algorithm for efficient reclamation of snapshot storage space | |
CN109144406A (en) | Metadata storing method, system and storage medium in distributed memory system | |
US11082206B2 (en) | Layout-independent cryptographic stamp of a distributed dataset | |
CN106662981B (en) | Storage device, program, and information processing method | |
US20230013281A1 (en) | Storage space optimization in a system with varying data redundancy schemes | |
US10977124B2 (en) | Distributed storage system, data storage method, and software program | |
US7447839B2 (en) | System for a distributed column chunk data store | |
US8775751B1 (en) | Aggressive reclamation of tier-1 storage space in presence of copy-on-write-snapshots | |
JP5539683B2 (en) | Scalable secondary storage system and method | |
CN106687911B (en) | Online data movement without compromising data integrity | |
US7457935B2 (en) | Method for a distributed column chunk data store | |
US20200117362A1 (en) | Erasure coding content driven distribution of data blocks | |
KR20170056418A (en) | Distributed multimode storage management | |
US11093387B1 (en) | Garbage collection based on transmission object models | |
US9514008B2 (en) | System and method for distributed processing of file volume | |
US20060218113A1 (en) | Method and system for shredding data within a data storage subsystem | |
EP3120235A1 (en) | Remote replication using mediums | |
CN110134338B (en) | Distributed storage system and data redundancy protection method and related equipment thereof | |
US20150046398A1 (en) | Accessing And Replicating Backup Data Objects | |
Manogar et al. | A study on data deduplication techniques for optimized storage | |
US10613755B1 (en) | Efficient repurposing of application data in storage environments | |
CN103970875A (en) | Parallel repeated data deleting method | |
CN110968554A (en) | Block chain storage method, storage system and storage medium based on file chain blocks | |
CN109582213A (en) | Data reconstruction method and device, data-storage system | |
US20200142628A1 (en) | Data reduction reporting in storage systems |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |