WO2021088586A1 - Method and apparatus for managing metadata in storage system - Google Patents

Method and apparatus for managing metadata in storage system Download PDF

Info

Publication number
WO2021088586A1
WO2021088586A1 PCT/CN2020/119929 CN2020119929W WO2021088586A1 WO 2021088586 A1 WO2021088586 A1 WO 2021088586A1 CN 2020119929 W CN2020119929 W CN 2020119929W WO 2021088586 A1 WO2021088586 A1 WO 2021088586A1
Authority
WO
WIPO (PCT)
Prior art keywords
metadata
data
storage
storage unit
written
Prior art date
Application number
PCT/CN2020/119929
Other languages
French (fr)
Chinese (zh)
Inventor
王晨
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2021088586A1 publication Critical patent/WO2021088586A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/202Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant
    • G06F11/2043Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant where the redundant components share a common memory address space
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/2053Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where persistent mass storage functionality or persistent mass storage control functionality is redundant
    • G06F11/2094Redundant storage or storage space
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation

Definitions

  • This application relates to the field of storage technology, and in particular to a method and device for managing metadata in a storage system.
  • the metadata instance can be understood as a program code used to implement a value-added service based on metadata, such as a service for snapshotting metadata or a service for cloning metadata.
  • the present application provides a method and device for managing metadata in a storage system, which are used to simplify the steps of performing redundancy protection on metadata.
  • a method for managing metadata in a storage system includes a plurality of storage units, and each storage unit is mapped to a physical storage space corresponding to at least two storage devices included in the storage system, that is, In other words, the storage unit is a logical storage unit.
  • the storage system includes a plurality of storage units that are used to store the metadata. Storage unit, thereby storing the metadata in at least two storage devices corresponding to the determined storage unit.
  • each storage unit is mapped to the physical storage space corresponding to at least two storage devices, in this way, when one of the storage devices corresponding to a certain storage unit fails, the The metadata is recovered from the remaining storage device corresponding to the storage unit, so that redundancy protection of the metadata can be realized. Therefore, in the embodiments of the present application, there is no need to create multiple metadata instances that store the same metadata, and a simpler method for redundant protection of metadata is provided.
  • the storage unit may store the metadata in an additional write mode.
  • the efficiency of writing metadata can be improved, and when new data is added to the storage system, the old data (that is, the previously stored data) may be determined as invalid data, and there will be The multiple consecutive old data stored in advance are all invalid data, so that the multiple consecutive storage units corresponding to the multiple invalid data are all storage units that need to be garbage collected, which can reduce the overhead of garbage collection.
  • a data write request for writing the data to be written into the storage system may be received, and write data according to the data.
  • the request and the metadata generate a record item corresponding to the metadata, and the record item includes a data write operation corresponding to the data write request and metadata updated after the data write operation is executed.
  • the storage unit for storing metadata fails, the metadata before the failure can be recovered through the content in the record, which can increase the stability of the storage system.
  • the metadata includes:
  • the logical address of each fragment is the logical address corresponding to the storage unit occupied by the fragment;
  • the metadata includes:
  • the logical address of each copy is the logical address corresponding to the storage unit occupied by the copy;
  • the set of logical addresses of each segment included in the data to be written or the logical address of each copy included in the data to be written is the logical address of the data to be written.
  • metadata can record a variety of different contents according to actual usage requirements, which can increase the flexibility and applicability of the storage system.
  • the storage system may also create a first metadata instance for performing business operations on metadata in a preset storage unit.
  • the metadata instance is no longer to perform business operations on the metadata in the preset physical storage space, but to operate on the metadata in the preset storage unit, providing a new kind of metadata How the instance was created.
  • a second metadata instance may be created, and the second metadata instance can access the metadata stored in the preset storage unit.
  • the new metadata instance when a new metadata instance is created, the new metadata instance can directly use the metadata in the shared storage unit, which reduces the process of copying and transmitting metadata to the new metadata instance. Reduce the time delay of creating a new metadata instance and improve efficiency. Furthermore, since there is no need to transmit metadata between multiple metadata instances, transmission resources can be saved.
  • a management device for metadata in a storage system may be a management node or a management server, or a device in a management node or a management server.
  • the management device includes a processor for implementing the method described in the first aspect.
  • the management device may also include a memory for storing program instructions and data. The memory is coupled with the processor, and the processor can call and execute the program instructions stored in the memory to implement any one of the methods described in the first aspect.
  • the processor of the metadata management device executes the program instructions in the memory to realize the following functions:
  • Determining a storage unit for storing the metadata where the storage system includes a plurality of storage units, and each storage unit is mapped to a physical storage space corresponding to at least two storage devices included in the storage system;
  • the metadata is stored in at least two storage devices corresponding to the storage unit.
  • the storage unit stores the metadata in an additional write mode.
  • the processor executes the program instructions stored in the memory to realize the following functions:
  • a record item corresponding to the metadata is generated; the record item includes the data write operation corresponding to the data write request and the metadata updated after the data write operation is executed .
  • the description of the metadata is similar to the corresponding content in the first aspect, and will not be repeated here.
  • the processor executes the program instructions stored in the memory to realize the following functions:
  • the processor executes the program instructions stored in the memory to realize the following functions:
  • a second metadata instance is created, and the second metadata instance can access the metadata stored in the preset storage unit.
  • a management device for metadata in a storage system may be a management node or a management server, or a device in a management node or a management server.
  • the management device may include a generating unit, a determining unit, and an executing unit, and these units may execute the corresponding function executed in any of the design examples of the first aspect, specifically:
  • the generating unit is used to generate metadata corresponding to the data to be written
  • a determining unit configured to determine a storage unit for storing the metadata, the storage system includes a plurality of storage units, and each storage unit is mapped to a physical storage space corresponding to at least two storage devices included in the storage system;
  • the execution unit is configured to store the metadata in at least two storage devices corresponding to the storage unit.
  • an embodiment of the present application provides a computer-readable storage medium that stores a computer program, and the computer program includes program instructions that, when executed by a computer, cause the The computer executes the method described in any one of the first aspect.
  • an embodiment of the present application provides a computer program product, the computer program product stores a computer program, the computer program includes program instructions, and when executed by a computer, the program instructions cause the computer to execute the first The method of any one of the aspects.
  • the present application provides a chip system.
  • the chip system includes a processor and may also include a memory for implementing the method described in the first aspect.
  • the chip system can be composed of chips, or it can include chips and other discrete devices.
  • an embodiment of the present application provides a storage system that includes the metadata management device of the storage system described in the second aspect and any one of the designs of the second aspect, or the storage system includes the first The metadata management device of the storage system described in any one of the third aspect and the third aspect is designed.
  • FIG. 1 is a schematic diagram of an example of an application scenario of an embodiment of the application
  • FIG. 2 is a schematic structural diagram of an example of a storage unit provided by this embodiment
  • FIG. 3 is a flowchart of the data storage process in an embodiment of the application.
  • FIG. 4 is a schematic diagram of an example of multiple strips included in a storage unit in an embodiment of the application.
  • FIG. 5 is a schematic diagram of an example of a mapping relationship between a storage unit and a storage device in an embodiment of the application
  • Fig. 6 is a flowchart of the metadata storage process in an embodiment of the application.
  • FIG. 7 is a schematic diagram of an example of writing metadata to a storage unit in an embodiment of the application.
  • FIG. 8 is a schematic diagram of an example of a metadata structure in an embodiment of the application.
  • FIG. 9 is a flowchart of the garbage collection process of metadata in an embodiment of the application.
  • FIG. 10 is a flowchart of the management process of metadata instances in an embodiment of the application.
  • FIG. 11 is a schematic structural diagram of an example of a metadata management device of a storage system provided in an embodiment of the application.
  • FIG. 12 is a schematic structural diagram of another example of a management device for metadata of a storage system provided in an embodiment of the application.
  • “multiple” refers to two or more than two. In view of this, “multiple” may also be understood as “at least two” in the embodiments of the present application. “At least one” can be understood as one or more, for example, one, two or more. For example, including at least one refers to including one, two or more, and does not limit which ones are included. For example, including at least one of A, B, and C, then the included can be A, B, C, A and B, A and C, B and C, or A and B and C.
  • ordinal numbers such as “first” and “second” mentioned in the embodiments of the present application are used to distinguish multiple objects, and are not used to limit the order, timing, priority, or importance of multiple objects.
  • the metadata management method provided in the embodiments of this application can be applied to various storage systems, for example, it can be a centralized storage system, or it can be a distributed storage system, or it can be a cloud storage system such as a public cloud or a private cloud. Wait, there is no restriction here.
  • the application of this metadata management method in a distributed storage system is taken as an example below.
  • FIG. 1 is a schematic diagram of an example of an application scenario provided by an embodiment of this application.
  • a client server (client server) 100 and a storage system 110 are included, and the client server 100 communicates with the storage system 110.
  • the storage system 110 includes a management module 111 and at least one storage node 112 (in FIG. 1, three storage nodes 112, respectively storage node 1 to storage node 3 are taken as an example), and the management module 111 is used to send each storage node 112 to each storage node 112. Data is written, and data is read from at least one storage node 112.
  • the storage node 112 in FIG. 1 may be an independent server, or it may also be a storage array including at least one storage device.
  • the storage device may be a hard disk drive (HDD) disk device or a solid state drive. , SSD) disk device, serial advanced technology attachment (SATA) disk device, small computer system interface (SCSI) disk device, serial attached SCSI interface (serial attached SCSI, SAS) disk Equipment or Fibre Channel (FC) disk equipment, etc.
  • HDD hard disk drive
  • SSD solid state drive
  • SATA serial advanced technology attachment
  • SCSI small computer system interface
  • serial attached SCSI interface serial attached SCSI, SAS
  • FC Fibre Channel
  • the management module 111 and at least one storage node 112 in FIG. 1 may be independent devices.
  • the management module 111 is an independent server; or, the management module 111 may also be a software module, which is deployed on a certain storage node 112.
  • the management module 111 and a certain storage node 112 run on the same server, and the specific forms of the management module 111 and the storage node 112 are not limited here.
  • each storage node includes at least one storage unit.
  • the storage unit is a segment of logical space.
  • the logical space is obtained by mapping the physical space of the storage device included in the storage node, that is, the actual The physical space still comes from multiple storage nodes.
  • FIG. 2 is a schematic structural diagram of an example of the storage unit provided in this embodiment.
  • the storage unit is a collection of multiple logic blocks.
  • the logical block is a logical space concept, which is obtained by the space division of the storage device.
  • the size of a logical block can be 4KB or 8KB, etc.
  • the size of the logical block is not limited here.
  • Each logical block corresponds to a physical storage space of the same size as the logical block in the storage device. It should be noted that multiple logical blocks included in a storage unit come from multiple storage devices, and the multiple storage devices may come from different storage nodes, or may also come from the same storage device, which is not limited here.
  • the storage node 112 may be based on a set redundant array of independent hard disks (redundant array). of independent disks, RAID) type, which maps the logical blocks in the logical block set included in the storage unit to data storage units for storing data fragments, and generates a checksum based on the data fragments stored in each logical block
  • RAID redundant array of independent disks
  • a storage unit contains one or more strips.
  • the data storage unit includes at least two logic blocks
  • the verification storage unit includes at least one logic block.
  • the storage node 112 takes out one logical block from four storage devices, such as storage device A to storage device D, to form a storage unit.
  • the four logical blocks form a striped data storage unit, and then from the other two Each logical block is taken out of the storage device to form a check storage unit.
  • the any two logical blocks in the strip fail, can be any two data storage units or logical blocks corresponding to any two check storage units, or can be a data storage unit and
  • the logic block corresponding to a check storage unit can reconstruct the data in the failed logic block according to the data in the remaining logic block.
  • the storage node 112 may also divide multiple logical blocks in the logical block set included in the storage unit into duplicate units according to the set multiple duplicate type.
  • each copy unit includes at least one logic block, the at least one logic block stores data, and the data stored in each copy unit is the same. For example, if a copy unit includes two logical blocks, the storage node 112 will take out one logical block from each of the two storage devices to form a copy unit. Assume that the multiple copy type is copy type 3, that is, one data needs to be stored in three copies.
  • the storage node 112 can each take out one logical block from the other four storage devices, and compose every two logical blocks into a copy unit to obtain another two copy units, and the same data is stored in the three copy units. In this way, when any copy unit fails, data can be obtained from the other two copy units.
  • the application scenario shown in FIG. 1 is taken as an example to describe the metadata management method provided by the embodiment of the present application.
  • the technical solutions of the embodiments of the present application will be introduced in the following four aspects.
  • the steps executed by the storage system 110 may all be executed by the management module 111 of the storage system 110.
  • the first aspect is the data storage process.
  • FIG. 3 is a flowchart of the data storage process in an embodiment of this application. The flowchart is described as follows:
  • the client server 100 sends a data write request to the storage system 110.
  • the data write request includes the data to be written and the virtual storage address of the data to be written.
  • the virtual storage address refers to the identifier and offset of the logical unit (LU) to which the data to be written is to be written, and the virtual storage address is an address visible to the client server 100.
  • the data write request may be obtained by the client server 100 according to a user's operation, or may be generated according to system requirements during operation.
  • the storage system 110 determines a storage unit for storing the data to be written.
  • the management module 111 of the storage system 110 After the management module 111 of the storage system 110 receives the data write request, it determines the storage unit of the data to be written according to the usage of the storage unit in the storage system 110 and the size of the data to be written carried in the data write request.
  • the storage system 110 determines that the data to be written requires 1 storage unit. The storage system 110 determines that no data is stored before receiving the data write request, and then determines that the storage unit occupied by the data to be written is storage unit 0.
  • the initial storage unit is the storage unit 0 as an example. In other embodiments, the initial storage unit may also be the storage unit 1, which is not limited here.
  • a storage unit may include multiple strips, that is, a striped data storage unit includes some logical blocks in the logical block set corresponding to the storage unit.
  • a storage unit contains 3 strips. If the size of the data stored in a strip is 32KB, the size of a storage unit is 96KB. If the size of the data to be written is smaller than the size of a storage unit, it can be determined to store the data to be written in a partial logical block included in a certain storage unit, for example, to store the data corresponding to at least one stripe.
  • Block For example, a storage unit includes 12 logic blocks, and each 4 logic blocks corresponds to a stripe, that is, every 4 logic blocks can store data with a data volume of 32KB.
  • the storage system 110 determines that before receiving the data write request, data has been stored in the first 4 logical blocks of storage unit 0 (that is, logical block 0 to logical block 3), then it can be determined to store the data to be written in the storage unit 0 in logic block 4 to logic block 7.
  • a storage unit may correspond to more than 3 strips. For example, it can correspond to dozens or hundreds of strips.
  • the number of strips shown in Figure 4 is only an example. It should not be understood as a restriction on the storage unit.
  • each storage device may provide a segment of logical address instead of providing it to the storage unit in the form of a logical block.
  • the storage unit is a collection of multiple logical address segments.
  • the storage system 110 stores the data to be written according to the determined storage unit for storing the data to be written.
  • the management module 111 of the storage system 110 pre-stores the mapping relationship between each storage unit and the storage device of the storage node. When the storage unit used to store the data to be written is determined, the data to be written is determined according to the mapping relationship. Write to the corresponding storage node.
  • the management module 111 of the storage system 110 stores the data written to the storage unit according to a preset RAID type.
  • the storage unit 0 includes 12 logical blocks, and each of the 4 logical blocks corresponds to a stripe, and the 4 logical blocks are used to store data fragments.
  • logical block 0 to logical block 3 are the logical blocks used to store data slices in the first strip
  • logical block 4 to logical block 7 are logical blocks used to store data slices in the second strip.
  • Logic block 8 to logic block 11 are the logic blocks used to store data slices in the third stripe, and each stripe also includes logic blocks used to store test data slices, for example, the first stripe It also includes a logic block P0 and a logic block Q0.
  • the second section also includes a logic block P1 and a logic block Q1
  • the third section also includes a logic block P2 and a logic block Q2.
  • the storage system 110 presets a mapping relationship between the logical blocks included in each segment and the storage device of the storage node.
  • the mapping relationship is: the 4 logical blocks used to store data fragments in each stripe correspond to storage device A in storage node 1 to storage node 4 in turn, and each stripe is used to verify data fragments
  • the logical blocks of corresponds to storage device A in storage node 5 and storage node 6 in turn.
  • in multiple strips corresponding to a storage unit logical blocks with the same position are from the same storage node.
  • the storage unit shown in FIG. 4 includes 3 strips.
  • the first strip includes logic block 0 to logic block 3, logic block P0, and logic block Q0
  • the second strip includes logic block 4 to logic block Q0.
  • logic block 0 and logic block 4 are located in the same position
  • logic block 1 and logic block 5 are located in the same position, and so on.
  • the management module 111 After the management module 111 receives the data to be written, it can divide the data to be written into multiple data fragments according to the preset RAID type, and calculate the parity fragments, and divide the data fragments and parity into multiple data fragments.
  • the fragments are stored in the storage device corresponding to each logical block. For example, the size of the data to be written is 32KB, and it is determined that the data to be written is stored in logical block 4 to logical block 7, then the management module 111 divides the data to be written into 4 data fragments, each The size of the fragment is 8KB, and then according to the 4 data fragments, 2 parity data fragments are calculated, and the size of each parity fragment is also 8KB.
  • the management module 111 sends each data fragment and the verification data fragment to the corresponding storage node for persistent storage.
  • the management module 111 sends 4 data fragments to storage node 1 to storage node 4 respectively, and sends 2 parity data fragments to storage node 5 and storage node 6 respectively.
  • Each storage node stores corresponding data in a preset storage device.
  • the management module 111 of the storage system 110 stores the data written to the storage unit according to a preset multiple copy type.
  • the storage unit 0 includes 12 logic blocks, and each logic block is used to store data.
  • the storage system 110 presets the mapping relationship between each logical block and the storage device of the storage node. For example, if the multiple copy type is 2 copies, each logical block can correspond to two different storage devices on a storage node, and the mapping relationship is: logical block 0 to logical block 3 correspond to storage node 1 to storage node in turn
  • the mapping relationship between storage device A and storage device B on 4, and other logical blocks and storage devices may be similar to logical block 0 to logical block 3, and will not be repeated here.
  • the management module 111 After the management module 111 receives the data to be written, it can copy the data to be written into multiple data according to the preset multiple copy type, and store the data to be written and the copied data corresponding to each logical block. In the storage device. For example, the size of the data to be written is 32KB, and the size of each logical block is 4KB. If it is determined to write the data to be written into logical blocks 0 to 4, the management module 111 divides the data to be written The data is 4 copies, and the size of each data is 8KB. Then the 4 copies of data are copied to obtain 8 copies of data. Then, the management module 111 sends the 8 copies of data to the corresponding storage node for persistent storage. With the mapping relationship as described above, the management module 111 sends two identical data of the eight pieces of data to storage nodes 1 to 4 respectively, and each storage node stores the corresponding data in a preset storage device.
  • the data to be written is written into the storage unit of the storage system 110. From a physical point of view, the data is ultimately still stored in multiple storage nodes. For each fragment, the identification of the storage unit where it is located and the location inside the storage unit are the logical address of the fragment, and the actual address of the fragment in the storage node is the physical address of the fragment. address.
  • the second aspect is the storage process of metadata.
  • the storage system 110 After the data to be written is stored in the storage device, in order to facilitate subsequent searching or reading of the data, the storage system 110 also needs to store the description information of the data.
  • the storage node receives the data read request, it is usually based on the data read request.
  • the carried information finds the metadata of the data to be read, and then further obtains the data to be read according to the metadata.
  • Metadata includes, but is not limited to: the correspondence between the logical address and physical address of each fragment, the correspondence between the logical address of the data and the logical address of each fragment contained in the data, and the The correspondence between the logical address and the physical address, and the correspondence between the logical address of the data and the logical address of the copy of the data.
  • the set of logical addresses of each fragment contained in the data or the logical address of each copy is the logical address of the data.
  • FIG. 6 is a flowchart of the metadata storage process in an embodiment of this application. The flowchart is described as follows:
  • the storage system 110 generates metadata.
  • the management module 111 of the storage system 110 After the data to be written is stored in the storage system 110, the management module 111 of the storage system 110 generates metadata of the data to be written. For example, in the embodiment shown in FIG. 3, the management module 111 stores the data to be written in logic block 0 to logic block 4 of the storage unit, and then the management module 111 will, according to the size of the data to be written, Store the address and other information to generate the metadata of the data to be written.
  • the content of metadata is not limited here.
  • the storage system 110 determines a storage unit for storing the metadata.
  • the physical storage space used by the storage system 110 for storing data and the physical storage space used for storing metadata are separated.
  • each storage node includes 4 storage devices, normally, Compared with the data itself, the metadata of the data occupies a smaller storage space. Therefore, the storage device A to the storage device C in each storage node in the storage system 110 can be set to store data, and each storage The storage device D in the node is used to store metadata; or, if the storage system 110 includes 4 storage nodes, it is also possible to set all storage devices in storage node 1 to storage node 3 to store data, and storage node 4 All storage devices are used to store metadata.
  • the storage unit used to store data and the storage unit used to store metadata are essentially the same, except that the content stored in the storage unit is different.
  • the storage unit used to store data and the storage unit used to store metadata are different.
  • the storage unit of metadata comes from different storage devices.
  • the management module 111 can determine the storage unit used to store the metadata according to the usage of the storage unit used to store the metadata in the storage system 110.
  • a storage unit for storing metadata includes 6 logical blocks, and every 2 logical blocks corresponds to a stripe.
  • the management module 111 determines that before generating the metadata, a storage unit has been used for storage. If data is stored in the first two logic blocks (ie, logic block 0 and logic block 1) of the metadata storage unit 0, the management module 111 can determine to store the generated metadata in the logic block 2 and logic block 2 of the storage unit 0.
  • Block 3. This method can be understood as storing metadata in the storage unit in an additional write manner.
  • step S63 may be performed before step S62.
  • the storage system 110 generates a record item corresponding to the metadata.
  • the management module 111 After the management module 111 generates the metadata, it can obtain the write ahead log (WAL) record item corresponding to the metadata according to the metadata and the operation corresponding to the metadata.
  • WAL write ahead log
  • the operation corresponding to metadata is illustrated by an example.
  • the management module 111 saves the record item in the memory, and the memory can be understood as the memory of the node or server where the management module 111 is located.
  • the preset condition may be that the number of WAL record items recorded in the memory reaches a threshold, then the metadata in the multiple WAL record items recorded in the memory is determined Write to the storage unit, thereby executing step S62 to determine the storage unit corresponding to the metadata in each WAL record.
  • step S62 the method of determining the storage unit corresponding to the metadata in each WAL record can be similar to step S62, that is, according to the usage of the storage unit used to store the metadata, determine the storage unit used to store each WAL record in turn.
  • the storage unit of the data will not be repeated here.
  • the storage system 110 writes the metadata into the determined storage unit.
  • Step S64 is similar to step S33, and a specific example is used for description below.
  • the storage unit 0 for storing metadata includes 6 logic blocks, and every 2 logic blocks corresponds to a stripe, that is, logic block 0 and logic block 1 correspond to the first stripe, logic block 2 and Logic block 3 corresponds to the second slice, and logic blocks 4 and 5 correspond to the third slice. These logic blocks correspond to the logic blocks used to store metadata slices in each slice. And each stripe also includes logic blocks for storing verification metadata. For example, the first stripe includes logic block P0, the second stripe includes logic block P1, and the third stripe includes logic block P1. Logic block P2.
  • the management module 111 determines to store the generated metadata in logical block 2 and logical block 3 of the storage unit 0, the data to be written can be divided into multiple metadata slices according to the preset RAID type, and The check fragment is obtained by calculation, and the metadata fragment and the check fragment are stored in a storage device corresponding to each logical block.
  • the management module 111 copies each metadata segment according to a preset multiple copy type, and then stores each metadata segment and the copied metadata segment in each storage device. It is similar to step S33 and will not be repeated here.
  • the management module 111 can perform steps S62 and S64, or perform steps S62 to S64 to store the metadata in the corresponding storage device, that is, the management module 111 can use There are two ways to store metadata. Then, the management module 111 can select which of the two ways to store metadata according to a preset judgment condition.
  • the preset judgment condition may be judging whether the metadata is metadata for new data or metadata for updating old data. If it is metadata for new data, it can be understood that it does not need to be updated in situ Step S62 and Step S64 can be performed for metadata of. If it is metadata for updating old data, it can be understood as metadata that needs to be updated in situ, then step S62 to step S64 can be performed.
  • the preset judgment condition can also be other content, which is not limited here.
  • the storage system 110 updates the metadata structure.
  • the management module 111 After the management module 111 writes the metadata into the corresponding storage device, the management module 111 also needs to update the metadata structure of the storage system 110.
  • the metadata structure may be a binary tree (Btree), a log-structured merge-tree (LSM tree), and of course, it may also be other types that can be stored in an additional write mode.
  • Btree binary tree
  • LSM tree log-structured merge-tree
  • Figure 8(a) is the Btree corresponding to the metadata that has been saved in the storage system 110.
  • the management module 111 stores the metadata in the corresponding storage device, it can be based on the metadata of the data to be written.
  • Update the Btree For example, in Figure 8(a), metadata h, metadata e, metadata s, metadata a, metadata f, and metadata q are included.
  • the name of the metadata corresponding to the data to be written is metadata z
  • the metadata z includes the metadata s, and the metadata z is taken as the child node of the metadata s, and the Btree as shown in FIG. 8(b) is obtained.
  • the name of the metadata corresponding to the data to be written is metadata h'
  • the Btree as shown in Figure 8(c) is obtained.
  • Step S65 is an optional step, which is represented by a dotted line in FIG. 6.
  • the third aspect is the garbage collection process of metadata.
  • FIG. 9 is a flowchart of the garbage collection process of metadata in an embodiment of this application. The flowchart is described as follows:
  • the storage system 110 determines a storage unit used for garbage collection.
  • garbage collection is performed in units of storage units.
  • the storage unit used for garbage collection may be that the garbage metadata contained reaches the first set threshold, or the storage unit that contains the most garbage metadata among the multiple storage units, or the effective metadata contained in the storage unit
  • the data is lower than the second set threshold, or the storage unit is the storage unit containing the least valid metadata among the plurality of storage units.
  • both metadata h and metadata h' are the parent nodes of metadata e and metadata s, and metadata h'is stored after metadata h, so ,
  • the management module 111 can determine that the metadata h is garbage metadata.
  • the logic blocks occupied by the metadata h are the logic block 1 and the logic block 2 of the storage unit 0.
  • the storage unit 0 includes 2 garbage logic blocks.
  • a preset threshold which may be 3
  • the storage unit used for garbage collection is the storage unit 0 as an example in the following.
  • the storage system 110 migrates the effective metadata in the storage unit used for garbage collection to other storage units.
  • storage unit 0 is a storage unit for garbage collection
  • the effective metadata in storage unit 0 is migrated to other storage units.
  • the garbage metadata is stored in logic block 1 to logic block 4 in storage unit 0
  • the valid metadata is stored in logic block 5 and logic block 6, the management module 111 will logically
  • the valid metadata stored in block 5 and logic block 6 are migrated to a new storage unit, for example, storage unit 2.
  • the storage system 110 releases the storage space occupied by the storage unit used for garbage collection.
  • the management module 111 may send a deletion instruction to the storage node corresponding to the storage unit 0 to delete the metadata segment corresponding to the storage unit 0 or verify the metadata segment.
  • the fourth aspect is the management process of metadata instances.
  • the storage system 110 can implement various value-added services by creating different metadata instances, such as a service for snapshotting metadata or a service for cloning metadata.
  • Metadata instances can be understood as program codes used to implement a certain value-added service.
  • FIG. 10 is a flowchart of the metadata instance management process in an embodiment of this application. The flowchart is described as follows:
  • the storage system 110 creates a first metadata instance.
  • the first metadata instance is used to perform business operations on the metadata stored in the preset storage unit.
  • the business operation is a snapshot operation, that is, the first metadata instance is an instance of snapshotting metadata in a preset storage unit.
  • the preset storage unit may be part or all of the storage units used to store metadata in the storage system 110.
  • the storage units used to store metadata in the storage system 110 include storage units 0 to 4, and
  • the preset storage units may be storage unit 0 and storage unit 1, which can be set according to actual usage.
  • the storage space corresponding to physical address 1 to physical address 20 in the storage system 110 needs to be stored.
  • the management module 111 of the storage system 110 will create at least two metadata instances for the storage space.
  • the at least two metadata instances may include metadata instance 1 and metadata instance 2.
  • the management module 111 allocates storage space for storing metadata for each metadata instance.
  • the storage space for storing metadata for metadata instance 1 is the storage space corresponding to physical address 50 to physical address 55, which is
  • the storage space configured in the metadata instance 2 is the storage space corresponding to the physical address 60 to the physical address 65.
  • metadata instance 1 stores the metadata of the data in its configured storage space, for example, the metadata of the data is metadata 1, metadata instance 1 Store metadata 1 in a storage space starting at physical address 50. Then, the management module 111 of the storage system 110 copies the metadata stored in the metadata instance 1, and stores the copied metadata in the storage space configured for the metadata instance 2. For example, the management module 111 copies the metadata 1 , And store the copied metadata 1 in another storage space whose starting address is the physical address 60. It can be seen that in related technologies, multiple metadata instances need to be created, which is more complicated. In the embodiment of the present application, since the metadata in the storage system 110 has been stored in the storage device using a preset RAID type or multiple copy type, the metadata has been redundantly protected. Therefore, in this case, In the embodiments of the application, there is no need to create multiple metadata instances storing the same metadata, and a simpler method for redundant protection of metadata is provided.
  • the preset RAID type when used to store metadata, since there is no need to store multiple copies of the same metadata, the storage space occupied by the metadata can be reduced, and the storage space utilization can be improved.
  • the storage system 110 determines that the first metadata instance is faulty, and then creates a second metadata instance.
  • the management module 111 can create a second metadata instance for taking a snapshot of the metadata, and set the storage unit and the first metadata instance that can be accessed by the second metadata instance the same.
  • the storage units that can be accessed by the first metadata instance are storage unit 0 and storage unit 1
  • the storage units that can be accessed by the second metadata instance are also storage unit 0 and storage unit 1, thereby realizing multiple metadata instances Sharing of accessible storage units, so that when a new metadata instance is created, the new metadata instance can directly use the metadata in the shared storage unit, reducing the need to copy and transfer metadata to the new metadata instance
  • the data process can reduce the time delay of creating a new metadata instance and improve efficiency. Furthermore, since there is no need to transmit metadata between multiple metadata instances, transmission resources can be saved.
  • metadata in the storage unit preset for metadata instance management is taken as an example for description.
  • creation and management of metadata instances are not limited to this .
  • the storage system may include a hardware structure and/or a software module, and a hardware structure, a software module, or a hardware structure plus a software module Form to achieve the above functions. Whether a certain function of the above-mentioned functions is executed by a hardware structure, a software module, or a hardware structure plus a software module depends on the specific application and design constraint conditions of the technical solution.
  • FIG. 11 shows a schematic structural diagram of an apparatus 1100 for managing metadata of a storage system.
  • the apparatus 1100 for managing metadata of the storage system may be the device where the management module 111 in the embodiment shown in FIG. 3 or FIG. 6 or FIG. 9 or FIG. 10 is located, or it may be located in the device where the management module 111 is located. To realize the functions of the management module 111.
  • the apparatus 1100 for managing metadata of the storage system may be a hardware structure or a hardware structure plus a software module.
  • the device 1100 for managing metadata of the storage system includes at least one memory for storing program instructions and/or data.
  • the apparatus 1100 for managing metadata of the storage system further includes at least one processor, the at least one processor is coupled to the memory, and the at least one processor can execute the program instructions stored in the memory.
  • the apparatus 1100 for managing metadata of a storage system may include a generating unit 1101, a determining unit 1102, and an executing unit 1103.
  • the generating unit 1101 may call the processor to execute the program instructions stored in the memory to execute step S61 in the embodiment shown in FIG. 6 and/or other processes for supporting the technology described herein.
  • the determining unit 1102 may call the processor to execute the program instructions stored in the memory to execute step S32 in the embodiment shown in FIG. 3, or execute step S62 in the embodiment shown in FIG. 6, or execute step S62 in the embodiment shown in FIG. Step S91 in the embodiment, and/or other processes used to support the technology described herein.
  • the execution unit 1103 may call the processor to execute the program instructions stored in the memory to execute step S33 in the embodiment shown in FIG. 3, or execute steps S63 to S65 in the embodiment shown in FIG. 6, or execute step S63 to step S65 in the embodiment shown in FIG. Steps S92 to S93 in the embodiment shown, or steps S101 to S102 in the embodiment shown in FIG. 10 are executed, and/or other processes used to support the technology described herein.
  • the apparatus 1100 for managing metadata of the storage system may further include a receiving unit 1104, which may call the processor to execute the program instructions stored in the memory to execute the program instructions in the embodiment shown in FIG. 3 Step S31, and/or other processes used to support the techniques described herein.
  • the receiving unit 1104 is used for the storage system metadata management device 1100 to communicate with other modules, and it can be a circuit, a device, an interface, a bus, a software module, a transceiver, or any other device that can implement communication.
  • the receiving unit 1104 is not necessary. In FIG. 11, the receiving unit 1104 is represented by a dotted line.
  • the division of modules in the embodiment shown in FIG. 11 is illustrative, and is only a logical function division. In actual implementation, there may be other division methods.
  • the functional modules in each embodiment of the present application may be integrated In a processor, it can also exist alone physically, or two or more modules can be integrated into one module.
  • the above-mentioned integrated modules can be implemented in the form of hardware or software function modules.
  • FIG. 12 shows an apparatus 1200 for managing metadata of a storage system provided by an embodiment of the present application.
  • the apparatus 1200 for managing metadata of a storage system may be the implementation shown in FIG. 3 or FIG. 6 or FIG. 9 or FIG. 10.
  • the device where the management module 111 is located, or the device where the management module 111 is located, can be used to implement the functions of the management module 111.
  • the apparatus 1200 for managing metadata of a storage system includes at least one processor 1220, and the apparatus 1200 for managing metadata of a storage system is used to implement or support the function of the management module 111 in the method provided in the embodiment of the present application.
  • the processor 1220 may determine a storage unit for storing metadata. For details, refer to the detailed description in the method example, which is not repeated here.
  • the apparatus 1200 for managing metadata of the storage system may further include at least one memory 1230 for storing program instructions and/or data.
  • the memory 1230 and the processor 1220 are coupled.
  • the coupling in the embodiments of the present application is an indirect coupling or communication connection between devices, units or modules, and may be in electrical, mechanical or other forms, and is used for information exchange between devices, units or modules.
  • the processor 1220 may operate in cooperation with the memory 1230.
  • the processor 1220 may execute program instructions stored in the memory 1230. At least one of the at least one memory may be included in the processor.
  • the apparatus 1200 for managing metadata of the storage system may further include a communication interface 1210 for communicating with other devices through a transmission medium, so that the apparatus 1200 for managing metadata of the storage system may communicate with other devices.
  • the other device may be a client or a storage device.
  • the processor 1220 may use the communication interface 1210 to send and receive data.
  • the embodiment of the present application does not limit the specific connection medium between the aforementioned communication interface 1210, the processor 1220, and the memory 1230.
  • the memory 1230, the processor 1220, and the communication interface 1210 are connected by a bus 1250.
  • the bus is represented by a thick line in FIG. , Is not limited.
  • the bus can be divided into an address bus, a data bus, a control bus, and so on. For ease of representation, only one thick line is used in FIG. 12 to represent it, but it does not mean that there is only one bus or one type of bus.
  • the processor 1220 may be a general-purpose processor, a digital signal processor, an application specific integrated circuit, a field programmable gate array or other programmable logic device, a discrete gate or transistor logic device, or a discrete hardware component. Or execute the methods, steps, and logical block diagrams disclosed in the embodiments of the present application.
  • the general-purpose processor may be a microprocessor or any conventional processor or the like. The steps of the method disclosed in the embodiments of the present application may be directly embodied as being executed and completed by a hardware processor, or executed and completed by a combination of hardware and software modules in the processor.
  • the memory 1230 may be a non-volatile memory, such as a hard disk drive (HDD) or a solid-state drive (SSD), etc., or a volatile memory (volatile memory), For example, random-access memory (RAM).
  • the memory is any other medium that can be used to carry or store desired program codes in the form of instructions or data structures and that can be accessed by a computer, but is not limited to this.
  • the memory in the embodiments of the present application may also be a circuit or any other device capable of realizing a storage function for storing program instructions and/or data.
  • the embodiment of the present application also provides a computer-readable storage medium, including instructions, which when run on a computer, cause the computer to execute the management module 111 in the embodiment shown in FIG. 3 or FIG. 6 or FIG. 9 or FIG. 10 Methods.
  • the embodiments of the present application also provide a computer program product, including instructions, which when run on a computer, cause the computer to execute the method executed by the management module 111 in the embodiment shown in FIG. 3 or FIG. 6 or FIG. 9 or FIG. 10 .
  • the embodiment of the present application provides a storage system, and the storage system includes the management module 111 in the embodiment shown in FIG. 3 or FIG. 6 or FIG. 9 or FIG. 10.
  • the methods provided in the embodiments of the present application may be implemented in whole or in part by software, hardware, firmware, or any combination thereof.
  • software When implemented by software, it can be implemented in the form of a computer program product in whole or in part.
  • the computer program product includes one or more computer instructions.
  • the computer may be a general-purpose computer, a special-purpose computer, a computer network, network equipment, user equipment, or other programmable devices.
  • the computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium. For example, the computer instructions may be transmitted from a website, computer, server, or data center.
  • a computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server or data center integrated with one or more available media.
  • the available medium may be a magnetic medium (for example, a floppy disk, hard disk, Magnetic tape), optical media (for example, digital video disc (DVD for short)), or semiconductor media (for example, SSD).

Abstract

Provided are a method and apparatus for managing metadata in a storage system; said storage system comprises a plurality of storage units, and each storage unit is mapped to a physical storage space corresponding to at least two storage devices comprised by the storage system, that is to say, the storage unit is a logical storage unit; in the method, after metadata corresponding to data to be written is generated in the storage system, the storage unit used for storing the metadata is determined from among the plurality of storage units comprised by the storage system, thus the metadata is stored in at least two storage devices corresponding to the determined storage unit. Each storage unit is mapped to a physical storage space corresponding to at least two storage devices; thus if one of the plurality of storage devices corresponding to one storage unit fails, the metadata can also be recovered from the remaining storage device corresponding to the storage unit, thereby achieving metadata redundancy protection.

Description

一种存储系统中的元数据的管理方法及装置Method and device for managing metadata in storage system
本申请要求在2019年11月05日提交中国专利局、申请号为201911072812.6、申请名称为“一种硬盘”的中国专利申请的优先权,以及2020年1月9日提交中国专利局、申请号为202010021351.6、申请名称为“一种存储系统中的元数据的管理方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application requires the priority of the Chinese patent application filed with the Chinese Patent Office, the application number is 201911072812.6, and the application name is "a kind of hard disk" on November 5, 2019, and the Chinese Patent Office with the application number filed on January 9, 2020 It is the priority of the Chinese patent application of 202010021351.6 and the application title is "A method and device for managing metadata in a storage system", the entire content of which is incorporated into this application by reference.
技术领域Technical field
本申请涉及存储技术领域,尤其涉及一种存储系统中的元数据的管理方法及装置。This application relates to the field of storage technology, and in particular to a method and device for managing metadata in a storage system.
背景技术Background technique
在存储系统中,为了保证其存储的数据以及元数据的可靠性,通常需要对数据以及元数据进行冗余保护。In a storage system, in order to ensure the reliability of the stored data and metadata, it is usually necessary to perform redundancy protection on the data and metadata.
以对元数据进行冗余保护为例,一种方式是,在物理存储空间创建多个元数据实例,每个实例中保存一份元数据副本,这样,当其中一个元数据实例发生故障后,还可以通过其他的元数据实例获取该物理存储空间中存储的数据的元数据,以实现对该物理存储空间中存储的数据的元数据的冗余保护。其中,元数据实例可以理解为,用于实现基于元数据的增值业务的程序代码,该增值业务例如对元数据打快照的业务或者对元数据进行克隆的业务等。Take the redundancy protection of metadata as an example. One way is to create multiple metadata instances in the physical storage space, and save a copy of the metadata in each instance, so that when one of the metadata instances fails, The metadata of the data stored in the physical storage space may also be obtained through other metadata instances, so as to implement redundancy protection of the metadata of the data stored in the physical storage space. Among them, the metadata instance can be understood as a program code used to implement a value-added service based on metadata, such as a service for snapshotting metadata or a service for cloning metadata.
上述技术方案中由于需要创建多个元数据实例才能实现对元数据的冗余保护,实现方式复杂。In the above technical solution, since multiple metadata instances need to be created to realize the redundancy protection of metadata, the implementation method is complicated.
发明内容Summary of the invention
本申请提供一种存储系统中的元数据的管理方法及装置,用于简化对元数据进行冗余保护的步骤。The present application provides a method and device for managing metadata in a storage system, which are used to simplify the steps of performing redundancy protection on metadata.
第一方面,提供一种存储系统中的元数据的管理方法,该存储系统包括多个存储单元,每个存储单元映射到该存储系统包括的至少两个存储设备对应的物理存储空间,也就是说,该存储单元为逻辑存储单元,在该方法中,当存储系统中生成待写入数据对应的元数据后,则从该存储系统包括的多个存储单元中确定用于存储该元数据的存储单元,从而将该元数据存储至与确定的存储单元对应的至少两个存储设备中。In a first aspect, a method for managing metadata in a storage system is provided. The storage system includes a plurality of storage units, and each storage unit is mapped to a physical storage space corresponding to at least two storage devices included in the storage system, that is, In other words, the storage unit is a logical storage unit. In this method, after the metadata corresponding to the data to be written is generated in the storage system, the storage system includes a plurality of storage units that are used to store the metadata. Storage unit, thereby storing the metadata in at least two storage devices corresponding to the determined storage unit.
在上述技术方案中,由于每个存储单元会映射到至少两个存储设备对应的物理存储空间,这样,当某一个存储单元对应的多个存储设备中的一个存储设备发生故障,还可以从与该存储单元对应的剩余存储设备中恢复出该元数据,从而可以实现对元数据的冗余保护。因此,在本申请实施例中,不需要通过创建多个存储有相同元数据的元数据实例,提供了一种较为简单的对元数据进行冗余保护的方法。In the above technical solution, since each storage unit is mapped to the physical storage space corresponding to at least two storage devices, in this way, when one of the storage devices corresponding to a certain storage unit fails, the The metadata is recovered from the remaining storage device corresponding to the storage unit, so that redundancy protection of the metadata can be realized. Therefore, in the embodiments of the present application, there is no need to create multiple metadata instances that store the same metadata, and a simpler method for redundant protection of metadata is provided.
在一种可能的设计中,存储单元可以以追加写的方式存储该元数据。In a possible design, the storage unit may store the metadata in an additional write mode.
通过追加写的方式,可以提高元数据的写入效率,且,当存储系统中追加写入新数据后,旧数据(也就是在先存储的数据)可能会被确定为无效数据,从而会存在在先存储的多个连续的旧数据均为无效数据,从而与该多个无效数据对应的多个连续的存储单元均为需要进行垃圾回收的存储单元,可以减少进行垃圾回收的开销。By means of additional writing, the efficiency of writing metadata can be improved, and when new data is added to the storage system, the old data (that is, the previously stored data) may be determined as invalid data, and there will be The multiple consecutive old data stored in advance are all invalid data, so that the multiple consecutive storage units corresponding to the multiple invalid data are all storage units that need to be garbage collected, which can reduce the overhead of garbage collection.
在一种可能的设计中,在确定用于存储所述元数据的存储单元之前,可以接收用于将所述待写入数据写入所述存储系统的数据写请求,并根据所述数据写请求以及所述元数据,生成与所述元数据对应的记录项,所述记录项包括所述数据写请求对应的写数据操作以及执行所述写数据操作后更新的元数据。In a possible design, before determining the storage unit for storing the metadata, a data write request for writing the data to be written into the storage system may be received, and write data according to the data. The request and the metadata generate a record item corresponding to the metadata, and the record item includes a data write operation corresponding to the data write request and metadata updated after the data write operation is executed.
在这种方式下,当用于存储元数据的存储单元出现故障时,可以通过记录项中的内容恢复出在出现故障之前的元数据,可以增加存储系统的稳定性。In this way, when the storage unit for storing metadata fails, the metadata before the failure can be recovered through the content in the record, which can increase the stability of the storage system.
在一种可能的设计中,所述元数据包括:In a possible design, the metadata includes:
所述待写入数据的每个分片的逻辑地址与物理地址之间的对应关系,所述待写入数据所占用的存储单元的逻辑地址与所述待写入数据所包含的各个分片的逻辑地址之间的对应关系,所述每个分片的逻辑地址为所述分片所占用的存储单元对应的逻辑地址;或,The correspondence between the logical address and the physical address of each segment of the data to be written, the logical address of the storage unit occupied by the data to be written and each segment contained in the data to be written Correspondence between the logical addresses of each fragment, the logical address of each fragment is the logical address corresponding to the storage unit occupied by the fragment; or,
所述元数据包括:The metadata includes:
所述待写入数据的各个副本的逻辑地址与物理地址之间的对应关系,所述待写入数据的逻辑地址与所述待写入数据所包含的各个副本的逻辑地址之间的对应关系,所述每个副本的逻辑地址为所述副本所占用的存储单元对应的逻辑地址;The correspondence between the logical address and the physical address of each copy of the data to be written, and the correspondence between the logical address of the data to be written and the logical address of each copy contained in the data to be written , The logical address of each copy is the logical address corresponding to the storage unit occupied by the copy;
所述待写入数据所包含的各个分片的逻辑地址的集合或者所述待写入数据所包含的各个副本的逻辑地址即所述待写入数据的逻辑地址。The set of logical addresses of each segment included in the data to be written or the logical address of each copy included in the data to be written is the logical address of the data to be written.
在上述技术方案中,元数据可以根据实际使用需求,记录多种不同的内容,可以增加存储系统的灵活性以及适用性。In the above technical solution, metadata can record a variety of different contents according to actual usage requirements, which can increase the flexibility and applicability of the storage system.
在一种可能的设计中,存储系统还可以创建用于对预设的存储单元中的元数据进行业务操作的第一元数据实例。In a possible design, the storage system may also create a first metadata instance for performing business operations on metadata in a preset storage unit.
在上述技术方案中,元数据实例不再是对预设的物理存储空间中的元数据进行业务操作,而是对预设的存储单元中的元数据进行操作,提供了一种新的元数据实例的创建方式。In the above technical solution, the metadata instance is no longer to perform business operations on the metadata in the preset physical storage space, but to operate on the metadata in the preset storage unit, providing a new kind of metadata How the instance was created.
在一种可能的设计中,在所述第一元数据实例发生故障后,可以创建第二元数据实例,所述第二元数据实例能够访问所述预设的存储单元中存储的元数据。In a possible design, after the first metadata instance fails, a second metadata instance may be created, and the second metadata instance can access the metadata stored in the preset storage unit.
在上述技术方案中,当创建新的元数据实例后,该新的元数据实例可以直接使用共享的存储单元中的元数据,减少了向新的元数据实例复制并传输元数据的过程,可以减少创建新的元数据实例的时延,提高效率。进一步,由于多个元数据实例之间不用传输元数据,从而可以节省传输资源。In the above technical solution, when a new metadata instance is created, the new metadata instance can directly use the metadata in the shared storage unit, which reduces the process of copying and transmitting metadata to the new metadata instance. Reduce the time delay of creating a new metadata instance and improve efficiency. Furthermore, since there is no need to transmit metadata between multiple metadata instances, transmission resources can be saved.
第二方面,提供一种存储系统中的元数据的管理装置,该管理装置可以是管理节点或者管理服务端,也可以是管理节点或者管理服务端中的装置。该管理装置包括处理器,用于实现上述第一方面描述的方法。该管理装置还可以包括存储器,用于存储程序指令和数据。该存储器与该处理器耦合,该处理器可以调用并执行该存储器中存储的程序指令,用于实现上述第一方面描述的方法中的任意一种方法。In a second aspect, a management device for metadata in a storage system is provided. The management device may be a management node or a management server, or a device in a management node or a management server. The management device includes a processor for implementing the method described in the first aspect. The management device may also include a memory for storing program instructions and data. The memory is coupled with the processor, and the processor can call and execute the program instructions stored in the memory to implement any one of the methods described in the first aspect.
在一种可能的设计中,该元数据的管理装置的处理器执行存储器中的程序指令,以实现如下功能:In a possible design, the processor of the metadata management device executes the program instructions in the memory to realize the following functions:
生成待写入数据对应的元数据;Generate metadata corresponding to the data to be written;
确定用于存储所述元数据的存储单元,所述存储系统包括多个存储单元,每个存储单元映射到所述存储系统包括的至少两个存储设备对应的物理存储空间;Determining a storage unit for storing the metadata, where the storage system includes a plurality of storage units, and each storage unit is mapped to a physical storage space corresponding to at least two storage devices included in the storage system;
将所述元数据存储至所述存储单元对应的至少两个存储设备中。The metadata is stored in at least two storage devices corresponding to the storage unit.
在一种可能的设计中,该存储单元以追加写的方式存储所述元数据。In a possible design, the storage unit stores the metadata in an additional write mode.
在一种可能的设计中,处理器执行存储器中存储的程序指令,以实现如下功能:In a possible design, the processor executes the program instructions stored in the memory to realize the following functions:
接收数据写请求,所述数据写请求用于将所述待写入数据写入所述存储系统;Receiving a data write request, where the data write request is used to write the data to be written into the storage system;
根据所述数据写请求以及所述元数据,生成与所述元数据对应的记录项;所述记录项包括所述数据写请求对应的写数据操作以及执行所述写数据操作后更新的元数据。According to the data write request and the metadata, a record item corresponding to the metadata is generated; the record item includes the data write operation corresponding to the data write request and the metadata updated after the data write operation is executed .
在一种可能的设计中,对元数据的说明与第一方面中相应的内容相似,在此不再赘述。In a possible design, the description of the metadata is similar to the corresponding content in the first aspect, and will not be repeated here.
在一种可能的设计中,处理器执行存储器中存储的程序指令,以实现如下功能:In a possible design, the processor executes the program instructions stored in the memory to realize the following functions:
创建第一元数据实例,所述第一元数据实例用于对预设的存储单元中的元数据进行业务操作。Create a first metadata instance, where the first metadata instance is used to perform business operations on metadata in a preset storage unit.
在一种可能的设计中,处理器执行存储器中存储的程序指令,以实现如下功能:In a possible design, the processor executes the program instructions stored in the memory to realize the following functions:
在所述第一元数据实例发生故障后,创建第二元数据实例,所述第二元数据实例能够访问所述预设的存储单元中存储的元数据。After the first metadata instance fails, a second metadata instance is created, and the second metadata instance can access the metadata stored in the preset storage unit.
第三方面,提供一种存储系统中的元数据的管理装置,该管理装置可以是管理节点或者管理服务端,也可以是管理节点或者管理服务端中的装置。该管理装置可以包括生成单元,确定单元和执行单元,这些单元可以执行上述第一方面任一种设计示例中的所执行的相应功能,具体的:In a third aspect, a management device for metadata in a storage system is provided. The management device may be a management node or a management server, or a device in a management node or a management server. The management device may include a generating unit, a determining unit, and an executing unit, and these units may execute the corresponding function executed in any of the design examples of the first aspect, specifically:
生成单元,用于生成待写入数据对应的元数据;The generating unit is used to generate metadata corresponding to the data to be written;
确定单元,用于确定用于存储所述元数据的存储单元,所述存储系统包括多个存储单元,每个存储单元映射到所述存储系统包括的至少两个存储设备对应的物理存储空间;A determining unit, configured to determine a storage unit for storing the metadata, the storage system includes a plurality of storage units, and each storage unit is mapped to a physical storage space corresponding to at least two storage devices included in the storage system;
执行单元,用于将所述元数据存储至所述存储单元对应的至少两个存储设备中。The execution unit is configured to store the metadata in at least two storage devices corresponding to the storage unit.
第四方面,本申请实施例提供一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序包括程序指令,所述程序指令当被计算机执行时,使所述计算机执行第一方面中任意一项所述的方法。In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium that stores a computer program, and the computer program includes program instructions that, when executed by a computer, cause the The computer executes the method described in any one of the first aspect.
第五方面,本申请实施例提供一种计算机程序产品,所述计算机程序产品存储有计算机程序,所述计算机程序包括程序指令,所述程序指令当被计算机执行时,使所述计算机执行第一方面中任意一项所述的方法。In a fifth aspect, an embodiment of the present application provides a computer program product, the computer program product stores a computer program, the computer program includes program instructions, and when executed by a computer, the program instructions cause the computer to execute the first The method of any one of the aspects.
第六方面,本申请提供了一种芯片系统,该芯片系统包括处理器,还可以包括存储器,用于实现第一方面所述的方法。该芯片系统可以由芯片构成,也可以包含芯片和其他分立器件。In a sixth aspect, the present application provides a chip system. The chip system includes a processor and may also include a memory for implementing the method described in the first aspect. The chip system can be composed of chips, or it can include chips and other discrete devices.
第七方面,本申请实施例提供了一种存储系统,该存储系统包括第二方面及第二方面任一种设计中所述的存储系统的元数据的管理装置,或者,该存储系统包括第三方面及第三方面任一种设计中所述的存储系统的元数据的管理装置。In a seventh aspect, an embodiment of the present application provides a storage system that includes the metadata management device of the storage system described in the second aspect and any one of the designs of the second aspect, or the storage system includes the first The metadata management device of the storage system described in any one of the third aspect and the third aspect is designed.
上述第二方面至第七方面及其实现方式的有益效果可以参考对第一方面的方法及其实现方式的有益效果的描述。For the beneficial effects of the foregoing second to seventh aspects and their implementation manners, reference may be made to the description of the beneficial effects of the method and implementation manners of the first aspect.
附图说明Description of the drawings
图1为本申请实施例的应用场景的一种示例的示意图;FIG. 1 is a schematic diagram of an example of an application scenario of an embodiment of the application;
图2为本实施例提供的存储单元的一种示例的结构示意图;2 is a schematic structural diagram of an example of a storage unit provided by this embodiment;
图3为本申请实施例中数据的存储过程的流程图;Figure 3 is a flowchart of the data storage process in an embodiment of the application;
图4为本申请实施例中存储单元包括的多个分条的一种示例的示意图;4 is a schematic diagram of an example of multiple strips included in a storage unit in an embodiment of the application;
图5为本申请实施例中存储单元与存储设备的映射关系的一种示例的示意图;5 is a schematic diagram of an example of a mapping relationship between a storage unit and a storage device in an embodiment of the application;
图6为本申请实施例中元数据的存储过程的流程图;Fig. 6 is a flowchart of the metadata storage process in an embodiment of the application;
图7为本申请实施例中将元数据写入到存储单元的一种示例的示意图;FIG. 7 is a schematic diagram of an example of writing metadata to a storage unit in an embodiment of the application;
图8为本申请实施例中元数据结构的示例的示意图;FIG. 8 is a schematic diagram of an example of a metadata structure in an embodiment of the application;
图9为本申请实施例中元数据的垃圾回收过程的流程图;FIG. 9 is a flowchart of the garbage collection process of metadata in an embodiment of the application;
图10为本申请实施例中元数据实例的管理过程的流程图;FIG. 10 is a flowchart of the management process of metadata instances in an embodiment of the application;
图11为本申请实施例中提供的存储系统的元数据的管理装置的一种示例的结构示意图;FIG. 11 is a schematic structural diagram of an example of a metadata management device of a storage system provided in an embodiment of the application; FIG.
图12为本申请实施例中提供的存储系统的元数据的管理装置的另一种示例的结构示意图。FIG. 12 is a schematic structural diagram of another example of a management device for metadata of a storage system provided in an embodiment of the application.
具体实施方式Detailed ways
为了使本申请实施例的目的、技术方案和优点更加清楚,下面将结合附图对本申请实施例作进一步地详细描述。In order to make the objectives, technical solutions, and advantages of the embodiments of the present application clearer, the embodiments of the present application will be described in further detail below with reference to the accompanying drawings.
本申请实施例中“多个”是指两个或两个以上,鉴于此,本申请实施例中也可以将“多个”理解为“至少两个”。“至少一个”,可理解为一个或多个,例如理解为一个、两个或更多个。例如,包括至少一个,是指包括一个、两个或更多个,而且不限制包括的是哪几个,例如,包括A、B和C中的至少一个,那么包括的可以是A、B、C、A和B、A和C、B和C、或A和B和C。“和/或”,描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,字符“/”,如无特殊说明,一般表示前后关联对象是一种“或”的关系。在本申请实施例中,“节点”和“节点”可以互换使用。In the embodiments of the present application, "multiple" refers to two or more than two. In view of this, "multiple" may also be understood as "at least two" in the embodiments of the present application. "At least one" can be understood as one or more, for example, one, two or more. For example, including at least one refers to including one, two or more, and does not limit which ones are included. For example, including at least one of A, B, and C, then the included can be A, B, C, A and B, A and C, B and C, or A and B and C. "And/or" describes the association relationship of the associated objects, indicating that there can be three types of relationships, for example, A and/or B, which can mean: A alone exists, A and B exist at the same time, and B exists alone. In addition, the character "/", unless otherwise specified, generally indicates that the associated objects before and after are in an "or" relationship. In the embodiments of the present application, "node" and "node" can be used interchangeably.
除非有相反的说明,本申请实施例提及“第一”、“第二”等序数词用于对多个对象进行区分,不用于限定多个对象的顺序、时序、优先级或者重要程度。Unless otherwise stated, ordinal numbers such as “first” and “second” mentioned in the embodiments of the present application are used to distinguish multiple objects, and are not used to limit the order, timing, priority, or importance of multiple objects.
本申请实施例中提供的元数据的管理方法可以应用于各种存储系统,例如,可以是集中式存储系统,或者可以是分布式存储系统,或者也可以是公有云或者私有云等云存储系统等,在此不作限制。为方便说明,下文中以该元数据的管理方法应用在分布式存储系统为例。The metadata management method provided in the embodiments of this application can be applied to various storage systems, for example, it can be a centralized storage system, or it can be a distributed storage system, or it can be a cloud storage system such as a public cloud or a private cloud. Wait, there is no restriction here. For the convenience of description, the application of this metadata management method in a distributed storage system is taken as an example below.
请参考图1,为本申请实施例提供的应用场景的一种示例的示意图。在图1中,包括客户端服务器(client server)100和存储系统110,客户端服务器100与存储系统110通信。存储系统110包括管理模块111和至少一个存储节点112(在图1中以3个存储节点112,分别为存储节点1~存储节点3为例),管理模块111用于向每个存储节点112中写入数据,以及,从至少一个存储节点112中读取数据。Please refer to FIG. 1, which is a schematic diagram of an example of an application scenario provided by an embodiment of this application. In FIG. 1, a client server (client server) 100 and a storage system 110 are included, and the client server 100 communicates with the storage system 110. The storage system 110 includes a management module 111 and at least one storage node 112 (in FIG. 1, three storage nodes 112, respectively storage node 1 to storage node 3 are taken as an example), and the management module 111 is used to send each storage node 112 to each storage node 112. Data is written, and data is read from at least one storage node 112.
图1中的存储节点112可以是独立的服务器,或者,也可以是包括至少一个存储设备的存储阵列,该存储设备可以是硬盘驱动器(hard disk drive,HDD)磁盘设备、固态驱动器(solid state drive,SSD)磁盘设备,串行高级技术附件(serial advanced technology attachment,SATA)磁盘设备、小型计算机系统接口(small computer system interface,SCSI)磁盘设备、串行连接SCSI接口(serial attached SCSI,SAS)磁盘设备或者光纤通道接口(fibre channel,FC)磁盘设备等。The storage node 112 in FIG. 1 may be an independent server, or it may also be a storage array including at least one storage device. The storage device may be a hard disk drive (HDD) disk device or a solid state drive. , SSD) disk device, serial advanced technology attachment (SATA) disk device, small computer system interface (SCSI) disk device, serial attached SCSI interface (serial attached SCSI, SAS) disk Equipment or Fibre Channel (FC) disk equipment, etc.
图1中的管理模块111和至少一个存储节点112可以是相互独立的设备,例如,管理模块111为一个独立的服务器;或者,管理模块111也可以是软件模块,部署在某一个存储节点112上,例如,管理模块111和某一个存储节点112运行在同一个服务器上,在此不对管理模块111和存储节点112的具体形式进行限制。The management module 111 and at least one storage node 112 in FIG. 1 may be independent devices. For example, the management module 111 is an independent server; or, the management module 111 may also be a software module, which is deployed on a certain storage node 112. For example, the management module 111 and a certain storage node 112 run on the same server, and the specific forms of the management module 111 and the storage node 112 are not limited here.
在本实施例中,每个存储节点都包含有至少一个存储单元,存储单元是一段逻辑空间,该逻辑空间是通过存储节点所包括的存储设备的物理空间映射得到的,也就是说,实际的物理空间仍然来自多个存储节点。In this embodiment, each storage node includes at least one storage unit. The storage unit is a segment of logical space. The logical space is obtained by mapping the physical space of the storage device included in the storage node, that is, the actual The physical space still comes from multiple storage nodes.
请参考图2,为本实施例提供的存储单元的一种示例的结构示意图。在图2中,存储单元是包含多个逻辑块的集合。逻辑块是一个逻辑空间概念,是由存储设备的空间划分而得到的,一个逻辑块的大小可以为4KB或者8KB等,在此不对逻辑块的大小进行限制。每个逻辑块对应存储设备中的一个与逻辑块大小相同的物理存储空间。需要说明的是,一个存储单元中包括的多个逻辑块来自多个存储设备,该多个存储设备可以来自不同存储节点,或者也可以来自同一个存储设备,在此不作限制。Please refer to FIG. 2, which is a schematic structural diagram of an example of the storage unit provided in this embodiment. In Figure 2, the storage unit is a collection of multiple logic blocks. The logical block is a logical space concept, which is obtained by the space division of the storage device. The size of a logical block can be 4KB or 8KB, etc. The size of the logical block is not limited here. Each logical block corresponds to a physical storage space of the same size as the logical block in the storage device. It should be noted that multiple logical blocks included in a storage unit come from multiple storage devices, and the multiple storage devices may come from different storage nodes, or may also come from the same storage device, which is not limited here.
以一个存储单元中包括的多个逻辑块来自存储系统中的同一个存储节点112所包括的存储设备为例,作为一种示例,存储节点112可以根据设定的独立硬盘冗余阵列(redundant array of independent disks,RAID)类型,将存储单元所包括的逻辑块集合中的逻辑块映射为数据存储单元,用于存储数据分片,以及,根据每个逻辑块中存储的数据分片生成校验数据分片,然后将校验数据分片存储到校验存储单元,数据存储单元和校验数据存储单元形成一个分条。一个存储单元包含一个或多个分条。其中,数据存储单元中包括至少两个逻辑块,校验存储单元中包括至少一个逻辑块。例如,存储节点112从4个存储设备,例如,存储设备A~存储设备D,各取出一个逻辑块构成存储单元,该4个逻辑块组成一个分条的数据存储单元,然后,从另外2个存储设备中各取出一个逻辑块组成校验存储单元。这样,当该分条中任意两个逻辑块失效时,该任意两逻辑块可以是任意两个数据存储单元或者任意两个校验存储单元对应的逻辑块,或者也可以是一个数据存储单元和一个校验存储单元分别对应的逻辑块,可以根据剩下的逻辑块中的数据重构出失效的逻辑块中的数据。Taking multiple logical blocks included in a storage unit from a storage device included in the same storage node 112 in the storage system as an example, as an example, the storage node 112 may be based on a set redundant array of independent hard disks (redundant array). of independent disks, RAID) type, which maps the logical blocks in the logical block set included in the storage unit to data storage units for storing data fragments, and generates a checksum based on the data fragments stored in each logical block The data is sliced, and then the check data is sliced and stored in the check storage unit, and the data storage unit and the check data storage unit form a strip. A storage unit contains one or more strips. Wherein, the data storage unit includes at least two logic blocks, and the verification storage unit includes at least one logic block. For example, the storage node 112 takes out one logical block from four storage devices, such as storage device A to storage device D, to form a storage unit. The four logical blocks form a striped data storage unit, and then from the other two Each logical block is taken out of the storage device to form a check storage unit. In this way, when any two logical blocks in the strip fail, the any two logical blocks can be any two data storage units or logical blocks corresponding to any two check storage units, or can be a data storage unit and The logic block corresponding to a check storage unit can reconstruct the data in the failed logic block according to the data in the remaining logic block.
作为另一种示例,存储节点112也可以根据设定的多副本类型,将存储单元所包括的逻辑块集合中的多个逻辑块划分为副本单元。其中,每个副本单元包括至少一个逻辑块,该至少一个逻辑块中存储数据,各个副本单元中存储的数据相同。例如,一个副本单元包括2个逻辑块,则存储节点112从2个存储设备中各取出一个逻辑块构成一个副本单元,假设多副本类型为副本类型3,也就是说,一个数据需要保存3份,则存储节点112可以从另外4个存储设备中各取出一个逻辑块,将每两个逻辑块组成一个副本单元,得到另外2个副本单元,该3个副本单元中存储同一个数据。这样,当任意一个副本单元失效时,可以从其他两个副本单元中获取数据。As another example, the storage node 112 may also divide multiple logical blocks in the logical block set included in the storage unit into duplicate units according to the set multiple duplicate type. Wherein, each copy unit includes at least one logic block, the at least one logic block stores data, and the data stored in each copy unit is the same. For example, if a copy unit includes two logical blocks, the storage node 112 will take out one logical block from each of the two storage devices to form a copy unit. Assume that the multiple copy type is copy type 3, that is, one data needs to be stored in three copies. , The storage node 112 can each take out one logical block from the other four storage devices, and compose every two logical blocks into a copy unit to obtain another two copy units, and the same data is stored in the three copy units. In this way, when any copy unit fails, data can be obtained from the other two copy units.
下面,以图1所示的应用场景为例,对本申请实施例提供的元数据管理方法进行说明。为便于理解,将以如下四个方面介绍本申请实施例的技术方案。在下面的介绍中,存储系统110所执行的步骤可以均由存储系统110的管理模块111执行。In the following, the application scenario shown in FIG. 1 is taken as an example to describe the metadata management method provided by the embodiment of the present application. For ease of understanding, the technical solutions of the embodiments of the present application will be introduced in the following four aspects. In the following introduction, the steps executed by the storage system 110 may all be executed by the management module 111 of the storage system 110.
第一方面,数据的存储过程。The first aspect is the data storage process.
请参考图3,为本申请实施例中数据的存储过程的流程图,该流程图描述如下:Please refer to FIG. 3, which is a flowchart of the data storage process in an embodiment of this application. The flowchart is described as follows:
S31、客户端服务器100向存储系统110发送数据写请求。S31. The client server 100 sends a data write request to the storage system 110.
该数据写请求中包括待写入的数据以及该待写入数据的虚拟存储地址。该虚拟存储地址是指该待写入数据待写入的逻辑单元(logical unit,LU)的标识和偏移量,该虚拟存储地址是对客户端服务器100可见的地址。该数据写请求可以是客户端服务器100根据用户的操作获取,也可以是在运行过程中根据系统需求生成。The data write request includes the data to be written and the virtual storage address of the data to be written. The virtual storage address refers to the identifier and offset of the logical unit (LU) to which the data to be written is to be written, and the virtual storage address is an address visible to the client server 100. The data write request may be obtained by the client server 100 according to a user's operation, or may be generated according to system requirements during operation.
S32、存储系统110确定用于存储该待写入数据的存储单元。S32. The storage system 110 determines a storage unit for storing the data to be written.
存储系统110的管理模块111接收该数据写请求后,则根据存储系统110中存储单元的使用情况以及该数据写请求中携带的待写入数据的大小,确定该待写入数据的存储单元。After the management module 111 of the storage system 110 receives the data write request, it determines the storage unit of the data to be written according to the usage of the storage unit in the storage system 110 and the size of the data to be written carried in the data write request.
作为一种示例,假设该待写入数据的大小为1MB,每一个存储单元的大小为1MB,则存储系统110确定该待写入数据需要占用1个存储单元。存储系统110确定在接收该数据写请求之前,未存储任何数据,则确定该待写入数据占用的存储单元为存储单元0。在本示例中,以起始的存储单元为存储单元0为例,在其他实施例中,起始的存储单元也可以是存储单元1,在此不作限制。As an example, assuming that the size of the data to be written is 1 MB and the size of each storage unit is 1 MB, the storage system 110 determines that the data to be written requires 1 storage unit. The storage system 110 determines that no data is stored before receiving the data write request, and then determines that the storage unit occupied by the data to be written is storage unit 0. In this example, the initial storage unit is the storage unit 0 as an example. In other embodiments, the initial storage unit may also be the storage unit 1, which is not limited here.
作为另一种示例,一个存储单元可以包含多个分条,也就是说,一个分条的数据存储单元包括该存储单元对应的逻辑块集合中的部分逻辑块。请参考图4,一个存储单元包含3个分条,如果一个分条存储的数据的大小为32KB,则一个存储单元的大小为96KB。若该待写入数据的大小小于一个存储单元的大小时,可以确定将该待写入数据存储到某一个存储单元所包括的部分逻辑块中,例如,存储到与至少一个分条对应的逻辑块中。例如,一个存储单元中包括12个逻辑块,每4个逻辑块对应一个分条,即每4个逻辑块能够存储数据量为32KB的数据,若该待写入数据的大小为32KB,存储系统110确定在接收该数据写请求之前,已经在存储单元0的前4个逻辑块(即逻辑块0~逻辑块3)中存储了数据,则可以确定将该待写入数据存储到该存储单元0的逻辑块4~逻辑块7中。As another example, a storage unit may include multiple strips, that is, a striped data storage unit includes some logical blocks in the logical block set corresponding to the storage unit. Please refer to Figure 4, a storage unit contains 3 strips. If the size of the data stored in a strip is 32KB, the size of a storage unit is 96KB. If the size of the data to be written is smaller than the size of a storage unit, it can be determined to store the data to be written in a partial logical block included in a certain storage unit, for example, to store the data corresponding to at least one stripe. Block. For example, a storage unit includes 12 logic blocks, and each 4 logic blocks corresponds to a stripe, that is, every 4 logic blocks can store data with a data volume of 32KB. If the size of the data to be written is 32KB, the storage system 110 determines that before receiving the data write request, data has been stored in the first 4 logical blocks of storage unit 0 (that is, logical block 0 to logical block 3), then it can be determined to store the data to be written in the storage unit 0 in logic block 4 to logic block 7.
需要说明的是,在实际使用过程中,一个存储单元中可能对应不止3个分条,例如,可以对应几十或者上百个分条,图4所示的分条的数量仅为示例说明,不应理解为对存储单元的限制。It should be noted that in actual use, a storage unit may correspond to more than 3 strips. For example, it can correspond to dozens or hundreds of strips. The number of strips shown in Figure 4 is only an example. It should not be understood as a restriction on the storage unit.
在其他实施方式中,每个存储设备可以提供一段逻辑地址,而不是以逻辑块的形式,给存储单元,这种情况下存储单元就是多个逻辑地址段的集合。In other embodiments, each storage device may provide a segment of logical address instead of providing it to the storage unit in the form of a logical block. In this case, the storage unit is a collection of multiple logical address segments.
S33、存储系统110根据确定的用于存储该待写入数据的存储单元,存储该待写入数据。S33. The storage system 110 stores the data to be written according to the determined storage unit for storing the data to be written.
存储系统110的管理模块111中预先存储各个存储单元与存储节点的存储设备的映射关系,当确定用于存储该待写入数据的存储单元后,则根据该映射关系,将该待写入数据写入到对应的存储节点中。The management module 111 of the storage system 110 pre-stores the mapping relationship between each storage unit and the storage device of the storage node. When the storage unit used to store the data to be written is determined, the data to be written is determined according to the mapping relationship. Write to the corresponding storage node.
作为一种示例,存储系统110的管理模块111根据预设的RAID类型存储写入到存储单元的数据。继续参考图4,存储单元0中包括12个逻辑块,其中每4个逻辑块对应一个分条,该4个逻辑块用于存储数据分片。例如,逻辑块0~逻辑块3为第一个分条中用于存储数据分片的逻辑块,逻辑块4~逻辑块7为第二个分条中用于存储数据分片的逻辑块,逻辑块8~逻辑块11为第三个分条中用于存储数据分片的逻辑块,且每个分条中还包括用于存储检验数据分片的逻辑块,例如,第一个分条还包括逻辑块P0和逻辑块Q0,第二个分条中还包括逻辑块P1和逻辑块Q1,第三个分条中还包括逻辑块P2和逻辑块Q2。As an example, the management module 111 of the storage system 110 stores the data written to the storage unit according to a preset RAID type. Continuing to refer to FIG. 4, the storage unit 0 includes 12 logical blocks, and each of the 4 logical blocks corresponds to a stripe, and the 4 logical blocks are used to store data fragments. For example, logical block 0 to logical block 3 are the logical blocks used to store data slices in the first strip, and logical block 4 to logical block 7 are logical blocks used to store data slices in the second strip. Logic block 8 to logic block 11 are the logic blocks used to store data slices in the third stripe, and each stripe also includes logic blocks used to store test data slices, for example, the first stripe It also includes a logic block P0 and a logic block Q0. The second section also includes a logic block P1 and a logic block Q1, and the third section also includes a logic block P2 and a logic block Q2.
存储系统110中预先设置每个分条包括的逻辑块与存储节点的存储设备的映射关系。 例如,该映射关系为:每个分条中用于存储数据分片的4个逻辑块依次对应存储节点1~存储节点4中的存储设备A,每个分条中用于校验数据分片的逻辑块依次对应存储节点5和存储节点6中的存储设备A。在图4中,一个存储单元对应的多个分条中,位置相同的逻辑块来自同一个存储节点。例如,图4所示的存储单元包括3个分条,其中,第一个分条包括逻辑块0~逻辑块3、逻辑块P0以及逻辑块Q0,第二个分条包括逻辑块4~逻辑块7、逻辑块P1以及逻辑块Q1,则逻辑块0和逻辑块4位于相同的位置,逻辑块1和逻辑块5位于相同的位置,以此类推。The storage system 110 presets a mapping relationship between the logical blocks included in each segment and the storage device of the storage node. For example, the mapping relationship is: the 4 logical blocks used to store data fragments in each stripe correspond to storage device A in storage node 1 to storage node 4 in turn, and each stripe is used to verify data fragments The logical blocks of corresponds to storage device A in storage node 5 and storage node 6 in turn. In Figure 4, in multiple strips corresponding to a storage unit, logical blocks with the same position are from the same storage node. For example, the storage unit shown in FIG. 4 includes 3 strips. The first strip includes logic block 0 to logic block 3, logic block P0, and logic block Q0, and the second strip includes logic block 4 to logic block Q0. For block 7, logic block P1, and logic block Q1, logic block 0 and logic block 4 are located in the same position, logic block 1 and logic block 5 are located in the same position, and so on.
当管理模块111接收待写入数据后,则可以根据预设的RAID类型将待写入数据切分为多个数据分片,并计算获得校验分片,将所述数据分片以及校验分片存储到与各个逻辑块对应的存储设备中。例如,该待写入数据的大小为32KB,确定将该待写入数据存储到逻辑块4~逻辑块7中,则管理模块111将待写入数据划分为4个数据分片,每个数据分片的大小为8KB,然后根据该4个数据分片,计算获得2个校验数据分片,每个校验分片的大小也为8KB。然后,管理模块111将各个数据分片以及校验数据分片发送给对应的存储节点进行持久化存储。以映射关系如前所述,管理模块111将4个数据分片分别发送给存储节点1~存储节点4,以及,将2个校验数据分片分别发送给存储节点5和存储节点6,由各个存储节点将对应的数据存储在预设的存储设备中。After the management module 111 receives the data to be written, it can divide the data to be written into multiple data fragments according to the preset RAID type, and calculate the parity fragments, and divide the data fragments and parity into multiple data fragments. The fragments are stored in the storage device corresponding to each logical block. For example, the size of the data to be written is 32KB, and it is determined that the data to be written is stored in logical block 4 to logical block 7, then the management module 111 divides the data to be written into 4 data fragments, each The size of the fragment is 8KB, and then according to the 4 data fragments, 2 parity data fragments are calculated, and the size of each parity fragment is also 8KB. Then, the management module 111 sends each data fragment and the verification data fragment to the corresponding storage node for persistent storage. With the mapping relationship as described above, the management module 111 sends 4 data fragments to storage node 1 to storage node 4 respectively, and sends 2 parity data fragments to storage node 5 and storage node 6 respectively. Each storage node stores corresponding data in a preset storage device.
作为另一种示例,存储系统110的管理模块111根据预设的多副本类型存储写入到存储单元的数据。请参考图5,存储单元0中包括12个逻辑块,每个逻辑块均用于存储数据。存储系统110中预先设置每个逻辑块与存储节点的存储设备的映射关系。例如,该多副本类型为2个副本,则每个逻辑块可以对应一个存储节点上的2个不同的存储设备,该映射关系为:逻辑块0~逻辑块3依次对应存储节点1~存储节点4上的存储设备A和存储设备B,其他逻辑块与存储设备之间的映射关系可以与逻辑块0~逻辑块3相似,在此不再赘述。As another example, the management module 111 of the storage system 110 stores the data written to the storage unit according to a preset multiple copy type. Please refer to FIG. 5, the storage unit 0 includes 12 logic blocks, and each logic block is used to store data. The storage system 110 presets the mapping relationship between each logical block and the storage device of the storage node. For example, if the multiple copy type is 2 copies, each logical block can correspond to two different storage devices on a storage node, and the mapping relationship is: logical block 0 to logical block 3 correspond to storage node 1 to storage node in turn The mapping relationship between storage device A and storage device B on 4, and other logical blocks and storage devices may be similar to logical block 0 to logical block 3, and will not be repeated here.
当管理模块111接收待写入数据后,则可以根据预设的多副本类型将待写入数据复制为多个数据,并将该待写入数据和复制得到的数据存储到与各个逻辑块对应的存储设备中。例如,该待写入数据的大小为32KB,每个逻辑块的大小为4KB,确定将该待写入数据写入到逻辑块0~逻辑块4中,则管理模块111将待写入数据划分为4份,每份数据的大小为8KB,然后复制该4份数据,得到8份数据,然后,管理模块111将该8份数据发送给对应的存储节点进行持久化存储。以映射关系如前所述,管理模块111将该8份数据中两份相同的数据分别发送给存储节点1~存储节点4,由各个存储节点将对应的数据存储在预设的存储设备中。After the management module 111 receives the data to be written, it can copy the data to be written into multiple data according to the preset multiple copy type, and store the data to be written and the copied data corresponding to each logical block. In the storage device. For example, the size of the data to be written is 32KB, and the size of each logical block is 4KB. If it is determined to write the data to be written into logical blocks 0 to 4, the management module 111 divides the data to be written The data is 4 copies, and the size of each data is 8KB. Then the 4 copies of data are copied to obtain 8 copies of data. Then, the management module 111 sends the 8 copies of data to the corresponding storage node for persistent storage. With the mapping relationship as described above, the management module 111 sends two identical data of the eight pieces of data to storage nodes 1 to 4 respectively, and each storage node stores the corresponding data in a preset storage device.
从逻辑上看,该待写入数据是写入存储系统110的存储单元中。从物理上看,数据最终仍然是存储在多个存储节点中的。对于每个分片而言,它所在的存储单元的标识以及位于所述存储单元内部的位置是所述分片的逻辑地址,该分片位于存储节点中的实际地址是所述分片的物理地址。From a logical point of view, the data to be written is written into the storage unit of the storage system 110. From a physical point of view, the data is ultimately still stored in multiple storage nodes. For each fragment, the identification of the storage unit where it is located and the location inside the storage unit are the logical address of the fragment, and the actual address of the fragment in the storage node is the physical address of the fragment. address.
第二方面,元数据的存储过程。The second aspect is the storage process of metadata.
当待写入数据存储到存储设备之后,为了方便后续查找或者读取该数据,存储系统110还需要存储该数据的描述信息,存储节点在收到数据读请求时,通常根据该数据读请求中携带的信息(例如,数据名或者虚拟地址)找到待读取数据的元数据,再进一步根据所述 元数据获取所述待读取数据。元数据包括但不限于:每个分片的逻辑地址与物理地址之间的对应关系,所述数据的逻辑地址与该数据所包含的各个分片的逻辑地址之间的对应关系,各个副本的逻辑地址与物理地址之间的对应关系,数据的逻辑地址与该数据的副本的逻辑地址之间的对应关系。该数据所包含的各个分片的逻辑地址的集合或者各个副本的逻辑地址也就是该数据的逻辑地址。After the data to be written is stored in the storage device, in order to facilitate subsequent searching or reading of the data, the storage system 110 also needs to store the description information of the data. When the storage node receives the data read request, it is usually based on the data read request. The carried information (for example, data name or virtual address) finds the metadata of the data to be read, and then further obtains the data to be read according to the metadata. Metadata includes, but is not limited to: the correspondence between the logical address and physical address of each fragment, the correspondence between the logical address of the data and the logical address of each fragment contained in the data, and the The correspondence between the logical address and the physical address, and the correspondence between the logical address of the data and the logical address of the copy of the data. The set of logical addresses of each fragment contained in the data or the logical address of each copy is the logical address of the data.
请参考图6,为本申请实施例中元数据的存储过程的流程图,该流程图描述如下:Please refer to FIG. 6, which is a flowchart of the metadata storage process in an embodiment of this application. The flowchart is described as follows:
S61、存储系统110生成元数据。S61. The storage system 110 generates metadata.
当存储系统110中存储待写入数据后,存储系统110的管理模块111则会生成该待写入数据的元数据。例如,在图3所示的实施例中,管理模块111将待写入数据存储到存储单元的逻辑块0~逻辑块4中,然后,管理模块111则会根据该待写入数据的大小、存储地址等信息,生成该待写入数据的元数据。元数据具体包括的内容在此不作限制。After the data to be written is stored in the storage system 110, the management module 111 of the storage system 110 generates metadata of the data to be written. For example, in the embodiment shown in FIG. 3, the management module 111 stores the data to be written in logic block 0 to logic block 4 of the storage unit, and then the management module 111 will, according to the size of the data to be written, Store the address and other information to generate the metadata of the data to be written. The content of metadata is not limited here.
S62、存储系统110确定用于存储该元数据的存储单元。S62. The storage system 110 determines a storage unit for storing the metadata.
在本申请实施例中,存储系统110用于存储数据的物理存储空间和用于存储元数据的物理存储空间是分开的,例如,若每个存储节点中包括4个存储设备,通常情况下,数据的元数据相较于数据本身而言,所占用的存储空间较小,因此,可以设置存储系统110中每个存储节点中的存储设备A~存储设备C用于存储数据,而每个存储节点中的存储设备D用于存储元数据;或者,若存储系统110包括4个存储节点,也可以设置存储节点1~存储节点3中所有的存储设备均用于存储数据,而存储节点4中所有的存储设备均用于存储元数据。在本申请实施例中,用于存储数据的存储单元和用于存储元数据的存储单元的本质相同,只是存储单元中存储的内容不同,或者说,用于存储数据的存储单元和用于存储元数据的存储单元来自不同的存储设备。In the embodiment of the present application, the physical storage space used by the storage system 110 for storing data and the physical storage space used for storing metadata are separated. For example, if each storage node includes 4 storage devices, normally, Compared with the data itself, the metadata of the data occupies a smaller storage space. Therefore, the storage device A to the storage device C in each storage node in the storage system 110 can be set to store data, and each storage The storage device D in the node is used to store metadata; or, if the storage system 110 includes 4 storage nodes, it is also possible to set all storage devices in storage node 1 to storage node 3 to store data, and storage node 4 All storage devices are used to store metadata. In the embodiments of the present application, the storage unit used to store data and the storage unit used to store metadata are essentially the same, except that the content stored in the storage unit is different. In other words, the storage unit used to store data and the storage unit used to store metadata are different. The storage unit of metadata comes from different storage devices.
作为一种示例,管理模块111在生成元数据后,可以根据存储系统110中用于存储元数据的存储单元的使用情况确定用于存储该元数据的存储单元。例如,请参考图7,一个用于存储元数据的存储单元中包括6个逻辑块,每2个逻辑块对应一个分条,管理模块111确定在生成该元数据之前,已经在一个用于存储元数据的存储单元0的前2个逻辑块(即逻辑块0和逻辑块1)中存储了数据,则管理模块111可以确定将生成的元数据存储到该存储单元0的逻辑块2和逻辑块3中。这种方式可以理解为,以追加写的方式在存储单元中存储元数据。As an example, after the management module 111 generates the metadata, it can determine the storage unit used to store the metadata according to the usage of the storage unit used to store the metadata in the storage system 110. For example, please refer to Figure 7. A storage unit for storing metadata includes 6 logical blocks, and every 2 logical blocks corresponds to a stripe. The management module 111 determines that before generating the metadata, a storage unit has been used for storage. If data is stored in the first two logic blocks (ie, logic block 0 and logic block 1) of the metadata storage unit 0, the management module 111 can determine to store the generated metadata in the logic block 2 and logic block 2 of the storage unit 0. Block 3. This method can be understood as storing metadata in the storage unit in an additional write manner.
在其他实施例中,在步骤S62之前还可以执行步骤S63。In other embodiments, step S63 may be performed before step S62.
S63、存储系统110生成与元数据对应的记录项。S63. The storage system 110 generates a record item corresponding to the metadata.
管理模块111在生成元数据后,可以根据该元数据以及与该元数据对应的操作,得到与该元数据对应的写前日志(write ahead log,WAL)记录项,当该WAL记录项存储到对应的存储空间后则形成了WAL日志。After the management module 111 generates the metadata, it can obtain the write ahead log (WAL) record item corresponding to the metadata according to the metadata and the operation corresponding to the metadata. When the WAL record item is stored After the corresponding storage space, a WAL log is formed.
对与元数据对应的操作进行举例说明,例如,该元数据是根据客户端服务器100发送的数据写请求生成的,则与该元数据对应的操作即写数据操作。然后,管理模块111会将该记录项保存在内存中,内存可以理解为管理模块111所在的节点或者服务器的内存。当内存中记录的WAL记录项满足预设条件,例如,该预设条件可以是内存中记录的WAL记录项的数量达到阈值,则确定将该内存中记录的多个WAL记录项中的元数据写入到存储单元中,从而执行步骤S62,确定与每个WAL记录项中元数据对应的存储单元。其中,确定与每个 WAL记录项中元数据对应的存储单元的方式可以与步骤S62相似,即根据用于存储元数据的存储单元的使用情况,依次确定用于存储每个WAL记录项中元数据的存储单元,在此不再赘述。The operation corresponding to metadata is illustrated by an example. For example, if the metadata is generated according to a data write request sent by the client server 100, the operation corresponding to the metadata is a data write operation. Then, the management module 111 saves the record item in the memory, and the memory can be understood as the memory of the node or server where the management module 111 is located. When the WAL record items recorded in the memory meet a preset condition, for example, the preset condition may be that the number of WAL record items recorded in the memory reaches a threshold, then the metadata in the multiple WAL record items recorded in the memory is determined Write to the storage unit, thereby executing step S62 to determine the storage unit corresponding to the metadata in each WAL record. Wherein, the method of determining the storage unit corresponding to the metadata in each WAL record can be similar to step S62, that is, according to the usage of the storage unit used to store the metadata, determine the storage unit used to store each WAL record in turn. The storage unit of the data will not be repeated here.
由于WAL记录项中记录了元数据以及对应的操作,因此,在这种方式下,当用于存储元数据的存储单元出现故障时,可以通过WAL记录项中的内容恢复出在出现故障之前的元数据,可以增加存储系统110的稳定性。Since metadata and corresponding operations are recorded in the WAL record, in this way, when the storage unit used to store the metadata fails, the content in the WAL record can be used to recover the previous failure Metadata can increase the stability of the storage system 110.
S64、存储系统110将该元数据写入到确定的存储单元中。S64. The storage system 110 writes the metadata into the determined storage unit.
步骤S64与步骤S33相似,下面以一个具体的示例进行说明。Step S64 is similar to step S33, and a specific example is used for description below.
继续参考图7,用于存储元数据的存储单元0中包括6个逻辑块,每2个逻辑块对应一个分条,即逻辑块0和逻辑块1对应第一个分条,逻辑块2和逻辑块3对应第二个分条,逻辑块4和逻辑块5对应第三个分条,这些逻辑块对应每个分条中用于存储元数据分片的逻辑块。且每个分条中还包括用于存储校验元数据的逻辑块,例如,第一个分条中包括逻辑块P0,第二个分条中包括逻辑块P1,第三个分条中包括逻辑块P2。Continuing to refer to Figure 7, the storage unit 0 for storing metadata includes 6 logic blocks, and every 2 logic blocks corresponds to a stripe, that is, logic block 0 and logic block 1 correspond to the first stripe, logic block 2 and Logic block 3 corresponds to the second slice, and logic blocks 4 and 5 correspond to the third slice. These logic blocks correspond to the logic blocks used to store metadata slices in each slice. And each stripe also includes logic blocks for storing verification metadata. For example, the first stripe includes logic block P0, the second stripe includes logic block P1, and the third stripe includes logic block P1. Logic block P2.
当管理模块111确定将生成的元数据存储到该存储单元0的逻辑块2和逻辑块3中,则可以根据预设的RAID类型将待写入数据切分为多个元数据分片,并计算获得校验分片,将所述元数据分片以及校验分片存储到与各个逻辑块对应的存储设备中。When the management module 111 determines to store the generated metadata in logical block 2 and logical block 3 of the storage unit 0, the data to be written can be divided into multiple metadata slices according to the preset RAID type, and The check fragment is obtained by calculation, and the metadata fragment and the check fragment are stored in a storage device corresponding to each logical block.
或者,管理模块111根据预设的多副本类型,对各个元数据分片进行复制,然后将各个元数据分片以及复制的元数据分片存储到与各个存储设备中。与步骤S33相似,在此不再赘述。Alternatively, the management module 111 copies each metadata segment according to a preset multiple copy type, and then stores each metadata segment and the copied metadata segment in each storage device. It is similar to step S33 and will not be repeated here.
由上述描述可知,管理模块111生成该元数据后,可以执行步骤S62和步骤S64,或者执行步骤S62~步骤S64,将元数据存储到对应的存储设备中,也就是说,管理模块111可以通过两种方式存储元数据。那么,管理模块111可以根据预设的判断条件,选择使用该两种方式中的哪一种方式存储元数据。作为一种示例,该预设的判断条件可以是,判断该元数据是新数据的元数据还是对旧数据进行更新的元数据,如果是新数据的元数据,可以理解为不需要原地更新的元数据,则可以执行步骤S62和步骤S64,如果是对旧数据进行更新的元数据,可以理解为需要原地更新的元数据,则可以执行步骤S62~步骤S64。该预设的判断条件也可以是其他内容,在此不作限制。It can be seen from the above description that after the management module 111 generates the metadata, it can perform steps S62 and S64, or perform steps S62 to S64 to store the metadata in the corresponding storage device, that is, the management module 111 can use There are two ways to store metadata. Then, the management module 111 can select which of the two ways to store metadata according to a preset judgment condition. As an example, the preset judgment condition may be judging whether the metadata is metadata for new data or metadata for updating old data. If it is metadata for new data, it can be understood that it does not need to be updated in situ Step S62 and Step S64 can be performed for metadata of. If it is metadata for updating old data, it can be understood as metadata that needs to be updated in situ, then step S62 to step S64 can be performed. The preset judgment condition can also be other content, which is not limited here.
S65、存储系统110更新元数据结构。S65. The storage system 110 updates the metadata structure.
当管理模块111将元数据写入对应的存储设备后,管理模块111还需要更新存储系统110的元数据结构。在本申请实施例中,元数据结构可以是二叉树(binary tree,Btree),可以是日志结构合并数(log-structured merge-tree,LSM tree),当然也可以是其他能够以追加写方式进行存储的元数据结构,在此不对元数据结构进行限制。After the management module 111 writes the metadata into the corresponding storage device, the management module 111 also needs to update the metadata structure of the storage system 110. In the embodiment of this application, the metadata structure may be a binary tree (Btree), a log-structured merge-tree (LSM tree), and of course, it may also be other types that can be stored in an additional write mode. The metadata structure of, there is no restriction on the metadata structure here.
例如,请参考图8(a),为存储系统110中已经保存的元数据对应的Btree,当管理模块111将元数据存储到对应的存储设备之后,则可以根据该待写入数据的元数据中的内容,更新该Btree。例如,在图8(a)中包括元数据h,元数据e、元数据s、元数据a、元数据f以及元数据q,与待写入数据对应的元数据的名称为元数据z,该元数据z中包括元数据s,则将元数据z作为元数据s的子节点,得到如图8(b)所示的Btree。For example, please refer to Figure 8(a), which is the Btree corresponding to the metadata that has been saved in the storage system 110. After the management module 111 stores the metadata in the corresponding storage device, it can be based on the metadata of the data to be written. Update the Btree. For example, in Figure 8(a), metadata h, metadata e, metadata s, metadata a, metadata f, and metadata q are included. The name of the metadata corresponding to the data to be written is metadata z, The metadata z includes the metadata s, and the metadata z is taken as the child node of the metadata s, and the Btree as shown in FIG. 8(b) is obtained.
又例如,与待写入数据对应的元数据的名称为元数据h’,该元数据h’中包括元数据e和元数据s,则将元数据h’作为元数据e和元数据s的父节点,得到如图8(c)所示的Btree。For another example, the name of the metadata corresponding to the data to be written is metadata h', and the metadata h'includes metadata e and metadata s, then metadata h'is used as the metadata of metadata e and metadata s For the parent node, the Btree as shown in Figure 8(c) is obtained.
步骤S65为可选步骤,在图6中以虚线进行表示。Step S65 is an optional step, which is represented by a dotted line in FIG. 6.
第三方面,元数据的垃圾回收过程。The third aspect is the garbage collection process of metadata.
为了合理利用元数据分区中的存储空间,当存储系统100中的垃圾元数据较多时,可以启动垃圾回收。请参考图9,为本申请实施例中元数据的垃圾回收过程的流程图,该流程图描述如下:In order to make reasonable use of the storage space in the metadata partition, when there are too many garbage metadata in the storage system 100, garbage collection can be started. Please refer to FIG. 9, which is a flowchart of the garbage collection process of metadata in an embodiment of this application. The flowchart is described as follows:
S91、存储系统110确定用于垃圾回收的存储单元。S91. The storage system 110 determines a storage unit used for garbage collection.
本实施例中,垃圾回收是以存储单元为单位进行的。该用于垃圾回收的存储单元可以是,所包含的垃圾元数据到达第一设定阈值,或者是所述多个存储单元中包含垃圾元数据最多的存储单元,或者该存储单元包含的有效元数据低于第二设定阈值,或者该存储单元是所述多个存储单元中包含有效元数据最少的存储单元。例如,在图8(c)所示的Btree中,元数据h和元数据h’均为元数据e和元数据s的父节点,而元数据h’是在元数据h之后存储的,因此,管理模块111可以确定元数据h为垃圾元数据。而元数据h所占用的逻辑块为存储单元0的逻辑块1和逻辑块2,因此,确定存储单元0中包括2个垃圾逻辑块。当一个存储单元中垃圾逻辑块的数量达到预设阈值,该预设阈值可以为3,则确定该存储单元为用于垃圾回收的存储单元。为了方便描述,下文中将用于垃圾回收的存储单元为存储单元0为例。In this embodiment, garbage collection is performed in units of storage units. The storage unit used for garbage collection may be that the garbage metadata contained reaches the first set threshold, or the storage unit that contains the most garbage metadata among the multiple storage units, or the effective metadata contained in the storage unit The data is lower than the second set threshold, or the storage unit is the storage unit containing the least valid metadata among the plurality of storage units. For example, in the Btree shown in Figure 8(c), both metadata h and metadata h'are the parent nodes of metadata e and metadata s, and metadata h'is stored after metadata h, so , The management module 111 can determine that the metadata h is garbage metadata. The logic blocks occupied by the metadata h are the logic block 1 and the logic block 2 of the storage unit 0. Therefore, it is determined that the storage unit 0 includes 2 garbage logic blocks. When the number of garbage logical blocks in a storage unit reaches a preset threshold, which may be 3, it is determined that the storage unit is a storage unit used for garbage collection. For the convenience of description, the storage unit used for garbage collection is the storage unit 0 as an example in the following.
S92、存储系统110将用于垃圾回收的存储单元中的有效元数据迁移到其他存储单元。S92. The storage system 110 migrates the effective metadata in the storage unit used for garbage collection to other storage units.
当确定存储单元0为用于垃圾回收的存储单元后,则将存储单元0中的有效元数据迁移到其他存储单元。例如,继续参考图7,在存储单元0中的逻辑块1~逻辑块4中存储的是垃圾元数据,而逻辑块5和逻辑块6中存储的是有效元数据,则管理模块111将逻辑块5和逻辑块6中存储的有效元数据迁移至一个新的存储单元中,例如,存储单元2。When it is determined that storage unit 0 is a storage unit for garbage collection, the effective metadata in storage unit 0 is migrated to other storage units. For example, continuing to refer to FIG. 7, the garbage metadata is stored in logic block 1 to logic block 4 in storage unit 0, and the valid metadata is stored in logic block 5 and logic block 6, the management module 111 will logically The valid metadata stored in block 5 and logic block 6 are migrated to a new storage unit, for example, storage unit 2.
S93、存储系统110释放用于垃圾回收的存储单元所占用的存储空间。S93. The storage system 110 releases the storage space occupied by the storage unit used for garbage collection.
具体来讲,管理模块111可以向与存储单元0对应的存储节点发送删除指令,以删除与存储单元0对应的元数据分片或者校验元数据分片。Specifically, the management module 111 may send a deletion instruction to the storage node corresponding to the storage unit 0 to delete the metadata segment corresponding to the storage unit 0 or verify the metadata segment.
第四方面,元数据实例的管理过程。The fourth aspect is the management process of metadata instances.
存储系统110可以通过创建不同的元数据实例,来实现各种增值业务,该增值业务例如对元数据打快照的业务或者对元数据进行克隆的业务等。元数据实例可以理解为用于实现某项增值业务的程序代码。请参考图10,为本申请实施例中元数据实例的管理过程的流程图,该流程图描述如下:The storage system 110 can implement various value-added services by creating different metadata instances, such as a service for snapshotting metadata or a service for cloning metadata. Metadata instances can be understood as program codes used to implement a certain value-added service. Please refer to FIG. 10, which is a flowchart of the metadata instance management process in an embodiment of this application. The flowchart is described as follows:
S101、存储系统110创建第一元数据实例。S101. The storage system 110 creates a first metadata instance.
第一元数据实例用于对预设的存储单元中存储的元数据进行业务操作。作为一种示例,该业务操作为打快照的操作,即,该第一元数据实例为对预设的存储单元中的元数据打快照的实例。该预设的存储单元可以是存储系统110中用于存储元数据的存储单元中的部分或者全部,例如,存储系统110中用于存储元数据的存储单元包括存储单元0~存储单元4,该预设的存储单元可以是存储单元0和存储单元1,根据实际使用情况进行设置。当管理模块111运行与该第一元数据实例对应的程序代码后,则创建该第一元数据实例。The first metadata instance is used to perform business operations on the metadata stored in the preset storage unit. As an example, the business operation is a snapshot operation, that is, the first metadata instance is an instance of snapshotting metadata in a preset storage unit. The preset storage unit may be part or all of the storage units used to store metadata in the storage system 110. For example, the storage units used to store metadata in the storage system 110 include storage units 0 to 4, and The preset storage units may be storage unit 0 and storage unit 1, which can be set according to actual usage. After the management module 111 runs the program code corresponding to the first metadata instance, it creates the first metadata instance.
需要说明的是,在相关技术中,是通过创建多个元数据实例来实现对元数据的冗余保 护的,例如,需要对存储系统110中物理地址1~物理地址20对应的存储空间存储的数据的元数据进行保护,则存储系统110的管理模块111会针对该存储空间创建至少两个元数据实例,该至少两个元数据实例可以包括元数据实例1和元数据实例2。管理模块111会为每个元数据实例分配用于存储元数据的存储空间,例如,为元数据实例1配置用于存储元数据的存储空间为物理地址50~物理地址55对应的存储空间,为元数据实例2配置的存储空间为物理地址60~物理地址65对应的存储空间。当物理地址1~物理地址20中存储有数据后,元数据实例1则将该数据的元数据存储在其所配置的存储空间中,例如,该数据的元数据为元数据1,元数据实例1将元数据1存储在起始地址为物理地址50的一段存储空间中。然后,存储系统110的管理模块111会复制元数据实例1中存储的元数据,并将复制的元数据存储到为元数据实例2配置的存储空间中,例如,管理模块111复制该元数据1,并将复制的元数据1存储在起始地址为物理地址60的另一段存储空间中。可见,在相关技术中,需要创建多个元数据实例,方式较为复杂。而在本申请实施例中,由于存储系统110中的元数据在存储到存储设备时,已经采用了预设的RAID类型或者多副本类型,对该元数据进行了冗余保护,因此,在本申请实施例中,不需要通过创建多个存储有相同元数据的元数据实例,提供了一种较为简单的对元数据进行冗余保护的方法。It should be noted that in related technologies, redundant protection of metadata is achieved by creating multiple metadata instances. For example, the storage space corresponding to physical address 1 to physical address 20 in the storage system 110 needs to be stored. When the metadata of the data is protected, the management module 111 of the storage system 110 will create at least two metadata instances for the storage space. The at least two metadata instances may include metadata instance 1 and metadata instance 2. The management module 111 allocates storage space for storing metadata for each metadata instance. For example, the storage space for storing metadata for metadata instance 1 is the storage space corresponding to physical address 50 to physical address 55, which is The storage space configured in the metadata instance 2 is the storage space corresponding to the physical address 60 to the physical address 65. When data is stored in physical address 1 to physical address 20, metadata instance 1 stores the metadata of the data in its configured storage space, for example, the metadata of the data is metadata 1, metadata instance 1 Store metadata 1 in a storage space starting at physical address 50. Then, the management module 111 of the storage system 110 copies the metadata stored in the metadata instance 1, and stores the copied metadata in the storage space configured for the metadata instance 2. For example, the management module 111 copies the metadata 1 , And store the copied metadata 1 in another storage space whose starting address is the physical address 60. It can be seen that in related technologies, multiple metadata instances need to be created, which is more complicated. In the embodiment of the present application, since the metadata in the storage system 110 has been stored in the storage device using a preset RAID type or multiple copy type, the metadata has been redundantly protected. Therefore, in this case, In the embodiments of the application, there is no need to create multiple metadata instances storing the same metadata, and a simpler method for redundant protection of metadata is provided.
另外,当采用预设的RAID类型存储元数据时,由于同一个元数据不用存储多份,从而可以减少元数据所占用的存储空间,可以提高存储空间的利用率。In addition, when the preset RAID type is used to store metadata, since there is no need to store multiple copies of the same metadata, the storage space occupied by the metadata can be reduced, and the storage space utilization can be improved.
S102、存储系统110确定第一元数据实例发生故障,则创建第二元数据实例。S102. The storage system 110 determines that the first metadata instance is faulty, and then creates a second metadata instance.
当管理模块111确定第一元数据示例发生故障,则可以创建用于对元数据打快照的第二元数据实例,并设置该第二元数据实例所能访问的存储单元与第一元数据实例相同。例如,第一元数据示例所能访问的存储单元是存储单元0和存储单元1,则第二元数据实例所能访问的存储单元也是存储单元0和存储单元1,从而实现多个元数据实例所能访问的存储单元的共享,这样,当创建新的元数据实例后,该新的元数据实例可以直接使用共享的存储单元中的元数据,减少了向新的元数据实例复制并传输元数据的过程,可以减少创建新的元数据实例的时延,提高效率。进一步,由于多个元数据实例之间不用传输元数据,从而可以节省传输资源。When the management module 111 determines that the first metadata instance is faulty, it can create a second metadata instance for taking a snapshot of the metadata, and set the storage unit and the first metadata instance that can be accessed by the second metadata instance the same. For example, the storage units that can be accessed by the first metadata instance are storage unit 0 and storage unit 1, and the storage units that can be accessed by the second metadata instance are also storage unit 0 and storage unit 1, thereby realizing multiple metadata instances Sharing of accessible storage units, so that when a new metadata instance is created, the new metadata instance can directly use the metadata in the shared storage unit, reducing the need to copy and transfer metadata to the new metadata instance The data process can reduce the time delay of creating a new metadata instance and improve efficiency. Furthermore, since there is no need to transmit metadata between multiple metadata instances, transmission resources can be saved.
需要说明的是,在上述的元数据实例的管理中,是以元数据实例管理预设的存储单元中的元数据为例进行说明的,当然,元数据实例的创建及管理方式不限制于此。It should be noted that in the above-mentioned metadata instance management, the metadata in the storage unit preset for metadata instance management is taken as an example for description. Of course, the creation and management of metadata instances are not limited to this .
上述本申请提供的实施例中,为了实现上述本申请实施例提供的方法中的各功能,存储系统可以包括硬件结构和/或软件模块,以硬件结构、软件模块、或硬件结构加软件模块的形式来实现上述各功能。上述各功能中的某个功能以硬件结构、软件模块、还是硬件结构加软件模块的方式来执行,取决于技术方案的特定应用和设计约束条件。In the above-mentioned embodiments of the present application, in order to realize each function in the method provided in the above-mentioned embodiments of the present application, the storage system may include a hardware structure and/or a software module, and a hardware structure, a software module, or a hardware structure plus a software module Form to achieve the above functions. Whether a certain function of the above-mentioned functions is executed by a hardware structure, a software module, or a hardware structure plus a software module depends on the specific application and design constraint conditions of the technical solution.
图11示出了一种存储系统的元数据的管理装置1100的结构示意图。其中,存储系统的元数据的管理装置1100可以是图3或图6或图9或图10所示的实施例中的管理模块111所在的设备,或者位于管理模块111所在的设备中,可以用于实现管理模块111的功能。存储系统的元数据的管理装置1100可以是硬件结构或硬件结构加软件模块。FIG. 11 shows a schematic structural diagram of an apparatus 1100 for managing metadata of a storage system. The apparatus 1100 for managing metadata of the storage system may be the device where the management module 111 in the embodiment shown in FIG. 3 or FIG. 6 or FIG. 9 or FIG. 10 is located, or it may be located in the device where the management module 111 is located. To realize the functions of the management module 111. The apparatus 1100 for managing metadata of the storage system may be a hardware structure or a hardware structure plus a software module.
存储系统的元数据的管理装置1100包括至少一个存储器,用于存储程序指令和/或数据。存储系统的元数据的管理装置1100还包括至少一个处理器,该至少一个处理器和存储器耦 合,该至少一个处理器可以执行存储器中存储的程序指令。The device 1100 for managing metadata of the storage system includes at least one memory for storing program instructions and/or data. The apparatus 1100 for managing metadata of the storage system further includes at least one processor, the at least one processor is coupled to the memory, and the at least one processor can execute the program instructions stored in the memory.
存储系统的元数据的管理装置1100可以包括生成单元1101、确定单元1102和执行单元1103。The apparatus 1100 for managing metadata of a storage system may include a generating unit 1101, a determining unit 1102, and an executing unit 1103.
生成单元1101可以调用处理器执行存储器中存储的程序指令,以执行图6所示的实施例中的步骤S61,和/或用于支持本文所描述的技术的其它过程。The generating unit 1101 may call the processor to execute the program instructions stored in the memory to execute step S61 in the embodiment shown in FIG. 6 and/or other processes for supporting the technology described herein.
确定单元1102可以调用处理器执行存储器中存储的程序指令,以执行图3所示的实施例中的步骤S32,或执行图6所示的实施例中的步骤S62,或执行图9所示的实施例中的步骤S91,和/或用于支持本文所描述的技术的其它过程。The determining unit 1102 may call the processor to execute the program instructions stored in the memory to execute step S32 in the embodiment shown in FIG. 3, or execute step S62 in the embodiment shown in FIG. 6, or execute step S62 in the embodiment shown in FIG. Step S91 in the embodiment, and/or other processes used to support the technology described herein.
执行单元1103可以调用处理器执行存储器中存储的程序指令,以执行图3所示的实施例中的步骤S33,或执行图6所示的实施例中的步骤S63~步骤S65,或执行图9所示的实施例中的步骤S92~S93,或执行图10所示的实施例中的步骤S101~S102,和/或用于支持本文所描述的技术的其它过程。The execution unit 1103 may call the processor to execute the program instructions stored in the memory to execute step S33 in the embodiment shown in FIG. 3, or execute steps S63 to S65 in the embodiment shown in FIG. 6, or execute step S63 to step S65 in the embodiment shown in FIG. Steps S92 to S93 in the embodiment shown, or steps S101 to S102 in the embodiment shown in FIG. 10 are executed, and/or other processes used to support the technology described herein.
在一种可能的设计中,存储系统的元数据的管理装置1100还可以包括接收单元1104,该接收单元1104可以调用处理器执行存储器中存储的程序指令,以执行图3所示的实施例中的步骤S31,和/或用于支持本文所描述的技术的其它过程。接收单元1104用于存储系统的元数据的管理装置1100和其它模块进行通信,其可以是电路、器件、接口、总线、软件模块、收发器或者其它任意可以实现通信的装置。该接收单元1104不是必须的,在图11中,接收单元1104用虚线表示。In a possible design, the apparatus 1100 for managing metadata of the storage system may further include a receiving unit 1104, which may call the processor to execute the program instructions stored in the memory to execute the program instructions in the embodiment shown in FIG. 3 Step S31, and/or other processes used to support the techniques described herein. The receiving unit 1104 is used for the storage system metadata management device 1100 to communicate with other modules, and it can be a circuit, a device, an interface, a bus, a software module, a transceiver, or any other device that can implement communication. The receiving unit 1104 is not necessary. In FIG. 11, the receiving unit 1104 is represented by a dotted line.
其中,上述方法实施例涉及的各步骤的所有相关内容均可以援引到对应功能模块的功能描述,在此不再赘述。Among them, all relevant content of the steps involved in the above method embodiments can be cited in the functional description of the corresponding functional module, which will not be repeated here.
图11所示的实施例中对模块的划分是示意性的,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,另外,在本申请各个实施例中的各功能模块可以集成在一个处理器中,也可以是单独物理存在,也可以两个或两个以上模块集成在一个模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。The division of modules in the embodiment shown in FIG. 11 is illustrative, and is only a logical function division. In actual implementation, there may be other division methods. In addition, the functional modules in each embodiment of the present application may be integrated In a processor, it can also exist alone physically, or two or more modules can be integrated into one module. The above-mentioned integrated modules can be implemented in the form of hardware or software function modules.
如图12所示为本申请实施例提供的存储系统的元数据的管理装置1200,其中,存储系统的元数据的管理装置1200可以是图3或图6或图9或图10所示的实施例中的管理模块111所在的设备,或者位于管理模块111所在的设备中,可以用于实现管理模块111的功能。FIG. 12 shows an apparatus 1200 for managing metadata of a storage system provided by an embodiment of the present application. The apparatus 1200 for managing metadata of a storage system may be the implementation shown in FIG. 3 or FIG. 6 or FIG. 9 or FIG. 10. In the example, the device where the management module 111 is located, or the device where the management module 111 is located, can be used to implement the functions of the management module 111.
存储系统的元数据的管理装置1200包括至少一个处理器1220,用于实现或用于支持存储系统的元数据的管理装置1200实现本申请实施例提供的方法中管理模块111的功能。示例性地,处理器1220可以确定用于存储元数据的存储单元,具体参见方法示例中的详细描述,此处不做赘述。The apparatus 1200 for managing metadata of a storage system includes at least one processor 1220, and the apparatus 1200 for managing metadata of a storage system is used to implement or support the function of the management module 111 in the method provided in the embodiment of the present application. Exemplarily, the processor 1220 may determine a storage unit for storing metadata. For details, refer to the detailed description in the method example, which is not repeated here.
存储系统的元数据的管理装置1200还可以包括至少一个存储器1230,用于存储程序指令和/或数据。存储器1230和处理器1220耦合。本申请实施例中的耦合是装置、单元或模块之间的间接耦合或通信连接,可以是电性,机械或其它的形式,用于装置、单元或模块之间的信息交互。处理器1220可能和存储器1230协同操作。处理器1220可能执行存储器1230中存储的程序指令。所述至少一个存储器中的至少一个可以包括于处理器中。The apparatus 1200 for managing metadata of the storage system may further include at least one memory 1230 for storing program instructions and/or data. The memory 1230 and the processor 1220 are coupled. The coupling in the embodiments of the present application is an indirect coupling or communication connection between devices, units or modules, and may be in electrical, mechanical or other forms, and is used for information exchange between devices, units or modules. The processor 1220 may operate in cooperation with the memory 1230. The processor 1220 may execute program instructions stored in the memory 1230. At least one of the at least one memory may be included in the processor.
存储系统的元数据的管理装置1200还可以包括通信接口1210,用于通过传输介质和其它设备进行通信,从而用于存储系统的元数据的管理装置1200可以和其它设备进行通信。示例性地,该其它设备可以是客户端或者存储设备。处理器1220可以利用通信接口1210 收发数据。The apparatus 1200 for managing metadata of the storage system may further include a communication interface 1210 for communicating with other devices through a transmission medium, so that the apparatus 1200 for managing metadata of the storage system may communicate with other devices. Exemplarily, the other device may be a client or a storage device. The processor 1220 may use the communication interface 1210 to send and receive data.
本申请实施例中不限定上述通信接口1210、处理器1220以及存储器1230之间的具体连接介质。本申请实施例在图12中以存储器1230、处理器1220以及通信接口1210之间通过总线1250连接,总线在图12中以粗线表示,其它部件之间的连接方式,仅是进行示意性说明,并不引以为限。所述总线可以分为地址总线、数据总线、控制总线等。为便于表示,图12中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。The embodiment of the present application does not limit the specific connection medium between the aforementioned communication interface 1210, the processor 1220, and the memory 1230. In the embodiment of the present application, in FIG. 12, the memory 1230, the processor 1220, and the communication interface 1210 are connected by a bus 1250. The bus is represented by a thick line in FIG. , Is not limited. The bus can be divided into an address bus, a data bus, a control bus, and so on. For ease of representation, only one thick line is used in FIG. 12 to represent it, but it does not mean that there is only one bus or one type of bus.
在本申请实施例中,处理器1220可以是通用处理器、数字信号处理器、专用集成电路、现场可编程门阵列或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件,可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件处理器执行完成,或者用处理器中的硬件及软件模块组合执行完成。In the embodiment of the present application, the processor 1220 may be a general-purpose processor, a digital signal processor, an application specific integrated circuit, a field programmable gate array or other programmable logic device, a discrete gate or transistor logic device, or a discrete hardware component. Or execute the methods, steps, and logical block diagrams disclosed in the embodiments of the present application. The general-purpose processor may be a microprocessor or any conventional processor or the like. The steps of the method disclosed in the embodiments of the present application may be directly embodied as being executed and completed by a hardware processor, or executed and completed by a combination of hardware and software modules in the processor.
在本申请实施例中,存储器1230可以是非易失性存储器,比如硬盘(hard disk drive,HDD)或固态硬盘(solid-state drive,SSD)等,还可以是易失性存储器(volatile memory),例如随机存取存储器(random-access memory,RAM)。存储器是能够用于携带或存储具有指令或数据结构形式的期望的程序代码并能够由计算机存取的任何其他介质,但不限于此。本申请实施例中的存储器还可以是电路或者其它任意能够实现存储功能的装置,用于存储程序指令和/或数据。In the embodiment of the present application, the memory 1230 may be a non-volatile memory, such as a hard disk drive (HDD) or a solid-state drive (SSD), etc., or a volatile memory (volatile memory), For example, random-access memory (RAM). The memory is any other medium that can be used to carry or store desired program codes in the form of instructions or data structures and that can be accessed by a computer, but is not limited to this. The memory in the embodiments of the present application may also be a circuit or any other device capable of realizing a storage function for storing program instructions and/or data.
本申请实施例中还提供一种计算机可读存储介质,包括指令,当其在计算机上运行时,使得计算机执行图3或图6或图9或图10所示的实施例中管理模块111执行的方法。The embodiment of the present application also provides a computer-readable storage medium, including instructions, which when run on a computer, cause the computer to execute the management module 111 in the embodiment shown in FIG. 3 or FIG. 6 or FIG. 9 or FIG. 10 Methods.
本申请实施例中还提供一种计算机程序产品,包括指令,当其在计算机上运行时,使得计算机执行图3或图6或图9或图10所示的实施例中管理模块111执行的方法。The embodiments of the present application also provide a computer program product, including instructions, which when run on a computer, cause the computer to execute the method executed by the management module 111 in the embodiment shown in FIG. 3 or FIG. 6 or FIG. 9 or FIG. 10 .
本申请实施例提供了一种存储系统,该存储系统包括图3或图6或图9或图10所示的实施例中管理模块111。The embodiment of the present application provides a storage system, and the storage system includes the management module 111 in the embodiment shown in FIG. 3 or FIG. 6 or FIG. 9 or FIG. 10.
本申请实施例提供的方法中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、网络设备、用户设备或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(digital subscriber line,简称DSL)或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机可以存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质(例如,软盘、硬盘、磁带)、光介质(例如,数字视频光盘(digital video disc,简称DVD))、或者半导体介质(例如,SSD)等。The methods provided in the embodiments of the present application may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented by software, it can be implemented in the form of a computer program product in whole or in part. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on the computer, the processes or functions described in the embodiments of the present application are generated in whole or in part. The computer may be a general-purpose computer, a special-purpose computer, a computer network, network equipment, user equipment, or other programmable devices. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium. For example, the computer instructions may be transmitted from a website, computer, server, or data center. Transmission to another website, computer, server or data center via wired (such as coaxial cable, optical fiber, digital subscriber line (DSL for short) or wireless (such as infrared, wireless, microwave, etc.). A computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server or data center integrated with one or more available media. The available medium may be a magnetic medium (for example, a floppy disk, hard disk, Magnetic tape), optical media (for example, digital video disc (DVD for short)), or semiconductor media (for example, SSD).

Claims (15)

  1. 一种存储系统中的元数据的管理方法,其特征在于,包括:A method for managing metadata in a storage system, which is characterized in that it includes:
    生成待写入数据对应的元数据;Generate metadata corresponding to the data to be written;
    确定用于存储所述元数据的存储单元,所述存储系统包括多个存储单元,每个存储单元映射到所述存储系统包括的至少两个存储设备对应的物理存储空间;Determining a storage unit for storing the metadata, where the storage system includes a plurality of storage units, and each storage unit is mapped to a physical storage space corresponding to at least two storage devices included in the storage system;
    将所述元数据存储至所述存储单元对应的至少两个存储设备中。The metadata is stored in at least two storage devices corresponding to the storage unit.
  2. 根据权利要求1所述的方法,其特征在于,所述存储单元以追加写的方式存储所述元数据。The method according to claim 1, wherein the storage unit stores the metadata in an additional write manner.
  3. 根据权利要求1或2所述的方法,其特征在于,在确定用于存储所述元数据的存储单元之前,所述方法还包括:The method according to claim 1 or 2, characterized in that, before determining the storage unit for storing the metadata, the method further comprises:
    接收数据写请求,所述数据写请求用于将所述待写入数据写入所述存储系统;Receiving a data write request, where the data write request is used to write the data to be written into the storage system;
    根据所述数据写请求以及所述元数据,生成与所述元数据对应的记录项;所述记录项包括所述数据写请求对应的写数据操作以及执行所述写数据操作后更新的元数据。According to the data write request and the metadata, a record item corresponding to the metadata is generated; the record item includes the data write operation corresponding to the data write request and the metadata updated after the data write operation is executed .
  4. 根据权利要求1-3中任一项所述的方法,其特征在于,The method according to any one of claims 1-3, characterized in that,
    所述元数据包括:The metadata includes:
    所述待写入数据的每个分片的逻辑地址与物理地址之间的对应关系,所述待写入数据所占用的存储单元的逻辑地址与所述待写入数据所包含的各个分片的逻辑地址之间的对应关系,所述每个分片的逻辑地址为所述分片所占用的存储单元对应的逻辑地址;或,The correspondence between the logical address and the physical address of each segment of the data to be written, the logical address of the storage unit occupied by the data to be written and each segment contained in the data to be written Correspondence between the logical addresses of each fragment, the logical address of each fragment is the logical address corresponding to the storage unit occupied by the fragment; or,
    所述元数据包括:The metadata includes:
    所述待写入数据的各个副本的逻辑地址与物理地址之间的对应关系,所述待写入数据的逻辑地址与所述待写入数据所包含的各个副本的逻辑地址之间的对应关系,所述每个副本的逻辑地址为所述副本所占用的存储单元对应的逻辑地址;The correspondence between the logical address and the physical address of each copy of the data to be written, and the correspondence between the logical address of the data to be written and the logical address of each copy contained in the data to be written , The logical address of each copy is the logical address corresponding to the storage unit occupied by the copy;
    所述待写入数据所包含的各个分片的逻辑地址的集合或者所述待写入数据所包含的各个副本的逻辑地址即所述待写入数据的逻辑地址。The set of logical addresses of each segment included in the data to be written or the logical address of each copy included in the data to be written is the logical address of the data to be written.
  5. 根据权利要求1-4中任一项所述的方法,其特征在于,所述方法还包括:The method according to any one of claims 1-4, wherein the method further comprises:
    创建第一元数据实例,所述第一元数据实例用于对预设的存储单元中的元数据进行业务操作。Create a first metadata instance, where the first metadata instance is used to perform business operations on metadata in a preset storage unit.
  6. 根据权利要求5所述的方法,其特征在于,所述方法还包括:The method according to claim 5, wherein the method further comprises:
    在所述第一元数据实例发生故障后,创建第二元数据实例,所述第二元数据实例能够访问所述预设的存储单元中存储的元数据。After the first metadata instance fails, a second metadata instance is created, and the second metadata instance can access the metadata stored in the preset storage unit.
  7. 一种存储系统中的元数据的管理装置,其特征在于,包括:A management device for metadata in a storage system, characterized in that it comprises:
    生成单元,用于生成待写入数据对应的元数据;The generating unit is used to generate metadata corresponding to the data to be written;
    确定单元,用于确定用于存储所述元数据的存储单元,所述存储系统包括多个存储单 元,每个存储单元映射到所述存储系统包括的至少两个存储设备对应的物理存储空间;A determining unit, configured to determine a storage unit for storing the metadata, the storage system includes a plurality of storage units, and each storage unit is mapped to a physical storage space corresponding to at least two storage devices included in the storage system;
    执行单元,用于将所述元数据存储至所述存储单元对应的至少两个存储设备中。The execution unit is configured to store the metadata in at least two storage devices corresponding to the storage unit.
  8. 根据权利要求7所述的装置,其特征在于,所述存储单元以追加写的方式存储所述元数据。8. The device according to claim 7, wherein the storage unit stores the metadata in an additional write manner.
  9. 根据权利要求7或8所述的装置,其特征在于,所述装置还包括:The device according to claim 7 or 8, wherein the device further comprises:
    接收单元,用于接收数据写请求,所述数据写请求用于将所述待写入数据写入所述存储系统;A receiving unit, configured to receive a data write request, where the data write request is used to write the data to be written into the storage system;
    所述生成单元,还用于:根据所述数据写请求以及所述元数据,生成与所述元数据对应的记录项;所述记录项包括所述数据写请求对应的写数据操作以及执行所述写数据操作后更新的元数据。The generating unit is further configured to: generate a record item corresponding to the metadata according to the data write request and the metadata; the record item includes the data write operation corresponding to the data write request and execute all Metadata updated after the write data operation.
  10. 根据权利要求7-9中任一项所述的装置,其特征在于,The device according to any one of claims 7-9, characterized in that:
    所述元数据包括:The metadata includes:
    所述待写入数据的每个分片的逻辑地址与物理地址之间的对应关系,所述待写入数据所占用的存储单元的逻辑地址与所述待写入数据所包含的各个分片的逻辑地址之间的对应关系,所述每个分片的逻辑地址为所述分片所占用的存储单元对应的逻辑地址;或,The correspondence between the logical address and the physical address of each segment of the data to be written, the logical address of the storage unit occupied by the data to be written and each segment contained in the data to be written Correspondence between the logical addresses of each fragment, the logical address of each fragment is the logical address corresponding to the storage unit occupied by the fragment; or,
    所述元数据包括:The metadata includes:
    所述待写入数据的各个副本的逻辑地址与物理地址之间的对应关系,所述待写入数据的逻辑地址与所述待写入数据所包含的各个副本的逻辑地址之间的对应关系,所述每个副本的逻辑地址为所述副本所占用的存储单元对应的逻辑地址;The correspondence between the logical address and the physical address of each copy of the data to be written, and the correspondence between the logical address of the data to be written and the logical address of each copy contained in the data to be written , The logical address of each copy is the logical address corresponding to the storage unit occupied by the copy;
    所述待写入数据所包含的各个分片的逻辑地址的集合或者所述待写入数据所包含的各个副本的逻辑地址即所述待写入数据的逻辑地址。The set of logical addresses of each segment included in the data to be written or the logical address of each copy included in the data to be written is the logical address of the data to be written.
  11. 根据权利要求7-10中任一项所述的装置,其特征在于,所述执行单元还用于:The device according to any one of claims 7-10, wherein the execution unit is further configured to:
    创建第一元数据实例,所述第一元数据实例用于对预设的存储单元中的元数据进行业务操作。Create a first metadata instance, where the first metadata instance is used to perform business operations on metadata in a preset storage unit.
  12. 根据权利要求11所述的装置,其特征在于,所述执行单元还用于:The device according to claim 11, wherein the execution unit is further configured to:
    在所述第一元数据实例发生故障后,创建第二元数据实例,所述第二元数据实例能够访问所述预设的存储单元中存储的元数据。After the first metadata instance fails, a second metadata instance is created, and the second metadata instance can access the metadata stored in the preset storage unit.
  13. 一种存储系统中的元数据的管理装置,其特征在于,包括处理器和存储器,所述存储器中存储有计算机可执行指令,所述计算机可执行指令在被所述处理器调用时用于使所述处理器执行上述权利要求1-6中任一项所述的方法。A device for managing metadata in a storage system, which is characterized in that it comprises a processor and a memory. The memory stores computer-executable instructions. The computer-executable instructions are used to make the computer executable when called by the processor. The processor executes the method according to any one of claims 1-6.
  14. 一种计算机存储介质,其特征在于,所述计算机存储介质存储有指令,当所述指令在计算机上运行时,使得所述计算机执行如权利要求1-6任一项所述的方法。A computer storage medium, wherein the computer storage medium stores instructions, and when the instructions are run on a computer, the computer executes the method according to any one of claims 1-6.
  15. 一种计算机程序产品,其特征在于,所述计算机程序产品存储有指令,当所述指令在计算机上运行时,使得所述计算机执行如权利要求1-6任一项所述的方法。A computer program product, characterized in that the computer program product stores instructions, which when run on a computer, cause the computer to execute the method according to any one of claims 1-6.
PCT/CN2020/119929 2019-11-05 2020-10-09 Method and apparatus for managing metadata in storage system WO2021088586A1 (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN201911072812.6 2019-11-05
CN201911072812 2019-11-05
CN202010021351.6A CN112783698A (en) 2019-11-05 2020-01-09 Method and device for managing metadata in storage system
CN202010021351.6 2020-01-09

Publications (1)

Publication Number Publication Date
WO2021088586A1 true WO2021088586A1 (en) 2021-05-14

Family

ID=75749970

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/119929 WO2021088586A1 (en) 2019-11-05 2020-10-09 Method and apparatus for managing metadata in storage system

Country Status (2)

Country Link
CN (1) CN112783698A (en)
WO (1) WO2021088586A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113342751B (en) * 2021-07-30 2021-11-09 联想凌拓科技有限公司 Metadata processing method, device, equipment and readable storage medium
CN113867642B (en) * 2021-09-29 2023-08-04 杭州海康存储科技有限公司 Data processing method, device and storage equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1776675A (en) * 2004-11-17 2006-05-24 国际商业机器公司 Method, system for storing and using metadata in multiple storage locations
CN107622019A (en) * 2016-07-14 2018-01-23 爱思开海力士有限公司 Accumulator system and its operating method
CN108108308A (en) * 2016-11-24 2018-06-01 爱思开海力士有限公司 Storage system and its operating method
US20190079859A1 (en) * 2017-09-13 2019-03-14 Intel Corporation Apparatus, computer program product, system, and method for managing multiple regions of a memory device

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8819208B2 (en) * 2010-03-05 2014-08-26 Solidfire, Inc. Data deletion in a distributed data storage system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1776675A (en) * 2004-11-17 2006-05-24 国际商业机器公司 Method, system for storing and using metadata in multiple storage locations
CN107622019A (en) * 2016-07-14 2018-01-23 爱思开海力士有限公司 Accumulator system and its operating method
CN108108308A (en) * 2016-11-24 2018-06-01 爱思开海力士有限公司 Storage system and its operating method
US20190079859A1 (en) * 2017-09-13 2019-03-14 Intel Corporation Apparatus, computer program product, system, and method for managing multiple regions of a memory device

Also Published As

Publication number Publication date
CN112783698A (en) 2021-05-11

Similar Documents

Publication Publication Date Title
WO2018040591A1 (en) Remote data replication method and system
US10467246B2 (en) Content-based replication of data in scale out system
US9946655B2 (en) Storage system and storage control method
US11188520B2 (en) Storage tier verification checks
JP6344798B2 (en) Data transmission method, data reception method, and storage device
JP4990066B2 (en) A storage system with a function to change the data storage method using a pair of logical volumes
US8204858B2 (en) Snapshot reset method and apparatus
US20080282047A1 (en) Methods and apparatus to backup and restore data for virtualized storage area
US20100199065A1 (en) Methods and apparatus for performing efficient data deduplication by metadata grouping
JP2022512064A (en) Improving the available storage space in a system with various data redundancy schemes
US10620843B2 (en) Methods for managing distributed snapshot for low latency storage and devices thereof
WO2019184012A1 (en) Data writing method, client server, and system
WO2019080370A1 (en) Data reading and writing method and apparatus, and storage server
WO2021088586A1 (en) Method and apparatus for managing metadata in storage system
WO2019062856A1 (en) Data reconstruction method and apparatus, and data storage system
US20200341871A1 (en) Raid schema for providing metadata protection in a data storage system
WO2021017782A1 (en) Method for accessing distributed storage system, client, and computer program product
US20200174683A1 (en) Method and system for delivering message in storage system
US10346077B2 (en) Region-integrated data deduplication
US11194501B2 (en) Standby copies withstand cascading fails
US11216204B2 (en) Degraded redundant metadata, DRuM, technique
US20210318826A1 (en) Data Storage Method and Apparatus in Distributed Storage System, and Computer Program Product
US20180307427A1 (en) Storage control apparatus and storage control method
US20210311654A1 (en) Distributed Storage System and Computer Program Product
JP2002288014A (en) File control system and file data writing method

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20884466

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20884466

Country of ref document: EP

Kind code of ref document: A1