WO2021063242A1 - 一种存储系统的元数据的发送方法及存储系统 - Google Patents

一种存储系统的元数据的发送方法及存储系统 Download PDF

Info

Publication number
WO2021063242A1
WO2021063242A1 PCT/CN2020/117416 CN2020117416W WO2021063242A1 WO 2021063242 A1 WO2021063242 A1 WO 2021063242A1 CN 2020117416 W CN2020117416 W CN 2020117416W WO 2021063242 A1 WO2021063242 A1 WO 2021063242A1
Authority
WO
WIPO (PCT)
Prior art keywords
metadata
storage system
updated
partition
updated metadata
Prior art date
Application number
PCT/CN2020/117416
Other languages
English (en)
French (fr)
Inventor
乔建峰
伍华涛
倪敏芳
王金红
葛挺峰
李建平
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2021063242A1 publication Critical patent/WO2021063242A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0604Improving or facilitating administration, e.g. storage management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/0644Management of space entities, e.g. partitions, extents, pools

Definitions

  • This application relates to the field of storage technology, and in particular to a method for sending metadata of a storage system and a storage system.
  • the business functions of search, etc., are not given as examples.
  • metadata is accessed mainly through the enumeration interface provided by the storage system.
  • the amount of metadata in the storage system continues to increase, in this case, it seriously affects the metadata processing of the storage system. effectiveness.
  • the embodiments of the present application provide a method for sending metadata of a storage system and a storage system to improve the efficiency of metadata processing of the storage system.
  • this application provides a method for sending metadata of a storage system.
  • the metadata updated according to the operation request in the storage system that is, the updated metadata
  • the updated metadata is first obtained, and then the updated metadata is sent.
  • the storage system may only provide metadata that has been updated in the storage system. Compared with the way of enumerating metadata in the prior art, the metadata processing efficiency can be improved.
  • the amount of metadata transmission can be reduced, and bandwidth resources can be saved.
  • acquiring the updated metadata of the storage system may be acquiring one of the multiple metadata partitions, for example, the first metadata Partition, updated metadata.
  • each metadata corresponds to a metadata record item, and each metadata record item contains the operation corresponding to the operation request and the updated metadata.
  • Data, the updated metadata of the storage system is obtained, which may be the updated metadata record item in the pre-write log of the storage system.
  • obtaining the updated metadata of the storage system may include, but is not limited to, the following two methods:
  • the first method is to obtain the updated metadata of the storage system in real time.
  • the storage system can send the updated metadata as soon as possible, which can ensure the consistency between the metadata used by the user and the metadata in the storage system.
  • the storage system first receives an acquisition request for acquiring updated metadata in the storage system, and then sends the updated metadata.
  • the storage system can send updated metadata according to user needs.
  • the storage system may send the updated metadata in the form of a message.
  • the storage system includes multiple metadata partitions, and the multiple metadata partitions may change, for example, two metadata partitions are merged, or a metadata partition is split into two new metadata Partition, in this case, when a certain metadata partition in the plurality of metadata partitions, for example, the second metadata partition, changes, the storage system can send instructions to cancel the second metadata partition The first instruction message for preserving the order of the updated metadata in the.
  • the storage system After the storage system obtains the updated metadata in the changed metadata partition, it first sends the first order indicating the order of the updated metadata in the changed metadata partition. 2. Instruct the message, and then send the updated metadata in the changed metadata partition.
  • the first indication message and the second indication message can be used to realize the order of the updated metadata in the metadata partition.
  • the user receives the second indication message If the first indication message has been received before the indication message, it means that all the updated metadata in the metadata partition before the change is sent successfully, and then the updated metadata in the metadata partition after the change is received, so that the protection can be achieved. sequence.
  • a method for sending metadata of a storage system uses a pre-write log to record metadata in the storage system.
  • Each metadata corresponds to a metadata record item, and each metadata
  • the record item contains the operation corresponding to the operation request and the updated metadata.
  • the storage system first obtains the updated metadata record item in the pre-write log of the storage system, and then sends the updated metadata record item.
  • the storage system may only provide record items of metadata that are updated in the storage system. Compared with the way of enumerating metadata in the prior art, the metadata processing efficiency can be improved.
  • acquiring the updated metadata record items of the storage system may be acquiring one of the multiple metadata partitions, for example, the first Metadata partition, updated metadata record items.
  • obtaining the updated metadata record items of the storage system may include, but is not limited to, the following two methods:
  • the first method is to obtain the updated metadata record items of the storage system in real time.
  • the storage system first receives the acquisition request for acquiring the updated metadata record item in the storage system, and then sends the updated metadata record item.
  • the storage system may send the updated metadata record item in the form of a message.
  • the storage system includes multiple metadata partitions, and the multiple metadata partitions may change.
  • the storage system may send a first instruction message for instructing to cancel the order of the updated metadata in the second metadata partition.
  • the storage system after the storage system obtains the updated metadata record item in the changed metadata partition, it first sends an instruction to instruct the updated metadata in the changed metadata partition to preserve the order Then, send the updated metadata record item in the changed metadata partition.
  • a storage system may be a storage node or a storage server, or a storage node or a device in the storage server.
  • the storage space management device includes a processor, configured to implement the method described in the first aspect.
  • the storage space management device may also include a memory for storing program instructions and data.
  • the memory is coupled with the processor, and the processor can call and execute the program instructions stored in the memory to implement any one of the methods described in the first aspect.
  • the storage system may also include a communication interface that communicates with the processor.
  • the storage system includes a communication interface and a processor, where:
  • the processor is configured to obtain updated metadata of a storage system, where the updated metadata is used to indicate metadata updated in the storage system according to an operation request;
  • the communication interface is used to send the updated metadata.
  • the storage system includes multiple metadata partitions; the processor is specifically configured to:
  • the processor is specifically used for:
  • the updated metadata record item includes the operation corresponding to the operation request and the updated metadata.
  • the processor is specifically used for:
  • the communication interface is specifically used for:
  • the updated metadata is sent in the form of a message.
  • the communication interface is also used for:
  • An acquisition request is received, where the acquisition request is used to acquire updated metadata in the storage system.
  • the communication interface is also used for:
  • a first indication message is sent, where the first indication message is used to instruct to cancel the preservation of the updated metadata in the second metadata partition. sequence.
  • the processor is also used to:
  • the communication interface is also used for:
  • a storage system may be a storage node or a storage server, or a storage node or a device in the storage server.
  • the storage system may include a processing unit and a transceiver unit, and these units can perform the corresponding functions performed in any of the design examples of the first aspect, specifically:
  • the processing unit is configured to obtain updated metadata of a storage system, where the updated metadata is used to indicate metadata that is updated in the storage system according to an operation request;
  • the transceiver unit is configured to send the updated metadata.
  • the storage system includes multiple metadata partitions; the processing unit is specifically configured to:
  • the processing unit is specifically configured to:
  • the updated metadata record item includes the operation corresponding to the operation request and the updated metadata.
  • the processing unit is specifically configured to:
  • the transceiver unit is specifically configured to:
  • the updated metadata is sent in the form of a message.
  • the transceiver unit is also used for:
  • An acquisition request is received, where the acquisition request is used to acquire updated metadata in the storage system.
  • the transceiver unit is also used for:
  • a first indication message is sent, where the first indication message is used to instruct to cancel the preservation of the updated metadata in the second metadata partition. sequence.
  • the processing unit is also used to:
  • the transceiver unit is also used for:
  • an embodiment of the present application provides a computer-readable storage medium, the computer-readable storage medium stores a computer program, and the computer program includes program instructions that, when executed by a computer, cause the The computer executes the method described in any one of the first aspect.
  • an embodiment of the present application provides a computer program product.
  • the computer program product stores a computer program.
  • the computer program includes program instructions. When executed by a computer, the program instructions cause the computer to execute the first The method of any one of the aspects.
  • the present application provides a chip system, which includes a processor and may also include a memory, configured to implement the method described in the first aspect.
  • the chip system can be composed of chips, or it can include chips and other discrete devices.
  • FIG. 1 is an example architecture diagram of a storage system in the prior art
  • Figure 2 is a schematic diagram of a process of writing data to a storage system
  • FIG. 3 is an example architecture diagram of a storage system provided by an embodiment of the application.
  • FIG. 5 is a schematic diagram of an example of multiple metadata partitions in an embodiment of the application.
  • FIG. 6 is a schematic diagram of an example of changes in metadata partitions in an embodiment of the application.
  • FIG. 7 is a schematic diagram of an example of dynamic changes of metadata partitions in an embodiment of the application.
  • FIG. 8 is a schematic diagram of an example of metadata partition splitting in an embodiment of the application.
  • FIG. 9 is an architecture diagram of an example of a storage system implemented by adopting a publish/subscribe system in an embodiment of the application.
  • FIG. 10 is a schematic diagram of metadata sent by a storage system implemented by adopting a publish/subscribe system in an embodiment of the application;
  • FIG. 11 is a schematic diagram of an example of a storage system in an embodiment of the application.
  • FIG. 12 is a schematic diagram of another example of a storage system in an embodiment of the application.
  • “multiple” refers to two or more than two. In view of this, “multiple” may also be understood as “at least two” in the embodiments of the present application. “At least one” can be understood as one or more, for example, one, two or more. For example, including at least one refers to including one, two or more, and does not limit which ones are included. For example, including at least one of A, B, and C, then the included can be A, B, C, A and B, A and C, B and C, or A and B and C.
  • ordinal numbers such as “first” and “second” mentioned in the embodiments of the present application are used to distinguish multiple objects, and are not used to limit the order, timing, priority, or importance of multiple objects.
  • the method in the embodiments of the present application can be applied to a storage system.
  • the storage system can be a centralized storage system or a distributed storage system. Specifically, it can be a database storage system, a file storage system, a block storage system, or an object storage system.
  • the column storage system, etc. can also be a cloud storage system, or a combination of the above storage systems, etc., and the form of the storage system is not limited here.
  • the storage system 100 includes an access node 110 and one or more storage nodes 120.
  • the storage system 100 includes one access node 110 and two storage nodes 120 (respectively storage nodes 1 and Take storage node 2) as an example.
  • the access node 110 and the storage node 120 may be independent servers or virtual devices.
  • the multiple virtual devices may run on multiple servers respectively, or may run on the same server, which can be determined by those skilled in the art according to the actual operating environment.
  • the access node 110 when the storage system 100 writes data, the access node 110 first receives a write operation request.
  • the write operation request includes the data to be written, generates metadata corresponding to the data to be written, and then
  • the generated metadata is stored in the storage node, for example, stored in the storage node 2.
  • the storage system 100 uses a write-ahead logging (WAL) method to record the operation request and metadata.
  • WAL write-ahead logging
  • the embodiments of the present application provide a method and device for sending metadata of a storage system, so as to improve the processing efficiency of metadata of the storage system.
  • FIG. 3 is a structural diagram of an example of a storage system provided in an embodiment of this application.
  • the difference from the storage system shown in FIG. 1 is that an interface is added to the access node 110 in FIG. 3, and the interface is used to provide updated metadata.
  • the interface is an update metadata interface, and the name of the interface is not restricted here.
  • the storage system 100 can obtain changed metadata. For example, when the client writes new data to the storage system 100, the metadata corresponding to the new data is the changed metadata; or, when the client writes new data to the storage system 100, If the data A stored in the system 100 is modified, the metadata corresponding to the modified data is the changed metadata; or, when the client deletes the data B stored in the storage system 100, the data B
  • the corresponding metadata is the changed metadata, that is, the updated metadata is based on the operation request used to update the data in the storage system (for example, modify data operation request, delete data operation request or write New data operation request).
  • the update metadata interface can provide metadata that has changed in the storage system 100.
  • the newly added interface is set in the storage system 100 as an example. In other embodiments, the newly added interface may also be set outside the storage system 100. , There is no restriction here.
  • FIG. 4 is a flowchart of the method, and the description of the flowchart is as follows:
  • the access node receives an operation request.
  • the operation request is used to update the data in the storage system.
  • the operation request can be an operation request to write new data to the storage system, or it can be an operation request to delete data already stored in the storage system, or it can be an operation request to delete data already stored in the storage system. Operation requests for modifying data stored in the storage system, etc.
  • the access node generates metadata corresponding to the operation request.
  • the access node After receiving the operation request, the access node updates the data in the storage node 1 according to the operation request. After the data is updated, metadata corresponding to the updated data can be generated.
  • the specific process is similar to the process shown in FIG. 2 and will not be repeated here.
  • the access node sends the generated metadata to the metadata storage node, and the metadata storage node stores the metadata.
  • the metadata storage node may be the storage node 2 shown in FIG. 3, and for convenience of description, the metadata storage node is the storage node 2 as an example in the following.
  • the storage node 2 storing the metadata may include but is not limited to the following manners.
  • the storage node 2 can be updated in a remote location, that is, the metadata is written into the storage node 2 in sequence according to the sequence in which the access node generates the metadata. For example, if the storage node 2 receives the first metadata and the second metadata successively, the storage node 2 writes the first metadata into the first metadata entry in the storage node 2, and then writes the second metadata Into the second metadata entry of storage node 2.
  • the storage node 2 includes multiple metadata partitions.
  • a range partition method can be used to store all the metadata that the storage node 2 can store according to the key (Key) (or understood as metadata). Data index number) to divide the partition.
  • the key range of the metadata that can be stored by the storage node 2 is ⁇ 0-100 ⁇ .
  • the metadata is divided into 5 metadata partitions.
  • the above metadata can be fixed on the principle of left-closed and right-opened or left-opened and right-closed to ensure the unambiguousness of its attribution. For example, the key range of the metadata of the first metadata partition shown in FIG.
  • the storage node 2 After the storage node 2 receives the at least one metadata, it can determine the partition to which the metadata belongs according to the Key of the metadata.
  • different metadata partitions are stored in different storage nodes. The embodiment of the present invention does not limit this.
  • the multiple metadata partitions may have the following forms.
  • the multiple metadata partitions are fixed.
  • the storage system is preset with five metadata partitions as shown in FIG. 5, and the five metadata partitions are always maintained during subsequent use.
  • the multiple metadata partitions can be dynamically merged or split according to usage requirements. For example, since the data volume of metadata in each metadata partition may be different, in order to ensure business balance, when the data volume of metadata in a certain metadata partition exceeds a threshold, the metadata partition can be split. Thereby, a new metadata partition is generated. When the amount of metadata in one or more metadata partitions is reduced, the metadata partitions can be merged.
  • the second metadata partition shown in FIG. 5 is split into two metadata partitions ⁇ 20-30 ⁇ and ⁇ 31-39 ⁇ , thereby obtaining 6 metadata partitions shown in 6(a).
  • the first metadata partition and the second metadata partition shown in FIG. 5 may be merged to obtain 4 metadata partitions as shown in FIG. 6(b).
  • the metadata partition is dynamically changing, in different time periods, due to different metadata partitions, the metadata of the same Key may be located in different metadata partitions.
  • the metadata is located in the metadata partition ⁇ 0-19 ⁇ , where ⁇ 0-19 ⁇ represents the metadata in the metadata partition The range of the data key; therefore, the metadata key is used to represent a partition.
  • the metadata partition ⁇ 0 ⁇ 19 ⁇ is split into the metadata partition ⁇ 0 ⁇ 9 ⁇ and the metadata partition ⁇ 10 ⁇ 19 ⁇ , so the key is 8
  • the metadata is located in the metadata partition ⁇ 0 ⁇ 9 ⁇ ; during the time period t5 ⁇ t6, the metadata partition ⁇ 0 ⁇ 9 ⁇ is split into the metadata partition ⁇ 0 ⁇ 5 ⁇ and the metadata partition ⁇ 6 ⁇ 9 ⁇ , then The metadata with Key 8 is located in the metadata partition ⁇ 6-9 ⁇ .
  • each metadata partition uses WAL to record the metadata in the metadata partition.
  • the operation request and the metadata of the operation request are recorded in the form of log entries.
  • the access node receives an acquisition request, where the acquisition request is used to acquire updated metadata of the storage system 100, and the access node sends the acquisition request to the storage node 2.
  • the client when the client needs to obtain all the metadata at the current moment, the client can send the obtaining request to the access node by calling the update metadata interface.
  • another application or system may send the acquisition request to acquire the updated metadata of the storage system 100.
  • the storage node 2 obtains updated metadata in the storage system.
  • the manner in which the storage node 2 obtains the updated metadata may include, but is not limited to, the following manners.
  • the storage node 2 can obtain updated metadata through a snapshot. For example, the storage node 2 obtains a first snapshot of the metadata stored in the storage node 2 at the first moment, and then obtains a second snapshot of the stored metadata again after a preset period of time, by comparing the first snapshot and the second snapshot, Then, the updated metadata in the storage node 2 can be obtained.
  • the storage node 2 when the storage node 2 stores metadata, it can add identification information to each metadata.
  • the identification information is used to indicate that the metadata has not been sent. If the metadata has been sent, it can be deleted. The identification information. In this way, the storage node 2 can obtain updated metadata by searching for metadata including identification information.
  • the stored metadata has a serial number.
  • the storage node 2 can record the serial number of the metadata sent each time. For example, in the initial state of the storage system, the storage system does not send any metadata, and the initial value of the serial number of the sent metadata is 0, then the storage node 2 It is determined that the metadata whose sequence number is greater than 0 is the metadata that has been updated. Among them, the serial number of the metadata can be understood as the number corresponding to the metadata item stored in the metadata.
  • metadata may be stored according to metadata partitions.
  • the updated metadata in the storage system 100 is acquired in the embodiment of the present invention, which may be a certain metadata partition in the storage system ( For example, the updated metadata occurs in the first metadata partition).
  • the acquisition request is used to acquire all metadata related to a certain business, and all metadata related to the business are stored in the first metadata partition, the storage node 2 may only acquire the first metadata partition.
  • the updated metadata in the metadata partition does not need to obtain all the changed metadata in the storage system, which can reduce the amount of metadata transmission, thereby saving bandwidth resources.
  • each metadata partition may use WAL to record the metadata in the metadata partition.
  • the metadata that has been updated in the storage system 100 can be obtained, specifically, obtaining all metadata.
  • S406 The storage node 2 sends the acquired updated metadata to the access node, and the access node sends the updated metadata to the client.
  • the storage system may only provide updated metadata. Compared with the processing method of listing metadata in the prior art, the metadata processing efficiency is improved, the metadata transmission amount is reduced, and bandwidth resources are saved.
  • the storage node 2 sends the updated metadata in the order of the updated metadata, thereby ensuring that the metadata obtained by the client is also in order.
  • metadata is stored according to metadata partitions.
  • the storage node is stored in the metadata partition.
  • the updated metadata is sent to the access node, so that the access node sends the updated metadata in the metadata partition to the client, thereby reducing the amount of metadata transmission, thereby saving bandwidth resources.
  • the storage node when obtaining updated metadata that occurs in the storage system 100 is to obtain updated metadata in multiple metadata partitions, the storage node sends the multiple metadata in parallel. The updated metadata in each metadata partition is sent to the access node, so that the sending delay can be reduced.
  • metadata can be sent by sending WAL record items.
  • the location information of the updated metadata of the WAL record can also be sent, and the client or other applications can read the updated metadata of the WAL record according to the location information.
  • the location information may include the identifier of each WAL record item and/or the offset position of the WAL record item, etc.
  • the location information can include the identifier of the WAL record item, the offset position of the WAL record item, and the multiple The length of the unit storage space occupied by WAL record items, and WAL record items can be obtained in batches through the location information.
  • the change of the second metadata partition may include splitting the second metadata partition or merging the second metadata partition with other metadata partitions.
  • the second metadata partition split is used as an example for description below.
  • the second metadata partition in the metadata storage node is split into multiple metadata partitions.
  • the second metadata partition is any one of the multiple metadata partitions in the metadata storage node.
  • the metadata partition key of metadata partition 1 ranges from A to C, and metadata partition 1 is split into metadata partition 2 (Key range is A to less than B) and metadata partition 3 (Key ranges from B to C).
  • the metadata storage node sends a first indication message to the access node, where the first indication message is used to instruct to cancel the order of the updated metadata in the second metadata partition.
  • the serial number of metadata refers to the serial number of metadata storage.
  • the serial number refers to the order of WAL record items.
  • the access node generates a mark for canceling the order-preserving relationship of the second metadata partition.
  • the metadata storage node obtains the updated metadata in the metadata partition after the split.
  • S411 The metadata storage node sends a second indication message to the access node, and the access node sends the second indication message to the client.
  • the second indication message is used to instruct to preserve the order of the updated metadata in the metadata partition after the split.
  • S412 The client sends a confirmation response message to the second indication message to the access node, and the access node sends the confirmation response message to the metadata storage node.
  • the client After the client receives the second indication message, it queries whether there is a mark in the access node that cancels the order-preserving relationship of the second metadata partition. If the mark exists, the client sends a confirmation response to the access node, and the access node After receiving the confirmation response message, the mark is cleared. If the client confirms that the mark does not exist in the access node after receiving the partition order-preserving relationship establishment message, the client will not send a confirmation response to the access node (or send a negative response to the access node).
  • the sending time of the first indication message is after sending the updated metadata in the metadata partition before the split is completed, when the client confirms that the first indication message has been received before receiving the second indication message, It means that all the updated metadata in the metadata partition before the split are sent successfully, and then the updated metadata is received according to the metadata partition after the split, so that order preservation can be achieved.
  • the metadata storage node may keep sending the second indication message until the confirmation response message is received.
  • the metadata storage node sends the updated metadata in the metadata partition after the split to the access node, and the access node sends the updated metadata to the client.
  • the metadata storage node After receiving the confirmation response message, the metadata storage node sends the updated metadata in the metadata partition after the split to the access node.
  • the access node can also record the serial number of the updated metadata that has been sent.
  • the access node determines that the serial number of the received updated metadata is less than the recorded serial number, the updated metadata is considered to be a duplicate Data, which can be directly discarded, avoiding repeated processing.
  • S414 The metadata storage node sends a metadata deletion message to the access node, and the access node deletes the metadata according to the metadata deletion message.
  • a deletion operation on the sent metadata can be triggered periodically, thereby sending a metadata deletion message to the access node. Since the metadata message in the embodiment of the present application is sent in order, when the access node receives the metadata delete message, it must be able to ensure that the metadata corresponding to the metadata partition has been consumed, thereby It can delete the updated metadata it receives, saving storage resources of the storage system.
  • step S404 is an optional step, that is, it is not necessary to be performed.
  • the storage node 2 can obtain updated metadata in the storage system in real time, and then actively send the updated metadata.
  • a monitoring event can be set in the storage node 2 to detect whether the metadata in the storage node has changed. In this way, as long as there is a change in the metadata in the storage node 2, the monitoring event will be triggered to actively update the metadata. Sent to the access node. Therefore, in FIG. 4, step S404 is taken as a dotted line to indicate that this step is an optional step.
  • FIG. 4 Another implementation manner of the embodiment of the present invention.
  • the implementation manner shown in FIG. 4 may all be implemented by a storage node or an access node.
  • the embodiment of the present invention may be implemented by an array controller, which is not limited by the embodiment of the present invention.
  • the storage system obtains the updated metadata and sends the updated metadata to the client or a third-party system/application.
  • the above various implementation manners are collectively referred to as being implemented by a storage system.
  • the embodiments of the present invention may also support implementation by a third-party device independent of the storage system, which will not be repeated here.
  • FIG. 9 is an example architecture diagram of a storage system implemented by adopting a publish/subscribe system.
  • the difference from the storage system described in FIG. 1 is that a publish/subscribe system 130 is added in FIG. 9.
  • the publish/subscribe system 130 communicates with the access node and the storage node 120 respectively.
  • the publish/subscribe system 130 is used to The updated metadata is sent in the form of a message.
  • the publish/subscribe system 130 can also communicate with third-party applications.
  • the publish/subscribe system includes two participants, a message publisher (producer) and a message subscriber (consumer).
  • the message publisher is used to create a topic in the publish/subscribe system, and then send messages to the topic and publish /Subscription system will keep the message in the topic for message subscribers, and forward the message in the topic to each message subscriber.
  • the publish/subscribe system removes the message from the topic.
  • the storage node 120 When the publish/subscribe system is applied to the storage system, the storage node 120 can be used as the message publisher of the publish/subscribe system, and then the storage system access node, client or third-party application can be used as the message subscriber, the storage node 120 Publish metadata in the publish/subscribe system, and the access node 110, client or third-party application of the storage system consumes the metadata in real time, so that the access node 110, client or third-party application of the storage system can Provide corresponding services based on the acquired metadata.
  • a topic is created in the publish/subscribe system
  • a message publisher is created in the storage node
  • a message subscriber is created in the access node.
  • a topic is first created in the publish/subscribe system, and the topic can include a default topic partition (TPartition).
  • TPartition To use topics to send messages, you must create a message publisher to produce messages and create message subscribers to consume messages. Therefore, you also need to create a message publisher in the storage node and create a message subscriber in the access node.
  • the message publisher includes only one default message publisher (TProducer)
  • Tonsumer the message subscriber includes only one default message subscriber (TConsumer)
  • the default topic is associated with the default message publisher and the default message subscriber.
  • each metadata partition includes a unique identifier
  • the metadata The unique identifier of the data partition can map a metadata partition to a topic partition.
  • the metadata storage node includes only one metadata partition.
  • the metadata partition will split, resulting in a new metadata partition. For example, from the default metadata
  • the data partition is split into metadata partition 1 and metadata partition 2.
  • the metadata storage node obtains the updated metadata in the metadata partition 2, it generates a new WAL record item corresponding to the updated metadata, and writes the new WAL record item To the partition update queue.
  • the default message publisher determines according to the new WAL record that the message publisher corresponding to metadata partition 2 is not included, and then a message publisher corresponding to metadata partition 2 is created in the message publisher and marked as message publisher 2. Then, the message publisher 2 obtains the new WAL record item from the partition update queue and sends it to the topic.
  • the default topic partition After the default topic partition receives the new WAL record item, it is determined that the topic does not include the topic partition corresponding to the new WAL record item, so that the topic partition 2 corresponding to the metadata partition 2 is created in the topic, through the Subject partition 2 receives the new WAL record item.
  • the default topic partition sends an event message that generates a new metadata partition to the default message subscriber.
  • the default message subscriber After receiving the event message, the default message subscriber creates a message subscriber 2 corresponding to the metadata partition 2 in the message subscriber. Thus, the message subscriber 2 obtains the new WAL record item from the topic partition 2.
  • the access node of the storage system serves as the publisher of the message, and the client or third-party application serves as the subscriber of the message.
  • the storage system acts as the publisher of the message, and the client or third-party application acts as the message subscriber.
  • the storage system may include a hardware structure and/or a software module, and a hardware structure, a software module, or a hardware structure plus a software module Form to achieve the above functions. Whether a certain function of the above-mentioned functions is executed by a hardware structure, a software module, or a hardware structure plus a software module depends on the specific application and design constraint conditions of the technical solution.
  • FIG. 11 shows a schematic structural diagram of a storage system 1100.
  • the storage system 1100 may be used to implement the functions of the storage node of the storage system or the array controller of the storage array.
  • the storage system 1100 may be a hardware structure, a software module, or a hardware structure plus a software module.
  • the storage system 1100 may be implemented by a chip system.
  • the chip system may be composed of chips, or may include chips and other discrete devices.
  • the storage system 1100 may include a processing unit 1101 and a transceiving unit 1102.
  • the transceiver unit 1102 may be used to execute step S403, step S404, step S406, step S408, step S411 to step S414 in the embodiment shown in FIG. 4, and/or other processes used to support the technology described herein.
  • the transceiver unit 1102 can be used to communicate with the processing unit 1101, or the transceiver unit 1102 can be used to communicate with the storage system 1100 and other modules, which can be circuits, devices, interfaces, buses, software modules, Transceiver or any other device that can realize communication.
  • the processing unit 1101 may be used to execute step S405, step S407, and step S410 in the embodiment shown in FIG. 4, and/or other processes used to support the technology described herein.
  • the division of modules in the embodiment shown in FIG. 11 is illustrative, and is only a logical function division. In actual implementation, there may be other division methods.
  • the functional modules in each embodiment of the present application may be integrated In a processor, it can also exist alone physically, or two or more modules can be integrated into one module.
  • the above-mentioned integrated modules can be implemented in the form of hardware or software function modules.
  • FIG. 12 shows a storage system 1200 provided by an embodiment of the application, where the storage system 1200 may be used to implement the functions of storage nodes of the storage system or array controllers of the storage array.
  • the storage system 1200 may be a chip system.
  • the chip system may be composed of chips, or may include chips and other discrete devices.
  • the storage system 1200 includes at least one processor 1220, which is used to implement or support the storage system 1200 to implement the functions of the storage node or the array controller of the storage array in the method provided in the embodiment of the present application.
  • the processor 1220 may obtain the updated metadata in the storage system. For details, refer to the detailed description in the method example, which is not repeated here.
  • the storage system 1200 may further include at least one memory 1230 for storing program instructions and/or data.
  • the memory 1230 and the processor 1220 are coupled.
  • the coupling in the embodiments of the present application is an indirect coupling or communication connection between devices, units or modules, and may be in electrical, mechanical or other forms, and is used for information exchange between devices, units or modules.
  • the processor 1220 may operate in cooperation with the memory 1230.
  • the processor 1220 may execute program instructions stored in the memory 1230. At least one of the at least one memory may be included in the processor.
  • the storage system 1200 may further include an interface 1210 for communicating with the processor 1220, or for communicating with other devices through a transmission medium, so that the storage space management apparatus 1200 can communicate with other devices.
  • the other device may be a client.
  • the processor 1220 can use the interface 1210 to send and receive data.
  • the embodiment of the present application does not limit the specific connection medium between the aforementioned interface 1210, the processor 1220, and the memory 1230.
  • the memory 1230, the processor 1220, and the interface 1210 are connected by a bus 1240.
  • the bus is represented by a thick line in FIG. 12, and the connection modes between other components are only for schematic illustration. It is not limited.
  • the bus can be divided into an address bus, a data bus, a control bus, and so on. For ease of representation, only one thick line is used in FIG. 12 to represent it, but it does not mean that there is only one bus or one type of bus.
  • the processor 1220 may be a general-purpose processor, a digital signal processor, an application specific integrated circuit, a field programmable gate array or other programmable logic device, a discrete gate or transistor logic device, or a discrete hardware component. Or execute the methods, steps, and logical block diagrams disclosed in the embodiments of the present application.
  • the general-purpose processor may be a microprocessor or any conventional processor or the like. The steps of the method disclosed in the embodiments of the present application may be directly embodied as being executed and completed by a hardware processor, or executed and completed by a combination of hardware and software modules in the processor.
  • the memory 1230 may be a non-volatile memory, such as a hard disk drive (HDD) or a solid-state drive (SSD), etc., or a volatile memory (volatile memory).
  • a non-volatile memory such as a hard disk drive (HDD) or a solid-state drive (SSD), etc.
  • a volatile memory volatile memory
  • RAM random-access memory
  • the memory is any other medium that can be used to carry or store desired program codes in the form of instructions or data structures and that can be accessed by a computer, but is not limited to this.
  • the memory in the embodiments of the present application may also be a circuit or any other device capable of realizing a storage function for storing program instructions and/or data.
  • An embodiment of the present application also provides a computer-readable storage medium, including instructions, which when run on a computer, cause the computer to execute the method executed by the storage node or the array controller in the embodiment shown in FIG. 4.
  • the embodiments of the present application also provide a computer program product, including instructions, which when run on a computer, cause the computer to execute the method executed by the storage node or the array controller in the embodiment shown in FIG. 4.
  • the embodiment of the present application provides a chip system.
  • the chip system includes a processor and may also include a memory for implementing the functions of the storage node or the array controller in the foregoing method.
  • the chip system can be composed of chips, or it can include chips and other discrete devices.
  • An embodiment of the present application provides a storage system, which includes a storage device and a storage node or an array controller in the embodiment shown in FIG. 4.
  • the methods provided in the embodiments of the present application may be implemented in whole or in part by software, hardware, firmware, or any combination thereof.
  • software When implemented by software, it can be implemented in the form of a computer program product in whole or in part.
  • the computer program product includes one or more computer instructions.
  • the computer may be a general-purpose computer, a special-purpose computer, a computer network, network equipment, user equipment, or other programmable devices.
  • the computer instructions may be stored in a computer-readable storage medium, or transmitted from one computer-readable storage medium to another computer-readable storage medium. For example, the computer instructions may be transmitted from a website, computer, server, or data center.
  • the computer-readable storage medium may be any available medium that can be accessed by the computer or a data storage device such as a server, data center, etc. integrated with one or more available media.
  • the available medium may be a magnetic medium (for example, a floppy disk, hard disk, Magnetic tape), optical media (for example, digital video disc (digital video disc, DVD for short)), or semiconductor media (for example, SSD), etc.

Abstract

一种存储系统的元数据的发送方法及存储系统,在该方法中,首先获取存储系统中根据操作请求更新的元数据,即更新的元数据,然后发送该更新的元数据。由于存储系统可以只提供存储系统中发生更新的元数据,因此,相对于现有技术中列举元数据的方式,可以提高元数据处理效率。

Description

一种存储系统的元数据的发送方法及存储系统 技术领域
本申请涉及存储技术领域,尤其涉及一种存储系统的元数据的发送方法及存储系统。
背景技术
在存储系统中,可以通过访问存储系统的元数据,来为用户提供各种业务功能。例如,可以扫描目录下所有的元数据,实现统计该目录的剩余容量的业务功能,或者,也可以通过获取的目录下所有的元数据,对该目录下存储的数据分析,实现文件分类或者模糊搜索的业务功能等,在此不一一举例。
在现有的存储系统中,主要通过存储系统提供的列举接口来访问元数据,然而,当存储系统中元数据的数据量不断增大,在这种情况下,严重影响存储系统的元数据处理效率。
发明内容
本申请实施例提供一种存储系统的元数据的发送方法及存储系统,用以提高存储系统的元数据处理效率。
第一方面,本申请提供了一种存储系统的元数据的发送方法,在该方法中,首先获取存储系统中根据操作请求更新的元数据,即更新的元数据,然后发送该更新的元数据。
在上述技术方案中,存储系统可以只提供存储系统中发生更新的元数据,相对于现有技术中列举元数据的方式,可以提高元数据处理效率。
且,由于不用提供未发生更新的元数据,从而可以减少元数据的传输量,可以节省带宽资源。
在一种可能的设计中,若存储系统包括多个元数据分区,则获取存储系统更新的元数据,可以是获取该多个元数据分区中的其中一个元数据分区,例如,第一元数据分区,更新的元数据。
在上述技术方案中,由于可以只获取该第一元数据分区中发生更新的元数据,而不用获取存储系统中所有发生变更的元数据,可以进一步减少元数据传输量,可以节省带宽资源。
在一种可能的设计中,可以使用写前日志记录存储系统中的元数据,每一个元数据对应一个元数据记录项,每个元数据记录项包含与操作请求对应的操作以及发生更新的元数据,则获取存储系统更新的元数据,可以是获取该存储系统的写前日志中更新的元数据记录项。
在上述技术方案中,提供一种通过写前日志来提供更新的元数据的方式,可以增加存储系统的灵活性。
在一种可能的设计中,获取存储系统更新的元数据可以包括但不限于如下两种方式:
第一种方式,实时获取该存储系统更新的元数据。
这样,只要存储系统中存在更新的元数据,则存储系统可以第一时间发送该更新的元数据,可以保证用户使用的元数据与存储系统中的元数据的一致性。
第二种方式,存储系统先接收用于获取该存储系统中更新的元数据的获取请求,然后再发送该更新的元数据。
这样,存储系统可以根据用户的需求来发送更新的元数据。
在一种可能的设计中,存储系统可以以消息的形式发送该更新的元数据。
在一种可能的设计中,存储系统包括多个元数据分区,该多个元数据分区可能会发生变化,例如,两个元数据分区合并,或者一个元数据分区分裂成两个新的元数据分区,在这种情况下,当该多个元数据分区中的某一个元数据分区,例如,第二元数据分区,发生变化后,存储系统可以发送用于指示取消对该第二元数据分区中更新的元数据的保序的第一指示消息。
在上述技术方案中,当元数据分区发生变化时,发生变化前的元数据分区中可能还存在未发送的更新的元数据,则当元数据分区发生变化后,可以继续发送该未发送的更新的元数据,然后在发送完成后,发送该第一指示消息。
在一种可能的设计中,当存储系统获取发生变化后的元数据分区中更新的元数据后,先发送用于指示对该发生变化后的元数据分区中更新的元数据进行保序的第二指示消息,然后,再发送该发生变化后的元数据分区中更新的元数据。
在上述技术方案中,当元数据分区发生变化时,可以通过第一指示消息和第二指示消息来实现对元数据分区中更新的元数据进行保序,例如,当用户在接收到该第二指示消息之前已经接收该第一指示消息,则说明发生变化前的元数据分区中所有更新的元数据发送成功,然后再接收发生变化后的元数据分区中的更新的元数据,从而可以实现保序。
第二方面,提供一种存储系统的元数据的发送方法,在该方法中,存储系统使用写前日志记录存储系统中的元数据,每一个元数据对应一个元数据记录项,每个元数据记录项包含与操作请求对应的操作以及发生更新的元数据,则存储系统首先获取该存储系统的写前日志中更新的元数据记录项,然后发送该更新的元数据记录项。
在上述技术方案中,存储系统可以只提供存储系统中发生更新的元数据的记录项,相对于现有技术中列举元数据的方式,可以提高元数据处理效率。
在一种可能的设计中,若存储系统包括多个元数据分区,则获取存储系统更新的元数据记录项,可以是获取该多个元数据分区中的其中一个元数据分区,例如,第一元数据分区,更新的元数据记录项。
在一种可能的设计中,获取存储系统更新的元数据记录项可以包括但不限于如下两种方式:
第一种方式,实时获取该存储系统更新的元数据记录项。
第二种方式,存储系统先接收用于获取该存储系统中更新的元数据记录项的获取请求,然后再发送该更新的元数据记录项。
在一种可能的设计中,存储系统可以以消息的形式发送该更新的元数据记录项。
在一种可能的设计中,存储系统包括多个元数据分区,该多个元数据分区可能会发生变化,在这种情况下,当该多个元数据分区中的某一个元数据分区,例如,第二元数据分 区,发生变化后,存储系统可以发送用于指示取消对该第二元数据分区中更新的元数据的保序的第一指示消息。
在一种可能的设计中,当存储系统获取发生变化后的元数据分区中更新的元数据记录项后,先发送用于指示对该发生变化后的元数据分区中更新的元数据进行保序的第二指示消息,然后,再发送该发生变化后的元数据分区中更新的元数据记录项。
第三方面,提供一种存储系统,该存储系统可以是存储节点或者存储服务端,也可以是存储节点或者存储服务端中的装置。该存储空间的管理装置包括处理器,用于实现上述第一方面描述的方法。该存储空间的管理装置还可以包括存储器,用于存储程序指令和数据。该存储器与该处理器耦合,该处理器可以调用并执行该存储器中存储的程序指令,用于实现上述第一方面描述的方法中的任意一种方法。该存储系统还可以包括通信接口,该通信接口与处理器进行通信。
在一种可能的设计中,存储系统包括通信接口和处理器,其中:
所述处理器,用于获取存储系统更新的元数据,所述更新的元数据用于指示所述存储系统中根据操作请求更新的元数据;
所述通信接口,用于发送所述更新的元数据。
在一种可能的设计中,所述存储系统包括多个元数据分区;所述处理器具体用于:
获取所述多个元数据分区中的第一元数据分区更新的元数据。
在一种可能的设计中,所述处理器具体用于:
获取所述存储系统的写前日志中更新的元数据记录项;所述更新的元数据记录项包含所述操作请求对应的操作以及所述发生更新的元数据。
在一种可能的设计中,所述处理器具体用于:
实时获取所述存储系统更新的元数据。
在一种可能的设计中,所述通信接口具体用于:
以消息的形式发送所述更新的元数据。
在一种可能的设计中,所述通信接口还用于:
接收获取请求,所述获取请求用于获取所述存储系统中更新的元数据。
在一种可能的设计中,所述通信接口还用于:
在所述多个元数据分区中的第二元数据分区发生变化后,发送第一指示消息,所述第一指示消息用于指示取消对所述第二元数据分区中更新的元数据的保序。
在一种可能的设计中,所述处理器还用于:
获取发生变化后的元数据分区中更新的元数据;
所述通信接口还用于:
发送第二指示消息,所述第二指示消息用于指示对所述发生变化后的元数据分区中更新的元数据进行保序;
发送所述发生变化后的元数据分区中更新的元数据。
第四方面,提供一种存储系统,该存储系统可以是存储节点或者存储服务端,也可以是存储节点或者存储服务端中的装置。该存储系统可以包括处理单元和收发单元,这些单元可以执行上述第一方面任一种设计示例中的所执行的相应功能,具体的:
所述处理单元,用于获取存储系统更新的元数据,所述更新的元数据用于指示所述存储系统中根据操作请求更新的元数据;
所述收发单元,用于发送所述更新的元数据。
在一种可能的设计中,所述存储系统包括多个元数据分区;所述处理单元具体用于:
获取所述多个元数据分区中的第一元数据分区更新的元数据。
在一种可能的设计中,所述处理单元具体用于:
获取所述存储系统的写前日志中更新的元数据记录项;所述更新的元数据记录项包含所述操作请求对应的操作以及所述发生更新的元数据。
在一种可能的设计中,所述处理单元具体用于:
实时获取所述存储系统更新的元数据。
在一种可能的设计中,所述收发单元具体用于:
以消息的形式发送所述更新的元数据。
在一种可能的设计中,所述收发单元还用于:
接收获取请求,所述获取请求用于获取所述存储系统中更新的元数据。
在一种可能的设计中,所述收发单元还用于:
在所述多个元数据分区中的第二元数据分区发生变化后,发送第一指示消息,所述第一指示消息用于指示取消对所述第二元数据分区中更新的元数据的保序。
在一种可能的设计中,所述处理单元还用于:
获取发生变化后的元数据分区中更新的元数据;
所述收发单元还用于:
发送第二指示消息,所述第二指示消息用于指示对所述发生变化后的元数据分区中更新的元数据进行保序;
发送所述发生变化后的元数据分区中更新的元数据。
第五方面,本申请实施例提供一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序包括程序指令,所述程序指令当被计算机执行时,使所述计算机执行第一方面中任意一项所述的方法。
第六方面,本申请实施例提供一种计算机程序产品,所述计算机程序产品存储有计算机程序,所述计算机程序包括程序指令,所述程序指令当被计算机执行时,使所述计算机执行第一方面中任意一项所述的方法。
第七方面,本申请提供了一种芯片系统,该芯片系统包括处理器,还可以包括存储器,用于实现第一方面所述的方法。该芯片系统可以由芯片构成,也可以包含芯片和其他分立器件。
上述第二方面至第七方面及其实现方式的有益效果可以参考对第一方面的方法及其实现方式的有益效果的描述。
附图说明
图1为现有技术中的存储系统的一种示例的架构图;
图2为向存储系统写入数据的过程的示意图;
图3为本申请实施例提供的存储系统的一种示例的架构图;
图4为本申请实施例中的元数据的发送方法的流程图;
图5为本申请实施例中的多个元数据分区的一种示例的示意图;
图6为本申请实施例中的元数据分区发生变化的示例的示意图;
图7为本申请实施例中元数据分区动态变化的一种示例的示意图;
图8为本申请实施例中元数据分区分裂的一种示例的示意图;
图9为本申请实施例中采用发布/订阅系统实现的存储系统的一种示例的架构图;
图10为本申请实施例中采用发布/订阅系统实现的存储系统发送元数据的示意图;
图11为本申请实施例中存储系统的一种示例的示意图;
图12为本申请实施例中存储系统的另一种示例的示意图。
具体实施方式
为了使本申请实施例的目的、技术方案和优点更加清楚,下面将结合说明书附图以及具体的实施方式对本申请实施例中的技术方案进行详细的说明。
本申请实施例中“多个”是指两个或两个以上,鉴于此,本申请实施例中也可以将“多个”理解为“至少两个”。“至少一个”,可理解为一个或多个,例如理解为一个、两个或更多个。例如,包括至少一个,是指包括一个、两个或更多个,而且不限制包括的是哪几个,例如,包括A、B和C中的至少一个,那么包括的可以是A、B、C、A和B、A和C、B和C、或A和B和C。“和/或”,描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,字符“/”,如无特殊说明,一般表示前后关联对象是一种“或”的关系。
除非有相反的说明,本申请实施例提及“第一”、“第二”等序数词用于对多个对象进行区分,不用于限定多个对象的顺序、时序、优先级或者重要程度。
本申请实施例中的方法可以应用于存储系统中,该存储系统可以为集中式存储系统或者分布式存储系统,具体来讲,可以是数据库存储系统、文件存储系统、块存储系统、对象存储系统、列式存储系统等,也可以是云存储系统,或者也可以是上述存储系统的组合等,在此不对存储系统的形态进行限制。
请参考图1,存储系统100包括接入节点110和一个或多个存储节点120,在图1中,以存储系统100包括一个接入节点110和两个存储节点120(分别为存储节点1和存储节点2)为例。
需要说明的是,接入节点110和存储节点120可以为独立的服务器,也可以为虚拟设备。当一个接入节点110和存储节点120为虚拟设备时,该多个虚拟设备可以分别运行在多个服务器上,也可以运行在同一服务器,本领域技术人员可以根据实际的运行环境确定。
请参考图2,存储系统100写入数据时,首先接入节点110接收写入操作请求,该写入操作请求中包括待写入数据,生成与该待写入数据对应的元数据,然后将生成的元数据存储到存储节点中,例如存储到存储节点2中。一种实现方式,存储系统100使用写前日志(Write-ahead Logging,WAL)方式记录该操作请求和元数据。
本申请实施例提供一种存储系统的元数据的发送方法及装置,用以提高存储系统的元数据的处理效率。
下面,将结合附图对本申请实施例中提供的数据存储方法进行说明。为实现本申请实施例中的方法,首先对如图1所示的存储系统进行改进。具体改进点如下:
请参考图3,为本申请实施例提供的存储系统的一种示例的架构图。与图1所示的存储系统不同的是,在图3的接入节点110中新增一个接口,该接口用于提供发生更新的元数据,为方便说明,在本申请实施例中,简称该接口为更新元数据接口,在此不对该接口的名称进行限制。
存储系统100能够获取发生变化的元数据,例如,当客户端向存储系统100中写入新数据,则与该新数据对应的元数据则为发生变化的元数据;或者,当客户端对存储系统100中已经存储的数据A进行修改,则与该修改后的数据对应的元数据即为发生变化的元数据;或者,当客户端删除存储系统100中已经存储的数据B,则该数据B对应的元数据即为发生变化的元数据,也就是说,发生更新的元数据是根据用于更新该存储系统中的数据的操作请求(例如,修改数据操作请求,删除数据操作请求或者写入新数据操作请求)得到的。该更新元数据接口可以提供存储系统100中发生变化的元数据。
需要说明的是,在图3所示的存储系统中,是将该新增接口设置在存储系统100内为例,在另一些实施例中,也可以将该新增接口设置在存储系统100外,在此不作限制。
下面,以图3所示的存储系统为例,介绍本申请实施例中的元数据的发送方法,请参考图4,为该方法的流程图,该流程图的描述如下:
S401、接入节点接收操作请求。
该操作请求用于更新存储系统中的数据,例如,该操作请求可以为向存储系统写入新数据的操作请求,或者,可以是删除存储系统中已经存储的数据的操作请求,也可以是对存储系统中存储的数据进行修改的操作请求等。
S402、接入节点生成与操作请求对应的元数据。
当接入节点接收到该操作请求后,则根据操作请求,更新存储节点1中的数据,在更新数据后,则可以生成与该更新后的数据对应的元数据。具体过程与图2所示的过程相似,在此不再赘述。
S403、接入节点将生成的元数据发送给元数据存储节点,元数据存储节点存储该元数据。
在本申请实施例中,元数据存储节点可以为图3所述存储节点2,为方便说明,下文中以元数据存储节点为存储节点2为例。存储节点2存储该个元数据可以包括但不限于如下方式。
第一种存储方式,存储节点2可以以异地更新方式,即按照接入节点生成元数据的先后顺序,依次将元数据写入到存储节点2中。例如,存储节点2先后接收到第一元数据和第二元数据,则存储节点2将第一元数据写入到存储节点2中的第一个元数据条目中,然后将第二元数据写入到存储节点2的第二个元数据条目中。
第二种存储方式,存储节点2中包括多个元数据分区,例如,可以采用范围分区(range partition)方式对存储节点2所能存储的所有的元数据按照键(Key)(或者理解为元数据的 索引号)划分分区。如图5所示,存储节点2所能存储的元数据的Key的范围为{0~100},然后,按照预设的步长,将该元数据分为5个元数据分区,在边界点上的元数据,可以固定采用左闭右开或者左开右闭的原则,以保证其归属的无二义性。例如,在图5所示的第一个元数据分区的元数据的Key的范围为0~19,第二个元数据分区的元数据的Key的范围为20~39,以此类推。这样,当存储节点2接收到该至少一个元数据后,则可以根据元数据的Key确定元数据所属的分区。另一种实现方式,不同的元数据分区存储到不同的存储节点。本发明实施例对此不作限定。
在本申请实施例中,该多个元数据分区可以有如下形式。
第一种形式,该多个元数据分区是固定不变的,例如,存储系统预先设置好如图5所示的5个元数据分区,在后续使用过程中始终保持该5个元数据分区。
第二种形式,该多个元数据分区是可以根据使用需求动态合并或分裂的。例如,由于各个元数据分区中的元数据的数据量可能不同,则为了保证业务均衡,当某一个元数据分区中的元数据的数据量超过阈值后,则可以将该元数据分区进行分裂,从而生成新的元数据分区。当某一个或多个元数据分区中元数据量减少,可以进行元数据分区的合并。例如,将图5所示的第二个元数据分区分裂为{20~30}和{31~39}两个元数据分区,从而得到6(a)所示的6个元数据分区。或者,可以将图5所示的第一个元数据分区和第二元数据分区合并,从而得到如图6(b)所示的4个元数据分区。
在这种情况下,由于元数据分区是动态变化的,则在不同的时间段,由于元数据分区不同,同一个Key的元数据可能位于不用的元数据分区中。例如,请参考图7,针对Key为8的元数据,在t1~t2时间段,该元数据位于元数据分区{0~19}中,其中,{0~19}表示该元数据分区中元数据Key的范围;因此,用元数据Key来表示一个分区。在t3~t4时间段,由于发生了元数据分区分裂,将元数据分区{0~19}分裂为元数据分区{0~9}和元数据分区{10~19},从而该Key为8的元数据位于元数据分区{0~9}中;在t5~t6时间段,元数据分区{0~9}分裂为元数据分区{0~5}和元数据分区{6~9},则该Key为8的元数据位于元数据分区{6~9}中。
本发明实施例中,一种实现方式,每一个元数据分区使用WAL记录该元数据分区中的元数据。例如以日志记录项的方式记录该操作请求以及该操作请求的元数据。
S404、接入节点接收获取请求,所述获取请求用于获取存储系统100发生更新的元数据,接入节点向存储节点2发送该获取请求。
作为一种示例,当客户端需要获取当前时刻所有的元数据时,则客户端可以通过调用该更新元数据接口,向接入节点发送该获取请求。另一种实现方式,本发明实施例中可以是其他应用或系统发送该获取请求,用于获取存储系统100发生更新的元数据。
S405、存储节点2获取存储系统中发生更新的元数据。
在本申请实施例中,存储节点2获取发生更新的元数据的方式可以包括但不限于如下方式。
第一种获取方式,存储节点2可以通过快照获取发生更新的元数据。例如,存储节点2在第一时刻获取存储节点2中存储的元数据的第一快照,然后在预设时长后再次获取存储的元数据的第二快照,通过比较第一快照和第二快照,则可以获取存储节点2中发生更 新的元数据。
第二种获取方式,存储节点2在存储元数据时,可以为每个元数据增加一个标识信息,该标识信息用来表示该元数据未发送,若该元数据已经被发送,则可以删掉该标识信息。这样,存储节点2可以通过搜索包括标识信息的元数据来获取发生更新的元数据。
第三种获取方式,存储节点2中存储的元数据都是按照顺序依次存储的。因此存储的元数据有序列号。存储节点2可以记录每次发送的元数据的序列号,例如,在存储系统的初始状态时,存储系统未发送任何元数据,则发送的元数据的序列号初始值为0,则存储节点2确定序列号大于0的元数据为发生更新的元数据。其中,元数据的序列号可以理解为该元数据所存储的元数据条目对应的编号。
本发明实施例中,元数据可以按照元数据分区进行存储,在这种情况下,则本发明实施例中获取存储系统100中发生更新的元数据,具体可以为存储系统中某一元数据分区(例如,第一元数据分区)中发生更新的元数据。作为一种示例,该获取请求用于获取与某一个业务相关的所有元数据,而该业务相关的所有的元数据均存储在第一元数据分区中,则存储节点2可以只获取该第一元数据分区中发生更新的元数据,而不用获取存储系统中所有发生变更的元数据,可以减少元数据传输量,从而节省带宽资源。
本发明实施例中,每个元数据分区可以使用WAL记录元数据分区中的元数据,在这种情况下,本发明实施例中获取存储系统100中发生更新的元数据,具体可以为获取所述存储系统的写前日志中更新的元数据记录项。
S406、存储节点2将获取的发生更新的元数据发送给接入节点,接入节点向客户端发送发生更新的元数据。
本发明实施例中,存储系统可以只提供发生更新的元数据,相对于现有技术列举元数据的处理方式,提高了元数据处理效率,减少了元数据传输量,进而节省了带宽资源。
其中一种实现,存储节点2按照发生更新的元数据的顺序,发送发生更新的元数据,从而,可以保证客户端获取的元数据也是有序的。
另一种实现,元数据是按照元数据分区存储的,当获取存储系统100中发生的更新的元数据是获取某一元数据分区中发生更新的元数据,则存储节点将该元数据分区中发生更新的元数据发送接入节点,从而接入节点向客户端发生该元数据分区中发生更新的元数据,从而可以减少元数据传输量,进而节省了带宽资源。
另一种实现方式,若元数据是按照元数据分区存储的,当获取存储系统100中发生的更新的元数据是获取多个元数据分区中发生更新的元数据,则存储节点并行发送该多个元数据分区中发生更新的元数据发送接入节点,从而可以减少发送时延。
另一种实现方式,在基于WAL记录元数据的情况下,可以通过发送WAL记录项的方式发送元数据。
另一种实现方式,在基于WAL记录元数据的情况下,也可以发送WAL记录的发生更新的元数据的位置信息,由客户端或其他应用根据位置信息读取WAL记录的发生更新的元数据。该位置信息可以包括每个WAL记录项的标识和/或WAL记录项的偏移位置等。若要发送的WAL记录项有多个,由于该多个WAL记录项是顺序存储的在这种情况下,该位置信息可以包括WAL记录项的标识、WAL记录项的偏移位置以及该多个WAL记录项所占 用的单位存储空间的长度,可以通过该位置信息批量获取WAL记录项。
S407、元数据存储节点中的第二元数据分区变化,产生新的元数据分区。
本发明实施例中,第二元数据分区发生变化可以包括第二元数据分区分裂或者第二元数据分区与其他元数据分区合并。为方便说明,下文中以第二元数据分区分裂为例进行说明。
随着元数据存储节点中存储的元数据的数量越来越多,当达到了分裂的条件时,元数据存储节点中的第二元数据分区分裂成多个元数据分区。该第二元数据分区为元数据存储节点中的多个元数据分区中的任意一个元数据分区。
作为一种示例,如图8所示,元数据分区1的元数据的分区key的范围为A~C,元数据分区1在Key为B的位置,分裂为元数据分区2(Key的范围为A至小于B)和元数据分区3(Key的范围为从B至C)。
S408、元数据存储节点向接入节点发送第一指示消息,所述第一指示消息用于指示取消对所述第二元数据分区中更新的元数据的保序。
如图8所示,在元数据分区1发生分裂时,元数据分区1中发生更新的元数据的最大序列号为102,而发生分裂时,元数据存储节点只向接入节点发送了元数据序列号为100的元数据,在这种情况下,元数据存储节点要保证序列号为101的元数据和序列号为102的元数据保序发送,则在发生分裂后,继续发送序列号为101的元数据和序列号为102的元数据,在发送完序列号为102的元数据后,向接入节点发送用于指示取消对所述第二元数据分区中更新的元数据的保序的消息。本发明实施例中,元数据的序列号是指元数据存储的序列号。例如以WAL方式记录元数据为例,序列号是指WAL的记录项的顺序。
S409、接入节点生成取消第二元数据分区保序关系的标记。
S410、元数据存储节点获取发生分裂后的元数据分区中更新的元数据。
S411、元数据存储节点向接入节点发送第二指示消息,接入节点向客户端发送该第二指示消息。
该第二指示消息用于指示对所述发生分裂后的元数据分区中更新的元数据进行保序。
S412、客户端向接入节点发送对第二指示消息的确认应答消息,接入节点向元数据存储节点发送该确认应答消息。
当客户端接收该第二指示消息后,查询接入节点中是否存在取消第二元数据分区保序关系的标记,如果有该标记存在,则客户端向接入节点反馈确认应答,接入节点接收该确认应答消息后,清除该标记。如果客户端在接收该分区保序关系建立消息之后,确认该接入节点中不存在该标记,则客户端不会向接入节点发送确认应答(或者,向接入节点发送否认应答)。
由于该第一指示消息的发送时间是在将分裂前的元数据分区中更新的元数据发送完成之后,因此,当客户端确认在接收到该第二指示消息之前已经接收该第一指示消息,则说明分裂前的元数据分区中所有更新的元数据发送成功,然后再根据分裂后的元数据分区接收更新的元数据,从而可以实现保序。
若元数据存储节点未接收到该确认应答消息,则元数据存储节点可以一直发送该第二指示消息,直至接收到该确认应答消息。
S413、元数据存储节点将发生分裂后的元数据分区中更新的元数据发送给接入节点,接入节点向客户端发送该更新的元数据。
在接收到该确认应答消息后,元数据存储节点则向接入节点发送发生分裂后的元数据分区中更新的元数据。
需要说明的是,当有元数据分区发生合并时,处理过程上述方式相同,在此不再赘述。
另外,接入节点还可以记录已经发送的更新的元数据的序列号,当接入节点确定接收到的更新的元数据的序列号小于记录的序列号时,则认为该更新的元数据为重复数据,从而可以直接丢弃,避免重复处理。
S414、元数据存储节点向接入节点发送元数据删除消息,接入节点根据该元数据删除消息删除元数据。
在元数据存储节点发送更新后的元数据后,可以周期性触发对已经发送的元数据的删除操作,从而向接入节点发送元数据删除消息。由于本申请实施例中的元数据消息是保序发送的,因此,当接入节点接收到该元数据删除消息后,一定能够保证与该元数据分区对应的元数据已经被消费过了,从而可以删除其接收到的更新的元数据,节省存储系统的存储资源。
需要说明的是,步骤S404为可选步骤,即不是必须要执行的,例如,存储节点2可以实时获取存储系统中更新的元数据,然后主动发送该发生更新的元数据,作为一种示例,可以在存储节点2中设置监控事件来检测存储节点中的元数据是否发生变化,这样,只要存储节点2中有元数据发生变化,就会触发该监控事件,从而主动将该发生更新的元数据发送给接入节点。因此,在图4中,以步骤S404为虚线来表示该步骤为可选步骤。
本发明实施例另一种实现方式,图4所示的实施方式可以均由存储节点或接入节点实现。在集中式存储系统中,本发明实施例可以由阵列控制器来实现,对此本发明实施例对此不作限定。存储系统获取到发生更新的元数据,向客户端或第三方系统/应用发送更新的元数据。在本发明实施例中,将以上各种实现方式称统为由存储系统实现。另外,本发明实施例还可以支持由独立于存储系统的第三方设备来实现,对此不再赘述。
下面将以发布/订阅系统(pub/sub system)来实现为例,对本申请实施例中的元数据发送方法进行详细说明。
请参考图9,为采用发布/订阅系统实现的存储系统的一种示例的架构图。与图1所述的存储系统不同的是,在图9中增加了一个发布/订阅系统130,发布/订阅系统130分别与接入节点和存储节点120通信,该发布/订阅系统130用于以消息的形式发送更新的元数据。该发布/订阅系统130还可以与第三方应用通信。
为方便本领域技术人员的理解,首先对发布/订阅系统进行说明。
发布/订阅系统包括两个参与者,消息发布者(producer)和消息订阅者(consumer),消息发布者用于在发布/订阅系统中创建主题(topic),然后向该主题中发送消息,发布/订阅系统会为消息订阅者保留主题中的消息,并将该主题中的消息转发给每一个消息订阅者。当消息订阅者从主题中接收消息,并向发布/订阅系统确认接收该消息后,发布/订阅系统将该消息将从主题中移除。
当将发布/订阅系统应用在存储系统时,可以将存储节点120作为发布/订阅系统的消息 发布者,然后,将存储系统接入节点、客户端或者第三方的应用作为消息订阅者,存储节点120在发布/订阅系统中发布元数据,存储系统的接入节点110、客户端或者第三方的应用实时消费该元数据,从而使得存储系统的接入节点110、客户端或者第三方的应用能够根据获取的元数据提供相应的服务。
下面,以图9所示的存储系统为例,介绍本申请实施例中的元数据的发送方法。
存储系统初次启动时,在发布/订阅系统中创建主题,在存储节点中创建消息发布者以及在接入节点中创建消息订阅者。
存储系统初次启动时,首先在发布/订阅系统中创建主题,该主题中可以包括一个默认主题分区(TPartition)。使用主题来发送消息,必须创建消息发布者来生产消息以及创建消息订阅者来消费消息,因此,还需要在存储节点中创建消息发布者,以及在接入节点中创建消息订阅者,在初次启动时,该消息发布者只包括一个默认消息发布者(TProducer),该消息订阅者只包括一个默认消息订阅者(TConsumer),该默认主题与该默认消息发布者和该默认消息订阅者关联,当默认消息发布者产生消息后,会发送给默认主题,而默认主题中的消息会推送给默认消息订阅者进行处理。
需要说明的是,在本申请实施例中,当存储系统是以元数据分区的方式存储元数据时,相应地,该主题也可以采用分区方式,每个元数据分区包括唯一的标识,通过元数据分区的唯一标识可以将一个元数据分区与一个主题分区映射。在存储系统初次启动时,该元数据存储节点中只包括一个元数据分区,随着新的元数据的产生,该元数据分区会发生分裂,从而产生新的元数据分区,例如,从默认元数据分区分裂为元数据分区1和元数据分区2。
作为一种示例,以消息发布者发送WAL记录项为例进行说明。请参考图10,元数据存储节点获取元数据分区2中发生更新的元数据后,则生成与该发生更新的元数据的对应的新的WAL记录项,并将该新的WAL记录项写入到分区更新队列中。默认消息发布者根据该新的WAL记录项确定不包括该元数据分区2对应的消息发布者,则在消息发布者中创建与元数据分区2对应的消息发布者,标记为消息发布者2。然后,消息发布者2从该分区更新队列中获取该新的WAL记录项,发送到该主题中。默认主题分区接收到该新的WAL记录项后,确定该主题中不包括与该新的WAL记录项对应的主题分区,从而在该主题中创建与元数据分区2对应的主题分区2,通过该主题分区2接收该新的WAL记录项。同时,默认主题分区向默认消息订阅者发送产生新的元数据分区的事件消息,默认消息订阅者接收该事件消息后则在消息订阅者中创建与该元数据分区2对应的消息订阅者2,从而通过该消息订阅者2从主题分区2获取该新的WAL记录项。
在另一种实现方式中,存储系统的接入节点作为消息的发布者,客户端或者第三方的应用作为消息订阅者。在另一种实现方式中,存储系统作为消息的发布者,客户端或第三方应用作为消息订阅者。
上述本申请提供的实施例中,为了实现上述本申请实施例提供的方法中的各功能,存储系统可以包括硬件结构和/或软件模块,以硬件结构、软件模块、或硬件结构加软件模块的形式来实现上述各功能。上述各功能中的某个功能以硬件结构、软件模块、还是硬件结构加软件模块的方式来执行,取决于技术方案的特定应用和设计约束条件。
图11示出了一种存储系统1100的结构示意图。其中,存储系统1100可以用于实现存 储系统的存储节点或者存储阵列的阵列控制器的功能。存储系统1100可以是硬件结构、软件模块、或硬件结构加软件模块。存储系统1100可以由芯片系统实现。本申请实施例中,芯片系统可以由芯片构成,也可以包含芯片和其他分立器件。
存储系统1100可以包括处理单元1101和收发单元1102。
收发单元1102可以用于执行图4所示的实施例中的步骤S403、步骤S404、步骤S406、步骤S408、步骤S411~步骤S414,和/或用于支持本文所描述的技术的其它过程。一种可能的实现方式,收发单元1102可以用于与处理单元1101通信,或者,收发单元1102可以用于存储系统1100和其它模块进行通信,其可以是电路、器件、接口、总线、软件模块、收发器或者其它任意可以实现通信的装置。
处理单元1101可以用于执行图4所示的实施例中的步骤S405、步骤S407以及步骤S410,和/或用于支持本文所描述的技术的其它过程。
其中,上述方法实施例涉及的各步骤的所有相关内容均可以援引到对应功能模块的功能描述,在此不再赘述。
图11所示的实施例中对模块的划分是示意性的,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,另外,在本申请各个实施例中的各功能模块可以集成在一个处理器中,也可以是单独物理存在,也可以两个或两个以上模块集成在一个模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。
如图12所示为本申请实施例提供的存储系统1200,其中,存储系统1200可以用于实现存储系统的存储节点或者存储阵列的阵列控制器的功能。其中,该存储系统1200可以为芯片系统。本申请实施例中,芯片系统可以由芯片构成,也可以包含芯片和其他分立器件。
存储系统1200包括至少一个处理器1220,用于实现或用于支持存储系统1200实现本申请实施例提供的方法中存储节点或者存储阵列的阵列控制器的功能。示例性地,处理器1220可以获取存储系统中更新的元数据,具体参见方法示例中的详细描述,此处不做赘述。
存储系统1200还可以包括至少一个存储器1230,用于存储程序指令和/或数据。存储器1230和处理器1220耦合。本申请实施例中的耦合是装置、单元或模块之间的间接耦合或通信连接,可以是电性,机械或其它的形式,用于装置、单元或模块之间的信息交互。处理器1220可能和存储器1230协同操作。处理器1220可能执行存储器1230中存储的程序指令。所述至少一个存储器中的至少一个可以包括于处理器中。
存储系统1200还可以包括接口1210,用于与处理器1220通信,或者用于通过传输介质和其它设备进行通信,从而用于存储空间的管理装置1200可以和其它设备进行通信。示例性地,该其它设备可以是客户端。处理器1220可以利用接口1210收发数据。
本申请实施例中不限定上述接口1210、处理器1220以及存储器1230之间的具体连接介质。本申请实施例在图12中以存储器1230、处理器1220以及接口1210之间通过总线1240连接,总线在图12中以粗线表示,其它部件之间的连接方式,仅是进行示意性说明,并不引以为限。所述总线可以分为地址总线、数据总线、控制总线等。为便于表示,图12中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。
在本申请实施例中,处理器1220可以是通用处理器、数字信号处理器、专用集成电路、现场可编程门阵列或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件, 可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件处理器执行完成,或者用处理器中的硬件及软件模块组合执行完成。
在本申请实施例中,存储器1230可以是非易失性存储器,比如硬盘(hard disk drive,HDD)或固态硬盘(solid-state drive,SSD)等,还可以是易失性存储器(volatile memory),例如随机存取存储器(random-access memory,RAM)。存储器是能够用于携带或存储具有指令或数据结构形式的期望的程序代码并能够由计算机存取的任何其他介质,但不限于此。本申请实施例中的存储器还可以是电路或者其它任意能够实现存储功能的装置,用于存储程序指令和/或数据。
本申请实施例中还提供一种计算机可读存储介质,包括指令,当其在计算机上运行时,使得计算机执行图4所示的实施例中存储节点或阵列控制器执行的方法。
本申请实施例中还提供一种计算机程序产品,包括指令,当其在计算机上运行时,使得计算机执行图4所示的实施例中存储节点或阵列控制器执行的方法。
本申请实施例提供了一种芯片系统,该芯片系统包括处理器,还可以包括存储器,用于实现前述方法中存储节点或阵列控制器的功能。该芯片系统可以由芯片构成,也可以包含芯片和其他分立器件。
本申请实施例提供了一种存储系统,该存储系统包括存储设备以及图4所示的实施例中存储节点或阵列控制器。
本申请实施例提供的方法中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、网络设备、用户设备或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(digital subscriber line,简称DSL)或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机可以存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质(例如,软盘、硬盘、磁带)、光介质(例如,数字视频光盘(digital video disc,简称DVD))、或者半导体介质(例如,SSD)等。

Claims (26)

  1. 一种存储系统的元数据的发送方法,其特征在于,包括:
    获取存储系统更新的元数据,所述更新的元数据用于指示所述存储系统中根据操作请求更新的元数据;
    发送所述更新的元数据。
  2. 根据权利要求1所述的方法,其特征在于,所述存储系统包括多个元数据分区;所述获取存储系统更新的元数据,包括:
    获取所述多个元数据分区中的第一元数据分区更新的元数据。
  3. 根据权利要求1或2所述的方法,其特征在于,所述获取存储系统更新的元数据,包括:
    获取所述存储系统的写前日志中更新的元数据记录项;所述更新的元数据记录项包含所述操作请求对应的操作以及所述发生更新的元数据。
  4. 根据权利要求1-3任一所述的方法,其特征在于,所述获取存储系统更新的元数据,包括:
    实时获取所述存储系统更新的元数据。
  5. 根据权利要求1-4任一所述的方法,其特征在于,所述发送所述更新的元数据,具体包括:
    以消息的形式发送所述更新的元数据。
  6. 根据权利要求1-3任一项所述的方法,其特征在于,在发送所述更新的元数据之前,所述方法还包括:
    接收获取请求,所述获取请求用于获取所述存储系统中更新的元数据。
  7. 根据权利要求2-6任一项所述的方法,其特征在于,所述方法还包括:
    在所述多个元数据分区中的第二元数据分区发生变化后,发送第一指示消息,所述第一指示消息用于指示取消对所述第二元数据分区中更新的元数据的保序。
  8. 根据权利要求7所述的方法,其特征在于,所述方法还包括:
    获取发生变化后的元数据分区中更新的元数据;
    发送第二指示消息,所述第二指示消息用于指示对所述发生变化后的元数据分区中更新的元数据进行保序;
    发送所述发生变化后的元数据分区中更新的元数据。
  9. 一种存储系统,其特征在于,包括通信接口和处理器,其中:
    所述处理器,用于获取存储系统更新的元数据,所述更新的元数据用于指示所述存储系统中根据操作请求更新的元数据;
    所述通信接口,用于发送所述更新的元数据。
  10. 根据权利要求9所述的存储系统,其特征在于,所述存储系统包括多个元数据分区;所述处理器具体用于:
    获取所述多个元数据分区中的第一元数据分区更新的元数据。
  11. 根据权利要求9或10所述的存储系统,其特征在于,所述处理器具体用于:
    获取所述存储系统的写前日志中更新的元数据记录项;所述更新的元数据记录项包含所述操作请求对应的操作以及所述发生更新的元数据。
  12. 根据权利要求9-11任一所述的存储系统,其特征在于,所述处理器具体用于:
    实时获取所述存储系统更新的元数据。
  13. 根据权利要求9-12任一所述的存储系统,其特征在于,所述通信接口具体用于:
    以消息的形式发送所述更新的元数据。
  14. 根据权利要求9-13任一项所述的存储系统,其特征在于,所述通信接口还用于:
    接收获取请求,所述获取请求用于获取所述存储系统中更新的元数据。
  15. 根据权利要求10-14任一项所述的存储系统,其特征在于,所述通信接口还用于:
    在所述多个元数据分区中的第二元数据分区发生变化后,发送第一指示消息,所述第一指示消息用于指示取消对所述第二元数据分区中更新的元数据的保序。
  16. 根据权利要求15所述的存储系统,其特征在于,所述处理器还用于:
    获取发生变化后的元数据分区中更新的元数据;
    所述通信接口还用于:
    发送第二指示消息,所述第二指示消息用于指示对所述发生变化后的元数据分区中更新的元数据进行保序;
    发送所述发生变化后的元数据分区中更新的元数据。
  17. 一种存储系统,其特征在于,包括处理单元和收发单元,其中:
    所述处理单元,用于获取存储系统更新的元数据,所述更新的元数据用于指示所述存储系统中根据操作请求更新的元数据;
    所述收发单元,用于发送所述更新的元数据。
  18. 根据权利要求17所述的存储系统,其特征在于,所述存储系统包括多个元数据分区;所述处理单元具体用于:
    获取所述多个元数据分区中的第一元数据分区更新的元数据。
  19. 根据权利要求17或18所述的存储系统,其特征在于,所述处理单元具体用于:
    获取所述存储系统的写前日志中更新的元数据记录项;所述更新的元数据记录项包含所述操作请求对应的操作以及所述发生更新的元数据。
  20. 根据权利要求17-19任一所述的存储系统,其特征在于,所述处理单元具体用于:
    实时获取所述存储系统更新的元数据。
  21. 根据权利要求17-20任一所述的存储系统,其特征在于,所述收发单元具体用于:
    以消息的形式发送所述更新的元数据。
  22. 根据权利要求17-21任一项所述的存储系统,其特征在于,所述收发单元还用于:
    接收获取请求,所述获取请求用于获取所述存储系统中更新的元数据。
  23. 根据权利要求17-22任一项所述的存储系统,其特征在于,所述收发单元还用于:
    在所述多个元数据分区中的第二元数据分区发生变化后,发送第一指示消息,所述第一指示消息用于指示取消对所述第二元数据分区中更新的元数据的保序。
  24. 根据权利要求23所述的存储系统,其特征在于,所述处理单元还用于:
    获取发生变化后的元数据分区中更新的元数据;
    所述收发单元还用于:
    发送第二指示消息,所述第二指示消息用于指示对所述发生变化后的元数据分区中更新的元数据进行保序;
    发送所述发生变化后的元数据分区中更新的元数据。
  25. 一种计算机存储介质,其特征在于,所述计算机存储介质存储有指令,当所述指令在计算机上运行时,使得所述计算机执行如权利要求1-8任一项所述的方法。
  26. 一种计算机程序产品,其特征在于,所述计算机程序产品存储有指令,当所述指令在计算机上运行时,使得所述计算机执行如权利要求1-8任一项所述的方法。
PCT/CN2020/117416 2019-09-30 2020-09-24 一种存储系统的元数据的发送方法及存储系统 WO2021063242A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910944349.3 2019-09-30
CN201910944349.3A CN112578996A (zh) 2019-09-30 2019-09-30 一种存储系统的元数据的发送方法及存储系统

Publications (1)

Publication Number Publication Date
WO2021063242A1 true WO2021063242A1 (zh) 2021-04-08

Family

ID=75116633

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/117416 WO2021063242A1 (zh) 2019-09-30 2020-09-24 一种存储系统的元数据的发送方法及存储系统

Country Status (2)

Country Link
CN (1) CN112578996A (zh)
WO (1) WO2021063242A1 (zh)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080162817A1 (en) * 2006-12-29 2008-07-03 Yusuf Batterywala Method and system for caching metadata of a storage system
CN105487500A (zh) * 2014-10-06 2016-04-13 费希尔-罗斯蒙特系统公司 在过程控制系统中流式传输用于分析的数据
CN105718484A (zh) * 2014-12-04 2016-06-29 中兴通讯股份有限公司 写文件、读文件、删除文件、查询文件的方法及客户端
CN108347455A (zh) * 2017-01-24 2018-07-31 阿里巴巴集团控股有限公司 元数据交互方法及系统
CN110018796A (zh) * 2019-04-11 2019-07-16 苏州浪潮智能科技有限公司 一种存储系统处理数据请求的方法、装置

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7010645B2 (en) * 2002-12-27 2006-03-07 International Business Machines Corporation System and method for sequentially staging received data to a write cache in advance of storing the received data
US10585627B2 (en) * 2016-03-24 2020-03-10 Microsoft Technology Licensing, Llc Distributed metadata management in a distributed storage system
CN110019267A (zh) * 2017-11-21 2019-07-16 中国移动通信有限公司研究院 一种元数据更新方法、装置、系统、电子设备及存储介质
CN110134340B (zh) * 2019-05-23 2020-03-06 苏州浪潮智能科技有限公司 一种元数据更新的方法、装置、设备以及存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080162817A1 (en) * 2006-12-29 2008-07-03 Yusuf Batterywala Method and system for caching metadata of a storage system
CN105487500A (zh) * 2014-10-06 2016-04-13 费希尔-罗斯蒙特系统公司 在过程控制系统中流式传输用于分析的数据
CN105718484A (zh) * 2014-12-04 2016-06-29 中兴通讯股份有限公司 写文件、读文件、删除文件、查询文件的方法及客户端
CN108347455A (zh) * 2017-01-24 2018-07-31 阿里巴巴集团控股有限公司 元数据交互方法及系统
CN110018796A (zh) * 2019-04-11 2019-07-16 苏州浪潮智能科技有限公司 一种存储系统处理数据请求的方法、装置

Also Published As

Publication number Publication date
CN112578996A (zh) 2021-03-30

Similar Documents

Publication Publication Date Title
US10831612B2 (en) Primary node-standby node data transmission method, control node, and database system
US10235047B2 (en) Memory management method, apparatus, and system
KR101871383B1 (ko) 계층적 데이터 구조의 노드 상에서 재귀적 이벤트 리스너를 사용하기 위한 방법 및 시스템
CN111309732B (zh) 数据处理方法、装置、介质和计算设备
US11550486B2 (en) Data storage method and apparatus
US11262916B2 (en) Distributed storage system, data processing method, and storage node
US20190220443A1 (en) Method, apparatus, and computer program product for indexing a file
US20220164316A1 (en) Deduplication method and apparatus
CN107992270B (zh) 一种多控存储系统全局共享缓存的方法及装置
US11231964B2 (en) Computing device shared resource lock allocation
US20220107752A1 (en) Data access method and apparatus
US10545667B1 (en) Dynamic data partitioning for stateless request routing
CN109144403B (zh) 一种用于云盘模式切换的方法与设备
US20140025630A1 (en) Data-store management apparatus, data providing system, and data providing method
WO2021063242A1 (zh) 一种存储系统的元数据的发送方法及存储系统
US20220350779A1 (en) File system cloning method and apparatus
CN113051244B (zh) 数据访问方法和装置、数据获取方法和装置
WO2017177400A1 (zh) 一种数据处理方法及系统
CN111399753B (zh) 写入图片的方法和装置
CN116594551A (zh) 一种数据存储方法及装置
CN114490540A (zh) 数据存储方法、介质、装置和计算设备
CN109343928B (zh) 虚拟化集群中虚拟机的虚拟内存文件重定向方法及其系统
CN114254035A (zh) 数据库系统、数据处理方法及设备
CN116466876A (zh) 一种存储系统及数据处理方法
KR20030055482A (ko) 선택적인 캐시관리방법 및 장치

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20872133

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20872133

Country of ref document: EP

Kind code of ref document: A1