CN115599295A

CN115599295A - Node capacity expansion method and device of storage system

Info

Publication number: CN115599295A
Application number: CN202211180673.0A
Authority: CN
Inventors: 贾世萌; 郑磊; 豆森
Original assignee: Lenovo Beijing Ltd
Current assignee: Lenovo Beijing Ltd
Priority date: 2022-09-27
Filing date: 2022-09-27
Publication date: 2023-01-13

Abstract

The application provides a node capacity expansion method and a device of a storage system, which improve a multilayer data structure of each original metadata node in the storage system, so that each SST file deployed on each layer comprises metadata of the same metadata fragment file and a fragment index field aiming at the metadata fragment file, therefore, after a metadata node is newly added to the storage system, the original metadata node metadata splitting request can be used for determining SST files to be split in at least one layer below the highest layer of the original metadata node by using the fragment index field contained in each SST file, and then directly transferring version information corresponding to a sub data structure formed by the SST files to be split in the original metadata node into the version information of the newly added metadata node without transferring a large amount of metadata, thereby improving the metadata splitting efficiency, saving network resources and hardware resources, avoiding IO service interruption of the newly added metadata node and influencing the performance of the original metadata node.

Description

Node capacity expansion method and device of storage system

Technical Field

The present application relates generally to the field of storage, and more particularly, to a node capacity expansion method and apparatus for a storage system.

Background

With the increasing of data generated by applications such as the internet and the internet of things, the capacity demand of a distributed storage system increases, and when the remaining capacity of the storage system is insufficient, the storage nodes and the metadata nodes of the storage system need to be continuously expanded to improve the performance of the storage system.

When the metadata nodes are expanded, the metadata nodes are usually added in the storage system, part of metadata of the original metadata nodes are split and read out, and are migrated and written into the newly added metadata nodes, so that load balance of the metadata nodes in the storage system is realized. However, the metadata split migration method needs to consume a large amount of time and hardware resources, and newly added metadata nodes cannot respond to the reading service due to the fact that complete data are not available.

Disclosure of Invention

In order to solve the above technical problem, in one aspect, the present application provides a node capacity expansion method for a storage system, where the method includes:

determining a storage system to respond to a metadata node capacity expansion request and newly add a metadata node; each original metadata node in the storage system has a multi-layer data structure, each layer can deploy at least one SST file, and each SST file comprises metadata of the same metadata fragment file and a fragment index field aiming at the metadata fragment file;

responding to a metadata splitting request, and determining to-be-split SST files where to-be-split metadata split files are located according to the segment index fields of each SST file of the original metadata node;

migrating version information corresponding to a sub-data structure formed by the SST file to be split in the original metadata node to the version information of the newly added metadata node; the sub-data structure comprises SST files located at least one layer below the highest layer of the original metadata node.

Optionally, each SST file includes metadata of the same metadata fragment file, and the method includes:

responding to a file integration event aiming at any metadata node in the storage system, and determining SST files to be integrated existing in the metadata node;

according to fragment file identifications corresponding to each piece of metadata contained in the SST file to be integrated, writing the piece of metadata to be integrated corresponding to the same fragment file identification in the SST file to be integrated into a newly created SST file, and writing the fragment file identification into an identification field of the corresponding newly created SST file;

wherein the shard file identifier is used for identifying different metadata shard files.

Optionally, the storing the metadata in a key value pair manner, and writing the metadata to be integrated corresponding to the same fragment file identifier in the SST files to be integrated into a newly created SST file according to the fragment file identifier corresponding to each metadata included in the SST files to be integrated includes:

sequentially writing the key values which are iteratively inquired by the SST files to be integrated into new SST files; the key value corresponds to a fragment file identifier of the metadata fragment file;

determining that the key value written into the new SST file is the last key value of the metadata fragment file with the corresponding fragment file identifier, and writing the fragment file identifier into the identifier field of the new SST file;

creating a new SST file, and writing the next key value which is inquired in an iterative manner into the newly created SST file;

and determining that the key value written into the new SST file at this time is not the last key value of the metadata fragmented file with the corresponding fragmented file identifier, and continuously writing the next key value which is iteratively inquired into the new SST file.

Optionally, the migrating the version information corresponding to the sub-data structure formed by the SST file to be split in the original metadata node to the version information of the newly added metadata node includes:

extracting version information of the SST files to be split existing in layers below the highest layer in the original metadata nodes;

migrating the extracted version information to a layer corresponding to a newly added metadata node, and merging the received version information with the version information of the layer corresponding to the newly added metadata node;

and updating the version information of the multilayer data structure currently possessed by the original metadata node according to the migrated version information in the original metadata node.

Optionally, the extracting version information of the SST file to be split existing in each layer below the highest layer in the original metadata node includes:

determining that the SST files to be split exist in the highest layer of the original metadata node, triggering the original metadata node to respond to a file integration event, and integrating the SST files to be split existing in the highest layer to the SST files created in the lower layer;

and determining that the SST files to be split do not exist in the highest layer of the original metadata node, and extracting version information of the SST files to be split existing in each layer of the original metadata node.

In another aspect, the present application further provides a node capacity expansion method for a storage system, where the method includes:

the method comprises the steps that a newly-added metadata node in a storage system receives version information corresponding to a sub data structure formed by SST files to be split migrated by an original metadata node; each original metadata node in the storage system has a multi-layer data structure, each layer can deploy at least one SST file, and each SST file comprises metadata of the same metadata fragment file and a fragment index field aiming at the metadata fragment file; the SST file to be split refers to the SST file where the metadata fragment file to be split is located; the sub data structure comprises SST files located at least one layer below the highest layer of the original metadata node;

and merging the received version information with the version information of the corresponding layer in the multilayer data structure of the self.

Optionally, the method further includes:

acquiring the maximum sequence number of the splitting key value of the to-be-split metadata fragment file in the original metadata node;

determining the global sequence number of the newly added metadata node by using the maximum sequence number of the split key value; the global sequence number of the newly added metadata node is larger than the maximum sequence number of the split key value;

in the process of merging the version information migrated from the original metadata node, forbidding to respond to a file integration event to obtain a data write-in request;

responding to the data writing request according to the global sequence number;

and determining to finish the merging operation of the version information, and canceling a prohibition instruction for responding to the file merging event.

Optionally, the method further includes:

obtaining a data reading request;

reading the requested data to be read in the newly added metadata node according to the version information of the current multilayer data structure;

and determining that the data to be read is not successfully read, forwarding the data reading request to the corresponding original metadata node, and responding the data reading request by the original metadata node.

In another aspect, the present application further provides a node capacity expansion apparatus of a storage system, where the apparatus includes:

the newly-added metadata node determining module is used for determining that the storage system responds to the metadata node capacity expansion request and newly-added metadata nodes; each original metadata node in the storage system has a multi-layer data structure, each layer can deploy at least one SST file, and each SST file comprises metadata of the same metadata fragment file and a fragment index field aiming at the metadata fragment file;

the to-be-split SST file determining module is used for responding to a metadata split request and determining the to-be-split SST file where the to-be-split metadata fragment file is located according to the fragment index field of each SST file of the original metadata node;

the version information migration module is used for migrating the version information corresponding to the sub-data structure formed by the SST file to be split in the original metadata node to the version information of the newly added metadata node; the sub-data structure comprises SST files located at least one layer below the highest layer of the original metadata node.

In another aspect, the present application further provides a node capacity expansion apparatus for a storage system, where the apparatus includes:

the version information receiving module is used for receiving the version information corresponding to the sub-data structure formed by the SST files to be split migrated by the original metadata nodes by the newly added metadata nodes in the storage system; each original metadata node in the storage system has a multi-layer data structure, each layer can deploy at least one SST file, and each SST file comprises metadata of the same metadata fragment file and a fragment index field aiming at the metadata fragment file; the SST files to be split refer to SST files where the metadata fragment files to be split are located;

and the version information merging module is used for merging the received version information with the version information of the corresponding layer in the self multi-layer data structure.

In yet another aspect, the present application further proposes a computer-readable storage medium, on which a computer program can be stored, the computer program being called and loaded by a processor to implement the node capacity expansion method of the storage system.

Therefore, after a new metadata node is added to the storage system, the original metadata node metadata splitting request can be used for determining SST files to be split in at least one layer below the highest layer of the original metadata node by using the fragment index field contained in each SST file, and then directly transferring version information corresponding to a sub data structure formed by the SST files to be split in the original metadata node to the version information of the new metadata node without transferring a large amount of metadata, so that the metadata splitting efficiency is improved, network resources and hardware resources are saved, IO service interruption of the new metadata node cannot be caused, and the performance of the original metadata node cannot be influenced.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.

FIG. 1 is a schematic diagram illustrating a metadata migration flow of a source level DB in a level DB capacity expansion scenario;

fig. 2 is a schematic diagram of an alternative scenario of a node capacity expansion method applied to the storage system proposed in the present application;

fig. 3 is a schematic flowchart of an alternative example of a node capacity expansion method of the storage system proposed in the present application;

FIG. 4 is a schematic flowchart illustrating another alternative example of a node expansion method for a storage system according to the present application;

fig. 5 is a schematic flowchart of yet another alternative example of a node expansion method of the storage system provided in the present application;

FIG. 6 is a schematic flowchart illustrating another alternative example of a node expansion method for a storage system according to the present application;

FIG. 7 is a schematic flowchart illustrating yet another alternative example of a node expansion method for a storage system according to the present application;

FIG. 8 is a schematic flow chart illustrating yet another alternative example of a node capacity expansion method for the storage system proposed in the present application;

FIG. 9 is a schematic flow chart of a level DB capacity expansion scenario of a node capacity expansion method applicable to the storage system proposed in the present application;

FIG. 10 is a schematic structural diagram of an alternative example of a node expansion apparatus of the storage system according to the present application

Fig. 11 is a schematic structural diagram of yet another alternative example of a node expansion apparatus of the storage system proposed in the present application;

fig. 12 is a schematic structural diagram of yet another alternative example of a node expansion apparatus of the storage system proposed in the present application;

fig. 13 is a schematic structural diagram of an alternative example of a storage system suitable for the node capacity expansion method of the storage system proposed in the present application.

Detailed Description

For the description content in the background art, a levelDB database is taken as a metadata node of a distributed storage system as an example, that is, levelDB is taken as a metadata service, a shelf is taken as a unit of a minimum metadata fragment and is recorded as a metadata fragment file, referring to a node expansion scene schematic diagram of the storage system shown in fig. 1, after a shelf to be split (such as a gray region in fig. 1) is determined, an iterator for creating the shelf from a source levelDB (such as levelDB1 shown in fig. 1) is generally used, a start key iteration of metadata in the shelf to be split is used, all metadata in the shelf are read (such as key value to be split is obtained), and written (such as key value read by pushing, key value) is used, a newly increased levelDB (such as levelDB2 shown in fig. 1), and this implementation process of metadata crossing nodes consumes very much hardware resources and also requires more network transmission resources. Moreover, before the replication of all metadata of the to-be-split shared board is completed, the newly-added level db cannot respond to an IO (Input/Output) service (such as a read/write service shown in fig. 1) because of having no complete data, which causes a long-time IO service interruption, and affects the IO service processing efficiency.

In addition, after the iteration of the metadata is completed, an iterator is usually required to be destroyed, and the keys of the shards to be split of the source level db are sequentially deleted, that is, the old data in the original metadata nodes are deleted, which requires generating a tombstone for each key, which is time-consuming, and the performance of the source level db is seriously affected by generating too many tombstones.

In order to improve the above problem, the present application provides a new metadata splitting method to support a node capacity expansion method for implementing a storage system, and referring to a scene diagram of the node capacity expansion method for a storage system provided by the present application shown in fig. 2, only version information of a sub-data structure of metadata to be split in an original metadata node (e.g., a source level db) needs to be split and merged to version information of a corresponding layer in a new metadata node (e.g., a new level db), the process does not need to migrate a large amount of metadata across nodes, network transmission resources are saved, and node capacity expansion efficiency is improved,

moreover, because the migrated sub data structure does not include the highest layer, the process does not affect the reading/writing operation of the IO service to the highest layer, that is, the reading/writing service and the metadata splitting process of the present case can be executed simultaneously, and the service of the user is not blocked. In addition, the node capacity expansion process does not generate garbage data, extra resources are not needed for cleaning the garbage data, and negative effects on the original metadata node can not be caused after the migration is completed.

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present application without making any creative effort belong to the protection scope of the present application.

Referring to fig. 3, a flow diagram of an optional example of the node capacity expansion method for the storage system provided by the present application is shown, where the method may be described from any original metadata node side in the storage system, that is, the method provided by this embodiment may be applied to the original metadata node, which may be a levelDB database, or other type of device node, as the case may be. As shown in fig. 3, the node capacity expansion method of the storage system according to this embodiment may include:

step S31, determining that the storage system responds to the metadata node capacity expansion request and newly adds metadata nodes;

in this embodiment, each SST file may include metadata of the same metadata shard file, that is, each SST file in each metadata node in the storage system corresponds to metadata of only one metadata shard file, and may be implemented by a new composition integration method, and the implementation process may refer to descriptions of corresponding portions of the following method embodiments, which is not described in detail herein.

In addition, in each SST file, a field of a shard, that is, a shard index field for the shard of the metadata shard file, is added in the application, and is used for recording which shard the SST file contains, and recording which index information of metadata and the like, so as to obtain the metadata shard file shard and the like through subsequent query.

In some embodiments, the storage system may be a key-value storage system, and the multi-layer data structure of each metadata node included in the storage system may be a LSMTree (Log Structured Merge Tree), that is, a layered, ordered, disk-oriented data structure, where a highest layer L0 is located in the memory, and layers below the highest layer are located in the disk, and the application does not limit respective types of the memory and the disk. In practical application, usually, modification increment of data or newly obtained service data is written into a memory, and after reaching a certain capacity, the data is merged into a disk in batch, for example, the data in the memory is merged and added to the tail of a disk queue in a merging and sorting manner, so that the writing performance is improved. The working principle of the LSM Tree is not detailed in the present application.

Step S32, responding to the metadata splitting request, and determining SST files to be split, where the metadata split files to be split are located, according to the split index fields of each SST file of the original metadata nodes;

after determining that a metadata node is newly added to the storage system, in order to balance the cluster load, enable the newly added metadata node to process the service, and improve the performance of the storage system, it is necessary to split the metadata on the original metadata node, and split part of the metadata to the new metadata node. In contrast, a metadata splitting operation is performed on an original metadata node that is determined to need to split metadata, so that the original metadata node can obtain a corresponding metadata splitting request.

Of course, the user may also use the management node in the storage system to send a corresponding metadata splitting request to one or more original metadata nodes to request splitting of the metadata node that receives the metadata splitting request, splitting at least part of the metadata to a newly added metadata node, and the like.

In combination with the above description about the fragment index field included in the SST file, an original metadata node in the storage system may respond to the obtained metadata splitting request, determine which SST file includes the requested split metadata fragment file shard (i.e., the unit of the smallest metadata fragment) to be split, and determine the SST file as the SST file to be split by querying the fragment index field included in each SST file that the metadata node has. The implementation method for determining the to-be-split shrard/to-be-split SST file is not limited.

Step S33, migrating the version information corresponding to the sub-data structure formed by the SST file to be split in the original metadata node to the version information of the newly added metadata node; the sub data structure contains SST files located at least one level below the highest level of the original metadata node.

As each metadata node in the storage system uses a multi-layer data structure to implement data management of the SST file, as shown in fig. 2, newly obtained data is written into the highest layer L0 of the metadata node first, and as a result of the multi-layer data, the lower the number of layers is, the older the metadata stored in the corresponding SST file is, that is, the earlier the metadata node obtains the metadata, different SST files in different hierarchies may form corresponding data structures, such as LSM tree structures, and Version information Version of the data structures is recorded to represent the SST files in each hierarchy included in the metadata node.

Therefore, after the original metadata node determines the SST files to be split according to the method described above, in order to reduce the data amount transmitted across the nodes and improve the metadata splitting efficiency, because each SST file after the improvement of the scheme only contains one metadata fragment file, each SST file can be used as a node, and the original metadata node can split Version information of a sub-data structure (such as a sub-tree) formed by the determined SST files to be split from the metadata node and send the split Version information to the newly added metadata node, so that the newly added metadata node combines the Version information of the multi-layer data structure of the newly added metadata node with the received Version information of the sub-data structure, and uses the combined Version information as New Version information (New Version) of the multi-layer data structure of the newly added metadata node to support normal service processing of the newly added metadata node.

The SST files to be split are from the SST files of at least one layer below the highest layer in the multi-layer data structure of the original metadata node and do not contain the SST files of the highest layer L0, so that when the newly added metadata node merges version information, the data of the highest layer of the newly added metadata node cannot be processed, the newly added metadata node cannot conflict with the read-write service of the highest layer, and the newly added metadata node can normally process the data write service in the metadata splitting process.

Moreover, only version information of a sub-data structure formed by the SST files to be split needs to be migrated between the original metadata nodes and the newly added metadata nodes, and compared with the transmission of all metadata to be split contained in the SST files to be split, the version information has very small data volume, so that the time and hardware resources consumed by metadata splitting are greatly shortened, and the metadata splitting efficiency after node capacity expansion is improved.

Referring to fig. 4, which is a flow schematic of yet another optional example of a node expansion method of a storage system proposed in the present application, this embodiment may improve a data structure of any metadata node in the storage system, and describe an improved implementation process of a processing procedure of integrating (composition) multi-layer data of the metadata node, as shown in fig. 4, the method may include:

step S41, responding to a file integration event aiming at any metadata node in the storage system, and determining SST files to be integrated existing in the metadata node;

the multi-layer data structure of any metadata node in the storage system can be an LSM tree, after user data is written into the metadata node, a WAL (Write-Ahead Logging) log mode is usually written first, then a memory table Memtable is written, after a certain condition is met, the Memtable is frozen, and a dump operation is executed to form an SST file. In this way, as the number of times of dumping increases with the increase of data written into the metadata node, and the number of times of dumping increases, and further, the number of files to be dumped increases, but too many SST files increase the number of IO times of data query, and the ranges of SST files at different layers may overlap, and it is necessary to merge these SST files, and this process may be referred to as a compact.

Therefore, for scanning SST files of each layer, when a preset integration condition (that is, a condition for performing a compact on the SST file) is met, the metadata node can respond to a corresponding file integration event and determine the SST files to be integrated existing in the metadata node. In the embodiment of the present application, the integration condition, that is, the condition for triggering generation of the file integration event, may include, but is not limited to: any SST file contains metadata belonging to different metadata fragment files and boards, and the SST file is determined as an SST file to be integrated.

Step S42, according to the fragment file identifications corresponding to each metadata contained in the SST files to be integrated, writing the metadata to be integrated corresponding to the same fragment file identification in the SST files to be integrated into the newly created SST files, and writing the fragment file identification into the identification field of the corresponding newly created SST files;

in the embodiment of the application, for the metadata fragment file shrard contained in the SST file, a corresponding fragment file identifier may be configured, so that the metadata node may identify different metadata fragment files shrard according to the fragment file identifier. The fragmentation file identifier may be a unique shard id, and may be stored in an identifier field of the SST file.

In order to merge metadata belonging to the same metadata fragment file shrard into one SST file, where the SST file only contains metadata of the same metadata fragment file shrard, a metadata node may identify metadata to be integrated corresponding to the same fragment file identifier from a plurality of SST files to be integrated according to fragment file identifiers corresponding to metadata contained in the SST file to be integrated, write the metadata into a newly created SST file, so that the SST file is aligned to a shrard boundary corresponding to the fragment file identifier from the head to the tail, and write the fragment file identifier into an identifier field of the newly created SST file, so as to subsequently identify a storage location of the shrard.

It should be noted that, regarding the improved compact implementation process proposed in the present application, including but not limited to the implementation method described above, it may be flexibly adjusted according to actual requirements, and the detailed description of the present application is not given here.

Step S43, obtaining a metadata splitting request aiming at the newly added metadata node of the storage system;

in combination with the above description of the technical solution of the embodiment, after the storage system responds to the metadata node capacity expansion request and adds a new metadata node, for load balancing, a data splitting request for at least one original metadata node may be generated, where the data splitting request may be determined according to load conditions of the original metadata nodes of the storage system, select one or more metadata nodes with current load ranks ahead as metadata nodes to be split, and send a data splitting request for the newly added metadata node to the metadata nodes, that is, request the original metadata node to split part of metadata to a newly added metadata node.

Step S44, responding to the metadata splitting request, and determining the SST files to be split, where the metadata split files to be split are located, according to the segment index fields of each SST file of the original metadata nodes;

in combination with the above description about the improved method for integrating and processing the SST files, the SST files of each layer in the original metadata nodes in the storage system only contain metadata of the same metadata sharded file, and the SST file is configured with a sharded index field for the metadata sharded file. Therefore, the metadata node analyzes the obtained metadata splitting request, and can determine the SST file in which the to-be-split metadata split file requested to be split is located according to the index information of the segment index field of each SST file, and the to-be-split metadata split file is marked as the to-be-split SST file.

S45, extracting version information of SST files to be split existing in layers below the highest layer in the original metadata nodes;

step S46, the extracted version information is migrated to a layer corresponding to the newly added metadata node, and the newly added metadata node merges the received version information with the version information of the layer corresponding to the newly added metadata node;

in order to improve the metadata splitting efficiency, reduce network resources consumed by data transmission, and ensure that the metadata splitting process does not interfere with the IO service of a newly added metadata node, it may be first detected that an SST file to be split exists in each SST file of the highest layer L0 of an original metadata node, and may merge metadata in the SST file to be split into an SST file of a next layer according to the integration processing manner described above, so as to ensure that no SST file to be split exists in the highest layer L0.

Based on this, the above step S45 may include: determining that the SST file to be split exists in the highest layer of the original metadata node, triggering the original metadata node to respond to a file integration event, and integrating the SST file to be split existing in the highest layer to the SST file created in the lower layer. And determining that the SST files to be split do not exist in the highest layer of the original metadata node, and extracting version information of the SST files to be split existing in each layer of the original metadata node.

Optionally, for the integration processing process of the SST file to be split at the highest layer of the metadata node, a minor compact integration mode may be adopted to implement, that is, one or more small and adjacent dump SST files and 0 or more frozen memory tables are selected and combined into a larger SST file. The detailed implementation process of minor compact is not described in detail in the present application, and is not limited to the implementation manner of compact.

Therefore, in the metadata splitting process, from the next higher level L1 of the original metadata node, the SST files to be split in the level can be determined, and the version information of the sub-data structure (such as the LSM sub-tree) formed by the SST files to be split belonging to different levels is sent to the newly added metadata node, so that the newly added metadata node combines (merge) the received version information of each level of the SST files to be split into the version information of the SST files of the level corresponding to the newly added metadata node itself, thereby forming the new version information of the SST files of the level.

The Version information Version can represent the SST files to be split and the metadata split files and other contents contained in the corresponding layer in a multi-layer data structure, and the content of the Version information of the data structure and the extraction method thereof are not limited and can be determined according to actual requirements.

In the processing process, the newly added metadata node can prohibit the execution of the compact operation, that is, the newly added metadata node prohibits the response of the file integration event, and each to-be-split SST file corresponding to the migration version information only contains the same metadata fragment file shared.

And step S47, updating the version information of the multilayer data structure currently possessed by the original metadata node according to the migrated version information in the original metadata node.

For any original metadata node in the storage system, according to the method described above, after the version information of the sub-data structure formed by all the SST files to be split is migrated to the new metadata node, the version information of the original metadata node can be updated in time, that is, the sub-data structure is deleted from the original multi-layer data structure, so as to obtain the new data structure and the version information thereof. Therefore, when the metadata in the SST file to be split in the storage system is read subsequently, the corresponding data reading request can be sent to the newly added metadata node, and the requested data to be read is inquired by using the version information of the current multilayer data structure of the newly added metadata node.

Therefore, in the node capacity expansion process of the storage system, the original metadata node only needs to migrate version information of a sub data structure formed by the SST files to be split to the newly added metadata node, and compared with a processing mode of migrating all metadata in the SST files to be split to the newly added metadata node, the method greatly reduces the data transmission amount, improves the capacity expansion efficiency of the node, and saves network resources. Moreover, the original metadata node only needs to delete the migrated version information and does not need to delete the metadata in the SST file to be split in sequence, and negative effects on the original metadata node are avoided.

Referring to fig. 5, which is a flowchart illustrating another optional example of the node capacity expansion method of the storage system proposed in the present application, this embodiment may describe an optional detailed implementation manner of the node capacity expansion method of the storage system described above, and as shown in fig. 5, the method may include:

step S51, responding to a file integration event aiming at any metadata node in the storage system, and determining SST files to be integrated existing in the metadata node;

regarding the implementation process of step S51, reference may be made to the description of the corresponding parts in the above embodiments, which is not described herein again.

Step S52, key values obtained by iterative query of the SST files to be integrated are sequentially written into new SST files; the key value corresponds to a fragment file identifier of the corresponding metadata fragment file;

in this embodiment of the present application, the storage system may be a key-value storage system, and the metadata included in each metadata node (e.g., levelDB) may be stored in a key-value pair manner. Thus, referring to the flowchart of an alternative implementation method for integrating SST files of metadata nodes in a storage system shown in fig. 6, after any metadata node starts a compact operation, a hierarchy participating in the compact operation, such as at least one layer below the highest layer in a multi-layer data structure of the metadata node, may be determined first, and then, from a plurality of SST files included in the hierarchy, an SST file to participate in the compact, that is, an SST file to be integrated, may be further determined. The determination process of the hierarchy participating in the compact and the SST files to be integrated contained in the hierarchy can be determined according to the load condition of each original metadata node in the storage system, but is not limited to this.

Then, as shown in fig. 6, an iterator for implementing a composition operation, that is, an iterator for creating a metadata fragment file shrard, may be created, so that when a composition operation is performed, iteration may be started from a start key corresponding to the determined shrard id of the SST file to be integrated, and the iterated key may be written into the newly created SST file in sequence. The creation method of the iterator and the working principle thereof are not described in detail in the present application.

Step S53, determining whether the key value written into the new SST file is the last key value of the metadata fragmented file with the corresponding fragmented file identifier; if yes, go to step S54; if not, go to step S56;

step S54, the fragment file identification is written into the identification field of the new SST file;

step S55, creating a new SST file, and writing the next key value which is inquired in an iterative manner into the newly created SST file;

and step S56, continuously writing the next key value which is inquired by iteration into the new SST file.

In the multilayer data structure which is expected to be improved by the application, each SST file only contains metadata of the same metadata fragment file shrd, so as shown in fig. 6, in the process of performing key iteration on metadata in an SST file to be integrated by an iterator, whether the iterated key reaches the boundary of the corresponding shrd id can be detected, if the key of the current iteration does not reach the boundary of the corresponding shrd id, it is indicated that the metadata key corresponding to the shrd id is not iterated, and the next key can be iterated and written into the new SST file in the above manner.

If the metadata key of the iteration reaches the boundary of the corresponding shard id, the metadata key belonging to one metadata fragment file shard with the shard id is iterated, the complete metadata contained in the shard is written into the newly created SST file, the newly created SST file is ensured to only contain the metadata key corresponding to the shard id, and the shard id is written into the identification field of the new SST file for subsequent identification of the metadata fragment file to be split. Then, if the determined SST file to be integrated is not integrated, a new SST file can be created, and the metadata key corresponding to the next shield id is recorded continuously according to the method; and ending the compact operation of the metadata node when the determined SST files to be integrated are integrated.

In practical applications, after a period of data writing processing is performed on each metadata node in the storage system, the compact operation may be performed according to the method described above, so as to ensure that each SST file included in each layer of the metadata node only includes metadata of the same metadata shard file. It should be noted that, for a newly added metadata node, when performing merge processing on version information of a sub-data structure from one or more original metadata nodes, the compact operation may be temporarily prohibited from being performed, but the SST file to be split corresponding to the received version information only contains a shared metadata, and an SST file with overlapping metadata does not occur. After the newly added metadata node completes the metadata splitting, the composition of the newly added metadata node is recovered, and the newly obtained data of the newly added metadata node can still be integrated according to the SST file integration processing method described above.

Referring to fig. 7, which is a schematic flowchart illustrating a further optional example of a node expansion method of a storage system according to the present application, this embodiment may describe, from a new metadata added node side of the storage system, an implementation process of the node expansion method of the storage system, as shown in fig. 7, the method may include:

step S71, receiving version information corresponding to a subdata structure formed by SST files to be split migrated by original metadata nodes;

regarding the process of obtaining version information corresponding to a sub-data structure formed by the SST file to be split, reference may be made to the above node expansion method of the storage system described from the original metadata node side of the storage system, and details of the embodiment of the present application are not described here.

As can be seen from the above analysis, since the metadata node of the storage system uses an improved compact to implement an integration process on metadata in the SST file, so that each SST file only contains metadata of the same metadata fragment file, and is configured with a fragment index field, such as a shield id, for the metadata fragment file shield, and the SST file to be split refers to the SST file in which the metadata fragment file to be split is located, each SST file to be split also only contains a shield metadata key, thereby improving query reliability of a sub-data structure (such as an LSM sub-tree) formed by the SST file to be split.

Step S72, merging the received version information with the version information of the corresponding layer in the multi-layer data structure of the device.

The version information received by the newly added metadata node can indicate a sub-data structure formed by a plurality of SST files to be split determined by the original metadata node, for example, the SST files to be split are located on which layer of a multi-layer data structure, and the SST files to be split contain information such as which shard file, compared with the data volume of the metadata contained by the SST files to be split, the data volume of the version information is very small, so that network resources consumed by transmission of version information are greatly reduced, the metadata splitting speed is increased, and the node capacity expansion efficiency is further improved.

In order to ensure data consistency when the SST file to be split includes a plurality of SST files located at different layers, a newly added metadata node in a storage system may merge version information corresponding to the SST file to be split at each layer with version information of the layer of the newly added metadata node itself, so as to obtain new version information of a multilayer data structure of the newly added metadata node, that is, a new data structure is formed, and data reading services may be subsequently processed accordingly.

Because the version information received by the newly added metadata node does not contain the highest-level data structure, the version information merging processing process executed by the newly added metadata node does not interfere with the processing of the data writing service, namely the IO service of the newly added metadata node does not need to be interrupted.

Referring to fig. 8, which is a schematic flowchart of still another optional example of the node expansion method for a storage system proposed in the present application, this embodiment may still describe an implementation process of the method from a new metadata added node side of the storage system, which may be an optional detailed implementation manner of the node expansion method for a storage system described above, as shown in fig. 8, the method may include:

step S81, acquiring the maximum serial number of the splitting key value of the to-be-split metadata fragment file in the original metadata node;

step S82, determining the global serial number of the newly added metadata node by using the maximum serial number of the split key value;

in practical applications, for a key-value storage system, each key may store multiple versions of values, and in order to achieve the ordering of operations such as data reading, writing, modifying and the like, each metadata node is usually configured with a sequence number field, such as a sequence number (global auto-increment sequence number, a value of unsigned 64-bit integer) for recording a new sequence number generated by each modification operation on the metadata of the metadata node, such as a value obtained for each modification, where the sequence number may be increased by 1, and therefore, the larger the sequence number is, the newer the corresponding key-value is. The method for acquiring the serial number of each key value pair of each metadata node is not limited.

Based on the above analysis, in order to ensure data consistency and improve the reliability of service processing of the newly added metadata node, the global sequence number (e.g. sequence number) of the newly added metadata node needs to be greater than the maximum sequence number of the split key value determined by the original metadata node, so that the newly added metadata node can read the latest metadata and data by responding to the data reading request according to the global sequence number.

Therefore, after the storage system starts the metadata splitting operation on the original metadata node, referring to a level db capacity expansion scenario shown in fig. 9, the source level db may determine the maximum sequence number of the key in the to-be-split shelf, that is, the maximum sequence number of the sequence number, as shown in fig. 9, count the number n of the keys in each to-be-split shelf, and send the number n to the newly-added metadata node level db, so as to increase the sequence number of the new level db to be greater than the sequence number of the source level db, for example, add n to the sequence number of the new level db, so as to obtain the global sequence number of the new level db.

For a level db capacity expansion scenario of a key value storage system as shown in fig. 9, for any source level db, after a metadata splitting operation is started, the source level db determines to-be-split SST files in each layer in an LSM tree thereof, and then may detect whether to-be-split SST files exist on a highest layer L0, and if so, may adopt a minor cooperation file integration manner to perform hierarchical merging processing on metadata in to-be-split SST files on the highest layer L0, so as to determine that no to-be-split SST files exist on the highest layer L0 any more, and then split Version information versions of subtrees formed by L1 of the current source level db and the to-be-split SST files in each layer below the current source level db and migrate to a newly-added level db, which does not involve data migration across nodes, and consumes little time, and does not consume hardware resources.

And then, the newly added level DB can merge the received Version information into a corresponding layer in the LSM tree of the newly added level DB, and record the SST files to be split to obtain the Version information Version of the new LSM tree of the newly added level DB. Afterwards, the source levelDB can also clear the SST file to be split from the LSM tree, and create a new Version information Version.

S83, in the process of merging the version information migrated from the original metadata node, forbidding to respond to the file integration event to obtain a data write-in request;

regarding the process of merging the version information migrated from the original metadata node by the newly added metadata node, reference may be made to the description of the corresponding part of the above embodiments, which is not described in detail herein.

In combination with the description of the corresponding parts of the above embodiments, in the embodiments of the present application, the to-be-split SST files in each layer of the original metadata node need to be merged into the newly-added metadata node once, each to-be-split SST file only includes metadata of one board, and the newly-added metadata node can prohibit the composition, so that it can be ensured that the SST files in each layer do not have overlapping metadata. The implementation manner of how the metadata node closes the compact is not limited in the present application, and may be determined according to the circumstances.

Step S84, responding the data writing request according to the global serial number of the newly added metadata node;

in the embodiment of the present application, for a multi-layer data structure of a metadata node, such as an LSM tree, newly obtained data is usually written into the highest layer L0 of the LSM tree, and does not conflict with the merging process of version information from an original metadata node. Moreover, in order to ensure that the latest data can be read subsequently, when responding to a data write request, for a newly written key-value or a new value of an existing key, the configured sequence number is greater than the sequence number of the maximum sequence number of the key in the SST file to be split, that is, in a newly added metadata node, the newly written sequence number will be greater than the sequence number from the original metadata node to be migrated, and no detailed description is given to the update implementation process of the sequence number of the newly added metadata node.

Based on this, under the condition that the newly added metadata node obtains the data reading request, the data to be read requested in the newly added metadata node can be read according to the version information of the currently-existing multilayer data structure, the data to be read is determined to be unsuccessfully read, the data reading request is forwarded to the corresponding original metadata node, and the original metadata node responds to the data reading request, so that the data reading requirement of the service is met. That is to say, as shown in fig. 9, for a data reading request of the storage system, the data reading request may be read in the newly added level db first, and if the data reading request is not read, the data reading request is forwarded to the source level db for reading, so that it is ensured that no read-write service is blocked, and it is ensured that data of the original metadata node is consistent with data of the newly added metadata node.

And step S85, determining to end the merging operation of the version information from the original metadata node, and canceling a prohibition instruction for a response file merging event.

After the split version information is merged as described above, the compact of the newly added metadata node may be opened to implement the integration processing of the SST file at the highest layer in the metadata node, and then the obtained data reading request does not need to be forwarded to other metadata nodes.

Referring to fig. 10, a schematic structural diagram of an optional example of a node capacity expansion apparatus of a storage system provided in the present application, this embodiment may be described from an original metadata node side of the storage system, as shown in fig. 10, the apparatus may include:

a newly added metadata node determination module 101, configured to determine that a storage system responds to a metadata node capacity expansion request and newly adds a metadata node;

each original metadata node in the storage system has a multi-layer data structure, each layer can deploy at least one SST file, and each SST file comprises metadata of the same metadata fragment file and a fragment index field aiming at the metadata fragment file;

the to-be-split SST file determining module 102 is configured to respond to a metadata split request, and determine, according to the segment index field of each SST file that the original metadata node has, an SST file to be split where the to-be-split metadata segment file is located;

a version information migration module 103, configured to migrate version information corresponding to a sub-data structure formed by the SST file to be split in the original metadata node to the version information of the newly added metadata node; the sub data structure comprises SST files located at least one layer below the highest layer of the original metadata nodes.

Optionally, in order to ensure that each SST file contains metadata of the same metadata fragment file, the apparatus may further include:

the SST file to be integrated determining module is used for responding to a file integration event aiming at any metadata node in the storage system and determining the SST files to be integrated existing in the metadata node;

the integration processing module is used for writing the metadata to be integrated corresponding to the same fragment file identifier in the SST files to be integrated into the newly created SST file according to the fragment file identifier corresponding to each metadata contained in the SST files to be integrated, and writing the fragment file identifier into the corresponding identifier field of the newly created SST file; wherein the shard file identifier is used to identify the different metadata shard files.

In some embodiments, the metadata is stored in a key-value pair manner, and based on this, the integration processing module may include:

the key value iterative write-in unit is used for sequentially writing the key values which are iteratively inquired by the SST files to be integrated into new SST files; the key value corresponds to a fragment file identifier of the metadata fragment file;

a first determining unit, configured to determine that the key value written into the new SST file this time is the last key value of a metadata fragment file having a corresponding fragment file identifier, and write the fragment file identifier into an identifier field of the new SST file;

the SST file creating unit is used for creating a new SST file and writing the next key value which is inquired in an iterative manner into the newly created SST file;

and the second determining unit is used for determining that the key value written into the new SST file at this time is not the last key value of the metadata fragment file with the corresponding fragment file identifier, and continuously writing the next key value obtained by iterative query into the new SST file.

In still other embodiments, the version information migration module 103 may include:

the version information extraction unit is used for extracting the version information of the SST files to be split existing in layers below the highest layer in the original metadata nodes;

the version information migration unit is used for migrating the extracted version information to a layer corresponding to a newly added metadata node, and the newly added metadata node merges the received version information with the version information of the layer corresponding to the newly added metadata node;

and the version information updating unit is used for updating the version information of the multilayer data structure currently possessed by the original metadata node according to the migrated version information in the original metadata node.

Optionally, the version information extracting unit may include:

the file integration unit is used for determining that the SST files to be split exist in the highest layer of the original metadata node, triggering the original metadata node to respond to a file integration event, and integrating the SST files to be split existing in the highest layer to the SST files created in the lower layer;

and the extracting unit is used for determining that the SST files to be split do not exist in the highest layer of the original metadata node and extracting the version information of the SST files to be split existing in each layer of the original metadata node.

Referring to fig. 11, a schematic structural diagram of yet another alternative example of a node capacity expansion apparatus of a storage system proposed in the present application, where this embodiment may be described from a side of a new metadata node of the storage system, as shown in fig. 11, the apparatus may include:

the version information receiving module 111 is configured to receive, by a newly added metadata node in the storage system, version information corresponding to a sub-data structure formed by the SST file to be split and migrated by an original metadata node;

each original metadata node in the storage system has a multi-layer data structure, each layer can deploy at least one SST file, and each SST file comprises metadata of the same metadata fragment file and a fragment index field aiming at the metadata fragment file; the SST files to be split refer to SST files where the metadata fragment files to be split are located;

and a version information merging module 112, configured to merge the received version information with version information of a corresponding layer in a multi-layer data structure of the client.

In still other embodiments, as shown in fig. 12, the apparatus may further include:

a maximum sequence number obtaining module 113, configured to obtain a maximum sequence number of a split key value of a to-be-split metadata fragment file in the original metadata node;

a global sequence number determining module 114, configured to determine a global sequence number of a newly added metadata node by using a maximum sequence number of the split key value; the global sequence number of the newly-added metadata node is larger than the maximum sequence number of the split key value;

a file integration event prohibition response module 115, configured to prohibit a response to a file integration event during merging of the version information migrated from the original metadata node;

a data write request obtaining module 116, configured to obtain a data write request;

a data write request response module 117, configured to respond to the data write request according to the global sequence number;

and the file integration event recovery response module 118 is configured to determine to end the merging operation on the version information, and cancel the prohibition instruction for responding to the file merging event.

It should be noted that, various modules, units, and the like in the embodiments of the foregoing apparatuses may be stored in the memory as program modules, and the processor executes the program modules stored in the memory to implement corresponding functions, and for the functions implemented by the program modules and their combinations and the achieved technical effects, reference may be made to the description of corresponding parts in the embodiments of the foregoing methods, which is not described in detail in this embodiment.

The present application further provides a computer-readable storage medium, on which a computer program may be stored, where the computer program may be called and loaded by a processor to implement the steps of the node capacity expansion method of the storage system described in the above embodiments.

Referring to fig. 13, a schematic structural diagram of an alternative example of a storage system suitable for the node capacity expansion method of the storage system provided in the present application, where the storage system may include a plurality of metadata nodes, and each metadata node may be a computer device with a multi-layer data structure, such as a database server like a levelDB, or a terminal device with a certain data processing capability, such as a desktop computer, a robot, an intelligent terminal in each field, and the node device of the storage system shown in fig. 13 is only an example and should not bring any limitation to the functions and the range of use of the storage system in the present application.

In the embodiment of the present application, each metadata node may be used to implement the node capacity expansion method of the storage system provided by the present application, and the implementation process may refer to the description of the corresponding part in the above embodiment, and details of the embodiment of the present application are not described herein.

Optionally, in order to implement the node expansion method of the storage system provided by the present application, the metadata node may be configured with at least one memory, a processor, and the like, where the memory may be used to store a program for implementing the node expansion method of the storage system provided by the present application, and the processor may load and execute the program to implement the node expansion method of the storage system provided by the present application.

In practical applications, the storage device may include at least one memory, a magnetic disk, etc., such as at least one magnetic disk storage device or other volatile solid state storage devices; the processor may include a Central Processing Unit (CPU), an application-specific integrated circuit (ASIC), a Digital Signal Processor (DSP), an application-specific integrated circuit (ASIC), an application-specific programmable gate array (FPGA) or other programmable logic device, and the like, and the application is not limited to the device structure of the metadata node, which may be determined as the case may be.

It should be understood that the structure of the storage system shown in fig. 13 does not constitute a limitation to the storage system in the embodiment of the present application, and in practical applications, the storage system may include more devices than those shown in fig. 13, such as a management node, a monitoring node, and the like, for implementing data management, monitoring, and the like on each metadata node in the storage system, which may be determined according to practical situations, and this application is not listed here.

Finally, it should be noted that, with respect to the above-described embodiments, unless the context clearly dictates otherwise, the terms "a", "an" and/or "the" are not intended to mean in the singular, but may include the plural. In general, the terms "comprises" and "comprising" are intended to cover only the explicitly identified steps or elements as not constituting an exclusive list and that the method or apparatus may comprise further steps or elements. An element defined by the phrase "comprising a component of ' 8230 ' \8230; ' does not exclude the presence of additional identical elements in the process, method, article, or apparatus that comprises the element.

Wherein in the description of the embodiments of the present application, "/" indicates an inclusive meaning, for example, a/B may indicate a or B; "and/or" herein is merely an association describing an associated object, and means that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, in the description of the embodiments of the present application, "a plurality" means two or more than two.

This application is directed to terms such as "first," "second," and the like, which are used for descriptive purposes only to distinguish one operation, element, or module from another operation, element, or module and do not necessarily require or imply any actual relationship or order between such elements, operations, or modules. And are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated, whereby a feature defined as "first" or "second" may explicitly or implicitly include one or more of such features.

In addition, in the present specification, the embodiments are described in a progressive or parallel manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments can be referred to each other. The device, the storage system and the metadata node disclosed by the embodiment correspond to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A method of node capacity expansion for a storage system, the method comprising:

migrating the version information corresponding to the sub-data structure formed by the SST file to be split in the original metadata node to the version information of the newly added metadata node; the sub-data structure comprises SST files located at least one layer below the highest layer of the original metadata node.

2. The method of claim 1, wherein each of the SST files contains metadata for the same metadata shard file, comprising:

responding to a file integration event aiming at any metadata node in the storage system, and determining SST files to be integrated which exist in the metadata node;

according to the fragment file identifications corresponding to the metadata contained in the SST file to be integrated, writing the metadata to be integrated corresponding to the same fragment file identification in the SST file to be integrated into a newly created SST file, and writing the fragment file identification into the corresponding identification field of the newly created SST file;

3. The method as claimed in claim 2, wherein the metadata is stored in a key-value pair manner, and the writing of the metadata to be integrated corresponding to the same fragment file identifier in the SST file to be integrated into the newly created SST file according to the fragment file identifier corresponding to each metadata included in the SST file to be integrated includes:

and determining that the key value written into the new SST file at this time is not the last key value of the metadata fragment file with the corresponding fragment file identifier, and continuously writing the next key value obtained by iterative query into the new SST file.

4. The method according to claim 1, wherein migrating the version information corresponding to the sub-data structure formed by the SST file to be split in the original metadata node to the version information of the newly added metadata node includes:

migrating the extracted version information to a newly added metadata node corresponding layer, and combining the received version information with the version information of the corresponding layer by the newly added metadata node;

5. The method as claimed in claim 4, wherein the extracting version information of the SST files to be split existing in the layers below the highest layer in the original metadata nodes comprises:

determining that SST files to be split exist in the highest layer of the original metadata node, triggering the original metadata node to respond to a file integration event, and integrating the SST files to be split existing in the highest layer to SST files created in the lower layer;

and determining that the SST files to be split do not exist in the highest layer of the original metadata node, and extracting the version information of the SST files to be split existing in each layer of the original metadata node.

6. A method of node capacity expansion for a storage system, the method comprising:

the method comprises the steps that a newly-added metadata node in a storage system receives version information corresponding to a sub data structure formed by SST files to be split migrated by an original metadata node; each original metadata node in the storage system has a multi-layer data structure, each layer can deploy at least one SST file, and each SST file comprises metadata of the same metadata fragment file and a fragment index field aiming at the metadata fragment file; the SST files to be split refer to SST files where the metadata fragment files to be split are located; the sub data structure comprises SST files located at least one layer below the highest layer of the original metadata node;

7. The method of claim 6, further comprising:

responding to the data writing request according to the global sequence number;

8. The method of claim 6, further comprising:

obtaining a data reading request;

and determining that the data to be read is not read successfully, forwarding the data reading request to the corresponding original metadata node, and responding the data reading request by the original metadata node.

9. A node capacity expansion apparatus of a storage system, the apparatus comprising:

a version information migration module, configured to migrate version information corresponding to a sub-data structure formed by the SST file to be split in the original metadata node to the version information of the newly added metadata node; the sub data structure comprises SST files located at least one layer below the highest layer of the original metadata nodes.

10. A node capacity expansion apparatus of a storage system, the apparatus comprising:

and the version information merging module is used for merging the received version information with the version information of the corresponding layer in the multilayer data structure of the version information merging module.