CN114138192A - Storage node online upgrading method, device, system and storage medium - Google Patents

Storage node online upgrading method, device, system and storage medium Download PDF

Info

Publication number
CN114138192A
CN114138192A CN202111395328.4A CN202111395328A CN114138192A CN 114138192 A CN114138192 A CN 114138192A CN 202111395328 A CN202111395328 A CN 202111395328A CN 114138192 A CN114138192 A CN 114138192A
Authority
CN
China
Prior art keywords
storage
data
storage nodes
nodes
upgrading
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111395328.4A
Other languages
Chinese (zh)
Inventor
柯丹丹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Macrosan Technologies Co Ltd
Original Assignee
Macrosan Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Macrosan Technologies Co Ltd filed Critical Macrosan Technologies Co Ltd
Priority to CN202111395328.4A priority Critical patent/CN114138192A/en
Publication of CN114138192A publication Critical patent/CN114138192A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0614Improving the reliability of storage systems
    • G06F3/0619Improving the reliability of storage systems in relation to data integrity, e.g. data losses, bit errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/08Error detection or correction by redundancy in data representation, e.g. by using checking codes
    • G06F11/10Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1458Management of the backup or restore process
    • G06F11/1469Backup restoration techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0646Horizontal data movement in storage systems, i.e. moving data in between storage devices or systems
    • G06F3/065Replication mechanisms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]

Abstract

The application provides a method, a device, a system and a storage medium for on-line upgrading of storage nodes, wherein the method comprises the following steps: determining a maximum number of failed storage nodes supported by a storage pool, the storage pool including at least one storage node; the storage nodes are upgraded in batches, and relevant information in the upgrading process is recorded, wherein the number of the storage nodes upgraded in each batch does not exceed the maximum fault node number, and the relevant information comprises the relevant information of the storage nodes upgraded in batches and index information of data which is not completely written in the upgrading process; and performing data recovery on the upgraded storage node according to the related information. On one hand, a plurality of storage nodes can be upgraded at the same time, and the time for upgrading all the storage nodes is shortened; on the other hand, when data recovery is carried out, the data recovery is carried out in a targeted manner by using the recorded related information, so that the traversal of all storage nodes is avoided, the upgrading time is further shortened, and the risk is reduced.

Description

Storage node online upgrading method, device, system and storage medium
Technical Field
The present application relates to the field of data storage technologies, and in particular, to a method, an apparatus, a system, and a storage medium for online upgrade of a storage node.
Background
With the rapid development of internet application, data circulating on a network also rapidly increases, and mass data storage of PB level and EB level becomes very common. Because the amount of data to be stored is huge, the storage server of the conventional centralized storage system will bear a great network pressure, and the system performance of the storage server will impose a great limitation on the data access service. Based on the above problems, the distributed storage system is widely used in the scenario of mass data storage, and stores data in a plurality of independent server devices in a distributed manner, and then uses the server to locate the stored information, so that on one hand, the storage load can be shared by a plurality of servers, on the other hand, the reliability, availability and access efficiency of the system are improved, and the system is easy to expand. In the distributed storage system, data is stored in different storage nodes respectively, and because the amount of data to be stored is very large, the number of storage nodes required in one distributed storage system is also large.
In order to ensure normal use of the system, the distributed storage system needs to perform upgrading maintenance operations such as software upgrading and firmware upgrading regularly. During upgrade maintenance, the storage node or services on the storage node may need to be restarted. Therefore, in order to avoid affecting the online service and errors of data damage or loss, the existing method for online updating the storage nodes is to select idle periods of the service, sequentially update the storage nodes, and only one storage node executes each time of updating. In the process of online upgrading of the storage nodes, the storage nodes are sequentially queued for upgrading, so that the duration of the whole upgrading is longer, the capacity of the storage nodes for processing accidents is not reduced in an upgrading process, and therefore, a longer upgrading maintenance time brings a greater risk.
Disclosure of Invention
In order to overcome the problems in the related art, the application provides a method, a device, a system and a storage medium for online upgrading of a storage node. The technical scheme is as follows:
a method for online upgrading of storage nodes in a distributed storage system comprises the following steps:
determining a maximum number of failed storage nodes supported by a storage pool, the storage pool including at least one storage node;
the storage nodes are upgraded in batches, and relevant information in the upgrading process is recorded, wherein the number of the storage nodes upgraded in each batch does not exceed the maximum fault node number, and the relevant information comprises the relevant information of the storage nodes upgraded in batches and index information of data which is not completely written in the upgrading process;
and performing data recovery on the upgraded storage node according to the related information.
An apparatus for online upgrade of storage nodes in a distributed storage system, the apparatus comprising:
a number calculation unit for determining a maximum number of failed storage nodes supported by a storage pool, the storage pool including at least one storage node;
the batch upgrading unit is used for carrying out batch upgrading on the storage nodes and recording related information in the upgrading process, wherein the number of the storage nodes of each batch of upgrading does not exceed the maximum fault node number, and the related information comprises the related information of the storage nodes carrying out batch upgrading and the index information of data which is not completely written in the upgrading process;
and the data recovery unit is used for recovering the data of the upgraded storage node according to the related information.
A distributed storage system, the system comprising:
at least one storage node; and a management node;
the management node is used for realizing the steps of the storage node online upgrading method in any distributed storage system provided by the application;
and the storage node is used for responding to the read-write instruction of the management node.
A computer-readable storage medium, having stored thereon, a computer program which, when executed by a processor, performs the steps of any one of the methods for online upgrade of storage nodes in a distributed storage system provided by the specification.
According to the technical scheme, aiming at actual application requirements, the maximum fault node number supported by the storage pool is calculated and determined at first, then the storage nodes in the storage pool are upgraded in batches according to the maximum fault node number, relevant information in the upgrading process is recorded in the upgrading process, and after the batch upgrading is completed, data recovery is carried out on the upgraded storage nodes according to the relevant information. On one hand, a plurality of storage nodes can be upgraded at the same time, so that the time for upgrading all the storage nodes is shortened, and the risk is reduced; on the other hand, when data recovery is carried out, the data recovery is carried out in a targeted manner by utilizing the recorded related information, so that the process of checking the data stored on all the storage nodes one by one is avoided, the upgrading time is further shortened, the duration of incomplete data is shortened, and the risk is reduced.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this application, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application.
Fig. 1 is a flowchart illustrating a method for online upgrade of storage nodes in a distributed storage system according to an exemplary embodiment of the present application.
Fig. 2 is a flowchart illustrating a method for online upgrade of storage nodes in a storage pool using an EC policy in a distributed storage system according to an exemplary embodiment of the present application.
Fig. 3 is a flowchart illustrating a method for online upgrade of storage nodes in a storage pool using a multi-copy policy in a distributed storage system according to an exemplary embodiment of the present application.
Fig. 4 is a flowchart illustrating a specific process of performing a batch upgrade on storage nodes in a distributed storage system according to an exemplary embodiment of the present application.
Fig. 5 is a schematic structural diagram of an online storage node upgrade apparatus in a distributed storage system according to an exemplary embodiment of the present application.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present application, as detailed in the appended claims.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this application and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.
It is to be understood that although the terms first, second, third, etc. may be used herein to describe various information, such information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present application. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.
In a distributed storage system, data files are stored in a plurality of storage servers at different positions respectively, and a storage node is a mark of the position of each storage server and is used for pointing to the storage server which actually stores data when the storage system performs data access service. The storage medium is a device specifically used for storing data information on each storage server, and the storage medium related to the present application may be various devices having a storage function, for example, a mechanical Hard Disk (HDD), a Solid State Drive (SSD), a floppy Disk, and the like. There may be only one storage medium on one storage node or there may be multiple storage media. The storage pool is a logical concept, which refers to an aggregate of resources providing a storage function, and is specifically composed of one or more storage media on one or more different storage nodes, a data storage policy is set on the storage pool, and the storage media on all the storage nodes of the storage pool store data according to the data storage policy of the storage pool.
In some embodiments, the distributed storage system may store data using object storage. The basic entities stored in the object storage include a bucket and an object, wherein the bucket is a container for storing the object, and the object is an aggregate of data of the data file to be stored and related attribute information thereof. Any object necessarily belongs to a certain bucket, and a plurality of objects can be contained in one bucket. The related attribute information of the object includes an object name, object data, and object metadata. The object name refers to a unique identifier of an object in the bucket, the object data refers to actual data of the corresponding object, and the object metadata is a set of name value pairs and comprises system metadata and user-defined metadata of the object. In the object storage system, a node that actually stores an object is a storage node, and a plurality of object storage nodes may constitute an object storage pool.
A data protection policy, such as an EC (Erasure Code) policy, a multi-copy policy, or the like, may also be set on the storage pool. Through a data protection strategy, the storage system allows a certain number of storage nodes in the storage pool to have faults without influencing the normal reading and writing of data by other storage nodes, ensures that data access service can be normally carried out in a supported fault range, and carries out data recovery on the storage nodes with faults after fault repair, thereby ensuring the data consistency of the stored data.
Erasure Code (EC) is a coding technique, and an EC strategy of N + M means that data to be stored is divided into N data blocks by using the EC technique, and M parity data blocks (parity blocks for short) are calculated. The data blocks and the check blocks are collectively referred to as EC blocks, and all EC blocks for the same stored data or data file are collectively referred to as EC stripes. The N + M EC blocks are respectively stored in a plurality of storage nodes, one or more EC blocks may be stored in each storage node, and the storage position of each EC block is determined by a computer according to a certain rule. If the damaged or lost data is not larger than M parts, the data which should be stored in the remaining M EC blocks can be reversely deduced through any stored N EC blocks (data blocks or check blocks), so that all correct and undamaged data blocks are obtained, namely correct and undamaged data are obtained. Therefore, the EC strategy can ensure that the data access service is not influenced under the condition that the failed EC block does not exceed M, and the damaged data can be reconstructed and repaired by utilizing the EC technology. For example, for a pool with an EC policy of 4+2, the damaged data may be reconstructed and repaired using EC techniques without the number of failed EC blocks exceeding 2. In some embodiments, the 4+2 EC blocks are stored on 6 storage nodes respectively, and at most one EC block is stored on each storage node, so that the storage pool can reconstruct and repair the damaged data by using an EC technique to ensure the integrity of the data under the condition that any storage node with the number not exceeding 2 fails. In other embodiments, the 4+2 EC-blocks may be stored on fewer storage nodes, for example, 3 or 4 storage nodes, and each storage node may store one EC-block or may store two EC-blocks, so that the storage pool may reconstruct and repair the damaged data by using the EC technique to ensure the integrity of the data when any storage node with a number not exceeding 1 fails, that is, the number of failed EC-blocks does not exceed 2. It should be noted that the number of EC blocks stored on each storage node should not exceed the number of parity blocks M, so as to avoid the situation that data cannot be repaired when a certain storage node fails.
Copy (Replica/Copy) refers to persisting the same piece of data across different storage nodes. The multi-copy strategy is K, namely K data copies of one data file are stored on K storage nodes respectively, and when the data copy stored on one storage node is lost, data can be read from the data copies on other storage nodes, so that the data copy on the storage node is repaired. In order to ensure the normal operation of access service, the multi-copy policy generally requires that at least a certain number of storage nodes where copies are located are allowed to process read and write operations when online.
When one storage node in the storage system is upgrading, in order to ensure the normal execution of the online service, in the process of upgrading the storage node, the storage node is defaulted not to receive a read-write request from a front-end server, and other online storage nodes receive the read-write request as usual, so that the read-write of the stored data is not influenced. Therefore, during the upgrade process, if there is a request to write data, the storage node being upgraded may miss the newly written data, resulting in incomplete data. When incomplete data exists on the storage node, risks exist, such as affecting data read-write speed and reducing data redundancy, and even when a hard disk or the storage node fails, part of data may be damaged so as not to be read and written, so that data recovery operation needs to be performed on the storage node. In general data recovery operation, all storage nodes need to be traversed, the data storage condition of each storage node is respectively confirmed, and then data recovery is performed on the storage nodes with incomplete data, so that even if batch upgrading is performed on the storage nodes, all the storage nodes still need to be traversed when data recovery is performed after the storage nodes are upgraded, the time spent is still long, and before the data recovery operation is completed, the incomplete state of the data can last for a long time, and a great risk still exists. Therefore, in the process of upgrading the storage nodes in batch, relevant information in the upgrading process, such as relevant information of the storage nodes subjected to batch upgrading and index information of data which is written in the upgrading process and causes writing failure because the storage nodes to be written in are upgrading, can be recorded, so that the storage nodes which are possibly in the incomplete state of the stored data after the upgrading is completed and the incompletely written data are determined.
The method for upgrading storage nodes in a distributed storage system in an online manner according to some embodiments is described in detail below.
As shown in fig. 1, fig. 1 is a flowchart illustrating a method for online upgrading a storage node in a distributed storage system according to an exemplary embodiment of the present application, including the following steps:
step S101, determining the maximum number of fault storage nodes supported by a storage pool, wherein the storage pool comprises at least one storage node;
step S102, carrying out batch upgrading on the storage nodes and recording related information in the upgrading process, wherein the number of the storage nodes in each batch of upgrading does not exceed the maximum fault node number, and the related information comprises the related information of the storage nodes in batch upgrading and the index information of data which is not completely written in the upgrading process;
and step S103, performing data recovery on the upgraded storage node according to the related information.
In step S101, the maximum number of storage nodes that are supported in one storage pool and have failed at the same time may be obtained according to information such as a data protection policy of the storage pool, and the number is the maximum number of storage nodes that are supported in one storage pool and are updated at the same time. That is, when the storage system is upgraded, the maximum number of failed nodes supported by the storage pool can be determined according to the number of storage nodes, the data protection policy of the storage pool, and the corresponding relationship between the storage nodes and the storage pool.
Since the storage media on a storage node may be of various types, it may occur that the storage node belongs to multiple storage pools. For example, the HDD on the storage node is selected to form a storage pool whose data protection policy is an EC policy, and the SSD on the storage node is selected to form a storage pool whose data protection policy is a multi-copy policy, and there may be a case where the HDD on a certain storage node belongs to a previous storage pool and the SSD on the node belongs to a next storage pool, that is, the storage node belongs to two storage pools at the same time.
In some embodiments, storage nodes belonging to at least two storage pools simultaneously may be upgraded preferentially to other storage nodes in the storage pools. In order to ensure that all storage nodes in all storage pools do not have problems, when the storage nodes belonging to at least two storage pools are upgraded, the number of the storage nodes upgraded in each batch may be the minimum value of the maximum number of the failed nodes corresponding to all the storage pools to which the storage nodes belong. For example, if the storage nodes s1, s2, s3, s4, s5, s6, s7 and s8 belong to the storage Pool-1, and the storage nodes s4, s5, s6, s7, s8, s9, s10 and s11 belong to the storage Pool-2, and Pool-1 supports simultaneous failure of Y1 storage nodes, Pool-2 supports simultaneous failure of Y2 storage nodes, and the minimum value of Y1 and Y2 is Y, then Y storage nodes are selected from the intersection portion s4, s5, s6, s7 and s8 of Pool-1 and Pool-2 for each batch and are upgraded at the same time, and the remaining storage nodes in Pool-1 and Pool-2 are upgraded after the upgrade is completed, wherein the remaining storage nodes in Pool-1 can support simultaneous upgrade of Y1 nodes, and the remaining storage nodes in Pool-2 can support simultaneous upgrade of Y2.
In step S102, the storage nodes are upgraded in batches, and relevant information in the upgrading process is recorded, and the number of the storage nodes upgraded in each batch does not exceed the maximum number of the failed nodes obtained in step S101.
As shown in fig. 2, fig. 2 is a flowchart illustrating a specific process of batch upgrading storage nodes in a distributed storage system according to an exemplary embodiment of the present application, where if the maximum number of failed nodes is Y, the specific process of batch upgrading storage nodes may be performed according to the following steps:
step S102a, selecting Y storage nodes meeting the conditions, setting the upgrade identification as 1, and starting batch upgrade restart;
step S102b, processing data read-write request normally in the upgrading process, and recording the relevant information in the upgrading process;
step S102c, restoring the upgrade identification to be 0 after the Y storage nodes finish upgrading, and finishing the batch upgrading;
and step S102d, repeating the steps for the rest other storage nodes until all the storage nodes are upgraded.
In the above embodiment, the upgrade flag is set to 1 or 0 to distinguish the storage nodes in the upgrade and the storage nodes not in the upgrade in each batch, when a storage node needs to start the upgrade or is in the upgrade, the upgrade flag is set to 1 to indicate that the storage node is in the upgrade, and when the storage node completes all steps in the upgrade process, the upgrade flag is set to 0 to indicate that the storage node is not in the upgrade. In other embodiments, other manners may be used as long as the storage node in the upgrade and the storage node not in the upgrade can be distinguished, which is not limited in the present application.
In the process of batch upgrading, the storage node marked as being upgraded does not receive the data read-write request, and the storage node marked as not being upgraded continues to receive and process the data read-write request, where the data read-write request may be from a front-end service or from other clients or servers, and this is not limited in the present application. In the process of batch upgrading, different processing details exist for processing the data read request according to different data protection policies of the storage pool, which is not limited in the present application.
In step S103, in the upgrading process, the storage system normally receives the data read-write request, but the storage node being upgraded does not receive the data read-write request, and when new data is written, the storage node being upgraded cannot receive the new data, so that the node is missing the new data, and there is a problem that the data is incomplete, and data recovery needs to be performed on the storage nodes after the upgrading is completed. In order to accurately point to the storage nodes needing to restore data, when each batch of storage nodes are upgraded in batches, relevant information in the upgrading process needs to be recorded, and the relevant information is used for restoring the data of the upgraded storage nodes. The related information refers to information related to a batch upgrading process and related information which may affect the batch upgrading process, and according to the related information, the storage system may implement operations such as data recovery to reduce the influence of the batch upgrading process on the services of the storage system, for example, the related information may be related information of the storage nodes performing batch upgrading and index information of data which is not completely written in the upgrading process, and the related information may also include other valuable information, which is not limited in the present application. For example, in some embodiments, the distributed storage system uses an object storage manner, and the recorded related information may be related index information of the object data that needs to be written, including a bucket name, an object version number, and the like of the object data that needs to be written, and related index information of the storage node that needs to be written. After the storage node is upgraded, the position of the corresponding storage node can be found in the storage system according to the recorded related information, and then data is recovered from other online storage nodes and written into the corresponding position in the storage node in which the data needs to be written according to the data protection strategy of the storage pool.
The data recovery is required when the storage nodes are upgraded, and the data recovery for the upgraded storage nodes can be performed when all the upgraded storage nodes in the batch are upgraded, or after all the storage nodes in the storage pool are upgraded.
In some embodiments, the system may perform data recovery immediately after each batch of storage nodes has completed an upgrade. By using the method, the integrity of the data can be recovered quickly, the duration of the incomplete state of the data is greatly reduced, and the risk which may occur when the data is incomplete is reduced. However, since data recovery is required after each batch upgrade, the duration of the whole upgrade is inevitably long, and data recovery can be performed in such a way when the threat of incomplete data is emphasized.
In other embodiments, the system may perform data recovery uniformly after all storage nodes in the storage pool have been upgraded. By using the method, the data recovery process can be completed once, the duration of the whole storage node upgrading is greatly reduced, and the possible risks in the storage node upgrading process are reduced. However, because the storage nodes in the incomplete data state after each batch upgrade need to wait for data recovery after all the storage nodes are upgraded, the incomplete data state of some storage nodes lasts for a long time, the control force on the risk which may occur when the data is incomplete is not reduced, and the data recovery can be performed by using the method when the whole storage node upgrade time is emphasized.
In some embodiments, the protection policy for a storage pool is an EC policy. Fig. 3 is a flowchart illustrating a method for online upgrade of storage nodes in a storage pool using an EC policy in a distributed storage system according to an exemplary embodiment of the present application, and includes the following steps:
step S301, determining the maximum number of failed storage nodes supported by a storage pool based on the number of the check blocks, wherein the storage pool comprises at least one storage node;
step S302, upgrading the storage nodes in batches and recording related information in the upgrading process, wherein the number of the storage nodes upgraded in each batch does not exceed the maximum fault node number, and the related information comprises the related information of the storage nodes upgraded in batches and the index information of data which is not completely written in the upgrading process;
step S3021, when a data reading request is received, reading the data blocks or the check blocks from the N storage nodes which are not in the upgrade process, combining the data blocks or the check blocks into a data file, and returning the data file to a requester;
step S3022, when a request for writing data is received, splitting newly written data into N data blocks, calculating M check blocks, writing the data blocks and the check blocks into corresponding storage nodes that are not under upgrade, and if at least one of the data blocks or the storage nodes corresponding to the check blocks is under upgrade, recording index information of the newly written data;
step S303, performing data recovery on the upgraded storage node according to the related information.
In step S301, if the protection policy of the storage pool is the EC policy and the EC policy is N + M, when the damaged EC block does not exceed M, the correct N block data blocks may be recovered by using an EC erasure technique through any N EC blocks (data blocks or check blocks) and combined into a complete data file, so that when the number of storage nodes is greater than or equal to N + M and at most one EC block is stored on each storage node, the storage pool supports simultaneous failure of M storage nodes, that is, the maximum number of failed nodes supported by the storage pool is M. When more than 1 EC block can be stored on a storage node, it should be understood that the number of EC blocks stored on the storage node does not exceed M, and the maximum number of failed nodes supported by the storage pool should be satisfied that even if several storage nodes storing the largest number of EC blocks fail, the number of failed EC blocks in the storage pool does not exceed M, for example, when M is 4, and the storage nodes can store 2 EC blocks at most, the maximum number of failed nodes supported by the storage pool should be 2, so that even if 2 storage nodes storing 2 EC blocks fail at the same time, the total failed EC blocks do not exceed 4; and when M is 3 and at most 2 EC blocks can be stored on the storage node, the maximum number of failed nodes supported by the corresponding storage pool should be 1 to ensure that the number of failed EC blocks does not exceed 3.
In step S302, the storage system normally receives and processes a data read/write request, in addition to performing batch upgrade on the storage nodes and recording related information during the upgrade process, as shown in steps S3021 and S3022.
In step S3021, when the read data request is received, if the data protection policy in the storage pool is the EC policy and the EC policy is N + M, the maximum number of failed nodes supported by the storage pool is Y, that is, Y storage nodes are being upgraded, it is known that Y is less than or equal to M according to the relationship between the storage nodes and the storage pool, and the number of storage nodes that are not being upgraded is greater than or equal to N. When a data reading request is received, reading EC blocks from N storage nodes which are not in upgrading, if all the read EC blocks are data blocks, directly combining all the data blocks into a complete data file, if the read EC blocks contain check blocks, calculating all the data blocks through an EC erasure technology, combining the data blocks into a complete data file, and returning the data file to a requesting party.
In step S3022, when a write data request is received, if the data protection policy in the storage pool is the EC policy, splitting the newly written data into N data blocks, calculating M parity blocks, and writing the data blocks and the parity blocks into the corresponding storage nodes that are not in the upgrade, where the storage nodes to which each of the data blocks and the parity blocks needs to be written are determined by a computer. For a storage node being upgraded, if a data block or a check block needs to be written into the storage node, it is considered that the data block or the check block is failed to write, and a loss of the data block or the check block is caused, that is, the newly written data has a problem of incomplete writing, and the data is incompletely written, so that index information of the newly written data needs to be recorded, so that the information is recovered after the upgrade is completed, and the data can be completely stored in a storage system.
In step S303, in the process of recovering data, if the data protection policy in the storage pool is the EC policy and the EC policy is N + M, if the storage node of the data to be recovered corresponds to one data block, the missing data block may be calculated by using other N-1 data blocks and one check block, and the missing data block is written into the storage node of the data to be recovered; if the storage node of the data to be recovered corresponds to one check block, the check block can be recalculated through the N data blocks, and then the check block corresponding to the storage node of the data to be recovered is replaced.
In some embodiments, the protection policy for the storage pool is a multiple copy policy. Fig. 4 is a flowchart illustrating a method for online upgrade of storage nodes in a storage pool using a multi-copy policy in a distributed storage system according to an exemplary embodiment of the present application, where the method includes the following steps:
step S401, determining the maximum number of fault storage nodes supported by a storage pool based on the number of the data copies at least simultaneously online, wherein the storage pool comprises at least one storage node;
step S402, upgrading the storage nodes in batches and recording related information in the upgrading process, wherein the number of the storage nodes upgraded in each batch does not exceed the maximum number of the fault nodes, and the related information comprises the related information of the storage nodes upgraded in batches and the index information of data which is not completely written in the upgrading process;
step S4021, when a data reading request is received, reading the data copy from any storage node which is not in upgrading, and returning the data copy to a requester;
step S4022, when a data writing request is received, copying newly written data into K data copies, writing the data copies into corresponding storage nodes which are not in upgrading respectively, and recording index information of the newly written data if at least one storage node corresponding to the data copy is in upgrading;
and S403, performing data recovery on the upgraded storage node according to the related information.
In step S401, if the protection policy of the storage pool is a multi-copy policy, and the copy number is K, where K is greater than 1, the maximum number of the failed nodes supported by the storage pool is determined based on the number of the data copies that are online at least at the same time, where the number of the data copies that are online at least at the same time is the number of the data copies that must be online at the same time required for processing the read/write operation in order to ensure normal operation of the access service. For example, if the multiple copy policies of the storage pool support that more than half of copies can be read and written when online, when K is an even number, the storage pool supports that K/2-1 storage nodes simultaneously fail; when K is an odd number, the storage pool supports (K +1)/2-1 storage nodes to simultaneously fail; if the multi-copy strategy of the storage pool supports that 1 copy can be read and written when online, K-1 storage nodes can be supported to be in failure. Particularly, in order to ensure the availability of the system, at least 2 storage nodes can be reserved to be online when 1 copy is online and can be read and written, namely, K-2 storage nodes can be supported to be failed.
In step S402, the storage system normally receives and processes the data read/write request, in addition to performing batch upgrade on the storage nodes and recording related information in the upgrade process, as shown in steps S4021 and S4022.
In step S4021, when a request to read data is received, if the data protection policy in the storage pool is a multi-copy policy, the number of copies is K, and the maximum number of failed nodes supported by the storage pool is Y, that is, Y storage nodes are being upgraded, and at least one storage node is not in the upgrade. When a data reading request is received, reading a data copy from any storage node which is not upgraded, and returning the data copy to the requester, wherein the data copy returned to the requester should be the latest data copy.
In step S4022, when a request to write data is received, if the data protection policy in the storage pool is a multi-copy policy, copying the newly written data into K data copies, and writing the data copies into the corresponding storage nodes not in the upgrade, where the storage nodes to which each data copy needs to be written are determined by the computer. For a storage node being upgraded, if a data copy needs to be written into the storage node, it is considered that the data copy is failed to be written, and a loss of the data copy is caused, that is, the newly written data has a problem of incomplete writing, and the data is incompletely written, so that index information of the newly written data needs to be recorded, so that the information is recovered after the upgrade is completed, and the data can be completely stored in a storage system.
In step S403, in the process of recovering data, if the data protection policy of the storage pool is a multi-copy policy, the data missing from the storage node of the data to be recovered may be recovered from the data copy corresponding to any storage node except the storage node of the data to be recovered, and the missing data may be written into the storage node of the data to be recovered.
Corresponding to the embodiment of the method, the application also provides a device for online upgrading of the storage node in the distributed storage system. As shown in fig. 5, fig. 5 is a schematic structural diagram of an apparatus for online upgrading a storage node in a distributed storage system according to an exemplary embodiment of the present application, where the apparatus includes:
a number calculation unit 510 for determining a maximum number of failed storage nodes supported by a storage pool, the storage pool including at least one storage node;
a batch upgrading unit 520, configured to upgrade the storage nodes in batches and record related information in an upgrading process, where the number of the storage nodes in each batch of upgrading does not exceed the maximum number of failed nodes, and the related information includes associated information of the storage nodes in batch upgrading and index information of data that is not completely written in the upgrading process;
a data recovery unit 530, configured to perform data recovery on the upgraded storage node according to the relevant information.
The implementation process of the functions and actions of each unit in the above device is specifically described in the implementation process of the corresponding step in the above method, and is not described herein again.
For the device embodiments, since they substantially correspond to the method embodiments, reference may be made to the partial description of the method embodiments for relevant points. The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the scheme of the application. One of ordinary skill in the art can understand and implement it without inventive effort.
In an exemplary embodiment, the present application further provides a distributed storage system including at least one storage node and a management node. The management node can control online upgrade maintenance and the like of the storage nodes in the distributed storage system, wherein the method for online upgrading the storage nodes by the distributed storage system is any method for online upgrading the storage nodes in the distributed storage system, which is provided by the present specification, that is, the management node is used for implementing any step of the method for online upgrading the storage nodes in the distributed storage system, which is provided by the present application. The storage node is used for responding to a read-write instruction of the management node, that is, data storage and reading and writing are performed on the storage node, and meanwhile, the storage node is also an implementation object of the method for online upgrading of the storage node in the distributed storage system.
In some embodiments, the management node may be a server node different from the storage node or may be the same server node as the storage node. In some embodiments, only a part of the storage nodes may be management nodes at the same time, and the other storage nodes are not management nodes, or all the storage nodes may be management nodes at the same time, or all the storage nodes may not be management nodes. This is not limited in this application.
In an exemplary embodiment, the present application further provides a non-transitory computer-readable storage medium comprising instructions, which when executed by a processor on a computer device or a control terminal of a distributed storage system, enable the computer device or the control terminal of the distributed storage system to perform any one of the methods for online upgrade of storage nodes in the distributed storage system proposed in the present application. The non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.
The foregoing description of specific embodiments of the present application has been presented. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.
Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.
It will be understood that the present application is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the application is limited only by the appended claims.
The above description is only exemplary of the present application and should not be taken as limiting the present application, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the scope of protection of the present application.

Claims (10)

1. A method for online upgrading of storage nodes in a distributed storage system is characterized by comprising the following steps:
determining a maximum number of failed storage nodes supported by a storage pool, the storage pool including at least one storage node;
the storage nodes are upgraded in batches, and relevant information in the upgrading process is recorded, wherein the number of the storage nodes upgraded in each batch does not exceed the maximum fault node number, and the relevant information comprises the relevant information of the storage nodes upgraded in batches and index information of data which is not completely written in the upgrading process;
and performing data recovery on the upgraded storage node according to the related information.
2. The method of claim 1, further comprising:
if the storage nodes belong to at least two storage pools simultaneously, the storage nodes are upgraded firstly, and then other storage nodes are upgraded, wherein the number of the storage nodes which belong to at least two storage pools simultaneously during each batch of upgrade is the minimum value of the maximum number of the fault nodes corresponding to all the storage pools to which the storage nodes belong.
3. The method of claim 1, wherein the time for recovering the updated storage nodes is after the storage nodes of the current batch of updates have been updated or after all the storage nodes in the storage pool have been updated.
4. The method of claim 1, wherein the storage nodes of the storage pool store data files based on an EC policy that data files are split into N data blocks and M parity blocks are computed, and wherein the data blocks and the parity blocks are stored in a plurality of storage nodes, and wherein the maximum number of failed nodes is determined based on the number of parity blocks.
5. The method of claim 4, further comprising:
when a data reading request is received, reading the data blocks or the check blocks from the N storage nodes which are not in upgrading to form a data file, and returning the data file to a requesting party;
when a data writing request is received, splitting newly written data into N data blocks, calculating M check blocks, respectively writing the data blocks and the check blocks into corresponding storage nodes which are not in upgrading, and recording index information of the newly written data if at least one of the data blocks or the storage nodes corresponding to the check blocks is in upgrading.
6. The method of claim 1, wherein the storage nodes of the storage pool store data files based on a multi-copy policy, the multi-copy policy being to copy data files into K data copies and store each of the data copies in a storage node, the maximum number of failed nodes being determined based on the number of data copies that are at least simultaneously online.
7. The method of claim 6, further comprising:
when a data reading request is received, reading the data copy from any storage node which is not in upgrading, and returning the data copy to a requester;
when a data writing request is received, copying newly written data into K data copies, respectively writing the data copies into corresponding storage nodes which are not in upgrading, and recording index information of the newly written data if at least one storage node corresponding to the data copy is in upgrading.
8. An online upgrade device for storage nodes in a distributed storage system, the device comprising:
a number calculation unit for determining a maximum number of failed storage nodes supported by a storage pool, the storage pool including at least one storage node;
the batch upgrading unit is used for carrying out batch upgrading on the storage nodes and recording related information in the upgrading process, wherein the number of the storage nodes of each batch of upgrading does not exceed the maximum fault node number, and the related information comprises the related information of the storage nodes carrying out batch upgrading and the index information of data which is not completely written in the upgrading process;
and the data recovery unit is used for recovering the data of the upgraded storage node according to the related information.
9. A distributed storage system, the system comprising:
at least one storage node; and a management node;
the management node is configured to implement the steps of the method according to any one of claims 1 to 7;
and the storage node is used for responding to the read-write instruction of the management node.
10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 7.
CN202111395328.4A 2021-11-23 2021-11-23 Storage node online upgrading method, device, system and storage medium Pending CN114138192A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111395328.4A CN114138192A (en) 2021-11-23 2021-11-23 Storage node online upgrading method, device, system and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111395328.4A CN114138192A (en) 2021-11-23 2021-11-23 Storage node online upgrading method, device, system and storage medium

Publications (1)

Publication Number Publication Date
CN114138192A true CN114138192A (en) 2022-03-04

Family

ID=80391458

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111395328.4A Pending CN114138192A (en) 2021-11-23 2021-11-23 Storage node online upgrading method, device, system and storage medium

Country Status (1)

Country Link
CN (1) CN114138192A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115373896A (en) * 2022-06-23 2022-11-22 北京志凌海纳科技有限公司 Replica data recovery method and system based on distributed block storage

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060130042A1 (en) * 2004-12-15 2006-06-15 Dias Daniel M Method and apparatus for dynamic application upgrade in cluster and grid systems for supporting service level agreements
CN104503781A (en) * 2014-12-10 2015-04-08 华为技术有限公司 Firmware upgrading method for hard disk and storage system
US20170075761A1 (en) * 2014-12-09 2017-03-16 Hitachi Data Systems Corporation A system and method for providing thin-provisioned block storage with multiple data protection classes
US9798534B1 (en) * 2015-07-01 2017-10-24 EMC IP Holding Company LLC Method and system to perform non-intrusive online disk firmware upgrades
US20180060061A1 (en) * 2016-08-26 2018-03-01 Nicira, Inc. Method and system for tracking progress and providing fault tolerance in automated upgrade of a network virtualization platform
CN107943510A (en) * 2017-11-23 2018-04-20 郑州云海信息技术有限公司 Distributed memory system upgrade method, system, device and readable storage medium storing program for executing
CN109525410A (en) * 2017-09-20 2019-03-26 华为技术有限公司 The method, apparatus and distributed memory system of distributed memory system updating and management
CN113138880A (en) * 2021-04-09 2021-07-20 浙商银行股份有限公司 Block chain system gray level release method, device, equipment and storage medium

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060130042A1 (en) * 2004-12-15 2006-06-15 Dias Daniel M Method and apparatus for dynamic application upgrade in cluster and grid systems for supporting service level agreements
US20170075761A1 (en) * 2014-12-09 2017-03-16 Hitachi Data Systems Corporation A system and method for providing thin-provisioned block storage with multiple data protection classes
CN104503781A (en) * 2014-12-10 2015-04-08 华为技术有限公司 Firmware upgrading method for hard disk and storage system
US9798534B1 (en) * 2015-07-01 2017-10-24 EMC IP Holding Company LLC Method and system to perform non-intrusive online disk firmware upgrades
US20180060061A1 (en) * 2016-08-26 2018-03-01 Nicira, Inc. Method and system for tracking progress and providing fault tolerance in automated upgrade of a network virtualization platform
CN109525410A (en) * 2017-09-20 2019-03-26 华为技术有限公司 The method, apparatus and distributed memory system of distributed memory system updating and management
CN107943510A (en) * 2017-11-23 2018-04-20 郑州云海信息技术有限公司 Distributed memory system upgrade method, system, device and readable storage medium storing program for executing
CN113138880A (en) * 2021-04-09 2021-07-20 浙商银行股份有限公司 Block chain system gray level release method, device, equipment and storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115373896A (en) * 2022-06-23 2022-11-22 北京志凌海纳科技有限公司 Replica data recovery method and system based on distributed block storage

Similar Documents

Publication Publication Date Title
US8060468B2 (en) Storage system and data recovery method
US7509544B2 (en) Data repair and synchronization method of dual flash read only memory
JP2005301497A (en) Storage management system, restoration method and its program
CN110058787B (en) Method, apparatus and computer program product for writing data
CN110058965B (en) Data reconstruction method and device in storage system
US8019953B2 (en) Method for providing atomicity for host write input/outputs (I/Os) in a continuous data protection (CDP)-enabled volume using intent log
CN111400267A (en) Method and device for recording log
CN111125040A (en) Method, apparatus and storage medium for managing redo log
CN107092598A (en) The management method and device of data storage location information
US9436554B2 (en) Information processing apparatus and data repairing method
CN114138192A (en) Storage node online upgrading method, device, system and storage medium
CN105892954A (en) Data storage method and device based on multiple copies
CN112486942A (en) Multi-copy storage method and multi-copy storage system for file data
JP5719083B2 (en) Database apparatus, program, and data processing method
CN110287164B (en) Data recovery method and device and computer equipment
CN112000623A (en) Metadata access method and device and computer readable storage medium
US10452496B2 (en) System and method for managing storage transaction requests
CN113703673B (en) Single machine data storage method and related device
US20130110789A1 (en) Method of, and apparatus for, recovering data on a storage system
CN112346913A (en) Data recovery method, device, equipment and storage medium
CN114064346A (en) Erasure code data consistency guaranteeing method and system
CN108984343B (en) Virtual machine backup and storage management method based on content analysis
CN113625950A (en) Method, system, equipment and medium for initializing redundant array of independent disks
CN111400098A (en) Copy management method and device, electronic equipment and storage medium
US11256583B2 (en) Efficient handling of RAID-F component repair failures

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination