WO2024099444A1 - Storage cluster upgrade control method and apparatus, device, and storage medium - Google Patents

Storage cluster upgrade control method and apparatus, device, and storage medium Download PDF

Info

Publication number
WO2024099444A1
WO2024099444A1 PCT/CN2023/131087 CN2023131087W WO2024099444A1 WO 2024099444 A1 WO2024099444 A1 WO 2024099444A1 CN 2023131087 W CN2023131087 W CN 2023131087W WO 2024099444 A1 WO2024099444 A1 WO 2024099444A1
Authority
WO
WIPO (PCT)
Prior art keywords
weight
storage
upgrade
storage node
node
Prior art date
Application number
PCT/CN2023/131087
Other languages
French (fr)
Chinese (zh)
Inventor
韩宾
Original Assignee
苏州元脑智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 苏州元脑智能科技有限公司 filed Critical 苏州元脑智能科技有限公司
Publication of WO2024099444A1 publication Critical patent/WO2024099444A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/60Software deployment
    • G06F8/65Updates
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]

Definitions

  • the present application relates to the technical field of data storage, and in particular to a storage cluster upgrade control method, device, equipment and non-volatile readable storage medium.
  • the purpose of this application is to provide a storage cluster upgrade control method, device, equipment and non-volatile readable storage medium, which can realize efficient concurrent online upgrade of large-scale clusters while ensuring that the storage cluster business system is not affected.
  • the specific scheme is as follows:
  • a first aspect of the present application provides a storage cluster upgrade control method, comprising:
  • weight classes include all storage services deployed in the storage cluster and the underlying fault domain of the storage cluster; the weights represent the number of nodes that the corresponding weight class allows to be upgraded concurrently;
  • the online concurrent upgrade process of each storage node in the storage cluster is controlled.
  • determining a target weight class corresponding to each storage node in the storage cluster and a target weight value corresponding to the target weight class includes:
  • Each storage service deployed on each storage node in the storage cluster and the underlying fault domain to which the storage node belongs are determined as the target weight class corresponding to the storage node, and the target weight value corresponding to the target weight class is determined based on all weight classes and the weight values corresponding to the weight classes.
  • the method further includes:
  • weights corresponding to the other weight classes are added to the weight matrix.
  • the online concurrent upgrade process of each storage node in the storage cluster is controlled based on the weight matrix of each storage node, including:
  • the storage nodes corresponding to the weight matrix of the target weight class without duplication are divided into concurrent upgrade node groups, and each storage node in the concurrent upgrade node group is controlled to perform online concurrent upgrade; wherein the concurrent upgrade node group contains at least one storage node;
  • the remaining storage nodes in the storage cluster that are not divided into the concurrent upgrade node group are divided into the remaining storage node group, and the target weight corresponding to the target weight class of the storage node divided into the concurrent upgrade node group in the weight matrix of the remaining storage node group is reduced by 1 to obtain a more weight matrix for each storage node;
  • the online concurrent upgrade process of each storage node in the storage cluster is controlled.
  • the storage nodes corresponding to the weight matrices of the target weight classes that do not have duplication are divided into concurrent upgrade node groups, including:
  • the weight matrices of all storage nodes are processed using the bubbling principle to determine the storage nodes corresponding to the weight matrices of the target weight classes that do not have duplicates, and divide them into concurrent upgrade node groups.
  • the weight matrices of all storage nodes are processed using the bubbling principle to determine the storage nodes corresponding to the weight matrices of the target weight class that do not have duplication, including:
  • the weight matrix including nodes with completely different weight types is selected from the weight matrix table, wherein the weight matrix table includes the weight matrices of all nodes in the storage cluster, and the completely different weight types of the nodes are used to represent the isolation between the nodes;
  • Storage nodes with completely different weight categories are screened out from the weight matrix including nodes with completely different weight categories.
  • the online concurrent upgrade process of each storage node in the storage cluster is controlled based on the updated weight matrix of each storage node, including:
  • Subtract 1 from the target weight corresponding to the target weight class of the storage node that is currently assigned to the concurrent upgrade node group in the weight matrix of the remaining storage node group after the elimination, to obtain an updated weight matrix of each storage node;
  • the storage node with the least number of weight classes is selected from the remaining storage node group after elimination, including:
  • one storage node is randomly selected from the multiple storage nodes as the storage node with the least number of weighted classes.
  • the latest remaining storage node group after obtaining the latest remaining storage node group, it also includes:
  • the method further includes:
  • the screened nodes are upgraded concurrently, and the weights of the corresponding weight classes in the remaining non-upgraded nodes are reduced by 1 according to the weight classes of the upgraded nodes.
  • the weight corresponding to each weight class is greater than or equal to 1.
  • the storage cluster upgrade control method further includes:
  • the real-time business pressure of the storage cluster is obtained, and the weight corresponding to the weight class is adjusted according to the real-time business pressure.
  • the weights corresponding to the weight classes are adjusted according to real-time business pressure, including:
  • the weight corresponding to the weight class is increased to reduce the number of concurrencies
  • the weight corresponding to the weight class is lowered to increase the number of concurrencies.
  • the storage cluster upgrade control method further includes:
  • the upgrade process controller is used to trigger the storage cluster upgrade to be in a paused state and archive the current upgrade progress, or to trigger the storage cluster to continue the upgrade.
  • using the upgrade process controller to trigger the storage cluster upgrade to be in a paused state and archive the current upgrade progress includes:
  • the upgrade process controller When receiving an instruction from a user to interrupt and exit the upgrade process before the upgrade is completed, the upgrade process controller responds to the instruction and triggers the cluster to suspend the upgrade process.
  • triggering the storage cluster to continue the upgrade includes:
  • the storage cluster can be restored to continue the upgrade process through the continue function of the upgrade process controller.
  • the storage cluster is a cluster under a distributed storage system and has an underlying fault domain structure based on a scalable pseudo-random data distribution algorithm structure.
  • a second aspect of the present application provides a storage cluster upgrade control device, including:
  • the weight class and weight value determination module is configured to determine all weight classes in the storage cluster and weight values corresponding to the weight classes; wherein the weight classes include all storage services deployed in the storage cluster and the underlying fault domain of the storage cluster; the weight values represent the number of nodes that the corresponding weight class allows to be upgraded concurrently;
  • a weight matrix generation module is configured to determine a target weight class corresponding to each storage node in the storage cluster and a target weight corresponding to the target weight class to generate a weight matrix for each storage node;
  • the upgrade module is configured to upgrade each storage node in the storage cluster based on the weight matrix of each storage node. Control the online concurrent upgrade process of the node.
  • a third aspect of the present application provides an electronic device, the electronic device comprising a processor and a memory; wherein the memory is configured to store a computer program, and the computer program is loaded and executed by the processor to implement the aforementioned storage cluster upgrade control method.
  • a fourth aspect of the present application provides a computer non-volatile readable storage medium, in which computer executable instructions are stored.
  • the computer executable instructions are loaded and executed by a processor, the aforementioned storage cluster upgrade control method is implemented.
  • all weight classes and weights corresponding to the weight classes in the storage cluster are first determined; wherein the weight classes include all storage services deployed in the storage cluster and the underlying fault domains of the storage cluster; the weights represent the number of nodes that the corresponding weight classes allow for concurrent upgrades; then the target weight classes corresponding to each storage node in the storage cluster and the target weights corresponding to the target weight classes are determined to generate a weight matrix for each storage node; finally, based on the weight matrix of each storage node, the online concurrent upgrade process of each storage node in the storage cluster is controlled.
  • this application sets a weight matrix for the storage nodes in the storage cluster, and controls the upgrade process of the storage nodes by controlling the weight classes and weights in the weight matrix, thereby achieving efficient concurrent online upgrades of large-scale clusters while ensuring that the storage cluster business system is not affected.
  • FIG1 is a flow chart of a storage cluster upgrade control method provided by the present application.
  • FIG2 is a flow chart of an optional storage cluster upgrade control method provided by the present application.
  • FIG3 is a schematic diagram of an optional storage cluster upgrade control method provided by the present application.
  • FIG4 is a schematic diagram of the structure of a storage cluster upgrade control device provided by the present application.
  • FIG5 is a structural diagram of a storage cluster upgrade control electronic device provided in the present application.
  • the online upgrade method of the existing storage system is serial, that is, after one node is upgraded, the next node is upgraded or concurrent upgrades are achieved in some scenarios.
  • the existing online upgrade solutions have many restrictions and are not universal.
  • the present application provides a storage cluster upgrade control solution, which sets a weight matrix for the storage nodes in the storage cluster, and controls the upgrade process of the storage nodes by controlling the weight classes and weights in the weight matrix, thereby achieving efficient concurrent online upgrades of large-scale clusters while ensuring that the storage cluster business system is not affected.
  • FIG1 is a flow chart of a storage cluster upgrade control method provided by an embodiment of the present application.
  • the storage cluster upgrade control method includes:
  • S11 Determine all weight classes in the storage cluster and weights corresponding to the weight classes; wherein the weight classes include all storage services deployed in the storage cluster and the underlying fault domain of the storage cluster; and the weights represent the number of nodes that the corresponding weight classes are allowed to upgrade concurrently.
  • the weight classes include all storage services deployed in the storage cluster and the underlying fault domain of the storage cluster; the weights represent the number of nodes that the corresponding weight classes allow to be upgraded concurrently.
  • the weight classes are the various services deployed on the nodes and the underlying fault domain structure to which the nodes belong.
  • One service or one fault domain structure is a weight class, and a storage node can contain multiple weight classes.
  • the weight is the number of nodes that each weight class allows to be upgraded concurrently.
  • the weight class determines whether the node can be upgraded, and the weight determines how many nodes can be upgraded concurrently at the same time.
  • the storage cluster of this embodiment is a cluster under a distributed storage system and is based on an underlying fault domain structure based on an extensible pseudo-random data distribution algorithm structure (crush root structure).
  • the online upgrade of this embodiment has two basic rules, which are formulated by the upgrade process controller.
  • Basic rule 1 The weight of the ownership class under the node must be greater than 0;
  • Basic rule 2 The number of nodes with the same weight class that are upgraded concurrently cannot exceed their basic weight.
  • the process controller upgrade rules are based on the storage underlying fault domain structure (crush root structure), so the impact of the upper layer protocol on the upgrade rules can be effectively avoided, thereby realizing concurrent online upgrades in all scenarios.
  • S12 Determine the target weight class corresponding to each storage node in the storage cluster and the target weight value corresponding to the target weight class to generate a weight matrix for each storage node.
  • the target weight class corresponding to each storage node in the storage cluster and the target weight value corresponding to the target weight class are determined to generate a weight matrix for each storage node.
  • each storage service deployed on each storage node in the storage cluster and the underlying fault domain to which the storage node belongs are determined as the target weight class corresponding to the storage node, and the target weight value corresponding to the target weight class is determined based on all weight classes and the weight values corresponding to the weight classes.
  • the node weight matrix is obtained. Whether a node can be upgraded depends on the node weight matrix.
  • the node weight matrix consists of two dimensions: one is the weight class and the other is the weight.
  • the process controller based on the node weight matrix has high scalability, that is, when other weight classes need to be added, only the weight of the corresponding weight class needs to be added to the weight matrix. And the process controller based on the node weight matrix has high concurrency, which can ensure that the storage cluster is in the best concurrent upgrade state in real time.
  • the online concurrent upgrade process of each storage node in the storage cluster is controlled based on the weight matrix of each storage node.
  • the process controller based on the node weight matrix in this embodiment has high agility.
  • the cluster business or service is under high pressure, it can quickly affect and adjust the number of concurrently upgraded nodes to ensure the normal cluster business.
  • first obtain the real-time business pressure of the storage cluster and adjust the weight corresponding to the weight class according to the real-time business pressure.
  • the size of the weight corresponding to the weight class is increased to reduce the number of concurrency; if the real-time business pressure is lower than the preset pressure value, the size of the weight corresponding to the weight class is lowered to increase the number of concurrency.
  • Automatic adjustment of the weight class weights during the online upgrade process can be achieved.
  • the upgrade process controller When the controller detects that the storage cluster business pressure is high or the pressure of a certain service is high, it can actively adjust the weight of the node weight class. Adjusting the weight size can directly affect the number of concurrent upgrades of the nodes of this weight class, thereby ensuring that more nodes can handle cluster business.
  • This embodiment can also use the upgrade process controller to trigger the storage cluster upgrade to be in a suspended state and archive the current upgrade progress, or trigger the storage cluster to continue the upgrade.
  • the upgrade process controller has the function of pausing and continuing the upgrade. Before the upgrade is completed, the user can interrupt and exit the upgrade process at any time and can actively trigger the cluster to suspend the upgrade process. After the upgrade is suspended, the cluster can be restored through the continue function to continue the upgrade.
  • the embodiment of the present application first determines all weight classes in the storage cluster and the weights corresponding to the weight classes; wherein the weight classes include all storage services deployed in the storage cluster and the underlying fault domains of the storage cluster; the weights represent the number of nodes that the corresponding weight class allows for concurrent upgrades; then the target weight class corresponding to each storage node in the storage cluster and the target weight class corresponding to the target weight class are determined to generate a weight matrix for each storage node; finally, based on the weight matrix of each storage node, the online concurrent upgrade process of each storage node in the storage cluster is controlled.
  • the embodiment of the present application sets a weight matrix for the storage nodes in the storage cluster, and controls the upgrade process of the storage nodes by controlling the weight classes and weights in the weight matrix, thereby achieving efficient concurrent online upgrades of large-scale clusters while ensuring that the storage cluster business system is not affected.
  • FIG2 is a flow chart of an optional storage cluster upgrade control method provided in an embodiment of the present application.
  • the storage cluster upgrade control method includes:
  • S21 Determine all weight classes and weights corresponding to the weight classes in the storage cluster, and determine the target weight class corresponding to each storage node in the storage cluster and the target weight corresponding to the target weight class to generate a weight matrix for each storage node.
  • S22 Divide the storage nodes corresponding to the weight matrix of the target weight class without duplication into a concurrent upgrade node group, and control each storage node in the concurrent upgrade node group to perform online concurrent upgrade.
  • the storage nodes corresponding to the weight matrices of the target weight classes that do not have duplicates are divided into concurrent upgrade node groups.
  • the weight matrices of all storage nodes can be processed using the bubbling principle to determine the storage nodes corresponding to the weight matrices of the target weight classes that do not have duplicates, and divide them into concurrent upgrade node groups. That is, nodes with completely different weight classes are screened out. Completely different node weight classes indicate that the nodes are isolated from each other and can be upgraded concurrently. Then, each storage node in the concurrent upgrade node group is controlled to perform online concurrent upgrades. Among them, the concurrent upgrade node group contains at least one storage node.
  • the remaining storage nodes in the storage cluster that are not divided into the concurrent upgrade node group are divided into the remaining storage node group, and then the target weight corresponding to the target weight class of the storage node divided into the concurrent upgrade node group in the weight matrix of the remaining storage node group is reduced by 1 to obtain the target weight of each storage node.
  • the node weight matrix is updated.
  • the selected nodes are upgraded concurrently, and the weights of the corresponding weight classes of the remaining non-upgraded nodes are adjusted according to the weight classes of the upgraded nodes, and the weights of the corresponding weight classes are reduced by 1.
  • S25 Filter out the storage node with the least number of weight classes from the remaining storage node group after elimination, and divide the storage node into the concurrent upgrade node group.
  • the storage nodes corresponding to the weight matrices with target weights of 0 in the remaining storage node group are eliminated to obtain the remaining storage node group after elimination. That is, the nodes whose weights are all not zero among the remaining nodes are screened, wherein the fact that all weights are not zero indicates that they can be upgraded.
  • Each target weight in the weight matrix of the storage nodes in the remaining storage node group after elimination is not 0.
  • the storage node with the least number of weight classes is screened out from the remaining storage node group after elimination, and the storage node is divided into the concurrent upgrade node group. In particular, if there are multiple storage nodes with the least number of weight classes in the remaining storage node group after elimination, one is randomly selected from the multiple storage nodes as the storage node with the least number of weight classes.
  • the target weight corresponding to the target weight class of the storage node divided into the concurrent upgrade node group in the weight matrix of the remaining storage node group after elimination is reduced by 1, and the updated weight matrix of each storage node is obtained. Then, the node elimination step, the storage node selection according to the number of weight classes, and the weight reduction step are repeated until the latest weight matrix of each storage node in the remaining storage node group after elimination has a target weight of 0, and the latest remaining storage node group is obtained.
  • the node elimination step, the storage node screening according to the number of weight classes and the weight reduction step are repeatedly executed until there is zero in the weights of the remaining nodes.
  • a weight value of 0 indicates that the number of concurrent upgrade nodes of the weight class has reached the threshold, and the nodes of the weight class are not allowed to be upgraded before the upgrade of the nodes of the weight class is completed.
  • S28 Determine whether the upgrade of each storage node in the concurrent upgrade node group is completed. If the upgrade of each storage node in the concurrent upgrade node group is completed, restore the weight matrix of the upgraded storage node, and add 1 to the target weight corresponding to the target weight class of the restored storage node in the weight matrix of the latest remaining storage node group to obtain the final remaining storage node group.
  • each storage node in the concurrently upgraded node group has been upgraded. If the storage nodes in the concurrently upgraded node group have been upgraded, the weight matrix of the upgraded storage node is restored, and the weight matrix of the latest remaining storage node group that matches the restored storage node is restored.
  • the target weight corresponding to the target weight class of the storage node is added by 1 to obtain the final remaining storage node group. That is, after the upgrade of the node waiting to be upgraded is completed, the weight of the corresponding weight class in the remaining non-upgraded nodes is refreshed according to the weight class of the upgraded node, and the weight of the corresponding weight class is added by 1.
  • the node elimination steps, storage node screening according to the number of weight classes, weight reduction step, node weight matrix recovery and weight addition step are repeated for the final remaining storage node group until all storage nodes are upgraded.
  • the upgrade control algorithm provided in the embodiment of the present application is described below with an example, and the operation flow of the process controller is shown in FIG3 .
  • a storage cluster consisting of 10 storage nodes, three services f1, f2, and f3 are deployed, and the underlying fault domains are d1 and d2.
  • the weights of f1, f2, and f3, that is, the concurrency thresholds, are set to 2, 2, and the weights of d1 and d2, that is, the concurrency thresholds, are set to 2, 2.
  • Step 1 First, generate a storage cluster node weight matrix table, which contains the weight matrices of all storage nodes.
  • the weight matrix tables of each storage node are constructed as follows: n1: [f1, 2] [d1, 2]; n2: [f1, 2] [d1, 2]; n3: [f1, 2] [d1, 2]; n4: [f2, 2] [d2, 2]; n5: [f2, 2] [d2, 2]; n6: [f2, 2] [d2, 2]; n7: [f3, 2] [d1, 2]; n8: [f3, 2] [d2, 2]; n9: [f3, 2]; n10: [d2, 2].
  • Step 2 Use the bubbling principle to filter nodes, and first filter out nodes with completely different weight classes.
  • Completely different node weight classes indicate that the nodes are isolated from each other and can be upgraded concurrently.
  • Step 3 Upgrade the selected nodes concurrently, and adjust the weights of the corresponding weight classes in the remaining non-upgraded nodes according to the weight classes of the upgraded nodes, and reduce the weights of the corresponding weight classes by 1.
  • the weights of the remaining nodes are reduced by 1, specifically, the weights of the five weight classes f1, d1, f2, d2, and f3 are reduced by 1.
  • the updated weight matrix is: n2: [f1, 1][d1, 1]; n3: [f1, 1][d1, 1]; n4: [f2, 1][d2, 1]; n5: [f2, 1][d2, 1]; n6: [f2, 1][d2, 1]; n7: [f3, 1][d1, 1]; n8: [f3, 1][d2, 1]; n10: [d2, 1].
  • Step 4 Filter the remaining nodes whose weights are all non-zero (if the weights are all non-zero, it means they can be upgraded).
  • Nodes whose weights are all non-zero n2: [f1, 1] [d1, 1]; n3: [f1, 1] [d1, 1]; n4: [f2, 1] [d2, 1]; n5: [f2, 1] [d2, 1]; n6: [f2, 1] [d2, 1]; n7: [f3, 1] [d1, 1]; n8: [f3, 1] [d2, 1]; n10: [d2, 1].
  • Step 5 Select the node with the smallest number of weight classes from the nodes generated in step 4 to upgrade and adjust the weights of the corresponding weight classes in the remaining nodes accordingly, and reduce the weight of the corresponding weight class by 1. If the number of weight classes is the same, one is randomly selected.
  • the corresponding weights of the remaining nodes are reduced by 1, specifically, the weight of the weight class d2 is reduced by 1: n2: [f1, 1][d1, 1]; n3: [f1, 1][d1, 1]; n5: [f2, 1][d2, 0]; n6: [f2, 1][d2, 0]; n7: [f3, 1][d1, 1]; n8: [f3, 1][d2, 0].
  • Step 6 Loop through steps 4-5 until the remaining nodes have zero weights (a weight value of 0 indicates that the number of concurrently upgraded nodes of this weight class has reached the threshold, and upgrading nodes of this weight class is not allowed before the upgrade of nodes of this weight class is completed):
  • ⁇ Step 4 Nodes whose weights are not 0: n2: [f1, 1] [d1, 1]; n3: [f1, 1] [d1, 1]; n7: [f3, 1] [d1, 1];
  • Step 5 Select the node with the smallest number of weight classes. If the number of weight classes is the same, select randomly: n2: [f1, 1] [d1, 1]; the corresponding weights of the remaining nodes are reduced by 1: n3: [f1, 0] [d1, 0]; n5: [f2, 1] [d2, 0]; n6: [f2, 1] [d2, 0]; n7: [f3, 1] [d1, 1]; n8: [f3, 1] [d2, 0]. ⁇
  • Step 7 After the node waiting for the upgrade is completed, the weights of the corresponding weight classes of the remaining nodes that have not been upgraded are refreshed according to the weight classes of the nodes that have completed the upgrade, and the weights of the corresponding weight classes are increased by 1.
  • n1: [f1, 2][d1, 2] the weights of the remaining nodes are increased by 1: n3: [f1, 1][d1, 1]; n5: [f2, 1][d2, 0]; n6: [f2, 1][d2, 0]; n7: [f3, 1][d1, 1]; n8: [f3, 1][d2, 0]
  • Step 8 Repeat steps 4-7 until all nodes have completed the upgrade:
  • ⁇ Step 4 Nodes whose weights are not 0: n3: [f1, 1] [d1, 1]; n7: [f3, 1] [d1, 1];
  • Step 5 Select the node with the smallest number of weight classes. If the number of weight classes is the same, select randomly: n3: [f1, 1] [d1, 1]; the corresponding weights of the remaining nodes are reduced by 1: n5: [f2, 1] [d2, 0]; n6: [f2, 1] [d2, 0]; n7: [f3, 1] [d1, 0]; n8: [f3, 1] [d2, 0]; wait for the node weights to recover. ⁇
  • the weights of the weight classes are automatically adjusted.
  • the upgrade process controller detects that the storage cluster business pressure is high or the pressure of a certain service is high, it can actively adjust the weights of the node weight classes. Adjusting the weights can directly affect the number of concurrent upgrades of the nodes of this weight class, thereby ensuring that more nodes can handle cluster business.
  • the embodiment of the present application also discloses a storage cluster upgrade control device, including:
  • the weight class and weight value determination module 11 is configured to determine all weight classes in the storage cluster and weight values corresponding to the weight classes; wherein the weight classes include all storage services deployed in the storage cluster and the underlying fault domain of the storage cluster; the weight values represent the number of nodes that the corresponding weight class allows to be upgraded concurrently;
  • the weight matrix generation module 12 is configured to determine the target weight class corresponding to each storage node in the storage cluster and the target weight corresponding to the target weight class to generate a weight matrix for each storage node;
  • the upgrade module 13 is configured to control the online concurrent upgrade process of each storage node in the storage cluster based on the weight matrix of each storage node.
  • the weight class and weight value determination module 11 determines all weight classes in the storage cluster and the weight values corresponding to the weight classes.
  • the weight classes include all storage services deployed in the storage cluster and the underlying fault domain of the storage cluster; the weight value represents the number of nodes that the corresponding weight class allows to be upgraded concurrently.
  • the weight class is the various services deployed on the node and the underlying fault domain structure to which the node belongs.
  • a service or a fault domain structure is a weight class, and a storage node can contain multiple weight classes.
  • the weight value is the number of nodes that each weight class allows to be upgraded concurrently.
  • the weight class determines whether the node can be upgraded, and the weight value determines how many nodes can be upgraded concurrently at the same time.
  • the storage cluster of this embodiment is a cluster under a distributed storage system and is based on an underlying fault domain structure based on an extensible pseudo-random data distribution algorithm structure (crush root structure).
  • the online upgrade of this embodiment has two basic rules, which are formulated by the upgrade process controller.
  • Basic rule 1 The weight of the ownership class under the node must be greater than 0;
  • Basic rule 2 The number of nodes with the same weight class that are upgraded concurrently cannot exceed their basic weight.
  • the process controller upgrade rules are based on the storage underlying fault domain structure (crush root structure), so it can effectively avoid the impact of the upper layer protocol on the upgrade rules, and thus realize full-scenario concurrent online upgrades.
  • the weight matrix generation module 12 determines the target weight class corresponding to each storage node in the storage cluster and the target weight corresponding to the target weight class to generate the weight matrix of each storage node.
  • each storage service deployed on each storage node in the storage cluster and the underlying fault domain to which the storage node belongs are determined as the target weight class corresponding to the storage node, and the target weight corresponding to the target weight class is determined based on all weight classes and the weights corresponding to the weight classes.
  • the node weight matrix is obtained. Whether a node can be upgraded depends on the node weight matrix.
  • the node weight matrix consists of two dimensions: one is the weight class and the other is the weight.
  • the process controller based on the node weight matrix has high scalability, that is, when other weight classes need to be added, only the weight of the corresponding weight class needs to be added to the weight matrix. And the process controller based on the node weight matrix has high concurrency, which can ensure that the storage cluster is in the best concurrent upgrade state in real time.
  • the upgrade module 13 controls the online concurrent upgrade process of each storage node in the storage cluster based on the weight matrix of each storage node.
  • the process controller based on the node weight matrix in this embodiment has high agility.
  • the cluster business or service is under high pressure, it can quickly affect and adjust the number of nodes for concurrent upgrades to ensure the normal operation of the cluster business.
  • the real-time business pressure of the storage cluster is first obtained, and the weight corresponding to the weight class is adjusted according to the real-time business pressure.
  • the size of the weight corresponding to the weight class is increased to reduce the number of concurrent operations; if the real-time business pressure is lower than the preset pressure value, the size of the weight corresponding to the weight class is lowered to increase the number of concurrent operations.
  • the automatic adjustment of the weight class weight in the online upgrade process can be achieved.
  • the upgrade process controller detects that the storage cluster business pressure is large, or the pressure of a certain service is large, the weight of the node weight class can be actively adjusted.
  • the adjustment of the weight size can directly affect the number of concurrent upgrades of the weight class node, thereby ensuring that more nodes can handle cluster business.
  • This embodiment can also use the upgrade process controller to trigger the storage cluster upgrade to be in a suspended state and archive the current upgrade progress, or trigger the storage cluster to continue the upgrade.
  • the upgrade process controller has the function of pausing and continuing the upgrade. Before the upgrade is completed, the user can interrupt and exit the upgrade process at any time and can actively trigger the cluster to suspend the upgrade process. After the upgrade is suspended, the cluster can be restored through the continue function to continue the upgrade.
  • the embodiment of the present application first determines all weight classes in the storage cluster and the weights corresponding to the weight classes; wherein the weight classes include all storage services deployed in the storage cluster and the underlying fault domains of the storage cluster; the weights represent the number of nodes that the corresponding weight class allows for concurrent upgrades; then the target weight class corresponding to each storage node in the storage cluster and the target weight class corresponding to the target weight class are determined to generate a weight matrix for each storage node; finally, based on the weight matrix of each storage node, the online concurrent upgrade process of each storage node in the storage cluster is controlled.
  • the embodiment of the present application sets a weight matrix for the storage nodes in the storage cluster, and controls the upgrade process of the storage nodes by controlling the weight classes and weights in the weight matrix, thereby achieving efficient concurrent online upgrades of large-scale clusters while ensuring that the storage cluster business system is not affected.
  • the weight matrix generation module 12 is configured to determine each storage service deployed on each storage node in the storage cluster and the underlying fault domain to which the storage node belongs as the target weight class corresponding to the storage node, and determine the target weight corresponding to the target weight class based on all weight classes and the weights corresponding to the weight classes.
  • the upgrade module 13 includes:
  • the first division unit is configured to divide the storage nodes corresponding to the weight matrix of the target weight class without duplication into a concurrent upgrade node group, and control each storage node in the concurrent upgrade node group to perform online concurrent upgrade; wherein the concurrent upgrade node group includes at least one storage node;
  • the second division unit is configured to divide the remaining storage nodes in the storage cluster that are not divided into the concurrent upgrade node group into the remaining storage node group, and reduce the target weight corresponding to the target weight class of the storage node divided into the concurrent upgrade node group in the weight matrix of the remaining storage node group by 1, so as to obtain a more weight matrix of each storage node;
  • the elimination unit is configured to eliminate the storage nodes corresponding to the weight matrices having target weights of 0 in the remaining storage node group, so as to obtain the remaining storage node group after the elimination; wherein each target weight in the weight matrix of the storage nodes in the remaining storage node group after the elimination is not 0;
  • a screening unit is configured to screen out a storage node with the least number of weight classes from the remaining storage node group after elimination, and divide the storage node into a concurrent upgrade node group;
  • a weight reduction unit is configured to reduce by 1 the target weight corresponding to the target weight class of the storage node currently divided into the concurrent upgrade node group in the weight matrix of the remaining storage node group after the elimination, so as to obtain an updated weight matrix of each storage node;
  • the first loop unit is configured to repeatedly execute the node elimination step, the storage node screening according to the number of weight categories, and the weight reduction step until the latest weight matrix of each storage node in the remaining storage node group after the elimination has a target weight of 0, thereby obtaining the latest remaining storage node group;
  • a judgment unit is configured to judge whether each storage node in the concurrently upgraded node group has been upgraded. If each storage node in the concurrently upgraded node group has been upgraded, the weight matrix of the upgraded storage node is restored, and the target weight corresponding to the target weight class of the restored storage node in the weight matrix of the latest remaining storage node group is increased by 1 to obtain a final remaining storage node group;
  • the second loop unit is configured to repeatedly execute the node elimination step, the storage node screening according to the number of weight categories, the weight reduction step, the node weight matrix recovery and the weight increase step for the final remaining storage node group until all storage nodes are upgraded.
  • the first partitioning unit is further configured to process the weight matrices of all storage nodes using the bubbling principle to determine the storage nodes corresponding to the weight matrices of the target weight classes without duplication, and divide them into concurrent upgrade node groups.
  • the elimination unit is further configured to randomly select one storage node from among the plurality of storage nodes as the storage node with the least number of weight classes if there are multiple storage nodes with the least number of weight classes in the remaining storage node group after the elimination.
  • the storage cluster upgrade control device further includes:
  • An adjustment module is configured to obtain real-time business pressure of the storage cluster and adjust the weight corresponding to the weight class according to the real-time business pressure;
  • the control module is configured to use the upgrade process controller to trigger the storage cluster upgrade to be in a suspended state and archive the current upgrade progress, or to trigger the storage cluster to continue to perform the upgrade.
  • the adjustment module includes:
  • the first adjustment unit is configured to increase the weight if the real-time business pressure exceeds the preset pressure value.
  • the size of the weight corresponding to the class to reduce the number of concurrency;
  • the second adjustment unit is configured to reduce the size of the weight corresponding to the weight class to increase the number of concurrency if the real-time business pressure is lower than the preset pressure value.
  • Fig. 5 is a structural diagram of an electronic device 20 according to an exemplary embodiment, and the content in the diagram cannot be regarded as any limitation on the scope of application of the present application.
  • FIG5 is a schematic diagram of the structure of an electronic device 20 provided in an embodiment of the present application.
  • the electronic device 20 may include: at least one processor 21, at least one memory 22, a power supply 23, a communication interface 24, an input/output interface 25, and a communication bus 26.
  • the memory 22 is configured to store a computer program, which is loaded and executed by the processor 21 to implement the relevant steps in the storage cluster upgrade control method disclosed in any of the aforementioned embodiments.
  • the power supply 23 is configured to provide working voltage for each hardware device on the electronic device 20;
  • the communication interface 24 can create a data transmission channel between the electronic device 20 and the external device, and the communication protocol it follows is any communication protocol that can be applied to the technical solution of the present application, and is not specifically limited here;
  • the input and output interface 25 is configured to obtain external input data or output data to the outside world, and its specific interface type can be selected according to specific application needs, and is not specifically limited here.
  • the memory 22, as a carrier for storing resources can be a read-only memory, a random access memory, a disk or an optical disk, etc.
  • the resources stored thereon may include an operating system 221, a computer program 222 and data 223, etc.
  • the storage method can be temporary storage or permanent storage.
  • the operating system 221 is used to manage and control the hardware devices and computer programs 222 on the electronic device 20, so as to realize the operation and processing of the massive data 223 in the memory 22 by the processor 21, which can be Windows Server, Netware, Unix, Linux, etc.
  • the computer program 222 can further include a computer program that can be used to complete other specific tasks.
  • the data 223 can include weight information and weight information collected by the electronic device 20.
  • the embodiment of the present application further discloses a non-volatile readable storage medium, in which a computer program is stored.
  • a computer program is stored.
  • the steps of the storage cluster upgrade control method disclosed in any of the aforementioned embodiments are implemented.
  • each embodiment is described in a progressive manner, and each embodiment focuses on the differences from other embodiments.
  • the same or similar parts between the embodiments can be referred to each other.
  • the description is relatively simple, and the relevant parts can be referred to the method part.
  • the storage cluster upgrade control method, device, equipment and non-volatile readable storage medium provided by the present application are introduced in detail above. Specific examples are used in this article to illustrate the principles and implementation methods of the present application. The description of the above embodiments is only used to help understand the method of the present application and its core idea; at the same time, for general technical personnel in this field, according to the idea of the present application, there will be changes in the specific implementation method and application scope. In summary, the content of this specification should not be understood as a limitation on the present application.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Human Computer Interaction (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present application relates to the technical field of data storage, and discloses a storage cluster upgrade control method and apparatus, a device, and a non-volatile readable storage medium. The method comprises: determining all weight classes in a storage cluster and weights corresponding to the weight classes, wherein the weight classes comprise all storage services deployed in the storage cluster and a bottom fault domain of the storage cluster, and the weight represents the number of nodes allowed to be concurrently upgraded by the corresponding weight class; determining a target weight class corresponding to each storage node in the storage cluster and a target weight corresponding to the target weight class to generate a weight matrix of each storage node; and controlling an online concurrent upgrade process of each storage node in the storage cluster on the basis of the weight matrix of each storage node. Therefore, according to the present application, the efficient concurrent online upgrade of a large-scale cluster can be achieved while ensuring that a storage cluster service system is not affected.

Description

一种存储集群升级控制方法、装置、设备及存储介质A storage cluster upgrade control method, device, equipment and storage medium
相关申请的交叉引用CROSS-REFERENCE TO RELATED APPLICATIONS
本申请要求于2022年11月11日提交中国专利局,申请号为202211412298.8,申请名称为“一种存储集群升级控制方法、装置、设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims priority to a Chinese patent application filed with the China Patent Office on November 11, 2022, with application number 202211412298.8, and entitled “A Storage Cluster Upgrade Control Method, Device, Equipment and Storage Medium”, all contents of which are incorporated by reference in this application.
技术领域Technical Field
本申请涉及数据存储技术领域,特别涉及一种存储集群升级控制方法、装置、设备及非易失性可读存储介质。The present application relates to the technical field of data storage, and in particular to a storage cluster upgrade control method, device, equipment and non-volatile readable storage medium.
背景技术Background technique
随着存储体系越来越庞大,一个存储集群上线后,为修复漏洞、增强性能亦或者体验新版本特性需要定期的对存储系统进行升级,为保证升级过程不影响存储集群的正常业务则只能选择在线升级方式。目前大部分存储系统的在线升级方式为串行方式,即一个节点升级完成后执行下一个节点升级或者在某些场景下实现了并发升级,但是现有的在线升级方案限制条件较多,且不具有普适性。As storage systems become larger and larger, after a storage cluster is launched, it is necessary to regularly upgrade the storage system to fix vulnerabilities, enhance performance, or experience new version features. To ensure that the upgrade process does not affect the normal business of the storage cluster, only online upgrades can be selected. Currently, most storage systems use serial online upgrades, that is, after one node is upgraded, the next node is upgraded, or concurrent upgrades are implemented in some scenarios. However, the existing online upgrade solutions have many restrictions and are not universal.
因此,上述技术问题亟待本领域技术人员解决。Therefore, the above technical problems need to be solved by those skilled in the art urgently.
发明内容Summary of the invention
有鉴于此,本申请的目的在于提供一种存储集群升级控制方法、装置、设备及非易失性可读存储介质,能够在保证存储集群业务系统不受影响的情况下实现大规模集群的高效并发在线升级。其具体方案如下:In view of this, the purpose of this application is to provide a storage cluster upgrade control method, device, equipment and non-volatile readable storage medium, which can realize efficient concurrent online upgrade of large-scale clusters while ensuring that the storage cluster business system is not affected. The specific scheme is as follows:
本申请的第一方面提供了一种存储集群升级控制方法,包括:A first aspect of the present application provides a storage cluster upgrade control method, comprising:
确定存储集群中的全部权类及与权类对应的权值;其中,权类包括存储集群中部署的全部存储服务和存储集群的底层故障域;权值表征对应的权类允许并发升级的节点个数;Determine all weight classes in the storage cluster and the weights corresponding to the weight classes; wherein the weight classes include all storage services deployed in the storage cluster and the underlying fault domain of the storage cluster; the weights represent the number of nodes that the corresponding weight class allows to be upgraded concurrently;
确定出存储集群中各个存储节点对应的目标权类及与目标权类对应的目标权值,以生成各个存储节点的权值矩阵;Determine the target weight class corresponding to each storage node in the storage cluster and the target weight corresponding to the target weight class to generate a weight matrix for each storage node;
基于各个存储节点的权值矩阵对存储集群中各个存储节点的在线并发升级过程进行控制。Based on the weight matrix of each storage node, the online concurrent upgrade process of each storage node in the storage cluster is controlled.
可选的,确定出存储集群中各个存储节点对应的目标权类及与目标权类对应的目标权值,包括:Optionally, determining a target weight class corresponding to each storage node in the storage cluster and a target weight value corresponding to the target weight class includes:
将存储集群中各个存储节点上部署的各存储服务及存储节点所属的底层故障域确定为该存储节点对应的目标权类,并根据全部权类及与权类对应的权值确定出与目标权类对应的目标权值。Each storage service deployed on each storage node in the storage cluster and the underlying fault domain to which the storage node belongs are determined as the target weight class corresponding to the storage node, and the target weight value corresponding to the target weight class is determined based on all weight classes and the weight values corresponding to the weight classes.
可选的,在生成各个存储节点的权值矩阵之后,方法还包括:Optionally, after generating the weight matrix of each storage node, the method further includes:
在权值矩阵中被添加了其他权类的情况下,将其他权类对应的权值添加到权值矩阵中。 When other weight classes are added to the weight matrix, the weights corresponding to the other weight classes are added to the weight matrix.
可选的,基于各个存储节点的权值矩阵对存储集群中各个存储节点的在线并发升级过程进行控制,包括:Optionally, the online concurrent upgrade process of each storage node in the storage cluster is controlled based on the weight matrix of each storage node, including:
将不存在重复的目标权类的权值矩阵对应的存储节点划分至并发升级节点组,并控制并发升级节点组中的各个存储节点进行在线并发升级;其中,并发升级节点组中至少包含一个存储节点;The storage nodes corresponding to the weight matrix of the target weight class without duplication are divided into concurrent upgrade node groups, and each storage node in the concurrent upgrade node group is controlled to perform online concurrent upgrade; wherein the concurrent upgrade node group contains at least one storage node;
将存储集群中未划分至并发升级节点组中的剩余存储节点划分至剩余存储节点组,并将剩余存储节点组中的权值矩阵中的与本次划分至并发升级节点组中的存储节点的目标权类对应的目标权值减1,得到各个存储节点的更权值矩阵;The remaining storage nodes in the storage cluster that are not divided into the concurrent upgrade node group are divided into the remaining storage node group, and the target weight corresponding to the target weight class of the storage node divided into the concurrent upgrade node group in the weight matrix of the remaining storage node group is reduced by 1 to obtain a more weight matrix for each storage node;
基于各个存储节点的更新后权值矩阵对存储集群中各个存储节点的在线并发升级过程进行控制。Based on the updated weight matrix of each storage node, the online concurrent upgrade process of each storage node in the storage cluster is controlled.
可选的,将不存在重复的目标权类的权值矩阵对应的存储节点划分至并发升级节点组,包括:Optionally, the storage nodes corresponding to the weight matrices of the target weight classes that do not have duplication are divided into concurrent upgrade node groups, including:
利用冒泡原则对全部存储节点的权值矩阵进行处理,以确定出不存在重复的目标权类的权值矩阵对应的存储节点,并划分至并发升级节点组。The weight matrices of all storage nodes are processed using the bubbling principle to determine the storage nodes corresponding to the weight matrices of the target weight classes that do not have duplicates, and divide them into concurrent upgrade node groups.
可选的,利用冒泡原则对全部存储节点的权值矩阵进行处理,以确定出不存在重复的目标权类的权值矩阵对应的存储节点,包括:Optionally, the weight matrices of all storage nodes are processed using the bubbling principle to determine the storage nodes corresponding to the weight matrices of the target weight class that do not have duplication, including:
从权值矩阵表中筛选出包括权类完全不同的节点的权值矩阵,其中,权值矩阵表包括存储集群全部节点的权值矩阵,节点的权类完全不同用于表征节点之间相互隔离;The weight matrix including nodes with completely different weight types is selected from the weight matrix table, wherein the weight matrix table includes the weight matrices of all nodes in the storage cluster, and the completely different weight types of the nodes are used to represent the isolation between the nodes;
从包括权类完全不同的节点的权值矩阵中筛选出权类完全不同的存储节点。Storage nodes with completely different weight categories are screened out from the weight matrix including nodes with completely different weight categories.
可选的,基于各个存储节点的更新后权值矩阵对存储集群中各个存储节点的在线并发升级过程进行控制,包括:Optionally, the online concurrent upgrade process of each storage node in the storage cluster is controlled based on the updated weight matrix of each storage node, including:
将剩余存储节点组中存在目标权值为0的权值矩阵对应的存储节点进行剔除,得到剔除后的剩余存储节点组;其中,剔除后的剩余存储节点组中的存储节点的权值矩阵中的各目标权值均不为0;Eliminate the storage nodes corresponding to the weight matrices with target weights of 0 in the remaining storage node group to obtain the remaining storage node group after elimination; wherein each target weight in the weight matrix of the storage nodes in the remaining storage node group after elimination is not 0;
从剔除后的剩余存储节点组中筛选出权类数量最少的存储节点,并将该存储节点划分至并发升级节点组;Filter out the storage node with the least number of weight classes from the remaining storage node groups after elimination, and divide the storage node into the concurrent upgrade node group;
将剔除后的剩余存储节点组中的权值矩阵中的与本次划分至并发升级节点组中的存储节点的目标权类对应的目标权值减1,得到各个存储节点的更新后权值矩阵;Subtract 1 from the target weight corresponding to the target weight class of the storage node that is currently assigned to the concurrent upgrade node group in the weight matrix of the remaining storage node group after the elimination, to obtain an updated weight matrix of each storage node;
重复执行节点剔除步骤、根据权类数量筛选存储节点及权值减1步骤,直至剔除后的剩余存储节点组中各个存储节点的最新权值矩阵中均存在权值为0的目标权值,得到最新的剩余存储节点组。Repeat the node elimination step, filter storage nodes according to the number of weight categories, and reduce the weight by 1 step until the latest weight matrix of each storage node in the remaining storage node group after elimination has a target weight of 0, and obtain the latest remaining storage node group.
可选的,从剔除后的剩余存储节点组中筛选出权类数量最少的存储节点,包括:Optionally, the storage node with the least number of weight classes is selected from the remaining storage node group after elimination, including:
如果剔除后的剩余存储节点组中存在多个权类数量最少的存储节点,则从该多个存储节点中随机选择一个作为权类数量最少的存储节点。 If there are multiple storage nodes with the least number of weighted classes in the remaining storage node group after elimination, one storage node is randomly selected from the multiple storage nodes as the storage node with the least number of weighted classes.
可选的,得到最新的剩余存储节点组之后,还包括:Optionally, after obtaining the latest remaining storage node group, it also includes:
判断并发升级节点组中的各个存储节点是否升级完成,如果并发升级节点组中的各个存储节点升级完成,则将升级完成的存储节点的权值矩阵进行恢复,并将最新的剩余存储节点组中的权值矩阵中的与本次恢复的存储节点的目标权类对应的目标权值加1,得到最终的剩余存储节点组;Determine whether each storage node in the concurrent upgrade node group has been upgraded. If each storage node in the concurrent upgrade node group has been upgraded, restore the weight matrix of the upgraded storage node, and add 1 to the target weight corresponding to the target weight class of the restored storage node in the weight matrix of the latest remaining storage node group to obtain the final remaining storage node group;
对最终的剩余存储节点组重复执行节点剔除步骤、根据权类数量筛选存储节点、权值减1步骤、节点权值矩阵恢复及权值加1步骤,直至全部的存储节点均升级完成。Repeat the node elimination steps, storage node screening according to the number of weight categories, weight reduction step, node weight matrix recovery and weight increase step for the final remaining storage node group until all storage nodes are upgraded.
可选的,在得到各个存储节点的更新后权值矩阵之后,方法还包括:Optionally, after obtaining the updated weight matrix of each storage node, the method further includes:
将筛选出的节点进行并发升级,并根据升级节点的权类将剩余未升级节点中对应权类的权值减1。The screened nodes are upgraded concurrently, and the weights of the corresponding weight classes in the remaining non-upgraded nodes are reduced by 1 according to the weight classes of the upgraded nodes.
可选的,各个权类对应的权值大于等于1。Optionally, the weight corresponding to each weight class is greater than or equal to 1.
可选的,存储集群升级控制方法,还包括:Optionally, the storage cluster upgrade control method further includes:
获取存储集群的实时业务压力,并根据实时业务压力对与权类对应的权值进行调整。The real-time business pressure of the storage cluster is obtained, and the weight corresponding to the weight class is adjusted according to the real-time business pressure.
可选的,根据实时业务压力对与权类对应的权值进行调整,包括:Optionally, the weights corresponding to the weight classes are adjusted according to real-time business pressure, including:
如果实时业务压力超过预设压力值,则调高与权类对应的权值的大小,以减少并发数;If the real-time business pressure exceeds the preset pressure value, the weight corresponding to the weight class is increased to reduce the number of concurrencies;
如果实时业务压力低于预设压力值,则调低与权类对应的权值的大小,以增大并发数。If the real-time business pressure is lower than the preset pressure value, the weight corresponding to the weight class is lowered to increase the number of concurrencies.
可选的,存储集群升级控制方法,还包括:Optionally, the storage cluster upgrade control method further includes:
利用升级流程控制器触发存储集群升级处于暂停状态并对当前升级进度进行存档,或触发存储集群继续执行升级。The upgrade process controller is used to trigger the storage cluster upgrade to be in a paused state and archive the current upgrade progress, or to trigger the storage cluster to continue the upgrade.
可选的,利用升级流程控制器触发存储集群升级处于暂停状态并对当前升级进度进行存档包括:Optionally, using the upgrade process controller to trigger the storage cluster upgrade to be in a paused state and archive the current upgrade progress includes:
在升级结束前接收到用户发出的中断退出升级流程的指令的情况下,升级流程控制器响应指令触发集群暂停升级流程。When receiving an instruction from a user to interrupt and exit the upgrade process before the upgrade is completed, the upgrade process controller responds to the instruction and triggers the cluster to suspend the upgrade process.
可选的,触发存储集群继续执行升级包括:Optionally, triggering the storage cluster to continue the upgrade includes:
在升级流程处于暂停状态的情况下,通过升级流程控制器的继续功能,恢复存储集群继续执行升级流程。When the upgrade process is in a paused state, the storage cluster can be restored to continue the upgrade process through the continue function of the upgrade process controller.
可选的,存储集群为分布式存储系统下的集群且以可扩展的伪随机数据分布算法结构为基础的底层故障域结构。Optionally, the storage cluster is a cluster under a distributed storage system and has an underlying fault domain structure based on a scalable pseudo-random data distribution algorithm structure.
本申请的第二方面提供了一种存储集群升级控制装置,包括:A second aspect of the present application provides a storage cluster upgrade control device, including:
权类及权值确定模块,被配置为确定存储集群中的全部权类及与权类对应的权值;其中,权类包括存储集群中部署的全部存储服务和存储集群的底层故障域;权值表征对应的权类允许并发升级的节点个数;The weight class and weight value determination module is configured to determine all weight classes in the storage cluster and weight values corresponding to the weight classes; wherein the weight classes include all storage services deployed in the storage cluster and the underlying fault domain of the storage cluster; the weight values represent the number of nodes that the corresponding weight class allows to be upgraded concurrently;
权值矩阵生成模块,被配置为确定出存储集群中各个存储节点对应的目标权类及与目标权类对应的目标权值,以生成各个存储节点的权值矩阵;A weight matrix generation module is configured to determine a target weight class corresponding to each storage node in the storage cluster and a target weight corresponding to the target weight class to generate a weight matrix for each storage node;
升级模块,被配置为基于各个存储节点的权值矩阵对存储集群中各个存储 节点的在线并发升级过程进行控制。The upgrade module is configured to upgrade each storage node in the storage cluster based on the weight matrix of each storage node. Control the online concurrent upgrade process of the node.
本申请的第三方面提供了一种电子设备,电子设备包括处理器和存储器;其中存储器被配置为存储计算机程序,计算机程序由处理器加载并执行以实现前述存储集群升级控制方法。A third aspect of the present application provides an electronic device, the electronic device comprising a processor and a memory; wherein the memory is configured to store a computer program, and the computer program is loaded and executed by the processor to implement the aforementioned storage cluster upgrade control method.
本申请的第四方面提供了一种计算机非易失性可读存储介质,计算机非易失性可读存储介质中存储有计算机可执行指令,计算机可执行指令被处理器加载并执行时,实现前述存储集群升级控制方法。A fourth aspect of the present application provides a computer non-volatile readable storage medium, in which computer executable instructions are stored. When the computer executable instructions are loaded and executed by a processor, the aforementioned storage cluster upgrade control method is implemented.
本申请中,先确定存储集群中的全部权类及与权类对应的权值;其中,权类包括存储集群中部署的全部存储服务和存储集群的底层故障域;权值表征对应的权类允许并发升级的节点个数;然后确定出存储集群中各个存储节点对应的目标权类及与目标权类对应的目标权值,以生成各个存储节点的权值矩阵;最后基于各个存储节点的权值矩阵对存储集群中各个存储节点的在线并发升级过程进行控制。可见,本申请对存储集群中的存储节点设置权值矩阵,通过控制权值矩阵中的权类和权值的方式来控制存储节点的升级过程,从而在保证存储集群业务系统不受影响的情况下实现大规模集群的高效并发在线升级。In this application, all weight classes and weights corresponding to the weight classes in the storage cluster are first determined; wherein the weight classes include all storage services deployed in the storage cluster and the underlying fault domains of the storage cluster; the weights represent the number of nodes that the corresponding weight classes allow for concurrent upgrades; then the target weight classes corresponding to each storage node in the storage cluster and the target weights corresponding to the target weight classes are determined to generate a weight matrix for each storage node; finally, based on the weight matrix of each storage node, the online concurrent upgrade process of each storage node in the storage cluster is controlled. It can be seen that this application sets a weight matrix for the storage nodes in the storage cluster, and controls the upgrade process of the storage nodes by controlling the weight classes and weights in the weight matrix, thereby achieving efficient concurrent online upgrades of large-scale clusters while ensuring that the storage cluster business system is not affected.
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据提供的附图获得其他的附图。In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings required for use in the embodiments or the description of the prior art will be briefly introduced below. Obviously, the drawings described below are merely embodiments of the present application. For ordinary technicians in this field, other drawings can be obtained based on the provided drawings without paying any creative work.
图1为本申请提供的一种存储集群升级控制方法流程图;FIG1 is a flow chart of a storage cluster upgrade control method provided by the present application;
图2为本申请提供的一种可选的存储集群升级控制方法流程图;FIG2 is a flow chart of an optional storage cluster upgrade control method provided by the present application;
图3为本申请提供的一种可选的存储集群升级控制方法示意图;FIG3 is a schematic diagram of an optional storage cluster upgrade control method provided by the present application;
图4为本申请提供的一种存储集群升级控制装置结构示意图;FIG4 is a schematic diagram of the structure of a storage cluster upgrade control device provided by the present application;
图5为本申请提供的一种存储集群升级控制电子设备结构图。FIG5 is a structural diagram of a storage cluster upgrade control electronic device provided in the present application.
具体实施方式Detailed ways
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。The following will be combined with the drawings in the embodiments of the present application to clearly and completely describe the technical solutions in the embodiments of the present application. Obviously, the described embodiments are only part of the embodiments of the present application, not all of the embodiments. Based on the embodiments in the present application, all other embodiments obtained by ordinary technicians in this field without creative work are within the scope of protection of this application.
现有存储系统的在线升级方式为串行方式,即一个节点升级完成后执行下一个节点升级或者在某些场景下实现了并发升级,但是现有的在线升级方案限制条件较多,且不具有普适性。针对上述技术缺陷,本申请提供一种存储集群升级控制方案,对存储集群中的存储节点设置权值矩阵,通过控制权值矩阵中的权类和权值的方式来控制存储节点的升级过程,从而在保证存储集群业务系统不受影响的情况下实现大规模集群的高效并发在线升级。The online upgrade method of the existing storage system is serial, that is, after one node is upgraded, the next node is upgraded or concurrent upgrades are achieved in some scenarios. However, the existing online upgrade solutions have many restrictions and are not universal. In response to the above technical defects, the present application provides a storage cluster upgrade control solution, which sets a weight matrix for the storage nodes in the storage cluster, and controls the upgrade process of the storage nodes by controlling the weight classes and weights in the weight matrix, thereby achieving efficient concurrent online upgrades of large-scale clusters while ensuring that the storage cluster business system is not affected.
图1为本申请实施例提供的一种存储集群升级控制方法流程图。参见图1所 示,该存储集群升级控制方法包括:FIG1 is a flow chart of a storage cluster upgrade control method provided by an embodiment of the present application. The storage cluster upgrade control method includes:
S11:确定存储集群中的全部权类及与权类对应的权值;其中,权类包括存储集群中部署的全部存储服务和存储集群的底层故障域;权值表征对应的权类允许并发升级的节点个数。S11: Determine all weight classes in the storage cluster and weights corresponding to the weight classes; wherein the weight classes include all storage services deployed in the storage cluster and the underlying fault domain of the storage cluster; and the weights represent the number of nodes that the corresponding weight classes are allowed to upgrade concurrently.
本实施例中,先确定存储集群中的全部权类及与权类对应的权值。其中,权类包括存储集群中部署的全部存储服务和存储集群的底层故障域;权值表征对应的权类允许并发升级的节点个数。权类为节点上部署的各种服务以及节点所属的底层故障域结构,一种服务或一个故障域结构即为一个权类,一个存储节点可包含多个权类。权值为每个权类所允许并发升级的节点个数。权类决定节点能否升级,权值则确定可同时并发多少个节点升级。另外,本实施例的存储集群为分布式存储系统下的集群且以可扩展的伪随机数据分布算法结构(crush root结构)为基础的底层故障域结构。In this embodiment, all weight classes in the storage cluster and the weights corresponding to the weight classes are first determined. Among them, the weight classes include all storage services deployed in the storage cluster and the underlying fault domain of the storage cluster; the weights represent the number of nodes that the corresponding weight classes allow to be upgraded concurrently. The weight classes are the various services deployed on the nodes and the underlying fault domain structure to which the nodes belong. One service or one fault domain structure is a weight class, and a storage node can contain multiple weight classes. The weight is the number of nodes that each weight class allows to be upgraded concurrently. The weight class determines whether the node can be upgraded, and the weight determines how many nodes can be upgraded concurrently at the same time. In addition, the storage cluster of this embodiment is a cluster under a distributed storage system and is based on an underlying fault domain structure based on an extensible pseudo-random data distribution algorithm structure (crush root structure).
本实施例的在线升级具有两个基本规则,由升级流程控制器来制定。基本规则1:节点下所有权类的权值必须大于0;基本规则2:具有相同权类的节点并发同时升级的个数不能超过其基础权值。流程控制器升级规则是以存储底层故障域结构(crush root结构)为基础的,因此可以有效避免上层协议对于升级规则的影响,进而可实现全场景并发在线升级。The online upgrade of this embodiment has two basic rules, which are formulated by the upgrade process controller. Basic rule 1: The weight of the ownership class under the node must be greater than 0; Basic rule 2: The number of nodes with the same weight class that are upgraded concurrently cannot exceed their basic weight. The process controller upgrade rules are based on the storage underlying fault domain structure (crush root structure), so the impact of the upper layer protocol on the upgrade rules can be effectively avoided, thereby realizing concurrent online upgrades in all scenarios.
S12:确定出存储集群中各个存储节点对应的目标权类及与目标权类对应的目标权值,以生成各个存储节点的权值矩阵。S12: Determine the target weight class corresponding to each storage node in the storage cluster and the target weight value corresponding to the target weight class to generate a weight matrix for each storage node.
本实施例中,确定出存储集群中各个存储节点对应的目标权类及与目标权类对应的目标权值,以生成各个存储节点的权值矩阵。可选的,将存储集群中各个存储节点上部署的各存储服务及存储节点所属的底层故障域确定为该存储节点对应的目标权类,并根据全部权类及与权类对应的权值确定出与目标权类对应的目标权值。In this embodiment, the target weight class corresponding to each storage node in the storage cluster and the target weight value corresponding to the target weight class are determined to generate a weight matrix for each storage node. Optionally, each storage service deployed on each storage node in the storage cluster and the underlying fault domain to which the storage node belongs are determined as the target weight class corresponding to the storage node, and the target weight value corresponding to the target weight class is determined based on all weight classes and the weight values corresponding to the weight classes.
最终得到节点权值矩阵,节点权值矩阵决定一个节点能否升级取决于该节点的权值矩阵,节点权值矩阵有两个维度组成一个是权类,另一个是权值。基于节点权值矩阵的流程控制器具有高扩展性,即当需要添加其他权类时仅需要将对应权类的权值添加至权值矩阵即可。且基于节点权值矩阵的流程控制器具有高并发性,能够保证存储集群实时处于最佳并发升级状态。Finally, the node weight matrix is obtained. Whether a node can be upgraded depends on the node weight matrix. The node weight matrix consists of two dimensions: one is the weight class and the other is the weight. The process controller based on the node weight matrix has high scalability, that is, when other weight classes need to be added, only the weight of the corresponding weight class needs to be added to the weight matrix. And the process controller based on the node weight matrix has high concurrency, which can ensure that the storage cluster is in the best concurrent upgrade state in real time.
S13:基于各个存储节点的权值矩阵对存储集群中各个存储节点的在线并发升级过程进行控制。S13: Controlling the online concurrent upgrade process of each storage node in the storage cluster based on the weight matrix of each storage node.
本实施例中,基于各个存储节点的权值矩阵对存储集群中各个存储节点的在线并发升级过程进行控制。需要说明的是,本实施例中基于节点权值矩阵的流程控制器具有高敏捷性,当集群业务或服务处于高压状态时,可以快速影响并调节并发升级的节点个数以此来保证集群业务的正常。可选的,先获取存储集群的实时业务压力,并根据实时业务压力对与权类对应的权值进行调整。如果实时业务压力超过预设压力值,则调高与权类对应的权值的大小,以减少并发数;如果实时业务压力低于预设压力值,则调低与权类对应的权值的大小,以增大并发数。即可实现在线升级过程中权类权值的自动调节,当升级流程控 制器检测到存储集群业务压力较大,或者某服务的压力较大时,可主动调节节点权类的权值大小,调节权值大小可直接影响该权类节点的并发升级个数,进而保证更多节点可以处理集群业务。In this embodiment, the online concurrent upgrade process of each storage node in the storage cluster is controlled based on the weight matrix of each storage node. It should be noted that the process controller based on the node weight matrix in this embodiment has high agility. When the cluster business or service is under high pressure, it can quickly affect and adjust the number of concurrently upgraded nodes to ensure the normal cluster business. Optionally, first obtain the real-time business pressure of the storage cluster, and adjust the weight corresponding to the weight class according to the real-time business pressure. If the real-time business pressure exceeds the preset pressure value, the size of the weight corresponding to the weight class is increased to reduce the number of concurrency; if the real-time business pressure is lower than the preset pressure value, the size of the weight corresponding to the weight class is lowered to increase the number of concurrency. Automatic adjustment of the weight class weights during the online upgrade process can be achieved. When the upgrade process controller When the controller detects that the storage cluster business pressure is high or the pressure of a certain service is high, it can actively adjust the weight of the node weight class. Adjusting the weight size can directly affect the number of concurrent upgrades of the nodes of this weight class, thereby ensuring that more nodes can handle cluster business.
本实施例还可以利用升级流程控制器触发存储集群升级处于暂停状态并对当前升级进度进行存档,或触发存储集群继续执行升级。升级流程控制器具有暂停,继续升级功能,在升级结束前用户可以随时中断退出升级流程及可主动触发集群暂停升级流程,升级暂停后可通过继续功能恢复集群继续执行升级。This embodiment can also use the upgrade process controller to trigger the storage cluster upgrade to be in a suspended state and archive the current upgrade progress, or trigger the storage cluster to continue the upgrade. The upgrade process controller has the function of pausing and continuing the upgrade. Before the upgrade is completed, the user can interrupt and exit the upgrade process at any time and can actively trigger the cluster to suspend the upgrade process. After the upgrade is suspended, the cluster can be restored through the continue function to continue the upgrade.
可见,本申请实施例先确定存储集群中的全部权类及与权类对应的权值;其中,权类包括存储集群中部署的全部存储服务和存储集群的底层故障域;权值表征对应的权类允许并发升级的节点个数;然后确定出存储集群中各个存储节点对应的目标权类及与目标权类对应的目标权值,以生成各个存储节点的权值矩阵;最后基于各个存储节点的权值矩阵对存储集群中各个存储节点的在线并发升级过程进行控制。本申请实施例对存储集群中的存储节点设置权值矩阵,通过控制权值矩阵中的权类和权值的方式来控制存储节点的升级过程,从而在保证存储集群业务系统不受影响的情况下实现大规模集群的高效并发在线升级。It can be seen that the embodiment of the present application first determines all weight classes in the storage cluster and the weights corresponding to the weight classes; wherein the weight classes include all storage services deployed in the storage cluster and the underlying fault domains of the storage cluster; the weights represent the number of nodes that the corresponding weight class allows for concurrent upgrades; then the target weight class corresponding to each storage node in the storage cluster and the target weight class corresponding to the target weight class are determined to generate a weight matrix for each storage node; finally, based on the weight matrix of each storage node, the online concurrent upgrade process of each storage node in the storage cluster is controlled. The embodiment of the present application sets a weight matrix for the storage nodes in the storage cluster, and controls the upgrade process of the storage nodes by controlling the weight classes and weights in the weight matrix, thereby achieving efficient concurrent online upgrades of large-scale clusters while ensuring that the storage cluster business system is not affected.
图2为本申请实施例提供的一种可选的存储集群升级控制方法流程图。参见图2所示,该存储集群升级控制方法包括:FIG2 is a flow chart of an optional storage cluster upgrade control method provided in an embodiment of the present application. Referring to FIG2 , the storage cluster upgrade control method includes:
S21:确定存储集群中的全部权类及与权类对应的权值,并确定出存储集群中各个存储节点对应的目标权类及与目标权类对应的目标权值,以生成各个存储节点的权值矩阵。S21: Determine all weight classes and weights corresponding to the weight classes in the storage cluster, and determine the target weight class corresponding to each storage node in the storage cluster and the target weight corresponding to the target weight class to generate a weight matrix for each storage node.
本实施例中,关于上述步骤S21的过程,可以参考前述实施例中公开的相应内容,在此不再进行赘述。In this embodiment, regarding the process of the above step S21, reference may be made to the corresponding contents disclosed in the above embodiments, which will not be described in detail here.
S22:将不存在重复的目标权类的权值矩阵对应的存储节点划分至并发升级节点组,并控制并发升级节点组中的各个存储节点进行在线并发升级。S22: Divide the storage nodes corresponding to the weight matrix of the target weight class without duplication into a concurrent upgrade node group, and control each storage node in the concurrent upgrade node group to perform online concurrent upgrade.
本实施例中,将不存在重复的目标权类的权值矩阵对应的存储节点划分至并发升级节点组。可以利用冒泡原则对全部存储节点的权值矩阵进行处理,以确定出不存在重复的目标权类的权值矩阵对应的存储节点,并划分至并发升级节点组。也即筛选出权类完全不同的节点,节点权类完全不同则说明节点间相互隔离,可进行并发升级。然后控制并发升级节点组中的各个存储节点进行在线并发升级。其中,并发升级节点组中至少包含一个存储节点。In this embodiment, the storage nodes corresponding to the weight matrices of the target weight classes that do not have duplicates are divided into concurrent upgrade node groups. The weight matrices of all storage nodes can be processed using the bubbling principle to determine the storage nodes corresponding to the weight matrices of the target weight classes that do not have duplicates, and divide them into concurrent upgrade node groups. That is, nodes with completely different weight classes are screened out. Completely different node weight classes indicate that the nodes are isolated from each other and can be upgraded concurrently. Then, each storage node in the concurrent upgrade node group is controlled to perform online concurrent upgrades. Among them, the concurrent upgrade node group contains at least one storage node.
S23:将存储集群中未划分至并发升级节点组中的剩余存储节点划分至剩余存储节点组,并将剩余存储节点组中的权值矩阵中的与本次划分至并发升级节点组中的存储节点的目标权类对应的目标权值减1,得到各个存储节点的更权值矩阵。S23: divide the remaining storage nodes in the storage cluster that have not been divided into the concurrent upgrade node group into the remaining storage node group, and reduce the target weight corresponding to the target weight class of the storage nodes divided into the concurrent upgrade node group this time in the weight matrix of the remaining storage node group by 1, to obtain a more updated weight matrix for each storage node.
本实施例中,将存储集群中未划分至并发升级节点组中的剩余存储节点划分至剩余存储节点组,然后将剩余存储节点组中的权值矩阵中的与本次划分至并发升级节点组中的存储节点的目标权类对应的目标权值减1,得到各个存储 节点的更权值矩阵。将筛选出的节点进行并发升级,同时根据升级节点的权类调整剩余未升级节点中对应权类的权值,对应权类的权值减1。In this embodiment, the remaining storage nodes in the storage cluster that are not divided into the concurrent upgrade node group are divided into the remaining storage node group, and then the target weight corresponding to the target weight class of the storage node divided into the concurrent upgrade node group in the weight matrix of the remaining storage node group is reduced by 1 to obtain the target weight of each storage node. The node weight matrix is updated. The selected nodes are upgraded concurrently, and the weights of the corresponding weight classes of the remaining non-upgraded nodes are adjusted according to the weight classes of the upgraded nodes, and the weights of the corresponding weight classes are reduced by 1.
S24:将剩余存储节点组中存在目标权值为0的权值矩阵对应的存储节点进行剔除,得到剔除后的剩余存储节点组;其中,剔除后的剩余存储节点组中的存储节点的权值矩阵中的各目标权值均不为0。S24: Eliminate the storage nodes corresponding to the weight matrices having target weights of 0 in the remaining storage node group to obtain the remaining storage node group after elimination; wherein each target weight in the weight matrix of the storage nodes in the remaining storage node group after elimination is not 0.
S25:从剔除后的剩余存储节点组中筛选出权类数量最少的存储节点,并将该存储节点划分至并发升级节点组。S25: Filter out the storage node with the least number of weight classes from the remaining storage node group after elimination, and divide the storage node into the concurrent upgrade node group.
本实施例中,将剩余存储节点组中存在目标权值为0的权值矩阵对应的存储节点进行剔除,得到剔除后的剩余存储节点组。即筛选剩余节点中权值全部不为零的节点,其中,权值全部不为零说明可进行升级。剔除后的剩余存储节点组中的存储节点的权值矩阵中的各目标权值均不为0。从剔除后的剩余存储节点组中筛选出权类数量最少的存储节点,并将该存储节点划分至并发升级节点组。特别的,如果剔除后的剩余存储节点组中存在多个权类数量最少的存储节点,则从该多个存储节点中随机选择一个作为权类数量最少的存储节点。In this embodiment, the storage nodes corresponding to the weight matrices with target weights of 0 in the remaining storage node group are eliminated to obtain the remaining storage node group after elimination. That is, the nodes whose weights are all not zero among the remaining nodes are screened, wherein the fact that all weights are not zero indicates that they can be upgraded. Each target weight in the weight matrix of the storage nodes in the remaining storage node group after elimination is not 0. The storage node with the least number of weight classes is screened out from the remaining storage node group after elimination, and the storage node is divided into the concurrent upgrade node group. In particular, if there are multiple storage nodes with the least number of weight classes in the remaining storage node group after elimination, one is randomly selected from the multiple storage nodes as the storage node with the least number of weight classes.
S26:将剔除后的剩余存储节点组中的权值矩阵中的与本次划分至并发升级节点组中的存储节点的目标权类对应的目标权值减1,得到各个存储节点的更新后权值矩阵。S26: Subtract 1 from the target weight corresponding to the target weight class of the storage node currently divided into the concurrent upgrade node group in the weight matrix of the remaining storage node group after elimination, to obtain an updated weight matrix of each storage node.
S27:重复执行节点剔除步骤、根据权类数量筛选存储节点及权值减1步骤,直至剔除后的剩余存储节点组中各个存储节点的最新权值矩阵中均存在权值为0的目标权值,得到最新的剩余存储节点组。S27: Repeat the node elimination step, the storage node screening step according to the number of weight categories, and the weight reduction step until the latest weight matrix of each storage node in the remaining storage node group after elimination has a target weight of 0, thereby obtaining the latest remaining storage node group.
本实施例中,将剔除后的剩余存储节点组中的权值矩阵中的与本次划分至并发升级节点组中的存储节点的目标权类对应的目标权值减1,得到各个存储节点的更新后权值矩阵。然后重复执行节点剔除步骤、根据权类数量筛选存储节点及权值减1步骤,直至剔除后的剩余存储节点组中各个存储节点的最新权值矩阵中均存在权值为0的目标权值,得到最新的剩余存储节点组。In this embodiment, the target weight corresponding to the target weight class of the storage node divided into the concurrent upgrade node group in the weight matrix of the remaining storage node group after elimination is reduced by 1, and the updated weight matrix of each storage node is obtained. Then, the node elimination step, the storage node selection according to the number of weight classes, and the weight reduction step are repeated until the latest weight matrix of each storage node in the remaining storage node group after elimination has a target weight of 0, and the latest remaining storage node group is obtained.
本实施例中,循环重复执行节点剔除步骤、根据权类数量筛选存储节点及权值减1步骤,直至剩余节点的权值中存在零,其中,权值为0则表示该权类的节点并发升级节点数已达到阈值,在该权类节点升级未完成前不允许用有该权类的节点升级。In this embodiment, the node elimination step, the storage node screening according to the number of weight classes and the weight reduction step are repeatedly executed until there is zero in the weights of the remaining nodes. Among them, a weight value of 0 indicates that the number of concurrent upgrade nodes of the weight class has reached the threshold, and the nodes of the weight class are not allowed to be upgraded before the upgrade of the nodes of the weight class is completed.
S28:判断并发升级节点组中的各个存储节点是否升级完成,如果并发升级节点组中的各个存储节点升级完成,则将升级完成的存储节点的权值矩阵进行恢复,并将最新的剩余存储节点组中的权值矩阵中的与本次恢复的存储节点的目标权类对应的目标权值加1,得到最终的剩余存储节点组。S28: Determine whether the upgrade of each storage node in the concurrent upgrade node group is completed. If the upgrade of each storage node in the concurrent upgrade node group is completed, restore the weight matrix of the upgraded storage node, and add 1 to the target weight corresponding to the target weight class of the restored storage node in the weight matrix of the latest remaining storage node group to obtain the final remaining storage node group.
S29:对最终的剩余存储节点组重复执行节点剔除步骤、根据权类数量筛选存储节点、权值减1步骤、节点权值矩阵恢复及权值加1步骤,直至全部的存储节点均升级完成。S29: Repeat the node elimination step, the storage node screening step according to the number of weight categories, the weight reduction step, the node weight matrix recovery step and the weight increase step for the final remaining storage node group until all storage nodes are upgraded.
本实施例中,判断并发升级节点组中的各个存储节点是否升级完成,如果并发升级节点组中的各个存储节点升级完成,则将升级完成的存储节点的权值矩阵进行恢复,并将最新的剩余存储节点组中的权值矩阵中的与本次恢复的存 储节点的目标权类对应的目标权值加1,得到最终的剩余存储节点组。即等待执行升级的节点升级完成后,根据完成升级节点的权类刷新剩余未升级节点中对应权类的权值,对应权类的权值加1。在此基础上,对最终的剩余存储节点组重复执行节点剔除步骤、根据权类数量筛选存储节点、权值减1步骤、节点权值矩阵恢复及权值加1步骤,直至全部的存储节点均升级完成。In this embodiment, it is determined whether each storage node in the concurrently upgraded node group has been upgraded. If the storage nodes in the concurrently upgraded node group have been upgraded, the weight matrix of the upgraded storage node is restored, and the weight matrix of the latest remaining storage node group that matches the restored storage node is restored. The target weight corresponding to the target weight class of the storage node is added by 1 to obtain the final remaining storage node group. That is, after the upgrade of the node waiting to be upgraded is completed, the weight of the corresponding weight class in the remaining non-upgraded nodes is refreshed according to the weight class of the upgraded node, and the weight of the corresponding weight class is added by 1. On this basis, the node elimination steps, storage node screening according to the number of weight classes, weight reduction step, node weight matrix recovery and weight addition step are repeated for the final remaining storage node group until all storage nodes are upgraded.
下面以一个例子对本申请实施例提供的升级控制算法进行描述,流程控制器操作流程示意如图3所示。The upgrade control algorithm provided in the embodiment of the present application is described below with an example, and the operation flow of the process controller is shown in FIG3 .
对于由10个存储节点组成的存储集群,部署三个服务f1、f2、f3,底层故障域为d1和d2,将f1、f2、f3的权值也即并发阈值设置为2、2、2,将d1、d2的权值也即并发阈值设置为2、2。For a storage cluster consisting of 10 storage nodes, three services f1, f2, and f3 are deployed, and the underlying fault domains are d1 and d2. The weights of f1, f2, and f3, that is, the concurrency thresholds, are set to 2, 2, and the weights of d1 and d2, that is, the concurrency thresholds, are set to 2, 2.
步骤1:首先生成存储集群节点权值矩阵表,该表包含存储节点全部节点的权值矩阵。构建的各存储节点的权值矩阵表为:n1:[f1,2][d1,2];n2:[f1,2][d1,2];n3:[f1,2][d1,2];n4:[f2,2][d2,2];n5:[f2,2][d2,2];n6:[f2,2][d2,2];n7:[f3,2][d1,2];n8:[f3,2][d2,2];n9:[f3,2];n10:[d2,2]。Step 1: First, generate a storage cluster node weight matrix table, which contains the weight matrices of all storage nodes. The weight matrix tables of each storage node are constructed as follows: n1: [f1, 2] [d1, 2]; n2: [f1, 2] [d1, 2]; n3: [f1, 2] [d1, 2]; n4: [f2, 2] [d2, 2]; n5: [f2, 2] [d2, 2]; n6: [f2, 2] [d2, 2]; n7: [f3, 2] [d1, 2]; n8: [f3, 2] [d2, 2]; n9: [f3, 2]; n10: [d2, 2].
步骤2:采用冒泡原则筛选节点,首选筛选出权类完全不同的节点,节点权类完全不同则说明节点间相互隔离,可进行并发升级。筛选权类完全不同的存储节点n1:[f1,2][d1,2]、n4[f2,2][d2,2]和n9[f3,2],这三个存储节点可进行并发升级。Step 2: Use the bubbling principle to filter nodes, and first filter out nodes with completely different weight classes. Completely different node weight classes indicate that the nodes are isolated from each other and can be upgraded concurrently. Filter storage nodes n1: [f1, 2] [d1, 2], n4 [f2, 2] [d2, 2], and n9 [f3, 2] with completely different weight classes. These three storage nodes can be upgraded concurrently.
步骤3:将筛选出的节点进行并发升级,同时根据升级节点的权类调整剩余未升级节点中对应权类的权值,对应权类的权值减1。剩余节点权值减1,具体是将f1、d1、f2、d2、f3这五个权类的权值减1。更新后权值矩阵为:n2:[f1,1][d1,1];n3:[f1,1][d1,1];n4:[f2,1][d2,1];n5:[f2,1][d2,1];n6:[f2,1][d2,1];n7:[f3,1][d1,1];n8:[f3,1][d2,1];n10:[d2,1]。Step 3: Upgrade the selected nodes concurrently, and adjust the weights of the corresponding weight classes in the remaining non-upgraded nodes according to the weight classes of the upgraded nodes, and reduce the weights of the corresponding weight classes by 1. The weights of the remaining nodes are reduced by 1, specifically, the weights of the five weight classes f1, d1, f2, d2, and f3 are reduced by 1. The updated weight matrix is: n2: [f1, 1][d1, 1]; n3: [f1, 1][d1, 1]; n4: [f2, 1][d2, 1]; n5: [f2, 1][d2, 1]; n6: [f2, 1][d2, 1]; n7: [f3, 1][d1, 1]; n8: [f3, 1][d2, 1]; n10: [d2, 1].
步骤4:筛选剩余节点中权值全部不为零的节点(权值全部不为零说明可进行升级)。权值全部不为零的节点:n2:[f1,1][d1,1];n3:[f1,1][d1,1];n4:[f2,1][d2,1];n5:[f2,1][d2,1];n6:[f2,1][d2,1];n7:[f3,1][d1,1];n8:[f3,1][d2,1];n10:[d2,1]。Step 4: Filter the remaining nodes whose weights are all non-zero (if the weights are all non-zero, it means they can be upgraded). Nodes whose weights are all non-zero: n2: [f1, 1] [d1, 1]; n3: [f1, 1] [d1, 1]; n4: [f2, 1] [d2, 1]; n5: [f2, 1] [d2, 1]; n6: [f2, 1] [d2, 1]; n7: [f3, 1] [d1, 1]; n8: [f3, 1] [d2, 1]; n10: [d2, 1].
步骤5:在步骤4产生的节点中选择权类数量最小的节点升级并相应调整剩余节点中对应权类的权值,对应权类的权值减1。权类个数相同则随机选择一个。剩余节点对应权值减1,具体是将d2这个权类的权值减1:n2:[f1,1][d1,1];n3:[f1,1][d1,1];n5:[f2,1][d2,0];n6:[f2,1][d2,0];n7:[f3,1][d1,1];n8:[f3,1][d2,0]。Step 5: Select the node with the smallest number of weight classes from the nodes generated in step 4 to upgrade and adjust the weights of the corresponding weight classes in the remaining nodes accordingly, and reduce the weight of the corresponding weight class by 1. If the number of weight classes is the same, one is randomly selected. The corresponding weights of the remaining nodes are reduced by 1, specifically, the weight of the weight class d2 is reduced by 1: n2: [f1, 1][d1, 1]; n3: [f1, 1][d1, 1]; n5: [f2, 1][d2, 0]; n6: [f2, 1][d2, 0]; n7: [f3, 1][d1, 1]; n8: [f3, 1][d2, 0].
步骤6:循环步骤4-5直至剩余节点的权值中存在零(权值为0则表示该权类的节点并发升级节点数已达到阈值,在该权类节点升级未完成前不允许用有该权类的节点升级):Step 6: Loop through steps 4-5 until the remaining nodes have zero weights (a weight value of 0 indicates that the number of concurrently upgraded nodes of this weight class has reached the threshold, and upgrading nodes of this weight class is not allowed before the upgrade of nodes of this weight class is completed):
{步骤4:权值全不为0的节点:n2:[f1,1][d1,1];n3:[f1,1][d1,1];n7:[f3,1][d1,1]; {Step 4: Nodes whose weights are not 0: n2: [f1, 1] [d1, 1]; n3: [f1, 1] [d1, 1]; n7: [f3, 1] [d1, 1];
步骤5:选择权类数量最小的节点,权类个数相同则随机:n2:[f1,1][d1,1];剩余节点对应权值减1:n3:[f1,0][d1,0];n5:[f2,1][d2,0];n6:[f2,1][d2,0];n7:[f3,1][d1,1];n8:[f3,1][d2,0]。}Step 5: Select the node with the smallest number of weight classes. If the number of weight classes is the same, select randomly: n2: [f1, 1] [d1, 1]; the corresponding weights of the remaining nodes are reduced by 1: n3: [f1, 0] [d1, 0]; n5: [f2, 1] [d2, 0]; n6: [f2, 1] [d2, 0]; n7: [f3, 1] [d1, 1]; n8: [f3, 1] [d2, 0]. }
步骤7:等待执行升级的节点升级完成后,根据完成升级节点的权类刷新剩余未升级节点中对应权类的权值,对应权类的权值+1。例如,升级完成n1:[f1,2][d1,2],剩余节点权值加1:n3:[f1,1][d1,1];n5:[f2,1][d2,0];n6:[f2,1][d2,0];n7:[f3,1][d1,1];n8:[f3,1][d2,0]Step 7: After the node waiting for the upgrade is completed, the weights of the corresponding weight classes of the remaining nodes that have not been upgraded are refreshed according to the weight classes of the nodes that have completed the upgrade, and the weights of the corresponding weight classes are increased by 1. For example, after the upgrade is completed, n1: [f1, 2][d1, 2], the weights of the remaining nodes are increased by 1: n3: [f1, 1][d1, 1]; n5: [f2, 1][d2, 0]; n6: [f2, 1][d2, 0]; n7: [f3, 1][d1, 1]; n8: [f3, 1][d2, 0]
步骤8:循环重复步骤4-7直至全部节点完成升级操作:Step 8: Repeat steps 4-7 until all nodes have completed the upgrade:
{步骤4:权值全不为0的节点:n3:[f1,1][d1,1];n7:[f3,1][d1,1];{Step 4: Nodes whose weights are not 0: n3: [f1, 1] [d1, 1]; n7: [f3, 1] [d1, 1];
步骤5:选择权类数量最小的节点,权类个数相同则随机:n3:[f1,1][d1,1];剩余节点对应权值减1:n5:[f2,1][d2,0];n6:[f2,1][d2,0];n7:[f3,1][d1,0];n8:[f3,1][d2,0];等待节点权值恢复。}Step 5: Select the node with the smallest number of weight classes. If the number of weight classes is the same, select randomly: n3: [f1, 1] [d1, 1]; the corresponding weights of the remaining nodes are reduced by 1: n5: [f2, 1] [d2, 0]; n6: [f2, 1] [d2, 0]; n7: [f3, 1] [d1, 0]; n8: [f3, 1] [d2, 0]; wait for the node weights to recover. }
上述在线升级过程中权类权值的自动调节,当升级流程控制器检测到存储集群业务压力较大,或者某服务的压力较大时,可主动调节节点权类的权值大小,调节权值大小可直接影响该权类节点的并发升级个数,进而保证更多节点可以处理集群业务。During the above online upgrade process, the weights of the weight classes are automatically adjusted. When the upgrade process controller detects that the storage cluster business pressure is high or the pressure of a certain service is high, it can actively adjust the weights of the node weight classes. Adjusting the weights can directly affect the number of concurrent upgrades of the nodes of this weight class, thereby ensuring that more nodes can handle cluster business.
参见图4所示,本申请实施例还相应公开了一种存储集群升级控制装置,包括:As shown in FIG. 4 , the embodiment of the present application also discloses a storage cluster upgrade control device, including:
权类及权值确定模块11,被配置为确定存储集群中的全部权类及与权类对应的权值;其中,权类包括存储集群中部署的全部存储服务和存储集群的底层故障域;权值表征对应的权类允许并发升级的节点个数;The weight class and weight value determination module 11 is configured to determine all weight classes in the storage cluster and weight values corresponding to the weight classes; wherein the weight classes include all storage services deployed in the storage cluster and the underlying fault domain of the storage cluster; the weight values represent the number of nodes that the corresponding weight class allows to be upgraded concurrently;
权值矩阵生成模块12,被配置为确定出存储集群中各个存储节点对应的目标权类及与目标权类对应的目标权值,以生成各个存储节点的权值矩阵;The weight matrix generation module 12 is configured to determine the target weight class corresponding to each storage node in the storage cluster and the target weight corresponding to the target weight class to generate a weight matrix for each storage node;
升级模块13,被配置为基于各个存储节点的权值矩阵对存储集群中各个存储节点的在线并发升级过程进行控制。The upgrade module 13 is configured to control the online concurrent upgrade process of each storage node in the storage cluster based on the weight matrix of each storage node.
本实施例中,权类及权值确定模块11确定存储集群中的全部权类及与权类对应的权值。其中,权类包括存储集群中部署的全部存储服务和存储集群的底层故障域;权值表征对应的权类允许并发升级的节点个数。权类为节点上部署的各种服务以及节点所属的底层故障域结构,一种服务或一个故障域结构即为一个权类,一个存储节点可包含多个权类。权值为每个权类所允许并发升级的节点个数。权类决定节点能否升级,权值则确定可同时并发多少个节点升级。另外,本实施例的存储集群为分布式存储系统下的集群且以可扩展的伪随机数据分布算法结构(crush root结构)为基础的底层故障域结构。In this embodiment, the weight class and weight value determination module 11 determines all weight classes in the storage cluster and the weight values corresponding to the weight classes. Among them, the weight classes include all storage services deployed in the storage cluster and the underlying fault domain of the storage cluster; the weight value represents the number of nodes that the corresponding weight class allows to be upgraded concurrently. The weight class is the various services deployed on the node and the underlying fault domain structure to which the node belongs. A service or a fault domain structure is a weight class, and a storage node can contain multiple weight classes. The weight value is the number of nodes that each weight class allows to be upgraded concurrently. The weight class determines whether the node can be upgraded, and the weight value determines how many nodes can be upgraded concurrently at the same time. In addition, the storage cluster of this embodiment is a cluster under a distributed storage system and is based on an underlying fault domain structure based on an extensible pseudo-random data distribution algorithm structure (crush root structure).
本实施例的在线升级具有两个基本规则,由升级流程控制器来制定。基本规则1:节点下所有权类的权值必须大于0;基本规则2:具有相同权类的节点并发同时升级的个数不能超过其基础权值。流程控制器升级规则是以存储底层故障域结构(crush root结构)为基础的,因此可以有效避免上层协议对于升级规则的影响,进而可实现全场景并发在线升级。 The online upgrade of this embodiment has two basic rules, which are formulated by the upgrade process controller. Basic rule 1: The weight of the ownership class under the node must be greater than 0; Basic rule 2: The number of nodes with the same weight class that are upgraded concurrently cannot exceed their basic weight. The process controller upgrade rules are based on the storage underlying fault domain structure (crush root structure), so it can effectively avoid the impact of the upper layer protocol on the upgrade rules, and thus realize full-scenario concurrent online upgrades.
本实施例中,权值矩阵生成模块12确定出存储集群中各个存储节点对应的目标权类及与目标权类对应的目标权值,以生成各个存储节点的权值矩阵。可选的,将存储集群中各个存储节点上部署的各存储服务及存储节点所属的底层故障域确定为该存储节点对应的目标权类,并根据全部权类及与权类对应的权值确定出与目标权类对应的目标权值。In this embodiment, the weight matrix generation module 12 determines the target weight class corresponding to each storage node in the storage cluster and the target weight corresponding to the target weight class to generate the weight matrix of each storage node. Optionally, each storage service deployed on each storage node in the storage cluster and the underlying fault domain to which the storage node belongs are determined as the target weight class corresponding to the storage node, and the target weight corresponding to the target weight class is determined based on all weight classes and the weights corresponding to the weight classes.
最终得到节点权值矩阵,节点权值矩阵决定一个节点能否升级取决于该节点的权值矩阵,节点权值矩阵有两个维度组成一个是权类,另一个是权值。基于节点权值矩阵的流程控制器具有高扩展性,即当需要添加其他权类时仅需要将对应权类的权值添加至权值矩阵即可。且基于节点权值矩阵的流程控制器具有高并发性,能够保证存储集群实时处于最佳并发升级状态。Finally, the node weight matrix is obtained. Whether a node can be upgraded depends on the node weight matrix. The node weight matrix consists of two dimensions: one is the weight class and the other is the weight. The process controller based on the node weight matrix has high scalability, that is, when other weight classes need to be added, only the weight of the corresponding weight class needs to be added to the weight matrix. And the process controller based on the node weight matrix has high concurrency, which can ensure that the storage cluster is in the best concurrent upgrade state in real time.
本实施例中,升级模块13基于各个存储节点的权值矩阵对存储集群中各个存储节点的在线并发升级过程进行控制。需要说明的是,本实施例中基于节点权值矩阵的流程控制器具有高敏捷性,当集群业务或服务处于高压状态时,可以快速影响并调节并发升级的节点个数以此来保证集群业务的正常。可选的,先获取存储集群的实时业务压力,并根据实时业务压力对与权类对应的权值进行调整。如果实时业务压力超过预设压力值,则调高与权类对应的权值的大小,以减少并发数;如果实时业务压力低于预设压力值,则调低与权类对应的权值的大小,以增大并发数。即可实现在线升级过程中权类权值的自动调节,当升级流程控制器检测到存储集群业务压力较大,或者某服务的压力较大时,可主动调节节点权类的权值大小,调节权值大小可直接影响该权类节点的并发升级个数,进而保证更多节点可以处理集群业务。In this embodiment, the upgrade module 13 controls the online concurrent upgrade process of each storage node in the storage cluster based on the weight matrix of each storage node. It should be noted that the process controller based on the node weight matrix in this embodiment has high agility. When the cluster business or service is under high pressure, it can quickly affect and adjust the number of nodes for concurrent upgrades to ensure the normal operation of the cluster business. Optionally, the real-time business pressure of the storage cluster is first obtained, and the weight corresponding to the weight class is adjusted according to the real-time business pressure. If the real-time business pressure exceeds the preset pressure value, the size of the weight corresponding to the weight class is increased to reduce the number of concurrent operations; if the real-time business pressure is lower than the preset pressure value, the size of the weight corresponding to the weight class is lowered to increase the number of concurrent operations. The automatic adjustment of the weight class weight in the online upgrade process can be achieved. When the upgrade process controller detects that the storage cluster business pressure is large, or the pressure of a certain service is large, the weight of the node weight class can be actively adjusted. The adjustment of the weight size can directly affect the number of concurrent upgrades of the weight class node, thereby ensuring that more nodes can handle cluster business.
本实施例还可以利用升级流程控制器触发存储集群升级处于暂停状态并对当前升级进度进行存档,或触发存储集群继续执行升级。升级流程控制器具有暂停,继续升级功能,在升级结束前用户可以随时中断退出升级流程及可主动触发集群暂停升级流程,升级暂停后可通过继续功能恢复集群继续执行升级。This embodiment can also use the upgrade process controller to trigger the storage cluster upgrade to be in a suspended state and archive the current upgrade progress, or trigger the storage cluster to continue the upgrade. The upgrade process controller has the function of pausing and continuing the upgrade. Before the upgrade is completed, the user can interrupt and exit the upgrade process at any time and can actively trigger the cluster to suspend the upgrade process. After the upgrade is suspended, the cluster can be restored through the continue function to continue the upgrade.
可见,本申请实施例先确定存储集群中的全部权类及与权类对应的权值;其中,权类包括存储集群中部署的全部存储服务和存储集群的底层故障域;权值表征对应的权类允许并发升级的节点个数;然后确定出存储集群中各个存储节点对应的目标权类及与目标权类对应的目标权值,以生成各个存储节点的权值矩阵;最后基于各个存储节点的权值矩阵对存储集群中各个存储节点的在线并发升级过程进行控制。本申请实施例对存储集群中的存储节点设置权值矩阵,通过控制权值矩阵中的权类和权值的方式来控制存储节点的升级过程,从而在保证存储集群业务系统不受影响的情况下实现大规模集群的高效并发在线升级。It can be seen that the embodiment of the present application first determines all weight classes in the storage cluster and the weights corresponding to the weight classes; wherein the weight classes include all storage services deployed in the storage cluster and the underlying fault domains of the storage cluster; the weights represent the number of nodes that the corresponding weight class allows for concurrent upgrades; then the target weight class corresponding to each storage node in the storage cluster and the target weight class corresponding to the target weight class are determined to generate a weight matrix for each storage node; finally, based on the weight matrix of each storage node, the online concurrent upgrade process of each storage node in the storage cluster is controlled. The embodiment of the present application sets a weight matrix for the storage nodes in the storage cluster, and controls the upgrade process of the storage nodes by controlling the weight classes and weights in the weight matrix, thereby achieving efficient concurrent online upgrades of large-scale clusters while ensuring that the storage cluster business system is not affected.
作为一种可选的实施例,权值矩阵生成模块12,被配置为将存储集群中各个存储节点上部署的各存储服务及存储节点所属的底层故障域确定为该存储节点对应的目标权类,并根据全部权类及与权类对应的权值确定出与目标权类对应的目标权值。As an optional embodiment, the weight matrix generation module 12 is configured to determine each storage service deployed on each storage node in the storage cluster and the underlying fault domain to which the storage node belongs as the target weight class corresponding to the storage node, and determine the target weight corresponding to the target weight class based on all weight classes and the weights corresponding to the weight classes.
作为一种可选的实施例,升级模块13,包括: As an optional embodiment, the upgrade module 13 includes:
第一划分单元,被配置为将不存在重复的目标权类的权值矩阵对应的存储节点划分至并发升级节点组,并控制并发升级节点组中的各个存储节点进行在线并发升级;其中,并发升级节点组中至少包含一个存储节点;The first division unit is configured to divide the storage nodes corresponding to the weight matrix of the target weight class without duplication into a concurrent upgrade node group, and control each storage node in the concurrent upgrade node group to perform online concurrent upgrade; wherein the concurrent upgrade node group includes at least one storage node;
第二划分单元,被配置为将存储集群中未划分至并发升级节点组中的剩余存储节点划分至剩余存储节点组,并将剩余存储节点组中的权值矩阵中的与本次划分至并发升级节点组中的存储节点的目标权类对应的目标权值减1,得到各个存储节点的更权值矩阵;The second division unit is configured to divide the remaining storage nodes in the storage cluster that are not divided into the concurrent upgrade node group into the remaining storage node group, and reduce the target weight corresponding to the target weight class of the storage node divided into the concurrent upgrade node group in the weight matrix of the remaining storage node group by 1, so as to obtain a more weight matrix of each storage node;
剔除单元,被配置为将剩余存储节点组中存在目标权值为0的权值矩阵对应的存储节点进行剔除,得到剔除后的剩余存储节点组;其中,剔除后的剩余存储节点组中的存储节点的权值矩阵中的各目标权值均不为0;The elimination unit is configured to eliminate the storage nodes corresponding to the weight matrices having target weights of 0 in the remaining storage node group, so as to obtain the remaining storage node group after the elimination; wherein each target weight in the weight matrix of the storage nodes in the remaining storage node group after the elimination is not 0;
筛选单元,被配置为从剔除后的剩余存储节点组中筛选出权类数量最少的存储节点,并将该存储节点划分至并发升级节点组;A screening unit is configured to screen out a storage node with the least number of weight classes from the remaining storage node group after elimination, and divide the storage node into a concurrent upgrade node group;
权值减单元,被配置为将剔除后的剩余存储节点组中的权值矩阵中的与本次划分至并发升级节点组中的存储节点的目标权类对应的目标权值减1,得到各个存储节点的更新后权值矩阵;A weight reduction unit is configured to reduce by 1 the target weight corresponding to the target weight class of the storage node currently divided into the concurrent upgrade node group in the weight matrix of the remaining storage node group after the elimination, so as to obtain an updated weight matrix of each storage node;
第一循环单元,被配置为重复执行节点剔除步骤、根据权类数量筛选存储节点及权值减1步骤,直至剔除后的剩余存储节点组中各个存储节点的最新权值矩阵中均存在权值为0的目标权值,得到最新的剩余存储节点组;The first loop unit is configured to repeatedly execute the node elimination step, the storage node screening according to the number of weight categories, and the weight reduction step until the latest weight matrix of each storage node in the remaining storage node group after the elimination has a target weight of 0, thereby obtaining the latest remaining storage node group;
判断单元,被配置为判断并发升级节点组中的各个存储节点是否升级完成,如果并发升级节点组中的各个存储节点升级完成,则将升级完成的存储节点的权值矩阵进行恢复,并将最新的剩余存储节点组中的权值矩阵中的与本次恢复的存储节点的目标权类对应的目标权值加1,得到最终的剩余存储节点组;A judgment unit is configured to judge whether each storage node in the concurrently upgraded node group has been upgraded. If each storage node in the concurrently upgraded node group has been upgraded, the weight matrix of the upgraded storage node is restored, and the target weight corresponding to the target weight class of the restored storage node in the weight matrix of the latest remaining storage node group is increased by 1 to obtain a final remaining storage node group;
第二循环单元,被配置为对最终的剩余存储节点组重复执行节点剔除步骤、根据权类数量筛选存储节点、权值减1步骤、节点权值矩阵恢复及权值加1步骤,直至全部的存储节点均升级完成。The second loop unit is configured to repeatedly execute the node elimination step, the storage node screening according to the number of weight categories, the weight reduction step, the node weight matrix recovery and the weight increase step for the final remaining storage node group until all storage nodes are upgraded.
作为一种可选的实施例,第一划分单元,还被配置为利用冒泡原则对全部存储节点的权值矩阵进行处理,以确定出不存在重复的目标权类的权值矩阵对应的存储节点,并划分至并发升级节点组。As an optional embodiment, the first partitioning unit is further configured to process the weight matrices of all storage nodes using the bubbling principle to determine the storage nodes corresponding to the weight matrices of the target weight classes without duplication, and divide them into concurrent upgrade node groups.
作为一种可选的实施例,剔除单元,还被配置为如果剔除后的剩余存储节点组中存在多个权类数量最少的存储节点,则从该多个存储节点中随机选择一个作为权类数量最少的存储节点。As an optional embodiment, the elimination unit is further configured to randomly select one storage node from among the plurality of storage nodes as the storage node with the least number of weight classes if there are multiple storage nodes with the least number of weight classes in the remaining storage node group after the elimination.
作为一种可选的实施例,存储集群升级控制装置还包括:As an optional embodiment, the storage cluster upgrade control device further includes:
调整模块,被配置为获取存储集群的实时业务压力,并根据实时业务压力对与权类对应的权值进行调整;An adjustment module is configured to obtain real-time business pressure of the storage cluster and adjust the weight corresponding to the weight class according to the real-time business pressure;
控制模块,被配置为利用升级流程控制器触发存储集群升级处于暂停状态并对当前升级进度进行存档,或触发存储集群继续执行升级。The control module is configured to use the upgrade process controller to trigger the storage cluster upgrade to be in a suspended state and archive the current upgrade progress, or to trigger the storage cluster to continue to perform the upgrade.
作为一种可选的实施例,调整模块,包括:As an optional embodiment, the adjustment module includes:
第一调整单元,被配置为如果实时业务压力超过预设压力值,则调高与权 类对应的权值的大小,以减少并发数;The first adjustment unit is configured to increase the weight if the real-time business pressure exceeds the preset pressure value. The size of the weight corresponding to the class to reduce the number of concurrency;
第二调整单元,被配置为如果实时业务压力低于预设压力值,则调低与权类对应的权值的大小,以增大并发数。The second adjustment unit is configured to reduce the size of the weight corresponding to the weight class to increase the number of concurrency if the real-time business pressure is lower than the preset pressure value.
本申请实施例还提供了一种电子设备。图5是根据一示例性实施例示出的电子设备20结构图,图中的内容不能认为是对本申请的使用范围的任何限制。The embodiment of the present application further provides an electronic device. Fig. 5 is a structural diagram of an electronic device 20 according to an exemplary embodiment, and the content in the diagram cannot be regarded as any limitation on the scope of application of the present application.
图5为本申请实施例提供的一种电子设备20的结构示意图。该电子设备20,可以包括:至少一个处理器21、至少一个存储器22、电源23、通信接口24、输入输出接口25和通信总线26。其中,存储器22被配置为存储计算机程序,计算机程序由处理器21加载并执行,以实现前述任一实施例公开的存储集群升级控制方法中的相关步骤。FIG5 is a schematic diagram of the structure of an electronic device 20 provided in an embodiment of the present application. The electronic device 20 may include: at least one processor 21, at least one memory 22, a power supply 23, a communication interface 24, an input/output interface 25, and a communication bus 26. The memory 22 is configured to store a computer program, which is loaded and executed by the processor 21 to implement the relevant steps in the storage cluster upgrade control method disclosed in any of the aforementioned embodiments.
本实施例中,电源23被配置为为电子设备20上的各硬件设备提供工作电压;通信接口24能够为电子设备20创建与外界设备之间的数据传输通道,其所遵循的通信协议是能够适用于本申请技术方案的任意通信协议,在此不对其进行具体限定;输入输出接口25,被配置为获取外界输入数据或向外界输出数据,其具体的接口类型可以根据具体应用需要进行选取,在此不进行具体限定。In this embodiment, the power supply 23 is configured to provide working voltage for each hardware device on the electronic device 20; the communication interface 24 can create a data transmission channel between the electronic device 20 and the external device, and the communication protocol it follows is any communication protocol that can be applied to the technical solution of the present application, and is not specifically limited here; the input and output interface 25 is configured to obtain external input data or output data to the outside world, and its specific interface type can be selected according to specific application needs, and is not specifically limited here.
另外,存储器22作为资源存储的载体,可以是只读存储器、随机存储器、磁盘或者光盘等,其上所存储的资源可以包括操作系统221、计算机程序222及数据223等,存储方式可以是短暂存储或者永久存储。In addition, the memory 22, as a carrier for storing resources, can be a read-only memory, a random access memory, a disk or an optical disk, etc. The resources stored thereon may include an operating system 221, a computer program 222 and data 223, etc. The storage method can be temporary storage or permanent storage.
其中,操作系统221用于管理与控制电子设备20上的各硬件设备以及计算机程序222,以实现处理器21对存储器22中海量数据223的运算与处理,其可以是Windows Server、Netware、Unix、Linux等。计算机程序222除了包括能够用于完成前述任一实施例公开的由电子设备20执行的存储集群升级控制方法的计算机程序之外,还可以进一步包括能够用于完成其他特定工作的计算机程序。数据223可以包括电子设备20收集到的权类信息及权值信息。The operating system 221 is used to manage and control the hardware devices and computer programs 222 on the electronic device 20, so as to realize the operation and processing of the massive data 223 in the memory 22 by the processor 21, which can be Windows Server, Netware, Unix, Linux, etc. In addition to including a computer program that can be used to complete the storage cluster upgrade control method performed by the electronic device 20 disclosed in any of the aforementioned embodiments, the computer program 222 can further include a computer program that can be used to complete other specific tasks. The data 223 can include weight information and weight information collected by the electronic device 20.
本申请实施例还公开了一种非易失性可读存储介质,非易失性可读存储介质中存储有计算机程序,计算机程序被处理器加载并执行时,实现前述任一实施例公开的存储集群升级控制方法步骤。The embodiment of the present application further discloses a non-volatile readable storage medium, in which a computer program is stored. When the computer program is loaded and executed by a processor, the steps of the storage cluster upgrade control method disclosed in any of the aforementioned embodiments are implemented.
本说明书中各个实施例采用递进的方式描述,每个实施例重点说明的都是与其它实施例的不同之处,各个实施例之间相同或相似部分互相参见即可。对于实施例公开的装置而言,由于其与实施例公开的方法相对应,所以描述的比较简单,相关之处参见方法部分说明即可。In this specification, each embodiment is described in a progressive manner, and each embodiment focuses on the differences from other embodiments. The same or similar parts between the embodiments can be referred to each other. For the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant parts can be referred to the method part.
最后,还需要说明的是,在本文中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一 系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个...”限定的要素,并不排除在包括要素的过程、方法、物品或者设备中还存在另外的相同要素。Finally, it should be noted that, in this article, relational terms such as first and second, etc. are only used to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply any such actual relationship or order between these entities or operations. Moreover, the terms "include", "comprises" or any other variations thereof are intended to cover non-exclusive inclusion, so that including one A process, method, article, or device that includes not only those elements, but also other elements not explicitly listed, or elements inherent to such process, method, article, or device. In the absence of further limitations, an element defined by the phrase "comprising a ..." does not exclude the presence of other identical elements in the process, method, article, or device that includes the element.
以上对本申请所提供的存储集群升级控制方法、装置、设备及非易失性可读存储介质进行了详细介绍,本文中应用了具体个例对本申请的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申请的方法及其核心思想;同时,对于本领域的一般技术人员,依据本申请的思想,在具体实施方式及应用范围上均会有改变之处,综上,本说明书内容不应理解为对本申请的限制。 The storage cluster upgrade control method, device, equipment and non-volatile readable storage medium provided by the present application are introduced in detail above. Specific examples are used in this article to illustrate the principles and implementation methods of the present application. The description of the above embodiments is only used to help understand the method of the present application and its core idea; at the same time, for general technical personnel in this field, according to the idea of the present application, there will be changes in the specific implementation method and application scope. In summary, the content of this specification should not be understood as a limitation on the present application.

Claims (20)

  1. 一种存储集群升级控制方法,其特征在于,包括:A storage cluster upgrade control method, characterized by comprising:
    确定存储集群中的全部权类及与权类对应的权值;其中,权类包括所述存储集群中部署的全部存储服务和所述存储集群的底层故障域;权值表征对应的所述权类允许并发升级的节点个数;Determine all weight classes in the storage cluster and weights corresponding to the weight classes; wherein the weight classes include all storage services deployed in the storage cluster and the underlying fault domains of the storage cluster; the weights represent the number of nodes that the corresponding weight classes allow to be upgraded concurrently;
    确定出所述存储集群中各个存储节点对应的目标权类及与所述目标权类对应的目标权值,以生成各个存储节点的权值矩阵;Determine a target weight class corresponding to each storage node in the storage cluster and a target weight corresponding to the target weight class to generate a weight matrix for each storage node;
    基于各个存储节点的权值矩阵对所述存储集群中各个存储节点的在线并发升级过程进行控制。The online concurrent upgrade process of each storage node in the storage cluster is controlled based on the weight matrix of each storage node.
  2. 根据权利要求1所述的存储集群升级控制方法,其特征在于,所述确定出所述存储集群中各个存储节点对应的目标权类及与所述目标权类对应的目标权值,包括:The storage cluster upgrade control method according to claim 1, characterized in that the step of determining the target weight class corresponding to each storage node in the storage cluster and the target weight value corresponding to the target weight class comprises:
    将所述存储集群中各个存储节点上部署的各存储服务及存储节点所属的底层故障域确定为该存储节点对应的所述目标权类,并根据全部权类及与权类对应的权值确定出与所述目标权类对应的所述目标权值。The storage services deployed on each storage node in the storage cluster and the underlying fault domain to which the storage node belongs are determined as the target weight class corresponding to the storage node, and the target weight value corresponding to the target weight class is determined based on all weight classes and the weight values corresponding to the weight classes.
  3. 根据权利要求1所述的存储集群升级控制方法,其特征在于,在所述生成各个存储节点的权值矩阵之后,所述方法还包括:The storage cluster upgrade control method according to claim 1, characterized in that after generating the weight matrix of each storage node, the method further comprises:
    在所述权值矩阵中被添加了其他权类的情况下,将所述其他权类对应的权值添加到所述权值矩阵中。In the case where other weight classes are added to the weight matrix, weights corresponding to the other weight classes are added to the weight matrix.
  4. 根据权利要求1所述的存储集群升级控制方法,其特征在于,所述基于各个存储节点的权值矩阵对所述存储集群中各个存储节点的在线并发升级过程进行控制,包括:The storage cluster upgrade control method according to claim 1, characterized in that the online concurrent upgrade process of each storage node in the storage cluster is controlled based on the weight matrix of each storage node, comprising:
    将不存在重复的所述目标权类的所述权值矩阵对应的存储节点划分至并发升级节点组,并控制所述并发升级节点组中的各个存储节点进行在线并发升级;其中,所述并发升级节点组中至少包含一个存储节点;Divide the storage nodes corresponding to the weight matrix of the target weight class that does not have duplication into a concurrent upgrade node group, and control each storage node in the concurrent upgrade node group to perform online concurrent upgrade; wherein the concurrent upgrade node group includes at least one storage node;
    将所述存储集群中未划分至所述并发升级节点组中的剩余存储节点划分至剩余存储节点组,并将所述剩余存储节点组中的所述权值矩阵中的与本次划分至所述并发升级节点组中的存储节点的所述目标权类对应的所述目标权值减1,得到各个存储节点的更新权值矩阵;The remaining storage nodes in the storage cluster that are not divided into the concurrent upgrade node group are divided into the remaining storage node group, and the target weight corresponding to the target weight class of the storage node divided into the concurrent upgrade node group this time in the weight matrix of the remaining storage node group is reduced by 1 to obtain an updated weight matrix of each storage node;
    基于各个存储节点的更新后权值矩阵对所述存储集群中各个存储节点的在线并发升级过程进行控制。The online concurrent upgrade process of each storage node in the storage cluster is controlled based on the updated weight matrix of each storage node.
  5. 根据权利要求4所述的存储集群升级控制方法,其特征在于,所述将不存在重复的所述目标权类的所述权值矩阵对应的存储节点划分至并发升级节点组,包括:The storage cluster upgrade control method according to claim 4, characterized in that the step of dividing the storage nodes corresponding to the weight matrix of the target weight class that does not have duplication into concurrent upgrade node groups comprises:
    利用冒泡原则对全部存储节点的所述权值矩阵进行处理,以确定出不存在重复的所述目标权类的所述权值矩阵对应的存储节点,并划分至所述并发升级节点组。The weight matrices of all storage nodes are processed using the bubbling principle to determine storage nodes corresponding to the weight matrices of the target weight class that do not have duplications, and divide them into the concurrent upgrade node group.
  6. 根据权利要求5所述的存储集群升级控制方法,其特征在于,所述利用冒泡原则对全部存储节点的所述权值矩阵进行处理,以确定出不存在重复 的所述目标权类的所述权值矩阵对应的存储节点,包括:The storage cluster upgrade control method according to claim 5 is characterized in that the weight matrix of all storage nodes is processed using the bubble principle to determine whether there is no duplication. The storage node corresponding to the weight matrix of the target weight class includes:
    从权值矩阵表中筛选出包括权类完全不同的节点的所述权值矩阵,其中,所述权值矩阵表包括所述存储集群全部节点的所述权值矩阵,所述节点的权类完全不同用于表征所述节点之间相互隔离;Filtering out the weight matrix including nodes with completely different weight types from the weight matrix table, wherein the weight matrix table includes the weight matrices of all nodes in the storage cluster, and the completely different weight types of the nodes are used to represent that the nodes are isolated from each other;
    从包括权类完全不同的节点的所述权值矩阵中筛选出权类完全不同的所述存储节点。The storage nodes with completely different weight categories are screened out from the weight matrix including nodes with completely different weight categories.
  7. 根据权利要求4所述的存储集群升级控制方法,其特征在于,所述基于各个存储节点的更新后权值矩阵对所述存储集群中各个存储节点的在线并发升级过程进行控制,包括:The storage cluster upgrade control method according to claim 4, characterized in that the controlling the online concurrent upgrade process of each storage node in the storage cluster based on the updated weight matrix of each storage node comprises:
    将所述剩余存储节点组中存在所述目标权值为0的所述权值矩阵对应的存储节点进行剔除,得到剔除后的所述剩余存储节点组;其中,剔除后的所述剩余存储节点组中的存储节点的所述权值矩阵中的各所述目标权值均不为0;Eliminate the storage nodes corresponding to the weight matrix having the target weight value of 0 in the remaining storage node group to obtain the remaining storage node group after elimination; wherein each of the target weights in the weight matrix of the storage nodes in the remaining storage node group after elimination is not 0;
    从剔除后的所述剩余存储节点组中筛选出权类数量最少的存储节点,并将该存储节点划分至所述并发升级节点组;Filter out the storage node with the least number of weight classes from the remaining storage node group after elimination, and divide the storage node into the concurrent upgrade node group;
    将剔除后的所述剩余存储节点组中的所述权值矩阵中的与本次划分至所述并发升级节点组中的存储节点的所述目标权类对应的所述目标权值减1,得到各个存储节点的更新后所述权值矩阵;Subtract 1 from the target weight corresponding to the target weight class of the storage node currently divided into the concurrent upgrade node group in the weight matrix of the remaining storage node group after elimination, to obtain the updated weight matrix of each storage node;
    重复执行节点剔除步骤、根据权类数量筛选存储节点及权值减1步骤,直至剔除后的所述剩余存储节点组中各个存储节点的最新所述权值矩阵中均存在权值为0的所述目标权值,得到最新的所述剩余存储节点组。Repeat the node elimination step, filter the storage nodes according to the number of weight categories, and reduce the weight by 1 step until the target weight of 0 exists in the latest weight matrix of each storage node in the remaining storage node group after elimination, thereby obtaining the latest remaining storage node group.
  8. 根据权利要求7所述的存储集群升级控制方法,其特征在于,所述从剔除后的所述剩余存储节点组中筛选出权类数量最少的存储节点,包括:The storage cluster upgrade control method according to claim 7, characterized in that the step of selecting the storage node with the least number of weight classes from the remaining storage node group after elimination comprises:
    如果剔除后的所述剩余存储节点组中存在多个权类数量最少的存储节点,则从该多个存储节点中随机选择一个作为权类数量最少的存储节点。If there are multiple storage nodes with the least number of weighted classes in the remaining storage node group after elimination, one storage node is randomly selected from the multiple storage nodes as the storage node with the least number of weighted classes.
  9. 根据权利要求7所述的存储集群升级控制方法,其特征在于,得到最新的所述剩余存储节点组之后,还包括:The storage cluster upgrade control method according to claim 7, characterized in that after obtaining the latest remaining storage node group, it also includes:
    判断所述并发升级节点组中的各个存储节点是否升级完成,如果并发升级节点组中的各个存储节点升级完成,则将升级完成的存储节点的所述权值矩阵进行恢复,并将最新的所述剩余存储节点组中的所述权值矩阵中的与本次恢复的存储节点的所述目标权类对应的所述目标权值加1,得到最终的所述剩余存储节点组;Determine whether each storage node in the concurrently upgraded node group has been upgraded. If each storage node in the concurrently upgraded node group has been upgraded, restore the weight matrix of the upgraded storage node, and add 1 to the target weight corresponding to the target weight class of the restored storage node in the weight matrix in the latest remaining storage node group, to obtain the final remaining storage node group;
    对最终的所述剩余存储节点组重复执行节点剔除步骤、根据权类数量筛选存储节点、权值减1步骤、节点权值矩阵恢复及权值加1步骤,直至全部的存储节点均升级完成。Repeat the node elimination step, the storage node screening step according to the number of weight categories, the weight reduction step, the node weight matrix recovery and the weight increase step for the final remaining storage node group until all storage nodes are upgraded.
  10. 根据权利要求7所述的存储集群升级控制方法,其特征在于,在所述得到各个存储节点的更新后所述权值矩阵之后,所述方法还包括:The storage cluster upgrade control method according to claim 7, characterized in that after obtaining the updated weight matrix of each storage node, the method further comprises:
    将筛选出的节点进行并发升级,并根据升级节点的权类将剩余未升级节点中对应权类的权值减1。The screened nodes are upgraded concurrently, and the weights of the corresponding weight classes in the remaining non-upgraded nodes are reduced by 1 according to the weight classes of the upgraded nodes.
  11. 根据权利要求1所述的存储集群升级控制方法,其特征在于,所述 各个权类对应的权值大于等于1。The storage cluster upgrade control method according to claim 1, characterized in that The weight corresponding to each weight class is greater than or equal to 1.
  12. 根据权利要求1所述的存储集群升级控制方法,其特征在于,还包括:The storage cluster upgrade control method according to claim 1, characterized in that it also includes:
    获取所述存储集群的实时业务压力,并根据所述实时业务压力对与权类对应的权值进行调整。The real-time business pressure of the storage cluster is obtained, and the weight corresponding to the weight class is adjusted according to the real-time business pressure.
  13. 根据权利要求12所述的存储集群升级控制方法,其特征在于,所述根据所述实时业务压力对与权类对应的权值进行调整,包括:The storage cluster upgrade control method according to claim 12, characterized in that the step of adjusting the weight corresponding to the weight class according to the real-time business pressure comprises:
    如果所述实时业务压力超过预设压力值,则调高与权类对应的权值的大小,以减少并发数;If the real-time business pressure exceeds the preset pressure value, the weight corresponding to the weight class is increased to reduce the number of concurrencies;
    如果所述实时业务压力低于预设压力值,则调低与权类对应的权值的大小,以增大并发数。If the real-time business pressure is lower than the preset pressure value, the weight corresponding to the weight class is adjusted down to increase the number of concurrencies.
  14. 根据权利要求1所述的存储集群升级控制方法,其特征在于,还包括:The storage cluster upgrade control method according to claim 1, characterized in that it also includes:
    利用升级流程控制器触发所述存储集群升级处于暂停状态并对当前升级进度进行存档,或触发所述存储集群继续执行升级。The upgrade process controller is used to trigger the storage cluster upgrade to be in a suspended state and archive the current upgrade progress, or to trigger the storage cluster to continue to perform the upgrade.
  15. 根据权利要求14所述的存储集群升级控制方法,其特征在于,所述利用升级流程控制器触发所述存储集群升级处于暂停状态并对当前升级进度进行存档包括:The storage cluster upgrade control method according to claim 14, characterized in that the step of using the upgrade process controller to trigger the storage cluster upgrade to be in a paused state and archiving the current upgrade progress comprises:
    在所述升级结束前接收到用户发出的中断退出升级流程的指令的情况下,所述升级流程控制器响应所述指令触发集群暂停升级流程。In the case where an instruction to interrupt and exit the upgrade process is received from the user before the upgrade is completed, the upgrade process controller responds to the instruction to trigger the cluster to suspend the upgrade process.
  16. 根据权利要求14所述的存储集群升级控制方法,其特征在于,所述触发所述存储集群继续执行升级包括:The storage cluster upgrade control method according to claim 14, wherein triggering the storage cluster to continue to perform the upgrade comprises:
    在所述升级流程处于所述暂停状态的情况下,通过所述升级流程控制器的继续功能,恢复所述存储集群继续执行升级流程。When the upgrade process is in the paused state, the storage cluster is restored to continue executing the upgrade process through the resume function of the upgrade process controller.
  17. 根据权利要求1至16任一项所述的存储集群升级控制方法,其特征在于,所述存储集群为分布式存储系统下的集群且以可扩展的伪随机数据分布算法结构为基础的底层故障域结构。The storage cluster upgrade control method according to any one of claims 1 to 16 is characterized in that the storage cluster is a cluster under a distributed storage system and has an underlying fault domain structure based on a scalable pseudo-random data distribution algorithm structure.
  18. 一种存储集群升级控制装置,其特征在于,包括:A storage cluster upgrade control device, characterized by comprising:
    权类及权值确定模块,被配置为确定存储集群中的全部权类及与权类对应的权值;其中,权类包括所述存储集群中部署的全部存储服务和所述存储集群的底层故障域;权值表征对应的所述权类允许并发升级的节点个数;A weight class and weight value determination module is configured to determine all weight classes in the storage cluster and weight values corresponding to the weight classes; wherein the weight classes include all storage services deployed in the storage cluster and the underlying fault domains of the storage cluster; and the weight values represent the number of nodes that the corresponding weight class allows to be upgraded concurrently;
    权值矩阵生成模块,被配置为确定出所述存储集群中各个存储节点对应的目标权类及与所述目标权类对应的目标权值,以生成各个存储节点的权值矩阵;A weight matrix generation module is configured to determine a target weight class corresponding to each storage node in the storage cluster and a target weight corresponding to the target weight class to generate a weight matrix for each storage node;
    升级模块,被配置为基于各个存储节点的权值矩阵对所述存储集群中各个存储节点的在线并发升级过程进行控制。The upgrade module is configured to control the online concurrent upgrade process of each storage node in the storage cluster based on the weight matrix of each storage node.
  19. 一种电子设备,其特征在于,所述电子设备包括处理器和存储器;其中所述存储器被配置为存储计算机程序,所述计算机程序由所述处理器加载并执行以实现如权利要求1至17任一项所述的存储集群升级控制方法。An electronic device, characterized in that the electronic device comprises a processor and a memory; wherein the memory is configured to store a computer program, and the computer program is loaded and executed by the processor to implement the storage cluster upgrade control method according to any one of claims 1 to 17.
  20. 一种计算机非易失性可读存储介质,其特征在于,被配置为存储计 算机可执行指令,所述计算机可执行指令被处理器加载并执行时,实现如权利要求1至17任一项所述的存储集群升级控制方法。 A computer non-volatile readable storage medium, characterized in that it is configured to store computer A computer executable instruction, when the computer executable instruction is loaded and executed by a processor, implements the storage cluster upgrade control method according to any one of claims 1 to 17.
PCT/CN2023/131087 2022-11-11 2023-11-10 Storage cluster upgrade control method and apparatus, device, and storage medium WO2024099444A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211412298.8 2022-11-11
CN202211412298.8A CN115658116B (en) 2022-11-11 2022-11-11 Storage cluster upgrade control method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
WO2024099444A1 true WO2024099444A1 (en) 2024-05-16

Family

ID=85020783

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/131087 WO2024099444A1 (en) 2022-11-11 2023-11-10 Storage cluster upgrade control method and apparatus, device, and storage medium

Country Status (2)

Country Link
CN (1) CN115658116B (en)
WO (1) WO2024099444A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115658116B (en) * 2022-11-11 2023-03-28 苏州浪潮智能科技有限公司 Storage cluster upgrade control method, device, equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021003677A1 (en) * 2019-07-09 2021-01-14 华为技术有限公司 Service upgrade method and apparatus in distributed system, and distributed system
CN112463185A (en) * 2020-11-12 2021-03-09 苏州浪潮智能科技有限公司 Distributed cluster online upgrading method and related components
CN112463195A (en) * 2020-12-07 2021-03-09 苏州浪潮智能科技有限公司 Method, system, terminal and storage medium for cluster grouping online upgrade
US10990286B1 (en) * 2019-10-30 2021-04-27 EMC IP Holding Company LLC Parallel upgrade of nodes in a storage system
CN115658116A (en) * 2022-11-11 2023-01-31 苏州浪潮智能科技有限公司 Storage cluster upgrade control method, device, equipment and storage medium

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103118076B (en) * 2013-01-11 2015-05-13 烽火通信科技股份有限公司 Upgraded server cluster system and load balancing method thereof
CN110597531B (en) * 2019-08-05 2022-11-08 平安科技(深圳)有限公司 Distributed module upgrading method and device and storage medium
CN112650624B (en) * 2020-12-25 2023-05-16 浪潮(北京)电子信息产业有限公司 Cluster upgrading method, device, equipment and computer readable storage medium
CN114697213A (en) * 2022-03-30 2022-07-01 联想(北京)有限公司 Upgrading method and device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021003677A1 (en) * 2019-07-09 2021-01-14 华为技术有限公司 Service upgrade method and apparatus in distributed system, and distributed system
US10990286B1 (en) * 2019-10-30 2021-04-27 EMC IP Holding Company LLC Parallel upgrade of nodes in a storage system
CN112463185A (en) * 2020-11-12 2021-03-09 苏州浪潮智能科技有限公司 Distributed cluster online upgrading method and related components
CN112463195A (en) * 2020-12-07 2021-03-09 苏州浪潮智能科技有限公司 Method, system, terminal and storage medium for cluster grouping online upgrade
CN115658116A (en) * 2022-11-11 2023-01-31 苏州浪潮智能科技有限公司 Storage cluster upgrade control method, device, equipment and storage medium

Also Published As

Publication number Publication date
CN115658116B (en) 2023-03-28
CN115658116A (en) 2023-01-31

Similar Documents

Publication Publication Date Title
WO2024099444A1 (en) Storage cluster upgrade control method and apparatus, device, and storage medium
US10209908B2 (en) Optimization of in-memory data grid placement
US20180302335A1 (en) Orchestrating computing resources between different computing environments
US20050060608A1 (en) Maximizing processor utilization and minimizing network bandwidth requirements in throughput compute clusters
JP6246923B2 (en) Management server, computer system and method
EP2944070B1 (en) Service migration across cluster boundaries
US20070050768A1 (en) Incremental web container growth to control startup request flooding
US20090193113A1 (en) Systems and methods for grid-based data scanning
CN103744734A (en) Method, device and system for task operation processing
JP2018527668A (en) Method and system for limiting data traffic
US20120078858A1 (en) De-Duplicating Data in a Network with Power Management
EP3591530B1 (en) Intelligent backup and recovery of cloud computing environment
WO2014131263A1 (en) Rule set arrangement processing method and apparatus, and trunking data system
JP2004303190A (en) Program, information processor, method for controlling information processor, and recording medium
US20240134762A1 (en) System and method for availability group database patching
CN111522651A (en) Managing metadata for distributed processing systems
US8977752B2 (en) Event-based dynamic resource provisioning
US8819234B1 (en) Supplying data storage services
US10769153B2 (en) Computer system and method for setting a stream data processing system
CN114064438A (en) Database fault processing method and device
WO2024113898A1 (en) Metadata reporting method and apparatus, and device and storage medium
WO2014133502A1 (en) Sending a request to a management service
JP2014038551A (en) Data storage device, method for controlling data storage device, and control program of data storage device
JP6751231B2 (en) Job scheduler test program, job scheduler test method, and parallel processing device
US8214613B2 (en) Storage system and copy method

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23888129

Country of ref document: EP

Kind code of ref document: A1