CN116436839A

CN116436839A - Link self-adaptive fault tolerance method, device and server for storage multi-control cluster

Info

Publication number: CN116436839A
Application number: CN202310449577.XA
Authority: CN
Inventors: 吴磊; 王电轻
Original assignee: Suzhou Inspur Intelligent Technology Co Ltd
Current assignee: Suzhou Inspur Intelligent Technology Co Ltd
Priority date: 2023-04-24
Filing date: 2023-04-24
Publication date: 2023-07-14

Abstract

The invention relates to the field of storage multi-control clusters, and discloses a link self-adaptive fault-tolerant method, device and server for the storage multi-control clusters, which comprises the following steps: acquiring a first priority path set, and selecting a path in the first priority path set to perform communication among the storage multi-control clusters; when the communication performance of any path in the first priority path set is not consistent with the preset communication performance, migrating the path which is not consistent with the preset communication performance from the first priority path set to a second priority path set; when the paths in the first priority path set are empty, selecting paths from the second priority path set to perform communication among the storage multi-control clusters; the method removes paths possibly causing large communication delay or overtime among clusters from the first priority path set, ensures the cluster communication rate and avoids cluster lease overtime caused by cluster communication overtime; and meanwhile, when no path exists in the first priority path set, a path is selected from the second priority path set to communicate, so that the service is ensured to continue to run, and abnormal interruption of the service is avoided.

Description

Link self-adaptive fault tolerance method, device and server for storage multi-control cluster

Technical Field

The invention relates to the technical field of storage multi-control clusters, in particular to a link self-adaptive fault-tolerant method, device and server of a storage multi-control cluster.

Background

With the continuous development of technology in the IT field, the data centers in the industries such as finance and the like have higher and higher requirements on the reliability of the storage system, and the reliability of the storage system is required to be 99.9999%, so that strict requirements are put on the reliability of the storage system, and in the software testing activity, a multi-control cluster is generated.

The current storage multi-control cluster link mainly has two states, namely a normal state and a fault state. However, due to the complexity of the environment, the storage multi-control cluster link may have a state between the two states, such as communication between clusters through an optical fiber link, optical fiber link has an error code or a delay caused by folding, breakage or low power of an optical module, and a link repeatedly flashes caused by an optical module or a switch, which causes the inter-cluster communication I/O (Input/output) to be repeatedly retried or blocked on an unhealthy link, thereby causing I/O timeout, and if the timeout time reaches the cluster lease timeout time, the cluster lease timeout is caused, thereby causing abnormal cluster service.

It should be noted that the information disclosed in the above background section is only for enhancing understanding of the background of the present disclosure and thus may include information that does not constitute prior art known to those of ordinary skill in the art.

Disclosure of Invention

In view of this, the invention provides a link self-adaptive fault tolerance method, device and server for a storage multi-control cluster, so as to solve the problem that the existing storage multi-control cluster increases fault domains due to the addition of optical fibers and other connections, so that cluster communication I/O is repeatedly retried or blocked on unhealthy links, thereby causing I/O timeout and finally possibly causing excessive lease of cluster lease.

In a first aspect, the present invention provides a link adaptive fault tolerance method for a storage multi-control cluster, where the method includes: acquiring a first priority path set, and selecting a path in the first priority path set to perform communication among the storage multi-control clusters; wherein each path includes at least one link; when the communication performance of any path in the first priority path set is not consistent with the preset communication performance, migrating the path which is not consistent with the preset communication performance from the first priority path set to a second priority path set; and when the paths in the first priority path set are empty, selecting the paths from the second priority path set to perform communication among the storage multi-control clusters. Through the process, paths corresponding to links which possibly cause large communication delay or overtime among clusters can be removed from the first priority path set, so that the cluster communication rate is ensured to the maximum extent, and cluster lease overtime caused by cluster communication overtime failure is avoided; and meanwhile, when the paths in the first priority path set are empty, selecting the paths from the second priority path set to carry out communication among the storage multi-control clusters so as to ensure that the service can continue to run and avoid abnormal interruption of the service among the clusters.

In an alternative embodiment, after selecting a path in the first priority path set for communication between storage multi-control clusters, the method further comprises:

when any path in the first priority path set fails, migrating the failure path from the first priority path set to the failure path set; wherein the path fault comprises physical connection disconnection of any link in the path and/or node fault of any link;

issuing an example test program to each fault path in the fault path set to detect whether the fault path resumes normal communication;

when the failed path resumes normal communication, the failed path is migrated from the failed path set to the first priority path set.

In an alternative embodiment, the path inconsistent with the preset communication performance includes:

paths whose communication performance does not match the preset communication performance due to instability, delay or checking error of any link in the path.

In an alternative embodiment, when the communication performance of any path in the first priority path set is not consistent with the preset communication performance, the step of migrating the path not consistent with the preset communication performance from the first priority path set to the second priority path set includes:

When the number of times of flashing of a link corresponding to any path in the first priority path set in preset time is greater than a preset threshold value, migrating the path from the first priority path set to a second priority path set;

when a link corresponding to any path in the first priority path set has a first preset percentage of input/output data which is greater than delay time under an example measurement program, the path is migrated from the first priority path set to the second priority path set;

and calculating the newly added cyclic redundancy check errors on the links corresponding to any paths in the first priority path set, and when the amplification of the newly added cyclic redundancy check errors is larger than a second preset percentage compared with the last calculation node, migrating the paths from the first priority path set to the second priority path set.

In an alternative embodiment, after migrating paths that do not meet the preset communication performance from the first priority path set to the second priority path set, the method further includes:

issuing an example test program to each degradation path in the second priority path set;

and when the communication performance of the degraded path in the preset time accords with the preset communication performance, migrating the degraded path from the second priority path set to the first priority path set.

and when the number of times of flashing of the link corresponding to any path in the second priority path set in the preset time is greater than a preset threshold value, migrating the path from the second priority path set to the fault path set.

In an alternative embodiment, the method further comprises:

and when the paths in the first priority path set and the second priority path set are empty, reporting a cluster communication fault alarm.

In a second aspect, the present invention provides a link adaptive fault tolerant device for a storage multi-control cluster, where the fault tolerant device mainly includes: the system comprises a first path selection module, a path migration module and a second path selection module; the first path selection module is used for acquiring a first priority path set, selecting paths in the first priority path set to carry out communication among the storage multi-control clusters, wherein each path comprises at least one link; the path migration module is used for migrating paths which are inconsistent with the preset communication performance from the first priority path set to the second priority path set when the communication performance of any path in the first priority path set is inconsistent with the preset communication performance, and the second path selection module is used for selecting paths from the second priority path set to carry out communication among the storage multi-control clusters when the paths in the first priority path set are empty. Through the process, paths corresponding to links which possibly cause large communication delay or overtime among clusters can be removed from the first priority path set, so that the cluster communication rate is ensured to the maximum extent, and cluster lease overtime caused by cluster communication overtime failure is avoided; and meanwhile, when the paths in the first priority path set are empty, selecting the paths from the second priority path set to carry out communication among the storage multi-control clusters so as to ensure that the service can continue to run and avoid abnormal interruption of the service among the clusters.

In a third aspect, the present invention provides a server comprising: the memory and the processor are in communication connection, computer instructions are stored in the memory, and the processor executes the computer instructions, so that the link adaptive fault tolerance method of the storage multi-control cluster according to the first aspect or any implementation mode corresponding to the first aspect is executed by the processor.

In a fourth aspect, the present invention provides a computer readable storage medium, where computer instructions are stored on the computer readable storage medium, where the computer instructions are configured to cause a computer to perform the link adaptive fault tolerance method of the storage multi-control cluster according to the first aspect or any one of the embodiments corresponding to the first aspect.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are needed in the description of the embodiments or the prior art will be briefly described, and it is obvious that the drawings in the description below are some embodiments of the present invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic illustration of an application environment of an embodiment of the present invention;

FIG. 2 is a flow chart of a link adaptive fault tolerance method for a storage multi-control cluster according to an embodiment of the present invention;

FIG. 3 is a flow chart of another link adaptation fault tolerance method of a storage multi-control cluster according to an embodiment of the present invention;

FIG. 4 is a flow chart of a link adaptation fault tolerance method of a storage multi-control cluster according to an embodiment of the present invention;

FIG. 5 is a data flow diagram illustrating a link adaptation fault tolerance method for a further storage multi-control cluster in accordance with an embodiment of the present invention;

FIG. 6 is a block diagram of a link adaptation fault tolerance device for a storage multi-control cluster in accordance with an embodiment of the present invention;

fig. 7 is a schematic diagram of a hardware structure of a server according to an embodiment of the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Referring to fig. 1, fig. 1 is a schematic diagram of an application environment provided by an embodiment of the present invention, where the schematic diagram includes a first storage multi-control server 10 and a second storage multi-control server 20, the first storage multi-control server 10 and the second storage multi-control server 20 are connected through an optical fiber switch 30 to form a storage multi-control cluster, and communication paths between the first storage multi-control server 10 and the second storage multi-control server 20 form a cluster link for communication between the storage multi-control clusters, that is, each path includes at least one link.

And deploying cluster software on each storage multi-control server to form a storage multi-control cluster system. The storage multi-control cluster system distributes all cluster links to a first priority path set through cluster software, and selects paths in the first priority path set for communication; when any path in the first priority path set fails, migrating the failure path from the first priority path set to the failure path set, and migrating the failure path to the first priority path set when the failure link is monitored to restore normal communication; when the communication performance of any path in the first priority path set is not consistent with the preset communication performance, migrating the path which is not consistent with the preset communication performance from the first priority path set to the second priority path set, and migrating each degraded path in the path set to the first priority path set or the fault path set according to the communication performance; when the paths in the first priority path set are empty, selecting paths from the second priority path set to perform communication among the storage multi-control clusters; and when the paths in the first priority path set and the second priority path set are empty, reporting a cluster communication fault alarm.

Two storage multi-control server storages are only examples, and the storage multi-control clusters comprise double control, four control, six control, eight control and sixteen control or more; the cluster link is a communication path between storage multi-control servers, for example, one double-control storage server is 1 control cabinet, and when 2 double-control storage servers form a storage multi-control cluster, a Fiber Channel (FC) protocol or other protocols are needed to connect and communicate, where the cluster link may be a communication link between 2 double-control storage servers.

According to an embodiment of the present invention, there is provided an embodiment of a link adaptation fault tolerance method for a storage multi-control cluster, it being noted that the steps shown in the flowchart of the figures may be performed in a computer system such as a set of computer executable instructions, and that although a logical order is shown in the flowchart, in some cases the steps shown or described may be performed in an order different from that shown herein.

In this embodiment, a link adaptive fault tolerance method of a storage multi-control cluster is provided, which may be used in the storage multi-control server described above, and fig. 2 is a flowchart of a link adaptive fault tolerance method of a storage multi-control cluster according to an embodiment of the present invention, as shown in fig. 2, where the flowchart includes the following steps:

Step S201, a first priority path set is obtained, and a path is selected in the first priority path set to carry out communication among storage multi-control clusters.

In this embodiment, first, a communication connection between storage multi-control servers is established, so as to obtain a storage multi-control cluster. And distributing a cluster link formed by the communication paths among the storage multi-control clusters to an optimal path set, and selecting paths in the optimal path set for communication so as to ensure the efficiency and quality of the communication among the storage multi-control clusters. Wherein each path includes at least one link.

In an alternative implementation mode, two storage multi-control servers are normally operated and are connected to an optical fiber switch through optical fibers and are communicated through configuration ports (zones), the two storage multi-control servers establish a storage multi-control cluster system through cluster software, and a path health state monitoring program of the storage multi-control cluster system is started. Meanwhile, all cluster links formed by storing the communication paths among the multi-control clusters are distributed to a first priority path set, and paths are selected in the first priority path set to perform cluster communication. Namely, according to the condition of storing the input/output data of each physical link between the multi-control servers, the optimal path is dynamically selected, and the transmission efficiency, reliability and stability of the input/output data are improved. The path health status program is actually a sub-module in the cluster software, and can be understood as an independent process running on the main configuration node of the storage multi-control cluster and used for monitoring the link status of the storage multi-control cluster.

Optionally, when the first priority path set selects a path for communication between the storage multi-control clusters, the path may be selected from the first priority path set for communication between the storage multi-control clusters by a routing algorithm, where the routing algorithm includes, but is not limited to, a polling scheduling algorithm, a minimum input/output data queue depth scheduling algorithm, and a minimum input/output data task amount scheduling algorithm.

When the number of the input/output data queues and the number of bytes corresponding to each physical link are quite large, a polling scheduling algorithm can be adopted to select paths, namely, the input/output data is polled and issued on each link. When the number of the input/output data queues corresponding to each physical link has a larger difference between certain links, a minimum input/output data queue depth scheduling algorithm can be adopted to select paths, namely, a path with relatively smaller number of each input/output data queue is selected by using the minimum input/output data queue depth scheduling algorithm to transmit the input/output data, and the current input/output data transmission is performed. When the corresponding input/output data queue byte on a physical link is larger, a minimum input/output data task amount scheduling algorithm can be adopted, namely, the current input/output data is inserted into a queue with relatively higher data transmission pressure.

For example, two storage multi-control servers are connected through an optical fiber switch, and cluster links are all distributed to a first priority path set (OptPathSet) through cluster software to form an OptPathSet [ Path1, path2 … PathN ], and the storage multi-control cluster system only issues input/output data on the OptPathSet selection path according to a routing algorithm in an initial state, if the routing algorithm is a round-robin scheduling algorithm (round-robin), the input/output data of cluster communication is issued to all paths in the OptPathSet in a round-robin manner.

Step S202, when the communication performance of any path in the first priority path set is not consistent with the preset communication performance, the path not consistent with the preset communication performance is migrated from the first priority path set to the second priority path set.

In this embodiment, after input/output data is issued by selecting a path in the first priority path set, the communication performance of each path is monitored by the path health status monitoring program, and when the communication performance of any path in the first priority path set is inconsistent with the preset communication performance, the path inconsistent with the preset communication performance is migrated from the first priority path set to the second priority path set, so that the path is no longer involved in communication between the storage multi-control clusters, thereby ensuring that the communication link between the storage multi-control clusters is always in a healthy on-line state, and avoiding the problem of overtime or failure of the communication between the storage multi-control clusters due to unstable link, high delay error code and other reasons.

Optionally, the path inconsistent with the preset communication performance includes: paths whose communication performance does not match the preset communication performance due to instability, delay or checking error of any link in the path. The link instability may include link flashover, such as link flashover data caused by a fabric switch problem; the link delay may include link delay data caused by link flash data and/or link error data; verification errors of a link may include optical fiber cable folding, breakage, or low power optical modules causing error data to appear on the link.

In an alternative implementation manner, if any link corresponding to path3 has an error code on the link due to folding, breakage or low power of the optical module of the optical fiber cable, or if the link is repeatedly flashed due to a problem of the optical fiber switch, path3 is removed from the OptPathSet, and the path is put into a second priority path set (degradpathset), where the OptPathSet is [ path1, path4 … pathN ], the degradpathset is [ path3], and if paths corresponding to other links also have a flash or error code, the OptPathSet and the degradpathset are further updated, and when an available path exists in the OptPathSet, only the path is selected from the OptPathSet to send input/output data for opt-storage multi-control inter-cluster communication, so that the problem of performance degradation or input/output blocking caused by the retry of faults among the clusters is avoided.

In step S203, when the paths in the first priority path set are empty, a path is selected from the second priority path set to perform communication between the storage multi-control clusters.

In this embodiment, when the paths in the first priority path set are empty, the paths are selected from the second priority path set to perform communication between the storage multi-control clusters, so as to temporarily maintain simple operation of the storage multi-control cluster system, instead of directly causing the storage multi-control clusters to crash.

In an alternative implementation manner, if all paths in the OptPathSet fail or degrade to cause no paths in the OptPathSet to be available, the storage multi-control cluster system selects paths to send input/output data from the degradePathSet to perform inter-storage multi-control cluster communication, so that the service can be ensured to continue to run, and although a certain time delay or timeout exists, the inter-storage multi-control clusters can perform communication.

According to the link self-adaptive fault-tolerant method for the storage multi-control clusters, a first priority path set is obtained, and paths are selected in the first priority path set for communication, so that the efficiency and the quality of communication among the storage multi-control clusters are guaranteed; when the communication performance of any path in the first priority path set is inconsistent with the preset communication performance, the path inconsistent with the preset communication performance is migrated from the first priority path set to the second priority path set, so that the path is not involved in the input/output data communication among the storage multi-control clusters, the communication link among the storage multi-control clusters is always in a healthy on-line state, and the problems of overtime or failure of the communication among the storage multi-control clusters caused by unstable links, high delay, error codes and the like are avoided; when the paths in the first priority path set are empty, the paths are selected from the second priority path set to communicate among the storage multi-control clusters so as to temporarily maintain the simple operation of the storage multi-control cluster system, and the storage multi-control clusters are not directly crashed.

In this embodiment, a link adaptive fault tolerance method of a storage multi-control cluster is provided, which may be used in the storage multi-control server described above, and fig. 3 is a flowchart of a link adaptive fault tolerance method of a storage multi-control cluster according to an embodiment of the present invention, as shown in fig. 3, where the flowchart includes the following steps:

step S301, a first priority path set is obtained, and a path is selected in the first priority path set to carry out communication among storage multi-control clusters.

Please refer to step S201 in the embodiment shown in fig. 2 in detail, which is not described herein.

In step S302, when the communication performance of any path in the first priority path set is not consistent with the preset communication performance, the path not consistent with the preset communication performance is migrated from the first priority path set to the second priority path set.

Specifically, the step S302 includes:

in step S3021, when the number of flashes of the link corresponding to any path in the first priority path set in the preset time is greater than the preset threshold, the path is migrated from the first priority path set to the second priority path set.

In this embodiment, when the number of times of flashovers of a link corresponding to any path in the first priority path set in a preset time is greater than a preset threshold, the path is migrated from the first priority path set to the second priority path set, so that the path is no longer involved in the input/output data communication between the storage multi-control clusters, thereby ensuring that the communication link between the storage multi-control clusters is always in a healthy online state, and avoiding the problem of overtime or failure of the communication between the storage multi-control clusters due to unstable links and other reasons. The preset time, the number of times of flashing in the preset time and the specific value of the preset threshold value can be adjusted according to actual requirements.

In an alternative embodiment, if the path3 corresponding to the link is repeatedly flashed due to the fabric switch problem, if the number of times of the link flashing exceeds 3 in 30 minutes, that is, the link is unstable, the path3 is removed from the OptPathSet, and put into the second priority path set (degradpathset), where the OptPathSet is [ path1, path4 … pathN ], and the degradpathset is [ path3], and if other links are flashed or have errors, the OptPathSet and the degradpathset are further updated. Wherein a link down (down) drive returns a link error (linkdown) error code to the storage system, the storage system sets the cluster link as a fault (fault), removes the cluster link from the preferred path set, and changes the cluster link from the fault to the preferred path after the link is recovered, so that the link is proved to be unstable when repeated for a plurality of times, and the link is unstable when intermittently seen. So called link instability degradation.

In step S3022, when a link corresponding to any path in the first priority path set has a first preset percentage of input/output data greater than the delay time in the example measurement procedure, the path is migrated from the first priority path set to the second priority path set.

In this embodiment, when a first preset percentage of input/output data is greater than a delay time under an exemplary measurement procedure for a link corresponding to any path in the first priority path set, the path is migrated from the first priority path set to the second priority path set, so that the communication link between the storage multi-control clusters is always in a healthy online state, and the problem of overtime or failure of the communication between the storage multi-control clusters due to high delay of the link is avoided. The specific values of the first preset percentage and the delay time can be adjusted according to actual requirements.

In an alternative embodiment, if the path3 corresponding to the link is checked for link response delay by the example measurement procedure, if 50% of the 100 input/output data exceeds 500ms, it is considered as link high delay, it is to be set as link high delay degradation, path3 is removed from the OptPathSet, and put into the second priority path set (degradpathset), where the OptPathSet is [ path1, path4 … pathN ], degradpathset is [ path3], and if other links also have a flash or error, the OptPathSet and degradpathset are updated further.

In step S3023, a new crc error on the link corresponding to any path in the first priority path set is calculated, and when the increase of the new crc error is greater than the previous calculation node by a second predetermined percentage, the path is migrated from the first priority path set to the second priority path set.

In this embodiment, a new cyclic redundancy check error (yclic Redundancy Check, CRC) on a link corresponding to any path in the first priority path set is calculated through a periodic, irregular or calculation request, and when the increase of the new cyclic redundancy check error is greater than a second preset percentage compared with the previous calculation node, the path is migrated from the first priority path set to the second priority path set, so as to ensure that the communication link between the storage multi-control clusters is always in a healthy online state, and avoid the problem of overtime or failure of the communication between the storage multi-control clusters caused by link error codes and other reasons. The value of the second preset percentage can be adjusted according to actual requirements.

In an alternative embodiment, if path3 corresponding to the link, a new cyclic redundancy check error (mostly caused by error code) of the link due to folding, breakage or low power of the optical module is counted by a periodic, unscheduled or calculation request, if the increase exceeds 80% by the previous period, the link is set as error code degradation, path3 is removed from the OptPathSet, and the path3 is put into a second priority path set (degradpathset), where OptPathSet is [ path1, path4 … pathN ], degradpathset is [ path3], and if other links also have flash or error code, the OptPathSet and degradpathset are updated further.

Step S303, issuing an example test program to each degradation path in the second priority path set.

In an alternative embodiment, the path health monitor program may issue the example test program to the paths in the second priority path set on a regular, irregular, or on a test request basis. The example test procedure may be issued every 5 seconds when issuing to each of the degraded paths in the second priority path set. The example test program is management I/O actively issued by the cluster software and is used for routinely issuing the I/O to check corresponding data of the I/O through all cluster links.

In step S304, when the communication performance of the degraded path in the preset time accords with the preset communication performance, the degraded path is migrated from the second priority path set to the first priority path set.

In an alternative embodiment, the degraded path is migrated from the second priority path set to the first priority path set when the degraded path does not continue to experience a flash or error for 30 minutes and there is no delay.

After the path is degraded, the drive returns a linkdown error code to the storage multi-control cluster system, the storage multi-control cluster system removes the path from the preferred path set, and after the link is recovered, the path is changed from the degraded path to the preferred path, if the path is repeated for a plurality of times, the link is proved to be unstable, and the link is unstable in the whole although the intermittent view is enabled, so that the degradation is the unstable link.

In step S305, when the number of times of flashing of the link corresponding to any path in the second priority path set in the preset time is greater than the preset threshold, the path corresponding to the link is migrated from the second priority path set to the failure path set.

In an alternative embodiment, the degraded path is migrated from the second priority path set to the failed path set when the number of flashes of the degraded path is greater than 3 in 30 minutes. The preset time, the number of times of flashing in the preset time and the specific value of the preset threshold value can be adjusted according to actual requirements.

In step S306, when the paths in the first priority path set are empty, a path is selected from the second priority path set to perform communication between the storage multi-control clusters.

Please refer to step S203 in the embodiment shown in fig. 2 in detail, which is not described herein.

Step S307, when the paths in the first priority path set and the second priority path set are empty, reporting the cluster communication fault alarm.

According to the link self-adaptive fault-tolerant method for the storage multi-control clusters, a first priority path set is obtained, and paths are selected in the first priority path set for communication, so that the efficiency and the quality of communication among the storage multi-control clusters are guaranteed; when the communication performance of any path in the first priority path set is inconsistent with the preset communication performance, the path inconsistent with the preset communication performance is migrated from the first priority path set to the second priority path set, so that the path is not involved in the input/output data communication between the storage multi-control clusters, the communication link between the storage multi-control clusters is always in a healthy on-line state, and the problem of overtime or failure of the communication between the storage multi-control clusters caused by link error codes and the like is avoided; when the paths in the first priority path set are empty, selecting paths from the second priority path set to communicate among the storage multi-control clusters so as to temporarily maintain simple operation of the storage multi-control cluster system, and not directly cause the storage multi-control clusters to crash; by issuing an example measurement program to each degradation path in the second priority path set, when the communication performance of the degradation path in the preset time accords with the preset communication performance, the degradation path is migrated from the second priority path set to the first priority path set, so that the normal communication path is prevented from being still in the second priority path set; migrating degraded paths which still cannot recover normal communication in preset time from the second priority path set to the fault path set by centralizing the second priority path set to ensure the reliability of the second priority path set, and reminding a user to timely conduct fault troubleshooting of the fault path; when the paths in the first priority path set and the second priority path set are empty, reporting a cluster communication fault alarm to remind a user of timely performing fault investigation on the fault paths.

In this embodiment, a link adaptive fault tolerance method of a storage multi-control cluster is provided, which may be used in the storage multi-control server described above, and fig. 4 is a flowchart of a link adaptive fault tolerance method of a storage multi-control cluster according to an embodiment of the present invention, as shown in fig. 4, where the flowchart includes the following steps:

step S401, a first priority path set is obtained, and a path is selected in the first priority path set to carry out communication among storage multi-control clusters.

Step S402, when any path in the first priority path set fails, the failed path is migrated from the first priority path set to the failed path set.

Wherein, the path fault comprises the physical connection disconnection of any link in the path and/or the node fault of any link.

In an alternative embodiment, if the physical connection of any link in path2 is broken, and/or if a node of any link fails, path2 is removed from the OptPathSet and placed into a failed Path set (FaultPathSet), where OptPathSet is [ path1, path3 … PathN ], faultPathSet is [ path2], and if other links also have a link break, the OptPathSet and FaultPathSet are further updated. All path failures follow the international standard protocol t10.Org, for example, after a link is unplugged, the driver returns a linkdown error code to the storage multi-control cluster system, and the storage multi-control cluster system sets (through) path as fault according to established logic.

Step S403, issuing an example test program to each fault path in the fault path set to detect whether the fault path resumes normal communication;

in an alternative embodiment, the path health monitor program issues a sample test program to the paths in the set of faulty paths every 5 seconds. The period of each fault path issuing example testing program in the fault path set can be adjusted according to actual requirements, and non-periodic or issuing according to example testing requests can be adopted.

Step S404, when the fault path resumes normal communication, the fault path is migrated from the fault path set to the first priority path set.

In this embodiment, when the failure path resumes normal communication, the failure path is migrated from the failure path set to the first priority path set to perform normal communication.

In step S405, when the communication performance of any path in the first priority path set is not consistent with the preset communication performance, the path not consistent with the preset communication performance is migrated from the first priority path set to the second priority path set.

Please refer to step S202 in the embodiment shown in fig. 2, which is not described herein.

In step S406, when the paths in the first priority path set are empty, a path is selected from the second priority path set to perform communication between the storage multi-control clusters.

According to the link self-adaptive fault-tolerant method for the storage multi-control clusters, a first priority path set is obtained, and paths are selected in the first priority path set for communication, so that the efficiency and the quality of communication among the storage multi-control clusters are guaranteed; when the communication performance of any path in the first priority path set is inconsistent with the preset communication performance, the path inconsistent with the preset communication performance is migrated from the first priority path set to the second priority path set, so that the path is not involved in the input/output data communication between the storage multi-control clusters, the communication link between the storage multi-control clusters is always in a healthy on-line state, and the problem of overtime or failure of the communication between the storage multi-control clusters caused by link error codes and the like is avoided; when the paths in the first priority path set are empty, selecting paths from the second priority path set to communicate among the storage multi-control clusters so as to temporarily maintain simple operation of the storage multi-control cluster system, and not directly cause the storage multi-control clusters to crash; and issuing an example test program to each fault path in the fault path set so as to transfer the fault path from the fault path set to the first priority path set when the fault path resumes normal communication, thereby avoiding that the path for normal communication is still in the fault path set.

As one or more specific application embodiments of the present invention, as shown in fig. 5, includes:

first, two storage multi-control servers are normally operated, are connected to a fiber switch through optical fibers, and are configured with ports (zones) for communication. The two storage multi-control servers establish a storage multi-control cluster system through cluster software, and a path health state monitoring program of the storage multi-control cluster system is started.

Second, the cluster software assigns the cluster links all to a first priority path set (OptPathSet) forming an OptPathSet [ Path1, path2 … Path N ]. The storage multi-control cluster system only transmits the cluster input/output data on the (OptPathSet) selection path according to the routing algorithm in the initial state. If the routing algorithm is round-robin, the trunking communication round-robin sends the trunking input/output data to all paths in the (OptPathSet) path set. Wherein each path includes at least one link.

Third, during the operation of the storage multi-control cluster system, if the link in the path2 is disconnected, the path2 is removed from the OptPathSet, and put into the fault path set (FaultPathSet), where the OptPathSet is [ path1, path3 … path N ], and the FaultPathSet is [ path2]. The OptPathSet and FaultPathSet will be further updated if link breaks occur in other paths as well. When an available path exists in the OptPathSet, the path is selected from the OptPathSet path set only to issue the cluster input/output data for inter-cluster service communication, so that the problem that the performance of the cluster input/output data is reduced or the input/output is blocked due to the retry of a fault is avoided; at this time, the path health status monitor program will issue an example measurement program to the path of the FaultPathSet every 5 seconds, and if the path resumes the connection, the link will be removed from the FaultPathSet and put back into the OptPathSet.

Fourth, if path3 is error-coded on the link due to fiber cable folding, breakage, or low power of the optical module, or the link is repeatedly flashed due to a switch problem, the path3 is removed from the OptPathSet and put into a second priority path set (degradpathset) when the OptPathSet is [ path1, path4 … pathN ], the FaultPathSet is [ path2], and the degradpathset is [ path3] if the number of flashes exceeds 3 times in 30 minutes. If other links have flash or error codes, the OptPathSet and the DegradePathSet are further updated, and when available paths exist in the OptPathSet, the paths are selected from the OptPathSet path set only to send the input/output data for inter-cluster communication, so that the problems of performance degradation, input/output blocking and the like caused by the retry of faults of the cluster input/output data are prevented; at this time, the path health status monitor program will issue an example test program to the degradePathSet every 5 seconds, and if the path does not continue to have a flash or error code within 30 minutes, the path will be removed from the degradePathSet and put back into the OptPathSet. If the degraded path is disconnected in the degradePathSet, the degraded path is switched from the degradePathSet to the FaultPathSet.

Fifth, if all paths in the OptPathSet fail or degrade, so that no paths in the OptPathSet are available, the cluster system will select a path from the degrader pathset to send the cluster input/output data to perform inter-cluster communication, so as to ensure that the service can continue to run, and although there is a certain time delay or timeout, the inter-cluster communication can be performed.

Sixth, if the paths in the OptPathSet and the DegradePathSet are empty, the cluster will report a fault alarm for the cluster communication, there is no available path between the whole clusters, and the service between the clusters is abnormally interrupted.

The embodiment also provides a link adaptive fault tolerant device for storing the multi-control cluster, which is used for implementing the foregoing embodiment and the preferred implementation manner, and is not described in detail. As used below, the term "module" may be a combination of software and/or hardware that implements a predetermined function. While the means described in the following embodiments are preferably implemented in software, implementation in hardware, or a combination of software and hardware, is also possible and contemplated.

The present embodiment provides a link adaptive fault tolerant device for storing a multi-control cluster, as shown in fig. 6, including:

the first path selection module 601 is configured to obtain a first priority path set, and select a path in the first priority path set to perform communication between storage multi-control clusters.

Wherein each path includes at least one link.

The path migration module 602 is configured to migrate, when the communication performance of any path in the first priority path set is inconsistent with the preset communication performance, a path inconsistent with the preset communication performance from the first priority path set to the second priority path set.

Wherein, the path which does not accord with the preset communication performance comprises:

In some alternative embodiments, the path migration module 602 includes:

the first migration unit is configured to migrate a path from the first priority path set to the second priority path set when the number of times of flashing of a link corresponding to any path in the first priority path set in a preset time is greater than a preset threshold.

The first migration unit is further configured to migrate the failed path from the first priority path set to the failed path set when any path in the first priority path set fails. Wherein, the path fault comprises the physical connection disconnection of any link in the path and/or the node fault of any link.

And the second migration unit is used for migrating the path from the first priority path set to the second priority path set when the link corresponding to any path in the first priority path set has a first preset percentage of input/output data which is greater than the delay time under the example measurement procedure.

And the second migration unit is used for calculating the newly added cyclic redundancy check errors on the links corresponding to any path in the first priority path set, and migrating the path from the first priority path set to the second priority path set when the amplification of the newly added cyclic redundancy check errors is larger than the previous calculation node by a second preset percentage.

And the second path selection module 603 is configured to select a path from the second priority path set to perform communication between the storage multi-control clusters when the paths in the first priority path set are empty.

In some alternative embodiments, the second routing module 603 includes:

and the example measurement issuing unit is used for issuing an example measurement program to each degradation path in the second priority path set.

The example test issuing unit is further used for issuing an example test program to each fault path in the fault path set so as to detect whether the fault path resumes normal communication.

A fourth migration unit for migrating the degraded path from the second priority path set to the first priority path set when the communication performance of the degraded path in the preset time accords with the preset communication performance

And the fourth migration unit is further used for migrating the fault path from the fault path set to the first priority path set when the fault path resumes normal communication.

And the fifth migration unit is used for migrating the path corresponding to the link from the second priority path set to the fault path set when the flashing times of the link corresponding to any path in the second priority path set in the preset time is greater than a preset threshold value.

In some optional embodiments, the apparatus further includes a fault alert module configured to report a trunking communication fault alert when the paths in the first priority path set and the second priority path set are empty.

The link-adaptive fault tolerance of the storage multi-control cluster in this embodiment is presented in the form of functional units, where the units refer to ASIC circuits, processors and memories executing one or more software or fixed programs, and/or other devices that can provide the above-described functionality.

Further functional descriptions of the above respective modules and units are the same as those of the above corresponding embodiments, and are not repeated here.

The embodiment of the invention also provides a server which is provided with the link self-adaptive fault-tolerant device for the storage multi-control cluster shown in the figure 6.

Referring to fig. 7, fig. 7 is a schematic structural diagram of a server according to an alternative embodiment of the present invention, as shown in fig. 7, the server includes: one or more processors 10, memory 20, and interfaces for connecting the various components, including high-speed interfaces and low-speed interfaces. The various components are communicatively coupled to each other using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions executing within the server, including instructions stored in or on memory to display graphical information of the GUI on an external input/output device, such as a display apparatus coupled to the interface. In some alternative embodiments, multiple processors and/or multiple buses may be used, if desired, along with multiple memories and multiple memories. Also, multiple servers may be connected, with each server providing a portion of the necessary operations (e.g., as a server array, a set of blade servers, or a multiprocessor system). One processor 10 is illustrated in fig. 7.

The processor 10 may be a central processor, a network processor, or a combination thereof. The processor 10 may further include a hardware chip, among others. The hardware chip may be an application specific integrated circuit, a programmable logic device, or a combination thereof. The programmable logic device may be a complex programmable logic device, a field programmable gate array, a general-purpose array logic, or any combination thereof.

Wherein the memory 20 stores instructions executable by the at least one processor 10 to cause the at least one processor 10 to perform a method for implementing the embodiments described above.

The memory 20 may include a storage program area that may store an operating system, at least one application program required for functions, and a storage data area; the storage data area may store data created from the use of a server of the presentation of an applet landing page, and the like. In addition, the memory 20 may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid-state storage device. In some alternative embodiments, memory 20 may optionally include memory located remotely from processor 10, which may be connected to the server via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

Memory 20 may include volatile memory, such as random access memory; the memory may also include non-volatile memory, such as flash memory, hard disk, or solid state disk; the memory 20 may also comprise a combination of the above types of memories.

The server also includes a communication interface 30 for the server to communicate with other devices or communication networks.

The embodiments of the present invention also provide a computer readable storage medium, and the method according to the embodiments of the present invention described above may be implemented in hardware, firmware, or as a computer code which may be recorded on a storage medium, or as original stored in a remote storage medium or a non-transitory machine readable storage medium downloaded through a network and to be stored in a local storage medium, so that the method described herein may be stored on such software process on a storage medium using a general purpose computer, a special purpose processor, or programmable or special purpose hardware. The storage medium can be a magnetic disk, an optical disk, a read-only memory, a random access memory, a flash memory, a hard disk, a solid state disk or the like; further, the storage medium may also comprise a combination of memories of the kind described above. It will be appreciated that a computer, processor, microprocessor controller or programmable hardware includes a storage element that can store or receive software or computer code that, when accessed and executed by the computer, processor or hardware, implements the methods illustrated by the above embodiments.

Although embodiments of the present invention have been described in connection with the accompanying drawings, various modifications and variations may be made by those skilled in the art without departing from the spirit and scope of the invention, and such modifications and variations fall within the scope of the invention as defined by the appended claims.

Claims

1. A link adaptive fault tolerance method for a storage multi-control cluster, the method comprising:

acquiring a first priority path set, and selecting a path in the first priority path set to perform communication among storage multi-control clusters; wherein each path includes at least one link;

when the communication performance of any path in the first priority path set is not consistent with the preset communication performance, migrating the path not consistent with the preset communication performance from the first priority path set to a second priority path set;

and selecting a path from the second priority path set to perform communication among the storage multi-control clusters when the paths in the first priority path set are empty.

2. The method of claim 1, wherein after selecting a path in the first set of priority paths for communication between storage multi-control clusters, the method further comprises:

When any path in the first priority path set fails, migrating a failure path from the first priority path set to a failure path set; wherein the path fault comprises physical connection disconnection of any link in the path and/or node fault of any link;

and when the fault path resumes normal communication, migrating the fault path from the fault path set to the first priority path set.

3. The method of claim 2, wherein the path that does not correspond to a preset communication performance comprises:

4. A method according to claim 3, wherein when the communication performance of any path in the first set of priority paths is inconsistent with a preset communication performance, the step of migrating the path inconsistent with a preset communication performance from the first set of priority paths to a second set of priority paths comprises:

When the flashing times of the links corresponding to any path in the first priority path set in the preset time is greater than a preset threshold value, migrating the path from the first priority path set to a second priority path set;

when a first preset percentage of input/output data exists in a link corresponding to any path in the first priority path set under an example measurement program and is greater than delay time, the path is migrated from the first priority path set to a second priority path set;

and calculating a new added cyclic redundancy check error on a link corresponding to any path in the first priority path set, and migrating the path from the first priority path set to a second priority path set when the increase of the new added cyclic redundancy check error is larger than a second preset percentage compared with the last calculation node.

5. The method of claim 4, wherein after migrating the paths that do not meet a preset communication performance from the first priority path set to a second priority path set, the method further comprises:

6. The method of claim 5, wherein after migrating paths that do not meet a preset communication performance from the first set of priority paths to the second set of priority paths, the method further comprises:

and when the flashing times of the link corresponding to any path in the second priority path set in the preset time is greater than a preset threshold value, migrating the path corresponding to the link from the second priority path set to a fault path set.

7. The method according to any one of claims 1 to 6, further comprising:

8. A link-adaptive fault tolerance apparatus for a storage multi-control cluster, the apparatus comprising:

the first path selection module is used for acquiring a first priority path set and selecting a path in the first priority path set to carry out communication among the storage multi-control clusters; wherein each path includes at least one link;

a path migration module, configured to migrate, when a communication performance of any path in the first priority path set is not consistent with a preset communication performance, the path inconsistent with the preset communication performance from the first priority path set to a second priority path set;

And the second path selection module is used for selecting a path from the second priority path set to carry out communication among the storage multi-control clusters when the paths in the first priority path set are empty.

9. A server, comprising:

a memory and a processor in communication with each other, the memory having stored therein computer instructions which, upon execution, cause the processor to perform the method of any of claims 1 to 7.

10. A computer readable storage medium having stored thereon computer instructions for causing a computer to perform the method of any one of claims 1 to 7.