WO2024078015A1 - 基于镜像对的故障注入方法、装置、设备和存储介质 - Google Patents

基于镜像对的故障注入方法、装置、设备和存储介质 Download PDF

Info

Publication number
WO2024078015A1
WO2024078015A1 PCT/CN2023/102838 CN2023102838W WO2024078015A1 WO 2024078015 A1 WO2024078015 A1 WO 2024078015A1 CN 2023102838 W CN2023102838 W CN 2023102838W WO 2024078015 A1 WO2024078015 A1 WO 2024078015A1
Authority
WO
WIPO (PCT)
Prior art keywords
node
injected
fault
type
mirror pair
Prior art date
Application number
PCT/CN2023/102838
Other languages
English (en)
French (fr)
Inventor
刘粉粉
Original Assignee
苏州元脑智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 苏州元脑智能科技有限公司 filed Critical 苏州元脑智能科技有限公司
Publication of WO2024078015A1 publication Critical patent/WO2024078015A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/3668Software testing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/3668Software testing
    • G06F11/3672Test management
    • G06F11/3688Test management for test execution, e.g. scheduling of test suites

Definitions

  • the present application relates to the field of computer technology, and in particular to a fault injection method, apparatus, computer equipment and storage medium based on a mirror pair.
  • Fault injection technology in software testing methods is widely used to verify software tolerance, robustness, security and reliability testing.
  • Fault injection refers to artificially and consciously generating faults in the target system according to the selected fault model to accelerate the occurrence of errors and failures in the target system.
  • information such as fault tolerance and fault safety can be verified.
  • the current product equipment can only support fault injection of one node, and the fault injection command is executed on a single node through an automated test script to achieve single-node fault injection.
  • the current fault injection method is used to simulate the fault, and then the next node fault injection simulation is entered after the fault injection simulation of one node is completed, which wastes time.
  • a fault injection method based on a mirror pair comprises:
  • the cluster includes multiple nodes, and the mirror pair information includes the mirror pair and the associated nodes.
  • recovery information of the fault-injected node is obtained, and when the recovery information of the fault-injected node is recovered, the step of determining the node to be injected with the fault from multiple nodes according to the mirror pair information in the preset mirror pair record table is returned; until the number of fault injections reaches a preset fault injection threshold, the fault injection is stopped.
  • the steps before logging into the cluster, the steps include: determining each node in the cluster; forming a mirror pair of each node with an adjacent node to form corresponding mirror pair information; and generating a preset mirror pair record table according to each mirror pair information.
  • a node to be injected with faults is determined from multiple nodes based on mirror pair information in a preset mirror pair record table, including: obtaining a preset fault injection rule; obtaining a type of fault to be injected and a type of node to be injected with faults according to the preset fault injection rule; and determining a node to be injected with faults from multiple nodes based on the type of fault to be injected, the type of node to be injected with faults, and the mirror pair information.
  • the node to be injected is determined from multiple nodes according to the type of fault to be injected, the type of node to be injected and the mirror pair information, including: when the type of node to be injected is sequential injection, one node is determined as the node to be injected according to the order of nodes described in the mirror pair information; when the type of node to be injected is random injection, one node is randomly selected from multiple nodes according to the mirror pair information to be determined as the node to be injected.
  • the node to be injected is determined from multiple nodes according to the type of fault injection, the type of node to be injected and the mirror pair information, including: when the type of node to be injected is sequential injection, based on the mirror pair information, non-duplicate nodes in the two mirror pairs are selected in sequence as the nodes to be injected; when the type of node to be injected is random injection, based on the mirror pair information, non-duplicate nodes in the two mirror pairs are randomly selected as the nodes to be injected.
  • the node to be injected into the fault is determined from multiple nodes according to the type of fault injection to be performed, the type of the node to be injected into the fault, and the mirror pair information, including: when the type of the node to be injected into the fault is sequential injection, three nodes are selected in sequence as the nodes to be injected into the fault based on the node order described in the mirror pair information; when the type of the node to be injected into the fault is random injection, three nodes are randomly selected as the nodes to be injected into the fault based on the mirror pair information.
  • a node to be injected into is determined from multiple nodes according to the type of fault injection, the type of node to be injected into, and the mirror pair information, including: when the type of node to be injected into is sequential injection, based on the node order described in the mirror pair information, one node is selected in turn to be determined as the node to be injected into.
  • a node to be injected into the fault is determined from multiple nodes according to the type of the node to be injected into the fault, the type of the node to be injected into the fault, and the mirror pair information, including: when the type of the node to be injected into the fault is a random injection, a node is randomly selected as the node to be injected into the fault based on the mirror pair information.
  • determining that a single node is a node to be injected with faults it also includes: detecting whether the node to be injected with faults meets the fault injection conditions. When the fault injection conditions are not met, a loop is entered to wait until the node to be injected with faults meets the fault injection conditions, and the step of injecting faults into the node to be injected with faults is executed.
  • the method further includes: when a fault injection condition is met, executing a step of performing fault injection on a node to be injected with the fault.
  • the method further includes: when the current service situation indicates that the front-end IO service is interrupted, determining that the fault injection of the node to be injected fails.
  • a fault injection device based on a mirror pair includes:
  • An acquisition module is used to log in to the cluster and acquire a preset mirror pair record table, the preset mirror pair record table includes mirror pair information, the cluster includes multiple nodes, and the mirror pair information includes the mirror pair and the associated nodes;
  • a screening module used to determine a node to be injected with a fault from a plurality of nodes according to the mirror pair information in a preset mirror pair record table;
  • the injection module is used to perform fault injection on the fault injection node and obtain the current business status corresponding to the front-end IO business;
  • the determination module is used to determine that the fault injection of the node to be injected is successful when the current service situation indicates that the front-end IO service is not interrupted.
  • a computer device comprises a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein when the processor executes the computer program, the following steps are implemented:
  • the preset mirror pair record table includes mirror pair information.
  • the cluster includes multiple nodes and mirror pair information. Includes mirror pairs and associated nodes;
  • a non-volatile computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the following steps are implemented:
  • the cluster includes multiple nodes, and the mirror pair information includes the mirror pair and the associated nodes.
  • the above-mentioned fault injection method, device, computer equipment and storage medium based on mirror pair select at least one node to be injected with faults through the mirror pair information of each node recorded in the preset mirror pair record table in the cluster, perform fault injection on the selected node to be injected with faults, and perform fault injection on the node to be injected with faults successfully through the current business situation of the front-end IO business when performing fault injection. Therefore, at least one node to be injected with faults is selected based on the mirror pair relationship of each node in the cluster, and fault injection is performed on at least one node to be injected with faults at the same time. There is no need to wait for one node to complete the fault injection simulation before performing the fault injection simulation of the next node.
  • the fault injection simulation of multiple nodes to be injected with faults can be performed at the same time, saving time, improving the efficiency of multi-node fault injection, and being able to cope with the situation where multi-node failures may occur in actual applications. Further, it can be determined whether there is a problem with the fault recovery by whether the front-end IO is interrupted, thereby improving the reliability of multi-node fault injection.
  • FIG1 is a diagram of an application environment of a fault injection method based on mirror pairs in some embodiments
  • FIG2 is a schematic diagram of a flow chart of a fault injection method based on a mirror pair in some embodiments
  • FIG3 is a schematic diagram of a flow chart of a fault injection method based on a mirror pair in some embodiments
  • FIG4 is a schematic diagram of a flow chart of a fault injection method based on a mirror pair in some embodiments
  • FIG5 is a schematic flow chart of steps for determining a node to be injected with a fault in some embodiments
  • FIG6 is a structural block diagram of a fault injection device based on a mirror pair in some embodiments.
  • FIG. 7 is a diagram of the internal structure of a computer device in some embodiments.
  • the fault injection method based on mirror pair provided in the present application can be applied in the application environment as shown in FIG1.
  • the terminal 102 communicates with the server 104 through a network.
  • the terminal 102 can be, but is not limited to, various personal computers, laptops, smart phones, tablet computers, and portable wearable devices, and the server 104 can be implemented as an independent server or a server cluster composed of multiple servers.
  • the server 104 can include multiple controllers, and each controller can be regarded as a node.
  • the terminal 102 can initiate a cluster login request to the server 104, the server 104 logs in to the cluster, obtains a preset mirror pair record table, the preset mirror pair record table includes mirror pair information, the cluster includes multiple nodes, the mirror pair information includes mirror pairs and associated nodes, and determines the node to be injected into the fault from multiple nodes according to the mirror pair information in the preset mirror pair record table, performs fault injection on the node to be injected into the fault, obtains the current business situation corresponding to the front-end IO business, and when the current business situation indicates that the front-end IO business has not been interrupted, determines that the fault injection of the node to be injected into the fault is successful.
  • a fault injection method based on a mirror pair is provided, which is described by taking the method applied to the server in FIG1 as an example, and includes the following steps:
  • Step 202 log in to the cluster and obtain a preset mirror pair record table, where the preset mirror pair record table includes mirror pair information.
  • the cluster includes multiple nodes, and the mirror pair information includes mirror pairs and associated nodes.
  • the cluster here refers to a group composed of multiple nodes, and a node can be a controller.
  • a node can be a controller.
  • the chassis can be a cluster and the controller can be a node.
  • Each node can form a mirror pair relationship with an adjacent node to obtain a preset mirror pair record table.
  • the preset mirror pair record table records the information of each mirror pair, and each mirror pair information includes a mirror pair and an associated node. It can be understood that the mirror pair information includes the cache data of the node and the adjacent node to form a mirror pair.
  • the information of these four mirror pairs constitutes a preset mirror pair record table.
  • the preset mirror pair record table can establish the mirror pair information corresponding to each node in the cluster in advance according to actual business needs, actual product needs or actual application scenarios.
  • a preset mirror pair record table is obtained according to the cluster login request, and the preset mirror pair record table includes mirror pair information, wherein the cluster includes multiple nodes, and the mirror pair information includes mirror pairs and associated nodes.
  • the cache data of the two nodes form a mirror management, and even if one of the nodes fails or dies, the other node forming the mirror pair can continue to provide services to the outside world.
  • Step 204 determine a node to be injected with a fault from a plurality of nodes according to the mirror pair information in the preset mirror pair record table.
  • the preset mirror pair record table records the mirror pair information between each node, and the node to be injected into the fault can be determined from multiple nodes in the cluster according to the mirror pair information described in the preset mirror pair record table.
  • the number of nodes to be injected into the fault can be at least one, which can be determined specifically according to the preset fault injection rules and the information of each mirror pair.
  • the preset fault injection rules are pre-set fault injection rules, including but not limited to the type of fault injection and the type of node to be injected into the fault.
  • the so-called type of fault injection is the specific number of nodes to be injected into the fault, and the type of node to be injected into the fault is whether the node to be injected into the fault is random or sequential. Therefore, the node to be injected into the fault can be determined from multiple nodes according to the preset fault injection rules and the information of each mirror pair in the preset mirror pair record table.
  • Step 206 perform fault injection on the node to be injected with faults, and obtain the current service status corresponding to the front-end IO service.
  • a fault injection command is sent to the node to be injected with faults.
  • the node to be injected with faults After receiving the fault injection command, the node to be injected with faults performs fault injection on the node to be injected with faults according to the fault injection command.
  • the current service status corresponding to the front-end IO service is obtained.
  • the service status of the front-end IO service when the node to be injected with faults performs the fault injection that is, the current service status, is obtained.
  • the current service status includes interruption and non-termination.
  • Step 208 When the current service situation indicates that the front-end IO service is not interrupted, it is determined that the fault injection of the node to be injected is successful.
  • the front-end IO service when it is determined through the current business situation that the front-end IO service is not interrupted, it means that the front-end IO service is not affected by the fault injection, and therefore, it can be determined that the fault injection of the node to be injected is successful. Conversely, when it is determined through the current business situation that the front-end IO service is interrupted, it means that the front-end IO service is affected by the fault injection, and therefore, it can be determined that the fault injection of the node to be injected fails.
  • At least one node to be injected with faults is selected through the mirror pair information of each node recorded in the preset mirror pair record table in the cluster, and fault injection is performed on the selected node to be injected with faults. If no terminal occurs during the execution of fault injection, the fault injection of the node to be injected with faults is successful. Therefore, at least one node to be injected with faults is selected based on the mirror pair relationship of each node in the cluster, and fault injection is performed on at least one node to be injected with faults at the same time. There is no need to wait for one node to complete the fault injection simulation before performing the fault injection simulation of the next node.
  • the fault injection simulation of multiple nodes to be injected with faults can be performed at the same time, saving time, improving the efficiency of multi-node fault injection, and being able to cope with the situation where multi-node failures may occur in actual applications. Further, it is possible to determine whether there is a problem with the fault recovery by whether the front-end IO is interrupted, thereby improving the reliability of multi-node fault injection.
  • the above-mentioned fault injection method based on mirror pairs further includes:
  • Step 302 obtaining recovery information of the faulty injected node, and when the recovery information of the faulty injected node is recovered, returning to the step of determining the node to be injected from multiple nodes according to the mirror pair information in the preset mirror pair record table.
  • Step 304 When the number of fault injections reaches a preset fault injection threshold, the fault injection is stopped.
  • the fault injection node After the fault injection is performed on the fault injection node, the fault injection node becomes the fault injection node. After the fault injection is performed on the fault injection node, it will recover after a period of time, and the recovery information of the fault injection node is obtained. That is, the recovery information of the fault injection node reflects whether the fault injection node has recovered.
  • the fault-injected node recovery information corresponding to the fault-injected node is obtained, and whether the node has been recovered is determined through the fault-injected node recovery information. If it has been recovered, the step of determining the node to be injected with faults from multiple nodes according to the mirror pair information in the preset mirror pair record table can be returned to perform fault injection on the next node until the number of fault injections reaches the preset fault injection number threshold, and the fault injection is stopped.
  • the preset fault injection number threshold can be determined according to actual business needs, actual product needs, or actual application scenarios.
  • Step 402 determine each node in the cluster.
  • Step 404 Form mirror pairs of each node with adjacent nodes to form corresponding mirror pair information.
  • Step 406 Generate a preset mirror pair record table according to the information of each mirror pair.
  • determining each node in the cluster specifically includes: counting each node in the cluster.
  • each node is formed into a mirror pair with an adjacent node to form corresponding mirror pair information, including: determining the current node, forming a mirror pair with the current node and the adjacent node, forming a mirror pair with the cache data of the current node and the cache data of the adjacent node, and obtaining corresponding mirror pair information.
  • the current node is any node in the cluster, and the adjacent node is an adjacent node of the current node.
  • the cluster includes 4 nodes, namely: node1, node2, node3, node4, forming four mirror pairs (node1, node2), (node2, node3), (node3, node4), (node4, node1). If node1 is the current node and node2 is its adjacent node, node1 and node2 form a mirror pair, and the cache data of node1 and the cache data of node2 form a mirror pair, forming mirror pair information. These four mirror pairs generate a preset mirror pair record table.
  • determining a node to be injected with a fault from a plurality of nodes according to the mirror pair information in the preset mirror pair record table includes:
  • Step 502 Obtain preset fault injection rules.
  • Step 504 Obtain the type of fault to be injected and the type of node to be injected according to the preset fault injection rule.
  • Step 506 Determine a node to be injected with faults from multiple nodes according to the type of the node to be injected with faults, the type of the node to be injected with faults, and the mirror pair information.
  • the preset fault injection rules are the execution rules for fault injection of nodes set in advance.
  • the preset fault injection rules include the type of fault injection to be performed and the type of node to be injected into.
  • the type of fault injection to be performed reflects the number of nodes that need to be injected into the fault, and the type of node to be injected into the fault reflects the order of the fault injection nodes. It can be set according to actual business needs, actual product needs or actual application scenarios.
  • the fault injection type (--inject_type) option has values 1-4, single node failure (value: 1, default), two nodes fail at the same time (value: 2), three nodes fail in sequence (value: 3), and two nodes fail in sequence (value: 4); the node type (--select_type) sets whether the node to be injected is random or sequential, with the sequential value seq (default) and the random value random.
  • obtain the type of fault injection and the type of node to be injected in the preset fault injection rule determine the number of nodes to be injected by the type of fault injection, determine the order of nodes to be injected by the type of node to be injected, and also need to determine at least one node to be injected from multiple nodes in the cluster in combination with the mirror pair information.
  • determining the node to be injected from multiple nodes according to the type of fault to be injected, the type of the node to be injected, and the mirror pair information includes:
  • one node is determined as the node to be injected into the fault in sequence according to the node order described in the mirror pair information.
  • the type of node to be injected into the fault is random injection, one node is randomly selected from multiple nodes according to the mirror pair information to be determined as the node to be injected into the fault.
  • the type of fault injection to be performed is a single node fault type, it means that each fault injection is performed on one node, so the number of nodes for each fault injection is one.
  • the type of fault injection to be performed is a single node fault type, and the type of node to be injected is sequential injection, a node can be sequentially determined as a node to be injected through the node sequence described in the mirror pair information.
  • the input type is a single node fault type and the type of the node to be injected into the fault is random injection
  • a node can be randomly selected from multiple nodes through the mirror pair information to be determined as the node to be injected into the fault.
  • the first node of each mirror pair in the stable initial state of the cluster is used as the fault injection node and saved in a fault node list [node1, node2, node3, node4].
  • This allows all nodes in the cluster to be injected with faults in turn and randomly selected nodes for fault injection.
  • determining the node to be injected into the fault from multiple nodes according to the type of fault injection to be performed, the type of the node to be injected into the fault, and the mirror pair information includes:
  • non-duplicate nodes in the two mirror pairs are selected in sequence as nodes to be injected into the fault.
  • non-duplicate nodes in the two mirror pairs are randomly selected as nodes to be injected into the fault.
  • the type of fault injection to be performed is the type of simultaneous failure of two nodes, it means that each time the fault injection is performed, two nodes fail at the same time, so the number of nodes performing fault injection each time is two.
  • the type of fault injection to be performed is the type of simultaneous failure of two nodes, and the type of nodes to be injected with faults is sequential injection
  • the non-duplicate nodes in the two mirror pairs can be selected sequentially as the nodes to be injected with faults through the mirror pair information. If the duplicate nodes in the two mirror pairs are selected to perform fault injection simultaneously, the data cached by the nodes will be lost. Therefore, based on the mirror pair information, the non-duplicate nodes in the two mirror pairs are selected sequentially to inject faults simultaneously.
  • the non-duplicate nodes in the two mirror pairs are selected in turn as the nodes to be injected into the fault: node1 and node3.
  • the non-duplicate nodes in the two mirror pairs can be randomly selected as the nodes to be injected through the mirror pair information. For example, if the mirror pair information is: (node1, node2), (node2, node3), (node3, node4), (node4, node1), then the non-duplicate nodes in the two mirror pairs are selected as the nodes to be injected: node2 and node4.
  • determining the node to be injected into the fault from multiple nodes according to the type of fault injection to be performed, the type of the node to be injected into the fault, and the mirror pair information includes:
  • nodes to be injected into the fault When the type of nodes to be injected into the fault is sequential injection, three nodes are selected as the nodes to be injected into the fault in sequence based on the node order described in the mirror pair information. When the type of nodes to be injected into the fault is random injection, three nodes are randomly selected as the nodes to be injected into the fault based on the mirror pair information.
  • the fault injection type is a three-node sequential fault type
  • each fault injection is for three nodes, and the three nodes are injected with faults in sequence.
  • the three nodes are injected with faults in sequence, first injecting faults into one node, and then injecting faults into the remaining nodes in sequence.
  • the type of fault injection is a three-node sequential fault type
  • the type of the node to be injected is sequential injection
  • three nodes can be selected as the nodes to be injected according to the node sequence described in the mirror pair information. For example, based on the node sequence described in the mirror pair information: [node1, node2, node3, node4], three nodes are selected as the nodes to be injected according to the sequence: node1, node2, node3.
  • the type of fault injection to be performed is a three-node sequential fault type
  • the type of the node to be injected into the fault is random injection
  • three nodes can be randomly selected as the nodes to be injected into the fault based on the mirror pair information.
  • the mirror pair information is: (node1, node2), (node2, node3), (node3, node4), (node4, node1)
  • the three nodes randomly selected as the nodes to be injected into the fault are: node1, node3, node4.
  • the first node of each mirror pair is used as the fault injection node and saved in a fault node list [node1, node2, node3, node4].
  • Three nodes are selected sequentially or randomly to inject faults in sequence, and then all faulty nodes are restored. The next node selection is performed after the nodes are restored to normal.
  • the method of selecting three nodes sequentially is to start with one node and then select the two nodes behind it in sequence, such as selecting [node1, node2, node3], and then selecting [node2, node3, node4] the next time, and then [node3, node4, node1, node2] the next time, and then selecting nodes in a loop.
  • Random selection is to select three nodes from the node list according to a random algorithm, such as [node1, node3, node4]. After selecting the node, remotely log in to the cluster through the cluster login information, and then execute the fault injection command to trigger the fault, because each node can be operated through the cluster without directly logging in to the node.
  • the node to be injected into is determined from multiple nodes according to the type of fault injection, the type of node to be injected into, and the mirror pair information, including: when the type of node to be injected into is sequential injection, based on the node order described in the mirror pair information, one node is selected in turn to be determined as the node to be injected into.
  • the node to be injected into is determined from multiple nodes according to the type of fault injection, the type of node to be injected into, and the mirror pair information, including: when the type of node to be injected into is random injection, a node is randomly selected as the node to be injected into based on the mirror pair information.
  • determining that a single node is a node to be injected with faults it also includes: detecting whether the node to be injected with faults meets the fault injection conditions. When the fault injection conditions are not met, a loop is entered to wait until the node to be injected with faults meets the fault injection conditions, and the step of injecting faults on the node to be injected with faults is executed.
  • the fault injection condition is a preset condition used to determine whether a single node can perform fault injection, which can be preset according to actual business needs, actual product needs or actual application scenarios.
  • this node has the conditions for injecting the fault. If the fault cannot be injected, a loop wait is performed until the fault position can be injected into this node. If the fault is forced to be injected, it will affect the front-end business offline.
  • the step of performing fault injection on the node to be injected with the fault is performed.
  • the steps of the fault injection method based on the mirror pair can be briefly and popularly explained through the following steps, which are specifically:
  • the fault injection type (--inject_type) option has a value of 1-4, single node failure (value: 1, default), two nodes fail at the same time (value: 2), three nodes fail in sequence (value: 3), and two nodes fail in sequence (value: 4); the fault injection node type (--select_type) sets whether the fault injection node is random or sequential, with the sequential value being seq (default) and the random value being random.
  • the first node of each mirror pair in the stable initial state of the cluster is used as the fault injection node and saved in a fault node list [node1, node2, node3, node4].
  • This allows all nodes in the cluster to be injected with faults in turn and randomly selected nodes for fault injection.
  • the nodes are selected by selecting two domains in sequence or randomly, merging the two domain elements and removing duplicates. For example, domain0 and domain1 are selected, and the two nodes for fault injection are (node1, node3). Wait for the two nodes to recover before selecting the next node. After selecting the nodes, call two threads to remotely log in to the two nodes respectively, and send fault injection commands to the nodes at the same time. Because it is necessary to simulate the simultaneous injection of faults, it is necessary to log in to the two nodes and execute the commands at the same time.
  • the first node of each mirror pair is also used as the fault injection node and saved in a fault node list [node1, node2, node3, node4].
  • Three nodes are selected sequentially or randomly to inject faults in sequence, and then all faulty nodes are restored. The next node selection is performed after the nodes are restored to normal.
  • the method of selecting three nodes sequentially is to start with one node and then select the two nodes behind it in sequence. For example, select [node1, node2, node3], and then select [node2, node3, node4] next time.
  • Random selection is to select three nodes from the node list according to a random algorithm, such as [node1, node3, node4]. After selecting the node, remotely log in to the cluster through the cluster login information, and then execute the fault injection command to trigger the fault. Because each node can be operated through the cluster, there is no need to log in to the node directly.
  • a fault injection device 600 based on a mirror pair comprising: an acquisition module 602 , a screening module 604 , an injection module 606 and a determination module 608 , wherein:
  • the acquisition module 602 is used to log in to the cluster and acquire a preset mirror pair record table, the preset mirror pair record table includes mirror pair information, the cluster includes multiple nodes, and the mirror pair information includes mirror pairs and associated nodes.
  • the screening module 604 is used to determine the node to be injected with the fault from the multiple nodes according to the mirror pair information in the preset mirror pair record table.
  • the injection module 606 is used to perform fault injection on the node to be injected with the fault, and obtain the current service status corresponding to the front-end IO service.
  • the determination module 608 is used to determine that the fault injection of the node to be injected is successful when the current service situation indicates that the front-end IO service is not interrupted.
  • the mirror pair-based fault injection device 600 obtains recovery information of the fault-injected node.
  • the recovery information of the fault-injected node is recovered, it returns to the screening module 604 to execute the step of determining the node to be injected with fault from multiple nodes according to the mirror pair information in the preset mirror pair record table, until the number of fault injections reaches the preset fault injection number threshold, and then stops the fault injection.
  • the mirror pair-based fault injection device 600 determines each node in the cluster, forms a mirror pair with each node and an adjacent node, forms corresponding mirror pair information, and generates a preset mirror pair record table according to each mirror pair information.
  • the screening module 604 obtains preset fault injection rules, obtains the type of fault to be injected and the type of node to be injected according to the preset fault injection rules, and determines the node to be injected from multiple nodes according to the type of fault to be injected, the type of node to be injected and the mirror pair information.
  • the screening module 604 determines one node as the node to be injected with faults in sequence according to the order of nodes described in the mirror pair information when the type of node to be injected with faults is sequential injection; when the type of node to be injected with faults is random injection, a node is randomly selected from multiple nodes according to the mirror pair information to be determined as the node to be injected with faults.
  • the screening module 604 selects non-duplicate nodes in the two mirror pairs as nodes to be injected with faults in sequence based on the mirror pair information when the type of nodes to be injected with faults is sequential injection; and randomly selects non-duplicate nodes in the two mirror pairs as nodes to be injected with faults based on the mirror pair information when the type of nodes to be injected with faults is random injection.
  • the screening module 604 selects three nodes as the nodes to be injected with faults in sequence based on the node order described in the mirror pair information when the type of the nodes to be injected with faults is sequential injection; when the type of the nodes to be injected with faults is random injection, three nodes are randomly selected as the nodes to be injected with faults based on the mirror pair information.
  • Each module in the above-mentioned fault injection device based on the mirror pair can be implemented in whole or in part by software, hardware and a combination thereof.
  • the above-mentioned modules can be embedded in or independent of the processor in the computer device in the form of hardware, or can be stored in the memory of the computer device in the form of software, so that the processor can call and execute the operations corresponding to the above modules.
  • a computer device which may be a server, and its internal structure diagram may be shown in FIG7 .
  • the computer device includes a processor, a memory, a network interface, and a database connected via a system bus.
  • the processor of the computer device is used to provide computing and control capabilities.
  • the memory of the computer device includes a non-volatile storage medium and an internal memory.
  • the non-volatile storage medium stores an operating system, a computer program, and a database.
  • the internal memory is an operating system and a computer program in the non-volatile storage medium.
  • the computer program provides an environment for running the computer program.
  • the database of the computer device is used to store a preset mirror pair record table.
  • the network interface of the computer device is used to communicate with an external terminal through a network connection.
  • FIG. 7 is merely a block diagram of a partial structure related to the solution of the present application, and does not constitute a limitation on the computer device to which the solution of the present application is applied.
  • the specific computer device may include more or fewer components than shown in the figure, or combine certain components, or have a different arrangement of components.
  • a computer device including a memory, a processor, and a computer program stored in the memory and executable on the processor.
  • the processor executes the computer program, the following steps are implemented: logging into a cluster, obtaining a preset mirror pair record table, the preset mirror pair record table including mirror pair information, the cluster including multiple nodes, the mirror pair information including mirror pairs and associated nodes, determining a node to be injected with faults from multiple nodes according to the mirror pair information in the preset mirror pair record table, performing fault injection on the node to be injected with faults, obtaining a current business situation corresponding to a front-end IO business, and determining that the fault injection of the node to be injected with faults is successful when the current business situation indicates that the front-end IO business has not been interrupted.
  • the processor executes the computer program, the following steps are also implemented: obtaining recovery information of the fault-injected node, and when the recovery information of the fault-injected node is recovered, returning to the step of determining the node to be injected with fault from multiple nodes according to the mirror pair information in the preset mirror pair record table, until the number of fault injections reaches a preset fault injection threshold, and then stopping the fault injection.
  • the processor when the processor executes the computer program, the following steps are also implemented: determining each node in the cluster, forming a mirror pair of each node with an adjacent node, forming corresponding mirror pair information, and generating a preset mirror pair record table according to each mirror pair information.
  • the processor when the processor executes the computer program, the following steps are also implemented: obtaining preset fault injection rules, obtaining the type of fault to be injected and the type of node to be injected according to the preset fault injection rules, and determining the node to be injected from multiple nodes according to the type of fault to be injected, the type of node to be injected and the mirror pair information.
  • the processor when the type of fault to be injected is a single-node fault type, the processor also implements the following steps when executing the computer program: when the type of node to be injected is sequential injection, one node is determined in sequence as the node to be injected into the fault according to the node order described in the mirror pair information; when the type of node to be injected into the fault is random injection, one node is randomly selected from multiple nodes according to the mirror pair information and determined as the node to be injected into the fault.
  • the processor when the type of fault injection is a simultaneous fault type of two nodes, the processor also implements the following steps when executing the computer program: when the type of node to be injected with fault is sequential injection, based on the mirror pair information, non-duplicate nodes in the two mirror pairs are selected in sequence as nodes to be injected with fault; when the type of node to be injected with fault is random injection, based on the mirror pair information, non-duplicate nodes in the two mirror pairs are randomly selected as nodes to be injected with fault.
  • the processor when the type of fault injection is a three-node sequential fault type, the processor also implements the following steps when executing the computer program: when the type of node to be injected with fault is sequential injection, three nodes are selected in sequence as nodes to be injected with fault based on the node order described in the mirror pair information; when the type of node to be injected with fault is random injection, three nodes are randomly selected as nodes to be injected with fault based on the mirror pair information.
  • the processor when the fault injection type is a single-node fault injection type, the processor further implements the following steps when executing the computer program: when the type of node to be injected with faults is sequential injection, based on the node order described in the mirror pair information, select one node in turn to determine as the node to be injected with faults.
  • the processor when the fault injection type is a single-node fault injection type, the processor further implements the following steps when executing the computer program: when the type of the node to be injected with faults is random injection, a node is randomly selected as the node to be injected with faults based on the mirror pair information.
  • the processor when the processor executes the computer program, the following steps are also implemented: detecting whether the node to be injected with faults meets the fault injection conditions; if the fault injection conditions are not met, entering a loop waiting until the node to be injected with faults meets the fault injection conditions, and executing the step of injecting faults into the node to be injected with faults.
  • the processor executes the computer program
  • the following steps are further implemented: when the fault injection condition is met, the step of performing fault injection on the node to be injected with the fault is executed.
  • the processor executes the computer program, the following steps are further implemented: when the current service situation indicates that the front-end IO service is interrupted, determining that the fault injection of the node to be injected fails.
  • a non-volatile computer-readable storage medium on which a computer program is stored.
  • the computer program When the computer program is executed by a processor, the following steps are implemented: logging in to a cluster, obtaining a preset mirror pair record table, the preset mirror pair record table including mirror pair information, the cluster including multiple nodes, the mirror pair information including mirror pairs and associated nodes, determining a node to be injected with faults from multiple nodes according to the mirror pair information in the preset mirror pair record table, performing fault injection on the node to be injected with faults, obtaining a current business situation corresponding to a front-end IO business, and determining that the fault injection of the node to be injected with faults is successful when the current business situation indicates that the front-end IO business has not been interrupted.
  • the processor executes the computer program, the following steps are also implemented: obtaining recovery information of the fault-injected node, and when the recovery information of the fault-injected node is recovered, returning to the step of determining the node to be injected with fault from multiple nodes according to the mirror pair information in the preset mirror pair record table, until the number of fault injections reaches a preset fault injection threshold, and then stopping the fault injection.
  • the processor when the processor executes the computer program, the following steps are also implemented: determining each node in the cluster, forming a mirror pair of each node with an adjacent node, forming corresponding mirror pair information, and generating a preset mirror pair record table according to each mirror pair information.
  • the processor when the processor executes the computer program, the following steps are also implemented: obtaining preset fault injection rules, obtaining the type of fault to be injected and the type of node to be injected according to the preset fault injection rules, and determining the node to be injected from multiple nodes according to the type of fault to be injected, the type of node to be injected and the mirror pair information.
  • the processor when the type of fault to be injected is a single-node fault type, the processor also implements the following steps when executing the computer program: when the type of node to be injected is sequential injection, one node is determined in sequence as the node to be injected into the fault according to the node order described in the mirror pair information; when the type of node to be injected into the fault is random injection, one node is randomly selected from multiple nodes according to the mirror pair information to be determined as the node to be injected into the fault.
  • the processor when the type of fault injection is a simultaneous fault type of two nodes, the processor also implements the following steps when executing the computer program: when the type of node to be injected with fault is sequential injection, based on the mirror pair information, non-duplicate nodes in the two mirror pairs are selected in sequence as nodes to be injected with fault; when the type of node to be injected with fault is random injection, based on the mirror pair information, non-duplicate nodes in the two mirror pairs are randomly selected as nodes to be injected with fault.
  • the processor when the type of fault injection is a three-node sequential fault type, the processor also implements the following steps when executing the computer program: when the type of node to be injected with fault is sequential injection, three nodes are selected in sequence as nodes to be injected with fault based on the node order described in the mirror pair information; when the type of node to be injected with fault is random injection, three nodes are randomly selected as nodes to be injected with fault based on the mirror pair information.
  • the processor when the fault injection type is a single-node fault injection type, the processor further implements the following steps when executing the computer program: when the type of node to be injected with faults is sequential injection, based on the node order described in the mirror pair information, select one node in turn to determine as the node to be injected with faults.
  • the processor when the fault injection type is a single-node fault injection type, the processor further implements the following steps when executing the computer program: when the type of the node to be injected with faults is random injection, a node is randomly selected as the node to be injected with faults based on the mirror pair information.
  • the processor when the processor executes the computer program, the following steps are also implemented: detecting whether the node to be injected with faults meets the fault injection conditions; if the fault injection conditions are not met, entering a loop waiting until the node to be injected with faults meets the fault injection conditions, and executing the step of injecting faults into the node to be injected with faults.
  • the processor executes the computer program
  • the following steps are further implemented: when the fault injection condition is met, the step of performing fault injection on the node to be injected with the fault is executed.
  • the processor executes the computer program, the following steps are further implemented: when the current service situation indicates that the front-end IO service is interrupted, determining that the fault injection of the node to be injected fails.
  • Non-volatile memory can include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM) or flash memory.
  • Volatile memory can include random access memory (RAM) or external cache memory.
  • RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous link (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

Abstract

本申请涉及一种基于镜像对的故障注入方法、装置、计算机设备和存储介质。该方法包括:登录集群,获取预设镜像对记录表,预设镜像对记录表包括镜像对信息,集群包括多个节点,镜像对信息包括镜像对与关联的节点;根据预设镜像对记录表中的镜像对信息从多个节点中确定待故障注入节点;对待故障注入节点执行故障注入,获取前端IO业务对应的当前业务情况;在当前业务情况表示前端IO业务未发生中断时,确定待故障注入节点故障注入成功。采用本方法能够基于集群中各节点的镜像对关系选择至少一个待故障注入节点,对至少待故障注入节点同时执行故障注入,节约时间,提高效率。

Description

基于镜像对的故障注入方法、装置、设备和存储介质
本申请要求于2022年10月13日提交中国专利局,申请号为202211253227.8,申请名称为“基于镜像对的故障注入方法、装置、设备和存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及计算机技术领域,特别是涉及一种基于镜像对的故障注入方法、装置、计算机设备和存储介质。
背景技术
软件测试方法中的故障注入技术被广泛用于验证软件容忍性、健壮性、安全性和可靠性测试中。故障注入是指按照选定的故障模型,人为地有意识地在目标系统中产生故障,以加速该目标系统的错误和失效的发生。通过分析目标系统对所注入故障的反应信息,可以验证容错性和故障安全等信息。
但是,目前产品设备只能支持一个节点的故障注入,通过自动化测试脚本对单节点执行故障注入命令,实现单节点的故障注入。但是实际情况中,会存在多个节点同时出现故障,通过目前的故障注入方式进行故障模拟,等到一个节点完成故障注入模拟,再进入下一节点的故障注入模拟,浪费时间。
发明内容
基于此,有必要针对上述技术问题,提供一种基于镜像对的故障注入方法、装置、计算机设备和存储介质,基于预先建立好的镜像对关系选择多个待故障注入节点,对多个待故障注入节点同时进行故障注入,节约时间,提高多个节点的故障注入效率。
根据第一方面,一种基于镜像对的故障注入方法,该方法包括:
登录集群,获取预设镜像对记录表,预设镜像对记录表包括镜像对信息,集群包括多个节点,镜像对信息包括镜像对与关联的节点;
根据预设镜像对记录表中的镜像对信息从多个节点中确定待故障注入节点;
对待故障注入节点执行故障注入,获取前端IO业务对应的当前业务情况;
在当前业务情况表示前端IO业务未发生中断时,确定待故障注入节点故障注入成功。
在其中一些实施例中,获取已故障注入节点恢复信息,在已故障注入节点恢复信息为已恢复时,返回根据预设镜像对记录表中的镜像对信息从多个节点中确定待故障注入节点的步骤;直至执行故障注入次数达到预设故障注入次数阈值时,停止故障注入。
在其中一些实施例中,登录集群之前,包括:确定集群中的各节点;将各节点与相邻节点形成镜像对,组成对应的镜像对信息;根据各镜像对信息生成预设镜像对记录表。
在其中一些实施例中,根据预设镜像对记录表中的镜像对信息从多个节点中确定待故障注入节点,包括:获取预设故障注入规则;根据预设故障注入规则获取待故障注入类型和待故障注入节点类型;根据待故障注入类型、待故障注入节点类型和镜像对信息从多个节点中确定待故障注入节点。
在其中一些实施例中,当待故障注入类型为单节点故障类型时,根据待故障注入类型、待故障注入节点类型和镜像对信息从多个节点中确定待故障注入节点,包括:在待故障注入节点类型为依次注入时,根据镜像对信息中描述的节点顺序,依次确定一个节点为待故障注入节点;在待故障注入节点类型为随机注入时,根据镜像对信息随机从多个节点中选取一个节点确定为待故障注入节点。
在其中一些实施例中,当待故障注入类型为两节点同时故障类型时,根据待故障注入类型、待故障注入节点类型和镜像对信息从多个节点中确定待故障注入节点,包括:在待故障注入节点类型为依次注入时,基于镜像对信息,依次选取两个镜像对中的非重复节点为待故障注入节点;在待故障注入节点类型为随机注入时,基于镜像对信息,随机选取两个镜像对中的非重复节点为待故障注入节点。
在其中一些实施例中,当待故障注入类型为三节点依次故障类型时,根据待故障注入类型、待故障注入节点类型和镜像对信息从多个节点中确定待故障注入节点,包括:在待故障注入节点类型为依次注入时,基于镜像对信息中描述的节点顺序,依次选取三个节点为待故障注入节点;在待故障注入节点类型为随机注入时,基于镜像对信息随机选取三个节点为待故障注入节点。
在其中一些实施例中,当故障注入类型为单节点故障注入类型时,根据待故障注入类型、待故障注入节点类型和镜像对信息从多个节点中确定待故障注入节点,包括:在待故障注入节点类型为依次注入时,基于镜像对信息中描述的节点顺序,依次选取一个节点确定为待故障注入节点。
在其中一些实施例中,当故障注入类型为单节点故障注入类型时,根据待故障注入类型、待故障注入节点类型和镜像对信息从多个节点中确定待故障注入节点,包括:在待故障注入节点类型为随机注入时,基于镜像对信息随机选取一个节点为待故障注入节点。
在其中一些实施例中,在确定单节点为待故障注入节点后,还包括:检测待故障注入节点是否具备故障注入条件,当不具备故障注入条件时,则进入循环等待,直至待故障注入节点具备故障注入条件,执行对待故障注入节点执行故障注入的步骤。
在其中一些实施例中,该方法还包括:当具备故障注入条件时,则执行对待故障注入节点执行故障注入的步骤。
在其中一些实施例中,该方法还包括:在当前业务情况表示前端IO业务发生中断时,确定待故障注入节点故障注入失败。
根据第二方面,一种基于镜像对的故障注入装置,该装置包括:
获取模块,用于登录集群,获取预设镜像对记录表,预设镜像对记录表包括镜像对信息,集群包括多个节点,镜像对信息包括镜像对与关联的节点;
筛选模块,用于根据预设镜像对记录表中的镜像对信息从多个节点中确定待故障注入节点;
注入模块,用于对待故障注入节点执行故障注入,获取前端IO业务对应的当前业务情况;
判定模块,用于在当前业务情况表示前端IO业务未发生中断时,确定待故障注入节点故障注入成功。
根据第三方面,一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,处理器执行计算机程序时实现以下步骤:
登录集群,获取预设镜像对记录表,预设镜像对记录表包括镜像对信息,集群包括多个节点,镜像对信息 包括镜像对与关联的节点;
根据预设镜像对记录表中的镜像对信息从多个节点中确定待故障注入节点;
对待故障注入节点执行故障注入,获取前端IO业务对应的当前业务情况;
在当前业务情况表示前端IO业务未发生中断时,确定待故障注入节点故障注入成功。
根据第四方面,一种非易失性计算机可读存储介质,其上存储有计算机程序,计算机程序被处理器执行时实现以下步骤:
登录集群,获取预设镜像对记录表,预设镜像对记录表包括镜像对信息,集群包括多个节点,镜像对信息包括镜像对与关联的节点;
根据预设镜像对记录表中的镜像对信息从多个节点中确定待故障注入节点;
对待故障注入节点执行故障注入,获取前端IO业务对应的当前业务情况;
在当前业务情况表示前端IO业务未发生中断时,确定待故障注入节点故障注入成功。
上述基于镜像对的故障注入方法、装置、计算机设备和存储介质,通过集群中的预设镜像对记录表中记载的各节点的镜像对信息,选取至少一个待故障注入节点,对选取出来的待故障注入节点执行故障注入,通过前端IO业务在执行故障注入时的当前业务情况,如果未发生终端,则说明此次待故障注入节点的故障注入成功。因此,基于集群中各节点的镜像对关系选择至少一个待故障注入节点,对至少一个待故障注入节点同时执行故障注入,无需等待一个节点完成故障注入模拟后,再进行下一节点的故障注入模拟,可同时执行多个待故障注入节点的故障注入模拟,节约时间,提高多节点故障注入效率,而且还能够应对实际应用中可能会出现多节点故障的情况,进一步地还能通过前端IO是否中断来判定此次故障恢复是否存在问题,进而提高多节点故障注入的可靠性。
附图说明
图1为一些实施例中基于镜像对的故障注入方法的应用环境图;
图2为一些实施例中基于镜像对的故障注入方法的流程示意图;
图3为一些实施例中基于镜像对的故障注入方法的流程示意图;
图4为一些实施例中基于镜像对的故障注入方法的流程示意图;
图5为一些实施例中待故障注入节点确定步骤的流程示意图;
图6为一些实施例中基于镜像对的故障注入装置的结构框图;
图7为一些实施例中计算机设备的内部结构图。
具体实施方式
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。
本申请提供的基于镜像对的故障注入方法,可以应用于如图1所示的应用环境中。其中,终端102通过网络与服务器104进行通信。其中,终端102可以但不限于是各种个人计算机、笔记本电脑、智能手机、平板电脑和便携式可穿戴设备,服务器104可以用独立的服务器或者是多个服务器组成的服务器集群来实现。其中,服务器104中可包括多个控制器,每一控制器可看成是一节点。
具体地,终端102可发起登录集群请求至服务器104,服务器104登录集群,获取预设镜像对记录表,预设镜像对记录表包括镜像对信息,集群包括多个节点,镜像对信息包括镜像对与关联的节点,根据预设镜像对记录表中的镜像对信息从多个节点中确定待故障注入节点,对待故障注入节点执行故障注入,获取前端IO业务对应的当前业务情况,在当前业务情况表示前端IO业务未发生中断时,确定待故障注入节点故障注入成功。
在一些实施例中,如图2所示,提供了一种基于镜像对的故障注入方法,以该方法应用于图1中的服务器为例进行说明,包括以下步骤:
步骤202,登录集群,获取预设镜像对记录表,预设镜像对记录表包括镜像对信息,集群包括多个节点,镜像对信息包括镜像对与关联的节点。
其中,这里的集群是指由多个节点组成的群体,节点可以是控制器,例如,一个机箱中四个控制器,则机箱可以为一集群,控制器为一节点。其中,各节点与相邻的节点可形成镜像对关系,得到预设镜像对记录表。也就是说,预设镜像对记录表记录各镜像对信息,各镜像对信息包括镜像对以及关联的节点,可以理解为,镜像对信息包括节点与相邻节点的缓存数据形成镜像对。
例如集群中的四个节点:node1,node2,node3,node4,组成四个镜像对为(node1,node2)、(node2,node3)、(node3,node4),(node4,node1),这四个镜像对信息组成预设镜像对记录表。
其中,预设镜像对记录表可以预先根据实际业务需求、实际产品需求或实际应用场景建立集群中各节点对应的镜像对信息。
具体地,通过触发生成集群登录请求,根据集群登录请求,获取预先设置好的预设镜像对记录表,预设镜像对记录表包括镜像对信息,其中集群包括多个节点,镜像对信息包括镜像对与关联的节点。也就是说,两个节点的缓存数据形成镜像管理,即使其中一个节点出现故障或者坏死,形成镜像对的另一个节点可继续对外提供服务。
步骤204,根据预设镜像对记录表中的镜像对信息从多个节点中确定待故障注入节点。
其中,预设镜像对记录表中记载了各节点之间的镜像对信息,可以根据预设镜像对记录表中描述的各镜像对信息从集群中的多个节点中确定待故障注入节点。其中,待故障注入节点的数目可以为至少一个,具体可以根据预设故障注入规则和各镜像对信息确定得到。其中,预设故障注入规则是预先设置好的故障注入规则,包括但不限定于待故障注入类型、待故障注入节点类型,所谓待故障注入类型是待故障注入节点的具体数量,待故障注入节点类型是待故障注入节点是否随机或依次。因此,可以通过预设故障注入规则以及预设镜像对记录表中的各镜像对信息从多个节点中确定待故障注入节点。
步骤206,对待故障注入节点执行故障注入,获取前端IO业务对应的当前业务情况。
具体地,发送故障注入命令至待故障注入节点,待故障注入节点接收到故障注入命令后,根据故障注入命令对待故障注入节点执行故障注入,其中,在待故障注入节点执行故障注入的过程时,获取前端IO业务对应的当前业务情况。具体是,获取前端IO业务在待故障注入节点执行故障注入时的业务情况,即当前业务情况。其中当前业务情况包括中断和未终端。
步骤208,在当前业务情况表示前端IO业务未发生中断时,确定待故障注入节点故障注入成功。
具体地,通过当前业务情况确定前端IO业务未发生中断时,说明前端IO业务未受到故障注入的影响,因此,可以确定此次待故障注入节点的故障注入成功。反之,通过当前业务情况确定前端IO业务发生中断时,说明前端IO业务受到故障注入的影响,因此,可以确定此次待故障注入节点的故障注入失败。
上述基于镜像对的故障注入方法中,通过集群中的预设镜像对记录表中记载的各节点的镜像对信息,选取至少一个待故障注入节点,对选取出来的待故障注入节点执行故障注入,通过前端IO业务在执行故障注入时的当前业务情况,如果未发生终端,则说明此次待故障注入节点的故障注入成功。因此,基于集群中各节点的镜像对关系选择至少一个待故障注入节点,对至少一个待故障注入节点同时执行故障注入,无需等待一个节点完成故障注入模拟后,再进行下一节点的故障注入模拟,可同时执行多个待故障注入节点的故障注入模拟,节约时间,提高多节点故障注入效率,而且还能够应对实际应用中可能会出现多节点故障的情况,进一步地还能通过前端IO是否中断来判定此次故障恢复是否存在问题,进而提高多节点故障注入的可靠性。
在一些实施例中,如图3所示,上述基于镜像对的故障注入方法还包括:
步骤302,获取已故障注入节点恢复信息,在已故障注入节点恢复信息为已恢复时,返回根据预设镜像对记录表中的镜像对信息从多个节点中确定待故障注入节点的步骤。
步骤304,直至执行故障注入次数达到预设故障注入次数阈值时,停止故障注入。
其中,对待故障注入节点执行故障注入后,待故障注入节点就变成了已故障注入节点,已故障注入节点在被执行故障注入后,经过一段时间后会恢复,得到已故障注入节点恢复信息。即,已故障注入节点恢复信息是体现已故障注入节点是否恢复的情况。
具体地,获取已故障注入节点对应的已故障注入节点恢复信息,通过已故障注入节点恢复信息确定该节点是否已经恢复,如果恢复了,则可以返回根据预设镜像对记录表中的镜像对信息从多个节点中确定待故障注入节点的步骤,进行下一个节点的故障注入,直至执行故障注入次数达到预设故障注入次数阈值,停止故障注入。其中,预设故障注入次数阈值可以根据实际业务需求、实际产品需求或实际应用场景确定得到。
在一些实施例中,如图4所示,登录集群之前,包括:
步骤402,确定集群中的各节点。
步骤404,将各节点与相邻节点形成镜像对,组成对应的镜像对信息。
步骤406,根据各镜像对信息生成预设镜像对记录表。
具体地,在登陆集群之前,可以进行一些预处理,首先,统计该集群中的所有节点,将当前节点与相邻节点形成镜像对,当前节点与相邻节点的缓存数据形成镜像对,组成对应的镜像对信息。集群中所有节点对应的镜像对信息组成预设镜像对记录表。也就是说,预设镜像对记录表中记载了集群中所有节点对应的镜像对信息。在一些实施例中,确定集群中的各节点具体为:统计集群中的各节点。
在一些实施例中,将各节点与相邻节点形成镜像对,组成对应的镜像对信息,包括:确定当前节点,将当前节点与相邻节点形成镜像对,当前节点的缓存数据与相邻节点的缓存数据形成镜像对,得到对应的镜像对信息。
其中,当前节点是集群中各节点中的任一节点,相邻节点是当前节点的相邻的节点。
例如,集群中包括4个节点,分别为:node1,node2,node3,node4,组成四个镜像对为(node1,node2)、(node2,node3)、(node3,node4),(node4,node1),如node1为当前节点,node2为其相邻节点,node1与node2形成镜像对,node1的缓存数据与node2的缓存数据形成镜像对,组成镜像对信息。而这四个镜像对生成预设镜像对记录表。
在一些实施例中,如图5所示,根据预设镜像对记录表中的镜像对信息从多个节点中确定待故障注入节点,包括:
步骤502,获取预设故障注入规则。
步骤504,根据预设故障注入规则获取待故障注入类型和待故障注入节点类型。
步骤506,根据待故障注入类型、待故障注入节点类型和镜像对信息从多个节点中确定待故障注入节点。
其中,预设故障注入规则是提前设置好的节点的故障注入的执行规则,预设故障注入规则包括待故障注入类型和待故障注入节点类型,待故障注入类型是体现需要进行故障注入节点的数量,而待故障注入节点类型是体现故障注入节点的顺序,可根据实际业务需求、实际产品需求或实际应用场景进行设置得到。
例如,可以通过以下步骤来设置预设故障注入规则,包括:
1)将集群以及当前测试环境中所有的节点信息配置到自动化平台或者自动化脚本可读取的配置文件中,这样在执行自动化用例时,可读取集群和节点的基本信息,主要包括用户名、密码、登录端口,IP等,以此实现自动化脚本可远程登录测试集群或节点。
2)将待故障注入类型(--inject_type)、待故障注入节点类型(--select_type)、故障注入次数(--inject_count)、故障注入间隔时间(--inject_interval)以脚本参数的形式传入,以决定本次故障注入的场景。待故障注入类型(--inject_type)选择项取值为1-4,单节点故障(value:1,默认),两节点同时故障(value:2),三节点依次故障(value:3),两节点依次故障(value:4);待故障注入节点类型(--select_type)设置待故障注入节点是随机还是顺序,顺序取值seq(默认),随机取值为random。
具体地,获取预设故障注入规则中的待故障注入类型和待故障注入节点类型,通过待故障注入类型确定此次故障注入的节点数目,通过待故障注入节点类型确定此次故障注入的节点顺序,同时也需要结合镜像对信息从集群的多个节点中确定出至少一个待故障注入节点。
在一些实施例中,当待故障注入类型为单节点故障类型时,根据待故障注入类型、待故障注入节点类型和镜像对信息从多个节点中确定待故障注入节点,包括:
在待故障注入节点类型为依次注入时,根据镜像对信息中描述的节点顺序,依次确定一个节点为待故障注入节点,在待故障注入节点类型为随机注入时,根据镜像对信息随机从多个节点中选取一个节点确定为待故障注入节点。
其中,当待故障注入类型为单节点故障类型,说明每次故障注入为一个节点,因此,每个执行故障注入的节点数目为一个。具体地,当待故障注入类型为单节点故障类型时,且待故障注入节点类型为依次注入时,通过镜像对信息中描述的节点顺序,可以依次确定一个节点为待故障注入节点。如果当待故障注 入类型为单节点故障类型时,且待故障注入节点类型为随机注入时,可以通过镜像对信息随机从多个节点中选取一个节点确定为待故障注入节点。
例如:当对集群中单个节点多次循环注入故障时,将集群稳定初始状态时每个镜像对的第一个节点作为故障注入节点,保存到一个故障节点列表[node1,node2,node3,node4],这样可实现集群中所有节点依次注入故障和随机选取节点进行故障注入,每次故障注入完成后等待故障恢复后,再选择下一个节点进行故障注入。选取节点后,通过集群登录信息远程登录集群,然后执行故障注入命令触发故障,因为通过集群可操作每个节点,无需直接登录节点。
在一些实施例中,当待故障注入类型为两节点同时故障类型时,根据待故障注入类型、待故障注入节点类型和镜像对信息从多个节点中确定待故障注入节点,包括:
在待故障注入节点类型为依次注入时,基于镜像对信息,依次选取两个镜像对中的非重复节点为待故障注入节点,在待故障注入节点类型为随机注入时,基于镜像对信息,随机选取两个镜像对中的非重复节点为待故障注入节点。
其中,当待故障注入类型为两节点同时故障类型时,说明每次故障注入为两个节点同时发生故障,因此,每次执行故障注入的节点数目为两个。具体地,当待故障注入类型为两节点同时故障类型,且待故障注入节点类型为依次注入时,可以通过镜像对信息,依次选取两个镜像对中的非重复节点为待故障注入节点,如果选择两个镜像对中的重复节点同时执行故障注入,则节点缓存的数据会发生丢失,因此,基于镜像对信息,依次选取两个镜像对中的非重复节点同时注入故障注入。
例如,镜像对信息为:(node1,node2)、(node2,node3)、(node3,node4),(node4,node1),则依次选取两个镜像对中的非重复节点为待故障注入节点:node1和node3。
其中,当待故障注入类型为两节点同时故障类型,且待故障注入节点类型为随机注入时,可以通过镜像对信息,随机选取两个镜像对中非重复的节点为待故障注入节点。如:镜像对信息为:(node1,node2)、(node2,node3)、(node3,node4),(node4,node1),则依次选取两个镜像对中的非重复节点为待故障注入节点:node2和node4。
其中,对两个节点同时注入故障时,需要选取不同镜像对中的两个不同的节点,通过依次或随机选取两个镜像对,然后合并两个镜像对元素并去重实现节点的选取,比如选取第一个镜像对(node1,node2)和第二个镜像对(node2,node3),进行元素合并且去重之后,得到待故障注入的两个节点为(node1,node3),等待两个节点故障恢复后再进行下一次的节点选取。选取节点后,调用两个线程分别远程登录两个节点,对节点同时下发故障注入命令,因为要模拟同时注入故障,所以必须要登录两个节点同时执行命令。
在一些实施例中,当待故障注入类型为三节点依次故障类型时,根据待故障注入类型、待故障注入节点类型和镜像对信息从多个节点中确定待故障注入节点,包括:
在待故障注入节点类型为依次注入时,基于镜像对信息中描述的节点顺序,依次选取三个节点为待故障注入节点,在待故障注入节点类型为随机注入时,基于镜像对信息随机选取三个节点为待故障注入节点。
其中,当待故障注入类型为三节点依次故障类型时,说明每次故障注入为三个节点,且这三个节点依次发生故障注入。但是,这三个节点是依次发生故障注入的,先对一个节点进行故障注入,再依次对剩 下两个节点进行故障注入。具体地,当待故障注入类型为三节点依次故障类型,且待故障注入节点类型为依次注入时,可以通过镜像对信息中描述的节点顺序,依次选取三个节点为待故障注入节点,例如:基于镜像对信息中描述的节点顺序为:[node1,node2,node3,node4],依次选取三个节点为待故障注入节点分别为:node1、node2、node3。
进一步地,当待故障注入类型为三节点依次故障类型,且待故障注入节点类型为随机注入时,基于镜像对信息,可以随机选取三个节点为待故障注入节点。例如,镜像对信息为:(node1,node2)、(node2,node3)、(node3,node4),(node4,node1),随机选取三个节点为待故障注入节点为:node1、node3、node4。
其实,当对多个节点依次注入故障模拟四坏三这种场景时,也是将每个镜像对的第一个节点作为故障注入节点,保存到一个故障节点列表[node1,node2,node3,node4],顺序或随机的选取三个节点依次注入故障,然后恢复所有故障节点,等待恢复正常后再进行下次的节点选取;顺序选取三个节点的选取方法是以一个节点为首,再依照顺序选取它后面的两个节点,比如选取[node1,node2,node3],下次选取[node2,node3,node4],下次就是[node3,node4,node1,node2],以此循环选取节点;随机选取就按照随机算法,从节点列表中选取三个节点,比如[node1,node3,node4]。选取节点后,通过集群登录信息远程登录集群,然后执行故障注入命令触发故障,因为通过集群可操作每个节点,无需直接登录节点。
在一些实施例中,当故障注入类型为单节点故障注入类型时,根据待故障注入类型、待故障注入节点类型和镜像对信息从多个节点中确定待故障注入节点,包括:在待故障注入节点类型为依次注入时,基于镜像对信息中描述的节点顺序,依次选取一个节点确定为待故障注入节点。
在一些实施例中,当故障注入类型为单节点故障注入类型时,根据待故障注入类型、待故障注入节点类型和镜像对信息从多个节点中确定待故障注入节点,包括:在待故障注入节点类型为随机注入时,基于镜像对信息随机选取一个节点为待故障注入节点。
在一些实施例中,在确定单节点为待故障注入节点后,还包括:检测待故障注入节点是否具备故障注入条件,当不具备故障注入条件时,则进入循环等待,直至待故障注入节点具备故障注入条件,执行对待故障注入节点执行故障注入的步骤。
其中,故障注入条件是用来判断是否可以实现单节点执行故障注入的预设条件,可以根据实际业务需求、实际产品需求或实际应用场景预先设置得到。也就是说,对一个节点注入故障时,为了不中断前端IO业务,需要判定此节点是否具备注入故障的条件,若不可注入故障,则进行循环等待,一直等待到此节点可注入故障位置,如果要强制注入故障就会影响到前端业务离线。
在一些实施例中,当具备故障注入条件时,则执行对待故障注入节点执行故障注入的步骤。
在一个具体的实施例中,可以通过以下步骤简要通俗地解释基于镜像对的故障注入方法的步骤,具体为:
(1)将集群以及当前测试环境中所有的节点信息配置到自动化平台或者自动化脚本可读取的配置文件中,这样在执行自动化用例时,可读取集群和节点的基本信息,主要包括用户名、密码、登录端口,IP等,以此实现自动化脚本可远程登录测试集群或节点。
(2)将待故障注入类型(--inject_type)、待故障注入节点类型(--select_type)、故障注入次数(--inject_count)、故障注入间隔时间(--inject_interval)以脚本参数的形式传入,以决定本次故障注入的场景。待故障注入类型(--inject_type)选择项取值为1-4,单节点故障(value:1,默认),两节点同时故障(value:2),三节点依次故障(value:3),两节点依次故障(value:4);待故障注入节点类型(--select_type)设置故障注入节点是随机还是顺序,顺序取值seq(默认),随机取值为random。
(3)当对集群中单个节点多次循环注入故障时,将集群稳定初始状态时每个镜像对的第一个节点作为故障注入节点,保存到一个故障节点列表[node1,node2,node3,node4],这样可实现集群中所有节点依次注入故障和随机选取节点进行故障注入,每次故障注入完成后等待故障恢复后,再选择下一个节点进行故障注入。选取节点后,通过集群登录信息远程登录集群,然后执行故障注入命令触发故障,因为通过集群可操作每个节点,无需直接登录节点。
(4)对两个节点同时注入故障时,需要选取不同domain中的两个不同的节点,通过依次或随机选取两个domain,然后合并两个domain元素并去重实现节点的选取,比如选取domain0和domain1,故障注入的两个节点为(node1,node3),等待两个节点故障恢复后再进行下一次的节点选取。选取节点后,调用两个线程分别远程登录两个节点,对节点同时下发故障注入命令,因为要模拟同时注入故障,所以必须要登录两个节点同时执行命令。
(5)当对多个节点依次注入故障模拟四坏三这种场景时,也是将每个镜像对的第一个节点作为故障注入节点,保存到一个故障节点列表[node1,node2,node3,node4],顺序或随机的选取三个节点依次注入故障,然后恢复所有故障节点,等待恢复正常后再进行下次的节点选取;顺序选取三个节点的选取方法是以一个节点为首,再依照顺序选取它后面的两个节点,比如选取[node1,node2,node3],下次选取[node2,node3,node4],下次就是[node3,node4,node1,node2],以此循环选取节点;随机选取就按照随机算法,从节点列表中选取三个节点,比如[node1,node3,node4]。选取节点后,通过集群登录信息远程登录集群,然后执行故障注入命令触发故障,因为通过集群可操作每个节点,无需直接登录节点。
(6)对多个节点依次注入故障模拟四坏二这种场景,原理和流程实现同四坏三场景,只是顺序选择两个节点或者随机选取两个节点依次注入故障。
(7)对一个节点注入故障时,为了不中断前端IO业务,需要判定此节点是否具备注入故障的条件,若不可注入故障,则进行循环等待,一直等待到此节点可注入故障位置,如果要强制注入故障就会影响到前端业务离线。
(8)各种场景下的故障注入,以前端IO未中断来判定此次故障恢复没有问题,IO未之后弄表示本次故障注入可靠性验证通过;否则失败退出。
应该理解的是,虽然上述流程图中的各个步骤按照箭头的指示依次显示,但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有严格的顺序限制,这些步骤可以以其它的顺序执行。而且,上述流程图中的至少一部分步骤可以包括多个子步骤或者多个阶段,这些子步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,这些子步骤或者阶段的执行顺序也不必然是依次进行,而是可以与其它步骤或者其它步骤的子步骤或者阶段的至少一部分轮流 或者交替地执行。
在一些实施例中,如图6所示,提供了一种基于镜像对的故障注入装置600,包括:获取模块602、筛选模块604、注入模块606和判定模块608,其中:
获取模块602,用于登录集群,获取预设镜像对记录表,预设镜像对记录表包括镜像对信息,集群包括多个节点,镜像对信息包括镜像对与关联的节点。
筛选模块604,用于根据预设镜像对记录表中的镜像对信息从多个节点中确定待故障注入节点。
注入模块606,用于对待故障注入节点执行故障注入,获取前端IO业务对应的当前业务情况。
判定模块608,用于在当前业务情况表示前端IO业务未发生中断时,确定待故障注入节点故障注入成功。
在一些实施例中,基于镜像对的故障注入装置600获取已故障注入节点恢复信息,在已故障注入节点恢复信息为已恢复时,返回筛选模块604执行根据预设镜像对记录表中的镜像对信息从多个节点中确定待故障注入节点的步骤,直至执行故障注入次数达到预设故障注入次数阈值时,停止故障注入。
在一些实施例中,基于镜像对的故障注入装置600确定集群中的各节点,将各节点与相邻节点形成镜像对,组成对应的镜像对信息,根据各镜像对信息生成预设镜像对记录表。
在一些实施例中,筛选模块604获取预设故障注入规则,根据预设故障注入规则获取待故障注入类型和待故障注入节点类型,根据待故障注入类型、待故障注入节点类型和镜像对信息从多个节点中确定待故障注入节点。
在一些实施例中,当待故障注入类型为单节点故障类型时,筛选模块604在待故障注入节点类型为依次注入时,根据镜像对信息中描述的节点顺序,依次确定一个节点为待故障注入节点,在待故障注入节点类型为随机注入时,根据镜像对信息随机从多个节点中选取一个节点确定为待故障注入节点。
在一些实施例中,当待故障注入类型为两节点同时故障类型时,筛选模块604在待故障注入节点类型为依次注入时,基于镜像对信息,依次选取两个镜像对中的非重复节点为待故障注入节点,在待故障注入节点类型为随机注入时,基于镜像对信息,随机选取两个镜像对中的非重复节点为待故障注入节点。
在一些实施例中,当待故障注入类型为三节点依次故障类型时,筛选模块604在待故障注入节点类型为依次注入时,基于镜像对信息中描述的节点顺序,依次选取三个节点为待故障注入节点,在待故障注入节点类型为随机注入时,基于镜像对信息随机选取三个节点为待故障注入节点。
关于基于镜像对的故障注入装置的具体限定可以参见上文中对于基于镜像对的故障注入方法的限定,在此不再赘述。上述基于镜像对的故障注入装置中的各个模块可全部或部分通过软件、硬件及其组合来实现。上述各模块可以硬件形式内嵌于或独立于计算机设备中的处理器中,也可以以软件形式存储于计算机设备中的存储器中,以便于处理器调用执行以上各个模块对应的操作。
在一些实施例中,提供了一种计算机设备,该计算机设备可以是服务器,其内部结构图可以如图7所示。该计算机设备包括通过系统总线连接的处理器、存储器、网络接口和数据库。其中,该计算机设备的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有操作系统、计算机程序和数据库。该内存储器为非易失性存储介质中的操作系统和计算 机程序的运行提供环境。该计算机设备的数据库用于存储预设镜像对记录表。该计算机设备的网络接口用于与外部的终端通过网络连接通信。该计算机程序被处理器执行时以实现一种基于镜像对的故障注入方法。
本领域技术人员可以理解,图7中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备的限定,具体的计算机设备可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。
在一些实施例中,提供了一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,处理器执行计算机程序时实现以下步骤:登录集群,获取预设镜像对记录表,预设镜像对记录表包括镜像对信息,集群包括多个节点,镜像对信息包括镜像对与关联的节点,根据预设镜像对记录表中的镜像对信息从多个节点中确定待故障注入节点,对待故障注入节点执行故障注入,获取前端IO业务对应的当前业务情况,在当前业务情况表示前端IO业务未发生中断时,确定待故障注入节点故障注入成功。
在一些实施例中,处理器执行计算机程序时还实现以下步骤:获取已故障注入节点恢复信息,在已故障注入节点恢复信息为已恢复时,返回根据预设镜像对记录表中的镜像对信息从多个节点中确定待故障注入节点的步骤,直至执行故障注入次数达到预设故障注入次数阈值时,停止故障注入。
在一些实施例中,处理器执行计算机程序时还实现以下步骤:确定集群中的各节点,将各节点与相邻节点形成镜像对,组成对应的镜像对信息,根据各镜像对信息生成预设镜像对记录表。
在一些实施例中,处理器执行计算机程序时还实现以下步骤:获取预设故障注入规则,根据预设故障注入规则获取待故障注入类型和待故障注入节点类型,根据待故障注入类型、待故障注入节点类型和镜像对信息从多个节点中确定待故障注入节点。
在一些实施例中,当待故障注入类型为单节点故障类型时,处理器执行计算机程序时还实现以下步骤:在待故障注入节点类型为依次注入时,根据镜像对信息中描述的节点顺序,依次确定一个节点为待故障注入节点,在待故障注入节点类型为随机注入时,根据镜像对信息随机从多个节点中选取一个节点确定为待故障注入节点。
在一些实施例中,当待故障注入类型为两节点同时故障类型时,处理器执行计算机程序时还实现以下步骤:在待故障注入节点类型为依次注入时,基于镜像对信息,依次选取两个镜像对中的非重复节点为待故障注入节点,在待故障注入节点类型为随机注入时,基于镜像对信息,随机选取两个镜像对中的非重复节点为待故障注入节点。
在一些实施例中,当待故障注入类型为三节点依次故障类型时,处理器执行计算机程序时还实现以下步骤:在待故障注入节点类型为依次注入时,基于镜像对信息中描述的节点顺序,依次选取三个节点为待故障注入节点,在待故障注入节点类型为随机注入时,基于镜像对信息随机选取三个节点为待故障注入节点。
在一些实施例中,当故障注入类型为单节点故障注入类型时,处理器执行计算机程序时还实现以下步骤:在待故障注入节点类型为依次注入时,基于镜像对信息中描述的节点顺序,依次选取一个节点确定为待故障注入节点。
在一些实施例中,当故障注入类型为单节点故障注入类型时,处理器执行计算机程序时还实现以下步骤:在待故障注入节点类型为随机注入时,基于镜像对信息随机选取一个节点为待故障注入节点。
在一些实施例中,处理器执行计算机程序时还实现以下步骤:检测待故障注入节点是否具备故障注入条件,当不具备故障注入条件时,则进入循环等待,直至待故障注入节点具备故障注入条件,执行对待故障注入节点执行故障注入的步骤。
在一些实施例中,处理器执行计算机程序时还实现以下步骤:当具备故障注入条件时,则执行对待故障注入节点执行故障注入的步骤。
在一些实施例中,处理器执行计算机程序时还实现以下步骤:在当前业务情况表示前端IO业务发生中断时,确定待故障注入节点故障注入失败。
在一些实施例中,提供了一种非易失性计算机可读存储介质,其上存储有计算机程序,计算机程序被处理器执行时实现以下步骤:登录集群,获取预设镜像对记录表,预设镜像对记录表包括镜像对信息,集群包括多个节点,镜像对信息包括镜像对与关联的节点,根据预设镜像对记录表中的镜像对信息从多个节点中确定待故障注入节点,对待故障注入节点执行故障注入,获取前端IO业务对应的当前业务情况,在当前业务情况表示前端IO业务未发生中断时,确定待故障注入节点故障注入成功。
在一些实施例中,处理器执行计算机程序时还实现以下步骤:获取已故障注入节点恢复信息,在已故障注入节点恢复信息为已恢复时,返回根据预设镜像对记录表中的镜像对信息从多个节点中确定待故障注入节点的步骤,直至执行故障注入次数达到预设故障注入次数阈值时,停止故障注入。
在一些实施例中,处理器执行计算机程序时还实现以下步骤:确定集群中的各节点,将各节点与相邻节点形成镜像对,组成对应的镜像对信息,根据各镜像对信息生成预设镜像对记录表。
在一些实施例中,处理器执行计算机程序时还实现以下步骤:获取预设故障注入规则,根据预设故障注入规则获取待故障注入类型和待故障注入节点类型,根据待故障注入类型、待故障注入节点类型和镜像对信息从多个节点中确定待故障注入节点。
在一些实施例中,当待故障注入类型为单节点故障类型时,处理器执行计算机程序时还实现以下步骤:在待故障注入节点类型为依次注入时,根据镜像对信息中描述的节点顺序,依次确定一个节点为待故障注入节点,在待故障注入节点类型为随机注入时,根据镜像对信息随机从多个节点中选取一个节点确定为待故障注入节点。
在一些实施例中,当待故障注入类型为两节点同时故障类型时,处理器执行计算机程序时还实现以下步骤:在待故障注入节点类型为依次注入时,基于镜像对信息,依次选取两个镜像对中的非重复节点为待故障注入节点,在待故障注入节点类型为随机注入时,基于镜像对信息,随机选取两个镜像对中的非重复节点为待故障注入节点。
在一些实施例中,当待故障注入类型为三节点依次故障类型时,处理器执行计算机程序时还实现以下步骤:在待故障注入节点类型为依次注入时,基于镜像对信息中描述的节点顺序,依次选取三个节点为待故障注入节点,在待故障注入节点类型为随机注入时,基于镜像对信息随机选取三个节点为待故障注入节点。
在一些实施例中,当故障注入类型为单节点故障注入类型时,处理器执行计算机程序时还实现以下步骤:在待故障注入节点类型为依次注入时,基于镜像对信息中描述的节点顺序,依次选取一个节点确定为待故障注入节点。
在一些实施例中,当故障注入类型为单节点故障注入类型时,处理器执行计算机程序时还实现以下步骤:在待故障注入节点类型为随机注入时,基于镜像对信息随机选取一个节点为待故障注入节点。
在一些实施例中,处理器执行计算机程序时还实现以下步骤:检测待故障注入节点是否具备故障注入条件,当不具备故障注入条件时,则进入循环等待,直至待故障注入节点具备故障注入条件,执行对待故障注入节点执行故障注入的步骤。
在一些实施例中,处理器执行计算机程序时还实现以下步骤:当具备故障注入条件时,则执行对待故障注入节点执行故障注入的步骤。
在一些实施例中,处理器执行计算机程序时还实现以下步骤:在当前业务情况表示前端IO业务发生中断时,确定待故障注入节点故障注入失败。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,的计算机程序可存储于一非易失性计算机可读取存储介质中,该计算机程序在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。
以上实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。
以上实施例仅表达了本申请的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对发明专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本申请构思的前提下,还可以做出若干变形和改进,这些都属于本申请的保护范围。因此,本申请专利的保护范围应以所附权利要求为准。

Claims (20)

  1. 一种基于镜像对的故障注入方法,所述方法包括:
    登录集群,获取预设镜像对记录表,所述预设镜像对记录表包括镜像对信息,所述集群包括多个节点,所述镜像对信息包括镜像对与关联的节点;
    根据所述预设镜像对记录表中的镜像对信息从多个所述节点中确定待故障注入节点;
    对所述待故障注入节点执行故障注入,获取前端IO业务对应的当前业务情况;
    在所述当前业务情况表示所述前端IO业务未发生中断时,确定所述待故障注入节点故障注入成功。
  2. 根据权利要求1所述的方法,其特征在于,所述方法还包括:
    获取已故障注入节点恢复信息,在所述已故障注入节点恢复信息为已恢复时,返回所述根据所述预设镜像对记录表中的镜像对信息从多个节点中确定待故障注入节点的步骤;
    直至执行故障注入次数达到预设故障注入次数阈值时,停止故障注入。
  3. 根据权利要求1所述的方法,其特征在于,所述登录集群之前,包括:
    确定所述集群中的各节点;
    将各所述节点与相邻节点形成镜像对,组成对应的镜像对信息;
    根据各所述镜像对信息生成预设镜像对记录表。
  4. 根据权利要求3所述的方法,其特征在于,所述确定所述集群中的各节点,包括:
    统计所述集群中的各节点。
  5. 根据权利要求3所述的方法,其特征在于,所述将各所述节点与相邻节点形成镜像对,组成对应的镜像对信息,包括:
    确定当前节点;
    将所述当前节点与相邻节点形成镜像对,所述当前节点的缓存数据与所述相邻节点的缓存数据形成镜像对,得到对应的镜像对信息。
  6. 根据权利要求1所述的方法,其特征在于,所述根据所述预设镜像对记录表中的镜像对信息从多个所述节点中确定待故障注入节点,包括:
    获取预设故障注入规则;
    根据所述预设故障注入规则获取待故障注入类型和待故障注入节点类型;
    根据所述待故障注入类型、所述待故障注入节点类型和所述镜像对信息从多个节点中确定待故障注入节点。
  7. 根据权利要求6所述的方法,其特征在于,当所述待故障注入类型为单节点故障类型时,所述根据所述待故障注入类型、所述待故障注入节点类型和所述镜像对信息从多个节点中确定待故障注入节点,包括:在所述待故障注入节点类型为依次注入时,根据所述镜像对信息中描述的节点顺序,依次确定一个节点为待故障注入节点。
  8. 根据权利要求6所述的方法,其特征在于,当所述待故障注入类型为单节点故障类型时,所述根据所述待故障注入类型、所述待故障注入节点类型和所述镜像对信息从多个节点中确定待故障注入节点,包括:在所述待故障注入节点类型为随机注入时,根据所述镜像对信息随机从多个节点中选取一个节点确定为待故障注入节点。
  9. 根据权利要求6所述的方法,其特征在于,当所述待故障注入类型为两节点同时故障类型时,所述根据所述待故障注入类型、所述待故障注入节点类型和所述镜像对信息从多个节点中确定待故障注入节点,包括:
    在所述待故障注入节点类型为依次注入时,基于所述镜像对信息,依次选取两个镜像对中的非重复节点为待故障注入节点。
  10. 根据权利要求6所述的方法,其特征在于,当所述待故障注入类型为两节点同时故障类型时,所述根据所述待故障注入类型、所述待故障注入节点类型和所述镜像对信息从多个节点中确定待故障注入节点,包括:
    在所述待故障注入节点类型为随机注入时,基于所述镜像对信息,随机选取两个镜像对中的非重复节点为待故障注入节点。
  11. 根据权利要求6所述的方法,其特征在于,当所述待故障注入类型为三节点依次故障类型时,所述根据所述待故障注入类型、所述待故障注入节点类型和所述镜像对信息从多个节点中确定待故障注入节点,包括:
    在所述待故障注入节点类型为依次注入时,基于所述镜像对信息中描述的节点顺序,依次选取三个节点为待故障注入节点。
  12. 根据权利要求6所述的方法,其特征在于,当所述待故障注入类型为三节点依次故障类型时,所述根据所述待故障注入类型、所述待故障注入节点类型和所述镜像对信息从多个节点中确定待故障注入节点,包括:
    在所述待故障注入节点类型为随机注入时,基于所述镜像对信息随机选取三个节点为待故障注入节点。
  13. 根据权利要求6所述的方法,其特征在于,当所述故障注入类型为单节点故障注入类型时,所述根据所述待故障注入类型、所述待故障注入节点类型和所述镜像对信息从多个节点中确定待故障注入节点,包括:
    在所述待故障注入节点类型为依次注入时,基于所述镜像对信息中描述的节点顺序,依次选取一个节点确定为待故障注入节点。
  14. 根据权利要求6所述的方法,其特征在于,当所述故障注入类型为单节点故障注入类型时,所述根据所述待故障注入类型、所述待故障注入节点类型和所述镜像对信息从多个节点中确定待故障注入节点,包括:
    在所述待故障注入节点类型为随机注入时,基于所述镜像对信息随机选取一个节点为待故障注入节点。
  15. 根据权利要求13或14所述的方法,其特征在于,在确定单节点为待故障注入节点后,还包括:
    检测所述待故障注入节点是否具备故障注入条件;
    当不具备所述故障注入条件时,则进入循环等待,直至所述待故障注入节点具备故障注入条件,执行所述对所述待故障注入节点执行故障注入的步骤。
  16. 根据权利要求15所述的方法,其特征在于,所述方法还包括:
    当具备所述故障注入条件时,则执行所述对所述待故障注入节点执行故障注入的步骤。
  17. 根据权利要求1所述的方法,其特征在于,所述方法还包括:
    在所述当前业务情况表示所述前端IO业务发生中断时,确定所述待故障注入节点故障注入失败。
  18. 一种基于镜像对的故障注入装置,其特征在于,所述装置包括:
    获取模块,用于登录集群,获取预设镜像对记录表,所述预设镜像对记录表包括镜像对信息,所述集群包括多个节点,所述镜像对信息包括镜像对与关联的节点;
    筛选模块,用于根据所述预设镜像对记录表中的镜像对信息从多个所述节点中确定待故障注入节点;
    注入模块,用于对所述待故障注入节点执行故障注入,获取前端IO业务对应的当前业务情况;
    判定模块,用于在所述当前业务情况表示所述前端IO业务未发生中断时,确定所述待故障注入节点故障注入成功。
  19. 一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,其特征在于,所述处理器执行所述计算机程序时实现权利要求1至17中任一项所述方法的步骤。
  20. 一种非易失性计算机可读存储介质,其上存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现权利要求1至17中任一项所述的方法的步骤。
PCT/CN2023/102838 2022-10-13 2023-06-27 基于镜像对的故障注入方法、装置、设备和存储介质 WO2024078015A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211253227.8A CN115328814B (zh) 2022-10-13 2022-10-13 基于镜像对的故障注入方法、装置、设备和存储介质
CN202211253227.8 2022-10-13

Publications (1)

Publication Number Publication Date
WO2024078015A1 true WO2024078015A1 (zh) 2024-04-18

Family

ID=83913760

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/102838 WO2024078015A1 (zh) 2022-10-13 2023-06-27 基于镜像对的故障注入方法、装置、设备和存储介质

Country Status (2)

Country Link
CN (1) CN115328814B (zh)
WO (1) WO2024078015A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115328814B (zh) * 2022-10-13 2023-04-14 苏州浪潮智能科技有限公司 基于镜像对的故障注入方法、装置、设备和存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104461865A (zh) * 2014-11-04 2015-03-25 哈尔滨工业大学 云环境下分布式文件系统可靠性测试套件
CN109495297A (zh) * 2018-11-05 2019-03-19 中国电子科技集团公司第二十八研究所 基于启发式强化学习的韧性云环境故障注入方法
CN112596934A (zh) * 2020-12-26 2021-04-02 中国农业银行股份有限公司 一种故障测试方法及装置
WO2022033672A1 (en) * 2020-08-12 2022-02-17 Huawei Technologies Co., Ltd. Apparatus and method for injecting a fault into a distributed system
CN115328814A (zh) * 2022-10-13 2022-11-11 苏州浪潮智能科技有限公司 基于镜像对的故障注入方法、装置、设备和存储介质

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104331324A (zh) * 2014-11-04 2015-02-04 哈尔滨工业大学 MapReduce故障注入套件
US10922203B1 (en) * 2018-09-21 2021-02-16 Nvidia Corporation Fault injection architecture for resilient GPU computing
CN109857522B (zh) * 2019-03-01 2021-03-02 哈尔滨工业大学 一种面向kvm的虚拟化层故障注入方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104461865A (zh) * 2014-11-04 2015-03-25 哈尔滨工业大学 云环境下分布式文件系统可靠性测试套件
CN109495297A (zh) * 2018-11-05 2019-03-19 中国电子科技集团公司第二十八研究所 基于启发式强化学习的韧性云环境故障注入方法
WO2022033672A1 (en) * 2020-08-12 2022-02-17 Huawei Technologies Co., Ltd. Apparatus and method for injecting a fault into a distributed system
CN112596934A (zh) * 2020-12-26 2021-04-02 中国农业银行股份有限公司 一种故障测试方法及装置
CN115328814A (zh) * 2022-10-13 2022-11-11 苏州浪潮智能科技有限公司 基于镜像对的故障注入方法、装置、设备和存储介质

Also Published As

Publication number Publication date
CN115328814B (zh) 2023-04-14
CN115328814A (zh) 2022-11-11

Similar Documents

Publication Publication Date Title
CN108319719B (zh) 数据库数据校验方法、装置、计算机设备和存储介质
CN108710673B (zh) 实现数据库高可用方法、系统、计算机设备和存储介质
CN110990183B (zh) 数据库集群的异常检测方法、装置、计算机可读存储介质
US11966818B2 (en) System and method for self-healing in decentralized model building for machine learning using blockchain
CN111327490B (zh) 区块链的拜占庭容错检测方法及相关装置
CN112363941A (zh) 接口测试方法、装置、计算机设备及存储介质
WO2024078015A1 (zh) 基于镜像对的故障注入方法、装置、设备和存储介质
CN108897658B (zh) 主数据库监控方法、装置、计算机设备和存储介质
CN111274077A (zh) 一种磁盘阵列可靠性测试方法、系统、终端及存储介质
US20130067288A1 (en) Cooperative Client and Server Logging
CN110807064A (zh) Rac分布式数据库集群系统中的数据恢复装置
WO2021135131A1 (zh) 区块链的交易方法、装置、计算机设备及存储介质
CN108418859B (zh) 写数据的方法和装置
CN111371599A (zh) 一种基于etcd的集群容灾管理系统
Mahajan et al. Jury: Validating controller actions in software-defined networks
US10860411B2 (en) Automatically detecting time-of-fault bugs in cloud systems
CN116743619A (zh) 网络服务的测试方法、装置、设备及存储介质
CN111131329A (zh) 区块链系统的数据共识方法、装置及硬件设备
CN112286786A (zh) 数据库的测试方法、装置和服务器
WO2009127160A1 (zh) 容灾演练的方法、装置和服务器
Tai et al. A performability-oriented software rejuvenation framework for distributed applications
CN114037539A (zh) 一种保险出单链路异常检测方法及装置
CN114416522A (zh) 区块链系统测试方法、装置、设备及存储介质
CN114996955A (zh) 一种云原生混沌工程实验的靶场环境构建方法及装置
CN114816806A (zh) 容器可用性验证方法、装置、计算机设备和存储介质