CN112181844B

CN112181844B - Detection method and device for verifying fault-tolerant mechanism of distributed protocol activity attribute

Info

Publication number: CN112181844B
Application number: CN202011083317.8A
Authority: CN
Inventors: 吴化尧; 陆超逸; 聂长海
Original assignee: Nanjing University
Current assignee: Nanjing University
Priority date: 2020-10-12
Filing date: 2020-10-12
Publication date: 2022-02-18
Anticipated expiration: 2040-10-12
Also published as: CN112181844A

Abstract

The invention relates to the field of model detection, in particular to a detection method and a device for verifying a distributed protocol activity attribute fault-tolerant mechanism, which comprises the following steps: initializing a system to be verified; acquiring a migration Set T _ Set which can be executed by a system to be verified in the current state; defining a peer node; defining the same operation; reducing the migration Set T _ Set based on the peer reduction strategy; acquiring the state of a system to be verified after migration; checking the security attribute of the state of the system to be verified after the migration to obtain the security state; checking the activity attribute of the safety state, acquiring the activity state, injecting a fault, and rechecking the safety attribute and the activity attribute of the state of the system to be verified; and a counter-example set C is output, the verification effect of the method is better, and the whole space can be equivalently traversed under the condition of exploring less execution sequences.

Description

Detection method and device for verifying fault-tolerant mechanism of distributed protocol activity attribute

Technical Field

The invention relates to the field of model detection, in particular to a detection method and a detection device for verifying a distributed protocol activity attribute fault-tolerant mechanism.

Background

The model detection is used in a software and hardware system with high requirements on reliability and safety, the whole system is modeled to formally describe all possible behaviors of the system during operation, whether behaviors violating system design attributes of technicians exist is found out by traversing the behaviors, and if the behaviors violating the system design attributes exist, the violating behaviors are output as counterexamples. The model detection technology can find out all bugs violating the attributes in the system to be verified by traversing the system space, so that the high safety and the high reliability of the system are ensured.

The existing model detection technology is divided into implementation-level model detection (implementation-level) and abstract model detection, wherein the implementation-level model detection is to directly verify the implementation code of a system to be verified in operation; and the abstract model detection is to convert a system to be verified into an abstract model (an automaton, a Petri net and the like), and verify the abstract model by using related knowledge such as graph theory, set theory and the like. However, the model detection technology is always troubled by the explosion problem of the state space of the system to be verified due to the need of traversing all the state spaces of the system.

Meanwhile, model detection using an abstract model can only verify the design of the system and the abstract model thereof, but cannot find defects introduced by code implementation; meanwhile, the problem that the model is not matched with the system may be caused by carrying out abstract modeling on the system to be verified, and the fact that model detection is carried out on the unmatched model is meaningless. The existing implementation-level model detection tool has less consideration for verification of activity attributes and fault tolerance thereof.

The existing model detection tools focus on verification of security attributes of a distributed system, detection of active attributes and fault-tolerant mechanisms, particularly active attribute fault-tolerant mechanisms is omitted, although some existing model detection tools support fault injection into a system to be verified to verify fault-tolerance, fault injection by the model detection tools is often a random injection or exhaustion fault which may occur (faults are injected at any time of operation of the system to be verified), and thus, fault injection is often lack of purpose, and many problems and challenges are brought.

Disclosure of Invention

The invention aims to provide a detection method and a device for verifying a fault-tolerant mechanism of distributed protocol activity attributes.

In order to solve the technical problems, the invention provides the following technical scheme:

the detection method for verifying the fault-tolerant mechanism of the activity attribute of the distributed protocol comprises the following steps:

initializing a system to be verified, and initializing the system to be verified to an initial state;

acquiring a migration Set T _ Set which can be executed by a system to be verified in the current state;

defining node N1 and node N2 as peer nodes at a time when and only when condition (1) and condition (2) are simultaneously satisfied, node N1 and node N2;

wherein

Represents the set of functions that the ith node can perform at that time:

defining the operation contents corresponding to the transition t1 and the transition t2 to be the same if and only if the condition (3) and the condition (4) are simultaneously satisfied, the operation contents corresponding to the transition t1 and the transition t2 are the same operation;

the migration t1 is the same object as the migration t2 operation or is a consistent object condition in a different node (3)

Migration t1 is the same condition as migration t2 for the behavior performed on the operands (4)

Reducing a migration Set T _ Set based on a peer reduction strategy, wherein the peer reduction strategy is to arbitrarily take two migrations from the migration Set T _ Set, if target nodes of the two migrations are peer nodes in a system to be detected and migration corresponding operation contents are the same operation, the two migrations are peer redundancy, and any one of the migrations is removed from the migration Set T _ Set;

acquiring the state of the system to be verified after the migration, and re-acquiring a migration Set T _ Set which can be executed by the system to be verified in the current state;

checking the security attribute of the state of the system to be verified after migration, and acquiring the security state, wherein the security attribute is an attribute which must be satisfied by each state in one system, and the security state is the state of the system to be verified after migration which satisfies the security attribute;

checking the activity attribute of the safety state to obtain the activity state, wherein the activity attribute is an attribute which is not required to be satisfied in all states of the system but is always satisfied in a certain future state, and the activity state is the safety state which satisfies the activity attribute;

injecting faults, and rechecking the safety attribute and the activity attribute of the state of the system to be verified;

if each migration in the migration Set T _ Set of the system to be verified in the initial state is executed, outputting a counter-example Set C, wherein the counter-example Set C is a Set of migration sequences which cause that the state of the system to be verified after the migration does not meet the security attribute and cause that the system state does not meet the activity attribute at the maximum execution depth k.

Optionally, the obtaining of the state of the system to be verified after the migration specifically includes the following steps:

setting and determining a maximum execution depth k explored by a system to be verified, wherein k is a positive integer;

the system to be verified selects an unexecuted migration as a target migration, and drives the system to be verified from the initial state migration to the state S based on the exploration from the target migration execution depth of 0 to the maximum execution depth k_naAnd adding one to the depth value of the last exploration until the depth of the current exploration of the system to be verified exceeds the maximum execution depth k, wherein the target migration belongs to the reduced migration Set T _ Set, N represents the number of items of the target migration in the migration Set T _ Set sequence, N belongs to {1, N } and N belongs to the total number of migration in the migration Set T _ Set, a represents the depth of the current exploration/the number of times of the current exploration, a is a positive integer and a belongs to {1, k }.

Optionally, the method for verifying the state security attribute of the system to be verified after migration includes the following steps:

acquiring a security attribute standard characteristic of a security attribute;

obtaining a state S to be checked_naCurrent security-related features of;

comparing state S_naIf the current security-related characteristic of the security attribute is the same as the security attribute standard characteristic of the security attribute, determining the state S_naIs in a safe state, if not, determines a state S_naFor the bug state, the migration sequence is put into a counter-example set C, and the system to be verified is returned to the previous state S_n(a-1)Executing the above-mentioned obtaining of the state of the system to be verified after the migration,and re-acquiring the step of the migration Set T _ Set which can be executed by the system to be verified in the current state, and taking the state of the new system to be verified as the state S to be checked_naUntil each migration in the migration Set T _ Set is executed.

Optionally, checking an activity attribute of the security state to obtain the activity state, where the activity attribute is an attribute that is not required to be satisfied in all states of the system but is always satisfied in a future state, and the activity state is the security state satisfying the activity attribute, and specifically includes the following steps;

acquiring an activity attribute standard characteristic of the activity attribute;

obtaining current activity-related characteristics of a security state to be checked;

comparing the current activity related characteristic of the security state with the activity attribute standard characteristic of the activity attribute, if the current activity related characteristic of the security state is the same as the activity attribute standard characteristic of the activity attribute, determining that the security state of the state is the activity state, and returning the system to be verified to the previous state S_n(a-1)Defining the target migration as a key migration t, if the target migration is different from the key migration t, executing the step of checking the security attribute of the state of the system to be verified after the migration, acquiring a new security state, and taking the new security state as the security state to be checked;

if the current activity-related characteristic of the security state not yet to be checked is the same as the activity attribute standard characteristic of the activity attribute when the search is carried out to the maximum execution depth k, the migration sequence is put into a counter-example set C.

Optionally, injecting a fault, and rechecking the security attribute and the activity attribute of the state of the system to be verified, specifically including the following steps:

setting and determining the maximum fault number of a system to be verified;

injecting a fault into a system to be verified to destroy the key migration t, and setting the fault recovery step number to be 0;

comparing the number of injected faults in the current system to be verified with the maximum fault number, if the number of injected faults is less than the maximum fault number, the system to be verified after the injected faults re-executes the exploration with the depth of 0 to the maximum execution depth k, and checking the state S_n(a-1)SecureIf the number of injected faults is equal to the maximum fault number, executing the step of obtaining the state of the system to be verified after migration, and re-obtaining a migration Set T _ Set which can be executed by the system to be verified in the current state;

wherein the state S is checked_n(a-1)The safety and activity attributes are specifically:

performing S as above_n(a-1)If S is a safety attribute and an activity attribute of_n(a-1)Failure to satisfy the safety attribute and failure to satisfy the liveness attribute within the k-step failure recovery depth will result in state S_n(a-1)Migration sequences which do not satisfy the safety attribute or do not satisfy the activity attribute within the k-step fault recovery depth are put into a counterexample set C if S_n(a-1)And (4) satisfying the integrity attribute and the activity attribute, executing fault injection in the system to be verified to destroy the key migration t, and setting the fault recovery step number as 0 step.

Further, the present invention also provides a detection apparatus for verifying a fault-tolerant mechanism of distributed protocol activity attributes, including:

the initialization module is used for initializing the system to be verified and initializing the system to be verified to an initial state;

the migration Set acquisition module is used for acquiring a migration Set T _ Set which can be executed by the system to be verified in the current state;

a first defining module for defining node N1 and node N2 as peer nodes at a certain time if and only if condition (5) and condition (6) are simultaneously satisfied, node N1 and node N2;

wherein

Indicating that the ith node can be at the timeSet of functions performed:

a second defining module, configured to define that the operation content of the migration t1 corresponding to the migration t2 is the same and that the operation content of the migration t1 corresponding to the migration t2 is the same operation only when the condition (7) and the condition (8) are satisfied at the same time;

the migration t1 is the same object as the migration t2 operation or is a consistent object condition in a different node (7)

The migration t1 is the same condition as the behavior performed on the operand by the migration t2 (8)

A peer reduction module, configured to reduce a migration Set T _ Set based on a peer reduction policy, where the peer reduction policy is to arbitrarily select two migrations from the migration Set T _ Set, and if a target node of the two migrations is a peer node in a system to be detected and the migration corresponding operation contents are the same operation, the two migrations are peer-to-peer redundant, and any one of the migrations is removed from the migration Set T _ Set;

the exploration migration module is used for acquiring the state of the system to be verified after migration and re-acquiring a migration Set T _ Set which can be executed by the system to be verified in the current state;

the first checking module is used for checking the security attribute of the state of the system to be verified after the migration to obtain the security state, wherein the security attribute is an attribute which must be met by each state in one system, and the security state is the state of the system to be verified after the migration which meets the security attribute;

the second checking module is used for checking the activity attribute of the safety state to obtain the activity state, wherein the activity attribute is an attribute which is not required to be satisfied in all states of the system but is always satisfied in a certain future state, and the activity state is the safety state which satisfies the activity attribute;

the fault injection module is used for injecting faults and rechecking the safety attribute and the activity attribute of the state of the system to be verified;

and the output module is used for outputting an inverse Set C if each migration in the migration Set T _ Set of the system to be verified in the initial state is executed, wherein the inverse Set C is a migration sequence which causes the state of the system to be verified after the migration to not meet the security attribute and a Set of migration sequences which causes the system state to not meet the activity attribute at the maximum execution depth k.

Optionally, the exploration migration module specifically includes:

the depth setting unit is used for setting and determining the maximum execution depth k explored by the system to be verified, wherein k is a positive integer;

a exploration migration unit for selecting an unexecuted migration from the system to be verified as a target migration, and driving the system to be verified from the initial state migration to the state S based on the exploration with the target migration execution depth of 0 to the maximum execution depth k_naAnd adding one to the depth value of the last exploration until the depth of the current exploration of the system to be verified exceeds the maximum execution depth k, wherein the target migration belongs to the reduced migration Set T _ Set, N represents the number of items of the target migration in the migration Set T _ Set sequence, N belongs to {1, N } in the migration Set T _ Set, N is the total number of migration in the migration Set T _ Set, a represents the depth of the current exploration/the number of times of the current exploration, a is a positive integer and a belongs to {1, k }.

Optionally, the first checking module specifically includes:

the first parameter acquisition unit is used for acquiring a safety attribute standard characteristic of a safety attribute;

a second parameter acquiring unit for acquiring the state S to be checked_naCurrent security-related features of;

a first comparison unit for comparing the state S_naIf the current security-related characteristic of the security attribute is the same as the security attribute standard characteristic of the security attribute, determining the state S_naIs in a safe state, if not, determines a state S_naFor the counter-example state, the migration sequence is put into a counter-example set C, and the system to be verified is returned to the previous state S_n(a-1)Executing the exploration and migration module to take the state of the new system to be verified as the state S to be checked_naUntil each migration in the migration Set T _ Set is executed.

Optionally, the second checking module specifically includes;

the third parameter acquisition unit is used for acquiring the activity attribute standard characteristic of the activity attribute;

a fourth parameter obtaining unit, configured to obtain a current activity-related feature of a security state to be checked;

a second comparing unit for comparing the current activity-related characteristic of the security state with the activity attribute standard characteristic of the activity attribute, determining the security state as the activity state if the current activity-related characteristic is the same as the activity attribute standard characteristic, and returning the system to be verified to the previous state S_n(a-1)Defining the target migration as a key migration t, if the target migration is different from the key migration t, executing the first checking module, and acquiring a new security state, wherein the new security state is used as a security state to be checked;

and the putting unit is used for putting the migration sequence into the counter-example set C if the current activity related characteristic of the safety state to be checked is the same as the activity attribute standard characteristic of the activity attribute when the maximum execution depth k is explored and the safety state to be checked still does not appear.

Optionally, the fault injection module specifically includes:

the system comprises a fault setting unit, a verification unit and a verification unit, wherein the fault setting unit is used for setting and determining the maximum fault number of a system to be verified;

the system comprises a fault injection unit, a fault recovery unit and a verification unit, wherein the fault injection unit is used for injecting faults into a system to be verified to destroy the key migration t, and the fault recovery step number is set to be 0;

a third comparing unit, configured to compare the number of injection faults in the current system to be verified with the maximum number of faults, if the number of injection faults is less than the maximum number of faults, the system to be verified after fault injection performs the search again with the depth of 0 to the maximum execution depth k, and checks the state S_n(a-1)The safety attribute and the activity attribute, if the number of the injected faults is equal to the maximum fault number, the exploration migration module is executed;

according to the first and second checking modules_n(a-1)If S is a safety attribute and an activity attribute of_n(a-1)Failure to meet safety attributes and failure to meet survivability within k-step failure recovery depthProperty attribute, will result in state S_n(a-1)Migration sequences which do not satisfy the safety attribute or do not satisfy the activity attribute within the k-step fault recovery depth are put into a counterexample set C if S_n(a-1)And satisfying the completeness attribute and the activity attribute, and executing the fault injection unit.

Compared with the prior art, the invention has the beneficial effects that:

the application provides a method and a device for realizing level model detection to verify a distributed protocol, the distributed protocol to be verified is modeled into an event staggered model, namely a system to be verified is modeled into a group of event execution sequence models starting from an initial state, a complete model of the system to be verified is obtained by running all possible actions, namely all possible event execution sequences of the system to be verified are obtained, and the sequences are executed, so that the full space of the distributed system is dynamically verified during running.

The method combines model detection and fault injection technology, and provides a method for purposefully selecting fault injection points and a Peer-to-Peer state space Reduction strategy (PRP) based on the roles of nodes of a distributed system.

Meanwhile, the source of uncertainty of the protocol to be verified considered by the application is uncertainty of the execution sequence of concurrent events, and the concurrent events comprise communication between nodes on the network, read-write operation of the nodes in local and the like. Therefore, the distributed protocol to be verified is modeled into a model formed by a group of event execution sequences starting from an initial state, concurrent events can be staggered, the implementation level model detects that each event execution sequence needs to be explored and run on the model to complete verification, for a security attribute, each state encountered in the execution process of the protocol to be verified needs to be checked whether the state is met, for an activity attribute, the occurrence of the state meeting the activity attribute is limited within a limited Depth k, Depth First Search (DFS) with the Depth of k is executed from the initial state of the system, and if the system cannot meet the activity attribute within the Depth k, the system is considered to possibly not meet the activity attribute.

The definition of the key migration is given in the application, the key migration is used as a fault injection point, the fault is injected at the key migration position to damage the execution of the key migration, and whether other execution paths exist in the system or not can meet the activity property again within the k depth is detected.

The method and the device have better verification effect, and can equivalently traverse the whole space under the condition of exploring less execution sequences.

Drawings

FIG. 1 is a schematic flow diagram of the process of the present invention;

FIG. 2 is a schematic flow chart of S600) in the method of the present invention;

FIG. 3 is a schematic flow chart of S700) in the method of the present invention;

FIG. 4 is a schematic flow chart of step S800) of the method of the present invention;

FIG. 5 is a schematic flow chart of S900) in the method of the present invention;

FIG. 6 is a block diagram of the apparatus of the present invention;

FIG. 7 is a block diagram of a specific structure of the apparatus of the present invention;

FIG. 8 is a block diagram of a specific structure of the apparatus of the present invention;

FIG. 9 is a block diagram of a specific structure of the apparatus of the present invention;

FIG. 10 is a block diagram showing a specific structure of the apparatus of the present invention;

FIG. 11 is a diagram of the design architecture for injecting a fault according to the present invention;

FIG. 12 is a diagram of an exemplary simple atomic broadcast protocol in accordance with the present invention;

FIG. 13 is a schematic illustration of the fault injection of FIG. 8 using the present invention;

FIG. 14 is a graph of experimental results comparing the number of execution paths that need to be searched for traversing the state space when no fault is injected, for the DFS search method and the peer reduction strategy used by the method, on different scales;

FIG. 15 is a logic diagram of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

With the continuous development of distributed technologies, services provided by distributed systems change the original software architecture, client applications become light, and software depends on services more, so that strict requirements are placed on the correctness and fault tolerance of the distributed systems.

Since the distributed system protocol involves a large number of concurrent operations, there are a number of uncertainties resulting from the concurrency. For example, if four concurrent events are represented by a, b, c and d at the same time, the four events are processed in 4! 24 species: the 24 event execution sequences may cause different results to the system, and in order to verify that the system behavior satisfies certain properties, the system is driven to all possible states, and each state is verified separately, which requires reordering of all concurrent events in the system, so as to obtain the sequences of all possible execution events of the system, and executing the event sequences can traverse all reachable states of the system, and verify each reachable system state.

The correctness and fault tolerance of the distributed system cannot be guaranteed by using the traditional software testing method. Due to the uncertainty of the concurrency sequence of the distributed system, even if the same test case is used for testing for multiple times, the system can not be ensured to completely test all possible behaviors of the test case system. The model detection technology is used as an important branch of a formalization method, all possible event execution sequences of the distributed system behavior can be verified, the defect of software testing in the aspect is overcome, and all bugs in the distributed system are systematically found out or are proved to be absent.

Simsa J et al propose a model detection tool dBug (ref: Simsa J, Bryant R, Gibson G.dBug: systematic evaluation of distributed Systems// procedures of the 5th International work on Systems Software version. value, BC, Canada: USENIX Association, 2010), giving a definition of event independence by using a black box analysis method, and reducing the state space by using a black box DPOR method. Leesatporwngsa T et al propose SAMC tool (reference 1: Leesatporwngsa T, Gunawi H S. SAMC: a fast model checker for defining a semantic construct in distributed Systems (demo)// Proceedings of 2015International Symposium on Software Testing and analysis. multimore, MD: ACM, 2015.423-427; reference 2: Leesatporwnsas T, Hao M, Lukman J F, et al. SAMC: semantic-aware model checking for a semantic construct correlation of depth constructs in closed Systems// concept of the concept 11. user identifier, a method for mutually analyzing a target node of a network event, whether a processing between two types of events can be judged by a mutual analysis system of the source and destination system of the event, a method for mutually analyzing a network event, a method for mutually analyzing a source and destination system of a network event, a method for mutually analyzing a destination event, a system for processing between two types of events, a system for processing between events, a system for processing a network event, a system for mutually analyzing a destination event, a network event, a system for processing between two types of a network event, a system for processing between a network event, a system for processing a network event, a system for analyzing a network event, a destination system for analyzing a destination node, a destination system for analyzing a destination of a, the state space is reduced using the white-box DPOR method. However, these methods do not take into account that many system events are peer-to-peer with each other in the distributed system, and the state space of the distributed system can be further reduced by the peer-to-peer events.

The existing model detection tools focus on verification of security attributes of a distributed system, detection of active attributes and fault-tolerant mechanisms, particularly active attribute fault-tolerant mechanisms, is omitted, although some existing model detection tools support fault injection into a system to be verified to verify fault-tolerance, the fault injection of the model detection tools is often a random injection or exhaustion fault which may occur (faults are injected at any time of operation of the system to be verified), and thus, fault injection is often lack of purpose, and many problems and challenges are brought, for example, a place where all fault-tolerant mechanisms cannot work normally can not be found out by using random injection, and the problem of space explosion is aggravated by exhaustion injection, so that a model detector does not have expansibility.

Referring to fig. 1, the method includes the following steps:

s100) initializing the system to be verified, and initializing the system to be verified to be in an initial state.

S200) obtaining a Current State, recording the Current State as Current _ State, obtaining a migration Set T _ Set which can be executed by a system to be verified in the Current _ State State of the system to be verified, wherein the migration is defined in such a way that if the migration T is a migration of the system in the State S, and if and only if the system can execute an event corresponding to the migration T in the State S, the system can execute the event corresponding to the migration T and

where s 'is one state of the system, and state s' and state s may be the same state.

S300) defining node N1 and node N2 as peer nodes at a time when and only when condition (1) and condition (2) are simultaneously satisfied, node N1 and node N2;

wherein

Indicating the set of functions that the ith node can perform at that time.

S400) defines that the operation contents corresponding to the migration t1 and the migration t2 are the same if and only if the condition (3) and the condition (4) are simultaneously satisfied, the operation contents corresponding to the migration t1 and the migration t2 are the same operation.

S500) reducing the migration Set T _ Set based on a Peer Reduction Policy (PRP), wherein the Peer Reduction Policy is to arbitrarily take two migrations from the migration Set T _ Set, if the target nodes of the two migrations are Peer nodes in the system to be detected and the corresponding migration operations are the same, the two migrations are Peer-to-Peer redundant, and any one of the migrations is removed from the migration Set T _ Set. Because a large number of peer nodes exist in the distributed system, the roles and functions of the nodes in the system are the same, and the same operation on any one of the nodes is equivalent, the method considers that sequences generated by performing the same operation on different peer nodes and sequences generated by performing the same operation on different peer nodes in different orders are equivalent, so that the verification on the peer nodes generates redundant steps, which causes the reduction in efficiency.

S600) obtaining the state of the system to be verified after migration, and obtaining a migration Set T _ Set that can be executed by the system to be verified in the current state again, referring to fig. 2, specifically, the method includes the following steps:

s601) setting and determining a maximum execution depth k explored by a system to be verified, wherein k is a positive integer.

S602) the system to be verified selects an unexecuted migration as a target migration, and drives the system to be verified from the initial state migration to the state S based on the exploration of the target migration execution depth from 0 to the maximum execution depth k_naThe depth value of each exploration is the depth value of the last exploration plus one until the current exploration of the system to be verifiedAnd when the depth exceeds the maximum execution depth k, wherein the target migration belongs to the reduced migration Set T _ Set, N represents the number of items of the target migration in the migration Set T _ Set sequence, N is a positive integer and belongs to {1, N }, N is the total number of migration in the migration Set T _ Set, a represents the depth of the current exploration/the number of times of the current exploration, a is a positive integer and belongs to {1, k }.

S700) checking the security attribute of the state of the system to be verified after migration, and obtaining the security state, where the security attribute is an attribute that each state in a system must satisfy, and the security state is the state of the system to be verified after migration that satisfies the security attribute, please refer to fig. 3, which specifically includes the following steps:

s701) obtaining the safety attribute standard characteristic of the safety attribute.

S702) obtaining the state S to be checked_naCurrent security-related features of (1).

S703) comparing the states S_naIf the current security-related characteristic of the security attribute is the same as the security attribute standard characteristic of the security attribute, determining the state S_naIs in a safe state, if not, determines a state S_naFor the counter-example state, the migration sequence is put into a counter-example set C, and the system to be verified is returned to the previous state S_n(a-1)Executing the above S600), obtaining the state of the system to be verified after the migration, and re-obtaining the migration Set T _ Set that can be executed by the system to be verified in the current state, and taking the state of the new system to be verified as the state S to be checked_naUntil each migration in the migration Set T _ Set is executed, i.e., until the migration Set T _ Set is an empty Set.

S800) checking the activity attribute of the security state to obtain an activity state, where the activity attribute is an attribute that is not required to be satisfied in all states of the system but is always satisfied in a future state, and the activity state is a security state satisfying the activity attribute, please refer to fig. 4, which specifically includes the following steps:

s801) obtaining activity attribute standard characteristics of the activity attributes.

S802) obtaining a current activity-related feature of the security state to be checked.

S803) comparing the current activity related characteristic of the security state with the activity attribute standard characteristic of the activity attribute, if the current activity related characteristic of the security state is the same as the activity attribute standard characteristic of the activity attribute, determining that the security state of the state is the activity state, and returning the system to be verified to the previous state S_n(a-1)Defining the target migration as a key migration t, if the target migration is different from the key migration t, executing the step S700) of checking the security attribute of the state of the system to be verified after the migration, acquiring a new security state, and taking the new security state as the security state to be checked;

s804) if the maximum execution depth k is explored, the current activity related characteristics of the security state to be checked are not the same as the activity attribute standard characteristics of the activity attribute, and the migration sequence is placed into a counter-example set C.

S900), injecting a fault, and rechecking the security attribute and the activity attribute of the state of the system to be verified, referring to fig. 5, specifically, the method includes the following steps:

s901) setting and determining a maximum number of failures of the system to be verified.

S902) injecting a fault in the system to be verified to destroy the critical migration t, and setting the fault recovery step number to be 0.

S903) comparing the number of the injected faults in the current system to be verified with the maximum fault number, if the number of the injected faults is less than the maximum fault number, re-executing the exploration with the depth of 0 to the maximum execution depth k by the system to be verified after the fault is injected, and checking the state S_n(a-1)A safety attribute and an activity attribute, if the number of the injected faults is equal to the maximum fault number, executing the step S600) to obtain the state of the system to be verified after the migration;

as per S700) and S800) steps_n(a-1)If S is a safety attribute and an activity attribute of_n(a-1)Failure to satisfy the safety attribute and failure to satisfy the liveness attribute within the k-step failure recovery depth will result in state S_n(a-1)Migration sequences which do not satisfy the safety attribute or do not satisfy the activity attribute within the k-step fault recovery depth are put into a counterexample set C if S_n(a-1)Satisfy the requirement ofAnd performing a step S902) by using the integrity attribute and the activity attribute, and injecting a new fault into the system to be verified, wherein the number of the injected faults in the system to be verified is increased by one.

S1000) if each migration in the migration Set T _ Set of the system to be verified in the initial state is executed, outputting an inverse Set C, wherein the inverse Set C is a Set of state migration sequences which cause that the state where the system to be verified is located does not meet the security attribute after the migration and state migration sequences which cause that the system state does not meet the activity attribute at the maximum execution depth k.

Referring to fig. 6, the apparatus includes:

and the initialization module is used for initializing the system to be verified and initializing the system to be verified to an initial state.

A migration Set obtaining module, configured to obtain a Current State, where the Current State is marked as Current _ State, and obtain that the system to be verified is a migration Set T _ Set that the system to be verified can execute in the Current _ State, where migration is defined as if migration T is a migration of the system in State s, and if and only if the system can execute an event corresponding to migration T in State s and the system can execute an event corresponding to migration T in State s

A first defining module for defining node N1 and node N2 as peer nodes at a time when and only when condition (5) and condition (5) are simultaneously satisfied, node N1 and node N2;

wherein

Indicates that the ith node is atA set of functions that can be performed at the time of day.

And the second definition module is used for defining the operation contents of the migration t1 corresponding to the migration t2 to be the same operation contents of the migration t1 corresponding to the migration t2 if and only if the condition (7) and the condition (8) are simultaneously met.

The system comprises a Peer Reduction module and a Peer Reduction module, wherein the Peer Reduction module is used for reducing a migration Set T _ Set based on a Peer Reduction Policy (PRP), the Peer Reduction Policy is any two migrations from the migration Set T _ Set, if target nodes of the two migrations are Peer nodes in a system to be detected and corresponding migration operation contents are the same operation, the two migrations are Peer redundancy, any one of the migrations is removed from the migration Set T _ Set, because a large number of Peer nodes exist in the distributed system, roles and functions of the nodes in the system are the same, the same operation of any one of the nodes is equivalent, the method considers that sequences generated by the same operation on different Peer nodes and sequences generated by the same operation on different Peer nodes are different in sequence, and then the Peer nodes are verified, redundant steps are generated, the efficiency is reduced, the method reduces the migration Set T _ Set approximately, and the number of sequences and state space needing to be searched can be reduced approximately by removing the peer migration.

The exploration migration module is configured to obtain a state of the system to be verified after the migration, and obtain a migration Set T _ Set that can be executed by the system to be verified in a current state again, referring to fig. 7, specifically, the exploration migration module includes:

and the depth setting unit is used for setting and determining the maximum execution depth k explored by the system to be verified, wherein k is a positive integer.

A search migration unit for selecting an unexecuted migration as a target migration by the system to be verified and executing depth based on the target migrationFor 0 to maximum execution depth k exploration, the system to be verified is driven from initial state transition to state S_naAnd adding one to the depth value of the last exploration until the depth of the current exploration of the system to be verified exceeds the maximum execution depth k, wherein the target migration belongs to the reduced migration Set T _ Set, N represents the number of items of the target migration in the migration Set T _ Set sequence, N belongs to {1, N } in the migration Set T _ Set, N is the total number of migration in the migration Set T _ Set, a represents the depth of the current exploration/the number of times of the current exploration, a is a positive integer and a belongs to {1, k }.

A first checking module, configured to check a security attribute of a state where a system to be verified is located after migration, and obtain a security state, where the security attribute is an attribute that each state in a system must satisfy, and the security state is the state where the system to be verified is located after migration, which satisfies the security attribute, with reference to fig. 3, specifically, the method includes the following steps:

the first parameter acquiring unit is used for acquiring the safety attribute standard characteristic of the safety attribute.

A second parameter acquiring unit for acquiring the state S to be checked_naCurrent security-related features of (1).

A first comparison unit for comparing the state S_naIf the current security-related characteristic of the security attribute is the same as the security attribute standard characteristic of the security attribute, determining the state S_naIs in a safe state, if not, determines a state S_naFor the counter-example state, the migration sequence is put into a counter-example set C, and the system to be verified is returned to the previous state S_n(a-1)Executing the exploration and migration module to take the state of the new system to be verified as the state S to be checked_naUntil each migration in the migration Set T _ Set is executed, i.e., until the migration Set T _ Set is an empty Set.

A second checking module, configured to check an activity attribute of the security status to obtain an activity status, where the activity attribute is an attribute that is not required to be satisfied in all states of the system but is always satisfied in a future state, and the activity status is a security status that satisfies the activity attribute, please refer to fig. 4, which specifically includes:

and the third parameter acquisition unit is used for acquiring the activity attribute standard characteristic of the activity attribute.

A fourth parameter obtaining unit, configured to obtain a current activity-related feature of the security state to be checked.

A second comparing unit for comparing the current activity-related characteristic of the security state with the activity attribute standard characteristic of the activity attribute, determining the security state as the activity state if the current activity-related characteristic is the same as the activity attribute standard characteristic, and returning the system to be verified to the previous state S_n(a-1)Defining the target migration as a key migration t, if the target migration is different from the key migration t, executing the first checking module, acquiring a new security state, and taking the new security state as a security state to be checked;

The fault injection module is configured to inject a fault, and recheck the security attribute and the activity attribute of the state of the system to be verified, referring to fig. 5, specifically, the fault injection module includes:

and the fault setting unit is used for setting and determining the maximum fault number of the system to be verified.

And the fault injection unit is used for injecting faults into the system to be verified to destroy the key migration t, and the fault recovery step number is set to be 0.

A third comparing unit, configured to compare the number of injection faults in the current system to be verified with the maximum number of faults, if the number of injection faults is less than the maximum number of faults, the system to be verified after fault injection performs the search again with the depth of 0 to the maximum execution depth k, and checks the state S_n(a-1)A safety attribute and an activity attribute, if the number of the injected faults is equal to the maximum fault number, executing the step S600) to obtain the state of the system to be verified after the migration;

according to a first checking module and a second checkingModule execution S_n(a-1)If S is a safety attribute and an activity attribute of_n(a-1)Failure to satisfy the safety attribute and failure to satisfy the liveness attribute within the k-step failure recovery depth will result in state S_n(a-1)Migration sequences which do not satisfy the safety attribute or do not satisfy the activity attribute within the k-step fault recovery depth are put into a counterexample set C if S_n(a-1)And executing a fault injection unit to inject a new fault into the system to be verified, wherein the number of the injected faults in the system to be verified is increased by one.

And the output module is used for outputting an inverse Set C if each migration in the migration Set T _ Set of the system to be verified in the initial state is executed, wherein the inverse Set C is a Set of state migration sequences which cause that the state of the system to be verified does not meet the security attribute after the migration and state migration sequences which cause that the system state does not meet the activity attribute at the maximum execution depth k.

Referring to fig. 11, the method and apparatus introduce a control layer (black part of node in fig. 6) on each node of the distributed system, the control layer obtains and blocks the operation that the node can currently execute, and introduce a server capable of performing RMI communication with each node control layer at the back end. In this way, each time control layer blocks and transmits events (i.e. concurrent events) that can occur at the time node to the server, the server reorders the events by using the state space exploration search method after receiving the events, and the server enables the blocked events according to the reordered event execution sequence (for example, execution a in fig. 6). Therefore, the system to be verified can execute specific events according to a specific sequence to obtain the next system state, check the security attribute and the activity attribute in the state, and then perform the operation again on the obtained state, thereby realizing the traversal of the system state space. The server can use the peer-to-peer reduction strategy of the invention to reduce the number of event sequences to be executed, and meanwhile, the server can actively detect a fault injection point and inject a fault into a system to be verified so as to verify the fault tolerance of the activity attribute in the distributed system.

Referring to FIG. 13, in a distributed system with three nodes, if node A is the leader, nodes B and C are followers. According to the simple atomic broadcast protocol, node a sends one and the same message to node B and node C, respectively. In the above protocol, the security attributes are: the system can not have the conditions of deadlock, memory overflow and the like, and the activity attribute is as follows: at some point in the future, there will be no messages being sent on all channels in the system and each of the operating nodes will receive the message being broadcast. It can be seen that there are two possible critical migrations in the protocol shown in fig. 7, in the first case node C receives the message from node a first and then node B receives the message from node a again, at which point the activity attribute of the system is satisfied, and then a-B is the event corresponding to the critical migration. Because of the sequential uncertainty between a-C and a-B, in the second case, node B receives the message of node a first, then node C receives the message of node a second, and similarly, a-C is the event corresponding to the critical migration. Referring to fig. 14, in this scenario, the maximum number of fault injections is 1, and the search depth is 3, and the method may use two fault injection methods to destroy the critical migration: the A node Crash (CrashA) and the link between the A and the corresponding node are disconnected (OmitaB).

When the peer-to-peer reduction strategy of the method and the device is used, because the role of the node B, C is the same and the operation of the node a on the nodes B and C is the same, the peer-to-peer reduction strategy only needs to explore an event execution sequence (a-C, a-B) and perform fault injection, the injection result is shown in table 1, and the 2 event sequences all cause that the distributed system with three nodes cannot enter the state meeting the activity attribute again, so the 2 event execution sequences are output as counter examples of the distributed system with three nodes.

TABLE 1

The method and the device select three protocols in two widely used distributed systems, namely, Zookeeper and Cassandra, as experimental objects, and perform experiments on a leader selection protocol (ZLE) of the Zookeeper, an atomic broadcast protocol (ZAB) and a gossyper protocol (GS) of the Cassandra. It is verified that, under the same node scale and the number of injected faults, the number of event executions needing to be explored and whether a bug effect is found are compared with the Depth First Search (DFS) method without any state reduction, the peer reduction strategy with the DPOR and the method, and table 2 shows the performance comparison of the Depth First Search method without any state reduction, the peer reduction strategy with the DPOR and the method.

TABLE 2

Protocol	Search method	Injection fault	Number of execution sequences	Acceleration	Bugs
						ZAB	DFS	1	66		L
ZAB	PRP	1	13	5.1	L
						ZAB	DPOR	1	18	3.7	L
ZLE	DFS	1	3542+		L
						ZLE	PRP	1	264	13.4	L
ZLE	DPOR	1	221	16	L
						GS	DFS	1	580		NO
GS	PRP	1	3	193.3	NO
						GS	DPOR	1	40	14.5	NO
GS	DFS	2	4161+		L
						GS	PRP	2	7	594.4	L
GS	DPOR	2	101	41.2	L

The scale of the experimental object is increased, and the number of execution paths to be searched for traversing the state space when no fault is injected is compared between the DFS search method and the peer reduction strategy, i.e., the PRP search method, used in the present invention at different scales, and the experimental result is shown in fig. 15. It can be seen that the PRP search method is used at different node scales to perform the search far lower than the DFS search method. With the increase of the number of nodes and the increase of concurrent events in the system to be verified, the DFS search method has the search time exceeding 12 hours when the number of nodes is 4(ZLE) and 5(ZAB and GS), the searched execution path sequence exceeds 5000, and the PRP search method can complete the search within an acceptable time under the scale of 10 nodes. As can be seen from fig. 10, the PRP search method has the best effect on the GS protocol, and under the scale of 10 nodes, the whole space can be equivalently traversed by searching only one execution sequence.

To study the effect of the number of fault injections on the method, we performed experiments on the GS protocol of 10 nodes, and each time, one fault was added, the results of table 3 were obtained. As can be seen from table 2, as the number of injection faults increases, the number of execution paths to be searched increases, and the number of paths capable of exposing bugs also increases. Meanwhile, if the injection faults are too few, the Bug in the system cannot be exposed, so when the method is used, a small amount of faults can be injected first, and if the time cost allows, more faults can be injected, and the Bug in the system can be better exposed.

TABLE 3

Protocol	Number of fault injections	Number of execution paths	Number of exposed Bug paths
				GS
	0	1	0
				GS	1	3	0
GS	2	7	1
				GS	3	11	3
GS	4	15	6

In conclusion, the beneficial effects of the invention are as follows:

Meanwhile, the source of uncertainty of the protocol to be verified considered by the application is uncertainty of the execution sequence of concurrent events, and the concurrent events comprise communication between nodes on the network, read-write operation of the nodes in local and the like. Therefore, the distributed protocol to be verified is modeled into a model formed by a group of event execution sequences starting from an initial state, concurrent events can be staggered, a level model is realized, detection on the model needs to explore and run each event execution sequence to complete verification, for a security attribute, whether the state is met or not needs to be checked on each state met in the execution process of the protocol to be verified, for an activity attribute, the occurrence of the state meeting the activity attribute is limited within a limited Depth k, Depth First Search (DFS) with the Depth of k is executed from the initial state of the system, and if the system cannot meet the activity attribute within the Depth k, the system is considered to possibly not meet the activity attribute.

The definition of the key migration is given in the application, the key migration is used as a fault injection point, the fault is injected at the key migration position to break the execution of the key migration, and whether other execution paths exist in the system or not is detected, so that the activity property can be satisfied again in the k depth.

Claims

1. A detection method for verifying a fault-tolerant mechanism of distributed protocol activity attributes is characterized by comprising the following steps:

condition (1):

condition (2):

wherein

Represents the set of functions that the ith node can perform at that time:

condition (3): migration t1 is the same object as the migration t2 operation or is a consistent object in a different node;

condition (4): the migration t1 is the same as the migration t2 in the behavior performed on the operation object;

if each migration in a migration Set T _ Set of the system to be verified in the initial state is executed, outputting an inverse example Set C, wherein the inverse example Set C is a state migration sequence which causes that the state of the system to be verified after the migration does not meet the security attribute and a Set of state migration sequences which cause that the state of the system does not meet the activity attribute under the maximum execution depth k;

the method for acquiring the state of the system to be verified after the migration specifically comprises the following steps:

the system to be verified selects an unexecuted migration as a target migration, and drives the system to be verified from the initial state migration to the state S based on the exploration from the target migration execution depth of 0 to the maximum execution depth k_naAdding one to the depth value of the last exploration until the depth of the current system to be verified exploration exceeds the maximum execution depth k, wherein the target migration belongs to reductionAnd the subsequent migration Set T _ Set, N represents the number of the item of the target migration in the migration Set T _ Set sequence, N is a positive integer and belongs to {1, N }, N is the total number of migration in the migration Set T _ Set, a represents the depth of the current exploration/the number of times of the current exploration, a is a positive integer and belongs to {1, k }.

2. The method according to claim 1, wherein the method for detecting the fault-tolerant mechanism of the active attribute of the distributed protocol includes the steps of checking a security attribute of a state of the system to be verified after migration, and obtaining a security state, wherein the security attribute is an attribute that each state in a system must satisfy, and the security state is the state of the system to be verified after migration that satisfies the security attribute, and specifically includes:

acquiring a security attribute standard characteristic of a security attribute;

obtaining a state S to be checked_naCurrent security attribute related characteristics;

comparing state S_naIf the current security attribute-related characteristic and the security attribute standard characteristic of the security attribute are the same, determining the state S_naIs in a safe state, if not, determines a state S_naFor the counter-example state, the migration sequence is put into a counter-example set C, and the system to be verified is returned to the previous state S_n(a-1)Executing the step of obtaining the state of the system to be verified after the migration, and re-obtaining the migration Set T _ Set which can be executed by the system to be verified in the current state, and taking the state of the new system to be verified as the state S to be checked_naUntil each migration in the migration Set T _ Set is executed.

3. The method according to claim 2, wherein the method for detecting the fault-tolerant mechanism of the activity attribute of the distributed protocol includes the following steps of checking the activity attribute of the security state to obtain the activity state, wherein the activity attribute is an attribute that is not required to be satisfied in all states of the system but always satisfied in a future state, and the activity state is a security state that satisfies the activity attribute;

obtaining the activity attribute related characteristics of the safety state to be checked;

comparing the related characteristics of the activity attributes of the security state with the standard characteristics of the activity attributes, if the related characteristics of the activity attributes are the same as the standard characteristics of the activity attributes, determining that the security state of the state is the activity state, and returning the system to be verified to the previous state S_n(a-1)Defining the target migration as a key migration t, if the target migration is different from the key migration t, executing the step of checking the security attribute of the state of the system to be verified after the migration, acquiring a new security state, and taking the new security state as the security state to be checked;

4. The method according to claim 3, wherein the step of injecting the fault and rechecking the security attribute and the activity attribute of the state of the system to be verified includes the following steps:

setting and determining the maximum fault number of a system to be verified;

comparing the number of injected faults in the current system to be verified with the maximum fault number, if the number of injected faults is less than the maximum fault number, the system to be verified after the injected faults re-executes the exploration with the depth of 0 to the maximum execution depth k, and checking the state S_n(a-1)If the number of injected faults is equal to the maximum fault number, executing the step of obtaining the state of the system to be verified after migration, and re-obtaining a migration Set T _ Set which can be executed by the system to be verified in the current state;

performing S according to the steps as claimed in claims 2 and 3_n(a-1)If S is a safety attribute and an activity attribute of_n(a-1)Failure to satisfy the safety attribute and failure to satisfy the activity attribute within the k-step failure recovery depth will result in state S_n(a-1)Migration sequences which do not satisfy the safety attribute or cannot satisfy the activity attribute within the k-step fault recovery depth are put into a counterexample set C if S_n(a-1)And satisfying the security attribute and the activity attribute, executing fault injection in the system to be verified to destroy the key migration t, and setting the fault recovery step number as 0 step.

5. A detection apparatus for verifying a distributed protocol liveness attribute fault tolerance mechanism, comprising:

condition (5):

condition (6):

wherein

Represents the set of functions that the ith node can perform at that time:

condition (7): migration t1 is the same object as the migration t2 operation or is a consistent object in a different node;

condition (8): the migration t1 is the same as the migration t2 in the behavior performed on the operation object;

the output module is used for outputting an inverse Set C if each migration in a migration Set T _ Set of the system to be verified in the initial state is executed, wherein the inverse Set C is a migration sequence which causes the state of the system to be verified after migration to not meet the security attribute and a Set of migration sequences which causes the state of the system to not meet the activity attribute at the maximum execution depth k; wherein, the exploration migration module specifically comprises:

exploring migration sheetsThe element is used for selecting an unexecuted migration from the system to be verified as a target migration, and driving the system to be verified from an initial state migration to a state S based on the exploration of the target migration execution depth from 0 to the maximum execution depth k_naAnd adding one to the depth value of the last exploration until the depth of the current exploration of the system to be verified exceeds the maximum execution depth k, wherein the target migration belongs to the reduced migration Set T _ Set, N represents the number of items of the target migration in the migration Set T _ Set sequence, N belongs to {1, N } and N belongs to the total number of migration in the migration Set T _ Set, a represents the depth of the current exploration/the number of times of the current exploration, a is a positive integer and a belongs to {1, k }.

6. The apparatus for detecting verification of fault-tolerant mechanism of distributed protocol activity attribute as claimed in claim 5, wherein the first checking module specifically comprises:

a second parameter acquiring unit for acquiring the state S to be checked_naA security-related feature of (a);

7. The detecting device for verifying the fault-tolerant mechanism of the activity attribute of the distributed protocol as claimed in claim 6, wherein the second checking module specifically comprises;

a second comparing unit for comparing the current activity-related characteristic of the security state with the activity attribute standard characteristic of the activity attribute, determining the security state as the activity state if the current activity-related characteristic of the security state is the same as the activity attribute standard characteristic of the activity attribute, and returning the system to be verified to the previous state S_n(a-1)Defining the target migration as a key migration t, if the target migration is different from the key migration t, executing the first checking module, and acquiring a new security state, wherein the new security state is used as a security state to be checked;

8. The apparatus for detecting fault-tolerant mechanism of validating distributed protocol activity attribute according to claim 7, wherein the fault injection module specifically includes:

executing S according to the first checking module and the second checking module_n(a-1)If S is a safety attribute and an activity attribute of_n(a-1)Not satisfying the safety attribute and within k-step fault recovery depthFailure to satisfy the Activity Attribute will result in State S_n(a-1)Migration sequences which do not satisfy the safety attribute or do not satisfy the activity attribute within the k-step fault recovery depth are put into a counterexample set C if S_n(a-1)And satisfying the completeness attribute and the activity attribute, and executing the fault injection unit.