CN108959098B

CN108959098B - System and method for testing deadlock defects of distributed system program

Info

Publication number: CN108959098B
Application number: CN201810799683.XA
Authority: CN
Inventors: 赵靖; 王延斌; 吴卓霏; 鲁华林; 姚念民
Original assignee: Dalian University of Technology
Current assignee: Dalian University of Technology
Priority date: 2018-07-20
Filing date: 2018-07-20
Publication date: 2021-11-05
Anticipated expiration: 2038-07-20
Also published as: CN108959098A

Abstract

The invention provides a testing system and a method aiming at a deadlock defect of a distributed system program, which are characterized in that a network event of the distributed system is simulated and combined with a lock event executed by distributed system software to be tested on a single machine, then a thread scheduling module based on priority is used for converting thread parallel logic into serial logic and judging deadlock through a deadlock detection module, thereby realizing the detection of the deadlock defect of the distributed system on the single machine with probability guarantee, effectively reducing the deadlock detection cost aiming at the distributed system and improving the program testing efficiency.

Description

System and method for testing deadlock defects of distributed system program

Technical Field

The invention belongs to the technical field of software testing, and particularly relates to a testing system and method for deadlock defects of a distributed system program.

Background

Software testing is an important technology for guaranteeing the reliability of a computer system, defects in a computer program can be exposed in advance, and accidental loss of the computer program in a production environment is avoided. Deadlock defect is one of common program defect types in a computer system, and the occurrence of deadlock defect can cause concurrent programs to fall into a circular waiting state, thereby causing program service unavailability or program crash, and the deadlock defect has the characteristics of difficult detection, difficult processing, great harmfulness and the like.

Deadlock bugs exist in computer systems with concurrency, where a program lock is used to maintain a coherency state for shared data in concurrent programs, and a deadlock is formed when a program enters the loop waiting state of the lock. Concurrency in the multi-thread program mainly comes from the operation of sharing the memory by different threads; the concurrency of the distributed system not only includes the concurrency of the multithreading program, but also comes from network communication interaction among different nodes of the distributed system.

In the prior art, deadlock defect detection is only performed on a single-machine multithreading program, and influence of a network message event on a program to be detected is not considered, so that a large amount of missing detection phenomena are generated when the prior art is applied to a distributed system.

Disclosure of Invention

The invention mainly solves the technical problem of providing a testing system aiming at the deadlock defect of a distributed system program. The method can reduce the omission factor of the deadlock defect test of the distributed system, reduce the test overhead and ensure the reliability of the distributed system.

The technical scheme of the invention is as follows:

a testing system aiming at a deadlock defect of a distributed system program comprises distributed system software to be tested, a self-defined network event simulator, a binary instrumentation module, a time sequence maintenance module, a priority-based thread scheduling module and a deadlock detection module;

the user-defined network event simulator is used for simulating network lock events generated on other hosts in the distributed system and sending the generated network lock events to the time sequence maintenance module in a JSON format, wherein the network lock events comprise timestamps, network event thread IDs, lock IDs and lock event memory addresses;

the binary instrumentation module is used for dynamically instrumentation distributed system software to be tested, acquiring all lock events executed by each thread in one operation process of the distributed system software to be tested, and sending the lock events to the time sequence maintenance module in a time sequence through a JSON format, wherein the lock events comprise timestamps, thread IDs (identities), lock IDs (identities) and lock event memory addresses;

the time sequence maintenance module receives the lock events in the JSON format from the user-defined network event simulator and the binary instrumentation module, maintains the received lock events according to the time sequence and the thread ID, and then sends all the collected information to the thread scheduling module based on the priority through the JSON format;

the thread scheduling module based on the priority receives data in a JSON format from the time sequence maintenance module, schedules concurrent threads by using a probability guarantee scheduling algorithm and according to a result fed back by the deadlock detection module, converts parallel logic of the concurrent threads into serial logic, and then sends an execution result to the deadlock detection module through the JSON format by taking a lock event as a unit;

and the deadlock detection module receives the JSON format lock event from the priority-based thread scheduling module, judges whether the current thread is blocked by adopting a deadlock detection algorithm, feeds back the result to the priority-based thread scheduling module, and judges the current unexecuted thread as deadlock if the current unexecuted thread is blocked.

A testing method for deadlock defects of a distributed system program comprises the following steps:

step 1, a user-defined network event simulator simulates network lock events generated by software on other hosts in a distributed system, and sends the generated network lock events to a time sequence maintenance module in a JSON format, wherein the network lock events comprise the following contents: timestamp, network event thread ID, lock event memory address, etc.;

step 2, the binary instrumentation module dynamically instrumentation the distributed system software to be tested, acquires all lock events executed by each thread in one operation process of the distributed system software to be tested, and sends the lock events to the time sequence maintenance module in a time sequence through a JSON format, wherein the lock events comprise the following contents: a timestamp, thread ID, lock event memory address, etc.;

step 3, the time sequence maintenance module receives the lock events in the JSON format from the user-defined network event simulator and the binary instrumentation module, maintains the received lock events according to the time sequence and the thread ID, and then sends all the collected information to the thread scheduling module based on the priority through the JSON format;

step 4, the thread scheduling module based on the priority receives data in a JSON format from the time sequence maintenance module, a probability guarantee scheduling algorithm is used, concurrent threads are scheduled according to a result fed back by the deadlock detection module, parallel logic of the concurrent threads is converted into serial logic, and then an execution result is sent to the deadlock detection module through the JSON format by taking a lock event as a unit;

and 5, the deadlock detection module receives the JSON format lock event from the priority-based thread scheduling module, judges whether the current thread is blocked by adopting a deadlock detection algorithm, feeds the result back to the priority-based thread scheduling module, and judges the current thread as deadlock if the current unexecuted thread is blocked.

The invention has the beneficial effects that: the testing system provided by the invention simulates the network event of the distributed system, combines the network event with the lock event executed by the distributed system software to be tested on the single machine, converts the thread parallel logic into the serial logic through the thread scheduling module based on the priority, and judges the deadlock through the deadlock detection module, thereby realizing the detection of the deadlock defect of the distributed system with probability guarantee on the single machine and effectively reducing the deadlock detection cost aiming at the distributed system.

Drawings

FIG. 1 is a block diagram of a system for testing deadlock defects of a distributed system program according to the present invention.

Fig. 2 is a flowchart of an implementation of a probabilistic guaranteed scheduling algorithm according to an embodiment of the present invention.

FIG. 3 is a flowchart of a deadlock detection algorithm provided by an embodiment of the invention.

Detailed Description

The following description will explain embodiments of the present invention in further detail with reference to the accompanying drawings.

Fig. 1 shows that the present invention designs a testing system for deadlock defects of a distributed system program, which in practical application specifically includes distributed system software to be tested 11, a binary instrumentation module 12, a custom network event simulator 13, a time sequence maintenance module 14, a priority-based thread scheduling module 15, and a deadlock detection module 16;

the custom network event simulator 13 is used to simulate network lock events generated by software on other hosts in the distributed system, and send the generated network lock events to the time sequence maintenance module 14 in JSON format, where the network lock events include: timestamp, network event thread ID, lock event memory address, etc.;

the binary instrumentation module 12 is configured to perform dynamic instrumentation on the distributed system software 11 to be tested, obtain all lock events executed by each thread of the distributed system software 11 to be tested in one operation process, and send the lock events to the time sequence maintenance module 14 in a time sequence according to a JSON format, where the lock events include: a timestamp, thread ID, lock event memory address, etc.;

the time sequence maintenance module 14 receives the lock events in the JSON format from the custom network event simulator 13 and the binary instrumentation module 12, maintains the received lock events according to the time sequence and the thread ID, and then sends all the collected information to the thread scheduling module 15 based on the priority through the JSON format;

the thread scheduling module 15 based on priority receives data in the JSON format from the time sequence maintenance module 14, uses a probability guarantee scheduling algorithm and schedules concurrent threads according to a result fed back by the deadlock detection module 16, converts parallel logic of the concurrent threads into serial logic, and then sends an execution result to the deadlock detection module 16 through the JSON format by taking a lock event as a unit;

the deadlock detection module 16 receives the JSON-formatted lock event from the priority-based thread scheduling module 15, determines whether the current thread is blocked by using a deadlock detection algorithm, feeds back the result to the priority-based thread scheduling module 15, and determines that the current unexecuted thread is deadlock if the current thread is blocked.

Fig. 2 shows a flowchart of implementing a probability guaranteed scheduling algorithm according to an embodiment of the present invention, which is detailed as follows:

in S201, initializing a probability guarantee scheduling algorithm, including the number of threads, the priority of the threads and a priority switching point, and reading data from the thread with the highest priority;

(1) the number of threads n and the total number of events k correspond to the number of threads in the information sent by the time sequence maintenance module 14;

(2) setting an estimated value d of the defect depth;

(3) randomly allocating the priority { d, d + 1., d + n-1} to n threads;

(3) randomly selecting d-1 lock events from the k lock events as priority conversion points;

in S202, it is determined whether there is a priority transition point in the current read data position, if not, step S203 is performed, and if so, step S204 is performed;

in S203, a first lock event record that has not been read is read from the highest priority thread in chronological order;

in S204, randomly allocating the priority {1, 2.,. d-1} to the current thread to update the current thread priority;

in S205, the read lock event data is sent to the deadlock detection module, and the deadlock detection module feeds back whether to terminate the current algorithm and whether to update the priority of the current thread;

in S206, judging whether to terminate the algorithm according to a feedback result sent by the deadlock detection module, if not, entering the step S207, and if so, terminating the current algorithm;

in S207, further determining whether the current thread priority needs to be updated according to a feedback result sent by the deadlock detection module, if so, entering step S204, and if not, entering step S202;

and repeating the steps S202 to S207 until the algorithm is terminated.

Fig. 3 shows a flowchart of an implementation of a deadlock detection algorithm according to an embodiment of the present invention, which is detailed as follows:

in S301, after receiving a lock event from the priority-based thread scheduling module, determining the type of the lock event, if the lock event is a release lock event, entering step S302, and if the lock event is a locking event, entering step S303;

in S302, the memory corresponding to the lock event occupied by the current thread is released;

in S303, it is determined whether the memory requested by the current lock event is locked, if so, step S306 is performed, and if not, step S304 is performed;

in S304, locking the memory applied by the current thread;

in S305, the current thread is set to be in a non-blocked state, and a command of "no need to update the thread priority" is sent to the thread scheduling module based on the priority;

in S306, the current thread is set to a blocking state;

in S307, it is determined whether there is a thread that is not blocked and is running, and if there is a thread that is not blocked, the process proceeds to S308, and if not, the process proceeds to S309;

in S308, a command of "update current thread priority" is returned;

in S309, a "terminate algorithm" command is returned and a deadlock is reported.

Claims

1. The testing system for the distributed system program deadlock defect is characterized by comprising distributed system software to be tested, a self-defined network event simulator, a binary instrumentation module, a time sequence maintenance module, a priority-based thread scheduling module and a deadlock detection module;

2. A testing method aiming at deadlock defects of a distributed system program is characterized by comprising the following steps:

(1) the user-defined network event simulator is used for simulating network lock events generated on other hosts in the distributed system and sending the generated network lock events to the time sequence maintenance module in a JSON format, wherein the network lock events comprise timestamps, network event thread IDs, lock IDs and lock event memory addresses;

(2) the binary instrumentation module is used for dynamically instrumentation distributed system software to be tested, acquiring all lock events executed by each thread in one operation process of the distributed system software to be tested, and sending the lock events to the time sequence maintenance module in a time sequence through a JSON format, wherein the lock events comprise timestamps, thread IDs (identities), lock IDs (identities) and lock event memory addresses;

(3) the time sequence maintenance module receives the lock events in the JSON format from the user-defined network event simulator and the binary instrumentation module, maintains the received lock events according to the time sequence and the thread ID, and then sends all the collected information to the thread scheduling module based on the priority through the JSON format;

(4) the thread scheduling module based on the priority receives data in a JSON format from the time sequence maintenance module, schedules concurrent threads by using a probability guarantee scheduling algorithm and according to a result fed back by the deadlock detection module, converts parallel logic of the concurrent threads into serial logic, and then sends an execution result to the deadlock detection module through the JSON format by taking a lock event as a unit;

(5) the deadlock detection module receives a JSON format lock event from the priority-based thread scheduling module, judges whether the current thread is blocked by adopting a deadlock detection algorithm, feeds back the result to the priority-based thread scheduling module, and judges the current unexecuted thread as deadlock if the current unexecuted thread is blocked;

the method comprises the following specific steps:

4.1) initializing a probability guarantee scheduling algorithm, wherein the probability guarantee scheduling algorithm comprises the number of threads, the priority of the threads and priority conversion points, and reading data from the thread with the highest priority;

4.1.1) the number n of threads and the total number k of events correspond to the number of threads in the information sent by the time sequence maintenance module;

4.1.2) setting an estimated value d of the defect depth;

4.1.3) randomly assigning a priority { d, d + 1., d + n-1} to the n threads;

4.1.4) randomly selecting d-1 from k lock events as priority switching points;

4.2) judging whether the current read data position has a priority conversion point, if not, entering a step 4.3), and if so, entering a step 4.4);

4.3) reading a first lock event record which is not read yet from the thread with the highest priority in a time sequence;

4.4) randomly assigning a priority {1, 2 …, d-1} to the current thread to update the current thread priority;

4.5) sending the read lock event data to a deadlock detection module, and feeding back whether to terminate the current algorithm and whether to update the priority of the current thread by the deadlock detection module;

4.6) judging whether to terminate the algorithm according to a feedback result sent by the deadlock detection module, if not, entering the step S207, and if so, terminating the current algorithm;

4.7) further judging whether the priority of the current thread needs to be updated according to a feedback result sent by the deadlock detection module, if so, entering the step 4.4), and if not, entering the step 4.2);

repeating the steps 4.2) to 4.7) until the end;

5.1) judging the type of the lock event after receiving the lock event from the thread scheduling module based on the priority, and if the lock event is a release lock event, entering a step 5.2), and if the lock event is a locking event, entering a step 5.3);

5.2) releasing the memory corresponding to the lock event occupied by the current thread;

5.3) judging whether the memory applied by the current locking event is locked, if so, entering a step 5.6), and if not, entering a step 5.4);

5.4) locking the memory applied by the current thread;

5.5) setting the current thread to be in a non-blocked state, and sending a command of 'not needing to update the thread priority' to the thread scheduling module based on the priority;

5.6) setting the current thread to be in a blocking state;

5.7) judging whether a thread which is not blocked and is running exists, if so, entering a step 5.8), and if not, entering a step 5.9);

5.8) returning a command of 'updating the current thread priority';

5.9) return "terminate algorithm" command and report deadlock.