CN108959098B - System and method for testing deadlock defects of distributed system program - Google Patents

System and method for testing deadlock defects of distributed system program Download PDF

Info

Publication number
CN108959098B
CN108959098B CN201810799683.XA CN201810799683A CN108959098B CN 108959098 B CN108959098 B CN 108959098B CN 201810799683 A CN201810799683 A CN 201810799683A CN 108959098 B CN108959098 B CN 108959098B
Authority
CN
China
Prior art keywords
thread
lock
priority
module
event
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810799683.XA
Other languages
Chinese (zh)
Other versions
CN108959098A (en
Inventor
赵靖
王延斌
吴卓霏
鲁华林
姚念民
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dalian University of Technology
Original Assignee
Dalian University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dalian University of Technology filed Critical Dalian University of Technology
Priority to CN201810799683.XA priority Critical patent/CN108959098B/en
Publication of CN108959098A publication Critical patent/CN108959098A/en
Application granted granted Critical
Publication of CN108959098B publication Critical patent/CN108959098B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/362Software debugging
    • G06F11/3644Software debugging by instrumenting at runtime
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/362Software debugging
    • G06F11/3648Software debugging using additional hardware
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/52Program synchronisation; Mutual exclusion, e.g. by means of semaphores
    • G06F9/524Deadlock detection or avoidance

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Software Systems (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention provides a testing system and a method aiming at a deadlock defect of a distributed system program, which are characterized in that a network event of the distributed system is simulated and combined with a lock event executed by distributed system software to be tested on a single machine, then a thread scheduling module based on priority is used for converting thread parallel logic into serial logic and judging deadlock through a deadlock detection module, thereby realizing the detection of the deadlock defect of the distributed system on the single machine with probability guarantee, effectively reducing the deadlock detection cost aiming at the distributed system and improving the program testing efficiency.

Description

System and method for testing deadlock defects of distributed system program
Technical Field
The invention belongs to the technical field of software testing, and particularly relates to a testing system and method for deadlock defects of a distributed system program.
Background
Software testing is an important technology for guaranteeing the reliability of a computer system, defects in a computer program can be exposed in advance, and accidental loss of the computer program in a production environment is avoided. Deadlock defect is one of common program defect types in a computer system, and the occurrence of deadlock defect can cause concurrent programs to fall into a circular waiting state, thereby causing program service unavailability or program crash, and the deadlock defect has the characteristics of difficult detection, difficult processing, great harmfulness and the like.
Deadlock bugs exist in computer systems with concurrency, where a program lock is used to maintain a coherency state for shared data in concurrent programs, and a deadlock is formed when a program enters the loop waiting state of the lock. Concurrency in the multi-thread program mainly comes from the operation of sharing the memory by different threads; the concurrency of the distributed system not only includes the concurrency of the multithreading program, but also comes from network communication interaction among different nodes of the distributed system.
In the prior art, deadlock defect detection is only performed on a single-machine multithreading program, and influence of a network message event on a program to be detected is not considered, so that a large amount of missing detection phenomena are generated when the prior art is applied to a distributed system.
Disclosure of Invention
The invention mainly solves the technical problem of providing a testing system aiming at the deadlock defect of a distributed system program. The method can reduce the omission factor of the deadlock defect test of the distributed system, reduce the test overhead and ensure the reliability of the distributed system.
The technical scheme of the invention is as follows:
a testing system aiming at a deadlock defect of a distributed system program comprises distributed system software to be tested, a self-defined network event simulator, a binary instrumentation module, a time sequence maintenance module, a priority-based thread scheduling module and a deadlock detection module;
the user-defined network event simulator is used for simulating network lock events generated on other hosts in the distributed system and sending the generated network lock events to the time sequence maintenance module in a JSON format, wherein the network lock events comprise timestamps, network event thread IDs, lock IDs and lock event memory addresses;
the binary instrumentation module is used for dynamically instrumentation distributed system software to be tested, acquiring all lock events executed by each thread in one operation process of the distributed system software to be tested, and sending the lock events to the time sequence maintenance module in a time sequence through a JSON format, wherein the lock events comprise timestamps, thread IDs (identities), lock IDs (identities) and lock event memory addresses;
the time sequence maintenance module receives the lock events in the JSON format from the user-defined network event simulator and the binary instrumentation module, maintains the received lock events according to the time sequence and the thread ID, and then sends all the collected information to the thread scheduling module based on the priority through the JSON format;
the thread scheduling module based on the priority receives data in a JSON format from the time sequence maintenance module, schedules concurrent threads by using a probability guarantee scheduling algorithm and according to a result fed back by the deadlock detection module, converts parallel logic of the concurrent threads into serial logic, and then sends an execution result to the deadlock detection module through the JSON format by taking a lock event as a unit;
and the deadlock detection module receives the JSON format lock event from the priority-based thread scheduling module, judges whether the current thread is blocked by adopting a deadlock detection algorithm, feeds back the result to the priority-based thread scheduling module, and judges the current unexecuted thread as deadlock if the current unexecuted thread is blocked.
A testing method for deadlock defects of a distributed system program comprises the following steps:
step 1, a user-defined network event simulator simulates network lock events generated by software on other hosts in a distributed system, and sends the generated network lock events to a time sequence maintenance module in a JSON format, wherein the network lock events comprise the following contents: timestamp, network event thread ID, lock event memory address, etc.;
step 2, the binary instrumentation module dynamically instrumentation the distributed system software to be tested, acquires all lock events executed by each thread in one operation process of the distributed system software to be tested, and sends the lock events to the time sequence maintenance module in a time sequence through a JSON format, wherein the lock events comprise the following contents: a timestamp, thread ID, lock event memory address, etc.;
step 3, the time sequence maintenance module receives the lock events in the JSON format from the user-defined network event simulator and the binary instrumentation module, maintains the received lock events according to the time sequence and the thread ID, and then sends all the collected information to the thread scheduling module based on the priority through the JSON format;
step 4, the thread scheduling module based on the priority receives data in a JSON format from the time sequence maintenance module, a probability guarantee scheduling algorithm is used, concurrent threads are scheduled according to a result fed back by the deadlock detection module, parallel logic of the concurrent threads is converted into serial logic, and then an execution result is sent to the deadlock detection module through the JSON format by taking a lock event as a unit;
and 5, the deadlock detection module receives the JSON format lock event from the priority-based thread scheduling module, judges whether the current thread is blocked by adopting a deadlock detection algorithm, feeds the result back to the priority-based thread scheduling module, and judges the current thread as deadlock if the current unexecuted thread is blocked.
The invention has the beneficial effects that: the testing system provided by the invention simulates the network event of the distributed system, combines the network event with the lock event executed by the distributed system software to be tested on the single machine, converts the thread parallel logic into the serial logic through the thread scheduling module based on the priority, and judges the deadlock through the deadlock detection module, thereby realizing the detection of the deadlock defect of the distributed system with probability guarantee on the single machine and effectively reducing the deadlock detection cost aiming at the distributed system.
Drawings
FIG. 1 is a block diagram of a system for testing deadlock defects of a distributed system program according to the present invention.
Fig. 2 is a flowchart of an implementation of a probabilistic guaranteed scheduling algorithm according to an embodiment of the present invention.
FIG. 3 is a flowchart of a deadlock detection algorithm provided by an embodiment of the invention.
Detailed Description
The following description will explain embodiments of the present invention in further detail with reference to the accompanying drawings.
Fig. 1 shows that the present invention designs a testing system for deadlock defects of a distributed system program, which in practical application specifically includes distributed system software to be tested 11, a binary instrumentation module 12, a custom network event simulator 13, a time sequence maintenance module 14, a priority-based thread scheduling module 15, and a deadlock detection module 16;
the custom network event simulator 13 is used to simulate network lock events generated by software on other hosts in the distributed system, and send the generated network lock events to the time sequence maintenance module 14 in JSON format, where the network lock events include: timestamp, network event thread ID, lock event memory address, etc.;
the binary instrumentation module 12 is configured to perform dynamic instrumentation on the distributed system software 11 to be tested, obtain all lock events executed by each thread of the distributed system software 11 to be tested in one operation process, and send the lock events to the time sequence maintenance module 14 in a time sequence according to a JSON format, where the lock events include: a timestamp, thread ID, lock event memory address, etc.;
the time sequence maintenance module 14 receives the lock events in the JSON format from the custom network event simulator 13 and the binary instrumentation module 12, maintains the received lock events according to the time sequence and the thread ID, and then sends all the collected information to the thread scheduling module 15 based on the priority through the JSON format;
the thread scheduling module 15 based on priority receives data in the JSON format from the time sequence maintenance module 14, uses a probability guarantee scheduling algorithm and schedules concurrent threads according to a result fed back by the deadlock detection module 16, converts parallel logic of the concurrent threads into serial logic, and then sends an execution result to the deadlock detection module 16 through the JSON format by taking a lock event as a unit;
the deadlock detection module 16 receives the JSON-formatted lock event from the priority-based thread scheduling module 15, determines whether the current thread is blocked by using a deadlock detection algorithm, feeds back the result to the priority-based thread scheduling module 15, and determines that the current unexecuted thread is deadlock if the current thread is blocked.
Fig. 2 shows a flowchart of implementing a probability guaranteed scheduling algorithm according to an embodiment of the present invention, which is detailed as follows:
in S201, initializing a probability guarantee scheduling algorithm, including the number of threads, the priority of the threads and a priority switching point, and reading data from the thread with the highest priority;
(1) the number of threads n and the total number of events k correspond to the number of threads in the information sent by the time sequence maintenance module 14;
(2) setting an estimated value d of the defect depth;
(3) randomly allocating the priority { d, d + 1., d + n-1} to n threads;
(3) randomly selecting d-1 lock events from the k lock events as priority conversion points;
in S202, it is determined whether there is a priority transition point in the current read data position, if not, step S203 is performed, and if so, step S204 is performed;
in S203, a first lock event record that has not been read is read from the highest priority thread in chronological order;
in S204, randomly allocating the priority {1, 2.,. d-1} to the current thread to update the current thread priority;
in S205, the read lock event data is sent to the deadlock detection module, and the deadlock detection module feeds back whether to terminate the current algorithm and whether to update the priority of the current thread;
in S206, judging whether to terminate the algorithm according to a feedback result sent by the deadlock detection module, if not, entering the step S207, and if so, terminating the current algorithm;
in S207, further determining whether the current thread priority needs to be updated according to a feedback result sent by the deadlock detection module, if so, entering step S204, and if not, entering step S202;
and repeating the steps S202 to S207 until the algorithm is terminated.
Fig. 3 shows a flowchart of an implementation of a deadlock detection algorithm according to an embodiment of the present invention, which is detailed as follows:
in S301, after receiving a lock event from the priority-based thread scheduling module, determining the type of the lock event, if the lock event is a release lock event, entering step S302, and if the lock event is a locking event, entering step S303;
in S302, the memory corresponding to the lock event occupied by the current thread is released;
in S303, it is determined whether the memory requested by the current lock event is locked, if so, step S306 is performed, and if not, step S304 is performed;
in S304, locking the memory applied by the current thread;
in S305, the current thread is set to be in a non-blocked state, and a command of "no need to update the thread priority" is sent to the thread scheduling module based on the priority;
in S306, the current thread is set to a blocking state;
in S307, it is determined whether there is a thread that is not blocked and is running, and if there is a thread that is not blocked, the process proceeds to S308, and if not, the process proceeds to S309;
in S308, a command of "update current thread priority" is returned;
in S309, a "terminate algorithm" command is returned and a deadlock is reported.

Claims (2)

1. The testing system for the distributed system program deadlock defect is characterized by comprising distributed system software to be tested, a self-defined network event simulator, a binary instrumentation module, a time sequence maintenance module, a priority-based thread scheduling module and a deadlock detection module;
the user-defined network event simulator is used for simulating network lock events generated on other hosts in the distributed system and sending the generated network lock events to the time sequence maintenance module in a JSON format, wherein the network lock events comprise timestamps, network event thread IDs, lock IDs and lock event memory addresses;
the binary instrumentation module is used for dynamically instrumentation distributed system software to be tested, acquiring all lock events executed by each thread in one operation process of the distributed system software to be tested, and sending the lock events to the time sequence maintenance module in a time sequence through a JSON format, wherein the lock events comprise timestamps, thread IDs (identities), lock IDs (identities) and lock event memory addresses;
the time sequence maintenance module receives the lock events in the JSON format from the user-defined network event simulator and the binary instrumentation module, maintains the received lock events according to the time sequence and the thread ID, and then sends all the collected information to the thread scheduling module based on the priority through the JSON format;
the thread scheduling module based on the priority receives data in a JSON format from the time sequence maintenance module, schedules concurrent threads by using a probability guarantee scheduling algorithm and according to a result fed back by the deadlock detection module, converts parallel logic of the concurrent threads into serial logic, and then sends an execution result to the deadlock detection module through the JSON format by taking a lock event as a unit;
and the deadlock detection module receives the JSON format lock event from the priority-based thread scheduling module, judges whether the current thread is blocked by adopting a deadlock detection algorithm, feeds back the result to the priority-based thread scheduling module, and judges the current unexecuted thread as deadlock if the current unexecuted thread is blocked.
2. A testing method aiming at deadlock defects of a distributed system program is characterized by comprising the following steps:
(1) the user-defined network event simulator is used for simulating network lock events generated on other hosts in the distributed system and sending the generated network lock events to the time sequence maintenance module in a JSON format, wherein the network lock events comprise timestamps, network event thread IDs, lock IDs and lock event memory addresses;
(2) the binary instrumentation module is used for dynamically instrumentation distributed system software to be tested, acquiring all lock events executed by each thread in one operation process of the distributed system software to be tested, and sending the lock events to the time sequence maintenance module in a time sequence through a JSON format, wherein the lock events comprise timestamps, thread IDs (identities), lock IDs (identities) and lock event memory addresses;
(3) the time sequence maintenance module receives the lock events in the JSON format from the user-defined network event simulator and the binary instrumentation module, maintains the received lock events according to the time sequence and the thread ID, and then sends all the collected information to the thread scheduling module based on the priority through the JSON format;
(4) the thread scheduling module based on the priority receives data in a JSON format from the time sequence maintenance module, schedules concurrent threads by using a probability guarantee scheduling algorithm and according to a result fed back by the deadlock detection module, converts parallel logic of the concurrent threads into serial logic, and then sends an execution result to the deadlock detection module through the JSON format by taking a lock event as a unit;
(5) the deadlock detection module receives a JSON format lock event from the priority-based thread scheduling module, judges whether the current thread is blocked by adopting a deadlock detection algorithm, feeds back the result to the priority-based thread scheduling module, and judges the current unexecuted thread as deadlock if the current unexecuted thread is blocked;
the method comprises the following specific steps:
4.1) initializing a probability guarantee scheduling algorithm, wherein the probability guarantee scheduling algorithm comprises the number of threads, the priority of the threads and priority conversion points, and reading data from the thread with the highest priority;
4.1.1) the number n of threads and the total number k of events correspond to the number of threads in the information sent by the time sequence maintenance module;
4.1.2) setting an estimated value d of the defect depth;
4.1.3) randomly assigning a priority { d, d + 1., d + n-1} to the n threads;
4.1.4) randomly selecting d-1 from k lock events as priority switching points;
4.2) judging whether the current read data position has a priority conversion point, if not, entering a step 4.3), and if so, entering a step 4.4);
4.3) reading a first lock event record which is not read yet from the thread with the highest priority in a time sequence;
4.4) randomly assigning a priority {1, 2 …, d-1} to the current thread to update the current thread priority;
4.5) sending the read lock event data to a deadlock detection module, and feeding back whether to terminate the current algorithm and whether to update the priority of the current thread by the deadlock detection module;
4.6) judging whether to terminate the algorithm according to a feedback result sent by the deadlock detection module, if not, entering the step S207, and if so, terminating the current algorithm;
4.7) further judging whether the priority of the current thread needs to be updated according to a feedback result sent by the deadlock detection module, if so, entering the step 4.4), and if not, entering the step 4.2);
repeating the steps 4.2) to 4.7) until the end;
5.1) judging the type of the lock event after receiving the lock event from the thread scheduling module based on the priority, and if the lock event is a release lock event, entering a step 5.2), and if the lock event is a locking event, entering a step 5.3);
5.2) releasing the memory corresponding to the lock event occupied by the current thread;
5.3) judging whether the memory applied by the current locking event is locked, if so, entering a step 5.6), and if not, entering a step 5.4);
5.4) locking the memory applied by the current thread;
5.5) setting the current thread to be in a non-blocked state, and sending a command of 'not needing to update the thread priority' to the thread scheduling module based on the priority;
5.6) setting the current thread to be in a blocking state;
5.7) judging whether a thread which is not blocked and is running exists, if so, entering a step 5.8), and if not, entering a step 5.9);
5.8) returning a command of 'updating the current thread priority';
5.9) return "terminate algorithm" command and report deadlock.
CN201810799683.XA 2018-07-20 2018-07-20 System and method for testing deadlock defects of distributed system program Active CN108959098B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810799683.XA CN108959098B (en) 2018-07-20 2018-07-20 System and method for testing deadlock defects of distributed system program

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810799683.XA CN108959098B (en) 2018-07-20 2018-07-20 System and method for testing deadlock defects of distributed system program

Publications (2)

Publication Number Publication Date
CN108959098A CN108959098A (en) 2018-12-07
CN108959098B true CN108959098B (en) 2021-11-05

Family

ID=64497913

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810799683.XA Active CN108959098B (en) 2018-07-20 2018-07-20 System and method for testing deadlock defects of distributed system program

Country Status (1)

Country Link
CN (1) CN108959098B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5812824A (en) * 1996-03-22 1998-09-22 Sun Microsystems, Inc. Method and system for preventing device access collision in a distributed simulation executing in one or more computers including concurrent simulated one or more devices controlled by concurrent one or more tests
CN1624664A (en) * 2004-12-20 2005-06-08 华中科技大学 Distribution type software reliability evaluation system having time restraint
EP2037368A2 (en) * 2007-07-30 2009-03-18 Fujitsu Microelectronics Limited Simulation of program execution to detect problem such as deadlock
CN102662840A (en) * 2012-03-31 2012-09-12 天津大学 Automatic detecting system and method for extension behavior of Firefox browser
CN103678122A (en) * 2013-11-29 2014-03-26 华为技术有限公司 Deadlock detecting method, equipment and system
CN104184816A (en) * 2014-08-28 2014-12-03 哈尔滨工程大学 Lookahead dynamic adjustment method based on simulation member event timestamp increment expectation
CN106790694A (en) * 2017-02-21 2017-05-31 广州爱九游信息技术有限公司 The dispatching method of destination object in distributed system and distributed system

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5812824A (en) * 1996-03-22 1998-09-22 Sun Microsystems, Inc. Method and system for preventing device access collision in a distributed simulation executing in one or more computers including concurrent simulated one or more devices controlled by concurrent one or more tests
CN1624664A (en) * 2004-12-20 2005-06-08 华中科技大学 Distribution type software reliability evaluation system having time restraint
EP2037368A2 (en) * 2007-07-30 2009-03-18 Fujitsu Microelectronics Limited Simulation of program execution to detect problem such as deadlock
CN102662840A (en) * 2012-03-31 2012-09-12 天津大学 Automatic detecting system and method for extension behavior of Firefox browser
CN103678122A (en) * 2013-11-29 2014-03-26 华为技术有限公司 Deadlock detecting method, equipment and system
CN104184816A (en) * 2014-08-28 2014-12-03 哈尔滨工程大学 Lookahead dynamic adjustment method based on simulation member event timestamp increment expectation
CN106790694A (en) * 2017-02-21 2017-05-31 广州爱九游信息技术有限公司 The dispatching method of destination object in distributed system and distributed system

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
Adaptive Deadlock Detection and Resolution in Real-Time Distributed Environments;W. Haque, M. Fontaine and A. Vezina;《2017 IEEE 19th International Conference on High Performance Computing and Communications》;20180215;573-577 *
Radius aware probabilistic testing of deadlocks with guarantees;Yan Cai;Zijiang Yang;《2016 31st IEEE/ACM International Conference on Automated Software Engineering (ASE)》;20161006;356-367 *
基于Petri网的分布式系统并发进程的死锁检测;刘荣峰;《计算机工程与设计》;20071128;5353-5355 *
并行/分布式网络模拟系统PDNS的实现及其性能分析;马殿富 赵路;《系统仿真学报》;20011230;429-432 *

Also Published As

Publication number Publication date
CN108959098A (en) 2018-12-07

Similar Documents

Publication Publication Date Title
US10664385B1 (en) Debugging in an actor-based system
US6411982B2 (en) Thread based governor for time scheduled process execution
US10498817B1 (en) Performance tuning in distributed computing systems
US20080172579A1 (en) Test Device For Verifying A Batch Processing
CN101408852A (en) Method, apparatus and system for scheduling task
CN109257396B (en) Distributed lock scheduling method and device
CN108959098B (en) System and method for testing deadlock defects of distributed system program
CN107943567B (en) High-reliability task scheduling method and system based on AMQP protocol
CN111290942A (en) Pressure testing method, device and computer readable medium
CN110287159B (en) File processing method and device
Zhao et al. HLC-PCP: A resource synchronization protocol for certifiable mixed criticality scheduling
KR20150104251A (en) Airplane system and control method thereof
CN110688211A (en) Distributed job scheduling method
CN115809179A (en) Alarm method, system, equipment and storage medium based on application performance data
KR20180134677A (en) Method and apparatus for fault injection test
CN114816866A (en) Fault processing method and device, electronic equipment and storage medium
CN113704355A (en) Data synchronization method, electronic device and storage medium
CN112596915A (en) Distributed lock scheduling method, device, equipment and medium
CN111090575B (en) Test method
CN113515356A (en) Lightweight distributed resource management and task scheduler and method
CN114661432A (en) Task scheduling method, device, equipment and storage medium
CN113703930A (en) Task scheduling method, device and system and computer readable storage medium
CN112463514A (en) Monitoring method and device for distributed cache cluster
CN112379977A (en) Task-level fault processing method based on time triggering
CN117076138B (en) System simulation-oriented resource cross-platform integration and scheduling method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant