CN116795695A

CN116795695A - Automatic construction method of concurrent program defect data set

Info

Publication number: CN116795695A
Application number: CN202310681824.9A
Authority: CN
Inventors: 韩心慧; 梁家硕; 武新逢
Original assignee: Peking University
Current assignee: Peking University
Priority date: 2023-06-09
Filing date: 2023-06-09
Publication date: 2023-09-22

Abstract

The invention provides an automatic construction method of a concurrent program defect data set, which belongs to the technical field of computer application, and can be used for implanting atomic violation type concurrent program defects and sequence violation inversion type concurrent program defects caused by program state error assumption into given C/C++ multithread program source codes to construct the concurrent program defect data set and evaluate the detection effect of a concurrent defect detection tool. The invention summarizes several representative concurrent defect code modes, is used as a template for automatically generating the concurrent defects, adds the generated defect codes to the proper source code positions based on the running path of the multithread program, ensures that the implanted concurrent defects can be triggered, and shows easily-observed program behaviors after the triggering, thereby being convenient for evaluating the concurrent defect detection tool.

Description

Automatic construction method of concurrent program defect data set

Technical Field

The invention belongs to the technical field of computer application, and relates to an automatic construction method of a concurrent program defect data set.

Background

1. Multithreading and concurrent flaws

With the popularity of multi-core processors, multi-threaded programs are widely used. In a multi-threaded programming model, there are multiple threads in a program, each thread having an independent local state, that can concurrently execute a respective sequence of operations. The threads can read and write the global program state in the shared memory through the communication of the shared memory, and the current value of the global state also influences the result of the read-write operation. The order of execution of these operations from different threads depends on the running process of the multi-core processor and the operating system dynamic scheduling. Different thread sequences can lead to different running results, and when a multi-thread program is compiled, the thread synchronization is realized by primitives such as a mutual exclusion lock and the like by considering the interleaving condition of multiple threads at the same time, so that the correct running of the program is ensured.

However, if the programmer does not fully consider the thread-interleaving situation, a unique class of program defects, i.e., concurrency defects, may result, including multiple types of data race and deadlock. There are many possible thread sequences in the program, and concurrent defects often only occur under special thread sequences and are not easy to find. Even if found, the method is not easy to reproduce and debug, and causes great trouble to developers.

2. Concurrent program defect dataset

There are some concurrent program defect data sets, which can be divided into two major categories, namely a historical defect data set and a synthetic defect data set.

The historical defect data set consists of concurrent defects which exist in the history truly, is collected and summarized manually, and can be built only by spending a great deal of manpower. The construction process is time-consuming and labor-consuming, the number of defects is small, the versions of programs to which each defect belongs are different, and the programs often need complex configuration and input modes to run, so that the concurrent defect detection tool is inconvenient to evaluate accurately and conveniently.

The synthetic defect dataset is automatically constructed according to certain rules. For example, ccmuttator generates concurrent defects by randomly changing multithread-related API calls in a program, and drinjject adds a new code that reads and writes global variables to the code of the original program to generate atomic violation-type concurrent defects. However, the current synthetic defect data set is difficult to meet the evaluation requirements of various concurrent defect detection tools, the generated concurrent defects are less in variety, and the added defects are not necessarily triggered in the running process, so that whether the defects exist truly cannot be confirmed.

3. Multithreaded program debugging

The GDB and other program debugging tools can control debugging behaviors through writing scripts, and support dynamic debugging of a multithreaded program, wherein the debugging tools comprise single-step running of threads, checking of current source code positions, checking of thread states, thread switching and other operations. The invention uses the debugging tool to realize the record and reproduction of the running path of the multi-thread program, and the generated path is used for the following steps of defect implantation position selection and the like.

With the popularity of multi-core processors, multi-threaded programs are widely used. However, when the multi-thread program is programmed, the thread synchronization needs to be realized by using primitives such as mutual exclusion lock and the like in consideration of the interleaving condition of multiple threads, so that the correct running of the program is ensured, otherwise, concurrent program defects (abbreviated as concurrent defects) such as data competition or deadlock and the like may be caused.

In order to alleviate this threat, in recent years, various concurrent defect detection methods have been proposed. However, the current method of evaluating concurrent defect inspection tools is typically to manually select a small number of procedures for testing and then manually verify the detected defects one by one. This method not only consumes much labor, but also often cannot reproduce the detected concurrent defects due to the inherent uncertainty of the multithreading program, and cannot confirm whether the detection results of different methods are actually present, and it is difficult to fairly compare the detection results of the different methods.

Disclosure of Invention

The invention aims to provide an automatic construction method of a concurrent program defect data set, which can construct the concurrent program defect data set meeting the requirement for evaluating various concurrent defects and solve the problem that a concurrent defect detection tool is difficult to evaluate.

In order to achieve the above purpose, the present invention adopts the following technical scheme:

an automated construction method of concurrent program defect data sets, comprising the steps of:

1) Selecting an object program and setting a concurrent defect code mode;

2) Dynamically running a target program, randomly scheduling threads, and recording running paths of a plurality of threads in the program, wherein the running paths of the threads refer to code positions of program execution and running sequence of each thread;

3) Automatically analyzing a multithreading running path record, and searching a competition position and a prolog position in a target program, wherein the competition position is the position of a statement which is executed by a plurality of threads concurrently, and the prolog position is the position for placing prolog codes before the competition position;

4) Randomly generating a defect code by taking a concurrent defect code mode as a template, simulating and executing the defect code to verify the validity of the defect, and inserting the defect code into a source code position recorded by a multithreaded running path after the defect code passes the verification to generate a program source code with the concurrent defect; and constructing a concurrent program defect data set according to the program source code with the concurrent defects, the input data capable of triggering the defects, the thread sequence and the defect position information.

Further, the concurrent defect code pattern in step 1) is described by a program State model language formally describing the program states and operations in the concurrent code, using the program State variable State to represent the shared variable set, and changing the program State by assigning operations; using assume to express a programmer's assumption of program state, i.e., a condition that needs to be met, if the condition is not met, the program will trigger a concurrent defect.

Further, the concurrent defect code patterns in step 1) include both an atomic violation concurrent defect code pattern and a sequential violation concurrent defect code pattern.

Further, in step 1), the concurrent defect code patterns of the atomic violation are classified from two dimensions, namely, the assumption of the program state and the use mode of the critical section;

wherein the assumptions about program states include the following three:

read-write-hypothesis: after thread 1 reads the program state, it is assumed that the program state remains unchanged, while thread 2 can modify the state between reading and assumption, disabling the assumption;

write-suppose: after thread 1 writes the program state, it is assumed that the program state remains unchanged, while thread 2 can modify the state between writing to the assumption, disabling the assumption;

write-hypothesis-write: thread 1 performs two consecutive write operations on the program state, thread 2 assuming that both write operations have not yet started or have completed, if thread 2 is executed between the two write operations, resulting in an erroneous program state;

the use modes of the critical sections comprise the following four modes:

no protection: the program has no critical area protection at all, and threads can be randomly and alternately executed;

partial protection: thread 2 may be executed between operations of thread 1, with only thread 1 being protected by critical section and thread 2 not being protected by critical section;

critical sections are too short: the critical section range of the thread is insufficient to cover all operations that need protection;

separation of critical sections: placing different operations of the same thread in different critical sections, and another thread can be executed between the two critical sections;

and combining the three conditions of the assumption of the program state and the four conditions of the use mode of the critical area two by two to obtain the concurrent defect code modes of different types of atomic violations.

Further, the concurrent defect code patterns of the sequence violation in step 1) include the following three types:

out-of-order execution: the lack of memory barrier between program instructions leads to the disordered actual execution sequence;

waiting in time: delaying the operation by sleep statement, and assuming that the waiting time is not satisfied;

no protection or conditional error: an assumption is made about the program state in the case where waiting and condition checking have not been passed, or the detection condition is wrong.

Further, in step 1), the target program is selected according to the following criteria:

multithreading: the target program should contain a plurality of threads capable of concurrent execution;

there is data input: the target program needs to read the input data;

the running time is short: the program should complete execution in a short period of time, not belonging to a daemon or infinite loop.

Further, in step 2), when the target program is dynamically operated, the input data used by the target program is a malformed input generated by the seed input through random variation of its bytes.

Further, the step of recording the running paths of the multiple threads in the program in the step 2) includes:

randomly selecting an unblocked thread;

enabling the selected thread to execute a sentence in a single step, and recording the thread number and the code position after execution;

the above steps are repeated until the program exits or exceeds the time limit set by the person.

Further, in step 3), the competing positions need to satisfy corresponding specific dominant relationships according to different concurrent defect code patterns, including:

the corresponding concurrent defect code mode is unprotected or conditional error;

forward dominance, corresponding concurrent defect code patterns are read-write-hypothesis, write-hypothesis, or wait in time;

the backward dominance relation, the corresponding concurrent defect code pattern is write-hypothesis-write or out-of-order execution.

Further, the defect code generated in step 4) includes the following three types of statements:

variable assignment statement: changing the program state by assigning a value to the shared variable;

condition check statement: checking whether the current program state is the expected state

Program state hypothesis statement: if the condition is not satisfied, the concurrent defect is triggered.

Further, the concurrent defect code generated in step 4) consists of a prologue, a precondition and a defect core; wherein the preamble is used to initialize the program state required for concurrent defects; the precondition is used for checking whether the current program state is the expected state, and if the condition is met, the program enters the defect core; the defect core is a defect code generated according to a concurrent defect code mode, contains error assumptions on program states, and triggers concurrent defects when running under a specific thread order.

Further, the method for simulating the execution of the defect code in the step 4) comprises the following steps: inserting a defect code into a source code position along a multithreading running path and acting on the current program state, calculating an expected program state of each running path record item, and filling expected program state values in a condition check statement; and simulating an execution condition check statement, and if the execution is successful, verifying the validity of the defect.

The method of the invention has the following technical effects:

1) And (3) automatic construction: the method constructs the concurrent program defect data set in an automatic mode, and reduces the workload and time cost of manually creating the data set.

2) Diversity of: by dynamically running the target program and randomly scheduling threads, the running paths of a plurality of threads are recorded, and a diversified concurrent defect data set can be generated; each run path represents a possible concurrent execution scenario, covering different race conditions and thread orders, increasing the diversity of the data set.

3) Verifying defect validity: after generating the defect code, verifying the validity of the defect by using a simulation execution method; by inserting the defect code into the source code location of the path record and calculating the expected program state, it is possible to verify whether the defect will cause the program to behave incorrectly, and this verification can help the developer to better understand the nature and influence of the concurrent defect.

4) Systematic classification and description: the method systematically classifies and describes the concurrent defect code mode, comprising two cases of atomic violation and sequential violation, and can more comprehensively cover various cases of the concurrent defect by considering different program state assumptions and critical area using modes, so that the generated data set is richer and finer.

5) The application range is wide: the method has certain requirements on the selection of the target program, is applicable to the programs with multithreading, data input and short running time, can be applied to the construction of various concurrent defect data sets, and meets the test and research of different scenes and requirements.

The concurrent program defect data set constructed by the method can meet the following requirements:

1) Representative: the constructed concurrent defect data set covers various common concurrent defects in reality, comprises more concurrent defect types, defect generation reasons and concurrent defect code modes, and is convenient for comprehensively evaluating the detection capability of the concurrent defect detection tool in real software. The generated concurrent defects are located in the real application program, rather than in the self-built applet, ensuring the representativeness of the thread context environment of the concurrent defects.

2) Triggerability: there is a set of program inputs and thread sequences that allow the program to execute to the location of the concurrent defect and satisfy the constraints of the concurrent defect itself.

3) Easy observability: after the concurrent defects in the constructed concurrent defect data set are triggered, the program behavior (such as program crash) which is easily observed by the outside is shown, and more detection tools can be supported to evaluate.

Drawings

FIG. 1 illustrates an atomicity violation inversion concurrent defect code pattern used by the present invention.

FIG. 2 illustrates a concurrent defect code pattern of the order violation inversion used by the present invention.

Fig. 3 shows a flow chart of the implantation of a concurrent defect of the present invention.

Fig. 4 shows an example of a concurrent defect implanted in accordance with the present invention.

Detailed Description

In order to make the technical features and advantages or technical effects of the technical scheme of the invention more obvious and understandable, the following detailed description is given with reference to the accompanying drawings.

The automatic construction method of the concurrent program defect data set provided by the invention is based on implantation of concurrent program defects in a running path, and can implant atomic violation and sequential violation concurrent defects caused by the false assumption of a program state into given C/C++ multithread program source codes, wherein the implanted concurrent defects have the characteristics of representativeness, triggerability and observability.

The inventor develops demonstration researches on concurrent defects in reality, and generalizes several representative concurrent defect code patterns, which are used as templates for automatically generating concurrent defects, and the concurrent defect code patterns can be described by using a program state model language, and see fig. 1 and fig. 2 in detail.

The implanted concurrency defect consists of three parts of code: prolog, precondition, defect core. Wherein the preamble will initialize the program state required for the concurrent defect; the precondition checks whether the current program state is the expected state, if so, the program will enter the defective core; the defect core is defect code generated according to the generalized concurrent defect code mode, contains error assumptions on program states, and can trigger concurrent defects when running under a specific thread sequence.

To generate and implant the concurrent defect, the method of the present invention takes the following steps (fig. 3 is a corresponding flowchart):

1) Selecting a target program: and selecting a proper target program manually, and setting concurrent defect configuration information including a concurrent defect code mode. The concurrent defect configuration information may also relate to defect distribution and difficulty, where the defect distribution refers to a distribution situation of a newly implanted defect code at an original source code position of a program, for example, the defect position is concentrated in a certain file, a certain function, and the like; the difficulty refers to the difficulty level of the implanted defect found by the defect detection tool, the difficulty level can be set by adjusting the thread interleaving number required by triggering the defect, and in the position selection step, the more threads the selected program position is located, the more difficult the generated defect is to trigger.

2) And a running path recording step: the debugger is controlled by a script to dynamically run the target program, randomly schedule threads, and record the running paths of a plurality of threads in the program, namely the positions of codes executed by the program and the running sequence of each thread.

3) Position selection: the path of travel is automatically analyzed, and a series of source code locations are selected for subsequent insertion of code, including prolog locations for placement of prolog code, and competing locations that can be concurrently executed by multiple threads.

4) A defect code generation step: randomly generating a program state operation statement conforming to a concurrent defect code mode, and after verifying the validity of the defect through simulation execution, adding the code to the position selected before the program state operation statement to generate program source codes containing the concurrent defect; and constructing a concurrent program defect data set based on the program source code containing the concurrent defects, the input data capable of triggering the defects, the thread sequence and the defect position information.

In order to facilitate understanding and automatically generating concurrent program defects, the inventor researches the concurrent program defects in reality, summarizes common types, generation reasons and concurrent defect code modes of the concurrent program defects, discovers that most of the concurrent defects are caused by 'error assumption of program states', belong to 'atomic violation' and 'sequence violation' types, and account for about 69% of the total number of the concurrent defects. An atomic violation refers to operations on shared variables that are not properly protected in concurrent code, resulting in unexpected results. Sequence violations refer to the sequence of operations in the concurrent code not being consistent with the expected sequence, thereby causing an error.

The following describes formally, using a program state model language, the common concurrent defect code patterns of "atomic violations" and "sequence violations" caused by "false assumptions of program state". The program state model language is a pseudo code which is substantially the same as the semantics of the C/c++ language, except that the meaning of some special symbols requires additional description. The invention uses a "program State" variable State to represent the set of all shared variables in the program, each of which can be considered as a member of State (e.g., state. Var. 0). Assignment of a program State, such as state=s_0, indicates changing the program State to a new State S0. The assignment of a program state member, such as state. Var_0=x, means that only shared variable var_0 is assigned to x, with the values of the other shared variables unchanged. The assumedly statement is used for representing the assumption of the program state by a programmer, such as assume (cond (state. Var_0, state. Var_1)), wherein cond (…) is a Boolean expression and represents a condition which needs to be met by the program state, and if the condition is not met, the program triggers concurrent defects. When describing concurrent defect code mode hereinafter, the invention writes thread 1 on the left side of the code segment, thread 2 on the right side, and initial values of program states on the top of the code segment, and marks the code segments with defect and defect with the fork-free crawler icon and the fork crawler icon respectively.

The concurrent defect code patterns of atomic violations can be described separately from two dimensions: assumptions about program state and the manner in which critical sections are used. These two dimensions are orthogonal and can be combined two by two to yield a rich concurrent defect code pattern.

From the assumption of program state, this dimension describes the way threads assume program state in concurrent code, i.e., the thread's expectation of program state when executing concurrently. According to different assumption modes, different atomicity violation conditions can be obtained. As shown in fig. 1 (a), there are three ways of false assumptions of program states that can cause an atomic violation:

i. read-write-hypothesis: thread 1 first reads the program state and saves it to the local variable state_1, since thread 1 does not modify the program state thereafter, it assumes that the current program state is still state_1. However, thread 2 may rewrite program state between reading assumptions, disabling assumptions.

Write-hypothesis: thread 1 first modifies the program state to state_1, since thread 1 thereafter no longer modifies the program state, it assumes that the current program state is still state_1. However, thread 2 may rewrite program state between writing to assumptions, disabling the assumptions.

Write-hypothesis-write: thread 1 continues to do two write operations to program state, and thread 2 assumes that both write operations have not yet started or have completed. If thread 2 is executed between two write operations, an erroneous program state is obtained.

From the use mode of the critical section, the multithreaded program often uses a mutual exclusion lock to ensure the atomicity of the operation, the program enters the critical section by the locking operation, and leaves the critical section by the unlocking operation. When a thread enters a critical section, other threads can enter the critical section identified by the lock only after waiting until it leaves. It is contemplated that thread 1 needs to perform the two operations op_x and op_y atomically, which is not desirable to be interrupted by thread 2's operation op_z. Fig. 1 (b) shows the manner in which valid and invalid critical section uses:

i. and 3, effective protection: this is the correct way to use critical sections to guarantee atomicity. Op_x and op_y of thread 1 are placed in the same block critical section, and op_z of thread 2 is placed in another critical section identified with the same lock, such that op_z cannot be executed between op_x and op_y. This effective protection is of the correct form and not of one of the types of concurrent defective code patterns.

Unprotected: the program has no critical section protection at all, and threads can be randomly staggered to execute. If thread 2's op_z is executed just between thread 1's op_x and op_y, atomicity is broken.

Partial protection: only thread 1 is protected by the critical section, thread 2 does not check the state of the lock before running op_z, so it can still be executed between op_x and op_y. Furthermore, thread 2 alone has critical section protection and thread 1 does not, may be categorized as such.

Critical zone too short: both threads use critical sections, but thread 1's critical section range is not large enough, and operation op_y is outside the critical section, resulting in an atomic violation. Furthermore, op_y may be classified as such if it is within the critical section but op_x is outside the critical section.

Separation of critical sections: op_x and op_y of thread 1 are placed in two separate critical sections, allowing thread 2 to execute after the end of the critical section of op_x and before the start of the critical section of op_y.

As shown in fig. 2, for a sequence violation, there are 2 correct limiting thread sequences and 3 common sequence violations, where the 3 common sequence violations are of the type in the concurrent defect code pattern, as follows:

i. and (3) effectively waiting: this is the correct way to guarantee thread order by waiting. Thread 1 waits for the state.complex condition to be met using the condition variable cv, thread 2 will assign a value to state.vars first, then set state.complex and wake thread 1 up through the condition variable. This ensures that the program state assumption for thread 1 is performed after thread 2 assigns a value to State.

Non-blocking inspection: this is the correct way to guarantee thread order non-blocking. In real world applications, people often use conditional checks instead of waiting for program operation efficiency reasons. When the State.complete condition is satisfied, it is indicated that thread 2 has been executed to completion; when the condition is not satisfied, thread 1 may be first caused to perform other tasks.

Out-of-order execution: the actual execution order of the instructions of the program may be adjusted by the compiler or processor to achieve higher operating efficiency. If (a 2) in FIG. 2 lacks a memory barrier for line 4, the actual run order may fail the program state assumption for thread 1 before the state.complex assignment by thread 2 is adjusted to the state.vars assignment as illustrated in (a 3) in FIG. 2.

iv, waiting in time: the programmer erroneously uses sleep instead of wait primitives to limit the thread order. However, when the computer is busy, thread 2 may not be scheduled by the operating system for a long period of time, resulting in the assignment of state vars by thread 2 not beginning even after sleep of thread 1 has ended.

Unprotected or conditional error: thread 1 does not go through wait and condition checks (or checked for an error condition, which equates to not checking), an assumption is made about the program state. Although the structure of such sequence violation is simple, in a real program, the false assumption of program state by thread 1 may occur in a function with low frequency of use or be located after an operation with large time overhead, but is not easily triggered.

The specific method of generating and implanting concurrent defects is described next.

As a preferred embodiment, in the choose target procedure step, the present invention chooses procedures according to the following criteria, since not all procedures are suitable for implantation of concurrent defects:

(1) Multithreading, which should include multiple threads in a program, can be executed concurrently;

(2) With data input, the program needs to read the input data (file or standard input), although the concurrent defect does not necessarily depend on the input data, many concurrent defects only appear under specific malformed input, and some concurrent defect inspection tools (such as fuzzy test tools) explore the program path by changing the input, so the program needing to be input is selected, and the tools can be tested by using the data set of the invention;

(3) The running time is short, the program should end in a short time, and cannot be daemon or endless loop, because there are many concurrent defect inspection tools that need to run the tested program repeatedly under different input and thread sequences, and cannot be used to test programs running for a long time.

As a preferred embodiment, the present invention dynamically monitors the target program using the debugger in the running path recording step. The debugger may lock the scheduler of the operating system so that only one thread can run at each time. The multithreaded travel path may be obtained by:

(1) Randomly selecting an unblocked thread;

(2) Enabling the selected thread to execute a sentence in a single step, and recording the thread number and the code position after execution;

(3) The above steps are repeated until the program exits or exceeds the time limit set by the person.

This approach actively, randomly schedules threads, and obviously, all possible thread interleaving orders have an opportunity to be scheduled. And a truly viable path is recorded that would ensure that the inserted code would be executed in the intended order if the program were run entirely in the recorded thread order.

In the course of the path record, the input data used by the target program is the "misshapen input" resulting from the "seed input" variation. The seed input can be test data carried by the target program, or can be constructed manually. The invention randomly changes the byte of the seed input, and the generated malformed input and the thread sequence in the running path record form the expected answer together. The travel path record may be represented by an ordered list, where each element in the list represents a statement that is executed, such as a doublet < tid, loc >, where tid is the thread number and loc is the code location (including file name, line number) of the statement.

Fig. 4 shows an example of a concurrent defect implanted by the present invention, with added code marked with boxes, where lines 2, 5 are prologs, lines 8, 11, 15 are preconditions, lines 9, 12, 16, 17 are defect cores. Table 2 illustrates one path of travel for the example code of FIG. 4. Since there is only one file, the file name of loc is omitted from the table. The program State in the table is represented by a list of values of variables in State, and assume that the input [5] value in the input file is 3. The grey shading in the table indicates the code locations for implantation of concurrent defects in the example code of fig. 4.

TABLE 1

As a preferred embodiment, in the position selection step, an appropriate code position is selected. The code location where the concurrent defect is located should be capable of running in a number of different thread sequences, with the defect only triggering under some of the particular sequences. Thus, the present invention adds defective core code to a source program at a pair of code locations, called competing locations, whose statements can be executed concurrently by two threads. When two threads arrive at competing locations at the same time, the code of which location is executed first depends on which thread is scheduled first.

According to the invention, the competition position in the program is found by automatically analyzing the multithreading running path record. For each entry < tid_1, loc_1> of the running path, enumerating thread tid_2 different from tid_1, querying the last running position loc_2 of the thread before the current entry, loc_1 and loc_2 can be run concurrently, and a pair of competing positions is formed. Table 2 shows the new competition location that can be found from each record.

Some concurrent defects may occupy multiple competing positions. If one thread needs to perform two operations op_x and op_y and the other thread needs to perform op_z, the operations should be such that op_x and op_z can be performed concurrently and op_y and op_z can be performed concurrently, but it is not required that op_x and op_y are adjacent. This is because there may be other extraneous operations between op_x and op_y in a true concurrent defect. In order to make the implanted concurrency defect more real, the invention can additionally select another competition position with the shared record item based on the current competition position, and place the op_z at the code position of the shared record item.

If concurrent defects are added directly to competing locations, an unexpected defect trigger pattern may result. Since only one travel path is currently analyzed, the execution of op_x before op_y in the current path does not mean that in other travel paths as well, which may result in an implanted defect being triggered without the need for multi-line Cheng Jiaocuo execution. To avoid this problem, further analysis of the code locations is required, and according to the requirements of different concurrent defect code patterns on the execution sequence of the operations, only the code locations satisfying a specific dominating relationship are selected, specifically as follows:

i. there is no dominant relationship. The inserted code need not define a dominance relationship. Concurrent defect code patterns applicable to this term include: no protection or conditional error.

Forward dominance. Each path from the entry to the second operation op_y needs to go through the first operation op_x. Concurrent defect code patterns applicable to this term include: read-write-hypothesis, write-hypothesis, waiting in time.

Backward dominance. Each path from the first operation op_x to the exit needs to go through the second operation op_y. Concurrent defect code patterns applicable to this term include: write-hypothesis-write, out-of-order execution.

Multiple preamble locations are also randomly selected before competing for location, and multiple pieces of preamble codes are placed. In addition to the program states required for initializing concurrent defects, the prolog may also be used to adjust the difficulty of finding defects. Because each piece of prolog code may be located in a different thread, the more prolog locations are selected, the more difficult it is to execute these prolog codes in the correct order.

In a preferred embodiment, in the defect code generating step, the defect code is randomly generated using the concurrent defect code pattern as a template, and the defect code is incorporated into the control flow of the target program itself. The generated defect code mainly comprises the following three sentences:

i. variable assignment statements. Program state is changed by assigning a value to the shared variable.

Conditional check statement. It is checked whether the current program state is an expected state.

Program state hypothesis statement. More severe program state checks trigger concurrent defects once the assumed conditions are not met.

In the generated defect code, the preamble is composed of variable assignment sentences, the precondition is a conditional check sentence before each defect core code segment, the defect core is obtained by taking a designated concurrent defect code mode as a template and replacing variables in the defect code with specific variables, and all three sentences are involved.

For variable assignment statements, the assigned values are calculated by random generation, operators include addition, subtraction, exclusive or, etc., and operands of the operations are derived from immediate, input data, values of program state variables. If the operation uses input data, it means that the implanted concurrency defect can only be triggered under specific input. The variable may be assigned by multiple statements from different threads, so the order of execution of the threads may affect the value of the variable. The invention does not change the original variables in the program, but creates some shared variables as program states. But may indirectly utilize the program state of the target program itself. Since the defect code is added along a path of travel, the implanted defect naturally retains the program state constraints imposed by the path of travel. For example, in the example of fig. 4, a portion of the defect code is located under the original if condition statement of the program, and this condition is only satisfied when the xor sum of the arr array is a specific value, which indirectly correlates the defect with the original control stream and data stream of the program.

For a conditional check statement, each program state variable need not be checked, but only whether one of the variables meets the expected value, because the values of the variables can be integrated together using the variable assignment statement prior to checking.

For program state hypothesis statements, there are two implementations. The first is a relatively simple way to call a bug_trigger function directly when the assumption is not true, the parameter of the function being the bug number. In the bug_trigger function, the triggered defect number is recorded, and then the program is actively crashed. Another implementation is to save the erroneous program state into a variable, leaving it to be used later by other defective codes.

After the defect codes are generated, the defect codes are simulated and executed. Along the run-path record, a new statement inserted at the current code location is acted upon the current program state, and the expected program state of each run-path record item is calculated, and these expected state values are filled in the conditional check statement. It is noted that the simulation should not be performed at the same time as the code is generated, since the later generated code may also appear on the travel path that has been previously travelled, resulting in erroneous simulation results. The present invention therefore requires two traversals of the travel path, in which the statement to be generated (i.e., the three statements described above) is determined in the first traversal, and in the second traversal, the execution is simulated. The "program state" column in Table 2 shows the results of simulation execution of the example code. After the simulation execution is successfully finished (namely, the validity verification of the defect is passed), the generated code is inserted into the corresponding source code position of the running path record, and the program source code with the concurrent defect is output.

And finally, storing the program source code with the concurrent defects, the input data capable of triggering the defects, the thread sequence and the defect position information to form a concurrent program defect data set.

Although the present invention has been described with reference to the above embodiments, it should be understood that the invention is not limited thereto, and that modifications and equivalents may be made thereto by those skilled in the art, which modifications and equivalents are intended to be included within the scope of the present invention as defined by the appended claims.

Claims

1. An automated construction method of a concurrent program defect dataset, comprising the steps of:

1) Selecting an object program and setting a concurrent defect code mode;

2. The method of claim 1, wherein the concurrent defect code patterns in step 1) are described by a program State model language formally describing program states and operations in concurrent code, using program State variables State to represent shared variable sets, and changing program states by assignment operations; using assume to express a programmer's assumption of program state, i.e., a condition that needs to be met, if the condition is not met, the program will trigger a concurrent defect.

3. The method of claim 1, wherein the concurrent defect code patterns in step 1) include both an atomically violated concurrent defect code pattern and a sequentially violated concurrent defect code pattern;

the concurrent defect code mode of the atomic violation classifies the concurrent defect code mode from two dimensions of an assumption of a program state and a use mode of a critical section;

wherein the assumptions about program states include the following three:

the use modes of the critical sections comprise the following four modes:

combining the three conditions of the assumption of the program state and the four conditions of the use mode of the critical area two by two to obtain concurrent defect code modes of different types of atomic violations;

the concurrent defect code patterns of the sequence violation include the following three types:

4. The method of claim 1, wherein in step 1) the target program is selected according to the following criteria:

there is data input: the target program needs to read the input data;

5. The method of claim 1, wherein the input data used by the object program in step 2) when the object program is dynamically operated is a misshapen input generated by a seed input through random variation of its bytes.

6. The method of claim 1, wherein the step of recording the travel paths of the plurality of threads in the program in step 2) comprises:

randomly selecting an unblocked thread;

7. The method of claim 1, wherein the competing locations in step 3) satisfy corresponding specific dominating relationships according to different concurrent defect code patterns, comprising:

8. The method of claim 1, wherein the defect code generated in step 4) includes three types of statements:

9. The method of claim 1, wherein the concurrent defect code generated in step 4) consists of a prologue, a precondition, and a defect kernel; wherein the preamble is used to initialize the program state required for concurrent defects; the precondition is used for checking whether the current program state is the expected state, and if the condition is met, the program enters the defect core; the defect core is a defect code generated according to a concurrent defect code mode, contains error assumptions on program states, and triggers concurrent defects when running under a specific thread order.

10. The method of claim 1, wherein the method of simulating execution of the defect code in step 4) is: inserting a defect code into a source code position along a multithreading running path and acting on the current program state, calculating an expected program state of each running path record item, and filling expected program state values in a condition check statement; and simulating an execution condition check statement, and if the execution is successful, verifying the validity of the defect.