CN112965838B - Concurrent program data competition checking method and device - Google Patents

Concurrent program data competition checking method and device Download PDF

Info

Publication number
CN112965838B
CN112965838B CN202110280390.2A CN202110280390A CN112965838B CN 112965838 B CN112965838 B CN 112965838B CN 202110280390 A CN202110280390 A CN 202110280390A CN 112965838 B CN112965838 B CN 112965838B
Authority
CN
China
Prior art keywords
node
thread
data
program
path
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110280390.2A
Other languages
Chinese (zh)
Other versions
CN112965838A (en
Inventor
周金果
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alipay Hangzhou Information Technology Co Ltd
Original Assignee
Alipay Hangzhou Information Technology Co Ltd
Filing date
Publication date
Application filed by Alipay Hangzhou Information Technology Co Ltd filed Critical Alipay Hangzhou Information Technology Co Ltd
Priority to CN202110280390.2A priority Critical patent/CN112965838B/en
Publication of CN112965838A publication Critical patent/CN112965838A/en
Application granted granted Critical
Publication of CN112965838B publication Critical patent/CN112965838B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The embodiment of the specification provides a data race checking method and device for concurrent programs. In the data race check method, a respective thread flow graph is constructed for each program function of a parallel program by using bottom-up data flow analysis. In performing data race checking, a set of candidate checking paths corresponding to the data access operation statement of interest is determined by using bottom-up data flow analysis. And then, determining global labels of a start node and an end node of each candidate checking path in the candidate checking path set by using the constructed thread flow graph of each program function, and determining a data competition checking result of the concurrent program by using the determined global labels.

Description

Concurrent program data competition checking method and device
Technical Field
Embodiments of the present disclosure relate generally to the field of computer technology, and in particular, to a method and an apparatus for checking data contention of concurrent programs.
Background
With the popularity of multi-core processors and the development of many-core processors, concurrent programming is increasingly used to improve program performance. By utilizing concurrent programming, the running time of the program can be reduced, and the throughput of the program and the utilization rate of the multi-core processor can be improved. While concurrent programming may provide the benefits described above, concurrency and uncertainty within concurrent programs still results in some unavoidable concurrency issues, such as deadlocks, data race, atomic violations, and sequence violations.
In the concurrency problem described above, data contention refers to that in a multi-threaded program, two or more threads access the same memory location without timing constraints, and at least one thread performs a data write operation. Data race is the root cause of other non-deadlock concurrency flaws and is a common concurrency problem for concurrent programs, and thus, how to efficiently implement data race checking for concurrent programs is a problem to be solved.
Disclosure of Invention
In view of the foregoing, the present embodiments provide a data race check method and a data race check apparatus for concurrent programs. By using the data race check method and the data race check device, the data race check of the concurrent program can be efficiently realized.
According to an aspect of the embodiments of the present disclosure, there is provided a data race check method of a concurrent program, including: determining a set of candidate inspection paths by performing bottom-up data flow analysis on program source code of a concurrent program, a start node of each candidate inspection path in the set of candidate inspection paths representing one data access operation for a memory location, an end node representing another data access operation for the memory location, at least one of the start node and the end node representing a data write operation for the memory location; determining global labels of start nodes and end nodes of each candidate inspection path in the set of candidate inspection paths using a thread flow graph TFG (N, E, nentry, nexit) of each program function; and determining a data race check result of the concurrent program according to global labels of a start node and an end node of each candidate check path, wherein a thread flow graph TFG (N, E, nentry, nexit) of each program function is constructed by performing bottom-up data flow analysis on program source code, N represents a thread flow graph node set { N 1,n2,……,nk},E={<ni,nj>|ni,nj E N } represents a directed edge set reflecting a control flow from the thread flow graph node N i to N j, each program statement in the program function corresponds to one real node in the thread flow graph node set N, and the thread flow graph node set N further includes virtual nodes created based on thread operations and function calls, nentry and nexit represent an entry node and an exit node of the program function, respectively, each thread flow graph node having labels for reflecting thread tags and/or path conditions involved in running the thread flow graph node.
Optionally, in one example of the above aspect, the callout is an expression of an operand and an operation composition, the operand including at least one of a constant thread tag, a variable thread tag, and a path condition, the operation including at least one of an add operation, a subtract operation, and a bind operation.
Optionally, in one example of the above aspect, constructing a corresponding thread flow graph for each program function by performing bottom-up dataflow analysis on program source code may include: from the bottom-most program function of the concurrent program, the following processing is performed based on the bottom-up data flow analysis until the processing of the top-most program function is completed: creating a thread flow graph node of a current program function, wherein the thread flow graph node comprises a real node corresponding to each program statement, a virtual node created according to thread creation and/or function call creation, an inlet node and an outlet node; determining directed edge relationships between various thread flow graph nodes of the current program function according to data flow analysis of program source code of the current program function; and determining the labels of the nodes of the thread flow graph of the current program function by using the label processing rules corresponding to the determined directed edge relations and the node labels of the lower program function.
Optionally, in one example of the above aspect, the directed edge includes at least one of: directed edge → flow, directed edge → create, directed edge → join and directed edge → call.
Optionally, in one example of the above aspect, when performing bottom-up data flow analysis on program source code of a concurrent program, multiple real nodes whose data flow directions do not change from each other are combined into a single thread flow graph node.
Optionally, in one example of the above aspect, determining the candidate set of inspection paths may include: acquiring a complete data flow path set corresponding to data access operation by performing bottom-up data flow analysis on program source codes of the concurrent program, wherein a start node of each complete data flow path in the complete data flow path set represents one data access operation for a memory location, and an end node represents another data access operation for the memory location; and removing the data flow paths corresponding to the start node and the end node and the non-data writing operation from the screened complete data flow path set to obtain a candidate checking path set.
Optionally, in one example of the above aspect, removing, from the screened complete data flow path set, data flow paths corresponding to both the start node and the end node to the non-data write operation, the candidate checking path set may include: and removing unreachable data flow paths and data flow paths corresponding to the start node and the end node to the non-data writing operation from the screened complete data flow path set to obtain a candidate checking path set.
Optionally, in one example of the above aspect, determining the data race check result according to global annotations of the start node and the end node of each candidate check path may include: and determining that the program sentences represented by the starting node and the ending node have data competition relations when the same thread labels or the thread labels with corresponding relations exist in the global labels of the starting node and the ending node aiming at each candidate checking path.
Optionally, in one example of the above aspect, the constructing of the thread flow graph is performed in parallel with the determining of the candidate inspection path.
According to another aspect of the embodiments of the present specification, there is provided a data race check apparatus of a concurrent program, the data race check apparatus including: at least one processor, a memory coupled with the at least one processor, and a computer program stored in the memory, the at least one processor executing the computer program to implement: determining a set of candidate checking paths by performing bottom-up data flow analysis on program source codes of concurrent programs, wherein a start node of each candidate checking path in the set of candidate checking paths represents one data access operation for a memory location, an end node represents another data access operation for the memory location, and at least one of the start node and the end node represents a data write operation for the memory location; determining global labels of start nodes and end nodes of each candidate inspection path in the set of candidate inspection paths using a thread flow graph TFG (N, E, nentry, nexit) of each program function; and determining a data race check result of the concurrent program according to global labels of a start node and an end node of each candidate check path, wherein a thread flow graph TFG (N, E, nentry, nexit) of each program function is constructed by performing bottom-up data flow analysis on program source code, N represents a thread flow graph node set { N 1,n2,……,nk},E={<ni,nj>|ni,nj E N } represents a directed edge set reflecting a control flow from the thread flow graph node N i to N j, each program statement in the program function corresponds to one real node in the thread flow graph node set N, and the thread flow graph node set N further includes virtual nodes created based on thread operations and function calls, nentry and nexit represent an entry node and an exit node of the program function, respectively, each thread flow graph node having labels for reflecting thread tags and/or path conditions involved in running the thread flow graph node.
According to another aspect of the embodiments of the present specification, there is provided a computer-readable storage medium storing a computer program that is executed by a processor to implement the data race check method as described above.
According to another aspect of embodiments of the present specification, there is provided a computer program product comprising a computer program that is executed by a processor to implement the data race check method as described above.
Drawings
A further understanding of the nature and advantages of the present description may be realized by reference to the following drawings. In the drawings, similar components or features may have the same reference numerals.
Fig. 1 shows a flowchart of a data race check procedure according to an embodiment of the present specification.
FIG. 2 illustrates an example schematic diagram of program source code of a concurrent program.
FIG. 3 shows a flow diagram of a thread-flow graph construction process according to an embodiment of the present description.
Fig. 4A-4D show example schematic diagrams of thread flow diagrams according to embodiments of the present description.
Fig. 5 shows a flowchart of a candidate inspection path set determination procedure according to an embodiment of the present specification.
Fig. 6 shows a block diagram of a data race check device according to an embodiment of the present specification.
FIG. 7 illustrates a block diagram of one example of an implementation of a thread-flow graph construction unit according to an embodiment of the present description.
Fig. 8 shows a block diagram of one implementation example of the inspection path set determining unit according to an embodiment of the present specification.
Fig. 9 shows a schematic diagram of a computer-implemented data race check-up device according to an embodiment of the present description.
Detailed Description
The subject matter described herein will now be discussed with reference to example embodiments. It should be appreciated that these embodiments are discussed only to enable a person skilled in the art to better understand and thereby practice the subject matter described herein, and are not limiting of the scope, applicability, or examples set forth in the claims. Changes may be made in the function and arrangement of elements discussed without departing from the scope of the disclosure as set forth in the specification. Various examples may omit, replace, or add various procedures or components as desired. For example, the described methods may be performed in a different order than described, and various steps may be added, omitted, or combined. In addition, features described with respect to some examples may be combined in other examples as well.
As used herein, the term "comprising" and variations thereof mean open-ended terms, meaning "including, but not limited to. The term "based on" means "based at least in part on". The terms "one embodiment" and "an embodiment" mean "at least one embodiment. The term "another embodiment" means "at least one other embodiment". The terms "first," "second," and the like, may refer to different or the same object. Other definitions, whether explicit or implicit, may be included below. Unless the context clearly indicates otherwise, the definition of a term is consistent throughout this specification.
The data race check scheme may be classified into a dynamic analysis scheme and a static analysis scheme according to the data race check timing. The dynamic analysis scheme obtains accurate information of the variable and the alias through pile insertion, but because of different thread scheduling strategies and different execution results of the program, the coverage of the dynamic analysis scheme is incomplete, a lot of missing reports exist, and the detection cost is high. Compared with a dynamic analysis scheme, the static analysis scheme has the advantages of high speed and comprehensive inspection.
Static analysis schemes may also be referred to as static analysis tools. Examples of static analysis schemes may include LOCKSMITH, goblint, RELAY, iteRace, ECHO, racerX, and the like. LOCKSMITH is a context-sensitive correlation analysis tool for data race. It uses constraint-based techniques to compute associations describing locks that protect left values. LOCKSMITH is only applied to the program that detects 100K lines of code. Goblint combine pointer analysis with value analysis, enabling it to handle complex locking situations. RELAY is a static extensible algorithm that uses a bottom-up analysis method. The method adopts the methods of symbol execution, pointer analysis, lock set analysis and protected access analysis to conduct data race check. IteRace is a static data competition detection tool for Java parallel circulation, which realizes lower false alarm rate by specially processing lambda type parallel circulation, tracking and summarizing, but IteRace only can analyze parallel circulation. ECHO can detect data race in the IDE in real time during the writing of code. To be able to be used in IDE, pointer analysis employed by ECHO will be incremental when the program is modified. However ECHO does not achieve context and path sensitivity due to the requirements that exist for analysis speed. RacerX detect data race and deadlock using flow-sensitive inter-process analysis. However, to achieve scalability RacerX discards some of the program information, e.g., using the type to represent all left values, which results in RacerX analysis imperfections.
Embodiments of the present description provide a data race check scheme for concurrent programs. In the data race check scheme, first, a thread flow graph for reflecting thread information of a program function is constructed for each program function of a parallel program by using bottom-up data flow analysis, the thread flow graph taking into consideration the influence of thread information introduced by an underlying program function. In performing data race checking, a set of candidate checking paths corresponding to the data access operation statement of interest is determined by using bottom-up data flow analysis. And then, determining global labels of a start node and an end node of each candidate checking path in the candidate checking path set by using the constructed thread flow graph of each program function, wherein the determined global labels are used for determining a data competition checking result. According to the data competition checking scheme, the influence of the thread information introduced by the lower program function is considered when the thread flow diagram of each program function is constructed, so that the lower program function is not required to be considered when the thread flow diagram is utilized for data competition checking, and the checking efficiency in the data competition checking can be improved. In addition, in the data race checking scheme, the influence of the upper program function is introduced when the global labels of the start node and the end node are determined, so that the determined global label considers the context influence, and the context-sensitive data race checking is realized. In addition, the constructed thread flow graph contains flow relationships and path conditions such that the data race check scheme is based on both flow-sensitive and path-sensitive data race check schemes. In addition, in the data race check scheme, the data race check is performed only on a part of the data stream paths determined according to the data stream analysis, not on all the data stream paths of the concurrent program, so that the check efficiency of the data race check can be improved.
A data race check method and a data race check apparatus for concurrent programs according to embodiments of the present specification will be described in detail below with reference to the accompanying drawings.
Fig. 1 shows a flow chart of a data race check process 100 according to an embodiment of the present specification.
As shown in fig. 1, at 110, a thread flow graph (Thread Flow Graph) TFG (N, E, nentry, nexit) for each program function is constructed by bottom-up data flow analysis of the program source code of the concurrent program. In this bottom-up data flow analysis, recursive calls are not considered. If a recursive call is encountered during data flow analysis, the recursive call is ignored directly, and is handled as a normal operation only, and is not handled as a function call.
In the constructed thread-flow graph TFG, N represents a thread-flow graph node set { N 1,n2,……,nk},E={<ni,nj>|ni,nj e N } represents a directed edge set reflecting a control flow from thread-flow graph nodes N i to N j, each program statement in the program function corresponds to a real node in the thread-flow graph node set N, and the thread-flow graph node set N further includes virtual nodes created based on thread operations and function calls, nentry and nexit represent an ingress node and an egress node of the program function, respectively. In addition, each thread-flow graph node has an annotation reflecting the thread tag and/or path condition that the thread-flow graph node is running on.
Optionally, in one example, the callout may be an expression of an operand including at least one of a constant thread tag, a variable thread tag, and a path condition and an operation including at least one of an add operation, a subtract operation, and a bind operation. Here, the constant thread tag refers to a deterministic thread tag, for example, a thread tag "T3" in fig. 4C for indicating the main thread T3. The variable thread tag refers to a non-deterministic tag corresponding to a variable operation, for example, a thread tag "arg2" of the egress node D in fig. 4A, which indicates a thread corresponding to the parameter arg2, specifically which thread needs to be determined according to a specific operation. The path condition is used to indicate the path condition required for the thread to execute, for example, "g >0" and "g < =0" in fig. 4D. The add operation "+" is used to indicate an operation to add a thread, the subtract operation "-" is used to indicate an operation to remove a thread, and the bind operation "()" is used to indicate an operation to bind two thread tags (or thread tags with operations).
In some embodiments, the directed edge may include at least one of: directed edge → flow, directed edge → create, directed edge → join and directed edge → call. Directed edge flow is used to indicate the flow of data between internal nodes of a program function. Directed edge→create is used to connect the call site of thread_create to the entry node of the thread. Directed edge→join is used to connect the thread's exit node to the call site of the thread_join operation. Directed edge-call is used to connect call sites of other function calls to the entry node of the corresponding callee. Further, directed edges→create and directed edges→join require matching corresponding operations by thread identifier (thread ID), e.g., a thread_join operation requires matching a thread that corresponds to its parameters (i.e., thread identifier) and is created by the thread_create operation.
FIG. 2 illustrates an example schematic diagram of program source code of a concurrent program. As shown in fig. 2, the concurrent program includes 4 program functions: func A, func B, func C and Func D. The call relations between the program functions are Func A call Func B and Func D, func B call Func C, and Func C call Func D. Furthermore, there are three variables in this concurrent program: global variable g and two memory addresses x, y. The global variable g is used for example path conditions and the memory addresses (memory locations) x and y are used for example data race analysis, i.e., analyzing data access operations for memory addresses x and y. Three operations are performed in this concurrent program: a data Write operation (Write), a function Call operation (Call), and a thread creation operation (NEWTHREAD, which may also be referred to as thread_create) to a memory address. It is noted that the illustration in fig. 2 is given only as an example, and in other embodiments, other operations may be included, such as a thread_join operation, a thread_notify operation, a thread_wait operation, a thread_lock operation, or a thread_unlock operation.
FIG. 3 shows a flow diagram of a thread-flow graph construction process according to an embodiment of the present description.
As shown in fig. 3, the thread-flow graph construction process is performed based on the bottom-up data flow analysis from the lowest program function of the concurrent program until the thread-flow graph construction process of the top-most program function is completed.
Specifically, at 310, the lowest program function is selected as the initial current program function, e.g., function Func D shown in fig. 2. Next, at 320, a thread-flow graph node for the current program function is created. The created thread-flow graph nodes include real nodes, virtual nodes, ingress nodes, and egress nodes. Each program statement of the current program function corresponds to a real node and virtual nodes are created for the program statements used for thread creation and function calls. For example, for function Func D, real nodes 25 and 26, ingress node D and egress node D are created. For the function Func, real nodes 18, 19, 21 are created as well as an ingress node C and an egress node C. Further, since the node 21 is used for the on-thread (thread creation), it is necessary to create a corresponding virtual node 21'. For the function Func B, a real node 14, an ingress node B and an egress node B are created. In addition, since the node 14 is used for function calls, a corresponding virtual node 14' needs to be created. For the function Func a, real nodes 4,5, 6, 7, 8 and 9, ingress node a and egress node a are created. Furthermore, since nodes 5 and 8 are used for thread creation, it is necessary to create corresponding virtual nodes 5 'and 8'. Since nodes 7 and 9 are used for function calls, it is necessary to create corresponding virtual nodes 7 'and 9'.
At 330, directed edge relationships between respective thread flow graph nodes of the current program function are determined from a data flow analysis of program source code of the current program function, as shown in fig. 4A-4D.
At 340, labels of nodes of each thread flow graph of the current program function are determined using the label processing rule corresponding to the determined directed edge relationship and the node labels of the lower program function.
In embodiments of the present description, the thread tags may be defined as simple tags T (main line Cheng Biaoqian) and T (child thread tags) as shown in fig. 4A-4D.
In the embodiment of the present specification, the directed edge→flow, the directed edge→create, the directed edge→join and the directed edge→call correspond to different labeling processing rules, that is, flow rule, create rule, join rule and Call rule, respectively. In addition, labeling processing rules may also include Wait/Notify rules, lock rules, and UnLock rules.
The Flow rule represents a basic annotation transformation. For node n m, if there are k nodes connected to node n m by directed edge flow, the labels from the predecessor nodes (e.g., branches) are merged as labels for node n m. If a node has only one predecessor node, then its successor node and predecessor node have the same label.
The Create rule is a conversion rule corresponding to the thread_create operation. In the Create rule, thread creation will introduce a main thread Cheng Biaoqian T and a corresponding child thread tag T, where main thread Cheng Biaoqian T is assigned to the successor node of the node that created the thread, and the corresponding child thread tag T is assigned to the virtual node of the node that created the thread. For example, for function Func, node 21 is used to create a thread, introducing a main thread Cheng Biaoqian T3 and a child thread tag T3. According to the Create rule, a mainline Cheng Biaoqian T3 is assigned to the egress node C and a child thread tag T3 is assigned to the virtual node 21' of the node 21. For function Func a, node 5 is used to create a thread, introducing a main thread Cheng Biaoqian T1 and a child thread tag T1. According to the Create rule, a mainline Cheng Biaoqian T1 is assigned to the successor node 6 of node 5, and a child thread tag T1 is assigned to the virtual node 5' of node 5. Node 8 is used to create a thread, introducing a main thread Cheng Biaoqian T2 and a child thread tag T2. According to the Create rule, the mainline Cheng Biaoqian T2 is assigned to the successor node 9 of node 8, and the child thread tag T2 is assigned to the virtual node 8' of node 8.
Join rules indicate that if a thread tag waits for termination, a decrement operation is performed on the thread tag. The Wait/Notify rule indicates that an ID (i.e., the unique identification of both operations) is added to the set of labels for all predecessor nodes of thread_notify and all successor nodes of thread_wait (including inter-process nodes). Lock rules and UnLock rules are used to describe Lock analysis. When the thread_lock function is called, one label is added to the lock set, and correspondingly, when the thread_unlock function is called, the corresponding label is removed from the corresponding lock set. These two labeling rules simulate the semantics of locking and unlocking.
The Call rule is used for labeling process of function Call. When one annotation arrives at a function call, the call point transmits the current annotation to the callee, and the output annotation of the call point is the annotation of the exit node of the merged callee. In function calls, the callee is treated as an inline function, which means that the label before the call point is the same as the label before the exit node of the callee. Similarly, the labels after the call site are the same as the labels after the exit node of the callee. Call rules are used to transform annotations so that caller and callee context information can be introduced into the data flow analysis.
After the thread flow graph of the current program function is obtained as above, at 350, it is determined whether there are more program functions that have not yet been constructed by the thread flow graph. If there are no program functions that do not complete the thread-flow graph construction, the thread-flow graph construction ends. If there are program functions for which thread flow graph construction is not complete, then at 360, an upper level program function or a same-level unprocessed program function of the current program function is selected as the current program function for the next process, and returns to 320 to execute the next process.
The label determination process for each thread flow graph node of a program function will be described below with reference to the program source code shown in fig. 2.
For example, for program function Func D, real nodes 25 and 26, ingress node D and egress node D are created since thread creation and function call are not involved in the program function. The entry node D is marked as empty, the directed edges between the entry node D and the real node 25 and the directed edges between the real node 25 and the real node 26 are both directed edges→flow, and the marks of the real nodes 25 and 26 are empty according to the Flow rule. The directed edge between the real node 26 and the egress node D is Join, and the subtraction operation will be performed on the thread tag associated with the second parameter according to Join rules, so that the egress node D is labeled-arg 2. Here, the label arg2 is a variable thread tag, and "-" is a subtraction operation. According to the above processing, a thread flow diagram of the program function Func D shown in fig. 4A is obtained.
For the program function Func, real nodes 18, 19 and 21 and an ingress node C and an egress node C are created. Further, since the node 21 is used for thread creation, a corresponding virtual node 21' is created. The directed edges between the ingress node C and the node 18, the nodes 18 and 19, the nodes 19 and 21, the nodes 18 and 21, and the node 21 and the egress node C are all directed edges- →flow, and the labels of the real nodes 18, 19 and 21 are null according to the Flow rule. Further, since the real node 21 is used for thread creation, a corresponding virtual node 21 'is created, in other words, the real node 21 is connected with the virtual node 21' with a directed edge→create, and is used to call a function Func D. The real node 21 is thread creation (T3) such that the virtual node 21' is denoted T3. Furthermore, since the label of the exit node D of the function Func D is-arg 2 (i.e., the node label of the lower program function), and arg2 of the function Func D is a real parameter T, which is arg1 of the function Func C, the label of the subsequent node "exit node C" of the real node 21 is-arg 1, and since the function Func D is opened in a thread creating manner, -arg1 is bound to the main thread T3, and if T3 is joined, arg1 is joined at the same time, resulting in a label of "exit node C" of T3 (-arg 1). According to the above processing, a thread flow diagram of the program function Func shown in fig. 4B is obtained.
For the program function Func B, a real node 14, an ingress node B and an egress node B are created. In addition, since the real node 14 is used for function calls, a corresponding virtual node 14' is created. The directional edges between the node 14 and the node 14 are directional edges- & gtflow, and the label of the real node 14 is null according to the Flow rule. The directed edge between node 14 and node 14' is → Call, and the label of virtual node 14' is the same as that of node 14 according to the Call rule, whereby the label of virtual node 14' is null. Node 14 calls program function Func, which returns label T3 (-arg 1), and arg1 in program function Func B is Null, i.e. not present, so that node 14 can only pass label T3 to the following node, i.e. egress node B, whereby the label of egress node B is T3. According to the above processing, a thread flow diagram of the program function Func B shown in fig. 4C is obtained.
For the program function Func a, real nodes 4, 5, 6, 7, 8 and 9, virtual nodes 5', 7', 8', 9' and ingress node a and egress node a are created. The directional edges between the entry node a and the node 4 and between the nodes 4 and 5 are directional edges- & gtflow, and the marking of the entry node a is null, and the marking of the real nodes 4 and 5 is null according to the Flow rule. Since the directed edge between nodes 5 and 6 is directed edge→flow, and node 5 invokes program function Func B by way of thread creation, according to the Create rule, the label of virtual node 5' is T1, and the successor node 6 is given label T1, and further since program function Func B returns label T3, the label of node 6 is t1+t3. The directed edge between nodes 6 and 7 is directed edge→flow, and then according to the Flow rule, the label of node 7 is the same as the label of node 6, i.e., t1+t3. The directed edge between nodes 7 and 7 'is directed edge→call, then node 7' is labeled as T1+T3 according to the Call rule. Furthermore, the node 7 calls the program function Func D in the manner of a function call, and the program function Func D returns the label-arg 2 (node label of the lower program function), where arg2 is T1/T1 generated by the node 5, so that the thread tag T1 needs to be subtracted, and since the path condition of the operation of the node 7 is (g > 0), the label of the node 8 is T1 (g < =0) +t3. In addition, node 8 invokes the program function Func B in a thread-created manner, and according to the create rule, virtual node 8' is labeled T2 (g > 0) +t1 (g < =0) +t3. The directed edge between nodes 8 and 9 is directed edge-flow, i.e., node 9 is the successor node to node 8, since node 8 calls program function Func B in a thread-created manner, and program function Func B returns label T3, whereby node 8 generates a new label T2 (g > 0), such that the label of node 9 is T1 (g < = 0) +t2 (g > 0) +t3. Since node 9 calls the program function Func D in the manner of a function call and the program function Func D returns the label-arg 2, where arg2 is T2/T2 generated by node 8, the thread tag needs to be subtracted and the path condition of node 8 is g >0, so that the label of egress node a is T1 (g < = 0) +t3. According to the above processing, a thread flow diagram of the program function Func a shown in fig. 4D is obtained.
In addition, optionally, in the process of constructing the thread flow graph, thread flow graph node merging processing can be performed. Specifically, when the program source code of the concurrent program is subjected to bottom-up data flow analysis, a plurality of real nodes with unchanged data flow directions among each other are combined into a single thread flow graph node. For example, for the program function Func D, nodes 25 and 26 may be merged. For the program function Func, nodes 18, 19 and 21 may be combined. For the program function Func B, no merging is possible because the only node 14 is a function call. For program function Func a, since nodes 5,7,8,9 are all function calls/thread creations, there are two edges between nodes 6 and 7 and egress node a, and node 7 and egress node a are not merged, so that node 6 cannot be merged with node 7 and egress node a, whereby in program function Func a, only nodes 4 and 5 can be merged.
In the process of constructing the thread flow graph, the thread flow graph of each constructed program function is executed in a bottom-up mode, and the influence of the lower program function on node marking of the constructed thread flow graph is considered, so that the constructed thread flow graph introduces the context influence of the lower program function, and the constructed thread flow graph supports context sensitive analysis. In addition, the constructed thread flow graph has labels that reflect the edge relationships of the control flow direction (data flow direction) and contain path conditions, such that the constructed thread flow graph supports flow-sensitive analysis and path-sensitive analysis.
The thread flow graph construction process described above may be pre-processed and the thread flow graph of each program function constructed is stored in a database or storage device for use by a subsequent data race check process. In another embodiment, the thread flow graph construction process described above may also be performed in real-time during the data race check process.
Returning to fig. 1, at 120, a set of candidate inspection paths is determined by performing bottom-up data flow analysis on program source code of a concurrent program, a start node of each candidate inspection path in the set of candidate inspection paths represents one data access operation for a memory location, an end node represents another data access operation for the memory location, and at least one data access operation in the data access operations corresponding to the start node and the end node is a data write operation. In addition, the resulting candidate inspection path contains various function call relationships that exist in the path. In this specification, examples of data access operations may include, but are not limited to: data write operations, data read operations, data query operations, and the like.
Fig. 5 shows a flowchart of a candidate inspection path set determination procedure 500 according to an embodiment of the present description.
As shown in fig. 5, at 510, a set of complete data flow paths corresponding to data access operations is determined by performing a bottom-up data flow analysis on program source code of a concurrent program, a start node of each complete data flow path in the set of complete data flow paths representing one data access operation for a memory location, and an end node representing another data access operation for the memory location. Here, the determination of the complete set of data flow paths begins with the data flow analysis of the lowest program function of the concurrent program.
For example, for the example concurrency program shown in fig. 2, the data flow analysis of the program function Func D begins. For the analysis of the program function Func D, an analysis result arg1- >25 is obtained, which is not a complete path and can be substituted upwards when the program function Func D is called.
For the program function Func, three analysis results were obtained: (1) arg2- >19 g <0, the analysis result is not a complete path, and can be substituted upwards when the program function Func is called; (2) 19- >21 (arg 1) g <0, substituting the analysis result of the program function Func D for the analysis result, thereby obtaining an analysis result: 19- >21 (arg 1) - >25:g <0, the analysis result being a complete path; (3) arg2- >21 (arg 1), substituting the analysis result of the program function Func D into the analysis result of the bar to obtain an analysis result: arg2- >21 (arg 1) - >25, the analysis result is not a complete path, and can be substituted upwards when the program function Func is called.
For the program function Func B, an analysis result arg1- >14 (arg 2) can be obtained, and the two analysis results can be obtained by substituting the analysis result of the program function Func: (1) arg1- >14 (arg 2) - > 19:g0, the analysis result is not a complete path, and can be substituted upwards when the program function Func B is called; and (2) arg1- >14 (arg 2) - >21 (arg 1) - >25, which is not a complete path, can be substituted upwards when the program function Func B is called.
For the program function Func a, an analysis result is obtained: (1) 5 (arg 1) - >7 (arg 1): g >0; (2) 5 (arg 1) - >9 (arg 1): g >0; and (3) 7 (arg 1) - >9 (arg 1): g >0. For analysis results 5 (arg 1) - >7 (arg 1): g >0, substituting the analysis results of program functions Func B and Func D, the complete path 19- >14 (arg 2) - >5 (arg 1) - >7 (arg 1) - > 25) can be obtained: g <0 &g >0 and complete path 25- >21 (arg 1) - >14 (arg 2) - >5 (arg 1) - >7 (arg 1) - >25:g > 0). For analysis results 5 (arg 1) - >9 (arg 1): g >0, substituting the analysis results of program functions Func B and Func D, the complete path 19- >14 (arg 2) - >5 (arg 1) - >9 (arg 1) - > 25) can be obtained: g <0 &g >0 and complete path 25- >21 (arg 1) - >14 (arg 2) - >5 (arg 1) - >9 (arg 1) - >25:g > 0). Substituting the analysis result of the program function Func D for 7 (arg 1) - >9 (arg 1): g >0, gives the complete path 25- >7 (arg 1) - >9 (arg 1) - >25: g >0.
In summary, the following 6 complete paths were found, 1 from program function Func, 5 from program function Func a:
Complete path 1:25- >7 (arg 1) - >9 (arg 1) - >25: g is more than 0, the weight of the product is more than 0,
Complete path 2:25- >21 (arg 1) - >14 (arg 2) - >5 (arg 1) - >7 (arg 1) - > 25): g >0
Complete path 3:25- >21 (arg 1) - >14 (arg 2) - >5 (arg 1) - >9 (arg 1) - > 25): g >0
Complete path 4:19- >21 (arg 1) - >25:g <0
Complete path 5:19- >14 (arg 2) - >5 (arg 1) - >7 (arg 1) - > 25): g <0 &g >0
Complete path 6:19- >14 (arg 2) - >5 (arg 1) - >9 (arg 1) - > 25): g <0 &g >0.
At 520, the data flow paths corresponding to both the start node and the end node to the non-data write operation are removed from the filtered set of complete data flow paths, resulting in a set of candidate check paths. The data access operations shown in fig. 2 are all data write operations, so that all of the 6 full data stream paths described above can be considered as candidate check paths.
Further, optionally, when the candidate check path set is obtained from the screened complete data flow path set, the unreachable data flow path may be removed in addition to removing the data flow paths corresponding to the non-data writing operation by both the start node and the end node. The term "unreachable data flow path" refers to a data flow path for which there is a contradiction in path conditions. For example, the complete paths 5 and 6 described above. After the above processing, 4 candidate inspection paths, i.e., complete paths 1 to 4, can be finally obtained.
Returning to FIG. 1, after the set of candidate inspection paths is obtained as described above, global annotations for the start and end nodes of each candidate inspection path are determined at 130 using the thread flow graph of each program function. For example, a thread flow graph for each program function may be used to determine global annotations for a start node and an end node for each candidate inspection path based on function call relationships in each candidate inspection path. Here, the global annotation is a node annotation after taking into account the annotation influence introduced by the upper-level program function and the lower-level program function (i.e., the complete context). Likewise, the labeling process at 130 may also employ the labeling process rules described above. The global annotation determination process described above is described below with reference to program source code shown in figure 2. The global annotation determination process also uses a bottom-up analysis method.
Firstly, the program function Func D is analyzed, and an analysis result arg1- >25 (marked as empty for the node 25) can be obtained, and when the program function Func D is called, the analysis result can be substituted upwards.
Then, the program function Func is analyzed, and an analysis result arg2- >19 (marked as null at node 19) can be obtained, g <0, which can be substituted upwards when the program function Func is called. The analysis program function Func may also obtain an analysis result 19- >21 (arg 1) (note of node 19 is empty) g <0, substituting the analysis result of the program function Func D (note t3 generated by the additional node 21 calling the program function FuncD) and obtain 19- >21 (arg 1) - >25 (note of node 19 is empty, note of node 25 is t 3) g <0, thereby obtaining global notes of the beginning node 19 and the ending node 25 of the complete path 4, i.e., global note of node 19 is empty and global note of node 25 is t3. Furthermore, the program function Func may be analyzed to obtain an analysis result arg2- >21 (arg 1), and the analysis result arg2- >21 (arg 1) may be substituted into the analysis result of the program function Func D (the label t3 generated by the additional node 21 calling the program function FuncD), thereby obtaining arg2- >21 (arg 1) - >25 (the label t3 of the node 25), and the analysis result may be substituted upward when the program function Func is called.
Subsequently, the program function Func B is analyzed to obtain an analysis result arg1- >14 (arg 2), and the analysis result (marked as empty) of the program function Func C is substituted to obtain arg1- >14 (arg 2) - >19 (marked as empty) of the node 19, wherein g <0. The analysis result may be substituted up when the program function Func B is called. Furthermore, the program function Func B is analyzed to obtain the analysis result arg1- >14 (arg 2) - >21 (arg 1) - >25 (line of interest 25 is denoted as t 3). When the program function Func B is called, it can be substituted upwards
Analyzing the program function Func a may result in an analysis result of 5 (arg 1) - >7 (arg 1): g >0, substituting the analysis results of the program function Func B and the program function Func D, the analysis result of the program function Func B being +t1, and the analysis result of the program function Func D being + (t1+t3)), thereby obtaining a complete path 25- >21 (arg 1) - >14 (arg 2) - >5 (arg 1) - >7 (arg 1) - >25:g >0 (complete path 2) starting node 25 and ending node 25 global labels, i.e., starting node 25 global labels are t1+t3, and ending node 25 global labels are t1+t3.
The program function Func a may also be analyzed to obtain an analysis result of 5 (arg 1) - >9 (arg 1): g >0, substituting the analysis results of the program function Func B and the program function Func D, the analysis result of the program function Func B being +t1, and the analysis result of the program function Func D being + (T1 (g < =0) +t2 (g > 0) +t3), thereby obtaining global labels of the start node 25 and the end node 25 of the complete path 25- >21 (arg 1) - >14 (arg 2) - >5 (arg 1) - >9 (arg 1) - >25:g >0 (complete path 3), i.e., the global label of the start node 25 is t1+t3, and the global label of the end node 25 is T1 (g < =0) +t2 (g > 0) +t3).
The program function Func a may be analyzed to obtain an analysis result 7 (arg 1) - >9 (arg 1): g >0, and the analysis result (node 7+ (t1+t3), node 9+ (T1 (g < =0) +t2 (g > 0) +t3)) of the program function Func D may be substituted to obtain a complete path 25- >7 (arg 1) - >9 (arg 1) - >25: g >0 (full path 1), i.e., the global notation of the start node 25 and the end node 25 is t1+t3, and the global notation of the end node 25 is T1 (g < =0) +t2 (g > 0) +t3.
According to the processing method, when the program function is called, the analysis result (analysis corresponding to the incomplete path) of the program function is substituted into the caller, so that the context influence from the caller to the callee is introduced in the data flow analysis. In addition, as described above, using bottom-up analysis in building a thread flow graph of individual program functions, caller-to-caller context effects have been introduced such that data flow analysis introduces complete context effects, which in turn may enable complete context-sensitive analysis.
Next, at 140, a data race check result is determined based on global labels of the start node and the end node of each candidate check path. In one example, for each candidate inspection path, when the same thread tag or a thread tag with a corresponding relationship exists in the global labels of the start node and the end node, it is determined that a data competition relationship exists between the program statements represented by the start node and the end node. Here, the thread tag having the correspondence relationship means a main thread Cheng Biaoqian and a sub thread tag having the correspondence relationship.
For example, for the above-described full paths 1 to 4, in the full path 1, the main line Cheng Biaoqian and the sub-thread tag having the correspondence do not occur at the same time, so that the start node 25 and the end node 25 have no data competition. In full path 2, T1/T1: g >0 and T3/T3: g >0 occur, so that both the start node 25 and the end node 25 can compete through T1/T1 and T3/T3. In the complete path 3, a contradiction (absence) of T1/T1: g < = 0& & g >0 occurs, and T3/T3: g >0, so that the start node 25 and the end node 25 can compete through T3/T3. In the full path 4, the main line Cheng Biaoqian and the sub-thread tag having the correspondence do not simultaneously occur, so that the start node 19 and the end node 25 have no data competition. In summary, in the program source code of the concurrent program shown in fig. 2, there is no data competition between the 19 th run-length sentence and the data access operation represented by the 25 th run-length sentence, but there is data competition between the data access operation represented by the 25 th run-length sentence and itself.
In the above data race check method, the constructed thread flow graph has an edge relation for reflecting a control flow direction (data flow direction) and a label containing a path condition, so that the constructed thread flow graph supports flow-sensitive analysis and path-sensitive analysis. In addition, when the thread flow graph of each program function is constructed, the thread flow graph is executed in a bottom-up mode, so that the constructed thread flow graph introduces the contextual influence of the lower program function. And when global annotation determination is carried out, if a program function call exists, substituting an analysis result (analysis corresponding to a non-complete path) of the program function into a caller, so that the determined global annotation introduces a context influence from the caller to a callee, and the determined global annotation introduces a complete context influence. In other words, embodiments of the present specification provide a data race analysis scheme that supports flow sensitivity, path sensitivity, and context sensitivity.
In addition, by using the data race checking method, when the program source codes of the concurrent program are subjected to bottom-up data flow analysis, a plurality of real nodes, the data flow directions of which are not changed, are combined into a single thread flow graph node, so that the number of nodes of the constructed thread flow graph can be reduced, and the data race checking efficiency is improved.
In addition, by using the data race checking method, the complete data stream path set corresponding to the data access operation is obtained by performing bottom-up data stream analysis on the program source code, and the candidate checking path set is obtained by removing the data stream paths corresponding to the non-data write operation from the screened complete data stream path set by both the start node and the end node, so that only part of paths but not all paths are analyzed, thereby reducing the workload of data race checking and improving the data race checking efficiency. In addition, the unreachable data flow paths are further removed from the screened complete data flow path set, so that the number of the analyzed paths can be further reduced, the workload of data race checking is further reduced, and the data race checking efficiency is improved.
Fig. 6 shows a block diagram of a data race check device 600 according to an embodiment of the present specification. As shown in fig. 6, the data race check apparatus 600 includes a thread flow graph construction unit 610, a check path set determination unit 620, a global annotation determination unit 630, and a data race check unit 640.
The thread-flow graph construction unit 610 is configured to construct, for each program function, a respective thread-flow graph TFG (N, E, nentry, nexit) by performing a bottom-up data-flow analysis on a program source code of a concurrent program, N representing a set of thread-flow graph nodes { N 1,n2,……,nk},E={<ni,nj>|ni,nj E N } representing a set of directed edges reflecting a control flow from the thread-flow graph nodes N i to N j, each program statement in the program function corresponding to one real node in the set of thread-flow graph nodes N, and the set of thread-flow graph nodes N further comprising virtual nodes created based on thread operations and function calls, nentry and nexit representing an ingress node and an egress node, respectively, of the program function, each thread-flow graph node having labels reflecting thread tags and/or path conditions involved in running the thread-flow graph node.
Optionally, in one example, the callout may be an expression of an operand including at least one of a constant thread tag, a variable thread tag, and a path condition and an operation including at least one of an add operation, a subtract operation, and a bind operation.
The check path set determining unit 620 is configured to determine a set of candidate check paths by performing a bottom-up data flow analysis on a program source code of a concurrent program, a start node of each candidate check path in the set of candidate check paths representing one data access operation for a memory location, an end node representing another data access operation for the memory location, wherein at least one of the start node and the end node represents a data write operation for the memory location.
The global annotation determination unit 630 is configured to determine global annotations for start and end nodes of each candidate examination path of the set of candidate examination paths using the thread flow graph TFG (N, E, nentry, nexit) of each program function.
The data race check unit 640 is configured to determine a data race check result of the concurrent program based on global labels of the start node and the end node of each candidate check path.
Fig. 7 shows a block diagram of one implementation example of a thread-flow graph construction unit 700 according to an embodiment of the present description. As shown in fig. 7, the thread-flow graph construction unit 700 includes a node creation module 710, an edge relationship determination module 720, and a node annotation determination module 730.
The node creation module 710, the side relationship determination module 720, and the node annotation determination module 730 are configured to perform a thread-flow graph construction process based on bottom-up dataflow analysis from the lowest program function of the concurrent program until the thread-flow graph construction process of the top-most program function is completed. When the processing of the topmost program function is not completed, an upper program function or an identical-layer unprocessed program function of the current program function is selected as the current program function of the next processing procedure.
Specifically, for each current program function, the node creation module 710 is configured to create a thread-flow graph node for the current program function that includes real nodes corresponding to each program statement, virtual nodes created from thread creation and/or function calls, ingress nodes, and egress nodes.
The edge relationship determination module 720 is configured to determine a directed edge relationship between various thread flow graph nodes of the current program function based on a data flow analysis of program source code of the current program function.
The node label determining module 730 is configured to determine labels of nodes of each thread flow graph of the current program function using label processing rules corresponding to the determined directed edge relationships and node labels of the lower program function.
Fig. 8 shows a block diagram of one implementation example of the inspection path set determination unit 800 according to an embodiment of the present specification. As shown in fig. 8, the inspection path set determining unit 800 includes a complete path set acquiring module 810 and an inspection path set determining module 820.
The complete path set obtaining module 810 is configured to obtain a complete data flow path set corresponding to a data access operation by performing a bottom-up data flow analysis on a program source code, wherein a start node of each complete data flow path in the complete data flow path set represents one data access operation for a memory location, and an end node represents another data access operation for the memory location.
The check path set determination module 820 is configured to remove data flow paths corresponding to both the start node and the end node from the screened complete data flow path set to obtain a candidate check path set.
As described above with reference to fig. 1 to 8, the data race check method and the data race check apparatus according to the embodiments of the present specification are described. The above data race checking apparatus may be implemented in hardware, or may be implemented in software or a combination of hardware and software.
Fig. 9 shows a schematic diagram of a computer-implemented data race check device 900 according to an embodiment of the present description. As shown in fig. 9, the data race checking apparatus 900 may include at least one processor 910, a memory (e.g., a nonvolatile memory) 920, a memory 930, and a communication interface 940, and the at least one processor 910, the memory 920, the memory 930, and the communication interface 940 are connected together via a bus 960. At least one processor 910 executes computer programs (i.e., the elements described above that are implemented in software) stored or encoded in memory.
In one embodiment, a computer program is stored in memory that, when executed, causes the at least one processor 910 to: determining a set of candidate inspection paths by performing bottom-up data flow analysis on program source code of a concurrent program, a start node of each candidate inspection path in the set of candidate inspection paths representing one data access operation for a memory location, an end node representing another data access operation for the memory location, wherein at least one of the start node and the end node represents a data write operation for the memory location; determining global labels of start nodes and end nodes of each candidate inspection path in the candidate inspection path set by using a thread flow graph TFG (N, E, nentry, nexit) of each program function; and determining a data race check result of the concurrent program according to global labels of a start node and an end node of each candidate check path, wherein a thread flow graph TFG (N, E, nentry, nexit) of each program function is constructed by performing bottom-up data flow analysis on program source codes, N represents a thread flow graph node set { N 1,n2,……,nk},E={<ni,nj>|ni,nj E N } represents a directed edge set reflecting a control flow from the thread flow graph nodes N i to N j, each program statement in the program function corresponds to one real node in the thread flow graph node set N, and the thread flow graph node set N further comprises virtual nodes created based on thread operations and function calls, nentry and nexit represent an entry node and an exit node of the program function, respectively, each thread flow graph node has labels for reflecting thread tags and/or path conditions involved in running of the thread flow graph node.
It should be appreciated that the computer programs stored in the memory, when executed, cause the at least one processor 910 to perform the various operations and functions described in connection with fig. 1-8 in various embodiments of the present description.
According to one embodiment, a program product such as a computer-readable medium (e.g., a non-transitory computer-readable medium) is provided. The computer readable medium may have a computer program (i.e., the elements described above implemented in software) that, when executed by a processor, causes the processor to perform the various operations and functions described in connection with fig. 1-8 in various embodiments of the present specification. In particular, a system or apparatus provided with a readable storage medium on which a software program code implementing the functions of any of the above embodiments is stored and whose computer or processor is caused to read out and execute the computer program stored in the readable storage medium may be provided.
In this case, the program code itself read from the readable medium can realize the functions of any of the above embodiments, and thus the computer readable code and the readable storage medium storing the computer readable code form part of the present specification.
Examples of readable storage media include floppy disks, hard disks, magneto-optical disks, optical disks (e.g., CD-ROMs, CD-R, CD-RWs, DVD-ROMs, DVD-RAMs, DVD-RWs), magnetic tapes, nonvolatile memory cards, and ROMs. Alternatively, the program code may be downloaded from a server computer or cloud by a communications network.
According to one embodiment, a computer program product is provided that includes a computer program that, when executed by a processor, causes the processor to perform the various operations and functions described above in connection with fig. 1-8 in various embodiments of the present description.
It will be appreciated by those skilled in the art that various changes and modifications can be made to the embodiments disclosed above without departing from the spirit of the invention. Accordingly, the scope of the invention should be limited only by the attached claims.
It should be noted that not all the steps and units in the above flowcharts and the system configuration diagrams are necessary, and some steps or units may be omitted according to actual needs. The order of execution of the steps is not fixed and may be determined as desired. The apparatus structures described in the above embodiments may be physical structures or logical structures, that is, some units may be implemented by the same physical entity, or some units may be implemented by multiple physical entities, or may be implemented jointly by some components in multiple independent devices.
In the above embodiments, the hardware units or modules may be implemented mechanically or electrically. For example, a hardware unit, module or processor may include permanently dedicated circuitry or logic (e.g., a dedicated processor, FPGA or ASIC) to perform the corresponding operations. The hardware unit or processor may also include programmable logic or circuitry (e.g., a general purpose processor or other programmable processor) that may be temporarily configured by software to perform the corresponding operations. The particular implementation (mechanical, or dedicated permanent, or temporarily set) may be determined based on cost and time considerations.
The detailed description set forth above in connection with the appended drawings describes exemplary embodiments, but does not represent all embodiments that may be implemented or fall within the scope of the claims. The term "exemplary" used throughout this specification means "serving as an example, instance, or illustration," and does not mean "preferred" or "advantageous over other embodiments. The detailed description includes specific details for the purpose of providing an understanding of the described technology. However, the techniques may be practiced without these specific details. In some instances, well-known structures and devices are shown in block diagram form in order to avoid obscuring the concepts of the described embodiments.
The previous description of the disclosure is provided to enable any person skilled in the art to make or use the disclosure. Various modifications to the disclosure will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other variations without departing from the scope of the disclosure. Thus, the disclosure is not limited to the examples and designs described herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (11)

1. A data race check method of concurrent program includes:
Determining a set of candidate checking paths by performing bottom-up data flow analysis on program source codes of concurrent programs, wherein a start node of each candidate checking path in the set of candidate checking paths represents a data access operation for a memory location, and an end node represents another data access operation for the memory location, wherein at least one of the start node and the end node represents a data write operation for the memory location;
Determining global labels of start nodes and end nodes of each candidate inspection path in the candidate inspection path set by using a thread flow graph TFG (N, E, nentry, nexit) of each program function of the concurrent program; and
Determining the data competition checking result of the concurrent program according to the global labels of the starting node and the ending node of each candidate checking path,
Wherein the thread flow graph TFG (N, E, nentry, nexit) of each program function is constructed by bottom-up data flow analysis of the program source code, N represents a thread flow graph node set, N represents a directed edge set of control flow from a thread flow graph node to, each program statement in the program function corresponds to a real node in the thread flow graph node set N, and the thread flow graph node set N further comprises virtual nodes created based on thread operations and function calls, nentry and nexit represent entry and exit nodes of the program function, respectively, each thread flow graph node having labels reflecting thread tags and/or path conditions involved in the running of the thread flow graph node.
2. The data race examination method of claim 1, wherein the callout is an expression of an operand and an operation composition, the operand including at least one of a constant thread tag, a variable thread tag, and a path condition, the operation including at least one of an add operation, a subtract operation, and a bind operation.
3. The data race examination method of claim 1 wherein each program function thread flow graph TFG (N, E, nentry, nexit) is constructed by bottom-up dataflow analysis of program source code by:
from the bottom-most program function of the concurrent program, the following processing is performed based on the bottom-up data flow analysis until the processing of the top-most program function is completed:
creating a thread flow graph node of a current program function, wherein the thread flow graph node comprises a real node corresponding to each program statement, a virtual node created according to thread creation and/or function call creation, an inlet node and an outlet node;
determining directed edge relationships between various thread flow graph nodes of the current program function according to data flow analysis of program source code of the current program function; and
And determining the labels of the nodes of each thread flow graph of the current program function by using the label processing rules corresponding to the determined directed edge relations and the node labels of the lower program function.
4. The data race examination method of claim 1, wherein the directed edges include at least one of the following directed edges:
directed edge → flow, directed edge → create, directed edge → join and directed edge → call.
5. A data race check method according to claim 3, wherein, in the bottom-up data flow analysis of the program source code of the concurrent program, a plurality of real nodes whose data flow direction does not change from each other are combined into a single thread flow graph node.
6. The data race examination method of claim 1 wherein determining a set of candidate examination paths by bottom-up data flow analysis of program source code of the concurrent program comprises:
Acquiring a complete data flow path set corresponding to data access operation by performing bottom-up data flow analysis on program source codes of the concurrent program, wherein a start node of each complete data flow path in the complete data flow path set represents one data access operation for a memory location, and an end node represents another data access operation for the memory location; and
And removing the data flow paths corresponding to the start node and the end node and the non-data writing operation from the screened complete data flow path set to obtain a candidate checking path set.
7. The data race check method of claim 6, wherein removing data flow paths corresponding to both the start node and the end node to the non-data write operation from the screened complete data flow path set, obtaining the candidate check path set includes:
And removing unreachable data flow paths and data flow paths corresponding to the start node and the end node to the non-data writing operation from the screened complete data flow path set to obtain a candidate checking path set.
8. The data race inspection method of claim 1, wherein determining the data race inspection result based on global labels of the start node and the end node of each candidate inspection path comprises:
And determining that the program sentences represented by the starting node and the ending node have data competition relations when the same thread labels or the thread labels with corresponding relations exist in the global labels of the starting node and the ending node aiming at each candidate checking path.
9. The data race examination method of claim 1 wherein construction of the thread flow graph is performed in parallel with determination of the candidate examination path.
10. A data race check apparatus for a concurrent program, the data race check apparatus comprising:
At least one of the processors is configured to perform,
A memory coupled to the at least one processor, and
A computer program stored in the memory, the at least one processor executing the computer program to implement:
Determining a set of candidate checking paths by performing bottom-up data flow analysis on program source codes of concurrent programs, wherein a start node of each candidate checking path in the set of candidate checking paths represents a data access operation for a memory location, and an end node represents another data access operation for the memory location, wherein at least one of the start node and the end node represents a data write operation for the memory location;
Determining global labels of start nodes and end nodes of each candidate inspection path in the set of candidate inspection paths using a thread flow graph TFG (N, E, nentry, nexit) of each program function; and
Determining the data competition checking result of the concurrent program according to the global labels of the starting node and the ending node of each candidate checking path,
Wherein the thread-flow graph TFG (N, E, nentry, nexit) of each program function is constructed by bottom-up data-flow analysis of the program source code, N represents a thread-flow graph node set, N represents a directed edge set reflecting the control flow from the thread-flow graph node to, each program statement in the program function corresponds to a real node in the thread-flow graph node set N, and the thread-flow graph node set N further comprises virtual nodes created based on thread operations and function calls, nentry and nexit represent an entry node and an exit node of the program function, respectively, each thread-flow graph node having labels for reflecting thread tags and/or path conditions involved in the running of the thread-flow graph node.
11. A computer-readable storage medium storing a computer program that is executed by a processor to implement the data race checking method of any one of claims 1 to 9.
CN202110280390.2A 2021-03-16 Concurrent program data competition checking method and device Active CN112965838B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110280390.2A CN112965838B (en) 2021-03-16 Concurrent program data competition checking method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110280390.2A CN112965838B (en) 2021-03-16 Concurrent program data competition checking method and device

Publications (2)

Publication Number Publication Date
CN112965838A CN112965838A (en) 2021-06-15
CN112965838B true CN112965838B (en) 2024-04-19

Family

ID=

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101512503A (en) * 2005-04-29 2009-08-19 微软公司 XML application framework
CN102073589A (en) * 2010-12-29 2011-05-25 北京邮电大学 Code static analysis-based data race detecting method and system thereof
JP2013156786A (en) * 2012-01-30 2013-08-15 Hitachi Automotive Systems Ltd Software structure visualization program and system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101512503A (en) * 2005-04-29 2009-08-19 微软公司 XML application framework
CN102073589A (en) * 2010-12-29 2011-05-25 北京邮电大学 Code static analysis-based data race detecting method and system thereof
JP2013156786A (en) * 2012-01-30 2013-08-15 Hitachi Automotive Systems Ltd Software structure visualization program and system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
多线程并行程序数据竞争静态检测方法;陈俊;周宽久;贾敏;;计算机工程与设计(05);全文 *
网络程序设计中的并发复杂性;李慧霸;田甜;彭宇行;李东升;卢锡城;;软件学报(01);全文 *

Similar Documents

Publication Publication Date Title
Alur et al. Model checking of message sequence charts
CN111488174B (en) Method and device for generating application program interface document, computer equipment and medium
Flanagan et al. Thread-modular model checking
CN102339252B (en) Static state detecting system based on XML (Extensive Makeup Language) middle model and defect mode matching
D’Osualdo et al. Automatic verification of Erlang-style concurrency
US8769499B2 (en) Universal causality graphs for bug detection in concurrent programs
US8589888B2 (en) Demand-driven analysis of pointers for software program analysis and debugging
US8380483B2 (en) Inter-procedural dataflow analysis of parameterized concurrent software
CN112000398B (en) Method and device for determining bottom layer interface call link, computer equipment and medium
EP3623948B1 (en) Method and system for code analysis to identify causes of code smells
Popeea et al. Compositional termination proofs for multi-threaded programs
Liu et al. Parametric regular path queries
CN113190330B (en) Block chain threat sensing system and method
CN112817787B (en) Automatic detection method for data competition of interrupt-driven embedded system
CN112214399B (en) API misuse defect detection system based on sequence pattern matching
Bartolini et al. Data flow-based validation of web services compositions: Perspectives and examples
US6516306B1 (en) Model checking of message flow diagrams
CN112965838B (en) Concurrent program data competition checking method and device
Madhavan et al. Purity analysis: An abstract interpretation formulation
CN112115053A (en) API misuse defect detection method based on sequence pattern matching
CN116702157A (en) Intelligent contract vulnerability detection method based on neural network
Swain et al. OpenRace: An open source framework for statically detecting data races
CN115686467A (en) Type inference in dynamic languages
CN112965838A (en) Data race checking method and device for concurrent program
CN113050987A (en) Interface document generation method and device, storage medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
SE01 Entry into force of request for substantive examination
GR01 Patent grant