CN117093502B - Method and device for detecting parallelism of program codes - Google Patents

Method and device for detecting parallelism of program codes Download PDF

Info

Publication number
CN117093502B
CN117093502B CN202311331747.0A CN202311331747A CN117093502B CN 117093502 B CN117093502 B CN 117093502B CN 202311331747 A CN202311331747 A CN 202311331747A CN 117093502 B CN117093502 B CN 117093502B
Authority
CN
China
Prior art keywords
program
node
depth
nodes
determining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311331747.0A
Other languages
Chinese (zh)
Other versions
CN117093502A (en
Inventor
卫思为
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alipay Hangzhou Information Technology Co Ltd
Original Assignee
Alipay Hangzhou Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alipay Hangzhou Information Technology Co Ltd filed Critical Alipay Hangzhou Information Technology Co Ltd
Priority to CN202311331747.0A priority Critical patent/CN117093502B/en
Publication of CN117093502A publication Critical patent/CN117093502A/en
Application granted granted Critical
Publication of CN117093502B publication Critical patent/CN117093502B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/3604Software analysis for verifying properties of programs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/43Checking; Contextual analysis
    • G06F8/433Dependency analysis; Data or control flow analysis

Abstract

The embodiment of the specification provides a method and a device for detecting parallelism of program codes. The method comprises the following steps: splitting the program code into a plurality of program units and determining the dependency relationship among the plurality of program units; constructing a dependency graph according to the dependency relationship, wherein the dependency graph comprises a plurality of nodes corresponding to a plurality of program units; when the dependency graph is a directed acyclic graph, determining respective depth numbers of the plurality of nodes according to the dependency graph; determining respective depth ranges of a plurality of nodes, wherein for any first node in the plurality of nodes, the lower limit value corresponding to the depth range is the depth number of the first node, the upper limit value corresponding to the depth range is determined based on the depth number of the neighbor node, and the neighbor node directly depends on the first node; and determining a program unit set which allows parallel execution according to the respective depth ranges of the plurality of nodes, wherein the program unit set comprises a first program unit and a second program unit, and the depth ranges of the respective corresponding nodes have overlapping areas.

Description

Method and device for detecting parallelism of program codes
Technical Field
One or more embodiments of the present disclosure relate to a computer, and more particularly, to a method and apparatus for detecting parallelism of program codes.
Background
Parallel technology is one of the important solutions to improve program performance. However, the developer may not be familiar with the parallel technology, so that the developer places more importance on the correctness, the readability and other characteristics of the program code in the process of writing the program code by adopting a high-level programming language, and the opportunity of improving the program performance by the parallel technology in the program code is very easy to be ignored.
A new solution is desired to efficiently discover opportunities in program code that allow parallel techniques to be employed.
Disclosure of Invention
One or more embodiments of the present specification provide a method and apparatus for detecting parallelism of program codes.
In a first aspect, there is provided a method for detecting parallelism of program code, the method comprising: splitting program code into a plurality of program units and determining dependency relationships among the plurality of program units; constructing a dependency graph according to the dependency relationship, wherein the dependency graph comprises a plurality of nodes corresponding to the plurality of program units; when the dependency graph is a directed acyclic graph, determining respective depth numbers of the plurality of nodes according to the dependency graph; determining respective depth ranges of the plurality of nodes, wherein for any first node in the plurality of nodes, a lower limit value corresponding to the depth range of the first node is a depth number thereof, an upper limit value corresponding to the depth range of the first node is determined based on a depth number of a neighbor node, and the neighbor node directly depends on the first node; and determining a program unit set which allows parallel execution according to the respective depth ranges of the plurality of nodes, wherein the program unit set comprises a first program unit and a second program unit, and the depth ranges of the nodes corresponding to the program unit set have an overlapping area.
In a possible implementation manner, the splitting the program code into a plurality of program units includes: splitting the program code into a plurality of statements; for any first statement in the plurality of statements, determining whether a function call exists in the first statement; if yes, determining a called function from the first statement, and a function call primitive depending on the called function, wherein the called function and the function call primitive are used as a program unit; and if not, taking the first statement as a program unit.
In one possible implementation manner, the determining the dependency relationship between the plurality of program units includes determining the dependency relationship between the plurality of program units according to the data objects accessed by the plurality of program units and the access types of the plurality of program units to the corresponding data objects.
In a possible implementation manner, the determining the respective depth numbers of the plurality of nodes includes: newly adding an inlet node, an outlet node, a directed edge from the inlet node to a second node and a directed edge from a third node to the outlet node in the dependency graph to obtain a target directed acyclic graph, wherein the degree of incidence of the second node in the dependency graph is 0, and the degree of egress of the third node in the dependency graph is 0; for any node in the target directed acyclic graph, a depth number of the node is determined, which is a number of directed edges included in a longest directed path between the ingress node and the node.
In one possible embodiment, the difference between the minimum value of the respective depth numbers of the at least one neighboring node directly depending on the first node and the upper limit value corresponding to the depth range of the first node is 1.
In a possible implementation manner, the determining a set of program units that allow parallel execution according to respective depth ranges of the plurality of nodes includes: determining a depth interval to be queried according to the depth numbers of the inlet node and the outlet node; querying a target node from the plurality of nodes by numbering any first depth located in the depth interval, wherein the first depth number is included in the depth range of the target node; and when the number of the target nodes is not less than 1, adding the program units corresponding to the target nodes into the program unit set.
In one possible embodiment, the method further comprises: and determining a first computing task and a second computing task which are allowed to be executed in parallel according to the program unit set, wherein the first computing task at least comprises the first program unit, and the second computing task at least comprises the second program unit.
In a possible embodiment, the set of program elements further comprises a third program element dependent on the second program element; wherein the second computing task further comprises the third program unit.
In one possible embodiment, the method further comprises: determining resource consumption information corresponding to each of the first computing task and the second computing task; and determining whether to execute the first computing task and the second computing task in parallel according to the resource consumption information.
In one possible embodiment, the method further comprises: when the dependency graph is a non-directed acyclic graph, disassembling the dependency graph into a plurality of strongly connected components; determining a trivial strong connected component from the strong connected components;
for any of the plurality of program elements, adding the program element to the set of program elements if the node to which the program element corresponds is located in the trivial strong connected component.
In one possible implementation, the program code is written in Python.
In a second aspect, there is provided a parallelism detecting apparatus of program code, the apparatus comprising: a code splitting unit configured to split the program code into a plurality of program units; a dependency analysis unit configured to determine a dependency relationship between the plurality of program units; a composition processing unit configured to construct a dependency graph according to the dependency relationship, wherein the dependency graph includes a plurality of nodes corresponding to the plurality of program units; a depth determination unit configured to determine respective depth numbers of the plurality of nodes when the dependency graph is a directed acyclic graph; a range determining unit configured to determine respective depth ranges of the plurality of nodes, wherein, for any first node of the plurality of nodes, a lower limit value corresponding to the depth range is a depth number thereof, and an upper limit value corresponding to the depth range is determined based on a depth number of a neighboring node, the neighboring node directly depends on the first node; and the parallel detection unit is configured to determine a program unit set which allows parallel execution according to the respective depth ranges of the plurality of nodes, wherein the program unit set comprises a first program unit and a second program unit, and the depth ranges of the nodes corresponding to the first program unit and the second program unit have overlapping areas.
In a third aspect, there is provided a computer readable storage medium having stored thereon a computer program/instruction which, when executed in a computing device, implements the method of any of the first aspects.
In a fourth aspect, there is provided a computing device comprising a memory having stored therein a computer program/instruction and a processor executing the computer program/instruction to implement the method of any of the first aspects.
By the method and the device provided in one or more embodiments of the present disclosure, a dependency graph including a plurality of nodes corresponding to a plurality of program units is constructed based on dependency relationships among the plurality of program units included in the program code; when the dependency graph is a directed acyclic graph, respective depth numbers of the plurality of nodes can be determined, respective depth ranges of the plurality of nodes are determined based on the respective depth numbers of the plurality of nodes, wherein for any first node in the plurality of nodes, a lower limit value corresponding to the depth range is the depth number, and an upper limit value corresponding to the depth range is determined based on the depth number of a neighboring node directly depending on the first node; when the depth ranges of any two nodes have overlapping areas, two program units corresponding to the two nodes are necessarily allowed to be executed in parallel, and accordingly a program unit set allowing parallel execution can be determined according to the respective depth ranges of the plurality of nodes, so that the program unit allowing parallel execution is added into the program unit set allowing parallel execution. In this way, program elements in the program code that allow parallel execution can be found more efficiently and accurately, i.e. opportunities to employ parallel techniques can be found more efficiently and accurately.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present specification, the drawings required for the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flowchart of a method for detecting parallelism of program codes according to an embodiment of the present disclosure;
FIG. 2 is a schematic diagram of an exemplary provided splitting program code into a plurality of program units;
FIG. 3 is a schematic diagram of an exemplary provided splitting statements in program code into program units;
FIG. 4 is a schematic diagram of an exemplary provided construction of a dependency graph corresponding to a plurality of program units;
FIG. 5 is a schematic diagram of an exemplary provided obtaining a target directed acyclic graph corresponding to a dependency graph;
FIG. 6 is a schematic diagram of an exemplary target directed acyclic graph;
FIG. 7 is a schematic diagram of an exemplary provided program element-based depth range acquisition program element set;
FIG. 8 is a graph of an exemplary provided dependency relationship containing strongly connected components;
Fig. 9 is a schematic structural diagram of a parallelism detecting apparatus for program codes according to an embodiment of the present disclosure.
Description of the embodiments
Various non-limiting embodiments provided by the present specification are described in detail below with reference to the attached drawings.
Parallelism detection of program code refers to detecting a set of program statements present in program code that need to be executed serially that can be executed in parallel/concurrently without changing the semantics of the program code. The parallel technology can be properly utilized to realize the purpose of accelerating the running of the program codes to improve the program performance by utilizing the multi-core processing capability of the modern computing equipment.
Parallel techniques are a more typical solution for improving program performance. Taking program code written in the high-level programming language Python as an example, the parallel technology framework that is allowed to be employed may include, but is not limited to, embedded-based modules multiprocessing, threading and distributed-framework-based Spark, ray, etc. The parallelization of program code is facilitated by a parallel technology framework. However, the relevant technician is focused on how to automatically convert the program code that would otherwise need to be executed serially into a parallel version after finding which program statements in the program code that needs to be executed serially are allowed to be executed in parallel; for example, automatic conversion may be implemented by using polyhedral optimization (polyhedral optimization) techniques such as autopilel and autophc, automatic conversion of program codes in a specific domain by using quantpoud and HPAT, and the like. Regarding how to find out which program statements in the program code that needs to be executed in series are allowed to be executed in parallel, a professional technician is usually required to make manual labeling, which is inefficient and accurate.
The embodiment of the specification at least provides a method and a device for detecting the parallelism of program codes. Constructing a dependency graph containing a plurality of nodes corresponding to a plurality of program units based on the dependency among the program units included in the program codes; when the dependency graph is a directed acyclic graph, determining respective depth numbers of the plurality of nodes according to the dependency graph, and determining respective depth ranges of the plurality of nodes based on the respective depth numbers of the plurality of nodes, wherein for any first node in the plurality of nodes, a lower limit value corresponding to the depth range is the depth number, and an upper limit value corresponding to the depth range is determined based on the depth number of a neighboring node directly dependent on the first node; when the depth ranges of any two nodes have overlapping areas, two program units corresponding to the two nodes are necessarily allowed to be executed in parallel, and accordingly a program unit set allowing parallel execution can be determined according to the respective depth ranges of the plurality of nodes, so that the program unit allowing parallel execution is added into the program unit set allowing parallel execution. In this way, program elements in the program code that allow parallel execution can be found more efficiently and accurately, i.e. opportunities to employ parallel techniques can be found more efficiently and accurately.
Fig. 1 is a flowchart of a method for detecting parallelism of program codes according to an embodiment of the present disclosure. The method may be performed by any apparatus, platform, device, or cluster of devices having computing/processing capabilities.
Referring to fig. 1, the method may include, but is not limited to, part or all of the following steps S101 to S119.
In step S101, the program code is split into a plurality of program units, and the dependency relationship between the plurality of program units is determined.
The split program code is program code that would otherwise need to be executed serially, which may be source code written in a variety of possible programming languages, such as, for example, in the programming language Python or other high-level languages.
The coarse granularity of the program code may be split into multiple statements. For any first statement in the multiple statements, determining whether the first statement is a coarse-granularity statement which needs to be split continuously; for example, when there is a function call in the first statement, the first statement may be determined to be a coarse-grained statement that needs to be split continuously. If the first statement is a coarse-grained statement which needs to be split continuously, splitting the first statement into at least two program units according to corresponding splitting rules; for example, when there is a function call in the first statement, a called function and a function call primitive depending on the called function may be determined from the first statement, and both the called function and the function call primitive are used as program units. Accordingly, if the first statement is not a coarse-grained statement that requires continued splitting, the first statement may be treated as a separate program element.
An abstract syntax tree (Abstract Syntax Tree, AST) corresponding to the program code may be first generated, and then all sentence subtrees may be extracted from the AST; a single statement subtree corresponds to a single statement in program code.
The program code may be split into a plurality of program elements by a computer program corresponding to the pseudocode as exemplified below:
pseudo code corresponding to computer program for splitting program code
Input: inputting program code
Output: outputting the split program unit set
//***//
Ast=paramast (code); AST for generating program code
Statements= get Statement Sub trees (AST); extracting all sentence subtrees
Program units=0; program Units for a set of/(and/or initialized Program Units for a set of initialization Units
ForStatement Statements do// per Statement State execution
If is coarse grained (State) the// if State is a coarse-grained Statement
Units= decompose Statement (status); extraction of program Unit Units from State
Program units= Program Units Units; program Units are added to the// Units
Else// if State is not a coarse-grained Statement that requires continued splitting
Program Units = Program Units { status }; program Units are added to the// State
End
End
Return Program units/output split Program unit collection
//***//
Referring to fig. 2, the exemplary program code includes the phrases "h=hello ()", "w=world ()", and "r=hw (h, w)", which are originally required to be executed in series, and can be correspondingly split into: 6 program units such as called function "hello ()", function call primitive "h= …", called function "world ()", function call primitive "w= …", called function "HW (h, w)" and function call primitive "r= …".
Referring to fig. 3, the exemplary program code includes the statement r=w×f (f 1 (a), f2 (b))+g (c), which can be split correspondingly into: called function "f1 (a)", called function f2 (b), called function g (c), f (… ) as a function call primitive and a called function at the same time, and function call primitive "r=w×f (… ) + …".
The execution order of the program units with the dependency cannot be changed and thus cannot be executed in parallel. Types of dependencies between program elements include: the method comprises the steps of firstly reading and then writing a Read-after-Write for the same data object, firstly writing and then reading the Write-after-Read for the same data object, sequentially rewriting the Write-after-Write for the same data object, and splitting a coarse-grained statement to obtain a dependency relationship (named compound), wherein the dependency relationship corresponding to a called function in the coarse-grained statement of a function call and a function call primitive depending on the called function belongs to the compound, and the requirement is that the function call is essentially Read access of the function call primitive serving as the data object by the function call primitive.
Based on the above dependency relationships, the dependency relationships among the program units can be determined according to the data objects accessed by the program units and the access types (read or write) of the program units to the corresponding data objects.
For example, see fig. 4. The data object to be accessed in the logical sense of the program unit hello ()' comprises a function body hello (), and the corresponding access type is read; the data object to be accessed in the logical sense of the program unit 'h= …' is a parameter h, and the corresponding access type is 'write'; the data object to be accessed in the logic sense of the program unit 'world ()' is a function body world (), and the corresponding access type is 'read'; the data object to be accessed in the logical sense of the program unit 'w= …' is the parameter w, and the corresponding access type is 'write'; the data object to be accessed in the logical sense of the program unit 'HW (h, w)' is the function body 'HW (h, w)' and the corresponding access type is 'read'; the data object to be accessed in the logical sense of the program unit "r= …" is the parameter r, and the corresponding access type is "write".
The called function 'hello ()' is called by a function calling primitive 'h= …', the called function 'world ()' is called by a function calling primitive 'w= …', and the called function 'HW (h, w)' is called by a function calling primitive 'r= …'; thus "h= …" depends on "hello ()" and the type of dependency thereof is compound; "w= …" depends on "world ()" and the type of dependency thereof is compound; "r= …" depends on "HW (h, w)" and the type of dependency thereof is compound.
The data objects that the called function "HW (h, w)" needs to read include a parameter h written by a function call primitive "h= …" and a parameter w written by a function call primitive "w= …". Thus, "HW (h, w)" depends on "h= …" and "w= …", and the type of dependency is read-before-write.
Step S103, a dependency graph is constructed according to the dependency relationship, wherein the dependency graph comprises a plurality of nodes corresponding to a plurality of program units.
The dependency graph can be directly constructed for a plurality of nodes by a plurality of program units; or in order to reduce the data volume required to be read and written, a plurality of nodes with relatively smaller data volume for representing the program units can be correspondingly allocated to the program units, and then a corresponding dependency graph is constructed by the plurality of nodes corresponding to the program units. It will be appreciated that when one of the plurality of program units is directly relied upon by another program unit, the dependency graph includes a directed edge that is directly pointed to by the node to which the one program unit corresponds.
Continuing with the previous example, reference is made to fig. 4. Depending on the dependency relationships among the 6 program units of the foregoing example, such as "hello ()", "h= …", "world ()", "h= …", "HW (h, w)", and "r= …", a dependency relationship graph having the foregoing 6 program units as nodes may be correspondingly constructed.
In step S105, when the dependency graph is a directed acyclic graph, the depth numbers of the nodes are determined.
More specifically, the depth numbers of the nodes are determined according to the dependency graph.
In a possible implementation manner, an entry node, an exit node, a directed edge from the entry node to the second node, and a directed edge from the third node to the exit node may be newly added in the dependency graph to obtain a target directed acyclic graph, where the degree of entry of the second node in the dependency graph is 0, and the degree of exit of the third node in the dependency graph is 0; next, for any node in the target directed acyclic graph, a depth number of the node is determined, wherein the depth number of the node is a number of directed edges included in a longest directed path between the ingress node and the node.
For example, refer to fig. 5. In the dependency graph, the ingress degree of the node "hello ()" and the node "world ()" are both 0, and the egress degree of the node "r= …" is 0. Accordingly, in addition to two nodes such as an entry node (i.e., node "entry") and an exit node (i.e., node "exit") that are required to be newly added in the dependency graph, a directed edge from the node "entry" to the node "hello ()" and a directed edge from the node "r= …" to the node "exit" are required to be newly added, in other words, two directed edges that start at the node "entry" and end at the node "hello ()" and the node "world ()" are required to be newly added, and a directed edge that starts at the node "r= …" and ends at the node "exit" are required to be newly added.
Based on the target directed graph illustrated in fig. 5, it can be determined that the depth number of the node "entrance" is 0, the depth numbers of the nodes "hello ()" and "world ()" are 1, the depth numbers of the nodes "h= …" and "w= …" are 2, the depth number of the node "HW (h, w)" is 3, the depth number of the node "r= …" is 4, and the depth number of the node "exit" is 5.
Similar numbering convention refers to the target directed acyclic graph illustrated in FIG. 6. In the target directed acyclic graph shown in fig. 6, the depth number of the node "entrance" is 0, the depth numbers of the node a and the node B are 1, the depth number of the node C is 2, the depth number of the node D is 3, and the depth number of the node "exit" is 4.
The determination of the respective depth numbers of the plurality of nodes corresponding to the plurality of program units in the dependency graph may also be achieved in other similar ways based on the same concept as the previous embodiment. For example, in another possible implementation manner, for any first node in the plurality of nodes, all the directed paths with the first node as the end point may be searched in the dependency graph, then the target directed path with the largest number of nodes is determined from all the determined directed paths, and then the number of nodes included in the target directed path is directly used as the depth number of the first node.
Step S107, determining respective depth ranges of a plurality of nodes, wherein for any first node in the plurality of nodes, a lower limit value corresponding to the depth range of the first node is a depth number thereof, and an upper limit value corresponding to the depth range of the first node is determined based on the depth numbers of neighboring nodes, and the neighboring nodes directly depend on the first node.
In one possible implementation, when there is a target directed acyclic graph corresponding to the dependency graph, for any first node of a plurality of nodes corresponding to the plurality of program units, there is at least one neighbor node in the target directed acyclic graph that is directly dependent on the first node. Directly depending on the minimum value in the respective depth numbers of at least one neighbor node of the first node, the difference value between the minimum value and the upper limit value corresponding to the depth range of the first node is 1; in other words, for any first node, a minimum value may be first determined from respective depth numbers of at least one neighboring node directly dependent on the first node, and a 1 reduction operation may be performed on the minimum value to obtain an upper limit value corresponding to the depth range of the first node.
Based on the same concept as the foregoing embodiment, the upper limit value corresponding to the depth range of any first node of the plurality of nodes may be determined without obtaining the target directed acyclic graph. For example, in the dependency graph, for any first node having at least one neighboring node, an upper limit value corresponding to a depth range of the first node may be determined according to the foregoing embodiment; for any first node where no neighbor node exists, all leaf nodes where no neighbor node exists can be first determined from the dependency graph, and the maximum value is determined from the respective depth numbers of all leaf nodes and is used as the upper limit value.
Referring to the target directed acyclic graph shown in fig. 6, for the node a, the neighboring node directly dependent on the node a includes the node D, the depth number of the node a is 1, the depth number of the node D is 3, that is, the upper limit value corresponding to the depth range of the node a is 2, in other words, the depth range [1,2] of the node a can be determined. Similarly, the upper limit value and the lower limit value corresponding to the depth range of the node B are both 1, and it can be determined that the depth range of the node B is [1]; the upper limit value and the lower limit value corresponding to the depth range of the node C are both 2, and the depth range of the node C can be determined to be [2]; the upper limit value and the lower limit value corresponding to the depth range of the node D are 3, and the depth range of the node D can be determined to be [3]. It should be noted that a single depth value is understood to be a depth range in which the upper and lower limits are the same in a logical sense.
Step S109, determining a program unit set which allows parallel execution according to respective depth ranges of a plurality of nodes, wherein the program unit set comprises a first program unit and a second program unit, and the depth ranges of the respective corresponding nodes have overlapping areas.
The first program unit and the second program unit belong to a plurality of program units obtained by splitting the program code.
When there is a target directed acyclic graph corresponding to the dependency graph, referring to fig. 7, determining a program unit set that allows parallel execution may be implemented by some or all of steps S1091 to S1095 as follows.
Step S1091, determining a depth interval to be queried according to the depth numbers of the entrance node and the exit node.
The lower limit value of the depth interval is a result of performing an add 1 operation on the depth number of the ingress node, and the upper limit value of the depth interval is a result of performing a subtract 1 operation on the depth number of the egress node. Illustratively, referring to the target directed acyclic graph exemplarily provided in fig. 5, the depth numbers of the entry node and the exit node thereof are 0 and 5, respectively, and then the depth interval to be queried is [1,4]; referring to the target directed acyclic graph exemplarily provided in fig. 6, depth numbers of the ingress node and the egress node thereof are 0 and 4, respectively, the depth interval to be queried is [1,3].
Step S1093, for each first depth number located in the depth interval, queries the target node from the plurality of nodes, where the depth range of the target node includes the first depth number.
In step S1095, when the number of target nodes is not less than 1, the program units corresponding to each target node are added to the set of program units allowed to be executed in parallel.
Illustratively, referring to the target directed acyclic graph exemplarily provided in FIG. 5, the depth ranges of nodes "hello ()" and "world ()" contain the same depth number 1, which would be determined as the target node corresponding to depth number 1 and added to the set of program elements that allow for parallel execution; the depth ranges of node "h= …" and node "w= …" contain the same depth number 2, which would be determined as the target node corresponding to depth number 2 and added to the set of program elements that are allowed to execute in parallel. Referring to the target directed acyclic graph exemplarily provided in fig. 6, the depth ranges of node a and node B include the same depth number 1, the depth ranges of node a and node C include the same depth number 2, node a and node B are determined to be target nodes corresponding to depth number 1, node a and node C are determined to be target nodes corresponding to depth number 2, and node a, node B, node C are added to the set of program elements that allow parallel execution.
Alternatively, the determination of the program element group that is allowed to be executed in parallel may be implemented in a manner other than the foregoing steps S1091 to S1095. For example, the following processing procedure is performed on an arbitrary first node in the dependency graph: determining whether at least one target node exists in the rest nodes except the first node, wherein the depth range of the target node and the depth range of the first node have an overlapping area, namely the same depth number exists, and if so, adding the program units corresponding to the first node and the at least one target node into a program unit set which allows parallel execution.
Through the steps S101 to S109, the program units in the program code allowed to be executed in parallel can be efficiently found.
In order to facilitate the subsequent implementation of converting the program code that needs to be executed serially into a serial version by a corresponding parallel technique, the following steps S111 and S113 may be further executed on the basis of the foregoing steps S101 to S109.
Step S111, determining a first computing task and a second computing task that allow parallel execution according to the program unit set, where the first computing task includes at least a first program unit, and the second computing task includes at least a second program unit.
For any first program unit and second program unit, if there is an overlapping region in the depth ranges of their respective nodes, a first computing task and a second computing task may be allocated to the first program unit and the second program unit, where the first computing task and the second computing task may be executed in two different threads or processes.
For example, refer to the dependency graph provided by way of example in FIG. 5. For the program unit hello () and the program unit world (), the depth ranges thereof contain the same depth number 1, the corresponding computing task 1 can be determined for the program unit hello (), the corresponding computing task 2 can be determined for the program unit world (), and the computing task 1 and the computing task 2 are indicated to be allowed to be executed in parallel; similarly, it is possible to determine its corresponding computing task 3 for program element "h= …", determine its corresponding computing task 4 for program element "w= …", and instruct computing task 3 and computing task 4 to allow parallel execution. With reference to the exemplary provision in fig. 6, for each program element of node a, node B, and node C, corresponding computing task 1, computing task 2, and computing task 3 may be determined in sequence, and the computing task 1 and computing task 2 may be instructed to allow parallel execution, and the computing task 1 and computing task 3 may be instructed to allow parallel execution.
For multiple program units that are added to a set of program units that allow parallel execution, there may be dependencies between portions of the program units that require sequential execution. For example, referring to the target directed acyclic graph exemplarily provided in fig. 5, program unit hello () and program unit "h= …" need to be sequentially executed, and program unit world () and program unit "w= …" need to be sequentially executed; as further described with reference to the target directed acyclic graph provided by way of example in fig. 6, there are dependencies between respective program elements corresponding to node B and node C that require sequential execution. If the computing task is independently allocated to each program unit, that is, each program unit is independently allocated to a thread or a process, the resource consumption is high. In order to save resources, the program units with the dependency relationship in the parallel execution allowed program set may be combined into the same computing task for execution, for example, when the parallel execution allowed program unit set includes the second program unit and the third program unit with the dependency relationship, the third program unit is not required to be independently allocated with the computing task, and the third program unit is required to be allocated with the same computing task as the second program unit.
For example, referring to the target directed acyclic graph exemplarily provided in fig. 5, program unit hello () and program unit "h= …" may be assigned to the same computing task 1, program unit world () and program unit "w= …" may be assigned to the same computing task 2, and computing task 1 and computing task 2 may be executed concurrently/in parallel in different threads or processes. Referring to the target directed acyclic graph exemplarily provided in fig. 6, program units corresponding to node a may be assigned to computing task 1, program units corresponding to node B and node C may each be assigned to the same computing task 2, and computing task 1 and computing task 2 may be executed concurrently/in parallel in different threads or processes.
The portion of the resources required to allow parallel execution of the computing task is less, and the use of parallel techniques for this portion of the computing task may not effectively improve the performance code of the program code. In order to avoid the use of parallel techniques for the program elements comprised by such calculation tasks, the following step S113 may also be performed on the basis of the aforementioned step S111.
Step S113, determining resource consumption information corresponding to each of the first computing task and the second computing task, and determining whether to execute the first computing task and the second computing task in parallel according to the resource consumption information.
The resource consumption information refers to the amount of resources that need to be consumed in the computing device to perform the corresponding computing task, and may be, for example, the time that needs to be consumed in the computing device to perform the corresponding computing task. For example, when there is resource consumption information of a certain computing task against a preset threshold value in the first computing task or the second computing task that is allowed to be executed in parallel, it may be determined that the first computing task and the second computing task do not need to be executed in parallel.
When the dependency graph is a directed acyclic graph, that is, when there is no loop statement in the program code, the foregoing steps S105 to S109 mainly describe how to obtain a set of program units that allow parallel execution, and execute subsequent processing on the set of program units. When the program unit includes a loop sentence, that is, when the dependency graph obtained in step S103 is a non-directed acyclic graph, part or all of steps S115 to S119 may be continuously executed on the basis of the foregoing steps S101 and S103, to obtain a program unit set that allows parallel execution.
In step S115, when the dependency graph is a non-directed acyclic graph, the dependency graph is broken down into a plurality of strongly connected components.
The strongly connected component may also be generally referred to as a strongly connected subgraph or a sub-strongly connected graph.
For example, see the dependency graph provided by way of example in FIG. 8, which includes node A, node B, node C, and node D. The dependency relationship is not a directed acyclic graph, and when the dependency relationship graph is disassembled into a plurality of strong connected components, as in the area selected by the dashed box in fig. 8, 3 strong connected components can be disassembled from the dependency relationship graph.
Step S117, determining a trivial strong connected component from the plurality of strong connected components.
For example, see the dependency graph provided by way of example in FIG. 8. The strongly connected components to which node a and node B belong are non-trivial, as are the strongly connected components to which node D belongs, and only the strongly connected components to which node D belongs are trivial.
Step S119, for any program unit in the plurality of program units, adding the program unit to the program unit set that allows parallel execution in the case where the node to which the program unit corresponds is located in the trivial strong connected component.
For example, see the dependency graph provided by way of example in FIG. 8. For node a, node B, node C and node D, the program elements corresponding to node a, node B and node C, respectively, will not be added to the set of program elements that allow parallel execution, while the program elements corresponding to node D will be added to the set of program elements that allow parallel execution.
For program codes with loop sentences, after obtaining a set of program units which are allowed to be executed in parallel, the number of computing tasks which need to be executed in parallel can be determined for the program units, so that the program units can be executed in the corresponding number of computing tasks.
Whether the program codes of the loop sentences exist or not, after the program unit set allowing parallel execution is obtained or the calculation task needing parallel execution is determined, the program codes needing serial execution can be automatically converted into parallel versions through various parallel technologies.
Based on the same concept as the foregoing method embodiment, a program code parallelism detecting apparatus 900 is also provided in the present embodiment. As shown in fig. 9, the apparatus 900 includes: a code splitting unit 901 configured to split the program code into a plurality of program units; a dependency analysis unit 903 configured to determine a dependency relationship between the plurality of program units; a composition processing unit 905 configured to construct a dependency graph including a plurality of nodes corresponding to the plurality of program units according to the dependency relationship; a depth determination unit 907 configured to determine respective depth numbers of the plurality of nodes when the dependency graph is a directed acyclic graph; a range determining unit 909 configured to determine respective depth ranges of the plurality of nodes, wherein, for any first node of the plurality of nodes, a lower limit value corresponding to the depth range of the first node is a depth number thereof, and an upper limit value corresponding to the depth range of the first node is determined based on a depth number of a neighboring node that directly depends on the first node; the parallel detection unit 911 is configured to determine, according to respective depth ranges of the plurality of nodes, a program unit set that allows parallel execution, where the program unit set includes a first program unit and a second program unit, and the depth ranges of the respective corresponding nodes have overlapping areas.
Those of skill in the art will appreciate that in one or more of the examples described above, the functions described herein may be implemented in hardware, software, firmware, or any combination thereof. When implemented in software, the computer programs corresponding to these functions may be stored in a computer readable medium or transmitted as one or more instructions/codes on a computer readable medium, so that the computer programs corresponding to these functions are executed by a computer, by which the methods described in any of the embodiments of the present specification are implemented.
There is also provided in embodiments of the present specification a computer readable storage medium having stored thereon a computer program which, when executed in a computing device, performs a method of parallelism detection of program code provided in any one of the embodiments of the present specification.
The embodiment of the specification also provides a computing device, which comprises a memory and a processor, wherein executable codes are stored in the memory, and the processor realizes the parallelism detection method of the program codes provided in any embodiment of the specification when executing the executable codes.
In this specification, each embodiment is described in a progressive manner, and the same and similar parts in each embodiment are referred to each other, and each embodiment is mainly described in a different point from other embodiments. In particular, for the device embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference is made to the description of the method embodiments in part.
The foregoing describes specific embodiments of the present disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.
The foregoing embodiments have been provided for the purpose of illustrating the general principles of the present invention in further detail, and are not to be construed as limiting the scope of the invention, but are merely intended to cover any modifications, equivalents, improvements, etc. based on the teachings of the invention.

Claims (14)

1. A method of parallelism detection of program code, the method comprising:
splitting program code into a plurality of program units and determining dependency relationships among the plurality of program units;
constructing a dependency graph according to the dependency relationship, wherein the dependency graph comprises a plurality of nodes corresponding to the plurality of program units;
when the dependency graph is a directed acyclic graph, determining respective depth numbers of the plurality of nodes;
determining respective depth ranges of the plurality of nodes, wherein for any first node in the plurality of nodes, a lower limit value corresponding to the depth range of the first node is a depth number thereof, an upper limit value corresponding to the depth range of the first node is determined based on a depth number of a neighbor node, and the neighbor node directly depends on the first node;
and determining a program unit set which allows parallel execution according to the respective depth ranges of the plurality of nodes, wherein the program unit set comprises a first program unit and a second program unit, and the depth ranges of the nodes corresponding to the program unit set have an overlapping area.
2. The method of claim 1, the splitting of the program code into a plurality of program units, comprising:
Splitting the program code into a plurality of statements;
for any first statement in the plurality of statements, determining whether a function call exists in the first statement;
if yes, determining a called function from the first statement, and a function call primitive depending on the called function, wherein the called function and the function call primitive are used as a program unit;
and if not, taking the first statement as a program unit.
3. The method of claim 1, wherein the determining the dependency relationship between the plurality of program units comprises determining the dependency relationship between the plurality of program units based on the data objects accessed by each of the plurality of program units and the type of access to the corresponding data objects by each of the plurality of program units.
4. The method of claim 1, the determining the respective depth numbers of the plurality of nodes, comprising:
newly adding an inlet node, an outlet node, a directed edge from the inlet node to a second node and a directed edge from a third node to the outlet node in the dependency graph to obtain a target directed acyclic graph, wherein the degree of incidence of the second node in the dependency graph is 0, and the degree of egress of the third node in the dependency graph is 0;
For any node in the target directed acyclic graph, a depth number of the node is determined, which is a number of directed edges included in a longest directed path between the ingress node and the node.
5. The method of claim 4, wherein the difference between the minimum value of the respective depth numbers of at least one neighboring node of the first node and the upper limit value corresponding to the depth range of the first node is 1.
6. The method of claim 4, the determining a set of program elements that are allowed to execute in parallel based on respective depth ranges of the plurality of nodes, comprising:
determining a depth interval to be queried according to the depth numbers of the inlet node and the outlet node;
querying a target node from the plurality of nodes by numbering any first depth located in the depth interval, wherein the first depth number is included in the depth range of the target node;
and when the number of the target nodes is not less than 1, adding the program units corresponding to the target nodes into the program unit set.
7. The method of claim 1, the method further comprising: and determining a first computing task and a second computing task which are allowed to be executed in parallel according to the program unit set, wherein the first computing task at least comprises the first program unit, and the second computing task at least comprises the second program unit.
8. The method of claim 7, further comprising a third program element of the set of program elements that is dependent on the second program element; wherein the second computing task further comprises the third program unit.
9. The method of claim 7, the method further comprising:
determining resource consumption information corresponding to each of the first computing task and the second computing task;
and determining whether to execute the first computing task and the second computing task in parallel according to the resource consumption information.
10. The method of claim 1, the method further comprising:
when the dependency graph is a non-directed acyclic graph, disassembling the dependency graph into a plurality of strongly connected components;
determining a trivial strong connected component from the plurality of strong connected components;
for any of the plurality of program elements, adding the program element to the set of program elements if the node to which the program element corresponds is located in the trivial strong connected component.
11. The method of any of claims 1-10, the program code being written in Python.
12. A parallelism detection apparatus of program code, the apparatus comprising:
A code splitting unit configured to split the program code into a plurality of program units;
a dependency analysis unit configured to determine a dependency relationship between the plurality of program units;
a composition processing unit configured to construct a dependency graph according to the dependency relationship, wherein the dependency graph includes a plurality of nodes corresponding to the plurality of program units;
a depth determination unit configured to determine respective depth numbers of the plurality of nodes when the dependency graph is a directed acyclic graph;
a range determining unit configured to determine respective depth ranges of the plurality of nodes, wherein, for any first node of the plurality of nodes, a lower limit value corresponding to the depth range is a depth number thereof, and an upper limit value corresponding to the depth range is determined based on a depth number of a neighboring node, the neighboring node directly depends on the first node;
and the parallel detection unit is configured to determine a program unit set which allows parallel execution according to the respective depth ranges of the plurality of nodes, wherein the program unit set comprises a first program unit and a second program unit, and the depth ranges of the nodes corresponding to the first program unit and the second program unit have overlapping areas.
13. A computer readable storage having stored thereon a computer program which, when executed in a computing device, performs the method of any of claims 1-11.
14. A computing device comprising a memory and a processor, the memory having stored therein a computer program, the processor executing the computer program to implement the method of any of claims 1-11.
CN202311331747.0A 2023-10-13 2023-10-13 Method and device for detecting parallelism of program codes Active CN117093502B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311331747.0A CN117093502B (en) 2023-10-13 2023-10-13 Method and device for detecting parallelism of program codes

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311331747.0A CN117093502B (en) 2023-10-13 2023-10-13 Method and device for detecting parallelism of program codes

Publications (2)

Publication Number Publication Date
CN117093502A CN117093502A (en) 2023-11-21
CN117093502B true CN117093502B (en) 2024-01-30

Family

ID=88777160

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311331747.0A Active CN117093502B (en) 2023-10-13 2023-10-13 Method and device for detecting parallelism of program codes

Country Status (1)

Country Link
CN (1) CN117093502B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101329638A (en) * 2007-06-18 2008-12-24 国际商业机器公司 Method and system for analyzing parallelism of program code
CN109814986A (en) * 2017-11-20 2019-05-28 上海寒武纪信息科技有限公司 Task method for parallel processing, storage medium, computer equipment, device and system
CN112288249A (en) * 2020-10-20 2021-01-29 杭州鲸算科技有限公司 Business process execution method and device, computer equipment and medium
CN114564297A (en) * 2022-03-04 2022-05-31 中信银行股份有限公司 Task execution sequence calculation method, device and equipment and readable storage medium
CN114625507A (en) * 2022-03-14 2022-06-14 广州经传多赢投资咨询有限公司 Task scheduling method, system, equipment and storage medium based on directed acyclic graph
CN116820962A (en) * 2023-06-28 2023-09-29 浙江极氪智能科技有限公司 Method and device for detecting risk codes

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3707727B2 (en) * 2000-10-30 2005-10-19 インターナショナル・ビジネス・マシーンズ・コーポレーション Program optimization method and compiler using the same
US8667474B2 (en) * 2009-06-19 2014-03-04 Microsoft Corporation Generation of parallel code representations
JP6004818B2 (en) * 2012-08-07 2016-10-12 インターナショナル・ビジネス・マシーンズ・コーポレーションInternational Business Machines Corporation Parallelization method, system, and program

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101329638A (en) * 2007-06-18 2008-12-24 国际商业机器公司 Method and system for analyzing parallelism of program code
CN109814986A (en) * 2017-11-20 2019-05-28 上海寒武纪信息科技有限公司 Task method for parallel processing, storage medium, computer equipment, device and system
CN112288249A (en) * 2020-10-20 2021-01-29 杭州鲸算科技有限公司 Business process execution method and device, computer equipment and medium
CN114564297A (en) * 2022-03-04 2022-05-31 中信银行股份有限公司 Task execution sequence calculation method, device and equipment and readable storage medium
CN114625507A (en) * 2022-03-14 2022-06-14 广州经传多赢投资咨询有限公司 Task scheduling method, system, equipment and storage medium based on directed acyclic graph
CN116820962A (en) * 2023-06-28 2023-09-29 浙江极氪智能科技有限公司 Method and device for detecting risk codes

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Automatic Code Parallelization with OpenMP task constructs;Manju M.等;2016 International Conference on Information Science (ICIS);全文 *
基于多核阵列体系结构的嵌套循环并行优化;杨子煜;严明;赵鹏;;计算机工程与科学(第S1期);全文 *

Also Published As

Publication number Publication date
CN117093502A (en) 2023-11-21

Similar Documents

Publication Publication Date Title
EP3631618B1 (en) Automated dependency analyzer for heterogeneously programmed data processing system
CN108628635B (en) Method, device, equipment and storage medium for acquiring parameter name and local variable name
Wirth et al. Compiler construction
US9239710B2 (en) Programming language transformations with abstract syntax tree extensions
US7958493B2 (en) Type inference system and method
US20090313600A1 (en) Concurrent code generation
US20130097593A1 (en) Computer-Guided Holistic Optimization of MapReduce Applications
Claessen et al. Generating constrained random data with uniform distribution
CN109313547B (en) Query optimizer for CPU utilization and code reformulation
US10216501B2 (en) Generating code in statically typed programming languages for dynamically typed array-based language
US20100250564A1 (en) Translating a comprehension into code for execution on a single instruction, multiple data (simd) execution
US7451438B2 (en) Method, system and product for identifying and executing locked read regions and locked write regions in programming languages that offer synchronization without explicit means to distinguish between such regions
US8103674B2 (en) E-matching for SMT solvers
Yuan et al. Compiling esterel for multi-core execution
Benzinger Automated complexity analysis of Nuprl extracted programs
CN117093502B (en) Method and device for detecting parallelism of program codes
Göhringer et al. An interactive tool based on polly for detection and parallelization of loops
Jin et al. A method for describing the syntax and semantics of UML statecharts
CN114398039A (en) Automatic fine-grained two-stage parallel translation method
Alonso-Blas et al. Precise cost analysis via local reasoning
US11662988B2 (en) Compiler for RISC processor having specialized registers
Daszczuk et al. Adding parallelism to sequential programs–a combined method
Poletanović et al. Implementation of Machine Outliner for nanoMIPS in the LLVM Compiler Infrastructure
Ng Rust vs C++, a Battle of Speed and Efficiency
CN116909568A (en) Method, device, equipment and storage medium for adding time-consuming codes in source code file

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant