CN112269648A

CN112269648A - Parallel task allocation method and device for multi-stage program analysis

Info

Publication number: CN112269648A
Application number: CN202011272405.2A
Authority: CN
Inventors: 陈睿; 江云松; 肖志恒; 王峥; 贾春鹏; 高栋栋; 于婷婷; 丁戈; 朱玉钊
Original assignee: Beijing Sunwise Information Technology Ltd
Current assignee: Beijing Sunwise Information Technology Ltd
Priority date: 2020-11-13
Filing date: 2020-11-13
Publication date: 2021-01-26
Anticipated expiration: 2040-11-13

Abstract

The invention discloses a parallel task allocation method and device for multi-stage program analysis. The method comprises the following steps: according to the dependency relationship among all tasks in the code to be analyzed, a task relationship graph corresponding to the code to be analyzed is constructed; acquiring an analysis task needing to be operated in the code to be analyzed; according to the task relation graph and the analysis task, performing stage division on the analysis task to obtain a stage task set; the phase task set comprises at least one parallel task which can be executed in parallel; and according to the number of the concurrent running tasks, running the stage tasks in the stage task set, and acquiring a task running result. The invention can exert the hardware performance to a greater extent, shorten the overall analysis time, and effectively solve the problems that the results of all the inspectors are accumulated in the same result file, and the result file is too large and is inconvenient to read when the inspection results are more.

Description

Parallel task allocation method and device for multi-stage program analysis

Technical Field

The invention relates to the technical field of code analysis, in particular to a parallel task allocation method and a parallel task allocation device for multi-stage program analysis.

Background

Code analysis is an important method for ensuring the correctness of software code. The high-precision static analysis is carried out on modern large-scale complex software systems (such as a Linux operating system with ten million lines of code scale), and the improvement of the analysis precision usually means longer analysis time. The mutual restriction of precision, efficiency and expandability is a main obstacle for the application of the static analysis technology in the industry.

In the aspect of improving the analysis efficiency, people do a great deal of optimization research work, such as single machine

CPU parallel, distributed, GPU implementation, etc. In view of the fact that the system applied by the invention is a single-machine environment, the invention mainly aims at the parallel situation of single-machine CPUs. Some existing parallel analysis algorithms include a parallel orientation analysis algorithm based on constraint graph rewriting proposed by Mendez-Lojo et al, and a parallel data flow analysis algorithm based on a participant model proposed by Rodriguez et al, which are specific processing aiming at a certain type of specific static analysis problem and have no universality. Albarghouthi et al propose a general framework for parallelizing top-down analysis using a map-reduce strategy, which is more memory dependent and is computationally limited when memory is insufficient.

For the case of performing analysis simultaneously by using multiple types of inspectors, some inspectors may have pre-tasks, for example, before the analysis of an inspector for global variable analysis, program construction needs to be performed first, and then function call relation calculation needs to be performed, and the analysis of the inspector has pre-dependent conditions and needs to be performed in multiple stages. If each checker waits for its pre-task to complete and all checkers are to proceed sequentially, there is a problem that the project integrity analysis is inefficient.

Disclosure of Invention

The technical problem solved by the invention is as follows: the defects of the prior art are overcome, and a parallel task allocation method and a parallel task allocation device for multi-stage program analysis are provided.

In order to solve the above technical problem, an embodiment of the present invention provides a parallel task allocation method for multi-stage program analysis, including:

according to the dependency relationship among all tasks in the code to be analyzed, a task relationship graph corresponding to the code to be analyzed is constructed;

acquiring an analysis task needing to be operated in the code to be analyzed;

according to the task relation graph and the analysis task, performing stage division on the analysis task to obtain a stage task set; the phase task set comprises at least one parallel task which can be executed in parallel;

and according to the number of the concurrent running tasks, running the stage tasks in the stage task set, and acquiring a task running result.

Optionally, the constructing a task relationship graph corresponding to the code to be analyzed according to the dependency relationships among all tasks in the code to be analyzed includes:

taking all the tasks as graph nodes;

and connecting the graph nodes with the dependency relationship according to the dependency relationship to generate the task relationship graph.

Optionally, the performing stage division on the analysis task according to the task relationship graph and the analysis task to obtain a stage task set includes:

acquiring a target graph node of the analysis task in the task relation graph;

acquiring a first graph node which is a father node in the target graph nodes, and dividing the first graph node into a stage task set;

deleting the first graph node from the target graph nodes, circularly executing the steps of acquiring the first graph node which is a parent node from the target graph nodes and dividing the first graph node into a phase task set for the rest target graph nodes until all analysis tasks are divided into the phase task set.

Optionally, the running the phase tasks in the phase task set according to the number of concurrently-run tasks and obtaining a task running result includes:

determining the number of the concurrent running tasks according to the equipment performance of the running equipment and the average occupied memory corresponding to the analysis task;

and according to the phase running sequence of the phase task set and the number of the concurrent running tasks, running the analysis tasks in the phase task set in stages to obtain a task running result.

Optionally, the operating analysis tasks in the phase task sets in the phase task set in stages according to the phase operation order of the phase task set and the number of concurrently operating tasks to obtain a task operation result includes:

sending the task operated in the first stage to a task executor through a task scheduler for execution;

after all the tasks in the first stage are executed, calculating a first task which is not required to be executed in the second stage task according to a process exit code returned by the executor, removing the first task, and sending the rest tasks to the executor for execution;

and after the operation is finished, integrating the results of the analysis tasks selected by the user through a result data integration part to obtain the task operation results.

In order to solve the above technical problem, an embodiment of the present invention further provides a parallel task allocation apparatus for multi-stage program analysis, including:

the task relation graph building module is used for building a task relation graph corresponding to the code to be analyzed according to the dependency relationship among all tasks in the code to be analyzed;

the analysis task acquisition module is used for acquiring an analysis task to be operated in the code to be analyzed;

the task set acquisition module is used for carrying out stage division on the analysis task according to the task relation graph and the analysis task to obtain a stage task set; the phase task set comprises at least one parallel task which can be executed in parallel;

and the operation result acquisition module is used for operating the stage tasks in the stage task set according to the number of the concurrent operation tasks and acquiring the task operation result.

Optionally, the task relationship graph building module includes:

a graph node acquisition unit, configured to take all the tasks as graph nodes;

and the task relation graph generating unit is used for connecting graph nodes with the dependency relationship according to the dependency relationship to generate the task relation graph.

Optionally, the task set obtaining module includes:

the target graph node acquisition unit is used for acquiring target graph nodes of the analysis tasks in the task relation graph;

a first graph node obtaining unit, configured to obtain a first graph node that is a parent node in the target graph nodes, and divide the first graph node into a phase task set;

and the task set acquisition unit is used for deleting the first graph node from the target graph nodes and circularly executing the first graph node acquisition unit on the rest target graph nodes until all the analysis tasks are divided into the stage task sets.

Optionally, the operation result obtaining module includes:

the concurrent task number determining unit is used for determining the concurrent operation task number according to the equipment performance of the operation equipment and the average occupied memory corresponding to the analysis task;

and the task running result acquisition unit is used for running the analysis tasks in the stage task set in stages according to the stage running sequence of the stage task set and the number of the concurrent running tasks to obtain a task running result.

Optionally, the task operation result obtaining unit includes:

the first-stage execution subunit is used for sending the task operated in the first stage to the task executor to be executed through the task scheduler;

the residual task execution subunit is used for calculating a first task which is not required to be executed in the second-stage task according to a process exit code returned by the executor after all the tasks in the first stage are executed, removing the first task, and sending the residual tasks to the executor for execution;

and the operation result acquisition subunit is used for integrating the result of the analysis task selected by the user through the result data integration part after the operation is finished so as to obtain the task operation result.

Compared with the prior art, the invention has the advantages that:

according to the method, the dependency among tasks is calculated, so that the overall planning is effectively carried out on a plurality of analysis processes, and the analysis tasks are executed in parallel according to the priority stages in sequence, so that the problem that the speed of analyzing the tasks by linearly operating the multi-stage multi-checker is too low is solved; according to the method, the number of processes which can run simultaneously is calculated according to the number of tasks, the size of the memory of the PC and the number of the cores of the PC, and the analysis tasks which run in a single stage are executed concurrently according to the calculated number of the parallel tasks, so that the performance of hardware can be exerted to a greater extent, and the overall analysis time is shortened; the invention adopts a redirection output mode to independently form the analysis result of each checker of a project into a file, thereby effectively solving the problems that the results of all the checkers are accumulated in the same result file, and the result file is too large and inconvenient to read when the checking results are more.

Drawings

FIG. 1 is a flowchart illustrating steps of a method for parallel task allocation for multi-stage program analysis according to an embodiment of the present invention;

fig. 2 is a schematic structural diagram of a parallel task allocation apparatus for multi-stage program analysis according to an embodiment of the present invention.

Detailed Description

Aiming at the problem that a user uses a plurality of multi-stage inspectors to analyze large-scale projects, the running time is long, and the running efficiency is low, the embodiment of the invention aims to reasonably schedule analysis tasks, give full play to hardware performance as far as possible, and reduce the total analysis time. The system comprises four parts of analysis task setting, analysis task calculation, analysis task execution and result data integration. The analysis task setting comprises two parts of checker setting and parameter setting; the analysis task calculation part comprises three parts of analysis task generation, analysis task dependency calculation and concurrency number calculation; the analysis task execution is divided into an analysis task scheduler and an analysis task executor. The invention constructs the dependency relationship of tasks required to be executed in each stage of program analysis into a directional graph data structure, and each node of the graph structure represents the task to be analyzed; the directed edges are the dependency relationship among the tasks, namely the father node of each task is the task node on which the father node depends; all edges should be unidirectional and cannot form a ring structure, namely, when a task node upwards searches for a task which depends on the task node, the task node cannot depend on the task node; a node is allowed to have multiple parents or multiple children, i.e. a task is allowed to depend on multiple tasks, or a task is depended on by multiple tasks.

Example one

Referring to fig. 1, a flowchart illustrating steps of a parallel task allocation method for multi-stage program analysis according to an embodiment of the present invention is shown, and as shown in fig. 1, the parallel task allocation method for multi-stage program analysis may specifically include the following steps:

step 101: and constructing a task relation graph corresponding to the code to be analyzed according to the dependency relation among all tasks in the code to be analyzed.

The embodiment of the invention can be combined with the task relation graph corresponding to the code to be analyzed to carry out program analysis.

The code to be analyzed refers to a level code that needs to be analyzed.

After the code to be analyzed is obtained, a task relationship diagram corresponding to the code to be analyzed may be constructed according to the dependency relationships among all tasks in the code to be analyzed, and specifically, the detailed description may be performed in combination with the following specific implementation manner.

In a specific implementation manner of the present invention, the step 101 may include:

substep A1: taking all the tasks as graph nodes;

substep A2: and connecting the graph nodes with the dependency relationship according to the dependency relationship to generate the task relationship graph.

In the embodiment of the present invention, all tasks in the code to be analyzed may be used as graph nodes, and then the graph nodes having a dependency relationship are connected according to the dependency relationship among all tasks, so that a task relationship graph may be generated, specifically, a dependency relationship graph of all tasks is generated according to the dependency relationship among the tasks, and the nodes of this graph represent the execution tasks selected by the user to be analyzed and the tasks on which these tasks depend, obviously, all nodes of this graph must include all the dependent tasks of the analysis checker selected by the user.

Step 102: and acquiring an analysis task needing to be operated in the code to be analyzed.

The analysis task refers to a task needing to be executed in the code to be analyzed.

Only part of the checker analysis tasks may be run in one analysis, and the set of these tasks is recorded as S1, i.e., the set formed by the analysis tasks. Checker settings refer to the checkers that configure the application needed for static analysis, typically specified by the user as input. Parameter settings are additional parameter settings made to the analysis tool, typically using default settings, which may also be specified by the user as input.

The analysis task generation main function is to arrange the input of the analysis task setting part and generate a command line for executing the analysis task. The analysis task dependency calculation is used for calculating the dependency relationship of the analysis tasks and generating multi-stage analysis tasks in cooperation with the analysis task generation part.

After the analysis task to be executed in the code to be analyzed is obtained, step 103 is executed.

Step 103: according to the task relation graph and the analysis task, performing stage division on the analysis task to obtain a stage task set; the phase task set comprises at least one parallel task which can be executed in parallel.

After the task relationship graph and the analysis task are obtained, the analysis task may be segmented according to the task relationship graph and the analysis task to obtain a segment task set, and each segment task set includes at least one parallel task that can be executed in parallel.

In another specific implementation manner of the present invention, the step 103 may include:

substep B1: acquiring a target graph node of the analysis task in the task relation graph;

substep B2: acquiring a first graph node which is a father node in the target graph nodes, and dividing the first graph node into a stage task set;

substep B3: deleting the first graph node from the target graph nodes, circularly executing the steps of acquiring the first graph node which is a parent node from the target graph nodes and dividing the first graph node into a phase task set for the rest target graph nodes until all analysis tasks are divided into the phase task set.

In this embodiment of the present invention, it may be noted that the dependency graph of all tasks is G, and then the element in S1 is a node in G. Finding parent nodes of all elements in S1 in G, recording a set formed by the parent nodes as S2, if elements of which the parent nodes cannot be found exist, putting the elements into another set as S0, and removing the elements in S1; and then carrying out the operation on the S2-Sn-1 set until Sn is an empty set. Finally, a plurality of sets are formed, namely task sets operated in different stages, wherein S0 is the task set operated in the first stage and then is arranged in the sequence from Sn-1 to S1.

After the set of phase tasks is obtained, step 104 is performed.

Step 104: and according to the number of the concurrent running tasks, running the stage tasks in the stage task set, and acquiring a task running result.

After the phase task set is obtained, the phase tasks may be executed according to the number of concurrently executed tasks, and a task execution result is obtained, and specifically, the following detailed description may be performed in combination with the following specific implementation manner:

in another specific implementation manner of the present invention, the step 104 may include:

substep C1: and determining the number of the concurrent running tasks according to the equipment performance of the running equipment and the average occupied memory corresponding to the analysis task.

Substep C2: and according to the phase running sequence of the phase task set and the number of the concurrent running tasks, running the analysis tasks in the phase task set in stages to obtain a task running result.

In the embodiment of the invention, the analysis task scheduler is used for handing the tasks to the analysis task executor one by one to execute during the execution of one stage of analysis task, judging whether the execution of the tasks is successful according to the exit code returned by the executor, and not executing the subsequent tasks of the tasks which are not successfully executed.

The analysis task executor has the functions of creating system process based on task, redirecting various outputs to file, receiving the exit code of the task process and sending the exit code to the analysis task scheduler.

In the set Sn of the plurality of tasks obtained through the above steps, all tasks in the set can be performed simultaneously, but generally, due to the performance limitation of the PC executing the tasks, all tasks cannot be performed simultaneously, and therefore, the number of tasks that can be simultaneously executed needs to be calculated according to the performance of the PC (i.e., the memory size and the number of computer cores) and the average memory occupation of the tasks.

And according to the stage sequence calculated in the steps, establishing a process for running the tasks in stages, designating the redirection positions of the process output stream and the error stream, after starting execution, waiting for the completion of the task running and recording a process exit code, wherein the number of the simultaneously-running tasks cannot exceed the number of the concurrently-running tasks obtained by calculation.

Judging exit codes of processes running tasks (supposed to be Sn), forming a failure task set by the tasks which are not normally exited by the processes, recording the failure task set as Sfn, calculating which tasks in the Sfn are depended by the next stage (namely Sn-1), removing the tasks from the Sn-1, and adding the tasks into the failure task set Sfn-1 of the next stage.

And continuing the running of the next stage task: and repeating the running process for the next stage task.

And after all tasks are completed, performing result integration: for the task actually executed last in S1, according to the output file in the above step, the results of all the output files are integrated into the final result through several threads, where the threads at least include 1 read thread and 1 write thread, and assuming that the calculated concurrency number is greater than 2, it may be determined to create more read threads or write threads according to the number and size of the output files, and it should be noted that if multiple write threads are started, the problem of inter-thread conflict needs to be solved.

In the embodiment of the invention, the task scheduler firstly hands the tasks operated in the first stage to the task executor to execute, calculates which tasks in the second stage are unnecessary to execute according to the process exit code returned by the executor after all the tasks in the first stage are executed, removes the tasks which are unnecessary to execute, and hands the rest tasks to the executor to execute. After the operation is finished, the results of the task analyzed by the checker selected by the user are integrated through the result data integration part, firstly, a result reading thread is started to read the result files of all tasks one by one, then, a result storing thread is started to store the read results into the final result file in a gathering manner, if the concurrent operation number of the tasks selected by the user is more than 2, more result reading threads can be started, the threads are also performed concurrently, only one result storing thread is started, and the problem of writing conflict is avoided.

According to the embodiment of the invention, the number of processes which can run simultaneously is calculated according to the number of tasks, the size of the memory of the PC and the number of the cores of the PC, and the analysis tasks which run in a single stage are concurrently executed according to the calculated number of the parallel tasks, so that the hardware performance can be greatly exerted, and the overall analysis time is shortened.

Example two

Referring to fig. 2, a schematic structural diagram of a parallel task allocation apparatus for multi-stage program analysis according to an embodiment of the present invention is shown, and as shown in fig. 2, the parallel task allocation apparatus for multi-stage program analysis may specifically include the following modules:

the task relationship graph building module 210 is configured to build a task relationship graph corresponding to a code to be analyzed according to dependency relationships among all tasks in the code to be analyzed;

an analysis task obtaining module 220, configured to obtain an analysis task that needs to be executed in the code to be analyzed;

a task set obtaining module 230, configured to perform stage division on the analysis task according to the task relationship diagram and the analysis task to obtain a stage task set; the phase task set comprises at least one parallel task which can be executed in parallel;

and the operation result obtaining module 240 is configured to operate the stage tasks in the stage task set according to the number of concurrent operation tasks, and obtain a task operation result.

Optionally, the task relationship graph building module 210 includes:

a graph node acquisition unit, configured to take all the tasks as graph nodes;

Optionally, the task set obtaining module 230 includes:

Optionally, the operation result obtaining module 240 includes:

Optionally, the task operation result obtaining unit includes:

The above description is only a preferred embodiment of the present invention, and although the present invention has been disclosed in terms of the preferred implementation method, the present invention is not limited to the above description.

Those skilled in the art will appreciate that those matters not described in detail in the present specification are well known in the art.

Claims

1. A method for parallel task distribution for multi-stage program analysis, comprising:

acquiring an analysis task needing to be operated in the code to be analyzed;

2. The method according to claim 1, wherein the constructing a task relationship graph corresponding to the code to be analyzed according to the dependency relationship among all tasks in the code to be analyzed comprises:

taking all the tasks as graph nodes;

3. The method according to claim 1, wherein the step of performing stage division on the analysis task according to the task relationship graph and the analysis task to obtain a stage task set comprises:

acquiring a target graph node of the analysis task in the task relation graph;

4. The method according to claim 1, wherein the operating the phase tasks in the phase task set according to the number of concurrently operating tasks and obtaining the task operation result comprises:

5. The method according to claim 4, wherein the step of executing the analysis tasks in the phase task sets in stages according to the phase execution order of the phase task sets and the number of concurrently executed tasks to obtain task execution results includes:

6. A parallel task allocation apparatus for multi-stage program analysis, comprising:

7. The apparatus of claim 6, wherein the task relationship graph building module comprises:

a graph node acquisition unit, configured to take all the tasks as graph nodes;

8. The apparatus of claim 6, wherein the task set obtaining module comprises:

9. The apparatus of claim 6, wherein the operation result obtaining module comprises:

10. The apparatus according to claim 9, wherein the task execution result obtaining unit includes: