CN108197027B

CN108197027B - Software performance optimization method, storable medium, computer program

Info

Publication number: CN108197027B
Application number: CN201711499169.6A
Authority: CN
Inventors: 唐华; 蔡智
Original assignee: Guangzhou Ginpie Technology Co ltd
Current assignee: Guangzhou Ginpie Technology Co ltd
Priority date: 2017-12-29
Filing date: 2017-12-29
Publication date: 2021-07-16
Anticipated expiration: 2037-12-29
Also published as: CN108197027A

Abstract

The invention belongs to the technical field of computer software, and discloses a software performance optimization method, a storable medium, a computer and a computer program, wherein binary codes are converted into intermediate languages, and program dependency relationships are extracted; the statements without dependency relationship are classified into one class, and a plurality of parallel execution operation groups are obtained; and dividing different operation groups to different operation components to regenerate the binary codes. The invention can convert the serial binary program into the parallel binary program, fully utilizes the parallel computing capability of high-performance computing equipment and improves the program operating efficiency. The invention improves the parallel operation capability of software and improves the performance on high-performance hardware.

Description

Software performance optimization method, storable medium, computer program

Technical Field

The invention belongs to the technical field of computer software, and particularly relates to a software performance optimization method, a storable medium, a computer and a computer program.

Background

There is no relevant patent.

There is a related class of techniques, loop parallelization. The technology breaks a single cycle into a plurality of cycles which can be independently operated, and then operates the cycles on a plurality of operation components in parallel, thereby achieving the purpose of improving the cycle operation efficiency.

In summary, the problems of the prior art are as follows: loop parallelization only focuses on loops and does not deal with parts of the program outside loops.

Disclosure of Invention

In view of the problems in the prior art, the present invention provides a software performance optimization method, a storable medium, a computer, and a computer program.

The invention is realized in such a way that a software performance optimization method comprises the following steps:

firstly, converting binary codes into intermediate languages, and extracting program dependency relations;

secondly, classifying the sentences without the dependency relationship into one class to obtain a plurality of parallel execution running groups;

and thirdly, dividing different operation groups into different operation components to regenerate the binary codes.

Further, the software performance optimization method specifically comprises the following steps:

translating a binary code into an intermediate language;

extracting a program dependency relationship from the intermediate language code, wherein the program dependency relationship comprises control flow dependency and data flow dependency and forms a program dependency graph;

thirdly, dividing the sentences without the dependency relationship into different groups;

and fourthly, binary rewriting, wherein the grouped sentences are written into the parallel execution unit to generate a parallel binary file.

Further, the statement grouping of the third step specifically includes:

firstly, searching a program dependency graph to find all leaf nodes;

secondly, backtracking from the nodes of the current operation group to the nodes of the bifurcation downwards according to the program dependency graph, and dividing the backtracked nodes into the same operation group;

thirdly, judging whether the sentences depending on the branch points are completely executed or not at the position of each branch point, if not, waiting, if so, setting the state of the branch points as ready, and grouping the branch points into a new operation group;

and fourthly, judging whether the current node is the root node of the program dependency graph or not, if not, returning to the second step, and if so, finishing grouping.

Another object of the present invention is to provide a software performance optimization system of the software performance optimization method, the software performance optimization system including:

the translation module is used for translating the binary code into an intermediate language;

the extraction module is used for extracting the program dependency relationship from the intermediate language code;

the grouping module is used for grouping the sentences without the dependency relationship into different groups;

and the rewriting module is used for writing the grouped sentences into the parallel execution unit to generate a parallel binary file.

Another object of the present invention is to provide a computer program for implementing the software performance optimization method.

Another object of the present invention is to provide a computer to which the software performance optimization method is applied.

The invention also aims to provide a storage medium for storing the software performance optimization method, wherein the storage medium is a ROM read-only memory, a RAM read-write memory or an EPROM programmable memory.

The invention can convert the serial binary program into the parallel binary program, fully utilizes the parallel computing capability of high-performance computing equipment and improves the program operating efficiency. The invention improves the parallel operation capability of software and improves the performance on high-performance hardware.

Drawings

Fig. 1 is a flowchart of a software performance optimization method according to an embodiment of the present invention.

Fig. 2 is a flowchart of an implementation of a software performance optimization method according to an embodiment of the present invention.

Fig. 3 is a program dependency diagram provided by an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

Different from the existing loop parallelization technology, the invention parallelizes the sentences which can be parallelized except the whole program loop, and judges the control flow dependency relationship and the data flow dependency relationship among the sentences by adopting program dependency analysis.

The following detailed description of the principles of the invention is provided in connection with the accompanying drawings.

As shown in fig. 1, the software performance optimization method provided by the embodiment of the present invention includes the following steps:

s101: converting the serial program binary code into an intermediate language;

s102: calculating the interdependence relation of each statement;

s103: constructing a sentence dependency graph;

s104: putting the interdependent statements into an execution thread;

s105: the parallel binary program is generated using a binary rewrite technique.

The software performance optimization method provided by the embodiment of the invention specifically comprises the following steps:

(1) translation, the implementation of the translation component firstly requires selecting or designing a simple and easy-to-use intermediate language, and currently, a plurality of intermediate language formats are available for selection, such as: vex, llvm, etc. The embodiment of the invention adopts llvm intermediate language.

(2) The program Dependency Analysis comprises data flow Dependency and control flow Dependency, and adopts an algorithm proposed by Austin et al of dynamics Dependency Analysis of organization Programs to draw a program Dependency graph.

(3) And (4) statement grouping, programming by using C/C + +, writing an llvm plug-in, and directly realizing on the llvm.

(4) Binary overwrite, various binary overwrite tools can be employed, such as: dyninst, and the like. And directly recompiling the grouped llvm intermediate statements by using the llvm to generate a parallel binary file.

The application of the principles of the present invention will now be described in further detail with reference to the accompanying drawings.

As shown in fig. 2, the input of the software performance optimization method provided by the embodiment of the present invention is a binary file of software, and a source program of the software is not required. The output is the optimized parallel binary file, and the file operation result is the same as the original serial binary file, except that if the parallel binary file is operated on a high-performance device, higher operation performance is obtained.

And translating, namely translating the binary code into an intermediate language which is easy to process. The binary code is determined by an instruction set, and the instruction set of the current mainstream comprises the following instructions: x86, x64, arm, PowerPC, MIPS, and the like. Different instruction sets and instruction formats and semantics are greatly different, so a uniform intermediate language form is needed, the difference of the instruction sets can be shielded, and a uniform optimization device is realized. Second, some instruction sets are very complex, exemplified by the x86 instruction set. The x86 instruction set includes hundreds of instructions, which are of variable length, and which may have operands of 0, 1, and 2, and operand types that may have immediate, register, and memory addresses. Therefore, it is very difficult to directly operate and analyze binary codes. Therefore, the invention needs to convert the binary file into a simple intermediate language similar to a reduced instruction set, which only contains 10 or more than 20 statement types, has simple semantics and can be analyzed conveniently.

And (4) program dependency analysis, namely extracting program dependency relationships from the intermediate language code, wherein the program dependency relationships comprise control flow dependencies and data flow dependencies and forming a program dependency graph. An example of a program dependency graph is shown in fig. 3. In this example, the program is translated into a total of 11 intermediate statements, the dependencies being marked by arrows, e.g. statement 5 depends on

statements

1, 2, 3.

The statements are grouped, and the statements without dependency relationship are divided into different groups, so that the statements can be handed to high-performance computing hardware to be executed in parallel. The statement grouping specifically includes:

the first step, initialization process, search procedure dependency graph, finds all leaf nodes, i.e. nodes not dependent on any node, such as 1, 2, 3, 4, 10 of fig. 2. Each leaf node is divided into a running group, so there are 5 running groups in the example under the initial conditions. The node states are two, namely not ready and ready, and under the initial condition, all leaf nodes are in the ready state and other nodes are marked as not ready states.

And secondly, backtracking from the nodes of the current operation group to the nodes of the bifurcation downwards according to the program dependency graph, and dividing the backtracked nodes into the same operation group. Therefore,

statements

10 and 11 are in the same run group, indicating that they cannot be executed in parallel.

And thirdly, judging whether the sentences depending on the branch points are completely executed at the position of each branch point, if not, waiting, if so, setting the state of the branch points as ready, and grouping the branch points into a new operation group. For the example of fig. 2, statement 6 must wait until the execution of

statements

4 and 11 is completed before the state is modified to ready and is included in a new run set.

And (4) binary rewriting, namely writing the grouped sentences into a parallel execution unit (such as a thread), and finally generating a parallel binary file.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims

1. A software performance optimization method is characterized in that the software performance optimization method parallelizes sentences which can be parallelized except for the whole program loop, and judges a control flow dependency relationship and a data flow dependency relationship among the sentences by adopting program dependency analysis;

the software performance optimization method specifically comprises the following steps:

firstly, converting binary codes into intermediate languages, and extracting program dependency relations; the binary code is translated into an intermediate language which is easy to process, the binary code is determined by an instruction set, and the current mainstream instruction set comprises the following components: x86, x64, arm, PowerPC, MIPS;

thirdly, dividing different operation groups into different operation components and regenerating binary codes;

translating a binary code into an intermediate language;

step four, binary rewriting, writing the grouped sentences into a parallel execution unit to generate a parallel binary file;

the third step specifically comprises:

firstly, searching a program dependency graph to find all leaf nodes;

2. A software performance optimization system of the software performance optimization method of claim 1, wherein the software performance optimization system comprises:

the extraction module is used for extracting the program dependency relationship from the intermediate language code; the binary code is translated into an intermediate language which is easy to process, the binary code is determined by an instruction set, and the current mainstream instruction set comprises the following components: x86, x64, arm, PowerPC, MIPS;

the grouping module is used for classifying the sentences without the dependency relationship to obtain a plurality of parallel execution running groups; for dividing statements without dependencies into different groups; the method specifically comprises the following steps:

searching the program dependence graph to find all leaf nodes;

according to the program dependency graph, backtracking from the nodes of the current operation group to the nodes of the bifurcation, and dividing the backtracked nodes into the same operation group;

judging whether the sentences depending on the branch points are completely executed or not at the position of each branch point, if not, waiting, if so, setting the state of the branch points as ready, and grouping the branch points into a new operation group;

judging whether the current node is the root node of the program dependency graph, if not, returning to the second step, and if so, finishing grouping;

the rewriting module is used for dividing different operation groups into different operation components and regenerating binary codes; and writing the grouped sentences into a parallel execution unit to generate a parallel binary file.

3. A computer to which the software performance optimization method of claim 1 is applied.

4. A storable medium storing the software performance optimization method of claim 1, wherein the storable medium is a ROM read only memory, a RAM read and write memory, an EPROM programmable memory.