CN108197027B - Software performance optimization method, storable medium, computer program - Google Patents

Software performance optimization method, storable medium, computer program Download PDF

Info

Publication number
CN108197027B
CN108197027B CN201711499169.6A CN201711499169A CN108197027B CN 108197027 B CN108197027 B CN 108197027B CN 201711499169 A CN201711499169 A CN 201711499169A CN 108197027 B CN108197027 B CN 108197027B
Authority
CN
China
Prior art keywords
program
performance optimization
binary
sentences
software performance
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201711499169.6A
Other languages
Chinese (zh)
Other versions
CN108197027A (en
Inventor
唐华
蔡智
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Ginpie Technology Co ltd
Original Assignee
Guangzhou Ginpie Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Ginpie Technology Co ltd filed Critical Guangzhou Ginpie Technology Co ltd
Priority to CN201711499169.6A priority Critical patent/CN108197027B/en
Publication of CN108197027A publication Critical patent/CN108197027A/en
Application granted granted Critical
Publication of CN108197027B publication Critical patent/CN108197027B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/362Software debugging
    • G06F11/3628Software debugging of optimised code

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Devices For Executing Special Programs (AREA)

Abstract

The invention belongs to the technical field of computer software, and discloses a software performance optimization method, a storable medium, a computer and a computer program, wherein binary codes are converted into intermediate languages, and program dependency relationships are extracted; the statements without dependency relationship are classified into one class, and a plurality of parallel execution operation groups are obtained; and dividing different operation groups to different operation components to regenerate the binary codes. The invention can convert the serial binary program into the parallel binary program, fully utilizes the parallel computing capability of high-performance computing equipment and improves the program operating efficiency. The invention improves the parallel operation capability of software and improves the performance on high-performance hardware.

Description

Software performance optimization method, storable medium, computer program
Technical Field
The invention belongs to the technical field of computer software, and particularly relates to a software performance optimization method, a storable medium, a computer and a computer program.
Background
There is no relevant patent.
There is a related class of techniques, loop parallelization. The technology breaks a single cycle into a plurality of cycles which can be independently operated, and then operates the cycles on a plurality of operation components in parallel, thereby achieving the purpose of improving the cycle operation efficiency.
In summary, the problems of the prior art are as follows: loop parallelization only focuses on loops and does not deal with parts of the program outside loops.
Disclosure of Invention
In view of the problems in the prior art, the present invention provides a software performance optimization method, a storable medium, a computer, and a computer program.
The invention is realized in such a way that a software performance optimization method comprises the following steps:
firstly, converting binary codes into intermediate languages, and extracting program dependency relations;
secondly, classifying the sentences without the dependency relationship into one class to obtain a plurality of parallel execution running groups;
and thirdly, dividing different operation groups into different operation components to regenerate the binary codes.
Further, the software performance optimization method specifically comprises the following steps:
translating a binary code into an intermediate language;
extracting a program dependency relationship from the intermediate language code, wherein the program dependency relationship comprises control flow dependency and data flow dependency and forms a program dependency graph;
thirdly, dividing the sentences without the dependency relationship into different groups;
and fourthly, binary rewriting, wherein the grouped sentences are written into the parallel execution unit to generate a parallel binary file.
Further, the statement grouping of the third step specifically includes:
firstly, searching a program dependency graph to find all leaf nodes;
secondly, backtracking from the nodes of the current operation group to the nodes of the bifurcation downwards according to the program dependency graph, and dividing the backtracked nodes into the same operation group;
thirdly, judging whether the sentences depending on the branch points are completely executed or not at the position of each branch point, if not, waiting, if so, setting the state of the branch points as ready, and grouping the branch points into a new operation group;
and fourthly, judging whether the current node is the root node of the program dependency graph or not, if not, returning to the second step, and if so, finishing grouping.
Another object of the present invention is to provide a software performance optimization system of the software performance optimization method, the software performance optimization system including:
the translation module is used for translating the binary code into an intermediate language;
the extraction module is used for extracting the program dependency relationship from the intermediate language code;
the grouping module is used for grouping the sentences without the dependency relationship into different groups;
and the rewriting module is used for writing the grouped sentences into the parallel execution unit to generate a parallel binary file.
Another object of the present invention is to provide a computer program for implementing the software performance optimization method.
Another object of the present invention is to provide a computer to which the software performance optimization method is applied.
The invention also aims to provide a storage medium for storing the software performance optimization method, wherein the storage medium is a ROM read-only memory, a RAM read-write memory or an EPROM programmable memory.
The invention can convert the serial binary program into the parallel binary program, fully utilizes the parallel computing capability of high-performance computing equipment and improves the program operating efficiency. The invention improves the parallel operation capability of software and improves the performance on high-performance hardware.
Drawings
Fig. 1 is a flowchart of a software performance optimization method according to an embodiment of the present invention.
Fig. 2 is a flowchart of an implementation of a software performance optimization method according to an embodiment of the present invention.
Fig. 3 is a program dependency diagram provided by an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
Different from the existing loop parallelization technology, the invention parallelizes the sentences which can be parallelized except the whole program loop, and judges the control flow dependency relationship and the data flow dependency relationship among the sentences by adopting program dependency analysis.
The following detailed description of the principles of the invention is provided in connection with the accompanying drawings.
As shown in fig. 1, the software performance optimization method provided by the embodiment of the present invention includes the following steps:
s101: converting the serial program binary code into an intermediate language;
s102: calculating the interdependence relation of each statement;
s103: constructing a sentence dependency graph;
s104: putting the interdependent statements into an execution thread;
s105: the parallel binary program is generated using a binary rewrite technique.
The software performance optimization method provided by the embodiment of the invention specifically comprises the following steps:
(1) translation, the implementation of the translation component firstly requires selecting or designing a simple and easy-to-use intermediate language, and currently, a plurality of intermediate language formats are available for selection, such as: vex, llvm, etc. The embodiment of the invention adopts llvm intermediate language.
(2) The program Dependency Analysis comprises data flow Dependency and control flow Dependency, and adopts an algorithm proposed by Austin et al of dynamics Dependency Analysis of organization Programs to draw a program Dependency graph.
(3) And (4) statement grouping, programming by using C/C + +, writing an llvm plug-in, and directly realizing on the llvm.
(4) Binary overwrite, various binary overwrite tools can be employed, such as: dyninst, and the like. And directly recompiling the grouped llvm intermediate statements by using the llvm to generate a parallel binary file.
The application of the principles of the present invention will now be described in further detail with reference to the accompanying drawings.
As shown in fig. 2, the input of the software performance optimization method provided by the embodiment of the present invention is a binary file of software, and a source program of the software is not required. The output is the optimized parallel binary file, and the file operation result is the same as the original serial binary file, except that if the parallel binary file is operated on a high-performance device, higher operation performance is obtained.
And translating, namely translating the binary code into an intermediate language which is easy to process. The binary code is determined by an instruction set, and the instruction set of the current mainstream comprises the following instructions: x86, x64, arm, PowerPC, MIPS, and the like. Different instruction sets and instruction formats and semantics are greatly different, so a uniform intermediate language form is needed, the difference of the instruction sets can be shielded, and a uniform optimization device is realized. Second, some instruction sets are very complex, exemplified by the x86 instruction set. The x86 instruction set includes hundreds of instructions, which are of variable length, and which may have operands of 0, 1, and 2, and operand types that may have immediate, register, and memory addresses. Therefore, it is very difficult to directly operate and analyze binary codes. Therefore, the invention needs to convert the binary file into a simple intermediate language similar to a reduced instruction set, which only contains 10 or more than 20 statement types, has simple semantics and can be analyzed conveniently.
And (4) program dependency analysis, namely extracting program dependency relationships from the intermediate language code, wherein the program dependency relationships comprise control flow dependencies and data flow dependencies and forming a program dependency graph. An example of a program dependency graph is shown in fig. 3. In this example, the program is translated into a total of 11 intermediate statements, the dependencies being marked by arrows, e.g. statement 5 depends on statements 1, 2, 3.
The statements are grouped, and the statements without dependency relationship are divided into different groups, so that the statements can be handed to high-performance computing hardware to be executed in parallel. The statement grouping specifically includes:
the first step, initialization process, search procedure dependency graph, finds all leaf nodes, i.e. nodes not dependent on any node, such as 1, 2, 3, 4, 10 of fig. 2. Each leaf node is divided into a running group, so there are 5 running groups in the example under the initial conditions. The node states are two, namely not ready and ready, and under the initial condition, all leaf nodes are in the ready state and other nodes are marked as not ready states.
And secondly, backtracking from the nodes of the current operation group to the nodes of the bifurcation downwards according to the program dependency graph, and dividing the backtracked nodes into the same operation group. Therefore, statements 10 and 11 are in the same run group, indicating that they cannot be executed in parallel.
And thirdly, judging whether the sentences depending on the branch points are completely executed at the position of each branch point, if not, waiting, if so, setting the state of the branch points as ready, and grouping the branch points into a new operation group. For the example of fig. 2, statement 6 must wait until the execution of statements 4 and 11 is completed before the state is modified to ready and is included in a new run set.
And fourthly, judging whether the current node is the root node of the program dependency graph or not, if not, returning to the second step, and if so, finishing grouping.
And (4) binary rewriting, namely writing the grouped sentences into a parallel execution unit (such as a thread), and finally generating a parallel binary file.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims (4)

1. A software performance optimization method is characterized in that the software performance optimization method parallelizes sentences which can be parallelized except for the whole program loop, and judges a control flow dependency relationship and a data flow dependency relationship among the sentences by adopting program dependency analysis;
the software performance optimization method specifically comprises the following steps:
firstly, converting binary codes into intermediate languages, and extracting program dependency relations; the binary code is translated into an intermediate language which is easy to process, the binary code is determined by an instruction set, and the current mainstream instruction set comprises the following components: x86, x64, arm, PowerPC, MIPS;
secondly, classifying the sentences without the dependency relationship into one class to obtain a plurality of parallel execution running groups;
thirdly, dividing different operation groups into different operation components and regenerating binary codes;
the software performance optimization method specifically comprises the following steps:
translating a binary code into an intermediate language;
extracting a program dependency relationship from the intermediate language code, wherein the program dependency relationship comprises control flow dependency and data flow dependency and forms a program dependency graph;
thirdly, dividing the sentences without the dependency relationship into different groups;
step four, binary rewriting, writing the grouped sentences into a parallel execution unit to generate a parallel binary file;
the third step specifically comprises:
firstly, searching a program dependency graph to find all leaf nodes;
secondly, backtracking from the nodes of the current operation group to the nodes of the bifurcation downwards according to the program dependency graph, and dividing the backtracked nodes into the same operation group;
thirdly, judging whether the sentences depending on the branch points are completely executed or not at the position of each branch point, if not, waiting, if so, setting the state of the branch points as ready, and grouping the branch points into a new operation group;
and fourthly, judging whether the current node is the root node of the program dependency graph or not, if not, returning to the second step, and if so, finishing grouping.
2. A software performance optimization system of the software performance optimization method of claim 1, wherein the software performance optimization system comprises:
the translation module is used for translating the binary code into an intermediate language;
the extraction module is used for extracting the program dependency relationship from the intermediate language code; the binary code is translated into an intermediate language which is easy to process, the binary code is determined by an instruction set, and the current mainstream instruction set comprises the following components: x86, x64, arm, PowerPC, MIPS;
the grouping module is used for classifying the sentences without the dependency relationship to obtain a plurality of parallel execution running groups; for dividing statements without dependencies into different groups; the method specifically comprises the following steps:
searching the program dependence graph to find all leaf nodes;
according to the program dependency graph, backtracking from the nodes of the current operation group to the nodes of the bifurcation, and dividing the backtracked nodes into the same operation group;
judging whether the sentences depending on the branch points are completely executed or not at the position of each branch point, if not, waiting, if so, setting the state of the branch points as ready, and grouping the branch points into a new operation group;
judging whether the current node is the root node of the program dependency graph, if not, returning to the second step, and if so, finishing grouping;
the rewriting module is used for dividing different operation groups into different operation components and regenerating binary codes; and writing the grouped sentences into a parallel execution unit to generate a parallel binary file.
3. A computer to which the software performance optimization method of claim 1 is applied.
4. A storable medium storing the software performance optimization method of claim 1, wherein the storable medium is a ROM read only memory, a RAM read and write memory, an EPROM programmable memory.
CN201711499169.6A 2017-12-29 2017-12-29 Software performance optimization method, storable medium, computer program Active CN108197027B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711499169.6A CN108197027B (en) 2017-12-29 2017-12-29 Software performance optimization method, storable medium, computer program

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711499169.6A CN108197027B (en) 2017-12-29 2017-12-29 Software performance optimization method, storable medium, computer program

Publications (2)

Publication Number Publication Date
CN108197027A CN108197027A (en) 2018-06-22
CN108197027B true CN108197027B (en) 2021-07-16

Family

ID=62587845

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711499169.6A Active CN108197027B (en) 2017-12-29 2017-12-29 Software performance optimization method, storable medium, computer program

Country Status (1)

Country Link
CN (1) CN108197027B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109144521A (en) * 2018-09-28 2019-01-04 五八有限公司 Generate method, apparatus, computer equipment and the readable storage medium storing program for executing of static library
CN109445881A (en) * 2018-11-02 2019-03-08 拉卡拉支付股份有限公司 Script operation method, device, electronic equipment and storage medium
CN111752821B (en) * 2019-03-29 2024-06-04 上海哔哩哔哩科技有限公司 Method, device, computer equipment and readable storage medium for packet pressure measurement
CN116775277A (en) * 2019-09-10 2023-09-19 华为技术有限公司 Method and device for optimizing tensor calculation performance
CN114647464B (en) * 2022-05-19 2022-09-06 恒生电子股份有限公司 Application parallel starting processing method and device and electronic equipment
CN116610325B (en) * 2023-07-20 2023-11-10 龙芯中科技术股份有限公司 Binary translation method, binary translation device, electronic equipment and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101814053A (en) * 2010-03-29 2010-08-25 中国人民解放军信息工程大学 Method for discovering binary code vulnerability based on function model
CN102043659A (en) * 2010-12-08 2011-05-04 上海交通大学 Compiling device for eliminating memory access conflict and implementation method thereof

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8479161B2 (en) * 2009-03-18 2013-07-02 Oracle International Corporation System and method for performing software due diligence using a binary scan engine and parallel pattern matching
CN101963918B (en) * 2010-10-26 2013-05-01 上海交通大学 Method for realizing virtual execution environment of central processing unit (CPU)/graphics processing unit (GPU) heterogeneous platform
US9183020B1 (en) * 2014-11-10 2015-11-10 Xamarin Inc. Multi-sized data types for managed code
CN105242929B (en) * 2015-10-13 2018-07-17 西安交通大学 A kind of design method of binary program automatically parallelizing for multi-core platform
CN105550120B (en) * 2016-01-29 2018-02-16 中国人民解放军信息工程大学 The multi-source multi-target performed based on parallel symbol approaches method of testing

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101814053A (en) * 2010-03-29 2010-08-25 中国人民解放军信息工程大学 Method for discovering binary code vulnerability based on function model
CN102043659A (en) * 2010-12-08 2011-05-04 上海交通大学 Compiling device for eliminating memory access conflict and implementation method thereof

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于函数式中间语言的XML查询并行化;陈荣鑫;《重庆理工大学学报:自然科学》;20120421(第07期);81-86 *

Also Published As

Publication number Publication date
CN108197027A (en) 2018-06-22

Similar Documents

Publication Publication Date Title
CN108197027B (en) Software performance optimization method, storable medium, computer program
US9081586B2 (en) Systems and methods for customizing optimization/transformation/ processing strategies
WO2021258692A1 (en) Multi-chip compatible compiling method and device
US20190220535A1 (en) Database system based on jit compilation, query processing method thereof, and stored procedure optimization method thereof
Ahmad et al. Leveraging parallel data processing frameworks with verified lifting
Ward et al. A practical program transformation system for reverse engineering
CN110347588B (en) Software verification method, device, computer equipment and storage medium
US11593076B2 (en) Method for merging architecture data
US9182960B2 (en) Loop distribution detection program and loop distribution detection method
US20150020051A1 (en) Method and apparatus for automated conversion of software applications
US20180253287A1 (en) Method for translation of assembler computer language to validated object-oriented programming language
Namjoshi et al. A self-certifying compilation framework for webassembly
Li et al. J2M: a Java to MapReduce translator for cloud computing
CN112527304A (en) Self-adaptive node fusion compiling optimization method based on heterogeneous platform
Wenzel et al. Declarative programming for microcontrollers-Datalog on Arduino
Cordeiro et al. Intrinsics-hmc: An automatic trace generator for simulations of processing-in-memory instructions
CN115983378A (en) Automatic compiling method for kernel of machine learning operating system
JP5775386B2 (en) Parallelization method, system, and program
KR102117165B1 (en) Method and apparatus for testing intermediate language for binary analysis
CN115705250A (en) Monitoring stack usage to optimize programs
CN112114817B (en) COBOL language-based data dictionary field information acquisition method and device
Nobre et al. Impact of compiler phase ordering when targeting GPUs
Yang et al. M2Coder: A Fully Automated Translator from Matlab M-functions to C/C++ Codes for ACS Motion Controllers
Escalada et al. An adaptable infrastructure to generate training datasets for decompilation issues
Jiang et al. S2N: Model Transformation from SPIN to NuSMV: (Tool Paper)

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant