CN114995820A - Code translation method, device, equipment, storage medium and program product - Google Patents

Code translation method, device, equipment, storage medium and program product Download PDF

Info

Publication number
CN114995820A
CN114995820A CN202210568118.9A CN202210568118A CN114995820A CN 114995820 A CN114995820 A CN 114995820A CN 202210568118 A CN202210568118 A CN 202210568118A CN 114995820 A CN114995820 A CN 114995820A
Authority
CN
China
Prior art keywords
source
basic block
instruction
target
candidate
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210568118.9A
Other languages
Chinese (zh)
Inventor
任丰原
李玮
骆晓徽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CN202210568118.9A priority Critical patent/CN114995820A/en
Publication of CN114995820A publication Critical patent/CN114995820A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/44Encoding
    • G06F8/443Optimisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/44Encoding
    • G06F8/447Target code generation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/70Software maintenance or management
    • G06F8/76Adapting program code to run in a different environment; Porting

Landscapes

  • Engineering & Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Devices For Executing Special Programs (AREA)

Abstract

The present application relates to a code translation method, apparatus, device, storage medium and program product. The method comprises the following steps: obtaining a source machine code, wherein the source machine code comprises a plurality of source basic blocks; according to the execution sequence of the source basic blocks when the source machine code runs, target operation is executed for the source basic blocks for multiple times until the target operation is completed on the last source basic block in the execution sequence; wherein the ith target operation in the plurality of target operations comprises: detecting whether the candidate source basic block corresponding to the ith target operation has finished the translation processing; if so, acquiring a target basic block obtained after the candidate source basic block is translated, and operating the target basic block; if not, translating the candidate source basic block based on the LLVM compiler to obtain the target basic block, and operating the target basic block. By adopting the method, the machine code translation efficiency can be improved.

Description

Code translation method, device, equipment, storage medium and program product
Technical Field
The present application relates to the field of binary software migration technologies, and in particular, to a code translation method, apparatus, device, storage medium, and program product.
Background
In order to directly run a program compiled for a processor of a B architecture (such as the x86 architecture) on a processor of the a architecture (such as the ARMv8 architecture), support of a dynamic binary translation technique is required.
In the related art, the open source software qemu (quick indicator) may implement a dynamic binary translation technique, which may support a variety of different architectures. Specifically, QEMU parses the source machine Code and translates it into the required target machine Code through a built-in tcg (tiny Code generator) compiler front-end. However, the TCG compiler can only provide translation technology and cannot directly run the translated target machine code, and can only provide very limited optimization mechanisms in the translation process, which is inefficient in generating the target machine code.
Disclosure of Invention
In view of the above, it is necessary to provide a code translation method, apparatus, device, storage medium, and program product capable of improving machine code translation efficiency.
In a first aspect, the present application provides a method of code translation. The method comprises the following steps:
obtaining a source machine code, wherein the source machine code comprises a plurality of source basic blocks; according to the execution sequence of the source basic blocks when the source machine code runs, target operation is executed for the source basic blocks for multiple times until the target operation is completed on the last source basic block in the execution sequence; wherein the ith target operation in the plurality of target operations comprises: detecting whether the candidate source basic block corresponding to the ith target operation has finished the translation processing; if so, acquiring a target basic block obtained after the candidate source basic block is translated, and operating the target basic block; if not, translating the candidate source basic block based on the LLVM compiler to obtain the target basic block, and operating the target basic block.
In one embodiment, translating the candidate source basic block based on the LLVM compiler to obtain the target basic block includes: creating an LLVM function corresponding to the candidate source basic block; acquiring an entry address of the candidate source basic block, sequentially acquiring source instructions from the source basic block according to the entry address, acquiring intermediate instructions corresponding to the source instructions after acquiring the source instructions each time, and adding the intermediate instructions to the LLVM function after acquiring the intermediate instructions each time; and after the intermediate instructions corresponding to all the source instructions in the candidate source basic block are added to the LLVM function, the LLVM function is operated to translate the candidate source basic block to obtain the target basic block.
In one embodiment, obtaining the entry address of the candidate source basic block includes: if i is 1, taking the entry address of the source machine code as the entry address of the candidate source basic block; and if i is larger than 1, determining the entry address of the candidate source basic block according to the last source instruction in the source basic block corresponding to the i-1 st target operation.
In one embodiment, sequentially fetching source instructions from the source basic block according to the entry address includes: according to the entry address, sequentially executing a plurality of instruction acquisition operations until the acquired source instruction is a control flow instruction; wherein, the ith instruction acquisition operation in the instruction acquisition operation comprises: determining byte identifiers of candidate source instructions corresponding to the ith instruction acquisition operation according to the entry address; and acquiring the candidate source instruction according to the byte identification.
In one embodiment, fetching an intermediate instruction corresponding to the source instruction includes: and querying a preset instruction database according to the source instruction to obtain an intermediate instruction corresponding to the source instruction, wherein the instruction database stores a plurality of groups of corresponding relations between the preset source instruction and the preset intermediate instruction.
In one embodiment, the construction process of the instruction database includes: acquiring an instruction set specification corresponding to the source machine code, wherein the instruction set specification comprises a plurality of preset source instructions; after the preset source instructions are simulated by adopting a preset program language, translating the preset source instructions to obtain preset intermediate instructions corresponding to the preset source instructions respectively; and constructing the instruction database according to each preset source instruction and each preset intermediate instruction.
In one embodiment, after obtaining the target basic block, the method further includes: and taking the entry address of the candidate source basic block as the identifier of the target basic block, and storing the target basic block in a preset cache space.
In one embodiment, detecting whether the candidate source basic block corresponding to the ith target operation has completed the translation processing includes: detecting whether the preset cache space contains a target identifier corresponding to the entry address of the candidate source basic block or not according to the entry address of the candidate source basic block, if so, finishing the translation processing of the candidate source basic block corresponding to the ith target operation; correspondingly, the obtaining of the target basic block obtained after the translation processing of the candidate source basic block includes: and searching a target basic block corresponding to the target identifier from the preset buffer space as a target basic block obtained after the candidate source basic block is translated.
In one embodiment, executing a plurality of target operations for the plurality of source basic blocks until the target operation is completed for a last source basic block in the execution order comprises: target operations are performed multiple times for the plurality of source basic blocks until an exit instruction in the source machine code is detected.
In a second aspect, the present application further provides a code translation apparatus. The device includes:
the acquisition module is used for acquiring a source machine code, and the source machine code comprises a plurality of source basic blocks;
the execution module is used for executing a plurality of target operations aiming at the plurality of source basic blocks according to the execution sequence of the plurality of source basic blocks when the source machine code runs until the target operation is completed on the last source basic block in the execution sequence; wherein the ith target operation in the plurality of target operations comprises: detecting whether the candidate source basic block corresponding to the ith target operation has finished the translation processing; if yes, obtaining a target basic block obtained after translation processing of the candidate source basic block, and operating the target basic block; if not, translating the candidate source basic block based on the LLVM compiler to obtain the target basic block, and operating the target basic block.
In one embodiment, the execution module is specifically configured to: creating an LLVM function corresponding to the candidate source basic block; acquiring an entry address of the candidate source basic block, sequentially acquiring source instructions from the source basic block according to the entry address, acquiring intermediate instructions corresponding to the source instructions after acquiring the source instructions each time, and adding the intermediate instructions to the LLVM function after acquiring the intermediate instructions each time; and after the intermediate instructions corresponding to all the source instructions in the candidate source basic block are added to the LLVM function, the LLVM function is operated to translate the candidate source basic block to obtain the target basic block.
In one embodiment, the execution module is specifically configured to: if i is 1, taking the entry address of the source machine code as the entry address of the candidate source basic block; and if i is larger than 1, determining the entry address of the candidate source basic block according to the last source instruction in the source basic block corresponding to the i-1 st target operation.
In one embodiment, the execution module is specifically configured to: according to the entry address, sequentially executing a plurality of instruction acquisition operations until the acquired source instruction is a control flow instruction; wherein, the ith instruction acquisition operation in the instruction acquisition operation comprises: determining byte identifiers of candidate source instructions corresponding to the ith instruction acquisition operation according to the entry address; and acquiring the candidate source instruction according to the byte identification.
In one embodiment, the execution module is specifically configured to: and querying a preset instruction database according to the source instruction to obtain an intermediate instruction corresponding to the source instruction, wherein the instruction database stores the corresponding relation between a plurality of groups of preset source instructions and preset intermediate instructions.
In one embodiment, the construction process of the instruction database includes: acquiring an instruction set specification corresponding to the source machine code, wherein the instruction set specification comprises a plurality of preset source instructions; after the preset source instructions are simulated by adopting a preset program language, translating the preset source instructions to obtain preset intermediate instructions corresponding to the preset source instructions respectively; and constructing the instruction database according to each preset source instruction and each preset intermediate instruction.
In one embodiment, the apparatus further comprises:
and the storage module is used for taking the entry address of the candidate source basic block as the identifier of the target basic block and storing the target basic block in a preset cache space.
In one embodiment, the execution module is specifically configured to: detecting whether the preset cache space contains a target identifier corresponding to the entry address of the candidate source basic block or not according to the entry address of the candidate source basic block, if so, finishing the translation processing of the candidate source basic block corresponding to the ith target operation; correspondingly, the execution module is specifically configured to: and searching a target basic block corresponding to the target identifier from the preset buffer space as a target basic block obtained after the candidate source basic block is translated.
In one embodiment, the execution module is specifically configured to: target operations are performed multiple times for the plurality of source basic blocks until an exit instruction in the source machine code is detected.
In a third aspect, the present application further provides a computer device comprising a memory and a processor, the memory storing a computer program, the processor implementing the steps of the method according to any one of the first aspect when executing the computer program.
In a fourth aspect, the present application also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the method of any of the first aspects described above.
In a fifth aspect, the present application also provides a computer program product comprising a computer program that, when executed by a processor, performs the steps of the method of any of the first aspects described above.
According to the code translation method, the code translation device, the code translation equipment, the code translation storage medium and the program product, the source machine code comprising a plurality of source basic blocks is obtained, and target operations are executed for the source basic blocks for a plurality of times according to the execution sequence of the source basic blocks when the source machine code runs until the target operation is completed on the last source basic block in the execution sequence; specifically, the ith target operation in the multiple target operations includes: detecting whether the candidate source basic block corresponding to the ith target operation has finished the translation processing; if so, acquiring a target basic block obtained after the candidate source basic block is translated, and operating the target basic block; if not, translating the candidate source basic block based on the LLVM compiler to obtain the target basic block, and operating the target basic block. The LLVM compiler can be called to translate the candidate source basic blocks under the condition that the candidate source basic blocks corresponding to the target operation are not translated, so that an optimization mechanism in the LLVM compiler can be fully utilized to perform optimization processing such as redundancy removal or combination on instructions in the candidate source basic blocks, the machine code translation efficiency is improved, and the target basic blocks generated after the operation optimization of the processor are more efficient. And in addition, under the condition that the candidate source basic block is translated, the basic block is directly operated, so that the source machine code can be directly translated and executed without repeatedly translating the candidate source basic block, the machine code compiled in the processor of one architecture can be directly operated in the processor of the other architecture, and meanwhile, the translation processing and execution efficiency of the source machine code is effectively improved.
Drawings
FIG. 1 is a flow diagram illustrating code translation in one embodiment;
FIG. 2 is a schematic flow chart illustrating obtaining a target basic block according to an embodiment;
FIG. 3 is a flow diagram that illustrates the instruction database build process in one embodiment;
FIG. 4 is a block diagram illustrating a dynamic binary translation method according to an embodiment;
FIG. 5 is a block diagram showing the structure of a code translation apparatus according to one embodiment;
FIG. 6 is a block diagram showing the construction of a code translating apparatus according to another embodiment;
FIG. 7 is a diagram illustrating an internal structure of a computer device according to an embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more clearly understood, the present application is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of and not restrictive on the broad application.
In order to directly run a program compiled for a B architecture (such as x86 architecture) processor on an A architecture (such as ARMv8 architecture) processor, the support of a dynamic binary translation technology is required. Currently, the mainstream dynamic binary translation software has an open source item QEMU that can support the translation process of machine code between many different architectures. QEMU parses the source machine code and translates it into the required target machine code through a built-in TCG compiler. However, the translation method of QEMU has a problem of low translation efficiency, and the native performance of the program is greatly reduced. Mainly because the TCG compiler has two major disadvantages: (1) for complex instructions such as vector instructions, etc., only simulation can be used, and the acceleration mechanism of the vector instructions by the hardware of the processor can not be fully utilized. (2) The TCG compiler can only provide very limited optimization mechanisms and the generated target machine code is not efficient enough. Furthermore, the TCG compiler can only provide translation techniques and cannot directly run the translated target machine code.
In view of this, the embodiments of the present application provide a code translation method, which can improve the translation efficiency of machine codes.
In one embodiment, as shown in fig. 1, a code translation method is provided, and this embodiment of the present application is exemplified by applying the method to a terminal, where the terminal includes a processor in an architecture a, and can translate and execute a program compiled by a processor in an architecture B. It is understood that the method can also be applied to a server, and can also be applied to a system comprising a terminal and a server, and is realized through the interaction of the terminal and the server. The terminal may be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, and the like, and the server may be implemented by an independent server or a server cluster formed by a plurality of servers.
In the embodiment of the application, the code translation method comprises the following steps:
step 101, a source machine code is obtained, wherein the source machine code comprises a plurality of source basic blocks.
The architecture of the processor can be IA-32, IA-64, x86-32, x86-64 or ARMv8, and programs compiled for processors of different architectures cannot be directly run on processors of other architectures, and program writing needs to be performed again or programs need to be translated.
For a process that a program compiled by a processor of the B architecture is to be directly run on a processor of the a architecture, the source code is referred to as a program compiled by a processor of the B architecture. In order to use the functions implemented by the source code without requiring a new compilation of programs, the source code is translated so that it can be run directly on the processor of the a architecture.
The basic block is an instruction sequence executed in sequence and comprises a plurality of instructions, one basic block only has one inlet and one outlet, the inlet is the first instruction in the basic block, and the outlet is the last instruction in the basic block; for a basic block, execution only enters from its entry and exits from its exit. The source code may include a plurality of basic blocks, and the basic blocks included in the source code are the source basic blocks, and the translation processing of the source code may be implemented by performing translation processing on each instruction in the plurality of source basic blocks included in the source code.
And 102, according to the execution sequence of the source basic blocks when the source machine code runs, executing a plurality of target operations aiming at the source basic blocks until the target operation is completed on the last source basic block in the execution sequence. Wherein the ith target operation in the plurality of target operations comprises: detecting whether the candidate source basic block corresponding to the ith target operation has finished the translation processing; if so, acquiring a target basic block obtained after the candidate source basic block is translated, and operating the target basic block; if not, translating the candidate source basic block based on the LLVM compiler to obtain the target basic block, and operating the target basic block.
The source machine code comprises a plurality of source basic blocks, and the source machine code is operated according to the execution sequence of each source basic block, wherein each source basic block can realize different functions after being operated, and the source machine code can comprise a plurality of same source basic blocks with different execution sequences.
In the translation process, target operations can be executed for each of the plurality of source basic blocks according to the execution sequence of each source basic block when the source machine code runs until the target operation is completed for the last source basic block in the execution sequence, and the translation processing and execution processes of the source machine code are completed.
Specifically, for each target operation, the source basic block processed by the target operation is the corresponding candidate source basic block. For each source basic block, in the process of executing target operation on the source basic block, after the source basic block is obtained, the source basic block is taken as a candidate source basic block, and whether the candidate source basic block has completed translation processing is detected firstly. If it is detected that the candidate source basic block is not translated completely, the target operation is to translate the candidate source basic block based on an llvm (low Level Virtual machine) compiler to obtain a target basic block, where the target basic block is a program corresponding to the candidate source basic block and capable of running on a processor of an architecture a, and therefore, the target basic block can be directly run after the translation, and functions of the candidate source basic block are realized. If the candidate source basic block is detected to have been translated, it means that the candidate source basic block does not appear for the first time in the translation process of the source machine code, and it has already been translated, so the target operation is to directly acquire the corresponding target basic block that has been translated previously of the candidate source basic block, and directly execute the target basic block, without performing repeated translation process on the candidate basic block, thereby improving the translation process efficiency of the source machine code. The LLVM is an open-source compiler framework, and can provide a powerful optimization mechanism, vectorize some parts of an instruction, merge redundant instructions, and perform redundancy removal processing or inline optimization and other processing on the instruction, so that efficient machine codes can be generated by calling the LLVM to perform translation processing on a source basic block, and the translation processing efficiency is improved.
According to the code translation method, a source machine code comprising a plurality of source basic blocks is obtained, and target operations are executed for the source basic blocks for a plurality of times according to the execution sequence of the source basic blocks when the source machine code runs until the target operation is completed on the last source basic block in the execution sequence; specifically, the ith target operation in the multiple target operations includes: detecting whether the candidate source basic block corresponding to the ith target operation has finished the translation processing; if yes, obtaining a target basic block obtained after translation processing of the candidate source basic block, and operating the target basic block; if not, translating the candidate source basic block based on the LLVM compiler to obtain the target basic block, and operating the target basic block. The LLVM compiler can be called to translate the candidate source basic blocks under the condition that the candidate source basic blocks corresponding to the target operation are not translated, so that an optimization mechanism in the LLVM compiler can be fully utilized to perform optimization processing such as redundancy removal or combination on instructions in the candidate source basic blocks, the machine code translation efficiency is improved, and the target basic blocks generated after the operation optimization of the processor are more efficient. And in addition, under the condition that the candidate source basic block is translated, the basic block is directly operated, so that the source machine code can be directly translated and executed without repeatedly translating the candidate source basic block, the machine code compiled in the processor of one architecture can be directly operated in the processor of the other architecture, and meanwhile, the translation processing and execution efficiency of the source machine code is effectively improved.
In one embodiment, as shown in fig. 2, a schematic flow chart of obtaining a target basic block according to an embodiment of the present application is shown. Translating the candidate source basic block based on the LLVM compiler to obtain the target basic block, wherein the translating process comprises the following steps:
step 201, creating an LLVM function corresponding to the candidate source basic block.
In the process of translating each candidate source basic block, the LLVM function corresponding to the candidate source basic block is established first, so that each instruction contained in the candidate source basic block corresponding to the LLVM function can be translated and executed based on the LLVM compiler.
Step 202, obtaining an entry address of the candidate source basic block, sequentially obtaining source instructions from the source basic block according to the entry address, and obtaining an intermediate instruction corresponding to the source instruction after obtaining the source instruction each time.
Wherein, as mentioned above, a basic block has only one entry and one exit, the entry being the first instruction and the exit being the last instruction; for a basic block, execution only enters from its entry and exits from its exit. And each instruction in the basic block corresponds to a memory address. And all instructions contained in the candidate source basic block are source instructions, and translating and executing the candidate source basic block are translating, processing and executing all source instructions in the candidate source basic block.
The intermediate instruction refers to an instruction that can be directly recognized and translated by the LLVM, and therefore, the intermediate instruction corresponding to each source instruction needs to be acquired and translated, and the intermediate instruction and the source instruction correspond to each other, so that the source instruction is translated.
Specifically, the entry address of the candidate source basic block is a memory address corresponding to a source instruction of an entry in the candidate source basic block. And determining each source instruction contained in the basic block by acquiring the entry address of the candidate source basic block, and further acquiring an intermediate instruction corresponding to each source instruction to further perform translation processing.
In one embodiment, sequentially fetching source instructions from the source basic block according to the entry address includes: according to the entry address, sequentially executing a plurality of instruction acquisition operations until the acquired source instruction is a control flow instruction; wherein, the ith instruction acquisition operation in the instruction acquisition operation comprises: determining byte identifiers of candidate source instructions corresponding to the ith instruction acquisition operation according to the entry address; and acquiring the candidate source instruction according to the byte identification.
And sequentially acquiring each source instruction in the candidate source basic block based on the entry address, wherein the candidate source instruction is a source instruction to be acquired by each instruction acquisition operation. Specifically, each candidate source instruction corresponds to a byte identifier, and the byte identifier may be determined according to a memory address corresponding to each source instruction, and further determined based on the byte identifier. Thus, the process of fetching source instructions in turn may be as follows:
for example, the entry address is a memory address corresponding to a first candidate source instruction in the source basic block, a byte identifier corresponding to the first candidate source instruction corresponding to the first instruction obtaining operation is determined according to the entry address, and the first candidate source instruction is obtained according to the byte identifier; further, according to the byte identifier corresponding to the first candidate source instruction, adding the byte identifier to the entry address to obtain an updated entry address, that is, a memory address corresponding to a second candidate source instruction, so that according to the memory address corresponding to the second candidate source instruction, the memory identifier corresponding to the second candidate source instruction corresponding to the second instruction obtaining operation is determined again, and further, according to the memory identifier, the second candidate source instruction is obtained; further, adding a byte identifier corresponding to the (i-1) th candidate source instruction to the updated entry address to obtain a memory address corresponding to the ith candidate source instruction, and obtaining the byte identifier corresponding to the ith candidate source instruction according to the memory address, thereby further obtaining the ith candidate source instruction corresponding to the ith instruction obtaining operation based on the byte identifier. And acquiring all source instructions contained in the candidate source basic block until the acquired candidate source instructions are control flow instructions. The control flow instruction refers to instructions that affect the program execution flow, such as a jump instruction, a loop instruction, or a judgment instruction.
In one embodiment, obtaining the entry address of the candidate source basic block includes: if i is 1, taking the entry address of the source machine code as the entry address of the candidate source basic block; and if i is larger than 1, determining the entry address of the candidate source basic block according to the last source instruction in the source basic block corresponding to the i-1 st target operation.
Wherein i is the number of times of executing the target operation, and when i is equal to 1, that is, when the target operation is executed for the first time, directly taking the entry address of the source machine code as the entry address of the candidate source basic block, where the entry address of the source machine code is the entry address of the basic block whose execution order is the first in the source machine code, so that each source instruction of the candidate source basic block can be further acquired.
And when i is greater than 1, determining the entry address of the candidate source basic block according to the last source instruction in the source basic block corresponding to the i-1 st target operation. For example, the memory address obtained by adding the byte identifier of the last source instruction in the source basic block corresponding to the target operation of the (i-1) th time and the memory address is the entry address of the source basic block corresponding to the currently executed target operation of the (i) th time.
In the above process of obtaining the source instruction, after the source instruction is obtained, the intermediate instruction corresponding to the source instruction is obtained, so that the intermediate instruction can be directly translated based on the LLVM compiler, and the following description is about obtaining the intermediate instruction corresponding to the source instruction.
In one embodiment, fetching an intermediate instruction corresponding to the source instruction includes: and querying a preset instruction database according to the source instruction to obtain an intermediate instruction corresponding to the source instruction, wherein the instruction database stores a plurality of groups of corresponding relations between the preset source instruction and the preset intermediate instruction.
The terminal is provided with an instruction database in advance, the instruction database comprises a corresponding relation of a plurality of groups of preset source instructions and preset intermediate instructions, and each preset source instruction can correspond to each source instruction in the source machine code respectively. Therefore, after the source instruction is obtained, the preset source instruction corresponding to the source instruction in the instruction database is determined, and the preset intermediate instruction corresponding to the preset source instruction is the intermediate instruction corresponding to the source instruction, so that the intermediate instruction corresponding to the source instruction can be obtained.
Step 203, after the intermediate instruction is acquired each time, adding the intermediate instruction to the LLVM function.
And for each source instruction, after the intermediate instruction corresponding to the source instruction is acquired, adding the acquired intermediate instruction into the LLVM function. Therefore, the intermediate instructions corresponding to all the source instructions in the candidate source basic block can be added to the LLVM function in the execution order, so that the LLVM compiler can be called to perform translation processing on the added intermediate instructions and execute the intermediate instructions based on the LLVM function.
And 204, after the intermediate instructions corresponding to all the source instructions in the candidate source basic block are added to the LLVM function, running the LLVM function to perform translation processing on the candidate source basic block to obtain the target basic block.
As described above, until the obtained source instruction is a control flow instruction, the intermediate instructions corresponding to all source instructions in the candidate source basic block are added to the LLVM function. Therefore, the LLVM function includes intermediate instructions corresponding to all source instructions in the candidate source basic block, and when the LLVM function is run based on the LLVM compiler, the LLVM function can translate each intermediate instruction in the LLVM function into a target basic block formed by the required target machine code instructions.
In one embodiment, after obtaining the target basic block, the method further includes: and taking the entry address of the candidate source basic block as the identifier of the target basic block, and storing the target basic block in a preset cache space.
And running the LLVM function to translate the candidate source basic block to obtain a target basic block, identifying the target basic block by using the entry address of the candidate source basic block, and storing the identified target basic block in a preset cache space, namely storing the identified target basic block in the terminal. Therefore, target basic blocks obtained by translation processing based on the LLVM function corresponding to each candidate source basic block are identified and stored in the terminal, so that when the translated candidate source basic blocks are detected in the source machine code translation processing process, the corresponding target basic blocks can be directly obtained from the preset cache space according to the identification and run without repeated translation, computer resources are saved, and the translation and execution efficiency of the source machine code is improved.
In one embodiment, the detecting whether the candidate source basic block corresponding to the ith target operation has completed the translation processing includes: detecting whether the preset cache space contains a target identifier corresponding to the entry address of the candidate source basic block or not according to the entry address of the candidate source basic block, if so, finishing the translation processing of the candidate source basic block corresponding to the ith target operation; correspondingly, the obtaining of the target basic block obtained after the translation processing of the candidate source basic block includes: and searching a target basic block corresponding to the target identifier from the preset buffer space as a target basic block obtained after the candidate source basic block is translated.
As mentioned above, the target basic block obtained after the translation processing is completed is stored in the preset cache space based on the entry address identifier of the corresponding candidate source basic block. Therefore, for a candidate source basic block corresponding to an ith target operation, according to an entry address of the candidate source basic block corresponding to the ith target operation, whether a target basic block identified by the entry address exists in a preset cache space, that is, whether a target identifier corresponding to the entry address of the candidate source basic block corresponding to the ith target operation exists is detected, if so, it means that the candidate source basic block corresponding to the ith target operation has been previously translated, and a target basic block identified by a corresponding target identifier is obtained directly based on the entry address of the candidate source basic block corresponding to the ith target operation and is used as a target basic block obtained after translation processing of the candidate source basic block corresponding to the ith target operation, and the target basic block is directly run.
In the embodiment of the application, whether the preset cache space contains the target identifier corresponding to the entry address of the candidate source basic block or not is detected according to the entry address of the candidate source basic block, so that the target basic block corresponding to the candidate source basic block can be directly obtained and directly executed under the condition that the preset cache space contains the target identifier corresponding to the entry address of the candidate source basic block, translation processing is not needed, the translation processing efficiency of a source machine code is greatly improved, and hardware resources are saved.
In one embodiment, as shown in fig. 3, a flow diagram of an instruction database building process provided by an embodiment of the present application is shown. The construction process of the instruction database comprises the following steps:
step 301, obtaining an instruction set specification corresponding to the source machine code, where the instruction set specification includes a plurality of preset source instructions.
The source machine code corresponds to an instruction set specification, the instruction set specification comprises a plurality of different preset source instructions, and each instruction contained in the source machine code can be found from the preset source instruction in the instruction set specification; the instruction set specification includes descriptions of semantics, functions, and the like of each of the default source instructions. Thus, the instruction database is built based on the various default source instructions in the instruction set specification.
Step 302, after the preset source instructions are realized by adopting the preset programming language simulation, the preset source instructions are translated to obtain the preset intermediate instructions corresponding to the preset source instructions.
In order to obtain the intermediate instruction corresponding to each source instruction in the translation process of the source machine code, the corresponding preset intermediate instruction needs to be determined in advance based on each preset source instruction. Specifically, for each preset source instruction in the instruction set specification, each preset source instruction is realized by adopting a preset programming language simulation, and the instruction realized by the simulation is translated, so that a preset intermediate instruction corresponding to each preset source instruction is obtained.
Optionally, the preset programming language may be a high-level language, for example, C, C + + or Java may be used as the programming language.
Optionally, after the preset source instructions are simulated and implemented by using the preset program language, the instruction obtained by simulation may be translated and processed by using software matched with the LLVM compiler to obtain corresponding preset intermediate instructions.
Step 303, constructing the instruction database according to each preset source instruction and each preset intermediate instruction.
The constructed instruction database comprises a plurality of preset source instructions and preset intermediate instructions, and each preset source instruction corresponds to one preset intermediate instruction, so that in the process of translating the candidate source basic blocks, the intermediate instructions corresponding to the candidate source instructions can be obtained based on the instruction database. The efficiency of obtaining the intermediate instruction is improved and the reliability of obtaining the intermediate instruction is guaranteed.
In one embodiment, the executing the target operation for the plurality of source basic blocks a plurality of times until the target operation is completed for a last source basic block in the execution order comprises: target operations are performed multiple times for the plurality of source basic blocks until an exit instruction in the source machine code is detected.
The source machine code includes an exit instruction, and optionally, the exit instruction may be included in a last source basic block in the execution order, so that, according to the execution order of the plurality of source basic blocks when the source machine code runs, target operations are executed for the plurality of source basic blocks for a plurality of times until the exit instruction is detected, it is indicated that translation and execution of the source machine code have ended, and all source basic blocks in the source machine code have executed the target operations. Thus, the source machine code corresponding to the program compiled by the processor of the B architecture is already executed in the terminal including the processing of the a architecture.
In one embodiment, as shown in fig. 4, an architectural diagram of a dynamic binary translation method provided in an embodiment of the present application is shown. Based on the dynamic binary translation method, a processor in a B architecture can be directly run in a processor in an A architecture compared with a written program, and the process specifically comprises the following steps:
s1, reading the binary file and extracting the source machine code, obtaining the entry address x of the source machine code, and setting the program counter PC as x;
s2, creating an LLVM intermediate function func and identifying the intermediate function func by using the value of the PC;
s3, judging whether the corresponding source basic block is cached in the preset cache space according to the value of the PC, if so, executing the step S7; otherwise, step S2 is executed, and then step S4 is executed;
s4, according to the value of the PC, obtaining the source instruction pointed by the PC, and updating the PC according to the byte identification of the source instruction. And finding an intermediate instruction corresponding to the source instruction from the file f, and adding the found intermediate instruction into the intermediate function func. If the source instruction pointed by the PC is a control flow instruction, go to step S5; otherwise, continuing to execute step S4;
the construction process of the file f comprises the following steps: and simulating and realizing each source instruction in the instruction set architecture specification by adopting a preset program language according to the instruction set architecture specification corresponding to the source machine code, compiling each source instruction into an intermediate instruction which can be identified and translated by the LLVM, and storing the intermediate instruction into the file f.
S5, optimizing each intermediate instruction in the intermediate function func by using an LLVM compiler, translating each intermediate instruction to obtain a target machine code, and executing the step S6;
s6, caching the target machine code corresponding to the intermediate function func based on the basic block, using the value identification of the PC corresponding to the first instruction in the intermediate function func, and then executing the step S8;
s7, finding out the cached target machine code corresponding to the corresponding intermediate function func according to the value of the PC, and executing the step S8;
s8, executing the target machine code corresponding to the intermediate function func, if the exit instruction is detected, executing the step S3; otherwise, the translation process of the source machine code is finished.
It should be understood that, although the steps in the flowcharts related to the embodiments as described above are sequentially displayed as indicated by arrows, the steps are not necessarily performed sequentially as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a part of the steps in the flowcharts related to the embodiments described above may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, and the execution order of the steps or stages is not necessarily sequential, but may be rotated or alternated with other steps or at least a part of the steps or stages in other steps.
Based on the same inventive concept, the embodiment of the present application further provides a code translation apparatus for implementing the above-mentioned code translation method. The implementation scheme for solving the problem provided by the device is similar to the implementation scheme recorded in the method, so the specific limitations in one or more embodiments of the code translation device provided below can refer to the limitations on the code translation method in the foregoing, and details are not described here.
In one embodiment, as shown in FIG. 5, a code translation apparatus is provided. The code translation apparatus 500 includes: an obtaining module 501 and an executing module 502, wherein:
an obtaining module 501, configured to obtain a source machine code, where the source machine code includes a plurality of source basic blocks;
an executing module 502, configured to execute, according to an execution order of the source basic blocks when the source machine code runs, a plurality of target operations for the source basic blocks until the target operation is completed for a last source basic block in the execution order; wherein the ith target operation in the plurality of target operations comprises: detecting whether the candidate source basic block corresponding to the ith target operation has finished the translation processing; if so, acquiring a target basic block obtained after the candidate source basic block is translated, and operating the target basic block; if not, translating the candidate source basic block based on the LLVM compiler to obtain the target basic block, and operating the target basic block.
In an embodiment, the executing module 502 is specifically configured to: creating an LLVM function corresponding to the candidate source basic block; acquiring an entry address of the candidate source basic block, sequentially acquiring source instructions from the source basic block according to the entry address, acquiring intermediate instructions corresponding to the source instructions after acquiring the source instructions each time, and adding the intermediate instructions to the LLVM function after acquiring the intermediate instructions each time; and after the intermediate instructions corresponding to all the source instructions in the candidate source basic block are added to the LLVM function, the LLVM function is operated to translate the candidate source basic block to obtain the target basic block.
In an embodiment, the executing module 502 is specifically configured to: if i is 1, taking the entry address of the source machine code as the entry address of the candidate source basic block; and if i is larger than 1, determining the entry address of the candidate source basic block according to the last source instruction in the source basic block corresponding to the i-1 st target operation.
In an embodiment, the executing module 502 is specifically configured to: according to the entry address, sequentially executing a plurality of instruction acquisition operations until the acquired source instruction is a control flow instruction; wherein, the ith instruction acquisition operation in the instruction acquisition operation comprises: determining byte identifiers of candidate source instructions corresponding to the ith instruction acquisition operation according to the entry address; and acquiring the candidate source instruction according to the byte identification.
In an embodiment, the executing module 502 is specifically configured to: and querying a preset instruction database according to the source instruction to obtain an intermediate instruction corresponding to the source instruction, wherein the instruction database stores a plurality of groups of corresponding relations between the preset source instruction and the preset intermediate instruction.
In one embodiment, the construction process of the instruction database includes: acquiring an instruction set specification corresponding to the source machine code, wherein the instruction set specification comprises a plurality of preset source instructions; after the preset source instructions are simulated by adopting a preset program language, translating the preset source instructions to obtain preset intermediate instructions corresponding to the preset source instructions respectively; and constructing the instruction database according to each preset source instruction and each preset intermediate instruction.
In one embodiment, as shown in fig. 6, the code translation apparatus 500 further includes:
the storage module 503 is configured to use the entry address of the candidate source basic block as the identifier of the target basic block, and store the target basic block in a preset cache space.
In an embodiment, the executing module 502 is specifically configured to: detecting whether the preset cache space contains a target identifier corresponding to the entry address of the candidate source basic block or not according to the entry address of the candidate source basic block, if so, finishing the translation processing of the candidate source basic block corresponding to the ith target operation; correspondingly, the executing module 502 is specifically configured to: and searching a target basic block corresponding to the target identifier from the preset buffer space as a target basic block obtained after the candidate source basic block is translated.
In an embodiment, the executing module 502 is specifically configured to: target operations are performed multiple times for the plurality of source basic blocks until an exit instruction in the source machine code is detected.
The modules in the code translation apparatus can be implemented in whole or in part by software, hardware, and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
In one embodiment, a computer device is provided, which may be a server, the internal structure of which may be as shown in fig. 7. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer device is used to store the XX data. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a code translation method.
Those skilled in the art will appreciate that the architecture shown in fig. 7 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
In one embodiment, a computer device is provided, comprising a memory and a processor, the memory having a computer program stored therein, the processor implementing the following steps when executing the computer program:
obtaining a source machine code, wherein the source machine code comprises a plurality of source basic blocks; according to the execution sequence of the source basic blocks when the source machine code runs, target operation is executed for the source basic blocks for multiple times until the target operation is completed on the last source basic block in the execution sequence; wherein the ith target operation in the plurality of target operations comprises: detecting whether the candidate source basic block corresponding to the ith target operation has finished the translation processing; if so, acquiring a target basic block obtained after the candidate source basic block is translated, and operating the target basic block; if not, the candidate source basic block is translated based on the LLVM compiler to obtain the target basic block, and the target basic block is operated.
In one embodiment, the processor, when executing the computer program, further performs the steps of:
creating an LLVM function corresponding to the candidate source basic block; acquiring an entry address of the candidate source basic block, sequentially acquiring source instructions from the source basic block according to the entry address, acquiring intermediate instructions corresponding to the source instructions after acquiring the source instructions each time, and adding the intermediate instructions to the LLVM function after acquiring the intermediate instructions each time; and after the intermediate instructions corresponding to all the source instructions in the candidate source basic block are added to the LLVM function, the LLVM function is operated to translate the candidate source basic block to obtain the target basic block.
In one embodiment, the processor, when executing the computer program, further performs the steps of:
if i is 1, taking the entry address of the source machine code as the entry address of the candidate source basic block; and if i is larger than 1, determining the entry address of the candidate source basic block according to the last source instruction in the source basic block corresponding to the i-1 st target operation.
In one embodiment, the processor when executing the computer program further performs the steps of:
according to the entry address, sequentially executing a plurality of instruction acquisition operations until the acquired source instruction is a control flow instruction; wherein, the ith instruction acquisition operation in the instruction acquisition operation comprises: determining byte identifiers of candidate source instructions corresponding to the ith instruction acquisition operation according to the entry address; and acquiring the candidate source instruction according to the byte identification.
In one embodiment, the processor, when executing the computer program, further performs the steps of:
and querying a preset instruction database according to the source instruction to obtain an intermediate instruction corresponding to the source instruction, wherein the instruction database stores a plurality of groups of corresponding relations between the preset source instruction and the preset intermediate instruction.
In one embodiment, the processor, when executing the computer program, further performs the steps of:
acquiring an instruction set specification corresponding to the source machine code, wherein the instruction set specification comprises a plurality of preset source instructions; after the preset source instructions are simulated by adopting a preset program language, translating the preset source instructions to obtain preset intermediate instructions corresponding to the preset source instructions respectively; and constructing the instruction database according to each preset source instruction and each preset intermediate instruction.
In one embodiment, the processor, when executing the computer program, further performs the steps of:
and taking the entry address of the candidate source basic block as the identifier of the target basic block, and storing the target basic block in a preset cache space.
In one embodiment, the processor when executing the computer program further performs the steps of:
detecting whether the preset cache space contains a target identifier corresponding to the entry address of the candidate source basic block or not according to the entry address of the candidate source basic block, if so, finishing the translation processing of the candidate source basic block corresponding to the ith target operation; correspondingly, the obtaining of the target basic block obtained after the translation processing of the candidate source basic block includes: and searching a target basic block corresponding to the target identifier from the preset buffer space as a target basic block obtained after the candidate source basic block is translated.
In one embodiment, the processor, when executing the computer program, further performs the steps of:
target operations are performed multiple times for the plurality of source basic blocks until an exit instruction in the source machine code is detected.
In one embodiment, a computer-readable storage medium is provided, having a computer program stored thereon, which when executed by a processor, performs the steps of:
obtaining a source machine code, wherein the source machine code comprises a plurality of source basic blocks; according to the execution sequence of the source basic blocks when the source machine code runs, target operation is executed for the source basic blocks for multiple times until the target operation is completed on the last source basic block in the execution sequence; wherein the ith target operation in the plurality of target operations comprises: detecting whether the candidate source basic block corresponding to the ith target operation has finished the translation processing; if so, acquiring a target basic block obtained after the candidate source basic block is translated, and operating the target basic block; if not, translating the candidate source basic block based on the LLVM compiler to obtain the target basic block, and operating the target basic block.
In one embodiment, the computer program when executed by the processor further performs the steps of:
creating an LLVM function corresponding to the candidate source basic block; acquiring an entry address of the candidate source basic block, sequentially acquiring source instructions from the source basic block according to the entry address, acquiring intermediate instructions corresponding to the source instructions after acquiring the source instructions each time, and adding the intermediate instructions to the LLVM function after acquiring the intermediate instructions each time; and after the intermediate instructions corresponding to all the source instructions in the candidate source basic block are added to the LLVM function, the LLVM function is operated to translate the candidate source basic block to obtain the target basic block.
In one embodiment, the computer program when executed by the processor further performs the steps of:
if i is 1, taking the entry address of the source machine code as the entry address of the candidate source basic block; and if i is larger than 1, determining the entry address of the candidate source basic block according to the last source instruction in the source basic block corresponding to the i-1 st target operation.
In one embodiment, the computer program when executed by the processor further performs the steps of:
according to the entry address, sequentially executing a plurality of instruction acquisition operations until the acquired source instruction is a control flow instruction; wherein, the ith instruction obtaining operation in the instruction obtaining operation comprises the following steps: determining byte identifiers of candidate source instructions corresponding to the ith instruction acquisition operation according to the entry address; and acquiring the candidate source instruction according to the byte identification.
In one embodiment, the computer program when executed by the processor further performs the steps of:
and querying a preset instruction database according to the source instruction to obtain an intermediate instruction corresponding to the source instruction, wherein the instruction database stores the corresponding relation between a plurality of groups of preset source instructions and preset intermediate instructions.
In one embodiment, the computer program when executed by the processor further performs the steps of:
acquiring an instruction set specification corresponding to the source machine code, wherein the instruction set specification comprises a plurality of preset source instructions; after the preset source instructions are simulated by adopting a preset program language, translating the preset source instructions to obtain preset intermediate instructions corresponding to the preset source instructions respectively; and constructing the instruction database according to each preset source instruction and each preset intermediate instruction.
In one embodiment, the computer program when executed by the processor further performs the steps of:
and taking the entry address of the candidate source basic block as the identifier of the target basic block, and storing the target basic block in a preset cache space.
In one embodiment, the computer program when executed by the processor further performs the steps of:
detecting whether the preset cache space contains a target identifier corresponding to the entry address of the candidate source basic block or not according to the entry address of the candidate source basic block, if so, finishing the translation processing of the candidate source basic block corresponding to the ith target operation; correspondingly, the obtaining of the target basic block obtained after the translation processing of the candidate source basic block includes: and searching a target basic block corresponding to the target identifier from the preset buffer space as a target basic block obtained after the candidate source basic block is translated.
In one embodiment, the computer program when executed by the processor further performs the steps of:
target operations are performed multiple times for the plurality of source basic blocks until an exit instruction in the source machine code is detected.
In one embodiment, a computer program product is provided, comprising a computer program which when executed by a processor performs the steps of:
obtaining a source machine code, wherein the source machine code comprises a plurality of source basic blocks; according to the execution sequence of the source basic blocks when the source machine code runs, target operation is executed for the source basic blocks for multiple times until the target operation is completed on the last source basic block in the execution sequence; wherein the ith target operation in the plurality of target operations comprises: detecting whether the candidate source basic block corresponding to the ith target operation has finished the translation processing; if so, acquiring a target basic block obtained after the candidate source basic block is translated, and operating the target basic block; if not, translating the candidate source basic block based on the LLVM compiler to obtain the target basic block, and operating the target basic block.
In one embodiment, the computer program when executed by the processor further performs the steps of:
creating an LLVM function corresponding to the candidate source basic block; acquiring an entry address of the candidate source basic block, sequentially acquiring source instructions from the source basic block according to the entry address, acquiring intermediate instructions corresponding to the source instructions after acquiring the source instructions each time, and adding the intermediate instructions to the LLVM function after acquiring the intermediate instructions each time; and after the intermediate instructions corresponding to all the source instructions in the candidate source basic block are added to the LLVM function, the LLVM function is operated to translate the candidate source basic block to obtain the target basic block.
In one embodiment, the computer program when executed by the processor further performs the steps of:
if i is equal to 1, taking the entry address of the source machine code as the entry address of the candidate source basic block; and if i is larger than 1, determining the entry address of the candidate source basic block according to the last source instruction in the source basic block corresponding to the i-1 st target operation.
In one embodiment, the computer program when executed by the processor further performs the steps of:
according to the entry address, sequentially executing a plurality of instruction acquisition operations until the acquired source instruction is a control flow instruction; wherein, the ith instruction obtaining operation in the instruction obtaining operation comprises the following steps: determining byte identifiers of candidate source instructions corresponding to the ith instruction acquisition operation according to the entry address; and acquiring the candidate source instruction according to the byte identification.
In one embodiment, the computer program when executed by the processor further performs the steps of:
and querying a preset instruction database according to the source instruction to obtain an intermediate instruction corresponding to the source instruction, wherein the instruction database stores a plurality of groups of corresponding relations between the preset source instruction and the preset intermediate instruction.
In one embodiment, the computer program when executed by the processor further performs the steps of:
acquiring an instruction set specification corresponding to the source machine code, wherein the instruction set specification comprises a plurality of preset source instructions; after the preset source instructions are simulated by adopting a preset program language, translating the preset source instructions to obtain preset intermediate instructions corresponding to the preset source instructions respectively; and constructing the instruction database according to each preset source instruction and each preset intermediate instruction.
In one embodiment, the computer program when executed by the processor further performs the steps of:
and taking the entry address of the candidate source basic block as the identifier of the target basic block, and storing the target basic block in a preset cache space.
In one embodiment, the computer program when executed by the processor further performs the steps of:
detecting whether the preset cache space contains a target identifier corresponding to the entry address of the candidate source basic block or not according to the entry address of the candidate source basic block, if so, finishing translation processing on the candidate source basic block corresponding to the ith target operation; correspondingly, the obtaining of the target basic block obtained after the translation processing of the candidate source basic block includes: and searching a target basic block corresponding to the target identifier from the preset buffer space as a target basic block obtained after the candidate source basic block is translated.
In one embodiment, the computer program when executed by the processor further performs the steps of:
target operations are performed multiple times for the plurality of source basic blocks until an exit instruction in the source machine code is detected.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, database, or other medium used in the embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high-density embedded nonvolatile Memory, resistive Random Access Memory (ReRAM), Magnetic Random Access Memory (MRAM), Ferroelectric Random Access Memory (FRAM), Phase Change Memory (PCM), graphene Memory, and the like. Volatile Memory can include Random Access Memory (RAM), external cache Memory, and the like. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), among others. The databases referred to in various embodiments provided herein may include at least one of relational and non-relational databases. The non-relational database may include, but is not limited to, a block chain based distributed database, and the like. The processors referred to in the embodiments provided herein may be general purpose processors, central processing units, graphics processors, digital signal processors, programmable logic devices, quantum computing based data processing logic devices, etc., without limitation.
All possible combinations of the technical features in the above embodiments may not be described for the sake of brevity, but should be considered as being within the scope of the present disclosure as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present application. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present application shall be subject to the appended claims.

Claims (13)

1. A method of code translation, the method comprising:
obtaining a source machine code, wherein the source machine code comprises a plurality of source basic blocks;
according to the execution sequence of the source basic blocks when the source machine code runs, target operation is executed for the source basic blocks for multiple times until the target operation is completed on the last source basic block in the execution sequence;
wherein an ith target operation of the plurality of target operations comprises:
detecting whether the candidate source basic block corresponding to the ith target operation has finished the translation processing;
if so, acquiring a target basic block obtained after the candidate source basic block is translated, and operating the target basic block;
if not, translating the candidate source basic block based on an LLVM compiler to obtain the target basic block, and operating the target basic block.
2. The method of claim 1, wherein the LLVM-based compiler performs translation processing on the candidate source basic blocks to obtain the target basic block, and comprises:
creating an LLVM function corresponding to the candidate source basic block;
acquiring entry addresses of the candidate source basic blocks, sequentially acquiring source instructions from the source basic blocks according to the entry addresses, acquiring intermediate instructions corresponding to the source instructions after the source instructions are acquired each time, and adding the intermediate instructions to the LLVM function after the intermediate instructions are acquired each time;
and after the intermediate instructions corresponding to all the source instructions in the candidate source basic block are added to the LLVM function, the LLVM function is operated to translate the candidate source basic block to obtain the target basic block.
3. The method of claim 2, wherein the obtaining the entry address of the candidate source basic block comprises:
if i is 1, taking the entry address of the source machine code as the entry address of the candidate source basic block;
and if i is larger than 1, determining the entry address of the candidate source basic block according to the last source instruction in the source basic block corresponding to the i-1 st target operation.
4. The method according to claim 2 or 3, wherein the sequentially fetching source instructions from the source basic block according to the entry address comprises:
according to the entry address, sequentially executing a plurality of instruction acquisition operations until the acquired source instruction is a control flow instruction;
wherein the ith instruction fetch operation in the instruction fetch operations comprises:
determining byte identifiers of candidate source instructions corresponding to the ith instruction acquisition operation according to the entry addresses;
and acquiring the candidate source instruction according to the byte identification.
5. The method of claim 2, wherein said fetching an intermediate instruction corresponding to the source instruction comprises:
and querying a preset instruction database according to the source instruction to obtain an intermediate instruction corresponding to the source instruction, wherein the instruction database stores the corresponding relation between a plurality of groups of preset source instructions and preset intermediate instructions.
6. The method of claim 5, wherein the step of building the instruction database comprises:
acquiring an instruction set specification corresponding to the source machine code, wherein the instruction set specification comprises a plurality of preset source instructions;
after each preset source instruction is realized by adopting preset program language simulation, each preset source instruction is translated to obtain a preset intermediate instruction corresponding to each preset source instruction;
and constructing the instruction database according to each preset source instruction and each preset intermediate instruction.
7. The method of claim 2, wherein after obtaining the target basic block, the method further comprises:
and taking the entry address of the candidate source basic block as the identifier of the target basic block, and storing the target basic block in a preset cache space.
8. The method according to claim 7, wherein the detecting whether the candidate source basic block corresponding to the i-th target operation has completed the translation processing comprises:
detecting whether the preset cache space contains a target identifier corresponding to the entry address of the candidate source basic block or not according to the entry address of the candidate source basic block, if so, finishing translation processing on the candidate source basic block corresponding to the ith target operation;
correspondingly, the obtaining of the target basic block obtained after the translation processing of the candidate source basic block includes:
and searching a target basic block corresponding to the target identifier from the preset buffer space to be used as the target basic block obtained after the candidate source basic block is translated.
9. The method of claim 1, wherein the performing a plurality of target operations on the plurality of source basic blocks until the target operation is completed on a last source basic block in the execution order comprises:
performing a plurality of target operations for the plurality of source basic blocks until an exit instruction in the source machine code is detected.
10. An apparatus for code translation, the apparatus comprising:
the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring a source machine code, and the source machine code comprises a plurality of source basic blocks;
the execution module is used for executing a plurality of target operations aiming at the plurality of source basic blocks according to the execution sequence of the plurality of source basic blocks when the source machine code runs until the target operation is completed on the last source basic block in the execution sequence;
wherein an ith target operation of the plurality of target operations comprises:
detecting whether the candidate source basic block corresponding to the ith target operation has finished the translation processing;
if so, acquiring a target basic block obtained after the candidate source basic block is translated, and operating the target basic block;
if not, translating the candidate source basic block based on an LLVM compiler to obtain the target basic block, and operating the target basic block.
11. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor realizes the steps of the method of any one of claims 1 to 9 when executing the computer program.
12. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 9.
13. A computer program product comprising a computer program, characterized in that the computer program realizes the steps of the method of any one of claims 1 to 9 when executed by a processor.
CN202210568118.9A 2022-05-24 2022-05-24 Code translation method, device, equipment, storage medium and program product Pending CN114995820A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210568118.9A CN114995820A (en) 2022-05-24 2022-05-24 Code translation method, device, equipment, storage medium and program product

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210568118.9A CN114995820A (en) 2022-05-24 2022-05-24 Code translation method, device, equipment, storage medium and program product

Publications (1)

Publication Number Publication Date
CN114995820A true CN114995820A (en) 2022-09-02

Family

ID=83026454

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210568118.9A Pending CN114995820A (en) 2022-05-24 2022-05-24 Code translation method, device, equipment, storage medium and program product

Country Status (1)

Country Link
CN (1) CN114995820A (en)

Similar Documents

Publication Publication Date Title
US11893386B1 (en) Optimizing source code from binary files
US8843944B2 (en) Accelerated class check
US8561045B2 (en) Constructing runtime state for inlined code
US20180275909A1 (en) Memory management for sparse matrix multiplication
US9336125B2 (en) Systems and methods for hardware-assisted type checking
US10346310B2 (en) Cache block apparatus and methods
CN104798075A (en) Application randomization
KR20200031150A (en) Coefficients of components in data items of data processing unit
TWI785064B (en) Apparatus, method, computer program, and computer-readable storage medium for testing bit values inside vector elements
US20110302394A1 (en) System and method for processing regular expressions using simd and parallel streams
CN113296786B (en) Data processing method, device, electronic equipment and storage medium
US20100058299A1 (en) Method of array interception using data-flow analysis
US8769498B2 (en) Warning of register and storage area assignment errors
KR102594770B1 (en) Matching continuous values in data processing devices
US9703578B2 (en) Providing class loading for JAVA™ applications
CN107729118A (en) Towards the method for the modification Java Virtual Machine of many-core processor
US10235167B2 (en) Microprocessor with supplementary commands for binary search and associated search method
CN114995820A (en) Code translation method, device, equipment, storage medium and program product
US9342319B1 (en) Accelerated class check
CN115904486A (en) Code similarity detection method and device
Gamaarachchi et al. Cache friendly optimisation of de bruijn graph based local re-assembly in variant calling
CN112287306B (en) Protection method and device for application program installation package and computer readable storage medium
Hai et al. Multi-threaded on-the-fly model generation of malware with hash compaction
Mišić et al. Comparison of parallel central processing unit‐and graphics processing unit‐based implementations of greedy string tiling algorithm for source code plagiarism detection
CN117273112A (en) Deep learning framework adjustment method, device, equipment, medium and program product

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination