CN118012506B - Binary translation optimization method and device, electronic equipment and storage medium - Google Patents

Binary translation optimization method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN118012506B
CN118012506B CN202410417400.6A CN202410417400A CN118012506B CN 118012506 B CN118012506 B CN 118012506B CN 202410417400 A CN202410417400 A CN 202410417400A CN 118012506 B CN118012506 B CN 118012506B
Authority
CN
China
Prior art keywords
basic block
type
order data
current basic
vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202410417400.6A
Other languages
Chinese (zh)
Other versions
CN118012506A (en
Inventor
胡起
兰彦志
杨兆鑫
曾露
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Loongson Technology Corp Ltd
Original Assignee
Loongson Technology Corp Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Loongson Technology Corp Ltd filed Critical Loongson Technology Corp Ltd
Priority to CN202410417400.6A priority Critical patent/CN118012506B/en
Publication of CN118012506A publication Critical patent/CN118012506A/en
Application granted granted Critical
Publication of CN118012506B publication Critical patent/CN118012506B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Devices For Executing Special Programs (AREA)

Abstract

The embodiment of the invention provides a binary translation optimization method, a binary translation optimization device, electronic equipment and a storage medium, wherein the binary translation optimization method comprises the following steps: identifying the type of an instruction contained in a current basic block to be translated; the types of instructions include vector instructions, including vector arithmetic instructions and/or scalar arithmetic instructions; analyzing the state of the high-order data of the vector register corresponding to each vector instruction in the current basic block to obtain a high-order data analysis result of the current basic block; determining the type of the current basic block according to the type of the instruction contained in the current basic block and the high-order data analysis result of the current basic block; the type of the current basic block is used for indicating whether scalar operation instructions existing in the current basic block can be optimized; and optimizing the current basic block according to the type of the current basic block. The embodiment of the invention can reduce the redundant instruction of saving and recovering high-order data generated by translating scalar operation instructions and improve the performance of a translated program.

Description

Binary translation optimization method and device, electronic equipment and storage medium
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a binary translation optimization method, a binary translation optimization device, an electronic device, and a storage medium.
Background
Binary translation may convert a source program running on one architecture platform to a target program running on another architecture platform to solve the problem of application running across instruction set architectures at the binary level. Binary translation is one of the research hotspots in the field of computer technology in recent years.
In the x86 architecture, SSE (STREAMING SIMD Extension) is a SIMD (Single instruction multiple data stream) extended instruction set. The x86 SSE instruction set includes scalar operation instructions for replacing native x87 floating point operation instructions. When these scalar operation instructions are executed, the lower bits of the vector register hold the operation result, while the upper bits hold the original value. For the instruction set of other architectures, such as RISC architecture, only the low-order operation result is calculated and stored in the target register, and the high-order data in the target register is not guaranteed to be the original value.
Therefore, in order to ensure the correctness of the translation semantics, a save-restore operation needs to be performed on the upper bits in the vector registers when each scalar operation instruction is translated, which brings about a large overhead and reduces the performance of the translated program.
Disclosure of Invention
In view of the foregoing, embodiments of the present invention have been developed to provide a binary translation optimization method that overcomes or at least partially solves the foregoing problems, and that can reduce unnecessary high-order data "save-restore" operation instructions generated by translating scalar operation instructions, thereby improving the performance of the translated program.
Correspondingly, the embodiment of the invention also provides a binary translation optimizing device, electronic equipment and a storage medium, which are used for ensuring the realization and application of the method.
In a first aspect, an embodiment of the present invention discloses a binary translation optimization method, where the method includes:
Identifying the type of an instruction contained in a current basic block to be translated; the type of instruction includes a vector instruction including at least one of a vector operation instruction and a scalar operation instruction;
Analyzing the state of high-order data of a vector register corresponding to each vector instruction in the current basic block to obtain a high-order data analysis result of the current basic block; the high-order data analysis result of the current basic block comprises the state of the high-order data of the vector register corresponding to each vector instruction in the current basic block; the state of the high-order data of the vector register is used for indicating whether the high-order data is a preset value and the source of the high-order data;
Determining the type of the current basic block according to the type of the instruction contained in the current basic block and the high-order data analysis result of the current basic block; the type of the current basic block is used for indicating whether scalar operation instructions existing in the current basic block can be optimized;
And optimizing the translated instruction of the current basic block according to the type of the current basic block.
In a second aspect, an embodiment of the present invention discloses a binary translation optimizing apparatus, the apparatus including:
the instruction stream analysis module is used for identifying the type of the instruction contained in the current basic block to be translated; the type of instruction includes a vector instruction including at least one of a vector operation instruction and a scalar operation instruction;
The data flow analysis module is used for analyzing the state of the high-order data of the vector register corresponding to each vector instruction in the current basic block to obtain a high-order data analysis result of the current basic block; the high-order data analysis result of the current basic block comprises the state of the high-order data of the vector register corresponding to each vector instruction in the current basic block; the state of the high-order data of the vector register is used for indicating whether the high-order data is a preset value and the source of the high-order data;
the type determining module is used for determining the type of the current basic block according to the type of the instruction contained in the current basic block and the high bit data analysis result of the current basic block; the type of the current basic block is used for indicating whether scalar operation instructions existing in the current basic block can be optimized;
And the instruction optimization module is used for optimizing the translated instruction of the current basic block according to the type of the current basic block.
In a third aspect, an embodiment of the present invention discloses an electronic device, including: the device comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface complete communication with each other through the communication bus; the memory is configured to store at least one executable instruction that causes the processor to perform the steps of the binary translation optimization method as described in any one of the preceding claims.
In a fourth aspect, an embodiment of the present invention discloses a readable storage medium, where a program or an instruction is stored, where the program or the instruction can implement the binary translation optimization method according to any one of the embodiments of the present invention when executed by a processor.
The binary translation optimization method, the binary translation optimization device, the electronic equipment and the storage medium provided by the embodiment of the invention have the following advantages:
In the binary translation process, the embodiment of the invention classifies the current basic block, so that whether scalar operation instructions existing in the current basic block can be optimized or not is determined according to the type of the current basic block, for example, unnecessary 'save-restore' operation instructions generated by translating the scalar operation instructions are deleted, and the performance of a translated program is improved.
Further, the embodiment of the invention classifies the current basic block according to the type of the instruction contained in the current basic block and the high-order data analysis result of the current basic block, instead of classifying the current basic block according to the type of the instruction contained in the current basic block. According to the high-order data analysis result of the current basic block, the high-order data of the vector registers corresponding to the vector instructions in the current basic block can be known to be meaningful actual data, the high-order data is needed to be used, and the operation of saving and recovering is still needed to be executed; and knowing which vector instructions in the current basic block correspond to the higher data of the vector register that is not to be used, such higher data does not require a save-restore operation to be performed. Therefore, the embodiment of the invention has more accurate type identification of the current basic block, can further reduce redundant 'save-restore' operation instructions, and further improves the translation optimization effect.
Drawings
FIG. 1 is a flow chart of steps of an embodiment of a binary translation optimization method of the present invention;
FIG. 2 is a schematic diagram of obtaining high-order data analysis results in one example of the invention;
FIG. 3 is a schematic diagram of determining the type of basic block in one example of the invention;
FIG. 4 is a schematic representation of the translation of a scalar operation instruction in one example of the invention;
FIG. 5 is a schematic diagram of linking different types of basic blocks in one example of the invention;
FIG. 6 is a block diagram of an embodiment of a binary translation optimizing apparatus of the present invention;
Fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
In order that the above-recited objects, features and advantages of the present invention will become more readily apparent, a more particular description of the invention will be rendered by reference to the appended drawings and appended detailed description.
The terms first, second and the like in the description and in the claims, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged, as appropriate, such that embodiments of the present invention may be implemented in sequences other than those illustrated or described herein, and that the objects identified by "first," "second," etc. are generally of a type, and are not limited to the number of objects, such as the first object may be one or more. Furthermore, the term "and/or" as used in the specification and claims to describe an association of associated objects means that there may be three relationships, e.g., a and/or B, may mean: a exists alone, A and B exist together, and B exists alone. The character "/" generally indicates that the context-dependent object is an "or" relationship. The term "plurality" in embodiments of the present invention means two or more, and other adjectives are similar.
Binary translation may be used to solve the problem of binary-level applications running across an instruction set architecture (Instruction Set Architecture, ISA). In particular, binary translation techniques may translate sequences of instructions in one instruction set architecture into sequences of instructions in another instruction set architecture. For example, binary translation may translate instruction sequences in an x86 instruction set architecture into instruction sequences in an arm instruction set architecture (e.g., arm 64), and vice versa. Binary translation includes both static translation and dynamic translation. Static translation is the translation of binary A on the source platform to translate the binary A into binary file B on the target platform prior to its execution. Based on the static translation technology, the translation result obtained by one translation can be used for a plurality of times. Dynamic translation is the translation of fragments that are executed when a binary is run. The binary translation optimization method of the embodiment of the invention can be applied to static translation or dynamic translation, and the translation is performed by taking basic blocks (Translation Block, TB) as units. A basic block typically ends with control flow change instructions (e.g., jumps, function calls, etc.).
In the embodiment of the present invention, the source program refers to a program to be binary translated, and the source program may be any type of application program. A source platform refers to a machine platform that can run a source program. In some embodiments, the source platform may also be referred to as a client. The processor of the source platform may be a processor based on a first instruction set architecture. The first instruction set architecture may be, for example, an x86 instruction set architecture or an arm instruction set architecture (e.g., arm 64). The source program is a program developed based on the first instruction set architecture, and thus, the source program can be normally run on the source platform. The target platform is a machine platform where migration of source programs is desired. In some embodiments, the target platform may also be referred to as a host. The processor of the target platform may be a processor based on a second instruction set architecture. The second instruction set architecture is a different instruction set architecture than the first instruction set architecture. For example, the first instruction set architecture is the x86 instruction set architecture and the second instruction set architecture is the arm instruction set architecture. As another example, the first instruction set architecture is an arm instruction set architecture and the second instruction set architecture is an x86 instruction set architecture. For another example, the first instruction set architecture is the x86 instruction set architecture and the second instruction set architecture is LoongArch (Dragon architecture). The source program may run on the source platform using source binary code. The source binary code is code based on a first instruction set architecture. If it is desired to run the source program on the target platform, the source binary needs to be translated into target binary. The target binary is code based on the second instruction set architecture such that the target binary can be run on the target platform.
FIG. 1 shows a flow chart of steps of an embodiment of a binary translation optimization method of the present invention, which may include the steps of:
step 101, identifying the type of an instruction contained in a current basic block to be translated; the type of instruction includes a vector instruction including at least one of a vector operation instruction and a scalar operation instruction;
102, analyzing the state of high-order data of a vector register corresponding to each vector instruction in the current basic block to obtain a high-order data analysis result of the current basic block; the high-order data analysis result of the current basic block comprises the state of the high-order data of the vector register corresponding to each vector instruction in the current basic block; the state of the high-order data of the vector register is used for indicating whether the high-order data is a preset value and the source of the high-order data;
Step 103, determining the type of the current basic block according to the type of the instruction contained in the current basic block and the high-order data analysis result of the current basic block; the type of the current basic block is used for indicating whether scalar operation instructions existing in the current basic block can be optimized;
Step 104, optimizing the translated instruction of the current basic block according to the type of the current basic block.
The binary translation optimization method provided by the embodiment of the invention can be applied to a static translation or dynamic translation scene, in particular to a binary translation system, wherein the binary translation system can be used for translating a source code into a target code, the source code is a binary code of a client platform, and the target code is a binary code of a host platform. It should be noted that, the embodiments of the present invention do not limit the host platform and the client platform. The scheme of the present invention can be used whenever the vector instruction in the client platform includes a high-order reservation or zero-out.
In an embodiment of the present invention, a vector instruction refers to an instruction that requires the use of vector registers. Taking the x86 instruction set as an example, vector instructions refer to x86 SSE vector instructions. Vector instructions may include vector operation instructions and scalar operation instructions. The vector operation instruction refers to an instruction that uses the high order and the low order of the vector register at the same time. Scalar operation instructions refer to instructions that use only the lower bits of the vector registers. That is, high-order data of a vector operation instruction, such as an ADDPS instruction, participates in the operation. The high order data of the scalar operation instruction is reserved and does not participate in the operation, and the scalar operation instruction is such as ADDSS instruction.
In the binary translation process, the embodiment of the invention identifies the type of the instruction contained in the current basic block to be translated, and analyzes the state of the high-order data of the vector register corresponding to each vector instruction in the current basic block to obtain the high-order data analysis result of the current basic block; the type of the current basic block can be distinguished according to the type of the instruction contained in the current basic block and the high-order data analysis result of the current basic block; it is thus possible to determine, based on the type of the current basic block, whether or not the scalar operation instruction present in the current basic block can be optimized, for example, whether or not a "save-restore" operation instruction that translates high-order data of a vector register generated by the scalar operation instruction can be eliminated.
Optionally, the identifying the type of the instruction contained in the current basic block to be translated may include: disassembling instructions contained in the current basic block; whether the instruction is a vector operation instruction or a scalar operation instruction is determined based on an operation code (Opcode) of the instruction in the disassembly result. For example, for an operation code obtained after disassembling an instruction, whether the operation code exists or not may be respectively queried in the first list and the second list; if the operation code exists in the first list, determining the instruction as a vector operation instruction; if the opcode is present in the second list, the instruction is determined to be a scalar operation instruction. The first list comprises operation codes corresponding to the vector operation instructions, and the second list comprises operation codes corresponding to the scalar operation instructions.
According to the embodiment of the invention, the high-order data analysis result of the current basic block is obtained by analyzing the state of the high-order data of the vector register corresponding to each vector instruction in the current basic block; the high-order data analysis result of the current basic block comprises the state of the high-order data of the vector register corresponding to each vector instruction in the current basic block; the state of the high order data of the vector register may be used to indicate whether an instruction using the vector register uses the high order bits of the vector register. For example, if the state of the high-order data of a certain vector register is a preset value state, it indicates that the instruction using the vector register does not use the high-order data of the vector register, and the high-order data of the vector register is a preset value. The preset value may be a predefined value for indicating that the high-order bit is nonsensical data, for example, the preset value may be 0, or the preset value may be null or other predefined values, in which the preset value is 0 in the embodiment of the present invention, the 0 state indicates that the high-order bit is 0. For another example, if the state of the high-order data of a certain vector register is a transfer state, it indicates that the instruction using the vector register uses the high-order data of the vector register, and the high-order data of the vector register is significant actual data written by the certain instruction. At this time, the source of the high-order data of the vector register may also be recorded, for example, from which other register the high-order data of the vector register is transferred.
The vector register corresponding to the vector instruction may include a source register or a destination register corresponding to the vector instruction. Since the instruction carries a register number, the vector register corresponding to the vector instruction refers to the register corresponding to the register number carried by the vector instruction.
Therefore, according to the high-order data analysis result of the current basic block, the high-order data of the vector registers corresponding to the vector instructions in the current basic block can be known to be meaningful actual data, and the high-order data is needed to be used, and the save-restore operation still needs to be executed; and knowing which vector instructions in the current basic block correspond to the higher data of the vector register that is not to be used, such higher data does not require a save-restore operation to be performed.
According to the embodiment of the invention, the type of the current basic block is determined according to the type of the instruction contained in the current basic block and the high-order data analysis result of the current basic block, namely, whether the scalar operation instruction can be further optimized or not is identified under the condition that the current basic block contains the scalar operation instruction, and further, the instruction translated by the current basic block can be optimized according to the classification result, such as deleting unnecessary 'save-restore' operation on high-order data, so that the improvement of the performance of a program after translation is realized.
The translation optimization method of the embodiment of the invention can optimize the 'save-restore' operation instruction of the high-order data of the vector register generated by translating the scalar operation instruction, and can realize the optimization of the translation of the scalar operation instruction in the x86 SSE vector instruction, and because the scalar operation instruction in the SSE vector instruction is commonly used for replacing the floating point x87 instruction by the modern x86 program, the translation optimization method of the embodiment of the invention can obviously improve the floating point operation performance of the modern x86 program.
In an optional embodiment of the present invention, the analyzing the state of the high-order data of the vector register corresponding to each vector instruction in the current basic block to obtain the high-order data analysis result of the current basic block may include:
Step S11, according to the order of the instruction flow in the current basic block, sequentially simulating the high-order data of the vector register corresponding to each vector instruction, and determining the source of the high-order data of each vector register;
Step S12, determining and recording the state of the high-order data of each vector register according to the source of the high-order data of each vector register; the state of the high-order data comprises one of a preset value state, a transmission state and an undefined state; the preset value state indicates that the high-order data is a preset value; the transfer state indicates that the high-order data is not a preset value, and the source of the high-order data is other registers; the undefined state indicates that the source of the high-order data is unknown.
The embodiment of the invention analyzes the state of the high-order data of the vector register corresponding to each vector instruction in the current basic block to obtain the high-order data analysis result of the current basic block. The high-order data analysis result of the current basic block can be stored in the information domain of the current basic block and can be used for type identification and subsequent optimization of the current basic block.
According to the embodiment of the invention, the data flow analysis is carried out on the current basic block, so that the state of the high-order data of the vector register corresponding to each vector instruction in the current basic block is analyzed. Specifically, according to the instruction flow sequence in the current basic block, according to the instruction semantics, the high-order data of the vector register corresponding to each vector instruction are simulated and calculated one by one, and the source of the high-order data is recorded. In order to improve translation performance, the embodiment of the invention simplifies the state of high-order data of the vector register into two states, namely a preset value state (such as a 0 state) and a transfer state. The 0 state indicates that the instruction that historically uses the vector register does not use the high order bits of the vector register, which have high order data of 0. The transfer state indicates that the instruction using the vector register historically uses the high order bits of the vector register, the high order bit data in the vector register is not 0, and the source of the high order bit data is other registers and is transferred by other registers. Further, for the case that the source of the data cannot be confirmed, the state of the high-order data of the vector register may be set to an undefined state, for example, the high-order data loaded into the vector register from the memory is set to the undefined state.
Referring to fig. 2, a schematic diagram of acquiring a high-level data analysis result in one example of the embodiment of the present invention is shown. As shown in fig. 2, the current basic block is basic block 1, and according to the order of the instruction flow in basic block 1, the high-order data of the vector register corresponding to each vector instruction is simulated in turn, and the source of the high-order data of each vector register is determined; depending on the source of the high order data of each vector register, the state of the high order data of each vector register is determined and recorded, which information can be recorded into the information field of the current basic block (basic block 1). As shown in fig. 2 ①, the information field of basic block 1 records the state and source of the high-order data of each vector register in basic block 1 (the source may be recorded when it is not in the 0 state). Further, as shown in fig. 2 ①, the information field of the current basic block (basic block 1) has recorded the following two vectors: a and b. Where b is a flag vector, also referred to as a status word in the embodiment of the present invention. The status word is used for recording the status of the high-order data of the vector register corresponding to each vector instruction in the current basic block, wherein 0 represents a 0 state, and 1 represents a transfer state or an undefined state. Vector a is used to record the source of the high order bits. As shown in fig. 2 ①, the high-order data analysis result of the basic block 1 may include a vector a and a vector b.
After the high-order data analysis of a basic block is completed, the state of the high-order data of each vector register in the basic block can be determined independently, and the high-order data analysis result of the basic block can be recorded in the information field of the basic block for the type identification and subsequent optimization process of the basic block.
In an alternative embodiment of the present invention, the method may further include:
And transmitting the high-order data analysis result of the current basic block to a next basic block to serve as high-order data initial information of the next basic block, so that the next basic block calculates the high-order data analysis result of the next basic block based on the high-order data initial information.
In the embodiment of the invention, the high-order data analysis can be carried out not only on the inside of the basic blocks, but also the high-order data analysis of the vector registers crossing the basic blocks, namely the high-order data analysis result of the vector registers of the preceding basic blocks can be used as the high-order data initial information of the vector registers of the following basic blocks. It should be noted that the leading basic block and the following basic block refer to two basic blocks that are executed in succession. As in the example of fig. 2, the leading basic block is basic block 1 and the following basic block is basic block 2.
As shown in fig. 2 ③, the high-order data analysis result of the current basic block (basic block 1) may be transferred to the next basic block (basic block 2) as the high-order data initial information of the next basic block. The basic block 2 may calculate its own high bit data analysis result based on the high bit data initial information.
When the next basic block (basic block 2) is translated, the high-order data analysis result of the basic block 1 transferred from the previous basic block (basic block 1) may be read (see fig. 2 ③). That is, the basic block 2 receives the status word of the basic block 1, the basic block 2 can maintain the status word of the basic block 1, and can calculate the own status word (own high-order data analysis result).
Specifically, when initializing the high order bits of the vector register of the basic block 2, the high order bit data analysis result transferred from the basic block 1 may be used as the high order bit data initial information of the basic block 2, and the high order bits of the vector register of the basic block 2 may be initialized. If the state word of the high-order data of a certain vector register is in the 0 state when the basic block 1 exits, the high-order data of the vector register is initialized to the 0 state when the basic block 2 is initialized. As shown in fig. 2 ⑧, the 0 th bit of the status word transferred from the basic block 1 marks the state of the high-order data of the xmm0 register, which indicates that the high-order data of the xmm0 register is in the 0 state when the basic block 1 exits. Therefore, the high order bit of the xmm0 register is initialized to the 0 state at the time of initialization of basic block 2. Taking the x86 instruction set as an example, the vector operation instruction movpd in basic block 2 (the instruction is an x86 instruction) will then carry the high-order data in the xmm0 register to the high-order bits of the xmm1 register, so the high-order data in the xmm1 register will also be recorded as a 0 state at this time. After the basic block 2 completes the analysis of the high-order data, the high-order data of the xmm1 register may be obtained as 0 (as shown in fig. 2 ⑤), where the source of the high-order data is the high-order bits of the xmm0 register.
In the embodiment of the invention, the current basic block transmits the high-order data analysis result to the next basic block, and the next basic block analyzes and maintains the high-order data analysis result besides the high-order data analysis result transmitted by the preamble basic block. The status word (high-order data analysis result) in fig. 2 ⑧ is transferred from the basic block 1, and after the basic block 2 performs the analysis work on itself as described in step 102, the status word (high-order data analysis result) of the basic block 2 is shown in fig. 2 ⑤.
Further, the embodiment of the invention transfers the high-order data analysis result of the current basic block to the next basic block. The high-order data analysis result of the current basic block not only contains the state of the high-order data of each vector register, but also records the source of the high-order data, thereby knowing the transmission direction of the high-order data. Therefore, the embodiment of the invention can carry the high-order data of the corresponding vector register according to the source of the high-order data of the recorded vector register at the tail end of the current basic block. For example, if a vector register transfers high-order data, a carry instruction may be generated at the end of the current basic block to complete the carry operation for the high-order data (e.g., FIG. 2 ②). For example, in this example, the handling of the high order bits of the xmm1 register to the high order bits of the xmm0 register is accomplished by a handling instruction.
In an alternative embodiment of the present invention, the method may further include:
Step S21, judging whether a high-order data analysis result of the current basic block is matched with high-order data initial information of a next basic block or not when the current basic block searches and links the next basic block after translation of the current basic block is completed;
step S22, if the high-order data analysis result of the current basic block is matched with the high-order data initial information of the next basic block, linking and executing the next basic block; otherwise, the next basic block is obtained through re-translation.
The high-order data analysis result of the current basic block (basic block 1) may be transferred to the next basic block (basic block 2) as the high-order data initial information of the next basic block. When the next basic block (basic block 2) is translated, the high-order data analysis result of the basic block 1 transferred from the previous basic block (basic block 1) may be read (see fig. 2 ③).
After the translation of the current basic block (basic block 1) is completed, searching and linking the next basic block, taking the initial high-order data information of the next basic block as a key value to participate in comparison, and judging whether the high-order data analysis result of the current basic block (basic block 1) is matched with the initial high-order data information of the next basic block. Only the next basic block (e.g. basic block 2) which is matched with the two basic blocks can be searched and executed (e.g. fig. 2 ⑦), if the two basic blocks are not matched (e.g. fig. 2 ⑥), the high-order data analysis result of the current basic block is taken as the high-order data initial information of the next basic block, and the next basic block is obtained through re-translation (e.g. fig. 2 ⑨).
As shown in fig. 2, basic block 2' and basic block 2 may be translated from the same guest instruction, except that different translation modes are employed due to the different state words. Basic block 2' is translated earlier than basic block 2, but basic block 2 is re-translated due to the status word mismatch.
In an optional embodiment of the present invention, the determining the type of the current basic block according to the type of the instruction included in the current basic block and the high-order data analysis result of the current basic block may include:
If the current basic block only contains a scalar operation instruction, determining that the type of the current basic block is a scalar S type; or alternatively
If the current basic block contains a vector operation instruction and a scalar operation instruction, judging whether the current basic block meets a first condition or not; if the first condition is met, determining that the type of the current basic block is a vector P type; if the first condition is not met, judging whether the current basic block meets a second condition or not; if the second condition is met, determining that the type of the current basic block is a vector P type; otherwise, determining the type of the current basic block as a scalar S type;
Wherein the first condition includes: the current basic block comprises at least one vector operation instruction, wherein the high-order data of a vector register read by the current basic block is in an undefined state, and the high-order data influences the low-order data of a written vector register; the second condition includes: the high-order data of the vector operation instruction read and written at the end of the current basic block are undefined states.
The embodiment of the invention performs instruction stream analysis (step 101) and data stream analysis (step 102) on the current basic block, and determines the type of the current basic block according to the instruction stream analysis result (the type of the instruction contained in the current basic block) and the data stream analysis result (the high-order data analysis result of the current basic block). The type of the current basic block is used to indicate whether optimizations can be made for scalar operation instructions present in the current basic block.
Illustratively, embodiments of the present invention may divide basic blocks into three types: s (Scalar ) type, P (Packed, vector) type, and N (None, normal) type. Wherein scalar operation instructions in basic blocks of scalar type (S-type) can be optimized. Scalar operation instructions in basic blocks of vector type (P-type) cannot be optimized. The basic block of the normal type (N type) does not contain an x86 SSE vector instruction, namely does not contain a scalar operation instruction, and optimization is not needed.
Specifically, the present invention may determine the type of the current basic block according to the following steps:
(1) If the current basic block does not contain vector instructions, that is, only non-vector instructions are contained in the current basic block, the type of the current basic block is determined to be N type (the current basic block is a common basic block).
(2) If only the scalar operation instruction is included in the current basic block, the type of the current basic block is determined to be S type (the current basic block is a scalar basic block).
(3) If the current basic block contains both scalar operation instructions and vector operation instructions, data flow analysis is carried out on the current basic block, and whether the type of the current basic block is S type or P type is further judged. Specifically:
a) Judging whether the current basic block meets a first condition or not; and if the first condition is met, determining that the type of the current basic block is P type. The first condition includes: the current basic block contains high-order data of at least one vector operation instruction reading vector register in an undefined state, and the high-order data affects low-order data of the written vector register.
B) Judging whether the current basic block meets a second condition or not; and if the second condition is met, determining that the type of the current basic block is P type. The second condition includes: the vector operation instruction at the end of the current basic block reads and writes the high-order data of the vector register in an undefined state. Specifically, the operation may be performed on the high-order data in the current basic block according to the instruction semantics, and if the high-order data of the vector register obtained by analysis is in an undefined state (not in a 0 state or a transfer state) at the end time of the current basic block, the type of the current basic block is considered to be a P type.
It should be noted that, the vector operation instruction at the end of the current basic block refers to the last vector instruction in the current basic block, but is not necessarily the last instruction of the current basic block.
Where a) represents that there is a vector instruction in the current basic block that actually uses the high-order data of the vector register. b) Indicating that it cannot be determined whether the high order data of the vector registers in the current basic block will be used by the following instruction. Both cases require "save-restore" of the high order data in the vector registers, avoiding loss of the high order data, and therefore, the two cases are classified as P-type.
C) The remaining cases are all determined to be S-type.
Taking the x86 instruction set as an example, referring to FIG. 3, a schematic diagram of determining the type of basic block in one example of embodiment of the invention is shown. As shown in FIG. 3 ①, the current basic block contains vector operation instructions PXOR and ANDPS. Since the high-order data of the xmm1 register is cleared by the PXOR instruction, the high-order data of the xmm0 register operated by the ANDPS instruction is in an undefined state, but after the operation according to the instruction semantics thereof, the high-order data of the xmm0 register is a certain value 0, and therefore, b) is not satisfied, and at the same time, the undefined high-order data (xmm 0) does not affect the low-order value, and therefore, a) is not satisfied, and therefore, the basic block is satisfied with c), and the type of the basic block can be determined as S type.
As shown in fig. 3 ②, the current basic block contains vector operation instructions PXOR and MOVHLPS. Since the state of the high data of the xmm1 register is undefined and is carried to the low bit of the xmm0 register by the command MOVHLPS, the low bit data of the vector register written is affected, and thus a) is satisfied, the type of the basic block is P-type.
As shown in FIG. 3 ③, the vector operation instruction at the end of the current basic block is the ANDPS. The ANDPS reads the xmm0 register and writes the high-order data of the xmm1 register with uncertainty (both are undefined states), and the high-order data of the obtained operation result is also an uncertainty value. B) is met, and therefore the type of the basic block is P-type.
The embodiment of the invention classifies the current basic block by combining the type of the instruction contained in the current basic block and the high-bit data analysis result of the current basic block, so that the type of the basic block is more accurate. If the current basic block is classified based only on the type of instruction contained in the current basic block, a large number of P types will be obtained, and some of these P types of basic blocks may be classified into S types, which will result in that they cannot be optimized, thereby affecting the effect of the optimization. According to the embodiment of the invention, more accurate classification can be realized, and the translation optimization effect can be further improved.
In an optional embodiment of the present invention, the optimizing the translated instruction of the current basic block according to the type of the current basic block may include:
If the type of the current basic block is determined to be S type, deleting the save and restore operation instruction of the high-order data of the vector register generated by translating the scalar operation instruction for the scalar operation instruction existing in the current basic block.
Referring to FIG. 4, a schematic diagram of the translation of a scalar operation instruction in one example of an embodiment of the invention is shown. As shown in fig. 4, what is to be translated in the current basic block is a scalar operation instruction ADDSD, which has the meaning of adding the lower bits of the vector registers, placing the added result in the lower bits of the target registers, and retaining the original higher bit data in the target registers. Since the LoongArch instruction adds up to destroy the high-order data of the destination register, a "save-restore" instruction is executed to protect the original high-order data in the destination register xmm 0. First, save the high order bits in xmm0 to temporary register temp (see FIG. 4 ①); then, an add instruction is executed (see FIG. 4 ②), which causes the high-order data in xmm0 to be corrupted (see FIG. 4 ③); finally, the original high-order data of xmm0 stored in temp is used to restore the high-order data of xmm0 (as shown in fig. 4 ④), so that the data stored in the high-order data of xmm0 can be restored to the original state (as shown in fig. 4 ⑤). The "save-restore" operation instruction is the execution of fig. 4 ① and fig. 4 ④. The "save-restore" operation instruction includes a save instruction for saving the high order bits in the vector register (e.g., xmm0 in fig. 4) to be protected in a temporary register (e.g., temp in fig. 4) and a restore instruction. The temporary register is not limited in the embodiment of the present invention, for example, the temporary register may be a vector register in a memory. The resume instruction is used to resume the high order bits held in the temporary register to the high order bits of the vector register (e.g., xmm 0) that need protection. The save instruction and the restore instruction typically occur in pairs, which in embodiments of the present invention are referred to simply as "save-restore" operation instructions.
If the current basic block is S-type, optimizing the translation result of scalar operation instruction ADDSD may be achieved by eliminating the save-restore operation instruction, and only the add instruction (e.g., fig. 4 ②) is reserved, i.e., only the core operation is reserved, eliminating redundant save-restore operation instructions. The target register refers to a vector register to which the scalar operation instruction ADDSD needs to write.
According to the embodiment of the invention, the basic blocks can be divided into three types of P/S/N, and the basic blocks of different types can be optimized according to different optimization modes.
Specifically, for a basic block of the S type, which is defined as a basic block that does not need to guarantee the high-order data of the vector register, the basic block of the type can perform an optimization action, and for a scalar operation instruction existing therein, a "save-restore" operation instruction for the high-order data of the vector register generated by the basic block of the S type can be deleted; for vector operation instructions that may exist therein, the translation may be performed according to the translation scheme in which the vector operation instruction is originally translated. In the basic block of S type, there may be vector instructions that affect the high order data of the vector registers, but according to the data flow analysis described above, the data affected by these instructions in the basic block of S type is meaningless in itself, so that the correctness of the basic block is not affected, and the condition of the high order data can be disregarded. For example, if a certain vector instruction in a certain basic block causes the high order bits of a certain vector register to be written, but in fact, the high order data in the vector register is set to 0 after the execution of the instruction is completed, the high order data written into the vector register by the vector instruction has no meaning (has no influence on the correctness of the program), and the save-restore operation instruction can be cleared. Of course, this is merely illustrative and is not intended to be limiting of the S-type basic block.
For basic blocks of the P type, which are defined as basic blocks that need to guarantee the high-order data of the vector register, the basic blocks of the type do not perform an optimization action, and for scalar operation instructions present therein, the resulting "save-restore" operation instructions for the high-order data of the vector register are still preserved.
For basic blocks of type N, there is no x86 SSE vector instruction in the basic block of type, and therefore there is no "save-restore" operation instruction to the high-order data of the vector register, i.e., there is no need to eliminate the operation instruction.
In an alternative embodiment of the present invention, the method may further include:
step S31, after the translation of the current basic block is completed, before the current basic block is linked to a next basic block, determining a link position according to the type of the current basic block and the type of the next basic block;
step S32, the current basic block is linked to the linking position.
After the basic blocks of the source code are translated, target code blocks can be obtained, direct and indirect skip links between the target code blocks can be completed according to the control flow of the source code, and the target code blocks are executed in sequence. It should be noted that, after the basic block translation is completed, for convenience of description, the embodiment of the present invention is still referred to as a basic block (may also be referred to as an object code block).
Basic block linking is also an optimization operation in binary translation, and basic blocks on the same control flow can be connected together through basic block linking, so that the cost of context switching and searching for the next basic block can be reduced. The embodiment of the invention distinguishes the types of the basic blocks, so that when the basic blocks are linked, corresponding link positions are dynamically determined for the basic blocks of different types.
In an optional embodiment of the invention, the determining the link position according to the type of the current basic block and the type of the next basic block may include: and if the type of the current basic block is the same as the type of the next basic block, determining the link position as the translated code of the next basic block.
Referring to FIG. 5, a schematic diagram of linking basic blocks of different types in one example of an embodiment of the invention is shown. As shown in fig. 5, since the effect on the high-order data of the vector register is the same for the same type of basic block, it is not necessary to perform special processing, and it is only necessary to perform the linking according to the normal basic block. For example, assuming that the type of the current basic block (e.g., basic block 1) is P-type and the type of the next basic block (e.g., basic block 2) is P-type, it is sufficient to determine that the link position is the post-translation code of basic block 2 and link basic block 1 to the post-translation code of basic block 2. For another example, assuming that the type of the current basic block (e.g., basic block 1) is S type and the type of the next basic block (e.g., basic block 2) is also S type, it is sufficient to determine that the link position is the post-translation code of basic block 2, and link basic block 1 to the post-translation code of basic block 2.
In an alternative embodiment of the present invention, the type of the instruction may further include a non-vector instruction, and the type of the current basic block may further include an N type, where the N type indicates that the current basic block includes only the non-vector instruction;
The determining a link position according to the type of the current basic block and the type of the next basic block may include: and if the type of the next basic block is N type, modifying the type of the next basic block into the type of the current basic block, and determining the link position as the translated code of the next basic block.
In particular implementations, since no x86 SSE vector instruction is used in an N-type basic block, there is no concern about the high-order data of the vector registers, and thus an N-type basic block may be considered an arbitrary type of basic block.
In order to reduce the overhead generated by special processing when the basic blocks are linked, when the current basic block is linked to the next basic block, if the type of the next basic block is N type, the embodiment of the invention modifies the type of the next basic block into the type of the current basic block, so that the N type basic block inherits the type of the basic block of the linker, namely the N type basic block inherits the type of the precursor basic block, and the linking can be performed according to the mode of the basic block linking of the same type.
For example, assuming that the type of the current basic block is a P type and the type of the next basic block is an N type, the N type of the next basic block may be modified to be a P type and linked according to the basic blocks of the same type (see fig. 5 ⑦).
In an alternative embodiment of the present invention, the method may further include: if the type of the current basic block is S type, adding a high-order preservation processing head before the translated code of the current basic block for preserving the high-order data of the appointed vector register;
The determining a link position according to the type of the current basic block and the type of the next basic block may include: if the type of the current basic block is P type and the type of the next basic block is S type, determining the link position as the high-order storage processing head of the next basic block.
In the embodiment of the present invention, the high-order data in the vector register is not concerned, but if the basic block of the S type is directly linked to the basic block of the S type, that is, if the type of the current basic block is the P type, the type of the next basic block is the S type, the direct link may cause the high-order data of the vector register to be lost because the basic block of the S type is executed. To avoid this problem, embodiments of the present invention add an upper save processing header to save the upper data of the vector registers specified (e.g., all) in the S-type basic block (e.g., fig. 5 ⑤) after the S-type basic block is translated, before the translated code of the S-type basic block. Illustratively, assuming that basic block 1 is an S-type basic block, after the translation of basic block 1 is completed, a high-order save processing header for saving high-order data of vector registers (e.g., all) specified in basic block 1 is added before the translated code of basic block 1.
When the basic block of the P type is linked to the basic block of the S type, the link position is determined to be the high-order storage processing head of the basic block of the S type. That is, the basic block of the P type is linked to the basic block of the S type at the high-order save processing head. Thus, the operation of the high-level save processing head is performed first to save the high-level data of the specified (e.g., all) vector registers in the S-type basic block into the memory, and then to run the translated code of the S-type basic block (e.g., fig. 5 ④).
It will be appreciated that if the type of the current basic block is S-type, the next basic block is also S-type, the linking may be performed in the same type of basic block linking manner, where the linking may skip the high-order save processing head of the next basic block and directly link to the translated code of the next basic block.
The embodiment of the invention can realize the preservation of the high-order data of the vector register by adding the special processing head (high-order preservation processing head) at the head of the S-type basic block so as to ensure the correctness of the high-order data when the basic blocks of different types are linked.
In fig. 5, a low-level load/save processing head and a full load/save processing head are also shown. Wherein the low-order loading processing head is used for loading low-order data of a vector register specified in the basic block. The low-order saving processing head is used for saving low-order data of a vector register specified in the basic block. The all-load processing head is used to load all data of the vector registers specified in the basic block. The all-save processing header is used to save all data of the vector registers specified in the basic block.
For a single basic block of vector type, all load processing heads and all save processing heads are added to the basic block. For a single scalar type basic block, a low-order load processing head and a low-order save processing head are added to the basic block.
In an alternative embodiment of the present invention, the method may further include: if the type of the current basic block is P type, adding a high-order loading processing head before the translated code of the current basic block for loading the high-order data of the appointed vector register;
The determining a link position according to the type of the current basic block and the type of the next basic block may include: and if the type of the current basic block is S type and the type of the next basic block is P type, determining the link position as the position of the high-order loading processing head of the next basic block.
In the embodiment of the present invention, the processing procedure of linking the S-type basic block to the P-type basic block is similar to the processing procedure of linking the P-type basic block to the S-type basic block. The difference is that when a basic block of the P type is linked to a basic block of the S type, the high-order save processing head of the basic block of the S type saves the high-order bits of the vector registers specified (e.g., all) in its basic block to the memory. Thus, when the basic block of the S type is linked to the basic block of the P type, high-order data needs to be loaded from the memory into the vector register.
Thus, after the translation of the basic block of the P type is completed, the embodiment of the present invention adds a high-order loading processing head before the translated code of the basic block of the P type, where the high-order loading processing head is used to load the high-order data (e.g., all) of the vector registers specified in the basic block of the P type (e.g., fig. 5 ③). Illustratively, assuming basic block 1 is a P-type basic block, after basic block 1 translation is completed, a high-order load processing head is added before the translated code of basic block 1, and is used to load high-order data of vector registers (such as all) specified in basic block 1.
When the basic block of the S type is linked to the basic block of the P type, the link position is determined to be the high-order loading processing head of the basic block of the P type. That is, the basic blocks of the S type are linked to the basic blocks of the P type at the high-order loading processing head. Thus, the runtime will first load the high-order data from the memory to the high-order bits in the vector register by the operation of the high-order loading processing head, and then run the translated code of the P-type basic block (see fig. 5 ②).
It will be appreciated that if the current basic block is P-type and the next basic block is P-type, the linking may be performed in the same type of basic block linking manner, where the linking may skip the high-order loading processing head of the next basic block and directly link to the translated code of the next basic block (see fig. 5 ⑥).
The embodiment of the invention can realize the recovery of the high-order data of the vector register by adding the special processing head (high-order loading processing head) at the head of the P-type basic block so as to ensure the correctness of the high-order data when the basic blocks of different types are linked.
The embodiment of the invention can save and restore the high-order data of the vector register by adding the special processing head at the head part of the basic block, thereby ensuring the correctness of the high-order data when the basic blocks of different types are linked. In addition, in the process of linking the basic blocks, the linking position can be dynamically selected according to whether the type of the precursor basic block is the same as the type of the precursor basic block, so that the use of a special processing head is optimized, and the cost caused by special processing is reduced.
In an alternative embodiment of the present invention, the storing the high-order data of the specified vector register may include:
step S41, acquiring a high-order data analysis result of the current basic block;
Step S42, determining a first vector register according to the high-order data analysis result, wherein the state of the high-order data of the first vector register is a transfer state or an undefined state;
Step S43, high-order data of the first vector register are saved.
In the process of linking basic blocks, when linking a P-type basic block to an S-type basic block, the high-order save processing head of the S-type basic block may save the high-order data of all vector registers in the basic block.
Further, if the high-order save processing head of the S-type basic block saves high-order data of all vector registers, it may result in saving some meaningless high-order data, such as high-order data of 0 state. In the case where the P-type basic block is linked to the S-type basic block, the high-order data of a certain vector register in the S-type basic block is in a 0 state, which indicates that the high-order data of the vector register is in a 0 state after the execution of the basic block is completed, and even if the high-order data is saved, the high-order data is covered by the execution result of the basic block, so that the save operation is not required.
Therefore, the embodiment of the invention further optimizes the save operation of the high-order save processing head when the P type basic block is linked to the S type basic block, so as to reduce the save operation actually executed. Specifically, when the current basic block is linked to the next basic block, if the current basic block is of the P type and the next basic block is of the S type, acquiring a high-order data analysis result of the current basic block; determining a first vector register according to the high-order data analysis result of the current basic block, wherein the state of high-order data of the first vector register is a non-0 state (such as a transfer state or an undefined state); the first vector register is used as a designated vector register, and only the high-order data of the first vector register is saved, and the high-order data of all vector registers are not saved.
In an alternative embodiment of the present invention, the loading the high-order data of the specified vector register may include:
step S51, acquiring high bit data initial information of the current basic block;
step S52, determining a second vector register according to the high-order data initial information, wherein the state of the high-order data of the second vector register is 0 state;
step S53, delay loading the high-order data of the second vector register.
In the process of linking basic blocks, when linking an S-type basic block to a P-type basic block, the high-order loading processing head of the P-type basic block may load the high-order data of all vector registers in the basic block immediately after linking.
Further, the embodiment of the invention further optimizes the loading operation of the high-order loading processing head when the S-type basic block is linked to the P-type basic block. Specifically, when the current basic block is linked to the next basic block, if the current basic block is of an S type and the next basic block is of a P type, acquiring high-order data initial information of the current basic block (namely, a high-order data analysis result transmitted by a preamble basic block of the current basic block); determining a second vector register according to the high-order data initial information of the current basic block, wherein the state of the high-order data of the second vector register is 0 state; and the loading of the high-order data of the second vector register is delayed instead of loading the high-order data of all vector registers immediately after linking.
For the case that the S-type basic block is linked to the P-type basic block, if the high-order data of a certain vector register in the P-type basic block is in a 0 state, the high-order data of the vector register is loaded, that is, a zero clearing operation is performed on the high-order data of the vector register. Embodiments of the present invention may delay the zeroing operation of the second vector register, e.g., loading the high-order data of the second vector register when the second vector register needs to be used.
Further, in the case that the S-type basic block is linked to the P-type basic block, in order to ensure the correctness of the high-order data of the vector register, in the embodiment of the present invention, a clear operation instruction may be further added at the end of the P-type basic block, where the clear operation instruction is used to execute a clear operation on the high-order data of the vector register that is not cleared due to delayed clear in the P-type basic block.
In the process of basic block linking, the embodiment of the invention further optimizes the operation of special processing heads (a high-order storage processing head and a high-order loading processing head) according to the high-order data analysis result of the preamble basic block and the high-order data initial information of the current basic block, can reduce the storage and recovery operation of the high-order data of a vector register when different types of basic blocks are linked, and further quickens the floating point operation of a translated program.
In summary, in the binary translation process, the embodiment of the invention classifies the current basic block, so that whether the scalar operation instruction existing in the current basic block can be optimized or not can be determined according to the type of the current basic block, such as deleting unnecessary high-order data 'save-restore' operation instructions generated by translating the scalar operation instruction, thereby improving the performance of the translated program. Further, the embodiment of the present invention classifies the current basic block in combination with the type of the instruction contained in the current basic block and the high-order data analysis result of the current basic block, instead of classifying the current basic block according to the type of the instruction contained in the current basic block only. According to the high-order data analysis result of the current basic block, the high-order data of the vector registers corresponding to the vector instructions in the current basic block can be known to be meaningful actual data, the high-order data is needed to be used, and the operation of saving and recovering is still needed to be executed; and knowing which vector instructions in the current basic block correspond to the higher data of the vector register that is not to be used, such higher data does not require a save-restore operation to be performed. Therefore, the embodiment of the invention has more accurate type identification of the current basic block, can further reduce redundant 'save-restore' operation instructions, and further improves the translation optimization effect.
It should be noted that, for simplicity of description, the method embodiments are shown as a series of acts, but it should be understood by those skilled in the art that the embodiments are not limited by the order of acts, as some steps may occur in other orders or concurrently in accordance with the embodiments. Further, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred embodiments, and that the acts are not necessarily required by the embodiments of the invention.
Referring to FIG. 6, there is shown a block diagram of an embodiment of a binary translation optimization device of the present invention, the device comprising:
an instruction stream analysis module 601, configured to identify a type of an instruction contained in a current basic block to be translated; the type of instruction includes a vector instruction including at least one of a vector operation instruction and a scalar operation instruction;
The data stream analysis module 602 is configured to analyze a state of high-order data of a vector register corresponding to each vector instruction in the current basic block, so as to obtain a high-order data analysis result of the current basic block; the high-order data analysis result of the current basic block comprises the state of the high-order data of the vector register corresponding to each vector instruction in the current basic block; the state of the high-order data of the vector register is used for indicating whether the high-order data is a preset value and the source of the high-order data;
A type determining module 603, configured to determine a type of the current basic block according to a type of an instruction included in the current basic block and a high-order data analysis result of the current basic block; the type of the current basic block is used for indicating whether scalar operation instructions existing in the current basic block can be optimized;
the instruction optimizing module 604 is configured to optimize the translated instruction of the current basic block according to the type of the current basic block.
Optionally, the data stream analysis module includes:
The source determining submodule is used for sequentially simulating the high-order data of the vector register corresponding to each vector instruction according to the instruction flow sequence in the current basic block and determining the source of the high-order data of each vector register;
The state recording submodule is used for determining and recording the state of the high-order data of each vector register according to the source of the high-order data of each vector register; the state of the high-order data comprises one of a preset value state, a transmission state and an undefined state; the preset value state indicates that the high-order data is a preset value; the transfer state indicates that the high-order data is not a preset value, and the source of the high-order data is other registers; the undefined state indicates that the source of the high-order data is unknown.
Optionally, the type determining module includes:
The first determining submodule is used for determining that the type of the current basic block is a scalar type if the current basic block only contains scalar operation instructions;
The second determining submodule is used for judging whether the current basic block meets a first condition or not if the current basic block contains a vector operation instruction and a scalar operation instruction; if the first condition is met, determining that the type of the current basic block is a vector type; if the first condition is not met, judging whether the current basic block meets a second condition or not; if the second condition is met, determining that the type of the current basic block is a vector type; otherwise, determining that the type of the current basic block is a scalar type;
Wherein the first condition includes: the current basic block comprises at least one vector operation instruction, wherein the high-order data of a vector register read by the current basic block is in an undefined state, and the high-order data influences the low-order data of a written vector register; the second condition includes: the high-order data of the vector operation instruction read and written at the end of the current basic block are undefined states.
Optionally, the instruction optimization module is specifically configured to: if the type of the current basic block is determined to be S type, deleting the save and restore operation instruction of the high-order data of the vector register generated by translating the scalar operation instruction for the scalar operation instruction existing in the current basic block.
Optionally, the apparatus further comprises:
And the information transmission module is used for transmitting the high-order data analysis result of the current basic block to a next basic block to be used as the high-order data initial information of the next basic block, so that the next basic block calculates the high-order data analysis result of the next basic block based on the high-order data initial information.
Optionally, the apparatus further comprises:
The information matching module is used for judging whether the high-order data analysis result of the current basic block is matched with the high-order data initial information of the next basic block or not when the current basic block searches and links the next basic block after the translation of the current basic block is finished;
The execution or translation module is used for linking and executing the next basic block if the high-order data analysis result of the current basic block is matched with the high-order data initial information of the next basic block; otherwise, the next basic block is obtained through re-translation.
Optionally, the apparatus further comprises:
A position determining module, configured to determine, after translation of the current basic block is completed, a link position according to a type of the current basic block and a type of a next basic block before linking the current basic block to the next basic block;
and the basic block link module is used for linking the current basic block to the link position.
Optionally, the apparatus further comprises:
The high-order preservation processing head processing module is used for adding a high-order preservation processing head before the translated code of the current basic block if the type of the current basic block is S type, and is used for preserving high-order data of a designated vector register;
the position determining module is specifically configured to:
If the type of the current basic block is P type and the type of the next basic block is S type, determining the link position as the high-order storage processing head of the next basic block.
Optionally, the apparatus further comprises:
the high-order loading processing head processing module is used for adding a high-order loading processing head before the translated code of the current basic block if the type of the current basic block is P type, and loading high-order data of a designated vector register;
the position determining module is specifically configured to:
And if the type of the current basic block is S type and the type of the next basic block is P type, determining the link position as the position of the high-order loading processing head of the next basic block.
Optionally, the location determining module is specifically configured to:
And if the type of the current basic block is the same as the type of the next basic block, determining the link position as the translated code of the next basic block.
Optionally, the type of the instruction further includes a non-vector instruction, and the type of the current basic block further includes an N type, where the N type indicates that the current basic block includes only the non-vector instruction;
the position determining module is specifically configured to:
and if the type of the next basic block is N type, modifying the type of the next basic block into the type of the current basic block, and determining the link position as the translated code of the next basic block.
Optionally, the high-level saving processing head processing module is specifically configured to: acquiring a high-order data analysis result of the current basic block; determining a first vector register according to the high-order data analysis result, wherein the state of the high-order data of the first vector register is a transfer state or an undefined state; high-order data of the first vector register is saved.
Optionally, the high-order loading processing head processing module is specifically configured to: acquiring high bit data initial information of the current basic block; determining a second vector register according to the high-order data initial information, wherein the state of the high-order data of the second vector register is a preset value state; and delaying loading high-order data of the second vector register.
The binary translation optimizing device of the embodiment of the invention classifies the current basic block in the binary translation process, so that whether scalar operation instructions existing in the current basic block can be optimized or not can be determined according to the type of the current basic block, for example, unnecessary high-order data 'save-restore' operation instructions generated by translating the scalar operation instructions are deleted, and the performance of a translated program is improved. Further, the embodiment of the present invention classifies the current basic block in combination with the type of the instruction contained in the current basic block and the high-order data analysis result of the current basic block, instead of classifying the current basic block according to the type of the instruction contained in the current basic block only. According to the high-order data analysis result of the current basic block, the high-order data of the vector registers corresponding to the vector instructions in the current basic block can be known to be meaningful actual data, the high-order data is needed to be used, and the operation of saving and recovering is still needed to be executed; and knowing which vector instructions in the current basic block correspond to the higher data of the vector register that is not to be used, such higher data does not require a save-restore operation to be performed. Therefore, the embodiment of the invention has more accurate type identification of the current basic block, can further reduce redundant 'save-restore' operation instructions, and further improves the translation optimization effect.
For the device embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference is made to the description of the method embodiments for relevant points.
Referring to fig. 7, a schematic structural diagram of an electronic device according to an embodiment of the present invention is shown. As shown in fig. 7, the electronic device includes: the device comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface complete communication with each other through the communication bus; the memory is configured to store at least one executable instruction that causes the processor to perform the steps of the binary translation optimization method of the foregoing embodiment.
An embodiment of the present invention provides a non-transitory computer-readable storage medium, which when executed by a program or a processor of a terminal, enables the terminal to perform the steps of the binary translation optimization method of the foregoing embodiment.
In this specification, each embodiment is described in a progressive manner, and each embodiment is mainly described by differences from other embodiments, and identical and similar parts between the embodiments are all enough to be referred to each other.
It will be apparent to those skilled in the art that embodiments of the present invention may be provided as a method, apparatus, or computer program product. Accordingly, embodiments of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the invention may take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
Embodiments of the present invention are described with reference to flowchart illustrations and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing terminal device to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal device, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
Finally, it is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or terminal device that comprises the element.
The principles and embodiments of the present invention have been described herein with reference to specific examples, the description of which is intended only to assist in understanding the methods of the present invention and the core ideas thereof; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in accordance with the ideas of the present invention, the present description should not be construed as limiting the present invention in view of the above.

Claims (26)

1.A binary translation optimization method, the method comprising:
Identifying the type of an instruction contained in a current basic block to be translated; the type of instruction includes a vector instruction including at least one of a vector operation instruction and a scalar operation instruction;
Analyzing the state of high-order data of a vector register corresponding to each vector instruction in the current basic block to obtain a high-order data analysis result of the current basic block; the high-order data analysis result of the current basic block comprises the state of the high-order data of the vector register corresponding to each vector instruction in the current basic block; the state of the high-order data of the vector register is used for indicating whether the high-order data is a preset value and the source of the high-order data;
Determining the type of the current basic block according to the type of the instruction contained in the current basic block and the high-order data analysis result of the current basic block; the type of the current basic block is used for indicating whether scalar operation instructions existing in the current basic block can be optimized;
optimizing the translated instruction of the current basic block according to the type of the current basic block;
The analyzing the state of the high-order data of the vector register corresponding to each vector instruction in the current basic block to obtain the high-order data analysis result of the current basic block comprises the following steps:
according to the instruction flow sequence in the current basic block, sequentially simulating the high-order data of the vector register corresponding to each vector instruction, and determining the source of the high-order data of each vector register;
Determining and recording the state of the high-order data of each vector register according to the source of the high-order data of each vector register; the state of the high-order data comprises one of a preset value state, a transmission state and an undefined state; the preset value state indicates that the high-order data is a preset value; the transfer state indicates that the high-order data is not a preset value, and the source of the high-order data is other registers; the undefined state indicates that the source of the high-order data is unknown.
2. The method according to claim 1, wherein the determining the type of the current basic block according to the type of the instruction contained in the current basic block and the high-order data analysis result of the current basic block includes:
If the current basic block only contains a scalar operation instruction, determining that the type of the current basic block is a scalar type;
if the current basic block contains a vector operation instruction and a scalar operation instruction, judging whether the current basic block meets a first condition or not; if the first condition is met, determining that the type of the current basic block is a vector type; if the first condition is not met, judging whether the current basic block meets a second condition or not; if the second condition is met, determining that the type of the current basic block is a vector type; otherwise, determining that the type of the current basic block is a scalar type;
Wherein the first condition includes: the current basic block comprises at least one vector operation instruction, wherein the high-order data of a vector register read by the current basic block is in an undefined state, and the high-order data influences the low-order data of a written vector register; the second condition includes: the high-order data of the vector operation instruction read and written at the end of the current basic block are undefined states.
3. The method of claim 1, wherein optimizing the translated instructions of the current basic block according to the type of the current basic block comprises:
if the type of the current basic block is determined to be a scalar type, deleting a save and restore operation instruction for high-order data of a vector register generated by translating the scalar operation instruction for the scalar operation instruction existing in the current basic block.
4. A method according to any one of claims 1 to 3, further comprising:
And transmitting the high-order data analysis result of the current basic block to a next basic block to serve as high-order data initial information of the next basic block, so that the next basic block calculates the high-order data analysis result of the next basic block based on the high-order data initial information.
5. The method according to claim 4, wherein the method further comprises:
After the translation of the current basic block is completed, when the current basic block searches and links with the next basic block, judging whether the high-order data analysis result of the current basic block is matched with the high-order data initial information of the next basic block or not;
If the high-order data analysis result of the current basic block is matched with the high-order data initial information of the next basic block, linking and executing the next basic block; otherwise, the next basic block is obtained through re-translation.
6. A method according to any one of claims 1 to 3, further comprising:
After the translation of the current basic block is completed, before the current basic block is linked to a next basic block, determining a link position according to the type of the current basic block and the type of the next basic block;
And linking the current basic block to the linking position.
7. The method of claim 6, wherein the method further comprises:
if the type of the current basic block is scalar type, adding a high-order preservation processing head before the translated code of the current basic block for preserving the high-order data of the appointed vector register;
The determining a link position according to the type of the current basic block and the type of the next basic block includes:
If the type of the current basic block is a vector type and the type of the next basic block is a scalar type, determining that the link position is at a high-order storage processing head of the next basic block.
8. The method of claim 6, wherein the method further comprises:
if the type of the current basic block is a vector type, adding a high-order loading processing head before the translated code of the current basic block for loading high-order data of a specified vector register;
The determining a link position according to the type of the current basic block and the type of the next basic block includes:
And if the type of the current basic block is a scalar type and the type of the next basic block is a vector type, determining that the link position is at a high-order loading processing head of the next basic block.
9. The method of claim 6, wherein determining a link position based on the type of the current basic block and the type of the next basic block comprises:
And if the type of the current basic block is the same as the type of the next basic block, determining the link position as the translated code of the next basic block.
10. The method of claim 6, wherein the type of instruction further comprises a non-vector instruction, and wherein the type of the current basic block further comprises a normal type, the normal type indicating that only non-vector instructions are included in the current basic block;
The determining a link position according to the type of the current basic block and the type of the next basic block includes:
and if the type of the next basic block is the common type, modifying the type of the next basic block into the type of the current basic block, and determining the link position as the translated code of the next basic block.
11. The method of claim 7, wherein the saving high-order data of the specified vector register comprises:
Acquiring a high-order data analysis result of the current basic block;
determining a first vector register according to the high-order data analysis result, wherein the state of the high-order data of the first vector register is a transfer state or an undefined state;
high-order data of the first vector register is saved.
12. The method of claim 8, wherein loading the high-order data of the specified vector register comprises:
Acquiring high bit data initial information of the current basic block;
Determining a second vector register according to the high-order data initial information, wherein the state of the high-order data of the second vector register is a preset value state;
and delaying loading high-order data of the second vector register.
13. A binary translation optimization apparatus, the apparatus comprising:
the instruction stream analysis module is used for identifying the type of the instruction contained in the current basic block to be translated; the type of instruction includes a vector instruction including at least one of a vector operation instruction and a scalar operation instruction;
The data flow analysis module is used for analyzing the state of the high-order data of the vector register corresponding to each vector instruction in the current basic block to obtain a high-order data analysis result of the current basic block; the high-order data analysis result of the current basic block comprises the state of the high-order data of the vector register corresponding to each vector instruction in the current basic block; the state of the high-order data of the vector register is used for indicating whether the high-order data is a preset value and the source of the high-order data;
the type determining module is used for determining the type of the current basic block according to the type of the instruction contained in the current basic block and the high bit data analysis result of the current basic block; the type of the current basic block is used for indicating whether scalar operation instructions existing in the current basic block can be optimized;
the instruction optimization module is used for optimizing the translated instruction of the current basic block according to the type of the current basic block;
The data stream analysis module comprises:
The source determining submodule is used for sequentially simulating the high-order data of the vector register corresponding to each vector instruction according to the instruction flow sequence in the current basic block and determining the source of the high-order data of each vector register;
The state recording submodule is used for determining and recording the state of the high-order data of each vector register according to the source of the high-order data of each vector register; the state of the high-order data comprises one of a preset value state, a transmission state and an undefined state; the preset value state indicates that the high-order data is a preset value; the transfer state indicates that the high-order data is not a preset value, and the source of the high-order data is other registers; the undefined state indicates that the source of the high-order data is unknown.
14. The apparatus of claim 13, wherein the type determination module comprises:
The first determining submodule is used for determining that the type of the current basic block is a scalar type if the current basic block only contains scalar operation instructions;
The second determining submodule is used for judging whether the current basic block meets a first condition or not if the current basic block contains a vector operation instruction and a scalar operation instruction; if the first condition is met, determining that the type of the current basic block is a vector type; if the first condition is not met, judging whether the current basic block meets a second condition or not; if the second condition is met, determining that the type of the current basic block is a vector type; otherwise, determining that the type of the current basic block is a scalar type;
Wherein the first condition includes: the current basic block comprises at least one vector operation instruction, wherein the high-order data of a vector register read by the current basic block is in an undefined state, and the high-order data influences the low-order data of a written vector register; the second condition includes: the high-order data of the vector operation instruction read and written at the end of the current basic block are undefined states.
15. The apparatus of claim 13, wherein the instruction optimization module is specifically configured to: if the type of the current basic block is determined to be a scalar type, deleting a save and restore operation instruction for high-order data of a vector register generated by translating the scalar operation instruction for the scalar operation instruction existing in the current basic block.
16. The apparatus according to any one of claims 13 to 15, further comprising:
And the information transmission module is used for transmitting the high-order data analysis result of the current basic block to a next basic block to be used as the high-order data initial information of the next basic block, so that the next basic block calculates the high-order data analysis result of the next basic block based on the high-order data initial information.
17. The apparatus of claim 16, wherein the apparatus further comprises:
The information matching module is used for judging whether the high-order data analysis result of the current basic block is matched with the high-order data initial information of the next basic block or not when the current basic block searches and links the next basic block after the translation of the current basic block is finished;
The execution or translation module is used for linking and executing the next basic block if the high-order data analysis result of the current basic block is matched with the high-order data initial information of the next basic block; otherwise, the next basic block is obtained through re-translation.
18. The apparatus according to any one of claims 13 to 15, further comprising:
A position determining module, configured to determine, after translation of the current basic block is completed, a link position according to a type of the current basic block and a type of a next basic block before linking the current basic block to the next basic block;
and the basic block link module is used for linking the current basic block to the link position.
19. The apparatus of claim 18, wherein the apparatus further comprises:
The high-order preservation processing head processing module is used for adding a high-order preservation processing head before the translated code of the current basic block if the type of the current basic block is a scalar type, and is used for preserving high-order data of a designated vector register;
the position determining module is specifically configured to:
If the type of the current basic block is a vector type and the type of the next basic block is a scalar type, determining that the link position is at a high-order storage processing head of the next basic block.
20. The apparatus of claim 18, wherein the apparatus further comprises:
A high-order loading processing head processing module for adding a high-order loading processing head before the translated code of the current basic block if the type of the current basic block is a vector type, and loading high-order data of a specified vector register
The position determining module is specifically configured to:
And if the type of the current basic block is a scalar type and the type of the next basic block is a vector type, determining that the link position is at a high-order loading processing head of the next basic block.
21. The apparatus according to claim 18, wherein the position determination module is specifically configured to:
And if the type of the current basic block is the same as the type of the next basic block, determining the link position as the translated code of the next basic block.
22. The apparatus of claim 18, wherein the type of instruction further comprises a non-vector instruction, and wherein the type of the current basic block further comprises a normal type, the normal type indicating that only non-vector instructions are included in the current basic block;
the position determining module is specifically configured to:
and if the type of the next basic block is the common type, modifying the type of the next basic block into the type of the current basic block, and determining the link position as the translated code of the next basic block.
23. The apparatus of claim 19, wherein the high-level save processing head processing module is specifically configured to: acquiring a high-order data analysis result of the current basic block; determining a first vector register according to the high-order data analysis result, wherein the state of the high-order data of the first vector register is a transfer state or an undefined state; high-order data of the first vector register is saved.
24. The apparatus of claim 20, wherein the high-order loading processing head processing module is specifically configured to: acquiring high bit data initial information of the current basic block; determining a second vector register according to the high-order data initial information, wherein the state of the high-order data of the second vector register is a preset value state; and delaying loading high-order data of the second vector register.
25. An electronic device, comprising: the device comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface complete communication with each other through the communication bus; the memory is configured to hold at least one executable instruction that causes the processor to perform the steps of the binary translation optimization method according to any one of claims 1 to 12.
26. A readable storage medium, characterized in that the readable storage medium has stored thereon a program or instructions which, when executed by a processor, implement the steps of the binary translation optimization method according to any of claims 1 to 12.
CN202410417400.6A 2024-04-08 2024-04-08 Binary translation optimization method and device, electronic equipment and storage medium Active CN118012506B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410417400.6A CN118012506B (en) 2024-04-08 2024-04-08 Binary translation optimization method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410417400.6A CN118012506B (en) 2024-04-08 2024-04-08 Binary translation optimization method and device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN118012506A CN118012506A (en) 2024-05-10
CN118012506B true CN118012506B (en) 2024-06-07

Family

ID=90954647

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410417400.6A Active CN118012506B (en) 2024-04-08 2024-04-08 Binary translation optimization method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN118012506B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7260815B1 (en) * 2003-06-30 2007-08-21 Vmware, Inc. Method and apparatus for managing registers in a binary translator
DE102015006670A1 (en) * 2014-06-26 2015-12-31 Intel Corporation COMMANDS AND LOGIC FOR PROVIDING CRYPTOGRAPHIC ALL-PURPOSE GF (256) SIMPLE ARITHMETIC FUNCTIONALITY
CN105242907A (en) * 2015-09-10 2016-01-13 西安交通大学 NEON vectorization conversion method for ARM (Advanced RISC Machine) binary code
WO2019016507A1 (en) * 2017-07-20 2019-01-24 Arm Limited Register-based complex number processing
GB202209637D0 (en) * 2022-06-30 2022-08-17 Advanced Risc Mach Ltd Vector extract and merge instruction
CN117591174A (en) * 2023-11-20 2024-02-23 北京交通大学 AVX2SVE code transplanting and optimizing method based on compiler expansion

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7260815B1 (en) * 2003-06-30 2007-08-21 Vmware, Inc. Method and apparatus for managing registers in a binary translator
DE102015006670A1 (en) * 2014-06-26 2015-12-31 Intel Corporation COMMANDS AND LOGIC FOR PROVIDING CRYPTOGRAPHIC ALL-PURPOSE GF (256) SIMPLE ARITHMETIC FUNCTIONALITY
CN105242907A (en) * 2015-09-10 2016-01-13 西安交通大学 NEON vectorization conversion method for ARM (Advanced RISC Machine) binary code
WO2019016507A1 (en) * 2017-07-20 2019-01-24 Arm Limited Register-based complex number processing
GB202209637D0 (en) * 2022-06-30 2022-08-17 Advanced Risc Mach Ltd Vector extract and merge instruction
CN117591174A (en) * 2023-11-20 2024-02-23 北京交通大学 AVX2SVE code transplanting and optimizing method based on compiler expansion

Also Published As

Publication number Publication date
CN118012506A (en) 2024-05-10

Similar Documents

Publication Publication Date Title
US20200073712A1 (en) Method, apparatus, device and medium for processing topological relation of tasks
US7739673B2 (en) Multiple stage program recompiler using information flow determination
US8645669B2 (en) Cracking destructively overlapping operands in variable length instructions
KR101081090B1 (en) Register-based instruction optimization for facilitating efficient emulation of an instruction stream
US11029924B2 (en) Program optimization by converting code portions to directly reference internal data representations
US6347383B1 (en) Method and system for address trace compression through loop detection and reduction
US20130262808A1 (en) Compression and decompression system, compression apparatus, decompression apparatus and compression and decompression method
CN114610325A (en) Full-system dynamic binary translation method based on translation rules
CN118012506B (en) Binary translation optimization method and device, electronic equipment and storage medium
US7516061B1 (en) Method and system for using stored data items in native data formats in emulated e-mode programs
CN116661808B (en) Binary translation method, binary translation device, electronic equipment and storage medium
CN115480871B (en) General TMS320C3X processor instruction set virtualization simulation method
US20170161041A1 (en) Controlling execution of binary code
CN115421861B (en) General TMS320C55x processor instruction set virtualization simulation method
CN117008975A (en) Instruction fusion segmentation method, processor core and processor
US20120017070A1 (en) Compile system, compile method, and storage medium storing compile program
US7219336B2 (en) Tracking format of registers having multiple content formats in binary translation
US10915547B2 (en) Optimizing data conversion using pattern frequency
CN114281339A (en) Program compiling method, compiler, device, and storage medium
US10169043B2 (en) Efficient emulation of guest architecture instructions
JP2015197821A (en) Assembly code conversion device and conversion method thereof
US20050160407A1 (en) Memory management method for dynamic conversion type emulator
US11327758B2 (en) Non-transitory computer-readable recording medium, assembly instruction conversion method and information processing apparatus
US10108528B2 (en) High-performance processor instruction tracing
CN117950675A (en) Binary translation method, binary translation device, electronic equipment and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant