WO2022057743A1 - 一种代码处理方法、装置及存储介质 - Google Patents
一种代码处理方法、装置及存储介质 Download PDFInfo
- Publication number
- WO2022057743A1 WO2022057743A1 PCT/CN2021/117845 CN2021117845W WO2022057743A1 WO 2022057743 A1 WO2022057743 A1 WO 2022057743A1 CN 2021117845 W CN2021117845 W CN 2021117845W WO 2022057743 A1 WO2022057743 A1 WO 2022057743A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- code
- platform
- target platform
- module
- instruction
- Prior art date
Links
- 238000003672 processing method Methods 0.000 title claims abstract description 24
- 238000003860 storage Methods 0.000 title claims abstract description 22
- 238000000034 method Methods 0.000 claims description 129
- 230000006870 function Effects 0.000 claims description 92
- 230000008569 process Effects 0.000 claims description 49
- 238000012545 processing Methods 0.000 claims description 37
- 230000004048 modification Effects 0.000 claims description 29
- 238000012986 modification Methods 0.000 claims description 29
- 230000004044 response Effects 0.000 claims description 23
- 238000004891 communication Methods 0.000 claims description 15
- 238000004590 computer program Methods 0.000 claims description 11
- 230000005012 migration Effects 0.000 abstract description 2
- 238000013508 migration Methods 0.000 abstract 1
- 238000010586 diagram Methods 0.000 description 18
- 238000005457 optimization Methods 0.000 description 14
- 238000004458 analytical method Methods 0.000 description 10
- 238000004422 calculation algorithm Methods 0.000 description 10
- 230000009471 action Effects 0.000 description 6
- 230000000694 effects Effects 0.000 description 6
- 238000013519 translation Methods 0.000 description 6
- 238000006243 chemical reaction Methods 0.000 description 5
- 238000011161 development Methods 0.000 description 5
- 230000003993 interaction Effects 0.000 description 5
- 230000008878 coupling Effects 0.000 description 4
- 238000010168 coupling process Methods 0.000 description 4
- 238000005859 coupling reaction Methods 0.000 description 4
- 238000007667 floating Methods 0.000 description 4
- 238000005206 flow analysis Methods 0.000 description 2
- 230000014509 gene expression Effects 0.000 description 2
- 238000002955 isolation Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000002054 transplantation Methods 0.000 description 2
- 238000009825 accumulation Methods 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000008676 import Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformation of program code
- G06F8/52—Binary to binary
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformation of program code
- G06F8/51—Source to source
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformation of program code
- G06F8/41—Compilation
- G06F8/43—Checking; Contextual analysis
- G06F8/433—Dependency analysis; Data or control flow analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformation of program code
- G06F8/53—Decompilation; Disassembly
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/70—Software maintenance or management
- G06F8/76—Adapting program code to run in a different environment; Porting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformation of program code
- G06F8/41—Compilation
Definitions
- the embodiments of the present application relate to the technical field of code processing, and in particular, to a code processing method, apparatus, and storage medium.
- the x86 platform is a general-purpose computing platform developed by Intel Corporation, which refers to a series of backward-compatible central processing unit instruction set architectures based on Intel 8086. Since its introduction in 1978, the x86 platform has evolved a huge and complex instruction set after years of accumulation.
- the technical personnel When building the software ecosystem of the ARMv8 platform or other platforms, the technical personnel usually write software codes according to the code rules of the platform (such as instruction sets, etc.), so that the developed software can be adapted to the platform.
- the difficulty of developing new software for this platform is usually high, and the software development efficiency is slow, which makes it more difficult to build a software ecosystem for the ARMv8 platform or other platforms. Therefore, there is an urgent need for a method that can reduce the ecological difficulty of building platform software.
- Embodiments of the present application provide a code processing method, device, and storage medium, so as to reduce the difficulty of building a software ecology of a platform.
- the embodiments of the present application provide a code processing method, which reduces the difficulty of building the software ecology of the platform by transplanting the software code of the source platform to the first target platform, the source platform and the first target platform belong to different Platforms, specifically, can have different instruction sets.
- the first code based on low-level language applied to the source platform can be obtained first, and the first code can be, for example, code based on assembly language or machine language, and can be recognized by the source platform;
- the first code is decompiled to obtain a first intermediate representation (IR), where the first IR may be an IR related to the first target platform or an IR independent of the first target platform;
- the first IR is compiled to obtain a low-level language-based code applied to the first target platform, and the obtained code can be recognized and run by the first target platform, so that the software code on the source platform can be transplanted to the first target platform. target platform.
- the above process of decompiling and compiling the software code does not require the participation of the developer, thereby realizing the isolation of the developer from the software code and reducing the possibility of the developer accessing the software code.
- optimization and secondary development can be performed according to the software code transplanted to the first target platform, which is convenient for the software operator to maintain the software code transplanted to the first target platform.
- the source platform may be, for example, an x86 platform, and the first target platform may be an ARM platform, specifically an ARMv8 platform.
- the source platform can be any platform, and the first target platform can be any platform different from the source platform.
- the above method can be applied locally or in the cloud. Wherein, when applied locally, it may be applied to a local terminal or server, and when applied to the cloud, it may be presented to the user as a cloud service.
- the first code of the source platform can be ported to any target platform. Specifically, taking porting to the first target platform and the second target platform as an example, in addition to obtaining the low-level language-based code applied to the first target platform through the above decompilation and compilation process, it is also possible to decompile the code based on the low-level language.
- the IR corresponding to the second platform is obtained according to the first code
- the IR corresponding to the second target platform is different from the first IR
- the applicable target platforms are different
- the first target platform and the second target platform have different instructions set
- the second target platform and the source platform also have different instruction sets. That is, when the software code on the source platform is transplanted to any platform, the above-mentioned decompilation and compilation process can be used for implementation.
- a target platform selection interface may be presented to the user, and the target platform selection interface may present multiple target platform candidates for the user to select a target platform. Then, in response to the user's selection operation on the target platform, the target platform selected by the user can be determined from a plurality of target platforms. For example, the above-mentioned first target platform can be performed according to the selection operation performed by the user on the target platform selection interface. definite.
- the annotation information of the first code may be obtained first, and the annotation information may include, for example, the type, quantity, and jump of parameters in the first code. Any one or more of the address types.
- the first code can be decompiled according to the annotation information to obtain the first IR.
- the length of the parameter in the first code is 64 bits
- the length of the parameter can still be 64 bits (such as floating point type), and will not be changed to 32 bits (such as integer type), so that the types of parameters before and after decompilation can be kept consistent, reducing the possibility of logical errors in the first IR.
- the initially obtained IR may also be optimized.
- a second IR can be obtained.
- the second IR may have a certain space for optimization.
- the data read in the generated second IR is as follows Each time 64 bits of data are read, while the first target platform has the ability to read 128 bits at a time, the data read mode in the second IR can be optimized to read 128 bits of data each time.
- the second IR may be optimized according to the first target platform, for example, according to the hardware/logic capability of the first target platform, so as to obtain the above-mentioned first IR. In this way, after compiling the first IR subsequently, a more efficient code can be obtained.
- prompt information may be further generated and presented, and the prompt information may be used to prompt the item to be checked, wherein the item to be checked may be based on the difference between the first target platform and the source platform generate.
- the prompt information may specifically be an instruction of the source platform displayed in a highlighted manner and an instruction of the first target platform with the same semantics as the instruction, so that the user determines that the instruction based on the source platform is translated based on the prompt information. Whether the instruction of the first target platform is accurate.
- the user may provide the first code to the cloud, for example, the user may send a code processing request including the first code to the cloud through a terminal or client, so as to make The cloud obtains the first code; correspondingly, after the received first code is decompiled and compiled, the cloud can send the obtained low-level language-based code applied to the first target platform to the user, so that the user can Get the code it needs.
- the first code when the first code is decompiled, the first code may be decompiled according to an instruction semantic library corresponding to the source platform.
- the instruction semantics library may contain the semantics of multiple source platform instructions.
- each instruction in the first code can be traversed, and the instruction semantics library can be used to identify the instruction. Having semantics so that the first code can be decompiled into a corresponding compiler IR according to the identified instruction semantics.
- the instruction semantic library can also be modified by the user, for example, the user can add a single instruction stream multiple data stream (Single Instruction Multiple Data, SIMD) instructions to the instruction semantic library to be used for
- SIMD Single Instruction Multiple Data
- the SIMD instructions in the first code are identified, so that in the decompilation process, corresponding decompilation processing can be performed on the SIMD instructions in the first code, so that the first IR contains instructions with vectorized semantics.
- the terminal or the server may modify the instruction semantic library correspondingly in response to the user's modification operation on the instruction semantic library.
- the software code to be transplanted may include both the first code based on the low-level language and the variables based on the high-level language. Therefore, when the terminal or server acquires the first code, It also acquires variables based on high-level languages, so that when the first code is decompiled, the first code can be translated into the first IR containing a function, the function includes formal parameters, and the actual parameters corresponding to the formal parameters can be is the variable. That is, during the decompilation process, the variables based on the high-level language can be passed as arguments to the formal parameters in the function.
- the semantics of each instruction string in the first code may be specifically determined, so that the corresponding relationship between the semantics and the function can be determined according to the , the function corresponding to the semantics of each instruction string in the first code is determined, and then the first IR including the function can be generated.
- the variables in the first code may also be relocated. In this way, after the first code is decompiled, the obtained first IR
- Each variable of the The first logical address and the second logical address are different logical addresses.
- a preset first logical address may be configured for the first variable in the first code
- a preset second logical address may be configured for the second variable in the first code , wherein both the first logical address and the second logical address may be abstract logical addresses.
- the decompilation process may be performed according to the difference in function calling convention or SIMD instruction between the first target platform and the source platform.
- the first code when the first code includes SIMD instructions, the first code may be decompiled in a direct vectorization manner. Specifically, a third code based on a low-level language of the target platform can be generated first, and the third code can be used to describe the vectorization semantics of the SIMD instructions in the first code; for example, the third code can include an intrinsic function, for example code. Then, the third code may be decompiled to obtain the first IR corresponding to the SIMD instruction of the first target platform with the vectorization flag, and at this time, the first IR is associated with the first target platform.
- the first code when the first code includes SIMD instructions, the first code may be decompiled by using an indirect vectorization method.
- a fourth code based on a high-level language can be generated first, and the fourth code can be used to describe the vectorization semantics of the SIMD instruction in the first code; then, the fourth code can be compiled to obtain a vectorized mark the first IR.
- the IR with the vectorization flag can be automatically vectorized and compiled to generate low-level language-based code applied to the first target platform, and the code can include SIMD instructions of the first target platform. In this way, the indirect vectorization processing of the SIMD instructions in the first code is realized.
- the embodiments of the present application also provide a code processing method.
- the first code based on the low-level language applied to the source platform can be obtained, and then the first code can be output according to the first code.
- the second code where the second code is a low-level language-based code that can be applied to the first target platform, and the second code is obtained by processing the acquired first code, for example, it may be It is obtained by decompiling and then compiling, and the source platform and the first target platform have different instruction sets.
- the second code applied to the first target platform can be obtained according to the first code of the source platform, and the obtained second code can be recognized and run by the first target platform, thereby realizing the transplantation of the software code on the source platform to the first target platform.
- the above-mentioned processing process for the first code may not require the participation of the developer, thereby realizing the isolation of the developer from the software code and reducing the possibility of the developer accessing the software code.
- optimization and secondary development can be performed according to the software code transplanted to the first target platform, which is convenient for the software operator to maintain the software code transplanted to the first target platform.
- the source platform may be, for example, an x86 platform, and the first target platform may be an ARM platform, specifically an ARMv8 platform.
- the source platform can be any platform, and the first target platform can be any platform different from the source platform.
- the second code when outputting the second code, may be presented through a code display interface, so that the user can view the translated code on the code display interface, so that the user can view the translated code on the code display interface.
- the translated code On the basis of the translated code, secondary development, optimization and other operations are carried out.
- the process of processing the first code may be applied to the cloud, and when acquiring the first code, it may specifically be receiving the first code from the user; for example, the user may use a terminal or client, etc. Send a code processing request to the cloud, and the code processing request may carry the first code, etc.
- the user may also provide the first code to the cloud in other ways. In this way, after the cloud processes the first code and obtains the second code, when outputting the second code, it may specifically output the second code to the user, for example, it may be a code display interface displayed on the terminal used by the user Wait.
- a target platform selection interface may also be presented, and the target platform selection interface may present multiple target platform candidates for the user to select target platforms. Then, in response to the user's selection operation on the target platform, the target platform selected by the user can be determined from a plurality of target platforms. For example, the above-mentioned first target platform can be performed according to the selection operation performed by the user on the target platform selection interface. definite.
- the instruction semantic library corresponding to the source platform may be obtained first, and the instruction semantic library may be used to process the first code.
- the instruction semantic library may Contains the semantics of the instructions of multiple source platforms, so that when processing the first code, each instruction in the first code can be traversed, and the semantics of the instruction can be identified according to the instruction semantic library, so that the The first code is decompiled into a corresponding compiler IR according to the identified instruction semantics.
- the user can also modify the instruction semantic library corresponding to the source platform, for example, the user can add SIMD instructions to the instruction semantic library, so as to identify the SIMD instruction in the first code Wait. Then, the instruction semantic library corresponding to the source platform can be modified in response to the modification operation performed by the user on the instruction semantic library. In this way, in the process of decompiling the first code, decompilation can be performed according to the modified instruction semantic library.
- prompt information may also be generated and presented, where the prompt information is used to prompt the item to be checked, wherein the item to be checked is based on the first code Differences between the target platform and the source platform are generated.
- the prompt information may specifically be an instruction of the source platform displayed in a highlighted manner and an instruction of the first target platform with the same semantics as the instruction, so that the user determines that the instruction based on the source platform is translated based on the prompt information. Whether the instruction of the first target platform is accurate.
- not only the second code can be output, but also the first IR obtained by decompiling the first code can be presented. Compiled. In this way, the user can perform operations such as debugging and observation on the presented first IR, and perform corresponding analysis.
- a second IR can also be presented, and the second IR is obtained by decompiling the first code.
- the second IR can be obtained by decompilating the first code first.
- the second IR, and then the second IR is optimized to obtain the first IR. In this way, the first IR obtained after optimization processing can be more efficient in the code execution stage.
- the data reading method in the generated second IR is to read 64 bits of data each time, while the first target platform has the ability to read 128 bits at a time, and can read the data in the second IR
- the retrieval method is optimized to read 128-bit data each time, so that when the same amount of data is read, it is not necessary to perform two data read operations.
- the outputted first IR can also be modified by the user.
- the first IR can be modified by the user. to modify.
- the terminal or the server can obtain the modified first IR, so that compilation can be performed according to the modified first IR, and a relatively optimal third code can be obtained.
- the code is low-level language based code applied to the first target platform so that the third code can be presented to the user.
- the first code when the low-level language-based code applied to the second target platform is generated according to the first code, and the first code is decompiled, the first code can be decompiled into a third IR, And subsequently, the low-level language code applied to the second target platform can be obtained by compiling the third IR.
- the IRs corresponding to different target platforms may be different, so that different IRs and codes can be generated for different target platforms.
- the user may also modify the output second code, for example, the user may perform secondary development and optimization on the basis of the output second code.
- the modified second code can be obtained, and at the same time, the modified second code can be presented to the user in real time to facilitate the user to view.
- the first code when acquiring the first code, not only the first code based on the low-level language applied to the source platform can be acquired, but also the variables based on the high-level language can be acquired at the same time.
- the variables based on the high-level language are included.
- the embodiments of the present application provide a computing device.
- the device has functions corresponding to the implementations of the above-mentioned first aspect. This function can be implemented by hardware or by executing corresponding software by hardware.
- the hardware or software includes one or more modules corresponding to the above functions.
- the embodiments of the present application provide two computing apparatuses.
- the device has functions corresponding to the implementations of the above-mentioned second aspect. This function can be implemented by hardware or by executing corresponding software by hardware.
- the hardware or software includes one or more modules corresponding to the above functions.
- an embodiment of the present application provides a computing device, including: a processor and a memory; the memory is used to store an instruction, and when the computing device runs, the processor executes the instruction stored in the memory, so that the device
- the code processing method in the first aspect or any implementation manner of the first aspect is executed.
- the memory may be integrated in the processor, or may be independent of the processor.
- the apparatus may also include a bus. Among them, the processor is connected to the memory through the bus.
- the memory may include readable memory and random access memory.
- an embodiment of the present application provides a computing device, including: a processor and a memory; the memory is used to store an instruction, and when the computing device runs, the processor executes the instruction stored in the memory, so that the device
- the code processing method in the second aspect or any implementation manner of the second aspect is executed.
- the memory may be integrated in the processor, or may be independent of the processor.
- the apparatus may also include a bus. Among them, the processor is connected to the memory through the bus.
- the memory may include readable memory and random access memory.
- an embodiment of the present application further provides a readable storage medium, where a program or an instruction is stored in the readable storage medium, and when the readable storage medium runs on a computer, the above first aspect or any one of the first aspect is made The code processing method in the implementation is executed.
- an embodiment of the present application further provides a readable storage medium, where a program or an instruction is stored in the readable storage medium, and when the readable storage medium is run on a computer, the second aspect or any one of the second aspect can be stored in the readable storage medium.
- the code processing method in the implementation is executed.
- embodiments of the present application further provide a computer program product including instructions, which, when run on a computer, enables the computer to execute any code processing method in the first aspect or any implementation manner of the first aspect.
- the embodiments of the present application further provide a computer program product containing instructions, which, when run on a computer, cause the computer to execute the second aspect or any code processing method in any implementation manner of the second aspect.
- FIG. 1 is a schematic diagram of an exemplary system architecture in an embodiment of the present application
- FIG. 2 is a schematic flowchart of a code processing method in an embodiment of the present application.
- FIG. 3 is a schematic diagram of a code of an embedded assembly scene in an embodiment of the application.
- FIG. 5 is a schematic diagram of adjusting parameters of a function in an embodiment of the present application.
- FIG. 6 is a schematic diagram of IR before and after optimization in the embodiment of the application.
- FIG. 7 is a schematic diagram of a target platform selection interface in an embodiment of the application.
- FIG. 8 is a schematic diagram of interaction between a user 801 and a computing device 802 in an embodiment of the present application
- FIG. 9 is a schematic structural diagram of a computing device in an embodiment of the present application.
- FIG. 10 is a schematic structural diagram of another computing device in an embodiment of the present application.
- FIG. 11 is a schematic diagram of a hardware structure of a computing device in an embodiment of the present application.
- FIG. 12 is a schematic diagram of a hardware structure of another computing device according to an embodiment of the present application.
- the embodiments of the present application provide a code processing method, which may be to transplant software on other platforms to the platform, thereby enriching the software that can run successfully on the platform, thereby reducing the software ecosystem for building the platform difficulty.
- the software code of the source platform (that is, the above-mentioned other platforms) can be decompiled to obtain the internal representation (IR) of the compiler, and then the IR can be compiled into the target platform (the above-mentioned platform)
- the code based on the low-level language can be successfully run on the target platform, so that the software of the source platform can be ported to the target platform to run.
- the source platform and the target platform are different platforms, and the difference between the two platforms is at least having different instruction sets.
- the above process of decompiling and compiling the software code does not require the participation of the developer, so that the developer can be isolated from the software code and the possibility of the developer accessing the software code is reduced.
- optimization and secondary development can be performed according to the software code transplanted to the target platform, which is convenient for the software operator to maintain the software code transplanted to the target platform.
- the above code processing method can be applied to the system architecture shown in FIG. 1 .
- the system architecture 100 includes a decompilation module 101 and a compilation module 102 .
- the decompiler module 101 decompiles it to obtain IR, and then transmits the decompiled IR to the compilation module 102, and the compilation module 102 compiles the IR to be applied to the target The low-level language-based code 2 of the platform 104 , such that the resulting code 2 can run on the target platform 104 .
- the decompilation module 101 may be a software-based functional module; or may also be implemented by a device having a decompilation function, such as a decompiler.
- the compiling module 102 may be implemented by a device having a compiling function, such as a compiler.
- the decompilation module 101 and the compilation module 102 may be deployed on the target platform 104 , or may be deployed on the terminal 105 or the server 106 or the like.
- the decompilation module 101 and the compilation module 102 are deployed on the terminal 105 and the server 106
- the decompilation module 101 may be deployed on the server 106
- the compilation module may be deployed on the terminal 105, etc.
- 101 and the compiling module 102 can both be deployed in the server 106 located in the cloud.
- the terminal 105 can send the software code to the cloud (specifically, the server 106 in the cloud), for example, it can send the code to the cloud.
- the code processing request includes the software code to be transplanted; then, after the cloud (server 106) performs the above-mentioned decompilation and compilation processing, the low-level language-based code applied to the target platform can be obtained, and the code is sent to Terminal 105.
- the terminal 105 can obtain software codes that can be applied to the target platform, thereby realizing software porting.
- the server 106 may also be a local server.
- the method can be applied to the above-mentioned target platform 104, or can be applied to a computing device, and the computing device can be, for example, the above-mentioned terminal 105 or a server 106, and the computing device may include a decompilation module 101 and a compilation module 102, and the method may specifically include:
- S201 Obtain the first code based on the low-level language applied to the source platform.
- the program codes of the software are usually compiled into codes in a low-level language that can be directly recognized by the source platform.
- the low-level language refers to a program language or instruction code that can be directly recognized by a machine, and may specifically be an assembly language or a machine language.
- Machine language is a language represented by binary code. Machine language is the only language a computer can recognize and execute.
- Assembly language is a language that uses easy-to-understand and memorize names and symbols to represent opcodes in machine instructions to solve the shortcomings of machine language that is difficult to understand and remember. Assembly language uses symbols instead of machine language binary code, so assembly language is essentially a symbolic language.
- High-level language In contrast to low-level languages, there is also a high-level language that is machine independent, procedural or object-oriented.
- High-level programming languages are usually close to natural languages, and can use mathematical expressions, so they have stronger expressive ability, can easily express data operations and program control structures, and can better describe various algorithms, such as C, C++ , Java and other languages.
- High-level language which can be applied to different platforms, such as the platform suitable for the x86 instruction set architecture (a general computing platform developed by Intel Corporation, hereinafter referred to as the x86 platform), and can also be applied to advanced RISC machines.
- ARM ARM
- ARM platform ARM platform
- reduced instruction set architecture Performance Optimization With Enhanced RISC–Performance Computing
- a developer can compile a code file based on a high-level language through a compiler program such as a compiler so that it can be recognized and executed by a machine.
- the acquired first code may be a code based on a low-level language, and the code can be applied to the source platform.
- the acquired first code may be a file in the ".obj" format obtained by assembling based on the assembly language corresponding to the source platform or the code of a file in other formats.
- the variables based on the high-level language may be acquired at the same time.
- the first code obtained is the assembly language code in the format of "movdqa%%xmm4, 16(%0)"
- the variable based on high-level language is specifically ""r "(&ff_inverse[0]), "&ff_inverse[0]” and "&ff_inverse[8]" in "r"(&ff_inverse[8])”.
- embedded assembly is usually a coding method used to improve code execution efficiency and to execute processor-specific instructions. Then, after assembling the embedded assembly code, the first code such as the ".obj" format file can be obtained.
- the first code in this embodiment may be a code in "obj" format obtained after assembling the entire assembly language program code of the software on the source platform; or after assembling a part of the program code.
- the obtained code in "obj” format such as one of the multiple code files of the software;
- the obtained code and the like that is, in this embodiment, the software code can be partially decompiled and the like.
- the source platform to which the first code is applied and the target platform to which the first code is transplanted are different, and may specifically have different instruction sets.
- the source platform and the target platform can be of different types of platforms.
- the source platform can be an x86 platform
- the target platform can be an ARMv8 platform (a processor architecture that supports 64-bit instruction set released by ARM); alternatively, the source platform and the target platform can be or belong to the same type Two different platforms, the source platform can be the Pentium II under the x86 platform, and the target platform can be the Pentium III under the x86 platform (with the introduction of a new SSE instruction set), or the source platform can be a 32-bit x86 platform, and The target platform can be an x86-64 platform that supports 64-bit, etc.
- the first code in the process of transplanting the first code applied to the source platform to the target platform to run on the target platform, the first code may be decompiled by using the decompilation module 101, specifically, the input first code may be decompiled.
- Lexical analysis, syntax analysis, and semantic analysis are performed, and converted into a compiler IR, where the compiler IR is the decompilation result corresponding to the first code.
- the compiler IR can characterize the semantic and syntactic structure of the first code, which can be considered as another high-level language.
- the decompilation module 101 may decompile the first code into a compiler IR.
- the variable based on the high-level language may also be acquired while acquiring the first code based on the low-level language.
- the decompilation module 101 decompiles the first code, the high-level language Parameterized translation processing is performed on the variables of the first code, so as to avoid the loss of variable information in the high-level language in the first code.
- the decompilation module 101 can translate the instruction string into a first IR of the compiler including a function according to the semantics of the instruction string, and the function included in the first IR contains The semantics of the expression match the semantics of the instruction string. For example, when the instruction string is "ADD%x%y", the function included in the compiler IR obtained by translating the instruction string may be a function for summation.
- the decompilation module 101 may obtain the correspondence between instruction semantics and functions in advance, for example, may store it in the decompilation module 101 in advance. Then, the decompilation module 101 can determine the semantics of the low-level language-based instruction string in the mixed code block, the instruction string can include one or more instructions, and the decompilation module can determine the semantics of each instruction in the instruction string. Next, the decompilation module 201 determines the function corresponding to the semantics of each instruction in the instruction string by looking up the corresponding relationship between the semantics and the function according to the semantics of each instruction in the instruction string, and then according to the determined function A first IR containing the function can be further generated.
- the decompilation module 101 can determine the instruction by searching the instruction semantic library corresponding to the source platform (for example, a technician pre-marks the instruction of the source platform semantically and imports the instruction and instruction semantics into the decompilation module 101, etc.) The semantics of each instruction in the string.
- the instruction semantic library includes multiple instructions of the source platform, and the semantics of each instruction is pre-completed with semantic annotation.
- the semantics of the instruction a in the instruction semantic library is the semantics of the instruction A in the first code.
- the semantics of each instruction may also be determined in other manners, which is not limited in this embodiment.
- the decompilation module 101 references the virtual register created above for the function determined in the translation process, and uses the virtual register as a formal parameter of the function.
- the decompilation module 101 establishes the correspondence between the variables in the high-level language and the formal parameters in the function, so as to pass the variables based on the high-level language in the mixed code as actual parameters to the formal parameters in the semantic function.
- the decompilation module 101 can count the variables of the high-level language in the mixed code block, so that a list composed of variables of multiple high-level languages can be obtained, and then according to the position of the formal parameter in the IR and the calling convention of the compiler, the The variables in the list correspond to the formal parameters in the function. For example, you can map the first variable in the list to the first parameter of the first function in IR, the second variable in the list to the second parameter of the first function in IR, etc. .
- variable information of the high-level language can be retained in the function as actual parameters without information loss, so that in the embedded assembly scenario, the code information before and after the decompilation process is consistent.
- the decompilation module 101 may first distinguish whether the code to be decompiled only includes the first code based on the low-level language, or includes both the first code and the high-level language variable. In a possible implementation manner, the decompilation module 101 may first detect the compilation command of the code to be decompiled and the file type of the code, so as to determine whether the first code is a code based on a high-level language or a code based on a low-level language .
- the decompilation module 101 may further determine, according to the grammar in the code, whether the code is entirely based on the first code of the low-level language, or both.
- the first code including the low-level language also includes mixed code blocks of variables of the high-level language, such as embedded assembly code composed of C/C++ language and assembly language (low-level language).
- the decompilation module 101 decompiles the first code for the high-level language in the first code. variable, the above parameterized translation process can be performed.
- the first code usually includes at least one function call.
- the called function is essentially a variable.
- the function y is essentially a variable.
- function y also has variables a, b, and c associated with it.
- the variables associated with the function include the function itself and the variables associated with the function body.
- the variables associated with the function y include y and a, b, and c. Since in the first code, the address of the variable associated with the function is a relative address, that is, the address of each variable is uncertain.
- the instructions in the first code are translated into the compiler's instructions based on the instruction semantic library corresponding to the source platform.
- the decompilation module 101 may further relocate the variables in the IR (different from the variables of the high-level language), so as to determine the absolute addresses of the variables in the first code.
- a preset first logical address may be configured for the first variable in the first code, which is:
- the second variable in the first code configures a preset second logical address, and the first logical address and the second logical address may be different.
- the decompilation module can access the relocation table, and the relocation table stores the logical address information of multiple variables (including the first variable and the second variable). information, determine the logical addresses corresponding to the first variable and the second variable respectively (that is, the absolute addresses corresponding to the variables), and associate the logical addresses with the symbols of the variables.
- the address information configured for the first variable and the second variable may be false logical address information.
- the decompilation module 101 can use the instruction semantic library corresponding to the source platform to decompile the first code that completes the address configuration of the variable to obtain the first IR of the compiler. In this way, in the subsequent compilation phase, the logical address of the variable will be recompiled into the relocation information of the target platform, that is, in the compilation phase, the variable in the first IR of the compiler can point to a specific logical address.
- the decompilation module 101 may first determine the difference between the source platform and the target platform, and decompile the first code according to the difference.
- the difference between the source platform and the target platform may be determined by the technical personnel in advance by comparing the function calling conventions or SIMD instructions of the source platform and the target platform, and then importing the difference into the decompilation module 101. This is not limited.
- the decompilation module 101 may also decompile the first code directly according to the function calling convention of the target platform or the SIMD instruction, etc., which is not limited in this embodiment. In this way, after the above differential processing or decompilation directly based on the information of the target platform, the obtained function calls or SIMD instructions in the IR can conform to the function calling conventions and SIMD instructions of the target platform.
- the following takes as an example the adjustment of the parameters in the first code according to the difference in the function calling convention between the target platform and the source platform as an example.
- the parameters of the function are stored in registers or on the stack in memory. Parameters are stored differently in different platforms.
- the decompilation module 101 can adjust the register information or stack information of the functions in the source code block according to the difference of the function calling rules of the source platform and the target platform. For example, the decompilation module 101 can adjust the parameters stored in the registers and store them in the stack; or adjust the parameters stored in the stack and store them in the registers.
- the decompilation module 101 may first use a decoding (decode) tool, such as intel xed, to decode the source code block to obtain the instruction control flow. Then the decompilation module 101 can execute a data flow analysis algorithm with respect to the instruction control flow to analyze active registers and stacks, so as to obtain the parameter types and numbers of functions in the source code block.
- the parameter type is mainly used to indicate that the parameter is stored in the register or in the stack.
- this embodiment defines the following data sets:
- variables represents the register corresponding to the variable.
- In[n] and Out[n] represent the set of registers corresponding to input and output, respectively, and Def[n] and Use[n] represent the set of registers that are defined and used, respectively.
- the decompilation module 101 can traverse the blocks in the source code block, and construct the use set and the def set of each block.
- the specific construction process is as follows:
- the decompilation module 101 can establish a data flow analysis equation according to the above set, as shown below:
- n represents a block
- the symbol indicates that the set on the right side of the symbol is a subset of the set on the left side of the symbol.
- the decompilation module 101 can solve the above equation through the fixed point algorithm, as follows:
- the intersection of the in set of the function entry and the input parameter Reg specified by the Calling Convention is the input parameter register; the intersection of the out set of the function exit and the output parameter Reg specified by the Calling Convention is the possible return value register .
- the decompilation module 101 When the decompilation module 101 performs stack analysis, it can use the algorithm based on the extended stack pointer register (rex-extended stack pointer, RSP) or the algorithm based on the extended base pointer register (rex-extended base pointer, RBP) to carry out the instruction control flow. Analysis implementation.
- RSP extended stack pointer register
- RBP extended base pointer register
- the process that the decompilation module 101 uses the RSP-based algorithm to analyze may specifically include the following steps:
- the decompilation module 101 may determine the offset through the sub instruction or the push instruction. The decompilation module 101 also records the registers associated with the RSP.
- the other register operand of the same instruction is an integer register (RXX), and the instruction is a non-floating->integer type conversion related instruction, it is judged that the stack parameter is an integer type. If the other register operand of the same instruction is a floating-point register (XMM), and the instruction is a non-integer -> floating-point type conversion related instruction, it is judged that the stack parameter is a floating-point.
- RXX integer register
- XMM floating-point register
- the process that the decompilation module 101 uses the RBP-based algorithm to analyze may specifically include the following steps:
- the other register operand of the same instruction is an integer register (RXX), and the instruction is a non-floating point -> integer type conversion related instruction, it is judged that the stack parameter is an integer type, if the other register operand of the same instruction is a floating point type Register (XMM), and the instruction is a non-integer -> floating-point type conversion related instruction, it is judged that the stack parameter is a floating-point.
- RXX integer register
- XMM floating point type Register
- the decompilation module 101 may execute the above two algorithms at the same time, and then take the maximum value of the total number of parameters S determined by the two algorithms.
- the decompilation module 101 can adjust the storage location of the parameters according to the difference in function calling rules. Specifically, the decompilation module 101 performs cross-platform processing on the input parameter register and the stack according to the difference in the function calling rules, such as pushing several parameters in the register into the stack, switching the stack pointer, etc.
- the input parameter register and the stack space perspective are the same.
- the function test includes a total of 10 parameters from i0 to i9.
- parameters i0 to i5 are stored in registers, and i6 to i9 are stored in the stack.
- i0 to i7 are stored in registers, and i8 to i9 are stored in the stack.
- the decompilation module 101 can push the parameters i6 and i7 into the stack, and switch the stack pointer, so that the input parameter registers and the stack space perspective of the runtime under different platforms are consistent.
- the method obtains precise input parameter registers and stack input parameters by using compiler active register analysis and stack analysis, and reduces unnecessary register conversion of function calling conventions.
- the decompilation module 101 can also obtain the first code before decompiling the first code.
- Annotation information corresponding to the first code for example, the annotation information of the first code may include any one of the types, quantities, and types of jump addresses in the first code (internal or external jumps in assembly code, etc.) or Therefore, when decompiling the first code, the decompilation module 101 can determine the type, quantity, and jump address type of the parameters in the compiler IR according to the annotation information.
- the annotation information in the first code may be generated in the process of compiling the code in the assembly language, and is used to carry the relevant information of the assembly language.
- the parameter type in the annotation information as an example, assuming that the length of the parameter in the first code is 64 bits, then in the first IR obtained when the first code is decompiled, the length of the parameter can still be 64 bits ( Such as floating point type), but not changed to 32 bits (such as integer type), so, according to the annotation information to decompile the first code, can make the types of parameters before and after decompilation consistent, reduce the occurrence of the first IR the possibility of logical errors.
- the decompilation module 101 may not need to consider the annotation information when decompiling the first code.
- the two The similarity between platforms is high, and the difference is small.
- the decompilation module 101 does not use the annotation information of the first code.
- the first code can also be decompiled directly.
- the instruction semantic library for translating instruction strings may also include vectorized instruction semantics, such as SIMD instruction semantics (the SIMD instruction can simultaneously perform the same operation on each data in a group of data to achieve Parallel processing in space), etc.
- the vectorized instruction semantics can be used to perform vectorized translation on part of the instructions in the first code, so as to obtain the vectorized IR corresponding to the instruction.
- vectorized code instructions
- the first code includes a SIMD instruction (for summing multiple data)
- the decompilation module 101 does not perform vectorization processing on the SIMD instruction
- the first IR obtained after decompilation will be in the first IR.
- the corresponding code execution process is a serial operation of reading and summing data one by one from a set of data, and after vectorizing the SIMD instruction, in the first IR obtained after decompilation, the corresponding The code execution process is to read all the data from a set of data, and perform a parallel sum calculation on all the data of the set.
- the decompilation module 101 when the decompilation module 101 translates the first code, other instructions in the first code can be translated into IR including corresponding functions according to the semantics of the instruction;
- the SIMD instruction in the first code can be vectorized and translated to obtain the first IR with the vectorization flag.
- the vectorization mark may be, for example, a special symbol in the first IR, such as " ⁇ ", "!, " ⁇ ” and other symbols.
- the skilled person can add the semantics of the SIMD instruction of the source platform to the imported instruction semantics library in advance, so that in the compilation stage, the decompilation module 101 can identify the semantics of the SIMD instruction in the first code according to the instruction semantics library.
- the decompilation module 101 can directly or indirectly translate the SIMD instruction in the first code into an IR with a vectorization flag.
- the decompilation module 101 may generate a third code based on a low-level language that is applied to the target platform.
- the third code may be a low-level language code that includes intrinsic functions corresponding to the target platform.
- the intrinsic function can package language extensions or platform-related capabilities and define them in high-level language header files such as C/C++, so that the generated third code related to the target platform can be used to describe the SIMD instructions of the source platform The vectorized semantics of .
- the decompilation module 101 may decompile the third code to obtain the first IR corresponding to the SIMD instruction of the target platform with the vectorization flag. In this way, after the subsequent compiler compiles the first IR, the SIMD instruction of the target platform can be obtained.
- the decompilation module 101 may generate a fourth code based on a high-level language independent of the target platform, and the fourth code may be used to describe the vectorization semantics of SIMD instructions. Then, the decompilation module 101 may decompile the fourth code based on the high-level language to obtain a platform-independent first IR with a vectorization flag. In this way, subsequent compilers can automatically vectorize the first IR to generate SIMD instructions of the target platform.
- S203 Compile the first IR obtained by decompilation into a second code based on a low-level language applied to a target platform, and the source platform and the target platform have different instruction sets.
- the obtained first IR can be compiled by the compiling module 102 to obtain a second code that can run on the target platform.
- the second code is a low-level language code supported by the target platform, such as corresponding to the target platform. assembly code, etc.
- the IR obtained by decompiling the first code may have a certain room for optimization.
- decompile The module 101 can decompile the first code based on the minimum capabilities of multiple platforms, which enables the decompilation module 101 to optimize the obtained IR according to the higher capabilities of the target platform, so that the target platform is executing
- the capabilities of the platform may include a data read speed, a data access method, and the like supported by the platform.
- the decompilation module 101 can optimize the code shown in the upper part of FIG. 6 to the code shown in the lower part of FIG. 6 according to the ability of the target platform to read 128 bits at a time, so that the target platform can read data based on the code when the target platform reads data. , 128bit data can be read each time.
- the decompilation module 101 may decompile the first code to generate a second IR, and then optimize the second IR according to the first target platform to obtain the first IR. For example, after decompiling the first code and obtaining the second IR, the decompilation module 101 can also determine the semantics of each (sentence) instruction string in the second IR, and determine the semantics according to the correspondence between the semantics and the compilation optimization rules relationship, determine the compilation and optimization rules corresponding to the semantics of each instruction string in the second IR, so as to optimize the second IR based on the determined compilation and optimization rules to obtain the first IR, so that when the compilation module 102 compiles, it can It is to compile the first IR obtained by optimization.
- the decompilation module 101 may decompile the first code to generate a second IR, and then optimize the second IR according to the first target platform to obtain the first IR. For example, after decompiling the first code and obtaining the second IR, the decompilation module 101 can also determine
- a binary file in ".obj" format or other formats can be obtained, so that assembly language code applied to the target platform can be generated according to the binary file.
- the assembly language code obtained through the above process can be adapted to the target platform, so that it can be successfully run on the target platform, and the code of the source platform can be transplanted to the target platform.
- the target platform can be any platform different from the source platform, that is, the decompilation module 101 and the compilation module 102 can transform the first code into a low-level language-based code suitable for any platform.
- the above target platform may be referred to as the first target platform below, and the decompilation module 101 and the compilation module 102 can not only transform the first code of the source platform into a first code suitable for the first target platform based on the above process.
- the low-level language code of the target platform can also be transformed into a low-level language code suitable for the second target platform in a similar process, wherein the second target platform, the first target platform and the source platform are different from each other , specifically with different instruction sets.
- a target platform selection interface may be presented to the user, for example, a target platform selection interface as shown in FIG. 7 may be presented on the display screen of the terminal, and the target platform selection interface may provide a plurality of different The candidate target platforms of , such as target platform 1, target platform 2, . . . , target platform N (N is a positive integer greater than 1) as shown in FIG.
- the user can perform the selection operation for the target platform from the plurality of candidate platforms presented according to actual needs, and determine the above-mentioned first target platform or the second target platform from the plurality of target platforms, thereby.
- the first code can be transformed into a low-level language-based code suitable for the first target platform or the second target platform through a similar process described above. For example, the user may click the drop-down menu button with the mouse to present multiple target platforms, then the user moves the cursor to the desired target platform, and selects the target platform by clicking the mouse, thereby determining the first target platform .
- the relevant information of the target platform can also be presented on the target platform selection interface. For example, as shown in FIG. 7 , for the target platform selected by the user, the data processing of the target platform can also be presented. capabilities, applicable hardware types, required hardware environment, etc.
- corresponding prompt information can also be generated for the code, and the prompt information can be used to indicate the source platform and the first target platform. difference between target platforms, and then, the prompt information can be presented on the interface where the user selects the target platform, so as to make a prompt to the user. For example, the user may be prompted for the correspondence between the instructions in the first code and the instructions in the code of the first target platform, such as highlighting the code instructions of the two platforms through a specific color.
- the prompt information may also be presented on other interfaces, and is not limited to the above-mentioned target platform selection interface. This embodiment does not limit how to present the prompt information and the specific implementation manner of presenting the prompt information.
- the computing device 802 may specifically be a cloud device, such as a server in the cloud, or a local terminal/server. As shown in Figure 8, the process may specifically include:
- the computing device 802 presents a target platform selection interface to the user 801, where multiple candidate target platforms are provided in the target platform selection interface.
- the computing device 802 can support porting software codes to a variety of different target platforms. Then, the computing device 802 can first present a target platform selection interface to the user, and then present the target platform selection interface of the computing device 802 on the target platform selection interface. Supported target platforms that can be ported for users to choose from. Among them, different platforms have at least different instruction sets.
- the computing device 802 determines a first target platform from a plurality of target platforms according to the user's selection operation on the target platform.
- S803 The user 801 sends the low-level language-based first code applied to the source platform to the computing device 802.
- the user 801 may specifically send the first code to the computing device 802 through a medium such as a terminal or a client.
- the first code may be a binary file based on the ".obj" format, or a binary file based on other formats.
- the user may also send the assembly language code of the source platform to the computing device 802. In this way, after the computing device 802 receives the assembly language code, it may first perform assembly processing on the assembly language code to obtain ".obj". ” format or other binary files.
- the computing device 802 presents the second code on the code display interface, where the second code is a low-level language-based code applied to the first target platform.
- the second code is obtained by processing the first code.
- the second code may be obtained by decompiling and compiling the first code by the computing device 802.
- the computing device 802. For the specific implementation, please refer to the foregoing embodiments. The relevant descriptions are not repeated here.
- the computing device 802 may present the second code on the corresponding code display interface, so that the user 801 can view the details of the second code obtained by processing.
- the second code may be a binary file in an "obj" format or other formats.
- the second code may also be code based on other languages suitable for the first target platform, for example, it may be assembly language code suitable for the first target platform, and the assembly language code may be suitable for the first target platform. The code is decompiled and the "obj" format file generated after compilation is converted.
- S805 The computing device 802 presents prompt information.
- the computing device 802 may prompt the user 801, such as prompting the correspondence between the instructions in the first code and the instructions in the second code; or prompting the user 801 about possible problems during the code porting process .
- the content and specific implementation of the prompt information presented by the computing device 802 are not limited.
- S806 The user 801 modifies the second code presented by the computing device 802.
- the user 801 can modify the second code, for example, when a logic loophole is found in the second code through the prompt information or viewing the second code, the second code can be modified. , to resolve the logic bug.
- the user 801 may continue to further modify the modified second code until the code meets the user's expectation.
- the computing device 802 presents the first IR or the second IR.
- the computing device 802 may first decompile the first code to obtain the second IR, and then, by further optimizing the second IR, the first IR may be obtained. Then, the computing device 802 can present the first IR or the second IR obtained in the decompilation process to the user 801 .
- the user 801 can view the first IR or the second IR presented by the computing device 802, and can debug the first IR or the second IR, so that the computing device 802 can adjust the first IR or the second IR according to the debugged first IR or the second IR. , and the corresponding low-level language-based code applied to the first target platform is obtained.
- this embodiment may also include:
- the computing device 802 usually decompiles the first code according to the instruction semantic library corresponding to the source platform, and the user can decompile the instructions used in the decompilation process by viewing the second code, prompt information or IR.
- Adjust the semantic library such as adding the semantics of the SIMD instruction corresponding to the source platform to the instruction semantic library, so that when the first code is decompiled based on the adjusted instruction semantic library, the obtained IR may be applied to the first target.
- the code on the platform can be better.
- an embodiment of the present application further provides a computing device, which can implement the functions of the computing device in the embodiment shown in FIG. 2 above.
- the apparatus 900 may include:
- an obtaining module 901 configured to obtain the first code based on the low-level language applied to the source platform
- a decompilation module 902 configured to decompile the first code to obtain a first intermediate representation IR
- a compiling module 903 configured to compile the first IR into a second code based on a low-level language applied to a first target platform, where the source platform and the first target platform have different instruction sets.
- the decompilation module 902 is further configured to decompile the first code to obtain the IR corresponding to the second target platform, the first IR and the second target platform The corresponding IRs are different, and the first target platform and the second target platform have different instruction sets.
- the apparatus 900 further includes:
- a presentation module 904 configured to present a target platform selection interface
- the determining module 905 is configured to determine the first target platform from a plurality of target platforms in response to the selection operation for the target platform.
- the decompilation module 902 is specifically used for:
- annotation information of the first code where the annotation information includes any one or more of the types, quantities, and types of jump addresses in the first code
- the first code is decompiled to obtain the first IR.
- the decompilation module 902 is specifically used for:
- the second IR is optimized to obtain the first IR.
- the apparatus 900 further includes:
- a generating module 906, configured to generate prompt information, where the prompt information is used to prompt an item to be checked, and the item to be checked is generated based on the difference between the first target platform and the source platform;
- the presentation module 904 is configured to present the prompt information.
- the apparatus is applied to the cloud, and the acquiring module 901 is specifically configured to receive a first code from a user;
- the apparatus 900 further includes: a communication module 907, configured to send the second code based on the low-level language applied to the first target platform to the user.
- the decompilation module 902 is specifically configured to decompile the first code according to an instruction semantic library corresponding to the source platform.
- the apparatus 900 further includes:
- the modification module 908 is configured to modify the instruction semantic library in response to the modification operation on the instruction semantic library.
- the obtaining module 901 is specifically configured to obtain the first code based on the low-level language and the variables based on the high-level language applied to the source platform;
- the decompilation module 902 is specifically configured to translate the first code into a first IR including a function, the function includes a formal parameter, and the actual parameter corresponding to the formal parameter is the variable.
- the first IR includes a first variable and a second variable, the first variable has a first logical address, the second variable has a second logical address, the first logical address The address is different from the second logical address.
- the decompilation module 902 is specifically configured to decompile the first code according to the target platform function calling convention or the single instruction stream multiple data stream SIMD instruction.
- the computing device 900 in this embodiment corresponds to the code processing method shown in FIG. 2 . Therefore, for the specific implementation of each functional module in the computing device 900 in this embodiment and the technical effects it has, reference may be made to the method shown in FIG. 2 . The description of the relevant parts in the embodiment is not repeated here.
- the embodiment of the present application further provides another computing apparatus, and the apparatus can implement the function of the computing device 802 in the embodiment shown in FIG. 8 above.
- the apparatus 1000 may include:
- an acquisition module 1001 configured to acquire the first code based on the low-level language applied to the source platform
- An output module 1002 configured to output a second code, where the second code is a low-level language-based code applied to a first target platform, the second code is obtained by processing the first code, the source The platform and the first target platform have different instruction sets.
- the output module 1002 is specifically configured to present the second code through a code display interface.
- the apparatus is applied to the cloud, and the obtaining module 1001 is specifically configured to receive a first code from a user;
- the output module 1002 is specifically configured to output the second code to the user.
- the apparatus 1000 further includes:
- the determining module 1004 is configured to determine the first target platform from a plurality of target platforms in response to the selection operation for the target platform.
- the obtaining module 1001 is further configured to obtain an instruction semantic library corresponding to the source platform, where the instruction semantic library is used to process the first code.
- the apparatus 1000 further includes:
- the modification module 1005 is configured to modify the instruction semantic library in response to the modification operation on the instruction semantic library.
- the apparatus 1000 further includes:
- a generating module 1006 configured to generate prompt information, where the prompt information is used to prompt an item to be checked, and the item to be checked is generated based on the difference between the first target platform and the source platform;
- the presentation module 1003 is configured to present the prompt information.
- the apparatus 1000 further includes:
- a presentation module 1003 configured to present a first intermediate representation IR, where the first IR is obtained by decompiling the first code, and the second code is obtained by compiling the first IR.
- the presentation module 1003 is further configured to present a second IR, where the second IR is obtained by decompiling the first code, and the first IR is based on the first IR A target platform is obtained by optimizing the first IR.
- the apparatus 1000 further includes:
- a modification module 1005, configured to obtain a modified first IR in response to a modification operation on the first IR;
- the presentation module 1003 is further configured to present a third code, where the third code is a low-level language-based code applied to the first target platform, and the third code is for compiling the modified first IR owned.
- the presentation module 1003 is further configured to present a third IR, where the third IR is obtained by decompiling the first code, and the third IR is used to generate an application Low level language based code for the second target platform, the third IR is different from the first IR.
- the apparatus 1000 further includes:
- a modification module 1005, configured to obtain a modified second code in response to a modification operation on the second code
- the output module 1002 is further configured to output the modified second code.
- the obtaining module 1001 is specifically configured to obtain the first code based on the low-level language and the variables based on the high-level language applied to the source platform.
- the computing device 1000 in this embodiment corresponds to the code processing method shown in FIG. 8 . Therefore, for the specific implementation of each functional module in the computing device 1000 in this embodiment and the technical effects it has, please refer to the method shown in FIG. 8 . The description of the relevant parts in the embodiment is not repeated here.
- an embodiment of the present application further provides a computing device.
- the device 1100 may include a communication interface 1110 and a processor 1120 .
- the apparatus 1100 may further include a memory 1130 .
- the memory 1130 may be disposed inside the device 1100 or outside the device 1100 .
- each action in the above-mentioned embodiment shown in FIG. 2 may be implemented by the processor 1120 .
- the processor 1120 may obtain the first code applied to the source platform through the communication interface 1110, and use it to implement any method executed in FIG. 2 .
- the steps of the processing flow may be implemented by the hardware integrated logic circuit in the processor 1120 or the instructions in the form of software to complete the method executed in FIG. 2 .
- the program codes executed by the processor 1120 for implementing the above method may be stored in the memory 1130 .
- the memory 1130 is connected to the processor 1120, such as a coupling connection and the like.
- Some features of the embodiments of the present application may be implemented/supported by the processor 1120 executing program instructions or software codes in the memory 1230 .
- the software components loaded on the memory 1230 can be summarized functionally or logically, for example, the acquisition module 901, the decompilation module 902, the compilation module 903, the presentation module 904, the determination module 905, the generation module 906, and the modification shown in FIG. 9 Module 908.
- the functions of the communication module 907 can be implemented by the communication interface 1110 .
- Any communication interface involved in the embodiments of this application may be a circuit, a bus, a transceiver, or any other device that can be used for information interaction.
- the communication interface 1110 in the apparatus 1100 for example, the other apparatus may be a device connected to the apparatus 1100, for example, may be a user terminal that provides the first code, and the like.
- an embodiment of the present application further provides a computing device.
- the device 1200 may include a communication interface 1210 and a processor 1220 .
- the apparatus 1200 may further include a memory 1230 .
- the memory 1230 may be disposed inside the device 1200 or outside the device 1200 .
- each action in the above-mentioned embodiment shown in FIG. 8 may be implemented by the processor 1220 .
- the processor 1220 can obtain the first code applied to the source platform through the communication interface 1210, and use it to implement any method executed in FIG. 8 .
- each step of the processing flow may be implemented by the hardware integrated logic circuit in the processor 1220 or the instructions in the form of software to complete the method executed in FIG. 8 .
- the program codes executed by the processor 1220 for implementing the above method may be stored in the memory 1230 .
- the memory 1230 is connected to the processor 1220, such as a coupling connection and the like.
- Some features of the embodiments of the present application may be implemented/supported by the processor 1220 executing program instructions or software codes in the memory 1230 .
- the software components loaded on the memory 1230 can be functionally or logically summarized, for example, the acquisition module 1001 , the output module 1002 , the presentation module 1003 , the determination module 1004 , the modification module 1005 , and the generation module 1006 shown in FIG. 10 .
- Any communication interface involved in the embodiments of this application may be a circuit, a bus, a transceiver, or any other device that can be used for information interaction.
- the communication interface 1210 in the apparatus 1200 for example, the other apparatus may be a device connected to the apparatus 1200, for example, may be a user terminal that provides the first code, or the like.
- the processors involved in the embodiments of the present application may be general-purpose processors, digital signal processors, application-specific integrated circuits, field programmable gate arrays or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, and may implement or The methods, steps and logic block diagrams disclosed in the embodiments of this application are executed.
- a general purpose processor may be a microprocessor or any conventional processor or the like.
- the steps of the methods disclosed in conjunction with the embodiments of the present application may be directly embodied as executed by a hardware processor, or executed by a combination of hardware and software modules in the processor.
- the coupling in the embodiments of the present application is an indirect coupling or communication connection between devices, modules or modules, which may be in electrical, mechanical or other forms, and is used for information exchange between devices, modules or modules.
- the processor may cooperate with the memory.
- the memory can be a non-volatile memory, such as a hard disk drive (HDD) or a solid-state drive (SSD), etc., or a volatile memory (volatile memory), such as random access memory (random-state drive, SSD), etc. access memory, RAM).
- Memory is, but is not limited to, any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer.
- connection medium among the above-mentioned communication interface, processor, and memory is not limited in the embodiments of the present application.
- the memory, the processor and the communication interface can be connected by a bus.
- the bus can be divided into an address bus, a data bus, a control bus, and the like.
- the embodiments of the present application further provide a computer storage medium, where a software program is stored in the storage medium, and when the software program is read and executed by one or more processors, it can implement any one or more of the above Embodiments provide a method for proxy edge-side device or edge-side device or cloud center execution.
- the computer storage medium may include: a U disk, a removable hard disk, a read-only memory, a random access memory, a magnetic disk or an optical disk and other mediums that can store program codes.
- an embodiment of the present application further provides a chip, where the chip includes a processor for implementing the function of the proxy edge device or the function of the edge device or the function of the cloud center involved in the above embodiments, for example It is used to implement the method performed by the proxy edge device in FIGS. 3 to 4 , the method performed by the edge device in FIGS. 3 to 4 , or the method performed by the cloud center in FIGS. 3 to 4 .
- the chip further includes a memory, and the memory is used for necessary program instructions and data to be executed by the processor.
- the chip may consist of chips, or may include chips and other discrete devices.
- the embodiments of the present application may be provided as a method, a system, or a computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
- computer-usable storage media including, but not limited to, disk storage, CD-ROM, optical storage, etc.
- These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory result in an article of manufacture comprising instruction means, the instructions
- the apparatus implements the functions specified in the flow or flow of the flowcharts and/or the block or blocks of the block diagrams.
Landscapes
- Engineering & Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Stored Programmes (AREA)
Abstract
本申请公开了一种代码处理方法、装置及存储介质,包括:获取应用于源平台的基于低级语言的第一代码;对获取的第一代码进行反编译,得到中间表示IR;然后将该IR编译成应用于第一目标平台的基于低级语言的第二代码,其中,源平台与目标平台具有不同指令集。例如,可以将适用于x86平台的代码转换成适用于ARM平台的代码,无需由技术人员通过人工编写程序代码的方式实现软件代码的跨平台迁移。如此,可以实现将源平台的软件代码移植至目标平台上运行,从而可以降低构建第一目标平台的软件生态的难度。
Description
本申请要求于2020年9月21日提交中国知识产权局、申请号为202010996387.6、申请名称为“一种代码处理方法、装置及存储介质”的中国专利申请的优先权,以及于2020年9月30日提交的申请号为202011066288.4、发明名称为“一种代码处理方法、装置及存储介质”的中国专利申请的优先权,前述两件专利申请的全部内容通过引用结合在本申请中。
本申请实施例涉及代码处理技术领域,尤其涉及一种代码处理方法、装置及存储介质。
x86平台,是一种英特尔公司主导开发的通用计算平台,具体是对一系列基于Intel 8086且向后兼容的中央处理器指令集架构的泛指。自1978年面世以来,x86平台,经过多年积累,演化出庞大且复杂的指令集。
实际应用中,开发人员通常会基于x86平台开发出能够应用于x86平台的各种软件,由此构建出了庞大的软件生态群。因此,目前很多应用软件可能仅能只能适用于x86平台,而无法适用于其它平台,如ARMv8平台(一种由ARM公司发布的支持64位指令集的处理器架构)等。
在构建ARMv8平台或者其它平台的软件生态时,通常是由技术人员根据该平台的代码规则(如指令集等)编写软件代码,以使得开发出的软件能够适用于该平台。但是,这种为该平台开发新软件的难度通常较高,软件开发效率较慢,这就使得构建ARMv8平台或者其它平台的软件生态的难度较高。为此,目前亟需一种能够降低构建平台软件生态难度的方法。
发明内容
本申请实施例提供一种代码处理方法、装置及存储介质,用以降低构建平台的软件生态的难度。
第一方面,本申请实施例提供一种代码处理方法,通过将源平台的软件代码移植至第一目标平台上,来降低构建平台的软件生态的难度,该源平台与第一目标平台属于不同平台,具体可以是具有不同的指令集。具体实现时,可以先获取应用于源平台的基于低级语言的第一代码,该第一代码例如可以是基于汇编语言或者机器语言的代码,并且能够被源平台所识别;然后,可以对获得的第一代码进行反编译,得到第一中间表示(intermediate representation,IR),该第一IR可以是与第一目标平台相关的IR,也可以是与第一目标平台无关的IR;接着,可以对该第一IR进行编译,得到应用于第一目标平台的基于低级语言的代码,而所得到的代码能够被第一目标平台所识别并运行,从而实现将源平台上的软件代码移植到第一目标平台。
同时,上述对软件代码进行反编译以及编译的过程,无需开发人员参与,从而可以实现开发人员与软件代码的隔离,降低开发人员接触软件代码的可能性。而对于软件运营商,可以根据在移植到第一目标平台的软件代码进行优化和二次开发,便于软件运营商对移植到第一目标平台的软件代码进行维护。
其中,源平台例如可以是x86平台,而第一目标平台可以是ARM平台,具体是ARMv8平台等。当然,实际应用中,源平台可以是任意平台,而第一目标平台可以是与源平台不同的任意一种平台。
上述方法可以应用于本地,也可以应用于云端。其中,当应用于本地时,具体可以是应用于本地的终端或者服务器等,而当应用于云端时,具体可以是作为云服务的形式呈现给用户。
在一种可能的实施方式中,可以将源平台的第一代码,移植到任意目标平台。具体的,以分别移植到第一目标平台以及第二目标平台为例,除了可以通过上述反编译以及编译过程,得到应用于第一目标平台的基于低级语言的代码之外,也可以是在反编译过程中,根据第一代码得到第二平台对应的IR,该第二目标平台对应的IR与第一IR不同,所适用的目标平台不同,第一目标平台与第二目标平台具有不同的指令集,并且,第二目标平台与源平台也具有不同的指令集。即,当将源平台上的软件代码移植至任意平台上时,均可以采用上述反编译以及编译过程进行实现。
在一种可能的实施方式中,可以是向用户呈现目标平台选择界面,该目标平台选择界面可以呈现多个目标平台的候选,可供用户进行目标平台的选择。然后,可以响应用户针对于目标平台的选择操作,从多个目标平台中确定出用户所选择的目标平台,例如上述第一目标平台,可以是根据用户在目标平台选择界面上执行的选择操作进行确定的。
在一种可能的实施方式中,在对第一代码进行反编译的过程中,可以先获取第一代码额注解信息,该注解信息例如可以包括该第一代码中参数的类型、数量、跳转地址种类中的任意一种或者多种。这样,在进行反编译时,可以根据该注解信息对第一代码进行反编译,得到第一IR。以注解信息中的参数类型为例,假设第一代码中的参数的长度为64位,则在对第一代码进行反编译时所得到的第一IR中,参数的长度仍然可以是64位(如浮点型),而不会变更为32位等(如整型),如此,可以使得反编译前后参数的类型保持一致,降低第一IR中出现逻辑错误的可能。
在一种可能的实施方式中,在对第一代码进行反编译过程中,还可以对初始得到的IR进行优化。具体的,在对第一代码进行反编译后,可以得到第二IR,通常情况下,该第二IR可能具有一定的优化空间,比如,所生成的第二IR中对于数据的读取方式为每次读取64比特的数据,而第一目标平台具有一次性读取128比特的能力,可以将第二IR中的数据读取方式优化为每次读取128比特数据。具体实现时,可以根据第一目标平台,如根据第一目标平台的硬件/逻辑能力等,对第二IR进行优化,从而得到上述第一IR。这样,在后续对第一IR进行编译后,可以得到更高效的代码。
在一种可能的实施方式中,还可以进一步生成并呈现提示信息,该提示信息可以用于针对待检查项进行提示,其中,待检查项可以是基于第一目标平台与源平台之间的差异生成。比如,该提示信息,具体可以是通过高亮方式进行显示的源平台的指令以及与该指令具有相同语义的第一目标平台的指令,以便由用户基于该提示信息确定基于源平台的指令翻译得到的第一目标平台的指令是否准确。
在一种可能的实施方式中,当上述方法应用于云端时,用户可以向云端提供第一代码,比如,用户可以通过终端或者客户端等向云端发送包含第一代码的代码处理请求,以使得云端获取第一代码;相应的,云端在接收到的第一代码进行相应的反编译以及编译处理后,可以将得到的应用于第一目标平台的基于低级语言的代码发送给用户,以使得用户获得其所需的代码。
在一种可能的实施方式中,在对第一代码进行反编译时,具体可以是根据源平台对应的指令语义库对第一代码进行反编译。例如,该指令语义库中可以包含多条源平台的指令所具有的语义,这样,在反编译过程中,可以遍历第一代码中的各条指令,并根据该指令语义库识别出该指令所具有语义,从而可以根据所识别出的指令语义将第一代码反编译成相应的编译器IR。
在一种可能的实施方式中,指令语义库还可以被用户进行修改,比如,用户可以向该指令语义库中添加单指令流多数据流(Single Instruction Multiple Data,SIMD)指令等,以用于识别出第一代码中的SIMD指令,从而可以在反编译过程中,针对于第一代码中的SIMD指令进行相应的反编译处理,以使得第一IR中包含向量化语义的指令。相应的,终端或者服务器可以响应用户针对于指令语义库的修改操作,对指令语义库进行相应的修改。
在一种可能的实施方式中,在嵌入汇编场景中,需要移植的软件代码中可以同时包括基于低级语言的第一代码以及基于高级语言的变量,因此,终端或者服务器在获取第一代码时,还获取基于高级语言的变量,从而在对第一代码进行反编译时,可以将第一代码翻译成包含函数的第一IR,该函数中包括形参,而该形参所对应的实参可以是变量。即,在反编译过程中,可以将基于高级语言的变量作为实参传递给函数中的形参。
在一种可能的实施方式中,在将第一代码翻译成包含函数的第一IR时,具体可以是确定该第一代码中各个指令串的语义,从而可以根据语义与函数之间的对应关系,确定该第一代码中各个指令串的语义所对应的函数,进而可以生成包含该函数的第一IR。
在一种可能的实施方式中,在对第一代码进行反编译之前,还可以对第一代码中的变量进行重定位,这样,对第一代码进行反编译后,所得到的第一IR中的各个变量可以具有不同的逻辑地址,以第一IR中包含的第一变量以及第二变量为例,第一变量可以具有第一逻辑地址,而第二变量可以具有第二逻辑地址,而第一逻辑地址与第二逻辑地址为不同的逻辑地址。具体实现时,可以在对第一代码进行反编译之前,为第一代码中的第一变量配置预设的第一逻辑地址,为第一代码中的第二变量配置预设的第二逻辑地址,其中,该第一逻辑地址以及第二逻辑地址均可以是抽象的逻辑地址。
在一种可能的实施方式中,由于源平台与第一目标平台之间可能在函数调用约定或者SIMD指令可能存在差异,因此,在对第一代码进行反编译时,具体可以是根据第一目标平台的函数调用约定或者第一目标平台的SIMD指令对第一代码进行反编译,从而使得最终得到的代码能够满足第一目标平台的函数调用约定或者SIMD指令要求。实际应用中,具体可以是根据第一目标平台与源平台之间在函数调用约定或者SIMD指令上的差异,进行反编译处理。
在一种可能的实施方式中,当第一代码中包括SIMD指令时,可以是采用直接向量化的方式对第一代码进行反编译。具体的,可以先生成目标平台的基于低级语言的第三代码,该第三代码能够用于描述第一代码中SIMD指令的向量化语义;示例性的,该第三代码例如可以是包括intrinsic函数的代码。然后,可以对该第三代码进行反编译,得到具有向量化标记的第一目标平台的SIMD指令的所对应的第一IR,此时,该第一IR与第一目标平台存在关联。
在一种可能的实施方式中,当第一代码中包括SIMD指令时,可以是采用间接向量化得到方式对第一代码进行反编译。具体实现时,可以先生成基于高级语言的第四代码,该第四代码能够用于描述第一代码中SIMD指令的向量化语义;然后,可以对该第四代码进行编译,得到具有向量化标记的第一IR。然后,在编译阶段,可以对具有向量化标记的IR进行自动 向量化编译,生成应用于第一目标平台基于低级语言的代码,该代码可以包括第一目标平台的SIMD指令。如此,实现了第一代码中的SIMD指令的间接向量化处理。
第二方面,本申请实施例还提供了一种代码处理方法,在实现软件代码移植的过程中,可以获取应用于源平台的基于低级语言的第一代码,然后,可以根据该第一代码输出第二代码,其中,第二代码为能够应用于第一目标平台的基于低级语言的代码,并且,该第二代码是对获取的第一代码进行处理得到的,例如可以是对第一代码先进行反编译再进行编译而得到,而源平台与第一目标平台具有不同指令集。如此,根据源平台的第一代码可以得到应用于第一目标平台的第二代码,而所得到的第二代码能够被第一目标平台所识别并运行,从而实现将源平台上的软件代码移植到第一目标平台。
同时,上述对第一代码的处理过程,可以无需开发人员参与,从而可以实现开发人员与软件代码的隔离,降低开发人员接触软件代码的可能性。而对于软件运营商,可以根据在移植到第一目标平台的软件代码进行优化和二次开发,便于软件运营商对移植到第一目标平台的软件代码进行维护。
其中,源平台例如可以是x86平台,而第一目标平台可以是ARM平台,具体是ARMv8平台等。当然,实际应用中,源平台可以是任意平台,而第一目标平台可以是与源平台不同的任意一种平台。
在一种可能的实施方式中,在输出第二代码时,具体可以是通过代码显示界面呈现该第二代码,这样,用户可以在该代码显示界面上查看到翻译后的代码,从而可以在该翻译后的代码的基础上进行二次开发、优化等操作。
在一种可能的实施方式中,对第一代码的处理过程可以应用于云端,则在获取第一代码时,具体可以是接收来自用户的第一代码;比如,用户可以通过终端或者客户端等向云端发送代码处理请求,该代码处理请求中可以携带有该第一代码等,当然,用户也可以是通过其它方式向云端提供第一代码等。这样,云端在对第一代码进行处理并得到第二代码后,在输出第二代码时,具体可以是向用户输出该第二代码,比如,可以是呈现在用户所使用的终端的代码显示界面等。
在一种可能的实施方式中,在对第一代码进行处理之前,还可以呈现目标平台选择界面,该目标平台选择界面可以呈现多个目标平台的候选,可供用户进行目标平台的选择。然后,可以响应用户针对于目标平台的选择操作,从多个目标平台中确定出用户所选择的目标平台,例如上述第一目标平台,可以是根据用户在目标平台选择界面上执行的选择操作进行确定的。
在一种可能的实施方式中,在对第一代码进行处理时,具体可以是先获取源平台对应的指令语义库,并利用该指令语义库对第一代码进行处理,如指令语义库中可以包含多条源平台的指令所具有的语义,这样,在对第一代码进行处理时,可以遍历第一代码中的各条指令,并根据该指令语义库识别出该指令所具有语义,从而可以根据所识别出的指令语义将第一代码反编译成相应的编译器IR。
在一种可能的实施方式中,用户还可以对源平台对应的指令语义库进行修改,比如,用户可以向该指令语义库中添加SIMD指令等,以用于识别出第一代码中的SIMD指令等。则,可以响应于用户执行的针对该指令语义库的修改操作,对源平台对应的指令语义库进行修改。这样,在对第一代码进行反编译的过程中,可以根据修改后的指令语义库进行反编译。
在一种可能的实施方式中,在根据第一代码得到第二代码后,还可以生成并呈现提示信息,该提示信息用于针对待检查项进行提示,其中,待检查项基于所述第一目标平台与所述源平台之间的差异生成。比如,该提示信息,具体可以是通过高亮方式进行显示的源平台的 指令以及与该指令具有相同语义的第一目标平台的指令,以便由用户基于该提示信息确定基于源平台的指令翻译得到的第一目标平台的指令是否准确。
在一种可能的实施方式中,不仅可以输出第二代码,还可以呈现对第一代码进行反编译而得到的第一IR,相应的,所输出的第二代码,则是对第一IR进行编译得到。这样,用户可以对呈现的第一IR进行调试、观察等操作,并进行相应的分析。
在一种可能的实施方式中,还可以呈现第二IR,该第二IR是对所述第一代码进行反编译得到,具体的,在反编译过程中,可以先根据第一代码反编译得到第二IR,然后,再对该第二IR进行优化,可以得到第一IR。这样,经过优化处理后所得到的第一IR,可以在代码执行阶段更加高效。,比如,所生成的第二IR中对于数据的读取方式为每次读取64比特的数据,而第一目标平台具有一次性读取128比特的能力,可以将第二IR中的数据读取方式优化为每次读取128比特数据,从而在读取相同数量的数据时,可以无需执行两次的数据读取操作。
在一种可能的实施方式中,输出的第一IR,还可以被用户修改,比如,当用户在确定输出的第一IR中存在逻辑错误或者存在可以优化的代码时,可以对该第一IR进行修改。则,终端或者服务器响应于针对该第一IR的修改操作,可以得到修改后的第一IR,从而可以根据修改后的第一IR进行编译,可以得到相对较优的第三代码,该第三代码为应用于第一目标平台的基于低级语言的代码,从而可以将该第三代码呈现给用户。
在一种可能的实施方式中,当根据第一代码生成应用于第二目标平台的基于低级语言的代码时,对第一代码进行反编译时,可以将第一代码反编译成第三IR,而后续可以通过对该第三IR进行编译,得到应用于第二目标平台的低级语言代码。其中,不同目标平台所对应的IR可以不同,从而可以针对于不同的目标平台,生成不同的IR以及代码。
在一种可能的实施方式中,用户还可以对输出的第二代码进行修改,比如,用户可以在输出的第二代码的基础上,进行二次开发和优化等。相应的,通过响应用户针对该第二代码的修改操作,可以得到修改后的第二代码,同时,可以实时向用户呈现修改后的第二代码,以便于用户进行查看。
在一种可能的实施方式中,在获取第一代码时,不仅可以获取应用于源平台的基于低级语言的第一代码,还可以同时获取基于高级语言的变量。比如,在嵌入汇编场景中,不仅包含作为低级语言的汇编语言代码,还可以包括高级语言的变量,如C/C++语言的变量等。
第三方面,基于与第一方面的方法实施例同样的发明构思,本申请实施例提供了一种计算装置。该装置具有实现上述第一方面的各实施方式对应的功能。该功能可以通过硬件实现,也可以通过硬件执行相应的软件实现。该硬件或软件包括一个或多个与上述功能相对应的模块。
第四方面,基于与第二方面的方法实施例同样的发明构思,本申请实施例提供了二种计算装置。该装置具有实现上述第二方面的各实施方式对应的功能。该功能可以通过硬件实现,也可以通过硬件执行相应的软件实现。该硬件或软件包括一个或多个与上述功能相对应的模块。
第五方面,本申请实施例提供一种计算装置,包括:处理器和存储器;该存储器用于存储指令,当该计算装置运行时,该处理器执行该存储器存储的该指令,以使该装置执行上述第一方面或第一方面的任一实现方式中的代码处理方法。需要说明的是,该存储器可以集成于处理器中,也可以是独立于处理器之外。装置还可以包括总线。其中,处理器通过总线连接存储器。其中,存储器可以包括可读存储器以及随机存取存储器。
第六方面,本申请实施例提供一种计算装置,包括:处理器和存储器;该存储器用于存储指令,当该计算装置运行时,该处理器执行该存储器存储的该指令,以使该装置执行上述第二方面或第二方面的任一实现方式中的代码处理方法。需要说明的是,该存储器可以集成于处理器中,也可以是独立于处理器之外。装置还可以包括总线。其中,处理器通过总线连接存储器。其中,存储器可以包括可读存储器以及随机存取存储器。
第七方面,本申请实施例还提供一种可读存储介质,所述可读存储介质中存储有程序或指令,当其在计算机上运行时,使得上述第一方面或第一方面的任一实现方式中的代码处理方法被执行。
第八方面,本申请实施例还提供一种可读存储介质,所述可读存储介质中存储有程序或指令,当其在计算机上运行时,使得上述第二方面或第二方面的任一实现方式中的代码处理方法被执行。
第九方面,本申请实施例还提供一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行上述第一方面或第一方面的任一实现方式中的任意代码处理方法。
第十方面,本申请实施例还提供一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行上述第二方面或第二方面的任一实现方式中的任意代码处理方法。
另外,第三方面至十方面中任一种实现方式所带来的技术效果可参见第一方面中不同实现方式所带来的技术效果,或者可参见第二方面中不同实现方式所带来的技术效果,此处不再赘述。
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请中记载的一些实施例,对于本领域普通技术人员来讲,还可以根据这些附图获得其他的附图。
图1为本申请实施例中一示例性系统架构示意图;
图2为本申请实施例中一种代码处理方法的流程示意图;
图3为本申请实施例中一种嵌入汇编场景的代码示意图;
图4为本申请实施例中一种参数化翻译的流程示意图;
图5为本申请实施例中对函数的参数进行调整的示意图;
图6为本申请实施例中优化前与优化后的IR示意图;
图7为本申请实施例中一种目标平台选择界面的示意图;
图8为本申请实施例中一种用户801与计算设备802进行交互的示意图;
图9为本申请实施例中一种计算装置的结构示意图;
图10为本申请实施例中另一种计算装置的结构示意图;
图11为本申请实施例中一种计算装置的硬件结构示意图;
图12为本申请实施例中又一种计算装置的硬件结构示意图。
在构建平台的软件生态时,通常是由技术人员编写出能够在该平台上运行的软件。但是,这种人工编写软件程序代码的方式,不仅软件开发效率低,而且,编写程序代码容易出现各种程序错误,这使得开发软件的难度较高,从而使得构建平台软件生态的难度也较高。
为此,本申请实施例提供了一种代码处理方法,可以是将其它平台上的软件移植到本平台上,从而丰富能够在本平台上成功运行的软件,以此降低构建本平台的软件生态的难度。具体实现时,可以将源平台(即上述其它平台)的软件代码进行反编译,得到编译器的内部表示(intermediate representation,IR),然后再将该IR编译成应用于目标平台(上述本平台)的基于低级语言的代码,从而可以该代码可以成功运行在目标平台,如此,可以实现将源平台的软件移植至目标平台上运行。当然,源平台与目标平台为不同的平台,这两个平台之间的区别至少是具有不同的指令集。
并且,上述对软件代码进行反编译以及编译的过程,无需开发人员参与,从而可以实现开发人员与软件代码的隔离,降低开发人员接触软件代码的可能性。而对于软件运营商,可以根据在移植到目标平台的软件代码进行优化和二次开发,便于软件运营商对移植到目标平台的软件代码进行维护。
作为一种示例,上述代码处理方法可以应用于图1所示的系统架构中。如图1所示,该系统架构100包括反编译模块101以及编译模块102。对于应用于源平台103上的代码1,反编译模块101对其进行反编译,得到IR,再将反编译得到的IR传递给编译模块102,并由编译模块102将该IR编译成应用于目标平台104的基于低级语言的代码2,这样,所得到的代码2可以在目标平台104上运行。
实际应用中,该反编译模块101可以是基于软件的功能模块;或者也可以是由具有反编译功能的设备实现,如反编译器等。类似的,编译模块102可以是由具有编译功能的设备实现,如编译器等。反编译模块101以及编译模块102可以被部署于目标平台104,也可以是被部署于终端105或服务器106等。示例性的,当反编译模块101以及编译模块102被部署于终端105以及服务器106时,可以是将反编译模块101部署于服务器106上,而编译模块部署于终端105等;或者,反编译模块101以及编译模块102均可以部署于位于云端的服务器106中,此时,在软件移植过程中,终端105可以向云端(具体可以是云端的服务器106)发送软件代码,例如可以是想云端发送代码处理请求,该代码处理请求中包括需要移植的软件代码;然后,云端(服务器106)进行上述反编译以及编译处理后,可以得到应用于目标平台的基于低级语言的代码,并将该代码发送给终端105。这样,终端105可以获得能够应用于目标平台的软件代码,实现了软件移植。当然,实际应用中,服务器106也可以是本地服务器。
为使本申请的上述目的、特征和优点能够更加明显易懂,下面将结合附图对本申请实施例中的各种非限定性实施方式进行示例性说明。显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其它实施例,都属于本申请保护的范围。
如图2所示,为本申请实施例中一种代码处理方法的流程示意图,该方法可以应用于上述目标平台104,或者可以是应用于计算设备,该计算设备例如可以是上述终端105或服务器106,并且,该计算设备可以包括反编译模块101以及编译模块102,该方法具体可以是包括:
S201:获取应用于源平台基于低级语言的第一代码。
实际应用中,软件在源平台上运行时,该软件的程序代码通常被编译成该源平台所能直接识别的低级语言的代码。其中,低级语言,是指机器能够直接识别的程序语言或指令代码,具体可以是汇编语言或机器语言等。机器语言是采用二进制代码进行表示的语言。机器语言 是计算机唯一可以识别和执行的语言。汇编语言则是为了解决机器语言难以理解和记忆的缺点,采用易于理解和记忆的名称和符号表示机器指令中的操作码的语言。汇编语言采用符号代替机器语言的二进制代码,因此汇编语言实质上是一种符号语言。
相对于低级语言而言,还存在一种独立于机器,面向过程或对象的高级语言。高级程序语言通常接近于自然语言,而且可以使用数学表达式,因而具有更强的表达能力,可以方便地表示数据的运算和程序的控制结构,能够更好的描述各种算法,如C、C++、Java等语言。高级语言,可以适用于不同的平台,如适用于x86指令集架构的平台(一种英特尔公司主导开发的通用计算平台,以下简称x86平台),也可以适用于高级精简指令集机器(advanced RISC machines,ARM)架构的平台(以下简称ARM平台)、一种精简指令集架构(Performance Optimization With Enhanced RISC–Performance Computing)架构的平台等。通常情况下,高级程序语言不能直接被机器识别和执行。开发人员可以通过编译程序如编译器对基于高级语言的代码文件进行编译,以便能够被机器识别和执行。
本实施例中,所获取的第一代码,可以是基于低级语言的代码,并且,该代码能够适用于源平台。例如,所获取的第一代码,可以是基于该源平台对应的汇编语言进行汇编所得到的“.obj”格式的文件或者其它格式文件的代码。
或者,在其它可能的实施方式中,在获取第一代码的同时,可以同时获取基于高级语言的变量。例如,在图3所示的嵌入汇编场景中,所获取的第一代码为“movdqa%%xmm4,16(%0)”等格式的汇编语言代码,而基于高级语言的变量具体为“"r"(&ff_inverse[0]),"r"(&ff_inverse[8])”中的“&ff_inverse[0]”以及“&ff_inverse[8]”。其中,嵌入汇编通常是为了提高代码执行效率以及执行处理器的专有指令所采用的编码方式。则,对该嵌入汇编代码进行汇编处理后,可以得到诸如“.obj”格式文件的第一代码。
示例性的,本实施例中的第一代码,可以是对源平台上软件的整个汇编语言的程序代码进行汇编处理后所得到“obj”格式的代码;或者是对部分程序代码进行汇编处理后所得到“obj”格式的代码,比如该软件的多个代码文件中的其中一个代码文件等;或者,也可以是一个代码文件中的一段代码,如对图3所示的代码块进行编译所得到的代码等,即本实施例中可以对软件代码进行局部反编译处理等。
本实施例中,第一代码所应用的源平台以及第一代码移植到的目标平台并不相同,具体可以是具有不同的指令集。源平台与目标平台可以是属于不同类型的平台。例如,源平台可以是x86平台,而目标平台可以是ARMv8平台(由ARM公司发布的支持64位指令集的处理器架构);或者,源平台与目标平台可以是也可以是属于同一类型下的两个不同平台,源平台可以是x86平台下的Pentium II,而目标平台可以是x86平台下的Pentium III(引入了新的SSE指令集),或者源平台可以是支持32位的x86平台,而目标平台可以是支持64位的x86-64平台等。
S202:对获取的第一代码进行反编译,得到第一IR。
本实施例中,将应用于源平台的第一代码,移植至目标平台上运行的过程中,可以先利用反编译模块101对该第一代码进行反编译,具体可以是对输入的第一代码进行词法分析、语法分析以及语义分析,并将其转换成编译器IR,该编译器IR即为第一代码对应的反编译结果。其中,编译器IR,可以表征第一代码的语义和语法结构,其可以被认为是另一种高级语言。
作为一种示例,可以仅获取基于低级语言的第一代码,此时,反编译模块101可以将第 一代码反编译为编译器IR。而在其它示例中,也可以是在获取基于低级语言的第一代码的同时,还获取基于高级语言的变量,此时,反编译模块101对第一代码进行反编译时,还需要对高级语言的变量进行参数化翻译处理,以避免第一代码中高级语言的变量信息发生丢失。
具体的,参见如图4所示的参数化翻译流程:
S401:对于混合代码块中基于低级语言的指令串,反编译模块101可以根据该指令串的语义,将该指令串翻译成包括函数的编译器第一IR,该第一IR中包含的函数所表达的语义与该指令串的语义相符。比如,当指令串为“ADD%x%y”时,对该指令串进行翻译得到的编译器IR中所包括的函数可以是用于求和的函数。
在一种翻译指令串的示例性实施方式中,反编译模块101可以预先获取指令语义与函数之间的对应关系,比如,可以预先将其保存在反编译模块101中。然后,反编译模块101可以确定该混合代码块中基于低级语言的指令串的语义,该指令串可以包括一条或者多条指令,则反编译模块可以确定该指令串中每条指令的语义。接着,反编译模块201根据该指令串中各条指令的语义,通过查找语义与函数之间的对应关系,确定该指令串中每条指令的语义所对应的函数,从而根据所确定出的函数可以进一步生成包含该函数的第一IR。
作为一种示例,反编译模块101可以通过查找源平台对应的指令语义库(如由技术人员预先对源平台的指令进行语义标记并将指令以及指令语义导入反编译模块101等),确定该指令串中每条指令的语义。该指令语义库中包括源平台的多条指令,并且每条指令的语义被预先完成语义标注。当第一代码中指令A与指令语义库中的指令a具有相同的语法结构时,可以确定该指令语义库中指令a的语义即为第一代码中指令A的语义。实际应用中,也可以是采用其它方式确定每条指令的语义,本实施例对此并不进行限定。
S402:反编译模块101在确定指令串对应的函数时,可以为该函数创建虚拟寄存器作为参数。
S403:反编译模块101为翻译过程中所确定的函数引用上述创建的虚拟寄存器,并将该虚拟寄存器作为函数的形参。
S404:反编译模块101建立高级语言的变量与函数中形参之间的对应关系,以便将该混合代码中基于高级语言的变量作为实参传递给语义函数中的形参。
示例性的,反编译模块101可以统计混合代码块中高级语言的变量,如此可以得到由多个高级语言的变量构成的列表,然后根据形参在IR中的位置以及编译器的调用约定,将列表中的变量与函数中的形参进行对应。比如,可以将列表中的第一个变量与IR中第一个函数的第一个形参对应,将列表中的第二个变量与IR中该第一个函数的第二个形参对应等。
这样,在反编译过程中,高级语言的变量信息可以是以实参保留在函数中,而不会发生信息丢失,从而实现在嵌入汇编场景下,反编译过程前后的代码信息保持一致。
实际应用中,反编译模块101可以先区分需要进行反编译的代码是仅包括基于低级语言的第一代码,还是同时包括第一代码以及高级语言变量。在一种可能的实施方式中,反编译模块101可以先检测需要进行反编译的代码的编译命令以及该代码的文件类型,以确定该第一代码是基于高级语言的代码还是基于低级语言的代码。进一步的,当基于编译命令以及文件类型确定该代码为基于低级语言的代码时,反编译模块101还可以进一步根据该代码中的语法,确定该代码是全部基于低级语言的第一代码,还是既包含低级语言的第一代码也包含高级语言的变量的混合代码块,如由C/C++语言与汇编语言(低级语言)组成的嵌入汇编代码等。
进一步的,若第一代码包括基于低级语言的第二代码以及基于高级语言的变量的混合代 码块,反编译模块101在对第一代码进行反编译过程中,针对于第一代码中的高级语言变量,可以进行上述参数化翻译过程。
本实施例中,第一代码中通常包括至少一个函数调用。被调用的函数实质是一个变量。例如,在函数y=a+b*c中,函数y实质是一个变量。在一些可能的实现方式中,函数的函数体中还关联有其它变量。例如,函数y还关联有变量a、b和c。基于此,函数关联的变量包括函数本身和函数体中关联的变量,如函数y关联的变量包括y和a、b、c。由于第一代码中,函数关联的变量的地址是相对地址,也即各变量的地址是不确定的,为此,在基于源平台对应的指令语义库将第一代码中指令翻译成编译器的第一IR之前,反编译模块101还可以对IR中的变量(区别于高级语言的变量)进行重定位,从而确定第一代码中各变量的绝对地址。
示例性的,以第一代码中包含第一变量以及第二变量为例,反编译模块101在进行反编译之前,可以为第一代码中的第一变量配置预设的第一逻辑地址,为第一代码中的第二变量配置预设的第二逻辑地址,并且,第一逻辑地址与第二逻辑地址可以不同。具体实现时,反编译模块可以访问重定位表,该重定位表中存储有多个变量(包括第一变量以及第二变量)的逻辑地址信息,反编译模块101根据重定向表中的逻辑地址信息,确定第一变量以及第二变量分别对应的逻辑地址(也即为该变量对应的绝对地址),并将该逻辑地址与变量的符号关联。其中,为第一变量以及第二变量所配置的地址信息,可以是虚假的逻辑地址信息。然后,反编译模块101可以利用源平台对应的指令语义库对完成变量的地址配置的第一代码进行反编译,得到编译器的第一IR。这样,在之后的编译阶段,该变量的逻辑地址会被重新编译成目标平台的重定位信息,即在编译阶段,该编译器的第一IR中的变量可以指向具体的逻辑地址。
实际应用中,源平台与目标平台之间除了指令集不同以外,可能还存在其它差异,比如,源平台与目标平台之间可能存在函数调用约定差异或单指令流多数据流(Single Instruction Multiple Data,SIMD)指令差异等。因此,在一些实施方式中,反编译模块101在对第一代码进行反编译时,可以先确定源平台与目标平台之间的差异,并根据该差异对第一代码进行反编译。其中,源平台与目标平台之间的差异,可以是由技术人员预先通过比较源平台与目标平台的函数调用约定或SIMD指令确定该差异,再将其导入反编译模块101中,本实施例对此并不进行限定。
在其它可能的实施方式中,反编译模块101也可以直接根据目标平台的函数调用约定或SIMD指令对第一代码进行反编译等,本实施例对此并不进行限定。如此,经过上述差异化处理或者直接基于目标平台的信息进行反编译,可以使得所得到的IR中的函数调用或SIMD指令能够符合目标平台的函数调用约定以及SIMD指令。
为便于理解,以下以根据目标平台与源平台之间的函数调用约定差异对第一代码中的参数进行调整为例进行示例性说明。
具体地,函数的参数存储在寄存器或者存储在内存的栈中。不同平台中参数的存储方式不同。反编译模块101可以根据源平台和目标平台的函数调用规则的差异,调整源代码块中函数的寄存器信息或栈信息。例如,反编译模块101可以将存储在寄存器中的参数进行调整,将其存储在栈中;或者是将存储在栈中的参数进行调整,将其存储在寄存器中。
在对寄存器信息或栈信息进行调整之前,反编译模块101可以先利用解码(decode)工具,例如intel xed对源代码块进行解码,得到指令控制流。然后反编译模块101可以针对指令控制流执行数据流分析算法,以对活跃的寄存器和栈进行分析,从而得到源代码块中函数的参数类型和数量。其中,参数类型主要用于指示参数存储于寄存器或者是存储于栈。
为了便于理解,下面分别对寄存器分析和栈分析的过程进行详细说明。
首先,本实施例定义如下几种数据集:
Use[n]:set of variables used by n;
Def[n]:set of variables defined by n;
In[n]:variables live on entry to n;
Out[n]:variables live on exit to n;
其中,variables表示变量对应的寄存器。In[n]和Out[n]分别表示输入和输出对应的寄存器的集合,Def[n]和Use[n]分别表示定义和使用对应的寄存器的集合。
反编译模块101可以遍历源代码块中的区块(block),构造每个block的use集和def集。具体构造过程如下:
a)按照block中指令执行顺序,遍历block中指令;
b)如果指令的operand的type为Register且action为kActionRead,则将加入use集;
c)如果指令的operand的type为Register且action为kActionWrite,则将加入def集;
d)如果指令的operand type为Address,则将其base_reg和index_reg加入use集;
反编译模块101可以根据上述集合建立数据流分析方程,如下所示:
(1)
其中,n表征一个block,符号表示符号右边的集合为符号左边的集合的子集。表示block中仍然有效的寄存器。
反编译模块101可以通过不动点算法对上述方程求解,如下所示:
(2)
经过不动点算法,函数入口的in集与Calling Convention规定的入参Reg取交集即为入参寄存器;函数出口的out集与Calling Convention规定的出参Reg取交集,即为可能的返回值寄存器。
反编译模块101在进行栈分析时,可以使用基于扩展栈指针寄存器(rex-extended stack pointer,RSP)的算法或者基于扩展基指针寄存器(rex-extended base pointer,RBP)的算法对指令控制流进行分析实现。
反编译模块101使用基于RSP的算法进行分析的过程具体可以包括如下步骤:
a.基于function prelogue部分(entry basic block)查看RSP是否有偏移,并记录偏移值off;
其中,反编译模块101可以通过sub指令或者push指令判断偏移。反编译模块101还记录与RSP关联的寄存器。
b.遍历所有block的所有指令,查找操作数Operand Type为kTypeAddress,action为kActionRead,base_reg=RSP(关联寄存器),内存偏移(dispalcement,dis)为正数的使用场景。该参数即为第(dis-off)/8个栈参数,然后统计参数总数S。
c.对于符合b规则的情况下,进一步区分参数类型:
如果同指令的其他寄存器操作数为整型寄存器(RXX),且指令为非浮点->整型类型转换相关指令,判断该栈参数为整型。如果同指令的其他寄存器操作数为浮点寄存器(XMM),且指令为非整型->浮点类型转换相关指令,判断该栈参数为浮点。
反编译模块101使用基于RBP的算法进行分析的过程具体可以包括如下步骤:
a.遍历所有block的所有指令,查找操作数Operand Type为kTypeAddress,action为kActionRead,base_reg=RBP,dis为正数的使用场景。该参数即为第(dis-8)/8个栈参数,统计参数总数X。
b.对于符合a规则的情况下,进一步区分参数类型:
如果同指令的其他寄存器操作数为整型寄存器(RXX),且指令为非浮点->整型类型转换相关指令,判断该栈参数为整型,如果同指令的其他寄存器操作数为浮点寄存器(XMM),且指令为非整型->浮点类型转换相关指令,判断该栈参数为浮点。
在一些可能的实现方式中,反编译模块101可以同时执行上述两个算法,然后对两个算法确定的参数总数S取最大值。
在获得参数总数和参数类型后,反编译模块101可以根据函数调用规则的差异调整参数的存储位置。具体是反编译模块101根据函数调用规则的差异,对入参寄存器和栈进行跨平台处理,例如将寄存器中的若干参数入栈、对栈指针进行切换等等,使得不同平台下的运行时的入参寄存器和栈空间视角一致。
为了便于理解,下面结合一具体示例进行说明。
参见图5所示的对函数的参数进行调整的示意图,该示例中,函数test包括i0至i9共计10个参数。在x86平台下的运行时中,参数i0至i5存储在寄存器中,i6至i9存储在栈中。在ARM平台下的运行时中,参数i0至i7存储在寄存器中,i8至i9存储在栈中。反编译模块101可以将参数i6、i7入栈,并切换栈指针,从而使得不同平台下的运行时的入参寄存器和栈空间视角一致。
该方法通过利用编译器活跃寄存器分析和栈分析,得到精确入参寄存器和栈入参,减少不必要的函数调用约定的寄存器转换。
进一步的,当源平台与目标平台属于不同类别的平台时,如源平台与目标平台分别为x86平台和ARM平台等,反编译模块101在对第一代码进行反编译之前,还可以先获取该第一代码对应的注解信息,该第一代码的注解信息例如可以包括第一代码中参数的类型、数量、以及跳转地址的种类(汇编代码内部或外部跳转等)中的任意一种或多种,从而反编译模块101在对第一代码进行反编译时,可以根据该注解信息,确定编译器IR中参数的类型、数量、跳转地址类型。其中,该第一代码中的注解信息,可以是在对汇编语言的代码进行编译过程中生成,用于携带汇编语言的相关信息。以注解信息中的参数类型为例,假设第一代码中的参数的长度为64位,则在对第一代码进行反编译时所得到的第一IR中,参数的长度仍然可以是64位(如浮点型),而不会变更为32位等(如整型),如此,根据注解信息对第一代码进行反编译,可以使得反编译前后参数的类型保持一致,降低第一IR中出现逻辑错误的可能。
当然,在其它可能的实施方式中,反编译模块101在对第一代码进行反编译时,也可以无需考虑注解信息,比如,当源平台与目标平台之间为同一类型平台时,这两个平台之间的相似度较高,差异较小,如源平台与目标平台之间,对于同一指令语义的汇编语言,仅是指令格式存在差异等,反编译模块101不借助第一代码的注解信息也可以直接对该第一代码进行反编译。
进一步的,用于对指令串进行翻译的指令语义库中,还可以包括向量化指令语义,如SIMD指令语义(该SIMD指令可以同时对一组数据中的每个数据分别执行相同的操作以实现空间上的并行处理)等,该向量化指令语义可以用于对第一代码中的部分指令进行向量化翻译,从而得到该指令所对应的向量化IR。
通常情况下,向量化的代码(指令)来可以用来替代循环执行结构,这使得程序代码更 为简洁,并且,代码执行效率也更高。比如,当第一代码中包括SIMD指令(用于对多个数据进行求和)时,若反编译模块101在对该SIMD指令不进行向量化处理,则反编译后所得到的第一IR中,其对应的代码执行过程为从一组数据中逐个执行读数据并求和的串行操作,而对该SIMD指令进行向量化处理后,反编译后所得到的第一IR中,其对应的代码执行过程为从一组数据中读取全部数据,并对该组的所有数据进行并行求和计算。
以第一代码中包括SIMD指令为例,反编译模块101在对第一代码进行翻译时,对于第一代码中的其它指令,可以根据该指令的语义,翻译成包括相应函数的IR;而对于第一代码中的SIMD指令,可以对该SIMD指令进行向量化翻译,得到具有向量化标记的第一IR。实际应用中,该向量化标记,例如可以是第一IR中的特殊符号,如“^”、“!”、“<”等符号。示例性的,技术人员可以预先向导入的指令语义库中添加源平台的SIMD指令的语义,从而在编译阶段,反编译模块101可以根据该指令语义库识别出第一代码中SIMD指令的语义。
其中,反编译模块101可以直接或者间接将第一代码中的SIMD指令翻译成具有向量化标记的IR。
在一种直接向量化的实施方式中,反编译模块101可以生成包括应用于目标平台的基于低级语言的的第三代码,该第三代码例如可以是包括目标平台对应的intrinsic函数的低级语言代码,该intrinsic函数可以对语言扩展或平台相关的能力进行包装并定义在C/C++等高级语言头文件中,这样,所生成的与目标平台相关的第三代码可以用于描述源平台的SIMD指令的向量化语义。然后,反编译模块101可以对该第三代码进行反编译,得到具有向量化标记的目标平台的SIMD指令所对应的第一IR。这样,后续编译器对该第一IR进行编译后,可以得到该目标平台的SIMD指令。
在一种间接向量化的实施方式中,反编译模块101可以生成与目标平台无关的基于高级语言的第四代码,该第四代码可以用于描述SIMD指令的向量化语义。然后,反编译模块101可以对基于高级语言的第四代码进行反编译,得到与平台无关的并且带有向量化标记的第一IR。这样,后续编译器可以对该第一IR进行自动向量化编译,生成目标平台的SIMD指令。
S203:将反编译得到的第一IR编译成应用于目标平台的基于低级语言的第二代码,该源平台与目标平台之间具有不同的指令集。
具体实现时,可以由编译模块102对得到的第一IR进行编译,得到能够在目标平台上运行的第二代码,当然,该第二代码为目标平台所支持的低级语言代码,如目标平台对应的汇编代码等。
通常情况下,对第一代码进行反编译所得到的IR,可能具有一定的优化空间。比如,当反编译模块101并没有基于目标平台的能力对第一代码进行反编译时,由于不同平台的能力通常存在差异,因此,为了使得反编译得到的IR可以适用于多种平台,反编译模块101可以是基于多个平台的最低能力对第一代码进行反编译,这使得反编译模块101还可以根据目标平台所具有的较高能力对所得到的IR进行优化,以使得目标平台在执行该IR对应的代码时,代码执行效率相对更高。其中,平台的能力可以包括平台所支持的数据读取速度、数据访存方式等。
举例来说,假设已有的各个平台在读取数据时,一部分平台能够一次性读取64bit(比特)的数据,而另一部分平台能够一次性读取128bit的数据。因此,反编译模块101在对第一代码进行反编译后,所得到的IR可以是如图6上方所示代码,如果目标平台基于该代码在读取数据时,每次仅读取64bit的数据,但是,目标平台实际上可以具有一次性读取128bit数据的能力,这使得目标平台在读取128位数据时,需要分两次进行读取,这就降低了代码执行效 率。为此,反编译模块101可以根据目标平台能一次性读取128bit的能力,将图6上方所示的代码优化为图6下方所示的代码,以使得目标平台在基于该代码读取数据时,每次可以读取128bit的数据。
基于此,在进一步可能的实施方式中,反编译模块101可以对第一代码进行反编译,生成第二IR,然后,再根据该第一目标平台,对该第二IR进行优化,得到第一IR。比如,反编译模块101可以在对第一代码进行反编译并得到第二IR后,还可以确定该第二IR中各(句)指令串的语义,并根据语义与编译优化规则之间的对应关系,确定第二IR中各指令串的语义所对应的编译优化规则,从而基于所确定出的编译优化规则对第二IR进行优化,得到第一IR,从而编译模块102在进行编译时,可以是对优化得到的第一IR进行编译。
对第一IR进行编译后,可以得到“.obj”格式或者其它格式的二进制文件,从而可以根据该二进制文件生成应用于目标平台的汇编语言代码。这样,经过上述过程处理所得到的汇编语言代码能够适用于目标平台,从而能够在目标平台上成功运行,实现将源平台的代码移植到目标平台。
本实施例中,目标平台可以是区别于源平台的任意一个平台,即反编译模块101以及编译模块102可以将第一代码变换为适用于任意一个平台的基于低级语言的代码。具体的,为便于描述,以下将上述目标平台可以称之为第一目标平台,则反编译模块101以及编译模块102不仅可以基于上述过程,将源平台的第一代码,变换为适用于第一目标平台的低级语言代码,还可以类似过程,将源平台的第二代码,变换为适用于第二目标平台的低级语言代码,其中,第二目标平台、第一目标平台以及源平台互不相同,具体是具有不同的指令集。
实际应用中,可以是针对于用户的需求,确定所软件代码所要移植的目标平台。在一种示例性的具体实施方式中,可以向用户呈现目标平台选择界面,例如可以在终端的显示屏幕上呈现如图7所示的目标平台选择界面,该目标平台选择界面可以提供多个不同的候选目标平台,如图7所示的目标平台1、目标平台2、……、目标平台N(N为大于1的正整数)。用户可以在该目标平台选择界面上,根据实际需求从呈现的多个候选平台中执行针对于目标平台的选择操作,从多个目标平台中确定出上述第一目标平台或者第二目标平台,从而经过上述类似过程可以将第一代码变换为适用于第一目标平台或者第二目标平台的基于低级语言的代码。例如,用户可以是利用鼠标点击下拉菜单按钮,呈现多个目标平台,然后,用户将光标移动至所期望选择的目标平台,并通过点击鼠标实现对目标平台的选择,从而确定出第一目标平台。
进一步的,基于用户所选择的目标平台,目标平台选择界面上还可以呈现该目标平台的相关信息,比如,如图7所示,对于用户选中的目标平台,还可以呈现该目标平台的数据处理能力、所适用的硬件类型、所需的硬件环境等。
另外,在经过上述反编译以及编译过程,得到应用于第一目标平台的基于低级语言的代码后,还可以针对于该代码生成相应的提示信息,该提示信息可以用于指示源平台与第一目标平台之间的差异,然后,可以在该用户选择目标平台的界面上呈现该提示信息,以对用户做出提示。比如,可以提示用户第一代码中指令与第一目标平台的代码中的指令之间的对应关系,如通过特定颜色对两个平台的代码指令进行高亮显示等。实际应用中,该提示信息也可以是呈现在其它界面,不局限于上述目标平台选择界面,本实施例对于如何呈现提示信息以及呈现提示信息的具体实现方式并不进行限定。
为便于理解,下面从人机交互的角度对本申请实施例的技术方案进行描述。参见图8所 示的用户801与计算设备802交互的流程示意图。其中,计算设备802具体可以是云端设备,如云端的服务器等,或者是本地的终端/服务器等。如图8所示,该流程具体可以包括:
S801:计算设备802向用户801呈现目标平台选择界面,该目标平台选择界面中提供多个候选的目标平台。
本实施例中,计算设备802可以支持将软件代码移植到多种不同的目标平台,则,计算设备802可以先向用户呈现目标平台选择界面,并在该目标平台选择界面上呈现计算设备802所支持的能够移植的目标平台,以供用户进行选择。其中,不同平台之间至少具有不同的指令集。
S802:计算设备802根据用户针对目标平台的选择操作,从多个目标平台中确定出第一目标平台。
S803:用户801向计算设备802发送应用于源平台的基于低级语言的第一代码。
其中,用户801具体可以是通过终端或者客户端等媒介,实现将第一代码发送给计算设备802。
本实施例中,第一代码可以是基于“.obj”格式的二进制文件,或者是基于其它格式的二进制文件。在其它实施例中,用户也可以是向计算设备802发送源平台的汇编语言代码,这样,计算设备802在接收到汇编语言代码后,可以先对该汇编语言代码进行汇编处理,得到“.obj”格式或者其它格式的二进制文件。
S804:计算设备802在代码显示界面呈现第二代码,该第二代码为应用于第一目标平台的基于低级语言的代码。
其中,第二代码是通过对第一代码进行处理得到的,具体的,第二代码可以是由计算设备802对第一代码进行反编译以及编译处理而得到,其具体实现可参见前述实施例中的相关之处描述,在此不做赘述。
并且,计算设备802在得到第二代码后,可以在相应的代码显示界面上呈现该第二代码,以使得用户801能够查看到处理得到的第二代码具体如何。
本实施例中,第二代码可以是“obj”格式或者其它格式的二进制文件。在其它可能的实施例中,第二代码也可以是基于够适用于第一目标平台的其它语言的代码,例如可以是适用于第一目标平台的汇编语言代码,该汇编语言代码可以对第一代码进行反编译以及编译后生成的“obj”格式文件进行转换得到的。
S805:计算设备802呈现提示信息。
本实施例中,计算设备802可以对用户801进行提示,如提示第一代码中的指令与第二代码中的指令之间的对应关系;或者提示用户801在代码移植过程中所可能出现的问题。本实施例中,对于计算设备802呈现提示信息的内容以及具体实现并不进行限定。
S806:用户801对计算设备802呈现的第二代码进行修改。
示例性的,用户801在查看第二代码或者提示信息后,可以对第二代码进行更改,如通过提示信息或者查看第二代码发现第二代码中出现逻辑漏洞时,可以对第二代码进行修改,以解决该逻辑漏洞问题。
S807:计算设备802呈现修改后的第二代码。
实际应用中,用户801还可以继续对修改后的第二代码进行进一步修改,直至代码符合用户的预期。
S808:计算设备802呈现第一IR或第二IR。
计算设备802在对第一代码进行反编译的过程中,可以先根据第一代码反编译得到第二 IR,然后,通过对该第二IR的进一步优化,可以得到第一IR。则,计算设备802可以将反编译过程中所得到的第一IR或第二IR呈现给用户801。
这样,用户801可以查看计算设备802所呈现的第一IR或第二IR,并可以对该第一IR或第二IR进行调试,从而计算设备802可以根据调试后的第一IR或者第二IR,得到相应的应用于第一目标平台的基于低级语言的代码。
在进一步可能的实施方式中,本实施例还可以包括:
S809:用户801对源平台对应的指令语义库进行修改。
在反编译过程中,计算设备802通常是根据源平台对应的指令语义库对第一代码进行反编译,而用户可以通过查看第二代码、提示信息或者IR,对反编译过程中所使用的指令语义库进行调整,如向指令语义库中新增源平台对应的SIMD指令的语义等,以便基于调整后的指令语义库对第一代码进行反编译时,所得到的IR或者应用于第一目标平台上的代码能够更优。
值得注意的是,对于本实施例中,计算设备802对于第一代码的反编译、编译及其相关过程,可以参见前述实施例的相关之处描述,在此不做赘述。
上文中结合图1至图8,详细描述了本申请所提供的代码处理方法,下面将结合图9至图10,描述根据本申请所提供的计算装置。
与上述方法同样的发明构思,本申请实施例还提供一种计算装置,该装置可以实现上述图2所示的实施例中计算设备的功能。参见图9所示,该装置900可以包括:
获取模块901,用于获取应用于源平台的基于低级语言的第一代码;
反编译模块902,用于对所述第一代码进行反编译,得到第一中间表示IR;
编译模块903,用于将所述第一IR编译成应用于第一目标平台的基于低级语言的第二代码,所述源平台与所述第一目标平台具有不同指令集。
在一种可能的实施方式中,所述反编译模块902,还用于对所述第一代码进行反编译,得到第二目标平台对应的IR,所述第一IR与所述第二目标平台对应的IR不同,所述第一目标平台与所述第二目标平台具有不同指令集。
在一种可能的实施方式中,所述装置900还包括:
呈现模块904,用于呈现目标平台选择界面;
确定模块905,用于响应针对于目标平台的选择操作,从多个目标平台中确定出所述第一目标平台。
在一种可能的实施方式中,所述反编译模块902,具体用于:
获取所述第一代码的注解信息,所述注解信息包括所述第一代码中参数的类型、数量、跳转地址种类中的任意一种或者多种;
根据所述注解信息,对所述第一代码进行反编译,得到所述第一IR。
在一种可能的实施方式中,所述反编译模块902,具体用于:
对所述第一代码进行反编译,得到第二IR;
根据所述第一目标平台,对所述第二IR进行优化,得到所述第一IR。
在一种可能的实施方式中,所述装置900还包括:
生成模块906,用于生成提示信息,所述提示信息用于针对待检查项进行提示,所述待检查项基于所述第一目标平台与所述源平台之间的差异生成;
呈现模块904,用于呈现所述提示信息。
在一种可能的实施方式中,所述装置应用于云端,所述获取模块901,具体用于接收来自用户的第一代码;
所述装置900还包括:通信模块907,用于向所述用户发送应用于第一目标平台的基于低级语言的第二代码。
在一种可能的实施方式中,所述反编译模块902,具体用于根据所述源平台对应的指令语义库,对所述第一代码进行反编译。
在一种可能的实施方式中,所述装置900还包括:
修改模块908,用于响应于针对所述指令语义库的修改操作,对所述指令语义库进行修改。
在一种可能的实施方式中,所述获取模块901,具体用于获取应用于源平台的基于低级语言的第一代码以及基于高级语言的变量;
所述反编译模块902,具体用于将所述第一代码翻译成包括函数的第一IR,所述函数中包括形参,所述形参对应的实参为所述变量。
在一种可能的实施方式中,所述第一IR包括第一变量以及第二变量,所述第一变量具有第一逻辑地址,所述第二变量具有第二逻辑地址,所述第一逻辑地址与所述第二逻辑地址不同。
在一种可能的实施方式中,所述反编译模块902,具体用于根据所述目标平台函数调用约定或单指令流多数据流SIMD指令,对所述第一代码进行反编译。
本实施例中的计算装置900,对应于图2所示的代码处理方法,因此,对于本实施例计算装置900中各个功能模块的具体实现及其所具有的技术效果,可以参见图2所示实施例中的相关之处描述,在此不做赘述。
此外,本申请实施例还提供了另外一种计算装置,该装置可以实现上述图8所示的实施例中计算设备802的功能。参见图10所示,该装置1000可以包括:
获取模块1001,用于获取应用于源平台的基于低级语言的第一代码;
输出模块1002,用于输出第二代码,所述第二代码为应用于第一目标平台的基于低级语言的代码,所述第二代码是对所述第一代码进行处理得到的,所述源平台与所述第一目标平台具有不同指令集。
在一种可能的实施方式中,所述输出模块1002,具体用于通过代码显示界面呈现所述第二代码。
在一种可能的实施方式中,所述装置应用于云端,所述获取模块1001,具体用于接收来自用户的第一代码;
所述输出模块1002,具体用于向所述用户输出所述第二代码。
在一种可能的实施方式中,所述装置1000还包括:
呈现模块1003,用于呈现目标平台选择界面;
确定模块1004,用于响应针对于目标平台的选择操作,从多个目标平台中确定出所述第一目标平台。
在一种可能的实施方式中,所述获取模块1001,还用于获取所述源平台对应的指令语义库,所述指令语义库用于对所述第一代码进行处理。
在一种可能的实施方式中,所述装置1000还包括:
修改模块1005,用于响应于针对所述指令语义库的修改操作,对所述指令语义库进行修 改。
在一种可能的实施方式中,所述装置1000还包括:
生成模块1006,用于生成提示信息,所述提示信息用于针对待检查项进行提示,所述待检查项基于所述第一目标平台与所述源平台之间的差异生成;
呈现模块1003,用于呈现所述提示信息。
在一种可能的实施方式中,所述装置1000还包括:
呈现模块1003,用于呈现第一中间表示IR,所述第一IR是对所述第一代码进行反编译得到,所述第二代码是对所述第一IR进行编译得到。
在一种可能的实施方式中,所述呈现模块1003,还用于呈现第二IR,所述第二IR是对所述第一代码进行反编译得到,所述第一IR是根据所述第一目标平台对所述第一IR进行优化得到。
在一种可能的实施方式中,所述装置1000还包括:
修改模块1005,用于响应于针对所述第一IR的修改操作,得到修改后的第一IR;
所述呈现模块1003,还用于呈现第三代码,所述第三代码为应用于第一目标平台的基于低级语言的代码,所述第三代码是对所述修改后的第一IR进行编译得到的。
在一种可能的实施方式中,所述呈现模块1003,还用于呈现第三IR,所述第三IR是对所述第一代码进行反编译得到,所述第三IR用于生成应用于第二目标平台的基于低级语言的代码,所述第三IR与所述第一IR不同。
在一种可能的实施方式中,装置1000还包括:
修改模块1005,用于响应于针对所述第二代码的修改操作,得到修改后的第二代码;
所述输出模块1002,还用于输出所述修改后的第二代码。
在一种可能的实施方式中,所述获取模块1001,具体用于获取应用于源平台的基于低级语言的第一代码以及基于高级语言的变量。
本实施例中的计算装置1000,对应于图8所示的代码处理方法,因此,对于本实施例计算装置1000中各个功能模块的具体实现及其所具有的技术效果,可以参见图8所示实施例中的相关之处描述,在此不做赘述。
此外,本申请实施例还提供一种计算装置,如图11所示,装置1100中可以包括通信接口1110、处理器1120。可选的,装置1100中还可以包括存储器1130。其中,存储器1130可以设置于装置1100内部,还可以设置于装置1100外部。示例性地,上述图2所示实施例中各个动作均可以由处理器1120实现。处理器1120可以通过通信接口1110获取应用于源平台的第一代码,并用于实现图2中所执行的任一方法。在实现过程中,处理流程的各步骤可以通过处理器1120中的硬件的集成逻辑电路或者软件形式的指令完成图2中执行的方法。为了简洁,在此不再赘述。处理器1120用于实现上述方法所执行的程序代码可以存储在存储器1130中。存储器1130和处理器1120连接,如耦合连接等。
本申请实施例的一些特征可以由处理器1120执行存储器1230中的程序指令或者软件代码来完成/支持。存储器1230上在加载的软件组件可以从功能或者逻辑上进行概括,例如,图9所示的获取模块901、反编译模块902、编译模块903、呈现模块904、确定模块905生成模块906、以及修改模块908。而通信模块907的功能可以由通信接口1110实现。
本申请实施例中涉及到的任一通信接口可以是电路、总线、收发器或者其它任意可以用于进行信息交互的装置。比如装置1100中的通信接口1110,示例性地,该其它装置可以是与 该装置1100相连的设备,比如,可以是提供第一代码的用户终端等。
此外,本申请实施例还提供一种计算装置,如图12所示,装置1200中可以包括通信接口1210、处理器1220。可选的,装置1200中还可以包括存储器1230。其中,存储器1230可以设置于装置1200内部,还可以设置于装置1200外部。示例性地,上述图8所示实施例中各个动作均可以由处理器1220实现。处理器1220可以通过通信接口1210获取应用于源平台的第一代码,并用于实现图8中所执行的任一方法。在实现过程中,处理流程的各步骤可以通过处理器1220中的硬件的集成逻辑电路或者软件形式的指令完成图8中执行的方法。为了简洁,在此不再赘述。处理器1220用于实现上述方法所执行的程序代码可以存储在存储器1230中。存储器1230和处理器1220连接,如耦合连接等。
本申请实施例的一些特征可以由处理器1220执行存储器1230中的程序指令或者软件代码来完成/支持。存储器1230上在加载的软件组件可以从功能或者逻辑上进行概括,例如,图10所示的获取模块1001、输出模块1002、呈现模块1003、确定模块1004、修改模块1005以及生成模块1006。
本申请实施例中涉及到的任一通信接口可以是电路、总线、收发器或者其它任意可以用于进行信息交互的装置。比如装置1200中的通信接口1210,示例性地,该其它装置可以是与该装置1200相连的设备,比如,可以是提供第一代码的用户终端等。
本申请实施例中涉及的处理器可以是通用处理器、数字信号处理器、专用集成电路、现场可编程门阵列或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件,可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件处理器执行完成,或者用处理器中的硬件及软件模块组合执行完成。
本申请实施例中的耦合是装置、模块或模块之间的间接耦合或通信连接,可以是电性,机械或其它的形式,用于装置、模块或模块之间的信息交互。
处理器可能和存储器协同操作。存储器可以是非易失性存储器,比如硬盘(hard disk drive,HDD)或固态硬盘(solid-state drive,SSD)等,还可以是易失性存储器(volatile memory),例如随机存取存储器(random-access memory,RAM)。存储器是能够用于携带或存储具有指令或数据结构形式的期望的程序代码并能够由计算机存取的任何其他介质,但不限于此。
本申请实施例中不限定上述通信接口、处理器以及存储器之间的具体连接介质。比如存储器、处理器以及通信接口之间可以通过总线连接。所述总线可以分为地址总线、数据总线、控制总线等。
基于以上实施例,本申请实施例还提供了一种计算机存储介质,该存储介质中存储软件程序,该软件程序在被一个或多个处理器读取并执行时可实现上述任意一个或多个实施例提供代理边缘侧设备或边缘侧设备或云中心执行的方法。所述计算机存储介质可以包括:U盘、移动硬盘、只读存储器、随机存取存储器、磁碟或者光盘等各种可以存储程序代码的介质。
基于以上实施例,本申请实施例还提供了一种芯片,该芯片包括处理器,用于实现上述实施例所涉及的代理边缘侧设备的功能或者边缘侧设备的功能或者云中心的功能,例如用于实现图3~图4中代理边缘侧设备所执行的方法,用于实现图3~图4中边缘侧设备所执行的方法,或者用于实现图3~图4中云中心所执行的方法。可选地,所述芯片还包括存储器,所述存储器,用于处理器所执行必要的程序指令和数据。该芯片,可以由芯片构成,也可以包含芯片和其他分立器件。
本领域内的技术人员应明白,本申请的实施例可提供为方法、系统、或计算机程序产品。因此,本申请可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
本申请是参照根据本申请实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的术语在适当情况下可以互换,这仅仅是描述本申请的实施例中对相同属性的对象在描述时所采用的区分方式。
显然,本领域的技术人员可以对本申请实施例进行各种改动和变型而不脱离本申请实施例的范围。这样,倘若本申请实施例的这些修改和变型属于本申请权利要求及其等同技术的范围之内,则本申请也意图包含这些改动和变型在内。
Claims (56)
- 一种代码处理方法,其特征在于,所述方法包括:获取应用于源平台的基于低级语言的第一代码;对所述第一代码进行反编译,得到第一中间表示IR;将所述第一IR编译成应用于第一目标平台的基于低级语言的第二代码,所述源平台与所述第一目标平台具有不同指令集。
- 根据权利要求1所述的方法,其特征在于,所述方法还包括:对所述第一代码进行反编译,得到第二目标平台对应的IR,所述第一IR与所述第二目标平台对应的IR不同,所述第一目标平台与所述第二目标平台具有不同指令集。
- 根据权利要求1或2所述的方法,其特征在于,所述方法还包括:呈现目标平台选择界面;响应针对于目标平台的选择操作,从多个目标平台中确定出所述第一目标平台。
- 根据权利要求1至3任一项所述的方法,其特征在于,所述对所述第一代码进行反编译,得到第一中间表示IR,包括:获取所述第一代码的注解信息,所述注解信息包括所述第一代码中参数的类型、数量、跳转地址种类中的任意一种或者多种;根据所述注解信息,对所述第一代码进行反编译,得到所述第一IR。
- 根据权利要求1至4任一项所述的方法,其特征在于,所述对所述第一代码进行反编译,得到第一中间表示IR,包括:对所述第一代码进行反编译,得到第二IR;根据所述第一目标平台,对所述第二IR进行优化,得到所述第一IR。
- 根据权利要求1至5任一项所述的方法,其特征在于,所述方法还包括:生成提示信息,所述提示信息用于针对待检查项进行提示,所述待检查项基于所述第一目标平台与所述源平台之间的差异生成;呈现所述提示信息。
- 根据权利1至6任一项所述的方法,其特征在于,所述方法应用于云端,所述获取第一代码,包括:接收来自用户的第一代码;所述方法还包括:向所述用户发送所述第二代码。
- 根据权利要求1至7任一项所述的方法,其特征在于,所述对所述第一代码进行反编译,包括:根据所述源平台对应的指令语义库,对所述第一代码进行反编译。
- 根据权利要求8所述的方法,其特征在于,所述方法还包括:响应于针对所述指令语义库的修改操作,对所述指令语义库进行修改。
- 根据权利要求1至9任一项所述的方法,其特征在于,所述获取应用于源平台的基于低级语言的第一代码,包括:获取应用于源平台的基于低级语言的第一代码以及基于高级语言的变量;所述对所述第一代码进行反编译,包括:将所述第一代码翻译成包括函数的第一IR,所述函数中包括形参,所述形参对应的实参为所述变量。
- 根据权利要求1至10任一项所述的方法,其特征在于,所述第一IR包括第一变量以及第二变量,所述第一变量具有第一逻辑地址,所述第二变量具有第二逻辑地址,所述第一逻辑地址与所述第二逻辑地址不同。
- 根据权利要求1至11任一项所述的方法,其特征在于,所述对所述第一代码进行反编译,包括:根据所述目标平台函数调用约定或单指令流多数据流SIMD指令,对所述第一代码进行反编译。
- 一种代码处理方法,其特征在于,所述方法包括:获取应用于源平台的基于低级语言的第一代码;输出第二代码,所述第二代码为应用于第一目标平台的基于低级语言的代码,所述第二代码是对所述第一代码进行处理得到的,所述源平台与所述第一目标平台具有不同指令集。
- 根据权利要求13所述的方法,其特征在于,所述输出第二代码,包括:通过代码显示界面呈现所述第二代码。
- 根据权利要求13或14所述的方法,其特征在于,所述方法应用于云端,所述获取第一代码,包括:接收来自用户的第一代码;所述输出所述第二代码,包括:向所述用户输出所述第二代码。
- 根据权利要求13至15任一项所述的方法,其特征在于,所述方法还包括:呈现目标平台选择界面;响应针对于目标平台的选择操作,从多个目标平台中确定出所述第一目标平台。
- 根据权利要求13至16任一项所述的方法,其特征在于,所述方法还包括:获取所述源平台对应的指令语义库,所述指令语义库用于对所述第一代码进行处理。
- 根据权利要求17所述的方法,其特征在于,所述方法还包括:响应于针对所述指令语义库的修改操作,对所述指令语义库进行修改。
- 根据权利要求13至18任一项所述的方法,其特征在于,所述方法还包括:生成提示信息,所述提示信息用于针对待检查项进行提示,所述待检查项基于所述第一目标平台与所述源平台之间的差异生成;呈现所述提示信息。
- 根据权利要求13至19任一项所述的方法,其特征在于,所述方法还包括:呈现第一中间表示IR,所述第一IR是对所述第一代码进行反编译得到,所述第二代码是对所述第一IR进行编译得到。
- 根据权利要求20所述的方法,其特征在于,所述方法还包括:呈现第二IR,所述第二IR是对所述第一代码进行反编译得到,所述第一IR是根据所述第一目标平台对所述第一IR进行优化得到。
- 根据权利要求20或21所述的方法,其特征在于,所述方法还包括:响应于针对所述第一IR的修改操作,得到修改后的第一IR;呈现第三代码,所述第三代码为应用于第一目标平台的基于低级语言的代码,所述第三代码是对所述修改后的第一IR进行编译得到的。
- 根据权利要求20至22任一项所述的方法,其特征在于,所述方法还包括:呈现第三IR,所述第三IR是对所述第一代码进行反编译得到,所述第三IR用于生成应 用于第二目标平台的基于低级语言的代码,所述第三IR与所述第一IR不同。
- 根据权利要求13至23任一项所述的方法,其特征在于,方法还包括:响应于针对所述第二代码的修改操作,得到修改后的第二代码;输出所述修改后的第二代码。
- 根据权利要求13至24任一项所述的方法,其特征在于,所述获取应用于源平台的基于低级语言的第一代码,包括:获取应用于源平台的基于低级语言的第一代码以及基于高级语言的变量。
- 一种代码处理装置,其特征在于,所述装置包括:获取模块,用于获取应用于源平台的基于低级语言的第一代码;反编译模块,用于对所述第一代码进行反编译,得到第一中间表示IR;编译模块,用于将所述第一IR编译成应用于第一目标平台的基于低级语言的第二代码,所述源平台与所述第一目标平台具有不同指令集。
- 根据权利要求26所述的装置,其特征在于,所述反编译模块,还用于对所述第一代码进行反编译,得到第二目标平台对应的IR,所述第一IR与所述第二目标平台对应的IR不同,所述第一目标平台与所述第二目标平台具有不同指令集。
- 根据权利要求25或26所述的装置,其特征在于,所述装置还包括:呈现模块,用于呈现目标平台选择界面;确定模块,用于响应针对于目标平台的选择操作,从多个目标平台中确定出所述第一目标平台。
- 根据权利要求26至28任一项所述的装置,其特征在于,所述反编译模块,具体用于:获取所述第一代码的注解信息,所述注解信息包括所述第一代码中参数的类型、数量、跳转地址种类中的任意一种或者多种;根据所述注解信息,对所述第一代码进行反编译,得到所述第一IR。
- 根据权利要求26至29任一项所述的装置,其特征在于,所述反编译模块,具体用于:对所述第一代码进行反编译,得到第二IR;根据所述第一目标平台,对所述第二IR进行优化,得到所述第一IR。
- 根据权利要求26至30任一项所述的装置,其特征在于,所述装置还包括:生成模块,用于生成提示信息,所述提示信息用于针对待检查项进行提示,所述待检查项基于所述第一目标平台与所述源平台之间的差异生成;呈现模块,用于呈现所述提示信息。
- 根据权利26至31任一项所述的装置,其特征在于,所述装置应用于云端,所述获取模块,具体用于接收来自用户的第一代码;所述装置还包括:通信模块,用于向所述用户发送应用于第一目标平台的基于低级语言的第二代码。
- 根据权利要求26至32任一项所述的装置,其特征在于,所述反编译模块,具体用于根据所述源平台对应的指令语义库,对所述第一代码进行反编译。
- 根据权利要求33所述的装置,其特征在于,所述装置还包括:修改模块,用于响应于针对所述指令语义库的修改操作,对所述指令语义库进行修改。
- 根据权利要求26至34任一项所述的装置,其特征在于,所述获取模块,具体用于 获取应用于源平台的基于低级语言的第一代码以及基于高级语言的变量;所述反编译模块,具体用于将所述第一代码翻译成包括函数的第一IR,所述函数中包括形参,所述形参对应的实参为所述变量。
- 根据权利要求26至35任一项所述的装置,其特征在于,所述第一IR包括第一变量以及第二变量,所述第一变量具有第一逻辑地址,所述第二变量具有第二逻辑地址,所述第一逻辑地址与所述第二逻辑地址不同。
- 根据权利要求26至36任一项所述的装置,其特征在于,所述反编译模块,具体用于根据所述目标平台函数调用约定或单指令流多数据流SIMD指令,对所述第一代码进行反编译。
- 一种代码处理装置,其特征在于,所述装置包括:获取模块,用于获取应用于源平台的基于低级语言的第一代码;输出模块,用于输出第二代码,所述第二代码为应用于第一目标平台的基于低级语言的代码,所述第二代码是对所述第一代码进行处理得到的,所述源平台与所述第一目标平台具有不同指令集。
- 根据权利要求38所述的装置,其特征在于,所述输出模块,具体用于通过代码显示界面呈现所述第二代码。
- 根据权利要求38或39所述的装置,其特征在于,所述装置应用于云端,所述获取模块,具体用于接收来自用户的第一代码;所述输出模块,具体用于向所述用户输出所述第二代码。
- 根据权利要求38至40任一项所述的装置,其特征在于,所述装置还包括:呈现模块,用于呈现目标平台选择界面;确定模块,用于响应针对于目标平台的选择操作,从多个目标平台中确定出所述第一目标平台。
- 根据权利要求38至41任一项所述的装置,其特征在于,所述获取模块,还用于获取所述源平台对应的指令语义库,所述指令语义库用于对所述第一代码进行处理。
- 根据权利要求42所述的装置,其特征在于,所述装置还包括:修改模块,用于响应于针对所述指令语义库的修改操作,对所述指令语义库进行修改。
- 根据权利要求38至43任一项所述的装置,其特征在于,所述装置还包括:生成模块,用于生成提示信息,所述提示信息用于针对待检查项进行提示,所述待检查项基于所述第一目标平台与所述源平台之间的差异生成;呈现模块,用于呈现所述提示信息。
- 根据权利要求38至44任一项所述的装置,其特征在于,所述装置还包括:呈现模块,用于呈现第一中间表示IR,所述第一IR是对所述第一代码进行反编译得到,所述第二代码是对所述第一IR进行编译得到。
- 根据权利要求45所述的装置,其特征在于,所述呈现模块,还用于呈现第二IR,所述第二IR是对所述第一代码进行反编译得到,所述第一IR是根据所述第一目标平台对所述第一IR进行优化得到。
- 根据权利要求45或46所述的装置,其特征在于,所述装置还包括:修改模块,用于响应于针对所述第一IR的修改操作,得到修改后的第一IR;所述呈现模块,还用于呈现第三代码,所述第三代码为应用于第一目标平台的基于低级语言的代码,所述第三代码是对所述修改后的第一IR进行编译得到的。
- 根据权利要求45至47任一项所述的装置,其特征在于,所述呈现模块,还用于呈现第三IR,所述第三IR是对所述第一代码进行反编译得到,所述第三IR用于生成应用于第二目标平台的基于低级语言的代码,所述第三IR与所述第一IR不同。
- 根据权利要求38至48任一项所述的装置,其特征在于,装置还包括:修改模块,用于响应于针对所述第二代码的修改操作,得到修改后的第二代码;所述输出模块,还用于输出所述修改后的第二代码。
- 根据权利要求38至49任一项所述的装置,其特征在于,所述获取模块,具体用于获取应用于源平台的基于低级语言的第一代码以及基于高级语言的变量。
- 一种计算装置,其特征在于,所述装置包括存储器和处理器,所述存储器,用于存储软件指令;所述处理器调用所述存储器存储的软件指令,以执行上述权利要求1至12中任一所述的方法。
- 一种计算装置,其特征在于,所述装置包括存储器和处理器,所述存储器,用于存储软件指令;所述处理器调用所述存储器存储的软件指令,以执行上述权利要求13至25中任一所述的方法。
- 一种计算机可读存储介质,其特征在于,包括指令,所述指令用于实现如权利要求1至12中任一项所述的方法。
- 一种计算机可读存储介质,其特征在于,包括指令,所述指令用于实现如权利要求13至25中任一项所述的方法。
- 一种计算机程序产品,其特征在于,当其在计算机上运行时,使得计算机执行如权利要求1至12中任一项所述的方法。
- 一种计算机程序产品,其特征在于,当其在计算机上运行时,使得计算机执行如权利要求13至25中任一项所述的方法。
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP21868568.3A EP4206904A4 (en) | 2020-09-21 | 2021-09-11 | CODE PROCESSING METHOD AND DEVICE AND STORAGE MEDIUM |
US18/185,894 US20230229410A1 (en) | 2020-09-21 | 2023-03-17 | Code processing method and apparatus, and storage medium |
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010996387.6 | 2020-09-21 | ||
CN202010996387 | 2020-09-21 | ||
CN202011066288.4A CN114253554A (zh) | 2020-09-21 | 2020-09-30 | 一种代码处理方法、装置及存储介质 |
CN202011066288.4 | 2020-09-30 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US18/185,894 Continuation US20230229410A1 (en) | 2020-09-21 | 2023-03-17 | Code processing method and apparatus, and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022057743A1 true WO2022057743A1 (zh) | 2022-03-24 |
Family
ID=80775905
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2021/117845 WO2022057743A1 (zh) | 2020-09-21 | 2021-09-11 | 一种代码处理方法、装置及存储介质 |
Country Status (3)
Country | Link |
---|---|
US (1) | US20230229410A1 (zh) |
EP (1) | EP4206904A4 (zh) |
WO (1) | WO2022057743A1 (zh) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103235724A (zh) * | 2013-05-10 | 2013-08-07 | 中国人民解放军信息工程大学 | 基于原子操作语义描述的多源二进制代码一体化翻译方法 |
CN103885770A (zh) * | 2014-01-21 | 2014-06-25 | 唐道成 | 单片机从可执行文件找回汇编文件的实现方法 |
US20170206068A1 (en) * | 2016-01-19 | 2017-07-20 | Cray Inc. | Program optimization based on directives for intermediate code |
CN110825386A (zh) * | 2019-11-01 | 2020-02-21 | 腾讯科技(深圳)有限公司 | 代码的编译方法和装置、存储介质 |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
PT3427148T (pt) * | 2016-03-11 | 2022-03-23 | Lzlabs Gmbh | Compilador de módulo de carregamento |
-
2021
- 2021-09-11 EP EP21868568.3A patent/EP4206904A4/en active Pending
- 2021-09-11 WO PCT/CN2021/117845 patent/WO2022057743A1/zh unknown
-
2023
- 2023-03-17 US US18/185,894 patent/US20230229410A1/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103235724A (zh) * | 2013-05-10 | 2013-08-07 | 中国人民解放军信息工程大学 | 基于原子操作语义描述的多源二进制代码一体化翻译方法 |
CN103885770A (zh) * | 2014-01-21 | 2014-06-25 | 唐道成 | 单片机从可执行文件找回汇编文件的实现方法 |
US20170206068A1 (en) * | 2016-01-19 | 2017-07-20 | Cray Inc. | Program optimization based on directives for intermediate code |
CN110825386A (zh) * | 2019-11-01 | 2020-02-21 | 腾讯科技(深圳)有限公司 | 代码的编译方法和装置、存储介质 |
Non-Patent Citations (1)
Title |
---|
See also references of EP4206904A4 |
Also Published As
Publication number | Publication date |
---|---|
EP4206904A4 (en) | 2024-01-24 |
EP4206904A1 (en) | 2023-07-05 |
US20230229410A1 (en) | 2023-07-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US8756590B2 (en) | Binding data parallel device source code | |
US20200364036A1 (en) | Persistent annotation of syntax graphs for code optimization | |
US8473971B2 (en) | Type inference and type-directed late binding | |
US8997070B2 (en) | Extension mechanism for scripting language compiler | |
JP5893038B2 (ja) | ユーザ定義型のコンパイル時境界検査 | |
JPH07122854B2 (ja) | プログラム翻訳方法 | |
CN109564540A (zh) | 用于jit编译器的调试工具 | |
WO2024045379A1 (zh) | 编译方法和编译器、Wasm虚拟机 | |
US20070074185A1 (en) | Identifier expressions | |
CN110058861B (zh) | 源码处理方法及装置、存储介质、电子设备 | |
US20230113783A1 (en) | Cross-platform code conversion method and device | |
US7441237B2 (en) | System and method for extending a compiler through a composer | |
US10983771B1 (en) | Quality checking inferred types in a set of code | |
CN114253554A (zh) | 一种代码处理方法、装置及存储介质 | |
Doolin et al. | JLAPACK–compiling LAPACK Fortran to Java | |
US20170075668A1 (en) | Methods and Systems for Generating Client-Server Applications for Target Devices | |
WO2022057743A1 (zh) | 一种代码处理方法、装置及存储介质 | |
WO2022068559A1 (zh) | 一种代码处理方法、装置及设备 | |
Fritzson et al. | Metamodelica–a symbolic-numeric modelica language and comparison to julia | |
KR20190060561A (ko) | 이종 디바이스의 선택적 추상화를 통한 사물 인터넷 어플리케이션의 개발 환경을 통합하는 방법 및 시스템 | |
WO2024198779A1 (zh) | 字节码的操作类型的标记方法、装置以及电子设备 | |
WO2020142195A1 (en) | Techniques for scheduling instructions in compiling source code | |
TW200417924A (en) | WIN F-language interpreter | |
Quinlan et al. | Rose user manual: A tool for building source-to-source translators draft user manual (version 0.9. 11.115) | |
CN117075912B (zh) | 用于程序语言转换的方法、编译方法及相关设备 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21868568 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2021868568 Country of ref document: EP Effective date: 20230329 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |