KR20180057317A - Apparatus and method for intermediate language transformation of binary data - Google Patents
Apparatus and method for intermediate language transformation of binary data Download PDFInfo
- Publication number
- KR20180057317A KR20180057317A KR1020160155803A KR20160155803A KR20180057317A KR 20180057317 A KR20180057317 A KR 20180057317A KR 1020160155803 A KR1020160155803 A KR 1020160155803A KR 20160155803 A KR20160155803 A KR 20160155803A KR 20180057317 A KR20180057317 A KR 20180057317A
- Authority
- KR
- South Korea
- Prior art keywords
- intermediate language
- language
- generated
- preprocessing
- information
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformation of program code
Landscapes
- Engineering & Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Stored Programmes (AREA)
- Devices For Executing Special Programs (AREA)
Abstract
Description
The present invention relates to a stepwise intermediate language conversion apparatus and method.
An intermediate language is a language that the compiler generates to translate source code into binary code. In other words, it is an intermediate language to represent binaries of various platforms in one unified form.
However, the process of converting an intermediate language from binary code is a very difficult problem, unlike compilation, because the abstract information such as type information and variable names existing in the source in the compilation process disappears. Therefore, it is necessary to restore the lost abstract information in the intermediate language conversion process.
Converting binary code into an intermediate language is the most important underlying technology for binary analysis. Without the conversion process to the intermediate language, the existing program analysis technique can not be applied. In addition, in terms of reverse engineering, it is necessary to conduct analysis through a high-level language rather than a low-level code.
Most of the existing intermediate language conversion is done in one step. That is, the method of disassembling the binary code and expressing it as one intermediate language is adopted. This method is advantageous for expressing low-level machine language, but it is inefficient for expressing high-level language.
For example, to express a For statement that exists in a high-level language, it is possible to link several intermediate language statements in a low-level intermediate language. In addition, type information that exists in a high-level language can not contain such information in a low-level Abstract Syntax Tree (AST).
Therefore, the present invention provides a step-like intermediate language conversion apparatus and method for constructing a step-by-step intermediate language to enable efficient binary analysis.
According to another aspect of the present invention, there is provided an intermediate language conversion apparatus,
A pre-processing unit for receiving a disassembled assembly language from a binary file, and for generating a preprocessing intermediate language from the assembly language; And generating a first stage intermediate language from the preprocessing intermediate language, receiving the first stage intermediate language as input and generating a second stage intermediate language, and generating a stepwise intermediate language as a final intermediate language generated at a predetermined number of times And a post-processing unit.
Wherein the post-processing unit comprises: a first post-processor for receiving the preprocessing intermediate language as input and generating the first stage intermediate language from the preprocessing intermediate language; And a second post-processor that receives the first intermediate language as input and generates the second-stage intermediate language from the first intermediate language.
The level of the first language intermediate language may be higher than the level of the preprocessing intermediate language and the level of the second language intermediate language may be higher than the level of the first language intermediate language.
The preprocessor may receive the binary information, which is environment information of the generated binary file, and may generate the preprocessed intermediate language after deriving the abstract information based on the binary information when generating the preprocessed intermediate language.
According to another aspect of the present invention, there is provided a method for converting an assembly language into an intermediate language,
Generating a preprocessing intermediate language from the assembly language; Generating a first language intermediate language from the generated preprocessing intermediate language; Generating a second language intermediate language from the first language intermediate language; And setting the second stage intermediate language as a final intermediate language if the generated second stage intermediate language is an intermediate language for each final stage of the preset stage.
The step of generating the preprocessing intermediate language includes: receiving binary information, which is environment information on which a binary file for the assembly language is generated; And deriving abstraction information from the assembly language based on the binary information.
The step of generating the second language intermediate language may include receiving the second language intermediate language and generating a third language intermediate language if the second language intermediate language is not the intermediate language of the last step of the preset step Step < / RTI >
The first language intermediate language, the second language intermediate language, and the third language intermediate language are generated in units of preset blocks, and the block unit corresponds to any one of a function or a module constituting the binary file .
According to the present invention, a binary code can be expressed in a language ranging from a low level to a high level.
In addition, the efficiency of reverse engineering can be improved through high level intermediate language, and various abstract information that does not exist in binary can be deduced.
FIG. 1 is a diagram illustrating an exemplary program generation process.
2 is an exemplary diagram of an environment including an intermediate language conversion apparatus according to an embodiment of the present invention.
3 is a structural diagram of an intermediate language conversion apparatus according to an embodiment of the present invention.
4 is a flowchart illustrating a method of restoring a program through an intermediate language conversion method according to an embodiment of the present invention.
Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings so that those skilled in the art can easily carry out the present invention. The present invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. In order to clearly illustrate the present invention, parts not related to the description are omitted, and similar parts are denoted by like reference characters throughout the specification.
Throughout the specification, when an element is referred to as "comprising ", it means that it can include other elements as well, without excluding other elements unless specifically stated otherwise.
Hereinafter, an intermediate language conversion apparatus and method according to an embodiment of the present invention will be described with reference to the drawings. Before describing an embodiment of the present invention, a general program creation process will be described with reference to FIG.
FIG. 1 is a diagram illustrating an exemplary program generation process.
As shown in FIG. 1, in a general program generation process, when a source code generated by a program developer is inputted with an idea of a program (S10), the program generates an input source code through an intermediate language and an assembly language (S20, S30). Then, the generated assembly language is converted into a binary code that can be understood by the computer, and the program is operated (S40).
Thus, in the process of creating a program, only one intermediate language generation step is required when converting from source code to assembly language. In the reverse engineering process of a general programming process, the same intermediate language generation step occurs, which is advantageous for low-level machine language representation. However, in order to express various loops or conditional statements existing in a high-level language, it is inefficient because it can be expressed only by connecting a plurality of intermediate language sentences.
Therefore, not only is it possible to provide a basis for efficient binary analysis by constructing a step-by-step intermediate language in the reverse engineering process, but also an intermediate language capable of expressing a binary code in a multi-platform can be created to facilitate expression and expansion of various platforms, An intermediate language conversion apparatus and method for generating a language and enabling high level code information expansion will be described with reference to Figs. 2 to 4. Fig.
2 is an exemplary diagram of an environment including an intermediate language conversion apparatus according to an embodiment of the present invention.
As shown in FIG. 2, binary information is obtained based on various types of binary codes, operating system (OS) information of an environment in which binary codes are written, and instruction set architecture (ISA) information. The assembly language generated by converting the binary code is input to the intermediate
Here, an example of grasping binary information will be described. When PE file format is used among various types of file storage formats of a binary file (for example, PE, ELF, MACH, etc.), a PE viewer tool System information and command system information can be easily identified. There are various methods of grasping the binary information, and the present invention is not limited to any one method.
The intermediate
In the above-described environment, the structure of the intermediate
3 is a structural diagram of an intermediate language conversion apparatus according to an embodiment of the present invention.
As shown in FIG. 3, the intermediate
The preprocessing
That is, the preprocessing
When the preprocessing
Therefore, in order to preprocess the assembly language, which is a machine language converted from the binary code, the
The
The
Here, the final intermediate language is an intermediate language at a stage before being generated as a source file, and means an intermediate language identical to the expression described in a programming language of a higher concept. If language features (eg, grammar, etc.) are taken into account from the final intermediate language, they can be generated in source code.
The reference block unit is a block unit based on a reference point such as a branch instruction, which is a basic unit of the intermediate language expression in each step according to the embodiment of the present invention. In other words, if you convert from a binary file to assembly language, it is difficult to understand the binary file flow only with the generated assembly language. Therefore, in order to easily grasp the entire configuration of the binary file, a step-by-step intermediate language can be expressed in units of functions or modules constituting a binary file, so that a high-level language can be expressed as an intermediate language without connecting several intermediate languages .
The first stage intermediate language, which is a step-by-step intermediate language created by translating the preprocessing intermediate language by the first-stage post-processor 120-1, is generated as an intermediate language at a higher level than the preprocessing intermediate language. The second stage post-processor 120-2 again translates the first intermediate language, which is the intermediate language of the generated upper level, to generate the second intermediate language at a higher level than the first intermediate language.
The
To this end, in the embodiment of the present invention, a plurality of post-processing units 120-1 to 120-n form the
An intermediate language grammar for generating a step-by-step intermediate language is defined in each of the post processors 120-1 to 120-n. Here, the method of defining the intermediate language grammatical form or intermediate language grammars stored in the post-processors 120-1 to 120-n is not limited to any one method.
In addition, the post processors 120-1 to 120-n of each of the stages may generate the number of post processors after the step is defined according to the step defined to generate the intermediate language for each step, or may use only some of the post processors The present invention is not limited to any one method.
And, the step-by-step intermediate language can be expressed variously according to the use of the intermediate language. For example, stepwise intermediate language can be expressed in terms of characteristics according to the purpose of reverse engineering such as for vulnerability analysis or malicious code analysis. The method of expressing the intermediate language in stages can be performed by various methods, and a detailed description thereof will be omitted in the embodiment of the present invention.
A method for converting an assembly language into an intermediate language using the above-described intermediate
4 is a flowchart illustrating a method of restoring a program through an intermediate language conversion method according to an embodiment of the present invention.
4, when a binary code is generated in any one of systems having various command systems or operating systems (S100), the generated binary code is converted from an inverse assembler (not shown) to an assembly language (S110). How the binary code is generated or how the disassembler converts the binary code into the assembly language is already known, and a detailed description thereof will be omitted in the embodiment of the present invention.
When the intermediate
In step S120, when the
The
Here, the abstraction information includes data abstraction information and flow abstraction information. The data abstraction information corresponds to information for defining an intermediate language in order of restoring complex data in a unit data size in the order of simple data, continuous data, complex data, and data confidentiality. The flow abstraction information is information capable of restoring contents of a program flow change such as a simple logical value, repetition, branch, and function.
In step S150, the
The final intermediate language output in step S160 is then generated as source code by code printing (S170). The matters relating to code printing are already known, and a detailed description thereof will be omitted in the embodiment of the present invention.
If it is determined in step S150 that the generated intermediate language is not the intermediate language corresponding to the preset step, the procedure after step S130 in which the intermediate language for each step is interpreted to generate the intermediate language for each step is performed do.
For example, if a stepwise intermediate language is set to be generated in all three steps by a user's input or other input, the
Finally, the second stage intermediate language is interpreted once again to create a third level intermediate language at a higher level than the second intermediate language. Here, the third stage intermediate language becomes the final intermediate language. Thus, by expressing the intermediate language generating step of one stage in the past as an intermediate language generating step of a plurality of stages, it is easy to express a high-level language. In addition, since a low-level intermediate language is input and a high-level intermediate language is generated, it is possible to verify the intermediate language generated at each stage.
While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it is to be understood that the invention is not limited to the disclosed exemplary embodiments, It belongs to the scope of right.
Claims (12)
A pre-processing unit for receiving a disassembled assembly language from a binary file, and for generating a preprocessing intermediate language from the assembly language; And
A first stage intermediate language is generated from the preprocessing intermediate language, a second stage intermediate language is generated by receiving the first stage intermediate language, and a stepwise intermediate language generated at a predetermined number of times is generated as a final intermediate language Processing unit
And an intermediate language conversion unit.
The post-
A first post-processor for receiving the preprocessing intermediate language as input and generating the first stage intermediate language from the preprocessing intermediate language; And
A second post-processor that receives the first intermediate language as input and generates the second intermediate language from the first intermediate language,
And an intermediate language conversion unit.
Wherein the post-processor is included in the post-processing unit a predetermined number of times to generate the intermediate language for each step.
Wherein the level of the first language intermediate language is higher than the level of the preprocessing intermediate language and the level of the second language intermediate language is higher than the level of the first language intermediate language.
The pre-
Receiving binary information, which is environmental information of the generated binary file,
Wherein when generating the preprocessing intermediate language, abstracting information is derived based on the binary information, and then the preprocessing intermediate language is generated.
Wherein the binary information includes operating system information on which the binary file is generated and command system information for recognizing the binary file.
Generating a preprocessing intermediate language from the assembly language;
Generating a first language intermediate language from the generated preprocessing intermediate language;
Generating a second language intermediate language from the first language intermediate language; And
If the generated second stage intermediate language is an intermediate language for each final stage of the preset stage, setting the second stage intermediate language as a final intermediate language
/ RTI >
Wherein the generating the preprocessing intermediate language comprises:
Receiving binary information that is environment information on which a binary file for the assembly language is generated; And
Deriving the abstract information from the assembly language based on the binary information
Further comprising the steps of:
Wherein the binary information includes operating system information on which the binary file is generated and command system information for recognizing the binary file.
Wherein the level of the first language intermediate language is higher than the level of the preprocessing intermediate language and the level of the second language intermediate language is higher than the level of the first language intermediate language.
Wherein the step of generating the second language intermediate language comprises:
If the second stage intermediate language is not the intermediate language for the last stage of the preset stage, generating the third stage intermediate language by receiving the second stage intermediate language as input
Further comprising the steps of:
Wherein the first language intermediate language, the second language intermediate language, and the third language intermediate language are respectively generated in units of preset blocks,
Wherein the block unit corresponds to a unit of a function or a module constituting the binary file.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020160155803A KR20180057317A (en) | 2016-11-22 | 2016-11-22 | Apparatus and method for intermediate language transformation of binary data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020160155803A KR20180057317A (en) | 2016-11-22 | 2016-11-22 | Apparatus and method for intermediate language transformation of binary data |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
KR1020180100655A Division KR20180098213A (en) | 2018-08-27 | 2018-08-27 | Apparatus and method for intermediate language transformation of binary data |
Publications (1)
Publication Number | Publication Date |
---|---|
KR20180057317A true KR20180057317A (en) | 2018-05-30 |
Family
ID=62300110
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
KR1020160155803A KR20180057317A (en) | 2016-11-22 | 2016-11-22 | Apparatus and method for intermediate language transformation of binary data |
Country Status (1)
Country | Link |
---|---|
KR (1) | KR20180057317A (en) |
-
2016
- 2016-11-22 KR KR1020160155803A patent/KR20180057317A/en active Application Filing
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108388425B (en) | Method for automatically completing codes based on LSTM | |
CN103218294B (en) | A kind of adjustment method of embedded system, debugging conversion equipment and system | |
CN109086215B (en) | Embedded software unit test case generation method and system | |
JPH08202545A (en) | Object-oriented system and method for generation of target language code | |
US7849394B2 (en) | Linked code generation report | |
JP2007141173A (en) | Compiling system, debug system and program development system | |
CN107291522B (en) | Compiling optimization method and system for user-defined rule file | |
RU2004100525A (en) | METHOD AND SYSTEM FOR RECORDING MACROS IN SYNTAXIS, INDEPENDENT ON THE LANGUAGE | |
CN112269566B (en) | Script generation processing method, device, equipment and system | |
CN112540767B (en) | Program code generation method and device, electronic equipment and storage medium | |
US20020026632A1 (en) | Universal computer code generator | |
JP2016157407A (en) | Prior construction method of vocabulary semantic pattern for text analysis and response system | |
CN112764738A (en) | Code automatic generation method and system based on multi-view program characteristics | |
CN117971236B (en) | Operator analysis method, device, equipment and medium based on lexical and grammatical analysis | |
US20150020051A1 (en) | Method and apparatus for automated conversion of software applications | |
KR20060089862A (en) | Pre-compiling device | |
CN112270176B (en) | Method, apparatus, and computer storage medium for mode conversion in a deep learning framework | |
Zhang et al. | Automated extraction of grammar optimization rule configurations for metamodel-grammar co-evolution | |
US20080141230A1 (en) | Scope-Constrained Specification Of Features In A Programming Language | |
CN104731705B (en) | A kind of dirty data propagation path based on complex network finds method | |
KR20180098213A (en) | Apparatus and method for intermediate language transformation of binary data | |
KR20180057317A (en) | Apparatus and method for intermediate language transformation of binary data | |
US20090112568A1 (en) | Method for Generating a Simulation Program Which Can Be Executed On a Host Computer | |
Akers et al. | Case study: Re-engineering C++ component models via automatic program transformation | |
CN109814869B (en) | Analysis method and system applied to robot and computer readable storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
A201 | Request for examination | ||
A302 | Request for accelerated examination | ||
E902 | Notification of reason for refusal | ||
AMND | Amendment | ||
E601 | Decision to refuse application | ||
AMND | Amendment | ||
A107 | Divisional application of patent |