CN103049265A - Method for processing zone bits in reverse decompilation system - Google Patents
Method for processing zone bits in reverse decompilation system Download PDFInfo
- Publication number
- CN103049265A CN103049265A CN2012105460924A CN201210546092A CN103049265A CN 103049265 A CN103049265 A CN 103049265A CN 2012105460924 A CN2012105460924 A CN 2012105460924A CN 201210546092 A CN201210546092 A CN 201210546092A CN 103049265 A CN103049265 A CN 103049265A
- Authority
- CN
- China
- Prior art keywords
- file
- zone bit
- processing
- statement
- array
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Landscapes
- Stored Programmes (AREA)
Abstract
The invention relates to a method for processing zone bits in a reverse decompilation system. The method includes the steps: firstly, connecting a debugger to a computer to fetch binary codes of a corresponding microprocessor; secondly, performing disassembly to realize particular processing according to a particular processor by referring to the assembly format of the processor and to generate particular assembly codes of the processor; and thirdly, reversely generating a high-level language from an assembly language. An existing common microprocessor is encrypted to a certain degree and can be debugged in a code fetching manner by the method before a fuse is burned out, but the binary codes cannot be fetched by the method after encryption, and a method for fetching the binary codes in the microprocessor is not within the discussion scope of the invention. Therefore, decompilation can be higher in accuracy in the decompilation process, and the method is simple and easy to operate and convenient to understand.
Description
Technical field
The present invention relates to the disposal route of zone bit, especially relate to a kind of in reverse anti-compiler the disposal route of zone bit.
Background technology
At society, development in science and technology is maked rapid progress, embedded system widespread use in mobile phone and various portable equipment.At aircraft, the vehicles such as automobile extensively adopt various microprocessors at military installations, deep-sea exploration, Space Science and Technology etc. even, the system maintainability that brings thus, malicious code analysis, there is larger defective on the system security reliability, simultaneously Legacy software utilization etc. also impelled the research reverse to software.Therefore reverse compiling is carried out in the code information of microprocessor just imperative.
Whether legal and whether can protect in software developer's the group and have certain law battle to reverse-engineering; reverse law battle in the nineties in 20th century about software finishes finally; definition with reference to the software giant U.S.; according to federal laws; the software that has copyright is carried out the reverse-engineering operation such as dis-assembling; if not unlawful interests is competed with it or obtained to the development of new products, the contrary operation that then carries out is legal [PamelaSamuelson 1990].So so long as not for economic interests and participate in the competition, it is legal carrying out reverse-engineering.
Embedded microprocessor is different from Intel more common on the personal computer and the CPU of AMD, and embedded microprocessor generally adopts oneself uniquely assembly language control whole system, and more common is serial such as Texas Instrument and auspicious Sa.Obtain the result after the dis-assembling of microprocessor, when carrying out reverse compiling, always relate to the processing of zone bit.Mainly contain at present the series such as X86, Am186/88, ARM, MIPS, PowerPC68K about embedded microprocessor.
Summary of the invention
Above-mentioned technical matters of the present invention is mainly solved by following technical proposals:
A kind of in reverse anti-compiler the disposal route of zone bit, it is characterized in that, may further comprise the steps:
Step 4, after processing whole assembling file according to the processing mode of step 3, by the processing of various controlled circulation structures and array, just can relatively be convenient to the higher level lanquage B file of understanding, at the head that the A file is added to the B file, can produce and have the comparatively higher level lanquage of complete meaning.
The present invention includes: the decompiling flow process of embedded microprocessor.The decompiling flow process of embedded decompiling comprises that mainly step is as follows, first step is that debugger is connected to the binary code that computer takes out corresponding microprocessor, present general microprocessor all can carry out certain encryption, before fuse opening, can replace by this method code debugging etc., but be encrypted and generally can not get binary code by this mode afterwards, get the method for the binary code in the microprocessor not within the discussion scope of this paper.Second step is processed, and carries out dis-assembling, and the purpose of dis-assembling is specifically to locate with reference to the compilation form of this processor according to specific processor, generates to change the specific assembly code of processor into.The purpose that the 3rd step processed is the reverse higher level lanquage that is generated as of assembly language, and the purpose of this patent namely is the method that generates a processing assembly language zone bit in the process of higher level lanquage.
Treatment step to the zone bit in the reverse compiler; Generally speaking, in the decompiling processing procedure of embedded microprocessor, have more step, all things considered, processing to assembly language is the processing that goes on foot one step ahead, namely first a meaningful assembly language is processed, in the process of this processing, run into may exert an influence to zone bit in, can generate four zone bits, we can define accordingly to zone bit, simultaneously in the process of processing, the statement that may change zone bit is added Rule of judgment, if necessary, dirty bit then.
Higher level lanquage generates, and in the Decompilation of embedded microprocessor, can generate a plurality of files at last, and then according to file of corresponding ruled synthesis.The definition of zone bit generates a file, and generally we can be defined as the int type to zone bit.Then generate the file of zone bit definition, can called after a.flag file, be other the file amalgamation that in Decompilation, generates together, just can generate comparatively senior language.
Above-mentioned a kind of in reverse anti-compiler the disposal route of zone bit, the concrete steps of described step 3 are: with reference to the practical significance of this compilation, according to the impact of this compilation on zone bit, adopt suitable with it corresponding higher level lanquage to express.
Above-mentioned a kind of in reverse anti-compiler the disposal route of zone bit, in the described step 4, be divided into for five steps, comprising: array manipulation; The processing of Switch statement; Variable is processed; The processing of controlled circulation statement; And overall treatment.
Therefore, the present invention has can make decompiling have higher accuracy rate in Decompilation, and operation is simple simultaneously, is convenient to understand.
Description of drawings
Fig. 1 is the general decompiling processing flow chart of embedded microprocessor, wherein mainly comprises three phases, obtains machine code, dis-assembling, decompiling.
Fig. 2 is the processing of the zone bit of embedded microprocessor, for the step in the decompiling processing, processes with reference to table 1.
Fig. 3 a embedded microprocessor program is the schematic diagram of the source file (C language) of middle input for example.
Compilation schematic diagram after for example middle C language compiling of Fig. 3 b embedded microprocessor program.
Fig. 3 c embedded microprocessor program is the schematic diagram of the higher level lanquage of middle output for example.
Fig. 4 is switch...case statement form of expression in compilation.
Fig. 5 is that case statement is processed schematic diagram.
Fig. 6 is case statement specifying information schematic diagram.
Fig. 7 is while compilation form schematic diagram.
Fig. 8 is the file data schematic diagram that reads in.
Embodiment
Below by embodiment, and by reference to the accompanying drawings, technical scheme of the present invention is described in further detail.
Embodiment:
1. read in the assembly language file of input; With reference to Fig. 2, after beginning, program reads in the assembly language of input.
2. according to the specific format of input file, such as being MSP430 or M16C etc., define its corresponding zone bit, and be deposited among the zone bit file A, as an example of MSP430 example (owing in Decompilation, do not relate to the processing of overflow position, therefore overflow position is not processed), then be:
int?N=0;
int?Z=0;
int?C=0;
In Fig. 2, if reading data may exert an influence to zone bit, then these words are divided into two step process, the first step is that corresponding statement is processed, second step is that the zone bit that possible have influence on is processed.
3. reading according to a significant assembly language and process, is not significant compilation such as the statement of single MOV, MOV A, B; Be significant compilation then, with reference to table 1, if this significant compilation has influence on zone bit, then process accordingly according to table 1, such as:
add.w?R14,R15
According to table 1, we can translate (owing to not relating to the processing of overflow position in Decompilation, therefore overflow position is not processed) like this:
int?R14,R15;
R15=R15+R14;
If(R15<0)
N=1;
Else
N=0;
If(R15=0)
Z=1;
Else
Z=0;
Because the difference of the concrete structure of various embedded microprocessor and data storage word length and difference, so can produce different determination methods, generally speaking take the storage data word length as criterion, length such as the storage data is 8, between-255-255, then do not have carry or borrow at operation result, otherwise have carry or borrow.We then are 16 take MSP430 as example.
If(-32767=<R15<=32768)
C=1;
Else
C=0;
Overflow equally and can judge according to the form of storage data.Which statement can exert an influence to zone bit among the table 1 expression MSP430, and which statement needs the assistance of zone bit, the reverse compiling of ability.
4. process whole assembling file according to the processing mode in the 3rd step, pass through again other necessary data structure analysis etc., just can relatively be convenient to the higher level lanquage B file understood, at the head that the A file is added to the B file, can have been produced higher level lanquage with complete meaning comparatively.The original C file of a figure expression in Fig. 3, b then represents to be input to the assembling file that MSP430 produces afterwards, and c represents the file after the decompiling.Mainly be divided into following substep:
4.1. the processing of array.
If in compilation process, run into the assembly language of mova, be array, just can with reference to above-mentioned data structure, extract corresponding array and get final product.
We must at first define the details that a structure is characterized in array in the compilation:
Wherein arrayInfo is the name of structure, comprises four information in the structure, is exactly the definition of name of array: name[length at first], purpose is in assembling file is processed, if find array, then array is processed, just the array name after processing leaves in this array.Thereafter chariniAddress[length] what deposit is the initial address that runs into array.
4.2.switch the processing of statement.
After first step array manipulation, carry out the processing of the switch statement of second step, process first array, the reason of processing again the switch statement is: array may be present in the switch statement, so must be according to this order.
We have a look first in assembly language, switch ... the form of expression of case statement can be referring to Fig. 4.With reference to this function, we can take multistep to realize the function of function:
4.21. define two file arrays:
Char fileArray[Max] [Length]; // file storage array
Char swQueue[switchMax] [Length]; //case array
The purpose of swQueue array is the options of preserving swtich, and minute number of case maximum in the switch case statement.Macro definition wherein also can find in table 3-6.Wherein obtaining of file array can be adopted do ... the method of while circulation, in the file every the space, just two different character strings are saved in the array.Below specific code sees for details:
In the code, we can see existence function IsKeyWord(in judging statement in the above); The purpose of this function is to judge whether the character that reads in from file is meaningful, and we can be referring to the description of this function:
Therefrom can find out, if the character of input is may exist in assembly language, then return very, just otherwise return vacation.
In sum, at first by a do ... while cycle control file reading character only reads a character at every turn, and when the character that reads was the space, we just moved forward a unit to character array, continue access data.We just can learn with this, how assembling file are read in the character array and go.
The file that our hypothesis is read in only has two row, as shown in Figure 8.
Then we are by after the above-mentioned processing, and what preserve in the fileArrays array is 8 data, is respectively
fileArrays[0][Length]=
{'0','A','0','B','E','8','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$'};
fileArrays[1][Length]=
{'M','O','V','.','W',':','Q','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$'};
fileArrays[2][Length]=
{'#','1','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$'};
fileArrays[3][Length]=
{'R','1','$',$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$'};
fileArrays[4][Length]=
{'M','O','V','.','W',':','Q','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$'};
fileArrays[5][Length]=
{'#','2','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$'};
fileArrays[6][Length]=
{'#','2','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$'};
fileArrays[7][Length]=
{'A','1','0','B','E','8','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$'};
Top data are slightly explained, and we just use dollar mark () " $ " to come the initialization array, because for ASC II code, only have this symbol not have too actual meaning in the c Programming with Pascal Language.With real its institute that gets of dollar mark () initialization array.
4.22. define four character arrays:
char?JSR_A[Length];
char?DB[Length];
char?JMP_B[Length];
char?JMP_S[Length];
The purpose that defines these arrays is to mate with data in file character array fileArray, because these four assembly instructions are indicating the switch case statement, when the match is successful, just as can be known in this assembling file, has the switch case statement.
4.23. concrete methods of realizing, we adopt the mode of false code to describe, and before describing, we illustrate first the function ArrayLength(that we define) function, in being described below, that may use arrives.As its name suggests, the function of ArrayLength function is exactly to return the function of the long length of array, but exists different from the function of common return function length.Referring to definition
It is very simple that this function is write, if the existence in the array is not equal to the character of " $ ", then returns the quantity of this character, otherwise return 0 value.
Switch statement processing module: we are take Fig. 5 as example: we use the detailed mark of black line frame out case statement wherein in Fig. 5:
We can extract such information from figure:
case?1:,case?3,case?5,case?8,case?0x16。
Refer again to information thereafter, as shown in Figure 6: can see very clearly that case statement JMP statement is corresponding one by one, with this we just very easily solution decided case statement, so this routine switch ... case statement, we just can translate like this:
Switch(R0 in fact), need not become, the switch statement is all with switch(R0 in assembly language) beginning.
All return even.
4.3. variable is processed.
The purpose that variable is processed mainly is to eliminate those in previous step, the variable that re-defines.Thinking is comparatively simple, whenever reads a variable, just variable is thereafter all eliminated, and when reading end-of-file (EOF), just finishes dealing with like this.
4.4.While statement is processed (processing of controlled circulation statement).
The processing of While statement is comparatively loaded down with trivial details, and we are at the call relation in master program file;
The Third function is that we process while statement, if we call the third function return value is 1, represents that we have processed while statement in calling specifically, if rreturn value is 0, then while statement is not processed in expression, then continues to carry out.
Process while statement, we look at first the expression of while statement in the meeting compilation:
The assembly language of this section coded representation while:
We can find out substantially such as Fig. 7, and for while statement, we are the compilation in the decisional block at first.
Therefrom can find out and judge that assembly language needs JMP and CMP, and JNZ.When existing these several statements to occur simultaneously, and address thereafter just can be judged as while statement when mutually correlating.Therefore can be translated as:
4.5.if statement is processed (overall treatment).
The judgement of If statement mainly is to see whether to have CMP and can noly have the redirect condition afterwards, if exist, is the if control statement just can be judged as.In the code of appendix, the if control statement that generates when also depositing the processing to zone bit
The example of if statement is as follows:
Generally speaking, the if statement also has relatively-stationary Rule of judgment.
Add JLT, JZ etc. statement after the CMP.So the above can be translated as
Following table is table 1: and the processing of embedded microprocessor zone bit (take the MSP430 of Texas Instrument as example, two positional operands are defined as RA, RB, and an operand is RC, V, N, Z, C are four zone bits, represent respectively overflow position, negative position, zero-bit and carry)
Wherein, " * " represents to affect "-" expression in the mode bit does not affect " 0 " expression zero clearing " 1 " expression set, and what contain .B is the single byte operation instruction, and what contain [.W] is double byte operational order (can omit).
Specific embodiment described herein only is to the explanation for example of the present invention's spirit.Those skilled in the art can make various modifications or replenish or adopt similar mode to substitute described specific embodiment, but can't depart from spirit of the present invention or surmount the defined scope of appended claims.
Claims (3)
1. the disposal route of a zone bit in reverse anti-compiler is characterized in that, may further comprise the steps:
Step 1 is read in the assembly language file of input by initialization module;
Step 2 by the specific format of zone bit identification module according to input file, defines its corresponding zone bit, and is deposited among the zone bit file A;
Step 3 reads according to the assembly language of setting and processes by being designated processing module;
Step 4, after processing whole assembling file according to the processing mode of step 3, by the processing of various controlled circulation structures and array, just can relatively be convenient to the higher level lanquage B file of understanding, at the head that the A file is added to the B file, can produce and have the comparatively higher level lanquage of complete meaning.
According to claim 1 a kind of in reverse anti-compiler the disposal route of zone bit, it is characterized in that, the concrete steps of described step 3 are: with reference to the practical significance of this compilation, according to the impact of this compilation on zone bit, adopt suitable with it corresponding higher level lanquage to express.
According to claim 1 a kind of in reverse anti-compiler the disposal route of zone bit, it is characterized in that, in the described step 4, be divided into for five steps, comprising: array manipulation; The processing of Switch statement; Variable is processed; The processing of controlled circulation statement; And overall treatment.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210546092.4A CN103049265B (en) | 2012-12-14 | 2012-12-14 | A kind of processing method of flag bit in reverse anti-compiler |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210546092.4A CN103049265B (en) | 2012-12-14 | 2012-12-14 | A kind of processing method of flag bit in reverse anti-compiler |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103049265A true CN103049265A (en) | 2013-04-17 |
CN103049265B CN103049265B (en) | 2016-12-28 |
Family
ID=48061917
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201210546092.4A Expired - Fee Related CN103049265B (en) | 2012-12-14 | 2012-12-14 | A kind of processing method of flag bit in reverse anti-compiler |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103049265B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108958739A (en) * | 2018-06-06 | 2018-12-07 | 北京大学 | Array data structure restoration methods and system in a kind of binary system decompiling |
CN111935622A (en) * | 2020-08-03 | 2020-11-13 | 深圳创维-Rgb电子有限公司 | Debugging method, device, equipment and storage medium for electronic equipment with digital power amplifier |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101364253A (en) * | 2007-08-06 | 2009-02-11 | 电子科技大学 | Covert debug engine and method for anti-worm |
CN101714118A (en) * | 2009-11-20 | 2010-05-26 | 北京邮电大学 | Detector for binary-code buffer-zone overflow bugs, and detection method thereof |
-
2012
- 2012-12-14 CN CN201210546092.4A patent/CN103049265B/en not_active Expired - Fee Related
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101364253A (en) * | 2007-08-06 | 2009-02-11 | 电子科技大学 | Covert debug engine and method for anti-worm |
CN101714118A (en) * | 2009-11-20 | 2010-05-26 | 北京邮电大学 | Detector for binary-code buffer-zone overflow bugs, and detection method thereof |
Non-Patent Citations (3)
Title |
---|
殷文建: "面向ARM体系结构的代码逆向分析关键技术研究", 《中国优秀硕士学位论文全文数据库》 * |
秦青文等: "基于IDA-Pro的软件逆向分析方法", 《计算机工程》 * |
韩小琨等: "可重用的指令集模拟器的设计与优化技术", 《计算机工程》 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108958739A (en) * | 2018-06-06 | 2018-12-07 | 北京大学 | Array data structure restoration methods and system in a kind of binary system decompiling |
CN108958739B (en) * | 2018-06-06 | 2020-11-10 | 北京大学 | Method and system for recovering array data structure in binary decompilation |
CN111935622A (en) * | 2020-08-03 | 2020-11-13 | 深圳创维-Rgb电子有限公司 | Debugging method, device, equipment and storage medium for electronic equipment with digital power amplifier |
CN111935622B (en) * | 2020-08-03 | 2022-02-11 | 深圳创维-Rgb电子有限公司 | Debugging method, device, equipment and storage medium for electronic equipment with digital power amplifier |
Also Published As
Publication number | Publication date |
---|---|
CN103049265B (en) | 2016-12-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
David et al. | Neural reverse engineering of stripped binaries using augmented control flow graphs | |
Wang et al. | Jtrans: Jump-aware transformer for binary code similarity detection | |
Van Emmerik | Static single assignment for decompilation | |
CN106716361B (en) | The compiler cache that routine redundancy tracks when for running | |
CN101652746B (en) | Improvements in and relating to floating point operations | |
Hasabnis et al. | Lifting assembly to intermediate representation: A novel approach leveraging compilers | |
Rahimian et al. | Bincomp: A stratified approach to compiler provenance attribution | |
Pei et al. | Stateformer: Fine-grained type recovery from binaries using generative state modeling | |
CN108139891A (en) | Include suggesting for the missing of external file | |
CN101477610B (en) | Software watermark process for combined embedding of source code and target code | |
Kalysch et al. | VMAttack: deobfuscating virtualization-based packed binaries | |
CN101807239A (en) | Method for preventing source code from decompiling | |
Ranta | Implementing programming languages. An introduction to compilers and interpreters | |
Lehmann et al. | Finding the dwarf: recovering precise types from WebAssembly binaries | |
Wang et al. | jtrans: Jump-aware transformer for binary code similarity | |
CN106055343B (en) | A kind of object code reverse-engineering system based on program evolution model | |
CN114625844A (en) | Code searching method, device and equipment | |
CN103049265A (en) | Method for processing zone bits in reverse decompilation system | |
Cao et al. | Boosting neural networks to decompile optimized binaries | |
Zhu et al. | kTrans: Knowledge-Aware Transformer for Binary Code Embedding | |
CN106126225B (en) | A kind of object code reverse engineering approach based on program evolution model | |
Ray | An Overview of WebAssembly for IoT: Background, Tools, State-of-the-Art, Challenges, and Future Directions | |
CN114816436A (en) | Source code analysis device based on disassembling | |
Kim et al. | A Transformer-based Function Symbol Name Inference Model from an Assembly Language for Binary Reversing | |
Liang et al. | Semantics-recovering decompilation through neural machine translation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20161228 Termination date: 20181214 |