CN103049265A - Method for processing zone bits in reverse decompilation system - Google Patents

Method for processing zone bits in reverse decompilation system Download PDF

Info

Publication number
CN103049265A
CN103049265A CN2012105460924A CN201210546092A CN103049265A CN 103049265 A CN103049265 A CN 103049265A CN 2012105460924 A CN2012105460924 A CN 2012105460924A CN 201210546092 A CN201210546092 A CN 201210546092A CN 103049265 A CN103049265 A CN 103049265A
Authority
CN
China
Prior art keywords
file
zone bit
processing
statement
array
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2012105460924A
Other languages
Chinese (zh)
Other versions
CN103049265B (en
Inventor
刘金硕
郑稳
章喻龙
刘源
刘天晓
栗鹏
曾秋梅
邹斌
张智
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan University WHU
Original Assignee
Wuhan University WHU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan University WHU filed Critical Wuhan University WHU
Priority to CN201210546092.4A priority Critical patent/CN103049265B/en
Publication of CN103049265A publication Critical patent/CN103049265A/en
Application granted granted Critical
Publication of CN103049265B publication Critical patent/CN103049265B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Stored Programmes (AREA)

Abstract

The invention relates to a method for processing zone bits in a reverse decompilation system. The method includes the steps: firstly, connecting a debugger to a computer to fetch binary codes of a corresponding microprocessor; secondly, performing disassembly to realize particular processing according to a particular processor by referring to the assembly format of the processor and to generate particular assembly codes of the processor; and thirdly, reversely generating a high-level language from an assembly language. An existing common microprocessor is encrypted to a certain degree and can be debugged in a code fetching manner by the method before a fuse is burned out, but the binary codes cannot be fetched by the method after encryption, and a method for fetching the binary codes in the microprocessor is not within the discussion scope of the invention. Therefore, decompilation can be higher in accuracy in the decompilation process, and the method is simple and easy to operate and convenient to understand.

Description

A kind of in reverse anti-compiler the disposal route of zone bit
Technical field
The present invention relates to the disposal route of zone bit, especially relate to a kind of in reverse anti-compiler the disposal route of zone bit.
Background technology
At society, development in science and technology is maked rapid progress, embedded system widespread use in mobile phone and various portable equipment.At aircraft, the vehicles such as automobile extensively adopt various microprocessors at military installations, deep-sea exploration, Space Science and Technology etc. even, the system maintainability that brings thus, malicious code analysis, there is larger defective on the system security reliability, simultaneously Legacy software utilization etc. also impelled the research reverse to software.Therefore reverse compiling is carried out in the code information of microprocessor just imperative.
Whether legal and whether can protect in software developer's the group and have certain law battle to reverse-engineering; reverse law battle in the nineties in 20th century about software finishes finally; definition with reference to the software giant U.S.; according to federal laws; the software that has copyright is carried out the reverse-engineering operation such as dis-assembling; if not unlawful interests is competed with it or obtained to the development of new products, the contrary operation that then carries out is legal [PamelaSamuelson 1990].So so long as not for economic interests and participate in the competition, it is legal carrying out reverse-engineering.
Embedded microprocessor is different from Intel more common on the personal computer and the CPU of AMD, and embedded microprocessor generally adopts oneself uniquely assembly language control whole system, and more common is serial such as Texas Instrument and auspicious Sa.Obtain the result after the dis-assembling of microprocessor, when carrying out reverse compiling, always relate to the processing of zone bit.Mainly contain at present the series such as X86, Am186/88, ARM, MIPS, PowerPC68K about embedded microprocessor.
Summary of the invention
Above-mentioned technical matters of the present invention is mainly solved by following technical proposals:
A kind of in reverse anti-compiler the disposal route of zone bit, it is characterized in that, may further comprise the steps:
Step 1 is read in the assembly language file of input by initialization module;
Step 2 by the specific format of zone bit identification module according to input file, defines its corresponding zone bit, and is deposited among the zone bit file A;
Step 3 reads according to the assembly language of setting and processes by being designated processing module;
Step 4, after processing whole assembling file according to the processing mode of step 3, by the processing of various controlled circulation structures and array, just can relatively be convenient to the higher level lanquage B file of understanding, at the head that the A file is added to the B file, can produce and have the comparatively higher level lanquage of complete meaning.
The present invention includes: the decompiling flow process of embedded microprocessor.The decompiling flow process of embedded decompiling comprises that mainly step is as follows, first step is that debugger is connected to the binary code that computer takes out corresponding microprocessor, present general microprocessor all can carry out certain encryption, before fuse opening, can replace by this method code debugging etc., but be encrypted and generally can not get binary code by this mode afterwards, get the method for the binary code in the microprocessor not within the discussion scope of this paper.Second step is processed, and carries out dis-assembling, and the purpose of dis-assembling is specifically to locate with reference to the compilation form of this processor according to specific processor, generates to change the specific assembly code of processor into.The purpose that the 3rd step processed is the reverse higher level lanquage that is generated as of assembly language, and the purpose of this patent namely is the method that generates a processing assembly language zone bit in the process of higher level lanquage.
Treatment step to the zone bit in the reverse compiler; Generally speaking, in the decompiling processing procedure of embedded microprocessor, have more step, all things considered, processing to assembly language is the processing that goes on foot one step ahead, namely first a meaningful assembly language is processed, in the process of this processing, run into may exert an influence to zone bit in, can generate four zone bits, we can define accordingly to zone bit, simultaneously in the process of processing, the statement that may change zone bit is added Rule of judgment, if necessary, dirty bit then.
Higher level lanquage generates, and in the Decompilation of embedded microprocessor, can generate a plurality of files at last, and then according to file of corresponding ruled synthesis.The definition of zone bit generates a file, and generally we can be defined as the int type to zone bit.Then generate the file of zone bit definition, can called after a.flag file, be other the file amalgamation that in Decompilation, generates together, just can generate comparatively senior language.
Above-mentioned a kind of in reverse anti-compiler the disposal route of zone bit, the concrete steps of described step 3 are: with reference to the practical significance of this compilation, according to the impact of this compilation on zone bit, adopt suitable with it corresponding higher level lanquage to express.
Above-mentioned a kind of in reverse anti-compiler the disposal route of zone bit, in the described step 4, be divided into for five steps, comprising: array manipulation; The processing of Switch statement; Variable is processed; The processing of controlled circulation statement; And overall treatment.
Therefore, the present invention has can make decompiling have higher accuracy rate in Decompilation, and operation is simple simultaneously, is convenient to understand.
Description of drawings
Fig. 1 is the general decompiling processing flow chart of embedded microprocessor, wherein mainly comprises three phases, obtains machine code, dis-assembling, decompiling.
Fig. 2 is the processing of the zone bit of embedded microprocessor, for the step in the decompiling processing, processes with reference to table 1.
Fig. 3 a embedded microprocessor program is the schematic diagram of the source file (C language) of middle input for example.
Compilation schematic diagram after for example middle C language compiling of Fig. 3 b embedded microprocessor program.
Fig. 3 c embedded microprocessor program is the schematic diagram of the higher level lanquage of middle output for example.
Fig. 4 is switch...case statement form of expression in compilation.
Fig. 5 is that case statement is processed schematic diagram.
Fig. 6 is case statement specifying information schematic diagram.
Fig. 7 is while compilation form schematic diagram.
Fig. 8 is the file data schematic diagram that reads in.
Embodiment
Below by embodiment, and by reference to the accompanying drawings, technical scheme of the present invention is described in further detail.
Embodiment:
1. read in the assembly language file of input; With reference to Fig. 2, after beginning, program reads in the assembly language of input.
2. according to the specific format of input file, such as being MSP430 or M16C etc., define its corresponding zone bit, and be deposited among the zone bit file A, as an example of MSP430 example (owing in Decompilation, do not relate to the processing of overflow position, therefore overflow position is not processed), then be:
int?N=0;
int?Z=0;
int?C=0;
In Fig. 2, if reading data may exert an influence to zone bit, then these words are divided into two step process, the first step is that corresponding statement is processed, second step is that the zone bit that possible have influence on is processed.
3. reading according to a significant assembly language and process, is not significant compilation such as the statement of single MOV, MOV A, B; Be significant compilation then, with reference to table 1, if this significant compilation has influence on zone bit, then process accordingly according to table 1, such as:
add.w?R14,R15
According to table 1, we can translate (owing to not relating to the processing of overflow position in Decompilation, therefore overflow position is not processed) like this:
int?R14,R15;
R15=R15+R14;
If(R15<0)
N=1;
Else
N=0;
If(R15=0)
Z=1;
Else
Z=0;
Because the difference of the concrete structure of various embedded microprocessor and data storage word length and difference, so can produce different determination methods, generally speaking take the storage data word length as criterion, length such as the storage data is 8, between-255-255, then do not have carry or borrow at operation result, otherwise have carry or borrow.We then are 16 take MSP430 as example.
If(-32767=<R15<=32768)
C=1;
Else
C=0;
Overflow equally and can judge according to the form of storage data.Which statement can exert an influence to zone bit among the table 1 expression MSP430, and which statement needs the assistance of zone bit, the reverse compiling of ability.
4. process whole assembling file according to the processing mode in the 3rd step, pass through again other necessary data structure analysis etc., just can relatively be convenient to the higher level lanquage B file understood, at the head that the A file is added to the B file, can have been produced higher level lanquage with complete meaning comparatively.The original C file of a figure expression in Fig. 3, b then represents to be input to the assembling file that MSP430 produces afterwards, and c represents the file after the decompiling.Mainly be divided into following substep:
4.1. the processing of array.
If in compilation process, run into the assembly language of mova, be array, just can with reference to above-mentioned data structure, extract corresponding array and get final product.
We must at first define the details that a structure is characterized in array in the compilation:
Figure BDA00002590128600061
Wherein arrayInfo is the name of structure, comprises four information in the structure, is exactly the definition of name of array: name[length at first], purpose is in assembling file is processed, if find array, then array is processed, just the array name after processing leaves in this array.Thereafter chariniAddress[length] what deposit is the initial address that runs into array.
4.2.switch the processing of statement.
After first step array manipulation, carry out the processing of the switch statement of second step, process first array, the reason of processing again the switch statement is: array may be present in the switch statement, so must be according to this order.
We have a look first in assembly language, switch ... the form of expression of case statement can be referring to Fig. 4.With reference to this function, we can take multistep to realize the function of function:
4.21. define two file arrays:
Char fileArray[Max] [Length]; // file storage array
Char swQueue[switchMax] [Length]; //case array
The purpose of swQueue array is the options of preserving swtich, and minute number of case maximum in the switch case statement.Macro definition wherein also can find in table 3-6.Wherein obtaining of file array can be adopted do ... the method of while circulation, in the file every the space, just two different character strings are saved in the array.Below specific code sees for details:
Figure BDA00002590128600071
Figure BDA00002590128600081
In the code, we can see existence function IsKeyWord(in judging statement in the above); The purpose of this function is to judge whether the character that reads in from file is meaningful, and we can be referring to the description of this function:
Therefrom can find out, if the character of input is may exist in assembly language, then return very, just otherwise return vacation.
In sum, at first by a do ... while cycle control file reading character only reads a character at every turn, and when the character that reads was the space, we just moved forward a unit to character array, continue access data.We just can learn with this, how assembling file are read in the character array and go.
The file that our hypothesis is read in only has two row, as shown in Figure 8.
Then we are by after the above-mentioned processing, and what preserve in the fileArrays array is 8 data, is respectively
fileArrays[0][Length]=
{'0','A','0','B','E','8','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$'};
fileArrays[1][Length]=
{'M','O','V','.','W',':','Q','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$'};
fileArrays[2][Length]=
{'#','1','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$'};
fileArrays[3][Length]=
{'R','1','$',$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$'};
fileArrays[4][Length]=
{'M','O','V','.','W',':','Q','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$'};
fileArrays[5][Length]=
{'#','2','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$'};
fileArrays[6][Length]=
{'#','2','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$'};
fileArrays[7][Length]=
{'A','1','0','B','E','8','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$','$'};
Top data are slightly explained, and we just use dollar mark () " $ " to come the initialization array, because for ASC II code, only have this symbol not have too actual meaning in the c Programming with Pascal Language.With real its institute that gets of dollar mark () initialization array.
4.22. define four character arrays:
char?JSR_A[Length];
char?DB[Length];
char?JMP_B[Length];
char?JMP_S[Length];
The purpose that defines these arrays is to mate with data in file character array fileArray, because these four assembly instructions are indicating the switch case statement, when the match is successful, just as can be known in this assembling file, has the switch case statement.
4.23. concrete methods of realizing, we adopt the mode of false code to describe, and before describing, we illustrate first the function ArrayLength(that we define) function, in being described below, that may use arrives.As its name suggests, the function of ArrayLength function is exactly to return the function of the long length of array, but exists different from the function of common return function length.Referring to definition
Figure BDA00002590128600111
It is very simple that this function is write, if the existence in the array is not equal to the character of " $ ", then returns the quantity of this character, otherwise return 0 value.
Figure BDA00002590128600121
Switch statement processing module: we are take Fig. 5 as example: we use the detailed mark of black line frame out case statement wherein in Fig. 5:
We can extract such information from figure:
case?1:,case?3,case?5,case?8,case?0x16。
Refer again to information thereafter, as shown in Figure 6: can see very clearly that case statement JMP statement is corresponding one by one, with this we just very easily solution decided case statement, so this routine switch ... case statement, we just can translate like this:
Figure BDA00002590128600122
Figure BDA00002590128600131
Switch(R0 in fact), need not become, the switch statement is all with switch(R0 in assembly language) beginning.
All return even.
4.3. variable is processed.
The purpose that variable is processed mainly is to eliminate those in previous step, the variable that re-defines.Thinking is comparatively simple, whenever reads a variable, just variable is thereafter all eliminated, and when reading end-of-file (EOF), just finishes dealing with like this.
4.4.While statement is processed (processing of controlled circulation statement).
The processing of While statement is comparatively loaded down with trivial details, and we are at the call relation in master program file;
Figure BDA00002590128600132
The Third function is that we process while statement, if we call the third function return value is 1, represents that we have processed while statement in calling specifically, if rreturn value is 0, then while statement is not processed in expression, then continues to carry out.
Process while statement, we look at first the expression of while statement in the meeting compilation:
The assembly language of this section coded representation while:
Figure BDA00002590128600141
We can find out substantially such as Fig. 7, and for while statement, we are the compilation in the decisional block at first.
Therefrom can find out and judge that assembly language needs JMP and CMP, and JNZ.When existing these several statements to occur simultaneously, and address thereafter just can be judged as while statement when mutually correlating.Therefore can be translated as:
4.5.if statement is processed (overall treatment).
The judgement of If statement mainly is to see whether to have CMP and can noly have the redirect condition afterwards, if exist, is the if control statement just can be judged as.In the code of appendix, the if control statement that generates when also depositing the processing to zone bit
The example of if statement is as follows:
Figure BDA00002590128600151
Generally speaking, the if statement also has relatively-stationary Rule of judgment.
Add JLT, JZ etc. statement after the CMP.So the above can be translated as
Figure BDA00002590128600152
Following table is table 1: and the processing of embedded microprocessor zone bit (take the MSP430 of Texas Instrument as example, two positional operands are defined as RA, RB, and an operand is RC, V, N, Z, C are four zone bits, represent respectively overflow position, negative position, zero-bit and carry)
Figure BDA00002590128600153
Figure BDA00002590128600161
Figure BDA00002590128600171
Figure BDA00002590128600181
Figure BDA00002590128600191
Figure BDA00002590128600201
Figure BDA00002590128600211
Figure BDA00002590128600221
Wherein, " * " represents to affect "-" expression in the mode bit does not affect " 0 " expression zero clearing " 1 " expression set, and what contain .B is the single byte operation instruction, and what contain [.W] is double byte operational order (can omit).
Specific embodiment described herein only is to the explanation for example of the present invention's spirit.Those skilled in the art can make various modifications or replenish or adopt similar mode to substitute described specific embodiment, but can't depart from spirit of the present invention or surmount the defined scope of appended claims.

Claims (3)

1. the disposal route of a zone bit in reverse anti-compiler is characterized in that, may further comprise the steps:
Step 1 is read in the assembly language file of input by initialization module;
Step 2 by the specific format of zone bit identification module according to input file, defines its corresponding zone bit, and is deposited among the zone bit file A;
Step 3 reads according to the assembly language of setting and processes by being designated processing module;
Step 4, after processing whole assembling file according to the processing mode of step 3, by the processing of various controlled circulation structures and array, just can relatively be convenient to the higher level lanquage B file of understanding, at the head that the A file is added to the B file, can produce and have the comparatively higher level lanquage of complete meaning.
According to claim 1 a kind of in reverse anti-compiler the disposal route of zone bit, it is characterized in that, the concrete steps of described step 3 are: with reference to the practical significance of this compilation, according to the impact of this compilation on zone bit, adopt suitable with it corresponding higher level lanquage to express.
According to claim 1 a kind of in reverse anti-compiler the disposal route of zone bit, it is characterized in that, in the described step 4, be divided into for five steps, comprising: array manipulation; The processing of Switch statement; Variable is processed; The processing of controlled circulation statement; And overall treatment.
CN201210546092.4A 2012-12-14 2012-12-14 A kind of processing method of flag bit in reverse anti-compiler Expired - Fee Related CN103049265B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210546092.4A CN103049265B (en) 2012-12-14 2012-12-14 A kind of processing method of flag bit in reverse anti-compiler

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210546092.4A CN103049265B (en) 2012-12-14 2012-12-14 A kind of processing method of flag bit in reverse anti-compiler

Publications (2)

Publication Number Publication Date
CN103049265A true CN103049265A (en) 2013-04-17
CN103049265B CN103049265B (en) 2016-12-28

Family

ID=48061917

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210546092.4A Expired - Fee Related CN103049265B (en) 2012-12-14 2012-12-14 A kind of processing method of flag bit in reverse anti-compiler

Country Status (1)

Country Link
CN (1) CN103049265B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108958739A (en) * 2018-06-06 2018-12-07 北京大学 Array data structure restoration methods and system in a kind of binary system decompiling
CN111935622A (en) * 2020-08-03 2020-11-13 深圳创维-Rgb电子有限公司 Debugging method, device, equipment and storage medium for electronic equipment with digital power amplifier

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101364253A (en) * 2007-08-06 2009-02-11 电子科技大学 Covert debug engine and method for anti-worm
CN101714118A (en) * 2009-11-20 2010-05-26 北京邮电大学 Detector for binary-code buffer-zone overflow bugs, and detection method thereof

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101364253A (en) * 2007-08-06 2009-02-11 电子科技大学 Covert debug engine and method for anti-worm
CN101714118A (en) * 2009-11-20 2010-05-26 北京邮电大学 Detector for binary-code buffer-zone overflow bugs, and detection method thereof

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
殷文建: "面向ARM体系结构的代码逆向分析关键技术研究", 《中国优秀硕士学位论文全文数据库》 *
秦青文等: "基于IDA-Pro的软件逆向分析方法", 《计算机工程》 *
韩小琨等: "可重用的指令集模拟器的设计与优化技术", 《计算机工程》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108958739A (en) * 2018-06-06 2018-12-07 北京大学 Array data structure restoration methods and system in a kind of binary system decompiling
CN108958739B (en) * 2018-06-06 2020-11-10 北京大学 Method and system for recovering array data structure in binary decompilation
CN111935622A (en) * 2020-08-03 2020-11-13 深圳创维-Rgb电子有限公司 Debugging method, device, equipment and storage medium for electronic equipment with digital power amplifier
CN111935622B (en) * 2020-08-03 2022-02-11 深圳创维-Rgb电子有限公司 Debugging method, device, equipment and storage medium for electronic equipment with digital power amplifier

Also Published As

Publication number Publication date
CN103049265B (en) 2016-12-28

Similar Documents

Publication Publication Date Title
David et al. Neural reverse engineering of stripped binaries using augmented control flow graphs
Wang et al. Jtrans: Jump-aware transformer for binary code similarity detection
Van Emmerik Static single assignment for decompilation
CN106716361B (en) The compiler cache that routine redundancy tracks when for running
CN101652746B (en) Improvements in and relating to floating point operations
Hasabnis et al. Lifting assembly to intermediate representation: A novel approach leveraging compilers
Rahimian et al. Bincomp: A stratified approach to compiler provenance attribution
Pei et al. Stateformer: Fine-grained type recovery from binaries using generative state modeling
CN108139891A (en) Include suggesting for the missing of external file
CN101477610B (en) Software watermark process for combined embedding of source code and target code
Kalysch et al. VMAttack: deobfuscating virtualization-based packed binaries
CN101807239A (en) Method for preventing source code from decompiling
Ranta Implementing programming languages. An introduction to compilers and interpreters
Lehmann et al. Finding the dwarf: recovering precise types from WebAssembly binaries
Wang et al. jtrans: Jump-aware transformer for binary code similarity
CN106055343B (en) A kind of object code reverse-engineering system based on program evolution model
CN114625844A (en) Code searching method, device and equipment
CN103049265A (en) Method for processing zone bits in reverse decompilation system
Cao et al. Boosting neural networks to decompile optimized binaries
Zhu et al. kTrans: Knowledge-Aware Transformer for Binary Code Embedding
CN106126225B (en) A kind of object code reverse engineering approach based on program evolution model
Ray An Overview of WebAssembly for IoT: Background, Tools, State-of-the-Art, Challenges, and Future Directions
CN114816436A (en) Source code analysis device based on disassembling
Kim et al. A Transformer-based Function Symbol Name Inference Model from an Assembly Language for Binary Reversing
Liang et al. Semantics-recovering decompilation through neural machine translation

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20161228

Termination date: 20181214