CN1818863A - Static library decompiling recognition of built-in software - Google Patents

Static library decompiling recognition of built-in software Download PDF

Info

Publication number
CN1818863A
CN1818863A CN 200610049803 CN200610049803A CN1818863A CN 1818863 A CN1818863 A CN 1818863A CN 200610049803 CN200610049803 CN 200610049803 CN 200610049803 A CN200610049803 A CN 200610049803A CN 1818863 A CN1818863 A CN 1818863A
Authority
CN
China
Prior art keywords
function
code
static
address
function module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN 200610049803
Other languages
Chinese (zh)
Inventor
陈天洲
胡威
谢斌
赵懿
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN 200610049803 priority Critical patent/CN1818863A/en
Publication of CN1818863A publication Critical patent/CN1818863A/en
Pending legal-status Critical Current

Links

Landscapes

  • Debugging And Monitoring (AREA)
  • Devices For Executing Special Programs (AREA)
  • Stored Programmes (AREA)

Abstract

An embedded static-base function identify realization method in software anti-translation, the invention is combining the processing characteristic of the anti-translation, matching the function module in the static-base and the base function in the compiler, the function entrance address in transfer function processing code of the middle code change to the corresponding name of the base function, to achieve the identification of the function in the static-base. The advantage is better for anti-translation by changing the aiming code to the corresponding high-grade language.

Description

Static library function identification implementation method in the embedded software decompiling
Technical field
The present invention relates to embedded software inverse compiling technique field, particularly relate to the static library function identification implementation method in a kind of embedded software decompiling.
Background technology
Reverse-engineering is that object of analysis sees how it works, and purpose is in order to duplicate or to strengthen this object.It is a practice of bringing from older industry, and present tense is everlasting and uses on computer hardware and the software.For instance, in car industry, a manufacturer may buy rival's vehicle, decomposes it, and the weld that checks vehicles, and other compositions of sealing lead and vehicle are to reach the purpose of using similar parts to strengthen their automobile.
Software inverse engineering comprises that the machine code that makes program (being sent to a string 0 and 1 character string of logic processor) oppositely gets back to the source code that is written into, the statement of service routine design language.Software inverse engineering may want to reach following purpose: because the loss of former code, so want to obtain again the former code of program; How search procedure moves specific operation; The performance of improvement program; Repair a program error (when source code is unavailable, correcting a mistake of program); Being identified in the content of the malice in the program, similarly is virus; Or be adapted for program that a microprocessor writes to adapt to a microprocessor that design is different.For the reverse-engineering of this unique purpose of reproducer constitutes copyright violation and is illegal.
In some cases, reverse-engineering is forbidden in the licensed-in use of software clearly.The people who does software inverse engineering may use some instruments to decompose a program.An instrument is the hexadecimal tripper, and it is with the binary digit of sexadesimal system form (this reads easily than two-symbol form) printing or display routine.By form and the instruction length of knowing the position of representing processor instruction, reverse-engineering teacher can discern the specific part of a program and see how they work.
Common instrument in addition is a resolver.Resolver is read two-symbol code and is shown each executable instruction with textual form then.Resolver can not be differentiated the difference between executable instruction and the data used by program, thus debugger by usefulness, its allows resolver to avoid the data division of decomposing program.These instruments may be used for revising code by a cracker and obtain the right of ingress of a computer system or cause other injury.
The hardware reverse-engineering comprises that taking a device apart sees how it works.For instance, if how the processor that processor manufacturer wants to see the rival works, they can buy rival's processor, decompose it, make a similar processor then.Yet this process is illegal in many countries.Substantially, the hardware reverse-engineering needs a lot of professional technique and quite expensive.Another kind of reverse-engineering relates to the 3-D figure (when its blueprint is unavailable) of the part that generation made so that make this part again.For a part is carried out reverse-engineering, this part is measured with measurement of coordinates device (CMM).When it was measured, 3-D wire frame image was produced and is shown on a monitor.After mensuration was finished, the wire frame image was measured.Use any part of these methods can carry out reverse-engineering.
And the dynamic tracking debugging can realize the tracking of program and operation one by one, and this is a reason of having utilized single step interruption and breakpoint to interrupt in fact, and present most of trace debug softwares all are to have utilized this two interruptions.
(INT1) interrupted in single step is a kind of interruption that is caused by the machine intimate state, when the TF of system sign register indicates that (single step tracking mark) is when being set, will produce a single step automatically and interrupt, make CPU after carrying out an instruction, to stop, and show the content of each register.
It is a kind of soft interruption that breakpoint interrupts (INT3), and soft interruption is called self-trapping instruction again, when CPU carries out self-trapping instruction, just enters the breakpoint interrupt service routine, finishes demonstration to each content of registers of breakpoint place by the breakpoint interrupt service routine.
Decompiling is a kind of object code to be converted into the technology of higher level lanquage fotmat code of equal value, and it is an important component part in the software inverse engineering.Decompiling is divided into based on Decompilation of Executive Program with based on the decompiling of virtual machine instructions.The identification of built-in function is an important component part of whole decompiling process.
Summary of the invention
The object of the present invention is to provide the static library function identification implementation method in a kind of embedded software decompiling.
The technical scheme that the present invention solves its technical matters employing is as follows:
1) intermediate code identification
Utilize the assembly code of the corresponding relation generation executable program of assembly instruction and machine instruction;
2) extract function module
Function module finally all is that the form with " CALL address A " appears at A start address place, address, so the recognition function module just identifies the module that address A in " CALL address A " begins, by seeking the start address and the termination address of function module, determine function module;
3) discern used compiler
The dynamic base name that different compilers carry is different, and other specific informations that compiling is produced by compiler can reach the purpose of discerning the used compiler of executive routine well;
4) extract the corresponding static storehouse
The static library of each compiler can obtain by checking its user manual, can obtain the information such as function name, function code of all functions in the library file by the Study document form, sets up a static database of its static library function;
5) identification built-in function
For one section given function module code, from the database that static library is set up, find out a function and equate with the instruction length and the sequence of opcodes of this section function module code.
The present invention compares with background technology, and the useful effect that has is:
The present invention is the static library function identification implementation method in a kind of embedded software decompiling, its major function is the whole process characteristics in conjunction with decompiling, to mating between the built-in function of static library with user's function module and compiler, function entrance address in the call function process code form in the intermediate code changes corresponding built-in function name into, to realize the identification to the static library function.
(1) high efficiency.This method has realized the static library function identification in the embedded software decompiling, helps accelerating the speed of decompiling work, improves the efficient of decompiling.
(2) practicality.To mating between the built-in function of static library with user's function module and compiler, help the decompiling work that object code is converted into higher level lanquage form of equal value.
Description of drawings
Accompanying drawing is an implementation process synoptic diagram of the present invention;
Embodiment
The present invention is the static library function identification implementation method in a kind of embedded software decompiling, below in conjunction with its specific implementation process of description of drawings.
1) intermediate code identification
It is a key point of carrying out built-in function identification that executive routine is converted to corresponding intermediate code, and this step also is the dis-assembling process in the decompiling.The present invention utilizes the corresponding relation of assembly instruction and machine instruction to generate the assembly code of executable program, and the code after the dis-assembling of generation is as shown in the table:
01006CB2:50 push eax
01006CB3:FF 75 08 push dword ptr[ebp+8]
01006CB6:FF 15 D0 10 00 01 call dword ptr ds:[010010D0h]
01006CBC:59 pop ecx
01006CBD:59 pop ecx
01006CC1:FF 7508 push dword ptr[ebp+8]
01006CC4:FF 15 FC 10 00 01 call dword ptr ds:[010010FCh]
The first above-mentioned row shape is the code loading address as " 01006CBC's ", and secondary series shape is the binary coding of instruction as " FF 75 08 ", and and then " push " of back is the operational code of corresponding assembly instruction, is exactly the operand of this instruction at last.
2) extract function module
Extracting function module is exactly that the function module that occurs in the intermediate code is identified, for intermediate code, because finally all being the form with " CALL address A ", function module appears at A start address place, address, so the recognition function module just identifies the module that address A in " CALL address A " begins.Characteristics according to the definition and the compiling of function, the function of intermediate code form all is to return with the form of " retn " at last, so flow process according to program itself, with " retn " form of returning is foundation, can identify the function module that occurs in all the CALL instructions in the intermediate code.
3) discern used compiler
The dynamic base name that different compilers carry is different, and other specific informations that compiling is produced by compiler can reach the purpose of discerning the used compiler of executive routine well; According to the characteristics of actual executive routine,, just can identify corresponding compiler by its dynamic base if executive routine has been quoted dynamic base.
4) extract the corresponding static storehouse
The static library of each compiler can obtain by checking its user manual, can obtain the information such as function name, function code of all functions in the library file by the Study document form, as for a function " fopen ", can obtain as shown in the table by this method:
-fopen:
00000000:6A 40 push 40h
00000002:FF 74 24 0C push dword ptr[esp+0Ch]
00000006:FF 74 24 0C push dword ptr[esp+0Ch]
0000000A:E8 00 00 00 00 call 0000000F
0000000F:83 C4 0C add esp,0Ch
00000012:C3 ret
By function name, the function code of all built-in functions of obtaining, just can set up a static database of its static library function, the database ground form of foundation is as shown in the table:
Function name Instruction length Sequence of opcodes
fopen 17 pushpushpushcalladdret
isalpha 46 cmpjlepushpushcallpoppopretmovmovmovandret
Because compiler can be taked the reorientation operation in binding, above-mentioned similarly " call 0000000F " instruction last operation number can change, so above-mentioned process of building the storehouse does not just have the typing operand.
5) identification built-in function
For one section given function module code, from the database that static library is set up, find out a function and equate with the instruction length and the sequence of opcodes of this section function module code.

Claims (1)

1. the static library function in the embedded software decompiling is discerned implementation method, it is characterized in that:
1) intermediate code identification
Utilize the assembly code of the corresponding relation generation executable program of assembly instruction and machine instruction;
2) extract function module
Function module finally all is that the form with " CALL address A " appears at A start address place, address, so the recognition function module just identifies the module that address A in " CALL address A " begins, by seeking the start address and the termination address of function module, determine function module;
3) discern used compiler
The dynamic base name that different compilers carry is different, and other specific informations that compiling is produced by compiler can reach the purpose of discerning the used compiler of executive routine well;
4) extract the corresponding static storehouse
The static library of each compiler can obtain by checking its user manual, can obtain the information such as function name, function code of all functions in the library file by the Study document form, sets up a static database of its static library function;
5) identification built-in function
For one section given function module code, from the database that static library is set up, find out a function and equate with the instruction length and the sequence of opcodes of this section function module code.
CN 200610049803 2006-03-13 2006-03-13 Static library decompiling recognition of built-in software Pending CN1818863A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 200610049803 CN1818863A (en) 2006-03-13 2006-03-13 Static library decompiling recognition of built-in software

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 200610049803 CN1818863A (en) 2006-03-13 2006-03-13 Static library decompiling recognition of built-in software

Publications (1)

Publication Number Publication Date
CN1818863A true CN1818863A (en) 2006-08-16

Family

ID=36918894

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 200610049803 Pending CN1818863A (en) 2006-03-13 2006-03-13 Static library decompiling recognition of built-in software

Country Status (1)

Country Link
CN (1) CN1818863A (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100451969C (en) * 2006-12-27 2009-01-14 北京大学 Method for recognizing composite conditional branch structure
CN102402479A (en) * 2010-09-28 2012-04-04 微软公司 Intermediate representation construction for static analysis
CN101393521B (en) * 2008-11-13 2012-04-25 上海交通大学 Extracting system for internal curing data of windows application program
CN103150438A (en) * 2013-03-12 2013-06-12 青岛中星微电子有限公司 Method and device for compiling circuit
CN103577728A (en) * 2013-11-16 2014-02-12 哈尔滨工业大学 Method for identifying library functions by using shrinkage executing dependence graphs
CN104679495A (en) * 2013-12-02 2015-06-03 贝壳网际(北京)安全技术有限公司 Method and device for recognizing software
CN104915211A (en) * 2015-06-18 2015-09-16 西安交通大学 Intrinsic function recognition method based on sub-graph isomorphism matching algorithm in decompilation
CN105044653A (en) * 2015-06-30 2015-11-11 武汉大学 Software conformance detection method for smart electric meters
CN109739506A (en) * 2018-12-27 2019-05-10 郑州云海信息技术有限公司 The processing method and system of library function missing in a kind of compiling of performance application
CN109918950A (en) * 2019-03-24 2019-06-21 哈尔滨理工大学 A kind of application method identifying binary function in embedded device
WO2022068559A1 (en) * 2020-09-30 2022-04-07 华为技术有限公司 Code processing method and apparatus, and device

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100451969C (en) * 2006-12-27 2009-01-14 北京大学 Method for recognizing composite conditional branch structure
CN101393521B (en) * 2008-11-13 2012-04-25 上海交通大学 Extracting system for internal curing data of windows application program
CN102402479A (en) * 2010-09-28 2012-04-04 微软公司 Intermediate representation construction for static analysis
US8930913B2 (en) 2010-09-28 2015-01-06 Microsoft Corporation Intermediate representation construction for static analysis
US9563535B2 (en) 2010-09-28 2017-02-07 Microsoft Technology Licensing, Llc Intermediate representation construction for static analysis
CN102402479B (en) * 2010-09-28 2015-09-16 微软技术许可有限责任公司 For the intermediate representation structure of static analysis
CN103150438B (en) * 2013-03-12 2016-01-06 青岛中星微电子有限公司 A kind of circuit compiler method and device
CN103150438A (en) * 2013-03-12 2013-06-12 青岛中星微电子有限公司 Method and device for compiling circuit
CN103577728A (en) * 2013-11-16 2014-02-12 哈尔滨工业大学 Method for identifying library functions by using shrinkage executing dependence graphs
CN103577728B (en) * 2013-11-16 2016-03-30 哈尔滨工业大学 A kind of method using contraction to perform dependency graph identification built-in function
CN104679495A (en) * 2013-12-02 2015-06-03 贝壳网际(北京)安全技术有限公司 Method and device for recognizing software
CN104679495B (en) * 2013-12-02 2018-04-27 北京猎豹移动科技有限公司 software identification method and device
CN104915211A (en) * 2015-06-18 2015-09-16 西安交通大学 Intrinsic function recognition method based on sub-graph isomorphism matching algorithm in decompilation
CN104915211B (en) * 2015-06-18 2018-04-17 西安交通大学 Intrinsic function recognition methods based on Subgraph Isomorphism matching algorithm in decompiling
CN105044653A (en) * 2015-06-30 2015-11-11 武汉大学 Software conformance detection method for smart electric meters
CN109739506A (en) * 2018-12-27 2019-05-10 郑州云海信息技术有限公司 The processing method and system of library function missing in a kind of compiling of performance application
CN109739506B (en) * 2018-12-27 2022-02-18 郑州云海信息技术有限公司 Method and system for processing library function missing in high-performance application compilation
CN109918950A (en) * 2019-03-24 2019-06-21 哈尔滨理工大学 A kind of application method identifying binary function in embedded device
WO2022068559A1 (en) * 2020-09-30 2022-04-07 华为技术有限公司 Code processing method and apparatus, and device

Similar Documents

Publication Publication Date Title
CN1818863A (en) Static library decompiling recognition of built-in software
Cook et al. Termination proofs for systems code
US8266608B2 (en) Post-compile instrumentation of object code for generating execution trace data
US5966541A (en) Test protection, and repair through binary-code augmentation
CN109918903B (en) Program non-control data attack protection method based on LLVM compiler
Bardin et al. Refinement-based CFG reconstruction from unstructured programs
Kennedy et al. Coq: the world's best macro assembler?
CN110245467B (en) Android application program protection method based on Dex2C and LLVM
Yadavalli et al. Raising binaries to LLVM IR with MCTOLL (WIP paper)
Qiu et al. Using reduced execution flow graph to identify library functions in binary code
Srinivasan et al. Synthesis of machine code from semantics
Donaldson et al. Putting randomized compiler testing into production (experience report)
Schneider et al. Bridging the semantic gap through static code analysis
CN114661588A (en) Code execution coverage rate counting method and device and computing equipment
CN103514027A (en) Method for enhancing usability of software protection
CN1299482A (en) Hybrid computer programming environment
Liu et al. Exploring missed optimizations in webassembly optimizers
EP3532936B1 (en) Debugging system and method
Campbell et al. Debugging and verifying SoC designs through effective cross-layer hardware-software co-simulation
CN112100059B (en) C language pointer type analysis method
Guo A scalable mixed-level approach to dynamic analysis of C and C++ programs
Samet Compiler testing via symbolic interpretation
Pollock et al. Introducing natural language program analysis
CN112685041A (en) Front-end modular grammar conversion method, system and storage medium
Cheng et al. Tolerating C integer error via precision elevation

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication