CN100377089C - Identifying method of multiple target branch statement through jump list in binary translation - Google Patents

Identifying method of multiple target branch statement through jump list in binary translation Download PDF

Info

Publication number
CN100377089C
CN100377089C CNB2005100855091A CN200510085509A CN100377089C CN 100377089 C CN100377089 C CN 100377089C CN B2005100855091 A CNB2005100855091 A CN B2005100855091A CN 200510085509 A CN200510085509 A CN 200510085509A CN 100377089 C CN100377089 C CN 100377089C
Authority
CN
China
Prior art keywords
node
grapheme
code
template
coupling
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CNB2005100855091A
Other languages
Chinese (zh)
Other versions
CN1900910A (en
Inventor
陈龙
唐锋
谢海斌
杨浩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Computing Technology of CAS
Original Assignee
Institute of Computing Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Computing Technology of CAS filed Critical Institute of Computing Technology of CAS
Priority to CNB2005100855091A priority Critical patent/CN100377089C/en
Publication of CN1900910A publication Critical patent/CN1900910A/en
Application granted granted Critical
Publication of CN100377089C publication Critical patent/CN100377089C/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Devices For Executing Special Programs (AREA)

Abstract

The multi objective branch statement recognizing method via jump list in binary interpretation includes the following steps: 1) converting the semantic to be recognized into semantic map as the template map; 2) selecting the code segment of the semantic to be recognized; 3) constituting the semantic map with code segment based on the semantic; 4) matching the semantic map constituted in the step 3) and template obtained in the step 1), and executing the next step for successful matching, or possessing failed recognition in case operate mismatching; and 5) obtaining the address of the jump list and the magnitude, and further obtaining the branch target address and marking as effective code. The present invention has the advantages of expanded binary interpretation covering rate, raised code executing efficiency, wide applicable range, simple algorithm and easy realization.

Description

In the binary translation via the recognition methods of the multiple target branch statement of jump list
Technical field
The present invention relates to the translation technology in the computer realm, specially refer in the static binary translation identification via the multiple target branch statement of jump list.
Background technology
The binary translation technology is to solve one of software transplanting problem research focus at present, existing software transplanting can be carried out to processor newly developed, and is significant to the popularization of the development that promotes processor architecture and homemade microprocessor.
For the binary translation system that adopts static mode or the mode of being association of activity and inertia to carry out, the static translation stage need be analyzed and translate the binary file that is translated, and be not that all data all are of practical significance in the file, wherein may comprise some padding datas of introducing in order to satisfy the page alignment requirement.When binary translation, the static translation stage needs to discern valid code as much as possible, and it is translated and optimizes, with the binary code (static local code) of the target machine that forms better performances.But it is effective that the static translation stage has only definite code that can arrive, instruction (being recorded in the symbol table of ELF file) such as place, the entry address of each function, the instruction at the destination address place of jump instruction, next the bar instruction (fall through) of branch instruction and the instruction (target) at destination address place or the like, these all can separate out the static time-division.But instruct (as jmp * %eax for indirect redirect and indirect call, call * %eax), destination address need be taken out from register or storage unit when dynamic operation, and the static translation stage can't obtain, thereby can't translate their valid code at destination address place.
Jump instruction indirectly as jmp * %eax, has been represented a kind of multiple target branch statement, i.e. the target of redirect has a plurality of, specifically jumps to which target and determines in the dynamic operation stage.The typical case of jump instruction representative is the switch-case statement in the higher level lanquage indirectly, the switch-case statement exists in the program of reality in a large number, if can identify the code at all jump target places and translate the execution efficient that will help to improve code in the static translation stage.
With the C language is example, and typical switch-case statement can be expressed as:
switch(expr)
{
case value1:
code1;break;
case value2:
code2;break;
case value3:
code3;break;
case......
default:
code_default;
}
Wherein,<expr〉be called expression formula to be matched,<value1 〉,<value2 〉,<value3 be candidate value, as<expr〉value when equating with some candidate values, the code at execution respective objects place (promptly<and code1 〉,<code2〉or<code3 〉).
Compiler mainly contains following several mode to the realization of switch-case statement:
1, value and each candidate value of order comparison expression if the value of expression formula equates with some candidate values, forward corresponding branch porch to.This mode is usually used under the less situation of candidate value number.The object code structure that compiles out is:
cmp value1,expr
je code1_addr
cmp value2,expr
je code2_addr
cmp value3,expr
je code3_addr
jmp code_default_addr
code1_addr:code1
code2_addr:code2
code3_addr:code3
code_default_addr:code_default
2, candidate value is organized into the structure of search tree, improves seek rate (as binary search).Be usually used under the more and comparatively sparse situation that distributes of candidate value number.For last example, if value1<value2<value3, the then comparison<expr of elder generation〉with<value2 〉, compare then<expr〉with<value1〉and<expr〉with<value3 〉.
3, use jump list.What each list item in the jump list was deposited is the branch entry address of a candidate value correspondence, when using jump list, with expression formula<expr〉value change into the index of jump list, thereby can find corresponding list item immediately.Be usually used under the more and situation about comparatively concentrating that distributes of candidate value number.Its object code structure is: index ← f (expr); Note: general<expr〉value change into the index jmp*Table_Base (index * 4) of jump list; Note: Table_Base is the start address of jump list, and each list item accounts for 4 bytes, and then Table_Base+index * 4 are the address of jump list index item.
Shown in following table 1, be an example of jump list,
Table 1
Figure C20051008550900061
Can see from last table, jump list not necessarily discharges in proper order by the entry address of each branch code, and certain list item may be the address of code_default code therebetween, because the distribution of value is not necessarily continuous, in last table,<value2〉and<value3〉just discontinuous.
4, use Hash (hash) table.Each list item of hash table also is the branch entry address of depositing a candidate value correspondence.By calculation expression<expr〉the hash value, also can obtain corresponding list item quickly.This method is usually used in the more and sparse situation that distributes of candidate value number.In actual use, this method is used less.
For preceding dual mode, be easy to from instruction, obtain the destination address of redirect, and then can be to code1, code2, code3, these codes of code_default are discerned and are translated, but for the third mode, generally need just can jump to corresponding list item place, be difficult to directly to obtain the destination address of instructing, be unfavorable for the raising of translation efficiency in the static translation stage by an indirect jump instruction.It is less that the 4th kind of mode used, and the present invention does not process.
Summary of the invention
The objective of the invention is to overcome prior art is difficult to directly obtain instruction target address in the static translation stage shortcoming, recognition methods via the multiple target branch statement of jump list is provided in a kind of binary translation, be implemented in the static translation stage to the position of jump list and the analysis of content, thereby obtain the destination address of each professional branch.
To achieve these goals, the invention provides in a kind of binary translation recognition methods, comprising via the multiple target branch statement of jump list:
1) semanteme that will discern is converted into grapheme, the grapheme after transforming as template figure;
2) code snippet of selection semanteme to be identified; Wherein, at first in code, search indirect jump instruction, when running into this instruction, check the instruction of some forward; In the instruction of searching forward, see if there is comparison order then, if comparison order arranged, then with the code segment between comparison order and the indirect jump instruction as the code snippet that will discern; If do not find comparison order, then do not process;
3) with step 2) code snippet selected makes up grapheme according to semanteme;
4) the template figure that obtains of grapheme that step 3) is constructed and step 1) does coupling, if the match is successful, then identifies semanteme, carries out next step, otherwise the semantic recognition failures of selected code snippet;
5) obtain the address and the size of jump list, and then the branch target address that obtains writing down in the list item, the code at the described branch target address of mark place is a valid code.
In the technique scheme, in the described step 3), when making up grapheme, need introduce the formation of a recording operation number, the node of the last look of all operands that occurred and this operand correspondence in grapheme in this queue record grapheme building process.
In the technique scheme, in the described step 4), grapheme to be detected and template figure are carried out synchronous traversal, whether the node that detects correspondence position among two figure is complementary, and the node of all correspondence positions all mates and thinks that then two figure mate among two figure.
In the technique scheme, in the described step 4), the node of grapheme and template figure is divided into three types: branch node, operational symbol node, atomic operation are counted node, branch node is considered as the root node of whole grapheme, begin grapheme is carried out matching detection from this node, for dissimilar nodes, carry out matching detection by different rules respectively:
4-1, branch node, the three stalks tree that detects it successively whether with template figure in the subtree coupling of correspondence position;
4-2, operational symbol node, whether with in the template figure node of correspondence position identical, detect successfully then two stalks tree about recursive detection if detecting its content earlier;
Whether 4-3, atomic operation are counted node, count node and mate for the ease of detecting two atomic operations, need add the information of some matched rules on template figure, that is: coupling, strict coupling, not strict coupling arbitrarily; As long as among the detected figure of coupling expression node is arranged herein arbitrarily, the content of node can be any; Strict this position node types of coupling expression must be identical with the node of correspondence position among the template figure with value; Not strict coupling represents that the node of this position only needs type identical with the node of correspondence position among the template figure, and value can be different.
The invention has the advantages that:
1, recognition methods via the multiple target branch statement of jump list has enlarged the translation coverage rate in static translation stage in the binary translation in the binary translation of the present invention, has improved the execution efficient of code.
2, can be applied to during binary file under the kinds of platform analyzes via the recognition methods of the multiple target branch statement of jump list in the binary translation of the present invention, applied widely.
3, the inventive method algorithm on solution is discerned via the multiple target branch statement problem of jump list is terse, is easy to realize.
Description of drawings
Fig. 1 is via the grapheme of the semanteme 1 described in the embodiment of the recognition methods of the multiple target branch statement of jump list in the binary translation of the present invention;
Fig. 2 is via the grapheme of the semanteme 2 described in the embodiment of the recognition methods of the multiple target branch statement of jump list in the binary translation of the present invention;
The grapheme of Fig. 3 for creating according to the code in the example 1;
The grapheme of Fig. 4 for creating according to the code in the example 3;
Fig. 5 is the grapheme behind the semantic 1 additional matched rule;
Fig. 6 is the grapheme behind the semantic 2 additional matched rules;
Fig. 7 is via the process flow diagram of the recognition methods of the multiple target branch statement of jump list in the binary translation of the present invention.
Embodiment
Below in conjunction with the drawings and specific embodiments the present invention is described further.
In one embodiment, as being translated object, but this method can be generalized in the binary translation system under other platform with the binary file under the Linux/x86 platform.
The object of static binary translation has executable file and shared library file.In executable file, the code of the multiple-limb statement of realizing in the jump list mode is followed following three kinds of patterns substantially:
pattern 1;
Cmp $num, oprand;<$num〉be individual several immediately, the list item number of sign jump list,<oprand 〉
Being an operand, may be the register manipulation number, also may be memory operand,
Deposit<expr〉the corresponding jump list of value in which list item, i.e. index index, its
In<expr〉be expression formula to be matched in the switch-case statement.
Ja code_default;<code_default〉be a label, the default generation of mark switch-case statement
The sign indicating number inlet.If<oprand〉than<$num〉big, show that exceeding jump list comprises
Span, jump to the code_default place so and carry out.
Mov oprand, %reg; Index is put in the register
Mov Table_Base (, %reg, 4), %eax;<Table_Base〉be the start address of jump list, Table_Base+
%reg * 4 are exactly the address (each list item accounts for 4 bytes) of the redirect list item of coupling,
This address is put to register %eax.
Jmp * %eax; Take out the object code address of writing down in the list item, will control and shift in the past.
● pattern 2:
Cmp1 $num, oprand; Common mode 1
Ja code_default; Common mode 1
Mov oprand, %eax; Common mode 1
Shl $0x2, %eax; Realize %eax * 4 for 2 by moving to left
Mov Table_Base (%eax), %eax; Common mode 1, this moment, Table_Base+%eax was redirect list item address
Jmp * %eax; Common mode 1
● mode 3: than the form of optimizing
Cmp $num, %reg; Deposit index among the %reg
Ja code_default; Common mode 1
Jmp * Table_Base (, %reg, 4); This is taken out for the jump list list item address of coupling in Table_Base+%reg * 4
The branch target address that deposit at the place will be controlled and shift in the past
A difference of mode 3 and pattern 1 and 2 is not carry out the indirect redirect of register by %eax, and uses the addressing mode of indirect memory redirect.
Example 1, below be the block code section that the eon dis-assembling goes out among the SPEC2000 (gcc-O0 compiling), it meets pattern 2:
80ab3bd:83 7d fc 06 cmp1 $0x6,0xfffffffc(%ebp)
80ab3c1:77 65 ja 80ab428<category_to_name+0x74>
80ab3c3:8b 45 fc mov 0xfffffffc(%ebp),%eax
80ab3c6:89 c0 mov %eax,%eax
80ab3c8:c1 e0 02 shl $0x2,%eax
80ab3cb:8b 80 b0 27 13 08 mov 0x81327b0(%eax),%eax
80ab3d1:ff e0 jmp *%eax
Can find by top several modes and the example enumerated, though all be the multiple target branch statement of realizing via jump list, but the form of expression of executable file in binary code still is various, such as: the addressing mode of operand different (indirect redirect of register or indirect memory redirect), take advantage of 4 implementation different (move to left two or adopt the operand of SIB addressing mode), also may there be the instruction of some wide of the marks in the real example, such as mov%eax, %eax or the like.Therefore it is extremely complicated judging whether to meet pattern by the coupling of instructing simply, but can know that by further analysis these patterns meet a common semanteme, that is:
if(index>$num)
goto code_default;
else
Jmp* (Table_Base+index * 4); (semantic 1)
If can from the code segment of reality, identify this semanteme that they are expressed, will be the most accurately with rational.
In the shared library file, compile out code segment via the multiple target branch statement of jump list and follow another semanteme:
if(index>$num)
goto code_default;
else
Jmp* (%ebx-* (%ebx+disp+index * 4)); (semantic 2)
One section code among example 2, the libc.so.6:
1cb0a:83fa 08 cmp $0x8,%edx
1cb0d:7743 ja 1cb52<iconv+0xde>
1cb0f:8b 8c 93 c8 69 ee ff mov 0xffee69c8(%ebx,%edx,4),%ecx
1cb16:89d8 mov %ebx,%eax
1cb18:29c8 sub %ecx,%eax
1cb1a:ff e0 jmp *%eax
For shared library file (so file), %ebx is a more special register, it deposits a relatively more fixing value (depositing the start address of the GOT table of libc.so), code in the storehouse is often located other address with this value as a benchmark, makes that such code is irrelevant with the load address of file.In the above in the code snippet of example, the base address of jump list is %ebx+0xffee69c8 (is actually %ebx and deducts a side-play amount), so the address of jump list in internal memory is GOT_Addr+Disp=(File_Base+GOT_Offset)+Disp, wherein File_Base is that the shared library file is encased in the base address in the internal memory, can when packing into, file determine, GOT_Offset is the side-play amount of GOT epiphase for the file start address, can from the Section Headers of file, read (specifically can referring to the ELF handbook), Disp is the side-play amount of jump list address with respect to the %ebx storage address, draws from instruction operands.Meet semanteme 2 as long as therefore can identify code snippet, just can find the address of jump list, and then excavate the code segment at each branch target place.
As shown in Figure 7, recognition methods via the multiple target branch statement of jump list comprises in the binary translation of the present invention:
Step 10, the semanteme that will discern is converted into grapheme, the grapheme after the conversion can be used as template.Aforesaid semantic 1 changes into behind the grapheme as shown in Figure 1, and aforesaid semantic 2 change into behind the grapheme as shown in Figure 2.Wherein, the rhombus node is represented conditional transfer, and the square frame node is represented operational symbol, and oval node is represented the atomic operation number, and index with dashed lines frame table shows herein may be individual node, also may set for the stalk that a plurality of nodes form.The method that semanteme is converted into grapheme is ripe prior art, in the present embodiment no longer the specific implementation to this method describe, when specific implementation, the form of expression of grapheme also may be discrepant, Fig. 1 and Fig. 2 are a kind of manifestation modes, person skilled is being reproduced when of the present invention the constructing semantic figure that can come according to the concrete feature of concrete semantic feature and the code that will discern, as long as the grapheme of being constructed energy is complete and express required semanteme the most compactly.
Step 20, selection will be discerned semantic code snippet.Through analysis, find that as if satisfying above-mentioned semanteme 1 or semantic 2, one key characters be exactly to have a comparison order (cmp) and jump instruction (jmp*) indirectly in the code snippet at least to binary file.Therefore when choosing code snippet, at first to meet this feature.Select the specific implementation method of code snippet as follows.
Step 21, search indirect jump instruction (jmp*) in code, when running into this instruction, check the instruction of some forward, some described herein can be determined according to actual conditions, as 10 instructions.
Step 22, in the instruction of searching forward, see if there is comparison order (cmp instruction), if comparison order arranged, then with the code segment between comparison order and the indirect jump instruction as the code snippet that will discern, execution in step 30; If do not find comparison order, think that then these instructions do not possess the essential characteristic of required semanteme, do not process these instructions.
Step 30, be that selected code snippet makes up grapheme.The effect of grapheme is the semanteme of expressing instruction or code snippet.Make up grapheme and need introduce the formation of a recording operation number, the node of last look correspondence in grapheme of all operands that occurred and this operand in its record grapheme building process.For example, in instruction i, relate to operand a, then in grapheme, set up a node n1 for a, in operand queue, note simultaneously a and corresponding n1 thereof, then the instruction j in a by revaluation, make its corresponding new node n2 in grapheme, then to count the node of a correspondence in the formation be n2 to retouching operation.In concrete grapheme, also need to comprise the feature of some instructions, as operational code, operand addressing mode etc., but grapheme can be conveniently used under the multiple different instruction-level architecture (ISA).Below in conjunction with one section code in the example 1, the building process of grapheme is elaborated.
Each bar instruction is handled successively, according to operand and operational code constructing semantic figure.
A) in code segment, to cmp1 $0x6,0xfffffffc (%ebp) handles, and 0xfffffffc herein (%ebp) expression adds value that 0xfffffffc obtains to the content of %ebp register as memory address, and the content of internal storage location of getting this address is as operand.Create operand queue as shown in table 2 and the grapheme shown in Fig. 3 (a), in table 2, write down 3 Cao Zuoshuo $0x6,0xfffffffc, the pairing node of %ebp respectively.The modification item in the formation represented in boldface letter in the table, below each epiphase with.
Table 2
Operand S0x6 %ebp 0xfffffffc(%ebp)
node n3 n7 n4
In Fig. 3 (a), the rhombus node is represented conditional transfer, the square frame node is represented operational symbol, oval node is represented operand, and wherein, the Boolean expression in the transfering node is drawn in outside the rhombus, Boolean operator is defined as ">" earlier, but the order of left and right sides operand wouldn't determine, it is fixed again to wait the back to run into after JA or the JLE instruction, and the true exit of transfering node and false exit are also waited to fill out.
B) to code line ja 80ab428<category_to_name+0x74〉handle.In a),, carry out this step if the result who judges is true.Add operand 80ab428 in operand queue, the operand queue after the change is as shown in table 3.
Table 3
Operand $6 %ebp 0xfffffffc(%ebp) $80ab428
node n3 n7 n4 n8
Accordingly, grapheme is made further modification, amended figure compares with Fig. 3 (a) shown in Fig. 3 (b), and the order of boolean operands and jump condition are that the destination address of true time all can be determined.
C) to code line mov 0xfffffffc (%ebp), %eax handles.In this code line, the value of 0xfffffffc (%ebp) is modified, and the node of its representative also will be done corresponding change.In operand queue, also to add operand %eax, the node of this operand representative is identical with the node of the new representative of 0xfffffffc (%ebp).Operand queue after the change is as shown in table 4.
Table 4
Operand $6 %ebp 0xfffffffc(%ebp) $80ab428 %eax
node n3 n7 n4 n8 n4
Accordingly, grapheme is added new part, the grapheme after the interpolation has had the node of representing operand 0xfffffffc (%ebp) in Fig. 3 (b) shown in Fig. 3 (c), therefore %eax is appended on this node.
D) to code line mov%eax, %eax handles.The work that this code line will be finished is that %eax is appended on the node n4 of %eax place, and the state of operand queue and grapheme does not all change.
E) to code line shl $0x2, %eax handles, and this instruction may be interpreted as mul $4, %eax, and the value of operand %eax changes, and its pairing node also changes.$4 is new operand, for it sets up new node.Operand queue after the change is as shown in table 5.
Table 5
Operand $6 %ebp 0xfffffffc(%bp) $80ab428 %eax $4
node n3 n7 n4 n8 n10 n9
On corresponding grapheme, shown in Fig. 3 (d), at first find %eax place node (n4), making it is the left operand of multiplication operator, set up one for right operand 4 again and count node immediately, at last %eax is appended on the multiplication node, and %eax is deleted from original summation node.
F) to code line mov 0x81327b0 (%eax), %eax handles.In the instruction of this code line, the value of %eax changes, and respective change has also taken place the node of its representative.Operand queue after the change is as shown in table 6.
Table 6
Operand $6 %ebp 0xfffffffc(%ebp) $80ab428 %eax $4
node n3 n7 n4 n8 n13 n9
Shown in Fig. 3 (e), for operand %eax creates new node n13.
G) code line jmp*%eax is handled, in a),, carry out this step if the result who judges is false.In this step, operand queue does not change, and is still as shown in table 6.Corresponding change is arranged on grapheme, and shown in Fig. 3 (f), the part in the empty frame is exactly the pairing node of index; And, the jmp* node is hung on the false limit of transfering node according to wanting the recognized patterns characteristics.
3: one sections dis-assembling sign indicating numbers of taking from crafty among the SPEC2000 of example are compiled by GCC-03
806b0b1:83 f8 06 cmp $0x6,%eax
806b0b4:89 15 a4 e3 07 08 mov %edx,0x807e3a4
806b0ba:0f 87 1e 05 00 00 ja 806b5de<UnMakeMove+0x121e>
806b0c0:ff 24 85 f0 fd 06 08 jmp *0x806fdf0(,%eax,4)
With above-mentioned step, can create grapheme to the dis-assembling sign indicating number in the example 3, the grapheme of being created has an isolated node as shown in Figure 4 in the drawings, shows this node and wants recognition data stream irrelevant.Therefore use the method for grapheme can effectively weed out some incoherent information.
Step 40, check the template figure coupling whether grapheme that step 30 creates is created with step 10.Described template figure is Fig. 1 or Fig. 2, meets semantic 1 or semantic 2 if coupling illustrates the semanteme of selected code segment.Grapheme is a directed acyclic graph (DAG), and branch node is outer can regard a binary tree as but remove, and in this binary tree, if there are many limits to point to same node, then regard these limits as and points to independent node separately respectively; Branch node is special to be treated, and can think that it has three stalks tree.Therefore can carry out preorder traversal synchronously to tested mapping and template figure, the matching detection algorithm of structure recurrence, the root node that promptly detects earlier figure whether with the root node coupling of template figure, if the match is successful then detect respectively again the left subtree, right subtree of this node and the 3rd stalk tree (if existence) whether with template in corresponding subtree coupling, recurrence successively, all then the match is successful in success.
Node in the grapheme is divided into three types: branch node, operational symbol node, atomic operation are counted node, respectively rhombus, rectangle and the oval node in the corresponding diagram.The root node of branch node, begin grapheme is carried out matching detection from this node as whole grapheme.
In the process of carrying out recursive algorithm, this node of three types is carried out matching detection according to following rule respectively:
1, branch node: the three stalks tree that detects it successively whether with template matches.
2, operational symbol node: whether with every template node identical, detection is strict if detecting its content earlier, if promptly every template node is "+" node, then detected node must be "+" node also, detects successfully then two stalks tree about recursive detection.
3, atomic operation is counted node: the atomic operation number must be the leaf node in the grapheme, can be divided into two types again: register and several immediately (side-play amount is also regarded count immediately a kind of as), the value of register can be %eax, %ebx etc., and the value of Shuoing is then uncertain immediately.Whether mate for the ease of detecting two leaf nodes, need add the information of some matched rules on template figure, that is: (RESTRICT), not strict coupling (NOT_RESTRICT) are mated in coupling (ANY), strictness arbitrarily.As long as among the detected figure of coupling expression node is arranged herein arbitrarily, the content of node can be any; Strict this position node types of coupling expression must be identical with every template node with value; Not strict coupling represents that the node of this position only needs type identical with every template node, and value can be different.To after semantic 1 the template figure mark matched rule information as shown in Figure 5, to after semantic 2 the template figure mark matched rule information as shown in Figure 6.
Additional matched rule information has been arranged, then when detecting certain leaf node, can judge according to these information on its corresponding every template node whether it meets the requirements easily.
Above figure matching algorithm can be easy to realize with C language or other higher level lanquages.
The list item number of step 50, the address that obtains jump list and jump list, and then the branch target address that obtains writing down in the list item, it is valid code for a mark.In case the match is successful for grapheme, then can obtain the address and the list item information of number of jump list easily.For semanteme 1, the jump list address is positioned at " Table_Base " node place, and the list item information of number is positioned at " " node place.For semantic 2, at first obtain the side-play amount of jump list with respect to %ebx from " Disp " node, add the address of GOT table then, promptly obtain the address of jump list, jump list list item information of number is still from “ $num " node obtains.
The foregoing description has illustrated the identification that can realize via the multiple target branch statement of jump list under the Linux/x86 platform, under other operating platforms, method of the present invention is suitable for too.

Claims (4)

  1. In the binary translation via the recognition methods of the multiple target branch statement of jump list, comprising:
    1) semanteme that will discern is converted into grapheme, the grapheme after transforming as template figure;
    2) code snippet of selection semanteme to be identified; Wherein, at first in code, search indirect jump instruction, when running into this instruction, check the instruction of some forward; In the instruction of searching forward, see if there is comparison order then, if comparison order arranged, then with the code segment between comparison order and the indirect jump instruction as the code snippet that will discern; If do not find comparison order, then do not process;
    3) with step 2) code snippet selected makes up grapheme according to semanteme;
    4) the template figure that obtains of grapheme that step 3) is constructed and step 1) does coupling, if the match is successful, then identifies semanteme, carries out next step, otherwise the semantic recognition failures of selected code snippet;
    5) obtain the address and the size of jump list, and then the branch target address that obtains writing down in the list item, the code at the described branch target address of mark place is a valid code.
  2. 2. in the binary translation according to claim 1 via the recognition methods of the multiple target branch statement of jump list, it is characterized in that, in the described step 3), when making up grapheme, need introduce the formation of a recording operation number, the node of the last look of all operands that occurred and this operand correspondence in grapheme in this queue record grapheme building process.
  3. 3. in the binary translation according to claim 1 via the recognition methods of the multiple target branch statement of jump list, it is characterized in that, in the described step 4), grapheme to be detected and template figure are carried out synchronous traversal, whether the node that detects correspondence position among two figure is complementary, and the node of all correspondence positions all mates and thinks that then two figure mate among two figure.
  4. 4. in the binary translation according to claim 1 via the recognition methods of the multiple target branch statement of jump list, it is characterized in that, in the described step 4), the node of grapheme and template figure is divided into three types: branch node, operational symbol node, atomic operation are counted node, branch node is considered as the root node of whole grapheme, begin grapheme is carried out matching detection from this node,, carry out matching detection by different rules respectively for dissimilar nodes:
    4-1, branch node, the three stalks tree that detects it successively whether with template figure in the subtree coupling of correspondence position;
    4-2, operational symbol node, whether with in the template figure node of correspondence position identical, detect successfully then two stalks tree about recursive detection if detecting its content earlier;
    Whether 4-3, atomic operation are counted node, count node and mate for the ease of detecting two atomic operations, need add the information of some matched rules on template figure, that is: coupling, strict coupling, not strict coupling arbitrarily; As long as among the detected figure of coupling expression node is arranged herein arbitrarily, the content of node can be any; Strict this position node types of coupling expression must be identical with the node of correspondence position among the template figure with value; Not strict coupling represents that the node of this position only needs type identical with the node of correspondence position among the template figure, and value can be different.
CNB2005100855091A 2005-07-22 2005-07-22 Identifying method of multiple target branch statement through jump list in binary translation Active CN100377089C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNB2005100855091A CN100377089C (en) 2005-07-22 2005-07-22 Identifying method of multiple target branch statement through jump list in binary translation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNB2005100855091A CN100377089C (en) 2005-07-22 2005-07-22 Identifying method of multiple target branch statement through jump list in binary translation

Publications (2)

Publication Number Publication Date
CN1900910A CN1900910A (en) 2007-01-24
CN100377089C true CN100377089C (en) 2008-03-26

Family

ID=37656799

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB2005100855091A Active CN100377089C (en) 2005-07-22 2005-07-22 Identifying method of multiple target branch statement through jump list in binary translation

Country Status (1)

Country Link
CN (1) CN100377089C (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101271398B (en) * 2007-03-23 2010-06-09 北京大学 Recognition method of multi-path branch structure
CN102855139B (en) * 2012-08-10 2015-04-22 浙江省电力公司电力科学研究院 Method and system for clearing register in decompiling data flow analysis
CN103617049B (en) * 2013-12-19 2017-03-29 中国科学院声学研究所 code moving method based on complementary predicate
CN113296833B (en) * 2021-04-30 2024-03-05 中国科学院信息工程研究所 Identification method and device for legal instructions in binary file
CN113312082B (en) * 2021-04-30 2024-03-08 中国科学院信息工程研究所 Identification method and device for data mixed in instructions in binary file
CN114625844B (en) * 2022-05-16 2022-08-09 湖南汇视威智能科技有限公司 Code searching method, device and equipment
CN116126350B (en) * 2023-04-17 2023-09-12 龙芯中科技术股份有限公司 Binary translation method, binary translator and electronic equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6195792B1 (en) * 1998-02-19 2001-02-27 Nortel Networks Limited Software upgrades by conversion automation
CN1311881A (en) * 1998-06-04 2001-09-05 松下电器产业株式会社 Language conversion rule preparing device, language conversion device and program recording medium
US20040025151A1 (en) * 2002-07-31 2004-02-05 Shan-Chyun Ku Method for improving instruction selection efficiency in a DSP/RISC compiler
JP2005149269A (en) * 2003-11-18 2005-06-09 Hitachi Systems & Services Ltd System for processing structured document

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6195792B1 (en) * 1998-02-19 2001-02-27 Nortel Networks Limited Software upgrades by conversion automation
CN1311881A (en) * 1998-06-04 2001-09-05 松下电器产业株式会社 Language conversion rule preparing device, language conversion device and program recording medium
US20040025151A1 (en) * 2002-07-31 2004-02-05 Shan-Chyun Ku Method for improving instruction selection efficiency in a DSP/RISC compiler
JP2005149269A (en) * 2003-11-18 2005-06-09 Hitachi Systems & Services Ltd System for processing structured document

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于词法规则的语义对象匹配算法. 李毅,何伟国,李涓子.计算机工程,第31卷第4期. 2005 *

Also Published As

Publication number Publication date
CN1900910A (en) 2007-01-24

Similar Documents

Publication Publication Date Title
CN112733137B (en) Binary code similarity analysis method for vulnerability detection
US11036614B1 (en) Data control-oriented smart contract static analysis method and system
CN100377089C (en) Identifying method of multiple target branch statement through jump list in binary translation
CN110187885A (en) A kind of the intermediate code generation method and device of the compiling of quantum program
US20080288915A1 (en) Determining destinations of a dynamic branch
WO2019201225A1 (en) Deep learning for software defect identification
CN103077064B (en) A kind of parsing also executive language method and interpreting means
JP2005018767A (en) Query optimizer system and method
CN108491228A (en) A kind of binary vulnerability Code Clones detection method and system
US11262988B2 (en) Method and system for using subroutine graphs for formal language processing
Blindell Instruction Selection
Kersting et al. 'Say EM'for Selecting Probabilistic Models for Logical Sequences
JP5481571B2 (en) How to improve understanding of time complexity and flow in code
CN113536308B (en) Binary code tracing method for multi-granularity information fusion under software gene view angle
Valenzuela-Escarcega et al. Description of the Odin event extraction framework and rule language
CN113987405A (en) AST-based mathematical expression calculation algorithm
US6055529A (en) Method, apparatus and computer program product for extracting known patterns from a data structure in a single pass
KR20220077847A (en) A technique to BinDiff cross architecture binaries
CN100483402C (en) Programmable rule processing apparatus for conducting high speed contextual searches &amp; characterzations of patterns in data
JP6536266B2 (en) Compilation device, compilation method and compilation program
Alrabaee et al. Compiler provenance attribution
CN115269107B (en) Method, medium and electronic device for processing interface image
CN115879868B (en) Expert system and deep learning integrated intelligent contract security audit method
Martin et al. A virtual machine for event sequence identification using fuzzy tolerance
CN116755700A (en) Data collection method, device and medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant