Summary of the invention
In order to solve the problem of prior art, the invention provides a kind of antialiasing method of binary code based on semanteme, the antialiasing method of the described binary code based on semanteme, comprising:
Build reversible deformation template base, described reversible deformation template base comprises the instruction distortion strategy based on semanteme;
Import target program, determine whether described target program is executable file by the binary code detecting described target program;
When described target program is described executable file, perform described target program, extract the implementation information of described target program;
Stain analysis is carried out to described implementation information, obtain the instruction sequence being with the mark that has a stain, have a stain according to described band the instruction sequence orecontrolling factor flow graph of mark, and described controlling stream graph comprises at least two data fundamental blocks, and multiple described data fundamental block forms explicit path;
According to control dependence, the described controlling stream graph information of carrying out is supplemented, obtain, with described data fundamental block, there is the control fundamental block controlling dependence;
According to the redirect relation in described controlling stream graph, obtain the implicit expression path in described controlling stream graph except described explicit path, described explicit path and described implicit expression path are formed the many execution routes in described controlling stream graph, obtain complete controlling stream graph;
According to described complete controlling stream graph, in conjunction with described reversible deformation template base, instruction tailoring is carried out to described target program, obtain the succinct controlling stream graph that instruction after cutting is corresponding, according to the implementation acquisition of information function call relationship graph of described target program.
Optionally, described structure reversible deformation template base, described reversible deformation template base comprises the instruction distortion strategy based on semanteme, comprising:
Obtain the known distortion of the instruction based on semanteme strategy, described instruction distortion strategy comprises the presumptive instruction before distortion and the antialiasing instruction after distortion;
Described instruction distortion strategy is combined, obtains reversible deformation template base.
Optionally, the described implementation information extracting described target program when described target program is described executable file, comprising:
When determining that described target program is described executable file, perform described target program, and carry out performance analysis in all execution data of described target program:
Extract the implementation information in described all execution data, described implementation information is included in all command informations, register information and internal storage data information in described target program implementation, when performing described target program existence function and calling, also need to extract function call information.
Optionally, described stain analysis is carried out to described implementation information, obtain the instruction sequence being with the mark that has a stain, according to described band have a stain mark instruction sequence orecontrolling factor flow graph, described controlling stream graph comprises at least two data fundamental blocks, multiple described data fundamental block forms explicit path, comprising:
Determine the stain information of described target program, stain mark is carried out in all instructions containing described stain information in described implementation information, obtains the instruction sequence marked with described stain;
According to memory location and the front and back redirect relation of the described instruction sequence marked with described stain, orecontrolling factor flow graph, described controlling stream graph comprises at least two data fundamental blocks, and multiple described data fundamental block forms explicit path.
Optionally, the described stain information determining described target program, comprising:
To the content of preset format be met as stain information; Or
When there is not described preset format, using input content as described stain information.
Optionally, described according to control dependence the described controlling stream graph information of carrying out is supplemented, obtain, with described data fundamental block, there is the control fundamental block controlling dependence, comprising:
Obtain all data fundamental blocks in described controlling stream graph;
According to control dependence, obtain in described controlling stream graph, with described data fundamental block, there is the control fundamental block controlling dependence, described control fundamental block is supplemented in described controlling stream graph.
Optionally, described according to the redirect relation in described controlling stream graph, obtain the implicit expression path in described controlling stream graph except described explicit path, described explicit path and described implicit expression path are formed the many execution routes in described controlling stream graph, obtain complete controlling stream graph, comprising:
According to the redirect relation in described controlling stream graph, determine performing the described data fundamental block that redirect occurs in described target program process;
In the described data fundamental block that redirect occurs, according to the redirect relation corresponding with described data fundamental block, by the routing algorithm of depth-first, route searching is carried out to described data fundamental block, obtains hiding Paths except described explicit path, described in hide Paths form implicit expression path;
Form multipath based on described explicit path and described implicit expression path, on the basis of described controlling stream graph, obtain complete controlling stream graph.
Optionally, described according to described complete controlling stream graph, in conjunction with described reversible deformation template base, instruction tailoring is carried out to described target program, obtain the succinct controlling stream graph that instruction after cutting is corresponding, according to the implementation acquisition of information function call relationship graph of described target program, comprising:
First, the described data fundamental block in described complete controlling stream graph and between described control fundamental block and described data fundamental block or described control fundamental block inner, the instruction sequence marked without described stain in described controlling stream graph is abandoned;
Secondly, detect be out of shape described in reversible deformation template base before the presumptive instruction instruction sequence of mating, then described instruction sequence is carried out cutting, remaining instruction sequence in described complete controlling stream graph is saved as succinct controlling stream graph, extract the function calling relationship in the implementation information of described target program, obtain function call relationship graph.
The beneficial effect that technical scheme provided by the invention is brought is:
By after judging that target program is as executable file, performance objective program, and all data obtained in implementation, dynamic stain analysis and control dependency analysis is used to mark the critical data in all data and the instruction sequence relevant to critical data, markd instruction sequence will be with to build controlling stream graph, finally delete in controlling stream graph and obtain succinct controlling stream graph without the instruction sequence of mark, function call relationship graph is obtained according to the data in the implementation obtained, according to associative function call graph while of succinct controlling stream graph, antialiasing process is carried out to Malware, avoid in prior art to Malware carry out antialiasing process time poor universality defect, improve the versatility of antialiasing process applicable object, also reduce system overhead to a certain extent.
Embodiment
For making structure of the present invention and advantage clearly, below in conjunction with accompanying drawing, structure of the present invention is further described.
Because the antialiasing method of binary code of the present invention's design is based on virtual Environment Implementation, here first the Platform of hardware building virtual environment is described, so that the understanding to this method, and in Figure of description, there is partial content to be sectional drawing under virtual environment, hereby illustrating for averting misconceptions.
Experimental situation and test case
Experimental situation: Win7 operating system, 3.0GHz processor, 4GB internal memory, Miscrosoft visual c++ 6.0 development environment.
Experimental subjects: use Code Virtualizer V1.0.1.0 to test herein.
Test case: test case used herein is the assembly instruction shown in table 1.Use the protection intensity of this virtual acquiescence to protect it, generate test file tesecas_cv_1.0.exe.
The assembly instruction that table 1 the present embodiment uses
Embodiment one
The invention provides a kind of antialiasing method of binary code based on semanteme, as shown in Figure 1, the antialiasing method of the described binary code based on semanteme, comprising:
101, build reversible deformation template base, described reversible deformation template base comprises the instruction distortion strategy based on semanteme.
102, import target program, determine whether described target program is executable file by the binary code detecting described target program.
103, when described target program is described executable file, perform described target program, extract the implementation information of described target program.
104, stain analysis is carried out to described implementation information, obtain the instruction sequence being with the mark that has a stain, have a stain according to described band the instruction sequence orecontrolling factor flow graph of mark, and described controlling stream graph comprises at least two data fundamental blocks, and multiple described data fundamental block forms explicit path.
105, according to control dependence, the described controlling stream graph information of carrying out is supplemented, obtain, with described data fundamental block, there is the control fundamental block controlling dependence.
106, according to the redirect relation in described controlling stream graph, obtain the implicit expression path in described controlling stream graph except described explicit path, described explicit path and described implicit expression path are formed the many execution routes in described controlling stream graph, obtain complete controlling stream graph.
107, according to described complete controlling stream graph, in conjunction with described reversible deformation template base, instruction tailoring is carried out to described target program, obtain the succinct controlling stream graph that instruction after cutting is corresponding, according to the implementation acquisition of information function call relationship graph of described target program.
In force, in order to reach, antialiasing process be carried out to Malware, first need to build reversible deformation template base, in this template base, comprise the instruction distortion strategy based on semanteme; Following importing target program and if only if just performance objective program and obtain the implementation information generated in implementation when this target program is executable file; Secondly stain analysis is carried out to implementation information, obtain the instruction sequence being with the mark that has a stain, so that build controlling stream graph according to these instruction sequences,
It is worth mentioning that why build controlling stream graph here, be because according to controlling stream graph, the concrete flow direction of key message in Malware can be got, thus delete flowing to irrelevant part with key message, thus reach the object simplifying Malware.
After structure controlling stream graph, obtain the implicit expression path that cannot directly obtain for explicit path in controlling stream graph again, further obtain the control fundamental block existing with data fundamental block in controlling stream graph and control dependence again, thus controlling stream graph is supplemented into complete controlling stream graph.
Finally, complete controlling stream graph and before set up reversible deformation template base basis on, execution cutting is carried out to target program, obtain the succinct controlling stream graph sum functions call graph after cutting, thus be convenient to, according to succinct controlling stream graph sum functions call graph, complete the identification to Malware and later stage killing flow process.
A kind of antialiasing method of binary code based on semanteme proposed in the present embodiment, by after judging that target program is as executable file, performance objective program, and all data obtained in implementation, critical data in all data and the instruction sequence relevant to critical data are marked, markd instruction sequence will be with to build controlling stream graph, finally delete in controlling stream graph and obtain succinct controlling stream graph without the instruction sequence of mark, function call relationship graph is obtained according to the data in the implementation obtained, according to succinct controlling stream graph associative function call graph simultaneously, just can carry out antialiasing process to Malware, avoid in prior art to Malware carry out antialiasing process time poor universality defect, improve the versatility of antialiasing process applicable object, also reduce system overhead to a certain extent.
Optionally, namely step 101 builds reversible deformation template base, and the instruction distortion strategy that described reversible deformation template base comprises based on semanteme comprises:
Obtain the known distortion of the instruction based on semanteme strategy, described instruction distortion strategy comprises the presumptive instruction before distortion and the antialiasing instruction after distortion;
Described instruction distortion strategy is combined, obtains reversible deformation template base.
In force, first current known usual instructions distortion strategy is obtained, these instructions distortion strategy is set up based on semanteme, can ensure like this after presumptive instruction being simplified according to instruction distortion strategy, remaining instruction still can represent normal implication, does not affect the identification carried out to Malware after antialiasing process.
Concrete instruction distortion strategy is as shown in table 2.
Table 2 instruction distortion strategy
In Table 1, there are two row contents, left column is the original instruction content before distortion, namely known conventional some obscure directive statement, right row are then the antialiasing instruction after distortion, namely the directive statement after antialiasing process is carried out in the instruction of obscuring for same a line, be easy to find out from table 1, antialiasing each directive statement before treatment is generally 2 ~ 4 row, and after antialiasing process, 2 ~ 4 row can be reduced to 1 statement, the simplification amplitude of 25% ~ 50% namely only just can be obtained according to instruction distortion strategy, and then be out of shape under the assisting of reversible deformation template base that strategy form by multiple instruction, obscure behavior to Malware to a certain extent to contain.
Optionally, namely step 103 extracts the implementation information of described target program when described target program is described executable file, as shown in Figure 2, specifically comprise:
1031, when determining that described target program is described executable file, described target program is performed;
1032, the implementation information in described all execution data is extracted, described implementation information is included in all command informations, register information and internal storage data information in described target program implementation, when performing described target program existence function and calling, also need to extract function call information.
In force, described target program is opened in compiler, by judging that the mode of target program file head condition code judges this target program whether as executable file.
Wherein, executable file is here Portable Executable (portable can perform, PE).32 under all Windows or 64 executable files are all PE file layouts, comprising DLL, EXE, FON, OCX, LIB and part sys file.The one (also having NE, LE) of the executable file under Windows system is a kind of executable file format of Microsoft's design, the approval of TIS (Tool Interface Standard, the tool interface standard) council.
By being whether the detection of executable file to target program, determine that target program is after can performing type, performance objective program.
In the process of performance objective program, extract all execution data of this target program, perform data and specifically comprise command information, register information and internal storage data information, when performing described target program existence function and calling, also need to extract function call information.
For ease of understanding, concrete execution data can reference diagram 3, Fig. 3 is a part for the information of program tesecase_cv_1.0.exe when Dynamic Execution extracted: in figure 3,1-9 behavior register is respectively register eax, ecx, edx, ebx, esp, ebp, the detailed numerical value of esi, edi, eip, 10th row is corresponding assembly instruction, the corresponding data message in internal memory of the capable register value of 11-14 behavior 1-9, because this example does not carry out function call in the process of implementation, does not therefore have outcome function to call corresponding data in Fig. 3.If there is situation about calling function in the process of performance objective program, so now also need to obtain function call data.
Optionally, namely step 104 carries out stain analysis to described implementation information, obtain the instruction sequence being with the mark that has a stain, according to described band have a stain mark instruction sequence orecontrolling factor flow graph, described controlling stream graph comprises at least two data fundamental blocks, multiple described data fundamental block forms explicit path, as shown in Figure 4, comprising:
1041, determine the stain information of described target program, stain mark is carried out in all instructions containing described stain information in described implementation information, obtains the instruction sequence marked with described stain;
1042, according to memory location and the front and back redirect relation of the described instruction sequence marked with described stain, orecontrolling factor flow graph, described controlling stream graph comprises at least two data fundamental blocks, and multiple described data fundamental block forms explicit path.
In force, determining the stain information relative to this target program in the execution data got, is here stain by user's sensitive data or mistrustful input data markers, here using pollution source as stain information here, follow the tracks of its communication process.All carry out stain mark by performing all instructions including stain information in data, and then get the instruction sequence being with the mark that has a stain.As shown in Figure 5, stain mark uses " tainted ", and every in Figure 5 have " tainted " as the instruction sequence of the row of suffix, is the instruction sequence being with the mark that has a stain.
After determining the instruction sequence with the mark that has a stain, build controlling stream graph according to the memory location in these instruction sequences and front and back redirect relation.
Here controlling stream graph (CFG, Control flow graph) is also control flow chart, is the abstraction of a process or program.Controlling stream graph often represents with the form of data structure chain.Each node on behalf in the graphic fundamental block in controlling stream graph, such as, without any jumping or the straight-line code block of jump target; Jump target with a BOB(beginning of block), with a block end.Directed edge is used to represent the jump in control flow check.
The concrete steps building controlling stream graph are as follows:
(1) all first instructions are marked
The Article 1 instruction of function is first instruction;
The jump target of any transfer instruction is all first instruction;
The instruction of closelying follow after conditional branch instruction is all first instruction.
(2) each fundamental block is all have certain first instruction to start until all instructions till next first instruction between (but not comprising next first instruction) are formed.
(3) if the jump target of fundamental block A ending place transfer instruction is fundamental block B, or B is immediately following after A, then add a limit A->B.
In controlling stream graph, the node in controlling stream graph is called " fundamental block ", and fundamental block is that the instruction of a series of order input is formed, and ending place is a conditional branch instruction normally.Represent in controlling stream graph be all can the superset of execution route.
When building the controlling stream graph being directed to the instruction sequence being with the mark that has a stain, as shown in Figure 6 exemplary, 42c290 and 42c2aa in 2nd row is start address and the end address of No. 0 fundamental block (The 0 Fragment) respectively, further learns that the subsequent sheet of No. 0 fundamental block is No. 1 fundamental block and No. 2 fundamental blocks according to the information in the 3rd row.According to above-mentioned information, the simplified form building controlling stream graph as shown in Figure 7, can be found out, comprise multiple data fundamental block in controlling stream graph, and multiple data fundamental block forms explicit path according to forward-backward correlation order.
Optionally, the stain information of target program described in the determination in above-mentioned steps 1041, comprising:
To the content of preset format be met as stain information; Or
When there is not described preset format, using input content as described stain information.
In force, owing to needing the acquisition carrying out the instruction sequence with the mark that has a stain according to stain information in previous step, and then just can complete the structure of controlling stream graph, the confirmation therefore for stain information just seems especially important.
In actual service condition, pollution source have system default and User Defined two kinds of modes, and system default is pollution source from the standard input of program or the data of reading in from network; User also can go to specify or get on to increase pollution source on the basis of system default according to the needs of oneself.Now need user can prestore as the content format of stain information, make when performance objective program like this, once the content meeting this content format be detected, just can immediately using this content as stain information, and then complete subsequent processing steps.
Concrete, when predetermined format not being carried out content and storing, give tacit consent to the content that inputs during performance objective program as stain information, " 3 " in " 2 " such as, in Fig. 5 in the 1st row " 0x2 " and the 3rd row " 0x3 " are the input contents when performance objective program, now just by " 2 " and " 3 " as stain information, and then the 1st row and the 2nd row instruction sequence that comprise the two are carried out stain mark.
By carrying out the mode of stain mark, shift by key content and by this key content, the instruction of the process such as computing is as the instruction sequence being with the mark that has a stain, be convenient to so preferentially follow the tracks of process to key message and all instructions relevant to key message, the processing mode of low relative importance value is then carried out in the instruction that those and key message have nothing to do, thus under ensureing that treating handling procedure carries out the prerequisite of antialiasing process, effective reduction system overhead, also can reduce to a certain extent and make antialiasing process stop the generation of this fortuitous event because system overhead is excessive.
Optionally, namely step 105 supplements the described controlling stream graph information of carrying out according to control dependence, obtains having with described data fundamental block the control fundamental block controlling dependence, comprising:
Obtain all data fundamental blocks in described controlling stream graph;
According to control dependence, obtain in described controlling stream graph, with described data fundamental block, there is the control fundamental block controlling dependence, described control fundamental block is supplemented in described controlling stream graph.
In force, in the controlling stream graph established before, obtain all data fundamental blocks, in these data fundamental blocks, include the instruction of a series of order input,
It should be noted that the mainly explicit information flow owing to being obtained by stain analysis, the controlling stream graph obtained obviously is sufficiently complete, in order to obtain prize procedure implicit information stream in the process of implementation, also needs to carry out Control dependence.
Table 3 controls the part false code of Dependent Algorithm in Precision
Two node w and v in program control flowchart, if the postdominator that node w is v refers to each path from v to stop all comprise w, and node can not be it self postdominator.Formally to be expressed as follows: node w is that the postdominator of node v refers to unique immediate successor that w is v or v has multiple immediate successor, and to be the postdominator of u for all follow-up u, the w of v.
Two instruction i and j in program, node j control to depend on the result that node i refers to node i and determine whether node j performs.Formally be expressed as follows, node j controls to depend on below node i demand fulfillment two conditions: there is a non null paths between (1) i and j; (2) j is the postdominator of other all nodes except i on this paths.
Can obtain drawing some execution sequences in control flow chart between each fundamental block by controlling dependence, so just can directly draw according to control dependence.
Optionally, step 106 is namely according to the redirect relation in described controlling stream graph, obtain the implicit expression path in described controlling stream graph except described explicit path, described explicit path and described implicit expression path are formed the many execution routes in described controlling stream graph, obtain complete controlling stream graph, as shown in Figure 8, comprising:
1061, according to the redirect relation in described controlling stream graph, determine performing the described data fundamental block that redirect occurs in described target program process.
1062, in the described data fundamental block that redirect occurs, according to the redirect relation corresponding with described data fundamental block, by the routing algorithm of depth-first, route searching is carried out to described data fundamental block, obtain hiding Paths except described explicit path, described in hide Paths form implicit expression path.
1063, form multipath based on described explicit path and described implicit expression path, on the basis of described controlling stream graph, obtain complete controlling stream graph.
In force, in the controlling stream graph shown in Fig. 7, can find out, redirect is there is between several data fundamental blocks from top to bottom, mean in each redirect place simultaneously, the higher level's data fundamental block having redirect relation corresponding and subordinate's data fundamental block, in higher level's data fundamental block of each redirect relation, route searching is carried out to this data fundamental block, here in order to realize the concrete condition in other paths of this data fundamental block, typically can use depth-first traversal (Depth-First Traversal, DFS) routing algorithm, can ensure that all paths relevant with this data fundamental block can be searched completely like this.
Pass through depth-priority-searching method, the all paths relevant with higher level's data fundamental block that redirect occurs can be got, due to the path that gets here not before build controlling stream graph time directly get, relative to the former " explicit " characteristic, these paths have more the character of " hidden ", therefore, this step is undertaken traveling through hiding Paths of just getting by routing algorithm and call " implicit expression " path.
After getting all implicit expression paths in controlling stream graph, in conjunction with before the explicit path that just gets, the two is merged into multipath, after on the controlling stream graph before being replenished by multipath, obtains the complete controlling stream graph with All Paths.Here, Fig. 9 is after supplementing according to the control dependence analysis of step 105 and the multipath of step 106, the complete controlling stream graph obtained.
Here all paths relevant to fundamental block to why be obtained, be because only according to band have a stain mark instruction sequence build controlling stream graph only obtain part execution route, and the structure of many execution routes efficiently avoid the problem that the part behavior that single execution route can only reflect program and the performance cost brought by symbolic analysis cause greatly and easily Space Explosion, saves system overhead.
Optionally, step 107, namely according to described complete controlling stream graph, carries out instruction tailoring in conjunction with described reversible deformation template base to described target program, obtains the succinct controlling stream graph that instruction after cutting is corresponding, according to the implementation acquisition of information function call relationship graph of described target program, comprising:
First, the described data fundamental block in described complete controlling stream graph and between described control fundamental block and described data fundamental block or described control fundamental block inner, the instruction sequence marked without described stain in described controlling stream graph is abandoned;
Secondly, detect be out of shape described in reversible deformation template base before the presumptive instruction instruction sequence of mating, then described instruction sequence is carried out cutting, remaining instruction sequence in described complete controlling stream graph is saved as succinct controlling stream graph, extract the function calling relationship in the implementation information of described target program, obtain function call relationship graph.
In force, can recognize according to foregoing, stain Essence of Information is before exactly most important information in Malware, namely the instruction sequence that the treatment step relevant to stain information is corresponding is important step with the instruction sequence marked that has a stain, and these are only the necessary part that Malware can realize complete function.Except the part except these parts; all can regard as Malware in order to protect self avoid the protection part that generates by antivirus software killing; exactly because the existence of these protection parts; the feature of Malware is made to be in the state hidden and can not be correctly validated; this step is just according to the complete controlling stream graph got; delete processing is abandoned to the instruction sequence marked without stain, only retains and best embody the band of the part of characteristic of malware-namely and to have a stain the instruction sequence of mark.
Based on above-mentioned theory and complete controlling stream graph, the data fundamental block in complete controlling stream graph and controlling between fundamental block, deletes the instruction sequence without stain mark.
And if then described data fundamental block in described complete controlling stream graph or described control fundamental block inner, detect be out of shape described in reversible deformation template base before the presumptive instruction instruction sequence of mating, then described instruction sequence being carried out cutting, is succinct controlling stream graph; If obtaining the function call information obtained in the implementation information in performance objective program, then also get function call relationship graph according to function calling relationship.So that carry out instruction tailoring in conjunction with succinct controlling stream graph to Malware.
Similar with the reason of the succinct controlling stream graph of above-mentioned acquisition; be in the consideration of same object; data fundamental block in complete controlling stream graph is inner with control fundamental block; according to the instruction distortion strategy in reversible deformation template base in step 101; the original instruction content met in instruction distortion strategy is converted to antialiasing instruction; make the useless statement of complexity shielded in Malware become concise and to the point statement, reach the effect of simplifying Malware instruction.
3 steps are generally all followed when carrying out code analysis:
1. namely partial analysis analyzes each fundamental block.
2. global analysis's (also known as process inner analysis) i.e. controlling stream graph of analytic function.
3. the call relation between interprocedural analysis and analytic function.Call relation between function represents with function call relationship graph.
In practice, general function call graph and controlling stream graph are considered as separately independently entity.Therefore in step 108, succinct controlling stream graph sum functions call graph is obtained respectively.The two is carrying out, in instruction tailoring process, playing different effects respectively.
A kind of antialiasing method of binary code based on semanteme proposed in the present embodiment, by after judging that target program is as executable file, performance objective program, and all data obtained in implementation, dynamic stain analysis and control dependency analysis is used to mark the critical data in all data and the instruction sequence relevant to critical data, markd instruction sequence will be with to build controlling stream graph, finally delete in controlling stream graph and obtain succinct controlling stream graph without the instruction sequence of mark, function call relationship graph is obtained according to the data in the implementation obtained, according to associative function call graph while of succinct controlling stream graph, antialiasing process is carried out to Malware, avoid in prior art to Malware carry out antialiasing process time poor universality defect, improve the versatility of antialiasing process applicable object, also reduce system overhead to a certain extent.
It should be noted that: the antialiasing method that above-described embodiment provides carries out the embodiment of antialiasing process to Malware, only as explanation in actual applications in this antialiasing method, can also use in other application scenarioss according to actual needs and by above-mentioned antialiasing method, its specific implementation process is similar to above-described embodiment, repeats no more here.
Each sequence number in above-described embodiment, just to describing, not to represent in the assembling of each parts or use procedure to obtain sequencing.
The foregoing is only embodiments of the invention, not in order to limit the present invention, within the spirit and principles in the present invention all, any amendment done, equivalent replacement, improvement etc., all should be included within protection scope of the present invention.