CN104834837A - Binary code anti-obfuscation method based on semanteme - Google Patents

Binary code anti-obfuscation method based on semanteme Download PDF

Info

Publication number
CN104834837A
CN104834837A CN201510158163.7A CN201510158163A CN104834837A CN 104834837 A CN104834837 A CN 104834837A CN 201510158163 A CN201510158163 A CN 201510158163A CN 104834837 A CN104834837 A CN 104834837A
Authority
CN
China
Prior art keywords
controlling stream
stream graph
target program
instruction
stain
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510158163.7A
Other languages
Chinese (zh)
Other versions
CN104834837B (en
Inventor
王蕾
郭军
汤战勇
房鼎益
陈晓江
李光辉
郝朝辉
王�华
张恒
叶贵鑫
周祥
陈锋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Northwest University
Original Assignee
Northwest University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northwest University filed Critical Northwest University
Priority to CN201510158163.7A priority Critical patent/CN104834837B/en
Publication of CN104834837A publication Critical patent/CN104834837A/en
Application granted granted Critical
Publication of CN104834837B publication Critical patent/CN104834837B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The present invention discloses a binary code anti-obfuscation method based on semanteme, belonging to the field of software security. The method comprises: extracting execution process information of a target program; conducting a taint analysis on the execution process information to obtain a control flow graph; and carrying out instruction tailoring on the target program according to the control flow graph to obtain a concise control flow graph. According to the method provided by the present invention, after it is determined that the target program is an executable file, the target program is executed, all the data during execution period is acquired, key data in all the data and instruction sequences relevant to the key data are labeled, the control flow graph is established by using the instruction sequences with labels, and finally the instruction sequences without labels in the control flow graph are deleted to obtain a concise control flow graph. Anti-obfuscation processing can be performed for the malicious software according to the concise control flow graph, thereby addressing the defect of poor universality of the prior art, improving the application universality that anti-obfuscation processing, and reducing the system overhead to some extent.

Description

A kind of antialiasing method of binary code based on semanteme
Technical field
The invention belongs to software security field, particularly a kind of antialiasing method of binary code based on semanteme.
Background technology
Nowadays along with the dependence of people to computer program is more and more higher, in order to increase the security of program, use safety software carries out the protected mode that programmed protection has become important.But the killing of Malware manufacturer in order to make the rogue program of self " escape " fail-safe software, the mode of " obscuring " can be used to increase the complexity of Malware, reach the effect changing the original code characteristic of rogue program, thus increase fail-safe software manufacturer carries out conversed analysis system overhead to it, realize the object avoiding killing.For above-mentioned situation, just need to use the mode of " antialiasing ", so that obtain removing the real condition code of " camouflage " Malware afterwards, and then to the killing of realization to Malware.
Existing antialiasing technology, as static schema matching technique, the antialiasing algorithm etc. that is directed to identifier rename are all based on static state or carry out antialiasing on the basis of source code, the antialiasing research for virtual obfuscation is more rare.
Realizing in process of the present invention, inventor finds that prior art at least exists following problem:
The structure of the existing virtual antialiasing technology first virtual interpreter of conversed analysis, then uses these information to go to calculate all bytecodes, finally recovers the source code be embedded in interpreter.When the interpreter structure of our process meets our particular demands, although this method is effective, versatility is poor, can not be applicable to a large amount of " antialiasing " process.
Summary of the invention
In order to solve the problem of prior art, the invention provides a kind of antialiasing method of binary code based on semanteme, the antialiasing method of the described binary code based on semanteme, comprising:
Build reversible deformation template base, described reversible deformation template base comprises the instruction distortion strategy based on semanteme;
Import target program, determine whether described target program is executable file by the binary code detecting described target program;
When described target program is described executable file, perform described target program, extract the implementation information of described target program;
Stain analysis is carried out to described implementation information, obtain the instruction sequence being with the mark that has a stain, have a stain according to described band the instruction sequence orecontrolling factor flow graph of mark, and described controlling stream graph comprises at least two data fundamental blocks, and multiple described data fundamental block forms explicit path;
According to control dependence, the described controlling stream graph information of carrying out is supplemented, obtain, with described data fundamental block, there is the control fundamental block controlling dependence;
According to the redirect relation in described controlling stream graph, obtain the implicit expression path in described controlling stream graph except described explicit path, described explicit path and described implicit expression path are formed the many execution routes in described controlling stream graph, obtain complete controlling stream graph;
According to described complete controlling stream graph, in conjunction with described reversible deformation template base, instruction tailoring is carried out to described target program, obtain the succinct controlling stream graph that instruction after cutting is corresponding, according to the implementation acquisition of information function call relationship graph of described target program.
Optionally, described structure reversible deformation template base, described reversible deformation template base comprises the instruction distortion strategy based on semanteme, comprising:
Obtain the known distortion of the instruction based on semanteme strategy, described instruction distortion strategy comprises the presumptive instruction before distortion and the antialiasing instruction after distortion;
Described instruction distortion strategy is combined, obtains reversible deformation template base.
Optionally, the described implementation information extracting described target program when described target program is described executable file, comprising:
When determining that described target program is described executable file, perform described target program, and carry out performance analysis in all execution data of described target program:
Extract the implementation information in described all execution data, described implementation information is included in all command informations, register information and internal storage data information in described target program implementation, when performing described target program existence function and calling, also need to extract function call information.
Optionally, described stain analysis is carried out to described implementation information, obtain the instruction sequence being with the mark that has a stain, according to described band have a stain mark instruction sequence orecontrolling factor flow graph, described controlling stream graph comprises at least two data fundamental blocks, multiple described data fundamental block forms explicit path, comprising:
Determine the stain information of described target program, stain mark is carried out in all instructions containing described stain information in described implementation information, obtains the instruction sequence marked with described stain;
According to memory location and the front and back redirect relation of the described instruction sequence marked with described stain, orecontrolling factor flow graph, described controlling stream graph comprises at least two data fundamental blocks, and multiple described data fundamental block forms explicit path.
Optionally, the described stain information determining described target program, comprising:
To the content of preset format be met as stain information; Or
When there is not described preset format, using input content as described stain information.
Optionally, described according to control dependence the described controlling stream graph information of carrying out is supplemented, obtain, with described data fundamental block, there is the control fundamental block controlling dependence, comprising:
Obtain all data fundamental blocks in described controlling stream graph;
According to control dependence, obtain in described controlling stream graph, with described data fundamental block, there is the control fundamental block controlling dependence, described control fundamental block is supplemented in described controlling stream graph.
Optionally, described according to the redirect relation in described controlling stream graph, obtain the implicit expression path in described controlling stream graph except described explicit path, described explicit path and described implicit expression path are formed the many execution routes in described controlling stream graph, obtain complete controlling stream graph, comprising:
According to the redirect relation in described controlling stream graph, determine performing the described data fundamental block that redirect occurs in described target program process;
In the described data fundamental block that redirect occurs, according to the redirect relation corresponding with described data fundamental block, by the routing algorithm of depth-first, route searching is carried out to described data fundamental block, obtains hiding Paths except described explicit path, described in hide Paths form implicit expression path;
Form multipath based on described explicit path and described implicit expression path, on the basis of described controlling stream graph, obtain complete controlling stream graph.
Optionally, described according to described complete controlling stream graph, in conjunction with described reversible deformation template base, instruction tailoring is carried out to described target program, obtain the succinct controlling stream graph that instruction after cutting is corresponding, according to the implementation acquisition of information function call relationship graph of described target program, comprising:
First, the described data fundamental block in described complete controlling stream graph and between described control fundamental block and described data fundamental block or described control fundamental block inner, the instruction sequence marked without described stain in described controlling stream graph is abandoned;
Secondly, detect be out of shape described in reversible deformation template base before the presumptive instruction instruction sequence of mating, then described instruction sequence is carried out cutting, remaining instruction sequence in described complete controlling stream graph is saved as succinct controlling stream graph, extract the function calling relationship in the implementation information of described target program, obtain function call relationship graph.
The beneficial effect that technical scheme provided by the invention is brought is:
By after judging that target program is as executable file, performance objective program, and all data obtained in implementation, dynamic stain analysis and control dependency analysis is used to mark the critical data in all data and the instruction sequence relevant to critical data, markd instruction sequence will be with to build controlling stream graph, finally delete in controlling stream graph and obtain succinct controlling stream graph without the instruction sequence of mark, function call relationship graph is obtained according to the data in the implementation obtained, according to associative function call graph while of succinct controlling stream graph, antialiasing process is carried out to Malware, avoid in prior art to Malware carry out antialiasing process time poor universality defect, improve the versatility of antialiasing process applicable object, also reduce system overhead to a certain extent.
Accompanying drawing explanation
In order to be illustrated more clearly in technical scheme of the present invention, below the accompanying drawing used required in describing embodiment is briefly described, apparently, accompanying drawing in the following describes is only some embodiments of the present invention, for those of ordinary skill in the art, under the prerequisite not paying creative work, other accompanying drawing can also be obtained according to these accompanying drawings.
Fig. 1 is the schematic flow sheet of the antialiasing method of a kind of binary code based on semanteme provided by the invention;
Fig. 2 is the schematic flow sheet extracting implementation information in the antialiasing method of a kind of binary code based on semanteme provided by the invention;
Fig. 3 is the result schematic diagram extracting concrete data in performance objective program process provided by the invention;
Fig. 4 is the schematic flow sheet building controlling stream graph in the antialiasing method of a kind of binary code based on semanteme provided by the invention;
Fig. 5 is the result schematic diagram of carrying out stain mark in the antialiasing method of a kind of binary code based on semanteme provided by the invention;
Fig. 6 is provided by the inventionly obtaining the result schematic diagram of carrying out stain mark in instruction sequence;
Fig. 7 is the structural representation of the controlling stream graph built provided by the invention;
Fig. 8 is the schematic flow sheet building complete controlling stream graph in the antialiasing method of a kind of binary code based on semanteme provided by the invention;
Fig. 9 is the structural representation of the complete controlling stream graph built provided by the invention;
Embodiment
For making structure of the present invention and advantage clearly, below in conjunction with accompanying drawing, structure of the present invention is further described.
Because the antialiasing method of binary code of the present invention's design is based on virtual Environment Implementation, here first the Platform of hardware building virtual environment is described, so that the understanding to this method, and in Figure of description, there is partial content to be sectional drawing under virtual environment, hereby illustrating for averting misconceptions.
Experimental situation and test case
Experimental situation: Win7 operating system, 3.0GHz processor, 4GB internal memory, Miscrosoft visual c++ 6.0 development environment.
Experimental subjects: use Code Virtualizer V1.0.1.0 to test herein.
Test case: test case used herein is the assembly instruction shown in table 1.Use the protection intensity of this virtual acquiescence to protect it, generate test file tesecas_cv_1.0.exe.
The assembly instruction that table 1 the present embodiment uses
Embodiment one
The invention provides a kind of antialiasing method of binary code based on semanteme, as shown in Figure 1, the antialiasing method of the described binary code based on semanteme, comprising:
101, build reversible deformation template base, described reversible deformation template base comprises the instruction distortion strategy based on semanteme.
102, import target program, determine whether described target program is executable file by the binary code detecting described target program.
103, when described target program is described executable file, perform described target program, extract the implementation information of described target program.
104, stain analysis is carried out to described implementation information, obtain the instruction sequence being with the mark that has a stain, have a stain according to described band the instruction sequence orecontrolling factor flow graph of mark, and described controlling stream graph comprises at least two data fundamental blocks, and multiple described data fundamental block forms explicit path.
105, according to control dependence, the described controlling stream graph information of carrying out is supplemented, obtain, with described data fundamental block, there is the control fundamental block controlling dependence.
106, according to the redirect relation in described controlling stream graph, obtain the implicit expression path in described controlling stream graph except described explicit path, described explicit path and described implicit expression path are formed the many execution routes in described controlling stream graph, obtain complete controlling stream graph.
107, according to described complete controlling stream graph, in conjunction with described reversible deformation template base, instruction tailoring is carried out to described target program, obtain the succinct controlling stream graph that instruction after cutting is corresponding, according to the implementation acquisition of information function call relationship graph of described target program.
In force, in order to reach, antialiasing process be carried out to Malware, first need to build reversible deformation template base, in this template base, comprise the instruction distortion strategy based on semanteme; Following importing target program and if only if just performance objective program and obtain the implementation information generated in implementation when this target program is executable file; Secondly stain analysis is carried out to implementation information, obtain the instruction sequence being with the mark that has a stain, so that build controlling stream graph according to these instruction sequences,
It is worth mentioning that why build controlling stream graph here, be because according to controlling stream graph, the concrete flow direction of key message in Malware can be got, thus delete flowing to irrelevant part with key message, thus reach the object simplifying Malware.
After structure controlling stream graph, obtain the implicit expression path that cannot directly obtain for explicit path in controlling stream graph again, further obtain the control fundamental block existing with data fundamental block in controlling stream graph and control dependence again, thus controlling stream graph is supplemented into complete controlling stream graph.
Finally, complete controlling stream graph and before set up reversible deformation template base basis on, execution cutting is carried out to target program, obtain the succinct controlling stream graph sum functions call graph after cutting, thus be convenient to, according to succinct controlling stream graph sum functions call graph, complete the identification to Malware and later stage killing flow process.
A kind of antialiasing method of binary code based on semanteme proposed in the present embodiment, by after judging that target program is as executable file, performance objective program, and all data obtained in implementation, critical data in all data and the instruction sequence relevant to critical data are marked, markd instruction sequence will be with to build controlling stream graph, finally delete in controlling stream graph and obtain succinct controlling stream graph without the instruction sequence of mark, function call relationship graph is obtained according to the data in the implementation obtained, according to succinct controlling stream graph associative function call graph simultaneously, just can carry out antialiasing process to Malware, avoid in prior art to Malware carry out antialiasing process time poor universality defect, improve the versatility of antialiasing process applicable object, also reduce system overhead to a certain extent.
Optionally, namely step 101 builds reversible deformation template base, and the instruction distortion strategy that described reversible deformation template base comprises based on semanteme comprises:
Obtain the known distortion of the instruction based on semanteme strategy, described instruction distortion strategy comprises the presumptive instruction before distortion and the antialiasing instruction after distortion;
Described instruction distortion strategy is combined, obtains reversible deformation template base.
In force, first current known usual instructions distortion strategy is obtained, these instructions distortion strategy is set up based on semanteme, can ensure like this after presumptive instruction being simplified according to instruction distortion strategy, remaining instruction still can represent normal implication, does not affect the identification carried out to Malware after antialiasing process.
Concrete instruction distortion strategy is as shown in table 2.
Table 2 instruction distortion strategy
In Table 1, there are two row contents, left column is the original instruction content before distortion, namely known conventional some obscure directive statement, right row are then the antialiasing instruction after distortion, namely the directive statement after antialiasing process is carried out in the instruction of obscuring for same a line, be easy to find out from table 1, antialiasing each directive statement before treatment is generally 2 ~ 4 row, and after antialiasing process, 2 ~ 4 row can be reduced to 1 statement, the simplification amplitude of 25% ~ 50% namely only just can be obtained according to instruction distortion strategy, and then be out of shape under the assisting of reversible deformation template base that strategy form by multiple instruction, obscure behavior to Malware to a certain extent to contain.
Optionally, namely step 103 extracts the implementation information of described target program when described target program is described executable file, as shown in Figure 2, specifically comprise:
1031, when determining that described target program is described executable file, described target program is performed;
1032, the implementation information in described all execution data is extracted, described implementation information is included in all command informations, register information and internal storage data information in described target program implementation, when performing described target program existence function and calling, also need to extract function call information.
In force, described target program is opened in compiler, by judging that the mode of target program file head condition code judges this target program whether as executable file.
Wherein, executable file is here Portable Executable (portable can perform, PE).32 under all Windows or 64 executable files are all PE file layouts, comprising DLL, EXE, FON, OCX, LIB and part sys file.The one (also having NE, LE) of the executable file under Windows system is a kind of executable file format of Microsoft's design, the approval of TIS (Tool Interface Standard, the tool interface standard) council.
By being whether the detection of executable file to target program, determine that target program is after can performing type, performance objective program.
In the process of performance objective program, extract all execution data of this target program, perform data and specifically comprise command information, register information and internal storage data information, when performing described target program existence function and calling, also need to extract function call information.
For ease of understanding, concrete execution data can reference diagram 3, Fig. 3 is a part for the information of program tesecase_cv_1.0.exe when Dynamic Execution extracted: in figure 3,1-9 behavior register is respectively register eax, ecx, edx, ebx, esp, ebp, the detailed numerical value of esi, edi, eip, 10th row is corresponding assembly instruction, the corresponding data message in internal memory of the capable register value of 11-14 behavior 1-9, because this example does not carry out function call in the process of implementation, does not therefore have outcome function to call corresponding data in Fig. 3.If there is situation about calling function in the process of performance objective program, so now also need to obtain function call data.
Optionally, namely step 104 carries out stain analysis to described implementation information, obtain the instruction sequence being with the mark that has a stain, according to described band have a stain mark instruction sequence orecontrolling factor flow graph, described controlling stream graph comprises at least two data fundamental blocks, multiple described data fundamental block forms explicit path, as shown in Figure 4, comprising:
1041, determine the stain information of described target program, stain mark is carried out in all instructions containing described stain information in described implementation information, obtains the instruction sequence marked with described stain;
1042, according to memory location and the front and back redirect relation of the described instruction sequence marked with described stain, orecontrolling factor flow graph, described controlling stream graph comprises at least two data fundamental blocks, and multiple described data fundamental block forms explicit path.
In force, determining the stain information relative to this target program in the execution data got, is here stain by user's sensitive data or mistrustful input data markers, here using pollution source as stain information here, follow the tracks of its communication process.All carry out stain mark by performing all instructions including stain information in data, and then get the instruction sequence being with the mark that has a stain.As shown in Figure 5, stain mark uses " tainted ", and every in Figure 5 have " tainted " as the instruction sequence of the row of suffix, is the instruction sequence being with the mark that has a stain.
After determining the instruction sequence with the mark that has a stain, build controlling stream graph according to the memory location in these instruction sequences and front and back redirect relation.
Here controlling stream graph (CFG, Control flow graph) is also control flow chart, is the abstraction of a process or program.Controlling stream graph often represents with the form of data structure chain.Each node on behalf in the graphic fundamental block in controlling stream graph, such as, without any jumping or the straight-line code block of jump target; Jump target with a BOB(beginning of block), with a block end.Directed edge is used to represent the jump in control flow check.
The concrete steps building controlling stream graph are as follows:
(1) all first instructions are marked
The Article 1 instruction of function is first instruction;
The jump target of any transfer instruction is all first instruction;
The instruction of closelying follow after conditional branch instruction is all first instruction.
(2) each fundamental block is all have certain first instruction to start until all instructions till next first instruction between (but not comprising next first instruction) are formed.
(3) if the jump target of fundamental block A ending place transfer instruction is fundamental block B, or B is immediately following after A, then add a limit A->B.
In controlling stream graph, the node in controlling stream graph is called " fundamental block ", and fundamental block is that the instruction of a series of order input is formed, and ending place is a conditional branch instruction normally.Represent in controlling stream graph be all can the superset of execution route.
When building the controlling stream graph being directed to the instruction sequence being with the mark that has a stain, as shown in Figure 6 exemplary, 42c290 and 42c2aa in 2nd row is start address and the end address of No. 0 fundamental block (The 0 Fragment) respectively, further learns that the subsequent sheet of No. 0 fundamental block is No. 1 fundamental block and No. 2 fundamental blocks according to the information in the 3rd row.According to above-mentioned information, the simplified form building controlling stream graph as shown in Figure 7, can be found out, comprise multiple data fundamental block in controlling stream graph, and multiple data fundamental block forms explicit path according to forward-backward correlation order.
Optionally, the stain information of target program described in the determination in above-mentioned steps 1041, comprising:
To the content of preset format be met as stain information; Or
When there is not described preset format, using input content as described stain information.
In force, owing to needing the acquisition carrying out the instruction sequence with the mark that has a stain according to stain information in previous step, and then just can complete the structure of controlling stream graph, the confirmation therefore for stain information just seems especially important.
In actual service condition, pollution source have system default and User Defined two kinds of modes, and system default is pollution source from the standard input of program or the data of reading in from network; User also can go to specify or get on to increase pollution source on the basis of system default according to the needs of oneself.Now need user can prestore as the content format of stain information, make when performance objective program like this, once the content meeting this content format be detected, just can immediately using this content as stain information, and then complete subsequent processing steps.
Concrete, when predetermined format not being carried out content and storing, give tacit consent to the content that inputs during performance objective program as stain information, " 3 " in " 2 " such as, in Fig. 5 in the 1st row " 0x2 " and the 3rd row " 0x3 " are the input contents when performance objective program, now just by " 2 " and " 3 " as stain information, and then the 1st row and the 2nd row instruction sequence that comprise the two are carried out stain mark.
By carrying out the mode of stain mark, shift by key content and by this key content, the instruction of the process such as computing is as the instruction sequence being with the mark that has a stain, be convenient to so preferentially follow the tracks of process to key message and all instructions relevant to key message, the processing mode of low relative importance value is then carried out in the instruction that those and key message have nothing to do, thus under ensureing that treating handling procedure carries out the prerequisite of antialiasing process, effective reduction system overhead, also can reduce to a certain extent and make antialiasing process stop the generation of this fortuitous event because system overhead is excessive.
Optionally, namely step 105 supplements the described controlling stream graph information of carrying out according to control dependence, obtains having with described data fundamental block the control fundamental block controlling dependence, comprising:
Obtain all data fundamental blocks in described controlling stream graph;
According to control dependence, obtain in described controlling stream graph, with described data fundamental block, there is the control fundamental block controlling dependence, described control fundamental block is supplemented in described controlling stream graph.
In force, in the controlling stream graph established before, obtain all data fundamental blocks, in these data fundamental blocks, include the instruction of a series of order input,
It should be noted that the mainly explicit information flow owing to being obtained by stain analysis, the controlling stream graph obtained obviously is sufficiently complete, in order to obtain prize procedure implicit information stream in the process of implementation, also needs to carry out Control dependence.
Table 3 controls the part false code of Dependent Algorithm in Precision
Two node w and v in program control flowchart, if the postdominator that node w is v refers to each path from v to stop all comprise w, and node can not be it self postdominator.Formally to be expressed as follows: node w is that the postdominator of node v refers to unique immediate successor that w is v or v has multiple immediate successor, and to be the postdominator of u for all follow-up u, the w of v.
Two instruction i and j in program, node j control to depend on the result that node i refers to node i and determine whether node j performs.Formally be expressed as follows, node j controls to depend on below node i demand fulfillment two conditions: there is a non null paths between (1) i and j; (2) j is the postdominator of other all nodes except i on this paths.
Can obtain drawing some execution sequences in control flow chart between each fundamental block by controlling dependence, so just can directly draw according to control dependence.
Optionally, step 106 is namely according to the redirect relation in described controlling stream graph, obtain the implicit expression path in described controlling stream graph except described explicit path, described explicit path and described implicit expression path are formed the many execution routes in described controlling stream graph, obtain complete controlling stream graph, as shown in Figure 8, comprising:
1061, according to the redirect relation in described controlling stream graph, determine performing the described data fundamental block that redirect occurs in described target program process.
1062, in the described data fundamental block that redirect occurs, according to the redirect relation corresponding with described data fundamental block, by the routing algorithm of depth-first, route searching is carried out to described data fundamental block, obtain hiding Paths except described explicit path, described in hide Paths form implicit expression path.
1063, form multipath based on described explicit path and described implicit expression path, on the basis of described controlling stream graph, obtain complete controlling stream graph.
In force, in the controlling stream graph shown in Fig. 7, can find out, redirect is there is between several data fundamental blocks from top to bottom, mean in each redirect place simultaneously, the higher level's data fundamental block having redirect relation corresponding and subordinate's data fundamental block, in higher level's data fundamental block of each redirect relation, route searching is carried out to this data fundamental block, here in order to realize the concrete condition in other paths of this data fundamental block, typically can use depth-first traversal (Depth-First Traversal, DFS) routing algorithm, can ensure that all paths relevant with this data fundamental block can be searched completely like this.
Pass through depth-priority-searching method, the all paths relevant with higher level's data fundamental block that redirect occurs can be got, due to the path that gets here not before build controlling stream graph time directly get, relative to the former " explicit " characteristic, these paths have more the character of " hidden ", therefore, this step is undertaken traveling through hiding Paths of just getting by routing algorithm and call " implicit expression " path.
After getting all implicit expression paths in controlling stream graph, in conjunction with before the explicit path that just gets, the two is merged into multipath, after on the controlling stream graph before being replenished by multipath, obtains the complete controlling stream graph with All Paths.Here, Fig. 9 is after supplementing according to the control dependence analysis of step 105 and the multipath of step 106, the complete controlling stream graph obtained.
Here all paths relevant to fundamental block to why be obtained, be because only according to band have a stain mark instruction sequence build controlling stream graph only obtain part execution route, and the structure of many execution routes efficiently avoid the problem that the part behavior that single execution route can only reflect program and the performance cost brought by symbolic analysis cause greatly and easily Space Explosion, saves system overhead.
Optionally, step 107, namely according to described complete controlling stream graph, carries out instruction tailoring in conjunction with described reversible deformation template base to described target program, obtains the succinct controlling stream graph that instruction after cutting is corresponding, according to the implementation acquisition of information function call relationship graph of described target program, comprising:
First, the described data fundamental block in described complete controlling stream graph and between described control fundamental block and described data fundamental block or described control fundamental block inner, the instruction sequence marked without described stain in described controlling stream graph is abandoned;
Secondly, detect be out of shape described in reversible deformation template base before the presumptive instruction instruction sequence of mating, then described instruction sequence is carried out cutting, remaining instruction sequence in described complete controlling stream graph is saved as succinct controlling stream graph, extract the function calling relationship in the implementation information of described target program, obtain function call relationship graph.
In force, can recognize according to foregoing, stain Essence of Information is before exactly most important information in Malware, namely the instruction sequence that the treatment step relevant to stain information is corresponding is important step with the instruction sequence marked that has a stain, and these are only the necessary part that Malware can realize complete function.Except the part except these parts; all can regard as Malware in order to protect self avoid the protection part that generates by antivirus software killing; exactly because the existence of these protection parts; the feature of Malware is made to be in the state hidden and can not be correctly validated; this step is just according to the complete controlling stream graph got; delete processing is abandoned to the instruction sequence marked without stain, only retains and best embody the band of the part of characteristic of malware-namely and to have a stain the instruction sequence of mark.
Based on above-mentioned theory and complete controlling stream graph, the data fundamental block in complete controlling stream graph and controlling between fundamental block, deletes the instruction sequence without stain mark.
And if then described data fundamental block in described complete controlling stream graph or described control fundamental block inner, detect be out of shape described in reversible deformation template base before the presumptive instruction instruction sequence of mating, then described instruction sequence being carried out cutting, is succinct controlling stream graph; If obtaining the function call information obtained in the implementation information in performance objective program, then also get function call relationship graph according to function calling relationship.So that carry out instruction tailoring in conjunction with succinct controlling stream graph to Malware.
Similar with the reason of the succinct controlling stream graph of above-mentioned acquisition; be in the consideration of same object; data fundamental block in complete controlling stream graph is inner with control fundamental block; according to the instruction distortion strategy in reversible deformation template base in step 101; the original instruction content met in instruction distortion strategy is converted to antialiasing instruction; make the useless statement of complexity shielded in Malware become concise and to the point statement, reach the effect of simplifying Malware instruction.
3 steps are generally all followed when carrying out code analysis:
1. namely partial analysis analyzes each fundamental block.
2. global analysis's (also known as process inner analysis) i.e. controlling stream graph of analytic function.
3. the call relation between interprocedural analysis and analytic function.Call relation between function represents with function call relationship graph.
In practice, general function call graph and controlling stream graph are considered as separately independently entity.Therefore in step 108, succinct controlling stream graph sum functions call graph is obtained respectively.The two is carrying out, in instruction tailoring process, playing different effects respectively.
A kind of antialiasing method of binary code based on semanteme proposed in the present embodiment, by after judging that target program is as executable file, performance objective program, and all data obtained in implementation, dynamic stain analysis and control dependency analysis is used to mark the critical data in all data and the instruction sequence relevant to critical data, markd instruction sequence will be with to build controlling stream graph, finally delete in controlling stream graph and obtain succinct controlling stream graph without the instruction sequence of mark, function call relationship graph is obtained according to the data in the implementation obtained, according to associative function call graph while of succinct controlling stream graph, antialiasing process is carried out to Malware, avoid in prior art to Malware carry out antialiasing process time poor universality defect, improve the versatility of antialiasing process applicable object, also reduce system overhead to a certain extent.
It should be noted that: the antialiasing method that above-described embodiment provides carries out the embodiment of antialiasing process to Malware, only as explanation in actual applications in this antialiasing method, can also use in other application scenarioss according to actual needs and by above-mentioned antialiasing method, its specific implementation process is similar to above-described embodiment, repeats no more here.
Each sequence number in above-described embodiment, just to describing, not to represent in the assembling of each parts or use procedure to obtain sequencing.
The foregoing is only embodiments of the invention, not in order to limit the present invention, within the spirit and principles in the present invention all, any amendment done, equivalent replacement, improvement etc., all should be included within protection scope of the present invention.

Claims (8)

1. based on the antialiasing method of binary code of semanteme, it is characterized in that, the antialiasing method of the described binary code based on semanteme, comprising:
Build reversible deformation template base, described reversible deformation template base comprises the instruction distortion strategy based on semanteme;
Import target program, determine whether described target program is executable file by the binary code detecting described target program;
When described target program is described executable file, perform described target program, extract the implementation information of described target program;
Stain analysis is carried out to described implementation information, obtain the instruction sequence being with the mark that has a stain, have a stain according to described band the instruction sequence orecontrolling factor flow graph of mark, and described controlling stream graph comprises at least two fundamental blocks, and multiple described fundamental block forms explicit path;
According to control dependence, the described controlling stream graph information of carrying out is supplemented, obtain, with described data fundamental block, there is the control fundamental block controlling dependence;
According to the redirect relation in described controlling stream graph, obtain the implicit expression path in described controlling stream graph except described explicit path, described explicit path and described implicit expression path are formed the many execution routes in described controlling stream graph, obtain complete controlling stream graph;
According to described complete controlling stream graph, in conjunction with described reversible deformation template base, instruction tailoring is carried out to described target program, obtain the succinct controlling stream graph that instruction after cutting is corresponding, according to the implementation acquisition of information function call relationship graph of described target program.
2. the antialiasing method of the binary code based on semanteme according to claim 1, is characterized in that, described structure reversible deformation template base, and described reversible deformation template base comprises the instruction distortion strategy based on semanteme, comprising:
Obtain the known distortion of the instruction based on semanteme strategy, described instruction distortion strategy comprises the presumptive instruction before distortion and the antialiasing instruction after distortion;
Described instruction distortion strategy is combined, obtains reversible deformation template base.
3. the antialiasing method of the binary code based on semanteme according to claim 1, is characterized in that, the described implementation information extracting described target program when described target program is described executable file, comprising:
When determining that described target program is described executable file, perform described target program;
Extract the implementation information in described all execution data, described implementation information is included in all command informations, register information and internal storage data information in described target program implementation, when performing described target program existence function and calling, also need to extract function call information.
4. the antialiasing method of the binary code based on semanteme according to claim 1, it is characterized in that, described stain analysis is carried out to described implementation information, obtain the instruction sequence being with the mark that has a stain, according to described band have a stain mark instruction sequence orecontrolling factor flow graph, described controlling stream graph comprises at least two data fundamental blocks, and multiple described data fundamental block forms explicit path, comprising:
Determine the stain information of described target program, stain mark is carried out in all instructions containing described stain information in described implementation information, obtains the instruction sequence marked with described stain;
According to memory location and the front and back redirect relation of the described instruction sequence marked with described stain, orecontrolling factor flow graph, described controlling stream graph comprises at least two data fundamental blocks, and multiple described data fundamental block forms explicit path.
5. the antialiasing method of the binary code based on semanteme according to claim 4, is characterized in that, the described stain information determining described target program, comprising:
To the content of preset format be met as stain information; Or
When there is not described preset format, using input content as described stain information.
6. the antialiasing method of the binary code based on semanteme according to claim 1, it is characterized in that, described according to control dependence the described controlling stream graph information of carrying out is supplemented, obtain, with described data fundamental block, there is the control fundamental block controlling dependence, comprising:
Obtain all data fundamental blocks in described controlling stream graph;
According to control dependence, obtain in described controlling stream graph, with described data fundamental block, there is the control fundamental block controlling dependence, described control fundamental block is supplemented in described controlling stream graph.
7. the antialiasing method of the binary code based on semanteme according to claim 1, it is characterized in that, described according to the redirect relation in described controlling stream graph, obtain the implicit expression path in described controlling stream graph except described explicit path, described explicit path and described implicit expression path are formed the many execution routes in described controlling stream graph, obtain complete controlling stream graph, comprising:
According to the redirect relation in described controlling stream graph, determine performing the described data fundamental block that redirect occurs in described target program process;
In the described data fundamental block that redirect occurs, according to the redirect relation corresponding with described data fundamental block, by the routing algorithm of depth-first, route searching is carried out to described data fundamental block, obtains hiding Paths except described explicit path, described in hide Paths form implicit expression path;
Form multipath based on described explicit path and described implicit expression path, on the basis of described controlling stream graph, obtain complete controlling stream graph.
8. the antialiasing method of the binary code based on semanteme according to claim 1, it is characterized in that, described according to described complete controlling stream graph, in conjunction with described reversible deformation template base, instruction tailoring is carried out to described target program, obtain the succinct controlling stream graph that instruction after cutting is corresponding, according to the implementation acquisition of information function call relationship graph of described target program, comprising:
First, the described data fundamental block in described complete controlling stream graph and between described control fundamental block and described data fundamental block or described control fundamental block inner, the instruction sequence marked without described stain in described controlling stream graph is abandoned;
Secondly, detect be out of shape described in reversible deformation template base before the presumptive instruction instruction sequence of mating, then described instruction sequence is carried out cutting, remaining instruction sequence in described complete controlling stream graph is saved as succinct controlling stream graph, extract the function calling relationship in the implementation information of described target program, obtain function call relationship graph.
CN201510158163.7A 2015-04-03 2015-04-03 A kind of antialiasing method of binary code based on semanteme Expired - Fee Related CN104834837B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510158163.7A CN104834837B (en) 2015-04-03 2015-04-03 A kind of antialiasing method of binary code based on semanteme

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510158163.7A CN104834837B (en) 2015-04-03 2015-04-03 A kind of antialiasing method of binary code based on semanteme

Publications (2)

Publication Number Publication Date
CN104834837A true CN104834837A (en) 2015-08-12
CN104834837B CN104834837B (en) 2017-10-31

Family

ID=53812720

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510158163.7A Expired - Fee Related CN104834837B (en) 2015-04-03 2015-04-03 A kind of antialiasing method of binary code based on semanteme

Country Status (1)

Country Link
CN (1) CN104834837B (en)

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105844160A (en) * 2016-06-21 2016-08-10 北京金山安全软件有限公司 Driver hiding method, device and equipment
CN106951744A (en) * 2017-03-15 2017-07-14 北京深思数盾科技股份有限公司 The guard method of executable program and device
CN107194252A (en) * 2017-05-09 2017-09-22 华中科技大学 The program control flow completeness protection method and system of a kind of complete context-sensitive
CN107229848A (en) * 2017-06-12 2017-10-03 北京洋浦伟业科技发展有限公司 A kind of code reinforcement means and device
CN108153518A (en) * 2017-12-25 2018-06-12 厦门市美亚柏科信息股份有限公司 A kind of antialiasing method of JAVA programs and terminal
CN108446541A (en) * 2018-02-12 2018-08-24 北京梆梆安全科技有限公司 Source code reinforcement means and device based on finite state machine and semiology analysis
CN108446536A (en) * 2018-02-12 2018-08-24 北京梆梆安全科技有限公司 A kind of source code reinforcement means and device based on semiology analysis and single-point logic
CN108537012A (en) * 2018-02-12 2018-09-14 北京梆梆安全科技有限公司 Source code based on variable and code execution sequence obscures method and device
CN108733990A (en) * 2018-05-22 2018-11-02 深圳壹账通智能科技有限公司 A kind of document protection method and terminal device based on block chain
CN109101816A (en) * 2018-08-10 2018-12-28 北京理工大学 A kind of malicious code homology analysis method for calling controlling stream graph based on system
CN109871681A (en) * 2019-02-28 2019-06-11 天津大学 Android malware detection method is loaded towards dynamic code based on hybrid analysis
CN110832488A (en) * 2017-06-29 2020-02-21 爱维士软件有限责任公司 Normalizing entry point instructions in executable program files
CN111814120A (en) * 2020-07-10 2020-10-23 北京嘀嘀无限科技发展有限公司 Program anti-aliasing processing method, device, equipment and storage medium
CN112612480A (en) * 2020-12-28 2021-04-06 苏州浪潮智能科技有限公司 Confusion removing method and device for decompiled original code
CN114357389A (en) * 2021-12-31 2022-04-15 北京大学 Instruction flower adding confusion method and device based on LLVM
CN114417355A (en) * 2022-01-07 2022-04-29 上海交通大学 Lightweight safety detection system and method for industrial control system

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101359352A (en) * 2008-09-25 2009-02-04 中国人民解放军信息工程大学 API use action discovering and malice deciding method after confusion of multi-tier synergism
CN101814053A (en) * 2010-03-29 2010-08-25 中国人民解放军信息工程大学 Method for discovering binary code vulnerability based on function model
CN101968766A (en) * 2010-10-21 2011-02-09 上海交通大学 System for detecting software bug triggered during practical running of computer program
CN102012987A (en) * 2010-12-02 2011-04-13 李清宝 Automatic behavioural analysis system for binary malicious codes
CN102789419A (en) * 2012-07-20 2012-11-21 中国人民解放军信息工程大学 Software fault analysis method based on multi-sample difference comparison
CN103778355A (en) * 2014-01-15 2014-05-07 西北大学 Code morphing-based binary code obfuscation method

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101359352A (en) * 2008-09-25 2009-02-04 中国人民解放军信息工程大学 API use action discovering and malice deciding method after confusion of multi-tier synergism
CN101814053A (en) * 2010-03-29 2010-08-25 中国人民解放军信息工程大学 Method for discovering binary code vulnerability based on function model
CN101968766A (en) * 2010-10-21 2011-02-09 上海交通大学 System for detecting software bug triggered during practical running of computer program
CN102012987A (en) * 2010-12-02 2011-04-13 李清宝 Automatic behavioural analysis system for binary malicious codes
CN102789419A (en) * 2012-07-20 2012-11-21 中国人民解放军信息工程大学 Software fault analysis method based on multi-sample difference comparison
CN103778355A (en) * 2014-01-15 2014-05-07 西北大学 Code morphing-based binary code obfuscation method

Cited By (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105844160A (en) * 2016-06-21 2016-08-10 北京金山安全软件有限公司 Driver hiding method, device and equipment
CN106951744A (en) * 2017-03-15 2017-07-14 北京深思数盾科技股份有限公司 The guard method of executable program and device
CN106951744B (en) * 2017-03-15 2019-12-13 北京深思数盾科技股份有限公司 protection method and device for executable program
CN107194252A (en) * 2017-05-09 2017-09-22 华中科技大学 The program control flow completeness protection method and system of a kind of complete context-sensitive
CN107194252B (en) * 2017-05-09 2019-11-22 华中科技大学 A kind of the program control flow completeness protection method and system of complete context-sensitive
CN107229848A (en) * 2017-06-12 2017-10-03 北京洋浦伟业科技发展有限公司 A kind of code reinforcement means and device
CN110832488A (en) * 2017-06-29 2020-02-21 爱维士软件有限责任公司 Normalizing entry point instructions in executable program files
CN108153518A (en) * 2017-12-25 2018-06-12 厦门市美亚柏科信息股份有限公司 A kind of antialiasing method of JAVA programs and terminal
CN108446536A (en) * 2018-02-12 2018-08-24 北京梆梆安全科技有限公司 A kind of source code reinforcement means and device based on semiology analysis and single-point logic
CN108537012A (en) * 2018-02-12 2018-09-14 北京梆梆安全科技有限公司 Source code based on variable and code execution sequence obscures method and device
CN108537012B (en) * 2018-02-12 2021-11-16 北京梆梆安全科技有限公司 Source code obfuscation method and device based on variables and code execution sequence
CN108446541A (en) * 2018-02-12 2018-08-24 北京梆梆安全科技有限公司 Source code reinforcement means and device based on finite state machine and semiology analysis
CN108733990A (en) * 2018-05-22 2018-11-02 深圳壹账通智能科技有限公司 A kind of document protection method and terminal device based on block chain
CN109101816A (en) * 2018-08-10 2018-12-28 北京理工大学 A kind of malicious code homology analysis method for calling controlling stream graph based on system
CN109871681A (en) * 2019-02-28 2019-06-11 天津大学 Android malware detection method is loaded towards dynamic code based on hybrid analysis
CN109871681B (en) * 2019-02-28 2023-04-18 天津大学 Detection method for android malicious software loaded on basis of hybrid analysis and oriented to dynamic codes
CN111814120B (en) * 2020-07-10 2021-04-23 北京嘀嘀无限科技发展有限公司 Program anti-aliasing processing method, device, equipment and storage medium
CN111814120A (en) * 2020-07-10 2020-10-23 北京嘀嘀无限科技发展有限公司 Program anti-aliasing processing method, device, equipment and storage medium
CN112612480A (en) * 2020-12-28 2021-04-06 苏州浪潮智能科技有限公司 Confusion removing method and device for decompiled original code
CN114357389A (en) * 2021-12-31 2022-04-15 北京大学 Instruction flower adding confusion method and device based on LLVM
CN114357389B (en) * 2021-12-31 2024-04-16 北京大学 LLVM (logical Low level virtual machine) -based instruction flower adding confusion method and device
CN114417355A (en) * 2022-01-07 2022-04-29 上海交通大学 Lightweight safety detection system and method for industrial control system
CN114417355B (en) * 2022-01-07 2022-11-08 上海交通大学 Lightweight safety detection system and method for industrial control system

Also Published As

Publication number Publication date
CN104834837B (en) 2017-10-31

Similar Documents

Publication Publication Date Title
CN104834837A (en) Binary code anti-obfuscation method based on semanteme
CN108614960B (en) JavaScript virtualization protection method based on front-end byte code technology
US10242043B2 (en) Software security via control flow integrity checking
US8850581B2 (en) Identification of malware detection signature candidate code
US9342285B2 (en) Method and apparatus for detecting code change
US8589897B2 (en) System and method for branch extraction obfuscation
US10055590B2 (en) Rule matching in the presence of languages with no types or as an adjunct to current analyses for security vulnerability analysis
CN104834859A (en) Method for dynamically detecting malicious behavior in Android App (Application)
US9900324B1 (en) System to discover and analyze evasive malware
CN103886229A (en) Method and device for extracting PE file features
US20100269105A1 (en) Method of determining which computer program functions are changed by an arbitrary source code modification
US10795679B2 (en) Memory access instructions that include permission values for additional protection
Otsuki et al. Building stack traces from memory dump of Windows x64
CN103514405B (en) The detection method of a kind of buffer overflow and system
JP2013131157A (en) Program, information apparatus, and method for detecting fraudulent access
CN111753302A (en) Method and device for detecting code bugs, computer readable medium and electronic equipment
KR20080096518A (en) Analyzing interpretable code for harm potential
CN105488399A (en) Script virus detection method and system based on program keyword calling sequence
CN105793864A (en) System and method of detecting malicious multimedia files
US9122792B2 (en) Step over operation for machine code function calls
KR101461051B1 (en) Method for detecting malignant code through web function analysis, and recording medium thereof
WO2023179298A1 (en) Application program updating method and apparatus, application program development method and apparatus, and computer device
KR101982734B1 (en) Apparatus and method for detecting malicious code
CN104680043A (en) Method and device for protecting executable file
CN104169872A (en) Translating application resources

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
EXSB Decision made by sipo to initiate substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CB03 Change of inventor or designer information
CB03 Change of inventor or designer information

Inventor after: Guo Jun

Inventor after: Ye Guixin

Inventor after: Zhou Xiang

Inventor after: Chen Feng

Inventor after: Wang Lei

Inventor after: Tang Zhanyong

Inventor after: Fang Dingyi

Inventor after: Chen Xiaojiang

Inventor after: Li Guanghui

Inventor after: Hao Chaohui

Inventor after: Wang Hua

Inventor after: Zhang Heng

Inventor before: Wang Lei

Inventor before: Ye Guixin

Inventor before: Zhou Xiang

Inventor before: Chen Feng

Inventor before: Guo Jun

Inventor before: Tang Zhanyong

Inventor before: Fang Dingyi

Inventor before: Chen Xiaojiang

Inventor before: Li Guanghui

Inventor before: Hao Chaohui

Inventor before: Wang Hua

Inventor before: Zhang Heng

CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20171031

Termination date: 20200403