CN102930210A

CN102930210A - System and method for automatically analyzing, detecting and classifying malicious program behavior

Info

Publication number: CN102930210A
Application number: CN2012104083589A
Authority: CN
Inventors: 邹艳; 刘建港; 苗启广; 曹莹; 谢国胜; 黄有成; 刘家辰; 郑春阳
Original assignee: JIANGSU JINLING TECHNOLOGY GROUP CORP
Current assignee: JIANGSU JINLING TECHNOLOGY GROUP CORP
Priority date: 2012-10-14
Filing date: 2012-10-14
Publication date: 2013-02-13
Anticipated expiration: 2032-10-14
Also published as: CN102930210B

Abstract

The invention discloses a system and a method for automatically analyzing, detecting and classifying a malicious program behavior. The system comprises a static analysis module, a sandbox dispatching management module, a sandbox monitoring module, a behavior abstraction module and a detection and classification module. Compared with the prior art, the system has the advantages that 1, the system is based on a behavior monitoring technology in an instruction set simulation environment; and 2, a virtual Internet is established in a sandbox through means of environment configuration, server program modification and the like, and a common network service is simulated, so that operations such as domain name server (DNS) resolution, http access, file download, Email login and mailing initiated by a malicious program can be successfully executed, the malicious program is inveigled to generate a malicious network behavior, the network behaviors are prevented from damaging a host machine and a real network, and the defects that the malicious program network behavior cannot be fully expressed during dynamic behavior analysis of a malicious program and the like are overcome.

Description

Rogue program behavior automated analysis, detection and classification system and method

Technical field

The invention belongs to security of system and network security association area, further relate to the method for rogue program dynamic behaviour automated analysis.The present invention is used for the foundation to known malicious program dynamic behaviour rule, and the pin-point accuracy of unknown rogue program dynamic behaviour is judged.

Background technology

At the rogue program analysis field, in order to obtain more accurately, more comprehensively, more rapidly the behavioural characteristic of rogue program, adopt dynamic behaviour automatic partition analysis method.

The patented claim of University of Electronic Science and Technology " rogue program dynamic behaviour automatic analysis system and method " (publication number: CN101154258, the applying date: the method that discloses a kind of rogue program performance analysis 2007.08.14).The concrete steps of this performance analysis comprise: (1) initialization parts start the target binary program; (2) the initialization parts load virtual execution unit and behavior monitoring parts; (3) assembly instruction of dis-assembling component retrieval target program binary code stream; (4) virtual execution unit section generates corresponding execution block; (5) the behavior control assembly is judged the malicious act that whether exists in the fundamental block in the rule base; (6) if there is malicious act, give the behavioural analysis parts with control, record this malicious act; (7) every instruction in the virtual execution fundamental block; (8) stop to analyze after, the behavioural analysis parts are submitted the malicious act analysis report to.Although this method provides robotization rogue program dynamic behaviour analytic system, can be used for the coarseness of unknown rogue program dynamic behaviour is divided, but because this system lacks static analysis to rogue program, lacks the host event simulation, lacks the hosted environment simulation, lacks the simulation of common network environment etc., so this system is very not comprehensive to obtaining of the dynamic behaviour of rogue program; And this system only can analyze binary executable, all can not analyze extended formatting file such as service routine, dll file or non-PE file, and the limitation that system uses is very large; Simultaneously, this system is abstract to how carrying out behavior after the acquisition rogue program behavioural characteristic, and how the malicious act of unknown binary program is analyzed classification and fail to provide method.In sum, these deficiencies have influence on practicality, accuracy and the classification effectiveness of this system.

Summary of the invention

The present invention is directed to the deficiency of the technology of existing rogue program analysis, detection and classification, propose a kind of static analysis, performance analysis combines with network analysis, the abstract rogue program automated analysis that combines with integrated study of behavior, detection and classification method.Target provides the stronger rogue program automated analysis of practicality, the detection and classification system and method, it supports load operating PE file and common non-PE file, the fully monitoring that support is carried out rogue program, process during the monitor malicious program load and execution is injected, registry operations, internal memory operation, Host behavior and the network redirection such as file operation, the DNS addressing, ftp connects, the http access, the email login waits network behavior with transmission, provide process, internal memory, file, registration table, hosted environment, the malice access behavior of the sorts of systems resources such as network provides USB flash disk to insert, the simulation of the host event such as CD insertion.Simultaneously, according to the report that the rogue program automated analysis generates the abstract of systematization, regularization carried out in the behavior of each rogue program, form rogue program behavioural characteristic storehouse, utilize integrated learning approach these behavioural characteristics are analyzed and to be quantized, set up disaggregated model, Effective Raise to the accuracy rate of unknown sample document classification.

The sandbox monitoring that the present invention realizes by Intel Virtualization Technology, obtain the rogue program static information and catch the rogue program behavioural characteristic, detect and the method for this whole system of classifying realizes rogue program automated analysis and accurately classification based on the rogue program of behavioural characteristic, with solve condition code in the prior art extract difficulty large, be difficult to tackle complicated add shell, rogue program, rogue program behavior polymorphic and deformation technology catch imperfect, behavior is abstract and detect the shortcomings such as sorting technique is indefinite, has improved verification and measurement ratio and the classification accuracy of rogue program.

Rogue program automated analysis provided by the invention, detection and classification system have comprised following module:

1. static analysis module: before sample file is carried out the sandbox performance analysis, can carry out static analysis to the structure of executable file (PF file), to obtain the information relevant with sample as much as possible, the static analysis that is obtained sample file by these information is reported, and various report afterwards becomes the most original Data Source of behavior abstract module.

2. sandbox dispatching management module: the present invention includes a plurality of sandboxs, the flow process that need to have sandbox dispatching management module independently to manage each sandbox, concordant sample and data transfer, control sample automated analysis.The sandbox dispatching management module is controlled the startup of each sandbox and is withdrawed from, and realizes message exchange and file transfer, the execution of control sample and hosted environment simulation with each sandbox.Generally speaking, the sandbox dispatching management module is a module of assisting the robotization of sandbox monitoring module to finish corresponding function, is an important supplementary module.

3. sandbox monitoring module: the sandbox monitoring module extracts simultaneously this process load-on module and operating system and is its relevant kernel data of its maintenance take the API Calls of catching specific process and initiating and parameter thereof as main target.

The present invention uses the simulator Qemu that increases income as the software virtual machine on basis, and the instruction interpretation operating part core code in its CPU simulation is made amendment, and realizes the purpose of monitoring specific process Host behavior.This behavior monitoring technology based on the instruction set simulation environment can begin to realize that the kernel module reconstruct such as system call, process obtain the behavior the rogue program Dynamic Execution from bottom to top from instruction-level, and the sandbox environment that host and rogue program are carried out is isolated, has avoided to a great extent rogue program in the process of implementation on the impact of host.

In order to overcome in traditional API hold-up interception method since revise the stable existence of carrying out, be detected easily analysis tool of Rogue program that monitored program source code causes with escape monitoring, when collecting the operating system nucleus data, the demand motive program, the shortcomings such as technical difficulty is large, the sandbox monitoring module is with modifying target program not, the execution of silent monitoring test procedure, collecting multiple available information is target.Monitored program operates in the client operating system, and the monitoring of program behavior is implemented in than client operating system the more Qemu watch-dog unit of highly privileged grade.Because behavior monitoring is implemented in higher prerogative grade, test procedure is difficult to escape analysis, and need not to revise the test procedure source code.

4. behavior abstract module: finish after the catching of the execution of rogue program and API at the sandbox monitoring module, can obtain api function that this sample program run duration uses and the report of parameter thereof.But this API report is directly used in the rogue program classification, has some obstacles, so need to the abstract behavior that obtains the sample performance from the API sequence.This with sample API sequence abstract be the process of sample behavior, be called " behavior is abstract ".

Rogue program sample is through the sandbox analysis, and what obtain is its API Calls sequence.Although this calling sequence contains the more information relevant with the rogue program behavior, to process in follow-up sorting algorithm, and generate the people and hold in the process of intelligible report, the abstraction hierarchy degree of API sequence is excessively low.So need some rules of definition, with the API Calls sequence abstract be algorithm easy to handle data mode, further also need to be abstracted into for the people and hold intelligible expression form.

5. detection and classification module: the rogue program Detection task is many classification task of a standard.In order to judge whether rogue program of Study document that the user submits to, if needing further to judge belong to any rogue program, must play disaggregated model by model.

Native system adopts the thought of integrated study to set up disaggregated model.The thought of integrated study is used different strategies that a large problem is divided into some minor issues and is found the solution respectively, or generating a plurality of learners solves the same problem, then by Integrated Strategy that the Output rusults of different sub-classifiers is synthetic, obtain single final Output rusults.Generate a plurality of sorters and put to the vote, accuracy rate that can the Effective Raise classification problem is the core of algorithm design in the native system.

Ensemble Learning Algorithms is divided into two key links: sub-classifier generation and sorter are integrated.AdaBoost algorithm classical in the study of native system selective enhancement is as integrated framework, and trade-off decision tree algorithm C4.5 is as the sub-classifier generating algorithm.

The present invention compared with prior art has the following advantages:

First, the present invention is based on the behavior monitoring technology of instruction set simulation environment, can begin to realize system call from bottom to top from instruction-level, the behavior in the rogue program Dynamic Execution is obtained in the kernel module reconstruct such as process, because behavior monitoring is implemented in higher prerogative grade, test procedure is difficult to escape analysis, and need not to revise the test procedure source code, therefore overcome in traditional API hold-up interception method owing to revise stable execution of Rogue program that monitored program source code causes, be detected easily the existence of analysis tool to escape monitoring, demand motive program when collecting the operating system nucleus data, the shortcomings such as technical difficulty is large.

Second, the present invention makes up Virtual Internet by means such as environment configurations and modification server programs in sandbox, simulate general network service, so that the dns resolution that rogue program is initiated, http access, file download, Email login, mail transmission etc. operate can successful execution, inveigle rogue program to produce the hostile network behavior, guarantee that simultaneously these network behaviors can not damage host and live network, overcome the shortcomings such as the rogue program network behavior can't fully show in the rogue program dynamic behaviour analysis.

The 3rd, the present invention simulates multiple host event and hosted environment by means such as environment configurations and programs in sandbox, so that rogue program to USB flash disk insert, the event such as CD inserts, connections of network shared files folder or can both successfully show follow-up behavior to microphone, when making a video recording first-class environment sensitive, inveigle rogue program to produce more behavior, overcome the rogue program dynamic behaviour analyze in rogue program abundant shortcoming such as expression behaviour when responsive to host event or hosted environment.

The 4th, the present invention has realized a kind of abstract algorithm for the rogue program behavior, by analyzing and processing the rogue program behavior raw data that sandbox obtains, can obtain that form is neat, the abstract behavior data of less redundancy.The behavior abstract algorithm can obtain fast can be for the data of follow-up behavior detection and classification algorithm, for subsequent algorithm provides good data basis, overcome the shortcomings such as the abstract speed of traditional behavior abstract algorithm is slow, the expression mode is complicated, versatility is not strong.

The 5th, the present invention has used advanced Ensemble Learning Algorithms AdaBoost in rogue program behavior detection and classification process.As one of modern machines study ten large algorithms, the AdaBoost algorithm can make up by adaptive line by the sorter that some performances are weak, obtains the stronger sorter of performance, while implicit expression Optimum Classification border, the negative effect of avoiding over-fitting to bring.In rogue program behavior detection and classification process, adopt the AdaBoost algorithm, can the Effective Raise classification accuracy, especially for the extensive accuracy rate of new samples.Overcome in traditional rogue program behavior detection and classification process undesirable, the easy over-fitting of classifying quality and caused the shortcomings such as generalization ability is poor.

Description of drawings

Fig. 1 is the process flow diagram of rogue program behavior automated analysis of the present invention, detection and classification system and method;

Fig. 2 is the system assumption diagram of rogue program behavior automated analysis of the present invention, detection and classification system and method sandbox monitoring module;

Fig. 3 is interworking flow process figure between the sandbox monitoring module of rogue program behavior automated analysis of the present invention, detection and classification system and method and the sandbox dispatching management module.

Fig. 4 is the Qemu watch-dog cell operation figure of rogue program behavior automated analysis of the present invention, detection and classification system and method.

Fig. 5 is the behavior abstract schematic of rogue program behavior automated analysis of the present invention, detection and classification system and method.

Fig. 6 is the abstract process flow diagram of behavior of rogue program behavior automated analysis of the present invention, detection and classification system and method.

Embodiment

Below in conjunction with specific embodiment, the present invention is described in detail.

With reference to figure 1, step 1, the static analysis module is at first carried out static analysis to the structure that can carry out sample file, whether the compiler version, structure time, multi-lingual information, the joint information of PE file, the importing table of PE file, the PE file that obtain sample add shell and add shell type etc., the static analysis module will obtain the information relevant with rogue program, and the rogue program performance analysis information that obtains in conjunction with the sandbox monitoring module, for the classification of last Ensemble classifier algorithm provides abundanter data.

Step 2, after static analysis is finished, sample file will enter the performance analysis automation process.The performance analysis process of sample file will be by the automatic management of sandbox dispatching management module.The sandbox dispatching management module starts sandbox, sample file is uploaded to Guest OS unit, in Guest OS unit, move sample, execution or the loading of the Qemu watch-dog unit monitors sample in the sandbox monitoring module, produce the report of the API sequence of sample file, the network packet that the network packet monitoring function of GuestOS unit monitoring Guest OS unit produces in the sandbox monitoring module, the network packet report that produces sample file.After sample is carried out normal termination or overtime end, if right and wrong EXE sample file will carry out the snapshot contrast of registration table and file system, produce registration table, File Snapshot comparison report.The file that these reports will generate in the sample implementation is transferred to the sandbox dispatching management module, and these reports will be that a rogue program sample is carried out the abstract raw data of behavior.

Fig. 2 is the system assumption diagram of sandbox monitoring module.Describe the architecture of sandbox monitoring module among the figure in detail, and the sandbox monitoring module carries out mutual situation by sandbox dispatching management module and host.The sandbox monitoring module comprises: as the Guest OS unit of rogue program virtual execution environment; Transformed total system simulator Qemu watch-dog unit.Guest OS unit has comprised the functions such as network packet monitoring, snapshot contrast, host event simulation, and Qemu watch-dog unit has comprised that process is identified and the functions such as multi-process monitoring, API monitoring, the analysis of API dependence and redundant data filtration.Following 2a-2b introduces respectively the function of each unit in detail, and workflow.

2a) .Guest OS (client operating system) unit is the environment of operation rogue program sample, and we select Windows XP operating system as Guest OS.Be connected by virtual network between Guest OS unit and the host, be responsible for mutual by the sandbox dispatching management module.

With reference to figure 3, describe Guest OS unit and sandbox dispatching management module in detail and carry out alternately, and the workflow of Guest OS unit operations rogue program sample.The below introduces this workflow in detail.

I〉sandbox dispatching management module startup sandbox, by virtual network sample file is uploaded to Guest OS unit from host.

Ii〉Guest OS starts the monitoring based on the host data bag.Begin to carry out sample file.

Ii i〉Qemu watch-dog unit sends the message that sample analysis finishes to the sandbox dispatching management module after sample file normal termination or overtime end.

Iv〉if sample file is not executable file then carries out the comparison of registration table and file system snapshot, generating snapshot comparison report, otherwise directly carry out next step.

V〉if sample file has discharged alternative document in the process of implementation, then these files are passed to host by virtual network, otherwise directly carry out next step.

Vi〉Guest OS unit passes to host with the network packet monitoring report.

Vii〉the sandbox dispatching management module closes sandbox.

2b) there is more highly privileged grade .Qemu watch-dog unit than Guest OS unit, is used for the behavior of monitoring objective program.Qemu watch-dog unit uses the simulator Qemu that increases income as the software virtual machine on basis, but the instruction interpretation operating part core code in its CPU simulation is made amendment, and realizes the purpose of monitoring specific process Host behavior.

With reference to figure 4, the course of work of Qemu watch-dog unit has been described, the below introduces this course of work in detail.

I〉the current process of moving of Qemu watch-dog unit identification target process whether, if not target process is then directly let slip execution, otherwise is carried out next step.

Ii〉if carry out place, API entrance code, then to preserve and return an address to return-address stack, and call the front end call back function, the front end call back function reads the in value of in and in_out parameter when place, API entrance code is carried out.

Iii〉if execution is not place, API entrance code, then compare with the return-address stack stack top element, if unequal, illustrate that then the current API that calls is the API of nesting allocation, the API of these nesting allocations does not represent the real behavior of program but realize the inside of operating system, does not monitor so let slip execution.Otherwise if equate, then call the rear end call back function, the rear end call back function is revised return-address stack afterwards accordingly calling the value that reads rreturn value and out parameter when returning.

Qemu watch-dog unit uses the software virtual machine of simulator Qemu as the basis of increasing income, but the instruction interpretation operating part core code in the CPU simulation is made amendment, revise in the process of Qemu and relate to a plurality of technological difficulties, below 2c-2g introduce respectively each technological difficulties and solution of the present invention.

2c) the process of .Qemu watch-dog unit identification: only computer hardware is strictly simulated without the virtual machine of transforming, simulation CPU carries out the process of each bar instruction, and does not understand " process " concept of operating system level.From the Qemu watch-dog, upwards monitor the target process that runs in the client operating system, at first must in the Qemu watch-dog, reconstruct all processes of current operation in the client operating system, only when target process is scheduled execution, carry out catching of behavioral data.

Native system in the method that Qemu watch-dog unit carries out process identification is: the sandbox monitoring module is before each translation BOB(beginning of block) is carried out, utilize the virtual memory read/write function, with kernel data structure KPCR (Kernel Process Control Region, the core processor control zone) is clue, finds in the system current just in the EPROCESS of executive process structure start address.Then, judge by the process name of preserving in EPROCESS (the carry out body process block) structure current just at executive process target process whether, if then therefrom the read operation system assignment give the page directory base value of this process.Afterwards, the value of storing in this value and the virtual CR3 register is compared, judge whether monitoring process is carried out.Only when carrying out, target process carries out the behavioral data collection.

2d) .API calls analytical framework and the call back function that reads parameter: Qemu carries out instruction simulation take fundamental block as unit, each code block all finishes with jump instruction.Thereby the code at its place, entrance of any API all is positioned at the beginning of a translation piece.In the beginning of translation piece, based on API entry address principle relatively, just can monitor the API Calls that specific process is initiated in the Qemu watch-dog unit realization formula of mourning in silence.API in each monitoring has corresponding with it call back function, is responsible for reading from virtual memory the call parameters that passes to this API.API monitoring is revised Qemu instruction translation routine by needs and is realized, calls framework to wherein inserting call back function, before API entrance code is carried out, judges whether to call the corresponding call back function information that gets parms by framing program.

Obtaining of API Calls parameter is the core that behavioral data gathers, and only obtains the API Calls name and is not enough to analyze the rogue program behavior.When carrying out place, API entrance code, use the virtual memory function reading just can from the address of virtual ESP register indication, read the return address, from the address of ESP+4 indication, read first parameter, from the address of ESP+8 indication, read second parameter, by that analogy.During API Calls, surpass 32 parameter (such as character string, structure etc.), replace the actual value of parameter to point to this parameter pointer.For these parameters, what read once that virtual memory obtains only is the memory address of parameter in internal memory, to analyzing without any effect, must repeatedly read virtual memory until read the actual value of parameter.

The API monitoring is finished by the call back function of two-part, the front end call back function reads the in value of in parameter (input parameter) and in_out parameter (input/output argument) when place, API entrance code is carried out, the rear end call back function is calling the value that reads rreturn value and out parameter (output parameter) when returning.The front and back end call back function communicates by common buffer, cooperating.

Use API entry address relative method, the API that inevitably can monitor nesting allocation is another major issue.Operating system might be called other API when realizing certain API.For example CopyFile indirect call CreateFile and WriteFi1e finish its function.The API of these nesting allocations does not represent the real behavior of program but realize the inside of operating system.In order to filter this class nesting allocation API, call back function calls framework and safeguards a return-address stack.Be that 1 o'clock output record is to filter nested API in stack level only.

2e). page fault is processed: virtual memory is by the simulation of Qemu process heap space, and the required various information of data acquisition all are positioned at wherein.The data of location in the virtual memory in local host, walking around Qemu virtual memory simulation routine, directly to read information needed be the core technology that the formula of mourning in silence in Qemu watch-dog unit is collected the target process behavioral data.But, because the Windows virtual memory management adopts " lazy strategy ", if the data that need to read are not at virtual memory, but in virtual hard disk the time, read by force virtual memory and can cause page fault, the analysis process abnormal end of carrying out in the client operating system has been destroyed the normal execution of monitoring process, is can not be received.

Cause the normal execution of page fault failure analysis program when extracting behavioral data for fear of the analyst from virtual memory, the sandbox monitoring module uses " three-step approach " to solve this problem.

Detailed process is as follows: at first test whether there is the phenomenon that skips leaf before reading, skip leaf if occur, wait for that this page or leaf is transferred virtual memory, if wait for unsuccessful, read by force data in this address space by routine analyzer, trigger the client operating system page fault and process routine, with skip leaf and call in virtual memory, then attempt again reading out data.

In order to improve execution efficient, in the sandbox monitoring module, be not that " three-step approach " all used in the read-write of all virtual memorys, but only when most possible skipping leaf, carry out above page fault avoidance strategy.In the Windows system, the data (32 parameters) that directly read from stack can not cause and skip leaf that character string and structure parameter generally little also can the initiation of data volume skip leaf, and all do not need the test of skipping leaf.When relating to I/O process or large buffer area read-write, just page fault might appear only.

2f) the .API dependence is analyzed with redundant data and is filtered: in API Calls, if the rreturn value of certain API or out parameter are the in parameters of another API, claim so to exist between these two API and call dependence.After the API Calls of successfully intercepting and capturing the specific process initiation, the API Calls sequential analysis still faces following three difficult problems.The, in order to resist API frequency statistics and API time series analysis, writing of part rogue program used redundant API to insert and the API rearrangement, so that API sequential calling sequence is difficult to portray the characteristic behavior of rogue program; The second, the behavioral data of performance analysis collection monitoring program when operation owing to have circulation and search in the program, causes some API to repeat to call, for follow-up behavior analytic band has come heavy data burden; There is ambiguity in the 3rd, Windows API, and for example CreateFile opens file, and creates file, opens named pipes, or creates named pipes etc., and this just causes API Calls really not to be equivalent to program behavior.

Although API Calls frequency and API Calls sequential can change, between the API to call dependence relatively stable, and to Existence dependency relationship between the API of same target repetitive operation.Based on this, the following three kinds of situations of the API dependence analysis in the Qemu watch-dog unit and redundant data filtering function complex process are to extract the characteristic behavior of rogue program.The first, under windows platform, handle is representing system resource, will merge into once the repetitive operation of same resource take handle as foundation.The second, process injection event is monitored.The 3rd, by the ambiguity of dependence analysis elimination Create series A PI.

2g). the multi-process monitoring: the multi-process monitoring function in the Qemu watch-dog unit is used for the subprocess of monitoring host process establishment and the behavior that process is injected the process that is injected into.In the host process operational process, automatically identifying and adding the new target process that needs monitoring is the difficult point that multi-process is monitored.Therefore the sandbox monitoring module, realizes that the multi-process monitoring is still from the API angle to catch API Calls as core.

The first step of multi-process monitoring is to obtain the process name that needs monitoring process, when the operating system initialization process, with process clue by name, finds operating system to distribute to the value of the page directory plot of this process.For creating this behavior of subprocess, can realize by this core A of monitoring NtCreateProcess PI.Transform front end call back function corresponding to NtCreateProcess, from call parameters, extract the process name of the process of being created, by the run-time memory analytical approach of above introducing, find the value of this process page directory plot, pass to the API Calls Governance framework, to realize the expansion of subprocess monitoring.Second step, process recognition function in the Qemu watch-dog unit is safeguarded a responsive page directory base value tabulation, before each translation piece is carried out, compare with the value of storing in the virtual CR3 register, when virtual CR3 register switched to any responsive page directory base value, the API monitoring function in the Qemu watch-dog unit was started working.

The monitoring that process is injected behavior mainly is divided into two steps of process name that the identification process is injected behavior and extracted the process that is injected into, the analysis of dependence between a plurality of API when all relating to operation.When process inject to realize usually from process is enumerated, because each is enumerated process is a potential process that is injected into, the process recognition function is safeguarded an overall process injection event-template, when monitoring EnumProcess, Process32First and Process32Next etc. and be used for API that process enumerates and be called, fill in a process for each found process and inject event-template, the information such as record the process name, process ID, process handle.The Core API that implementation process injects comprises: OpenProcess, VirtualAllocEx, WriteProcessMemory.Revise the front end call back function corresponding with these API, when these API are called, inject event-template by the corresponding process of call parameters index, more new template until WriteProcessMemory successfully calls, indicates the generation of process injection event.At this moment from template, read and be injected into the process name, find the page directory plot of this process, pass to again the process recognition function, add successfully just that to be injected into process be monitoring objective.The API Calls Governance framework can be injected into the behavior that process is initiated by automatic analysis subsequently.

Step 3 after the rogue program sample performance analysis finishes, will obtain a series of report, and these reports will be processed by the behavior abstract module, obtain the sample behavior.

5 behavior abstract schematic by reference to the accompanying drawings, the key step of behavior abstract module is: raw data cleaning, abstract, the behavior storage of behavior.The abstract process flow diagram of 6 behaviors by reference to the accompanying drawings, each step of behavior abstract module will be refined as a plurality of detailed rules and regulations.The below is described in detail.

3a). raw data cleaning: owing to exist some invalid, redundant api functions to call record in the original API sequence, in order to prevent these records the abstract step of the behavior of back is exerted an influence, in this step, original API time series technique file is cleared up.

The api function that need to be cleared up comprises following a few class.

＜i〉.API calls name and the identical N continuous API Calls of call parameters, only keeps first, removes afterwards N-1 API Calls.

N continuous is inferior can not to show more behavior with the same api function of same parameter call, cause extra computation burden can on the contrary follow-up behavior abstraction process.

＜ii 〉. invalid handle parameter

In raw data cleaning logic, safeguarded overall handle information table, the handle that imports into of any valid function all should be handle parameter or the rreturn value that function before spreads out of.Used the handle that does not appear in the overall handle information table as importing parameter into if find certain function, can think that so this function call is invalid.

＜iii 〉. used invalid handle value.

Some handle value representation be invalid handle, be nonsensical to the use of these handles, so think that this function call is invalid.

3b). behavior is abstract: this step is the core of the abstract flow process of whole behavior, at first from database, read predefined behavior abstraction rule, according to these abstraction rules, the API sequential recording file after the cleaning is resolved afterwards, obtain the behavioural information of sample.

Because the API Calls of catching record is with the form storage of text, so the abstract process of behavior is reading and resolving text file.After opening API sequential recording file, analyze one by one for all API Calls function records in the file, for each api function of catching, following several situation that may occur arranged:

＜i 〉. this function and behavior are abstract irrelevant.

Namely this function is not Key Functions, and in this case, this function more can partly not carry out any operation, such as Sleep, GetSystemTime etc. to system core.This class function can directly be skipped.

＜ii 〉. this function can form auxiliary behavior.

If this function is Key Functions and can forms auxiliary behavior, in this case, need to obtain the parameter of this function and process, such as character string conversion and synthetic etc., the auxiliary behavior that then will form is temporarily stored into database.

＜iii 〉. this function can form abstract behavior.

If this function is Key Functions and can forms abstract behavior, in this case, need to obtain the parameter of this function and process, such as character string conversion and synthetic etc., then deposit the abstract behavior that forms in database.After whole file analysis is finished, these abstract behaviors will expand to decision vector according to predetermined extension rule.

3c). behavior storage: for the ease of the processing of follow-up sorting algorithm, the data that draw in the behavior abstraction process, comprise that abstract behavior and decision vector will be stored in the database, simultaneously in the process of sample analysis, may carry out to a certain degree change to the abstract rule of behavior for the situation of actual sample, to adapt to the characteristics of concrete sample class.

Step 4 after the process behavior is abstract, with obtaining the behavioural information of sample and being stored in the database, along with increasing of training sample, will be stored a large amount of sample behavioural informations in the database.In order to judge whether rogue program of Study document that the user submits to, perhaps belong to any rogue program, must play disaggregated model by model.Native system adopts integrated study thought to utilize the behavioural information of training sample to set up disaggregated model, by training a plurality of sub-classifiers same sample classification result is voted, to improve the nicety of grading in many classification situation.

Ensemble Learning Algorithms is divided into two key links: sub-classifier generation and sorter are integrated.From the characteristics of algorithm process data, the data that sandbox behavior monitoring and static analysis collect have API sequence, file static nature, network packet etc.These data come from different data sources, are the discrete type categorical datas; Require from Detection task, require sorting algorithm can process many classification problems rather than simple two classification.Comprehensive above requirement, system selects classical decision tree C4.5 algorithm as the sub-classifier algorithm.

System uses decision Tree algorithms as the sub-classifier algorithm, and the AdaBoost algorithm is as Integrated Algorithm, and the result strengthens to the rogue program detection and classification.

Step 5, output rogue program behavior report, the result who detects and classify.

Should be understood that, for those of ordinary skills, can be improved according to the above description or conversion, and all these improvement and conversion all should belong to the protection domain of claims of the present invention.

Claims

1. a rogue program behavior automated analysis, detection and classification system is characterized in that, comprise such as lower module:

(1). static analysis module: before sample file is carried out the sandbox performance analysis, can carry out static analysis to the structure of executable file, to obtain the information relevant with sample as much as possible, the static analysis that is obtained sample file by these information is reported, and various report afterwards becomes the most original Data Source of behavior abstract module;

(2). the sandbox dispatching management module: the sandbox dispatching management module is managed the flow process of each sandbox, concordant sample and data transfer, control sample automated analysis; The sandbox dispatching management module is controlled the startup of each sandbox and is withdrawed from, and realizes and message exchange and the file transfer of each sandbox that execution and the hosted environment of control sample are simulated;

(3). the sandbox monitoring module: the sandbox monitoring module extracts simultaneously this process load-on module and operating system and is its relevant kernel data of its maintenance take the API Calls of catching specific process and initiating and parameter thereof as main target.The present invention uses the simulator Qemu that increases income as the software virtual machine on basis, and the instruction interpretation operating part core code in its CPU simulation is made amendment, and realizes the purpose of monitoring specific process Host behavior.This behavior monitoring technology based on the instruction set simulation environment can begin to realize that the kernel module reconstruct such as system call, process obtain the behavior the rogue program Dynamic Execution from bottom to top from instruction-level, and the sandbox environment that host and rogue program are carried out is isolated, has avoided to a great extent rogue program in the process of implementation on the impact of host;

(4). behavior abstract module: finish after the catching of the execution of rogue program and API at the sandbox monitoring module, can obtain api function that this sample program run duration uses and the report of parameter thereof; But this API report is directly used in the rogue program classification, has some obstacles, so need to the abstract behavior that obtains the sample performance from the API sequence;

(5). the detection and classification module: the rogue program Detection task is many classification task of a standard.In order to judge whether rogue program of Study document that the user submits to, if needing further to judge belong to any rogue program, must play disaggregated model by model; Adopt the thought of integrated study to set up disaggregated model, the thought of integrated study is used different strategies that a large problem is divided into some minor issues and is found the solution respectively, or generating a plurality of learners solves the same problem, then by Integrated Strategy that the Output rusults of different sub-classifiers is synthetic, obtain single final Output rusults.

2. rogue program behavior automated analysis according to claim 1, detection and classification system, it is characterized in that: the sandbox monitoring module comprises: as the Guest OS unit of rogue program virtual execution environment; Transformed total system simulator Qemu watch-dog unit; Guest OS unit has comprised the functions such as network packet monitoring, snapshot contrast, host event simulation, and Qemu watch-dog unit has comprised process identification and multi-process monitoring, API monitoring, the analysis of API dependence and redundant data filtering function.

3. rogue program behavior automated analysis according to claim 2, detection and classification system is characterized in that: Guest OS unit is the environment of operation rogue program sample, selects Windows XP operating system as GuestOS; Be connected by virtual network between Guest OS unit and the host, be responsible for mutual by the sandbox dispatching management module.

4. rogue program behavior automated analysis according to claim 2, detection and classification system, it is characterized in that: there is more highly privileged grade the Qemu watch-dog unit of sandbox monitoring module than Guest OS unit, is used for the behavior of monitoring objective program; Qemu watch-dog unit uses the simulator Qemu that increases income as the software virtual machine on basis, but the instruction interpretation operating part core code in its CPU simulation is made amendment, and realizes the purpose of monitoring specific process Host behavior.

5. rogue program behavior automated analysis according to claim 4, detection and classification system, it is characterized in that: the method that described Qemu watch-dog unit carries out process identification is: the sandbox monitoring module is before each translation BOB(beginning of block) is carried out, utilize the virtual memory read/write function, take kernel data structure KPCR as clue, find in the system current just in the EPROCESS of executive process structure start address; Then, judge by the process name preserved in the EPROCESS structure current just at executive process target process whether, if then therefrom the read operation system assignment give the page directory base value of this process; Afterwards, the value of storing in this value and the virtual CR3 register is compared, judge whether monitoring process is carried out; Only when carrying out, target process carries out the behavioral data collection.

6. according to claim 1-5 arbitrary described rogue program behavior automated analysis, detection and classification system is characterized in that: described sandbox monitoring module uses " three-step approach " to solve this problem; Detailed process is as follows: at first test whether there is the phenomenon that skips leaf before reading, skip leaf if occur, wait for that this page or leaf is transferred virtual memory, if wait for unsuccessful, read by force data in this address space by routine analyzer, trigger the client operating system page fault and process routine, with skip leaf and call in virtual memory, then attempt again reading out data;

In order to improve execution efficient, in the sandbox monitoring module, be not that " three-step approach " all used in the read-write of all virtual memorys, but only when most possible skipping leaf, carry out above page fault avoidance strategy; In the Windows system, the data that directly read from stack can not cause and skip leaf that character string and structure parameter generally little also can the initiation of data volume skip leaf, and all do not need the test of skipping leaf; When relating to I/O process or large buffer area read-write, just page fault might appear only.

7. rogue program behavior automated analysis according to claim 4, detection and classification system is characterized in that: the behavior that the multi-process monitoring function in the Qemu watch-dog unit injects the process that is injected into for subprocess and the process of the establishment of monitoring host process; The method that native system carries out the multi-process monitoring is: the first step, obtain the process name that needs monitoring process, and when the operating system initialization process, with process clue by name, find operating system to distribute to the value of the page directory plot of this process.For creating this behavior of subprocess, can realize by this core A of monitoring NtCreateProcess PI.Transform front end call back function corresponding to NtCreateProcess, from call parameters, extract the process name of the process of being created, by the run-time memory analytical approach of above introducing, find the value of this process page directory plot, pass to the API Calls Governance framework, to realize the expansion of subprocess monitoring.Second step, process recognition function in the Qemu watch-dog unit is safeguarded a responsive page directory base value tabulation, before each translation piece is carried out, compare with the value of storing in the virtual CR3 register, when virtual CR3 register switched to any responsive page directory base value, the API monitoring function in the Qemu watch-dog unit was started working.

8. rogue program behavior automated analysis according to claim 7, detection and classification system is characterized in that: the native system process of carrying out is injected the method for behavior monitoring and is:

The first step, the identification process is injected behavior: process is injected when realizing usually from process is enumerated, because each is enumerated process is a potential process that is injected into, process recognition function in the Qemu watch-dog unit is safeguarded an overall process injection event-template, when monitoring EnumProcess, Process32First and Process32Next etc. and be used for API that process enumerates and be called, fill in a process for each found process and inject event-template, the information such as record the process name, process ID, process handle; The Core API that implementation process injects comprises: OpenProcess, VirtualAllocEx, WriteProcessMemory; Revise the front end call back function corresponding with these API, when these API are called, inject event-template by the corresponding process of call parameters index, more new template until WriteProcessMemory successfully calls, indicates the generation of process injection event;

Second step extracts the process name of the process that is injected into: read and be injected into the process name, find the page directory plot of this process, pass to the process recognition function again, add successfully just that to be injected into process be monitoring objective from template; The API Calls Governance framework can be injected into the behavior that process is initiated by automatic analysis subsequently.

9. a rogue program behavior automated analysis, detection and classification method is characterized in that step is as follows:

Step (1), static analysis module are at first carried out static analysis to the structure that can carry out sample file, and acquisition can be carried out the static information of sample file;

Step (2), after static analysis is finished, sample file will enter the performance analysis automation process: the performance analysis process of sample file will be by the automatic management of sandbox dispatching management module, the sandbox dispatching management module starts sandbox, sample file is uploaded to Guest OS unit, at Guest OS unit operation sample, execution or the loading of sandbox monitoring module monitoring sample, produce the report of the API sequence of sample file, the network packet that network packet watchdog routine Guest OS unit produces, the network packet report that produces sample file; After sample is carried out normal termination or overtime end, if right and wrong EXE sample file, to carry out the snapshot contrast of registration table and file system, produce registration table, File Snapshot comparison report, the file that described report will generate in the sample implementation is transferred to the sandbox dispatching management module, and described report will be that a rogue program sample is carried out the abstract raw data of behavior;

Step (3) after the rogue program sample performance analysis finishes, will obtain a series of report, and described report will be processed by the behavior abstract module, obtain the sample behavior;

Step (4) after the process behavior is abstract, with obtaining the behavioural information of sample and being stored in the database, along with increasing of training sample, will be stored a large amount of sample behavioural informations in the database; In order to judge whether rogue program of Study document that the user submits to, perhaps belong to any rogue program, must play disaggregated model by model; Adopt integrated study thought to utilize the behavioural information of training sample to set up disaggregated model, by training a plurality of sub-classifiers same sample classification result is voted, to improve the nicety of grading in many classification situation;

Step (5), output rogue program behavior report, the result who detects and classify.

10. rogue program behavior automated analysis according to claim 2, detection and classification method is characterized in that: the key step that behavior is abstract:

(1) raw data cleaning;

(2) behavior is abstract;

(3) behavior storage;

The api function of raw data cleaning comprises following a few class:

(1) the identical N continuous API Calls of API Calls name and call parameters only keeps first, N-1 API Calls after removing;

(2) used the handle that does not appear in the overall handle information table as importing parameter into if find certain function, can think that so this function call is invalid;

(3) some handle value representation is invalid handle, is nonsensical to the use of these handles, so think that this function call is invalid;

The abstract process of behavior is the process that all API Calls records in the file are analyzed one by one, for each API Calls of catching, following several situation that may occur is arranged:

(1) this function and behavior are abstract irrelevant:

Namely this function is not Key Functions, and in this case, this function more can partly not carry out any operation to system core, and this class function can directly be skipped;

(2) this function can form auxiliary behavior:

If this function is Key Functions and can forms auxiliary behavior, in this case, need to obtain the parameter of this function and process, the auxiliary behavior that then will form is temporarily stored into database;

(3) this function can form abstract behavior:

If this function is Key Functions and can forms abstract behavior, in this case, need to obtain the parameter of this function and process, then deposit the abstract behavior that forms in database; After whole file analysis is finished, these abstract behaviors will expand to decision vector according to predetermined extension rule.