CN102930210A - System and method for automatically analyzing, detecting and classifying malicious program behavior - Google Patents

System and method for automatically analyzing, detecting and classifying malicious program behavior Download PDF

Info

Publication number
CN102930210A
CN102930210A CN 201210408358 CN201210408358A CN102930210A CN 102930210 A CN102930210 A CN 102930210A CN 201210408358 CN201210408358 CN 201210408358 CN 201210408358 A CN201210408358 A CN 201210408358A CN 102930210 A CN102930210 A CN 102930210A
Authority
CN
Grant status
Application
Patent type
Prior art keywords
process
behavior
sandbox
api
analysis
Prior art date
Application number
CN 201210408358
Other languages
Chinese (zh)
Other versions
CN102930210B (en )
Inventor
邹艳
刘建港
苗启广
曹莹
谢国胜
黄有成
刘家辰
郑春阳
Original Assignee
江苏金陵科技集团公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date

Links

Abstract

The invention discloses a system and a method for automatically analyzing, detecting and classifying a malicious program behavior. The system comprises a static analysis module, a sandbox dispatching management module, a sandbox monitoring module, a behavior abstraction module and a detection and classification module. Compared with the prior art, the system has the advantages that 1, the system is based on a behavior monitoring technology in an instruction set simulation environment; and 2, a virtual Internet is established in a sandbox through means of environment configuration, server program modification and the like, and a common network service is simulated, so that operations such as domain name server (DNS) resolution, http access, file download, Email login and mailing initiated by a malicious program can be successfully executed, the malicious program is inveigled to generate a malicious network behavior, the network behaviors are prevented from damaging a host machine and a real network, and the defects that the malicious program network behavior cannot be fully expressed during dynamic behavior analysis of a malicious program and the like are overcome.

Description

恶意程序行为自动化分析、检测与分类系统及方法 Automated malware behavior analysis, detection and classification system and method

技术领域 FIELD

[0001] 本发明属于系统安全与网络安全相关领域,更进一步涉及恶意程序动态行为自动化分析的方法。 [0001] The present invention pertains to system security and network security related art, further relates to a method for automated dynamic behavior malware analysis. 本发明用于对已知恶意程序动态行为规则的建立,和对未知恶意程序动态行为的高准确度判断。 The present invention for establishing a dynamic behavior rules known malicious programs, and high accuracy determination of the dynamic behavior of unknown malicious programs.

背景技术 Background technique

[0002] 在恶意程序分析领域,为了更加准确、更加全面的、更加迅速的获得恶意程序的行为特征,采用动态行为自动化分析方法。 [0002] In the field of malware analysis, to be more accurate, more comprehensive, more quickly obtain the characteristic behavior of malicious programs, dynamic behavior of automated analytical methods.

[0003] 电子科技大学的专利申请“恶意程序动态行为自动化分析系统与方法”(公开号:CN101154258,申请日:2007.08.14)中公开了一种恶意程序动态分析的方法。 [0003] University of Electronic Science and Technology patent application "malware dynamic behavior of the automated analysis system and method" (Publication Number: CN101154258, filing date: 2007.08.14) discloses a method for dynamic analysis of malicious programs. 这种动态分析的具体步骤包括:(I)初始化部件启动目标二进制程序;(2)初始化部件加载虚拟执行部件和行为监控部件;(3)反汇编部件获取目标程序二进制代码流的汇编指令;(4)虚拟执行部件切片生成相应的执行块;(5)行为控制部件判断基本块内是否存在规则库中的恶意行为;(6)若存在恶意行为,将控制权交给行为分析部件,记录该恶意行为;(7)虚拟执行基本块中的每条指令;(8)停止分析后,行为分析部件提交恶意行为分析报告。 This particular dynamic analysis comprises the step of: (I) initialization means to start the target binary program; (2) loading virtual execution unit initializing section and behavioral monitoring component; (3) Disassemble member acquires target program binaries assembler instruction stream; ( 4) to generate the corresponding virtual execution means the execution block slice; (5) the control means determines whether there is the behavior rule base basic block malicious behavior; (6) if malicious behavior, the behavior control to the analysis means, the recording malicious behavior; (7) a virtual execution each instruction in a basic block; (8) after stopping analysis, behavioral analysis submitted malicious behavior analysis component. 虽然这种方法提供了自动化恶意程序动态行为分析系统,可用于对未知恶意程序动态行为的粗粒度划分,但是由于该系统缺少对恶意程序的静态分析、缺少主机事件模拟、缺少主机环境模拟、缺少常见的网络环境的模拟等,所以该系统对恶意程序的动态行为的获取很不全面;并且该系统仅能对二进制可执行文件进行分析,对其他格式文件如服务程序、DLL文件或非PE文件都不能进行分析,系统使用的局限性很大;同时,该系统对获得恶意程序行为特征之后如何进行行为抽象,以及如何对未知二进制程序的恶意行为进行分析归类未能给出方法。 While this approach provides automated malware behavior analysis of dynamic systems it can be used for coarse-grained division of the dynamic behavior of unknown malicious programs, but because the system lacks static analysis of malicious programs, the lack of host-event simulation, simulation lack host environment, the lack of analog common network environment, etc., so that the system dynamic behavior of malicious programs get very comprehensive; and the system will only be able binary executable file for analysis, such as service program, DLL files or other PE file format can not be analyzed, the system uses a lot of limitations; the same time, this system after obtaining a malicious program behavior characteristics of the behavioral abstraction, and how binary unknown malicious behavior analysis method failed to give classified. 综上所述,这些不足影响到该系统的实用性、准确性以及分类效率。 Taken together, these deficiencies affect the usefulness of the system, accuracy and efficiency of classification.

发明内容 SUMMARY

[0004] 本发明针对现有的恶意程序分析、检测与分类的技术的不足,提出一种静态分析、动态分析与网络分析相结合,行为抽象与集成学习相结合的恶意程序自动化分析、检测与分类方法。 [0004] The present invention is directed to an existing malicious program analysis, inadequate testing and classification technology, proposed a static analysis, dynamic analysis combined with network analysis, integrated learning and behavior abstract combination of automated malware analysis and detection Classification. 目标是提供实用性较强的恶意程序自动化分析、检测与分类系统及方法,它支持加载运行PE文件和常见的非PE文件,支持对恶意程序执行的完全监控,监控恶意程序加载执行期间的进程注入、注册表操作、内存操作、文件操作等主机行为以及网络重定向、DNS寻址、ftp连接、http访问、email登录与发送等网络行为,提供对进程、内存、文件、注册表、主机环境、网络等各类系统资源的恶意访问行为,提供U盘插入、光盘插入等主机事件的模拟。 The goal is to provide practical, strong malicious programs automated analysis, detection and classification system and method, which supports loading operation and common non-PE files PE files, support for full implementation of the monitoring for malicious programs, monitoring the malicious program is loaded during the execution process injection, the behavior of the host registry operations, memory operations, as well as network file operations such as redirection, DNS address, ftp connection, http access, email and other network login and send behavior, providing processes, memory, files, registry, host environment , the behavior of all kinds of malicious access to system resources such as network, provides a U disk into analog insert the disc hosts events and so on. 同时,根据恶意程序自动化分析生成的报告对每个恶意程序的行为进行系统化、规则化的抽象,形成恶意程序行为特征库,利用集成学习方法对这些行为特征进行分析和量化,建立起分类模型,有效提高了对未知样本文件分类的准确率。 Meanwhile, according to the report generation automated analysis of malicious programs were malicious programs on the behavior of each of systematic, rule of abstract form of malicious program behavior feature library, these behavioral characteristics are analyzed and quantified using integrated learning, establish a classification model effectively improve the accuracy of the unknown sample file classification.

[0005] 本发明通过虚拟化技术实现的沙盒监控、获取恶意程序静态信息并捕获恶意程序行为特征、基于行为特征的恶意程序检测及分类这一整套系统化的方法来实现恶意程序自动化分析与准确分类,以解决现有技术中特征码提取难度大、难以应对复杂的加壳、多态和变形技术的恶意程序、恶意程序行为捕获不完整、行为抽象和检测分类方法不明确等缺点,提高了恶意程序的检测率和分类准确率。 [0005] The present invention sandbox virtualization technology monitoring, access to information and to capture static malware malware behavioral characteristics, behavior-based malware detection and classification features of this set of systematic approach to implement automated analysis and malware accurate classification, in order to solve the prior art signature extraction is difficult, difficult to deal with complex shell, polymorphic malware and deformation technology, the malicious program behavior capture incomplete, behavior detection and abstract classification is not clear and other shortcomings, improve the detection rate and classification accuracy of malicious programs.

[0006] 本发明提供的恶意程序自动化分析、检测与分类系统包括了下述模块: [0006] The present invention provides malicious automated analysis, detection and classification system includes the following modules:

[0007] I.静态分析模块:在对样本文件进行沙盒动态分析之前,可以对可执行文件(PF文件)的结构进行静态分析,以得到尽可能多的与样本相关的信息,由这些信息得到样本文件的静态分析报告,和之后的各种报告成为行为抽象模块最原始的数据来源。 [0007] I. static analysis module: prior to sample file sandbox dynamic analysis, the structure may be static analysis of the executable file (PF file), to give as much information associated with a sample, the information get sample file static analysis, reports and abstract behavior after becoming the most primitive data source module.

[0008] 2.沙盒调度管理模块:本发明包括多个沙盒,需要有独立的沙盒调度管理模块管理每个沙盒、协调样本和数据的传输、控制样本自动化分析的流程。 [0008] 2. sandbox schedule management module: The present invention comprises a plurality of sandbox, requires an independent transmission scheduling management module manages sandbox each sandbox, and coordinate data of the sample, the control flow automated analysis of samples. 沙盒调度管理模块控制每一个沙盒的启动和退出,实现与每一个沙盒的信息交换与文件传输,控制样本的执行和主机环境模拟。 Sandbox scheduling management module controls the start of each sandbox and exit, implementation and exchange information with each file transfer sandbox, control and execution host environment simulation samples. 总的来说,沙盒调度管理模块是一个协助沙盒监控模块自动化完成相应功能的模块,是一个重要的辅助模块。 Overall, the sandbox is a schedule management module to assist sandbox monitoring module that automates the corresponding function module is an important auxiliary module.

[0009] 3.沙盒监控模块:沙盒监控模块以捕获特定进程发起的API调用及其参数为主要目标,同时提取该进程加载模块和操作系统为其维护的相关内核数据。 [0009] 3. The monitoring module sandbox: Sandbox monitoring module to capture a particular process initiated by the API calls and parameters as the main target, but the extraction process to load module and its associated operating system kernel data maintenance.

[0010] 本发明使用开源模拟器Qemu作为基础的虚拟机软件,并且对其CPU模拟中的指令解释执行部分核心代码进行修改,实现监控特定进程主机行为的目的。 [0010] The present invention is used as a basis for open Qemu emulator virtual machine software, and explanation thereof CPU instruction execution simulation kernel code modifications, to achieve the purpose of monitoring the behavior of a particular process host. 这种基于指令集仿真环境的行为监测技术可以从指令级开始自下而上实现系统调用、进程等内核模块重构来获取恶意程序动态执行中的行为,并且宿主机与恶意程序执行的沙盒环境相隔离,在很大程度上避免了恶意程序在执行过程中对宿主机的影响。 This behavior-based monitoring technology instruction set simulator environment can be achieved from the bottom-up start instruction-level system calls, kernel modules such as the reconstruction process to obtain the dynamic execution of the malware behavior, and host malicious program execution sandbox environmental isolated, largely avoided the impact on the host malicious program during execution.

[0011] 为了克服传统API拦截方法中由于修改被监控程序源代码引起的破坏程序稳定执行、容易被检测出分析工具的存在以逃避监控、收集操作系统内核数据时,需要驱动程序,技术难度大等缺点,沙盒监控模块以不修改目标程序,静默监控测试程序的执行,收集多种可用信息为目标。 [0011] In order to overcome API interception process due to destruction of the program modification is caused to monitor program source code is stably performed, is easily detected when analyzed for the presence of the tool to escape surveillance, collecting operating system kernel data, you need a driver, technical difficulties and other shortcomings, sandbox monitoring module does not modify the target program, silently monitor the execution of the test program, collect a variety of information available for the target. 被监控程序运行在客户操作系统中,程序行为的监控实现在比客户操作系统有着更高特权等级的Qemu监控器单元。 Monitored program running in the guest operating system, the monitoring program behavior to achieve Qemu monitor unit than the guest operating system has a higher privilege level. 由于行为监控实现于更高的特权等级,测试程序难以逃脱分析,并且无需修改测试程序源代码。 Since the behavior monitoring to achieve at a higher privilege level, the test program is difficult to escape the analysis, testing and without modifying the source code.

[0012] 4.行为抽象模块:在沙盒监控模块完成对恶意程序的执行和API的捕获之后,可以获得该样本程序运行期间使用的API函数及其参数的报告。 [0012] 4. The conduct abstraction module: After completing the capture and execution of malicious programs in the sandbox API monitoring module, you can get reports API functions and parameters of the sample program used during the operation. 但是该API报告直接用于恶意程序分类,存在一些障碍,所以需要从API序列中抽象得到样本表现的行为。 But the API report directly used for malicious programs into categories, there are some obstacles, so you need to get the abstract behavior of a sample performance from the API sequence. 这个将样本API序列抽象为样本行为的过程,称为“行为抽象”。 This API will sample sequence acts as a sample abstract process, known as "behavioral abstraction."

[0013] 恶意程序样本经过沙盒分析,得到的是其API调用序列。 [0013] malware samples through the sandbox analysis, is its API call sequence. 虽然该调用序列含有较多的与恶意程序行为相关的信息,但是在后续分类算法处理,以及生成人容易理解的报告的过程中,API序列的抽象层次程度过低。 Although the call sequence contains more malicious program behavior associated with the information, but in the course of the subsequent classification algorithm processing, as well as adult students easy to understand reports, the degree of abstraction API sequence level is too low. 所以需要定义一些规则,将API调用序列抽象为算法容易处理的数据形式,更进一步还需要抽象成为人容易理解的表述形式。 So you need to define some rules, the API call sequence data in the form of an abstract algorithm easy to handle, and further need to be expressed in the form of abstract people easy to understand.

[0014] 5.检测与分类模块:恶意程序检测任务是一个标准的多分类任务。 [0014] The detection and classification module: malware detection is a standard multi-task classification task. 为了判断用户提交的分析文件是否是恶意程序,若是需进一步判断属于哪一种恶意程序,必须首先建立起分类模型。 In order to determine whether the analysis of documents submitted by the user is a malicious program, which if a malicious program that needs further judgment belongs, you must first establish a classification model.

[0015] 本系统采用集成学习的思想建立分类模型。 [0015] The system uses the idea of ​​establishing an integrated learning classification model. 集成学习的思想使用不同的策略将一个大问题划分为若干的小问题分别求解,或是生成多个学习器解决同一问题,接着通过集成策略将不同子分类器的输出结果合成,得到单一的最终输出结果。 Different ideas using an integrated learning strategy will be a big problem into a number of small problems are solved, or to generate a plurality of learning is to solve the same problem, and then output the results by different sub-classifiers integration strategy will be synthesized by a single final output. 生成多个分类器进行表决,能够有效提高分类问题的准确率,是本系统中算法设计的核心。 Generating a plurality of classification vote is possible to effectively improve the accuracy of the classification problem, the present system is the core algorithm design.

[0016] 集成学习算法分为两个关键环节:子分类器生成和分类器集成。 [0016] integrated learning algorithm is divided into two key areas: sub-classifier generation and classifier ensemble. 本系统选择增强学习中经典的AdaBoost算法作为集成框架,选择决策树算法C4. 5作为子分类器生成算法。 The system chosen in the classic reinforcement learning AdaBoost algorithm as an integrated frame, select decision tree algorithm C4. 5 algorithm as the sub-classifiers generated.

[0017] 本发明与现有技术相比具有以下优点: [0017] The present invention and the prior art has the following advantages:

[0018] 第一,本发明基于指令集仿真环境的行为监测技术,可以从指令级开始自下而上实现系统调用、进程等内核模块重构来获取恶意程序动态执行中的行为,由于行为监控实现于更高的特权等级,测试程序难以逃脱分析,并且无需修改测试程序源代码,因此克服了传统API拦截方法中由于修改被监控程序源代码引起的破坏程序稳定执行,容易被检测出分析工具的存在以逃避监控,收集操作系统内核数据时需要驱动程序、技术难度大等缺点。 [0018] First, the present invention is based on the behavior monitoring technique instruction set simulator environment, the system call can be achieved from the bottom-up start instruction level, reconstruction kernel module to obtain a dynamic process like malicious program behavior, since behavior monitoring achieved at a higher privilege level, the analysis is difficult to escape the test program, and without modifying the source code of the test program, thereby overcoming the conventional method of intercepting API modifications are due to damage caused by the monitoring program source code execution stable, easily detected Analyzer presence to evade surveillance, data collection operating system kernel when the program needs to be driven, technical difficulties and other shortcomings.

[0019] 第二,本发明在沙盒中通过环境配置和修改服务器程序等手段构建虚拟Internet,模拟普遍的网络服务,使得恶意程序发起的DNS解析、http访问、文件下载、Email登录、邮件发送等操作能够成功执行,诱骗恶意程序产生恶意网络行为,同时确保这些网络行为不会对宿主机和真实网络造成破坏,克服了恶意程序动态行为分析中恶意程序网络行为无法充分表现等缺点。 Construct [0019] Second, the present invention is modified by environment configuration and server programs in a sandbox and other means of virtual Internet, universal analog network services such malicious programs initiated DNS resolution, http access, file downloads, Email log in, send e-mail and other operations can be performed successfully, trick malicious programs generate malicious network behavior, while ensuring that such behavior does not cause damage to the network host and a real network, malicious programs to overcome the dynamic behavior of network behavior analysis of malware and other shortcomings can not be fully demonstrated.

[0020] 第三,本发明在沙盒中通过环境配置和程序等手段模拟多种主机事件和主机环境,使得恶意程序对U盘插入、光盘插入、网络共享文件夹连接等事件或对麦克风、摄像头等环境敏感时都能够成功表现出后续的行为,诱骗恶意程序产生更多的行为,克服了恶意程序动态行为分析中恶意程序对主机事件或主机环境敏感时无法充分表现行为等缺点。 [0020] Third, the present invention is disposed in the environment by the sandbox and various host programs and other means simulated event, the host environment, such that the malicious program is inserted into the U disk, an optical disk is inserted, connected to a network shared folder microphone or other events, when sensitive cameras environments are able to successfully show subsequent behavior, trick malicious programs produce more acts to overcome the dynamic behavior of malware behavior analysis is not sufficiently developed malicious programs sensitive to host an event or host environment and other shortcomings. [0021 ] 第四,本发明实现了一种对于恶意程序行为的抽象算法,通过分析和处理沙盒得到的恶意程序行为原始数据,可以得到形式整齐、较少冗余的抽象行为数据。 [0021] Fourth, the present invention realizes a method for abstract malicious program behavior, by analyzing and processing the behavior of malware sandbox obtained original data can be obtained in the form of clean, less redundancy abstract behavioral data. 该行为抽象算法可以快速得到可供后续行为检测与分类算法使用的数据,为后续算法提供良好的数据基础,克服了传统行为抽象算法抽象速度慢、表示方式复杂、通用性不强等缺点。 The algorithm can quickly get abstract behavior for subsequent behavior detection data classification algorithm used to provide a good basis for the subsequent data algorithm, to overcome the traditional slow behavior Abstract abstract algorithm, expressed in a complex manner, it is not universal and other shortcomings.

[0022] 第五,本发明在恶意程序行为检测与分类过程中使用了先进的集成学习算法AdaBoost。 [0022] Fifth, the present invention employs advanced integrated learning algorithm AdaBoost malicious program behavior detection and classification process. 作为现代机器学习十大算法之一,AdaBoost算法可以将若干性能较弱的分类器通过自适应线性组合,得到性能较强的分类器,同时隐式优化分类边界,避免过拟合带来的负面影响。 As one of the top ten modern machine learning algorithm, AdaBoost algorithm performance can be a number of weak classifiers by adaptive linear combination to give strong performance classifier, at the same time optimizing the implicit classification boundary, avoid over-fitting brings negative influences. 在恶意程序行为检测与分类过程中采用AdaBoost算法,可以有效提高分类准确率,尤其是对于新样本的泛化准确率。 Detect malicious program behavior and classification process using AdaBoost algorithm can effectively improve the classification accuracy, especially for new samples of generalization accuracy. 克服了传统恶意程序行为检测与分类过程中分类效果不理想、容易过拟合导致泛化能力差等缺点。 To overcome the traditional behavior of malicious program detection and classification process, classification is not satisfactory, fitting easily lead to over-generalization ability of poor shortcomings.

附图说明 BRIEF DESCRIPTION

[0023] 图I为本发明恶意程序行为自动化分析、检测与分类系统及方法的流程图; Behavior [0023] FIG I malicious automated analysis of the present invention, a flowchart of detection and classification system and method;

[0024]图2为本发明恶意程序行为自动化分析、检测与分类系统及方法沙盒监控模块的体系结构图; [0024] FIG. 2 malicious behavior automatic analysis, detection and classification system configuration diagram of a system and method sandbox monitoring module of the present invention;

[0025]图3为本发明恶意程序行为自动化分析、检测与分类系统及方法的沙盒监控模块与沙盒调度管理模块之间交互工作流程图。 Behavior [0025] FIG. 3 malicious automated analysis of the present invention, interworking between sandbox classification system and method for detection and monitoring module and schedule management module flowchart sandbox.

[0026] 图4为本发明恶意程序行为自动化分析、检测与分类系统及方法的Qemu监控器单元工作图。 Behavior [0026] FIG. 4 of the present invention, a malicious program automated analysis, Qemu monitor unit working FIG detection and classification system and method.

[0027] 图5为本发明恶意程序行为自动化分析、检测与分类系统及方法的行为抽象示意图。 Behavior [0027] FIG. 5 of the present invention, a malicious program automated analysis, detection and classification system and method for behavioral abstraction FIG. [0028] 图6为本发明恶意程序行为自动化分析、检测与分类系统及方法的行为抽象流程图。 Behavior [0028] FIG. 6 of the present invention, a malicious program automated analysis, detection and classification system and method for behavioral abstraction flowchart.

具体实施方式 detailed description

[0029] 以下结合具体实施例,对本发明进行详细说明。 [0029] The following embodiments with reference to specific embodiments, the present invention will be described in detail.

[0030] 参考图1,步骤1,静态分析模块首先对可执行样本文件的结构进行静态分析,获得样本的编译器版本、构建时间、多国语言信息、PE文件的节信息、PE文件的导入表、PE文件是否加壳以及加壳类型等,静态分析模块将得到与恶意程序相关的信息,并结合沙盒监控模块获得的恶意程序动态分析信息,为最后的集成分类算法的分类提供更丰富的数据。 [0030] Referring to FIG 1, Step 1, the static analysis module analyzes the structured executable sample file static analysis to obtain compiler version of the sample, the build time, the import table multi-language information, section information on a PE file, PE file whether the PE file packers and packer types, static analysis module to get information related to malicious programs, combined with dynamic malware analysis information sandbox monitoring module available to provide for the classification of the final classification algorithm integrated richer data.

[0031] 步骤2,静态分析完成以后,样本文件将进入动态分析自动化过程。 [0031] Step 2, after the completion of static analysis, dynamic analysis sample files into the automation process. 样本文件的动态分析过程将由沙盒调度管理模块自动化管理。 Dynamic analysis of sample files will sandbox Scheduling Module for automated management. 沙盒调度管理模块启动沙盒,将样本文件上传至Guest OS单元,在Guest OS单元中运行样本,沙盒监控模块中的Qemu监控器单元监控样本的执行或加载,产生样本文件的API序列的报告,沙盒监控模块中GuestOS单元的网络数据包监控功能监控Guest OS单元产生的网络数据包,产生样本文件的网络数据包报告。 Sandbox schedule management module starts sandbox, upload the sample files to Guest OS cells, samples were run Guest OS cells, sandbox monitoring module Qemu monitor unit monitors the execution of the sample or the loads generated sample file API sequence report, sandbox network packet monitoring module monitoring unit GuestOS Guest OS monitoring network packet generation unit generates a data packet network reporting sample file. 样本执行正常结束或超时结束后,如果是非EXE样本文件,将进行注册表及文件系统的快照对比,产生注册表、文件快照对比报告。 After the sample perform normal termination or end of the timeout, if the sample is non-EXE file, compare registry snapshots and file systems, resulting in the registry, a snapshot of the file comparison reports. 这些报告将与样本执行过程中生成的文件一起传输给沙盒调度管理模块,这些报告将是对一个恶意程序样本进行行为抽象的原始数据。 Transmission of these reports will be generated by the process of sample files together to perform a sandbox scheduling management module, these reports will be the raw data on the behavior of a malware sample abstract.

[0032]图2为沙盒监控模块的体系结构图。 [0032] FIG. 2 is a system configuration diagram of the sandbox monitoring module. 图中详细描述了沙盒监控模块的体系结构,以及沙盒监控模块通过沙盒调度管理模块与宿主机进行交互的情况。 FIG architecture is described in detail in the sandbox monitoring module, the monitoring module and the case where the sandbox sandboxes interact with host schedule management module. 沙盒监控模块包括:作为恶意程序虚拟执行环境的Guest OS单元;改造过的全系统模拟器Qemu监控器单元。 Sandbox monitoring module includes: a malicious program as a Guest OS virtual execution unit environment; transformation of the full-system simulator Qemu monitor unit. Guest OS单元包括了网络数据包监控、快照对比、主机事件模拟等功能,Qemu监控器单元包括了进程识别与多进程监控、API监控、API依赖关系分析与冗余数据过滤等功能。 Guest OS unit comprises a packet data network monitor, compare the snapshot, the host event simulation functions, Qemu monitor unit comprises a plurality of process monitoring and process identification, monitoring the API, the API dependency and redundant data filtering. 下面2a-2b分别详细介绍各单元的功能,以及工作流程。 2a-2b, respectively, the following detailed function of each unit, and workflow.

[0033] 2a). Guest OS(客户操作系统)单元是运行恶意程序样本的环境,我们选择Windows XP操作系统作为Guest OS0 Guest OS单元与宿主机之间通过虚拟网络连接,由沙 [0033] 2a). Guest OS (guest OS) environment to run unit is a malicious program samples, we choose between Windows XP operating system as a Guest OS0 Guest OS unit connected to the host via a virtual network by sand

盒调度管理模块负责交互。 Box scheduling management module is responsible for interaction.

[0034] 参考图3,详细描述了Guest OS单元与沙盒调度管理模块进行交互,以及GuestOS单元操作恶意程序样本的工作流程。 [0034] Referring to FIG 3, a detailed description of the sandbox and Guest OS schedule management unit interacts module, and the operation unit GuestOS malware sample workflow. 下面详细介绍该工作流程。 The following details the workflow.

[0035] i>沙盒调度管理模块启动沙盒,通过虚拟网络将样本文件从宿主机上传至GuestOS单元。 [0035] i> sandbox sandbox schedule management module starts, the sample from the host to the file upload GuestOS unit via a virtual network.

[0036] ii>Guest OS启动基于主机数据包的监控。 [0036] ii> Guest OS start packet-based monitoring host. 开始执行样本文件。 Started sample file.

[0037] ii i>Qemu监控器单元在样本文件正常结束或超时结束后向沙盒调度管理模块发送样本分析结束的消息。 [0037] ii i> Qemu monitoring unit sends a message to the end of the sample analysis sandbox schedule management module after the normal end of the file or the sample end of the timeout.

[0038] iv>若样本文件不是可执行文件则进行注册表及文件系统快照比对,生成快照比对报告,否则直接进行下一步。 [0038] iv> If the sample file is not an executable file then make the registry and file system snapshot comparison, a snapshot comparison report, or directly to the next step.

[0039] v>若样本文件在执行过程中释放了其他文件,则将这些文件通过虚拟网络传给宿主机,否则直接进行下一步。 [0039] v> If the sample documents released during execution of other files, those files will be passed to the host through a virtual network, or directly to the next step.

[0040] vi>Guest OS单元将网络数据包监控报告传给宿主机。 [0040] vi> Guest OS network packet monitoring unit reports transmitted to the host.

[0041] vii>沙盒调度管理模块关闭沙盒。 [0041] vii> sandbox sandbox schedule management module is turned off. [0042] 2b). Qemu监控器单元比Guest OS单元有着更高特权等级,用于监控目标程序的行为。 [0042] 2b). Qemu than Guest OS monitoring unit has a higher privilege level means, for monitoring the behavior of the target program. Qemu监控器单元使用开源模拟器Qemu作为基础的虚拟机软件,但是对其CPU模拟中的指令解释执行部分核心代码进行修改,实现监控特定进程主机行为的目的。 Qemu monitor unit as a basis for the use of open source emulator Qemu virtual machine software, but its CPU simulation instruction interpretation execution part of the core code changes to achieve the purpose of monitoring the behavior of a particular process host.

[0043] 参考图4,描述了Qemu监控器单元的工作过程,下面详细介绍该工作过程。 [0043] Referring to Figure 4, the monitoring unit Qemu working process, the working process is described in detail below.

[0044] i>QemU监控器单元识别当前正在运行的进程是否是目标进程,若不是目标进程,则直接放过执行,否则进行下一步。 Whether [0044] i> QemU monitoring unit to identify processes that are currently running the target process, if it is the target process, then let go direct execution, otherwise the next step.

[0045] ii>若执行到API入口点处代码,则保存返回地址到返回地址栈,并调用前端回调函数,前端回调函数在API入口点处代码执行时读取in和in_out参数的in值。 [0045] ii> When code execution at the entry point to the API, the return address is saved to stack the return address, and calls the callback function front end, the front end of the callback function reads in and in_out parameter values ​​when the code execution in the API entry point.

[0046] iii>若执行的不是API入口点处代码,则与返回地址栈栈顶元素进行比较,若不相等,则说明当前调用的API是嵌套调用的API,这些嵌套调用的API不代表程序真正的行为而是操作系统的内部实现,所以放过执行而不监控。 [0046] iii> If the API is not at the entry point code execution, the return address stack and element stack are compared, if not equal, then the current API call is nested calls API, the API does not call these nested program on behalf of real activity but internal operating system implementation, so let not perform monitoring. 否则,若相等,则调用后端回调函数,后端回调函数在调用返回时读取返回值和out参数的值,之后对返回地址栈进行相应的修改。 Otherwise, if equal, then calling the callback function rear, the rear end of the callback function reads out the parameter and return value in the call returns, the return address on the stack then make the appropriate changes.

[0047] Qemu监控器单元使用开源模拟器Qemu作为基础的虚拟机软件,但是对CPU模拟中的指令解释执行部分核心代码进行修改,修改Qemu的过程中涉及到多个技术难点,下面2c-2g分别介绍各技术难点及本发明的解决方法。 [0047] Qemu monitor unit used as a basis for the open source Qemu emulator virtual machine software, but CPU command interpretation execution of the simulation kernel code modifications portion, Qemu modification process involves a plurality of technical difficulties, the following 2c-2g introduced technical difficulties and solutions of the present invention.

[0048] 2c). Qemu监控器单元的进程识别:未经改造的虚拟机仅对计算机硬件进行严格模拟,模拟CPU执行每一条指令的过程,并不理解操作系统级别的“进程”概念。 . [0048] 2c) process to identify Qemu monitor unit: unmodified virtual machine only computer hardware rigorous simulation, simulation CPU executes each instruction, do not understand the concept of operating system-level "process." 从Qemu监控器中向上监控运行于客户操作系统中的目标进程,首先必须在Qemu监控器中重构出客户操作系统中当前运行的所有进程,仅在目标进程被调度执行时,进行行为数据的捕获。 Qemu Monitor monitors from the target process running up to the guest operating system, you must first reconstruct all processes currently running in the guest operating system in Qemu Monitor, only when the target process is scheduled for execution, behavioral data capture.

[0049] 本系统在Qemu监控器单元进行进程识别的方法是:沙盒监控模块在每一个翻译块开始执行前,利用虚拟内存读写函数,以内核数据结构KPCR(Kernel Process ControlRegion,内核处理器控制区)为线索,找到系统中当前正在执行进程的EPR0CESS结构体起始地址。 [0049] The process identification method of the present system in the monitoring unit are Qemu: sandbox before the monitoring module of each translation block begins execution, the use of virtual memory read and write functions to the kernel data structures KPCR (Kernel Process ControlRegion, core processor control area) for clues, find the starting address system EPR0CESS structure currently executing process. 接着,通过EPR0CESS(执行体进程块)结构中保存的进程名判断当前正在执行进程是否是目标进程,若是则从中读取操作系统分配给该进程的页目录基址值。 Then, saved by EPR0CESS (executable process block) structure in the process name to determine whether the process is currently executing the target process, if it is read from the operating system assigned to the process page directory base value. 之后,将该值与虚拟CR3寄存器中存储的值做比较,判断监控进程是否正在执行。 After that, the virtual CR3 value stored in the register for comparison, to determine whether the monitoring process is being performed. 仅在目标进程执行时进行行为数据采集。 Only behavioral data collection in the target process.

[0050] 2d). API调用分析框架与读取参数的回调函数=Qemu以基本块为单位进行指令模拟,每一个代码块均以跳转指令结束。 [0050] 2d). API calls Framework and read parameter callback = Qemu basic instruction block units simulation, each block are skip end instruction. 因而,任何API其入口点处的代码都位于一个翻译块的开始。 Thus, any code at its inlet point API are located at the beginning of a translation block. 在翻译块的开始,基于API入口地址比较的原理,便可在Qemu监控器单元实现静默式监控特定进程发起的API调用。 At the beginning of the translation block based on the principle API entry address comparison, in Qemu monitor unit can achieve specific API calls Silent monitoring process initiated. 每一个监控中的API都有与之对应的回调函数,负责从虚拟内存中读取传递给该API的调用参数。 Each Monitoring API has a corresponding callback function is responsible for reading the parameters passed to the API call from the virtual memory. API监控通过需要修改Qemu指令翻译例程来实现,向其中插入回调函数调用框架,在API入口点代码执行之前,由框架程序判断是否需要调用相应的回调函数获取参数信息。 By monitoring API need to be modified to implement Qemu instruction translation routine, the framework calls the callback function is inserted, before the API entry point code execution, the program determines whether the call frame corresponding callback function to get the parameter information.

[0051] API调用参数的获取是行为数据采集的核心,仅得到API调用名不足以分析恶意程序行为。 [0051] API call parameters to obtain the core of behavioral data collection, received only API call name is not sufficient to analyze malware behavior. 在执行到API入口点处代码时,使用虚拟内存读取函数便可以从虚拟ESP寄存器指示的地址中读取返回地址,从ESP+4指示的地址中读取第一个参数,从ESP+8指示的地址中读取第二个参数,以此类推。 When performing the code entry point to the API using the virtual memory function can read the return address read from the virtual address indicated by the ESP register, the read address from the first parameter indicative of the ESP + 4, from ESP + 8 It reads the address indicated by the second parameter, and so on. API调用时,超过32位的参数(如字符串、结构体等),以指向该参数指针代替参数的实际值。 API call, over 32 parameters (such as strings, structures, etc.), the pointer to point to the place of the actual parameter values ​​of the parameters. 对于这些参数,一次读取虚拟内存得到的仅仅是参数 For these parameters, the virtual memory to get a read only parameter

9在内存中的存储地址,对分析没有任何作用,必须多次读取虚拟内存直到读出参数的实际值为止。 9 memory address in memory, has no effect on the analysis, it must read up until the actual value of virtual memory readout parameters.

[0052] API监控由两段式的回调函数完成,前端回调函数在API入口点处代码执行时读取in参数(输入参数)和in_out参数(输入输出参数)的in值,后端回调函数在调用返回时读取返回值和out参数(输出参数)的值。 [0052] By monitoring the two-stage API callback, the callback function reads in the front end parameters (input parameters) and in_out parameters (input and output parameters) in the code execution when the value of API entry point at the rear end callback reading out parameters and return value (output parameter) when the call returns. 前后端回调函数通过公共缓冲区进行通信,配合工作。 Front and rear ends communicate through the callback function common buffer, with the work.

[0053] 使用API入口地址比较法,不可避免的会监控到嵌套调用的API是另一个重要问题。 [0053] Using the API entry address comparative law, inevitably monitor nested calls to the API is another important issue. 操作系统在实现某个API时,有可能调用其它API。 When implementing an operating system API, it is possible to call other API. 例如CopyFile间接调用CreateFile和WriteFile完成其功能。 For example CopyFile indirect call CreateFile and WriteFile to perform its function. 这些嵌套调用的API不代表程序真正的行为而是操作系统的内部实现。 API does not mean that these programs nested calls real activity but internal operating system implementation. 为了过滤这一类嵌套调用API,回调函数调用框架维护一个返回地址栈。 To filter this type of nested call API, the callback function is called the framework to maintain a return address stack. 仅在栈深度为I时输出记录以过滤嵌套API。 Output only when the recording I nested stack depth to filter API.

[0054] 2e).缺页异常处理:虚拟内存由Qemu进程堆空间模拟,数据采集所需的各种信息都位于其中。 . [0054] 2e) missing page exception handling: virtual memory emulated by Qemu process heap space, any information required for data collection are located. 在本地主机中定位虚拟内存中的数据,绕过Qemu虚拟内存模拟例程直接读取所需信息是在Qemu监控器单元静默式收集目标进程行为数据的核心技术。 Locate data in the virtual memory on the local host, bypassing Qemu virtual memory emulation routines directly read the required information is the core technology unit silent collection target process in Qemu monitor the behavior of the data. 但是,由于Windows虚拟内存管理采用“懒策略”,若需要读取的数据不在虚拟内存,而是在虚拟硬盘中时,强行读取虚拟内存会导致缺页异常,客户操作系统中正在执行的分析进程异常终止,破坏了监控进程的正常执行,是不可被接受的。 However, the Windows virtual memory management a "lazy policy" needs to be read if the data is not in virtual memory, but when the virtual hard disk, virtual memory will lead to forced to read a page fault analysis abnormal, the guest operating system is being executed abnormal termination process, undermined the normal execution of the monitoring process, is not acceptable.

[0055] 为了避免分析者从虚拟内存中提取行为数据时引发缺页异常破坏分析程序的正常执行,沙盒监控模块使用“三步法”来解决这一问题。 Lead to missing pages when the [0055] In order to avoid the analyst to extract data from the virtual memory behavior abnormal destruction of normal program execution analysis, sandbox monitoring module using the "three steps" to resolve this issue.

[0056] 具体过程如下:读取之前首先测试是否存在缺页现象,若出现缺页,等待该页被调入虚拟内存,若等待不成功,由分析程序强行读取该地址空间中的数据,触发客户操作系统缺页异常处理例程,将所缺页调入虚拟内存,接着尝试再次读取数据。 [0056] The process is as follows: first test whether the presence of the phenomenon before reading the page fault, if the page fault occurs, waiting to be transferred to the virtual memory page, if the wait is unsuccessful, the address space to read data analysis program by force, trigger guest operating system exception handling routines missing page, a page fault will be transferred to the virtual memory, and then try to read the data again.

[0057]为了提高执行效率,在沙盒监控模块中,并非所有虚拟内存读写都使用“三步法”,而是仅在最有可能发生缺页时,执行以上缺页异常避免策略。 [0057] In order to improve the efficiency, monitoring module in a sandbox, not all virtual memory is read and written using the "three-step", but only when it is most likely to occur missing pages, missing pages perform the above abnormal avoidance strategy. Windows系统中,直接从栈中读取的数据(32位参数)不会引起缺页,字符串和结构体参数通常情况下数据量小也不会引发缺页,均不需要进行缺页测试。 Windows systems, the data (32-bit parameters) read directly from the stack without causing a page fault, and the structure parameters of the string data in a small amount does not normally lead to page fault, not a page fault needs to be tested. 只有在涉及I/O过程或是大缓冲区读写时,才有可能出现缺页异常。 Only when it comes to I / O processes or large buffers to read and write, only missing page exception may occur.

[0058] 2f). API依赖关系分析与冗余数据过滤:在API调用中,若某个API的返回值或是out参数是另一个API的in参数,那么称这两个API之间存在调用依赖关系。 . [0058] 2f) API dependency and redundant data filtering: the API call, if a return value of the API or out parameter is another parameter in the API, then said there is a call between two API dependencies. 在成功截获特定进程发起的API调用之后,API调用序列分析仍然面临以下三个难题。 After successfully intercepted a particular process initiated by the API call, API calling sequence analysis we are still faced with three problems. 第,为了对抗API频率统计和API时序分析,部分恶意程序的编写使用了冗余API插入和API重排,使得API时序调用序列难以刻画恶意程序的特征行为;第二,动态分析在运行时收集监控程序的行为数据,由于程序中存在循环与搜索,造成某些API重复调用,为后续行为分析带来了沉重的数据负担;第三,Windows API存在二义性,例如CreateFile可以是打开文件,创建文件,打开命名管道,或是创建命名管道等,这就导致API调用并不真正等价于程序行为。 The first, and in order to combat frequency statistics API API timing analysis, malicious code some of the programs using redundancy inserted API and API rearrangements, such API call sequence timing is difficult to characterize the behavior of a malicious program; second, dynamic analysis were collected at runtime behavioral data monitoring program, due to the presence of circulating the search procedure, resulting in some of the API is called repeatedly for the subsequent behavior analysis of data places a heavy burden; third, there is ambiguity Windows API, for example, can be CreateFile to open the file, create a file, open a named pipe, or create a named pipe, etc., which led to API calls do not really equivalent to program behavior.

[0059] 尽管API调用频率和API调用时序会发生变化,但API之间的调用依赖关系相对稳定,并且对同一对象重复操作的API之间存在依赖关系。 [0059] Although the API call API call timing and frequency vary, but relatively stable call dependencies between API, and the API dependency relationship between objects of the same operation is repeated. 基于此,Qemu监控器单元中的API依赖关系分析与冗余数据过滤功能复杂处理以下三种情况以提取出恶意程序的特征行为。 Based on this, Qemu monitoring unit in dependence API and redundant data filtering complex processing to extract the following three cases the behavior characteristics of a malicious program. 第一,在Windows平台下,句柄代表着系统资源,以句柄为依据将对同一资源的重复操作合并为一次。 First, under the Windows platform, handles represent the system resources to handle repeated operations will be based on the same resource merged into one. 第二,对进程注入事件进行监控。 Second, the process of injecting monitor events. 第三,通过依赖关系分析消除Create系列API的二义性。 Third, eliminate ambiguity Create series by API dependency analysis.

[0060] 2g).多进程监控=Qemu监控器单元中的多进程监控功能用于监控主进程创建的子进程和进程注入中被注入进程的行为。 [0060] 2g). Multi-process monitor multiple process monitoring function = Qemu monitor unit for monitoring a child process created by the primary process and the injection process is injected into the process behavior. 在主进程运行过程中,自动识别并添加需要监控的新的目标进程是多进程监控的难点。 In the process of the main process is running, add the required monitoring and automatic identification of new target process is the difficulty of multi-process monitoring. 沙盒监控模块以捕获API调用为核心,因此,实现多进程监控依然从API角度出发。 Sandbox monitoring module to capture the core API calls, therefore, multi-process monitoring remains from the API perspective.

[0061] 多进程监控的第一步是获取需要监控进程的进程名,在操作系统初始化进程时,以进程名为线索,找到操作系统分配给该进程的页目录基址的值。 [0061] The first step in the process of multi-monitor is to get the process needs to be monitored process name, when the operating system initialization process to process called clues, find the operating system assigns a value to the process page directory base address. 对于创建子进程这一行为,可通过监控NtCreateProcess这一内核API来实现。 Create a child process for this behavior can be achieved by monitoring NtCreateProcess the kernel API. 改造NtCreateProcess对应的前端回调函数,从调用参数中抽取出被创建进程的进程名,通过上文介绍的运行时内存分析方法,找到该进程页目录基址的值,传递给API调用管理框架,以实现子进程监控的扩充。 Transformation NtCreateProcess corresponding front-end callback function call parameters extracted from the process created a process name, run by memory analysis method described above, the process to find the value of the page directory base address is passed to the API call management framework to expand the child process monitoring. 第二步,Qemu监控器单元中的进程识别功能维护一个敏感页目录基址值列表,在每一个翻译块执行前,与虚拟CR3寄存器中存储的值做比较,当虚拟CR3寄存器切换到任一个敏感页目录基址值时,Qemu监控器单元中的API监控功能开始工作。 The second step, the monitor unit Qemu recognition process page directory base to maintain a sensitive list of values, before each block performing a translation, with the value stored in the register CR3 virtual comparison, when the CR3 register is switched to either a virtual when the page directory base sensitive value, API Qemu monitoring unit begins to monitor work.

[0062] 进程注入行为的监控主要分为识别进程注入行为和提取被注入进程的进程名两个步骤,都涉及到运行时多个API之间依赖关系的分析。 [0062] injection process monitoring behavior recognition process is divided into injecting behavior and extraction process name two steps are injected into the process, it involves the analysis of dependencies between multiple API to run. 进程注入实现时通常从进程枚举开始,由于每一个被枚举进程都是一个潜在的被注入进程,进程识别功能维护一个全局进程注入事件模板,当监控到EnumProcess、Process32First和Process32Next等用于进程枚举的API被调用时,为每一个被找到进程填写一个进程注入事件模板,记录进程名、进程ID、进程句柄等信息。 Usually begin during the enumeration process is injected to achieve from the process, because each is an enumeration process is a potential injection process is the process identification function maintains a global process template injection event, when monitored EnumProcess, Process32First and so on for process Process32Next when enumeration API is called, is found for each injection event process to fill in a process template, process name records, information process ID, process handle and so on. 实现进程注入的核心API包括:0penProcess、VirtualAllocEx>WriteProcessMemory。 Achieve the process of injecting the core API include: 0penProcess, VirtualAllocEx> WriteProcessMemory. 修改与这些API对应的前端回调函数,当这些API被调用时,通过调用参数索引相应的进程注入事件模板,更新模板,直到WriteProcessMemory成功调用,标志着进程注入事件的发生。 Modify the front end of the callback function corresponding to these API, when the API is called, is injected through the appropriate process calls the event template parameter index, update the template until WriteProcessMemory called, marks the injection process the event occurred. 这时从模板中读取被注入进程名,找到该进程的页目录基址,再传递给进程识别功能,便成功添加被注入进程为监控目标。 Then read from the template to be injected into the process name, find the page directory base address of the process, before being passed to the process of recognition, it will be injected into the process successfully added to monitor the target. 随后API调用管理框架会自动分析被注入进程发起的行为。 Subsequently API call management framework will be initiated automatically analyzes the behavior of the injection process.

[0063] 步骤3,在恶意程序样本动态分析结束之后,将会获得一系列的报告,这些报告将由行为抽象模块进行处理,得到样本行为。 [0063] Step 3, after the end of the dynamic analysis of malware samples, will get a series of reports that will be processed by behavioral abstraction module to obtain a sample behavior.

[0064] 结合附图5行为抽象示意图,行为抽象模块的主要步骤是:原始数据清理、行为抽象、行为存储。 [0064] Figure 5 acts in conjunction with a schematic abstract, behavioral abstraction module main steps are: cleaning the raw data, the abstract behavior, memory behavior. 结合附图6行为抽象流程图,行为抽象模块的每个步骤将细化为多个细则。 6 flowchart abstract accompanying drawings behavior, the behavior of each step will be refined abstraction module is a plurality of rules. 下面进行详细的介绍。 The following detailed description.

[0065] 3a).原始数据清理:由于原始API序列中存在一些无效、冗余的API函数调用记录,为了防止这些记录对后面的行为抽象步骤产生影响,在该步骤中对原始API序列技术文件进行清理。 . [0065] 3a) the original data cleaning: some invalid due to the original sequence of API, the API function call redundant recording, the recording order to prevent such an impact on the behavior of the latter abstraction step, the original file sequence technique API in this step to clean up.

[0066] 需要被清理的API函数包含以下几类。 [0066] API functions that need to be cleaned include the following categories.

[0067] <i>. API调用名及调用参数完全相同的的连续N个API调用,仅保留第一个,清除后NI个API调用。 [0067] <i>. API call exactly the same name and calling parameters N successive API calls, retaining only the first, after clearing the NI API calls.

[0068] 连续N次以同样的参数调用同一个API函数,并不能表现出更多的行为,反而会给后续的行为抽象过程造成额外的计算负担。 [0068] N times in a row with the same parameters calling the same API function, and can not show more behavior, but will follow the behavior of abstraction resulting in additional computational burden.

[0069] <ii>.无效的句柄参数[0070] 在原始数据清理逻辑中,维护了全局的句柄信息表,任何有效函数的传入句柄都应该是之前的函数传出的句柄参数或者返回值。 [0069] <ii>. The handle is invalid parameter [0070] In the cleaning logic original data, maintains a global handle table information, incoming handle any valid function should handle parameter is a function of the outgoing or return values ​​before . 如果发现某函数使用了未出现在全局句柄信息表中的句柄作为传入参数,那么可以认为该函数调用无效。 If you find that a function uses a global handle information does not appear in the table handle as incoming parameters, it can be considered invalid function call.

[0071] 使用了无效句柄值。 [0071] An invalid handle value.

[0072] 某些句柄值表示的是无效句柄,对这些句柄的使用是没有意义的,所以认为该函数调用无效。 [0072] represents the value of some handle invalid handle, the use of these handles is meaningless, it is considered invalid function call.

[0073] 3b).行为抽象:该步骤是整个行为抽象流程的核心,首先从数据库中读取预定义的行为抽象规则,之后按照这些抽象规则,对清理后的API序列记录文件进行解析,获取样本的行为信息。 [0073] 3b) the abstract behavior: This step is the core of the behavior of the abstract process, first read the predefined rules from the database abstraction behavior, in accordance with the following abstract rules, the recording sequence of the API parses the file after cleaning obtain behavior information of the sample.

[0074] 由于捕获的API调用记录以文本文件的形式存储,所以行为抽象的过程是对该文本文件的读取和解析过程。 [0074] Since the capture of the API call records stored in the form of text files, so the behavior of the process of abstraction is the process of reading and parsing the text file. 打开API序列记录文件后,针对文件中所有API调用函数记录进行逐一分析,对于每一个捕获的API函数,有如下几种可能出现的情况: After opening the API sequence of log files, one by one analysis of all records in the file API function calls, the API function for each capture, there are several situations that may arise:

[0075] <i>.该函数与行为抽象无关。 [0075] <i>. This function is independent of the abstract behavior.

[0076] 即该函数不是关键函数,这种情况下,该函数通常是一些不会对系统关键部分进行任何操作的,例如Sleep、GetSystemTime等。 [0076] That is, the function is not a critical function, in this case, some of the functions are usually not perform any operation key part of the system, such as Sleep, GetSystemTime like. 这类函数可以直接跳过。 Such functions can skip.

[0077] <ii>.该函数可以形成辅助行为。 [0077] <ii>. The behavior of the auxiliary function can be formed.

[0078] 如果该函数是关键函数并且可以形成辅助行为,这种情况下,需要获取该函数的参数并进行处理,例如字符串转换和合成等,然后将形成的辅助行为暂存入数据库。 [0078] If the function is a critical function and may be an auxiliary behavior, in this case, needs to obtain the parameters of the function and processed, e.g. string conversion and synthesis, then the behavior of the formed auxiliary temporarily stored in the database.

[0079] 该函数可以形成抽象行为。 [0079] This function can be formed abstract behaviors.

[0080] 如果该函数是关键函数并且可以形成抽象行为,这种情况下,需要获取该函数的参数并进行处理,例如字符串转换和合成等,然后将形成的抽象行为存入数据库。 [0080] If the function is a critical function and may be formed abstract behavior, in this case, needs to obtain the parameters of the function and processed, e.g. abstract behaviors string conversion and synthesis, and then formed into the database. 在整个文件分析完成后,这些抽象行为将按照预定的扩展规则扩展为决策向量。 After the entire file analysis is complete, these actions will be extended to abstract decision vector in accordance with a predetermined spreading rule.

[0081] 3c).行为存储:为了便于后续分类算法的处理,行为抽象过程中得出的数据,包括抽象行为和决策向量将被存储到数据库中,同时在样本分析的过程中,针对实际样本的情况可能会对行为抽象的规则进行一定程度的改动,以适应具体样本类别的特点。 . [0081] 3c) storage behavior: classification algorithm in order to facilitate the subsequent processing behavior of process data derived from the abstract, and including abstract behavior decision vector to be stored in the database, while in the process of sample analysis, actual samples the situation may be abstract behavior rules to some extent altered to meet the specific characteristics of the sample category.

[0082] 步骤4,经过行为抽象后,将得到样本的行为信息并被存储在数据库中,随着训练样本的增多,数据库中将存储大量的样本行为信息。 [0082] Step 4, after the abstract behavior, resulting in the database, with the increase of training sample, the database stored in the number of samples stored in the behavior information and the behavior information of the sample. 为了判断用户提交的分析文件是否是恶意程序,或者属于哪一种恶意程序,必须首先建立起分类模型。 In order to determine whether the analysis of documents submitted by the user is a malicious program, or what kind of malicious program belongs to, you must first establish a classification model. 本系统采用集成学习思想利用训练样本的行为信息建立分类模型,通过训练多个子分类器对同一个样本分类结果进行投票表决,以提高在多分类情况下的分类精度。 The system uses an integrated learning behavior information using the idea of ​​the establishment of the training sample classification model, to vote on the same sample multiple sub-classification results by training classifiers to improve classification accuracy in multi-classification of.

[0083] 集成学习算法分为两个关键环节:子分类器生成和分类器集成。 [0083] integrated learning algorithm is divided into two key areas: sub-classifier generation and classifier ensemble. 从算法处理数据的特点看,沙盒行为监控和静态分析采集到的数据有API序列、文件静态特征、网络数据包等。 Processing data from the characteristics of the algorithm see, sandbox behavior monitoring and analysis of static API has collected data sequence, wherein the static file, a data packet network and the like. 这些数据来自于不同的数据源,是离散型非数值数据;从检测任务要求看,要求分类算法能够处理多分类问题而不是简单的二分类。 The data from different data sources, is a discrete non-numerical data; see from the detection task requirements, it requires classification algorithm can handle multiple classification rather than a simple binary. 综合以上要求,系统选择经典决策树C4. 5算法作为子分类器算法。 Based on the above requirements, the system selects the classic decision tree C4. 5 algorithm as the sub-classification algorithm.

[0084] 系统使用决策树算法作为子分类器算法,AdaBoost算法作为集成算法,对恶意程序检测与分类结果进行增强。 [0084] The system used as a sub-classification decision tree algorithm, the AdaBoost algorithm as the integration algorithm, the malware detection and classification results enhanced.

[0085] 步骤5,输出恶意程序行为报告、检测和分类的结果。 [0085] Step 5, the output report malicious behavior, the result of the detection and classification.

[0086] 应当理解的是,对本领域普通技术人员来说,可以根据上述说明加以改进或变换,而所有这些改进和变换都应属于本发明所附权利要求的保护范围。 [0086] It should be understood that those of ordinary skill in the art, can be modified or converted according to the above description, and all such modifications and variations shall fall within the scope of the appended claims of the invention.

Claims (10)

  1. 1. 一种恶意程序行为自动化分析、检测与分类系统,其特征在于,包括如下模块:(1).静态分析模块:在对样本文件进行沙盒动态分析之前,可以对可执行文件的结构进行静态分析,以得到尽可能多的与样本相关的信息,由这些信息得到样本文件的静态分析报告,和之后的各种报告成为行为抽象模块最原始的数据来源;(2).沙盒调度管理模块:沙盒调度管理模块管理每个沙盒、协调样本和数据的传输、控制样本自动化分析的流程;沙盒调度管理模块控制每一个沙盒的启动和退出,实现与每一个沙盒的信息交换与文件传输,控制样本的执行和主机环境模拟;(3).沙盒监控模块:沙盒监控模块以捕获特定进程发起的API调用及其参数为主要目标,同时提取该进程加载模块和操作系统为其维护的相关内核数据。 A malicious program behavior automatic analysis, detection and classification system, characterized by comprising the following means: (1) static analysis module: prior to sample file sandbox dynamic analysis, the structure may be made to the executable file static analysis, in order to get as much information related to the sample to obtain a sample file from the information static analysis, reports and abstract behavior after becoming the most original source data module;. (2) Sandbox scheduling management module: Sandbox scheduling management module manages each sandbox, sample and data transmission coordination, control flow automated analysis of samples; sandbox scheduling management module controls each start and exit the sandbox, information and each sandbox exchange and file transfer, execution and control host environment simulation samples; (3) monitoring module sandbox: Sandbox monitoring module to capture a particular process initiated by the API calls and parameters as the main target, but the extraction process load modules and operations. system related to its core data maintenance. 本发明使用开源模拟器Qemu作为基础的虚拟机软件,并且对其CPU模拟中的指令解释执行部分核心代码进行修改,实现监控特定进程主机行为的目的。 The present invention is using open source software emulator Qemu virtual machine as a basis, and interpretation of the core code operative to modify its CPU simulation instructions, to achieve the purpose of monitoring the behavior of a particular process host. 这种基于指令集仿真环境的行为监测技术可以从指令级开始自下而上实现系统调用、进程等内核模块重构来获取恶意程序动态执行中的行为,并且宿主机与恶意程序执行的沙盒环境相隔离,在很大程度上避免了恶意程序在执行过程中对宿主机的影响;(4).行为抽象模块:在沙盒监控模块完成对恶意程序的执行和API的捕获之后,可以获得该样本程序运行期间使用的API函数及其参数的报告;但是该API报告直接用于恶意程序分类,存在一些障碍,所以需要从API序列中抽象得到样本表现的行为;(5).检测与分类模块:恶意程序检测任务是一个标准的多分类任务。 This behavior-based monitoring technology instruction set simulator environment can be achieved from the bottom-up start instruction-level system calls, kernel modules such as the reconstruction process to obtain the dynamic execution of the malware behavior, and host malicious program execution sandbox environmental isolated, largely avoided the impact on the host malicious programs in the implementation process; (4) the behavior of the abstract modules: complete capture and execution of malicious programs in the sandbox API monitoring module after that, you can get reporting API functions and their parameters used during the sample program is running; however the API report directly used for malicious programs into categories, there are some obstacles, so you need to get the abstract behavior of a sample performance from the API sequence;. (5) detection and classification module: malicious program detection task is a standard multi-classification task. 为了判断用户提交的分析文件是否是恶意程序,若是需进一步判断属于哪一种恶意程序,必须首先建立起分类模型;采用集成学习的思想建立分类模型,集成学习的思想使用不同的策略将一个大问题划分为若干的小问题分别求解,或是生成多个学习器解决同一问题,接着通过集成策略将不同子分类器的输出结果合成,得到单一的最终输出结果。 In order to determine whether the analysis of documents submitted by the user is a malicious program, if the need to further determine what kind of malicious program belongs to, you must first establish a classification model; the idea of ​​establishing an integrated learning classification model, the idea of ​​integrated learning to use different strategies will be a big the problem is divided into several small problems are solved, or to generate a plurality of learning is to solve the same problem, and then output the results by different sub-classifiers integration strategy will be synthesized by a single final output.
  2. 2.根据权利要求I所述的恶意程序行为自动化分析、检测与分类系统,其特征在于:沙盒监控模块包括:作为恶意程序虚拟执行环境的Guest OS单元;改造过的全系统模拟器Qemu监控器单元;GueSt OS单元包括了网络数据包监控、快照对比、主机事件模拟等功能,Qemu监控器单元包括了进程识别与多进程监控、API监控、API依赖关系分析与冗余数据过滤功能。 The malicious program behavior I according to claim automated analysis, detection and classification system, wherein: the sandbox monitoring module comprising: a Guest OS malicious program execution unit virtual environment; transformation of the full-system simulator monitor Qemu unit; GueSt OS unit comprises a packet data network monitor, compare the snapshot, the host event simulation functions, Qemu monitor unit comprises a plurality of process monitoring and process identification, monitoring the API, the API dependency and redundant data filtering.
  3. 3.根据权利要求2所述的恶意程序行为自动化分析、检测与分类系统,其特征在于:Guest OS单元是运行恶意程序样本的环境,选择Windows XP操作系统作为GuestOS ;GuestOS单元与宿主机之间通过虚拟网络连接,由沙盒调度管理模块负责交互。 The behavior of the 2 malware automated analysis, detection and classification system as claimed in claim wherein: Guest OS environment unit is a malicious program running samples and choose Windows XP operating system as GuestOS; between the host unit and GuestOS through a virtual network connection, interaction is responsible for the sandbox schedule management module.
  4. 4.根据权利要求2所述的恶意程序行为自动化分析、检测与分类系统,其特征在于:沙盒监控模块的Qemu监控器单元比Guest OS单元有着更高特权等级,用于监控目标程序的行为;QemU监控器单元使用开源模拟器Qemu作为基础的虚拟机软件,但是对其CPU模拟中的指令解释执行部分核心代码进行修改,实现监控特定进程主机行为的目的。 The behavior of the 2 malware automated analysis, detection and classification system as claimed in claim wherein: Qemu sandbox monitoring module monitor unit has a higher privilege level than the Guest OS means for monitoring the behavior of the target program ; QEMU monitor unit used as a basis for the open source Qemu emulator virtual machine software, but its CPU command interpretation execution of the simulation kernel code portion changes to achieve a particular process to monitor the behavior of the host object.
  5. 5.根据权利要求4所述的恶意程序行为自动化分析、检测与分类系统,其特征在于:所述的Qemu监控器单元进行进程识别的方法是:沙盒监控模块在每一个翻译块开始执行前,利用虚拟内存读写函数,以内核数据结构KPCR为线索,找到系统中当前正在执行进程的EPR0CESS结构体起始地址;接着,通过EPR0CESS结构中保存的进程名判断当前正在执行进程是否是目标进程,若是则从中读取操作系统分配给该进程的页目录基址值;之后,将该值与虚拟CR3寄存器中存储的值做比较,判断监控进程是否正在执行;仅在目标进程执行时进行行为数据采集。 The malicious program behavior of the 4 automated analysis, detection and classification system as claimed in claim wherein: said monitoring unit Qemu process identification method is: begin monitoring module sandbox each translation block before , the use of virtual memory read and write functions to kernel data structures KPCR for clues, find the system EPR0CESS structure starting address of the currently executing process; then, saved by EPR0CESS construction process name to determine whether the process is currently executing the target process , if the operating system assigns to read from page directory base value of the process; then, the value of the virtual register CR3 value stored for comparison to determine whether the monitoring process is being performed; only acts when the target process data collection.
  6. 6.根据权利要求1-5任一所述的恶意程序行为自动化分析、检测与分类系统,其特征在于:所述沙盒监控模块使用“三步法”来解决这一问题;具体过程如下:读取之前首先测试是否存在缺页现象,若出现缺页,等待该页被调入虚拟内存,若等待不成功,由分析程序强行读取该地址空间中的数据,触发客户操作系统缺页异常处理例程,将所缺页调入虚拟内存,接着尝试再次读取数据;为了提高执行效率,在沙盒监控模块中,并非所有虚拟内存读写都使用“三步法”,而是仅在最有可能发生缺页时,执行以上缺页异常避免策略;WindowS系统中,直接从栈中读取的数据不会引起缺页,字符串和结构体参数通常情况下数据量小也不会引发缺页,均不需要进行缺页测试;只有在涉及I/O过程或是大缓冲区读写时,才有可能出现缺页异常。 The act of a malicious program according to any one of 1-5 automated analysis, detection and classification system as claimed in claim wherein: the monitoring module sandbox "three steps" to solve this problem; process is as follows: first test before reading the existence of the phenomenon of missing pages, if the missing page appears, wait for the page to be transferred to the virtual memory, if the wait is unsuccessful, the address space to read data analysis program by force, triggering abnormal guest operating system missing page handling routine, the missing pages will be transferred to virtual memory, and then try to read the data again; in order to improve efficiency in the implementation, monitoring module in a sandbox, not all virtual memory is read and written using the "three-step", but only in when the page fault is most likely to occur, the implementation of the above strategy to avoid missing page exception; WindowS system, data is read directly from the stack without causing a page fault, a small amount of data in a string structure and parameters normally will not lead missing pages, missing pages not need to be tested; only when it comes to I / O processes or large buffers to read and write, only missing page exception may occur.
  7. 7.根据权利要求4所述的恶意程序行为自动化分析、检测与分类系统,其特征在于:Qemu监控器单元中的多进程监控功能用于监控主进程创建的子进程和进程注入中被注入进程的行为;本系统进行多进程监控的方法是:第一步,获取需要监控进程的进程名,在操作系统初始化进程时,以进程名为线索,找到操作系统分配给该进程的页目录基址的值。 The malicious program behavior of the 4 automated analysis, detection and classification system as claimed in claim wherein: the multi-process monitoring unit Qemu monitor function for monitoring the child process is injected into the main process and the process of creating the implantation process behavior; methods of the present system of multi-process monitoring are: first, the need to obtain the process name to monitor the process, when the operating system initialization process to process called clues, find the page directory base operating system allocated to the process value. 对于创建子进程这一行为,可通过监控NtCreateProcess这一内核API来实现。 Create a child process for this behavior can be achieved by monitoring NtCreateProcess the kernel API. 改造NtCreateProcess对应的前端回调函数,从调用参数中抽取出被创建进程的进程名,通过上文介绍的运行时内存分析方法,找到该进程页目录基址的值,传递给API调用管理框架,以实现子进程监控的扩充。 Transformation NtCreateProcess corresponding front-end callback function call parameters extracted from the process created a process name, run by memory analysis method described above, the process to find the value of the page directory base address is passed to the API call management framework to expand the child process monitoring. 第二步,Qemu监控器单元中的进程识别功能维护一个敏感页目录基址值列表,在每一个翻译块执行前,与虚拟CR3寄存器中存储的值做比较,当虚拟CR3寄存器切换到任一个敏感页目录基址值时,Qemu监控器单元中的API监控功能开始工作。 The second step, the monitor unit Qemu recognition process page directory base to maintain a sensitive list of values, before each block performing a translation, with the value stored in the register CR3 virtual comparison, when the CR3 register is switched to either a virtual when the page directory base sensitive value, API Qemu monitoring unit begins to monitor work.
  8. 8.根据权利要求7所述的恶意程序行为自动化分析、检测与分类系统,其特征在于:本系统进行进程注入行为监控的方法是:第一步,识别进程注入行为:进程注入实现时通常从进程枚举开始,由于每一个被枚举进程都是一个潜在的被注入进程,Qemu监控器单元中的进程识别功能维护一个全局进程注入事件模板,当监控到EnumProcess、Process32First和Process32Next等用于进程枚举的API被调用时,为每一个被找到进程填写一个进程注入事件模板,记录进程名、进程ID、进程句柄等信息;实现进程注入的核心API包括:0penProcess、VirtualAllocEx>WriteProcessMemory ;修改与这些API对应的前端回调函数,当这些API被调用时,通过调用参数索引相应的进程注入事件模板,更新模板,直到WriteProcessMemory成功调用,标志着进程注入事件的发生;第二步,提取被注入进程的进程名:从模板 8. The behavior of the malware in claim 7, automatic analysis, detection and classification system, which is characterized in that: the system behavior monitoring method for the injection process are: first, recognition injection process behavior: When the injection process is typically implemented begin the process of enumeration, enumeration process because each is a potential is being injected into the process, Qemu monitor unit in the process of recognition maintain a global process template injection event, when monitored EnumProcess, Process32First and so on for process Process32Next enumeration API is called, is found for each injection process to fill in a process template events, recording information process name, process ID, process handle and so on; the injection process to achieve the core API include: 0penProcess, VirtualAllocEx> WriteProcessMemory; modify these API callback function corresponding to the front end, when the API is called, by calling the appropriate process parameter index injection event template, the template is updated, until WriteProcessMemory called, marks the injection process occurrence of an event; the second step, the extraction process is injected process name: from template 读取被注入进程名,找到该进程的页目录基址,再传递给进程识别功能,便成功添加被注入进程为监控目标;随后API调用管理框架会自动分析被注入进程发起的行为。 Reading is injected into the process name, find the page directory base address of the process, before being passed to the process of recognition, will be injected into the process successfully added to monitor target; then the API call management framework will be initiated automatically analyzes the behavior of the injection process.
  9. 9. 一种恶意程序行为自动化分析、检测与分类方法,其特征在于,步骤如下:步骤(I),静态分析模块首先对可执行样本文件的结构进行静态分析,获得可执行样本文件的静态信息;步骤(2),静态分析完成以后,样本文件将进入动态分析自动化过程:样本文件的动态分析过程将由沙盒调度管理模块自动化管理,沙盒调度管理模块启动沙盒、将样本文件上传至Guest OS单元,在Guest OS单元运行样本,沙盒监控模块监控样本的执行或加载,产生样本文件的API序列的报告,网络数据包监控程序Guest OS单元产生的网络数据包,产生样本文件的网络数据包报告;样本执行正常结束或超时结束后,如果是非EXE样本文件,将进行注册表及文件系统的快照对比,产生注册表、文件快照对比报告,所述报告将与样本执行过程中生成的文件一起传输给沙盒调度管理模块,所 A malicious program behavior automatic analysis, detection and classification method, comprising the following steps: Step (the I), static analysis module first sample configuration of the executable file static analysis to obtain the static information in the executable file Sample ; step (2), after a static analysis is complete, the sample files will go into dynamic analysis automation process: dynamic analysis process sample file by the sandbox scheduling module for automated management, sandbox scheduling management module starts sandbox, upload the sample files to a Guest OS section, running samples Guest OS unit sandbox monitoring module monitor the execution of the sample or the loads, producing the API sequence sample file reports, network packet monitor Guest OS network packet generation unit, to generate network data sample file report packet; specimen after performing a normal end or end of the timeout, if the sample is a non EXE file, compare the snapshot file system and a registry, the registry is generated, the snapshot file comparison report, the report generation process of the sample with the execution file sandbox transmitted together to the schedule management module, the 报告将是对一个恶意程序样本进行行为抽象的原始数据;步骤(3),在恶意程序样本动态分析结束之后,将会获得一系列的报告,所述报告将由行为抽象模块进行处理,得到样本行为;步骤(4),经过行为抽象后,将得到样本的行为信息并被存储在数据库中,随着训练样本的增多,数据库中将存储大量的样本行为信息;为了判断用户提交的分析文件是否是恶意程序,或者属于哪一种恶意程序,必须首先建立起分类模型;采用集成学习思想利用训练样本的行为信息建立分类模型,通过训练多个子分类器对同一个样本分类结果进行投票表决,以提高在多分类情况下的分类精度;步骤(5),输出恶意程序行为报告、检测和分类的结果。 Reports would be on a sample malicious program abstract behavioral raw data; Step (3), after the dynamic analysis of malware samples will be obtained a series of reports, the report by abstraction module for processing the behavior, the behavior of the sample to give ; step (4), after the abstract behavior, and the resulting behavior information stored sample in the database, with the increase of training sample, in the number of samples stored behavioral information database; analysis in order to determine whether the document is submitted by the user malware, or malicious programs which belong to, must first establish a classification model; behavioral information using the integrated learning ideological training samples to establish the classification model, to vote on the same sample multiple sub-classification results by training classifiers to improve in the classification accuracy of multiple classification; results of step (5), the output report malicious program behavior, detection and classification.
  10. 10.根据权利要求2所述的恶意程序行为自动化分析、检测与分类方法,其特征在于:行为抽象的主要步骤:(1)原始数据清理;(2)行为抽象;(3)行为存储;原始数据清理的API函数包括以下几类:(1)API调用名及调用参数完全相同的的连续N个API调用,仅保留第一个,清除后NI个API调用;(2)如果发现某函数使用了未出现在全局句柄信息表中的句柄作为传入参数,那么可以认为该函数调用无效;(3)某些句柄值表示的是无效句柄,对这些句柄的使用是没有意义的,所以认为该函数调用无效;行为抽象的过程是对文件中所有API调用记录进行逐一分析的过程,对于每一个捕获的API调用,有如下几种可能出现的情况:(1)该函数与行为抽象无关:即该函数不是关键函数,这种情况下,该函数通常是一些不会对系统关键部分进行任何操作的,这类函数可以直接跳过;(2)该函 10. The behavior of the malware according to claim 2, automated analysis, detection and classification method, wherein: the abstract behavior main steps: (1) cleaning the raw data; (2) the abstract behavior; (3) memory behavior; Original data cleanup API functions include the following categories: (1) API call parameters and call name identical N successive API calls, retaining only the first, after clearing the NI API call; (2) If a function is found using the handle does not appear in the global information table in the handle as incoming parameters, it can be considered invalid function call; (3) represents the value of the handle certain invalid handle, the use of these handles is meaningless, it is considered that function call is invalid; behavioral abstraction process is a process call records of all API files one by one analysis, for each API call capture, there are several situations may arise: (1) this function has nothing to do with the behavior of abstraction: the this function is not critical function, in this case, some of the functions are usually not critical to the operation of any part of the system, these functions can skip; (2) the function 数可以形成辅助行为:如果该函数是关键函数并且可以形成辅助行为,这种情况下,需要获取该函数的参数并进行处理,然后将形成的辅助行为暂存入数据库;(3)该函数可以形成抽象行为:如果该函数是关键函数并且可以形成抽象行为,这种情况下,需要获取该函数的参数并进行处理,然后将形成的抽象行为存入数据库;在整个文件分析完成后,这些抽象行为将按照预定的扩展规则扩展为决策向量。 Behavior auxiliary number may be formed: If the function is a critical function and may be an auxiliary behavior, in this case, needs to obtain the parameters of the function and processed, and the resulting behavior of the auxiliary temporarily stored in the database; (3) This function can be forming abstract behavior: abstract behavior if the function is a critical function and may be formed abstract behavior, in this case, needs to obtain the parameters of the function and processed, and then formed into the database; the entire file after the analysis is complete, these abstractions behavior will be extended to decision vector in accordance with a predetermined spreading rule.
CN 201210408358 2012-10-14 2012-10-14 Automated malware behavior analysis, detection and classification system and method CN102930210B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 201210408358 CN102930210B (en) 2012-10-14 2012-10-14 Automated malware behavior analysis, detection and classification system and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 201210408358 CN102930210B (en) 2012-10-14 2012-10-14 Automated malware behavior analysis, detection and classification system and method

Publications (2)

Publication Number Publication Date
CN102930210A true true CN102930210A (en) 2013-02-13
CN102930210B CN102930210B (en) 2015-11-25

Family

ID=47645007

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 201210408358 CN102930210B (en) 2012-10-14 2012-10-14 Automated malware behavior analysis, detection and classification system and method

Country Status (1)

Country Link
CN (1) CN102930210B (en)

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103152224A (en) * 2013-03-21 2013-06-12 中国科学院信息工程研究所 Method and system for dynamically monitoring analog network in real time
CN103150509A (en) * 2013-03-15 2013-06-12 长沙文盾信息技术有限公司 Virus detection system based on virtual execution
CN103368965A (en) * 2013-07-18 2013-10-23 北京随方信息技术有限公司 Working method for mapping network safety norms to attribution requirements corresponding to network
CN103679032A (en) * 2013-12-13 2014-03-26 北京奇虎科技有限公司 Method and device for preventing malicious software
CN103927484A (en) * 2014-04-21 2014-07-16 西安电子科技大学宁波信息技术研究院 Malicious program behavior capture method based on Qemu
CN104252594A (en) * 2013-06-27 2014-12-31 贝壳网际(北京)安全技术有限公司 Virus detection method and device
CN104252447A (en) * 2013-06-27 2014-12-31 贝壳网际(北京)安全技术有限公司 Method and device for analyzing file behavior
CN104715190A (en) * 2015-02-03 2015-06-17 中国科学院计算技术研究所 Method and system for monitoring program execution path on basis of deep learning
CN105488414A (en) * 2015-09-25 2016-04-13 深圳市安之天信息技术有限公司 Method and system for preventing malicious codes from detecting virtual environments
WO2016078323A1 (en) * 2014-11-20 2016-05-26 华为技术有限公司 Malware detection method and apparatus
WO2016127037A1 (en) * 2015-02-06 2016-08-11 Alibaba Group Holding Limited Method and device for identifying computer virus variants
CN105893848A (en) * 2016-04-27 2016-08-24 南京邮电大学 Precaution method for Android malicious application program based on code behavior similarity matching
CN106161344A (en) * 2014-09-30 2016-11-23 瞻博网络公司 Identifying Evasive Malicious Object Based On Behavior Delta
CN106384047A (en) * 2016-08-26 2017-02-08 青岛天龙安全科技有限公司 APP detection unknown pattern collection and judging method
US9769189B2 (en) 2014-02-21 2017-09-19 Verisign, Inc. Systems and methods for behavior-based automated malware analysis and classification
WO2018036321A1 (en) * 2016-08-24 2018-03-01 中兴通讯股份有限公司 Email viewing method, and user terminal

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030200464A1 (en) * 2002-04-17 2003-10-23 Computer Associates Think, Inc. Detecting and countering malicious code in enterprise networks
CN101226570A (en) * 2007-09-05 2008-07-23 江启煜 Method for monitoring and eliminating generalized unknown virus
CN101458630A (en) * 2008-12-30 2009-06-17 中国科学院软件研究所 Self-modifying code identification method based on hardware emulator
CN101782954A (en) * 2009-01-20 2010-07-21 联想(北京)有限公司 Computer and abnormal progress detection method
CN102254111A (en) * 2010-05-17 2011-11-23 北京知道创宇信息技术有限公司 Malicious site detection method and device
CN102521206A (en) * 2011-12-16 2012-06-27 天津大学 Lead optimization method for SVM-RFE (support vector machine-recursive feature elimination) based on ensemble learning thought

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030200464A1 (en) * 2002-04-17 2003-10-23 Computer Associates Think, Inc. Detecting and countering malicious code in enterprise networks
CN101226570A (en) * 2007-09-05 2008-07-23 江启煜 Method for monitoring and eliminating generalized unknown virus
CN101458630A (en) * 2008-12-30 2009-06-17 中国科学院软件研究所 Self-modifying code identification method based on hardware emulator
CN101782954A (en) * 2009-01-20 2010-07-21 联想(北京)有限公司 Computer and abnormal progress detection method
CN102254111A (en) * 2010-05-17 2011-11-23 北京知道创宇信息技术有限公司 Malicious site detection method and device
CN102521206A (en) * 2011-12-16 2012-06-27 天津大学 Lead optimization method for SVM-RFE (support vector machine-recursive feature elimination) based on ensemble learning thought

Cited By (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103150509A (en) * 2013-03-15 2013-06-12 长沙文盾信息技术有限公司 Virus detection system based on virtual execution
CN103150509B (en) * 2013-03-15 2015-10-28 长沙文盾信息技术有限公司 A virus detection system based on virtual execution
CN103152224B (en) * 2013-03-21 2015-12-02 中国科学院信息工程研究所 A real-time dynamic monitoring of network simulation method and system
CN103152224A (en) * 2013-03-21 2013-06-12 中国科学院信息工程研究所 Method and system for dynamically monitoring analog network in real time
CN104252447A (en) * 2013-06-27 2014-12-31 贝壳网际(北京)安全技术有限公司 Method and device for analyzing file behavior
CN104252594A (en) * 2013-06-27 2014-12-31 贝壳网际(北京)安全技术有限公司 Virus detection method and device
CN103368965A (en) * 2013-07-18 2013-10-23 北京随方信息技术有限公司 Working method for mapping network safety norms to attribution requirements corresponding to network
CN103679032B (en) * 2013-12-13 2017-05-17 北京奇虎科技有限公司 Method and apparatus against malicious software
CN103679032A (en) * 2013-12-13 2014-03-26 北京奇虎科技有限公司 Method and device for preventing malicious software
US9769189B2 (en) 2014-02-21 2017-09-19 Verisign, Inc. Systems and methods for behavior-based automated malware analysis and classification
CN103927484A (en) * 2014-04-21 2014-07-16 西安电子科技大学宁波信息技术研究院 Malicious program behavior capture method based on Qemu
CN106161344B (en) * 2014-09-30 2018-03-30 瞻博网络公司 Based on the behavior of malicious objects incremental identity escape
US9922193B2 (en) 2014-09-30 2018-03-20 Juniper Networks, Inc. Identifying an evasive malicious object based on a behavior delta
CN106161344A (en) * 2014-09-30 2016-11-23 瞻博网络公司 Identifying Evasive Malicious Object Based On Behavior Delta
WO2016078323A1 (en) * 2014-11-20 2016-05-26 华为技术有限公司 Malware detection method and apparatus
CN105678164A (en) * 2014-11-20 2016-06-15 华为技术有限公司 Method and device for detecting malicious software
CN105678164B (en) * 2014-11-20 2018-08-14 华为技术有限公司 A method and apparatus for detecting malware
CN104715190A (en) * 2015-02-03 2015-06-17 中国科学院计算技术研究所 Method and system for monitoring program execution path on basis of deep learning
CN104715190B (en) * 2015-02-03 2018-02-06 中国科学院计算技术研究所 Method and system for monitoring program execution path based on the depth of learning
WO2016127037A1 (en) * 2015-02-06 2016-08-11 Alibaba Group Holding Limited Method and device for identifying computer virus variants
CN105488414A (en) * 2015-09-25 2016-04-13 深圳市安之天信息技术有限公司 Method and system for preventing malicious codes from detecting virtual environments
CN105893848A (en) * 2016-04-27 2016-08-24 南京邮电大学 Precaution method for Android malicious application program based on code behavior similarity matching
WO2018036321A1 (en) * 2016-08-24 2018-03-01 中兴通讯股份有限公司 Email viewing method, and user terminal
CN106384047A (en) * 2016-08-26 2017-02-08 青岛天龙安全科技有限公司 APP detection unknown pattern collection and judging method

Also Published As

Publication number Publication date Type
CN102930210B (en) 2015-11-25 grant

Similar Documents

Publication Publication Date Title
Bible et al. A comparative study of coarse-and fine-grained safe regression test-selection techniques
US20120323853A1 (en) Virtual machine snapshotting and analysis
US7996905B2 (en) Method and apparatus for the automatic determination of potentially worm-like behavior of a program
US20040054991A1 (en) Debugging tool and method for tracking code execution paths
Cesare et al. Classification of malware using structured control flow
US20120084759A1 (en) System and method for in-vivo multi-path analysis of binary software
US20120233601A1 (en) Recompiling with Generic to Specific Replacement
Moser et al. Exploring multiple execution paths for malware analysis
Sodani et al. An empirical analysis of instruction repetition
Gethers et al. Using relational topic models to capture coupling among classes in object-oriented software systems
Han et al. Performance debugging in the large via mining millions of stack traces
US20130073837A1 (en) Input Vector Analysis for Memoization Estimation
Ren et al. Hadoop's adolescence: an analysis of Hadoop usage in scientific workloads
US6662362B1 (en) Method and system for improving performance of applications that employ a cross-language interface
US20130067445A1 (en) Determination of Function Purity for Memoization
US7398514B2 (en) Test automation stack layering
US20140130158A1 (en) Identification of malware detection signature candidate code
Cesare et al. Malwise—an effective and efficient classification system for packed and polymorphic malware
US20120239987A1 (en) System and Method of Manipulating Virtual Machine Recordings for High-Level Execution and Replay
US20090328002A1 (en) Analysis and Detection of Responsiveness Bugs
US20070288937A1 (en) Virtual Device Driver
CN102054149A (en) Method for extracting malicious code behavior characteristic
US20120323553A1 (en) Mobile Emulator Integration
CN101714118A (en) Detector for binary-code buffer-zone overflow bugs, and detection method thereof
US20110296377A1 (en) Deployment script generation and execution

Legal Events

Date Code Title Description
C06 Publication
C10 Entry into substantive examination
C53 Correction of patent for invention or patent application
COR Change of bibliographic data

Free format text: CORRECT: APPLICANT; FROM: JIANGSU JINLING SCIENCE + TECHNOLOGY GROUP CORPORATION TO: JIANGSU JINLING SCIENCE + TECHNOLOGY GROUP CO., LTD.

Free format text: CORRECT: INVENTOR; FROM: ZOU YAN LIU JIANGANG MIAO QIGUANG CAO YING XIE GUOSHENG HUANG YOUCHENG LIUJIACHEN ZHENG CHUNYANG TO: ZOU YAN LIU JIANGANG MIAO QIGUANG SONG JIANFENG XIE GUOSHENG CAO YING HUANG YOUCHENG LIU JIACHEN ZHENG CHUNYANG

C14 Grant of patent or utility model