CN100504903C - A Malicious Code Automatic Identification Method - Google Patents
A Malicious Code Automatic Identification Method Download PDFInfo
- Publication number
- CN100504903C CN100504903C CNB2007101219336A CN200710121933A CN100504903C CN 100504903 C CN100504903 C CN 100504903C CN B2007101219336 A CNB2007101219336 A CN B2007101219336A CN 200710121933 A CN200710121933 A CN 200710121933A CN 100504903 C CN100504903 C CN 100504903C
- Authority
- CN
- China
- Prior art keywords
- function
- component
- malicious
- functions
- components
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
- 238000000034 method Methods 0.000 title claims abstract description 40
- 238000004458 analytical method Methods 0.000 claims abstract description 41
- 239000008186 active pharmaceutical agent Substances 0.000 claims description 18
- 239000000284 extract Substances 0.000 claims description 10
- 238000010276 construction Methods 0.000 claims description 2
- 230000006399 behavior Effects 0.000 abstract description 28
- 239000000523 sample Substances 0.000 description 15
- 238000005516 engineering process Methods 0.000 description 9
- 230000008569 process Effects 0.000 description 5
- 230000000694 effects Effects 0.000 description 4
- 238000000605 extraction Methods 0.000 description 4
- 238000001514 detection method Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 230000003068 static effect Effects 0.000 description 3
- 230000002155 anti-virotic effect Effects 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 230000035755 proliferation Effects 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 239000013589 supplement Substances 0.000 description 2
- 241000700605 Viruses Species 0.000 description 1
- 230000002776 aggregation Effects 0.000 description 1
- 238000004220 aggregation Methods 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000001010 compromised effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 230000007717 exclusion Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- ZXQYGBMAQZUVMI-GCMPRSNUSA-N gamma-cyhalothrin Chemical compound CC1(C)[C@@H](\C=C(/Cl)C(F)(F)F)[C@H]1C(=O)O[C@H](C#N)C1=CC=CC(OC=2C=CC=CC=2)=C1 ZXQYGBMAQZUVMI-GCMPRSNUSA-N 0.000 description 1
- 238000010921 in-depth analysis Methods 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 230000008092 positive effect Effects 0.000 description 1
- 230000010076 replication Effects 0.000 description 1
- 238000012552 review Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000001502 supplementing effect Effects 0.000 description 1
Images
Landscapes
- Stored Programmes (AREA)
Abstract
Description
技术领域 technical field
本发明属于恶意代码自动分析领域,具体涉及一种利用逆向工程技术和代码相似性比较技术加速分析恶意代码的方法。The invention belongs to the field of automatic analysis of malicious codes, in particular to a method for accelerating the analysis of malicious codes by using reverse engineering technology and code similarity comparison technology.
背景技术 Background technique
在目前的互联网上,恶意代码无处不在,泛滥成灾,严重威胁着网络安全。恶意代码相关技术的实现难度相对较高,但近年来,随着网络的普及,互联网上专门讨论恶意代码实现技术的网站不断增多,人们可以直接从网上获得恶意代码的源码,基本的恶意功能的源代码变得唾手可得。这些推动了恶意代码变种的泛滥,在不同变种的恶意代码中,代码段复用的现象非常明显,很多新出现的的恶意代码均采用了以往恶意代码的实现技术,甚至直接使用已有的源代码,只有很少的新样本才会添加一些新的功能或新的实现方法。目前恶意代码及其变种的数量已出现了爆炸性的增长,传统的手工分析方法已经不能满足恶意代码的快速分析的需求.。On the current Internet, malicious codes are ubiquitous and flooding, seriously threatening network security. Malicious code-related technologies are relatively difficult to implement, but in recent years, with the popularization of the Internet, there have been more and more websites on the Internet dedicated to discussing malicious code implementation technologies. People can directly obtain the source code of malicious code from the Internet. Source code becomes available at your fingertips. These have promoted the proliferation of malicious code variants. In different variants of malicious code, the phenomenon of code segment reuse is very obvious. Many new malicious codes have adopted the previous malicious code implementation technology, and even directly use existing sources. Code, only a few new samples will add some new functionality or new implementation methods. At present, the number of malicious codes and their variants has grown explosively, and traditional manual analysis methods can no longer meet the needs of rapid analysis of malicious codes.
在恶意代码自动分析领域,目前主要有两类自动分析的方法:动态分析法与静态分析法。动态分析法是指在安全环境中动态执行待分析程序并观察其运行过程与结果,这种分析方法可以用于发掘恶意代码中的部分行为,但有时候由于环境不符合代码运行的要求,或不符合恶意行为触发条件,动态运行的方法难以执行所有的路径。此外,动态分析监控所得到的结果也还需要进行进一步的分析和汇总。静态分析方法主要有模式匹配和语意分析这两种方法,通过检验程序中是否包含给定的行为或语意模式,来判断程序中是否包含特定的恶意行为。这种方法在恶意行为与程序片断间建立了对应关系,但很难建立健壮且区别性很强的模型。In the field of automatic analysis of malicious codes, there are currently two types of automatic analysis methods: dynamic analysis and static analysis. The dynamic analysis method refers to dynamically executing the program to be analyzed in a safe environment and observing its running process and results. This analysis method can be used to discover some behaviors in malicious code, but sometimes the environment does not meet the requirements for code operation, or If it does not meet the triggering conditions of malicious behaviors, it is difficult to execute all paths dynamically. In addition, the results of dynamic analysis and monitoring also need to be further analyzed and summarized. Static analysis methods mainly include pattern matching and semantic analysis. By checking whether the program contains a given behavior or semantic pattern, it is judged whether the program contains specific malicious behavior. This approach establishes correspondences between malicious behaviors and program fragments, but it is difficult to build robust and discriminative models.
考虑到以上这些,本发明提出一种静态分析恶意代码中恶意行为的方法,其对建模要求相对较低,根据二进制程序和恶意代码实现的特点,将程序拆分成构件,然后进行分析和匹配,这种自动分析的方法对恶意程序,特别是僵尸网络程序有非常好的分析效果,可以快速的自动发现待分析样本中恶意行为。Considering the above, the present invention proposes a method for statically analyzing malicious behaviors in malicious codes, which has relatively low modeling requirements. According to the characteristics of binary programs and malicious codes, the program is split into components, and then analyzed and analyzed. Matching, this automatic analysis method has a very good analysis effect on malicious programs, especially botnet programs, and can quickly and automatically discover malicious behaviors in samples to be analyzed.
僵尸网络(Botnet)是指一些被攻陷的机器的组合,这些机器被称为所谓的“僵尸主机”(bot),僵尸网络中往往包含有命令行控制的框架,他们通过蠕虫、木马或者后门等工具攻陷网络上的主机并潜伏下来,再通过其远程控制模块接受僵尸网络的源头,即僵尸网络控制者(bot herder)的远程控制。这些僵尸网络程序通常还包括繁殖与扩散的功能模块,可以自动,或在其控制者的命令下进行网络扫描,侵入其他易感染主机,留下自己的副本。僵尸网络控制者常利用僵尸程序实现某些恶意的目标,如窃取用户银行卡信息,对特定服务器实施拒绝服务攻击等。僵尸网络程序中常包括的功能模块有:命令控制模块、复制/传播模块、主机控制模块、下载/上载文件、模块信息窃取模块、反检测/反分析模块等,其结构如图1(僵尸网络的结构与机理等信息,参见文献:P.Barford andV.Yegneswaran,"An Inside Look at Botnets",Special Workshop on Malware Detection,Advances in Information Security,2006)。Botnet (Botnet) refers to the combination of some machines that have been compromised. These machines are called "bots". Botnets often contain command-line control frameworks. They use worms, Trojan horses or backdoors to The tool captures hosts on the network and lurks down, and then accepts the source of the botnet, that is, the remote control of the botherder through its remote control module. These botnet programs usually also include functional modules for reproduction and proliferation, which can scan the network automatically or under the command of their controllers, invade other vulnerable hosts, and leave their own copies. Botnet controllers often use bots to achieve certain malicious goals, such as stealing user bank card information and implementing denial-of-service attacks on specific servers. The functional modules often included in the botnet program are: command control module, replication/propagation module, host control module, download/upload file, module information stealing module, anti-detection/anti-analysis module, etc., its structure is shown in Figure 1 (botnet For information on structure and mechanism, please refer to literature: P.Barford and V.Yegneswaran, "An Inside Look at Botnets", Special Workshop on Malware Detection, Advances in Information Security, 2006).
发明内容 Contents of the invention
本发明的目的是提供一种恶意代码自动识别的方法,利用逆向工程技术和代码相似性比较技术,加速分析恶意代码,有效的提高了恶意代码分析的效率和覆盖面。The purpose of the present invention is to provide a method for automatic identification of malicious codes, using reverse engineering technology and code similarity comparison technology to accelerate the analysis of malicious codes, effectively improving the efficiency and coverage of malicious code analysis.
本发明的上述目的是通过如下技术方案来实现的:Above-mentioned purpose of the present invention is achieved by following technical scheme:
一种恶意代码自动识别的方法,其步骤包括:A method for automatically identifying malicious codes, the steps comprising:
1)解析待分析的可执行程序样本,得到该程序中的函数结点与函数调用信息;1) Analyze the executable program sample to be analyzed, and obtain the function node and function call information in the program;
2)根据上述函数结点与函数调用信息,提取各构件头函数及其直接或间接调用的所有函数,得到以各构件头函数为标识的待分析构件:2) According to the above function nodes and function call information, extract the head functions of each component and all functions directly or indirectly called, and obtain the components to be analyzed marked by the head functions of each component:
3)将所得各待分析构件逐次与已知恶意行为构件库中的各已知恶意行为构件进行相似性比较,直至一待分析构件与一己知恶意行为构件相似,或比较完所有待分析构件;3) Carry out similarity comparison between each component to be analyzed and each known malicious behavior component in the known malicious behavior component library successively, until a component to be analyzed is similar to a known malicious behavior component, or all components to be analyzed are compared;
4)如存在待分析构件和已知恶意行为构件库中的已知恶意行为构件相似,则判定该程序包含恶意行为。4) If the component to be analyzed is similar to the known malicious behavior component in the known malicious behavior component library, it is determined that the program contains malicious behavior.
进一步,本发明从恶意代码的领域知识提出了二进制可执行程序中构件的表达与提取方法,根据二进制恶意代码的特点,设计了利用函数调用簇表达恶意行为构件的方法,其中函数簇直接或间接调用簇内所有其他函数的函数被称为构件头函数;Further, the present invention proposes a method for expressing and extracting components in binary executable programs from the domain knowledge of malicious codes, and according to the characteristics of binary malicious codes, a method for expressing malicious behavior components by using function call clusters is designed, wherein the function clusters directly or indirectly The function that calls all other functions in the cluster is called the component head function;
本发明中三种识别构件头函数的方法:In the present invention, there are three methods for identifying component header functions:
[1]基于程序调度的构件识别方法:识别被调度程序调度的构件头函数:[1] Component identification method based on program scheduling: Identify component header functions scheduled by the scheduler:
程序调度是将一个消息与一个特定序列的代码相映射的过程。恶意代码,特别是僵尸网络程序,通常根据接受到的指令调用功能构件。因此,识别了调用函数,我们就可以容易的提取该调用函数所调用的恶意行为功能构件。Program scheduling is the process of mapping a message to a specific sequence of codes. Malicious codes, especially botnet programs, usually invoke functional components according to received instructions. Therefore, after identifying the calling function, we can easily extract the malicious behavior functional component called by the calling function.
[2]基于关键API的构件识别方法:识别直接或间接调用一组关键API的构件头函数:[2] Component identification method based on key APIs: identify component header functions that directly or indirectly call a set of key APIs:
恶意代码中的功能性构件通常需要完成某个恶意的任务,如杀死杀毒程序的进程,记录键盘活动,攻击受害网站等。为了完成这些功能,这些构件常常需要直接或间接的调用一组关键的系统API。相对的,可以通过检查那些汇聚了这些API的函数来提取潜在的构件。此时,API集合对于提取构件函数很重要,可以通过定义库,或从其他方法获取的已知恶意构件以供收集。Functional components in malicious code usually need to complete a malicious task, such as killing the process of an antivirus program, recording keyboard activity, attacking a victim website, and so on. In order to complete these functions, these components often need to call a set of key system APIs directly or indirectly. Instead, potential artifacts can be extracted by examining the functions that aggregate these APIs. At this time, API collection is very important for extracting component functions, which can be collected by defining libraries, or obtaining known malicious components from other methods.
[3]基于多次调用原则的构件识别方法:识别被程序中不同的函数段重复调用的构件头函数:[3] Component identification method based on the principle of multiple calls: identify component header functions that are repeatedly called by different function segments in the program:
被重复调用的函数簇,如果不是库函数,也不是系统提供的API,则一般是由用户编写完成某个独立功能的函数模块。这个特点也可用于提取程序中的恶意构件。需要的信息是调用函数与被调用函数的集合及相应的函数调用图。如果一个函数A调用了函数B和C,函数B和C又调用了函数D,则函数D可被表示为潜在的可复用模块,因为D被多次调用,如图2所示。为了避免将库函数和系统API函数误判为可复用的模块,需要实现定义排斥规则,忽略这些函数的重复调用信息。If the repeatedly called function cluster is not a library function or an API provided by the system, it is generally a function module written by the user to complete an independent function. This feature can also be used to extract malicious artifacts in programs. The required information is the collection of calling functions and called functions and the corresponding function call graph. If a function A calls functions B and C, and functions B and C call function D, then function D can be represented as a potentially reusable module because D is called multiple times, as shown in Figure 2. In order to avoid misjudgment of library functions and system API functions as reusable modules, it is necessary to define exclusion rules and ignore repeated call information of these functions.
进一步,本发明提出一种“权重—阈值”函数调用不精确匹配的二进制程序构件相似性的方法,通过对比新样本中获取的构件函数簇与已知恶意构件库中的函数簇来检测存在的已知恶意构件。Further, the present invention proposes a method of "weight-threshold" function call binary program component similarity that is not exactly matched, and detects the existing component function clusters by comparing the component function clusters obtained in the new sample with the function clusters in the known malicious component library. Known malicious artifacts.
进一步,本发明提出一种构建已知恶意行为构件库的方法,其步骤包括:Further, the present invention proposes a method for building a known malicious behavior component library, the steps of which include:
1)解析已知的恶意程序样本,得到该程序中的函数结点与函数调用信息;1) Analyze known malicious program samples to obtain function nodes and function call information in the program;
2)根据已知恶意程序的特征,从步骤1)所得函数中提取各构件头函数及其直接或间接调用的所有函数,得到以各构件头函数为标识的构件;2) According to the characteristics of known malicious programs, extract each component header function and all functions called directly or indirectly from the functions obtained in step 1), and obtain components marked with each component header function;
3)分析上述各构件,识别出恶意行为构件,将恶意行为构件按照设定的格式存入已知恶意行为构件库。3) Analyze the above-mentioned components, identify malicious behavior components, and store the malicious behavior components in the known malicious behavior component library according to the set format.
本发明具有以下积极效果:The present invention has the following positive effects:
1.本发明提供的方法是一种静态的恶意代码分析的方法,与动态分析方法相比,分析覆盖面更广;与模型检验和语意分析的方法相比,不需要设计精确的恶意行为模型,而是构建恶意构件库,已知恶意构件主要是根据恶意程序的结构特点自动提取的,比构建恶意行为模型要容易许多;1. The method provided by the present invention is a static malicious code analysis method, and compared with the dynamic analysis method, the analysis coverage is wider; compared with the method of model checking and semantic analysis, it is not necessary to design an accurate malicious behavior model, Instead, build a malicious component library. Known malicious components are mainly extracted automatically based on the structural characteristics of malicious programs, which is much easier than building a malicious behavior model;
2.对于检测到恶意构件的样本,本发明可以将其拆分得到的构件区分为“已知”与“未知”两类,这可以加速恶意样本的分析过程。对于已经分析过的恶意构件,可以自动由系统生成分析报告,避免重复的分析工作;而如果某个恶意代码中部分构件没有在已知恶意构件库中找到相应的匹配,这个构件很可能包含了新的恶意行为,本发明可以将这些构件挑出来重点分析,分析得到的结果可用于补充完善已知的恶意构件库;2. For the samples of detected malicious components, the present invention can classify the components obtained by splitting them into two types: "known" and "unknown", which can speed up the analysis process of malicious samples. For the malicious components that have been analyzed, the analysis report can be automatically generated by the system to avoid repeated analysis work; and if some components in a malicious code do not find a corresponding match in the known malicious component library, this component is likely to contain For new malicious behaviors, the present invention can pick out these components for key analysis, and the results obtained from the analysis can be used to supplement and improve the known malicious component library;
3.根据恶意构件之间相似性关系,可以对分析过的恶意样本进行分类,追踪新样本的源头,可根据构件相似度构件恶意代码样本之间的族谱关系。3. According to the similarity relationship between malicious components, the analyzed malicious samples can be classified, the source of new samples can be traced, and the genealogical relationship between malicious code samples can be built according to the similarity of components.
附图说明 Description of drawings
下面结合附图和具体实施方式进一步详细地描述本发明:Describe the present invention in further detail below in conjunction with accompanying drawing and specific embodiment:
图1僵尸网络程序的程序结构示意图Figure 1 Schematic diagram of the program structure of the botnet program
图2本发明定义的“构件”的示意图Fig. 2 is a schematic diagram of the "component" defined by the present invention
图3一个具体实施案例的流程图Figure 3 is a flowchart of a specific implementation case
具体实施方式 Detailed ways
本发明中,构件定义为完成某个特度特定功能的一簇相关函数构成的程序模块。这个函数簇中包含一个头函数,也就是构建头,它直接或间接的调用簇中的所有其他的函数。图2所示即为本发明中定义的构件的示意图。In the present invention, a component is defined as a program module composed of a cluster of related functions to complete a specific function. This function cluster contains a header function, that is, the construction header, which directly or indirectly calls all other functions in the cluster. Figure 2 is a schematic diagram of the components defined in the present invention.
本发明中用到了逆向工程领域中的构件提取技术和二进制代码相似性比较技术。构件提取技术是软件工程领域研究的一个重要议题,其主要目标是从遗产代码中识别可复用构件。构件提取和评估的方法(参见文献:罗景,张路,孙家骕《构件提取技术综述》,计算机科学2005年12月第32卷),包括从领域知识、结构以及构件的度量等方面,这种方法同样用于恶意代码的分析领域。二进制代码相似性比较技术(参见文献E.Carreraand G.Erdelyi,"Digital genome mapping:Advanced binary malware analysis",Proceedingsof 15th Virus Bulletin International Conference(VB2004),p187-197,2004)是软件工程领域广泛研究的议题,现在被越来越多的应用到了恶意代码研究的领域。API序列、函数调用图,控制流图,程序依赖图等均被用于二进制代码相似性比较。The invention uses component extraction technology and binary code similarity comparison technology in the field of reverse engineering. Component extraction technology is an important research topic in the field of software engineering, and its main goal is to identify reusable components from legacy code. The method of component extraction and evaluation (see literature: Luo Jing, Zhang Lu, Sun Jiasu "Review of Component Extraction Technology", Computer Science, Volume 32, December 2005), including domain knowledge, structure and component measurement, etc., this kind of The method is also used in the field of analysis of malicious code. Binary code similarity comparison technology (see E. Carrera and G. Erdelyi, "Digital genome mapping: Advanced binary malware analysis", Proceedings of 15th Virus Bulletin International Conference (VB2004), p187-197, 2004) is widely studied in the field of software engineering The topic is now being more and more applied to the field of malicious code research. API sequences, function call graphs, control flow graphs, program dependency graphs, etc. are all used for binary code similarity comparison.
图3所示为本发明一个具体实施案例的流程图,其流程如下:Fig. 3 shows the flow chart of a specific implementation case of the present invention, and its flow process is as follows:
1.构建恶意构件库1. Build a malicious component library
[1]对于一些已知的恶意程序样本,使用反汇编器解析,根据程序函数特征信息,提取出程序中函数结点与函数调用信息(函数结点与函数调用信息提取方法请参见《黑客反汇编揭密》,Kris Kaspersky著,谭明金译,电子工业出版社.2005,P85);[1] For some known malicious program samples, use a disassembler to analyze, and extract the function node and function call information in the program according to the program function feature information (for the method of extracting function node and function call information, please refer to "Hacker Anti- Compilation Revealed", written by Kris Kaspersky, translated by Tan Mingjin, Electronic Industry Press. 2005, P85);
[2]根据已知恶意代码程序调度的特征,提取恶意构件的头函数。常用的程序调度有两种实现方式。第一种是直接调度,由调度函数解析接受的消息,然后直接调用相应的函数,第二种是注册调度:每个构件将自己注册到全局调度表中,调度程序根据命令解析结果查阅全局调度表调用相应的函数。在这里本发明从rbot和sdbot样本中的调度函数irc_parseline()所调用的函数和agobot样本中的注册函数g_cMainCtrl.m_cCommands所注册的函数中提取出已知恶意构件头函数;[2] According to the characteristics of the known malicious code program scheduling, the header function of the malicious component is extracted. There are two common implementations of program scheduling. The first is direct scheduling, where the dispatch function parses the received message, and then directly calls the corresponding function. The second is registration scheduling: each component registers itself in the global scheduling table, and the scheduler consults the global scheduling according to the command parsing results The table calls the corresponding function. Here the present invention extracts the known malicious component header function from the function called by the scheduling function irc_parseline () in the rbot and sdbot samples and the function registered by the registration function g_cMainCtrl.m_cCommands in the agobot sample;
[3]根据头函数和程序中的函数结点关系,提取出已知的恶意构件。对这些已知恶意构件补充描述样本来源和功能信息后,将其保存到已知恶意构件库。[3] According to the header function and the function node relationship in the program, the known malicious components are extracted. After supplementing the source and function information of these known malicious components, save them in the known malicious component library.
2.提取待分析的可执行程序样本中的待选构件2. Extract the candidate components in the executable program sample to be analyzed
[1]对于待分析的未知样本,使用反汇编器解析,根据程序函数特征信息,提取出程序中函数结点与函数调用信息;[1] For the unknown sample to be analyzed, use a disassembler to analyze, and extract the function node and function call information in the program according to the feature information of the program function;
[2]提取其中的待选构件,采用关键API汇聚和多次调用原则寻找恶意构件头函数,如,杀死杀毒程序,通常需要调用(AdjustTokenPrivileges,CreateToolhelp32Snapshot,LookupPrivilegeValueA,Module32First,OpenProcess,Process32First,Process32Next,TerminateProcess)等API集合;记录击键活动,通常需要调用(GetAsyncKeyState,GetForegroundWindow,GetKeyState)等API集合;[2] Extract the components to be selected, and use the principle of key API aggregation and multiple calls to find malicious component header functions, such as killing antivirus programs, usually need to call (AdjustTokenPrivileges, CreateToolhelp32Snapshot, LookupPrivilegeValueA, Module32First, OpenProcess, Process32First, Process32Next, TerminateProcess) and other API collections; record keystroke activities, usually need to call (GetAsyncKeyState, GetForegroundWindow, GetKeyState) and other API collections;
3.对待选构件与已知恶意行为构件库中的构件,做构件相似度比较如果显示存在与已知恶意构件相匹配的构件,则判定待分析样本是恶意代码,且包括已知某家族的某种意图恶意行为构件,根据这些信息自动生成样本的分析报告;同时筛选出疑似恶意行为构件,对其作进一步深入分析,分析得到的结果补充完善已知的恶意行为构件库。3. Compare the component similarity between the component to be selected and the components in the known malicious behavior component library. If it shows that there is a component that matches the known malicious component, it is determined that the sample to be analyzed is malicious code, and includes a known family. Some intentional malicious behavior components will automatically generate sample analysis reports based on these information; at the same time, suspected malicious behavior components will be screened out for further in-depth analysis, and the results obtained from the analysis will supplement and improve the known malicious behavior component library.
其中构件相似度比较,通过对新样本中获取的构件函数簇与已知恶意构件库中的函数簇做对比来识别存在的已知恶意构件。如果在新样本的待选构件中存在函数和已知恶意构件中的构件头函数的构件特征相似度高于预设阈值(这里设定为0.8),就认为新样本中包含了该恶意构件。因此,构件检测的问题就转化为函数比较的问题。本发明提出一种“权重—阈值”函数调用图不精确匹配的方法,主要包括以下4个步骤:Among them, the component similarity is compared, and the known malicious components are identified by comparing the component function clusters obtained in the new sample with the function clusters in the known malicious component library. If the component feature similarity between the function in the candidate component of the new sample and the component header function in the known malicious component is higher than the preset threshold (here set as 0.8), it is considered that the malicious component is included in the new sample. Therefore, the problem of component detection is transformed into a problem of function comparison. The present invention proposes a method for inaccurate matching of "weight-threshold" function call graph, which mainly includes the following four steps:
1.对目标样本的函数调用图和已知恶意构件的函数调用图进行拓扑排序(参见文献:许卓群,杨冬青,唐世渭,张铭,《数据结构与算法》,高等教育出版社,第六章,p183,2004);将函数调用图中的函数结点排成一个有序序列,使得该序列中排在前面的函数不会调用排在其后面的函数结点,递归调用除外;1. Perform topological sorting on the function call graph of the target sample and the function call graph of known malicious components (see literature: Xu Zhuoqun, Yang Dongqing, Tang Shiwei, Zhang Ming, "Data Structure and Algorithm", Higher Education Press, Chapter 6 , p183, 2004); Arrange the function nodes in the function call graph into an ordered sequence, so that the function nodes in the front of the sequence will not call the function nodes in the back, except for recursive calls;
2.计算结点权重W(F):沿着拓扑序列计算目标样本和已知恶意构件中的每一个函数的结点权重。如果一个函数结点未调用任何函数,则该函数结点的权重设置为1,如果一个函数调用了其他函数(不包括递归调用),其权重设置为被调用函数的权重加1;2. Calculate the node weight W(F): Calculate the node weight of each function in the target sample and known malicious components along the topological sequence. If a function node does not call any function, the weight of the function node is set to 1, and if a function calls other functions (excluding recursive calls), its weight is set to the weight of the called function plus 1;
3.计算基于权重和阈值计算的相似度S(F,G):沿着拓扑序列计算待分析样本中每个函数F与已知构件内部中的每个函数G之间的相似度。分如下三种情况:3. Calculate the similarity S(F, G) calculated based on weight and threshold: Calculate the similarity between each function F in the sample to be analyzed and each function G in the known component along the topological sequence. There are three situations as follows:
a)当两个函数都是API时,若相同,则相似度记为1,若不同,相似度记为0;a) When both functions are APIs, if they are the same, the similarity is recorded as 1, and if they are different, the similarity is recorded as 0;
b)当一个函数是API,而另外一个不是时,相似度记为0;b) When one function is an API and the other is not, the similarity is recorded as 0;
c)当两个函数都不是API时,其相似度定义为两者调用的函数集合中的函数对的加权相似度之和的最大值;如果该最大值低于一个预设的阈值时(如0.8),则定义这两个函数的相似度为0。具体的计算方法如下:c) When neither of the two functions is an API, its similarity is defined as the maximum value of the sum of the weighted similarities of the function pairs in the function set called by the two; if the maximum value is lower than a preset threshold (such as 0.8), then the similarity between these two functions is defined as 0. The specific calculation method is as follows:
i.设F的结点权重较大,记为T;i. Let the node weight of F be larger, denoted as T;
ii.记F调用的函数集合为{f0,f1,...,fm},记G调用的函数集合为{g0,g1,...,gn},记r=min(m,n)。注意,因为相似度的计算是沿着拓扑序列进行的,所以fi和gj之间的相似度都已经计算完毕,即S(fi,gj)已知;ii. Denote the set of functions called by F as {f 0 , f 1 , ..., f m }, the set of functions called by G as {g 0 , g 1 , ..., g n }, and denote r=min (m, n). Note that because the calculation of similarity is carried out along the topological sequence, the similarity between f i and g j has been calculated, that is, S(f i , g j ) is known;
iii.对于序列Gk={gk0,gk1,...,gkr}(其中0≦ki≦r,且k1≠kj如果i≠j),候选加权相似度S′=(1+∑(W(fi)*S(fi,gki)))/T;iii. For the sequence G k ={g k0 , g k1 ,...,g kr } (where 0≦k i ≦r, and k 1 ≠k j if i≠j), candidate weighted similarity S′=( 1+∑(W(fi)*S(f i , g ki )))/T;
iv.记序列Gk′={gk0′,gk1′,...,gkr′}是使得S′最大的一个序列,记此最大值为Smax;iv. Note that the sequence G k ′={g k0 ′, g k1 ′, ..., g kr ′} is a sequence that makes S′ the largest, and record the maximum value as S max ;
v.如果Smax大于预设阈值,如0.8,则F和G的相似度为Smax;否则F和G的相似度为0。v. If S max is greater than a preset threshold, such as 0.8, then the similarity between F and G is S max ; otherwise, the similarity between F and G is 0.
4.对于一个已知恶意构件头函数F′,如果在目标样本中存在函数G′,使得S(F′,G′)高于预设阈值(如0.8),则可以认为在目标样本中存在该恶意构件。实验中,预设阈值为0.8时,产生的效果最佳。4. For a known malicious component header function F', if there is a function G' in the target sample, so that S(F', G') is higher than the preset threshold (such as 0.8), it can be considered that there is a function G' in the target sample The malicious component. In the experiment, when the preset threshold is 0.8, the effect is the best.
如上所述,本发明利用恶意程序中功能代码复用的特点,提出一种全新的基于二进制构件分析的方法来实现恶意代码的自动识别。本发明中的方法已被应用于中国蜜网联盟北大蜜网项目组(http://www.icst.pku.edu.cn/honeynetweb/index.htm)捕获的僵尸网络程序样本,极大的加快了恶意代码样本分析的速度,取得了很好的效果,实现了本发明的目的。本发明具有很好的实用性和推广应用前景。As mentioned above, the present invention utilizes the feature of function code reuse in malicious programs, and proposes a brand-new method based on binary component analysis to realize automatic identification of malicious codes. The method in the present invention has been applied to the botnet program samples captured by China Honeynet Alliance Beida Honeynet Project Group (http://www.icst.pku.edu.cn/honeynetweb/index.htm), greatly accelerating The speed of malicious code sample analysis is improved, good results are obtained, and the purpose of the present invention is realized. The invention has good practicability and popularization and application prospect.
尽管为说明发明目的公开了具体实施例和附图,其目的在于帮助理解本发明的内容并据以实施,但是本领域的技术人员可以理解:在不脱离本发明及所附的权利要求的精神和范围内,各种替换、变化和修改都是可能的。因此,本发明不应局限于最佳实施例和附图所公开的内容,本发明要求保护的范围以权利要求书界定的范围为准。Although specific embodiments and drawings are disclosed for the purpose of illustrating the invention, the purpose is to help understand the content of the present invention and implement it accordingly, but those skilled in the art can understand that: without departing from the spirit of the present invention and the appended claims Various substitutions, changes and modifications are possible within the scope and scope. Therefore, the present invention should not be limited to the content disclosed in the preferred embodiments and drawings, and the protection scope of the present invention should be defined by the claims.
Claims (10)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CNB2007101219336A CN100504903C (en) | 2007-09-18 | 2007-09-18 | A Malicious Code Automatic Identification Method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CNB2007101219336A CN100504903C (en) | 2007-09-18 | 2007-09-18 | A Malicious Code Automatic Identification Method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN101140611A CN101140611A (en) | 2008-03-12 |
CN100504903C true CN100504903C (en) | 2009-06-24 |
Family
ID=39192559
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CNB2007101219336A Expired - Fee Related CN100504903C (en) | 2007-09-18 | 2007-09-18 | A Malicious Code Automatic Identification Method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN100504903C (en) |
Families Citing this family (34)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101388056B (en) * | 2008-10-20 | 2010-06-02 | 成都市华为赛门铁克科技有限公司 | A method, system and device for preventing malicious programs |
CN102035793B (en) * | 2009-09-28 | 2014-05-07 | 成都市华为赛门铁克科技有限公司 | Botnet detecting method, device and network security protective equipment |
CN102054149B (en) * | 2009-11-06 | 2013-02-13 | 中国科学院研究生院 | Method for extracting malicious code behavior characteristic |
US9449175B2 (en) | 2010-06-03 | 2016-09-20 | Nokia Technologies Oy | Method and apparatus for analyzing and detecting malicious software |
CN102034042B (en) * | 2010-12-13 | 2012-10-03 | 四川大学 | Novel unwanted code detecting method based on characteristics of function call relationship graph |
CN101984450B (en) * | 2010-12-15 | 2012-10-24 | 北京安天电子设备有限公司 | Malicious code detection method and system |
CN102291397A (en) * | 2011-08-04 | 2011-12-21 | 中国科学院计算技术研究所 | Bot network tracking method |
WO2013088461A1 (en) * | 2011-12-12 | 2013-06-20 | 株式会社 日立製作所 | Software analysis program and software analysis system |
CN103177214B (en) * | 2011-12-23 | 2016-02-10 | 宇龙计算机通信科技(深圳)有限公司 | The detection method of Malware, system and communication terminal |
CN103778371A (en) * | 2012-10-22 | 2014-05-07 | 腾讯科技(深圳)有限公司 | Plug-in installation monitoring method and terminal |
CN104252594B (en) * | 2013-06-27 | 2019-04-02 | 贝壳网际(北京)安全技术有限公司 | virus detection method and device |
CN103473507B (en) * | 2013-09-25 | 2016-03-30 | 西安交通大学 | A kind of Android malicious code detecting method |
CN103914657B (en) * | 2014-04-16 | 2016-10-19 | 南京大学 | A malicious program detection method based on function features |
CN104021343B (en) * | 2014-05-06 | 2016-08-24 | 南京大学 | A kind of rogue program based on heap access module monitoring method and system |
CN104021346B (en) * | 2014-06-06 | 2017-02-22 | 东南大学 | Method for detecting Android malicious software based on program flow chart |
JP6106340B2 (en) * | 2014-06-06 | 2017-03-29 | 日本電信電話株式会社 | Log analysis device, attack detection device, attack detection method and program |
CN105488409B (en) * | 2014-12-31 | 2018-04-24 | 哈尔滨安天科技股份有限公司 | A kind of method and system for detecting malicious code family's mutation and new family |
CN104715190B (en) * | 2015-02-03 | 2018-02-06 | 中国科学院计算技术研究所 | A kind of monitoring method and system of the program execution path based on deep learning |
CN104636665B (en) * | 2015-02-03 | 2018-01-05 | 南京理工大学 | A kind of description of Android application programs and matching process |
RU2618947C2 (en) * | 2015-06-30 | 2017-05-11 | Закрытое акционерное общество "Лаборатория Касперского" | Method of preventing program operation comprising functional undesirable for user |
CN105046152B (en) * | 2015-07-24 | 2018-01-26 | 四川大学 | Malware Detection Method Based on Function Call Graph Fingerprint |
CN106557695B (en) * | 2015-09-25 | 2019-05-10 | 卓望数码技术(深圳)有限公司 | A kind of malicious application detection method and system |
CN105740711B (en) * | 2016-01-29 | 2018-08-31 | 哈尔滨工业大学深圳研究生院 | A kind of malicious code detecting method and system based on kernel objects behavior ontology |
CN106341282A (en) * | 2016-11-10 | 2017-01-18 | 广东电网有限责任公司电力科学研究院 | Malicious code behavior analyzer |
CN106709350B (en) * | 2016-12-30 | 2020-01-14 | 腾讯科技(深圳)有限公司 | Virus detection method and device |
WO2019091028A1 (en) * | 2017-11-10 | 2019-05-16 | 华为技术有限公司 | Method and terminal for application software malicious behavior dynamic alarm |
CN108040064A (en) * | 2017-12-22 | 2018-05-15 | 北京知道创宇信息技术有限公司 | Data transmission method, device, electronic equipment and storage medium |
CN108182364B (en) * | 2017-12-29 | 2022-07-15 | 安天科技集团股份有限公司 | Method and system for identifying attack homology based on call dependency relationship |
CN110555305A (en) * | 2018-05-31 | 2019-12-10 | 武汉安天信息技术有限责任公司 | Malicious application tracing method based on deep learning and related device |
CN109753796B (en) * | 2018-12-07 | 2021-06-08 | 广东技术师范学院天河学院 | A kind of big data computer network security protection device and using method |
CN109683888A (en) * | 2018-12-19 | 2019-04-26 | 睿驰达新能源汽车科技(北京)有限公司 | A kind of multiplexing method and reusable business module of business module |
CN110765457A (en) * | 2018-12-24 | 2020-02-07 | 哈尔滨安天科技集团股份有限公司 | Method and device for identifying homologous attack based on program logic and storage device |
CN111241544B (en) * | 2020-01-08 | 2023-05-02 | 北京梆梆安全科技有限公司 | Malicious program identification method and device, electronic equipment and storage medium |
CN114817924B (en) * | 2022-05-19 | 2023-04-07 | 电子科技大学 | AST (AST) and cross-layer analysis based android malicious software detection method and system |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1581084A (en) * | 2004-05-20 | 2005-02-16 | 北京大学 | Binary system software member and its manufacturing method |
CN1845120A (en) * | 2006-05-16 | 2006-10-11 | 北京启明星辰信息技术有限公司 | Automatic analysis system and method for malicious code |
CN101013461A (en) * | 2007-02-14 | 2007-08-08 | 白杰 | Method of computer protection based on program behavior analysis |
-
2007
- 2007-09-18 CN CNB2007101219336A patent/CN100504903C/en not_active Expired - Fee Related
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1581084A (en) * | 2004-05-20 | 2005-02-16 | 北京大学 | Binary system software member and its manufacturing method |
CN1845120A (en) * | 2006-05-16 | 2006-10-11 | 北京启明星辰信息技术有限公司 | Automatic analysis system and method for malicious code |
CN101013461A (en) * | 2007-02-14 | 2007-08-08 | 白杰 | Method of computer protection based on program behavior analysis |
Non-Patent Citations (2)
Title |
---|
Intrusion Detection via Static Analysis. David Wagner, Drew Dean.Security and Privacy,2001. 2001 |
Intrusion Detection via Static Analysis. David Wagner, Drew Dean.Security and Privacy,2001. 2001 * |
Also Published As
Publication number | Publication date |
---|---|
CN101140611A (en) | 2008-03-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN100504903C (en) | A Malicious Code Automatic Identification Method | |
Shijo et al. | Integrated static and dynamic analysis for malware detection | |
CN106790186B (en) | Multi-step attack detection method based on multi-source abnormal event correlation analysis | |
CN105208037B (en) | A kind of DoS/DDoS attack detectings and filter method based on lightweight intrusion detection | |
JP6348656B2 (en) | Malware-infected terminal detection device, malware-infected terminal detection system, malware-infected terminal detection method, and malware-infected terminal detection program | |
Azab et al. | Mining malware to detect variants | |
US10516671B2 (en) | Black list generating device, black list generating system, method of generating black list, and program of generating black list | |
US8769692B1 (en) | System and method for detecting malware by transforming objects and analyzing different views of objects | |
CN103679030B (en) | Malicious code analysis and detection method based on dynamic semantic features | |
CN113221109B (en) | An intelligent analysis method of malicious files based on generative adversarial network | |
CN104933364B (en) | A kind of malicious code based on the behavior of calling automates homologous determination method and system | |
CN105245495A (en) | A fast detection method for malicious shellcode based on similarity matching | |
Thunga et al. | Identifying metamorphic virus using n-grams and hidden markov model | |
CN116015861A (en) | Data detection method and device, electronic equipment and storage medium | |
CN107368740B (en) | Detection method and system for executable codes in data file | |
Sahu et al. | A review of malware detection based on pattern matching technique | |
US11321453B2 (en) | Method and system for detecting and classifying malware based on families | |
Lin et al. | Using graph neural network to ransomware detection for cyber threats | |
CN110650157A (en) | Fast-flux domain name detection method based on ensemble learning | |
Vignesh et al. | Malware detection using ensemble learning and file monitoring | |
Almarshad et al. | Detecting zero-day polymorphic worms with jaccard similarity algorithm | |
Guo et al. | A malware detection algorithm based on multi-view fusion | |
Yin et al. | APT Attack Detection Method Based on Traceability Graph [J] | |
Wang et al. | MrKIP: Rootkit Recognition With Kernel Function Invocation Pattern. | |
Qi et al. | A design of network behavior-based malware detection system for android |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20090624 Termination date: 20140918 |
|
EXPY | Termination of patent right or utility model |