CN103679034A - Computer virus analyzing system based on body and virus feature extraction method - Google Patents
Computer virus analyzing system based on body and virus feature extraction method Download PDFInfo
- Publication number
- CN103679034A CN103679034A CN201310750929.1A CN201310750929A CN103679034A CN 103679034 A CN103679034 A CN 103679034A CN 201310750929 A CN201310750929 A CN 201310750929A CN 103679034 A CN103679034 A CN 103679034A
- Authority
- CN
- China
- Prior art keywords
- virus
- sample
- rule
- ontology
- module
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 241000700605 Viruses Species 0.000 title claims abstract description 179
- 238000000605 extraction Methods 0.000 title abstract description 10
- 230000006399 behavior Effects 0.000 claims abstract description 36
- 238000004458 analytical method Methods 0.000 claims abstract description 22
- 230000003542 behavioural effect Effects 0.000 claims abstract description 10
- 238000010276 construction Methods 0.000 claims abstract description 10
- 230000003044 adaptive effect Effects 0.000 claims abstract description 9
- 239000000284 extract Substances 0.000 claims abstract description 8
- 230000006870 function Effects 0.000 claims description 39
- 238000000034 method Methods 0.000 claims description 32
- 238000012545 processing Methods 0.000 claims description 10
- 238000004422 calculation algorithm Methods 0.000 claims description 9
- 239000011159 matrix material Substances 0.000 claims description 8
- 238000012217 deletion Methods 0.000 claims description 2
- 230000037430 deletion Effects 0.000 claims description 2
- 238000012986 modification Methods 0.000 claims description 2
- 230000004048 modification Effects 0.000 claims description 2
- 238000011160 research Methods 0.000 claims description 2
- 230000003612 virological effect Effects 0.000 claims 5
- 230000008878 coupling Effects 0.000 claims 1
- 238000010168 coupling process Methods 0.000 claims 1
- 238000005859 coupling reaction Methods 0.000 claims 1
- 230000000694 effects Effects 0.000 claims 1
- 230000013011 mating Effects 0.000 claims 1
- 241000894007 species Species 0.000 claims 1
- 230000009897 systematic effect Effects 0.000 claims 1
- 239000008186 active pharmaceutical agent Substances 0.000 description 15
- 238000001514 detection method Methods 0.000 description 13
- 238000004364 calculation method Methods 0.000 description 7
- 238000005516 engineering process Methods 0.000 description 4
- 238000007792 addition Methods 0.000 description 1
- 230000001364 causal effect Effects 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/50—Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
- G06F21/55—Detecting local intrusion or implementing counter-measures
- G06F21/56—Computer malware detection or handling, e.g. anti-virus arrangements
- G06F21/561—Virus type analysis
Landscapes
- Engineering & Computer Science (AREA)
- Computer Security & Cryptography (AREA)
- Health & Medical Sciences (AREA)
- Virology (AREA)
- Computer Hardware Design (AREA)
- General Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
本发明提供一种基于本体的计算机病毒分析系统及病毒特征提取方法,其在Pin平台上获得关键系统调用及内存信息,根据已有知识提取数据依赖关系与控制依赖关系,构建行为依赖图来表示描述病毒语义的行为特征,以此建立计算机病毒本体系统,在病毒样本增加的情况下实现自适应的特征学习与本体构建。本发明通过提取计算机病毒的特征与本体构建使得细粒度地发现病毒行为与指令之间的关系,描述计算机病毒,从而达到准确分析与判断计算机病毒的目的。The invention provides an ontology-based computer virus analysis system and virus feature extraction method, which obtains key system calls and memory information on the Pin platform, extracts data dependencies and control dependencies based on existing knowledge, and constructs a behavior dependency graph to represent Describe the behavioral characteristics of virus semantics, in order to establish a computer virus ontology system, and realize adaptive feature learning and ontology construction when virus samples increase. By extracting the characteristics of computer viruses and constructing ontology, the invention discovers the relationship between virus behavior and instructions in a fine-grained manner, and describes computer viruses, so as to achieve the purpose of accurately analyzing and judging computer viruses.
Description
技术领域technical field
本发明属于计算机病毒分析领域,具体涉及一种基于本体的计算机病毒分析系统及其特征提取方法。The invention belongs to the field of computer virus analysis, and in particular relates to an ontology-based computer virus analysis system and a feature extraction method thereof.
背景技术Background technique
计算机病毒(Computer Virus)是编制者在计算机程序中插入的破坏计算机功能或者破坏数据,影响计算机使用并且能够自我复制的一组计算机指令或者程序代码。与医学上的“病毒”不同,计算机病毒不是天然存在的,是某些人利用计算机软件和硬件所固有的脆弱性编制的一组指令集或程序代码。它能通过某种途径潜伏在计算机的存储介质(或程序)里,当达到某种条件时即被激活,通过修改其他程序的方法将自己的精确拷贝或者可能演化的形式放入其他程序中,从而感染其他程序,对计算机资源进行破坏。A computer virus is a set of computer instructions or program codes that are inserted into a computer program by the compiler to destroy computer functions or data, affect computer use, and are capable of self-replication. Different from medical "viruses", computer viruses do not exist naturally, but are a set of instruction sets or program codes compiled by some people by taking advantage of the inherent fragility of computer software and hardware. It can hide in the storage medium (or program) of the computer through some way, and it will be activated when a certain condition is met, and put its exact copy or possibly evolved form into other programs by modifying other programs. Thereby infecting other programs and destroying computer resources.
目前,常用的病毒检测方法为特征代码法。特征代码法是检测已知病毒的最简单、开销最小的方法。它的实现是采集已知病毒样本,建立病毒数据库。当病毒检测开始时,打开被检测文件,在文件中搜索,检查文件中是否含有病毒数据库中的病毒特征代码。如果发现被检测文件中存在病毒特征代码,由于特征代码与病毒一一对应,便可以断定,被查文件中患有何种病毒。Currently, the commonly used virus detection method is the signature code method. The signature method is the simplest and least expensive way to detect known viruses. Its implementation is to collect known virus samples and build a virus database. When the virus detection starts, open the detected file, search in the file, and check whether the file contains the virus signature code in the virus database. If it is found that there is a virus signature code in the detected file, since the signature code corresponds to the virus one by one, it can be concluded that what kind of virus is contained in the checked file.
现今,计算机病毒分析与检测工具已经实用化,尤其是通过对病毒样本的分析、提取特征码以及病毒样本性质的计算机病毒检测工具。这些计算机病毒检测工具是使用统计分析、模糊识别与机器学习方法,寻找样本的特征值,结合虚拟机技术和启发式扫描技术,检测特征码的存在。这些计算机病毒检测工具应用图形相似和/或二次检测等方法,根据程序之间相似度即与特征相似度对病毒进行家族分类,其中由于部分传统病毒有其明显特征码且变化较少,人们对某些病毒形态了解较深刻,所以在特征码明显或特征值可以较为完整描述病毒及病毒变种性质的情况下,病毒检测工具效果良好。Nowadays, computer virus analysis and detection tools have been put into practical use, especially computer virus detection tools through analyzing virus samples, extracting signatures and virus sample properties. These computer virus detection tools use statistical analysis, fuzzy identification and machine learning methods to find the characteristic values of samples, and combine virtual machine technology and heuristic scanning technology to detect the existence of characteristic codes. These computer virus detection tools use graphic similarity and/or secondary detection methods to classify virus families according to the similarity between programs, that is, the similarity of features. Because some traditional viruses have obvious signatures and less changes, people We have a deep understanding of some virus forms, so the virus detection tool works well when the signature code is obvious or the signature value can describe the nature of the virus and its variants more completely.
但是随着智能技术发展,病毒编制与病毒检测永远都是一个事务的两个方面,随着新病毒与病毒变种不断出现,加之病毒变形技术的使用,在特征码不明显,或特征值不能完整描述病毒及病毒变种性质的情况下,现有计算机病毒检测容易造成检测失败。However, with the development of intelligent technology, virus preparation and virus detection are always two aspects of the same transaction. With the continuous emergence of new viruses and virus variants, coupled with the use of virus deformation technology, when the signature code is not obvious, or the signature value is not complete In the case of describing the nature of viruses and virus variants, the existing computer virus detection is likely to cause detection failure.
发明内容Contents of the invention
为了解决上述问题,本发明人针对现有技术的不足,经过多次设计和研究,本发明提供了一种基于本体的计算机病毒分析系统及其特征提取方法,该发明能够达到适应病毒变种且较为准确分析与判断计算机病毒的目的。In order to solve the above-mentioned problems, the inventors aim at the deficiencies in the prior art, and after many times of design and research, the present invention provides a computer virus analysis system based on ontology and its feature extraction method, which can adapt to virus variants and is relatively Accurately analyze and judge the purpose of computer viruses.
依据本发明的第一方面,提供一种基于本体的计算机病毒分析系统,其在Pin平台上获得关键系统调用及内存信息,根据已有知识提取数据依赖关系与控制依赖关系,构建行为依赖图来表示描述病毒语义的行为特征,以此建立计算机病毒本体系统,在病毒样本增加的情况下实现自适应的特征学习与本体构建。According to the first aspect of the present invention, an ontology-based computer virus analysis system is provided, which obtains key system calls and memory information on the Pin platform, extracts data dependencies and control dependencies according to existing knowledge, and constructs a behavior dependency graph to Represent the behavioral features that describe the semantics of viruses, so as to establish a computer virus ontology system, and realize adaptive feature learning and ontology construction when virus samples increase.
优选地,在Pin平台上运行处理获得待检测样本的含有关键系统调用及内存信息的轨迹文件,根据所建立描述典型行为的规则库的内容,分析轨迹文件提取数据依赖关系与控制依赖关系。Preferably, run the process on the Pin platform to obtain the trajectory file containing key system calls and memory information of the sample to be detected, and analyze the trajectory file to extract data dependencies and control dependencies according to the established rule base describing typical behaviors.
进一步地,构建有向图来表示描述病毒语义的行为特征,并与规则匹配,得出各个规则的表现程度。Further, a directed graph is constructed to represent the behavioral features describing the semantics of the virus, and matched with the rules to obtain the performance degree of each rule.
更优选地,以得到的各个规则的表现程度来建立计算机病毒本体,对待测样本通过相似度计算,确定在病毒本体知识树的位置,给出系统分析的结果。More preferably, the computer virus ontology is established based on the obtained expression degree of each rule, and the position of the virus ontology knowledge tree is determined through the similarity calculation of the samples to be tested, and the result of the system analysis is given.
具体地,基于本体的计算机病毒分析系统包括如下模块:Specifically, the ontology-based computer virus analysis system includes the following modules:
(1)Pin平台处理模块,其对计算机病毒样本使用Pin平台上编写的程序进行处理,输出为轨迹文件,轨迹文件包含病毒样本的关键系统调用流程及内存信息;(1) Pin platform processing module, which uses the program written on the Pin platform to process the computer virus sample, and the output is a track file, and the track file contains the key system call process and memory information of the virus sample;
(2)具有自动更新的功能规则库模块,其使用经验知识,通过研究计算机病毒典型行为的编程实现手段,提取数据依赖关系与控制依赖关系来表示已知计算机病毒的典型行为;(2) Functional rule base module with automatic update, which uses empirical knowledge to represent the typical behavior of known computer viruses by extracting data dependencies and control dependencies by studying the programming implementation means of typical behaviors of computer viruses;
(3)规则匹配模块,规则匹配模块对Pin平台处理后输出的样本轨迹文件进行逐行分析,得出该样本轨迹文件的全部函数与数据的顺序与依赖关系,与规则库中的规则进行匹配,输出匹配具体结果,使用本体知识对匹配具体结果进行处理与分类;(3) Rule matching module, the rule matching module analyzes the sample trajectory file output after the Pin platform processing line by line, obtains the order and dependency relationship of all functions and data of the sample trajectory file, and matches with the rules in the rule base , output matching specific results, and use ontology knowledge to process and classify the matching specific results;
(4)本体管理模块,其具有构建与查询功能,所建立的本体以OWL格式文件的形式存在;所建立的本体具有一般本体的通用性,对已知病毒利用已知特征,使用本体知识通过protégéapi手动构建本体;(4) Ontology management module, which has construction and query functions, and the established ontology exists in the form of OWL format files; the established ontology has the versatility of general ontology, uses known features for known viruses, uses ontology knowledge to pass protégéapi builds ontology manually;
(5)本体的自适应学习模块,对于不断增加的病毒样本,使用聚类算法,在病毒本体知识树添加新出现的病毒特征与病毒种类;(5) The self-adaptive learning module of Ontology uses clustering algorithm to add new virus characteristics and virus types to the virus ontology knowledge tree for the increasing number of virus samples;
(6)本体相似度计算模块,对给出规则匹配结果的病毒样本,进行属性的相似度计算,给出病毒本体知识树中位置,得出病毒分析的最终结果。(6) Ontology similarity calculation module, which calculates the attribute similarity of the virus sample with the rule matching result, gives the position in the knowledge tree of the virus ontology, and obtains the final result of the virus analysis.
优选地,本体管理模块实现了手动的类别、属性、实例的添加或删除或修改,并能实现病毒查询的功能;Preferably, the ontology management module realizes manual category, attribute, instance addition or deletion or modification, and can realize the function of virus query;
依据本发明的第二方面,提供一种基于上述计算机病毒分析系统的计算机病毒特征提取方法,所述计算机病毒特征提取方法包括以下步骤:According to the second aspect of the present invention, there is provided a computer virus feature extraction method based on the above-mentioned computer virus analysis system, the computer virus feature extraction method comprising the following steps:
1)规则库模块中的规则为关键各种典型病毒行为编写方法的描述,规则库模块中的规则是关键系统调用之间的协调与组合,采用序列描述表示API函数的出现次序与各API函数之间的参数相等以及因果的逻辑关系;1) The rules in the rule base module are the descriptions of key various typical virus behavior writing methods. The rules in the rule base module are the coordination and combination of key system calls. The sequence description is used to represent the order of appearance of API functions and the order of each API function. The parameters are equal and the logical relationship between causality;
2)使用Pin提供的API编写Pintool来提取运行中的代码特征,输出为样本轨迹文件,样本轨迹文件包含了样本的按时序排列的关键系统调用及内存信息;2) Use the API provided by Pin to write Pintool to extract the running code features, and output it as a sample trajectory file, which contains the key system calls and memory information of the sample arranged in time series;
3)规则匹配模块对Pin平台处理后输出的样本轨迹文件进行逐行扫描,将规则库中的相邻关系表示为矩阵,按矩阵中的关系有无出现,使用数据结构中的有向图来表示样本里得的关键系统函数与数据的顺序与依赖关系;3) The rule matching module performs line-by-line scanning on the output sample trajectory files processed by the Pin platform, expresses the adjacent relationship in the rule base as a matrix, and uses the directed graph in the data structure to determine whether the relationship in the matrix appears or not. Indicates the order and dependency of the key system functions and data in the sample;
4)有向图与规则库中的规则进行匹配,得出与各个规则的匹配形态,以表示行为的出现顺序与程度,所有行为的匹配结果记录在特征文件中。4) The directed graph is matched with the rules in the rule base to obtain the matching form with each rule to indicate the order and degree of behavior, and the matching results of all behaviors are recorded in the feature file.
优选地,所述使用Pin提供的API编写Pintool来提取运行中的代码特征,为使用Pin平台处理模块对未知文件样本处理。Preferably, the API provided by Pin is used to write Pintool to extract running code features, which is to use the Pin platform processing module to process unknown file samples.
本发明所提供的基于本体的计算机病毒分析系统及其病毒特征提取方法在Pin平台上获得关键系统调用、内存信息,根据已有知识提取数据依赖关系与控制依赖关系,构建行为依赖图来表示描述病毒语义的行为特征,以此建立计算机病毒本体系统,在病毒样本增加的情况下实现自适应的特征学习与本体构建。因而在病毒样本增加的情况下实现了自适应的特征学习与本体聚类构建算法,从而达到适应病毒变种,较为准确分析与判断计算机病毒的目的。The ontology-based computer virus analysis system and virus feature extraction method provided by the present invention obtains key system calls and memory information on the Pin platform, extracts data dependencies and control dependencies based on existing knowledge, and constructs a behavior dependency graph to represent the description The behavioral characteristics of virus semantics are used to establish a computer virus ontology system, and to realize adaptive feature learning and ontology construction when virus samples increase. Therefore, in the case of increasing virus samples, adaptive feature learning and ontology clustering construction algorithms are realized, so as to achieve the purpose of adapting to virus variants and more accurately analyzing and judging computer viruses.
具体实施方式Detailed ways
下面将结合本发明实施例中的各个模块,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明的一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。另外地,不应当将本发明的保护范围仅仅限制至下述具体模块或具体参数。The technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with each module in the embodiments of the present invention. Apparently, the described embodiments are only part of the embodiments of the present invention, not all of them. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention. In addition, the protection scope of the present invention should not be limited only to the following specific modules or specific parameters.
基于本体的计算机病毒分析系统包括如下模块:(1)Pin平台处理模块,它对计算机病毒样本使用Pin平台上编写的程序进行处理,输出为轨迹文件,轨迹文件包含病毒样本的关键系统调用流程、内存信息。(2)具有自动更新的功能规则库模块,使用经验知识,通过研究计算机病毒典型行为的编程实现手段,使用提取数据依赖关系与控制依赖关系表示已知计算机病毒的典型行为。(3)规则匹配模块,规则匹配模块对Pin平台处理后输出的样本轨迹文件进行逐行分析,得出该文件的全部函数与数据的顺序与依赖关系,与规则库中的规则进行匹配,输出匹配具体结果,匹配具体结果使用本体知识进行处理与分类,包括后续与本体相关的三个模块。(4)本体管理模块,具有构建与查询功能,所建立的本体以OWL格式文件的形式存在,其具有一般本体的通用性,对已知病毒利用已知特征,使用本体知识通过protégéapi手动构建本体。实现了手动的类别,属性,实例的添加,删除,修改等操作,在此基础上实现病毒查询的功能。(5)本体的自适应学习模块,对于不断增加的病毒样本,使用聚类算法,在病毒本体知识树添加新出现的病毒特征与病毒种类。(6)本体相似度计算模块,对给出规则匹配结果的病毒样本,进行属性的相似度计算,给出病毒本体知识树中位置,得出病毒分析的最终结果。Ontology-based computer virus analysis system includes the following modules: (1) Pin platform processing module, which uses the program written on the Pin platform to process computer virus samples, and the output is a trajectory file, which contains the key system call process of the virus sample, Memory information. (2) Functional rule base module with automatic update, using empirical knowledge, by studying the programming implementation means of typical behavior of computer viruses, using extracted data dependencies and control dependencies to represent typical behaviors of known computer viruses. (3) Rule matching module, the rule matching module analyzes the sample track file outputted after processing by the Pin platform line by line, obtains the order and dependency relationship of all functions and data of the file, matches with the rules in the rule base, and outputs Match specific results, and use ontology knowledge to process and classify specific matching results, including three subsequent modules related to ontology. (4) Ontology management module, with construction and query functions, the established ontology exists in the form of OWL format file, which has the versatility of general ontology, uses known characteristics for known viruses, and uses ontology knowledge to manually construct ontology through protégéapi . It realizes manual operations such as adding, deleting, and modifying categories, attributes, and instances, and realizes the function of virus query on this basis. (5) The self-adaptive learning module of the ontology, for the increasing virus samples, uses the clustering algorithm to add new virus characteristics and virus types to the knowledge tree of the virus ontology. (6) Ontology similarity calculation module, which calculates the attribute similarity of the virus sample with the rule matching result, gives the position in the knowledge tree of the virus ontology, and obtains the final result of the virus analysis.
基于本体的计算机病毒分析系统病毒特征提取方法包括以下步骤:The ontology-based computer virus analysis system virus feature extraction method comprises the following steps:
1)规则库模块中的规则为关键各种典型病毒行为编写方法的描述,是关键系统调用之间的协调与组合,采用序列描述表示API函数的出现次序与各API函数之间的某些参数相等以及因果的逻辑关系。1) The rules in the rule base module are the description of key various typical virus behavior writing methods, which are the coordination and combination of key system calls, and the sequence description is used to indicate the order of appearance of API functions and certain parameters between API functions Logical relationships of equality and causality.
2)使用Pin提供的API编写Pintool来提取运行中的代码特征,即使用Pin平台处理模块对未知文件样本处理,输出为轨迹文件,包含了样本的按时序排列的关键系统调用、内存信息。2) Use the API provided by Pin to write Pintool to extract the running code features, that is, use the Pin platform processing module to process unknown file samples, and output it as a trajectory file, which contains the key system calls and memory information of the sample in chronological order.
3)规则匹配模块对Pin平台处理后输出的样本轨迹文件进行逐行扫描,将规则库中的相邻关系表示为矩阵,按矩阵中的关系有无出现,使用数据结构中的有向图来表示样本里得的关键系统函数与数据的顺序与依赖关系。3) The rule matching module performs line-by-line scanning on the output sample trajectory files processed by the Pin platform, expresses the adjacent relationship in the rule base as a matrix, and uses the directed graph in the data structure to determine whether the relationship in the matrix appears or not. Indicates the sequence and dependencies of the key system functions and data in the sample.
4)有向图与规则库中的规则进行匹配,得出与各个规则的匹配形态,以表示行为的出现顺序与程度,所有行为的匹配结果记录在特征文件中。4) The directed graph is matched with the rules in the rule base to obtain the matching form with each rule to indicate the order and degree of behavior, and the matching results of all behaviors are recorded in the feature file.
为使本发明的目的、技术方案和优点更加清楚,下面将对本发明实施方式作进一步地详细描述。In order to make the purpose, technical solution and advantages of the present invention clearer, the implementation manners of the present invention will be further described in detail below.
为了达到适应新病毒与病毒变种,较为准确分析与判断计算机病毒的目的,在特征码不明显,或特征值不能完整描述病毒及病毒变种性质的情况下,提高检测正确率,本发明实施例提供了一种基于本体的计算机病毒分析系统及计算机病毒特征提取方法,详见下文描述:In order to achieve the purpose of adapting to new viruses and virus variants and more accurately analyzing and judging computer viruses, when the characteristic code is not obvious, or the characteristic value cannot completely describe the nature of the virus and virus variant, to improve the detection accuracy rate, the embodiment of the present invention provides An ontology-based computer virus analysis system and computer virus feature extraction method are proposed, as described below for details:
1、本方案中Pin平台处理模块,其实现的关键步骤是收集了病毒行为有关WinAPI函数,根据其原型中参数个数与类型编写插装函数,使得Pin能查找与病毒分析有关的有效的函数对象。1. The key step for the implementation of the Pin platform processing module in this solution is to collect WinAPI functions related to virus behavior, and write plug-in functions according to the number and type of parameters in its prototype, so that Pin can find effective functions related to virus analysis object.
Pin是Intel公司提供的程序插装平台工具,支持IA-32,Intel(R)64和IA64架构上的Linux和Windows可执行程序,网址为pintool.org/。Pin通过在可执行程序的任何地方插入C或C++编写的任意代码,使得可以将Pin附加到进程上。Pin执行具体的插装任务需通过定义Pintool来实现。Pin is a program instrumentation platform tool provided by Intel Corporation. It supports Linux and Windows executable programs on IA-32, Intel(R)64 and IA64 architectures. The website is pintool.org/. Pin makes it possible to attach Pin to a process by inserting arbitrary code written in C or C++ anywhere in the executable program. Pin implements specific instrumentation tasks by defining Pintool.
本模块使用Pin提供的API编写Pintool来实现提取运行中的病毒代码特征,包括以下编写步骤:This module uses the API provided by Pin to write Pintool to extract the characteristics of running virus code, including the following writing steps:
1)初始化:首先调用PIN_InitSymbols,之后调用PIN_Init初始化Pin系统。打开输出文件流,以备后续结果输出。1) Initialization: First call PIN_InitSymbols, then call PIN_Init to initialize the Pin system. Open an output file stream for subsequent output of results.
2)注册回调函数:使用IMG_AddInstrumentFunction注册自定义回调函数,对按本方法中收集的与病毒行为有关的函数列表,查找得到有效的函数对象,进行插装操作,插装函数为依据WinAPI函数原型中参数个数和类型编写。2) Register callback function: Use IMG_AddInstrumentFunction to register a custom callback function, find out the effective function object from the list of functions related to virus behavior collected in this method, and perform the instrumentation operation. The instrumentation function is based on the WinAPI function prototype Write the number and type of parameters.
3)使用Pin_StartProgram()启动被插装的代码,输出到结果文件。3) Use Pin_StartProgram() to start the instrumented code and output to the result file.
Pin工具本身提供使用手册,其常规使用方法为本领域技术人员使用者所公知,本发明实施例在此不再赘述。The Pin tool itself provides a user manual, and its conventional use methods are well known to users skilled in the art, so the embodiments of the present invention will not be repeated here.
2、具有自动更新的功能规则库模块,使用经验知识,通过研究计算机病毒典型行为的编程实现手段,使用提取数据依赖关系与控制依赖关系表示已知计算机病毒的典型行为。规则为关键各种典型病毒行为编写方法的描述,是关键系统调用之间的协调与组合,采用序列描述表示API函数的出现次序与各API函数之间的某些参数相等以及因果的逻辑关系。2. Functional rule library module with automatic update, using empirical knowledge, by studying the programming implementation means of typical behaviors of computer viruses, using extracted data dependencies and control dependencies to represent typical behaviors of known computer viruses. The rules are the description of key various typical virus behavior writing methods, which are the coordination and combination of key system calls. The sequence description is used to express the order of appearance of API functions and the equality of certain parameters between API functions and the causal logical relationship.
如果数量较多的样本在下述步骤3的处理过程中,多次出现新的API组合,根据其距离已有规则的重要程度与距离,设定阈值,使得新的行为增加到规则库中。If a large number of samples have multiple new API combinations during the processing of the following step 3, set the threshold according to their importance and distance from the existing rules, so that new behaviors can be added to the rule base.
3、规则匹配模块对Pin平台处理后输出的样本轨迹文件进行逐行扫描,将规则库中的相邻关系表示为矩阵,按矩阵中的关系有无出现,使用数据结构中的有向图来表示样本里得的关键系统函数与数据的顺序与依赖关系。有向图与规则库中的规则进行匹配,也就是(2)中的表示序列,得出与各个规则的匹配形态,以表示行为的出现顺序与程度,所有行为的匹配结果记录在特征文件中,输出为特征文件,输出结果再使用本体知识进行处理与分类,包括后续与本体相关的三个模块。3. The rule matching module scans the sample track files outputted after processing by the Pin platform line by line, expresses the adjacent relationship in the rule base as a matrix, and uses the directed graph in the data structure to determine whether the relationship in the matrix appears or not. Indicates the sequence and dependencies of the key system functions and data in the sample. The directed graph is matched with the rules in the rule base, that is, the representation sequence in (2), and the matching form with each rule is obtained to indicate the order and degree of behavior. The matching results of all behaviors are recorded in the feature file. , the output is a feature file, and the output result is then processed and classified using ontology knowledge, including three subsequent modules related to ontology.
与本体相关的三个模块,分别是本体管理模块,本体自适应学习模块与本体相似度计算模块。这些模块都是在netbeans平台下,采用java语言进行编写,设计了下述算法对进行相似程度的计算,通过protégé api进行对本体的操纵,实现本体的构建,查询和管理。The three modules related to ontology are ontology management module, ontology adaptive learning module and ontology similarity calculation module. These modules are all written in java language under the netbeans platform, and the following algorithm is designed to calculate the similarity, and the ontology is manipulated through the protégé api to realize the construction, query and management of the ontology.
4、本体管理模块,具有构建与查询功能,所建立的本体以OWL格式文件的形式存在,其具有一般本体的通用性,对已知病毒利用已知特征,使用本体知识通过protégé api手动构建本体。实现了手动的类别,属性,实例的添加,删除,修改等操作,在此基础上实现病毒知识查询的功能。4. The ontology management module has the functions of construction and query. The established ontology exists in the form of OWL format files, which has the versatility of general ontology. It uses known features for known viruses and uses ontology knowledge to manually construct ontology through protégé api . It realizes manual operations such as adding, deleting, and modifying categories, attributes, and instances, and realizes the function of virus knowledge query on this basis.
病毒查询查询针对某具体的病毒知识进行查询,主要通过关键字查询,所使用关键字种类为名称关键字和功能关键字。Virus query Query for a specific virus knowledge, mainly through keyword query, the types of keywords used are name keywords and function keywords.
1)当关键字为病毒名称时,用获取的关键字和病毒知识树的病毒名称进行比较,获得所需的病毒,并显示病毒的父节点,子节点以及属性等详细信息。1) When the keyword is a virus name, compare the obtained keyword with the virus name of the virus knowledge tree to obtain the desired virus, and display detailed information such as the parent node, child node, and attribute of the virus.
2)当关键字为功能名称时,用获取的关键字和病毒本体知识树中的对象属性和数据属性进行比较,显示所查询到得属性名称,并通过显示其domain和range来丰富查询内容2) When the keyword is a function name, compare the obtained keyword with the object attribute and data attribute in the virus ontology knowledge tree, display the attribute name obtained by query, and enrich the query content by displaying its domain and range
通过上述步骤,可根据需求对本体中已经存在的病毒知识进行查询。Through the above steps, the virus knowledge already existing in the ontology can be queried as required.
5、本体的自适应学习模块,对于不断增加的病毒样本,使用聚类算法,在病毒本体知识树添加新出现的病毒特征与病毒种类,使得病毒本体更加完善,主要有下面两种处理方法:类内实例产生明显的聚类现象,说明有新类的生成;不同的类(兄弟之间)所包含的实例的距离变小,则重新聚类,可能产生新的类。所设计病毒本体的自适应学习算法主要步骤如下:5. Ontology's self-adaptive learning module uses clustering algorithms to add new virus characteristics and virus types to the virus ontology knowledge tree for the ever-increasing number of virus samples, making the virus ontology more complete. There are two main processing methods: Intra-class instances produce obvious clustering phenomenon, indicating that a new class is generated; if the distance between instances contained in different classes (brothers) becomes smaller, clustering may be performed again, and a new class may be generated. The main steps of the adaptive learning algorithm of the designed virus ontology are as follows:
设定阈值s、a、b。Set thresholds s, a, b.
2)当某一类中的实例达到一定的数量s时,对这些实例进行聚类,计算聚类中心的距离,当着距离大于a时,则将这些实例分裂,并添加新的病毒分类到树中。2) When the number of instances in a certain class reaches a certain number s, cluster these instances and calculate the distance between the cluster centers. When the distance is greater than a, split these instances and add a new virus classification to in the tree.
3)计算相邻两个类(兄弟节点之间)所包含的实例之间的相似程度,若相似程度大于阈值b,并且大于原来所在类的实例之间的相似程度,则重新调整实例的位置,产生新的类。3) Calculate the similarity between the instances contained in two adjacent classes (between sibling nodes), if the similarity is greater than the threshold b and greater than the similarity between the instances of the original class, readjust the position of the instance , generating new classes.
6、本体相似度计算模块,对给出规则匹配结果的病毒检测样本,进行属性的相似度计算,给出病毒本体知识树中位置,得出病毒分析的最终结果。6. Ontology similarity calculation module, which calculates the attribute similarity of the virus detection samples with rule matching results, gives the position in the virus ontology knowledge tree, and obtains the final result of virus analysis.
其中使用样本与病毒属性的相似度计算方法。病毒首先具有粗细粒度行为特征,其次具有完成这次行为特征它需要调用的API序列。依据已有知识对典型病毒建立包含行为间层次、逻辑、时序关系以及行为的API等内容的行为特征树。特征树的上层节点为大粒度行为,子节点是组成父节点的小粒度行为,叶节点是为完成这个行为调用的API方法,叶节点之间与子节点之间有与或关系,调用顺序等时序关系。由未知样本的规则匹配结果,根据规则时序关系与API信息,建立样本特征树,比较样本特征树在病毒特征树中的覆盖程度,计算样本特征树与病毒特征树的相似程度,所述具体步骤如下:The similarity calculation method between the sample and virus attributes is used. A virus first has coarse-grained behavioral characteristics, and secondly has the API sequence it needs to call to complete this behavioral characteristic. Based on the existing knowledge, a behavioral feature tree is established for typical viruses, including the level, logic, timing relationship between behaviors, and APIs of behaviors. The upper nodes of the feature tree are large-grained behaviors, and the child nodes are the small-grained behaviors that make up the parent node. The leaf nodes are the API methods called to complete this behavior. There is an AND or relationship between the leaf nodes and the child nodes, and the order of calls, etc. timing relationship. From the rule matching results of unknown samples, according to the rule timing relationship and API information, establish a sample feature tree, compare the coverage of the sample feature tree in the virus feature tree, and calculate the similarity between the sample feature tree and the virus feature tree, the specific steps as follows:
设定整数m和n,分别代表相同节点个数和不同节点个数。The integers m and n are set to represent the number of identical nodes and the number of different nodes respectively.
从根节点始,采用深度优先遍历算法,每次比较的两个节点处于树的同一位置上,针对某一节点只有一棵树存在,则视为不同节点,n加1。即父节点的值相同,深度相同,比较两个节点的值,不同则n加1。若相同,则转(3)。Starting from the root node, the depth-first traversal algorithm is adopted. The two nodes compared each time are in the same position of the tree. For a node, only one tree exists, it is regarded as a different node, and n is increased by 1. That is, the value of the parent node is the same, the depth is the same, compare the values of the two nodes, if they are different, add 1 to n. If they are the same, go to (3).
对两个节点的子节点,检验节点值相等,比较时序关系,验算与或关系,若都相同,则m加1;否则,n+1。For the child nodes of two nodes, check that the node values are equal, compare the timing relationship, and check the AND or relationship. If they are the same, add 1 to m; otherwise, n+1.
得出两者的相似度为Sim(V1,V2)=m/(m+n)(V1为待测样本,V2为病毒)。The similarity between the two is obtained as Sim(V 1 , V 2 )=m/(m+n) (V 1 is the sample to be tested, and V 2 is the virus).
通过上述步骤完成了对病毒本体知识树中所有病毒与新样本的比较,得到一组相似度,确定最大相似度在病毒本体树的位置,给出分类结果与确定特征。Through the above steps, the comparison of all viruses in the virus ontology knowledge tree with the new samples is completed, a set of similarities is obtained, the position of the maximum similarity in the virus ontology tree is determined, and classification results and characteristics are given.
综上所述,本发明实施例提供了一种基于本体的计算机病毒分析系统及计算机病毒特征提取方法,本发明在Pin平台上运行处理获得待检测样本的含有关键系统调用及内存信息的轨迹文件,根据所建立描述典型行为的规则库的内容,分析轨迹文件提取数据依赖关系与控制依赖关系,构建有向图来表示描述病毒语义的行为特征,并与规则匹配,得出各个规则的表现程度,同样以此建立计算机病毒本体,对待测样本通过相似度计算,确定在病毒本体知识树的位置,给出系统分析的结果。在病毒样本增加的情况下实现了自适应的特征学习与本体聚类构建算法,从而达到适应病毒变种,较为准确分析与判断计算机病毒的目的。To sum up, the embodiment of the present invention provides an ontology-based computer virus analysis system and a computer virus feature extraction method. The present invention operates on the Pin platform to obtain a trace file containing key system calls and memory information of the sample to be detected. , according to the content of the established rule base describing typical behaviors, analyze the trajectory file to extract data dependencies and control dependencies, construct a directed graph to represent the behavioral characteristics describing the semantics of the virus, and match the rules to obtain the performance level of each rule , also establish computer virus ontology based on this, determine the position in the knowledge tree of virus ontology through the similarity calculation of the samples to be tested, and give the result of system analysis. In the case of increasing virus samples, adaptive feature learning and ontology clustering construction algorithms are realized, so as to adapt to virus variants and more accurately analyze and judge computer viruses.
以上所述,仅为本发明较佳的具体实施方式,但本发明的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本发明揭露的技术范围内,可轻易想到的变化或替换,都应涵盖在本发明的保护范围之内。本领域普通的技术人员可以理解,在不背离所附权利要求定义的本发明的精神和范围的情况下,可以在形式和细节中做出各种各样的修改。The above is only a preferred embodiment of the present invention, but the scope of protection of the present invention is not limited thereto. Any person skilled in the art within the technical scope disclosed in the present invention can easily think of changes or Replacement should be covered within the protection scope of the present invention. It will be understood by those of ordinary skill in the art that various changes in form and details may be made without departing from the spirit and scope of the present invention as defined by the appended claims.
Claims (8)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310750929.1A CN103679034B (en) | 2013-12-26 | 2013-12-26 | A kind of computer virus analytic system based on body and feature extracting method thereof |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310750929.1A CN103679034B (en) | 2013-12-26 | 2013-12-26 | A kind of computer virus analytic system based on body and feature extracting method thereof |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103679034A true CN103679034A (en) | 2014-03-26 |
CN103679034B CN103679034B (en) | 2016-04-13 |
Family
ID=50316544
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201310750929.1A Expired - Fee Related CN103679034B (en) | 2013-12-26 | 2013-12-26 | A kind of computer virus analytic system based on body and feature extracting method thereof |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103679034B (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105740711A (en) * | 2016-01-29 | 2016-07-06 | 哈尔滨工业大学深圳研究生院 | Malicious code detection method and system based on kernel object behavior body |
CN105830060A (en) * | 2014-02-06 | 2016-08-03 | 富士施乐株式会社 | Information processing device, information processing program, storage medium, and information processing method |
CN107038380A (en) * | 2017-04-14 | 2017-08-11 | 华中科技大学 | A kind of leak detection method and system based on performance of program tree |
CN109145601A (en) * | 2017-06-27 | 2019-01-04 | 英特尔公司 | Malware detection system attack prevents |
CN110457903A (en) * | 2019-07-24 | 2019-11-15 | 腾讯科技(深圳)有限公司 | A virus analysis method, device, equipment and medium |
CN111143848A (en) * | 2019-12-31 | 2020-05-12 | 成都科来软件有限公司 | System for recording sample behaviors and formulating virus rules |
CN112767135A (en) * | 2021-01-26 | 2021-05-07 | 北京健康之家科技有限公司 | Rule engine configuration method and device, storage medium and computer equipment |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101162485A (en) * | 2006-10-11 | 2008-04-16 | 飞塔信息科技(北京)有限公司 | Method and system for processing computer malicious code |
CN101853200A (en) * | 2010-05-07 | 2010-10-06 | 北京大学 | An Efficient Dynamic Software Vulnerability Mining Method |
US20120144488A1 (en) * | 2010-12-01 | 2012-06-07 | Symantec Corporation | Computer virus detection systems and methods |
US20130174257A1 (en) * | 2010-08-18 | 2013-07-04 | Qizhi Software (Beijing) Company Limited | Active Defense Method on The Basis of Cloud Security |
CN103440201A (en) * | 2013-09-05 | 2013-12-11 | 北京邮电大学 | Dynamic taint analysis device and application thereof to document format reverse analysis |
-
2013
- 2013-12-26 CN CN201310750929.1A patent/CN103679034B/en not_active Expired - Fee Related
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101162485A (en) * | 2006-10-11 | 2008-04-16 | 飞塔信息科技(北京)有限公司 | Method and system for processing computer malicious code |
CN101853200A (en) * | 2010-05-07 | 2010-10-06 | 北京大学 | An Efficient Dynamic Software Vulnerability Mining Method |
US20130174257A1 (en) * | 2010-08-18 | 2013-07-04 | Qizhi Software (Beijing) Company Limited | Active Defense Method on The Basis of Cloud Security |
US20120144488A1 (en) * | 2010-12-01 | 2012-06-07 | Symantec Corporation | Computer virus detection systems and methods |
CN103440201A (en) * | 2013-09-05 | 2013-12-11 | 北京邮电大学 | Dynamic taint analysis device and application thereof to document format reverse analysis |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105830060A (en) * | 2014-02-06 | 2016-08-03 | 富士施乐株式会社 | Information processing device, information processing program, storage medium, and information processing method |
CN105740711A (en) * | 2016-01-29 | 2016-07-06 | 哈尔滨工业大学深圳研究生院 | Malicious code detection method and system based on kernel object behavior body |
CN105740711B (en) * | 2016-01-29 | 2018-08-31 | 哈尔滨工业大学深圳研究生院 | A kind of malicious code detecting method and system based on kernel objects behavior ontology |
CN107038380A (en) * | 2017-04-14 | 2017-08-11 | 华中科技大学 | A kind of leak detection method and system based on performance of program tree |
CN107038380B (en) * | 2017-04-14 | 2019-07-05 | 华中科技大学 | A kind of leak detection method and system based on performance of program tree |
CN109145601A (en) * | 2017-06-27 | 2019-01-04 | 英特尔公司 | Malware detection system attack prevents |
CN110457903A (en) * | 2019-07-24 | 2019-11-15 | 腾讯科技(深圳)有限公司 | A virus analysis method, device, equipment and medium |
CN110457903B (en) * | 2019-07-24 | 2024-11-12 | 腾讯科技(深圳)有限公司 | A virus analysis method, device, equipment and medium |
CN111143848A (en) * | 2019-12-31 | 2020-05-12 | 成都科来软件有限公司 | System for recording sample behaviors and formulating virus rules |
CN112767135A (en) * | 2021-01-26 | 2021-05-07 | 北京健康之家科技有限公司 | Rule engine configuration method and device, storage medium and computer equipment |
CN112767135B (en) * | 2021-01-26 | 2024-02-27 | 北京水滴科技集团有限公司 | Configuration method and device of rule engine, storage medium and computer equipment |
Also Published As
Publication number | Publication date |
---|---|
CN103679034B (en) | 2016-04-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Ding et al. | A malware detection method based on family behavior graph | |
CN103679034B (en) | A kind of computer virus analytic system based on body and feature extracting method thereof | |
Cesare et al. | Control flow-based malware variantdetection | |
CN107169358B (en) | Code homology detection method and its device based on code fingerprint | |
Ghiasi et al. | Dynamic VSA: a framework for malware detection based on register contents | |
Ren et al. | Label noise reduction in entity typing by heterogeneous partial-label embedding | |
US20190199736A1 (en) | Cyber vaccine and predictive-malware-defense methods and systems | |
Mayvan et al. | Design pattern detection based on the graph theory | |
US20180189487A1 (en) | Behavior specification, finding main, and call graph visualizations | |
Liang et al. | A behavior-based malware variant classification technique | |
Carlin et al. | The effects of traditional anti-virus labels on malware detection using dynamic runtime opcodes | |
CN109271788B (en) | Android malicious software detection method based on deep learning | |
CN101923618B (en) | A Vulnerability Detection Method of Assembly Instruction Level Based on Hidden Markov Model | |
CN111382438B (en) | Malware detection method based on multi-scale convolutional neural network | |
Al-Hashmi et al. | Deep-ensemble and multifaceted behavioral malware variant detection model | |
CN112115326B (en) | A multi-label classification and vulnerability detection method for Ethereum smart contracts | |
Wang et al. | Explainable apt attribution for malware using nlp techniques | |
CN116149669B (en) | Binary file-based software component analysis method, binary file-based software component analysis device and binary file-based medium | |
Wisse et al. | Scripting dna: Identifying the javascript programmer | |
Zhang et al. | Slowing down the aging of learning-based malware detectors with api knowledge | |
CN117725592A (en) | A smart contract vulnerability detection method based on directed graph attention network | |
CN117195233A (en) | Bill of materials SBOM+ analysis method and device for open source software supply chain | |
CN103971054A (en) | Detecting method of browser extension loophole based on behavior sequence | |
Lin et al. | Towards interpreting ML-based automated malware detection models: A survey | |
CN104750484B (en) | A kind of code abstraction generating method based on maximum entropy model |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20160413 |