CN103294954A - Compound document malicious code detecting technique and system based on spectral analysis - Google Patents

Compound document malicious code detecting technique and system based on spectral analysis Download PDF

Info

Publication number
CN103294954A
CN103294954A CN2013102245691A CN201310224569A CN103294954A CN 103294954 A CN103294954 A CN 103294954A CN 2013102245691 A CN2013102245691 A CN 2013102245691A CN 201310224569 A CN201310224569 A CN 201310224569A CN 103294954 A CN103294954 A CN 103294954A
Authority
CN
China
Prior art keywords
document
malicious code
compound document
phase
spectrum
Prior art date
Application number
CN2013102245691A
Other languages
Chinese (zh)
Other versions
CN103294954B (en
Inventor
方勇
贾鹏
左政�
Original Assignee
四川大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 四川大学 filed Critical 四川大学
Priority to CN201310224569.1A priority Critical patent/CN103294954B/en
Publication of CN103294954A publication Critical patent/CN103294954A/en
Application granted granted Critical
Publication of CN103294954B publication Critical patent/CN103294954B/en

Links

Abstract

The invention relates to the technical field of computer malicious code detecting and spectral analysis and aims at providing a compound document malicious code detecting technique and system based on spectral analysis. The compound document malicious code detecting technique utilizes a detecting scheme based on compound document phase spectrum analysis and includes a method of converting binary data of a static compound document into a phase spectrum, a method of automatically extracting features such as uniformity, phase values and spectrum width of the phase spectrum, a method of building a large number of comparison sets by designing comparison experiments and calculating out a general difference formulating and determining criterion and a method of detecting malicious codes of the compound document by analyzing features of the phase spectrum. The compound document malicious code detecting system is strong in pertinence of an objective, does not need actuation in the process of detecting and can detect out unknown malicious codes. The compound document malicious code detecting technique and system based on spectral analysis is a novel solution to detection of malicious codes of compound documents.

Description

一种基于频谱分析的复合文档恶意代码检测技术与系统 A composite document of malicious code detection technology and systems based on spectrum analysis

技术领域 FIELD

[0001] 本发明涉及计算机恶意代码检测技术领域和频谱分析技术领域,尤其涉及一种基于频谱分析的复合文档恶意代码检测技术与系统。 [0001] The present invention relates to the field of computer malware detection techniques and spectrum analysis technical field, particularly to a compound document technology and malware detection systems based on spectral analysis.

背景技术 Background technique

[0002] 随着电子办公的不断发展,以复合文档为代表的电子文档使用越来越普遍,于此同时复合文档也成为了恶意代码的攻击目标,通过将自身绑定到复合文档,恶意代码能够方便的实现启动和隐藏自身。 [0002] With the development of electronic office to compound document represented by the electronic document using more and more common, this compound document at the same time also become a target for malicious code, by itself bound to compound document, the malicious code It can facilitate the realization of start-up and hide itself. 每年检测到的恶意文件中涉及复合文档的占到5%左右,用户在使用浏览器和即时聊天软件下载或者接受文件时文件携带恶意代码的比例在6%-10%之间,这里又有很大的比例关系到复合文档。 The proportion of files carrying malicious code accounted for about 5% per year when the detected malicious files involved in a compound document, users use the browser and instant messaging software download or receive files between 6% -10%, where there are very a large proportion relationship to the composite document. 复合文档的安全性受到了严重的威胁,对复合文档恶意代码进行有针对性的检测已经刻不容缓。 Compound document security has been seriously threatened, the composite document targeted malicious code detection is desperately needed.

[0003] 传统的针对恶意代码的检测,主要分为特征码检测技术以及非特征码检测技术。 [0003] The traditional detection for the malicious code, signature detection techniques and divided into non-signature detection techniques. 特征码检测技术是一种静态检测方法,通过提取恶意代码特征码的方法对其进行检测。 Signature detection technique is a method for detecting static, subjected to extraction by the methods of detecting malicious code signature. 非特征码技术也称为动态检测技术,现有的技术主要有基于行为分析的恶意代码检测技术、启发式分析的检测技术、沙盒技术等等。 Non-signature technology, also known as dynamic detection technology, existing technologies include malicious code detection technology Behavior-based analysis, heuristic analysis detection technology sandbox technologies. 传统的恶意代码检测技术在检测复合文档型的恶意代码方面存在以下的这些问题。 Traditional malicious code detection technology following problems exist in the compound document type detection of malicious code terms.

[0004] 一、静态检测技术无法检测未知的恶意代码。 [0004] First, the static detection technology can not detect unknown malicious code.

[0005] 二、动态检测技术检测效率低、代价大,且准确度偏低。 [0005] Second, by detection of a low dynamic efficiency, high cost and low accuracy.

[0006] 三、没有一种专门的针对复合文档恶意代码的检测方法,对于复合文档恶意代码的检测还是采用传统的技术,没有充分利用复合文档的结构特点。 [0006] Third, none of the specific method for detecting malicious code in a compound document, the compound document for detecting malicious code or the use of conventional techniques, not fully utilizing the structural characteristics of the composite document.

[0007] 同时,因为复合文档恶意代码迷惑性高,实现起来较为简单,所以通过复合文档进行传播的恶意代码数量会越来越多。 [0007] Also, because the compound document malicious code obfuscation is high, relatively simple to implement, so were the number of malicious code spread through more and more complex documents. 因此,针对复合文档恶意代码检测成为了一个迫切需要研究的问题,需要研究一种新的方法,能够针对复合文档的特点,克服以上的三个问题,提出一种不同于传统恶意代码检测方法的检测技术,对复合文档恶意代码进行有效检测。 Therefore, for the compound document to detect malicious code has become an urgent need to study, you need to research a new method, the characteristics of compound documents can overcome the above three problems, which is different from traditional malicious code detection method detection technology, compound document effectively detect malicious code.

发明内容 SUMMARY

[0008] “一种基于频谱分析的复合文档恶意代码检测技术与系统”是在恶意代码检测技术的研究过程中针对存在的现有技术问题所提出的发明。 [0008] "A composite document malicious code detection and spectral analysis system" in the invention is the study of malicious code detection techniques prior art for the proposed problem. 本发明的一个目标是改善现有检测方法针对性差的弱点,提供一种基于频谱分析的复合文档恶意代码检测技术,有针对性的对绑定了恶意代码的复合文档进行检测。 An object of the present invention to improve the conventional method for detecting poor targeting weakness, there is provided a composite document based on spectral analysis of malicious code detection technique, the binding of the targeted compound document malicious code is detected. 复合文档的二进制数据从某种程度上来说也可以看作是一种信号,作为一种信号就会在频域上表现出一定的特性,通过对复合文档的数据进行时域到频域的变换,就可以得到其在频域上的特性。 Binary data of the composite document to some extent may be considered a signal, as a signal will exhibit some characteristics in the frequency domain to the time domain data of the compound document by a frequency domain transform , their characteristics can be obtained in the frequency domain. 本发明中的检测方法提供了一种全新的检测思路,不依赖于恶意代码传统的静态特征和行为特征,不需要打开复合文档,不需要监测系统的各方面状态变化,而是根据复合文档的相位谱特征进行检测。 Detection methods of the present invention provides a new idea for detecting, malicious code is not dependent on traditional static characteristics and behavior features, aspects of the state change without opening the compound document, no need to monitor the system, but according to the compound document wherein the phase spectrum is detected. 该方法在检测过程中能够有效的保护系统和用户数据安全,并且该检测方案针对性强,因此其检测的准确率也较传统的检测技术高。 The method in the detection process can be effectively protect user data and system security, and the targeted detection scheme, so that the detection accuracy is also higher than the conventional detection technique. [0009] 为了实现上述目标,本发明提供了一种基于频谱分析的复合文档恶意代码检测系统,该系统能够从复合文档的二进制数据中提取出实数序列,然后采用相关算法进行变换得到文档的相位谱,进而根据相位谱特征进行判断。 [0009] In order to achieve the above object, the present invention provides a Compound Document malware detection system of spectral analysis, the system can be extracted from the binary data of a compound document in a sequence of real numbers, then using correlation algorithms phase shift obtained document spectrum, and further wherein the determination based on the phase spectrum. 该系统包含了:管理端,对检测过程和检测结果进行管理,并维护一个特征数据库;数据提取器,用于从复合文档的二进制数据中剥离出固定部分的数据,并将二进制数据转换成实数序列,然后根据抽样率对实数序列进行抽样;相位谱生成器,接收来自数据提取模块的数据,采用相应的变换算法,生成文档的相位谱;频谱分析器,用于对生成的文档相位谱进行特征分析,给出判定结果。 The system comprising: the management end, and the detection result of the detection process to manage and maintain a feature database; data extractor for stripping out the data from the binary data of the fixed portion of the composite document, and the binary data is converted into real numbers sequence, then the real number sequence sampled according to the sampling rate; phase spectrum generator, receives data from a data extraction module, using the corresponding phase transformation algorithm, to generate the document spectrum; spectrum analyzer, the phase spectrum for the document is generated characterization, the determination result is given.

附图说明 BRIEF DESCRIPTION

[0010] 从下面结合附图的详细描述中,将会更清楚的理解本发明的目标、实现方法、优点和特性,其中。 [0010] from the following detailed description, it will be more clearly understood from the object of the present invention to achieve a method, advantages and features, which.

[0011] 图1是一个展示本发明的基于频谱分析的复合文档恶意代码检测系统的架构图。 [0011] FIG. 1 is a schematic diagram showing the present invention compound document malicious code detection system based on spectral analysis.

[0012] 图2是一个展示本发明的检测系统管理端内部组成单元的方框图。 [0012] FIG. 2 is a block diagram of a detection system of the present invention is composed of the management end of the inner display unit.

[0013] 图3是一个说明本发明的检测系统数据提取器从复合文档提取数据的示意图。 [0013] FIG. 3 is a schematic diagram of a detection system according to the present invention, a data extractor for extracting data from the composite documentation.

[0014] 图4是一个说明本发明的检测系统相位谱生成器生成相位谱过程的示意图。 [0014] FIG. 4 is an illustration of the phase detection system of the present invention a schematic generator to generate phase spectrum during spectrum.

[0015] 图5是一个展示本发明的检测系统频谱分析器内部组成单元的方框图。 [0015] FIG. 5 is a block diagram of a spectrum analyzer means for detecting the internal composition of the present invention display system.

[0016] 图6是一个说明本发明的检测系统完整工作流程的流程图。 [0016] FIG. 6 is a flowchart illustrating the detection of the complete system of the present invention, the workflow description.

具体实施方式 Detailed ways

[0017] 本发明中判定准则制定过程中用于绑定样本的恶意代码有很多类型,利用绑定工具可以把这些代码绑定到正常的复合文档中,这些恶意代码主要有以下一些类型。 [0017] In the present invention, malicious code decision criteria developed for binding process there are many types of samples, the use of these tools can bind to a normal code to bind compound document, malicious code mainly in the following types.

[0018] I)下载器代码。 [0018] I) downloading code.

[0019] 2)键盘记录代码。 [0019] 2) recording the code keyboard.

[0020] 3)修改注册表项代码。 [0020] 3) modifying a registry key code.

[0021] 4)密码发送代码。 [0021] 4) to send the password code.

[0022] 5)上传资料代码。 [0022] 5) upload the data code.

[0023] 6)弹出窗口代码。 [0023] 6) pop-up code.

[0024] 下面结合附图对本发明做进一步的说明。 [0024] The following drawings further illustrate the present invention in combination. 本发明旨在提供一种针对复合文档恶意代码进行检测的系统,高效准确的检测Office系列文档、PDF文档等常用格式的复合文档中包含的恶意代码,保护系统数据和用户数据的安全。 The present invention aims to provide a compound document be against malicious code detection system, malicious code commonly used compound document formats efficient and accurate detection of Office series of documents, PDF documents, etc. contained in the secure system data and user data.

[0025] 图1是一个描述本发明系统组成的架构图。 [0025] FIG. 1 is a schematic diagram of a system according to the present invention is described with the composition.

[0026] 如图1所示,检测系统的核心是管理端,和其他模块进行数据交互,其他模块根据管理端的配置信息进行相应的操作,管理端同时维护着一个特征数据库,里面存放了各种格式的复合文档的相位谱特征信息和判定准则,针对不同类型的恶意代码有不同的判定准贝U。 Core [0026] As shown, detection system management terminal 1, to exchange data with other modules, the other modules perform a corresponding operation according to configuration information management terminal, a management side while maintaining the feature database, stored inside a variety of wherein the phase spectrum of the compound document format information and decision criteria, determines different for different types of shellfish quasi malicious code U. 通过管理端还可以进行各种配置信息的管理和修改,其中有取样频率、变换算法等。 You can also modify the configuration and management of various information through the management end, which sampling frequency conversion algorithm. 复合文档从数据转换器进入检测系统,经过一系列变换和判断,由管理端输出结果。 Compound document entry detection system from the data converter, and converted through a series of determination, the output from the management side.

[0027] 图2是一个描述管理端内部组成单元的示意图。 [0027] FIG. 2 is a schematic view of the internal management terminal constitutional unit is described.

[0028] 如图2中所示,管理端由逻辑控制单元、数据库管理单元、策略配置单元和用户界面单元组成。 [0028] As shown in FIG Duanyou management logic control unit, a database management unit, and a policy configuration unit composed of user interface unit 2. 逻辑控制单元负责控制整个系统的运行逻辑,通过对系统其它部分的调用来实现检测功能。 Logic control unit responsible for controlling the logic operation of the entire system, to achieve detection function calls to other parts of the system. 数据库管理单元负责管理维护数据库,数据库中主要存储判定标准和检测结果,检测结果的内容主要有文件类型、MD5值以及对该文档的判定结果,每次对待检测文件进行检测之前都要先进行MD5的计算与匹配,匹配成功则直接给出原有的检测结果,避免重复检测。 MD5 before they have to be responsible for managing the database management unit maintains a database, the database stored in the main criteria and test results, test results are mainly content file types, MD5 values, and the determination result of the document, each file is detected to be detected and calculation, the matching is successful direct detection result given by the original, to avoid duplicate detection. 策略配置单元则负责管理系统检测过程中的各种策略信息,逻辑控制单元会根据这些配置策略控制系统的各组件进行相应操作。 The policy configuration unit is responsible for managing the various system policy information detection process, the logic control unit will operate in accordance with the corresponding components of the control system configuration policies. 用户界面单元是系统与用户交互的平台,通过该单元,用户可以查看检测结果和更改系统配置信息。 System and the user interface unit is a user interaction platform, means by which the user can view and change the detection result of the system configuration information.

[0029] 图3是一个展示数据提取器从复合文档中提取数据的示意图。 [0029] FIG. 3 is a diagram of a data extractor for extracting data from the composite document display.

[0030] 如图3中所示,数据读取器将复合文档的二进制数据读入内存,首先根据固定字节判断文档类型,然后根据该文档的存储类型,剥离固定部分的数据,减小后续计算的数据量,提高系统的效率。 [0030] As shown in FIG binary data reader compound document data is read into memory 3, according to a fixed first byte determines the type of document, and the document data storage type peeling the fixed portion, a subsequent reduced calculated data quantity, to improve efficiency of the system. 之后对二进制数据进行进制转换,使其成为实数序列,最后数据读取器将根据管理端配置的抽样率对实数序列进行抽样,将抽样结果交给相位谱生成器进行运算。 After conversion of the number of binary data, making it a real number sequence, the reader will be the last data sampled according to the sampling rate real sequence configuration management side, the sample results to calculates the phase spectrum generator. 图3中是以4比特位为一个单位进行二进制到实数序列的转换,也可以采用其他的位数组合,如6位、8位等。 FIG. 3 is a 4-bit binary conversion unit is a sequence of real numbers, and other bits may be used in combination, such as six, eight and the like. 同时在对转换后的实数序列进行抽样时,也有不同的抽样率可供选择:全部抽样、隔一位抽样、隔两位抽样等。 While the real-number sequence when the converted sampling, there are different sampling rates to choose from: All samples, a sampling interval, two sampling interval and the like. 采用多少位的比特组合进行转换以及采用多少的抽样率进行抽样,要根据复合文档的不同特点和绑定的恶意代码种类的不同而决定。 How many bit combination is converted using the sampling rate and the number of sampling employed, it should be determined depending on the characteristics of the different types of malicious code and binding of compound documents.

[0031] 图4是展示相位谱生成器生成相位谱过程的示意图。 [0031] FIG. 4 is a schematic diagram showing the process of the phase spectrum generator generates phase spectrum.

[0032] 如图4中所示,实数序列输入相位谱生成器后,相位谱生成器将输入的实数序列作为一函数的因变量,将序列在文档中的顺序信息作为时间自变量,构造时域上的函数,然后对该函数进行FFT变换(除FFT变换外还有小波变换等其他变换),最后绘制相位谱,该相位谱就是经过提取变换后的复合文档的相位谱。 [0032] Real sequence as shown in FIG real sequence input phase spectrum generator 4, the phase spectrum generator input as the dependent variable of a function, the sequence order information in the document as the time argument, when configured function on the domain, and the FFT transform function (FFT transform in addition there are other transforms wavelet transform), and finally drawing the phase spectrum, the phase spectrum of the compound document is phase-transform spectrum after extraction.

[0033] 图5是一个描述频谱分析器内部组成单元的示意图。 [0033] FIG. 5 is a diagram depicting the internal constituent elements spectrum analyzer.

[0034] 如图5中所示,相位谱生成后传给频谱分析器进行频谱分析。 As shown in [0034] FIG. 5, the spectrum analyzer pass phase spectrum generated for spectral analysis. 本发明能够实现对绑定了恶意代码的复合文档进行检测的依据是正常文档与感染了恶意代码的复合文档在相位谱的均匀度、相位值、频谱宽度等方面存在差异。 The present invention enables a malicious code according to the binding compound document is detected differences exist on a normal document infected with malicious code phase spectrum of the compound document in uniformity, a phase value, the spectral width. 特征提取单元会从相位谱中提取出频谱特征,然后判定单元根据数据库中的判定准则,对复合文档是否挂载了恶意代码进行判断。 The feature extraction unit extracts phase spectrum from the frequency spectrum of the characteristic, and determination means in accordance with the criteria of the database, whether or not the composite document loading malicious code determination. 本发明的检测系统中最为重要的部分是判定准则的制定,这个环节涉及到大量样本的生成、测试、相位谱计算以及对照组的相位谱分析。 The most important part of the detection system of the present invention is to develop a decision criteria, this link involves phase generation, testing, calculation, and controls the phase spectrum of the spectral analysis of large numbers of samples. 因为每种复合文档的二进制数据的组合形式并不一样,因此他们的相位谱特征差别也很大,进而需要对每种复合文档单独设定判定准则。 Because the combination of binary data for each composite document are not the same, so their phase spectrum characteristics differ greatly, and thus need to set the criteria of the composite document for each individual.

[0035] 所述复合文档恶意代码检测系统中用于判定准则制定的相位谱特征主要包括以下的一种或者多种。 The [0035] compound document malware detection system for determining spectral characteristics of normative phase comprising mainly of one or more.

[0036] I)相位的分布在整个频率范围内是否均匀,一般绑定的恶意代码的复合文档的相位谱在上下边缘都会出现一些脉冲从而导致相位分布不均匀。 [0036] I) the phase distribution is uniform over the entire frequency range, the phase of the composite document generally binding malicious code spectrum in upper and lower edges will have some phase pulse causing uneven distribution.

[0037] 2)相位谱上下边缘脉冲的相位值,例如绑定了恶意代码的PDF文档的相位谱的上边缘会出现一个相位大于0°的尖脉冲,而对应正常文档的上边缘是比较平整的,相位分布基本都在-30°以下。 [0037] 2) a phase value of the phase spectrum of the pulse upper and lower edges, for example bound to PDF document phase spectrum of malicious code will appear on the edge of a sharp pulse phase greater than 0 °, whereas the corresponding upper edge of the document is normally relatively flat the phase distribution are substantially -30 ° or less.

[0038] 3)整个相位分布的范围,例如包含恶意代码的Word文档的相位分布大概是对应正常文档的两倍左右。 [0038] 3) the entire range of the phase distribution, a phase distribution, for example, contains malicious code corresponding to a Word document is probably about twice normal document.

[0039] 判定准则主要是通过比较各种文档类型的正常文档与样本文档相位谱的一般性差异得到的,要分析出这种一般性的差异,就需要大量的对照组,首先要构造大量的各种类型的正常文档,然后使用对应文档的绑定工具将各种类型的恶意代码绑定到正常文档中,这样每个正常文档与对应的绑定了恶意代码的样本文档就构成了一组对照组,这里每个样本文档都需要进行验证,对每组文档的相位谱差异进行分析,然后统计出一般性差异制定出判定准则。 [0039] Criteria for mainly through general comparison between the various documents and document types normal phase spectrum of the obtained sample document, to analyze differences in such a general, requires a lot of control, a large number of first configuration Various types of normal document and then use the corresponding document binding tool of the type of malicious code to bind to normal document so that each document with a corresponding normal sample document binding malicious code constitutes a group group, where each sample document needs to be verified, each of the phase difference spectrum analysis of the document, and then to develop a statistical difference in the general determination criterion. 在判定准则的制定过程中,针对没一组对照组,又会变换不同的比特位数、取样频率、算法,然后进行纵向的比较,找出使得对照组特征差异最明显的比特位数、取样频率和算法的组合。 During the development of the decision criteria for a set of control group did not, will transform different number of bits, the sampling frequency, the algorithm, then the vertical comparison, characterized in that the control group to identify the most significant number of bits, the sampling combined frequency and algorithms.

[0040] 图6是一个展示了检测系统完整工作流程的流程图。 [0040] FIG. 6 is a flowchart showing the complete detection system workflow.

[0041] 如图6中所示,复合文档输入后,首先会计算复合文档的MD5值,以此判断该文档是否已经被检测过,防止对同一文档的重复检测。 [0041] As shown in the composite document input in FIG. 6, will first compound document MD5 value is calculated in order to determine whether the document has been examined, to prevent duplication of the detection of the same document. 如果没有检测过,则数据提取器开始进行数据提取工作,将二进制数据转换成实数序列,按照抽样率对实数序列进行抽样,然后交由相位谱生成器生成文档的相位谱图。 If not detected, too, the data extractor to extract data starts working to convert the binary data into a sequence of real numbers for a real number sequence sampled at a decimation rate, and then handed over to the phase spectrum generator generates a phase spectrum of the document. 频谱分析器得到生成的相位谱图后首先进行特征提取,然后根据提取的特征信息分析相位谱的频谱特性,最后根据数据库中存储的相应类型的复合文档的判定准则做出最终判定,将结果返回给管理端。 Spectrum analyzer spectrum obtained after the first phase of the generated feature extraction and then analysis of the feature information extracted by the phase spectrum based on spectral characteristics, and finally to make a final determination in accordance with decision criteria composite document database stored in a corresponding type, the result returned to the management end.

[0042] 如上所述,本发明通过生成并提取复合文档的频谱特征来检测绑定的恶意代码,其优点在于:1、检测对象针对性很强,每种文档的判定准则通过独立的样本分析来制定,这样系统的检测准确率比传统的检测技术的检测准确率要高很多;2、该系统的检测技术在检测过程中不需要解析文档,这样可以有效地保护系统数据和用户数据的安全;3、判定准则的制定是基于二进制数据的相位谱差异,是通过大量的数据对比与统计得来的,因此可以有效地检测到一些针对某些保护机制的绕过代码,降低漏报率。 [0042] As described above, the present invention is detected by generating and extracting the spectral characteristics of the composite document binding malicious code, which is advantageous in that: the decision criteria for each document by a separate sample analyzer 1, the detection target highly targeted, to develop, detection accuracy of such a system is much higher than the detection accuracy of the conventional detection technique; 2, the detection system does not need to parse the document in the detection process, which can effectively protect the security of the system data and user data ; 3, determined based on the spectral norm is the phase difference between the binary data is statistical come, it can be effectively detected by the comparison of large amounts of data codes to bypass some protection against certain mechanisms, reducing the false negative rate. 4、本发明的检测方法有别于传统的静态检测和动态检测,通过分析频谱,能够检测出未知的恶意代码。 4, the detection method of the present invention differs from the conventional static dynamic testing and detection, spectral analysis can detect unknown malicious code.

[0043] 尽管出于说明的目的描述了本发明的优选实施例子,本领域人员将理解,在不脱离如附属权利要求所披露的本发明的范围和精神的情况下,各种修改、增加和替换都是可能的。 [0043] Although described for purposes of illustration a preferred embodiment of the present invention, examples will be understood by those skilled in the art, without departing from the scope and spirit of the invention as disclosed in the appended claims, and various modifications, additions and substitutions are possible.

Claims (9)

1.一种基于频谱分析的复合文档恶意代码检测技术,其特征在于,所述方法包括如下步骤: A、对大量正常复合文档与对应的样本文档的相位谱进行比较分析,得出一般性差异,制定出判定准则; B、数据提取器对复合文档的二进制数据进行进制转换,将其转换为实数序列; C、相位谱生成器对复合文档的数据进行傅里叶等变换,并绘制复合文档的相位谱; D、频谱分析器提取相位谱的特征; E、频谱分析器根据判定准则和提取到的复合文档的相位谱特征进行检测。 A Compound Document malicious code detection technique of spectral analysis, characterized in that the method comprises the following steps: A, a large number of normal phase spectrum of the sample compound document corresponding to the document comparing analysis, the general differences to develop a decision criteria; B, binary data extractor compound document data is binary conversion, convert it to a real number sequence; C, the phase spectrum generator of the composite document data to Fourier transform and the like, and drawing the composite document phase spectrum; D, wherein a spectrum analyzer to extract phase spectrum; E, is detected based on a spectrum analyzer and spectrum determination criterion to the feature extraction phase composite document.
2.根据权利要求1所述的一种基于频谱分析的复合文档恶意代码检测技术,其特征在于,所述的步骤A进一步包括如下步骤: Al、设计对比实验,根据绑定恶意代码的种类、复合文档的种类、变换算法的类型、二进制到实数序列转换方法、抽样频率等不同情况设计出不同的对比实验组合; A2、对组合中的样本进行相应变换,生成样本的相位谱; A3、对A2步骤中生成的样本相位谱进行对比分析,寻找正常样本和挂马样本间的一般性差异,然后制定出每种情况的判定准则。 According to one of the claims 1 to Compound Document malicious code detection technique of spectral analysis, wherein said step A further comprises the step of: Al, Comparative Experiment design The binding type of malicious code, type composite document, unlike the case type transform algorithm, binary-to-real sequence converting method, sampling frequency and other design different comparative experimental compositions; A2, combination of samples corresponding transition phase generate a sample spectrum; A3, for step A2 phase spectrum generated samples were analyzed, and looking normal sample hanging horse general differences between the samples, and then work out the criteria of each case.
3.根据权利要`求2所述的一种基于频谱分析的复合文档恶意代码检测技术,其特征在于,步骤Al中所述的不同情况具体是指: 恶意代码的种类有下载器代码、键盘记录代码、修改注册表项代码、密码发送代码、上传资料代码、弹出窗口代码等;复合文档的种类有office系列办公文档Word、Excel、PowerPoint和PDF格式的文档等;变换算法有FFT快速傅里叶变换算法、小波变换算法等;二进制到实数序列的转换方法的种类有分别以2、3、4、5、6、7、8、16比特为一个单位向实数序列进行转换;抽样频率是指在生成的全部实数序列中每隔多少位进行一次抽样;每组实验组合的唯一区别是文档是干净的还是绑定了某一类型的恶意代码,其他的限制条件如变换算法、文档格式、抽样率等都是相同的。 According to one of the claims 2 'requirements Compound Document malicious code detection technique of spectral analysis, characterized in that, unlike the case of the step Al in particular: the type of malicious code, the download code, the keypad record the code, modify the registry key codes, passwords are sent the code, upload data codes, codes and other pop-up window; types of compound documents are office series of office documents Word, Excel, PowerPoint and PDF documents, etc.; there FFT fast Fourier transform algorithm in Fourier transform method, wavelet transform algorithm; binary sequence of real numbers to the conversion method of the kind 2,3,4,5,6,7,8,16 bits respectively to conversion into a unit for the sequence of real numbers; sampling frequency is how many samples once every all real-number sequence generated; only difference is each experimental combination documents are clean or binding a certain type of malicious code, other restrictions, such as transform algorithm, document format, sampling rate is the same.
4.根据权利要求2所述的一种基于频谱分析的复合文档恶意代码检测技术,其特征在于,步骤A3中所述的判定准则具体是指: 判定一个复合文档是否绑定恶意代码的相位谱上的特征,每一条判定准则都详细的规定了其所针对的判定对象,如针对word文档绑定键盘记录恶意代码的判定准则,每一条判定准则都有其他的限定条件,如某条判定准则适用于采用4比特为一个单位进行二进制到实数序列的转换、全部抽样、采用FFT算法生成的相位谱。 According to one of the claims 2 Compound Document malicious code detection technique of spectral analysis, wherein said step A3 specifically refers to the criteria of: determining whether a compound document binding malicious code phase spectrum feature on, each decision criteria are specified in detail for their determination target, such as a word document binding keyboard for recording the criteria of malicious code, each decision criteria are defined in the other conditions, such as a strip cRITERIA suitable for use as a 4-bit binary sequence of real numbers to the converter unit, the entire sample, using a phase spectrum generated by the FFT algorithm.
5.根据权利要求1所述的一种基于频谱分析的复合文档恶意代码检测技术,其特征在于,所述的步骤B进一步包括如下步骤: B1、将复合文档读入内存,根据固定位置的标志位判断文档的类型; B2、根据文档的类型,对文档进行数据分离,提取固定字段的数据部分; B3、对提取出的数据部分进行进制转换,按照权利要求3中所述将二进制数据分组转为实数序列。 According to one of the claims 1 to Compound Document malicious code detection technique of spectral analysis, wherein said step B further comprises the step of: B1, a compound document is read into memory, according to a fixed location marker Analyzing the document type bits; B2, according to the type of the document, the document data separating, extracting a fixed portion of data field; B3, the extracted data portion for binary conversion, binary data of the packet according to claim 3 to a real number sequence.
6.根据权利要求1所述的一种基于频谱分析的复合文档恶意代码检测技术,其特征在于,所述的步骤C进一步包括如下步骤: 相位谱生成器收到来自步骤B中的转换后的实数序列后,Cl、将实数序列作为因变量,实数在文件中的位置顺序作为时间自变量,构造时域上的函数; C2、对构造的函数进行快速傅里叶变换或者小波变换,得到其在频域的表示; C3、根据变换结果绘制复合文档的相位谱图,将数据可视化。 According to one of the claims 1 to Compound Document malicious code detection technique of spectral analysis, wherein said step C further comprises the step of: after receiving the phase spectrum generator from step B is converted after the real-number sequence, Cl, real number sequence as the dependent variable, the real position of the sequence in the file as a time from a function of the variable configuration in time domain; C2, a function configuration of a fast Fourier transform or wavelet transform, which represented in the frequency domain; a C3, the transform results are plotted the phase spectra of the composite document, the data visualization.
7.根据权利要求1所述的一种基于频谱分析的复合文档恶意代码检测技术,其特征在于,步骤D中所述的相位谱特征,具体包含以下几个方面: Dl、相位谱均匀度上的特征; D2、相位谱相位分布区间的特征; D3、相位谱分布频域范围的特征。 According to one of the claims 1 to Compound Document malicious code detection technique of spectral analysis, characterized in that the spectral characteristic of the phase in step D, comprising the following particular aspects: Dl, the phase spectrum uniformity feature; wherein D2, the phase distribution of the phase spectrum range; D3, phase spectrum distribution of the frequency domain.
8.一种基于频谱分析的复合文档恶意代码检测系统,其特征在于,所述的系统包括: F、管理端G、数据提取器H、相位谱生成器1、频谱分析器其特征在于,所述的系统还包括F1、逻辑控制单元,用于控制整个系统的运行逻辑,实现系统各部分间的相互配合; F2、数据库管理单元,用于管理维护数据库; F3、策略配置单元,用于管理系统检测过程中的各种策略配置信息,支撑逻辑控制单元对整个系统运行的控制工作; F4用户界面单元,用于完成系统与用户的交互,用户通过该单元查看检测结果和更改系统配置信息; 11、特征提取单元,用于提取生成的复合文档相位谱图中的特征信息; 12、判定标准维护单元,用于维护系统中的判定标准; 13、判定单元,用于根据提取的特征信息和判定标准对复合文档进行判定,判定其是否绑定恶意代码,以及绑定了哪种 A composite document based on the spectral analysis of malicious code detection, characterized in that said system comprises: F, management terminal G, the data extractor H, phase spectrum generator 1, characterized in that the spectrum analyzer, the said system further comprises F1, logic control unit, for controlling the operation logic of the system, to achieve mutual cooperation between parts of the system; F2 of, a database management unit configured to manage and maintain a database; F3, the policy configuration unit for managing various strategies the configuration information detection process, the supporting logic control unit controls the operation of the whole system work; the F4 user interface unit, for performing a user interaction with the system, the user to view test results and change the system configuration information through the cell; 11, feature extraction unit for extracting feature information of the phase spectrum of the extracted compound document is generated; 12, criteria maintenance unit for maintaining the system criteria; 13, determination unit, according to the extracted feature information and criteria for judging the composite document, determine whether its binding malicious code, and which bound 类型的恶意代码。 Types of malicious code.
9.一种基于频谱分析的复合文档恶意代码检测系统,其特征在于,所述的F管理端会对被检测文档计算MD5值,对其检测结果进行记录,对新文档进行检测时,会首先计算器MD5值,判断是否已经检测过,避免重复检测。 A compound document to detect malicious code based on the spectral analysis, characterized in that said management terminal F will detect a document MD5 value is calculated, for recording its detection result, detects when a new document, will first MD5 value calculator, determines whether it has detected too, to avoid duplicate detection.
CN201310224569.1A 2013-06-07 2013-06-07 A composite document of malicious code detection method and system based on spectrum analysis CN103294954B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310224569.1A CN103294954B (en) 2013-06-07 2013-06-07 A composite document of malicious code detection method and system based on spectrum analysis

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310224569.1A CN103294954B (en) 2013-06-07 2013-06-07 A composite document of malicious code detection method and system based on spectrum analysis

Publications (2)

Publication Number Publication Date
CN103294954A true CN103294954A (en) 2013-09-11
CN103294954B CN103294954B (en) 2015-12-02

Family

ID=49095796

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310224569.1A CN103294954B (en) 2013-06-07 2013-06-07 A composite document of malicious code detection method and system based on spectrum analysis

Country Status (1)

Country Link
CN (1) CN103294954B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105488408A (en) * 2014-12-31 2016-04-13 中国信息安全认证中心 Identification method and system of malicious sample type on the basis of characteristics

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2007100916A2 (en) * 2006-02-28 2007-09-07 The Trustees Of Columbia University In The City Of New York Systems, methods, and media for outputting a dataset based upon anomaly detection
CN101459445A (en) * 2008-12-29 2009-06-17 浙江大学 Cooperative spectrum sensing method in cognitive radio system
US8370938B1 (en) * 2009-04-25 2013-02-05 Dasient, Inc. Mitigating malware

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2007100916A2 (en) * 2006-02-28 2007-09-07 The Trustees Of Columbia University In The City Of New York Systems, methods, and media for outputting a dataset based upon anomaly detection
WO2007100916A3 (en) * 2006-02-28 2008-04-24 Univ Columbia Systems, methods, and media for outputting a dataset based upon anomaly detection
CN101459445A (en) * 2008-12-29 2009-06-17 浙江大学 Cooperative spectrum sensing method in cognitive radio system
US8370938B1 (en) * 2009-04-25 2013-02-05 Dasient, Inc. Mitigating malware

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
白金荣等: "基于ELF静态结构特征的恶意软件检测方法", 《四川大学学报(工程科学版)》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105488408A (en) * 2014-12-31 2016-04-13 中国信息安全认证中心 Identification method and system of malicious sample type on the basis of characteristics

Also Published As

Publication number Publication date
CN103294954B (en) 2015-12-02

Similar Documents

Publication Publication Date Title
Beebe Digital forensic research: The good, the bad and the unaddressed
US8555385B1 (en) Techniques for behavior based malware analysis
US8713681B2 (en) System and method for detecting executable machine instructions in a data stream
CN100461132C (en) Software safety code analyzer based on static analysis of source code and testing method therefor
US8219588B2 (en) Methods for searching forensic data
US8881271B2 (en) System and method for forensic identification of elements within a computer system
US20100192222A1 (en) Malware detection using multiple classifiers
US9690935B2 (en) Identification of obfuscated computer items using visual algorithms
US8656095B2 (en) Digital forensic acquisition kit and methods of use thereof
US20120158625A1 (en) Creating and Processing a Data Rule
Quick et al. Forensic collection of cloud storage data: Does the act of collection result in changes to the data or its metadata?
US20120317421A1 (en) Fingerprinting Executable Code
CN103918222A (en) System and method for detection of denial of service attacks
KR20090051956A (en) The method and apparatus for judging dll inserted by malicious code in an operation system
Kenneally et al. Risk sensitive digital evidence collection
CN102314561B (en) Automatic analysis method and system of malicious codes based on API (application program interface) HOOK
CN103839003B (en) Method and apparatus for detecting a malicious file
McKemmish When is digital evidence forensically sound?
US20130074198A1 (en) Methods and systems to fingerprint textual information using word runs
USRE42382E1 (en) Volume mount authentication
US20070239993A1 (en) System and method for comparing similarity of computer programs
Ademu et al. A new approach of digital forensic model for digital forensic investigation
Roussev Hashing and data fingerprinting in digital forensics
CN101140611A (en) Malevolence code automatic recognition method
US7941386B2 (en) Forensic systems and methods using search packs that can be edited for enterprise-wide data identification, data sharing, and management

Legal Events

Date Code Title Description
C06 Publication
C10 Entry into substantive examination
C14 Grant of patent or utility model