WO2017114290A1 - 一种样本关联性检测方法、系统、电子设备以及存储介质 - Google Patents

一种样本关联性检测方法、系统、电子设备以及存储介质 Download PDF

Info

Publication number
WO2017114290A1
WO2017114290A1 PCT/CN2016/111566 CN2016111566W WO2017114290A1 WO 2017114290 A1 WO2017114290 A1 WO 2017114290A1 CN 2016111566 W CN2016111566 W CN 2016111566W WO 2017114290 A1 WO2017114290 A1 WO 2017114290A1
Authority
WO
WIPO (PCT)
Prior art keywords
sample
samples
correlation
detected
features
Prior art date
Application number
PCT/CN2016/111566
Other languages
English (en)
French (fr)
Inventor
张路
潘宣辰
Original Assignee
武汉安天信息技术有限责任公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 武汉安天信息技术有限责任公司 filed Critical 武汉安天信息技术有限责任公司
Publication of WO2017114290A1 publication Critical patent/WO2017114290A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/554Detecting local intrusion or implementing counter-measures involving event detection and direct action

Definitions

  • the present invention relates to the field of information technology, and in particular, to a sample correlation detection method, system, electronic device, and storage medium.
  • the present invention provides a sample correlation detection method and system, which provides a more accurate sample association relationship and is more inspiring, and can be widely applied in the fields of malicious code detection and malicious code analysis.
  • a sample correlation detection method includes:
  • the samples and features are taken as nodes, and the associated samples and the connected lines of the features are used as edges to construct the associated network graph;
  • the calculating the degree of association between the two samples in the sample set may be: traversing the class name and the method name in the code of each sample, and comparing the class names between the two samples, and if the class names are the same, further calculating The number of all method names in the corresponding class name of the two samples intersects, and the number of method name intersections in each class name is added in turn, divided by the number of unions of all the method names of the two samples, which is the degree of association between the two samples.
  • the correlation degrees between the associated features are the same.
  • the feature set is extracted in multiple dimensions of the sample set, and at least includes static and dynamic information extractable by the sample, and other information obtained after static and dynamic information processing, and the features may further Refined to: sample source dimension, sample identification dimension, and sample name dimension. specific:
  • the sample source dimension may include: ip, sp, email, url or whois information of the domain name;
  • the sample identification dimension may include: a hash value of the sample resource file or the icon.
  • the hash algorithm herein includes a fuzzy hash, a local sensitive hash algorithm, and the like, in addition to the unique identifier MD5, SHA1, crc32 algorithm, and the like. ;
  • the sample name dimension may include: a sample hash, a package name, a program name, a file signature, or a certificate.
  • the sample hash algorithm herein has a unique identifier MD5, SHA1, crc32 algorithm, etc., and includes a fuzzy hash. , local sensitive hash algorithm, etc.
  • the sample relevance detection method further includes:
  • a sample correlation detection system includes:
  • a sample collection module for collecting known white sample files and black sample files to form a sample set
  • a feature extraction module configured to perform feature extraction on a plurality of dimensions of the sample set
  • the sample correlation degree calculation module is configured to respectively calculate the degree of association between two samples in the sample set. If the degree of association is greater than the first preset value, the two samples have an association, otherwise the two samples have no correlation;
  • a feature judging module configured to respectively determine whether the features of each sample in the sample set are the same in each dimension; if yes, It is considered that the features of the sample in the corresponding dimension are related, and the correlation value between the related features is given; otherwise, the sample has no relevance in the corresponding dimension;
  • the associated network graph construction module is configured to construct an association network graph based on the correlation between the samples and the association of the features of the samples in the corresponding dimensions, taking the samples and features as nodes, and connecting the connected samples and features as edges. ;
  • the sample correlation module to be detected is used for acquiring features of the sample to be detected in each dimension, and calculating the degree of association between the sample to be detected and the sample of the sample set, and the feature of the sample to be detected in the dimensions and the sample to be detected Embedding the constructed associated network map, and the connection constitutes a new associated network map;
  • a result output module configured to calculate a product of the correlation value of the sample to be detected in the new associated network graph and the sample on each connected line, and determine whether the product of the correlation value exceeds a second preset value, if the second preset is exceeded The value is output to the user for the sample corresponding to the feature on the corresponding line.
  • the calculating the degree of association between the two samples in the sample set is specifically: traversing the class name and the method name in the code for obtaining each sample, and comparing the class names between the two samples, and if the class names are the same, further calculating The number of all method names in the corresponding class name of the two samples intersects, and the number of method name intersections in each class name is added in turn, divided by the number of unions of all the method names of the two samples, which is the degree of association between the two samples.
  • the feature set is extracted in multiple dimensions, and at least includes static and dynamic information that can be extracted by the sample, and other information obtained after static and dynamic information processing, and these features can be further Refined to: sample source dimension, sample identification dimension, and sample name dimension, specific:
  • the sample source dimension may include: ip, sp, email, url or whois information of the domain name;
  • the sample identification dimension may include: a sample resource file or an icon hash value.
  • the hash algorithm herein includes a fuzzy hash and a local sensitive hash algorithm in addition to the unique identifier MD5, SHA1, crc32 algorithm, and the like;
  • the sample name dimension may include: a sample hash, a package name, a program name, a file signature, or a certificate.
  • sample hash algorithm herein has a unique identifier MD5, SHA1, crc32 algorithm, etc., and includes a fuzzy hash. , local sensitive hash algorithm.
  • the sample correlation detection system further includes:
  • a malicious sample judging module configured to determine, according to the sample corresponding to the feature on the corresponding connection line, whether the sample to be detected is malicious after the result output module outputs the sample corresponding to the feature on the corresponding connection line sample.
  • the present invention also provides an electronic device comprising: one or more processors; a memory; one or more programs, the one or more programs being stored in the memory when processed by the one or more When the device is executed, do the following:
  • the sample and the feature are taken as nodes, and the associated samples and the connected lines of the features are used as edges to construct an associated network map;
  • connection constitutes a new associated network map
  • the present invention also proposes a storage medium for storing an application for performing the sample relevance detecting method of the present invention at runtime.
  • the sample correlation detection method and system proposed by the present invention include: acquiring a sample set, and calculating the feature and relevance of the sample set in each dimension to the sample And the sample feature is a node, and the associated network map is constructed by the connected sample and the connected line of the node, and the characteristics of the sample to be detected are acquired, and embedded in the associated network map, and the sample to be detected is calculated in the new associated network map and The product of the correlation value of the samples on each connection line, if it is greater than the second preset value, outputs the sample on the corresponding connection line.
  • the invention has the advantages of non-code feature correlation detection, more accurate association and stronger enlightenment, and can be widely applied in the fields of malicious code detection and malicious code analysis.
  • FIG. 1 is a flow chart of a sample correlation detection method according to the present invention.
  • FIG. 2 is a schematic diagram of constructing an associated network diagram in accordance with the method of the present invention.
  • FIG. 3 is a schematic diagram of constructing a new associated network in accordance with the method of the present invention.
  • FIG. 4 is a schematic structural diagram of a sample correlation detection system according to the present invention.
  • the invention provides a sample correlation detection method, system, electronic device and storage medium. By constructing an association network diagram between samples and calculating the correlation weight between samples, the correlation between the sample to be detected and the known sample is obtained. Thereby providing auxiliary information for malicious code judgment and analysis.
  • a sample correlation detection method as shown in FIG. 1, includes:
  • S101 collects known sample files to form a sample set.
  • known samples can be white samples (that is, officially released normal samples without malicious code) or black samples (samples containing malicious code), but in order to improve the accuracy of malicious code detection, fully explain the detection
  • the sample is associated with the white sample and the black sample respectively.
  • the sample set includes both the white sample file and the black sample file.
  • S102 performs feature extraction on the sample set in multiple dimensions.
  • sample source dimension As an example, feature extraction is performed on multiple dimensions of a sample set, including at least static and dynamic information that can be extracted by the sample, and other information obtained after static and dynamic information processing, and these features can be further refined into : sample source dimension, sample identification dimension, and sample name dimension:
  • the sample source dimension may include: ip, sp, email, url or whois information of the domain name;
  • the sample identification dimension may include: a hash value of the sample resource file or the icon.
  • the hash algorithm herein includes a fuzzy hash, a local sensitive hash algorithm, and the like, in addition to the unique identifier MD5, SHA1, crc32 algorithm, and the like;
  • the sample name dimension may include: sample hash, package name, program name, file signature or certificate, etc.
  • sample hash algorithm herein includes a fuzzy hash, in addition to a unique identifier MD5, SHA1, crc32 algorithm, and the like. Locally sensitive hash algorithm.
  • S103 respectively calculates the degree of association between two samples in the sample set. If the degree of association is greater than the first preset value, it is determined that there is an association between the two samples.
  • the degree of association between two samples in the sample set there are various methods for respectively calculating the degree of association between two samples in the sample set.
  • a method with convenient use and high detection efficiency is shown. Specifically: traversing the class name and method name in the code of each sample, and comparing the class names between the two samples. If the class names are the same between the two samples, further calculate the number of intersections of all the method names of the two samples in the corresponding class name. , in turn, add the method names in the same class name, and get the intersection of the method names in the same class name.
  • the obtained method is used to divide the total number of method names in each of the same class names by the number of unions of all method names of the two samples, and the obtained value is the degree of correlation between the two samples.
  • the first preset value is set to 0.5. For example, the situation of the existing white sample 1 is shown in Table 1:
  • step S102 can also be performed after step S103.
  • the heuristic at the code level is more useful in scenarios where there is functionality or code development homology, and in the actual confrontation, as the attacker's means become more abundant, the number of advanced targeted threats increases, more and more malicious code In order to better bypass detection and confrontation,
  • S104 respectively determines whether the features of the samples in the sample set are the same in the corresponding dimension; if the two samples have the same features in at least one dimension such as ip, email, url, the same resource file, or the same icon, the sample is considered to be in the corresponding dimension.
  • the features on the top are related, and the relevance values between the associated features are given; otherwise, the features of the determined samples in the corresponding dimensions are not related.
  • the degree of association between the associated features in each dimension can be set according to the empirical value.
  • the degree of association between the other related features is required, except that the degree of association between the sample and the sample needs to be separately calculated.
  • the same value For example, it can be set to 0.5.
  • S105 constructs an associated network map according to the correlation between the samples and the correlation of the features of the samples in the corresponding dimensions, with the samples and features as nodes, and the connected samples and the connected lines of the features as edges;
  • the correlation degree between sample 1 and sample 4 is 0.7
  • the correlation degree between sample 2 and sample 3 is 0.85
  • sample 1 has feature icons 1 and ip1
  • sample 2 has characteristics.
  • Sp1 and icon 1 sample 3 has feature package name 1 and icon 1
  • sample 4 has icon 1 then an associated network map is constructed, and the correlation degree between each associated feature is 0.5.
  • S106 acquiring a feature of the sample to be detected in each dimension, and calculating a degree of association between the sample to be detected and the sample in the sample set. If the requirement is met, the feature of the sample to be detected in the dimensions and the sample to be detected are embedded and constructed. In the associated network diagram, the connection constitutes a new associated network map;
  • the satisfaction requirement in this step can be understood as: the correlation between the sample to be detected and the sample in the sample set, that is, the degree of association between the sample to be detected and the sample in the sample set is greater than the first preset value.
  • the correlation degree between the sample to be detected and the sample 2 is 0.95
  • the degree of association with other samples is less than 0.5
  • the sample to be detected has the feature sp1
  • the association of the embedded construction is performed.
  • the connection is composed of a new associated network diagram.
  • S107 calculates a product of the correlation value between the sample to be detected and the sample on each connection in the new associated network graph, and determines whether the product of the correlation value exceeds a second preset value, such as 0.2, if the product of the correlation value exceeds The second preset value outputs the sample corresponding to the feature on the corresponding line, otherwise the sample on the connection line is discarded.
  • a second preset value such as 0.2
  • the present invention can be applied to sample maliciousness detection. If the maliciousness of a sample is unknown, the correlation degree of all samples in the sample and the map (that is, the new associated network diagram formed above) can be separately calculated, and the association is selected. A sample whose degree is greater than the second preset value may predict the maliciousness of the unknown sample according to the sample condition with the largest degree of association.
  • the sample relevance detection method may further include: according to the sample corresponding to the feature on the corresponding connection line of the output, Determine whether the sample to be tested is a malicious sample.
  • the sample with the largest correlation with the sample to be detected may be firstly found from the output sample, and then Whether the sample to be detected is a malicious sample is determined according to the type of the sample (such as a malicious sample or a normal sample). For example, in the existing sample X to be detected, the samples whose correlation degree is greater than the second preset value in the map respectively have A, B, C, D, and E, wherein the most relevant degree is C, and C is known as a malicious sample. Then X is also a malicious sample.
  • voting is performed according to all samples that satisfy the second preset value (ie, the samples outputted above), for example, the output samples are A, B, C, D, and E, respectively, where samples A, C, and D E is a malicious sample, and B is a normal sample. Most of the malicious samples associated with the sample X to be detected are pre-determined to be a malicious sample.
  • the present invention may not directly give the maliciousness determination result of the sample to be detected, but push the output samples A, B, C, D, E to the analyst, and analyze Personnel treat samples based on a small sample set for more efficient and accurate determination.
  • determining whether the sample to be detected is a counterfeit file according to the output of the correlated sample has an auxiliary function for detecting the malicious code.
  • the invention has the advantages that the association between the sample to be detected and the known samples is given by the association of multiple information such as samples and features, and is provided to the user for further determining whether the sample to be detected is a malicious or counterfeit sample, and If a large number of malicious samples are found to have the same characteristics during the association process, it may be considered to add the feature to the rule base of the anti-virus engine.
  • an embodiment of the present invention further provides a sample correlation detection system, which is provided by the sample correlation detection system and the foregoing implementations provided by the embodiments of the present invention.
  • the sample correlation detection method provided by the example is corresponding, and therefore, the embodiment of the foregoing sample correlation detection method is also applicable to the sample correlation detection system provided in this embodiment, which is not described in detail in this embodiment.
  • 4 is a schematic structural diagram of a sample correlation detection system according to the present invention. As shown in Figure 4, it includes:
  • the sample collection module 401 is configured to collect a known white sample file and a black sample file to form a sample set.
  • the feature extraction module 402 is configured to perform feature extraction on the plurality of dimensions of the sample set.
  • the sample set is extracted in multiple dimensions, and at least includes: a sample source dimension, a sample identification dimension, and a sample name dimension; wherein the sample source dimension includes: ip, sp, email, url Or the whois information of the domain name; the sample identification dimension includes: an MD5 value of the sample resource file or icon; the sample name dimension includes: a sample package name, a program name, a file signature, or a certificate.
  • the sample relevance calculation module 403 is configured to separately calculate the degree of association between two samples in the sample set, and if the degree of association is greater than the first preset value, determine that there is an association between the two samples, otherwise it is determined that there is no relationship between the two samples. Relevance.
  • the specific implementation process of the sample relevance calculation module 403 for calculating the degree of association between two samples in the sample set may be as follows: traversing the class name and method name in the code of each sample, and comparing the two sample classes. Name, if the class name is the same, further calculate the number of intersections of all method names in the corresponding class name of the two samples, and sequentially add the number of method name intersections in the same class name, divided by the number of unions of all method names of the two samples, ie The degree of association between the two samples.
  • the feature judging module 404 is configured to respectively determine whether the features of the samples in the sample set are the same in each dimension; if yes, consider that the features of the sample in the corresponding dimension have relevance, and give a correlation value between the associated features; otherwise It is determined that the features of the sample in the corresponding dimension are not related. In the system, the correlation values between the associated features are the same.
  • the association network graph construction module 405 is configured to construct an association network graph by using the sample and the feature as nodes according to the correlation between the samples and the features of the samples in the corresponding dimensions, and using the associated samples and the connecting lines of the features as edges.
  • the to-be-detected sample association module 406 is configured to acquire features of the sample to be detected in each dimension, and calculate a degree of association between the sample to be detected and the sample of the sample set, and select features of the sample to be detected in the dimensions and
  • the test sample is embedded in the constructed associated network map, and the connection constitutes a new associated network map.
  • the result output module 407 is configured to calculate a product of the correlation value between the sample to be detected and the sample on each connection in the new associated network graph, and determine whether the product of the correlation value exceeds a second preset value, if the correlation degree When the value product exceeds the second preset value, the sample corresponding to the feature on the corresponding connection line is output to the user.
  • the sample relevance detection system may further include: a malicious sample determination module.
  • the malicious sample judging module is configured to determine, after the result output module 407 outputs the sample corresponding to the feature on the corresponding connection line, whether the sample to be detected is a malicious sample according to the sample corresponding to the feature on the corresponding connection line.
  • the present invention also provides an electronic device comprising: one or more processors; a memory; one or more programs, the one or more programs being stored in the memory, when When one or more processors are executed, the following operations are performed:
  • S104' respectively determining whether the features of each sample in the sample set are the same in each dimension, and if so, considering that the features of the sample in the corresponding dimension have relevance, and giving a correlation value between the associated features, otherwise determining The characteristics of the sample in the corresponding dimension are not related;
  • S106 ′ acquiring features of the to-be-detected samples in the respective dimensions, and calculating a degree of association between the to-be-detected samples and the sample-concentrated samples, and selecting features of the to-be-detected samples in the respective dimensions and waiting for The detection sample is embedded in the associated network diagram, and the connection constitutes a new associated network diagram;
  • the present invention also provides a storage medium for storing an application for performing the sample relevance detecting method according to any of the above embodiments of the present invention at runtime.
  • the invention has the advantages that the association between the sample to be detected and the known samples is given by the association of multiple information such as samples and features, and is provided to the user for further determining whether the sample to be detected is a malicious or counterfeit sample, and If a large number of malicious samples are found to have the same characteristics during the association process, it may be considered to add the feature to the rule base of the anti-virus engine.
  • the invention provides a sample correlation detection method and system, comprising: acquiring a sample set, and calculating a feature and correlation degree of the sample set in each dimension, taking the sample and the sample feature as a node, and the sample with the degree of association and the connection of the node
  • the associated network map is constructed by the line edge, and the characteristics of the sample to be detected are obtained, and embedded in the associated network map, and the product of the correlation value of the sample to be detected in the new associated network graph and the samples on each connected line is calculated, if it is greater than the second pre- Set the value to output the sample on the corresponding line.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)
  • Computer And Data Communications (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种样本关联性检测方法、系统、电子设备以及存储介质,包括:获取样本集,并计算样本集在各维度的特征及关联度,以样本及样本特征为节点,以具有关联度的样本及节点的连线围边构建关联网络图,获取待检测样本的特征,并将其嵌入关联网络图,计算待检测样本在新关联网络图中与各连线上样本的关联度值乘积,若大于第二预设值,则输出对应连线上的样本。能够利用代码及样本属性进行更多信息的判断,关联关系更准确,启发性更强。能够有效的输出具有关联性的样本,广泛应用于恶意代码检测、恶意代码分析等领域。

Description

一种样本关联性检测方法、系统、电子设备以及存储介质
相关申请的交叉引用
本申请要求武汉安天信息技术有限责任公司于2015年12月31日提交的、发明名称为“一种基于标签传播的样本关联性检测方法及系统”的、中国专利申请号“201511015286.1”的优先权。
技术领域
本发明涉及信息技术领域,特别涉及一种样本关联性检测方法、系统、电子设备以及存储介质。
背景技术
目前样本关联性的检测方面大多在代码层面进行分析,但是代码层面的启发性更多作用于有功能或代码开发同源性的场景,而在现实对抗当中,随着攻击者的手段越加丰富,高级针对性威胁的增多,越来越多的恶意代码为了更好的绕过检测和对抗,故意采取了针对性的开发策略来避免代码层面的关联性检测。通过对现有技术的大量分析,我们发现虽然恶意代码的开发手段不同,但外在的欺骗技巧、伪装技巧以及行为表现等多方面的非代码特征或人可察觉的部分是有很好的关联性的,因此,实有必要研究一种新的样本关联性检测技术,以改善现有技术的不足。
发明内容
有鉴于此,本发明提出一种样本关联性检测方法及系统,提供的样本关联关系更准确,启发性更强,能广泛应用于恶意代码检测、恶意代码分析等领域。
一种样本关联性检测方法,包括:
收集已知样本文件,组成样本集;
对样本集在多个维度进行特征提取;
分别计算样本集中两样本间的关联度,如果关联度大于第一预设值,则两样本间具有关联性,否则两样本间不具有关联性;
分别判断样本集中各样本在各维度上特征是否相同;如果是,则认为样本在对应维度上的特征具有关联性,并给出各关联特征间的关联度值;否则样本在对应维度上不具有关 联性;
根据样本间的关联性及样本在对应维度上特征的关联性,以样本及特征为节点,以具有关联性的样本及特征的连线为边,构建关联网络图;
获取待检测样本在各维度上的特征,并计算待检测样本与样本集中样本的关联度,将所述待检测样本在所述各维度上的特征及待检测样本嵌入构建的关联网络图,连线构成新关联网络图;
计算待检测样本在新关联网络图中与各连线上样本的关联度值乘积,并判断所述关联度值乘积是否超过第二预设值,如果超过第二预设值,则向用户输出对应连线上的特征所对应的样本。
所述的方法中,所述分别计算样本集中两样本间的关联度可以为:遍历获取各样本的代码中的类名及方法名,比较两样本间类名,如类名相同,则进一步计算两样本在对应类名中的所有方法名交集个数,依次累加各相同类名中方法名交集数量,除以两样本所有方法名并集数量,即为两样本间的关联度。
所述方法中,所述各关联特征间的关联度值相同。
所述的方法中,所述对样本集在多个维度进行特征提取,至少包括样本可提取的静态、动态信息,以及基于静态、动态信息处理后得到的其他信息作为特征,而这些特征可进一步细化为:样本来源维度、样本标识维度及样本名称维度。具体的:
所述样本来源维度可以包括:ip、sp、email、url或域名的whois信息等;
所述样本标识维度可以包括:样本资源文件或图标的hash值,需要说明的是,这里的hash算法,除了具有唯一性标示MD5、SHA1、crc32算法等,还包括模糊hash、局部敏感hash算法等;
所述样本名称维度可以包括:样本hash、包名、程序名、文件签名或证书,需要说明的是,这里的样本hash算法,除了具有唯一性标示MD5、SHA1、crc32算法等,还包括模糊hash、局部敏感hash算法等。
在所述输出对应连线上的特征所对应的样本之后,所述样本关联性检测方法还包括:
根据所述输出对应连线上的特征所对应的样本,判断所述待检测样本是否为恶意样本。
一种样本关联性检测系统,包括:
样本收集模块,用于收集已知白样本文件及黑样本文件,组成样本集;
特征提取模块,用于对样本集在多个维度进行特征提取;
样本关联度计算模块,用于分别计算样本集中两样本间的关联度,如果关联度大于第一预设值,则两样本间具有关联性,否则两样本间不具有关联性;
特征判断模块,用于分别判断样本集中各样本在各维度上特征是否相同;如果是,则 认为样本在对应维度上的特征具有关联性,并给出各关联特征间的关联度值;否则样本在对应维度上不具有关联性;
关联网络图构建模块,用于根据样本间的关联性及样本在对应维度上特征的关联性,以样本及特征为节点,以具有关联性的样本及特征的连线为边,构建关联网络图;
待检测样本关联模块,用于获取待检测样本在各维度上的特征,并计算待检测样本与样本集中样本的关联度,将所述待检测样本在所述各维度上的特征及待检测样本嵌入构建的关联网络图,连线构成新关联网络图;
结果输出模块,用于计算待检测样本在新关联网络图中与各连线上样本的关联度值乘积,并判断所述关联度值乘积是否超过第二预设值,如果超过第二预设值,则向用户输出对应连线上的特征所对应的样本。
所述的系统中,所述分别计算样本集中两样本间的关联度具体为:遍历获取各样本的代码中的类名及方法名,比较两样本间类名,如类名相同,则进一步计算两样本在对应类名中的所有方法名交集个数,依次累加各相同类名中方法名交集数量,除以两样本所有方法名并集数量,即为两样本间的关联度。
所述系统中,所述各关联特征间的关联度值相同。
所述的系统中,所述对样本集在多个维度进行特征提取,至少包括样本可提取的静态、动态信息,以及基于静态、动态信息处理后得到的其他信息作为特征,而这些特征可进一步细化为:样本来源维度、样本标识维度及样本名称维度,具体的:
所述样本来源维度可以包括:ip、sp、email、url或域名的whois信息;
所述样本标识维度可以包括:样本资源文件或图标的hash值,需要说明的是,这里的hash算法,除了具有唯一性标示MD5、SHA1、crc32算法等,还包括模糊hash、局部敏感hash算法;
所述样本名称维度可以包括:样本hash、包名、程序名、文件签名或证书,需要说明的是,这里的样本hash算法,除了具有唯一性标示MD5、SHA1、crc32算法等,还包括模糊hash、局部敏感hash算法。
所述样本关联性检测系统还包括:
恶意样本判断模块,用于在所述结果输出模块输出对应连线上的特征所对应的样本之后,根据所述输出对应连线上的特征所对应的样本,判断所述待检测样本是否为恶意样本。
本发明还提出了一种电子设备,包括:一个或者多个处理器;存储器;一个或多个程序,所述一个或者多个程序存储在所述存储器中,当被所述一个或者多个处理器执行时进行如下操作:
收集已知白样本文件及黑样本文件,组成样本集;
对所述样本集在多个维度进行特征提取;
分别计算所述样本集中两样本之间的关联度,如果所述关联度大于第一预设值,则判定所述两样本之间具有关联性,否则判定所述两样本之间不具有关联性;
分别判断所述样本集中各样本在各维度上的特征是否相同,如果是,则认为样本在对应维度上的特征具有关联性,并给出各关联特征间的关联度值,否则判定样本在对应维度上的特征不具有关联性;
根据所述样本之间的关联性以及所述样本在对应维度上特征的关联性,以样本及特征为节点,以具有关联性的样本及特征的连线为边,构建关联网络图;
获取待检测样本在所述各维度上的特征,并计算所述待检测样本与所述样本集中样本的关联度,并将所述待检测样本在所述各维度上的特征及待检测样本嵌入所述关联网络图中,连线构成新关联网络图;
计算所述待检测样本在所述新关联网络图中与各连线上样本之间的关联度值乘积,并判断所述关联度值乘积是否超过第二预设值,如果所述关联度值乘积超过所述第二预设值,则输出对应连线上的特征所对应的样本。
本发明还提出了一种存储介质,用于存储应用程序,所述应用程序用于在运行时执行本发明所述的样本关联性检测方法。
针对恶意应用开发者故意从代码层面绕过检测和对抗的现状,本发明提出的样本关联性检测方法及系统,包括:获取样本集,并计算样本集在各维度的特征及关联度,以样本及样本特征为节点,以具有关联度的样本及节点的连线围边构建关联网络图,获取待检测样本的特征,并将其嵌入关联网络图,计算待检测样本在新关联网络图中与各连线上样本的关联度值乘积,若大于第二预设值,则输出对应连线上的样本。本发明通过非代码特征关联性检测,具有关联关系更准确,启发性更强等优点,能广泛应用于恶意代码检测、恶意代码分析等领域。
附图说明
为了更清楚地说明本发明或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明中记载的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为本发明一种样本关联性检测方法流程图;
图2为依据本发明方法构建关联网络图示意图;
图3为依据本发明方法构建新关联网络图示意图;
图4为本发明一种样本关联性检测系统结构示意图。
具体实施方式
为了使本技术领域的人员更好地理解本发明实施例中的技术方案,并使本发明的上述目的、特征和优点能够更加明显易懂,下面结合附图对本发明中技术方案作进一步详细的说明。
本发明提出了一种样本关联性检测方法、系统、电子设备以及存储介质,通过构建样本间关联网络图及对样本间关联性权值的计算,得到待检测样本与已知样本的关联性,从而为恶意代码判断、分析提供辅助信息。
在一些实施例中一种样本关联性检测方法,如图1所示,包括:
S101收集已知样本文件,组成样本集。
在恶意代码检测领域,已知样本可以为白样本(即官方发布没有恶意代码的正常样本)或者黑样本(包含恶意代码的样本),但是为了提高恶意代码检测的准确度,即充分说明待检测样本分别与白样本和黑样本的关联性,优选样本集中同时包含白样本文件及黑样本文件。
S102对样本集在多个维度进行特征提取。
作为一种示例,对样本集在多个维度进行特征提取,至少包括样本可提取的静态、动态信息,以及基于静态、动态信息处理后得到的其他信息作为特征,而这些特征可进一步细化为:样本来源维度、样本标识维度及样本名称维度等:
样本来源维度可以包括:ip、sp、email、url或域名的whois信息等;
样本标识维度可以包括:样本资源文件或图标的hash值,需要说明的是,这里的hash算法,除了具有唯一性标示MD5、SHA1、crc32算法等,还包括模糊hash、局部敏感hash算法等;
样本名称维度可以包括:样本hash、包名、程序名、文件签名或证书等,需要说明的是,这里的样本hash算法,除了具有唯一性标示MD5、SHA1、crc32算法等,还包括模糊hash、局部敏感hash算法。
S103分别计算样本集中两样本之间的关联度,如果关联度大于第一预设值,则判定两样本之间具有关联性。
可以理解的,分别计算样本集中两样本间的关联度的方法有多种,在本实施例中示出一种使用方便、检测效率高的方法。具体的:遍历获取各样本的代码中的类名及方法名,比较两样本之间类名,如果两样本之间类名相同,则进一步计算两样本在对应类名中的所有方法名交集数量,依次累加各相同类名中方法名,得到各相同类名中方法名交集总是数, 用得到的各相同类名中方法名交集总数除以两样本的所有方法名并集数量,得到的数值即为两样本之间的关联度。在本实施例中,将第一预设值设为0.5。例如:现有白样本1的情况如表1所示:
表1
Figure PCTCN2016111566-appb-000001
样本2的情况如表2所示:
表2
Figure PCTCN2016111566-appb-000002
由表可知,白样本1与黑样本2具有类名2、类名3这两个相同的类名,类名2、类名3拥有的方法名交集总是数5(即方法201、方法202、方法203、方法301、方法302),两样本的所有方法名并集数为10(即方法101、方法102、方法103、方法201、方法202、方法203、方法301、方法302、方法303、方法304),则两样本间的关联度为5/10=0.5。
可以理解,在本发明的实施例中,如果关联度小于或等于第一预设值,则可判定两样本之间不具有关联性。
再次指出,现有技术正越来越多的采取针对性的开发策略来避免代码层面的关联性检测,比如很多仿冒应用仅仅利用了正版应用的资源文件进行仿冒,并未直接从代码层面进行仿冒,因此,仅仅通过代码层面的内容来判断样本关联性是不可靠的,还需要从多个其他维度判断样本关联性。可以理解的,步骤S102也可以放在步骤S103之后执行。代码层面的启发性更多作用于有功能或代码开发同源性的场景,而在现实对抗当中,随着攻击者的手段越加丰富,高级针对性威胁的增多,越来越多的恶意代码为了更好的绕过检测和对抗,
S104分别判断样本集中各样本在相应维度上的特征是否相同;如果两样本在至少一个维度上如ip、email、url、具有相同资源文件、或图标相同等的特征相同,则认为样本在对应维度上的特征具有关联性,并给出各关联特征间的关联度值;否则判定样本在对应维度上的特征不具有关联性。
可以理解,一般可以根据经验值来设置各维度上的关联特征间的关联度值,在本实施例中,除样本与样本间的关联度需要单独计算之外,其他各关联特征间的关联度值相同, 例如,可均设置为0.5。
S105根据样本之间的关联性及样本在对应维度上的特征的关联性,以样本及特征为节点,以具有关联性的样本及特征的连线为边,构建关联网络图;
举例来说,如图2所示,如经计算后,样本1与样本4的关联度为0.7,样本2与样本3的关联度为0.85,样本1具有特征图标1及ip1,样本2具有特征sp1及图标1,样本3具有特征包名1及图标1,样本4具有图标1,则构建关联网络图,各关联特征间的关联度值为0.5。
至此,上述步骤S101-S105完成了关联网络图的构建。接下来将应用该关联网络图来实现对待检测样本进行样本关联性的检测。
S106获取待检测样本在各维度上的特征,并计算待检测样本与样本集中样本的关联度,若满足要求,则将所述待检测样本在所述各维度上的特征及待检测样本嵌入构建的关联网络图中,连线构成新关联网络图;
需要说明的是,本步骤中的所述满足要求可理解为:待检测样本与样本集中样本之间具有关联性,即待检测样本与样本集中样本的关联度大于第一预设值。
举例来说,如图3所示,经计算,待检测样本与样本2之间的关联度为0.95,与其他样本的关联度均小于0.5,且待检测样本具有特征sp1,则嵌入构建的关联网络图后,连线构成的新关联网络图。
S107计算待检测样本在新关联网络图中与各连线上样本之间的关联度值乘积,并判断所述关联度值乘积是否超过第二预设值,如0.2,如果关联度值乘积超过第二预设值,则输出对应连线上的特征所对应的样本,否则放弃所述连线上的样本。
例如,以如图3所示的新关联网络图为例,经计算得到,待检测样本与样本3之间的权值乘积为0.95*0.85=0.8075,大于0.2,则待检测样本与样本2及样本3均存在关联。将样本2及样本3输出给用户。
可以理解的,本发明能应用于样本恶意性检测,如果某样本恶意性未知,可分别计算该样本与图谱(即为上述构成的新关联网络图)中所有样本的关联度,并选出关联度大于第二预设值的样本,可根据关联度最大的样本情况来预判该未知样本的恶意性。
进一步地,在本发明的一个实施例中,在输出对应连线上的特征所对应的样本之后,该样本关联性检测方法还可包括:根据该输出对应连线上的特征所对应的样本,判断待检测样本是否为恶意样本。
其中,根据输出的样本来判断待检测样本是否为恶意样本的方式有很多种,以下将给出三种不同示例:
作为一种示例,可先从输出的样本中找出与待检测样本关联度最大的样本,之后,可 根据该样本的类型(如是恶意样本或者是正常样本)来判断该待检测样本是否为恶意样本。例如,现有待检测样本X,在图谱中与之关联度大于第二预设值的样本分别有A、B、C、D、E,其中关联度最大的为C,已知C为恶意样本,则预判X也是恶意样本。
作为另一种示例,根据所有满足第二预设值的样本(即上述输出的样本)进行投票,例如,输出样本分别为A、B、C、D、E,其中,样本A、C、D、E均为恶意样本,而B为正常样本,与待检测样本X关联的恶意样本居多,因而预判待检测样本X为恶意样本。
作为又一种示例,在某种场景下,本发明可不必直接给出待检测样本的恶意性判定结果,而是给分析人员推送该输出样本A、B、C、D、E,并由分析人员根据少量样本集对待检测样本,进行更加高效、精准的判定。
由此,根据输出具有关联性的样本来判断待检测样本是否为仿冒文件,对恶意代码的检测具有辅助作用。
本发明的优势在于,通过样本及特征等多信息的关联,给出待检测样本与已知各样本间的关联性,提供给用户,用于进一步判断待检测样本是否为恶意或仿冒样本,同时,如果在关联过程中发现大量恶意样本具有相同的特征,则可以考虑将该特征加入反病毒引擎的规则库。
与上述几种实施例提供的样本关联性检测方法相对应,本发明的一种实施例还提供一种样本关联性检测系统,由于本发明实施例提供的样本关联性检测系统与上述几种实施例提供的样本关联性检测方法相对应,因此在前述样本关联性检测方法的实施方式也适用于本实施例提供的样本关联性检测系统,在本实施例中不再详细描述。图4为本发明一种样本关联性检测系统结构示意图。如图4所示,包括:
样本收集模块401,用于收集已知白样本文件及黑样本文件,组成样本集。
特征提取模块402,用于对样本集在多个维度进行特征提取。所述的系统中,所述对样本集在多个维度进行特征提取,至少包括:样本来源维度、样本标识维度及样本名称维度;其中,所述样本来源维度包括:ip、sp、email、url或域名的whois信息;所述样本标识维度包括:样本资源文件或图标的MD5值;所述样本名称维度包括:样本包名、程序名、文件签名或证书。
样本关联度计算模块403,用于分别计算样本集中两样本之间的关联度,如果该关联度大于第一预设值,则判定两样本之间具有关联性,否则判定两样本之间不具有关联性。
所述的系统中,所述样本关联度计算模块403分别计算样本集中两样本间的关联度的具体实现过程可如下:遍历获取各样本的代码中的类名及方法名,比较两样本间类名,如类名相同,则进一步计算两样本在对应类名中的所有方法名交集个数,依次累加各相同类名中方法名交集数量,除以两样本的所有方法名并集数量,即为两样本之间的关联度。
特征判断模块404,用于分别判断样本集中各样本在各维度上特征是否相同;如果是,则认为样本在对应维度上的特征具有关联性,并给出各关联特征间的关联度值;否则判定样本在对应维度上的特征不具有关联性。所述系统中,所述各关联特征间的关联度值相同。
关联网络图构建模块405,用于根据样本间及样本在对应维度上的特征的关联性,以样本及特征为节点,以具有关联性的样本及特征的连线为边,构建关联网络图。
待检测样本关联模块406,用于获取待检测样本在各维度上的特征,并计算待检测样本与样本集中样本的关联度,并将所述待检测样本在所述各维度上的特征及待检测样本嵌入构建的关联网络图,连线构成新关联网络图。
结果输出模块407,用于计算待检测样本在新关联网络图中与各连线上样本之间的关联度值乘积,并判断所述关联度值乘积是否超过第二预设值,如果关联度值乘积超过第二预设值,则向用户输出对应连线上的特征所对应的样本。
为了提供本发明的可用性以及可行性,为了辅助恶意代码的检测,可选地,在本发明的一个实施例中,该样本关联性检测系统还可包括:恶意样本判断模块。其中,恶意样本判断模块可用于在结果输出模块407输出对应连线上的特征所对应的样本之后,根据输出对应连线上的特征所对应的样本,判断待检测样本是否为恶意样本。
为了实现上述实施例,本发明还提出了一种电子设备,包括:一个或者多个处理器;存储器;一个或多个程序,所述一个或者多个程序存储在所述存储器中,当被所述一个或者多个处理器执行时进行如下操作:
S101’,收集已知白样本文件及黑样本文件,组成样本集;
S102’,对所述样本集在多个维度进行特征提取;
S103’,分别计算所述样本集中两样本之间的关联度,如果所述关联度大于第一预设值,则判定所述两样本之间具有关联性,否则判定所述两样本之间不具有关联性;
S104’,分别判断所述样本集中各样本在各维度上的特征是否相同,如果是,则认为样本在对应维度上的特征具有关联性,并给出各关联特征间的关联度值,否则判定样本在对应维度上的特征不具有关联性;
S105’,根据所述样本之间的关联性以及所述样本在对应维度上特征的关联性,以样本及特征为节点,以具有关联性的样本及特征的连线为边,构建关联网络图;
S106’,获取待检测样本在所述各维度上的特征,并计算所述待检测样本与所述样本集中样本的关联度,并将所述待检测样本在所述各维度上的特征及待检测样本嵌入所述关联网络图中,连线构成新关联网络图;
S107’,计算所述待检测样本在所述新关联网络图中与各连线上样本之间的关联度值 乘积,并判断所述关联度值乘积是否超过第二预设值,如果所述关联度值乘积超过所述第二预设值,则输出对应连线上的特征所对应的样本。
为了实现上述实施例,本发明还提出了一种存储介质,用于存储应用程序,该应用程序用于在运行时执行本发明上述任一个实施例所述的样本关联性检测方法。
本发明的优势在于,通过样本及特征等多信息的关联,给出待检测样本与已知各样本间的关联性,提供给用户,用于进一步判断待检测样本是否为恶意或仿冒样本,同时,如果在关联过程中发现大量恶意样本具有相同的特征,则可以考虑将该特征加入反病毒引擎的规则库。
本发明提出一种样本关联性检测方法及系统,包括:获取样本集,并计算样本集在各维度的特征及关联度,以样本及样本特征为节点,以具有关联度的样本及节点的连线围边构建关联网络图,获取待检测样本的特征,并将其嵌入关联网络图,计算待检测样本在新关联网络图中与各连线上样本的关联度值乘积,若大于第二预设值,则输出对应连线上的样本。通过本发明的方法,能够利用代码及样本属性进行更多信息的判断,关联关系更准确,启发性更强。能够有效的输出具有关联性的样本,用于进一步判断待检测样本是否为仿冒文件,对恶意代码的检测具有辅助作用。
虽然通过实施例描绘了本发明,本领域普通技术人员知道,本发明有许多变形和变化而不脱离本发明的精神,希望所附的权利要求包括这些变形和变化而不脱离本发明的精神。

Claims (11)

  1. 一种样本关联性检测方法,其特征在于,包括:
    收集已知样本文件,组成样本集;
    对所述样本集在多个维度进行特征提取;
    分别计算所述样本集中两样本之间的关联度,如果所述关联度大于第一预设值,则判定所述两样本之间具有关联性,否则判定所述两样本之间不具有关联性;
    分别判断所述样本集中各样本在各维度上的特征是否相同,如果是,则认为样本在对应维度上的特征具有关联性,并给出各关联特征间的关联度值,否则判定样本在对应维度上的特征不具有关联性;
    根据所述样本之间的关联性以及所述样本在对应维度上特征的关联性,以样本及特征为节点,以具有关联性的样本及特征的连线为边,构建关联网络图;
    获取待检测样本在所述各维度上的特征,并计算所述待检测样本与所述样本集中样本的关联度,并将所述待检测样本在所述各维度上的特征及待检测样本嵌入所述关联网络图中,连线构成新关联网络图;
    计算所述待检测样本在所述新关联网络图中与各连线上样本之间的关联度值乘积,并判断所述关联度值乘积是否超过第二预设值,如果所述关联度值乘积超过所述第二预设值,则输出对应连线上的特征所对应的样本。
  2. 如权利要求1所述的方法,其特征在于,所述分别计算所述样本集中两样本之间的关联度具体包括:
    遍历获取各样本的代码中的类名及方法名,比较两样本之间类名;
    如果所述两样本之间类名相同,则进一步计算所述两样本在对应类名中的所有方法名交集数量;
    依次累加各相同类名中方法名交集数量,除以所述两样本的所有方法名并集数量,即为所述两样本之间的关联度。
  3. 如权利要求1或2所述的方法,其特征在于,对样本集在多个维度进行特征提取,包括样本提取的静态信息、动态信息、基于静态或动态信息处理后得到的其他信息。
  4. 如权利要求1至3中任一项所述的方法,其特征在于,在所述输出对应连线上的特征所对应的样本之后,所述方法还包括:
    根据预设的方法判断所述待检测样本是否为恶意样本。
  5. 一种样本关联性检测系统,其特征在于,包括:
    样本收集模块,用于收集样本文件,组成样本集;
    特征提取模块,用于对所述样本集在多个维度进行特征提取;
    样本关联度计算模块,用于分别计算所述样本集中两样本之间的关联度,如果所述关联度大于第一预设值,则判定所述两样本之间具有关联性,否则判定所述两样本之间不具有关联性;
    特征判断模块,用于分别判断所述样本集中各样本在各维度上的特征是否相同,如果是,则认为样本在对应维度上的特征具有关联性,并给出各关联特征间的关联度值,否则判定样本在对应维度上的特征不具有关联性;
    关联网络图构建模块,用于根据所述样本之间的关联性以及所述样本在对应维度上特征的关联性,以样本及特征为节点,以具有关联性的样本及特征的连线为边,构建关联网络图;
    待检测样本关联模块,用于获取待检测样本在所述各维度上的特征,并计算所述待检测样本与所述样本集中样本的关联度,并将所述待检测样本在所述各维度上的特征及待检测样本嵌入所述关联网络图中,连线构成新关联网络图;
    结果输出模块,用于计算所述待检测样本在所述新关联网络图中与各连线上样本之间的关联度值乘积,并判断所述关联度值乘积是否超过第二预设值,如果所述关联度值乘积超过所述第二预设值,则输出对应连线上的特征所对应的样本。
  6. 如权利要求5所述的系统,其特征在于,所述样本关联度计算模块具体用于:
    遍历获取各样本的代码中的类名及方法名,比较两样本之间类名;
    在果所述两样本之间类名相同时,进一步计算所述两样本在对应类名中的所有方法名交集数量;
    依次累加各相同类名中方法名交集数量,除以所述两样本的所有方法名并集数量,即为所述两样本之间的关联度。
  7. 如权利要求5或6所述的系统,其特征在于,所述各关联特征间的关联度值相同。
  8. 如权利要求5至7中任一项所述的系统,其特征在于,对样本集在多个维度进行特征提取,包括样本可提取的静态信息、动态信息、基于静态或动态信息处理后得到的其他信息。
  9. 如权利要求5至8中任一项所述的系统,其特征在于,所述系统还包括:
    恶意样本判断模块,用于在所述结果输出模块输出对应连线上的特征所对应的样本之后,根据预设的方法判断所述待检测样本是否为恶意样本。
  10. 一种电子设备,其特征在于,包括:
    一个或者多个处理器;
    存储器;
    一个或多个程序,所述一个或者多个程序存储在所述存储器中,当被所述一个或者多个处理器执行时进行如下操作:
    收集已知样本文件,组成样本集;
    对所述样本集在多个维度进行特征提取;
    分别计算所述样本集中两样本之间的关联度,如果所述关联度大于第一预设值,则判定所述两样本之间具有关联性,否则判定所述两样本之间不具有关联性;
    分别判断所述样本集中各样本在各维度上的特征是否相同,如果是,则认为样本在对应维度上的特征具有关联性,并给出各关联特征间的关联度值,否则判定样本在对应维度上的特征不具有关联性;
    根据所述样本之间的关联性以及所述样本在对应维度上特征的关联性,以样本及特征为节点,以具有关联性的样本及特征的连线为边,构建关联网络图;
    获取待检测样本在所述各维度上的特征,并计算所述待检测样本与所述样本集中样本的关联度,并将所述待检测样本在所述各维度上的特征及待检测样本嵌入所述关联网络图中,连线构成新关联网络图;
    计算所述待检测样本在所述新关联网络图中与各连线上样本之间的关联度值乘积,并判断所述关联度值乘积是否超过第二预设值,如果所述关联度值乘积超过所述第二预设值,则输出对应连线上的特征所对应的样本。
  11. 一种存储介质,其特征在于,用于存储应用程序,所述应用程序用于在运行时执行权利要求1至4中任一项所述的样本关联性检测方法。
PCT/CN2016/111566 2015-12-31 2016-12-22 一种样本关联性检测方法、系统、电子设备以及存储介质 WO2017114290A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201511015286.1 2015-12-31
CN201511015286.1A CN105975852A (zh) 2015-12-31 2015-12-31 一种基于标签传播的样本关联性检测方法及系统

Publications (1)

Publication Number Publication Date
WO2017114290A1 true WO2017114290A1 (zh) 2017-07-06

Family

ID=56988207

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/111566 WO2017114290A1 (zh) 2015-12-31 2016-12-22 一种样本关联性检测方法、系统、电子设备以及存储介质

Country Status (2)

Country Link
CN (2) CN105975852A (zh)
WO (1) WO2017114290A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110336838A (zh) * 2019-08-07 2019-10-15 腾讯科技(武汉)有限公司 账号异常检测方法、装置、终端及存储介质
CN110457359A (zh) * 2018-05-04 2019-11-15 拉萨经济技术开发区凯航科技开发有限公司 一种关联性分析方法
CN112487421A (zh) * 2020-10-26 2021-03-12 中国科学院信息工程研究所 基于异质网络的安卓恶意应用检测方法及系统

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105975852A (zh) * 2015-12-31 2016-09-28 武汉安天信息技术有限责任公司 一种基于标签传播的样本关联性检测方法及系统
CN106446687B (zh) * 2016-10-14 2020-11-03 北京奇虎科技有限公司 恶意样本的检测方法及装置
CN108268772B (zh) * 2016-12-30 2021-10-22 武汉安天信息技术有限责任公司 恶意样本的筛选方法及系统
CN107609400A (zh) * 2017-09-28 2018-01-19 深信服科技股份有限公司 计算机病毒分类方法、系统、设备及计算机可读存储介质
CN109995605B (zh) * 2018-01-02 2021-04-13 中国移动通信有限公司研究院 一种流量识别方法、装置以及计算机可读存储介质
CN108537654B (zh) * 2018-03-09 2021-04-30 平安普惠企业管理有限公司 客户关系网络图的渲染方法、装置、终端设备及介质
CN109033834A (zh) * 2018-07-17 2018-12-18 南京邮电大学盐城大数据研究院有限公司 一种基于文件关联关系的恶意软件检测方法
CN109325280A (zh) * 2018-09-13 2019-02-12 广西科技大学 一种惯性测试转台模块划分方法
CN110264333B (zh) * 2019-05-09 2023-12-08 创新先进技术有限公司 一种风险规则确定方法和装置
CN110458394B (zh) * 2019-07-05 2023-08-22 创新先进技术有限公司 一种基于对象关联度的指标测算方法及装置

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103530367A (zh) * 2013-10-12 2014-01-22 深圳先进技术研究院 一种钓鱼网站鉴别系统和方法
CN104537303A (zh) * 2014-12-30 2015-04-22 中国科学院深圳先进技术研究院 一种钓鱼网站鉴别系统及鉴别方法
CN104899253A (zh) * 2015-05-13 2015-09-09 复旦大学 面向社会图像的跨模态图像-标签相关度学习方法
CN105975852A (zh) * 2015-12-31 2016-09-28 武汉安天信息技术有限责任公司 一种基于标签传播的样本关联性检测方法及系统

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1000500A4 (en) * 1997-02-03 2002-02-05 Mci Worldcom Inc COMMUNICATION SYSTEM ARCHITECTURE
US7769851B1 (en) * 2005-01-27 2010-08-03 Juniper Networks, Inc. Application-layer monitoring and profiling network traffic
US8295198B2 (en) * 2007-12-18 2012-10-23 Solarwinds Worldwide Llc Method for configuring ACLs on network device based on flow information
CN102034042B (zh) * 2010-12-13 2012-10-03 四川大学 基于函数调用关系图特征的恶意代码检测新方法
CN102821002B (zh) * 2011-06-09 2015-08-26 中国移动通信集团河南有限公司信阳分公司 网络流量异常检测方法和系统
CN103984920B (zh) * 2014-04-25 2017-04-12 同济大学 一种基于稀疏表示与多特征点的三维人脸识别方法
CN105205397B (zh) * 2015-10-13 2018-10-16 北京奇安信科技有限公司 恶意程序样本分类方法及装置

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103530367A (zh) * 2013-10-12 2014-01-22 深圳先进技术研究院 一种钓鱼网站鉴别系统和方法
CN104537303A (zh) * 2014-12-30 2015-04-22 中国科学院深圳先进技术研究院 一种钓鱼网站鉴别系统及鉴别方法
CN104899253A (zh) * 2015-05-13 2015-09-09 复旦大学 面向社会图像的跨模态图像-标签相关度学习方法
CN105975852A (zh) * 2015-12-31 2016-09-28 武汉安天信息技术有限责任公司 一种基于标签传播的样本关联性检测方法及系统

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110457359A (zh) * 2018-05-04 2019-11-15 拉萨经济技术开发区凯航科技开发有限公司 一种关联性分析方法
CN110457359B (zh) * 2018-05-04 2024-03-08 拉萨经济技术开发区凯航科技开发有限公司 一种关联性分析方法
CN110336838A (zh) * 2019-08-07 2019-10-15 腾讯科技(武汉)有限公司 账号异常检测方法、装置、终端及存储介质
CN110336838B (zh) * 2019-08-07 2022-07-08 腾讯科技(武汉)有限公司 账号异常检测方法、装置、终端及存储介质
CN112487421A (zh) * 2020-10-26 2021-03-12 中国科学院信息工程研究所 基于异质网络的安卓恶意应用检测方法及系统
CN112487421B (zh) * 2020-10-26 2024-06-11 中国科学院信息工程研究所 基于异质网络的安卓恶意应用检测方法及系统

Also Published As

Publication number Publication date
CN106815521A (zh) 2017-06-09
CN106815521B (zh) 2019-07-23
CN105975852A (zh) 2016-09-28

Similar Documents

Publication Publication Date Title
WO2017114290A1 (zh) 一种样本关联性检测方法、系统、电子设备以及存储介质
US9160762B2 (en) Verifying application security vulnerabilities
Ramesh et al. An efficacious method for detecting phishing webpages through target domain identification
JP6047463B2 (ja) セキュリティ上の脅威を評価する評価装置及びその方法
WO2014000537A1 (zh) 一种钓鱼网站查找系统及方法
WO2017152877A1 (zh) 网络威胁事件评估方法及装置
TW201104489A (en) Method and system for cleaning malicious software and computer program product and storage medium
Ramesh et al. Identification of phishing webpages and its target domains by analyzing the feign relationship
JP2011076188A (ja) Dnsトラフィックデータを利用したボット感染者検知方法およびボット感染者検知システム
Nguyen et al. Detecting repackaged android applications using perceptual hashing
WO2016209728A1 (en) Systems and methods for categorization of web assets
WO2015188604A1 (zh) 钓鱼网页的检测方法和装置
Su et al. Suspicious URL filtering based on logistic regression with multi-view analysis
CN108399321B (zh) 基于动态指令依赖图胎记的软件局部抄袭检测方法
JP6322240B2 (ja) フィッシング・スクリプトを検出するためのシステム及び方法
JP6523799B2 (ja) 情報分析システム、情報分析方法
CN109002441A (zh) 应用名称相似度的确定方法、异常应用检测方法及系统
CN103096321A (zh) 一种用于检测恶意服务器的方法和装置
CN114254069A (zh) 域名相似度的检测方法、装置和存储介质
KR101648349B1 (ko) 웹사이트 위험도 산출 장치 및 그 방법
JP6258189B2 (ja) 特定装置、特定方法および特定プログラム
CN110598115A (zh) 一种基于人工智能多引擎的敏感网页识别方法及系统
CN107239704A (zh) 恶意网页发现方法及装置
KR101725450B1 (ko) 웹 페이지에 안전성을 제공하기 위한 평판관리 시스템 및 방법
CN111259391B (zh) 文件恶意评分方法、装置、设备及计算机可读存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16881079

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 23/11/2018)

122 Ep: pct application non-entry in european phase

Ref document number: 16881079

Country of ref document: EP

Kind code of ref document: A1