CN106815521A - A kind of sample relevance detection method, system and electronic equipment - Google Patents

A kind of sample relevance detection method, system and electronic equipment Download PDF

Info

Publication number
CN106815521A
CN106815521A CN201611199242.3A CN201611199242A CN106815521A CN 106815521 A CN106815521 A CN 106815521A CN 201611199242 A CN201611199242 A CN 201611199242A CN 106815521 A CN106815521 A CN 106815521A
Authority
CN
China
Prior art keywords
sample
feature
relevance
association
dimension
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201611199242.3A
Other languages
Chinese (zh)
Other versions
CN106815521B (en
Inventor
潘宣辰
张路
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan Antian Information Technology Co Ltd
Original Assignee
Wuhan Antian Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan Antian Information Technology Co Ltd filed Critical Wuhan Antian Information Technology Co Ltd
Publication of CN106815521A publication Critical patent/CN106815521A/en
Application granted granted Critical
Publication of CN106815521B publication Critical patent/CN106815521B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/554Detecting local intrusion or implementing counter-measures involving event detection and direct action

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)
  • Computer And Data Communications (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention proposes a kind of sample relevance detection method, system and electronic equipment.Wherein method includes:Obtain sample set, and calculate feature and the degree of association of the sample set in each dimension, with sample and sample characteristics as node, related network figure is built with the line surrounding edge of the sample with the degree of association and node, the feature of sample to be detected is obtained, and is embedded into related network figure, calculated sample to be detected and associate angle value product with sample on each line in new related network figure, if being more than the second preset value, the sample on output correspondence line.It is enlightening stronger by the way that method of the invention, it is possible to carry out the judgement of more information using code and sample attribute, incidence relation is more accurate.The sample with relevance can be effectively exported, is widely used in the fields such as Malicious Code Detection, malicious code analysis.

Description

A kind of sample relevance detection method, system and electronic equipment
Cross-Reference to Related Applications
Submitted on December 31st, 2015 this application claims Wuhan An Tian information technologies Co., Ltd, denomination of invention It is " a kind of sample relevance detection method and system propagated based on label ", Chinese Patent Application No. The priority of " 201511015286.1 ".
Technical field
The present invention relates to areas of information technology, more particularly to a kind of sample relevance detection method, system and electronic equipment.
Background technology
The context of detection of current sample relevance is analyzed in code aspect mostly, but the enlightenment of code aspect is more Multiaction resists central in functional or code development homology scene in reality, as the means of attacker are all the more rich Richness, what senior specific aim was threatened increases, and increasing malicious code deliberately takes to preferably bypass detection and resisting Targetedly development strategy is detected the relevance that avoids code aspect.By a large amount of analyses to prior art, it has been found that Although the development tool of malicious code is different, external deception skill, camouflage skill and behavior expression etc. are many non- There is good relevance code characteristic or perceived part, therefore, a kind of new sample association of research is necessary in fact Property detection technique, to improve the deficiencies in the prior art.
The content of the invention
In view of this, the present invention proposes a kind of sample relevance detection method and system, there is provided sample incidence relation more Accurately, it is enlightening stronger, the fields such as Malicious Code Detection, malicious code analysis can be widely used in.
A kind of sample relevance detection method, including:
Known sample file is collected, sample set is constituted;
Feature extraction is carried out in multiple dimensions to sample set;
The degree of association between two samples in sample set is calculated respectively, if the degree of association is more than between the first preset value, two samples With relevance, do not have relevance between otherwise two samples;
Whether judgement sample concentrates each sample feature in each dimension identical respectively;If it is, thinking sample in correspondence Feature in dimension has relevance, and provides the association angle value between each linked character;Otherwise sample does not have in correspondence dimension Relevant property;
According to the relevance of the feature in correspondence dimension of the relevance and sample between sample, with sample and node is characterized as, It is side with sample and the line of feature with relevance, builds related network figure;
Obtain feature of the sample to be detected in each dimension, and calculate sample to be detected and associated with sample in sample set Degree, the related network figure that feature and the sample to be detected insertion by the sample to be detected in each dimension build, line Constitute new related network figure;
Calculate sample to be detected and associate angle value product with sample on each line in new related network figure, and judge described Whether angle value product is associated more than the second preset value, if it exceeds the second preset value, then export the spy on correspondence line to user Levy corresponding sample.
In described method, the degree of association in the sample set of calculating respectively between two samples can be:Traversal obtains various kinds Class name and method name in this code, compare class name between two samples, and such as class name is identical, then further calculate two samples in correspondence All method name common factor numbers in class name, method name common factor quantity in each identical class name that adds up successively is all divided by two samples The degree of association between method name union quantity, as two samples.
In methods described, the association angle value between each linked character is identical.
It is described that feature extraction is carried out in multiple dimensions to sample set in described method, it is at least extractible including sample Static, multidate information, and based on the other information obtained after static, multidate information treatment as feature, and these features can It is further refined as:Samples sources dimension, sample identification dimension and sample names dimension.Specifically:
The samples sources dimension can include:Whois information of ip, sp, email, url or domain name etc.;
The sample identification dimension can include:The hash values of sample resource file or icon, it is necessary to explanation, here Hash algorithm, except with uniqueness sign MD5, SHA1, crc32 algorithm etc., also including obscuring hash, local sensitivity hash Algorithm etc.;
The sample names dimension can include:Sample hash, bag name, program name, file signature or certificate are, it is necessary to illustrate , sample hash algorithm here, except with uniqueness sign MD5, SHA1, crc32 algorithm etc., also including fuzzy Hash, local sensitivity hash algorithm etc..
After the sample corresponding to feature on the output correspondence line, the sample relevance detection method is also wrapped Include:
Sample according to corresponding to the feature on the output correspondence line, judges whether the sample to be detected is malice Sample.
A kind of sample relevance detecting system, including:
Sample collection module, for collecting known white sample file and black sample file, constitutes sample set;
Characteristic extracting module, for carrying out feature extraction in multiple dimensions to sample set;
Sample calculation of relationship degree module, for calculating the degree of association in sample set between two samples respectively, if the degree of association is big In the first preset value, then have between two samples and do not have relevance between relevance, otherwise two samples;
Whether feature judge module, concentrate each sample feature in each dimension identical for distinguishing judgement sample;If it is, Then think that feature of the sample in correspondence dimension has relevance, and provide the association angle value between each linked character;Otherwise sample Do not have relevance in correspondence dimension;
Related network figure builds module, for the association according to the feature in correspondence dimension of the relevance and sample between sample Property, with sample and node is characterized as, with sample and the line of feature with relevance as side, build related network figure;
Sample relating module to be detected, for obtaining feature of the sample to be detected in each dimension, and calculates test sample to be checked This degree of association with sample in sample set, feature and the sample to be detected insertion by the sample to be detected in each dimension The related network figure of structure, line constitutes new related network figure;
As a result output module, for calculating the degree of association of the sample to be detected with sample on each line in new related network figure Value product, and whether the association angle value product is judged more than the second preset value, if it exceeds the second preset value, then defeated to user The sample corresponding to feature gone out on correspondence line.
In described system, the degree of association in the sample set of calculating respectively between two samples is specially:Traversal obtains various kinds Class name and method name in this code, compare class name between two samples, and such as class name is identical, then further calculate two samples in correspondence All method name common factor numbers in class name, method name common factor quantity in each identical class name that adds up successively is all divided by two samples The degree of association between method name union quantity, as two samples.
In the system, the association angle value between each linked character is identical.
It is described that feature extraction is carried out in multiple dimensions to sample set in described system, it is at least extractible including sample Static, multidate information, and based on the other information obtained after static, multidate information treatment as feature, and these features can It is further refined as:Samples sources dimension, sample identification dimension and sample names dimension, specifically:
The samples sources dimension can include:The whois information of ip, sp, email, url or domain name;
The sample identification dimension can include:The hash values of sample resource file or icon, it is necessary to explanation, here Hash algorithm, except with uniqueness sign MD5, SHA1, crc32 algorithm etc., also including obscuring hash, local sensitivity hash Algorithm;
The sample names dimension can include:Sample hash, bag name, program name, file signature or certificate are, it is necessary to illustrate , sample hash algorithm here, except with uniqueness sign MD5, SHA1, crc32 algorithm etc., also including fuzzy Hash, local sensitivity hash algorithm.
The sample relevance detecting system also includes:
Malice sample judge module, for the sample corresponding to the feature on result output module output correspondence line After this, the sample according to corresponding to the feature on the output correspondence line judges whether the sample to be detected is malice Sample.
The invention allows for a kind of electronic equipment, including:One or more processor;Memory;One or more Program, one or more of program storages in the memory, when by one or more of computing devices Proceed as follows:
Known white sample file and black sample file are collected, sample set is constituted;
Feature extraction is carried out in multiple dimensions to the sample set;
The degree of association between two samples in the sample set is calculated respectively, if the degree of association is more than the first preset value, Then judge that there is relevance between two sample, otherwise judge do not have relevance between two sample;
Whether feature of each sample in each dimension be identical in judging the sample set respectively, if it is, thinking sample Feature in correspondence dimension has relevance, and provides the association angle value between each linked character, otherwise judges sample in correspondence Feature in dimension does not have relevance;
According to the relevance of the relevance between the sample and the sample feature in correspondence dimension, with sample and Node is characterized as, is side with sample and the line of feature with relevance, build related network figure;
Obtain feature of the sample to be detected in each dimension, and calculate the sample to be detected and the sample set The degree of association of sample, and feature and sample to be detected the insertion association net by the sample to be detected in each dimension In network figure, line constitutes new related network figure;
The sample to be detected is calculated to multiply with the angle value that associates between sample on each line in the new related network figure Product, and whether the association angle value product is judged more than the second preset value, if the association angle value product exceedes described second Preset value, then export the sample corresponding to the feature on correspondence line.
The invention allows for a kind of storage medium, for storing application program, the application program is used for operationally Perform sample relevance detection method of the present invention.
The present situation of detection and confrontation, sample proposed by the present invention are deliberately bypassed from code aspect for malicious application developer Relevance detection method and system, including:Sample set is obtained, and calculates feature and the degree of association of the sample set in each dimension, with sample This and sample characteristics are node, and related network figure is built with the line surrounding edge of the sample with the degree of association and node, are obtained to be checked The feature of test sample sheet, and be embedded into related network figure, calculate sample to be detected in new related network figure with each line loading This association angle value product, if being more than the second preset value, the sample on output correspondence line.The present invention is by non-code characteristic Relevance is detected, more accurate with incidence relation, the advantages of enlightening stronger, can be widely used in Malicious Code Detection, malice The fields such as code analysis.
Brief description of the drawings
In order to illustrate more clearly of technical scheme of the invention or of the prior art, below will be to embodiment or prior art The accompanying drawing to be used needed for description is briefly described, it should be apparent that, during drawings in the following description are only the present invention Some embodiments recorded, for those of ordinary skill in the art, on the premise of not paying creative work, can be with Other accompanying drawings are obtained according to these accompanying drawings.
Fig. 1 is a kind of sample relevance detection method flow chart of the invention;
Fig. 2 is to build related network diagram according to the inventive method to be intended to;
Fig. 3 is to build new related network diagram according to the inventive method to be intended to;
Fig. 4 is a kind of sample relevance detecting system structural representation of the invention.
Specific embodiment
In order that those skilled in the art more fully understand the technical scheme in the embodiment of the present invention, and make of the invention Above-mentioned purpose, feature and advantage can be more obvious understandable, and technical scheme in the present invention is made further in detail below in conjunction with the accompanying drawings Thin explanation.
The present invention proposes a kind of sample relevance detection method, system and electronic equipment, by building sample between associate Network and the calculating to relevance weights between sample, obtain the relevance of sample to be detected and known sample, so as to be malice Code judges, analysis provided auxiliary information.
A kind of sample relevance detection method in certain embodiments, as shown in figure 1, including:
S101 collects known sample file, constitutes sample set.
In Malicious Code Detection field, it is known that sample can be that (i.e. official issues the normal of no malicious code to white sample Sample) or black sample (sample comprising malicious code), but in order to improve the degree of accuracy of Malicious Code Detection, i.e., fully say Bright sample to be detected respectively with white sample and the relevance of black sample, preferably in sample set simultaneously include white sample file and black sample Presents.
S102 carries out feature extraction to sample set in multiple dimensions.
As a kind of example, feature extraction is carried out in multiple dimensions to sample set, at least including sample it is extractible it is static, Multidate information, and based on the other information obtained after static, multidate information treatment as feature, and these features can be further It is refined as:Samples sources dimension, sample identification dimension and sample names dimension etc.:
Samples sources dimension can include:Whois information of ip, sp, email, url or domain name etc.;
Sample identification dimension can include:The hash values of sample resource file or icon, it is necessary to explanation, here Hash algorithm, except with uniqueness sign MD5, SHA1, crc32 algorithm etc., also being calculated including fuzzy hash, local sensitivity hash Method etc.;
Sample names dimension can include:Sample hash, bag name, program name, file signature or certificate etc., it is necessary to explanation Be, sample hash algorithm here, except with uniqueness sign MD5, SHA1, crc32 algorithm etc., also including fuzzy hash, Local sensitivity hash algorithm.
S103 calculates the degree of association between two samples in sample set respectively, if the degree of association is more than the first preset value, sentences There is relevance between fixed two samples.
It should be understood that the method for calculating the degree of association in sample set between two samples respectively has various, in the present embodiment A kind of easy to use, detection efficiency method high is shown.Specifically:Traversal obtains class name and method in the code of each sample Name, compares class name between two samples, if class name is identical between two samples, further two samples of calculating are in correspondence class name All method name common factor quantity, method name in each identical class name that adds up successively, method name is occured simultaneously always in obtaining each identical class name Number, with the total all method name union quantity divided by two samples of method name common factor, the number for obtaining in each identical class name for obtaining Value is the degree of association between two samples.In the present embodiment, the first preset value is set to 0.5.For example:Existing white sample 1 Situation is as shown in table 1:
Table 1
The situation of sample 2 is as shown in table 2:
Table 2
As seen from table, white sample 1 and black sample 2 have class name 2, class name 3 the two identical class names, class name 2, class name 3 The method name that possesses is occured simultaneously always number 5 (i.e. method 201, method 202, method 203, method 301, method 302), the institute of two samples Have method name union number for 10 (i.e. method 101, method 102, method 103, method 201, method 202, method 203, method 301, Method 302, method 303, method 304), then the degree of association between two samples is 5/10=0.5.
It is appreciated that in an embodiment of the present invention, if the degree of association is less than or equal to the first preset value, can determine that two Do not have relevance between sample.
, it is noted once again that prior art is just more and more taking targetedly development strategy to avoid the association of code aspect Property detection, such as many counterfeit applications with only legal application resource file carry out it is counterfeit, not directly from code layer Face carry out it is counterfeit, therefore, it is insecure to carry out judgement sample relevance only by the content of code aspect, in addition it is also necessary to from multiple Other dimension judgement sample relevances.It should be understood that step S102 can also be put performing after step s 103.Code aspect It is enlightening more act on functional or code development homology scene, and in the middle of reality confrontation, with attacker's Means are all the more enriched, and what senior specific aim was threatened increases, and increasing malicious code is detected and resisted to preferably bypass,
Whether S104 difference judgement samples concentrate feature of each sample in respective dimensions identical;If two samples are at least Such as ip, email, url, identical with the mutually equal feature of same asset file or icon in one dimension, then it is assumed that sample exists Feature in correspondence dimension has relevance, and provides the association angle value between each linked character;Otherwise judge that sample is tieed up in correspondence Feature on degree does not have relevance.
It is appreciated that the association angle value between the linked character in each dimension can be typically set based on experience value, at this In embodiment, in addition to the degree of association between sample and sample needs individually to calculate, the association angle value phase between other each linked characters Together, for example, 0.5 can be disposed as.
The relevance of features of the S105 according to the relevance and sample between sample in correspondence dimension, with sample and feature It is node, is side with sample and the line of feature with relevance, builds related network figure;
For example, as shown in Fig. 2 after being such as computed, sample 1 is 0.7 with the degree of association of sample 4, sample 2 and sample 3 The degree of association be 0.85, sample 1 have feature icon 1 and ip1, sample 2 have feature sp1 and icon 1, sample 3 have feature Bag name 1 and icon 1, sample 4 have icon 1, then it is 0.5 to build the association angle value between related network figure, each linked character.
So far, above-mentioned steps S101-S105 completes the structure of related network figure.Next will be using the related network figure To realize carrying out sample to be detected the detection of sample relevance.
S106 obtains feature of the sample to be detected in each dimension, and calculates the pass of sample to be detected and sample in sample set Connection degree, if meet requiring, what feature and the sample to be detected insertion by the sample to be detected in each dimension built In related network figure, line constitutes new related network figure;
It should be noted that the satisfaction requirement in this step can be regarded as:Sample to be detected and sample in sample set Between there is sample in relevance, i.e., sample to be detected and sample set the degree of association be more than the first preset value.
For example, as shown in figure 3, being computed, the degree of association between sample to be detected and sample 2 is 0.95, with other The degree of association of sample is respectively less than 0.5, and sample to be detected has feature sp1, then after the embedded related network figure for building, line structure Into new related network figure.
S107 calculates sample to be detected and associates angle value product between sample on each line in new related network figure, and Whether the association angle value product is judged more than the second preset value, such as 0.2, if association angle value product is more than the second preset value, The sample corresponding to the feature on correspondence line is then exported, the sample on the line is otherwise abandoned.
For example, by taking new related network figure as shown in Figure 3 as an example, calculating, between sample to be detected and sample 3 Weights product is 0.95*0.85=0.8075, and more than 0.2, then sample to be detected exists with sample 2 and sample 3 and associates.By sample Sheet 2 and sample 3 are exported to user.
It should be understood that present invention can apply to the malicious detection of sample, if certain sample is malicious unknown, can count respectively The degree of association of the sample and all samples in collection of illustrative plates (the new related network figure of as above-mentioned composition) is calculated, and selects the degree of association and be more than The sample of the second preset value, can be according to the maximum sample situation of the degree of association come the malicious of the anticipation unknown sample.
Further, in one embodiment of the invention, output correspondence line on feature corresponding to sample it Afterwards, the sample relevance detection method may also include:Sample according to corresponding to the feature on output correspondence line, judges to treat Whether detection sample is malice sample.
Wherein, the sample according to output judges whether sample to be detected is that the mode of malice sample has many kinds, below Three kinds of different examples will be given:
As a kind of example, the sample maximum with the sample degree of association to be detected can be first found out from the sample of output, afterwards, Can judge whether the sample to be detected is malice sample according to the type (malice sample or normal sample in this way) of the sample This.For example, existing sample X to be detected, in collection of illustrative plates degree of being associated more than the sample of the second preset value have respectively A, B, C, D, E, the wherein degree of association maximum are C, it is known that C is malice sample, then anticipation X is also malice sample.
Used as another example, the sample (sample of i.e. above-mentioned output) according to the second preset value of all satisfactions is thrown Ticket, for example, output sample is respectively A, B, C, D, E, wherein, sample A, C, D, E are malice sample, and B is normal sample, with The malice sample of sample X associations to be detected is in the majority, thus anticipation sample X to be detected is malice sample.
Used as another example, under certain scene, the present invention can directly give the malicious of sample to be detected and sentence Determine result, but the output sample A, B, C, D, E are pushed to analysis personnel, and by analyzing personnel according to a small amount of sample set to be checked Test sample sheet, more efficiently, accurately judge.
Thus, judge whether sample to be detected is counterfeit file according to sample of the output with relevance, to malice generation The detection of code has booster action.
It is an advantage of the present invention that by the association of the multi informations such as sample and feature, providing sample to be detected each with known Relevance between sample, there is provided to user, for determining whether whether sample to be detected is malice or counterfeit sample, meanwhile, If finding that a large amount of malice samples have identical feature in association process, Ke Yikaolv this feature addition anti-virus is drawn The rule base held up.
Corresponding with the sample relevance detection method that above-mentioned several embodiments are provided, a kind of embodiment of the invention is also carried For a kind of sample relevance detecting system, due to sample relevance detecting system provided in an embodiment of the present invention and above-mentioned several realities The sample relevance detection method for applying example offer is corresponding, therefore implementation method in foregoing sample relevance detection method is also fitted For the sample relevance detecting system that the present embodiment is provided, it is not described in detail in the present embodiment.Fig. 4 is a kind of for the present invention Sample relevance detecting system structural representation.As shown in figure 4, including:
Sample collection module 401, for collecting known white sample file and black sample file, constitutes sample set.
Characteristic extracting module 402, for carrying out feature extraction in multiple dimensions to sample set.It is described in described system Feature extraction is carried out in multiple dimensions to sample set, is at least included:Samples sources dimension, sample identification dimension and sample names dimension Degree;Wherein, the samples sources dimension includes:The whois information of ip, sp, email, url or domain name;The sample identification dimension Degree includes:The MD5 values of sample resource file or icon;The sample names dimension includes:Sample bag name, program name, file label Name or certificate.
Sample calculation of relationship degree module 403, for calculating the degree of association in sample set between two samples respectively, if the pass Connection degree is more than the first preset value, then judge there is relevance between two samples, otherwise judges do not have relevance between two samples.
In described system, the sample calculation of relationship degree module 403 calculates the association between two samples in sample set respectively The process that implements of degree can be as follows:Traversal obtains class name and method name in the code of each sample, compares class name between two samples, As class name is identical, then all method name common factor numbers of two samples in correspondence class name are further calculated, added up successively each identical Method name common factor quantity in class name, divided by all method name union quantity of two samples, the degree of association between as two samples.
Whether feature judge module 404, concentrate each sample feature in each dimension identical for distinguishing judgement sample;If It is, then it is assumed that feature of the sample in correspondence dimension has relevance, and provides the association angle value between each linked character;Otherwise sentence Feature of the random sample sheet in correspondence dimension does not have relevance.In the system, the association angle value phase between each linked character Together.
Related network figure build module 405, for according between sample and sample correspondence dimension on feature relevance, With sample and node is characterized as, is side with sample and the line of feature with relevance, build related network figure.
Sample relating module 406 to be detected, for obtaining feature of the sample to be detected in each dimension, and calculates to be detected The degree of association of sample in sample and sample set, and feature and sample to be detected by the sample to be detected in each dimension The embedded related network figure for building, line constitutes new related network figure.
As a result output module 407, for calculating sample to be detected in new related network figure between sample on each line Association angle value product, and whether the association angle value product is judged more than the second preset value, if association angle value product exceedes Second preset value, then to the sample corresponding to the feature on user's output correspondence line.
In order to provide availability of the invention and feasibility, in order to aid in the detection of malicious code, alternatively, in this hair In bright one embodiment, the sample relevance detecting system may also include:Malice sample judge module.Wherein, malice sample After the sample corresponding to feature that judge module can be used on the output correspondence line of result output module 407, according to output The sample corresponding to feature on correspondence line, judges whether sample to be detected is malice sample.
In order to realize above-described embodiment, the invention allows for a kind of electronic equipment, including:One or more treatment Device;Memory;One or more programs, one or more of program storages in the memory, when one Or proceeded as follows during multiple computing devices:
S101 ', collects known white sample file and black sample file, constitutes sample set;
S102 ', feature extraction is carried out to the sample set in multiple dimensions;
S103 ', calculates the degree of association between two samples in the sample set respectively, if the degree of association is pre- more than first If value, then judge that there is relevance between two sample, otherwise judge do not have relevance between two sample;
S104 ', whether feature of each sample in each dimension be identical during the sample set is judged respectively, if it is, recognizing For feature of the sample in correspondence dimension has relevance, and the association angle value between each linked character is provided, otherwise judge sample Feature in correspondence dimension does not have relevance;
S105 ', the relevance of the feature in correspondence dimension according to the relevance between the sample and the sample, with Sample and node is characterized as, is side with sample and the line of feature with relevance, build related network figure;
S106 ', obtains feature of the sample to be detected in each dimension, and calculate the sample to be detected and the sample The degree of association of this concentration sample, and feature by the sample to be detected in each dimension and sample to be detected insertion are described In related network figure, line constitutes new related network figure;
S107 ', calculates sample to be detected associating between sample on each line in the new related network figure Angle value product, and whether the association angle value product is judged more than the second preset value, if the association angle value product exceedes institute The second preset value is stated, then exports the sample corresponding to the feature on correspondence line.
In order to realize above-described embodiment, the invention allows for a kind of storage medium, for storing application program, the application Program is used to operationally perform the sample relevance detection method described in any of the above-described embodiment of the invention.
It is an advantage of the present invention that by the association of the multi informations such as sample and feature, providing sample to be detected each with known Relevance between sample, there is provided to user, for determining whether whether sample to be detected is malice or counterfeit sample, meanwhile, If finding that a large amount of malice samples have identical feature in association process, Ke Yikaolv this feature addition anti-virus is drawn The rule base held up.
The present invention proposes a kind of sample relevance detection method and system, including:Sample set is obtained, and calculates sample set and existed The feature and the degree of association of each dimension, with sample and sample characteristics as node, are enclosed with sample and the line of node with the degree of association Side builds related network figure, obtains the feature of sample to be detected, and is embedded into related network figure, calculates sample to be detected new Angle value product is associated with sample on each line in related network figure, if being more than the second preset value, on output correspondence line Sample.By the way that method of the invention, it is possible to carry out the judgement of more information using code and sample attribute, incidence relation is more accurate Really, it is enlightening stronger.The sample with relevance can be effectively exported, for determining whether whether sample to be detected is imitative File is emitted, there is booster action to the detection of malicious code.
Although depicting the present invention by embodiment, it will be appreciated by the skilled addressee that the present invention have it is many deformation and Change is without deviating from spirit of the invention, it is desirable to which appended claim includes these deformations and changes without deviating from of the invention Spirit.

Claims (10)

1. a kind of sample relevance detection method, it is characterised in that including:
Known sample file is collected, sample set is constituted;
Feature extraction is carried out in multiple dimensions to the sample set;
The degree of association between two samples in the sample set is calculated respectively, if the degree of association is more than the first preset value, is sentenced There is relevance between fixed two sample, otherwise judge do not have relevance between two sample;
Whether feature of each sample in each dimension be identical in judging the sample set respectively, if it is, thinking sample right Answering the feature in dimension has relevance, and provides the association angle value between each linked character, otherwise judges sample in correspondence dimension On feature do not have relevance;
The relevance of the feature in correspondence dimension according to the relevance between the sample and the sample, with sample and feature It is node, is side with sample and the line of feature with relevance, builds related network figure;
Feature of the sample to be detected in each dimension is obtained, and calculates the sample to be detected with sample in the sample set The degree of association, and feature by the sample to be detected in each dimension and sample to be detected are embedded in the related network figure In, line constitutes new related network figure;
Calculate the sample to be detected and associate angle value product between sample on each line in the new related network figure, and Whether the association angle value product is judged more than the second preset value, if the association angle value product exceedes described second and presets Value, then export the sample corresponding to the feature on correspondence line.
2. the method for claim 1, it is characterised in that the pass calculated respectively in the sample set between two samples Connection degree is specifically included:
Traversal obtains class name and method name in the code of each sample, compares class name between two samples;
If class name is identical between two sample, all method names of two sample in correspondence class name are further calculated Common factor quantity;
Add up method name common factor quantity in each identical class name successively, divided by all method name union quantity of two sample, i.e., It is the degree of association between two sample.
3. the method for claim 1, it is characterised in that feature extraction, including sample are carried out in multiple dimensions to sample set The static information of this extraction, multidate information, based on the other information obtained after either statically or dynamically information processing.
4. method as claimed any one in claims 1 to 3, it is characterised in that the feature on the output correspondence line After corresponding sample, methods described also includes:
Judge whether the sample to be detected is malice sample according to default method.
5. a kind of sample relevance detecting system, it is characterised in that including:
Sample collection module, for collecting sample file, constitutes sample set;
Characteristic extracting module, for carrying out feature extraction in multiple dimensions to the sample set;
Sample calculation of relationship degree module, for calculating the degree of association in the sample set between two samples respectively, if the pass Connection degree is more than the first preset value, then judge there is relevance between two sample, otherwise judges do not have between two sample Relevant property;
Feature judge module, for judging the sample set respectively in feature of each sample in each dimension it is whether identical, if It is, then it is assumed that feature of the sample in correspondence dimension has relevance, and provides the association angle value between each linked character, otherwise sentences Feature of the random sample sheet in correspondence dimension does not have relevance;
Related network figure builds module, for special in correspondence dimension according to the relevance between the sample and the sample The relevance levied, with sample and is characterized as node, is side with sample and the line of feature with relevance, builds related network Figure;
Sample relating module to be detected, for obtaining feature of the sample to be detected in each dimension, and calculates described to be checked The degree of association of test sample sheet and sample in the sample set, and feature by the sample to be detected in each dimension and to be checked In this insertion of test sample related network figure, line constitutes new related network figure;
As a result output module, for calculating the sample to be detected in the new related network figure between sample on each line Association angle value product, and whether the association angle value product is judged more than the second preset value, if the association angle value product More than second preset value, then the sample corresponding to the feature on correspondence line is exported.
6. system as claimed in claim 5, it is characterised in that the sample calculation of relationship degree module specifically for:
Traversal obtains class name and method name in the code of each sample, compares class name between two samples;
When class name is identical between fruit two sample, all method names of two sample in correspondence class name are further calculated Common factor quantity;
Add up method name common factor quantity in each identical class name successively, divided by all method name union quantity of two sample, i.e., It is the degree of association between two sample.
7. system as claimed in claim 5, it is characterised in that the association angle value between each linked character is identical.
8. the system as any one of claim 5 to 7, it is characterised in that feature is carried out in multiple dimensions to sample set Extract, including the extractible static information of sample, multidate information, based on obtained after either statically or dynamically information processing other letter Breath.
9. the system as any one of claim 5 to 8, it is characterised in that the system also includes:
Malice sample judge module, for the result output module output correspondence line on feature corresponding to sample it Afterwards, judge whether the sample to be detected is malice sample according to default method.
10. a kind of electronic equipment, it is characterised in that including:
One or more processor;
Memory;
One or more programs, one or more of program storages in the memory, when by one or many Proceeded as follows during individual computing device:
Known sample file is collected, sample set is constituted;
Feature extraction is carried out in multiple dimensions to the sample set;
The degree of association between two samples in the sample set is calculated respectively, if the degree of association is more than the first preset value, is sentenced There is relevance between fixed two sample, otherwise judge do not have relevance between two sample;
Whether feature of each sample in each dimension be identical in judging the sample set respectively, if it is, thinking sample right Answering the feature in dimension has relevance, and provides the association angle value between each linked character, otherwise judges sample in correspondence dimension On feature do not have relevance;
The relevance of the feature in correspondence dimension according to the relevance between the sample and the sample, with sample and feature It is node, is side with sample and the line of feature with relevance, builds related network figure;
Feature of the sample to be detected in each dimension is obtained, and calculates the sample to be detected with sample in the sample set The degree of association, and feature by the sample to be detected in each dimension and sample to be detected are embedded in the related network figure In, line constitutes new related network figure;
Calculate the sample to be detected and associate angle value product between sample on each line in the new related network figure, and Whether the association angle value product is judged more than the second preset value, if the association angle value product exceedes described second and presets Value, then export the sample corresponding to the feature on correspondence line.
CN201611199242.3A 2015-12-31 2016-12-22 A kind of sample relevance detection method, system and electronic equipment Active CN106815521B (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201511015286.1A CN105975852A (en) 2015-12-31 2015-12-31 Method and system for detecting sample relevance based on label propagation
CN2015110152861 2015-12-31

Publications (2)

Publication Number Publication Date
CN106815521A true CN106815521A (en) 2017-06-09
CN106815521B CN106815521B (en) 2019-07-23

Family

ID=56988207

Family Applications (2)

Application Number Title Priority Date Filing Date
CN201511015286.1A Pending CN105975852A (en) 2015-12-31 2015-12-31 Method and system for detecting sample relevance based on label propagation
CN201611199242.3A Active CN106815521B (en) 2015-12-31 2016-12-22 A kind of sample relevance detection method, system and electronic equipment

Family Applications Before (1)

Application Number Title Priority Date Filing Date
CN201511015286.1A Pending CN105975852A (en) 2015-12-31 2015-12-31 Method and system for detecting sample relevance based on label propagation

Country Status (2)

Country Link
CN (2) CN105975852A (en)
WO (1) WO2017114290A1 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107609400A (en) * 2017-09-28 2018-01-19 深信服科技股份有限公司 Computer virus classification method, system, equipment and computer-readable recording medium
CN109325280A (en) * 2018-09-13 2019-02-12 广西科技大学 A kind of inertia test table module partition method
CN109995605A (en) * 2018-01-02 2019-07-09 中国移动通信有限公司研究院 A kind of method for recognizing flux and device and computer readable storage medium
CN110264333A (en) * 2019-05-09 2019-09-20 阿里巴巴集团控股有限公司 A kind of risk rule determines method and apparatus
CN110458394A (en) * 2019-07-05 2019-11-15 阿里巴巴集团控股有限公司 A kind of index measuring and calculating method and device based on Object related degree

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105975852A (en) * 2015-12-31 2016-09-28 武汉安天信息技术有限责任公司 Method and system for detecting sample relevance based on label propagation
CN106446687B (en) * 2016-10-14 2020-11-03 北京奇虎科技有限公司 Malicious sample detection method and device
CN108268772B (en) * 2016-12-30 2021-10-22 武汉安天信息技术有限责任公司 Method and system for screening malicious samples
CN108537654B (en) * 2018-03-09 2021-04-30 平安普惠企业管理有限公司 Rendering method and device of customer relationship network graph, terminal equipment and medium
CN110457359B (en) * 2018-05-04 2024-03-08 拉萨经济技术开发区凯航科技开发有限公司 Correlation analysis method
CN109033834A (en) * 2018-07-17 2018-12-18 南京邮电大学盐城大数据研究院有限公司 A kind of malware detection method based on file association relationship
CN110336838B (en) * 2019-08-07 2022-07-08 腾讯科技(武汉)有限公司 Account abnormity detection method, device, terminal and storage medium
CN112487421B (en) * 2020-10-26 2024-06-11 中国科学院信息工程研究所 Android malicious application detection method and system based on heterogeneous network

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1262836A (en) * 1997-02-03 2000-08-09 Mci通讯公司 Communication system architecture
US7769851B1 (en) * 2005-01-27 2010-08-03 Juniper Networks, Inc. Application-layer monitoring and profiling network traffic
CN101933290A (en) * 2007-12-18 2010-12-29 太阳风环球有限责任公司 Method for configuring acls on network device based on flow information
CN102034042A (en) * 2010-12-13 2011-04-27 四川大学 Novel unwanted code detecting method based on characteristics of function call relationship graph
CN102821002A (en) * 2011-06-09 2012-12-12 中国移动通信集团河南有限公司信阳分公司 Method and system for network flow anomaly detection
CN103984920A (en) * 2014-04-25 2014-08-13 同济大学 Three-dimensional face identification method based on sparse representation and multiple feature points
CN105205397A (en) * 2015-10-13 2015-12-30 北京奇虎科技有限公司 Rogue program sample classification method and device

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103530367B (en) * 2013-10-12 2017-07-18 深圳先进技术研究院 A kind of fishing website identification system and method
CN104537303B (en) * 2014-12-30 2017-10-24 中国科学院深圳先进技术研究院 A kind of fishing website identification system and discrimination method
CN104899253B (en) * 2015-05-13 2018-06-26 复旦大学 Towards the society image across modality images-label degree of correlation learning method
CN105975852A (en) * 2015-12-31 2016-09-28 武汉安天信息技术有限责任公司 Method and system for detecting sample relevance based on label propagation

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1262836A (en) * 1997-02-03 2000-08-09 Mci通讯公司 Communication system architecture
US7769851B1 (en) * 2005-01-27 2010-08-03 Juniper Networks, Inc. Application-layer monitoring and profiling network traffic
CN101933290A (en) * 2007-12-18 2010-12-29 太阳风环球有限责任公司 Method for configuring acls on network device based on flow information
CN102034042A (en) * 2010-12-13 2011-04-27 四川大学 Novel unwanted code detecting method based on characteristics of function call relationship graph
CN102821002A (en) * 2011-06-09 2012-12-12 中国移动通信集团河南有限公司信阳分公司 Method and system for network flow anomaly detection
CN103984920A (en) * 2014-04-25 2014-08-13 同济大学 Three-dimensional face identification method based on sparse representation and multiple feature points
CN105205397A (en) * 2015-10-13 2015-12-30 北京奇虎科技有限公司 Rogue program sample classification method and device

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107609400A (en) * 2017-09-28 2018-01-19 深信服科技股份有限公司 Computer virus classification method, system, equipment and computer-readable recording medium
CN109995605A (en) * 2018-01-02 2019-07-09 中国移动通信有限公司研究院 A kind of method for recognizing flux and device and computer readable storage medium
CN109325280A (en) * 2018-09-13 2019-02-12 广西科技大学 A kind of inertia test table module partition method
CN110264333A (en) * 2019-05-09 2019-09-20 阿里巴巴集团控股有限公司 A kind of risk rule determines method and apparatus
CN110264333B (en) * 2019-05-09 2023-12-08 创新先进技术有限公司 Risk rule determining method and apparatus
CN110458394A (en) * 2019-07-05 2019-11-15 阿里巴巴集团控股有限公司 A kind of index measuring and calculating method and device based on Object related degree
CN110458394B (en) * 2019-07-05 2023-08-22 创新先进技术有限公司 Index measuring and calculating method and device based on object association degree

Also Published As

Publication number Publication date
WO2017114290A1 (en) 2017-07-06
CN106815521B (en) 2019-07-23
CN105975852A (en) 2016-09-28

Similar Documents

Publication Publication Date Title
CN106815521A (en) A kind of sample relevance detection method, system and electronic equipment
CN105825138B (en) A kind of method and apparatus of sensitive data identification
Gelernter et al. Cross-site search attacks
CN107180192A (en) Android malicious application detection method and system based on multi-feature fusion
CN106713303A (en) Malicious domain name detection method and system
CN105224600B (en) A kind of detection method and device of Sample Similarity
WO2016201938A1 (en) Multi-stage phishing website detection method and system
Bailey et al. Statistics on password re-use and adaptive strength for financial accounts
Huang et al. Mitigate web phishing using site signatures
CN110881050A (en) Security threat detection method and related product
CN107247902A (en) Malware categorizing system and method
CN106599688A (en) Application category-based Android malicious software detection method
Su et al. Suspicious URL filtering based on logistic regression with multi-view analysis
Torres et al. Malicious PDF documents detection using machine learning techniques
Cui et al. A password strength evaluation algorithm based on sensitive personal information
CN113792298A (en) Method and device for detecting vehicle safety risk
CN107085684A (en) The detection method and device of performance of program
Orunsolu et al. An Anti-Phishing Kit Scheme for Secure Web Transactions.
CN106911635A (en) A kind of method and device of detection website with the presence or absence of backdoor programs
EP4137976A1 (en) Learning device, detection device, learning method, detection method, learning program, and detection program
CN107995167B (en) Equipment identification method and server
CN115599345A (en) Application security requirement analysis recommendation method based on knowledge graph
CN107239704A (en) Malicious web pages find method and device
Hao et al. JavaScript malicious codes analysis based on naive bayes classification
Yu et al. HoneyGAN: creating indistinguishable honeywords with improved generative adversarial networks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant