CN106815521A - A kind of sample relevance detection method, system and electronic equipment - Google Patents
A kind of sample relevance detection method, system and electronic equipment Download PDFInfo
- Publication number
- CN106815521A CN106815521A CN201611199242.3A CN201611199242A CN106815521A CN 106815521 A CN106815521 A CN 106815521A CN 201611199242 A CN201611199242 A CN 201611199242A CN 106815521 A CN106815521 A CN 106815521A
- Authority
- CN
- China
- Prior art keywords
- sample
- feature
- relevance
- association
- dimension
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/50—Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
- G06F21/55—Detecting local intrusion or implementing counter-measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/50—Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
- G06F21/55—Detecting local intrusion or implementing counter-measures
- G06F21/554—Detecting local intrusion or implementing counter-measures involving event detection and direct action
Landscapes
- Engineering & Computer Science (AREA)
- Computer Security & Cryptography (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Computer Hardware Design (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
- Computer And Data Communications (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention proposes a kind of sample relevance detection method, system and electronic equipment.Wherein method includes:Obtain sample set, and calculate feature and the degree of association of the sample set in each dimension, with sample and sample characteristics as node, related network figure is built with the line surrounding edge of the sample with the degree of association and node, the feature of sample to be detected is obtained, and is embedded into related network figure, calculated sample to be detected and associate angle value product with sample on each line in new related network figure, if being more than the second preset value, the sample on output correspondence line.It is enlightening stronger by the way that method of the invention, it is possible to carry out the judgement of more information using code and sample attribute, incidence relation is more accurate.The sample with relevance can be effectively exported, is widely used in the fields such as Malicious Code Detection, malicious code analysis.
Description
Cross-Reference to Related Applications
Submitted on December 31st, 2015 this application claims Wuhan An Tian information technologies Co., Ltd, denomination of invention
It is " a kind of sample relevance detection method and system propagated based on label ", Chinese Patent Application No.
The priority of " 201511015286.1 ".
Technical field
The present invention relates to areas of information technology, more particularly to a kind of sample relevance detection method, system and electronic equipment.
Background technology
The context of detection of current sample relevance is analyzed in code aspect mostly, but the enlightenment of code aspect is more
Multiaction resists central in functional or code development homology scene in reality, as the means of attacker are all the more rich
Richness, what senior specific aim was threatened increases, and increasing malicious code deliberately takes to preferably bypass detection and resisting
Targetedly development strategy is detected the relevance that avoids code aspect.By a large amount of analyses to prior art, it has been found that
Although the development tool of malicious code is different, external deception skill, camouflage skill and behavior expression etc. are many non-
There is good relevance code characteristic or perceived part, therefore, a kind of new sample association of research is necessary in fact
Property detection technique, to improve the deficiencies in the prior art.
The content of the invention
In view of this, the present invention proposes a kind of sample relevance detection method and system, there is provided sample incidence relation more
Accurately, it is enlightening stronger, the fields such as Malicious Code Detection, malicious code analysis can be widely used in.
A kind of sample relevance detection method, including:
Known sample file is collected, sample set is constituted;
Feature extraction is carried out in multiple dimensions to sample set;
The degree of association between two samples in sample set is calculated respectively, if the degree of association is more than between the first preset value, two samples
With relevance, do not have relevance between otherwise two samples;
Whether judgement sample concentrates each sample feature in each dimension identical respectively;If it is, thinking sample in correspondence
Feature in dimension has relevance, and provides the association angle value between each linked character;Otherwise sample does not have in correspondence dimension
Relevant property;
According to the relevance of the feature in correspondence dimension of the relevance and sample between sample, with sample and node is characterized as,
It is side with sample and the line of feature with relevance, builds related network figure;
Obtain feature of the sample to be detected in each dimension, and calculate sample to be detected and associated with sample in sample set
Degree, the related network figure that feature and the sample to be detected insertion by the sample to be detected in each dimension build, line
Constitute new related network figure;
Calculate sample to be detected and associate angle value product with sample on each line in new related network figure, and judge described
Whether angle value product is associated more than the second preset value, if it exceeds the second preset value, then export the spy on correspondence line to user
Levy corresponding sample.
In described method, the degree of association in the sample set of calculating respectively between two samples can be:Traversal obtains various kinds
Class name and method name in this code, compare class name between two samples, and such as class name is identical, then further calculate two samples in correspondence
All method name common factor numbers in class name, method name common factor quantity in each identical class name that adds up successively is all divided by two samples
The degree of association between method name union quantity, as two samples.
In methods described, the association angle value between each linked character is identical.
It is described that feature extraction is carried out in multiple dimensions to sample set in described method, it is at least extractible including sample
Static, multidate information, and based on the other information obtained after static, multidate information treatment as feature, and these features can
It is further refined as:Samples sources dimension, sample identification dimension and sample names dimension.Specifically:
The samples sources dimension can include:Whois information of ip, sp, email, url or domain name etc.;
The sample identification dimension can include:The hash values of sample resource file or icon, it is necessary to explanation, here
Hash algorithm, except with uniqueness sign MD5, SHA1, crc32 algorithm etc., also including obscuring hash, local sensitivity hash
Algorithm etc.;
The sample names dimension can include:Sample hash, bag name, program name, file signature or certificate are, it is necessary to illustrate
, sample hash algorithm here, except with uniqueness sign MD5, SHA1, crc32 algorithm etc., also including fuzzy
Hash, local sensitivity hash algorithm etc..
After the sample corresponding to feature on the output correspondence line, the sample relevance detection method is also wrapped
Include:
Sample according to corresponding to the feature on the output correspondence line, judges whether the sample to be detected is malice
Sample.
A kind of sample relevance detecting system, including:
Sample collection module, for collecting known white sample file and black sample file, constitutes sample set;
Characteristic extracting module, for carrying out feature extraction in multiple dimensions to sample set;
Sample calculation of relationship degree module, for calculating the degree of association in sample set between two samples respectively, if the degree of association is big
In the first preset value, then have between two samples and do not have relevance between relevance, otherwise two samples;
Whether feature judge module, concentrate each sample feature in each dimension identical for distinguishing judgement sample;If it is,
Then think that feature of the sample in correspondence dimension has relevance, and provide the association angle value between each linked character;Otherwise sample
Do not have relevance in correspondence dimension;
Related network figure builds module, for the association according to the feature in correspondence dimension of the relevance and sample between sample
Property, with sample and node is characterized as, with sample and the line of feature with relevance as side, build related network figure;
Sample relating module to be detected, for obtaining feature of the sample to be detected in each dimension, and calculates test sample to be checked
This degree of association with sample in sample set, feature and the sample to be detected insertion by the sample to be detected in each dimension
The related network figure of structure, line constitutes new related network figure;
As a result output module, for calculating the degree of association of the sample to be detected with sample on each line in new related network figure
Value product, and whether the association angle value product is judged more than the second preset value, if it exceeds the second preset value, then defeated to user
The sample corresponding to feature gone out on correspondence line.
In described system, the degree of association in the sample set of calculating respectively between two samples is specially:Traversal obtains various kinds
Class name and method name in this code, compare class name between two samples, and such as class name is identical, then further calculate two samples in correspondence
All method name common factor numbers in class name, method name common factor quantity in each identical class name that adds up successively is all divided by two samples
The degree of association between method name union quantity, as two samples.
In the system, the association angle value between each linked character is identical.
It is described that feature extraction is carried out in multiple dimensions to sample set in described system, it is at least extractible including sample
Static, multidate information, and based on the other information obtained after static, multidate information treatment as feature, and these features can
It is further refined as:Samples sources dimension, sample identification dimension and sample names dimension, specifically:
The samples sources dimension can include:The whois information of ip, sp, email, url or domain name;
The sample identification dimension can include:The hash values of sample resource file or icon, it is necessary to explanation, here
Hash algorithm, except with uniqueness sign MD5, SHA1, crc32 algorithm etc., also including obscuring hash, local sensitivity hash
Algorithm;
The sample names dimension can include:Sample hash, bag name, program name, file signature or certificate are, it is necessary to illustrate
, sample hash algorithm here, except with uniqueness sign MD5, SHA1, crc32 algorithm etc., also including fuzzy
Hash, local sensitivity hash algorithm.
The sample relevance detecting system also includes:
Malice sample judge module, for the sample corresponding to the feature on result output module output correspondence line
After this, the sample according to corresponding to the feature on the output correspondence line judges whether the sample to be detected is malice
Sample.
The invention allows for a kind of electronic equipment, including:One or more processor;Memory;One or more
Program, one or more of program storages in the memory, when by one or more of computing devices
Proceed as follows:
Known white sample file and black sample file are collected, sample set is constituted;
Feature extraction is carried out in multiple dimensions to the sample set;
The degree of association between two samples in the sample set is calculated respectively, if the degree of association is more than the first preset value,
Then judge that there is relevance between two sample, otherwise judge do not have relevance between two sample;
Whether feature of each sample in each dimension be identical in judging the sample set respectively, if it is, thinking sample
Feature in correspondence dimension has relevance, and provides the association angle value between each linked character, otherwise judges sample in correspondence
Feature in dimension does not have relevance;
According to the relevance of the relevance between the sample and the sample feature in correspondence dimension, with sample and
Node is characterized as, is side with sample and the line of feature with relevance, build related network figure;
Obtain feature of the sample to be detected in each dimension, and calculate the sample to be detected and the sample set
The degree of association of sample, and feature and sample to be detected the insertion association net by the sample to be detected in each dimension
In network figure, line constitutes new related network figure;
The sample to be detected is calculated to multiply with the angle value that associates between sample on each line in the new related network figure
Product, and whether the association angle value product is judged more than the second preset value, if the association angle value product exceedes described second
Preset value, then export the sample corresponding to the feature on correspondence line.
The invention allows for a kind of storage medium, for storing application program, the application program is used for operationally
Perform sample relevance detection method of the present invention.
The present situation of detection and confrontation, sample proposed by the present invention are deliberately bypassed from code aspect for malicious application developer
Relevance detection method and system, including:Sample set is obtained, and calculates feature and the degree of association of the sample set in each dimension, with sample
This and sample characteristics are node, and related network figure is built with the line surrounding edge of the sample with the degree of association and node, are obtained to be checked
The feature of test sample sheet, and be embedded into related network figure, calculate sample to be detected in new related network figure with each line loading
This association angle value product, if being more than the second preset value, the sample on output correspondence line.The present invention is by non-code characteristic
Relevance is detected, more accurate with incidence relation, the advantages of enlightening stronger, can be widely used in Malicious Code Detection, malice
The fields such as code analysis.
Brief description of the drawings
In order to illustrate more clearly of technical scheme of the invention or of the prior art, below will be to embodiment or prior art
The accompanying drawing to be used needed for description is briefly described, it should be apparent that, during drawings in the following description are only the present invention
Some embodiments recorded, for those of ordinary skill in the art, on the premise of not paying creative work, can be with
Other accompanying drawings are obtained according to these accompanying drawings.
Fig. 1 is a kind of sample relevance detection method flow chart of the invention;
Fig. 2 is to build related network diagram according to the inventive method to be intended to;
Fig. 3 is to build new related network diagram according to the inventive method to be intended to;
Fig. 4 is a kind of sample relevance detecting system structural representation of the invention.
Specific embodiment
In order that those skilled in the art more fully understand the technical scheme in the embodiment of the present invention, and make of the invention
Above-mentioned purpose, feature and advantage can be more obvious understandable, and technical scheme in the present invention is made further in detail below in conjunction with the accompanying drawings
Thin explanation.
The present invention proposes a kind of sample relevance detection method, system and electronic equipment, by building sample between associate
Network and the calculating to relevance weights between sample, obtain the relevance of sample to be detected and known sample, so as to be malice
Code judges, analysis provided auxiliary information.
A kind of sample relevance detection method in certain embodiments, as shown in figure 1, including:
S101 collects known sample file, constitutes sample set.
In Malicious Code Detection field, it is known that sample can be that (i.e. official issues the normal of no malicious code to white sample
Sample) or black sample (sample comprising malicious code), but in order to improve the degree of accuracy of Malicious Code Detection, i.e., fully say
Bright sample to be detected respectively with white sample and the relevance of black sample, preferably in sample set simultaneously include white sample file and black sample
Presents.
S102 carries out feature extraction to sample set in multiple dimensions.
As a kind of example, feature extraction is carried out in multiple dimensions to sample set, at least including sample it is extractible it is static,
Multidate information, and based on the other information obtained after static, multidate information treatment as feature, and these features can be further
It is refined as:Samples sources dimension, sample identification dimension and sample names dimension etc.:
Samples sources dimension can include:Whois information of ip, sp, email, url or domain name etc.;
Sample identification dimension can include:The hash values of sample resource file or icon, it is necessary to explanation, here
Hash algorithm, except with uniqueness sign MD5, SHA1, crc32 algorithm etc., also being calculated including fuzzy hash, local sensitivity hash
Method etc.;
Sample names dimension can include:Sample hash, bag name, program name, file signature or certificate etc., it is necessary to explanation
Be, sample hash algorithm here, except with uniqueness sign MD5, SHA1, crc32 algorithm etc., also including fuzzy hash,
Local sensitivity hash algorithm.
S103 calculates the degree of association between two samples in sample set respectively, if the degree of association is more than the first preset value, sentences
There is relevance between fixed two samples.
It should be understood that the method for calculating the degree of association in sample set between two samples respectively has various, in the present embodiment
A kind of easy to use, detection efficiency method high is shown.Specifically:Traversal obtains class name and method in the code of each sample
Name, compares class name between two samples, if class name is identical between two samples, further two samples of calculating are in correspondence class name
All method name common factor quantity, method name in each identical class name that adds up successively, method name is occured simultaneously always in obtaining each identical class name
Number, with the total all method name union quantity divided by two samples of method name common factor, the number for obtaining in each identical class name for obtaining
Value is the degree of association between two samples.In the present embodiment, the first preset value is set to 0.5.For example:Existing white sample 1
Situation is as shown in table 1:
Table 1
The situation of sample 2 is as shown in table 2:
Table 2
As seen from table, white sample 1 and black sample 2 have class name 2, class name 3 the two identical class names, class name 2, class name 3
The method name that possesses is occured simultaneously always number 5 (i.e. method 201, method 202, method 203, method 301, method 302), the institute of two samples
Have method name union number for 10 (i.e. method 101, method 102, method 103, method 201, method 202, method 203, method 301,
Method 302, method 303, method 304), then the degree of association between two samples is 5/10=0.5.
It is appreciated that in an embodiment of the present invention, if the degree of association is less than or equal to the first preset value, can determine that two
Do not have relevance between sample.
, it is noted once again that prior art is just more and more taking targetedly development strategy to avoid the association of code aspect
Property detection, such as many counterfeit applications with only legal application resource file carry out it is counterfeit, not directly from code layer
Face carry out it is counterfeit, therefore, it is insecure to carry out judgement sample relevance only by the content of code aspect, in addition it is also necessary to from multiple
Other dimension judgement sample relevances.It should be understood that step S102 can also be put performing after step s 103.Code aspect
It is enlightening more act on functional or code development homology scene, and in the middle of reality confrontation, with attacker's
Means are all the more enriched, and what senior specific aim was threatened increases, and increasing malicious code is detected and resisted to preferably bypass,
Whether S104 difference judgement samples concentrate feature of each sample in respective dimensions identical;If two samples are at least
Such as ip, email, url, identical with the mutually equal feature of same asset file or icon in one dimension, then it is assumed that sample exists
Feature in correspondence dimension has relevance, and provides the association angle value between each linked character;Otherwise judge that sample is tieed up in correspondence
Feature on degree does not have relevance.
It is appreciated that the association angle value between the linked character in each dimension can be typically set based on experience value, at this
In embodiment, in addition to the degree of association between sample and sample needs individually to calculate, the association angle value phase between other each linked characters
Together, for example, 0.5 can be disposed as.
The relevance of features of the S105 according to the relevance and sample between sample in correspondence dimension, with sample and feature
It is node, is side with sample and the line of feature with relevance, builds related network figure;
For example, as shown in Fig. 2 after being such as computed, sample 1 is 0.7 with the degree of association of sample 4, sample 2 and sample 3
The degree of association be 0.85, sample 1 have feature icon 1 and ip1, sample 2 have feature sp1 and icon 1, sample 3 have feature
Bag name 1 and icon 1, sample 4 have icon 1, then it is 0.5 to build the association angle value between related network figure, each linked character.
So far, above-mentioned steps S101-S105 completes the structure of related network figure.Next will be using the related network figure
To realize carrying out sample to be detected the detection of sample relevance.
S106 obtains feature of the sample to be detected in each dimension, and calculates the pass of sample to be detected and sample in sample set
Connection degree, if meet requiring, what feature and the sample to be detected insertion by the sample to be detected in each dimension built
In related network figure, line constitutes new related network figure;
It should be noted that the satisfaction requirement in this step can be regarded as:Sample to be detected and sample in sample set
Between there is sample in relevance, i.e., sample to be detected and sample set the degree of association be more than the first preset value.
For example, as shown in figure 3, being computed, the degree of association between sample to be detected and sample 2 is 0.95, with other
The degree of association of sample is respectively less than 0.5, and sample to be detected has feature sp1, then after the embedded related network figure for building, line structure
Into new related network figure.
S107 calculates sample to be detected and associates angle value product between sample on each line in new related network figure, and
Whether the association angle value product is judged more than the second preset value, such as 0.2, if association angle value product is more than the second preset value,
The sample corresponding to the feature on correspondence line is then exported, the sample on the line is otherwise abandoned.
For example, by taking new related network figure as shown in Figure 3 as an example, calculating, between sample to be detected and sample 3
Weights product is 0.95*0.85=0.8075, and more than 0.2, then sample to be detected exists with sample 2 and sample 3 and associates.By sample
Sheet 2 and sample 3 are exported to user.
It should be understood that present invention can apply to the malicious detection of sample, if certain sample is malicious unknown, can count respectively
The degree of association of the sample and all samples in collection of illustrative plates (the new related network figure of as above-mentioned composition) is calculated, and selects the degree of association and be more than
The sample of the second preset value, can be according to the maximum sample situation of the degree of association come the malicious of the anticipation unknown sample.
Further, in one embodiment of the invention, output correspondence line on feature corresponding to sample it
Afterwards, the sample relevance detection method may also include:Sample according to corresponding to the feature on output correspondence line, judges to treat
Whether detection sample is malice sample.
Wherein, the sample according to output judges whether sample to be detected is that the mode of malice sample has many kinds, below
Three kinds of different examples will be given:
As a kind of example, the sample maximum with the sample degree of association to be detected can be first found out from the sample of output, afterwards,
Can judge whether the sample to be detected is malice sample according to the type (malice sample or normal sample in this way) of the sample
This.For example, existing sample X to be detected, in collection of illustrative plates degree of being associated more than the sample of the second preset value have respectively A, B, C, D,
E, the wherein degree of association maximum are C, it is known that C is malice sample, then anticipation X is also malice sample.
Used as another example, the sample (sample of i.e. above-mentioned output) according to the second preset value of all satisfactions is thrown
Ticket, for example, output sample is respectively A, B, C, D, E, wherein, sample A, C, D, E are malice sample, and B is normal sample, with
The malice sample of sample X associations to be detected is in the majority, thus anticipation sample X to be detected is malice sample.
Used as another example, under certain scene, the present invention can directly give the malicious of sample to be detected and sentence
Determine result, but the output sample A, B, C, D, E are pushed to analysis personnel, and by analyzing personnel according to a small amount of sample set to be checked
Test sample sheet, more efficiently, accurately judge.
Thus, judge whether sample to be detected is counterfeit file according to sample of the output with relevance, to malice generation
The detection of code has booster action.
It is an advantage of the present invention that by the association of the multi informations such as sample and feature, providing sample to be detected each with known
Relevance between sample, there is provided to user, for determining whether whether sample to be detected is malice or counterfeit sample, meanwhile,
If finding that a large amount of malice samples have identical feature in association process, Ke Yikaolv this feature addition anti-virus is drawn
The rule base held up.
Corresponding with the sample relevance detection method that above-mentioned several embodiments are provided, a kind of embodiment of the invention is also carried
For a kind of sample relevance detecting system, due to sample relevance detecting system provided in an embodiment of the present invention and above-mentioned several realities
The sample relevance detection method for applying example offer is corresponding, therefore implementation method in foregoing sample relevance detection method is also fitted
For the sample relevance detecting system that the present embodiment is provided, it is not described in detail in the present embodiment.Fig. 4 is a kind of for the present invention
Sample relevance detecting system structural representation.As shown in figure 4, including:
Sample collection module 401, for collecting known white sample file and black sample file, constitutes sample set.
Characteristic extracting module 402, for carrying out feature extraction in multiple dimensions to sample set.It is described in described system
Feature extraction is carried out in multiple dimensions to sample set, is at least included:Samples sources dimension, sample identification dimension and sample names dimension
Degree;Wherein, the samples sources dimension includes:The whois information of ip, sp, email, url or domain name;The sample identification dimension
Degree includes:The MD5 values of sample resource file or icon;The sample names dimension includes:Sample bag name, program name, file label
Name or certificate.
Sample calculation of relationship degree module 403, for calculating the degree of association in sample set between two samples respectively, if the pass
Connection degree is more than the first preset value, then judge there is relevance between two samples, otherwise judges do not have relevance between two samples.
In described system, the sample calculation of relationship degree module 403 calculates the association between two samples in sample set respectively
The process that implements of degree can be as follows:Traversal obtains class name and method name in the code of each sample, compares class name between two samples,
As class name is identical, then all method name common factor numbers of two samples in correspondence class name are further calculated, added up successively each identical
Method name common factor quantity in class name, divided by all method name union quantity of two samples, the degree of association between as two samples.
Whether feature judge module 404, concentrate each sample feature in each dimension identical for distinguishing judgement sample;If
It is, then it is assumed that feature of the sample in correspondence dimension has relevance, and provides the association angle value between each linked character;Otherwise sentence
Feature of the random sample sheet in correspondence dimension does not have relevance.In the system, the association angle value phase between each linked character
Together.
Related network figure build module 405, for according between sample and sample correspondence dimension on feature relevance,
With sample and node is characterized as, is side with sample and the line of feature with relevance, build related network figure.
Sample relating module 406 to be detected, for obtaining feature of the sample to be detected in each dimension, and calculates to be detected
The degree of association of sample in sample and sample set, and feature and sample to be detected by the sample to be detected in each dimension
The embedded related network figure for building, line constitutes new related network figure.
As a result output module 407, for calculating sample to be detected in new related network figure between sample on each line
Association angle value product, and whether the association angle value product is judged more than the second preset value, if association angle value product exceedes
Second preset value, then to the sample corresponding to the feature on user's output correspondence line.
In order to provide availability of the invention and feasibility, in order to aid in the detection of malicious code, alternatively, in this hair
In bright one embodiment, the sample relevance detecting system may also include:Malice sample judge module.Wherein, malice sample
After the sample corresponding to feature that judge module can be used on the output correspondence line of result output module 407, according to output
The sample corresponding to feature on correspondence line, judges whether sample to be detected is malice sample.
In order to realize above-described embodiment, the invention allows for a kind of electronic equipment, including:One or more treatment
Device;Memory;One or more programs, one or more of program storages in the memory, when one
Or proceeded as follows during multiple computing devices:
S101 ', collects known white sample file and black sample file, constitutes sample set;
S102 ', feature extraction is carried out to the sample set in multiple dimensions;
S103 ', calculates the degree of association between two samples in the sample set respectively, if the degree of association is pre- more than first
If value, then judge that there is relevance between two sample, otherwise judge do not have relevance between two sample;
S104 ', whether feature of each sample in each dimension be identical during the sample set is judged respectively, if it is, recognizing
For feature of the sample in correspondence dimension has relevance, and the association angle value between each linked character is provided, otherwise judge sample
Feature in correspondence dimension does not have relevance;
S105 ', the relevance of the feature in correspondence dimension according to the relevance between the sample and the sample, with
Sample and node is characterized as, is side with sample and the line of feature with relevance, build related network figure;
S106 ', obtains feature of the sample to be detected in each dimension, and calculate the sample to be detected and the sample
The degree of association of this concentration sample, and feature by the sample to be detected in each dimension and sample to be detected insertion are described
In related network figure, line constitutes new related network figure;
S107 ', calculates sample to be detected associating between sample on each line in the new related network figure
Angle value product, and whether the association angle value product is judged more than the second preset value, if the association angle value product exceedes institute
The second preset value is stated, then exports the sample corresponding to the feature on correspondence line.
In order to realize above-described embodiment, the invention allows for a kind of storage medium, for storing application program, the application
Program is used to operationally perform the sample relevance detection method described in any of the above-described embodiment of the invention.
It is an advantage of the present invention that by the association of the multi informations such as sample and feature, providing sample to be detected each with known
Relevance between sample, there is provided to user, for determining whether whether sample to be detected is malice or counterfeit sample, meanwhile,
If finding that a large amount of malice samples have identical feature in association process, Ke Yikaolv this feature addition anti-virus is drawn
The rule base held up.
The present invention proposes a kind of sample relevance detection method and system, including:Sample set is obtained, and calculates sample set and existed
The feature and the degree of association of each dimension, with sample and sample characteristics as node, are enclosed with sample and the line of node with the degree of association
Side builds related network figure, obtains the feature of sample to be detected, and is embedded into related network figure, calculates sample to be detected new
Angle value product is associated with sample on each line in related network figure, if being more than the second preset value, on output correspondence line
Sample.By the way that method of the invention, it is possible to carry out the judgement of more information using code and sample attribute, incidence relation is more accurate
Really, it is enlightening stronger.The sample with relevance can be effectively exported, for determining whether whether sample to be detected is imitative
File is emitted, there is booster action to the detection of malicious code.
Although depicting the present invention by embodiment, it will be appreciated by the skilled addressee that the present invention have it is many deformation and
Change is without deviating from spirit of the invention, it is desirable to which appended claim includes these deformations and changes without deviating from of the invention
Spirit.
Claims (10)
1. a kind of sample relevance detection method, it is characterised in that including:
Known sample file is collected, sample set is constituted;
Feature extraction is carried out in multiple dimensions to the sample set;
The degree of association between two samples in the sample set is calculated respectively, if the degree of association is more than the first preset value, is sentenced
There is relevance between fixed two sample, otherwise judge do not have relevance between two sample;
Whether feature of each sample in each dimension be identical in judging the sample set respectively, if it is, thinking sample right
Answering the feature in dimension has relevance, and provides the association angle value between each linked character, otherwise judges sample in correspondence dimension
On feature do not have relevance;
The relevance of the feature in correspondence dimension according to the relevance between the sample and the sample, with sample and feature
It is node, is side with sample and the line of feature with relevance, builds related network figure;
Feature of the sample to be detected in each dimension is obtained, and calculates the sample to be detected with sample in the sample set
The degree of association, and feature by the sample to be detected in each dimension and sample to be detected are embedded in the related network figure
In, line constitutes new related network figure;
Calculate the sample to be detected and associate angle value product between sample on each line in the new related network figure, and
Whether the association angle value product is judged more than the second preset value, if the association angle value product exceedes described second and presets
Value, then export the sample corresponding to the feature on correspondence line.
2. the method for claim 1, it is characterised in that the pass calculated respectively in the sample set between two samples
Connection degree is specifically included:
Traversal obtains class name and method name in the code of each sample, compares class name between two samples;
If class name is identical between two sample, all method names of two sample in correspondence class name are further calculated
Common factor quantity;
Add up method name common factor quantity in each identical class name successively, divided by all method name union quantity of two sample, i.e.,
It is the degree of association between two sample.
3. the method for claim 1, it is characterised in that feature extraction, including sample are carried out in multiple dimensions to sample set
The static information of this extraction, multidate information, based on the other information obtained after either statically or dynamically information processing.
4. method as claimed any one in claims 1 to 3, it is characterised in that the feature on the output correspondence line
After corresponding sample, methods described also includes:
Judge whether the sample to be detected is malice sample according to default method.
5. a kind of sample relevance detecting system, it is characterised in that including:
Sample collection module, for collecting sample file, constitutes sample set;
Characteristic extracting module, for carrying out feature extraction in multiple dimensions to the sample set;
Sample calculation of relationship degree module, for calculating the degree of association in the sample set between two samples respectively, if the pass
Connection degree is more than the first preset value, then judge there is relevance between two sample, otherwise judges do not have between two sample
Relevant property;
Feature judge module, for judging the sample set respectively in feature of each sample in each dimension it is whether identical, if
It is, then it is assumed that feature of the sample in correspondence dimension has relevance, and provides the association angle value between each linked character, otherwise sentences
Feature of the random sample sheet in correspondence dimension does not have relevance;
Related network figure builds module, for special in correspondence dimension according to the relevance between the sample and the sample
The relevance levied, with sample and is characterized as node, is side with sample and the line of feature with relevance, builds related network
Figure;
Sample relating module to be detected, for obtaining feature of the sample to be detected in each dimension, and calculates described to be checked
The degree of association of test sample sheet and sample in the sample set, and feature by the sample to be detected in each dimension and to be checked
In this insertion of test sample related network figure, line constitutes new related network figure;
As a result output module, for calculating the sample to be detected in the new related network figure between sample on each line
Association angle value product, and whether the association angle value product is judged more than the second preset value, if the association angle value product
More than second preset value, then the sample corresponding to the feature on correspondence line is exported.
6. system as claimed in claim 5, it is characterised in that the sample calculation of relationship degree module specifically for:
Traversal obtains class name and method name in the code of each sample, compares class name between two samples;
When class name is identical between fruit two sample, all method names of two sample in correspondence class name are further calculated
Common factor quantity;
Add up method name common factor quantity in each identical class name successively, divided by all method name union quantity of two sample, i.e.,
It is the degree of association between two sample.
7. system as claimed in claim 5, it is characterised in that the association angle value between each linked character is identical.
8. the system as any one of claim 5 to 7, it is characterised in that feature is carried out in multiple dimensions to sample set
Extract, including the extractible static information of sample, multidate information, based on obtained after either statically or dynamically information processing other letter
Breath.
9. the system as any one of claim 5 to 8, it is characterised in that the system also includes:
Malice sample judge module, for the result output module output correspondence line on feature corresponding to sample it
Afterwards, judge whether the sample to be detected is malice sample according to default method.
10. a kind of electronic equipment, it is characterised in that including:
One or more processor;
Memory;
One or more programs, one or more of program storages in the memory, when by one or many
Proceeded as follows during individual computing device:
Known sample file is collected, sample set is constituted;
Feature extraction is carried out in multiple dimensions to the sample set;
The degree of association between two samples in the sample set is calculated respectively, if the degree of association is more than the first preset value, is sentenced
There is relevance between fixed two sample, otherwise judge do not have relevance between two sample;
Whether feature of each sample in each dimension be identical in judging the sample set respectively, if it is, thinking sample right
Answering the feature in dimension has relevance, and provides the association angle value between each linked character, otherwise judges sample in correspondence dimension
On feature do not have relevance;
The relevance of the feature in correspondence dimension according to the relevance between the sample and the sample, with sample and feature
It is node, is side with sample and the line of feature with relevance, builds related network figure;
Feature of the sample to be detected in each dimension is obtained, and calculates the sample to be detected with sample in the sample set
The degree of association, and feature by the sample to be detected in each dimension and sample to be detected are embedded in the related network figure
In, line constitutes new related network figure;
Calculate the sample to be detected and associate angle value product between sample on each line in the new related network figure, and
Whether the association angle value product is judged more than the second preset value, if the association angle value product exceedes described second and presets
Value, then export the sample corresponding to the feature on correspondence line.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201511015286.1A CN105975852A (en) | 2015-12-31 | 2015-12-31 | Method and system for detecting sample relevance based on label propagation |
CN2015110152861 | 2015-12-31 |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106815521A true CN106815521A (en) | 2017-06-09 |
CN106815521B CN106815521B (en) | 2019-07-23 |
Family
ID=56988207
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201511015286.1A Pending CN105975852A (en) | 2015-12-31 | 2015-12-31 | Method and system for detecting sample relevance based on label propagation |
CN201611199242.3A Active CN106815521B (en) | 2015-12-31 | 2016-12-22 | A kind of sample relevance detection method, system and electronic equipment |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201511015286.1A Pending CN105975852A (en) | 2015-12-31 | 2015-12-31 | Method and system for detecting sample relevance based on label propagation |
Country Status (2)
Country | Link |
---|---|
CN (2) | CN105975852A (en) |
WO (1) | WO2017114290A1 (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107609400A (en) * | 2017-09-28 | 2018-01-19 | 深信服科技股份有限公司 | Computer virus classification method, system, equipment and computer-readable recording medium |
CN109325280A (en) * | 2018-09-13 | 2019-02-12 | 广西科技大学 | A kind of inertia test table module partition method |
CN109995605A (en) * | 2018-01-02 | 2019-07-09 | 中国移动通信有限公司研究院 | A kind of method for recognizing flux and device and computer readable storage medium |
CN110264333A (en) * | 2019-05-09 | 2019-09-20 | 阿里巴巴集团控股有限公司 | A kind of risk rule determines method and apparatus |
CN110458394A (en) * | 2019-07-05 | 2019-11-15 | 阿里巴巴集团控股有限公司 | A kind of index measuring and calculating method and device based on Object related degree |
Families Citing this family (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105975852A (en) * | 2015-12-31 | 2016-09-28 | 武汉安天信息技术有限责任公司 | Method and system for detecting sample relevance based on label propagation |
CN106446687B (en) * | 2016-10-14 | 2020-11-03 | 北京奇虎科技有限公司 | Malicious sample detection method and device |
CN108268772B (en) * | 2016-12-30 | 2021-10-22 | 武汉安天信息技术有限责任公司 | Method and system for screening malicious samples |
CN108537654B (en) * | 2018-03-09 | 2021-04-30 | 平安普惠企业管理有限公司 | Rendering method and device of customer relationship network graph, terminal equipment and medium |
CN110457359B (en) * | 2018-05-04 | 2024-03-08 | 拉萨经济技术开发区凯航科技开发有限公司 | Correlation analysis method |
CN109033834A (en) * | 2018-07-17 | 2018-12-18 | 南京邮电大学盐城大数据研究院有限公司 | A kind of malware detection method based on file association relationship |
CN110336838B (en) * | 2019-08-07 | 2022-07-08 | 腾讯科技(武汉)有限公司 | Account abnormity detection method, device, terminal and storage medium |
CN112487421B (en) * | 2020-10-26 | 2024-06-11 | 中国科学院信息工程研究所 | Android malicious application detection method and system based on heterogeneous network |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1262836A (en) * | 1997-02-03 | 2000-08-09 | Mci通讯公司 | Communication system architecture |
US7769851B1 (en) * | 2005-01-27 | 2010-08-03 | Juniper Networks, Inc. | Application-layer monitoring and profiling network traffic |
CN101933290A (en) * | 2007-12-18 | 2010-12-29 | 太阳风环球有限责任公司 | Method for configuring acls on network device based on flow information |
CN102034042A (en) * | 2010-12-13 | 2011-04-27 | 四川大学 | Novel unwanted code detecting method based on characteristics of function call relationship graph |
CN102821002A (en) * | 2011-06-09 | 2012-12-12 | 中国移动通信集团河南有限公司信阳分公司 | Method and system for network flow anomaly detection |
CN103984920A (en) * | 2014-04-25 | 2014-08-13 | 同济大学 | Three-dimensional face identification method based on sparse representation and multiple feature points |
CN105205397A (en) * | 2015-10-13 | 2015-12-30 | 北京奇虎科技有限公司 | Rogue program sample classification method and device |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103530367B (en) * | 2013-10-12 | 2017-07-18 | 深圳先进技术研究院 | A kind of fishing website identification system and method |
CN104537303B (en) * | 2014-12-30 | 2017-10-24 | 中国科学院深圳先进技术研究院 | A kind of fishing website identification system and discrimination method |
CN104899253B (en) * | 2015-05-13 | 2018-06-26 | 复旦大学 | Towards the society image across modality images-label degree of correlation learning method |
CN105975852A (en) * | 2015-12-31 | 2016-09-28 | 武汉安天信息技术有限责任公司 | Method and system for detecting sample relevance based on label propagation |
-
2015
- 2015-12-31 CN CN201511015286.1A patent/CN105975852A/en active Pending
-
2016
- 2016-12-22 CN CN201611199242.3A patent/CN106815521B/en active Active
- 2016-12-22 WO PCT/CN2016/111566 patent/WO2017114290A1/en active Application Filing
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1262836A (en) * | 1997-02-03 | 2000-08-09 | Mci通讯公司 | Communication system architecture |
US7769851B1 (en) * | 2005-01-27 | 2010-08-03 | Juniper Networks, Inc. | Application-layer monitoring and profiling network traffic |
CN101933290A (en) * | 2007-12-18 | 2010-12-29 | 太阳风环球有限责任公司 | Method for configuring acls on network device based on flow information |
CN102034042A (en) * | 2010-12-13 | 2011-04-27 | 四川大学 | Novel unwanted code detecting method based on characteristics of function call relationship graph |
CN102821002A (en) * | 2011-06-09 | 2012-12-12 | 中国移动通信集团河南有限公司信阳分公司 | Method and system for network flow anomaly detection |
CN103984920A (en) * | 2014-04-25 | 2014-08-13 | 同济大学 | Three-dimensional face identification method based on sparse representation and multiple feature points |
CN105205397A (en) * | 2015-10-13 | 2015-12-30 | 北京奇虎科技有限公司 | Rogue program sample classification method and device |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107609400A (en) * | 2017-09-28 | 2018-01-19 | 深信服科技股份有限公司 | Computer virus classification method, system, equipment and computer-readable recording medium |
CN109995605A (en) * | 2018-01-02 | 2019-07-09 | 中国移动通信有限公司研究院 | A kind of method for recognizing flux and device and computer readable storage medium |
CN109325280A (en) * | 2018-09-13 | 2019-02-12 | 广西科技大学 | A kind of inertia test table module partition method |
CN110264333A (en) * | 2019-05-09 | 2019-09-20 | 阿里巴巴集团控股有限公司 | A kind of risk rule determines method and apparatus |
CN110264333B (en) * | 2019-05-09 | 2023-12-08 | 创新先进技术有限公司 | Risk rule determining method and apparatus |
CN110458394A (en) * | 2019-07-05 | 2019-11-15 | 阿里巴巴集团控股有限公司 | A kind of index measuring and calculating method and device based on Object related degree |
CN110458394B (en) * | 2019-07-05 | 2023-08-22 | 创新先进技术有限公司 | Index measuring and calculating method and device based on object association degree |
Also Published As
Publication number | Publication date |
---|---|
WO2017114290A1 (en) | 2017-07-06 |
CN106815521B (en) | 2019-07-23 |
CN105975852A (en) | 2016-09-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106815521A (en) | A kind of sample relevance detection method, system and electronic equipment | |
CN105825138B (en) | A kind of method and apparatus of sensitive data identification | |
Gelernter et al. | Cross-site search attacks | |
CN107180192A (en) | Android malicious application detection method and system based on multi-feature fusion | |
CN106713303A (en) | Malicious domain name detection method and system | |
CN105224600B (en) | A kind of detection method and device of Sample Similarity | |
WO2016201938A1 (en) | Multi-stage phishing website detection method and system | |
Bailey et al. | Statistics on password re-use and adaptive strength for financial accounts | |
Huang et al. | Mitigate web phishing using site signatures | |
CN110881050A (en) | Security threat detection method and related product | |
CN107247902A (en) | Malware categorizing system and method | |
CN106599688A (en) | Application category-based Android malicious software detection method | |
Su et al. | Suspicious URL filtering based on logistic regression with multi-view analysis | |
Torres et al. | Malicious PDF documents detection using machine learning techniques | |
Cui et al. | A password strength evaluation algorithm based on sensitive personal information | |
CN113792298A (en) | Method and device for detecting vehicle safety risk | |
CN107085684A (en) | The detection method and device of performance of program | |
Orunsolu et al. | An Anti-Phishing Kit Scheme for Secure Web Transactions. | |
CN106911635A (en) | A kind of method and device of detection website with the presence or absence of backdoor programs | |
EP4137976A1 (en) | Learning device, detection device, learning method, detection method, learning program, and detection program | |
CN107995167B (en) | Equipment identification method and server | |
CN115599345A (en) | Application security requirement analysis recommendation method based on knowledge graph | |
CN107239704A (en) | Malicious web pages find method and device | |
Hao et al. | JavaScript malicious codes analysis based on naive bayes classification | |
Yu et al. | HoneyGAN: creating indistinguishable honeywords with improved generative adversarial networks |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |