CN105488406A - Similar malicious sample file matching method and system based on feature vector - Google Patents

Similar malicious sample file matching method and system based on feature vector Download PDF

Info

Publication number
CN105488406A
CN105488406A CN201410827237.7A CN201410827237A CN105488406A CN 105488406 A CN105488406 A CN 105488406A CN 201410827237 A CN201410827237 A CN 201410827237A CN 105488406 A CN105488406 A CN 105488406A
Authority
CN
China
Prior art keywords
sample file
behavioural characteristic
behavior
vector
malice
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201410827237.7A
Other languages
Chinese (zh)
Other versions
CN105488406B (en
Inventor
张洋
康学斌
董晓齐
孙晋超
肖新光
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Antiy Technology Group Co Ltd
Original Assignee
Harbin Antiy Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin Antiy Technology Co Ltd filed Critical Harbin Antiy Technology Co Ltd
Priority to CN201410827237.7A priority Critical patent/CN105488406B/en
Publication of CN105488406A publication Critical patent/CN105488406A/en
Application granted granted Critical
Publication of CN105488406B publication Critical patent/CN105488406B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Sampling And Sample Adjustment (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention provides a similar malicious sample file matching method and system based on a feature vector. The similar malicious sample matching method comprises the following steps: firstly, extracting the behavior features of each malicious sample file in mass malicious sample file libraries; filtering the behavior features, calculating the hash value of each filtered behavior feature, and aiming at each malicious sample file to generate a behavior feature vector group; obtaining a to-be-inquired feature vector group of a to-be-inquired sample file; solving a sample similarity between the to-be-inquired feature vector group and each malicious sample file, obtaining the behavior feature vector group of the malicious sample file of which the sample similarity is greater than or equal to a target similarity, and finding the corresponding malicious sample file according to the behavior feature vector group, wherein the malicious sample file is similar to the to-be-inquired sample file. The technical scheme can quickly discover the generality of the malicious sample file from mass samples, inquire required similar samples and generate a report to provide the report for related personnel to carry out analysis.

Description

A kind of similar malice sample matches method and system of feature based vector
Technical field
The present invention relates to field of information security technology, particularly relate to a kind of similar malice sample matches method and system of feature based vector.
Background technology
Along with the growth of the blowout formula of data in the last few years, in this field of network security, the kind of rogue program, quantity are also in growth at full speed.How can in the malicious code of this magnanimity, find the general character of malicious code, the development for antivirus techniques has very large help.
Summary of the invention
For above-mentioned technical matters, the invention provides a kind of similar malice sample matches method and system of feature based vector, the general character between malice sample file can be found fast in magnanimity malice sample file, inquire required similar sample, generate report, analyze to be supplied to related personnel.Solve the technical matterss such as retrieval rate in classic method is slow, length consuming time.
The present invention adopts and realizes with the following method: a kind of similar malice sample matches method of feature based vector, comprising:
Extract the behavioural characteristic of each malice sample file in magnanimity malice sample file storehouse;
Filter described behavioural characteristic, calculate the hash value of each behavioural characteristic after filtering, and generating behavioural characteristic Vector Groups for each malice sample file, described behavioural characteristic Vector Groups comprises m behavioural characteristic vector, and described m behavioural characteristic vector corresponds to the m class behavior feature of each malice sample file;
The structure of described behavioural characteristic vector is: behavioural characteristic type: [behavior component 1, behavior component 2 ... behavior component n];
Obtain the proper vector group to be checked of sample file to be checked;
Ask for the Sample Similarity of sample file to be checked and each malice sample file, obtain the behavioural characteristic Vector Groups that Sample Similarity is more than or equal to the malice sample file of target similarity, according to described behavioural characteristic Vector Groups, find corresponding malice sample file, described malice sample file is the malice sample file similar to sample file to be checked;
The described concrete grammar asking for Sample Similarity is:
The behavioural characteristic of proper vector group to be checked vector is compared with the behavioural characteristic vector of each malice sample file, under asking for arbitrary behavioural characteristic type, the number of the identical behavior component contained by both;
Ask for identical behavior component number and the behavior proper vector the ratio of behavior component total number, be multiplied by the default weight of behavior proper vector again, obtain the intermediate weights value of behavior proper vector, and ask for the intermediate weights value of all the other behavior proper vectors in the same way, calculate all intermediate weights value sums, obtain Sample Similarity;
Described whole behavioural characteristic vector weight summations are 1.
Further, before asking for Sample Similarity, also comprise: based on default filtercondition, magnanimity malice sample file storehouse is filtered.
Further, described default filtercondition is:
Choose arbitrary behavioural characteristic vector of proper vector group to be checked as unitary variant, suppose that all the other behavior proper vectors are mated completely, according to the operation relation between the weight of behavior component number, target similarity and behavioural characteristic vector, obtain the minimum coupling number of components of corresponding each behavior component;
The sub-weight of calculating behavior component, arranges from big to small, forms the sub-weighted list of flashback;
Accumulated list neutron weight one by one, until cumulative sum is greater than goal-selling weight, the number of behavior component accumulated in list is minimum coupling sum;
Based on minimum coupling number of components and the minimum coupling sum filtration magnanimity malice sample file storehouse further of each behavior component obtained.
The present invention adopts following system to realize: a kind of similar malice sample matches system of feature based vector, comprising:
Malice sample file storehouse processing module, for extracting the behavioural characteristic of each malice sample file in magnanimity malice sample file storehouse;
Filter described behavioural characteristic, calculate the hash value of each behavioural characteristic after filtering, and generating behavioural characteristic Vector Groups for each malice sample file, described behavioural characteristic Vector Groups comprises m behavioural characteristic vector, and described m behavioural characteristic vector corresponds to the m class behavior feature of each malice sample file;
The structure of described behavioural characteristic vector is: behavioural characteristic type: [behavior component 1, behavior component 2 ... behavior component n];
Sample file processing module to be checked, for obtaining the proper vector group to be checked of sample file to be checked;
Matching primitives module, for asking for the Sample Similarity of sample file to be checked and each malice sample file, obtain the behavioural characteristic Vector Groups that Sample Similarity is more than or equal to the malice sample file of target similarity, according to described behavioural characteristic Vector Groups, find corresponding malice sample file, described malice sample file is the malice sample file similar to sample file to be checked;
The described concrete grammar asking for Sample Similarity is:
The behavioural characteristic of proper vector group to be checked vector is compared with the behavioural characteristic vector of each malice sample file, under asking for arbitrary behavioural characteristic type, the number of the identical behavior component contained by both;
Ask for identical behavior component number and the behavior proper vector the ratio of behavior component total number, be multiplied by the default weight of behavior proper vector again, obtain the intermediate weights value of behavior proper vector, and ask for the intermediate weights value of all the other behavior proper vectors in the same way, calculate all intermediate weights value sums, obtain Sample Similarity;
Described whole behavioural characteristic vector weight summations are 1.
Further, before asking for Sample Similarity, also comprise: based on default filtercondition, magnanimity malice sample file is filtered.
Further, described default filtercondition is:
Choose arbitrary behavioural characteristic vector of proper vector group to be checked as unitary variant, suppose that all the other behavior proper vectors are mated completely, according to the operation relation between the weight of behavior component number, target similarity and behavioural characteristic vector, obtain the minimum coupling number of components of corresponding each behavior component;
The sub-weight of calculating behavior component, arranges from big to small, forms the sub-weighted list of flashback;
Accumulated list neutron weight one by one, until cumulative sum is greater than goal-selling weight, the number of behavior component accumulated in list is minimum coupling sum;
Based on minimum coupling number of components and the minimum coupling sum filtration magnanimity malice sample file storehouse further of each behavior component obtained.
In sum, first technical scheme of the present invention extracts the behavioural characteristic of each malice sample file in magnanimity malice sample file storehouse; Filter described behavioural characteristic, calculate the hash value of each behavioural characteristic after filtering, and generate behavioural characteristic Vector Groups for each malice sample file; Obtain the proper vector group to be checked of sample file to be checked; Ask for the Sample Similarity of sample file to be checked and each malice sample file, obtain the behavioural characteristic Vector Groups that Sample Similarity is more than or equal to the malice sample file of target similarity, according to described behavioural characteristic Vector Groups, find corresponding malice sample file, described malice sample file is the malice sample file similar to sample file to be checked.Technical scheme of the present invention can find the general character between malice sample file fast in magnanimity malice sample file, inquire required similar sample, and generation report is analyzed to be supplied to related personnel.
Beneficial effect of the present invention is: the behavioural characteristic vector that the present invention is based on malice sample file, employ effective similar sample matches algorithm, the general character between malice sample file can be found fast in magnanimity malice sample file, inquire required similar sample, generate report, analyze to be supplied to related personnel.Solve the technical matterss such as retrieval rate in classic method is slow, length consuming time.
Accompanying drawing explanation
In order to be illustrated more clearly in technical scheme of the present invention, be briefly described to the accompanying drawing used required in embodiment below, apparently, the accompanying drawing that the following describes is only some embodiments recorded in the present invention, for those of ordinary skill in the art, under the prerequisite not paying creative work, other accompanying drawing can also be obtained according to these accompanying drawings.
Fig. 1 is the similar malice sample matches embodiment of the method process flow diagram of a kind of feature based vector provided by the invention;
Fig. 2 is the similar malice sample matches system embodiment structural drawing of a kind of feature based vector provided by the invention.
Embodiment
The present invention gives a kind of similar malice sample matches method and system of feature based vector, technical scheme in the embodiment of the present invention is understood better in order to make those skilled in the art person, and enable above-mentioned purpose of the present invention, feature and advantage become apparent more, below in conjunction with accompanying drawing, technical scheme in the present invention is described in further detail:
The present invention provide firstly a kind of similar malice sample matches embodiment of the method for feature based vector, as shown in Figure 1, comprising:
S101 extracts the behavioural characteristic of each malice sample file in magnanimity malice sample file storehouse;
Wherein, described behavioural characteristic comprises the URL of malice sample file access, IP, the information such as domain name;
S102 filters described behavioural characteristic, calculates the hash value of each behavioural characteristic after filtering, and generates behavioural characteristic Vector Groups for each malice sample file;
Described behavioural characteristic Vector Groups comprises m behavioural characteristic vector, and described m behavioural characteristic vector corresponds to the m class behavior feature of each malice sample file;
The structure of described behavioural characteristic vector
For: behavioural characteristic type: [behavior component 1, behavior component 2 ... behavior component n];
Wherein, filter described behavioural characteristic to carry out based on original malice Sample Storehouse.
Instantiation is as follows:
Suppose that MD5 value is that the concrete form of behavioural characteristic Vector Groups of E13A8763AE6F65DF4C72D130B6696056 malice sample file is as follows:
URL:[http://46.211.87.16/mod2/safpro1.exe,http://89.149.101.121/mod1/safpro1.exe,http://188.0.133.161/mod1/safpro1.exe];
Domain name: [" 188.0.133.161 ", " www.baidu.com ", " www.google.com "]
IP:[39.119.165.76,178.151.173.178,46.148.53.253,86.100.8.75,89.149.101.121,188.0.133.161,74.82.216.5,95.141.42.87,65.98.83.117]
Wherein, because www.baidu.com and www.google.com is common common domain name, clearly do not belong to the behavioural characteristic of malice sample file, be therefore filtered.
The hash value calculating each behavioural characteristic after filtering is as follows:
URL:[c62f6e80,a097745c,4b865ed5]
Domain name: [11b3c408]
IP:[180a97dd,5ecdccfe,12d99ac4,918f478b,8c870f31,11b3c408,fbab2cfa,62f10103,d65c4fb7]
From the foregoing, behavioural characteristic URL, domain name and IP comprise 3,1 and 9 behavior components respectively.
Wherein, behavioural characteristic Vector Groups based on magnanimity malice sample generates reversal of the natural order of things concordance list, the structure of described reversal of the natural order of things concordance list is: the MD5 value of behavior component i:[sample file 1, the MD5 value of sample file 2 ... the MD5 value of sample file P], wherein, described 1<=i<=n; The MD5 value of described sample file P is, possesses the MD5 value of the malice sample file of behavior component i in the proper vector group of magnanimity malice sample file;
The concordance list of behavioural characteristic vector URL is as follows:
c62f6e80:[E13A8763AE6F65DF4C72D130B6696056,…]
a097745c:[E13A8763AE6F65DF4C72D130B6696056,…]
4b865ed5:[E13A8763AE6F65DF4C72D130B6696056,…]
S103 obtains the proper vector group to be checked of sample file to be checked;
S104 asks for the Sample Similarity of sample file to be checked and each malice sample file, obtain the behavioural characteristic Vector Groups that Sample Similarity is more than or equal to the malice sample file of target similarity, according to described behavioural characteristic Vector Groups, find corresponding malice sample file, described malice sample file is the malice sample file similar to sample file to be checked;
The described concrete grammar asking for Sample Similarity is:
The behavioural characteristic of proper vector group to be checked vector is compared with the behavioural characteristic vector of each malice sample file, under asking for arbitrary behavioural characteristic type, the number of the identical behavior component contained by both;
Ask for identical behavior component number and the behavior proper vector the ratio of behavior component total number, be multiplied by the default weight of behavior proper vector again, obtain the intermediate weights value of behavior proper vector, and ask for the intermediate weights value of all the other behavior proper vectors in the same way, calculate all intermediate weights value sums, obtain Sample Similarity;
Described whole behavioural characteristic vector weight summations are 1.
Wherein, target similarity is artificial setting value.At this, the value of hypothetical target similarity is 90%, URL, the default weight of domain name and IP behavioural characteristic vector is respectively 20%, 20%, 60%.
Computing formula is as follows:
(n ∈ I), I is proper vector group to be checked, weight (n) is the weight shared in proper vector group to be checked of the n-th class behavior proper vector, in (n) and ord(n) represent the behavioural characteristic vector of behavioural characteristic vector sum each malice sample file of proper vector group to be checked respectively.The Sample Similarity that f (n) is sample file to be checked and each malice sample file.Computation process is exemplified below:
Suppose that malice sample file only comprises three class behavior features, be respectively URL, domain name and IP;
Then all intermediate weights value sum Weight=Weight(UPL) * | in (URL) ∩ ord (URL) |/| in (URL) |+Weight (domain name) * | in (domain name) ∩ in (domain name) |/| in (domain name) |+Weight (ip) * | in (ip) ∩ ord (ip) |/| in (ip) |=0.2*2/2+0.2*1/1+0.6*8/9=93%.Then can obtain the behavioural characteristic Vector Groups that Sample Similarity is more than or equal to the malice sample file of target similarity, according to described behavioural characteristic Vector Groups, find corresponding malice sample file, described malice sample file is the malice sample file similar to sample file to be checked.
Preferably, before asking for Sample Similarity, also comprise: based on default filtercondition, magnanimity malice sample file storehouse is filtered.
Preferably, described default filtercondition is:
Choose arbitrary behavioural characteristic vector of proper vector group to be checked as unitary variant, suppose that all the other behavior proper vectors are mated completely, according to the operation relation between the weight of behavior component number, target similarity and behavioural characteristic vector, obtain the minimum coupling number of components of corresponding each behavior component;
The sub-weight of calculating behavior component, arranges from big to small, forms the sub-weighted list of flashback;
Accumulated list neutron weight one by one, until cumulative sum is greater than goal-selling weight, the number of behavior component accumulated in list is minimum coupling sum;
Based on minimum coupling number of components and the minimum coupling sum filtration magnanimity malice sample file storehouse further of each behavior component obtained.
Wherein, computing formula is as follows: x>n-(1-Wd) * n/Wc.Wd is target similarity.The default weight of the behavioural characteristic vector of Wc behavior characteristic type, x is the number at least mated of such behavioural characteristic required, and lower of n behavior characteristic type comprises the total number of behavior component.
Then computation process is as follows:
The number that URL at least mates is 2-(1-90%) * 2/20%=1,
The number that domain name is at least mated is 1-(1-90%) * 1/20%=1
The number that IP at least mates is 9-(1-90%) * 9/60%=8
The process of the sub-weight of calculating behavior component is:
The sub-weight of each URL behavior component is 20%/2=10%
The sub-weight 20%/1=20% of each domain name behavior component
The sub-weight 60%/9=6.67% of each IP behavior component
From the above, arrange from big to small, forming the sub-weighted list of flashback is [domain name behavior component 1, URL behavior component 1, URL behavior component 2, IP behavior component 1, IP behavior component 9], wherein, the behavior component sequence between every class behavior characteristic type is in no particular order.
Accumulated list neutron weight one by one, until cumulative sum is greater than goal-selling weight, the number of behavior component accumulated in list is minimum coupling sum;
Therefore minimum coupling sum is at least 1+2+8=11.
Default filtercondition is, URL, and the number that domain name and IP at least mate is respectively 1,1,8.Minimum coupling sum must not be less than 11.
Secondly the present invention provides a kind of similar malice sample matches system embodiment of feature based vector, as shown in Figure 2, comprising:
Malice sample file storehouse processing module 201, for extracting the behavioural characteristic of each malice sample file in magnanimity malice sample file storehouse;
Filter described behavioural characteristic, calculate the hash value of each behavioural characteristic after filtering, and generating behavioural characteristic Vector Groups for each malice sample file, described behavioural characteristic Vector Groups comprises m behavioural characteristic vector, and described m behavioural characteristic vector corresponds to the m class behavior feature of each malice sample file;
The structure of described behavioural characteristic vector is: behavioural characteristic type: [behavior component 1, behavior component 2 ... behavior component n];
Sample file processing module 202 to be checked, for obtaining the proper vector group to be checked of sample file to be checked;
Matching primitives module 203, for asking for the Sample Similarity of sample file to be checked and each malice sample file, obtain the behavioural characteristic Vector Groups that Sample Similarity is more than or equal to the malice sample file of target similarity, according to described behavioural characteristic Vector Groups, find corresponding malice sample file, described malice sample file is the malice sample file similar to sample file to be checked;
The described concrete grammar asking for Sample Similarity is:
The behavioural characteristic of proper vector group to be checked vector is compared with the behavioural characteristic vector of each malice sample file, under asking for arbitrary behavioural characteristic type, the number of the identical behavior component contained by both;
Ask for identical behavior component number and the behavior proper vector the ratio of behavior component total number, be multiplied by the default weight of behavior proper vector again, obtain the intermediate weights value of behavior proper vector, and ask for the intermediate weights value of all the other behavior proper vectors in the same way, calculate all intermediate weights value sums, obtain Sample Similarity;
Described whole behavioural characteristic vector weight summations are 1.
Preferably, before asking for Sample Similarity, also comprise: based on default filtercondition, magnanimity malice sample file storehouse is filtered.
Preferably, described default filtercondition is:
Choose arbitrary behavioural characteristic vector of proper vector group to be checked as unitary variant, suppose that all the other behavior proper vectors are mated completely, according to the operation relation between the weight of behavior component number, target similarity and behavioural characteristic vector, obtain the minimum coupling number of components of corresponding each behavior component;
The sub-weight of calculating behavior component, arranges from big to small, forms the sub-weighted list of flashback;
Accumulated list neutron weight one by one, until cumulative sum is greater than goal-selling weight, the number of behavior component accumulated in list is minimum coupling sum;
Based on minimum coupling number of components and the minimum coupling sum filtration magnanimity malice sample file storehouse further of each behavior component obtained.
In sum, first technical scheme of the present invention extracts the behavioural characteristic of each malice sample file in magnanimity malice sample file storehouse; Filter described behavioural characteristic, calculate the hash value of each behavioural characteristic after filtering, and generate behavioural characteristic Vector Groups for each malice sample file; Obtain the proper vector group to be checked of sample file to be checked; Ask for the Sample Similarity of sample file to be checked and each malice sample file, obtain the behavioural characteristic Vector Groups that Sample Similarity is more than or equal to the malice sample file of target similarity, according to described behavioural characteristic Vector Groups, find corresponding malice sample file, described malice sample file is the malice sample file similar to sample file to be checked.Technical scheme of the present invention can find the general character between malice sample file fast in magnanimity malice sample file, inquire required similar sample, and generation report is analyzed to be supplied to related personnel.
Beneficial effect of the present invention is: the behavioural characteristic vector that the present invention is based on malice sample file, employ effective similar sample matches algorithm, the general character between malice sample file can be found fast in magnanimity malice sample file, inquire required similar sample, generate report, analyze to be supplied to related personnel.The technical matterss such as retrieval rate in classic method is slow, length consuming time can be efficiently solved.
Above embodiment is unrestricted technical scheme of the present invention in order to explanation.Do not depart from any modification or partial replacement of spirit and scope of the invention, all should be encompassed in the middle of right of the present invention.

Claims (6)

1. a similar malice sample file matching process for feature based vector, is characterized in that, comprising:
Extract the behavioural characteristic of each malice sample file in magnanimity malice sample file storehouse;
Filter described behavioural characteristic, calculate the hash value of each behavioural characteristic after filtering, and generating behavioural characteristic Vector Groups for each malice sample file, described behavioural characteristic Vector Groups comprises m behavioural characteristic vector, and described m behavioural characteristic vector corresponds to the m class behavior feature of each malice sample file;
The structure of described behavioural characteristic vector is: behavioural characteristic type: [behavior component 1, behavior component 2 ... behavior component n];
Obtain the proper vector group to be checked of sample file to be checked;
Ask for the Sample Similarity of sample file to be checked and each malice sample file, obtain the behavioural characteristic Vector Groups that Sample Similarity is more than or equal to the malice sample file of target similarity, according to described behavioural characteristic Vector Groups, find corresponding malice sample file, described malice sample file is the malice sample file similar to sample file to be checked;
The described concrete grammar asking for Sample Similarity is:
The behavioural characteristic of proper vector group to be checked vector is compared with the behavioural characteristic vector of each malice sample file, under asking for arbitrary behavioural characteristic type, the number of the identical behavior component contained by both;
Ask for identical behavior component number and the behavior proper vector the ratio of behavior component total number, be multiplied by the default weight of behavior proper vector again, obtain the intermediate weights value of behavior proper vector, and ask for the intermediate weights value of all the other behavior proper vectors in the same way, calculate all intermediate weights value sums, obtain Sample Similarity;
Described whole behavioural characteristic vector weight summations are 1.
2. the method for claim 1, is characterized in that, before asking for Sample Similarity, also comprises: based on default filtercondition, filters magnanimity malice sample file storehouse.
3. method as claimed in claim 2, it is characterized in that, described default filtercondition is:
Choose arbitrary behavioural characteristic vector of proper vector group to be checked as unitary variant, suppose that all the other behavior proper vectors are mated completely, according to the operation relation between the weight of behavior component number, target similarity and behavioural characteristic vector, obtain the minimum coupling number of components of corresponding each behavior component;
The sub-weight of calculating behavior component, arranges from big to small, forms the sub-weighted list of flashback;
Accumulated list neutron weight one by one, until cumulative sum is greater than goal-selling weight, the number of behavior component accumulated in list is minimum coupling sum;
Based on minimum coupling number of components and the minimum coupling sum filtration magnanimity malice sample file storehouse further of each behavior component obtained.
4. a similar malice sample file matching system for feature based vector, is characterized in that, comprising:
Malice sample file storehouse processing module, for extracting the behavioural characteristic of each malice sample file in magnanimity malice sample file storehouse;
Filter described behavioural characteristic, calculate the hash value of each behavioural characteristic after filtering, and generating behavioural characteristic Vector Groups for each malice sample file, described behavioural characteristic Vector Groups comprises m behavioural characteristic vector, and described m behavioural characteristic vector corresponds to the m class behavior feature of each malice sample file;
The structure of described behavioural characteristic vector is: behavioural characteristic type: [behavior component 1, behavior component 2 ... behavior component n];
Sample file processing module to be checked, for obtaining the proper vector group to be checked of sample file to be checked;
Matching primitives module, for asking for the Sample Similarity of sample file to be checked and each malice sample file, obtain the behavioural characteristic Vector Groups that Sample Similarity is more than or equal to the malice sample file of target similarity, according to described behavioural characteristic Vector Groups, find corresponding malice sample file, described malice sample file is the malice sample file similar to sample file to be checked;
The described concrete grammar asking for Sample Similarity is:
The behavioural characteristic of proper vector group to be checked vector is compared with the behavioural characteristic vector of each malice sample file, under asking for arbitrary behavioural characteristic type, the number of the identical behavior component contained by both;
Ask for identical behavior component number and the behavior proper vector the ratio of behavior component total number, be multiplied by the default weight of behavior proper vector again, obtain the intermediate weights value of behavior proper vector, and ask for the intermediate weights value of all the other behavior proper vectors in the same way, calculate all intermediate weights value sums, obtain Sample Similarity;
Described whole behavioural characteristic vector weight summations are 1.
5. system as claimed in claim 4, is characterized in that, before asking for Sample Similarity, also comprise: based on default filtercondition, filters magnanimity malice sample file.
6. system as claimed in claim 5, it is characterized in that, described default filtercondition is:
Choose arbitrary behavioural characteristic vector of proper vector group to be checked as unitary variant, suppose that all the other behavior proper vectors are mated completely, according to the operation relation between the weight of behavior component number, target similarity and behavioural characteristic vector, obtain the minimum coupling number of components of corresponding each behavior component;
The sub-weight of calculating behavior component, arranges from big to small, forms the sub-weighted list of flashback;
Accumulated list neutron weight one by one, until cumulative sum is greater than goal-selling weight, the number of behavior component accumulated in list is minimum coupling sum;
Based on minimum coupling number of components and the minimum coupling sum filtration magnanimity malice sample file storehouse further of each behavior component obtained.
CN201410827237.7A 2014-12-29 2014-12-29 A kind of similar malice sample matches method and system based on feature vector Active CN105488406B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410827237.7A CN105488406B (en) 2014-12-29 2014-12-29 A kind of similar malice sample matches method and system based on feature vector

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410827237.7A CN105488406B (en) 2014-12-29 2014-12-29 A kind of similar malice sample matches method and system based on feature vector

Publications (2)

Publication Number Publication Date
CN105488406A true CN105488406A (en) 2016-04-13
CN105488406B CN105488406B (en) 2019-02-26

Family

ID=55675380

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410827237.7A Active CN105488406B (en) 2014-12-29 2014-12-29 A kind of similar malice sample matches method and system based on feature vector

Country Status (1)

Country Link
CN (1) CN105488406B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106326746A (en) * 2016-08-26 2017-01-11 成都科来软件有限公司 Malicious program behavior feature library construction method and device
CN106788962A (en) * 2016-12-13 2017-05-31 电子科技大学 Vector similitude determination methods under secret protection
CN109284610A (en) * 2018-09-11 2019-01-29 腾讯科技(深圳)有限公司 A kind of Research of Malicious Executables Detection Method, device and detection service device
CN110210213A (en) * 2019-04-26 2019-09-06 北京奇安信科技有限公司 The method and device of filtering fallacious sample, storage medium, electronic device
CN111027994A (en) * 2018-10-09 2020-04-17 百度在线网络技术(北京)有限公司 Similar object determination method, device, equipment and medium
CN111444961A (en) * 2020-03-26 2020-07-24 国家计算机网络与信息安全管理中心黑龙江分中心 Method for judging internet website affiliation through clustering algorithm

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102082792A (en) * 2010-12-31 2011-06-01 成都市华为赛门铁克科技有限公司 Phishing webpage detection method and device
CN102663284A (en) * 2012-03-21 2012-09-12 南京邮电大学 Malicious code identification method based on cloud computing
CN103226583A (en) * 2013-04-08 2013-07-31 北京奇虎科技有限公司 Method and device for recognizing advertisement plugin
CN103324888A (en) * 2012-03-19 2013-09-25 哈尔滨安天科技股份有限公司 Method and system for automatically extracting virus characteristics based on family samples
CN103679019A (en) * 2012-09-10 2014-03-26 腾讯科技(深圳)有限公司 Malicious file identifying method and device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102082792A (en) * 2010-12-31 2011-06-01 成都市华为赛门铁克科技有限公司 Phishing webpage detection method and device
CN103324888A (en) * 2012-03-19 2013-09-25 哈尔滨安天科技股份有限公司 Method and system for automatically extracting virus characteristics based on family samples
CN102663284A (en) * 2012-03-21 2012-09-12 南京邮电大学 Malicious code identification method based on cloud computing
CN103679019A (en) * 2012-09-10 2014-03-26 腾讯科技(深圳)有限公司 Malicious file identifying method and device
CN103226583A (en) * 2013-04-08 2013-07-31 北京奇虎科技有限公司 Method and device for recognizing advertisement plugin

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106326746A (en) * 2016-08-26 2017-01-11 成都科来软件有限公司 Malicious program behavior feature library construction method and device
CN106326746B (en) * 2016-08-26 2019-02-19 成都科来软件有限公司 A kind of rogue program behavioural characteristic base construction method and device
CN106788962A (en) * 2016-12-13 2017-05-31 电子科技大学 Vector similitude determination methods under secret protection
CN106788962B (en) * 2016-12-13 2020-04-14 电子科技大学 Vector similarity judgment method under privacy protection
CN109284610A (en) * 2018-09-11 2019-01-29 腾讯科技(深圳)有限公司 A kind of Research of Malicious Executables Detection Method, device and detection service device
CN109284610B (en) * 2018-09-11 2023-02-28 腾讯科技(深圳)有限公司 Virus program detection method and device and detection server
CN111027994A (en) * 2018-10-09 2020-04-17 百度在线网络技术(北京)有限公司 Similar object determination method, device, equipment and medium
CN110210213A (en) * 2019-04-26 2019-09-06 北京奇安信科技有限公司 The method and device of filtering fallacious sample, storage medium, electronic device
CN110210213B (en) * 2019-04-26 2021-04-27 奇安信科技集团股份有限公司 Method and device for filtering malicious sample, storage medium and electronic device
CN111444961A (en) * 2020-03-26 2020-07-24 国家计算机网络与信息安全管理中心黑龙江分中心 Method for judging internet website affiliation through clustering algorithm
CN111444961B (en) * 2020-03-26 2023-08-18 国家计算机网络与信息安全管理中心黑龙江分中心 Method for judging attribution of Internet website through clustering algorithm

Also Published As

Publication number Publication date
CN105488406B (en) 2019-02-26

Similar Documents

Publication Publication Date Title
CN105488406A (en) Similar malicious sample file matching method and system based on feature vector
Davis et al. Exploring power and parameter estimation of the BiSSE method for analyzing species diversification
Yeo et al. Flow-based malware detection using convolutional neural network
Chen et al. Automatic mobile application traffic identification by convolutional neural networks
CN105205397A (en) Rogue program sample classification method and device
CN107368592B (en) Text feature model modeling method and device for network security report
CN106254321A (en) A kind of whole network abnormal data stream sorting technique
CN110414238A (en) The search method and device of homologous binary code
CN110414236A (en) A kind of detection method and device of malicious process
CN104036187A (en) Method and system for determining computer virus types
CN102446254A (en) Similar loophole inquiry method based on text mining
CN105354228B (en) Similar diagram searching method and device
Kim et al. Behavior-based anomaly detection on big data
CN106572486B (en) Handheld terminal flow identification method and system based on machine learning
Krenn et al. Predicting the Future of AI with AI: High-quality link prediction in an exponentially growing knowledge network
IT201600091521A1 (en) METHOD FOR THE EXPLORATION OF PASSIVE TRAFFIC TRACKS AND GROUPING OF SIMILAR URLS.
CN115604032B (en) Method and system for detecting complex multi-step attack of power system
WO2015074493A1 (en) Method and apparatus for filtering out low-frequency click, computer program, and computer readable medium
Bui et al. A clustering-based shrink autoencoder for detecting anomalies in intrusion detection systems
CN114511330B (en) Ether house Pompe fraudster detection method and system based on improved CNN-RF
CN106101086A (en) The cloud detection method of optic of program file and system, client, cloud server
CN104331507A (en) Method and device for automatically finding and classifying machine data categories
US20210192296A1 (en) Data de-identification method and apparatus
Sun et al. Automatically identifying apps in mobile traffic
CN103986606A (en) Method for parallel recognition and statistics of webpage URLs based on MapReduce algorithm

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP03 Change of name, title or address
CP03 Change of name, title or address

Address after: 150028 Building 7, Innovation Plaza, Science and Technology Innovation City, Harbin High-tech Industrial Development Zone, Heilongjiang Province (838 Shikun Road)

Patentee after: Harbin antiy Technology Group Limited by Share Ltd

Address before: 150090 room 506, Hongqi Street, Nangang District, Harbin Development Zone, Heilongjiang, China, 162

Patentee before: Harbin Antiy Technology Co., Ltd.

PE01 Entry into force of the registration of the contract for pledge of patent right

Denomination of invention: Similar malicious sample file matching method and system based on feature vector

Effective date of registration: 20190828

Granted publication date: 20190226

Pledgee: Bank of Longjiang, Limited by Share Ltd, Harbin Limin branch

Pledgor: Harbin antiy Technology Group Limited by Share Ltd

Registration number: Y2019230000002

PE01 Entry into force of the registration of the contract for pledge of patent right
CP01 Change in the name or title of a patent holder

Address after: 150028 Building 7, Innovation Plaza, Science and Technology Innovation City, Harbin High-tech Industrial Development Zone, Heilongjiang Province (838 Shikun Road)

Patentee after: Antan Technology Group Co.,Ltd.

Address before: 150028 Building 7, Innovation Plaza, Science and Technology Innovation City, Harbin High-tech Industrial Development Zone, Heilongjiang Province (838 Shikun Road)

Patentee before: Harbin Antian Science and Technology Group Co.,Ltd.

CP01 Change in the name or title of a patent holder
PC01 Cancellation of the registration of the contract for pledge of patent right

Date of cancellation: 20211119

Granted publication date: 20190226

Pledgee: Bank of Longjiang Limited by Share Ltd. Harbin Limin branch

Pledgor: Harbin Antian Science and Technology Group Co.,Ltd.

Registration number: Y2019230000002

PC01 Cancellation of the registration of the contract for pledge of patent right