CN103679019A - Malicious file identifying method and device - Google Patents

Malicious file identifying method and device Download PDF

Info

Publication number
CN103679019A
CN103679019A CN201210332168.3A CN201210332168A CN103679019A CN 103679019 A CN103679019 A CN 103679019A CN 201210332168 A CN201210332168 A CN 201210332168A CN 103679019 A CN103679019 A CN 103679019A
Authority
CN
China
Prior art keywords
black
weights
white
sample
file
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201210332168.3A
Other languages
Chinese (zh)
Other versions
CN103679019B (en
Inventor
王健
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201210332168.3A priority Critical patent/CN103679019B/en
Publication of CN103679019A publication Critical patent/CN103679019A/en
Application granted granted Critical
Publication of CN103679019B publication Critical patent/CN103679019B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Virology (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a malicious file identifying method and device. The method comprises the steps of extracting a weak feature set of a sample file to be identified, searching for a preset black weight table and a preset white weight table according to the weak feature set and acquiring a black weight and a white weight, acquiring a black weight coefficient through calculation according to the black weight, the white weight and a predetermined algorithm, and identifying the black and white property of the sample file according to the black weight coefficient. According to the malicious file identifying method and device, by extracting the weak feature set of the sample file to be identified, searching for the preset black weight table and the preset white weight table, acquiring the black weight and the white weight and acquiring the black weight coefficient namely the black equivocation through calculation based on the predetermined algorithm, comprehensive judgment is achieved based on multiple combination features of the sample file, the black and white property of the sample file is identified, weights of all the combination features can be obtained through manual experience and data statistics, and then accuracy of judgment of a malicious sample is improved.

Description

Malicious file recognition methods and device
Technical field
The present invention relates to Internet technical field, relate in particular to a kind of malicious file recognition methods and device.
Background technology
For a sample file, its sample characteristics comprises strong feature and weak feature, and strong feature generally has uniqueness, and by strong feature, can identify this sample file is black sample file or white sample file.Relatively common strong feature, is difficult to define the black and white attribute of this sample file by feature a little less than or a group.Such as path of version information, compiling information, responsive character string (as process name occurring in URL, sample etc.), file icon and the file of file etc.
At present, killing antagonism can be concentrated strong feature is resisted, and therefore, the identification of existing malicious file adopts the authentication method based on strong characteristic statistics conventionally.The method adopts many features of fixed position, adopts the method for statistics to obtain a black mark sheet and a white mark sheet, and unknown file can directly obtain the attribute of sample by inquiry black and white mark sheet.
But there is following shortcoming in existing recognition methods:
1, the feature that existing characteristic statistics method adopts has uniqueness, non-black white, although sample recall rate is higher, reports by mistake also larger.
2, the location comparison of feature extraction is fixed, and general antagonism meeting free to kill is out of shape or revises the feature of some fixed positions, once the position of this position and assessor acquisition feature coincide, will be easy to walk around the killing of this method.
Summary of the invention
Fundamental purpose of the present invention is to provide a kind of malicious file recognition methods and device, is intended to improve the identification accuracy of malicious file.
In order to achieve the above object, the present invention proposes a kind of malicious file recognition methods, comprising:
Extract the weak feature set of sample file to be identified;
According to described weak feature set, search black, the white weight table of setting up in advance, obtain black weights and white weights;
According to described black weights and white weights and pre-defined algorithm, calculate and obtain black weights coefficient;
According to described black weights coefficient, identify the black and white attribute of described sample file.
The present invention also proposes a kind of malicious file recognition device, comprising:
Extraction module, for extracting the weak feature set of sample file to be identified;
Search module, for searching according to described weak feature set black, the white weight table of setting up in advance, obtain black weights and white weights;
Computing module, for according to described black weights and white weights and pre-defined algorithm, calculates and obtains black weights coefficient;
Identification module, for identifying the black and white attribute of described sample file according to described black weights coefficient.
A kind of malicious file recognition methods and device that the present invention proposes, by extracting the weak feature set of sample file to be identified, search black, the white weight table of setting up in advance, obtain black weights and white weights, and based on pre-defined algorithm, calculate that to obtain black weights coefficient be black suspicious degree, by the many assemblage characteristics to sample file, carry out synthetic determination thus, the black and white attribute of recognition sample file, and the weights of each assemblage characteristic can obtain according to artificial experience and data statistics, improved thus the accuracy that malice sample is judged.
Accompanying drawing explanation
Fig. 1 is the schematic flow sheet of malicious file recognition methods the first embodiment of the present invention;
Fig. 2 is the schematic flow sheet of malicious file recognition methods the second embodiment of the present invention;
Fig. 3 is the structural representation of malicious file recognition device the first embodiment of the present invention;
Fig. 4 is the structural representation of malicious file recognition device the second embodiment of the present invention.
In order to make technical scheme of the present invention clearer, clear, below in conjunction with accompanying drawing, be described in further detail.
Embodiment
The solution of the embodiment of the present invention is mainly: the weak feature set of extracting sample file to be identified, search black, the white weight table of setting up in advance, obtain black weights and white weights, based on pre-defined algorithm, calculate that to obtain black weights coefficient be black suspicious degree, black and white attribute with this recognition sample file, each feature comprising in black, white weight table due to foundation in advance or the weights of each assemblage characteristic can obtain according to artificial experience and data statistics, can improve thus the accuracy that malice sample is judged.
The technical term the present invention relates to comprises:
One-dimensional characteristic: the special case of of multidimensional characteristic, refer to extract one independently feature from a file, do not combine with any other feature.Such as: feature A forms an one-dimensional characteristic.Concrete as: feature T1, T2, T3 ... Tn etc.
Multidimensional characteristic: more than two and two combination of feature of extracting from a file.Such as: feature A and feature B form a two dimensional character.Specific as follows:
Two dimensional character, such as: T1T2, T1T3 ... T1Tn, T2Tn, T2T3 ... TmTn
Three-dimensional feature, such as: T1T2T3, T1T3T4 ... T2T3Tn, T2TmTn ... TiTmTn
......
As shown in Figure 1, first embodiment of the invention proposes a kind of malicious file recognition methods, comprising:
Step S101, extracts the weak feature set of sample file to be identified;
The present embodiment is considered, general killing antagonism can be concentrated strong feature is resisted, and for the distortion sample of same family, its weak feature often changes not quite, therefore, the weak feature of sample file is also a kind of effective means of identification to new variant virus or unknown virus.
The present embodiment has been set up black, white weight table in advance, this black, white weight table includes from the corresponding relation of known black sample file collection and the weak feature set of white sample file collection extraction and the weights of setting, wherein, the weights of corresponding each weak feature set can be set automatically according to statistics, also can set manually.
Whether the present embodiment is that malicious file carries out synthetic determination by feature or Feature Combination to sample file.Above-mentioned weak feature set can be the set of one-dimensional characteristic, can be also the set that the multidimensional characteristic such as two-dimentional, three-dimensional combines.Therefore,, in black, the white weight table of setting up in advance, corresponding each category feature combination (comprising one-dimensional characteristic, two dimensional character etc.), all has corresponding weights corresponding with it.
When having sample file to identify, first from this sample file, extract weak feature set.
When extracting weak feature set, the feature locations extracting can not fixed, there is certain range of choice, internal characteristics and the peripheral information that can comprise file, wherein, internal characteristics can be version information, compiling information, responsive character string (as process name occurring in URL, sample etc.), the file icon of sample file; Described peripheral information can be the file path of depositing on subscriber set, filename etc.
Step S102, searches according to described weak feature set black, the white weight table of setting up in advance, obtains black weights and white weights;
As previously mentioned, in black, the white weight table of setting up in advance, include the corresponding relation of the weights of weak feature set and its setting.After extracting the weak feature set of sample file to be identified, for this weak feature set, remove to search black, white weight table, obtain corresponding black weights and white weights.
Step S103, according to described black weights and white weights and pre-defined algorithm, calculates and obtains black weights coefficient;
The present embodiment specifically adopts black weights and the white weights that Bayes' theorem integrating step S102 obtains to carry out the calculating of black weights coefficient.Wherein, black weights coefficient refers to the black suspicious degree of sample file, can judge the black and white attribute of this sample file by this black weights coefficient.
The formula that adopts Bayes' theorem to calculate black weights coefficient is:
Figure BDA00002120891100041
Wherein: P (A|B) represents black weights coefficient, refer to the black suspicious degree of sample file; P (B|A) is black weights; P (B)=black weights+white weights; If set the black and white attribute probability of sample file to be identified, equate, P (A)=50%.
Thus, for a unknown sample file, by inquiry black and white weight table, obtain its black, white weights, then by Bayes' theorem, can to obtain this sample file be black Bayes's weights coefficient, the i.e. black suspicious degree of sample.
Step S104, identifies the black and white attribute of described sample file according to described black weights coefficient.
In calculating, get after the black weights coefficient of sample file, this black weights coefficient and preset threshold value are compared, with this, carry out the black and white attribute of judgement sample file.
The present embodiment is provided with two threshold values, the first preset threshold value and the second preset threshold value, and described the second preset threshold value is less than described the first preset threshold value, such as the first preset threshold value and the second preset threshold value can be distinguished value 70% and 50%.
By the second preset threshold value, carrying out judgement sample file is black suspicious sample presents, and by the first preset threshold value, carrying out judgement sample file is black file.
Particularly, the black weights coefficient and the first preset threshold value that calculating are obtained compare; If described black weights coefficient is greater than described the first preset threshold value, identifying described sample file is black sample file.
When being that non-black sample file is that black weights coefficient is while being less than or equal to described the first preset threshold value by the first preset threshold value judgement sample file, for judgement sample file is the probability of black sample file or white sample file, described black weights coefficient and the second preset threshold value are compared; If described black weights coefficient is greater than described the second preset threshold value and is less than described the first preset threshold value; Identifying described sample file is black suspicious sample presents, if described black weights coefficient is less than described the second preset threshold value, can judge that this sample file is as white suspicious sample presents or white sample file.
The present embodiment, by such scheme, carries out synthetic determination according to the feature of sample file or assemblage characteristic, provides a black weights coefficient, and the weights of each stack features can obtain according to artificial experience and data statistics, can improve thus the accuracy that sample is judged.
It should be noted that, in the weak feature set of utilizing sample file to be identified to extract, inquire about black, white weight table obtains black, white weights, while calculating thus black weights coefficient, can first utilize the one-dimensional characteristic of sample file to extract weak feature set, go to search black, white weights, finally calculate black weights coefficient, if the black weights coefficient calculating does not reach desired threshold (non-black sample file), can consider again to utilize the two dimensional character of sample file to extract weak feature set, inquire about black, white weight table obtains black, white weights, calculate thus black weights coefficient, judge the black and white attribute of sample file.
Further, principle is by increasing dimension, to more high-dimensional assemblage characteristic expansions such as three-dimensional features, to improve the accuracy of sample file judgement thus.
Certainly, along with the increase of characteristic dimension, the combination of feature is the power multiplication of dimension, can greatly increase the complexity of calculating, and therefore, dimension can be controlled in three-dimensional in actual applications.
In addition,, because the present invention extracts the unfixing of feature locations, along with the increase of dimension, also can reduce the efficiency of feature extraction.
The present embodiment passes through such scheme, can start with from the weak feature of easy uncared-for sample file, the distinguishing ability of enhancing to trojan horse, for example, for information such as file version information, icons, prior art can not be identified as a single feature it to sample, but, by the present embodiment technological means, in conjunction with feature a little less than other, its black weights coefficient of comprehensive assessment, can be converted into strong feature by feature a little less than this sample is effectively identified, thereby has improved the judgement accuracy of sample file.
As shown in Figure 2, second embodiment of the invention proposes a kind of malicious file recognition methods, on the basis of above-mentioned the first embodiment, before above-mentioned steps S101, also comprises:
Step S90, chooses known black sample set and white sample set is trained, and extracts the wherein weak feature set of each sample;
Step S100, sets weights for feature set a little less than each, sets up described black, white weight table.
The difference of the present embodiment and above-described embodiment is, the present embodiment also comprises the scheme black, white weight table of setting up.
Particularly, when setting up black, white weight table, first, collect a collection of known black sample set and white sample set is trained, extract the weak feature set of each sample in black sample set and white sample set, then, weak feature set is weighted, for feature set a little less than each is set weights, thus, obtain white weights storehouse and black weights storehouse, afterwards, for white weights storehouse and black weights storehouse, set up black weight table and white weight table.
Wherein, the weights of corresponding each weak feature set can be set automatically according to statistics, also can set according to artificial experience.
Particularly, for the situation of weights is set according to artificial experience:
For one-dimensional characteristic weights, if an one-dimensional characteristic A is enough to judge that a sample is as black, can manually give higher weights of this one-dimensional characteristic;
For two dimensional character weights, if a sample that not only contains A feature but also contain B feature is a virus document, according to artificial experience, can give high weight of its two dimensional character AB.
For set the situation of weights according to statistics, its computing method are as follows:
By the statistics of the black and white sample set to set, the frequency that feature group is occurred in its set sample set is as the weights of concentrated this feature group of respective sample.
The present embodiment passes through such scheme, with the sample file of known black and white, carry out feature extraction and set up black, white weight table, by the many assemblage characteristics to sample file, carry out synthetic determination, the black and white attribute of recognition sample file, the weights of each assemblage characteristic can obtain according to artificial experience and data statistics, have improved thus the accuracy that malice sample is judged.
As shown in Figure 3, first embodiment of the invention proposes a kind of malicious file recognition device, comprising: extraction module 401, search module 402, computing module 403 and identification module 404, wherein:
Extraction module 401, for extracting the weak feature set of sample file to be identified;
Search module 402, for searching according to described weak feature set black, the white weight table of setting up in advance, obtain black weights and white weights;
Computing module 403, for according to described black weights and white weights and pre-defined algorithm, calculates and obtains black weights coefficient;
Identification module 404, for identifying the black and white attribute of described sample file according to described black weights coefficient.
The present embodiment is considered, general killing antagonism can be concentrated strong feature is resisted, and for the distortion sample of same family, its weak feature often changes not quite, therefore, the weak feature of sample file is also a kind of effective means of identification to new variant virus or unknown virus.
The present embodiment has been set up black, white weight table in advance, this black, white weight table includes from the corresponding relation of known black sample file collection and the weak feature set of white sample file collection extraction and the weights of setting, wherein, the weights of corresponding each weak feature set can be set automatically according to statistics, also can set manually.
Whether the present embodiment is that malicious file carries out synthetic determination by feature or Feature Combination to sample file.Above-mentioned weak feature set can be the set of one-dimensional characteristic, can be also the set that the multidimensional characteristic such as two-dimentional, three-dimensional combines.Therefore,, in black, the white weight table of setting up in advance, corresponding each category feature combination (comprising one-dimensional characteristic, two dimensional character etc.), all has corresponding weights corresponding with it.
When having sample file to identify, first by extraction module 401, from this sample file, extract weak feature set.
When extracting weak feature set, the feature locations extracting can not fixed, there is certain range of choice, internal characteristics and the peripheral information that can comprise file, wherein, internal characteristics can be version information, compiling information, responsive character string (as process name occurring in URL, sample etc.), the file icon of sample file; Described peripheral information can be the file path of depositing on subscriber set, filename etc.
As previously mentioned, in black, the white weight table of setting up in advance, include the corresponding relation of the weights of weak feature set and its setting.After extracting the weak feature set of sample file to be identified, search module 402 and remove to search black, white weight table for this weak feature set, obtain corresponding black weights and white weights.
After getting black weights and white weights, computing module 403 adopts Bayes' theorem to carry out the calculating of black weights coefficient.Wherein, black weights coefficient refers to the black suspicious degree of sample file, can judge the black and white attribute of this sample file by this black weights coefficient.
The formula that adopts Bayes' theorem to calculate black weights coefficient is:
P ( A | B ) = P ( A ) P ( B | A ) P ( B )
Wherein: P (A|B) represents black weights coefficient, refer to the black suspicious degree of sample file; P (B|A) is black weights; P (B)=black weights+white weights; If set the black and white attribute probability of sample file to be identified, equate, P (A)=50%.
Thus, for a unknown sample file, by inquiry black and white weight table, obtain its black, white weights, then by Bayes' theorem, can to obtain this sample file be black Bayes's weights coefficient, the i.e. black suspicious degree of sample.
In calculating, get after the black weights coefficient of sample file, identification module 404 compares this black weights coefficient and preset threshold value, with this, carrys out the black and white attribute of judgement sample file.
The present embodiment is provided with two threshold values, the first preset threshold value and the second preset threshold value, and described the second preset threshold value is less than described the first preset threshold value, such as the first preset threshold value and the second preset threshold value can be distinguished value 70% and 50%.
By the second preset threshold value, carrying out judgement sample file is black suspicious sample presents, and by the first preset threshold value, carrying out judgement sample file is black file.
Particularly, the black weights coefficient and the first preset threshold value that calculating are obtained compare; If described black weights coefficient is greater than described the first preset threshold value, identifying described sample file is black sample file.
When being that non-black sample file is that black weights coefficient is while being less than or equal to described the first preset threshold value by the first preset threshold value judgement sample file, for judgement sample file is the probability of black sample file or white sample file, described black weights coefficient and the second preset threshold value are compared; If described black weights coefficient is greater than described the second preset threshold value and is less than described the first preset threshold value; Identifying described sample file is black suspicious sample presents, if described black weights coefficient is less than described the second preset threshold value, can judge that this sample file is as white suspicious sample presents or white sample file.
The present embodiment, by such scheme, carries out synthetic determination according to the feature of sample file or assemblage characteristic, provides a black weights coefficient, and the weights of each stack features can obtain according to artificial experience and data statistics, can improve thus the accuracy that sample is judged.
It should be noted that, in the weak feature set of utilizing sample file to be identified to extract, inquire about black, white weight table obtains black, white weights, while calculating thus black weights coefficient, can first utilize the one-dimensional characteristic of sample file to extract weak feature set, go to search black, white weights, finally calculate black weights coefficient, if the black weights coefficient calculating does not reach desired threshold (non-black sample file), can consider again to utilize the two dimensional character of sample file to extract weak feature set, inquire about black, white weight table obtains black, white weights, calculate thus black weights coefficient, judge the black and white attribute of sample file.
Further, principle is by increasing dimension, to more high-dimensional assemblage characteristic expansions such as three-dimensional features, to improve the accuracy of sample file judgement thus.
Certainly, along with the increase of characteristic dimension, the combination of feature is the power multiplication of dimension, can greatly increase the complexity of calculating, and therefore, dimension can be controlled in three-dimensional in actual applications.
In addition,, because the present invention extracts the unfixing of feature locations, along with the increase of dimension, also can reduce the efficiency of feature extraction.
The present embodiment passes through such scheme, can start with from the weak feature of easy uncared-for sample file, the distinguishing ability of enhancing to trojan horse, for example, for information such as file version information, icons, prior art can not be identified as a single feature it to sample, but, by the present embodiment technological means, in conjunction with feature a little less than other, its black weights coefficient of comprehensive assessment, can be converted into strong feature by feature a little less than this sample is effectively identified, thereby has improved the judgement accuracy of sample file.
As shown in Figure 4, second embodiment of the invention also proposes a kind of malicious file recognition device, on the basis of above-described embodiment, also comprises:
Set up module 400, for choosing known black sample set, train with white sample set, extract the wherein weak feature set of each sample; For feature set a little less than each, set weights, set up described black, white weight table.
The difference of the present embodiment and above-described embodiment is, the present embodiment also comprises the scheme black, white weight table of setting up.
Particularly, when setting up black, white weight table, first, collect a collection of known black sample set and white sample set is trained, extract the weak feature set of each sample in black sample set and white sample set, then, weak feature set is weighted, for feature set a little less than each is set weights, thus, obtain white weights storehouse and black weights storehouse, afterwards, for white weights storehouse and black weights storehouse, set up black weight table and white weight table.
Wherein, the weights of corresponding each weak feature set can be set automatically according to statistics, also can set according to artificial experience.
Particularly, for the situation of weights is set according to artificial experience:
For one-dimensional characteristic weights, if an one-dimensional characteristic A is enough to judge that a sample is as black, can manually give higher weights of this one-dimensional characteristic;
For two dimensional character weights, if a sample that not only contains A feature but also contain B feature is a virus document, according to artificial experience, can give high weight of its two dimensional character AB.
For set the situation of weights according to statistics, its computing method are as follows:
By the statistics of the black and white sample set to set, the frequency that feature group is occurred in its set sample set is as the weights of concentrated this feature group of respective sample.
The present embodiment passes through such scheme, with the sample file of known black and white, carry out feature extraction and set up black, white weight table, by the many assemblage characteristics to sample file, carry out synthetic determination, the black and white attribute of recognition sample file, the weights of each assemblage characteristic can obtain according to artificial experience and data statistics, have improved thus the accuracy that malice sample is judged.
The foregoing is only the preferred embodiments of the present invention; not thereby limit the scope of the claims of the present invention; every equivalent structure or flow process conversion that utilizes instructions of the present invention and accompanying drawing content to do; or be directly or indirectly used in other relevant technical field, be all in like manner included in scope of patent protection of the present invention.

Claims (11)

1. a malicious file recognition methods, is characterized in that, comprising:
Extract the weak feature set of sample file to be identified;
According to described weak feature set, search black, the white weight table of setting up in advance, obtain black weights and white weights;
According to described black weights and white weights and pre-defined algorithm, calculate and obtain black weights coefficient;
According to described black weights coefficient, identify the black and white attribute of described sample file.
2. method according to claim 1, is characterized in that, the described step of identifying the black and white attribute of described sample file according to black weights coefficient comprises:
Described black weights coefficient and the first preset threshold value are compared;
If described black weights coefficient is greater than described the first preset threshold value, identifying described sample file is black sample file.
3. method according to claim 2, is characterized in that, the described step of identifying the black and white attribute of described sample file according to black weights coefficient further comprises:
Described black weights coefficient and the second preset threshold value are compared; Described the second preset threshold value is less than described the first preset threshold value;
If described black weights coefficient is greater than described the second preset threshold value and is less than described the first preset threshold value; Identifying described sample file is black suspicious sample presents.
4. according to the method described in claim 1,2 or 3, it is characterized in that, before the step of the weak feature set of the sample file that described extraction is to be identified, also comprise:
Choose known black sample set and train with white sample set, extract the wherein weak feature set of each sample;
For feature set a little less than each, set weights, set up described black, white weight table.
5. method according to claim 4, is characterized in that, described weak feature set comprises the set of one-dimensional characteristic or the set of multidimensional characteristic combination.
6. method according to claim 4, it is characterized in that, when extracting weak feature set, the characteristic range of extraction comprises internal characteristics and the peripheral information of file, and it is one of following that described internal characteristics at least comprises: the version information of sample file, compiling information, responsive character string, file icon; It is one of following that described peripheral information at least comprises: the path that file is deposited on subscriber set, filename.
7. a malicious file recognition device, is characterized in that, comprising:
Extraction module, for extracting the weak feature set of sample file to be identified;
Search module, for searching according to described weak feature set black, the white weight table of setting up in advance, obtain black weights and white weights;
Computing module, for according to described black weights and white weights and pre-defined algorithm, calculates and obtains black weights coefficient;
Identification module, for identifying the black and white attribute of described sample file according to described black weights coefficient.
8. device according to claim 7, is characterized in that, described identification module is also for comparing described black weights coefficient and the first preset threshold value; If described black weights coefficient is greater than described the first preset threshold value, identifying described sample file is black sample file.
9. device according to claim 8, is characterized in that, described identification module is also for comparing described black weights coefficient and the second preset threshold value; Described the second preset threshold value is less than described the first preset threshold value; If described black weights coefficient is greater than described the second preset threshold value and is less than described the first preset threshold value; Identifying described sample file is black suspicious sample presents.
10. according to the device described in claim 7,8 or 9, it is characterized in that, also comprise:
Set up module, for choosing known black sample set, train with white sample set, extract the wherein weak feature set of each sample; For feature set a little less than each, set weights, set up described black, white weight table.
11. devices according to claim 10, is characterized in that, described weak feature set comprises the set of one-dimensional characteristic or the set of multidimensional characteristic combination.
CN201210332168.3A 2012-09-10 2012-09-10 Malicious file recognition methodss and device Active CN103679019B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210332168.3A CN103679019B (en) 2012-09-10 2012-09-10 Malicious file recognition methodss and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210332168.3A CN103679019B (en) 2012-09-10 2012-09-10 Malicious file recognition methodss and device

Publications (2)

Publication Number Publication Date
CN103679019A true CN103679019A (en) 2014-03-26
CN103679019B CN103679019B (en) 2017-03-08

Family

ID=50316529

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210332168.3A Active CN103679019B (en) 2012-09-10 2012-09-10 Malicious file recognition methodss and device

Country Status (1)

Country Link
CN (1) CN103679019B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105095752A (en) * 2014-05-07 2015-11-25 腾讯科技(深圳)有限公司 Identification method, apparatus and system of virus packet
CN105488406A (en) * 2014-12-29 2016-04-13 哈尔滨安天科技股份有限公司 Similar malicious sample file matching method and system based on feature vector
CN105512555A (en) * 2014-12-12 2016-04-20 哈尔滨安天科技股份有限公司 Homologous family dividing and mutation method and system based on file string cluster
CN106789844A (en) * 2015-11-23 2017-05-31 阿里巴巴集团控股有限公司 A kind of malicious user recognition methods and device
CN108171054A (en) * 2016-12-05 2018-06-15 中国科学院软件研究所 The detection method and system of a kind of malicious code for social deception
WO2020014916A1 (en) * 2018-07-19 2020-01-23 华为技术有限公司 Method for identifying user and related device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101984450A (en) * 2010-12-15 2011-03-09 北京安天电子设备有限公司 Malicious code detection method and system
US20110247072A1 (en) * 2008-11-03 2011-10-06 Stuart Gresley Staniford Systems and Methods for Detecting Malicious PDF Network Content
CN102254120A (en) * 2011-08-09 2011-11-23 成都市华为赛门铁克科技有限公司 Method, system and relevant device for detecting malicious codes
CN102479298A (en) * 2010-11-29 2012-05-30 北京奇虎科技有限公司 Program identification method and device based on machine learning

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110247072A1 (en) * 2008-11-03 2011-10-06 Stuart Gresley Staniford Systems and Methods for Detecting Malicious PDF Network Content
CN102479298A (en) * 2010-11-29 2012-05-30 北京奇虎科技有限公司 Program identification method and device based on machine learning
CN101984450A (en) * 2010-12-15 2011-03-09 北京安天电子设备有限公司 Malicious code detection method and system
CN102254120A (en) * 2011-08-09 2011-11-23 成都市华为赛门铁克科技有限公司 Method, system and relevant device for detecting malicious codes

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105095752A (en) * 2014-05-07 2015-11-25 腾讯科技(深圳)有限公司 Identification method, apparatus and system of virus packet
CN105512555A (en) * 2014-12-12 2016-04-20 哈尔滨安天科技股份有限公司 Homologous family dividing and mutation method and system based on file string cluster
CN105512555B (en) * 2014-12-12 2018-05-25 哈尔滨安天科技股份有限公司 Based on the homologous family of division of file character string cluster and the method and system of mutation
CN105488406A (en) * 2014-12-29 2016-04-13 哈尔滨安天科技股份有限公司 Similar malicious sample file matching method and system based on feature vector
CN105488406B (en) * 2014-12-29 2019-02-26 哈尔滨安天科技股份有限公司 A kind of similar malice sample matches method and system based on feature vector
CN106789844A (en) * 2015-11-23 2017-05-31 阿里巴巴集团控股有限公司 A kind of malicious user recognition methods and device
CN106789844B (en) * 2015-11-23 2020-06-16 阿里巴巴集团控股有限公司 Malicious user identification method and device
CN111629010A (en) * 2015-11-23 2020-09-04 阿里巴巴集团控股有限公司 Malicious user identification method and device
CN111629010B (en) * 2015-11-23 2023-03-10 创新先进技术有限公司 Malicious user identification method and device
CN108171054A (en) * 2016-12-05 2018-06-15 中国科学院软件研究所 The detection method and system of a kind of malicious code for social deception
WO2020014916A1 (en) * 2018-07-19 2020-01-23 华为技术有限公司 Method for identifying user and related device

Also Published As

Publication number Publication date
CN103679019B (en) 2017-03-08

Similar Documents

Publication Publication Date Title
CN103679019A (en) Malicious file identifying method and device
CN103488941B (en) Hardware Trojan horse detection method and system
CN102710795B (en) Hotspot collecting method and device
CN102739679A (en) URL(Uniform Resource Locator) classification-based phishing website detection method
CN103617233A (en) Method and device for detecting repeated video based on semantic content multilayer expression
CN103984738A (en) Role labelling method based on search matching
CN104159232B (en) Method of recognizing protocol format of binary message data
CN102375813A (en) Duplicate detection system and method for search engines
CN102790762A (en) Phishing website detection method based on uniform resource locator (URL) classification
CN102170446A (en) Fishing webpage detection method based on spatial layout and visual features
CN102779249A (en) Malicious program detection method and scan engine
CN103679012A (en) Clustering method and device of portable execute (PE) files
CN104545887A (en) Method and device for identifying artifact electrocardiograph waveforms
CN109818949A (en) A kind of anti-crawler method neural network based
CN102542061A (en) Intelligent product classification method
CN108846117A (en) The duplicate removal screening technique and device of business news flash
CN111538741A (en) Deep learning analysis method and system for big data of alarm condition
CN106980651A (en) A kind of knowledge based collection of illustrative plates crawls seed list update method and device
CN103294820A (en) WEB page classifying method and system based on semantic extension
CN103885947A (en) Mining method for searching demands, intelligent searching method and device thereof
CN103714159A (en) Coarse-to-fine fingerprint identification method fusing second-level and third-level features
CN105445577B (en) A kind of power quality interference source industry and mining city method
CN103678327B (en) Method and device for information association
CN103136212A (en) Mining method of class new words and device
CN103823753B (en) Webpage sampling method oriented at barrier-free webpage content detection

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant