CN103761477A - Method and equipment for acquiring virus program samples - Google Patents

Method and equipment for acquiring virus program samples Download PDF

Info

Publication number
CN103761477A
CN103761477A CN201410006827.3A CN201410006827A CN103761477A CN 103761477 A CN103761477 A CN 103761477A CN 201410006827 A CN201410006827 A CN 201410006827A CN 103761477 A CN103761477 A CN 103761477A
Authority
CN
China
Prior art keywords
file
feature extraction
extraction condition
classification
vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201410006827.3A
Other languages
Chinese (zh)
Inventor
唐海
陈卓
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Qihoo Technology Co Ltd
Qizhi Software Beijing Co Ltd
Original Assignee
Beijing Qihoo Technology Co Ltd
Qizhi Software Beijing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Qihoo Technology Co Ltd, Qizhi Software Beijing Co Ltd filed Critical Beijing Qihoo Technology Co Ltd
Priority to CN201410006827.3A priority Critical patent/CN103761477A/en
Publication of CN103761477A publication Critical patent/CN103761477A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/561Virus type analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Data Mining & Analysis (AREA)
  • Virology (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computer Hardware Design (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention provides a method and equipment for acquiring a virus program samples. The method includes determining at least one feature extraction condition according to file types, and randomly taking N feature extraction conditions from the feature extraction conditions, where N is a positive integer; respectively utilizing the N feature extraction conditions to perform feature extraction on each file, and determining a vector value of a feature extracted according to each feature extraction condition according to extraction results; for each file, generating corresponding classification vectors according to the vector values of the N feature extraction conditions; randomly taking one classification vector as a classification base point, and calculating classification distance between each of other classification vectors except for the classification vector as the classification base point and the classification base point; classifying the files according to the classification distances acquired by calculating. By the method and the equipment, time for performing classification operation on the files can be saved, and the files can be classified accurately and efficiently.

Description

A kind of acquisition methods of Virus sample and equipment
Technical field
The present invention relates to internet, applications field, particularly relate to a kind of acquisition methods and equipment of Virus sample.
Background technology
Along with social Informatization Development, terminal (comprising the plurality of devices such as computer, mobile phone) is more and more important in people's life.People rely on more and more terminal and preserve personal information, such as the even information such as some picture photos of various account information, private chat record.Therefore, if terminal system is suffered the threat of malicious file (as malice network address or computer virus etc.), easily cause the leakage of personal information, user is caused to incalculable damage.Therefore, malicious file classified and processed, avoiding terminal system to suffer the threat of malicious file, guaranteeing that the security of terminal system is very important.
In prior art, when document classification, only can to file, classify according to a feature, inefficiency, and waste a large amount of time.Therefore, use prior art when virus document is classified, require a great deal of time and manpower is classified to virus document.And every subseries only can be for a feature, inefficiency.In addition, use prior art when a large amount of virus documents are classified, because every subseries only can be for a feature, and the enormous amount of file causes classifying inaccurate.To virus document inaccurate classification the malicious file that terminal system carried out according to virus document is detected and processing etc. inaccurate, cause guaranteeing the security of terminal system, user's information security is threatened.
Summary of the invention
In view of the above problems, the present invention has been proposed to a kind of acquisition methods and corresponding equipment of a kind of Virus sample that overcomes the problems referred to above or address the above problem are at least in part provided.
According to one aspect of the present invention, a kind of acquisition methods of Virus sample is provided, comprising: according to file type, determine at least one feature extraction condition, and appoint and get N feature extraction condition in described feature extraction condition, wherein N is positive integer; Utilize respectively described N feature extraction condition to carry out feature extraction to each file, according to extraction result, determine the vector value of the feature of each feature extraction condition extraction; For each file, according to the vector value of the feature of described N feature extraction condition, generate corresponding class vector; Appoint and get a described class vector as classification basic point, calculate except the classifying distance as between class vector described in other class vector of basic point of classifying and described classification basic point; According to the each classifying distance calculating, described file is classified.
Alternatively, the acquisition methods of described Virus sample also comprises: after this subseries, determine whether also to exist untapped feature extraction condition; If so, in described untapped feature extraction condition, again select M feature extraction condition, utilize described M feature extraction condition to carry out the sort operation to described file; Repeat the sort operation to described file, until described feature extraction condition is all used.
Alternatively, according to described classifying distance, described file is classified, comprising: according to apart from length, classifying distance being divided into groups; According to different groups, described file is divided into different classes of.
Alternatively, described according to apart from length, classifying distance being divided into groups, comprising: judge described in each whether classifying distance is not more than classification predetermined threshold; If so, described classifying distance is divided into one group; If not, described classifying distance is divided into other one group; According to different groups, described file is divided into different classes ofly, comprises: corresponding the classifying distance of same group file is divided into same class file.
Alternatively, the mean value that described classification predetermined threshold is described classifying distance.
Alternatively, described file comprises virus document.
According to another aspect of the present invention, a kind of equipment that obtains of Virus sample is also provided, comprising: getter, be configured to determine at least one feature extraction condition according to file type, and appoint and get N feature extraction condition in described feature extraction condition, wherein, N is positive integer; Vector generator, is configured to utilize respectively described N feature extraction condition to carry out feature extraction to each file, determines the vector value of the feature of each feature extraction condition extraction according to extraction result; And for each file, according to the vector value of the feature of described N feature extraction condition, generate corresponding class vector; Distance calculator, is configured to appoint and gets a described class vector as classification basic point, calculates except the classifying distance as between class vector described in other class vector of basic point of classifying and described classification basic point; Sorter, is configured to according to the each classifying distance calculating, described file be classified.
Alternatively, the equipment that obtains of described Virus sample also comprises: described vector generator is also configured to, and after this subseries, determines whether also to exist untapped feature extraction condition; If so, in described untapped feature extraction condition, again select M feature extraction condition, trigger described distance calculator and described sorter and utilize described M feature extraction condition to carry out the sort operation to described file; Repeat the sort operation to described file, until described feature extraction condition is all used.
Alternatively, described sorter is also configured to: according to apart from length, classifying distance being divided into groups; According to different groups, described file is divided into different classes of.
Alternatively, the equipment that obtains of described Virus sample also comprises: determining device, is configured to judge whether described classifying distance is not more than classification predetermined threshold.
Alternatively, described sorter is also configured to: if according to described determining device, described classifying distance is not more than classification predetermined threshold, and described classifying distance is divided into one group; If according to described determining device, described classifying distance is less than classification predetermined threshold, and described classifying distance is divided into other one group; According to different groups, described file is divided into different classes ofly, comprises: corresponding the classifying distance of same group file is divided into same class file.
Alternatively, the mean value that described classification predetermined threshold is described classifying distance.
In embodiments of the present invention, according to file type, determine feature extraction condition, and according to feature extraction condition, file is carried out to feature extraction, generate class vector.Because the feature between dissimilar file is distinct, according to the type of file, determine respectively that the feature extraction condition of file can guarantee, after file is carried out to basic classification of type, file is carried out to concrete and more significant classification.For example, to the type of a DES(executable file) file of type carries out feature extraction and can determine feature extraction condition is in DES file, whether to comprise feedback to compile instruction, to pe(file layout) file of form carries out feature extraction and can determine feature extraction condition is whether to comprise importing table, comprise number of derived table etc.After class vector corresponding to feature extraction condition spanned file, in the embodiment of the present invention, appoint and get a class vector as classification basic point, calculate the classifying distance between classification basic point and other class vectors, and according to classifying distance, file is classified, while having solved in prior art document classification, every subseries only can be classified to file according to a feature, cause spending the plenty of time to document classification, classification effectiveness the low and inaccurate problem of classifying.The embodiment of the present invention, according to definite vector corresponding to feature extraction condition spanned file, is used multiple feature extraction conditions file is classified simultaneously, can save the classification time and improve the accuracy of classification.For example, the acquisition methods of the Virus sample that the use embodiment of the present invention provides is classified to virus document, can guarantee the virus document of accurately classifying, other virus document of same class is divided into one group, and then guarantee more efficiently other virus document of same class to be carried out to analyzing and processing, so that the real-time unknown sample to collecting is carried out safe judgement, guarantee more expeditiously the security of terminal system, improve user and experience.
Above-mentioned explanation is only the general introduction of technical solution of the present invention, in order to better understand technological means of the present invention, and can be implemented according to the content of instructions, and for above and other objects of the present invention, feature and advantage can be become apparent, below especially exemplified by the specific embodiment of the present invention.
According to the detailed description to the specific embodiment of the invention by reference to the accompanying drawings below, those skilled in the art will understand above-mentioned and other objects, advantage and feature of the present invention more.
Accompanying drawing explanation
By reading below detailed description of the preferred embodiment, various other advantage and benefits will become cheer and bright for those of ordinary skills.Accompanying drawing is only for the object of preferred implementation is shown, and do not think limitation of the present invention.And in whole accompanying drawing, represent identical parts by identical reference symbol.In the accompanying drawings:
Fig. 1 shows the processing flow chart of the acquisition methods of Virus sample according to an embodiment of the invention;
Fig. 2 shows the list of the property value in the file header of the file that obtains according to an embodiment of the invention dex form;
Fig. 3 shows the processing flow chart of the acquisition methods of Virus sample in accordance with a preferred embodiment of the present invention; And
Fig. 4 shows the structural representation of the equipment that obtains of Virus sample according to an embodiment of the invention.
Embodiment
The algorithm providing at this is intrinsic not relevant to any certain computer, virtual system or miscellaneous equipment with demonstration.Various general-purpose systems also can with based on using together with this teaching.According to description above, it is apparent constructing the desired structure of this type systematic.In addition, the present invention is not also for any certain programmed language.It should be understood that and can utilize various programming languages to realize content of the present invention described here, and the description of above language-specific being done is in order to disclose preferred forms of the present invention.
In correlation technique, mention, when document classification, only can to file, classify according to a feature, inefficiency, and waste a large amount of time.Therefore, while using prior art to classify to virus document, require a great deal of time and manpower is classified to virus document, in addition, while using prior art to classify to a large amount of virus documents, because every subseries only can be for a feature, and the enormous amount of file causes classifying inaccurate.
For solving the problems of the technologies described above, the embodiment of the present invention has proposed a kind of acquisition methods of Virus sample.Fig. 1 shows the processing flow chart of the acquisition methods of Virus sample according to an embodiment of the invention.Referring to Fig. 1, this flow process at least comprises that step S102 is to step S110.
Step S102, according to file type, determine at least one feature extraction condition, and appoint and get N feature extraction condition in feature extraction condition, wherein N is positive integer.
Step S104, utilize N feature extraction condition to carry out feature extraction to each file respectively, according to the vector value of extracting result and determine the feature that each feature extraction condition extracts.
Step S106, for each file, according to the vector value of the feature of N feature extraction condition, generate corresponding class vector.
Step S108, appoint and get a class vector as classification basic point, calculate except other class vectors as the class vector of basic point of classifying and the classifying distance between described classification basic point.
Each classifying distance that step S110, basis calculate is classified to file.
It should be noted that, the method for using the embodiment of the present invention to provide is obtained Virus sample, after Virus sample being classified, obtains respectively, and the embodiment of the present invention can be classified to Virus sample.In addition, the method that the embodiment of the present invention provides can be classified to the alternative document except Virus sample.For the acquisition methods of Virus sample that the embodiment of the present invention is provided is set forth clearer succinctly, in the embodiment of the present invention and preferred embodiment thereof, by Virus sample referred to as file.
In embodiments of the present invention, according to file type, determine feature extraction condition, and according to feature extraction condition, file is carried out to feature extraction, generate class vector.Because the feature between dissimilar file is distinct, according to the type of file, determine respectively that the feature extraction condition of file can guarantee, after file is carried out to basic classification of type, file is carried out to concrete and more significant classification.For example, to the type of a DES(executable file) file of type carries out feature extraction and can determine feature extraction condition is in DES file, whether to comprise feedback to compile instruction, to pe(file layout) file of form carries out feature extraction and can determine feature extraction condition is whether to comprise importing table, comprise number of derived table etc.After class vector corresponding to feature extraction condition spanned file, in the embodiment of the present invention, appoint and get a class vector as classification basic point, calculate the classifying distance between classification basic point and other class vectors, and according to classifying distance, file is classified, while having solved in prior art document classification, every subseries only can be classified to file according to a feature, cause spending the plenty of time to document classification, classification effectiveness the low and inaccurate problem of classifying.The embodiment of the present invention, according to definite vector corresponding to feature extraction condition spanned file, is used multiple feature extraction conditions file is classified simultaneously, can save the classification time and improve the accuracy of classification.For example, the acquisition methods of the Virus sample that the use embodiment of the present invention provides is classified to virus document, can guarantee the virus document of accurately classifying, other virus document of same class is divided into one group, and then guarantee more efficiently other virus document of same class to be carried out to analyzing and processing, so that the real-time unknown sample to collecting is carried out safe judgement, guarantee more expeditiously the security of terminal system, improve user and experience.
As shown in the step S102 in Fig. 1, according to file type, determine at least one feature extraction condition, wherein, file can be the file of any type.For example, when file type is the type of a DES(executable file), feature extraction condition can be in the side-play amount that whether comprises method in number that whether character string, file size exceed the method number in fixed value, file, file in file, file, whether to comprise feedback to compile instruction.Again for example, when file is portable while carrying out body (Portable Execute, hereinafter to be referred as pe) form, feature extraction condition can be number of derived table in the number, pe file of importing table in the value that comprises field in pe file, pe file etc.In addition, when the file of classifying in the embodiment of the present invention is virus document, virus document can be known virus document, can also be unknown virus document, the virus document of 0day (the virus document type occurring in real time), etc.
In the embodiment of the present invention, can also use Kmeans clustering algorithm (a kind of data mining algorithm) to choose classification basic point, file is classified.Kmeans clustering algorithm in space k point centered by carry out cluster, to the most close they object sort out.By the method for iteration, successively upgrade the value of each cluster centre, until obtain best cluster result.For example, its word frequency information of file acquisition that use Kmeans clustering algorithm is html to form is as feature extraction condition, property value in its feature string of the file acquisition that is dex for form and/or dex file header is as feature extraction condition, as, Filesize(property value title), Method off and size(property value title), Stringid off and size(property value title), Classdef off and size(property value title), Prototype off and size(property value title) and file layout according to an embodiment of the invention be as shown in Figure 2 the list of attribute values in the dex file header that obtains of dex, etc..
After determining feature extraction condition, in definite feature extraction condition, appoint N the feature extraction condition of getting.Wherein, N is positive integer, and the value of N is to arrange arbitrarily.For example, when definite feature extraction condition is 100, be that N can be 5, appoint 5 the feature extraction conditions of getting, N can be 30, appoints 30 the feature extraction conditions of getting, N can also be 100, directly chooses all definite feature extraction conditions, and the embodiment of the present invention is not limited this.Utilize respectively N feature extraction condition to carry out feature extraction to each file, and according to extraction result, determine the vector value of the feature of each feature extraction condition extraction.For example, according to the type of three files, in definite feature extraction condition, to appoint and get 2 feature extraction conditions, first feature extraction condition is in file, to have importing table, second extraction conditions is that file size exceedes 20 kilobyte (Kilobits, hereinafter to be referred as k).According to first feature extraction condition, in first file, there is not importing table, in second file and the 3rd file, exist and import table.Arranging and having second file of importing table and the vector value of the 3rd corresponding first feature extraction condition of file is 1, and not having the vector value of corresponding first feature extraction condition of first file of importing table is 0.According to second feature extraction condition, the size of first file and the 3rd file exceedes 20k, and the size of second file is less than 20k.Arrange that file size exceedes first file of 20k and the vector value of corresponding second the feature extraction condition of the 3rd file is 2, the vector value that file size is less than corresponding second the feature extraction condition of second file of 20k is 3.It should be noted that, the vector value of the above-mentioned setting of the present invention only, for for example, in practice, can be carried out different settings to vector value according to the difference of feature extraction condition, and the embodiment of the present invention is not limited this.
After determining the vector value of feature of each feature extraction condition extraction, for each file, according to the vector value of the feature of the each feature extraction condition of N, generate corresponding class vector.For example, in a upper example, according to the type of three files, in definite feature extraction condition, appoint 2 the feature extraction conditions of getting, and according to 2 feature extractions, determine that the class vector of first file is (0,2), the class vector of second file is (1,3), the class vector of the 3rd file is (1,2).
For each file, after class vector generates, appoint and get a class vector as classification basic point, calculate the classifying distance between other class vectors and the classification basic point except the class vector as classification basic point.In the embodiment of the present invention, the classifying distance of calculating between other class vectors and the classification basic point except the class vector as classification basic point can adopt any distance computing method, and the embodiment of the present invention is not limited this.Preferably, in the embodiment of the present invention, can adopt Euclidean distance computing method to calculate the distance between each class vector and basic point.Wherein, Euclidean distance computing method are a kind of conventional distance calculating method, applied widely, and it is higher to calculate accuracy rate.After each class vector and the classifying distance calculating of classifying between basic point are complete, according to the classification predetermined threshold setting in advance, classifying distance is divided into groups.Judge whether each classifying distance is not more than classification predetermined threshold.If so, classifying distance is divided into one group, if not, classifying distance is divided into other one group.Afterwards, according to different groups, file corresponding each classifying distance is divided into different classes ofly, that is, the corresponding file of classifying distance of same group is divided into same class file.
In the embodiment of the present invention, classification predetermined threshold can be the artificial predetermined threshold arranging of classifying distance removing according to calculating, can also be a numerical value that classifying distance can be carried out to rationally grouping that all classifying distance are carried out obtaining after suitable computing etc., the embodiment of the present invention be limited this.Preferably, in the embodiment of the present invention, the classifying distance value of averaging is calculated, using the mean value obtaining as above-mentioned classification predetermined threshold.In the embodiment of the present invention, get N feature extraction condition for each file is classified, wherein, N is more than or equal to 1 positive integer.When N is greater than 1, when each file is classified, need to consider multiple feature extraction conditions, and every subseries is only divided into All Files two classes.Therefore, adopt the mean value of classifying distance can obtain the mean value of all class vectors with respect to classification basic point as classification predetermined threshold, and then comprehensively each feature extraction condition is divided into two classes by each file of class vector representative according to characteristic of correspondence extraction conditions exactly.
When getting after N feature extraction condition classify to file according to appointing, if also there is untapped feature extraction condition, in untapped feature extraction condition, again select M feature extraction condition in definite feature extraction condition.Utilize M the feature extraction condition of again selecting to repeat sort operation mentioned above to file, file is classified again, until definite feature extraction condition is all used.Wherein, M and N carry out the value that value that additive operation obtains should be not more than definite whole feature extraction conditions.In addition, in the embodiment of the present invention, the value of M can be identical with the value of N, can also be not identical with the value of N, can arrange according to different situations in actual applications, and the embodiment of the present invention is not limited this.
In the embodiment of the present invention, the acquisition methods of the Virus sample that file all can provide according to the embodiment of the present invention is arbitrarily classified.After file is classified, the embodiment of the present invention can, to identify label of the file identification of each classification (Identity is called for short ID), can intuitively be understood the feature of each category file according to the ID of the file of each classification.The file of preferably, classifying in the embodiment of the present invention can be virus document.According to the feature extraction condition of dissimilar virus document, virus document is classified.Afterwards, according to distinguishing different virus documents and feature thereof to the classification of virus document.And then, according to the feature of virus document and classification, can distinguish other unknown virus files, further find unknown virus document.For example, after file is classified, according to the feature of files in different types, can judge that file is malicious file or non-malicious file.According to information such as the distinguishing characteristicss of the malicious file of distinguishing and non-malicious file, can further to other file, carry out the differentiation of malice or non-malice, and then can more efficiently find all sidedly virus document, protection user terminal safety.
Again for example, after the file getting is classified, other file of same class is carried out to safety scanning.Wherein, this safety scanning is one by one other file of same class to be mated with the file storing in the feature database of malicious file and non-malicious file.If in this class file being scanned, most of file (as more than 80% file) is all malicious file, and so remaining file (as remaining 20% file) is also judged to be malicious file.Otherwise, this class file being scanned can be judged to be to non-malicious file.After the security of sort file (file is as malicious file or non-malicious file as judged) is judged, file is labeled as to malicious file or flies malicious file according to result of determination, and be updated to above-mentioned feature database and store, to the file of unknown security attribute is judged.
Now with specific embodiment, the acquisition methods of Virus sample of the present invention is described.
Embodiment mono-
Fig. 3 shows the processing flow chart of the acquisition methods of Virus sample in accordance with a preferred embodiment of the present invention, for supporting the acquisition methods of above-mentioned any one Virus sample, the acquisition methods of above-mentioned Virus sample is set forth more clear understandablely.Referring to Fig. 3, this flow process at least comprises that step S302 is to step S318.For this preferred embodiment is set forth clearlyer, in this preferred embodiment, arrange 10,0000 files are classified.
Step S302, determine 300 feature extraction conditions.
10,0000 files are classified, according to classify file of these 10,0000 files, determine 300 feature extraction conditions.
Step S304, appoint and get 30 feature extraction conditions.
In 300 feature extraction conditions determining, appoint 3 the feature extraction conditions of getting in step S302.These 3 feature extraction conditions are respectively whether file size exceedes and in the value that contains character string in 100k, file, file, whether contain feedback and compile instruction.
Step S306, utilize feature extraction condition to carry out feature extraction.
According to appointing 3 feature extraction conditions of getting in step S304,10,0000 files are carried out to feature extraction.Whether judge in 10,0000 files whether each file size exceedes contains feedback in the value that contains character string in 100k, each file, each file and compiles instruction.
Step S308, according to extract result determine the class vector that each file is corresponding.
According to 3 feature extraction conditions, the feature extraction result of 10,0000 files is determined the vector value of the feature of each feature extraction condition extraction.For feature " whether file size exceedes 100k ", if file size exceedes 100k, it is 1 that vector value corresponding to this feature of file is set, if file size does not exceed 100k, it is 0 that vector value corresponding to this feature of file is set.For feature, " in each file, contain the value of character string ", the value that the character string containing in file can be directly set is vector value corresponding to this feature of file, can also corresponding vector value be set according to the order of magnitude of the value of the character string comprising in file or rule or other.For feature, " in each file, whether contain feedback and compile instruction ", vector value corresponding to this feature of file that setting contains the instruction of feedback volume is 2, and it is 3 that vector value corresponding to this feature of file that does not contain the instruction of feedback volume is set.
According to the vector value of above-mentioned definite feature, generate 10,0000 class vectors corresponding to files difference.
Step S310, in all class vectors, choose at random a class vector as basic point.
Step S312, calculate the classifying distance between the basic point of choosing in other class vectors and step S310.
Step S314, according to classifying distance, classify the documents.
After calculating classifying distance according to step S312, calculate the mean value of all classifying distance, and using this mean value as classification predetermined threshold.Judge whether each classifying distance is not more than classification predetermined threshold.If so, classifying distance is divided into one group.If not, classifying distance is divided into other one group.By after classifying distance grouping, according to different groups, file is divided into different classes ofly, that is, same group categories is divided into same class file apart from corresponding file.
Step S316, judge determine 300 feature extraction conditions in whether there is untapped feature extraction condition.If not, flow process finishes.If so, perform step S318.
In 300 feature extraction conditions that step S318, step S316 judgement is determined, there is untapped feature extraction condition, in untapped feature extraction condition, appoint and get M feature extraction condition continuation execution step S308 to step S316, until 300 feature extraction conditions determining are all used file is classified.In step S318, appoint the number of the feature extraction condition of getting not fix.That is,, in this example, when in 300 feature extraction conditions determining, random selected characteristic extraction conditions is classified to file, the number of selected characteristic extraction conditions can arrange arbitrarily.For example, can be by time tagsort operation of 300 feature extraction conditional averages to ten, be that every subseries is appointed 30 the feature extraction conditions of getting, the feature extraction condition of optional different numerical value in each sort operation can also directly be got 300 whole feature extraction conditions file is classified when the first classification again.
The acquisition methods of the Virus sample based on above each preferred embodiment provides, based on same inventive concept, the embodiment of the present invention provides a kind of equipment that obtains of Virus sample, for realizing the acquisition methods of above-mentioned Virus sample.
Fig. 4 shows the structural representation of the equipment that obtains of Virus sample according to an embodiment of the invention.Referring to Fig. 4, the equipment that obtains of the Virus sample of the embodiment of the present invention at least comprises: getter 410, vector generator 420, distance calculator 430 and sorter 450.
Now introduce the annexation between each device of the equipment that obtains or function and the each several part of composition of the Virus sample of the embodiment of the present invention:
Getter 410, is configured to determine at least one feature extraction condition according to file type, and appoints and get N feature extraction condition in feature extraction condition, and wherein, N is positive integer.
Vector generator 420, is coupled with getter 410, is configured to utilize respectively N feature extraction condition to carry out feature extraction to each file, determines the vector value of the feature of each feature extraction condition extraction according to extraction result; And
For each file, according to the vector value of the feature of N feature extraction condition, generate corresponding class vector.
Distance calculator 430, is coupled with vector generator 420, is configured to appoint get a class vector as classification basic point, calculates the classifying distance between other class vectors and the classification basic point except the class vector as classification basic point.
Sorter 440, is coupled with distance calculator 430, is configured to according to the each classifying distance calculating, file be classified.
It should be noted that, the equipment that uses the embodiment of the present invention to provide obtains Virus sample, after Virus sample being classified, obtains respectively, and the embodiment of the present invention can be classified to Virus sample.In addition, the equipment that the embodiment of the present invention provides can be classified to the alternative document except Virus sample.For the equipment that obtains of Virus sample that the embodiment of the present invention is provided is set forth clearer succinctly, in the embodiment of the present invention and preferred embodiment thereof, by Virus sample referred to as file.
In embodiments of the present invention, according to file type, determine feature extraction condition, and according to feature extraction condition, file is carried out to feature extraction, generate class vector.Because the feature between dissimilar file is distinct, according to the type of file, determine respectively that the feature extraction condition of file can guarantee, after file is carried out to basic classification of type, file is carried out to concrete and more significant classification.For example, to the type of a DES(executable file) file of type carries out feature extraction and can determine feature extraction condition is in DES file, whether to comprise feedback to compile instruction, to pe(file layout) file of form carries out feature extraction and can determine feature extraction condition is whether to comprise importing table, comprise number of derived table etc.After class vector corresponding to feature extraction condition spanned file, in the embodiment of the present invention, appoint and get a class vector as classification basic point, calculate the classifying distance between classification basic point and other class vectors, and according to classifying distance, file is classified, while having solved in prior art document classification, every subseries only can be classified to file according to a feature, cause spending the plenty of time to document classification, classification effectiveness the low and inaccurate problem of classifying.The embodiment of the present invention, according to definite vector corresponding to feature extraction condition spanned file, is used multiple feature extraction conditions file is classified simultaneously, can save the classification time and improve the accuracy of classification.For example, the equipment that obtains of the Virus sample that the use embodiment of the present invention provides is classified to virus document, can guarantee the virus document of accurately classifying, other virus document of same class is divided into one group, and then guarantee more efficiently other virus document of same class to be carried out to analyzing and processing, so that the real-time unknown sample to collecting is carried out safe judgement, guarantee more expeditiously the security of terminal system, improve user and experience.
As shown in Figure 4, getter 410 is determined at least one feature extraction condition according to file type, and wherein, file can be the file of any type.For example, when file type is the type of a DES(executable file), getter 410 determines feature extraction condition can be in the side-play amount that whether comprises method in number that whether character string, file size exceed the method number in fixed value, file, file in file, file, whether to comprise feedback to compile instruction.Again for example, when file is pe form, getter 410 determines that feature extraction condition can be number of derived table in the number, pe file of importing table in the value that comprises field in pe file, pe file etc.In addition, when the file of classifying in the embodiment of the present invention is virus document, virus document can be known virus document, can also be any virus files such as unknown virus document, the virus document of 0day (the virus document type occurring in real time).
After getter 410 is determined feature extraction condition, vector generator 420 is appointed N the feature extraction condition of getting in definite feature extraction condition.Wherein, N is positive integer, and the value of N is to arrange arbitrarily.For example, when definite feature extraction condition is 100, be that N can be 5, appoint 5 the feature extraction conditions of getting, N can be 30, appoints 30 the feature extraction conditions of getting, N can also be 100, directly chooses all definite feature extraction conditions, and the embodiment of the present invention is not limited this.Vector generator 420 utilizes respectively N feature extraction condition to carry out feature extraction to each file, and according to extraction result, determines the vector value of the feature of each feature extraction condition extraction.For example, according to the type of three files, in definite feature extraction condition, appoint and get 2 feature extraction conditions, first feature extraction condition is in file, to have importing table, and second extraction conditions is that file size exceedes 20k.According to first feature extraction condition, in first file, there is not importing table, in second file and the 3rd file, exist and import table.Arranging and having second file of importing table and the vector value of the 3rd corresponding first feature extraction condition of file is 1, and not having the vector value of corresponding first feature extraction condition of first file of importing table is 0.According to second feature extraction condition, the size of first file and the 3rd file exceedes 20k, and the size of second file is less than 20k.Arrange that file size exceedes first file of 20k and the vector value of corresponding second the feature extraction condition of the 3rd file is 2, the vector value that file size is less than corresponding second the feature extraction condition of second file of 20k is 3.It should be noted that, the vector value of the above-mentioned setting of the present invention only, for for example, in practice, can be carried out different settings to vector value according to the difference of feature extraction condition, and the embodiment of the present invention is not limited this.
After determining the vector value of feature of each feature extraction condition extraction, for each file, vector generator 420 generates corresponding class vector according to the vector value of the feature of the each feature extraction condition of N.For example, in a upper example, according to the type of three files, in definite feature extraction condition, appoint 2 the feature extraction conditions of getting, and according to 2 feature extractions, determine that the class vector of first file is (0,2), the class vector of second file is (1,3), the class vector of the 3rd file is (1,2).
For each file, after vector generator 420 class vectors generate, get a class vector as classification basic point with 430 of the distance calculators that vector generator 320 is coupled, calculate the classifying distance between other class vectors and the classification basic point except the class vector as classification basic point.In the embodiment of the present invention, the classifying distance that counter 430 calculates between other class vectors and the classification basic point except the class vector as classification basic point can adopt any distance computing method, and the embodiment of the present invention is not limited this.Preferably, counter 430 can adopt Euclidean distance computing method to calculate the distance between each class vector and basic point.Wherein, Euclidean distance computing method are a kind of conventional distance calculating method, applied widely, and it is higher to calculate accuracy rate.After the each class vector of distance calculator 430 and the classifying distance calculating of classifying between basic point are complete, sorter 450 divides into groups classifying distance according to the classification predetermined threshold setting in advance.First, by determining device 440, judge whether each classifying distance is not more than classification predetermined threshold.If so, sorter 450 is divided into one group by classifying distance, and if not, sorter 450 is divided into other one group by classifying distance.Afterwards, sorter 450 is divided into file corresponding each classifying distance according to different groups different classes of, that is, the corresponding file of classifying distance of same group is divided into same class file.
In the embodiment of the present invention, classification predetermined threshold can be the artificial predetermined threshold arranging of classifying distance removing according to calculating, can also be a numerical value that classifying distance can be carried out to rationally grouping that all classifying distance are carried out obtaining after suitable computing etc., the embodiment of the present invention be limited this.Preferably, in the embodiment of the present invention, the classifying distance value of averaging is calculated, using the mean value obtaining as above-mentioned classification predetermined threshold.In the embodiment of the present invention, get N feature extraction condition for each file is classified, wherein, N is more than or equal to 1 positive integer.When N is greater than 1, when each file is classified, need to consider multiple feature extraction conditions, and every subseries is only divided into All Files two classes.Therefore, adopt the mean value of classifying distance can obtain the mean value of all class vectors with respect to classification basic point as classification predetermined threshold, and then comprehensively each feature extraction condition is divided into two classes by each file of class vector representative according to characteristic of correspondence extraction conditions exactly.
When sorter 450 is got after N feature extraction condition classify to file according to appointing in definite feature extraction condition, if also there is untapped feature extraction condition, getter 410 is again selected M feature extraction condition in untapped feature extraction condition.M the feature extraction condition that sorting device utilization is as shown in Figure 4 selected again repeats sort operation mentioned above to file, file classified again, until definite feature extraction condition is all used.Wherein, M and N carry out the value that value that additive operation obtains should be not more than definite whole feature extraction conditions.In addition, in the embodiment of the present invention, the value of M can be identical with the value of N, can also be not identical with the value of N, can arrange according to different situations in actual applications, and the embodiment of the present invention is not limited this.
In the embodiment of the present invention, the Virus sample acquiring device that file all can provide according to the embodiment of the present invention is arbitrarily classified.The file of preferably, classifying in the embodiment of the present invention can be virus document.According to the feature extraction condition of dissimilar virus document, virus document is classified.Afterwards, according to distinguishing different virus documents and feature thereof to the classification of virus document.And then, according to the feature of virus document and classification, can distinguish other unknown virus files, further find unknown virus document.For example, after file is classified, according to the feature of files in different types, can judge that file is malicious file or non-malicious file.According to information such as the distinguishing characteristicss of the malicious file of distinguishing and non-malicious file, can further to other file, carry out the differentiation of malice or non-malice, and then can more efficiently find all sidedly virus document, protection user terminal safety.
Again for example, after the file getting is classified, other file of same class is carried out to safety scanning.Wherein, this safety scanning is one by one other file of same class to be mated with the file storing in the feature database of malicious file and non-malicious file.If in this class file being scanned, most of file (as more than 80% file) is all malicious file, and so remaining file (as remaining 20% file) is also judged to be malicious file.Otherwise, this class file being scanned can be judged to be to non-malicious file.After the security of sort file (file is as malicious file or non-malicious file as judged) is judged, file is labeled as to malicious file or flies malicious file according to result of determination, and be updated to above-mentioned feature database and store, to the file of unknown security attribute is judged.
According to the combination of above-mentioned any one preferred embodiment or multiple preferred embodiments, the embodiment of the present invention can reach following beneficial effect:
In embodiments of the present invention, according to file type, determine feature extraction condition, and according to feature extraction condition, file is carried out to feature extraction, generate class vector.Because the feature between dissimilar file is distinct, according to the type of file, determine respectively that the feature extraction condition of file can guarantee, after file is carried out to basic classification of type, file is carried out to concrete and more significant classification.For example, to the type of a DES(executable file) file of type carries out feature extraction and can determine feature extraction condition is in DES file, whether to comprise feedback to compile instruction, to pe(file layout) file of form carries out feature extraction and can determine feature extraction condition is whether to comprise importing table, comprise number of derived table etc.After class vector corresponding to feature extraction condition spanned file, in the embodiment of the present invention, appoint and get a class vector as classification basic point, calculate the classifying distance between classification basic point and other class vectors, and according to classifying distance, file is classified, while having solved in prior art document classification, every subseries only can be classified to file according to a feature, cause spending the plenty of time to document classification, classification effectiveness the low and inaccurate problem of classifying.The embodiment of the present invention, according to definite vector corresponding to feature extraction condition spanned file, is used multiple feature extraction conditions file is classified simultaneously, can save the classification time and improve the accuracy of classification.For example, the acquisition methods of the Virus sample that the use embodiment of the present invention provides is classified to virus document, can guarantee the virus document of accurately classifying, other virus document of same class is divided into one group, and then guarantee more efficiently other virus document of same class to be carried out to analyzing and processing, so that the real-time unknown sample to collecting is carried out safe judgement, guarantee more expeditiously the security of terminal system, improve user and experience.
In the instructions that provided herein, a large amount of details have been described.But, can understand, embodiments of the invention can be put into practice in the situation that there is no these details.In some instances, be not shown specifically known method, structure and technology, so that not fuzzy understanding of this description.
Similarly, be to be understood that, in order to simplify the disclosure and to help to understand one or more in each inventive aspect, in the above in the description of exemplary embodiment of the present invention, each feature of the present invention is grouped together into single embodiment, figure or sometimes in its description.But, the method for the disclosure should be construed to the following intention of reflection: the present invention for required protection requires than the more feature of feature of clearly recording in each claim.Or rather, as reflected in claims below, inventive aspect is to be less than all features of disclosed single embodiment above.Therefore, claims of following embodiment are incorporated to this embodiment thus clearly, and wherein each claim itself is as independent embodiment of the present invention.
Those skilled in the art are appreciated that and can the module in the equipment in embodiment are adaptively changed and they are arranged in one or more equipment different from this embodiment.Module in embodiment or unit or assembly can be combined into a module or unit or assembly, and can put them in addition multiple submodules or subelement or sub-component.At least some in such feature and/or process or unit are mutually repelling, and can adopt any combination to combine all processes or the unit of disclosed all features in this instructions (comprising claim, summary and the accompanying drawing followed) and disclosed any method like this or equipment.Unless clearly statement in addition, in this instructions (comprising claim, summary and the accompanying drawing followed) disclosed each feature can be by providing identical, be equal to or the alternative features of similar object replaces.
In addition, those skilled in the art can understand, although embodiment more described herein comprise some feature rather than further feature included in other embodiment, the combination of the feature of different embodiment means within scope of the present invention and forms different embodiment.For example, in claims, the one of any of embodiment required for protection can be used with array mode arbitrarily.
All parts embodiment of the present invention can realize with hardware, or realizes with the software module of moving on one or more processor, or realizes with their combination.It will be understood by those of skill in the art that and can use in practice microprocessor or digital signal processor (DSP) to realize the some or all functions of obtaining the some or all parts in equipment according to the Virus sample of the embodiment of the present invention.The present invention can also be embodied as part or all equipment or the device program (for example, computer program and computer program) for carrying out method as described herein.Realizing program of the present invention and can be stored on computer-readable medium like this, or can there is the form of one or more signal.Such signal can be downloaded and obtain from internet website, or provides on carrier signal, or provides with any other form.
It should be noted above-described embodiment the present invention will be described rather than limit the invention, and those skilled in the art can design alternative embodiment in the case of not departing from the scope of claims.In the claims, any reference symbol between bracket should be configured to limitations on claims.Word " comprises " not to be got rid of existence and is not listed as element or step in the claims.Being positioned at word " " before element or " one " does not get rid of and has multiple such elements.The present invention can be by means of including the hardware of some different elements and realizing by means of the computing machine of suitably programming.In the unit claim of having enumerated some devices, several in these devices can be to carry out imbody by same hardware branch.The use of word first, second and C grade does not represent any order.Can be title by these word explanations.
So far, those skilled in the art will recognize that, although detailed, illustrate and described multiple exemplary embodiment of the present invention herein, but, without departing from the spirit and scope of the present invention, still can directly determine or derive many other modification or the modification that meet the principle of the invention according to content disclosed by the invention.Therefore, scope of the present invention should be understood and regard as and cover all these other modification or modifications.
The embodiment of the present invention also provides the acquisition methods of an A1. Virus sample, comprising:
According to file type, determine at least one feature extraction condition, and appoint and get N feature extraction condition in described feature extraction condition, wherein N is positive integer;
Utilize respectively described N feature extraction condition to carry out feature extraction to each file, according to extraction result, determine the vector value of the feature of each feature extraction condition extraction;
For each file, according to the vector value of the feature of described N feature extraction condition, generate corresponding class vector;
Appoint and get a described class vector as classification basic point, calculate except the classifying distance as between class vector described in other class vector of basic point of classifying and described classification basic point;
According to the each classifying distance calculating, described file is classified.
A2. according to the method described in A1, wherein, also comprise:
After this subseries, determine whether also to exist untapped feature extraction condition;
If so, in described untapped feature extraction condition, again select M feature extraction condition, utilize described M feature extraction condition to carry out the sort operation to described file;
Repeat the sort operation to described file, until described feature extraction condition is all used.
A3. according to the method described in A1 or A2, wherein, according to described classifying distance, described file is classified, comprising:
According to apart from length, classifying distance being divided into groups;
According to different groups, described file is divided into different classes of.
A4. according to the method described in A3, wherein, described according to apart from length, classifying distance being divided into groups, comprising:
Judge described in each whether classifying distance is not more than classification predetermined threshold;
If so, described classifying distance is divided into one group;
If not, described classifying distance is divided into other one group;
According to different groups, described file is divided into different classes ofly, comprises:
Corresponding the classifying distance of same group file is divided into same class file.
A5. according to the method described in A4 any one, wherein, the mean value that described classification predetermined threshold is described classifying distance.
A6. according to the method described in A1 to A5 any one, wherein, described file comprises virus document.
The embodiment of the present invention also provides the equipment that obtains of a B7. Virus sample, comprising:
Getter, is configured to determine at least one feature extraction condition according to file type, and appoints and get N feature extraction condition in described feature extraction condition, and wherein, N is positive integer;
Vector generator, is configured to utilize respectively described N feature extraction condition to carry out feature extraction to each file, determines the vector value of the feature of each feature extraction condition extraction according to extraction result; And
For each file, according to the vector value of the feature of described N feature extraction condition, generate corresponding class vector;
Distance calculator, is configured to appoint and gets a described class vector as classification basic point, calculates except the classifying distance as between class vector described in other class vector of basic point of classifying and described classification basic point;
Sorter, is configured to according to the each classifying distance calculating, described file be classified.
B8. according to the equipment described in B7, wherein, also comprise:
Described vector generator is also configured to, and after this subseries, determines whether also to exist untapped feature extraction condition;
If so, in described untapped feature extraction condition, again select M feature extraction condition, trigger described distance calculator and described sorter and utilize described M feature extraction condition to carry out the sort operation to described file;
Repeat the sort operation to described file, until described feature extraction condition is all used.
B9. according to the equipment described in B7 or B8, wherein, described sorter is also configured to:
According to apart from length, classifying distance being divided into groups;
According to different groups, described file is divided into different classes of.
B10. according to the equipment described in B9, also comprise:
Determining device, is configured to judge whether described classifying distance is not more than classification predetermined threshold.
B11. according to the equipment described in B10, wherein, described sorter is also configured to:
If according to described determining device, described classifying distance is not more than classification predetermined threshold, and described classifying distance is divided into one group;
If according to described determining device, described classifying distance is less than classification predetermined threshold, and described classifying distance is divided into other one group;
According to different groups, described file is divided into different classes ofly, comprises:
Corresponding the classifying distance of same group file is divided into same class file.
B12. according to the equipment described in B10 or B11, wherein, the mean value that described classification predetermined threshold is described classifying distance.

Claims (10)

1. an acquisition methods for Virus sample, comprising:
According to file type, determine at least one feature extraction condition, and appoint and get N feature extraction condition in described feature extraction condition, wherein N is positive integer;
Utilize respectively described N feature extraction condition to carry out feature extraction to each file, according to extraction result, determine the vector value of the feature of each feature extraction condition extraction;
For each file, according to the vector value of the feature of described N feature extraction condition, generate corresponding class vector;
Appoint and get a described class vector as classification basic point, calculate except the classifying distance as between class vector described in other class vector of basic point of classifying and described classification basic point;
According to the each classifying distance calculating, described file is classified.
2. method according to claim 1, wherein, also comprises:
After this subseries, determine whether also to exist untapped feature extraction condition;
If so, in described untapped feature extraction condition, again select M feature extraction condition, utilize described M feature extraction condition to carry out the sort operation to described file;
Repeat the sort operation to described file, until described feature extraction condition is all used.
3. method according to claim 1 and 2, wherein, classifies to described file according to described classifying distance, comprising:
According to apart from length, classifying distance being divided into groups;
According to different groups, described file is divided into different classes of.
4. method according to claim 3, wherein, described according to apart from length, classifying distance being divided into groups, comprising:
Judge described in each whether classifying distance is not more than classification predetermined threshold;
If so, described classifying distance is divided into one group;
If not, described classifying distance is divided into other one group;
According to different groups, described file is divided into different classes ofly, comprises:
Corresponding the classifying distance of same group file is divided into same class file.
5. according to the method described in claim 4 any one, wherein, the mean value that described classification predetermined threshold is described classifying distance.
6. according to the method described in claim 1 to 5 any one, wherein, described file comprises virus document.
7. the equipment that obtains of Virus sample, comprising:
Getter, is configured to determine at least one feature extraction condition according to file type, and appoints and get N feature extraction condition in described feature extraction condition, and wherein, N is positive integer;
Vector generator, is configured to utilize respectively described N feature extraction condition to carry out feature extraction to each file, determines the vector value of the feature of each feature extraction condition extraction according to extraction result; And
For each file, according to the vector value of the feature of described N feature extraction condition, generate corresponding class vector;
Distance calculator, is configured to appoint and gets a described class vector as classification basic point, calculates except the classifying distance as between class vector described in other class vector of basic point of classifying and described classification basic point;
Sorter, is configured to according to the each classifying distance calculating, described file be classified.
8. equipment according to claim 7, wherein, also comprises:
Described vector generator is also configured to, and after this subseries, determines whether also to exist untapped feature extraction condition;
If so, in described untapped feature extraction condition, again select M feature extraction condition, trigger described distance calculator and described sorter and utilize described M feature extraction condition to carry out the sort operation to described file;
Repeat the sort operation to described file, until described feature extraction condition is all used.
9. according to the equipment described in claim 7 or 8, wherein, described sorter is also configured to:
According to apart from length, classifying distance being divided into groups;
According to different groups, described file is divided into different classes of.
10. equipment according to claim 9, also comprises:
Determining device, is configured to judge whether described classifying distance is not more than classification predetermined threshold.
CN201410006827.3A 2014-01-07 2014-01-07 Method and equipment for acquiring virus program samples Pending CN103761477A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410006827.3A CN103761477A (en) 2014-01-07 2014-01-07 Method and equipment for acquiring virus program samples

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410006827.3A CN103761477A (en) 2014-01-07 2014-01-07 Method and equipment for acquiring virus program samples

Publications (1)

Publication Number Publication Date
CN103761477A true CN103761477A (en) 2014-04-30

Family

ID=50528713

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410006827.3A Pending CN103761477A (en) 2014-01-07 2014-01-07 Method and equipment for acquiring virus program samples

Country Status (1)

Country Link
CN (1) CN103761477A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106960153A (en) * 2016-01-12 2017-07-18 阿里巴巴集团控股有限公司 The kind identification method and device of virus
US20220417260A1 (en) * 2021-06-29 2022-12-29 Juniper Networks, Inc. Detecting and blocking a malicious file early in transit on a network

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1388947A (en) * 2000-08-31 2003-01-01 惠普公司 Character recognition system
CN101609450A (en) * 2009-04-10 2009-12-23 南京邮电大学 Web page classification method based on training set
CN101620616A (en) * 2009-05-07 2010-01-06 北京理工大学 Chinese similar web page de-emphasis method based on microcosmic characteristic
CN102930296A (en) * 2012-11-01 2013-02-13 长沙纳特微视网络科技有限公司 Image identifying method and device
US20130185305A1 (en) * 2010-09-07 2013-07-18 Olympus Corporation Keyword assignment apparatus and recording medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1388947A (en) * 2000-08-31 2003-01-01 惠普公司 Character recognition system
CN101609450A (en) * 2009-04-10 2009-12-23 南京邮电大学 Web page classification method based on training set
CN101620616A (en) * 2009-05-07 2010-01-06 北京理工大学 Chinese similar web page de-emphasis method based on microcosmic characteristic
US20130185305A1 (en) * 2010-09-07 2013-07-18 Olympus Corporation Keyword assignment apparatus and recording medium
CN102930296A (en) * 2012-11-01 2013-02-13 长沙纳特微视网络科技有限公司 Image identifying method and device

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106960153A (en) * 2016-01-12 2017-07-18 阿里巴巴集团控股有限公司 The kind identification method and device of virus
CN106960153B (en) * 2016-01-12 2021-01-29 阿里巴巴集团控股有限公司 Virus type identification method and device
US20220417260A1 (en) * 2021-06-29 2022-12-29 Juniper Networks, Inc. Detecting and blocking a malicious file early in transit on a network
US11895129B2 (en) * 2021-06-29 2024-02-06 Juniper Networks, Inc. Detecting and blocking a malicious file early in transit on a network

Similar Documents

Publication Publication Date Title
Saxe et al. eXpose: A character-level convolutional neural network with embeddings for detecting malicious URLs, file paths and registry keys
CN103473506A (en) Method and device of recognizing malicious APK files
CN103226583A (en) Method and device for recognizing advertisement plugin
CN104915327A (en) Text information processing method and device
CN109271788B (en) Android malicious software detection method based on deep learning
CN105653984B (en) File fingerprint method of calibration and device
Yang et al. Detecting android malware by applying classification techniques on images patterns
US11580222B2 (en) Automated malware analysis that automatically clusters sandbox reports of similar malware samples
CN104134041A (en) Anti-detecting method and device of terminal simulator system
KR101582601B1 (en) Method for detecting malignant code of android by activity string analysis
CN104143008A (en) Method and device for detecting phishing webpage based on picture matching
Kedziora et al. Malware detection using machine learning algorithms and reverse engineering of android java code
CN104462985A (en) Detecting method and device of bat loopholes
CN111651768B (en) Method and device for identifying link library function name of computer binary program
CN110704841A (en) Convolutional neural network-based large-scale android malicious application detection system and method
CN104504334A (en) System and method used for evaluating selectivity of classification rules
KR20180079434A (en) Virus database acquisition methods and devices, equipment, servers and systems
CN108229168B (en) Heuristic detection method, system and storage medium for nested files
US10248789B2 (en) File clustering using filters working over file attributes
CN103761477A (en) Method and equipment for acquiring virus program samples
CN104933096A (en) Abnormal key recognition method of database, abnormal key recognition device of database and data system
CN108491718B (en) Method and device for realizing information classification
Kedziora et al. Android malware detection using machine learning and reverse engineering
KR101907443B1 (en) Component-based malicious file similarity analysis device and method
Han et al. Instruction frequency-based malware classification method1

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20140430