Background technology
Along with the development of computer technology and Internet, people are all the more extensive to computer application, and thus, computer virus is also increasing to people's harm.Antivirus techniques is also developed gradually in this trend, and Anti-Virus Engine is exactly a major embodiment of antivirus techniques.Virus base is the viral sample set having been found that, one of major function of tradition Anti-Virus Engine, removes to contrast all programs or the file in machine with the sample in virus base exactly, sees whether meet these samples, virus if meet, otherwise not necessarily viral.
Tradition Anti-Virus Engine is mainly to use static scanning technology based on condition code, and whether checking exists the feature that matches hereof, if coupling, certain virus of just can having judged File Infection.But the Anti-Virus Engine of a maturation, it is very difficult only using a certain condition code (the most in full part HASH), because this is difficult to meet general Anti-Virus Engine and should possesses high coverage rate and guarantee again the demand of high detection speed.Therefore, numerous Anti-Virus Engine manufacturer often all can extract to the sample in virus base separately the condition code of multiclass, with the application scenarios choice for use different.This will face following problem:
How the application scenarios of different characteristic code is evaluated:
Different characteristic code is due to the difference of its extracting position and extraction algorithm, can there is different features and applicable scene, such as the condition code False Rate having is extremely low, but single condition code only can detect little viral sample, the single condition code having can detect a lot of viral samples, but rate of false alarm is equally very high.How different characteristic code is carried out to systematization evaluation, allow Anti-Virus Engine developer select according to the feature of institute's publish engine, the problem solving emphatically with regard to becoming this patent.
How to set up complete condition code appraisement system:
Along with number of samples amount in virus base increases gradually, anti-virus manufacturer is in order to maintain the retrieval high-level efficiency of each publish engine, a capital selection portion is divided the basic herbarium of sample as engine, the conclusion that above-described condition code evaluation method draws can not be the same in different basic herbariums, this just need to set up an individual system, improves basic herbarium and be generated to the whole process of evaluation.
Summary of the invention
For above-mentioned technical matters, the invention provides a kind of Anti-Virus Engine condition code evaluation method and system based on statistics, the method, by the method for quantitative statistics, draws all kinds of desired values of features relevant code, finally selects required condition code by developer according to application scenarios etc.
The present invention adopts with the following method and realizes: a kind of Anti-Virus Engine condition code evaluation method based on statistics, comprising:
From virus base, choose required viral sample and form basic herbarium;
Extract condition code for described basic herbarium, described condition code is a class or multiclass;
For every category feature code, set n check point, each check point value c
i, with c
icorresponding weights are w
i, described n and c
ivalue depending on basic sample set and condition code situation, described w
ivalue be used for adjusting the value proportion of each check point;
Utilize following formula parameter value index:
Wherein i ∈ [1, n];
Described c
ifor the viral sample quantity detecting, described N
numberOfAllSignaturefor the feature total quantity of such condition code in basic herbarium, described in
what such condition code of serving as reasons detected meet pre-conditioned viral sample quantity is greater than c
itime feature quantity;
With reference to described desired value index, according to the concrete application scenarios requirement of Anti-Virus Engine, choose required condition code.
Wherein, the object that multiple check points are set is to hold more accurately the characteristic of every category feature code, understands its performance under different check points;
Wherein, described w
ivalue be that developer determines according to actual conditions, be more concerned about the performance of described condition code under which check point, for example, if developer is concerned about that such condition code is detecting more than 20 performance under viral sample, by c more
i=20 corresponding weight w
ivalue tune up.
Further, described desired value index is condition code virus sample recall rate, and the size of described desired value index is directly proportional to the viral sample Detection capability of condition code, wherein, and described N
numberOfAllSignaturefor the feature total quantity of such condition code in basic herbarium; Described
the viral sample quantity that such condition code of serving as reasons detects is greater than c
itime feature quantity; Described desired value is larger, represents that such condition code more tends to detect more viral sample, otherwise, more tend to detect viral sample still less.
Further, described desired value index is the single virus family recall rate of condition code, and the size of described desired value index is directly proportional to the viral sample Detection capability of the single virus family of condition code, wherein, and described N
numberOfAllSignaturefor the feature total quantity of such condition code in basic herbarium, described in
the viral sample quantity of what such condition code of serving as reasons detected belong to single virus family is greater than c
itime feature quantity; Described desired value is larger, represents that such condition code more tends to detect the viral sample of single virus family, otherwise, more tend to detect the viral sample of multiple virus families.
Further, described desired value index is the single Virus Type recall rate of condition code, and the size of described desired value index is directly proportional to the viral sample Detection capability of the single Virus Type of condition code, wherein, and described N
numberOfAllSignaturefor the feature total quantity of such condition code in basic herbarium, described in
the viral sample quantity of what such condition code of serving as reasons detected belong to single Virus Type is greater than c
itime feature quantity; Described desired value is larger, represents that such condition code more tends to detect the viral sample of single Virus Type, otherwise, more tend to detect the viral sample of multiple Virus Types.
Further, described desired value index is the single operation platform recall rate of condition code, and the size of described desired value index is directly proportional to the viral sample Detection capability of the single operation platform of condition code, wherein, and described N
numberOfAllSignaturefor the feature total quantity of such condition code in basic herbarium, described in
the viral sample quantity of what such condition code of serving as reasons detected belong to single operation platform is greater than c
itime feature quantity; Described desired value is larger, represents that such condition code more tends to detect the viral sample of single operation platform, otherwise, more tend to detect the viral sample of multiple operation platforms.
Further, described desired value index is the single file layout recall rate of condition code, and the size of described desired value index is directly proportional to the viral sample Detection capability of the single file layout of condition code, wherein, and described N
numberOfAllSignaturefor the feature total quantity of such condition code in basic herbarium, described in
the viral sample quantity of single file layout that what such condition code of serving as reasons detected belong to is greater than c
itime feature quantity; Described desired value is larger, represents that such condition code more tends to detect the viral sample of single file layout, otherwise, more tend to detect the viral sample of multiple file layouts.
Known as mentioned above, for common Anti-Virus Engine, generally to select the condition code of high viral sample recall rate and high single virus family recall rate, because such condition code not only can have been used a small amount of condition code detect more virus but also can avoid more wrong report.But for the Anti-Virus Engine of some specific use, need concrete consideration, for example, some anti-virus manufacturer can provide file layout Filter Engine, now should select the condition code of high viral sample recall rate and high single file layout recall rate, more meaningful.
Except above-mentioned several desired values, utilize the thought of this method can provide more desired values that can quantitative statistics, can select for developer, thereby select more effectively virus signature.
The present invention adopts following system to realize: a kind of Anti-Virus Engine condition code evaluation system based on statistics, comprising:
Basis herbarium preparation module, forms basic herbarium for choose required viral sample from virus base;
Condition code extraction module, for extracting condition code for described basic herbarium, described condition code is a class or multiclass;
Statistical computation module, for for every category feature code, sets n check point, each check point value c
i, with c
icorresponding weights are w
i, described n and c
ivalue depending on basic sample set and condition code situation, described w
ivalue be used for adjusting the value proportion of each check point;
Utilize following formula parameter value index:
Wherein i ∈ [1, n];
Described c
ifor the viral sample quantity detecting, described N
numberOfAllSignaturefor the feature total quantity of such condition code in basic herbarium, described in
what such condition code of serving as reasons detected meet pre-conditioned viral sample quantity is greater than c
itime feature quantity;
Condition code is chosen module, for reference to described desired value index, according to the concrete application scenarios requirement of Anti-Virus Engine, chooses required condition code.
Further, described desired value index is condition code virus sample recall rate, and the size of described desired value index is directly proportional to the viral sample Detection capability of condition code, wherein, and described N
numberOfAllSignaturefor the feature total quantity of such condition code in basic herbarium; Described
the viral sample quantity that such condition code of serving as reasons detects is greater than c
itime feature quantity.
Further, described desired value index is the single virus family recall rate of condition code, and the size of described desired value index is directly proportional to the viral sample Detection capability of the single virus family of condition code, wherein, and described N
numberOfAllSignaturefor the feature total quantity of such condition code in basic herbarium, described in
the viral sample quantity of what such condition code of serving as reasons detected belong to single virus family is greater than c
itime feature quantity.
Further, described desired value index is the single Virus Type recall rate of condition code, and the size of described desired value index is directly proportional to the viral sample Detection capability of the single Virus Type of condition code, wherein, and described N
numberOfAllSignaturefor the feature total quantity of such condition code in basic herbarium, described in
the viral sample quantity of what such condition code of serving as reasons detected belong to single Virus Type is greater than c
itime feature quantity.
Further, described desired value index is the single operation platform recall rate of condition code, and the size of described desired value index is directly proportional to the viral sample Detection capability of the single operation platform of condition code, wherein, and described N
numberOfAllSignaturefor the feature total quantity of such condition code in basic herbarium, described in
the viral sample quantity of what such condition code of serving as reasons detected belong to single operation platform is greater than c
itime feature quantity.
Further, described desired value index is the single file layout recall rate of condition code, and the size of described desired value index is directly proportional to the viral sample Detection capability of the single file layout of condition code, wherein, and described N
numberOfAllSignaturefor the feature total quantity of such condition code in basic herbarium, described in
the viral sample quantity of single file layout that what such condition code of serving as reasons detected belong to is greater than c
itime feature quantity.
In sum, the invention provides a kind of Anti-Virus Engine condition code evaluation method and system based on statistics, first, when anti-virus manufacturer each publish engine, all need to from virus base, screen viral sample structure foundation herbarium, the present invention is directed to the basic herbarium at every turn reselecting and carry out condition code extraction, with the situation that adapts to add new condition code and reject some condition code, all carry out statistical computation based on the desired value computing method that provide in invention for every category feature code subsequently, the indices value that utilization obtains, and with reference to concrete application scenarios demand, choose reasonable condition code.
Embodiment
The present invention has provided a kind of Anti-Virus Engine condition code evaluation method and system based on statistics, in order to make those skilled in the art person understand better the technical scheme in the embodiment of the present invention, and above-mentioned purpose of the present invention, feature and advantage can be become apparent more, below in conjunction with accompanying drawing, technical scheme in the present invention is described in further detail:
First the present invention provides a kind of Anti-Virus Engine condition code evaluation method based on statistics, as shown in Figure 1, comprising:
S101 chooses required viral sample and forms basic herbarium from virus base;
S102 extracts condition code for described basic herbarium, and described condition code is a class or multiclass;
S103, for every category feature code, sets n check point, each check point value c
i, with c
icorresponding weights are w
i; For example: in the time of n=5, if c
i∈ [1,2,5,10,20], and w
i∈ [1,1,1,1,5]; Represent to have set 5 check points, and relatively pay close attention to the detected representation that check point is 20 o'clock such condition codes (because the weights of 20 o'clock are larger);
Described n and c
ivalue depending on basic sample set and condition code situation, described w
ivalue be used for adjusting the value proportion of each check point;
S104 utilizes following formula parameter value index:
Wherein i ∈ [1, n];
Described c
ifor the viral sample quantity detecting, described N
numberOfAllSignaturefor the feature total quantity of such condition code in basic herbarium, described in
what such condition code of serving as reasons detected meet pre-conditioned viral sample quantity is greater than c
itime feature quantity;
For example: have a basic herbarium, wherein have 1000 viral samples, certain category feature code has 100 features, N
numberOfAllSignature=100, work as c
i=20 o'clock,
value be exactly the number that the quantity of the viral sample that detects is greater than the feature of 20 o'clock;
S105, with reference to described desired value index, according to the concrete application scenarios requirement of Anti-Virus Engine, chooses required condition code.
Preferably, described desired value index is condition code virus sample recall rate, and the size of described desired value index is directly proportional to the viral sample Detection capability of condition code, wherein, and described N
numberOfAllSignaturefor the feature total quantity of such condition code in basic herbarium; Described
the viral sample quantity that such condition code of serving as reasons detects is greater than c
itime feature quantity.
Preferably, described desired value index is the single virus family recall rate of condition code, and the size of described desired value index is directly proportional to the viral sample Detection capability of the single virus family of condition code, wherein, and described N
numberOfAllSignaturefor the feature total quantity of such condition code in basic herbarium, described in
the viral sample quantity of what such condition code of serving as reasons detected belong to single virus family is greater than c
itime feature quantity.
Preferably, described desired value index is the single Virus Type recall rate of condition code, and the size of described desired value index is directly proportional to the viral sample Detection capability of the single Virus Type of condition code, wherein, and described N
numberOfAllSignaturefor the feature total quantity of such condition code in basic herbarium, described in
the viral sample quantity of what such condition code of serving as reasons detected belong to single Virus Type is greater than c
itime feature quantity.
Preferably, described desired value index is the single operation platform recall rate of condition code, and the size of described desired value index is directly proportional to the viral sample Detection capability of the single operation platform of condition code, wherein, and described N
numberOfAllSignaturefor the feature total quantity of such condition code in basic herbarium, described in
the viral sample quantity of what such condition code of serving as reasons detected belong to single operation platform is greater than c
itime feature quantity.
Preferably, described desired value index is the single file layout recall rate of condition code, and the size of described desired value index is directly proportional to the viral sample Detection capability of the single file layout of condition code, wherein, and described N
numberOfAllSignaturefor the feature total quantity of such condition code in basic herbarium, described in
the viral sample quantity of single file layout that what such condition code of serving as reasons detected belong to is greater than c
itime feature quantity.
The present invention also provides a kind of Anti-Virus Engine condition code evaluation system based on statistics, as shown in Figure 2, comprising:
Basis herbarium preparation module 201, forms basic herbarium for choose required viral sample from virus base;
Condition code extraction module 202, for extracting condition code for described basic herbarium, described condition code is a class or multiclass;
Statistical computation module 203, for for every category feature code, sets n check point, each check point value c
i, with c
icorresponding weights are w
i, described n and c
ivalue depending on basic sample set and condition code situation, described w
ivalue be used for adjusting the value proportion of each check point;
Utilize following formula parameter value index:
Wherein i ∈ [1, n];
Described c
ifor the viral sample quantity detecting, described N
numberOfAllSignaturefor the feature total quantity of such condition code in basic herbarium, described in
what such condition code of serving as reasons detected meet pre-conditioned viral sample quantity is greater than c
itime feature quantity;
Condition code is chosen module 204, for reference to described desired value index, according to the concrete application scenarios requirement of Anti-Virus Engine, chooses required condition code.
Preferably, described desired value index is condition code virus sample recall rate, and the size of described desired value index is directly proportional to the viral sample Detection capability of condition code, wherein, and described N
numberOfAllSignaturefor the feature total quantity of such condition code in basic herbarium; Described
the viral sample quantity that such condition code of serving as reasons detects is greater than c
itime feature quantity.
Preferably, described desired value index is the single virus family recall rate of condition code, and the size of described desired value index is directly proportional to the viral sample Detection capability of the single virus family of condition code, wherein, and described N
numberOfAllSignaturefor the feature total quantity of such condition code in basic herbarium, described in
the viral sample quantity of what such condition code of serving as reasons detected belong to single virus family is greater than c
itime feature quantity.
Preferably, described desired value index is the single Virus Type recall rate of condition code, and the size of described desired value index is directly proportional to the viral sample Detection capability of the single Virus Type of condition code, wherein, and described N
numberOfAllSignaturefor the feature total quantity of such condition code in basic herbarium, described in
the viral sample quantity of what such condition code of serving as reasons detected belong to single Virus Type is greater than c
itime feature quantity.
Preferably, described desired value index is the single operation platform recall rate of condition code, and the size of described desired value index is directly proportional to the viral sample Detection capability of the single operation platform of condition code, wherein, and described N
numberOfAllSignaturefor the feature total quantity of such condition code in basic herbarium, described in
the viral sample quantity of what such condition code of serving as reasons detected belong to single operation platform is greater than c
itime feature quantity.
Preferably, described desired value index is the single file layout recall rate of condition code, and the size of described desired value index is directly proportional to the viral sample Detection capability of the single file layout of condition code, wherein, and described N
numberOfAllSignaturefor the feature total quantity of such condition code in basic herbarium, described in
the viral sample quantity of single file layout that what such condition code of serving as reasons detected belong to is greater than c
itime feature quantity.
As mentioned above, the present invention provided a kind of based on statistics Anti-Virus Engine condition code evaluation method and the specific embodiment of system, the difference of itself and classic method is, the mode of choosing of traditional condition code is not also suitable for all situations, and the evaluation method of traditional condition code is not necessarily suited for each different basic herbarium.The given method and system of the present invention have been set up a set of condition code evaluation system based on basic herbarium, for the basic herbarium of each foundation, again extract condition code, and choose multiple check points and the weights based on corresponding check point need to be set, the desired value computing method based on given are calculated indices value.Calculating by these desired values can be evaluated the applicable scene of all kinds of condition codes, is applicable to the issue flow process of each new edition Anti-Virus Engine.
Above embodiment is unrestricted technical scheme of the present invention in order to explanation.Do not depart from any modification or partial replacement of spirit and scope of the invention, all should be encompassed in the middle of claim scope of the present invention.