CN110210218B - Virus detection method and related device - Google Patents

Virus detection method and related device Download PDF

Info

Publication number
CN110210218B
CN110210218B CN201810402154.1A CN201810402154A CN110210218B CN 110210218 B CN110210218 B CN 110210218B CN 201810402154 A CN201810402154 A CN 201810402154A CN 110210218 B CN110210218 B CN 110210218B
Authority
CN
China
Prior art keywords
sample detection
sample
file
virus
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810402154.1A
Other languages
Chinese (zh)
Other versions
CN110210218A (en
Inventor
雷经纬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201810402154.1A priority Critical patent/CN110210218B/en
Publication of CN110210218A publication Critical patent/CN110210218A/en
Application granted granted Critical
Publication of CN110210218B publication Critical patent/CN110210218B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/562Static detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Computer Hardware Design (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Virology (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention discloses a virus detection method, which comprises the following steps: acquiring a target characteristic vector of a file to be detected; matching the target characteristic vectors by adopting a sample detection rule set to generate a target matching result, wherein the sample detection rule set comprises a first sample detection rule and a second sample detection rule, the first sample detection rule is used for expressing the corresponding relation between a security type and path information, the second sample detection rule is used for expressing the corresponding relation between a virus type and the path information, and the path information is used for indicating the occurrence probability of a behavior identifier; and determining the virus detection result of the file to be detected according to the target matching result. The embodiment of the invention also provides a virus detection device. The embodiment of the invention can save the process of manually extracting the feature code on one hand, and can accurately sense the type of the file to be detected on the other hand, thereby being beneficial to improving the safety of the scheme.

Description

Virus detection method and related device
Technical Field
The present invention relates to the field of information security technologies, and in particular, to a method and a related apparatus for virus detection.
Background
With the development of computer technology and network technology, the types of viruses are more and more, and viruses with strong destructiveness and concealment exist for a long time. The virus is a program or a piece of executable code, and has the characteristics of self-reproduction, mutual infection, activation and regeneration and the like biological viruses. They can attach themselves to various types of files, and they spread along with the files as they are copied or transferred from one user to another.
At present, the virus detection is usually performed in the following manner, first, a virus sample labeled manually is analyzed, then a binary segment is extracted from the virus sample as a feature code, and if a file to be detected hits the feature code, the file is indicated to carry the virus.
However, the above method for determining whether a file carries a virus has the following problems: since the feature code is determined in advance, once a novel virus appears, the novel virus is difficult to detect, in other words, the existing scheme cannot detect unknown virus, which is not favorable for information security.
Disclosure of Invention
The embodiment of the invention provides a virus detection method and a related device, which can save the process of manually extracting feature codes on one hand, and can accurately sense the type of a file to be detected on the other hand, thereby being beneficial to improving the safety of a scheme.
In view of the above, a first aspect of the present invention provides a method for detecting a virus, including:
acquiring a target characteristic vector of a file to be detected;
matching the target characteristic vectors by adopting a sample detection rule set to generate a target matching result, wherein the sample detection rule set comprises a first sample detection rule and a second sample detection rule, the first sample detection rule is used for representing the corresponding relation between the security type and the path information, the second sample detection rule is used for representing the corresponding relation between the virus type and the path information, and the path information is used for indicating the occurrence probability of the behavior identifier;
and determining the virus detection result of the file to be detected according to the target matching result.
In a second aspect, the present invention provides a virus detection apparatus, including:
the acquisition module is used for acquiring a target characteristic vector of the file to be detected;
a generating module, configured to match the target feature vector obtained by the obtaining module by using a sample detection rule set to generate a target matching result, where the sample detection rule set includes a first sample detection rule and a second sample detection rule, the first sample detection rule is used to indicate a correspondence between a security type and path information, the second sample detection rule is used to indicate a correspondence between a virus type and the path information, and the path information is used to indicate an occurrence probability of a behavior identifier;
and the determining module is used for determining the virus detection result of the file to be detected according to the target matching result generated by the generating module.
A third aspect of the present invention provides a virus detection apparatus, comprising: a memory, a transceiver, a processor, and a bus system;
wherein the memory is used for storing programs;
the processor is configured to execute the program in the memory, and includes the steps of:
acquiring a target characteristic vector of a file to be detected;
matching the target characteristic vectors by adopting a sample detection rule set to generate a target matching result, wherein the sample detection rule set comprises a first sample detection rule and a second sample detection rule, the first sample detection rule is used for representing the corresponding relation between the security type and the path information, the second sample detection rule is used for representing the corresponding relation between the virus type and the path information, and the path information is used for indicating the occurrence probability of the behavior identifier;
determining a virus detection result of the file to be detected according to the target matching result;
the bus system is used for connecting the memory and the processor so as to enable the memory and the processor to communicate.
A fourth aspect of the present invention provides a computer-readable storage medium having stored therein instructions, which, when run on a computer, cause the computer to perform the method of the above-described aspects.
According to the technical scheme, the embodiment of the invention has the following advantages:
the embodiment of the invention provides a virus detection method, which comprises the steps of firstly obtaining a target characteristic vector of a file to be detected, matching the target characteristic vector by adopting a sample detection rule set to generate a target matching result, wherein the sample detection rule set comprises a first sample detection rule and a second sample detection rule, the first sample detection rule is used for expressing the corresponding relation between a security type and path information, the second sample detection rule is used for expressing the corresponding relation between a virus type and the path information, the path information is used for indicating the occurrence probability of a behavior identifier, and finally determining the virus detection result of the file to be detected according to the target matching result. Through the mode, on one hand, the process of manually extracting the feature codes can be saved, the matching result of the file to be detected can be obtained by directly matching the sample detection rule set, the matching result can represent the safety of the file to be detected, on the other hand, the sample detection rule set at least comprises the rules for detecting the safety type and the virus type, the type of the file to be detected can be accurately sensed, and the scheme safety is favorably improved.
Drawings
FIG. 1 is a schematic diagram of an architecture of a virus detection system according to an embodiment of the present invention;
FIG. 2 is a diagram illustrating a call relationship of a virus detection system according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of an embodiment of a method for virus detection according to an embodiment of the present invention;
FIG. 4 is a schematic flow chart of obtaining a target feature vector according to an embodiment of the present invention;
FIG. 5 is a schematic flow chart of generating a decision tree model file according to an embodiment of the present invention;
FIG. 6 is a schematic flow chart illustrating the generation of sample detection rules according to an embodiment of the present invention;
FIG. 7 is a diagram of a decision tree model according to an embodiment of the present invention;
FIG. 8 is a schematic flowchart illustrating an exemplary process of checking a file to be checked according to an embodiment of the present invention;
FIG. 9 is a schematic flow chart of virus detection in an application scenario of the present invention;
FIG. 10 is a schematic diagram of an embodiment of a virus detection apparatus in an embodiment of the present invention;
FIG. 11 is a schematic structural diagram of a virus detection apparatus according to an embodiment of the present invention.
Detailed Description
The embodiment of the invention provides a virus detection method and a related device, which can save the process of manually extracting feature codes on one hand, and can accurately sense the type of a file to be detected on the other hand, thereby being beneficial to improving the safety of a scheme.
The terms "first," "second," "third," "fourth," and the like in the description and in the claims, as well as in the drawings, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are, for example, capable of operation in sequences other than those illustrated or otherwise described herein. Moreover, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
It should be understood that the present invention is mainly applicable to detection of Android (Android) virus, and may also be applied to detection of other types of viruses, such as computer virus detection, apple system (iOS) virus detection, microsoft system (Windows) virus detection, etc., and the Android virus detection is taken as an example in the present scheme for description. Android is released with a series of core application packages including a client, a Short Message Service (SMS) program, a calendar, a map, a browser, and a contact management program.
Meanwhile, the Android system is also subjected to the Android virus attack, such as "hundred-bug trojan" (which infects popularization applications), "lizard tail trojan" (which infects system library files, replaces system files, injects system processes, steals information, acquires calls and messages, etc.), and "authority killer" (which resists security software, acquires messages, pops advertisements, promotes and flushes flow rate), etc. The method and the device can detect the known Android viruses and can also detect other unknown Android viruses.
Referring to fig. 1, fig. 1 is a schematic diagram of an architecture of a virus detection system according to an embodiment of the present invention, and as shown in the drawing, a virus detection device in the present disclosure may be deployed in a server, and after the server obtains a virus detection result, the virus detection result is sent to a terminal device, so that a user can know the virus detection result of a file to be detected through a display interface of the terminal device. Optionally, the virus detection device in the present solution may also be deployed in a terminal device, and the terminal device directly detects the file to be detected and displays the virus detection result on a display interface at the front end.
The virus detection device of the present invention may comprise four logic modules, each for implementing a corresponding function. Referring to fig. 2, fig. 2 is a schematic diagram of a call relationship of a virus detection system according to an embodiment of the present invention, and as shown in the figure, the four logic modules are a behavior data extraction module S1, a decision tree model training module S2, a detection flow control module S3, and a rule extraction module S4, respectively. The behavior data extraction module S1 is called by a decision tree model training module S2 and a detection process control module S3. It can be understood that the behavior data extraction module S1 may be an independent module, or may be integrated with the two modules, i.e., the decision tree model training module S2 and the detection process control module S3, respectively. A batch of Android virus samples and Android safety samples are input through the decision tree model training module S2, the decision tree model training module S2 calls the behavior data extraction module S1 to obtain vector representation of the training samples, then the rule extraction module S4 selects a decision tree path and a sample type according to a preset rule generation condition according to a result output by the decision tree model, and therefore a sample detection rule set is generated. The detection flow control module S3 calls the behavior data extraction module S1 to obtain the vector representation of the sample to be detected, and finally calls the sample detection rule set generated by the rule extraction module S4 to obtain the safety state of the sample to be detected.
Referring to fig. 3, a method for detecting a virus according to the present invention will be described below from the perspective of a virus detection apparatus, wherein an embodiment of the method for detecting a virus according to the present invention includes:
101. acquiring a target characteristic vector of a file to be detected;
in this embodiment, first, the virus detection apparatus receives a virus detection instruction, where the virus detection instruction carries an identifier of a file to be detected, and the file to be detected can be determined by the identifier. And then, extracting the behavior identifier of the file to be detected, and generating a target characteristic vector according to the extraction result.
102. Matching the target characteristic vectors by adopting a sample detection rule set to generate a target matching result, wherein the sample detection rule set comprises a first sample detection rule and a second sample detection rule, the first sample detection rule is used for expressing the corresponding relation between the security type and the path information, the second sample detection rule is used for expressing the corresponding relation between the virus type and the path information, and the path information is used for indicating the occurrence probability of the behavior identifier;
in this embodiment, the virus detection apparatus matches the target feature vector by using at least one rule in the sample detection rule set, specifically, the sample detection rule set includes a first sample detection rule and a second sample detection rule, where the first sample detection rule is used to detect a correspondence between a security type and path information, for example, the path information corresponding to the security type is "include a behavior identifier 1, not include a behavior identifier 2, include a behavior identifier 4, and include a behavior identifier 5". It should be understood that the correspondence between the security type and the path information is only an illustration, and should not be construed as a limitation to the present application. The second sample detection rule is used to detect a correspondence between a virus type and path information, for example, the path information corresponding to the virus type is "do not include behavior identifier 1, include behavior identifier 3, do not include behavior identifier 5, and do not include behavior identifier 6". It is to be understood that the correspondence between the virus types and the path information is only an illustration, and should not be construed as a limitation to the present application.
The path information includes the occurrence probability of the behavior identifier, and may indicate that a certain behavior identifier occurs with "1" and indicate that a certain behavior identifier does not occur with "0".
Matching the behavior identifier contained in the target feature vector with the behavior identifier indicated by the sample detection rule (the first sample detection rule or the second sample detection rule), specifically, assuming that the first sample detection rule is: security type-containing behavior identifier 1, not containing behavior identifier 2, containing behavior identifier 4, and containing behavior identifier 5, where "path information" is "containing behavior identifier 1, not containing behavior identifier 2, containing behavior identifier 4, and containing behavior identifier 5". Assuming that the target feature vector is [10011], for the sake of understanding, the behavior identification of the target feature vector will be described by table 1.
TABLE 1
Figure GDA0004094932110000061
As shown in table 1, comparing the behavior identifier in the target feature vector with the behavior identifier defined in the first sample detection rule, it can be easily seen that the target feature vector includes the behavior identifier 1, the behavior identifier 2, the behavior identifier 4, and the behavior identifier 5, and therefore, the target feature vector is considered to be matched with the first sample detection rule, and a corresponding target matching result can be generated.
103. And determining the virus detection result of the file to be detected according to the target matching result.
In this embodiment, the virus detection device determines the virus detection result of the file to be detected according to the target matching result, and can send the virus detection result to the client, so that the user can know whether the file to be detected is safe through the client.
The virus detection result may include a first security type matching the first sample detection rule, a second virus type matching the second sample detection rule, and a third unknown type not matching the first sample detection rule and the second sample detection rule.
The embodiment of the invention provides a virus detection method, which comprises the steps of firstly obtaining a target characteristic vector of a file to be detected, matching the target characteristic vector by adopting a sample detection rule set to generate a target matching result, wherein the sample detection rule set comprises a first sample detection rule and a second sample detection rule, the first sample detection rule is used for representing the corresponding relation between a security type and path information, the second sample detection rule is used for representing the corresponding relation between a virus type and the path information, the path information is used for indicating the occurrence probability of a behavior identifier, and finally determining the virus detection result of the file to be detected according to the target matching result. Through the mode, on one hand, the process of manually extracting the feature codes can be saved, the matching result of the file to be detected can be obtained by directly matching the sample detection rule set, the matching result can represent the safety of the file to be detected, on the other hand, the sample detection rule set at least comprises the rules for detecting the safety type and the virus type, the type of the file to be detected can be accurately sensed, and the scheme safety is favorably improved.
Optionally, on the basis of the embodiment corresponding to fig. 3, in a first optional embodiment of the method for detecting a virus according to the embodiment of the present invention, the obtaining a target feature vector of a file to be detected may include:
acquiring log information of a file to be detected, wherein the log information comprises N behavior identifiers and N trigger times, and N is an integer greater than or equal to 1;
counting the occurrence probability of each behavior identifier in the N behavior identifiers;
and generating a target characteristic vector of the file to be detected according to the N triggering times and the occurrence probability of each behavior identifier.
In this embodiment, how to obtain a target feature vector of a file to be detected is described, where the target feature vector is obtained by arranging behavior identifiers in the order from front to back of trigger time.
Specifically, referring to fig. 4, fig. 4 is a schematic flowchart of a process of obtaining a target feature vector according to an embodiment of the present invention, as shown in step 201, a file to be detected is obtained, where the file to be detected may be a picture file, a video file, a document file, an audio file, an application program, or the like. In step 202, the file to be detected is sent to a simulator for operation, where the simulator may be an Android simulator, the simulator is an operating environment, a log recording function is executed in the operating environment, when the file to be detected operates in the simulator, execution of a certain function is triggered, and at this time, a piece of log information may be output, where the log information includes two fields, namely, a behavior identification field and a trigger time field. In step 203, the virus detection apparatus extracts log information running in the simulator, and finally in step 204, converts the log information into a target feature vector.
How to convert the log information into the target feature vector will be described below with reference to table 2.
TABLE 2
Figure GDA0004094932110000081
Taking table 2 as an example, 11 behavior identifiers (that is, N is 11) and the probability of occurrence of each behavior identifier are counted, if a behavior identifier occurs, the behavior identifier is marked as 1, otherwise, the behavior identifier is marked as 0, a group of feature vectors are obtained by arranging from front to back, and the target feature vector of the file to be detected indicated by table 2 is [ 11 011 011 0].
It should be noted that, in the present solution, the above manner may be adopted to generate the feature vector of the security sample and the feature vector of the virus sample, which is not described herein again.
Secondly, in the embodiment of the invention, the virus detection device can acquire the log information of the file to be detected, then count the occurrence probability of each behavior identifier in the N behavior identifiers, and finally generate the target characteristic vector of the file to be detected according to the N triggering times and the occurrence probability of each behavior identifier. By the method, the characteristic vector can be generated according to the occurrence probability and the trigger time of the behavior identifier, so that the characteristic vector and the behavior identifier have an incidence relation, reliable basis can be provided for subsequent rule matching, and feasibility of a scheme is improved.
Optionally, on the basis of the embodiment corresponding to fig. 3, in a second optional embodiment of the method for virus detection provided in the embodiment of the present invention, before matching the target feature vector by using the sample detection rule set, the method may further include:
acquiring a feature vector of a security sample and a feature vector of a virus sample;
obtaining a first sample detection rule corresponding to a feature vector of a safety sample through a decision tree model, wherein the decision tree model is used for outputting path information and sample types, and the sample types comprise a safety type and a virus type;
and obtaining a second sample detection rule corresponding to the feature vector of the virus sample through the decision tree model.
In this embodiment, how to generate the sample detection rule (including the first sample detection rule and the second sample detection rule) will be described, and how to generate the sample detection rule will be described below with reference to fig. 5, please refer to fig. 5, fig. 5 is a schematic flowchart of generating the decision tree model file in the embodiment of the present invention, as shown in the figure, in step 301, a batch of positive samples and negative samples are obtained, wherein it is assumed that the positive samples are security samples, and the negative samples are virus samples. Then, in step 302, a feature vector of each security sample and a feature vector of each virus sample are generated by using the feature vector generation method provided in the first embodiment corresponding to fig. 3.
In step 303, the feature vectors of the security samples and the feature vectors of the virus samples are trained using a decision tree model, where each sample has a set of attributes and a class, and the classes are determined in advance, so that a classifier is obtained through learning, and the classifier can correctly classify newly-appeared objects. The decision tree model includes judgment conditions, path information and decision results. In step 304, a decision tree model library file (e.g., sample label 1-path information 1; sample label 2-path information 2) is generated according to the path information, wherein the decision tree model library file is used to generate a sample detection rule, if the input is a security sample, the output according to the decision tree model library file may be a first sample detection rule, and if the input is a virus sample, the output according to the decision tree model library file may be a second sample detection rule.
Secondly, in the embodiment of the present invention, the method for generating the first sample detection rule and the second sample detection rule by the virus detection apparatus may be that the feature vector of the security sample and the feature vector of the virus sample are obtained first, then the two are input to the decision tree model, and the path information is determined according to the output decision result, thereby generating the sample detection rule. Through the mode, the decision tree model has the following advantages that firstly, the decision tree is easy to understand and realize, and the characteristics of data can be directly reflected. Second, for decision trees, feasible and well-behaved results can be made on large amounts of data in a relatively short time. Third, the model is easily evaluated by static testing, and model confidence can be determined.
Optionally, on the basis of the second embodiment corresponding to fig. 3, in a third optional embodiment of the method for virus detection provided in the embodiment of the present invention, obtaining, by using the decision tree model, the first sample detection rule corresponding to the feature vector of the security sample may include:
inputting the feature vector of the safety sample into a decision tree model to obtain X first to-be-selected sample detection rules, wherein X is an integer greater than or equal to 1;
selecting Y first sample detection rules meeting preset rule generation conditions from X first sample detection rules to be selected, wherein Y is an integer which is greater than or equal to 1 and less than or equal to X;
obtaining, by the decision tree model, a second sample detection rule corresponding to the feature vector of the virus sample may include:
inputting the feature vector of the virus sample into a decision tree model to obtain Q second sample detection rules to be selected, wherein Q is an integer greater than or equal to 1;
and selecting P second sample detection rules meeting the preset rule generation condition from the Q second candidate sample detection rules, wherein P is an integer which is greater than or equal to 1 and less than or equal to Q.
In this embodiment, how to generate the first sample detection rule and the second sample detection rule will be described. Specifically, the feature vectors of all the safety samples are input to the decision tree model, and X first to-be-selected sample detection rules may be output, but the X first to-be-selected sample detection rules are not necessarily all applicable, for example, path information included in some first to-be-selected sample detection rules is very short, or the effective node ratio is very low, and the like, in this case, first to-be-selected sample detection rules that satisfy the preset rule generation condition need to be selected from the X first to-be-selected sample detection rules, and the first to-be-selected sample detection rules that satisfy the condition are Y first sample detection rules.
Similarly, the feature vectors of all the virus samples are input to the decision tree model, and Q second candidate sample detection rules may be output, but the Q second candidate sample detection rules are not necessarily all applicable, for example, path information included in some second candidate sample detection rules is very short, or the effective node proportion is very low, and the like, in this case, a second candidate sample detection rule meeting a preset rule generation condition needs to be selected from the Q second candidate sample detection rules, and the second candidate sample detection rules meeting the condition are P second sample detection rules.
For convenience of introduction, please refer to fig. 6, where fig. 6 is a schematic flow chart illustrating the generation of the sample detection rule according to the embodiment of the present invention, and as shown in the figure, specifically:
in step 401, a decision tree model file is obtained;
in step 402, the decision tree model file may be filtered according to two conditions, where the first condition is filtering according to the path length, and the condition that the path length is short is regarded as the condition that the rule generation condition is not satisfied;
in step 403, the second condition is filtering according to the forward node proportion, and the condition that the forward node proportion is low is regarded as the condition that the rule generation condition is not met;
in step 404, a sample detection rule set (including a first sample detection rule and a second sample detection rule) is generated by using the filtered decision tree model file (including the sample label and the path information).
In the embodiment of the present invention, it is considered that not all path information is suitable for constructing the sample detection rule, and therefore some "thresholds" need to be set to generate the sample detection rule. By the method, the reliability of the sample detection rule can be improved, so that the type of the file to be detected can be accurately sensed, and the safety of the scheme is improved.
Optionally, on the basis of the third embodiment corresponding to fig. 3, in a fourth optional embodiment of the method for virus detection provided in the embodiment of the present invention, selecting Y first sample detection rules that satisfy the preset rule generation condition from among X first to-be-selected sample detection rules may include:
selecting Y first sample detection rules with path length larger than a preset length threshold from the X first sample detection rules to be selected;
selecting P second sample detection rules meeting the preset rule generation condition from the Q second candidate sample detection rules, wherein the P second sample detection rules comprise:
and selecting P second sample detection rules with the path length larger than a preset length threshold from the Q second sample detection rules to be selected.
In this embodiment, a method for selecting a sample detection rule will be described, for convenience of description, please refer to fig. 7, fig. 7 is a schematic diagram of a decision tree model in an embodiment of the present invention, and as shown in fig. 6, fig. 6 is a decision tree model with a depth of 6, which is a path from a vertex to each sample label (virus or security), and there are 10 paths in total. Taking the first sample detection rule (i.e. the security label path shown by the shaded portion) as an example, the label corresponding to the first sample detection rule from the vertex passes through 6 judgment nodes (i.e. path information), which are respectively the behavior flag [4] >0.35, the behavior flag [2] >0.235, the behavior flag [1] >0.35, the behavior flag [7] >0.76, the behavior flag [5] >0.65, and the behavior flag [72] >0.75, and since the feature vector is not 0 or 1, the situation of unknown security does not occur.
The first sample detection rule generated according to the 6 judgment nodes is as follows: presence of action id 4, presence of action id 2, presence of action id 1, presence of action id 7, presence of action id 5, absence of action id 72.
The path length is used for filtering, the path length can be required to be greater than or equal to 2/3 of the depth of the tree, and if the depth of the decision tree model is 30, the preset length threshold selected by the scheme is 30 multiplied by 2/3=20. It should be noted that the preset length threshold may be 2/3, or may be other reasonable values, which is only an illustration here and should not be construed as a limitation to the present invention.
Further, in the embodiment of the present invention, a sample detection rule meeting requirements may be selected from the candidate sample detection rules according to the path length, that is, a first sample detection rule and a second sample detection rule are generated. Through the mode, the generated sample detection rule set has better reliability, the required path length is greater than the preset length threshold, otherwise, the generated sample detection rule set is regarded as unqualified path information, and the corresponding sample detection rule can not be generated, so that the feasibility and the practicability of the scheme are improved.
Optionally, on the basis of the third or fourth embodiment corresponding to fig. 3, in a fifth optional embodiment of the method for virus detection provided in the embodiment of the present invention, selecting Y first sample detection rules that satisfy the preset rule generation condition from X first to-be-selected sample detection rules may include:
selecting Y first sample detection rules with the forward node proportion larger than a preset proportion threshold from X first sample detection rules to be selected, wherein the forward node proportion represents the proportion of the forward node quantity in the total node quantity, and the forward node represents a node containing a behavior identifier;
selecting P second sample detection rules meeting the preset rule generation condition from the Q second candidate sample detection rules, wherein the P second sample detection rules comprise:
and selecting P second sample detection rules of which the forward node proportion is greater than a preset proportion threshold from the Q second sample detection rules to be selected.
In this embodiment, based on the third optional embodiment corresponding to fig. 3, a method for selecting a sample detection rule is further provided. Specifically, for the decision tree, the path information is composed of a plurality of nodes including a behavior identifier and not including a behavior identifier, and the forward node means including a behavior identifier. The scheme requires that the proportion of the forward nodes is greater than or equal to a preset proportion threshold, and if the length of a certain path is 20 and the preset proportion threshold is 4/5, the minimum number of the forward nodes is 20 × 4/5=16.
It should be noted that the preset ratio threshold may be 4/5, or may be other reasonable values, which is only an illustration here and should not be construed as a limitation to the present invention.
Still further, in the embodiment of the present invention, a sample detection rule meeting requirements may be selected from the candidate sample detection rules according to the forward node ratio, that is, a first sample detection rule and a second sample detection rule are generated. Through the method, the generated sample detection rule set has better reliability, the forward node proportion is required to be greater than the preset proportion threshold, otherwise, the generated sample detection rule set is regarded as unqualified path information, and the corresponding sample detection rule can not be generated, so that the feasibility and the practicability of the scheme are improved.
Optionally, on the basis of the embodiment corresponding to fig. 3, in a sixth optional embodiment of the method for virus detection provided in the embodiment of the present invention, matching the target feature vector by using the sample detection rule set to generate a target matching result, may include:
judging whether the target characteristic vector meets a first sample detection rule, and if so, generating a first matching result;
if the target characteristic vector does not meet the first sample detection rule, judging whether the target characteristic vector meets a second sample detection rule, and if the target characteristic vector meets the second sample detection rule, generating a second matching result;
and if the target characteristic vector does not meet the second sample detection rule, generating a third matching result.
In this embodiment, the virus detection apparatus sequentially matches the matching target feature vector with the sample detection rule. Assuming that the sample detection rule set comprises a first sample detection rule and a second sample detection rule, firstly, the virus detection device judges whether the target characteristic vector meets the first sample detection rule, if so, the first matching result is directly generated, otherwise, the virus detection device continuously judges whether the next rule is met, namely, whether the target characteristic vector meets the second sample detection rule, and if the target characteristic vector meets the second sample detection rule, the virus detection device generates a second matching result. If the target feature vector does not satisfy either the first sample detection rule or the second sample detection rule, a third matching result is generated.
It is understood that, in practical applications, the matching order of the sample detection rules in the selected sample detection rule set is not limited, and the second sample detection rule may be matched first, and then the first sample detection rule may be matched, or vice versa.
Referring to fig. 8, a flow of detecting a type of a file to be detected will be described below, and fig. 8 is a schematic flow chart of detecting the file to be detected in the embodiment of the present invention, as shown in the figure, specifically:
in step 501, a batch of positive samples and negative samples are obtained, where the positive samples may refer to safety samples, and the negative samples may refer to virus samples. It should be noted that, in practical applications, the positive sample may also be set as a virus sample, and the negative sample may be set as a safety sample, which depends on the setting of the positive and negative samples by the user in advance;
in step 502, respectively sending the security sample and the virus sample into a simulator, and generating log information output by the simulator to generate a feature vector of the security sample and a feature vector of the virus sample;
in step 503, inputting the feature vector of the safety sample and the feature vector of the virus sample into a decision tree model for training;
in step 504, after model training, a model library file, that is, a decision tree model file, may be obtained, where the decision tree model file may be stored and copied for subsequent detection and invocation, and the decision tree model file may be understood as a configuration file;
in step 505, filtering each piece of path information according to the decision tree model file, where the filtering method may be: if the path length is greater than or equal to 2/3 of the decision tree depth and the number of forward nodes is greater than or equal to 4/5 of the path length, the path information may be determined to be a sample detection rule;
in step 506, classifying the sample detection rule generated in step 505 to obtain a sample detection rule set;
in step 507, acquiring a file to be detected;
in step 508, extracting the target characteristic vector of the file to be detected, and matching the target characteristic vector of the file to be detected with each sample detection rule contained in the sample detection rule set;
in step 509, it is determined whether the target feature vector matches a sample detection rule included in the sample detection rule set, if the target feature vector matches the virus sample detection rule, step 511 is performed, and if the target feature vector does not match the virus sample detection rule, step 510 is performed;
in step 510, it is determined whether the target feature vector matches a sample detection rule included in the sample detection rule set, if the target feature vector matches a safe sample detection rule, step 513 is performed, and if the target feature vector does not match a virus sample detection rule, step 512 is performed;
in step 511, the file to be detected is determined as a virus file;
in step 512, the security condition of the file to be detected cannot be determined, or the file to be detected is considered as a secure file;
in step 513, it is determined that the file to be detected is a security file.
Secondly, in the embodiment of the present invention, the virus detection apparatus matches the target feature vector with the rules in the sample detection rule set, and if a certain rule is not matched, the virus detection apparatus continues to match the next rule until a result is matched, or determines that the result is not matched with all rules. By the method, the matching result of the file to be detected can be accurately obtained, so that the reliability of the scheme is improved.
Optionally, on the basis of the sixth embodiment corresponding to fig. 3, in a seventh optional embodiment of the method for virus detection provided in the embodiment of the present invention, determining a virus detection result of the file to be detected according to the target matching result may include:
if the target matching result is the first matching result, determining that the file to be detected belongs to the security file;
if the target matching result is the second matching result, determining that the file to be detected belongs to the virus file;
and if the target matching result is the third matching result, determining that the file to be detected belongs to the unknown security file.
In this embodiment, the virus detection device may obtain the type of the file to be detected according to the virus detection result generated by the sample detection rule set.
Specifically, if the target feature vector of the file to be detected is matched with the first sample detection rule, the target matching result is determined to be the first matching result, and the file to be detected can be determined to belong to a safe file for subsequent operation. If the target feature vector of the file to be detected is matched with the second sample detection rule, the target matching result is determined to be the second matching result, and the file to be detected can be determined to belong to the file of the virus type, and the file of the virus type is usually required to be isolated. And assuming that the target characteristic vector of the file to be detected is not matched with the first sample detection rule or the second sample detection rule, the file to be detected is regarded as an unknown safe file, namely, the file to be detected is regarded as a suspicious file.
In the embodiment of the present invention, the virus detection apparatus determines the type of the file to be detected according to the target matching result, that is, the first matching result is used to indicate that the file to be detected is a secure file, the second matching result is used to indicate that the file to be detected is a virus file, and the third matching result is used to indicate that the file to be detected is a location-secure file. By the method, the type of the file to be detected can be accurately known, the virus type can be determined, the safety type and the condition of unknown safety type can be distinguished, and therefore the practicability and safety of the scheme are improved.
For convenience of understanding, the following will describe the virus detection process with reference to fig. 9, please refer to fig. 9, and fig. 9 is a schematic flow chart of virus detection in the application scenario of the present invention, and as shown in the figure, specifically:
in step 601, virus detection is started;
in step 602, a batch of security samples and virus samples for generating a sample detection rule set (a first sample detection rule and a second sample detection rule) are selected;
in step 603, selecting a file to be detected;
step 604, specifically, the method includes four steps, in step 6041, a security sample, a virus sample, and a file to be detected are obtained, in step 6042, the security sample, the virus sample, and the file to be detected are sent to a simulator to operate, then, in step 6043, log information of the security sample, log information of the virus sample, and log information of the file to be detected are respectively extracted from the simulator, and finally, in step 6044, the log information of the security sample is converted into feature information of the security sample, the log information of the virus sample is converted into feature information of the virus sample, and the log information of the file to be detected is converted into a target feature vector of the file to be detected;
in step 605, inputting the characteristic information of the virus sample and the characteristic information of the safety sample into a decision tree model for training;
in step 606, the decision tree model library file obtained by training is filtered, and the purpose of filtering is mainly to screen out the sample detection rules meeting the rule generation conditions. The decision tree model library file can be stored and copied for subsequent virus detection and calling, and the decision tree model library file can be understood as a configuration file;
in step 607, extracting a target feature vector of the file to be detected;
in step 608, matching the target characteristic vector of the file to be detected with the sample detection rule;
in step 609, judging whether the target feature vector is matched with a sample detection rule contained in the sample detection rule set, if so, entering step 611, and if not, entering step 610;
in step 610, determining whether the target feature vector matches a sample detection rule included in the sample detection rule set, if the target feature vector matches the sample detection rule included in the sample detection rule set, entering step 613, and if the target feature vector does not match the virus sample detection rule, entering step 612;
in step 611, the file to be detected is determined to be a virus file;
in step 612, the security condition of the file to be detected cannot be judged;
in step 613, the file to be detected is determined to be a security file.
Referring to fig. 10, the virus detection apparatus of the present invention is described in detail below, where fig. 10 is a schematic view of an embodiment of the virus detection apparatus of the present invention, and the virus detection apparatus 70 includes:
an obtaining module 701, configured to obtain a target feature vector of a file to be detected;
a generating module 702, configured to match the target feature vector obtained by the obtaining module 701 with a sample detection rule set to generate a target matching result, where the sample detection rule set includes a first sample detection rule and a second sample detection rule, the first sample detection rule is used to indicate a correspondence between a security type and path information, the second sample detection rule is used to indicate a correspondence between a virus type and the path information, and the path information is used to indicate an occurrence probability of a behavior identifier;
a determining module 703, configured to determine a virus detection result of the file to be detected according to the target matching result generated by the generating module 702.
In this embodiment, an obtaining module 701 obtains a target feature vector of a file to be detected, a generating module 702 matches the target feature vector obtained by the obtaining module 701 by using a sample detection rule set to generate a target matching result, where the sample detection rule set includes a first sample detection rule and a second sample detection rule, the first sample detection rule is used to represent a correspondence between a security type and path information, the second sample detection rule is used to represent a correspondence between a virus type and the path information, the path information is used to indicate an occurrence probability of a behavior identifier, and a determining module 703 determines the virus detection result of the file to be detected according to the target matching result generated by the generating module 702.
The embodiment of the invention provides a virus detection device, which comprises the steps of firstly obtaining a target characteristic vector of a file to be detected, matching the target characteristic vector by adopting a sample detection rule set to generate a target matching result, wherein the sample detection rule set comprises a first sample detection rule and a second sample detection rule, the first sample detection rule is used for representing the corresponding relation between a security type and path information, the second sample detection rule is used for representing the corresponding relation between a virus type and the path information, the path information is used for indicating the occurrence probability of a behavior identifier, and finally determining the virus detection result of the file to be detected according to the target matching result. Through the mode, on one hand, the process of manually extracting the feature codes can be saved, the matching result of the file to be detected can be obtained by directly matching the sample detection rule set, the matching result can represent the safety of the file to be detected, on the other hand, the sample detection rule set at least comprises the rules for detecting the safety type and the virus type, the type of the file to be detected can be accurately sensed, and the scheme safety is favorably improved.
Optionally, on the basis of the embodiment corresponding to fig. 10, in another embodiment of the virus detection apparatus provided in the embodiment of the present invention,
the obtaining module 701 is specifically configured to obtain log information of the file to be detected, where the log information includes N behavior identifiers and N trigger times, and N is an integer greater than or equal to 1;
counting the occurrence probability of each behavior identifier in the N behavior identifiers;
and generating the target characteristic vector of the file to be detected according to the N triggering times and the occurrence probability of each behavior identifier.
Secondly, in the embodiment of the invention, the virus detection device can acquire the log information of the file to be detected, then count the occurrence probability of each behavior identifier in the N behavior identifiers, and finally generate the target characteristic vector of the file to be detected according to the N triggering times and the occurrence probability of each behavior identifier. By the method, the characteristic vector can be generated according to the occurrence probability and the trigger time of the behavior identifier, so that the characteristic vector and the behavior identifier have an incidence relation, reliable basis can be provided for subsequent rule matching, and feasibility of a scheme is improved.
Optionally, on the basis of the embodiment corresponding to fig. 10, in another embodiment of the virus detection apparatus provided in the embodiment of the present invention,
the obtaining module 701 is further configured to obtain a feature vector of a security sample and a feature vector of a virus sample before the generating module 702 matches the target feature vector with a sample detection rule set;
obtaining the first sample detection rule corresponding to the feature vector of the security sample through a decision tree model, wherein the decision tree model is used for outputting the path information and a sample type, and the sample type comprises the security type and the virus type;
and obtaining the second sample detection rule corresponding to the feature vector of the virus sample through the decision tree model.
In the embodiment of the present invention, the method for generating the first sample detection rule and the second sample detection rule by the virus detection apparatus may be that the feature vector of the security sample and the feature vector of the virus sample are obtained first, and then the obtained feature vectors and the feature vectors are input to the decision tree model, and the path information is determined according to the output decision result, thereby generating the sample detection rule. Through the mode, the decision tree model has the following advantages that firstly, the decision tree is easy to understand and realize, and the characteristics of data can be directly reflected. Second, for decision trees, feasible and effective results can be made on large amounts of data in a relatively short time. Third, the model is easily evaluated by static testing, and model confidence can be determined.
Optionally, on the basis of the embodiment corresponding to fig. 10, in another embodiment of the virus detection apparatus provided in the embodiment of the present invention,
the obtaining module 701 is specifically configured to input the feature vector of the safety sample to the decision tree model to obtain X first detection rules to be selected, where X is an integer greater than or equal to 1;
selecting Y first sample detection rules meeting preset rule generation conditions from the X first sample detection rules to be selected, wherein Y is an integer which is greater than or equal to 1 and less than or equal to X;
inputting the feature vector of the virus sample into the decision tree model to obtain Q second candidate sample detection rules, wherein Q is an integer greater than or equal to 1;
and selecting P second sample detection rules meeting preset rule generation conditions from the Q second sample detection rules to be selected, wherein P is an integer which is greater than or equal to 1 and less than or equal to Q.
In the embodiment of the present invention, it is considered that not all path information is suitable for constructing the sample detection rule, and therefore some "thresholds" need to be set to generate the sample detection rule. By the method, the reliability of the sample detection rule can be improved, so that the type of the file to be detected can be accurately sensed, and the safety of the scheme is improved.
Optionally, on the basis of the embodiment corresponding to fig. 10, in another embodiment of the virus detection apparatus provided in the embodiment of the present invention,
the obtaining module 701 is specifically configured to select, from the X first to-be-selected sample detection rules, Y first sample detection rules whose path lengths are greater than a preset length threshold;
and selecting P second sample detection rules with path lengths larger than the preset length threshold from the Q second sample detection rules to be selected.
Further, in the embodiment of the present invention, a sample detection rule meeting requirements may be selected from the candidate sample detection rules according to the path length, that is, a first sample detection rule and a second sample detection rule are generated. Through the mode, the generated sample detection rule set has better reliability, the required path length is greater than the preset length threshold, otherwise, the generated sample detection rule set is regarded as unqualified path information, and the corresponding sample detection rule can not be generated, so that the feasibility and the practicability of the scheme are improved.
Optionally, on the basis of the embodiment corresponding to fig. 10, in another embodiment of the virus detection apparatus provided in the embodiment of the present invention,
the obtaining module 701 is specifically configured to select, from the X first to-be-selected sample detection rules, the Y first sample detection rules whose forward node proportion is greater than a preset proportion threshold, where the forward node proportion indicates a proportion of a forward node number in a total node number, and the forward node indicates a node including a behavior identifier;
and selecting the P second sample detection rules of which the forward node proportion is greater than a preset proportion threshold from the Q second sample detection rules to be selected.
Still further, in the embodiment of the present invention, a sample detection rule meeting requirements may be selected from the candidate sample detection rules according to the forward node ratio, that is, a first sample detection rule and a second sample detection rule are generated. Through the mode, the generated sample detection rule set has better reliability, the proportion of the forward node is required to be greater than the preset proportion threshold, otherwise, the generated sample detection rule set is regarded as unqualified path information, and the corresponding sample detection rule set cannot be generated, so that the feasibility and the practicability of the scheme are improved.
Optionally, on the basis of the embodiment corresponding to fig. 10, in another embodiment of the virus detection apparatus provided in the embodiment of the present invention,
the generating module 702 is configured to determine whether the target feature vector satisfies the first sample detection rule, and if the target feature vector satisfies the first sample detection rule, generate a first matching result;
if the target characteristic vector does not meet the first sample detection rule, judging whether the target characteristic vector meets the second sample detection rule, and if the target characteristic vector meets the second sample detection rule, generating a second matching result;
and if the target characteristic vector does not meet the second sample detection rule, generating a third matching result.
Secondly, in the embodiment of the present invention, the virus detection apparatus matches the target feature vector with the rules in the sample detection rule set, and if a certain rule is not matched, the virus detection apparatus continues to match the next rule until a result is matched, or determines that the result is not matched with all rules. By the method, the matching result of the file to be detected can be accurately obtained, so that the reliability of the scheme is improved.
Optionally, on the basis of the embodiment corresponding to fig. 10, in another embodiment of the virus detection apparatus provided in the embodiment of the present invention,
the determining module 703 is configured to determine that the to-be-detected file belongs to a security file if the target matching result is the first matching result;
if the target matching result is the second matching result, determining that the file to be detected belongs to a virus file;
and if the target matching result is the third matching result, determining that the file to be detected belongs to an unknown security file.
In the embodiment of the present invention, the virus detection apparatus determines the type of the file to be detected according to the target matching result, that is, the first matching result is used to indicate that the file to be detected is a secure file, the second matching result is used to indicate that the file to be detected is a virus file, and the third matching result is used to indicate that the file to be detected is a location-secure file. By the method, the type of the file to be detected can be accurately known, the virus type can be determined, the safety type and the condition of unknown safety type can be distinguished, and therefore the practicability and safety of the scheme are improved.
Fig. 11 is a schematic diagram of a server 800 according to an embodiment of the present invention, which may include one or more Central Processing Units (CPUs) 822 (e.g., one or more processors) and a memory 832, one or more storage media 830 (e.g., one or more mass storage devices) storing applications 842 or data 844. Memory 832 and storage medium 830 may be, among other things, transient or persistent storage. The program stored in the storage medium 830 may include one or more modules (not shown), each of which may include a series of instruction operations for the server. Still further, a central processor 822 may be provided in communication with the storage medium 830 for executing a series of instruction operations in the storage medium 830 on the server 800.
The server 800 may also include one or more power supplies 826, one or more wired or wireless network interfaces 850, one or more input-output interfaces 858, and/or one or more operating systems 841, such as Windows Server, mac OS XTM, unixTM, linuxTM, freeBSDTM, and so forth.
The steps performed by the server in the above embodiment may be based on the server structure shown in fig. 11.
CPU 822 is operative to perform the following steps:
acquiring a target characteristic vector of a file to be detected;
matching the target characteristic vectors by adopting a sample detection rule set to generate a target matching result, wherein the sample detection rule set comprises a first sample detection rule and a second sample detection rule, the first sample detection rule is used for expressing the corresponding relation between a security type and path information, the second sample detection rule is used for expressing the corresponding relation between a virus type and the path information, and the path information is used for indicating the occurrence probability of a behavior identifier;
and determining the virus detection result of the file to be detected according to the target matching result.
Optionally, CPU 822 is specifically configured to perform the following steps:
acquiring log information of the file to be detected, wherein the log information comprises N behavior identifiers and N trigger times, and N is an integer greater than or equal to 1;
counting the occurrence probability of each behavior identifier in the N behavior identifiers;
and generating the target characteristic vector of the file to be detected according to the N triggering times and the occurrence probability of each behavior identifier.
Optionally, CPU 822 is further configured to perform the steps of:
acquiring a feature vector of a security sample and a feature vector of a virus sample;
obtaining the first sample detection rule corresponding to the feature vector of the security sample through a decision tree model, wherein the decision tree model is used for outputting the path information and a sample type, and the sample type comprises the security type and the virus type;
and obtaining the second sample detection rule corresponding to the feature vector of the virus sample through the decision tree model.
Optionally, CPU 822 is specifically configured to perform the following steps:
inputting the feature vector of the safety sample into the decision tree model to obtain X first to-be-selected sample detection rules, wherein X is an integer greater than or equal to 1;
selecting Y first sample detection rules meeting preset rule generation conditions from the X first sample detection rules to be selected, wherein Y is an integer which is greater than or equal to 1 and less than or equal to X;
inputting the feature vector of the virus sample into the decision tree model to obtain Q second candidate sample detection rules, wherein Q is an integer greater than or equal to 1;
and selecting P second sample detection rules meeting preset rule generation conditions from the Q second candidate sample detection rules, wherein P is an integer which is greater than or equal to 1 and less than or equal to Q.
Optionally, CPU 822 is specifically configured to perform the following steps:
selecting Y first sample detection rules with path length larger than a preset length threshold from the X first sample detection rules to be selected;
and selecting P second sample detection rules with path lengths larger than the preset length threshold from the Q second sample detection rules to be selected.
Optionally, CPU 822 is specifically configured to perform the following steps:
selecting Y first sample detection rules with forward node proportion larger than a preset proportion threshold from the X first sample detection rules to be selected, wherein the forward node proportion represents the proportion of the forward node quantity in the total node quantity, and the forward node represents a node containing a behavior identifier;
and selecting the P second sample detection rules of which the forward node proportion is greater than a preset proportion threshold from the Q second sample detection rules to be selected.
Optionally, CPU 822 is specifically configured to perform the following steps:
judging whether the target characteristic vector meets the first sample detection rule, and if the target characteristic vector meets the first sample detection rule, generating a first matching result;
if the target characteristic vector does not meet the first sample detection rule, judging whether the target characteristic vector meets the second sample detection rule, and if the target characteristic vector meets the second sample detection rule, generating a second matching result;
and if the target characteristic vector does not meet the second sample detection rule, generating a third matching result.
Optionally, CPU 822 is specifically configured to perform the following steps:
if the target matching result is the first matching result, determining that the file to be detected belongs to a security file;
if the target matching result is the second matching result, determining that the file to be detected belongs to a virus file;
and if the target matching result is the third matching result, determining that the file to be detected belongs to an unknown security file.
It can be clearly understood by those skilled in the art that, for convenience and simplicity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the several embodiments provided in the present invention, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one type of logical functional division, and other divisions may be realized in practice, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (15)

1. A method for virus detection, comprising:
acquiring a target characteristic vector of a file to be detected;
acquiring a feature vector of a security sample and a feature vector of a virus sample;
inputting the feature vector of the safety sample into a decision tree model to obtain X first to-be-selected sample detection rules, wherein X is an integer greater than or equal to 1;
selecting Y first sample detection rules meeting preset rule generation conditions from the X first sample detection rules to be selected, wherein Y is an integer which is greater than or equal to 1 and less than or equal to X, the decision tree model is used for outputting path information and sample types, and the sample types comprise security types and virus types;
inputting the feature vector of the virus sample into the decision tree model to obtain Q second candidate sample detection rules, wherein Q is an integer greater than or equal to 1;
selecting P second sample detection rules meeting preset rule generation conditions from the Q second sample detection rules to be selected, wherein P is an integer which is greater than or equal to 1 and less than or equal to Q;
matching the target feature vectors by adopting a sample detection rule set to generate a target matching result, wherein the sample detection rule set comprises a first sample detection rule and a second sample detection rule, the first sample detection rule is used for expressing the corresponding relation between the security type and the path information, the second sample detection rule is used for expressing the corresponding relation between the virus type and the path information, and the path information is used for indicating the occurrence probability of a behavior identifier;
and determining the virus detection result of the file to be detected according to the target matching result.
2. The method according to claim 1, wherein the obtaining the target feature vector of the file to be detected comprises:
acquiring log information of the file to be detected, wherein the log information comprises N behavior identifiers and N trigger times, and N is an integer greater than or equal to 1;
counting the occurrence probability of each behavior identifier in the N behavior identifiers;
and generating the target characteristic vector of the file to be detected according to the N triggering times and the occurrence probability of each behavior identifier.
3. The method according to claim 1, wherein the selecting Y first sample detection rules satisfying a preset rule generation condition from the X first candidate sample detection rules comprises:
selecting Y first sample detection rules with path length larger than a preset length threshold from the X first sample detection rules to be selected;
the selecting, from the Q second candidate sample detection rules, P second sample detection rules that satisfy a preset rule generation condition includes:
and selecting P second sample detection rules with path length larger than the preset length threshold from the Q second to-be-selected sample detection rules.
4. The method according to claim 1, wherein the selecting Y first sample detection rules satisfying a preset rule generation condition from the X first candidate sample detection rules comprises:
selecting Y first sample detection rules with forward node proportion larger than a preset proportion threshold from the X first sample detection rules to be selected, wherein the forward node proportion represents the proportion of the forward node quantity in the total node quantity, and the forward node represents a node containing a behavior identifier;
the selecting, from the Q second candidate sample detection rules, P second sample detection rules that satisfy a preset rule generation condition includes:
and selecting the P second sample detection rules of which the forward node proportion is greater than a preset proportion threshold from the Q second sample detection rules to be selected.
5. The method of claim 1, wherein matching the target feature vector using a set of sample detection rules to generate a target matching result comprises:
if the target feature vector meets the first sample detection rule, generating a first matching result;
if the target characteristic vector does not meet the first sample detection rule, judging whether the target characteristic vector meets the second sample detection rule, and if the target characteristic vector meets the second sample detection rule, generating a second matching result;
and if the target characteristic vector does not meet the second sample detection rule, generating a third matching result.
6. The method according to claim 5, wherein the determining the virus detection result of the file to be detected according to the target matching result comprises:
if the target matching result is the first matching result, determining that the to-be-detected file belongs to a safe file;
if the target matching result is the second matching result, determining that the file to be detected belongs to a virus file;
and if the target matching result is the third matching result, determining that the file to be detected belongs to an unknown security file.
7. A virus detection device, comprising:
the acquisition module is used for acquiring a target characteristic vector of the file to be detected;
the acquisition module is also used for acquiring the characteristic vector of the security sample and the characteristic vector of the virus sample; inputting the feature vector of the safety sample into a decision tree model to obtain X first to-be-selected sample detection rules, wherein X is an integer greater than or equal to 1; selecting Y first sample detection rules meeting preset rule generation conditions from the X first sample detection rules to be selected, wherein Y is an integer which is greater than or equal to 1 and less than or equal to X, the decision tree model is used for outputting path information and sample types, and the sample types comprise security types and virus types; inputting the feature vector of the virus sample into the decision tree model to obtain Q second candidate sample detection rules, wherein Q is an integer greater than or equal to 1; selecting P second sample detection rules meeting preset rule generation conditions from the Q second sample detection rules to be selected, wherein P is an integer which is greater than or equal to 1 and less than or equal to Q;
a generating module, configured to match the target feature vector obtained by the obtaining module by using a sample detection rule set to generate a target matching result, where the sample detection rule set includes the first sample detection rule and the second sample detection rule, the first sample detection rule is used to indicate a correspondence between the security type and the path information, the second sample detection rule is used to indicate a correspondence between the virus type and the path information, and the path information is used to indicate an occurrence probability of a behavior identifier;
and the determining module is used for determining the virus detection result of the file to be detected according to the target matching result generated by the generating module.
8. The virus detection apparatus according to claim 7,
the acquisition module is specifically configured to acquire log information of the file to be detected, where the log information includes N behavior identifiers and N trigger times, and N is an integer greater than or equal to 1;
counting the occurrence probability of each behavior identifier in the N behavior identifiers;
and generating the target characteristic vector of the file to be detected according to the N triggering times and the occurrence probability of each behavior identifier.
9. The virus detection apparatus according to claim 7,
the acquisition module is specifically configured to select Y first sample detection rules, of which path lengths are greater than a preset length threshold, from the X first to-be-selected sample detection rules;
and selecting P second sample detection rules with path length larger than the preset length threshold from the Q second to-be-selected sample detection rules.
10. The virus detection apparatus according to claim 7,
the obtaining module is specifically configured to select the Y first sample detection rules, of which a forward node ratio is greater than a preset ratio threshold, from the X first sample detection rules to be selected, where the forward node ratio indicates a ratio of a forward node number to a total node number, and the forward node indicates a node including a behavior identifier;
and selecting the P second sample detection rules of which the forward node proportion is greater than a preset proportion threshold from the Q second sample detection rules to be selected.
11. The virus detection apparatus of claim 7, wherein the generation module is configured to:
if the target feature vector meets the first sample detection rule, generating a first matching result;
if the target characteristic vector does not meet the first sample detection rule, judging whether the target characteristic vector meets the second sample detection rule, and if the target characteristic vector meets the second sample detection rule, generating a second matching result;
and if the target characteristic vector does not meet the second sample detection rule, generating a third matching result.
12. The virus detection apparatus of claim 11, wherein the determining module is configured to:
if the target matching result is the first matching result, determining that the to-be-detected file belongs to a safe file;
if the target matching result is the second matching result, determining that the file to be detected belongs to a virus file;
and if the target matching result is the third matching result, determining that the file to be detected belongs to an unknown security file.
13. A virus detection apparatus, comprising: a memory, a transceiver, a processor, and a bus system;
wherein the memory is used for storing programs;
the processor is used for executing the program in the memory and comprises the following steps:
acquiring a target characteristic vector of a file to be detected;
acquiring a feature vector of a security sample and a feature vector of a virus sample;
inputting the feature vector of the safety sample into a decision tree model to obtain X first to-be-selected sample detection rules, wherein X is an integer greater than or equal to 1;
selecting Y first sample detection rules meeting preset rule generation conditions from the X first sample detection rules to be selected, wherein Y is an integer which is greater than or equal to 1 and less than or equal to X, the decision tree model is used for outputting path information and sample types, and the sample types comprise security types and virus types;
inputting the feature vector of the virus sample into the decision tree model to obtain Q second candidate sample detection rules, wherein Q is an integer greater than or equal to 1;
selecting P second sample detection rules meeting preset rule generation conditions from the Q second sample detection rules to be selected, wherein P is an integer which is greater than or equal to 1 and less than or equal to Q;
matching the target feature vectors by adopting a sample detection rule set to generate a target matching result, wherein the sample detection rule set comprises a first sample detection rule and a second sample detection rule, the first sample detection rule is used for expressing the corresponding relation between the security type and the path information, the second sample detection rule is used for expressing the corresponding relation between the virus type and the path information, and the path information is used for indicating the occurrence probability of a behavior identifier;
determining a virus detection result of the file to be detected according to the target matching result;
the bus system is used for connecting the memory and the processor so as to enable the memory and the processor to communicate.
14. The virus detection apparatus of claim 13, wherein the processor is specifically configured to perform the steps of:
acquiring log information of the file to be detected, wherein the log information comprises N behavior identifiers and N trigger times, and N is an integer greater than or equal to 1;
counting the occurrence probability of each behavior identifier in the N behavior identifiers;
and generating the target characteristic vector of the file to be detected according to the N triggering times and the occurrence probability of each behavior identifier.
15. A computer-readable storage medium comprising instructions that, when executed on a computer, cause the computer to perform the method of any of claims 1 to 6.
CN201810402154.1A 2018-04-28 2018-04-28 Virus detection method and related device Active CN110210218B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810402154.1A CN110210218B (en) 2018-04-28 2018-04-28 Virus detection method and related device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810402154.1A CN110210218B (en) 2018-04-28 2018-04-28 Virus detection method and related device

Publications (2)

Publication Number Publication Date
CN110210218A CN110210218A (en) 2019-09-06
CN110210218B true CN110210218B (en) 2023-04-14

Family

ID=67778796

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810402154.1A Active CN110210218B (en) 2018-04-28 2018-04-28 Virus detection method and related device

Country Status (1)

Country Link
CN (1) CN110210218B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110795638A (en) * 2019-11-13 2020-02-14 北京百度网讯科技有限公司 Method and apparatus for outputting information
CN111753290B (en) * 2020-05-26 2024-05-28 郑州启明星辰信息安全技术有限公司 Software type detection method and related equipment
CN113032742B (en) * 2021-01-26 2022-02-22 北京安华金和科技有限公司 Data desensitization method and device, storage medium and electronic device
CN117152260B (en) * 2023-11-01 2024-02-06 张家港长三角生物安全研究中心 Method and system for detecting residues of disinfection apparatus

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8401982B1 (en) * 2010-01-14 2013-03-19 Symantec Corporation Using sequencing and timing information of behavior events in machine learning to detect malware
CN103150509A (en) * 2013-03-15 2013-06-12 长沙文盾信息技术有限公司 Virus detection system based on virtual execution
CN103577756A (en) * 2013-11-05 2014-02-12 北京奇虎科技有限公司 Virus detection method and device based on script type judgment
CN103839003A (en) * 2012-11-22 2014-06-04 腾讯科技(深圳)有限公司 Malicious file detection method and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8401982B1 (en) * 2010-01-14 2013-03-19 Symantec Corporation Using sequencing and timing information of behavior events in machine learning to detect malware
CN103839003A (en) * 2012-11-22 2014-06-04 腾讯科技(深圳)有限公司 Malicious file detection method and device
CN103150509A (en) * 2013-03-15 2013-06-12 长沙文盾信息技术有限公司 Virus detection system based on virtual execution
CN103577756A (en) * 2013-11-05 2014-02-12 北京奇虎科技有限公司 Virus detection method and device based on script type judgment

Also Published As

Publication number Publication date
CN110210218A (en) 2019-09-06

Similar Documents

Publication Publication Date Title
CN110210218B (en) Virus detection method and related device
US10803171B2 (en) Virus detection method, terminal and server
EP3258409B1 (en) Device for detecting terminal infected by malware, system for detecting terminal infected by malware, method for detecting terminal infected by malware, and program for detecting terminal infected by malware
CN107315954B (en) File type identification method and server
US11888881B2 (en) Context informed abnormal endpoint behavior detection
CN111460446B (en) Malicious file detection method and device based on model
JP6711000B2 (en) Information processing apparatus, virus detection method, and program
KR102095853B1 (en) Virus database acquisition method and device, equipment, server and system
CN112528284A (en) Malicious program detection method and device, storage medium and electronic equipment
EP3905084A1 (en) Method and device for detecting malware
CN112153062B (en) Multi-dimension-based suspicious terminal equipment detection method and system
CN111339531B (en) Malicious code detection method and device, storage medium and electronic equipment
CN110210216B (en) Virus detection method and related device
CN110210215B (en) Virus detection method and related device
CN113378161A (en) Security detection method, device, equipment and storage medium
CN114491523A (en) Malicious software detection method and device, electronic equipment, medium and product
CN113486359B (en) Method and device for detecting software loopholes, electronic device and storage medium
US11176251B1 (en) Determining malware via symbolic function hash analysis
CN114301672B (en) Network risk detection method and device and electronic equipment
CN113094709B (en) Detection method, device and server for risk application
CN113626817B (en) Malicious code family classification method
CN115550672B (en) Bullet screen burst behavior identification method and system in network live broadcast environment
Wang et al. Malicious Code Detection Technology Based on Metadata Machine Learning
KR101726360B1 (en) Method and server for generating suffix tree, method and server for detecting malicious code with using suffix tree
CN111191234A (en) Method and device for detecting virus information

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant