CN110210216B - Virus detection method and related device - Google Patents

Virus detection method and related device Download PDF

Info

Publication number
CN110210216B
CN110210216B CN201810332378.XA CN201810332378A CN110210216B CN 110210216 B CN110210216 B CN 110210216B CN 201810332378 A CN201810332378 A CN 201810332378A CN 110210216 B CN110210216 B CN 110210216B
Authority
CN
China
Prior art keywords
sample
file
operation code
target
code vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810332378.XA
Other languages
Chinese (zh)
Other versions
CN110210216A (en
Inventor
雷经纬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201810332378.XA priority Critical patent/CN110210216B/en
Publication of CN110210216A publication Critical patent/CN110210216A/en
Application granted granted Critical
Publication of CN110210216B publication Critical patent/CN110210216B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/562Static detection
    • G06F21/563Static detection by source code analysis

Abstract

The embodiment of the invention discloses a virus detection method, which comprises the following steps: acquiring a target operation code vector of a file to be detected, wherein the target operation code vector is generated according to at least one instruction; acquiring a target sample label corresponding to the target operation code vector through a virus detection model, wherein the virus detection model is obtained by training according to a positive sample operation code vector set and a negative sample operation code vector set, and the virus detection model is used for representing the relation between the operation code vector and the sample label; and determining a virus detection result of the file to be detected according to the target sample label. The embodiment of the invention also provides a virus detection device. The embodiment of the invention can save the process of manually extracting the feature code on one hand, and can sense unknown viruses on the other hand, thereby being beneficial to improving the safety of the scheme.

Description

Virus detection method and related device
Technical Field
The present invention relates to the field of information security technologies, and in particular, to a method and a related apparatus for virus detection.
Background
With the development of computer technology and network technology, the types of viruses are more and more, and viruses with strong destructiveness and concealment exist for a long time. The virus is a program or a piece of executable code, and has the characteristics of self-reproduction, mutual infection and activation regeneration like a biological virus. They can attach themselves to various types of files and when files are copied or transferred from one user to another they are spread along with the files.
At present, the virus detection is usually performed in the following manner, first, a virus sample labeled manually is analyzed, then a binary segment is extracted from the virus sample as a feature code, and if a file to be detected hits the feature code, the file is indicated to carry the virus.
However, the above method for determining whether a file carries a virus has the following problems: because the feature code is determined in advance, once a novel virus appears, the novel virus is difficult to detect, in other words, the existing scheme cannot detect the unknown virus, and is not favorable for information security.
Disclosure of Invention
The embodiment of the invention provides a virus detection method and a related device, which can save the process of manually extracting feature codes on one hand, can sense unknown viruses on the other hand, and are favorable for improving the safety of a scheme.
In a first aspect of the present invention, there is provided a method for detecting a virus, comprising:
acquiring a target operation code vector of a file to be detected, wherein the target operation code vector is generated according to at least one instruction;
acquiring a target sample label corresponding to the target operation code vector through a virus detection model, wherein the virus detection model is obtained by training according to a positive sample operation code vector set and a negative sample operation code vector set, and the virus detection model is used for representing the relation between the operation code vector and the sample label;
and determining the virus detection result of the file to be detected according to the target sample label.
In a second aspect of the present invention, there is provided a virus detection apparatus comprising:
the file processing device comprises an acquisition module, a storage module and a processing module, wherein the acquisition module is used for acquiring a target operation code vector of a file to be detected, and the target operation code vector is generated according to at least one instruction;
the obtaining module is further configured to obtain a target sample label corresponding to the target operation code vector through a virus detection model, where the virus detection model is obtained by training according to a positive sample operation code vector set and a negative sample operation code vector set, and the virus detection model is used to represent a relationship between the operation code vector and the sample label;
and the determining module is used for determining the virus detection result of the file to be detected according to the target sample label acquired by the acquiring module.
A third aspect of the present invention provides a virus detection apparatus, including: a memory, a transceiver, a processor, and a bus system;
wherein the memory is used for storing programs;
the processor is used for executing the program in the memory and comprises the following steps:
acquiring a target operation code vector of a file to be detected, wherein the target operation code vector is generated according to at least one instruction;
acquiring a target sample label corresponding to the target operation code vector through a virus detection model, wherein the virus detection model is obtained by training according to a positive sample operation code vector set and a negative sample operation code vector set, and the virus detection model is used for representing the relation between the operation code vector and the sample label;
determining a virus detection result of the file to be detected according to the target sample label;
the bus system is used for connecting the memory and the processor so as to enable the memory and the processor to communicate.
A fourth aspect of the present invention provides a computer-readable storage medium having stored therein instructions, which, when run on a computer, cause the computer to perform the method of the above-described aspects.
According to the technical scheme, the embodiment of the invention has the following advantages:
the embodiment of the invention provides a virus detection method, which comprises the steps of firstly obtaining a target operation code vector of a file to be detected, wherein the target operation code vector is generated according to at least one instruction, then obtaining a target sample label corresponding to the target operation code vector through a virus detection model, wherein the virus detection model is obtained through training according to a positive sample operation code vector set and a negative sample operation code vector set, the virus detection model is used for representing the relation between the operation code vector and the sample label, and finally determining the virus detection result of the file to be detected according to the target sample label. By the mode, on one hand, the process of manually extracting the feature codes can be saved, the sample label of the file to be detected can be obtained by directly analyzing the virus detection model, the sample label can represent whether the file to be detected has the virus, on the other hand, the virus detection model is obtained by training a large number of positive and negative samples, and the virus detection model has good virus prediction capability, so that unknown viruses can be perceived, and the scheme safety is favorably improved.
Drawings
FIG. 1 is a schematic diagram of an architecture of a virus detection system according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a call relationship of a virus detection system according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of an embodiment of a method for virus detection according to an embodiment of the present invention;
FIG. 4 is a flowchart illustrating an embodiment of obtaining a target opcode vector;
FIG. 5 is a schematic diagram of an internal structure of a payload file according to an embodiment of the invention;
FIG. 6 is a diagram illustrating a format of an instruction according to an embodiment of the present invention;
FIG. 7 is a schematic flow chart of training a virus detection model according to an embodiment of the present invention;
FIG. 8 is a schematic diagram of an embodiment of a decision tree in an embodiment of the present invention;
FIG. 9 is a schematic flowchart illustrating an exemplary process of checking a file to be checked according to an embodiment of the present invention;
FIG. 10 is a schematic flow chart of virus detection in an application scenario of the present invention;
FIG. 11 is a schematic diagram of an embodiment of a virus detection apparatus in an embodiment of the present invention;
FIG. 12 is a schematic diagram of another embodiment of a virus detection apparatus according to an embodiment of the present invention;
FIG. 13 is a schematic structural diagram of a virus detection apparatus according to an embodiment of the present invention.
Detailed Description
The embodiment of the invention provides a virus detection method and a related device, which can save the process of manually extracting feature codes on one hand, and can sense unknown viruses on the other hand, thereby being beneficial to improving the safety of a scheme.
The terms "first," "second," "third," "fourth," and the like in the description and in the claims, as well as in the drawings, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are, for example, capable of operation in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
It should be understood that the present invention is mainly applicable to detection of Android (Android) virus, and may also be applied to detection of other types of viruses, such as computer virus detection, apple system (iOS) virus detection, microsoft system (Windows) virus detection, etc., and the Android virus detection is taken as an example in the present scheme for description. Android is published with a series of core application packages including clients, short Message Service (SMS) programs, calendars, maps, browsers, and contact management programs.
Meanwhile, the Android system is also subject to the Android virus damage, such as "hundred-bug trojan" (which infects popularization applications), "lizard tail trojan" (which infects system library files, replaces system files, injects system processes, steals user information, monitors calls and messages, etc.), and "authority killer" (which can fight against security software, monitor messages, pop advertisements, promote and brush traffic), etc. The scheme can be used for detecting the known Android viruses and can also be used for detecting other unknown Android viruses.
Referring to fig. 1, fig. 1 is a schematic diagram of an architecture of a virus detection system according to an embodiment of the present invention, and as shown in the drawing, a virus detection device in the present disclosure may be deployed in a server, and after the server obtains a virus detection result, the virus detection result is sent to a terminal device, so that a user can know the virus detection result of a file to be detected through a display interface of the terminal device. Optionally, the virus detection device in the present solution may also be deployed in a terminal device, and the terminal device directly detects the file to be detected and displays the virus detection result on a display interface at the front end.
The virus detection device of the present invention may comprise three logic modules, each for implementing a corresponding function. Referring to fig. 2, fig. 2 is a schematic diagram of a call relationship of a virus detection system according to an embodiment of the present invention, and as shown in the figure, the three logic modules are a vector generation module, a random forest model training module, and a detection flow control module, respectively. The vector generation module is an independent module and is called by the other two modules. A batch of Android virus samples and Android safety samples are input through a random forest model training module, the random forest model training module calls operation code vectors of positive and negative samples obtained by a vector generation module, and then the operation code vectors are input into an Artificial Intelligence (AI) model (specifically, a random forest model) to obtain a model file. And the detection flow control module calls the vector generation module to obtain a target operation code vector of the file to be detected, and finally, the file to be detected is sent to an AI model (such as a random forest model) to obtain the safety state of the sample to be detected.
Referring to fig. 3, a method for detecting a virus according to an embodiment of the present invention is described below with reference to a virus detection apparatus, where the method for detecting a virus according to an embodiment of the present invention includes:
101. acquiring a target operation code vector of a file to be detected, wherein the target operation code vector is generated according to at least one instruction;
in this embodiment, first, the virus detection apparatus receives a virus detection instruction, where the virus detection instruction carries an identifier of a file to be detected, and the file to be detected can be determined by using the identifier. And then, extracting the operation code vector of the file to be detected, and obtaining a target operation code vector.
The target operation code vector is generated according to at least one instruction, and the target operation code vector comprises at least one vector element, and each vector element corresponds to one instruction.
102. Acquiring a target sample label corresponding to a target operation code vector through a virus detection model, wherein the virus detection model is obtained by training according to a positive sample operation code vector set and a negative sample operation code vector set, and the virus detection model is used for expressing the relation between the operation code vector and the sample label;
in this embodiment, the virus detection apparatus inputs the target operation code vector to a virus detection model obtained by pre-training, and outputs the target sample label of the file to be detected through the model.
The virus detection model is obtained by training a large number of positive sample operation code vectors (namely, a positive sample operation code vector set) and a large number of negative sample operation code vectors (namely, a negative sample operation code vector set), and the positive sample operation code vector set and the negative sample operation code vector set are added into the AI model for training to obtain an AI model library file, wherein the AI model library file indicates whether viruses exist or not by adopting sample labels (for example, "1" indicates a virus sample, and "0" indicates a safety sample). After the target operation code vector of the file to be detected passes through the virus detection model, a corresponding target sample label is also output.
103. And determining the virus detection result of the file to be detected according to the target sample label.
In this embodiment, the virus detection device determines the virus detection result of the file to be detected according to the target sample label, and can send the virus detection result to the client, so that the user can know whether the file to be detected is safe through the client.
The embodiment of the invention provides a virus detection method, which comprises the steps of firstly obtaining a target operation code vector of a file to be detected, wherein the target operation code vector is generated according to at least one instruction, then obtaining a target sample label corresponding to the target operation code vector through a virus detection model, wherein the virus detection model is obtained by training according to a positive sample operation code vector set and a negative sample operation code vector set, the virus detection model is used for expressing the relation between the operation code vector and the sample label, and finally determining the virus detection result of the file to be detected according to the target sample label. By the method, on one hand, the process of manually extracting the feature codes can be saved, the sample label of the file to be detected can be obtained by directly analyzing the virus detection model, the sample label can indicate whether the file to be detected has the virus, on the other hand, the virus detection model is obtained by training a large number of positive and negative samples, and the virus detection model has good virus prediction capability, so that unknown viruses can be perceived, and the scheme safety can be improved.
Optionally, on the basis of the embodiment corresponding to fig. 3, in a first optional embodiment of the method for detecting a virus according to the embodiment of the present invention, the obtaining a target opcode vector of a file to be detected may include:
if the file to be detected comprises the effective load file, acquiring an instruction set of the file to be detected, wherein the instruction set comprises at least one instruction;
and numbering each instruction in the instruction set according to an instruction numbering rule to generate a target operation code vector of the file to be detected.
In this embodiment, the virus detection apparatus first determines whether a payload (payload) file exists in the file to be detected, and if so, the virus detection apparatus may obtain an instruction set from the file to be detected, where the instruction set generally includes at least one instruction, then number each instruction in the instruction set according to an instruction numbering rule, and finally generate a target opcode vector of the file to be detected.
It will be appreciated that viruses typically do some harmful or malignant action, and the part of the virus code that performs this function is called a payload file. payload files enable what a program running in the victim's environment can do, enabling actions to be performed including, but not limited to, corrupting files, deleting files, sending sensitive information to the author of the virus or any recipient, and providing backdoors to infected computers.
How to extract the payload file will be described below. Taking an Android (Android) file as an example, an Android installation Package (Android Package, apk) is an executable program, and a specific implementation may be a compressed Package, please refer to table 1, where table 1 is an illustration of a file structure in the compressed Package.
TABLE 1
Files or directories Function of
META-INF/ That is, directory for introducing description package information from Java jar file for manifest
res/ Directory for storing resource files
libs/ If so, the so library compiled by ndk is stored
AndroidManifest.xml Program global configuration file
classes.dex Finally generated dalvik bytecode
resources.ars Compiled binary resource file
From table 1, a file suffixed with ". Dex" can be found after opening the "res/" file, and the file suffixed with ". Dex" is a payload file, but the "classes. In other words, dex files other than "classes. If the file to be detected does not contain the payload file, the file is not in the virus detection range, and therefore the security detection of the file to be detected can be quitted in advance.
Specifically, referring to fig. 4, fig. 4 is a schematic flowchart illustrating a process of obtaining a target opcode vector according to an embodiment of the present invention, as shown in step 201, a file to be detected is obtained, where the file to be detected may be an installation package file, a picture file, a video file, a document file, an audio file, or an application program, and is not limited herein. In step 202, a payload file is extracted from the file to be detected, the suffix of the payload file is ". Dex", but the "classes. In step 203, a set of instructions is extracted from each payload file, wherein a set of instructions typically includes a plurality of instructions. In step 204, the distribution of the operation codes corresponding to each instruction set is counted according to each instruction set, and finally, in step 205, the target operation code vector related to the instruction set can be generated according to the distribution of the operation codes.
Secondly, in the embodiment of the present invention, if the file to be detected contains a payload file, the virus detection apparatus may obtain an instruction set of the file to be detected, and number each instruction in the instruction set according to an instruction number rule, so as to generate a target opcode vector of the file to be detected. By the method, the instruction set can be extracted and processed for the file to be detected containing the payload file, the payload file can reflect whether harmful or malignant operation is possible to occur on the file to be detected, and virus detection is not needed if no payload file exists, so that the success rate of virus detection is improved.
Optionally, on the basis of the first embodiment corresponding to fig. 3, in a second optional embodiment of the method for virus detection provided in the embodiment of the present invention, the target opcode vector includes at least one vector element;
numbering each instruction in the instruction set according to an instruction numbering rule to generate a target operation code vector of a file to be detected, which may include:
counting the occurrence times of each operation code in the instruction set;
generating vector elements according to the instruction numbering rule and the occurrence number of each operation code;
and generating a target operation code vector of the file to be detected according to the vector elements.
In this embodiment, the target opcode vector corresponds to a method code segment, and for ease of understanding, referring to fig. 5, fig. 5 is a schematic diagram illustrating an internal structure of a payload file according to an embodiment of the present invention, where a class (class) includes a plurality of method code segments and a method code segment corresponds to a plurality of instructions. In brief, class refers to a class, which represents a type of thing. class can instantiate an object by a construction method, generally, objects of Java (java) code use class are used for completing functions by calling a method, so that the instruction of the method characterizes that the functions are the key detection objects of us. method refers to the code in class that performs a certain function.
Specifically, referring to fig. 6, fig. 6 is a schematic diagram of a format of an instruction according to an embodiment of the present invention, where, as shown, an instruction at least includes two parts, namely an operation code and data, the operation code is used to indicate a type of an operation, and the data is used to indicate content of the operation.
How to generate the target opcode vector for the file to be detected will be described below. A file to be detected may include multiple payload files, a payload file may include multiple classes, a class includes multiple methods, and a method includes multiple instructions, and a target opcode vector may be an instruction corresponding to any one of the methods in the file to be detected.
Since the type of the opcode in the instruction is predetermined, each instruction may be assigned a number (e.g., starting with 1), i.e., an instruction numbering rule is formed, such as "mov" instruction corresponding to number 1, "add" instruction corresponding to number 2, and so on. Referring to Table 2, table 2 is an illustration of instruction numbering conventions.
TABLE 2
Instructions Number of
mov (moving instruction) 1
add (add instruction) 2
invoke (Call instruction) 3
nop (null instruction) 4
move-wide (Wide moving instruction) 5
return (return instruction) 6
return-wide (Wide return instruction) 7
Firstly, the virus detection apparatus extracts a method code segment of the payload file, and extracts an operation code of an instruction, assuming that the instruction of a certain method code segment is as follows:
mov va,1
add va,vb
invoke xxx
the opcode sequence is "mov, add, and invoke", the opcodes of all method code segments of the payload are taken, the number of occurrences of each opcode is counted, the number of occurrences of all opcodes is arranged according to the order of the instruction numbering rule, a string of numbers can be formed, and the string of numbers is the opcode vector of the sample. It is understood that if a certain opcode does not appear in the payload file, the number is recorded as 0.
For example, if the "mov" instruction occurs 3 times, the "add" instruction occurs 1 time, the "invoke" instruction occurs 2 times, the "nop" instruction occurs 5 times, the "move-wide" instruction occurs 0 times, the "return" instruction occurs 1 time, and the "return-wide" instruction occurs 0 times, the opcode vector may be expressed as [3,1,2,5,0,1,0].
It should be noted that the above example is only an illustration, and in practical applications, the target opcode vector, the negative sample opcode vector, and the positive sample opcode vector of the file to be detected can be obtained in the above manner.
Thirdly, in the embodiment of the present invention, the virus detection apparatus processes the occurrence frequency of each instruction by using the instruction numbering rule, obtains the vector element, and determines the required target opcode vector according to the generated at least one vector element. Through the mode, the target operation code vector of the file to be detected can be automatically generated by utilizing the relation between the instruction indicated by the instruction numbering rule and the serial number, and the operation code vector does not need to be manually extracted, so that the efficiency of file safety detection is favorably improved, the labor cost is reduced, and the practicability of the scheme is improved.
Optionally, on the basis of the embodiment corresponding to fig. 3, in a third optional embodiment of the method for detecting a virus according to the embodiment of the present invention, before obtaining, by using a virus detection model, a target sample tag corresponding to a target operation code vector, the method may further include:
acquiring a positive sample operation code vector set and a negative sample operation code vector set, wherein the positive sample operation code vector set comprises at least one positive sample operation code vector, the negative sample operation code vector set comprises at least one negative sample operation code vector, the positive sample operation code vector is generated according to at least one instruction in a safety sample, and the negative sample operation code vector is generated according to at least one instruction in a virus sample;
and training the positive sample operation code vector set and the negative sample operation code vector set to obtain a random forest model.
In this embodiment, the virus detection model may be specifically a random forest model, and in combination with the second embodiment corresponding to fig. 3, the operation code vectors of the positive samples and the operation code vectors of the negative samples may be generated in a similar manner. Wherein positive samples can be considered as safe samples and negative samples are virus samples.
Specifically, how to train to obtain the virus detection model will be described, and how to train to obtain the virus detection model will be described below with reference to fig. 7, referring to fig. 7, fig. 7 is a schematic flow chart of training the virus detection model in an embodiment of the present invention, as shown in the figure, in step 301, a batch of positive samples and negative samples are obtained first, where the positive samples are safety samples, and the negative samples are virus samples. In step 302, it is also necessary to extract the opcode vector for each positive sample to generate a set of positive sample opcode vectors, and to extract the opcode vector for each negative sample to generate a set of negative sample opcode vectors. In step 303, a set of positive sample opcode vectors and a set of negative sample opcode vectors are trained using a random forest model.
The random forest model refers to a classifier that trains and predicts a sample by using a plurality of trees. In machine learning, a random forest model is a classifier that contains multiple decision trees, and the class of its output is dependent on the mode of the class output by the individual trees.
It is understood that, in practical applications, the virus detection model may also be a Recurrent Neural Networks (RNN) model or a Deep Neural Networks (DNN) model, which is not limited herein.
Secondly, in the embodiment of the present invention, the virus detection apparatus may obtain a positive sample operation code vector set and a negative sample operation code vector set in advance, and then train the positive sample operation code vector set and the negative sample operation code vector set to obtain the random forest model. By the mode, the random forest model is used as the virus detection model, the random forest model can generate a high-accuracy result and can process a large amount of data, and the random forest model can balance errors for unbalanced classification conditions, so that the practicability and operability of the scheme are improved.
Optionally, on the basis of the third embodiment corresponding to fig. 3, in a fourth optional embodiment of the method for virus detection provided in the embodiment of the present invention, obtaining, by using a virus detection model, a target sample tag corresponding to a target opcode vector may include:
inputting the target operation code vector into a random forest model to obtain a sample judgment result corresponding to the target operation code vector, wherein the random forest model comprises a plurality of decision trees, and each decision tree is used for outputting a positive result or a negative result;
and determining a target sample label according to the sample judgment result.
In this embodiment, the virus detection apparatus may determine the target sample label through the random forest model. Firstly, a target operation code vector is required to be input into a random forest model to obtain a sample judgment result corresponding to the target operation code vector, wherein the random forest model comprises a plurality of decision trees, each decision tree is used for outputting a positive result or a negative result of a sample, the positive result can represent a safety sample, the negative result can represent a virus sample, and finally a target sample label is determined according to the sample result output by each decision tree.
For convenience of introduction, please refer to fig. 8, where fig. 8 is a schematic diagram of an embodiment of a decision tree according to an embodiment of the present invention, a random forest model in the present embodiment is composed of a plurality of decision trees shown in fig. 8, each decision tree has a decision result (security sample or virus sample), and a result with a larger number of decision times is taken as a final sample decision result. And if the conclusion times are the same, the sample is considered as a virus sample. The depth of the decision tree shown in fig. 8 is 3, the depth of the decision tree means that the result of the discrimination needs to be determined at most several times to know the result, and the decision tree in fig. 8 needs to be determined 2 times or 3 times to know the result, so the depth of the decision tree is 3.
It can be understood that, the parameter configuration of the random forest model needs to be performed in advance, and after long-term experience accumulation and experimental treatment, the parameter configuration in this scheme may be that the number of decision trees in the random forest model is between 50 and 80, the depth of each tree is between 20 and 40, and the minimum number of leaf nodes samples is less than or equal to 5. However, in practical applications, the above parameters may also be adjusted according to the situation, which is only an illustration here and should not be construed as a limitation of the present solution.
Thirdly, in the embodiment of the invention, the virus detection device inputs the target operation code vector to the random forest model to obtain a sample judgment result corresponding to the target operation code vector, and then determines the target sample label according to the sample judgment result. Through the mode, the target sample label of the file to be detected can be determined by utilizing the output result of the plurality of decision trees in the random forest model, and the decision trees are basic classifiers and have the advantages of strong readability and high classification speed, so that the operability and the detection efficiency of the scheme can be improved.
Optionally, on the basis of the fourth embodiment corresponding to fig. 3, in a fifth optional embodiment of the method for detecting a virus according to the embodiment of the present invention, obtaining a sample judgment result corresponding to the target opcode vector may include:
acquiring the quantity of positive results and the quantity of negative results corresponding to the target operation code vector according to the output result of each decision tree in the random forest model;
determining a target sample label according to the sample judgment result may include:
and if the number of the negative results is less than the number of the positive results, determining that the target sample label is a positive label.
In this embodiment, after the virus detection apparatus inputs the file to be detected to the random forest model, the random forest model outputs a sample determination result. Assuming that there are 50 trees in the random forest model, each tree will output one result, and if 22 of the 50 trees output negative results, the remaining 28 trees output positive results. Then the number of negative results is less than the number of positive results (i.e., 22 is less than 28) and the target exemplar label is deemed a positive label.
It can be understood that the positive tag can be regarded as a security tag, and if the target sample tags of the to-be-detected document are all security tags, it means that the to-be-detected document is also a secure document, or it is determined that the security of the to-be-detected document is in an unknown state.
Further, in the embodiment of the present invention, the virus detection apparatus may first obtain, according to an output result of each decision tree in the random forest model, a positive result number and a negative result number corresponding to the target operation code vector, and determine that the target sample label is a positive label if the negative result number is smaller than the positive result number. Through the mode, the target sample label corresponding to the file to be detected is generated according to the principle that a small number of files obeys majority, and the target sample label can be judged to be a positive label under the condition that the quantity of negative results is smaller than that of positive results, so that the feasibility and the practicability of the scheme are improved.
Optionally, on the basis of the fourth embodiment corresponding to fig. 3, in a sixth optional embodiment of the method for virus detection provided in the embodiment of the present invention, obtaining the sample judgment result corresponding to the target opcode vector may include:
acquiring the number of positive results and the number of negative results corresponding to the target operation code vector according to the output result of each decision tree in the random forest model;
determining the target sample label according to the sample judgment result may include:
and if the number of the negative results is larger than or equal to the number of the positive results, determining that the target sample label is a negative label.
In this embodiment, after the virus detection apparatus inputs the file to be detected to the random forest model, the random forest model outputs a sample determination result. Assuming that there are a total of 50 trees in the random forest model, each tree will output one result, and if 35 of the 50 trees output negative results, the remaining 15 trees output positive results. Then the number of negative results is greater than the number of positive results (i.e., 35 is greater than 15) and the target specimen label is deemed to be a negative label.
It is understood that the negative label can be regarded as a virus label, and if the target sample label of the file to be detected includes a virus label, it means that the file to be detected is a file carrying virus.
Further, in the embodiment of the present invention, the virus detection apparatus may first obtain the number of positive results and the number of negative results corresponding to the target opcode vector according to the output result of each decision tree in the random forest model, and determine that the target sample label is a negative label if the number of negative results is greater than or equal to the number of positive results. Through the mode, the target sample label corresponding to the file to be detected is generated according to the principle that a small number of samples obey majority, and the target sample label can be judged to be the negative label under the condition that the number of the negative results is larger than or equal to that of the positive results, so that the feasibility and the practicability of the scheme are improved.
Optionally, on the basis of the fifth or sixth embodiment corresponding to fig. 3, in a seventh optional embodiment of the method for detecting a virus provided in the embodiment of the present invention, determining a virus detection result of a file to be detected according to a target sample tag may include:
if the target sample label is a positive label, determining that the file to be detected is a safe file;
and if the target sample label is a negative label, determining that the file to be detected is a virus file.
In this embodiment, the virus detection model may be a random forest model, the random forest model is obtained by training a large number of positive sample operation code vectors and a large number of negative sample operation code vectors, and the positive sample operation code vectors and the negative sample operation code vectors are added to the random forest model for training, so as to obtain a model library file, and the model library file indicates whether viruses exist or not by using a sample label. And outputting a corresponding target sample label after the target random forest model vector of the file to be detected passes through the random forest model.
If the output target sample label is a positive label, the file to be detected is a safe file or a file with unknown security, otherwise, if the output target sample label is a negative label, the file to be detected is a virus file. This is because although some unknown viruses can be predicted by using the virus detection model, it is difficult to ensure that all viruses can be detected, and therefore, the document to be detected that is positively tagged can be temporarily regarded as a safe document.
It should be noted that the positive label represents a security label and may also be represented as "0", the negative label represents a temporary virus label and may be represented as "1", and in practical applications, the positive label and the negative label may also be represented in other forms, which are not limited herein.
Referring to fig. 9, a flow of detecting a to-be-detected file will be described below, and fig. 9 is a schematic flow chart of detecting a to-be-detected file according to an embodiment of the present invention, as shown in the figure, specifically:
in step 401, a batch of positive samples and negative samples are obtained, where the positive samples may refer to safety samples, and the negative samples may refer to virus samples. It should be noted that, in practical applications, the positive sample may also be set as a virus sample, and the negative sample may be set as a safety sample, which depends on the setting of the positive and negative samples by the user in advance;
in step 402, respectively extracting the opcode vector of the positive sample and the opcode vector of the negative sample;
in step 403, the operation code vectors of the positive samples and the operation code vectors of the negative samples are input into an AI model for training, wherein the AI model may be a random forest model;
in step 404, a model library file may be obtained after model training, and the model library file may be stored and copied for subsequent virus detection and call, where the model library file may be understood as a configuration file;
in step 405, a file to be detected is obtained;
in step 406, extracting a target operation code vector of the file to be detected, inputting the target operation code vector of the file to be detected into the model, and outputting a sample label of the file to be detected;
in step 407, judging whether the sample label of the file to be detected is consistent with the virus label, if so, entering step 408, otherwise, if not, skipping to step 409;
in step 408, determining the file to be detected as a virus file;
in step 409, the security condition of the document to be detected cannot be determined, or the document to be detected is considered to be a secure document.
Secondly, in the embodiment of the invention, the virus detection device determines the security of the file to be detected according to the virus detection result, if the target sample label is a negative label, the file to be detected is determined to be a virus file, otherwise, if the target sample label is a positive label, the file to be detected is determined to be a security file. By the method, the target sample label of the file to be detected is predicted by the virus detection model, the obtained target sample label is consistent with the virus label, the file to be detected can be determined to have the virus, the virus prediction capability of the file to be detected is realized, unknown virus can be sensed, and the scheme safety is improved.
For convenience of understanding, the following will describe the flow of virus detection with reference to fig. 10, please refer to fig. 10, fig. 10 is a schematic flow chart of virus detection in the application scenario of the present invention, as shown in the figure, specifically:
in step 501, virus detection is started;
in step 502, a batch of positive samples and negative samples for virus detection model training are selected;
in step 503, selecting a file to be detected;
in step 504, it is determined whether the file to be detected obtained in step 503 contains a payload file, if so, step 505 is entered, otherwise, if the file to be detected does not contain a payload file, step 513 is skipped;
in step 505, the method specifically includes four steps, a positive sample, a negative sample and a file to be detected are obtained in step 5051, payload files in the positive sample, the negative sample and the file to be detected are extracted in step 5052, then in step 5053, an instruction set corresponding to each payload file is obtained, distribution of operation codes is counted according to the instruction sets, wherein one instruction set comprises at least one instruction, and finally in step 5054, an operation code vector of the positive sample, an operation code vector of the negative sample and an operation code vector of the file to be detected are generated in a distributed manner;
in step 506, the operation code vectors of the positive samples and the operation code vectors of the negative samples are input into a random forest model for training;
in step 507, a model library file can be obtained after model training, the model library file can be stored and copied for subsequent virus detection and calling, and the model library file can be understood as a configuration file;
in step 508, extracting the operation code vector of the file to be detected, and inputting the operation code vector of the file to be detected into the random forest model;
in step 509, a sample label of the file to be detected can be obtained by using a random forest model obtained by training the model library file, namely a virus detection model;
in step 510, judging whether a sample label of the file to be detected is consistent with a virus label, if so, entering step 511, otherwise, if not, skipping to step 512;
in step 511, the file to be detected is determined as a virus file;
in step 512, the security condition of the file to be detected cannot be judged;
in step 513, the security detection of the file to be detected is ended.
Referring to fig. 11, fig. 11 is a schematic view of an embodiment of a virus detection apparatus according to an embodiment of the present invention, in which a virus detection apparatus 60 includes:
an obtaining module 601, configured to obtain a target operation code vector of a file to be detected, where the target operation code vector is generated according to at least one instruction;
the obtaining module 601 is further configured to obtain a target sample tag corresponding to the target operation code vector through a virus detection model, where the virus detection model is obtained by training according to a positive sample operation code vector set and a negative sample operation code vector set, and the virus detection model is used to represent a relationship between an operation code vector and a sample tag;
a determining module 602, configured to determine a virus detection result of the file to be detected according to the target sample label obtained by the obtaining module.
In this embodiment, the obtaining module 601 obtains a target operation code vector of a file to be detected, where the target operation code vector is generated according to at least one instruction, the obtaining module 601 obtains a target sample tag corresponding to the target operation code vector through a virus detection model, where the virus detection model is obtained by training according to a positive sample operation code vector set and a negative sample operation code vector set, the virus detection model is used to represent a relationship between an operation code vector and a sample tag, and the determining module 602 determines a virus detection result of the file to be detected according to the target sample tag obtained by the obtaining module.
The embodiment of the invention provides a virus detection device, which comprises the steps of firstly obtaining a target operation code vector of a file to be detected, wherein the target operation code vector is generated according to at least one instruction, then obtaining a target sample label corresponding to the target operation code vector through a virus detection model, wherein the virus detection model is obtained by training according to a positive sample operation code vector set and a negative sample operation code vector set, the virus detection model is used for expressing the relation between the operation code vector and the sample label, and finally determining the virus detection result of the file to be detected according to the target sample label. By the method, on one hand, the process of manually extracting the feature codes can be saved, the sample label of the file to be detected can be obtained by directly analyzing the virus detection model, the sample label can indicate whether the file to be detected has the virus, on the other hand, the virus detection model is obtained by training a large number of positive and negative samples, and the virus detection model has good virus prediction capability, so that unknown viruses can be perceived, and the scheme safety can be improved.
Optionally, on the basis of the embodiment corresponding to fig. 11, in another embodiment of the virus detection apparatus 60 provided in the embodiment of the present invention,
the obtaining module 601 is specifically configured to obtain an instruction set of the file to be detected if the file to be detected includes a payload file, where the instruction set includes the at least one instruction;
and numbering each instruction in the instruction set according to an instruction numbering rule to generate the target operation code vector of the file to be detected.
Secondly, in the embodiment of the present invention, if the file to be detected contains a payload file, the virus detection apparatus may obtain an instruction set of the file to be detected, and number each instruction in the instruction set according to an instruction number rule, so as to generate a target opcode vector of the file to be detected. By the method, the instruction set of the file to be detected containing the payload file can be extracted and processed, the payload file can reflect whether the file to be detected is possibly subjected to harmful or malignant operation, and virus detection is not needed if the payload file does not exist, so that the success rate of virus detection is improved.
Optionally, on the basis of the embodiment corresponding to fig. 11, in another embodiment of the virus detection apparatus 60 provided in the embodiment of the present invention, the target opcode vector includes at least one vector element;
the obtaining module 601 is specifically configured to count the occurrence number of each operation code in the instruction set;
generating vector elements according to the instruction numbering rule and the occurrence number of each operation code;
and generating the target operation code vector of the file to be detected according to the vector elements.
Thirdly, in the embodiment of the present invention, the virus detection apparatus processes the occurrence frequency of each instruction by using the instruction numbering rule, obtains the vector element, and determines the required target opcode vector according to the generated at least one vector element. Through the mode, the target operation code vector of the file to be detected can be automatically generated by utilizing the relation between the instruction indicated by the instruction numbering rule and the serial number, and the operation code vector does not need to be manually extracted, so that the efficiency of file safety detection is favorably improved, the labor cost is reduced, and the practicability of the scheme is improved.
Optionally, on the basis of the embodiment corresponding to fig. 11, referring to fig. 12, in another embodiment of the virus detection apparatus 60 provided in the embodiment of the present invention, the virus detection apparatus 60 further includes a training module 603;
the obtaining module 601 is further configured to obtain the set of positive sample opcode vectors and the set of negative sample opcode vectors before obtaining a target sample tag corresponding to the target opcode vector through a virus detection model, where the set of positive sample opcode vectors includes at least one positive sample opcode vector, the set of negative sample opcode vectors includes at least one negative sample opcode vector, the positive sample opcode vector is generated according to at least one instruction in a security sample, and the negative sample opcode vector is generated according to at least one instruction in a virus sample;
the training module 603 is configured to train the positive sample operation code vector set and the negative sample operation code vector set obtained by the obtaining module 601, so as to obtain a random forest model.
Secondly, in the embodiment of the present invention, the virus detection apparatus may obtain a positive sample operation code vector set and a negative sample operation code vector set in advance, and then train the positive sample operation code vector set and the negative sample operation code vector set to obtain the random forest model. By the mode, the random forest model is used as the virus detection model, the random forest model can generate a high-accuracy result and can process a large amount of data, and the random forest model can balance errors for unbalanced classification conditions, so that the practicability and operability of the scheme are improved.
Optionally, on the basis of the embodiment corresponding to fig. 12, in another embodiment of the virus detection apparatus 60 provided in the embodiment of the present invention,
the obtaining module 601 is specifically configured to input the target opcode vector to the random forest model to obtain a sample determination result corresponding to the target opcode vector, where the random forest model includes multiple decision trees, and each decision tree is used to output a positive result or a negative result;
and determining the target sample label according to the sample judgment result.
Thirdly, in the embodiment of the invention, the virus detection device inputs the target operation code vector to the random forest model to obtain a sample judgment result corresponding to the target operation code vector, and then determines the target sample label according to the sample judgment result. Through the mode, the target sample label of the file to be detected can be determined by utilizing the output result of the plurality of decision trees in the random forest model, and the decision trees are basic classifiers and have the advantages of strong readability and high classification speed, so that the operability and the detection efficiency of the scheme can be improved.
Optionally, on the basis of the embodiment corresponding to fig. 12, in another embodiment of the virus detection apparatus 60 provided in the embodiment of the present invention,
the obtaining module 601 is specifically configured to obtain, according to an output result of each decision tree in the random forest model, a positive result number and a negative result number corresponding to the target opcode vector;
and if the negative result quantity is smaller than the positive result quantity, determining that the target sample label is a positive label.
Further, in the embodiment of the present invention, the virus detection apparatus may first obtain, according to an output result of each decision tree in the random forest model, a positive result number and a negative result number corresponding to the target operation code vector, and determine that the target sample label is a positive label if the negative result number is smaller than the positive result number. Through the mode, the target sample label corresponding to the file to be detected is generated on the basis of the principle that a small number of samples obey majority, and the target sample label can be judged to be a positive label under the condition that the number of negative results is smaller than the number of positive results, so that the feasibility and the practicability of the scheme are improved.
Optionally, on the basis of the embodiment corresponding to fig. 12, in another embodiment of the virus detection apparatus 60 provided in the embodiment of the present invention,
the obtaining module 601 is specifically configured to obtain, according to an output result of each decision tree in the random forest model, a positive result number and a negative result number corresponding to the target opcode vector;
and if the number of the negative results is greater than or equal to the number of the positive results, determining that the target sample label is a negative label.
Further, in the embodiment of the present invention, the virus detection apparatus may first obtain, according to an output result of each decision tree in the random forest model, a positive result number and a negative result number corresponding to the target operation code vector, and if the negative result number is greater than or equal to the positive result number, determine that the target sample label is a negative label. Through the mode, the target sample label corresponding to the file to be detected is generated according to the principle that a small number of samples obey majority, and the target sample label can be judged to be the negative label under the condition that the number of the negative results is larger than or equal to that of the positive results, so that the feasibility and the practicability of the scheme are improved.
Optionally, on the basis of the embodiment corresponding to fig. 12, in another embodiment of the virus detection apparatus 60 provided in the embodiment of the present invention,
the determining module 602 is specifically configured to determine that the file to be detected is a secure file if the target sample label is the positive label;
and if the target sample label is the negative label, determining that the file to be detected is a virus file.
Secondly, in the embodiment of the invention, the virus detection device determines the security of the file to be detected according to the virus detection result, if the target sample label is a negative label, the file to be detected is determined to be a virus file, otherwise, if the target sample label is a positive label, the file to be detected is determined to be a security file. By the method, the target sample label of the file to be detected is predicted by the virus detection model, the obtained target sample label is consistent with the virus label, the file to be detected can be determined to have the virus, the virus prediction capability of the file to be detected is realized, unknown virus can be sensed, and the scheme safety is improved.
Fig. 13 is a schematic structural diagram of a server 700 according to an embodiment of the present invention, where the server 700 may have a relatively large difference due to different configurations or performances, and may include one or more Central Processing Units (CPUs) 722 (e.g., one or more processors) and a memory 732, and one or more storage media 730 (e.g., one or more mass storage devices) for storing applications 742 or data 744. Memory 732 and storage medium 730 may be, among other things, transient storage or persistent storage. The program stored on the storage medium 730 may include one or more modules (not shown), each of which may include a series of instruction operations on a server. Further, the central processor 722 may be configured to communicate with the storage medium 730, and execute a series of instruction operations in the storage medium 730 on the server 700.
The server 700 may also include one or more power supplies 726, one or more wired or wireless network interfaces 750, one or more input-output interfaces 758, and/or one or more operating systems 741, such as Windows Server, mac OS XTM, unixTM, linuxTM, freeBSDTM, and so forth.
The steps performed by the server in the above embodiment may be based on the server structure shown in fig. 13.
CPU 722 is configured to perform the following steps:
acquiring a target operation code vector of a file to be detected, wherein the target operation code vector is generated according to at least one instruction;
acquiring a target sample label corresponding to the target operation code vector through a virus detection model, wherein the virus detection model is obtained by training according to a positive sample operation code vector set and a negative sample operation code vector set, and the virus detection model is used for representing the relation between the operation code vector and the sample label;
and determining the virus detection result of the file to be detected according to the target sample label.
Optionally, the CPU 722 is specifically configured to perform the following steps:
if the file to be detected comprises an effective load file, acquiring an instruction set of the file to be detected, wherein the instruction set comprises the at least one instruction;
and numbering each instruction in the instruction set according to an instruction numbering rule to generate the target operation code vector of the file to be detected.
Optionally, the CPU 722 is specifically configured to perform the following steps:
numbering each instruction in the instruction set according to an instruction numbering rule to generate the target operation code vector of the file to be detected, wherein the numbering comprises the following steps:
counting the occurrence times of each operation code in the instruction set;
generating vector elements according to the instruction numbering rule and the occurrence times of each operation code;
and generating the target operation code vector of the file to be detected according to the vector elements.
Optionally, the CPU 722 is further configured to perform the following steps:
obtaining a set of positive sample opcode vectors and a set of negative sample opcode vectors, wherein the set of positive sample opcode vectors includes at least one positive sample opcode vector, the set of negative sample opcode vectors includes at least one negative sample opcode vector, the positive sample opcode vector is generated according to at least one instruction in a security sample, and the negative sample opcode vector is generated according to at least one instruction in a virus sample;
and training the positive sample operation code vector set and the negative sample operation code vector set to obtain a random forest model.
Optionally, the CPU 722 is specifically configured to perform the following steps:
inputting the target operation code vector into the random forest model to obtain a sample judgment result corresponding to the target operation code vector, wherein the random forest model comprises a plurality of decision trees, and each decision tree is used for outputting a positive result or a negative result;
and determining the target sample label according to the sample judgment result.
Optionally, the CPU 722 is specifically configured to perform the following steps:
acquiring the number of positive results and the number of negative results corresponding to the target operation code vector according to the output result of each decision tree in the random forest model;
and if the negative result quantity is smaller than the positive result quantity, determining that the target sample label is a positive label.
Optionally, the CPU 722 is specifically configured to perform the following steps:
acquiring the number of positive results and the number of negative results corresponding to the target operation code vector according to the output result of each decision tree in the random forest model;
and if the number of the negative results is greater than or equal to the number of the positive results, determining that the target sample label is a negative label.
Optionally, the CPU 722 is specifically configured to perform the following steps:
if the target sample label is the positive label, determining that the file to be detected is a safe file;
and if the target sample label is the negative label, determining that the file to be detected is a virus file.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the embodiments provided in the present invention, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (12)

1. A method for virus detection, comprising:
acquiring a target operation code vector of a file to be detected, wherein the target operation code vector is generated according to at least one instruction;
acquiring a target sample label corresponding to the target operation code vector through a virus detection model, wherein the virus detection model is obtained by training according to a positive sample operation code vector set and a negative sample operation code vector set, and the virus detection model is used for representing the relation between the operation code vector and the sample label;
determining a virus detection result of the file to be detected according to the target sample label;
the acquiring of the target operation code vector of the file to be detected comprises the following steps:
if the file to be detected comprises an effective load file, acquiring an instruction set of the file to be detected, wherein the instruction set comprises the at least one instruction;
and numbering each instruction in the instruction set according to an instruction numbering rule to generate the target operation code vector of the file to be detected.
2. The method of claim 1, wherein the target opcode vector includes at least one vector element;
numbering each instruction in the instruction set according to an instruction numbering rule to generate the target operation code vector of the file to be detected, wherein the numbering comprises the following steps:
counting the occurrence times of each operation code in the instruction set;
generating vector elements according to the instruction numbering rule and the occurrence number of each operation code;
and generating the target operation code vector of the file to be detected according to the vector elements.
3. The method of claim 1, wherein before the obtaining, by the virus detection model, the target sample label corresponding to the target opcode vector, the method further comprises:
obtaining the set of positive sample opcode vectors and the set of negative sample opcode vectors, wherein the set of positive sample opcode vectors includes at least one positive sample opcode vector, the set of negative sample opcode vectors includes at least one negative sample opcode vector, the positive sample opcode vector is generated based on at least one instruction in a security sample, and the negative sample opcode vector is generated based on at least one instruction in a virus sample;
and training the positive sample operation code vector set and the negative sample operation code vector set to obtain a random forest model.
4. The method of claim 3, wherein obtaining, by the virus detection model, the target sample tag corresponding to the target opcode vector comprises:
inputting the target operation code vector into the random forest model to obtain a sample judgment result corresponding to the target operation code vector, wherein the random forest model comprises a plurality of decision trees, and each decision tree is used for outputting a positive result or a negative result;
and determining the target sample label according to the sample judgment result.
5. The method of claim 4, wherein obtaining the sample decision result corresponding to the target opcode vector comprises:
acquiring the quantity of positive results and the quantity of negative results corresponding to the target operation code vector according to the output result of each decision tree in the random forest model;
the determining the target sample label according to the sample judgment result includes:
and if the negative result quantity is smaller than the positive result quantity, determining that the target sample label is a positive label.
6. The method of claim 4, wherein obtaining the sample judgment result corresponding to the target opcode vector comprises:
acquiring the number of positive results and the number of negative results corresponding to the target operation code vector according to the output result of each decision tree in the random forest model;
the determining the target sample label according to the sample judgment result comprises:
and if the number of the negative results is greater than or equal to the number of the positive results, determining that the target sample label is a negative label.
7. The method according to claim 5 or 6, wherein the determining the virus detection result of the file to be detected according to the target sample label comprises:
if the target sample label is a positive label, determining that the file to be detected is a safe file;
and if the target sample label is a negative label, determining that the file to be detected is a virus file.
8. A virus detection device, comprising:
the file processing device comprises an acquisition module, a storage module and a processing module, wherein the acquisition module is used for acquiring a target operation code vector of a file to be detected, and the target operation code vector is generated according to at least one instruction;
the obtaining module is further configured to obtain a target sample label corresponding to the target operation code vector through a virus detection model, where the virus detection model is obtained by training according to a positive sample operation code vector set and a negative sample operation code vector set, and the virus detection model is used to represent a relationship between the operation code vector and the sample label;
the determining module is used for determining a virus detection result of the file to be detected according to the target sample label acquired by the acquiring module;
the acquisition module is specifically configured to:
if the file to be detected comprises an effective load file, acquiring an instruction set of the file to be detected, wherein the instruction set comprises the at least one instruction;
and numbering each instruction in the instruction set according to an instruction numbering rule to generate the target operation code vector of the file to be detected.
9. The virus detection apparatus of claim 8, further comprising a training module;
the obtaining module is further configured to obtain, before obtaining a target sample tag corresponding to the target opcode vector through a virus detection model, a set of positive sample opcode vectors and a set of negative sample opcode vectors, where the set of positive sample opcode vectors includes at least one positive sample opcode vector, the set of negative sample opcode vectors includes at least one negative sample opcode vector, the positive sample opcode vector is generated according to at least one instruction in a security sample, and the negative sample opcode vector is generated according to at least one instruction in a virus sample;
the training module is configured to train the positive sample operation code vector set and the negative sample operation code vector set obtained by the obtaining module to obtain a random forest model.
10. A virus detection apparatus, comprising: a memory, a transceiver, a processor, and a bus system;
wherein the memory is used for storing programs;
the processor is used for executing the program in the memory and comprises the following steps:
acquiring a target operation code vector of a file to be detected, wherein the target operation code vector is generated according to at least one instruction;
acquiring a target sample label corresponding to the target operation code vector through a virus detection model, wherein the virus detection model is obtained by training according to a positive sample operation code vector set and a negative sample operation code vector set, and the virus detection model is used for representing the relation between the operation code vector and the sample label;
determining a virus detection result of the file to be detected according to the target sample label;
the bus system is used for connecting the memory and the processor so as to enable the memory and the processor to communicate;
the processor is specifically configured to execute the following steps when acquiring a target operation code vector of a file to be detected:
if the file to be detected comprises an effective load file, acquiring an instruction set of the file to be detected, wherein the instruction set comprises the at least one instruction;
and numbering each instruction in the instruction set according to an instruction numbering rule to generate the target operation code vector of the file to be detected.
11. The virus detection apparatus of claim 10, wherein the processor is further configured to perform the steps of:
obtaining a set of positive sample opcode vectors and a set of negative sample opcode vectors, wherein the set of positive sample opcode vectors includes at least one positive sample opcode vector, the set of negative sample opcode vectors includes at least one negative sample opcode vector, the positive sample opcode vector is generated according to at least one instruction in a security sample, and the negative sample opcode vector is generated according to at least one instruction in a virus sample;
and training the positive sample operation code vector set and the negative sample operation code vector set to obtain a random forest model.
12. A computer-readable storage medium comprising instructions that, when executed on a computer, cause the computer to perform the method of any one of claims 1 to 7.
CN201810332378.XA 2018-04-13 2018-04-13 Virus detection method and related device Active CN110210216B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810332378.XA CN110210216B (en) 2018-04-13 2018-04-13 Virus detection method and related device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810332378.XA CN110210216B (en) 2018-04-13 2018-04-13 Virus detection method and related device

Publications (2)

Publication Number Publication Date
CN110210216A CN110210216A (en) 2019-09-06
CN110210216B true CN110210216B (en) 2023-03-17

Family

ID=67779047

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810332378.XA Active CN110210216B (en) 2018-04-13 2018-04-13 Virus detection method and related device

Country Status (1)

Country Link
CN (1) CN110210216B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110611675A (en) * 2019-09-20 2019-12-24 哈尔滨安天科技集团股份有限公司 Vector magnitude detection rule generation method and device, electronic equipment and storage medium
CN112948829B (en) * 2021-03-03 2023-11-03 深信服科技股份有限公司 File searching and killing method, system, equipment and storage medium
CN113257426B (en) * 2021-06-30 2021-09-21 杭州华网信息技术有限公司 Aggregated group flu prediction system, storage medium and device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105205396A (en) * 2015-10-15 2015-12-30 上海交通大学 Detecting system for Android malicious code based on deep learning and method thereof
CN106845240A (en) * 2017-03-10 2017-06-13 西京学院 A kind of Android malware static detection method based on random forest
CN106919841A (en) * 2017-03-10 2017-07-04 西京学院 A kind of efficient Android malware detection model DroidDet based on rotation forest
CN107392019A (en) * 2017-07-05 2017-11-24 北京金睛云华科技有限公司 A kind of training of malicious code family and detection method and device
CN107463844A (en) * 2016-06-06 2017-12-12 国家计算机网络与信息安全管理中心 WEB Trojan detecting methods and system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105205396A (en) * 2015-10-15 2015-12-30 上海交通大学 Detecting system for Android malicious code based on deep learning and method thereof
CN107463844A (en) * 2016-06-06 2017-12-12 国家计算机网络与信息安全管理中心 WEB Trojan detecting methods and system
CN106845240A (en) * 2017-03-10 2017-06-13 西京学院 A kind of Android malware static detection method based on random forest
CN106919841A (en) * 2017-03-10 2017-07-04 西京学院 A kind of efficient Android malware detection model DroidDet based on rotation forest
CN107392019A (en) * 2017-07-05 2017-11-24 北京金睛云华科技有限公司 A kind of training of malicious code family and detection method and device

Also Published As

Publication number Publication date
CN110210216A (en) 2019-09-06

Similar Documents

Publication Publication Date Title
US10176321B2 (en) Leveraging behavior-based rules for malware family classification
US10581879B1 (en) Enhanced malware detection for generated objects
CN106682505B (en) Virus detection method, terminal, server and system
CN108280350B (en) Android-oriented mobile network terminal malicious software multi-feature detection method
US9781144B1 (en) Determining duplicate objects for malware analysis using environmental/context information
CN111460446B (en) Malicious file detection method and device based on model
CN110210218B (en) Virus detection method and related device
US20130291111A1 (en) Method and Device for Program Identification Based on Machine Learning
CN103839003A (en) Malicious file detection method and device
CN110210216B (en) Virus detection method and related device
CN110826064A (en) Malicious file processing method and device, electronic device and storage medium
WO2017012241A1 (en) File inspection method, device, apparatus and non-volatile computer storage medium
CN107580703B (en) Migration service method and module for software module
CN112528284A (en) Malicious program detection method and device, storage medium and electronic equipment
CN107395650B (en) Method and device for identifying Trojan back connection based on sandbox detection file
KR102095853B1 (en) Virus database acquisition method and device, equipment, server and system
Jiang et al. Android malware family classification based on sensitive opcode sequence
CN111222137A (en) Program classification model training method, program classification method and device
Carlin et al. Dynamic analysis of malware using run-time opcodes
CN116303290A (en) Office document detection method, device, equipment and medium
CN110210215B (en) Virus detection method and related device
US20240054210A1 (en) Cyber threat information processing apparatus, cyber threat information processing method, and storage medium storing cyber threat information processing program
CN107358106A (en) Leak detection method, Hole Detection device and server
Ugarte-Pedrero et al. On the adoption of anomaly detection for packed executable filtering
CN111723370A (en) Method and equipment for detecting malicious behavior of container

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant