CN110210216A - A kind of method and relevant apparatus of viral diagnosis - Google Patents

A kind of method and relevant apparatus of viral diagnosis Download PDF

Info

Publication number
CN110210216A
CN110210216A CN201810332378.XA CN201810332378A CN110210216A CN 110210216 A CN110210216 A CN 110210216A CN 201810332378 A CN201810332378 A CN 201810332378A CN 110210216 A CN110210216 A CN 110210216A
Authority
CN
China
Prior art keywords
code vector
sample
file
operation code
detected
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810332378.XA
Other languages
Chinese (zh)
Other versions
CN110210216B (en
Inventor
雷经纬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201810332378.XA priority Critical patent/CN110210216B/en
Publication of CN110210216A publication Critical patent/CN110210216A/en
Application granted granted Critical
Publication of CN110210216B publication Critical patent/CN110210216B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/562Static detection
    • G06F21/563Static detection by source code analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Virology (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The embodiment of the invention discloses a kind of methods of viral diagnosis, comprising: obtains the object run code vector of file to be detected, wherein at least one instruction generates according to the object run code vector;Target sample label corresponding to the object run code vector is obtained by viral diagnosis model, wherein, the viral diagnosis model is obtained according to positive sample operation code vector set and the training of negative sample operation code vector set, and the viral diagnosis model is used to indicate the relationship between operation code vector and sample label;The viral diagnosis result of the file to be detected is determined according to the target sample label.A kind of viral diagnosis device is additionally provided in the embodiment of the present invention.On the one hand the embodiment of the present invention can save the artificial process for extracting condition code and on the other hand can perceive unknown virus, be conducive to the safety of lifting scheme.

Description

A kind of method and relevant apparatus of viral diagnosis
Technical field
The present invention relates to field of information security technology more particularly to the methods and relevant apparatus of a kind of viral diagnosis.
Background technique
With the development of computer technology and network technology, viral type is more and more, and destructive and concealment is very strong Viral long-term existence.Virus is a program or one section of executable code, just as biological virus, have self-reproduction, The mutual biological virus feature such as phase transmission and activating and regenerating.They can be attached to itself on various types of files, work as file As soon as be replicated or be transmitted to another user from a user, they spread together in company with file comes.
Currently, generalling use the detection of virus such as under type, firstly, being carried out to the Virus Sample come is manually marked out Then analysis extracts binary segments as condition code, if file to be detected hits condition code from Virus Sample, then it represents that This document carries virus.
However, judging whether carry virus in file using aforesaid way, there are the following problems: since condition code is to shift to an earlier date It determines, once there is new virus, is then difficult to detect by the new virus, in other words, existing scheme can not be to unknown disease Poison is detected, and information security is unfavorable for.
Summary of the invention
The embodiment of the invention provides a kind of method of viral diagnosis and relevant apparatus, on the one hand can save and manually mention The process of condition code is taken, on the other hand, unknown virus can be perceived, be conducive to the safety of lifting scheme.
The first aspect of the present invention first provides a kind of method of viral diagnosis, comprising:
Obtain the object run code vector of file to be detected, wherein according to the object run code vector at least one What instruction generated;
Target sample label corresponding to the object run code vector is obtained by viral diagnosis model, wherein described Viral diagnosis model is obtained according to positive sample operation code vector set and the training of negative sample operation code vector set, described Viral diagnosis model is used to indicate the relationship between operation code vector and sample label;
The viral diagnosis result of the file to be detected is determined according to the target sample label.
The second aspect of the present invention first provides a kind of viral diagnosis device, comprising:
Module is obtained, for obtaining the object run code vector of file to be detected, wherein the object run code vector is It is generated according at least one instruction;
The acquisition module is also used to obtain target corresponding to the object run code vector by viral diagnosis model Sample label, wherein the viral diagnosis model is to operate code vector according to positive sample operation code vector set and negative sample Set training obtains, and the viral diagnosis model is used to indicate the relationship between operation code vector and sample label;
Determining module, the target sample label for being obtained according to the acquisition module determine the file to be detected Viral diagnosis result.
The third aspect of the present invention first provides a kind of viral diagnosis device, and the viral diagnosis device includes: storage Device, transceiver, processor and bus system;
Wherein, the memory is for storing program;
The processor is used to execute the program in the memory, includes the following steps:
Obtain the object run code vector of file to be detected, wherein according to the object run code vector at least one What instruction generated;
Target sample label corresponding to the object run code vector is obtained by viral diagnosis model, wherein described Viral diagnosis model is obtained according to positive sample operation code vector set and the training of negative sample operation code vector set, described Viral diagnosis model is used to indicate the relationship between operation code vector and sample label;
The viral diagnosis result of the file to be detected is determined according to the target sample label;
The bus system is for connecting the memory and the processor, so that the memory and the place Reason device is communicated.
The fourth aspect of the present invention provides a kind of computer readable storage medium, in the computer readable storage medium It is stored with instruction, when run on a computer, so that computer executes method described in above-mentioned various aspects.
As can be seen from the above technical solutions, the embodiment of the present invention has the advantage that
In the embodiment of the present invention, a kind of method of viral diagnosis is provided, obtains the object run of file to be detected first Code vector, wherein at least one instruction generates according to object run code vector, then obtains mesh by viral diagnosis model Target sample label corresponding to mark operation code vector, wherein viral diagnosis model is according to positive sample operation code vector set And the training of negative sample operation code vector set obtains, viral diagnosis model for indicate operation code vector and sample label it Between relationship, the viral diagnosis result of file to be detected is finally determined according to target sample label.By the above-mentioned means, on the one hand The artificial process for extracting condition code can be saved, directly obtains the sample mark of file to be detected using viral diagnosis model analysis Label, the sample label can indicate whether file to be detected has virus, and on the other hand, viral diagnosis model is by a large amount of positive and negative What sample training obtained, there is preferable viral predictive ability to be conducive to the peace of lifting scheme so as to perceive unknown virus Quan Xing.
Detailed description of the invention
Fig. 1 is a configuration diagram of virus detection system in the embodiment of the present invention;
Fig. 2 is a call relation schematic diagram of virus detection system in the embodiment of the present invention;
Fig. 3 is method one embodiment schematic diagram of viral diagnosis in the embodiment of the present invention;
Fig. 4 is the flow diagram that object run code vector is obtained in the embodiment of the present invention;
Fig. 5 is a schematic diagram of internal structure of payload file in the embodiment of the present invention;
Fig. 6 is the form schematic diagram instructed in the embodiment of the present invention;
Fig. 7 is a flow diagram of training viral diagnosis model in the embodiment of the present invention;
Fig. 8 is one embodiment schematic diagram of decision tree in the embodiment of the present invention;
Fig. 9 is the flow diagram tested in the embodiment of the present invention to file to be detected;
Figure 10 is a flow diagram of viral diagnosis in application scenarios of the present invention;
Figure 11 is one embodiment schematic diagram of viral diagnosis device in the embodiment of the present invention;
Figure 12 is another embodiment schematic diagram of viral diagnosis device in the embodiment of the present invention;
Figure 13 is a structural schematic diagram of viral diagnosis device in the embodiment of the present invention.
Specific embodiment
The embodiment of the invention provides a kind of method of viral diagnosis and relevant apparatus, on the one hand can save and manually mention The process of condition code is taken, on the other hand, unknown virus can be perceived, be conducive to the safety of lifting scheme.
Description and claims of this specification and term " first ", " second ", " third ", " in above-mentioned attached drawing The (if present)s such as four " are to be used to distinguish similar objects, without being used to describe a particular order or precedence order.It should manage The data that solution uses in this way are interchangeable under appropriate circumstances, so that the embodiment of the present invention described herein for example can be to remove Sequence other than those of illustrating or describe herein is implemented.In addition, term " includes " and " having " and theirs is any Deformation, it is intended that cover it is non-exclusive include, for example, containing the process, method of a series of steps or units, system, production Product or equipment those of are not necessarily limited to be clearly listed step or unit, but may include be not clearly listed or for this A little process, methods, the other step or units of product or equipment inherently.
It should be understood that present invention is primarily applicable to the detections of Android (Android) virus, in addition it is also possible to be applied to it The viral diagnosis of his type, such as Computer parallel processing, apple system (iphone operation system, iOS) virus Detection and microsoft system (Windows) viral diagnosis etc., this programme will be introduced by taking Android viral diagnosis as an example. Android can same series core application package issue together, which includes client, SMS (Short Message Service) (Short Message Service, SMS) program, calendar, map, browser and contact management's program etc..
At the same time, android system also faces the infringement of this Android virus, such as " hundred brain worm wooden horses " (can infect Promote class application program), " the tail tree horse of lizard " (can infect system library file, replacement system file, injected system process, surreptitiously Take user information and monitor call with short message etc.) and " permission killer " (can fight security software, monitor short message, play advertisement, Popularization and brush flow) etc..This programme can not only detect Android virus known to these, can also be to other Unknown Android virus is detected.
Referring to Fig. 1, Fig. 1 is a configuration diagram of virus detection system in the embodiment of the present invention, as shown, this Viral diagnosis device in scheme can be deployed in server, after server obtains viral diagnosis result, by the viral diagnosis As a result it is sent to terminal device, so that user can understand the virus inspection of file to be detected by the display interface of terminal device Survey result.Optionally, the viral diagnosis device in this programme can also be deployed in terminal device, by terminal device directly to be detected File is detected, and viral diagnosis result is showed in the display interface of front end.
Viral diagnosis device in the present invention may include three logic modules, and each logic module is for realizing corresponding function Energy.Referring to Fig. 2, Fig. 2 is a call relation schematic diagram of virus detection system in the embodiment of the present invention, as shown, this Three logic modules are respectively vector generation module, Random Forest model training module and testing process control module.Wherein, Vector generation module is an independent module, is called by other two module.It is inputted by Random Forest model training module Be a collection of Android Virus Sample and the safe sample of Android, Random Forest model training module calls vector generation module The operation code vector of obtained positive negative sample is then input to artificial intelligence (Artificial Intelligence, AI) model (specifically can be Random Forest model) obtains model file.Testing process control module then calls vector generation module, with To the object run code vector of file to be detected, file to be detected is finally fed through AI model (such as Random Forest model), is obtained To the safe condition of sample to be detected.
Below by from the angle of viral diagnosis device, the method for viral diagnosis in the present invention is introduced, figure is please referred to 3, method one embodiment of viral diagnosis includes: in the embodiment of the present invention
101, obtain the object run code vector of file to be detected, wherein according to object run code vector at least one What instruction generated;
In the present embodiment, firstly, viral diagnosis device receives viral diagnosis instruction, carried in viral diagnosis instruction to be checked File identification is surveyed, file to be detected just can determine that by the mark.Then, operation code vector is carried out to file to be detected to mention It takes, and obtains object run code vector.
Wherein, object run code vector is generated according at least one instruction, comprising at least in object run code vector One vector element, each vector element is corresponding to be instructed with one.
102, target sample label corresponding to object run code vector is obtained by viral diagnosis model, wherein virus Detection model is obtained according to positive sample operation code vector set and the training of negative sample operation code vector set, viral diagnosis Model is used to indicate the relationship between operation code vector and sample label;
In the present embodiment, object run code vector is input to the viral diagnosis mould that training obtains in advance by viral diagnosis device Type, and export by the model target sample label of file to be detected.
Wherein, viral diagnosis model be use a large amount of positive samples operation code vector (i.e. positive sample operation code vector set) with And a large amount of negative samples operation code vector (i.e. negative sample operation code vector set) training obtain, by these positive sample operation codes to Duration set and negative sample operation code vector set are added into AI model and are trained, available AI model library file, These model library files indicate whether have using sample label (for example " 1 " is expressed as Virus Sample, and " 0 " is expressed as safe sample) There is virus.The object run code vector of file to be detected can also export a corresponding mesh after through viral diagnosis model Mark sample label.
103, the viral diagnosis result of file to be detected is determined according to target sample label.
In the present embodiment, viral diagnosis device determines the viral diagnosis knot of the file to be detected according to target sample label Fruit, and viral diagnosis result can be sent to client, user can understand whether file to be detected is peace by client Full situation.
In the embodiment of the present invention, a kind of method of viral diagnosis is provided, obtains the object run of file to be detected first Code vector, wherein at least one instruction generates according to object run code vector, then obtains mesh by viral diagnosis model Target sample label corresponding to mark operation code vector, wherein viral diagnosis model is according to positive sample operation code vector set And the training of negative sample operation code vector set obtains, viral diagnosis model for indicate operation code vector and sample label it Between relationship, the viral diagnosis result of file to be detected is finally determined according to target sample label.By the above-mentioned means, on the one hand The artificial process for extracting condition code can be saved, directly obtains the sample mark of file to be detected using viral diagnosis model analysis Label, the sample label can indicate whether file to be detected has virus, and on the other hand, viral diagnosis model is by a large amount of positive and negative What sample training obtained, there is preferable viral predictive ability to be conducive to the peace of lifting scheme so as to perceive unknown virus Quan Xing.
Optionally, on the basis of above-mentioned Fig. 3 corresponding embodiment, the method for viral diagnosis provided in an embodiment of the present invention In first alternative embodiment, the object run code vector of file to be detected is obtained, may include:
If file to be detected includes effective payload file, the instruction set of file to be detected is obtained, wherein instruction set Include at least one instruction;
Every instruction in instruction set is numbered according to order number rule, to generate the target of file to be detected Operate code vector.
In the present embodiment, viral diagnosis device is first determined whether in file to be detected with the presence or absence of payload (payload) File, if it is present viral diagnosis device can from file to be detected acquisition instruction set, generally comprise in instruction set At least one instruction is numbered processing to each instruction in instruction set, finally gives birth to then according to order number rule At the object run code vector of file to be detected.
It is understood that virus would generally do some harmful or pernicious movement, this is realized in viral code The part of a function is called payload file.The program operated in victim's environment, which may be implemented, in payload file to do Thing, be able to carry out movement and include but be not limited only to destroy file, delete file, to virus author or arbitrary receive Person sends sensitive information, and provides the back door for leading to infected computer.
It is explained below and how to extract payload file.By taking Android (Android) file as an example, Android installation kit (Android Package, apk) is an executable program, and specific implementation can be compressed package, please refer to table 1, and table 1 is pressure A signal of file structure in contracting packet.
Table 1
File or catalogue Effect
META-INF/ As manifest introduces the catalogue of description package informatin from Java jar file
res/ Store the catalogue of resource file
libs/ If it does, what is stored is the library so that ndk compiles out
AndroidManifest.xml Program global configuration file
classes.dex The dalvik bytecode ultimately generated
resources.ars Binary Resources file after compiling
From the point of view of table 1, the file that suffix is " .dex " can be found after opening " res/ " file, suffix is " .dex " File is payload file, and still " classes.dex " file is not belonging to payload file.In other words, in addition to Dex file other than " classes.dex " is exactly so-called payload file.If not including payload in file to be detected File can then exit the safety detection to file to be detected then not within the scope of viral diagnosis in advance.
Specifically, referring to Fig. 4, Fig. 4 is the process signal for obtaining object run code vector in the embodiment of the present invention Figure, as shown, in step 201, obtaining file to be detected, wherein file to be detected can be installation package file, picture text Part, video file, document files, audio file or application program etc., herein without limitation.In step 202, to be detected Payload file is extracted in file, the suffix of payload file is " .dex ", but " classes.dex " file is not belonging to Payload file therefore, there is no need to extract " classes.dex " file.In step 203, from each payload file Extract instruction set, wherein would generally include multiple instruction in an instruction set.In step 204, according to each instruction set It closes and counts operation code distribution situation corresponding to each instruction set, finally, in step 205, according to operation code distribution situation Object run code vector relevant to instruction set can be generated.
Secondly, in the embodiment of the present invention, if file to be detected includes effective payload file, viral diagnosis device can be with The instruction set for obtaining file to be detected is numbered every instruction in instruction set according still further to order number rule, with Generate the object run code vector of file to be detected.By the above-mentioned means, can be to the file to be detected comprising payload file Extracting and processing for instruction set is carried out, payload file can reflect out whether file to be detected is likely to occur harmful or dislikes The operation of property, if not having payload file, also there is no need to carry out viral diagnosis, to promote the success rate of viral diagnosis.
Optionally, on the basis of above-mentioned Fig. 3 corresponding one embodiment, viral diagnosis provided in an embodiment of the present invention Second alternative embodiment of method in, object run code vector include at least one vector element;
Every instruction in instruction set is numbered according to order number rule, to generate the target of file to be detected Code vector is operated, may include:
Count the frequency of occurrence of each operation code in instruction set;
Vector element is generated according to the frequency of occurrence of order number rule and each operation code;
The object run code vector of file to be detected is generated according to vector element.
In the present embodiment, object run code vector with method (method) code snippet be it is corresponding, for the ease of Understand, referring to Fig. 5, Fig. 5 is a schematic diagram of internal structure of payload file in the embodiment of the present invention, as shown, One classification (class) includes multiple method code snippets, and a method code snippet corresponds to multiple instruction.Simply For, class is to refer to that a class, class represent the thing of a type.Class can instantiate one by building method A object, Java (java) code uses the object of class under normal circumstances, is to complete function by calling method, therefore The instruction of method characterizes the emphasis test object that had function is for we.Method, which refers in class, to be completed centainly The code of function.
Specifically, referring to Fig. 6, Fig. 6 is the form schematic diagram instructed in the embodiment of the present invention, as shown, one Two parts, respectively operation code and data are included at least in a instruction, operation code is used to indicate the type of some operation, and counts According to then for indicating the content of operation.
The object run code vector for how generating file to be detected is explained below.One file to be detected may include more A payload file, a payload file may include multiple class, and a class includes multiple method, and one Method includes multiple instruction, and object run code vector can be the finger corresponding to any one method in file to be detected It enables.
Since the type of operation code in instruction is predetermined, we can compile to each one number of instruction setting Number (such as since 1) forms order number rule, for example " mov " instructs reference numeral 1, and " add " instructs reference numeral 2, Etc..Table 2 is please referred to, table 2 is a signal of order number rule.
Table 2
Instruction Number
Mov (move) 1
Add (addition instruction) 2
Invoke (call instruction) 3
Nop (do-nothing instruction) 4
Move-wide (wide move) 5
Return (return instruction) 6
Return-wide (wide return instruction) 7
Firstly, viral diagnosis device extracts a method code snippet of payload file, the operation of extraction instruction Code, it is assumed that the instruction of certain method code snippet is as follows:
mov va,1
add va,vb
invoke xxx
So its sequence of opcodes is " mov, add and invoke ", takes all method code snippets of the payload Operation code counts the number of each operation code appearance, and the number that all operation codes are occurred is suitable according to order number rule Sequence is arranged, and string number can be formed, this string number is the operation code vector of the sample.It is understood that if Some operation code does not occur in payload file, then number is denoted as 0.
For example, " mov " instruction occurs 3 times, " add " instruction occurs 1 time, and " invoke " instruction occurs 2 times, " nop " instruction Occurring 5 times, " move-wide " instruction occurs 0 time, and " return " instruction occurs 1 time, and " return-wide " instruction occurs 0 time, [3,1,2,5,0,1,0] can be expressed as by then operating code vector.
It should be noted that above-mentioned example be only one signal, in practical applications, can using aforesaid way obtain to The object run code vector of file is detected, negative sample operates code vector and positive sample operates code vector.
Again, in the embodiment of the present invention, viral diagnosis device is using order number rule to the frequency of occurrence of every instruction It is handled, and obtains vector element, determine required object run code further according at least one vector element of generation Vector.By the above-mentioned means, using order number rule indicated by instruction and number between relationship, can automatically generate to The object run code vector of file is detected, artificial extraction operation code vector is not necessarily to, to be conducive to promote file security detection Efficiency, reduce cost of labor, the practicability of lifting scheme.
Optionally, on the basis of above-mentioned Fig. 3 corresponding embodiment, the method for viral diagnosis provided in an embodiment of the present invention In third alternative embodiment, by viral diagnosis model obtain object run code vector corresponding to target sample label it Before, can also include:
Obtain positive sample operation code vector set and negative sample operation code vector set, wherein positive sample operation code to Duration set includes that at least one positive sample operates code vector, and negative sample operation code vector set is operated comprising at least one negative sample Code vector, it is to be generated according to the instruction of at least one of safe sample that positive sample, which operates code vector, and negative sample operates code vector For what is generated according to the instruction of at least one of Virus Sample;
Positive sample operation code vector set and negative sample operation code vector set are trained, random forest mould is obtained Type.
In the present embodiment, viral diagnosis model is specifically as follows Random Forest model, and corresponding second in conjunction with above-mentioned Fig. 3 The operation code vector that similar mode generates the operation code vector sum negative sample of positive sample can be used in embodiment.Wherein, positive sample It may be considered safe sample, and negative sample is then Virus Sample.
Specifically, how to train introduction to obtain viral diagnosis model, how to train to obtain disease below in conjunction with Fig. 7 introduction Malicious detection model is such as schemed referring to Fig. 7, Fig. 7 is a flow diagram of training viral diagnosis model in the embodiment of the present invention It is shown, in step 301, first obtain a collection of positive sample and negative sample, wherein positive sample is safe sample, and negative sample is disease Malicious sample.In step 302, also need to extract the operation code vector of each positive sample, to generate positive sample operation code vector Set, and the operation code vector of each negative sample is extracted, to generate the operation code vector set of negative sample.In step 303, Positive sample operation code vector set and negative sample operation code vector set are trained using Random Forest model.
Random Forest model refers to setting a kind of classifier for being trained sample and predicting using more.In engineering In habit, Random Forest model is the classifier comprising multiple decision trees, and classification of its output is by setting output individually Classification mode depending on.
It is understood that in practical applications, viral diagnosis model is also possible to Recognition with Recurrent Neural Network (Recurrent Neural Networks, RNN) model or deep neural network (Deep Neural Network, DNN) model, herein not It limits.
Secondly, in the embodiment of the present invention, viral diagnosis device can obtain in advance positive sample operation code vector set and Then negative sample operation code vector set instructs positive sample operation code vector set and negative sample operation code vector set Practice, obtains Random Forest model.By the above-mentioned means, using Random Forest model as viral diagnosis model, and random forest Model can be generated high accuracy as a result, and be capable of handling a large amount of data, for unbalanced classification situation, with Machine forest model can also balance error, thus the practicability and operability of lifting scheme.
Optionally, on the basis of above-mentioned Fig. 3 corresponding third embodiment, viral diagnosis provided in an embodiment of the present invention The 4th alternative embodiment of method in, pass through viral diagnosis model obtain object run code vector corresponding to target sample mark It signs, may include:
Object run code vector is input to Random Forest model, obtains the judgement of sample corresponding to object run code vector As a result, wherein Random Forest model includes more decision trees, and each decision tree is for exporting positive result or negative test;
Target sample label is determined according to sample judging result.
In the present embodiment, viral diagnosis device can determine target sample label by Random Forest model.Firstly the need of Object run code vector is input to Random Forest model, obtains sample judging result corresponding to object run code vector, In, Random Forest model includes more decision trees, and each decision tree is used to export the positive result or negative test of sample, and positive result can To indicate safe sample, negative test can indicate Virus Sample, and the sample results finally exported according to every decision tree determine mesh Mark sample label.
For the ease of introducing, referring to Fig. 8, Fig. 8 is one embodiment schematic diagram of decision tree in the embodiment of the present invention, this Random Forest model in scheme is made of multiple decision trees as shown in Figure 8, and each decision tree has one to sentence Other conclusion (safe sample or Virus Sample), the result for taking conclusion number more is as final sample judging result.If Conclusion number is identical, then it is assumed that the sample is Virus Sample.Decision tree depth shown in Fig. 8 is 3, and decision tree depth, which just refers to, to be sentenced Other result at most need to determine several times could knowledge of result, the decision tree in Fig. 8 need to judge 2 times or 3 times could knowledge of result, Therefore, the depth of the decision tree is 3.
It is understood that also needing to carry out parameter configuration to Random Forest model in advance, by long-term empirical cumulative And experiment process, the parameter configuration in this programme can be, the number of decision tree is between 50 to 80 in Random Forest model, often Tree depth between 20 to 40, leaf node smallest sample number be less than or equal to 5.It, can be with however, in practical applications According to circumstances above-mentioned parameter is adjusted, only one signal, is not construed as being the restriction to this programme herein.
Again, in the embodiment of the present invention, object run code vector is first input to Random Forest model by viral diagnosis device, Sample judging result corresponding to object run code vector is obtained, target sample label is then determined according to sample judging result. By the above-mentioned means, using in Random Forest model the output of more decision trees as a result, it is possible to determine the target of file to be detected Sample label, and decision tree is a kind of basic classifier, has readable strong and fast classification speed advantage, thus favorably In the operability and detection efficiency of lifting scheme.
Optionally, on the basis of above-mentioned Fig. 3 corresponding 4th embodiment, viral diagnosis provided in an embodiment of the present invention The 5th alternative embodiment of method in, obtain sample judging result corresponding to object run code vector, may include:
According to the output of every decision tree in Random Forest model as a result, obtaining positive knot corresponding to object run code vector Fruit quantity and negative test quantity;
Target sample label is determined according to sample judging result, may include:
If negative test quantity is less than positive fruiting quantities, it is determined that target sample label is positive label.
It is random gloomy by this after file to be detected is input to Random Forest model by viral diagnosis device in the present embodiment Woods model exports sample judging result.Assuming that one share 50 trees in Random Forest model, each tree can all export one as a result, The result as a result, results of remaining 28 trees output the are positive if result for having 22 trees to export in 50 trees is negative.It is so negative Fruiting quantities are less than positive fruiting quantities (i.e. 22 less than 28), then think that target sample label is positive label.
It is understood that positive label may be considered safety label, if the target sample label of file to be detected is all It is safety label, it is meant that file to be detected is also safe file, or determines that the safety of file to be detected is in not Know state.
Further, in the embodiment of the present invention, viral diagnosis device can be first according to every decision in Random Forest model The output of tree is as a result, obtain positive fruiting quantities and negative test quantity corresponding to object run code vector, if negative test quantity Less than positive fruiting quantities, it is determined that target sample label is positive label.By the above-mentioned means, with the principle of " the minority is subordinate to the majority " Target sample label corresponding to file to be detected is generated, in the case where negative test quantity is less than positive fruiting quantities, can be sentenced The sample label that sets the goal is positive label, thus the feasibility and practicability of lifting scheme.
Optionally, on the basis of above-mentioned Fig. 3 corresponding 4th embodiment, viral diagnosis provided in an embodiment of the present invention The 6th alternative embodiment of method in, obtain sample judging result corresponding to object run code vector, may include:
According to the output of every decision tree in Random Forest model as a result, obtaining positive knot corresponding to object run code vector Fruit quantity and negative test quantity;
Target sample label is determined according to sample judging result, may include:
If negative test quantity is greater than or equal to positive fruiting quantities, it is determined that target sample label is negative label.
It is random gloomy by this after file to be detected is input to Random Forest model by viral diagnosis device in the present embodiment Woods model exports sample judging result.Assuming that one share 50 trees in Random Forest model, each tree can all export one as a result, The result as a result, results of remaining 15 trees output the are positive if result for having 35 trees to export in 50 trees is negative.It is so negative Fruiting quantities are greater than positive fruiting quantities (i.e. 35 be greater than 15), then think that target sample label is negative label.
It is understood that negative label may be considered viral label, if the target sample label packet of file to be detected Include viral label, it is meant that file to be detected is to take viruliferous file.
Further, in the embodiment of the present invention, viral diagnosis device can be first according to every decision in Random Forest model The output of tree is as a result, obtain positive fruiting quantities and negative test quantity corresponding to object run code vector, if negative test quantity More than or equal to positive fruiting quantities, it is determined that target sample label is negative label.By the above-mentioned means, with " the minority is subordinate to the majority " Principle generate target sample label corresponding to file to be detected, negative test quantity be greater than or equal to positive fruiting quantities feelings Under condition, it is possible to determine that target sample label is negative label, thus the feasibility and practicability of lifting scheme.
Optionally, provided in an embodiment of the present invention corresponding 5th or on the basis of the 6th embodiment in above-mentioned Fig. 3 In the 7th alternative embodiment of method of viral diagnosis, the viral diagnosis knot of file to be detected is determined according to target sample label Fruit may include:
The label if target sample label is positive, it is determined that file to be detected is secure file;
The label if target sample label is negative, it is determined that file to be detected is virus document.
In the present embodiment, viral diagnosis model specifically can be Random Forest model, and Random Forest model is using a large amount of What positive sample operation code vector and the operation code vector training of a large amount of negative samples obtained, these positive sample operation code vector sums are born Sample operations code vector, which is added to Random Forest model, to be trained, and an available model library file, model library file is adopted Indicate whether that there is virus with sample label.The target Random Forest model vector of file to be detected is passing through Random Forest model Later, a corresponding target sample label can also be exported.
If the target sample label of output is positive label, illustrate that the file to be detected is secure file or unknown peace The file of full property then illustrates that the file to be measured is virus document, whereas if the target sample label of output is negative label.This It is because while some unknown viruses can be predicted by using viral diagnosis model, but also be difficult to ensure all virus all It can be detected, so can temporarily be considered safe file for the file to be measured of positive label.
It should be noted that positive tag representation safety label, can also be expressed as " 0 ", and negative tag representation is temporarily viral Label, " 1 " can be expressed as, in practical applications, positive label and negative label can also be indicated using other forms, this Place is without limitation.
Below in conjunction with Fig. 9, a process for detecting file to be detected is introduced, referring to Fig. 9, Fig. 9 is that the present invention is implemented The flow diagram tested in example to file to be detected, as shown, specifically:
In step 401, a collection of positive sample and negative sample are obtained, wherein positive sample can refer to that safe sample, negative sample can To refer to Virus Sample.It should be noted that in practical applications, positive sample can also be set as to Virus Sample, negative sample It is set as safe sample, this depends on user's setting to positive negative sample in advance;
In step 402, the operation code vector of the operation code vector sum negative sample of positive sample is extracted respectively;
In step 403, the operation code vector of the operation code vector sum negative sample of positive sample is input to AI model and is instructed Practice, wherein the AI model specifically can be Random Forest model;
In step 404, an available model library file after model training is carried out, which can be stored Duplication, detects for subsequent viral and calls, and model library file here can be understood as a kind of configuration file;
In step 405, file to be detected is obtained;
In step 406, the object run code vector of the file to be detected is extracted, and by the object run code of file to be detected Vector is input in model, and exports the sample label of file to be detected;
In step 407, judge whether the sample label of the file to be detected is consistent with viral label, if unanimously, entering Step 408,409 are gone to step conversely, if inconsistent with viral label,;
In step 408, determine that the file to be detected is virus document;
In step 409, the security situation of the file to be detected can not be determined, or think that the file to be detected is safe text Part.
Secondly, viral diagnosis device determines the safety of the file to be measured according to viral diagnosis result in the embodiment of the present invention Property, the label if target sample label is negative, it is determined that file to be measured is virus document, if instead target sample label is positive Label, it is determined that file to be measured is secure file.By the above-mentioned means, using viral diagnosis model to the target sample of file to be measured This label is predicted that obtained target sample label is consistent with viral label just to can determine that file to be measured has virus, real The viral predictive ability of existing file to be measured is conducive to the safety of lifting scheme so as to perceive unknown virus.
In order to make it easy to understand, the process of viral diagnosis is introduced below in conjunction with Figure 10, referring to Fig. 10, Figure 10 is the present invention A flow diagram of viral diagnosis in application scenarios, as shown, specifically:
In step 501, start to carry out viral diagnosis;
In step 502, selection a batch is used for the positive sample and negative sample of viral diagnosis model training;
In step 503, a file to be detected is selected;
In step 504, whether the file to be detected got in judgment step 503 includes payload file, if so, 505 are entered step, whereas if not including payload file in file to be detected, then gos to step 513;
It in step 505, can specifically be divided into four steps, positive sample, negative sample and to be checked are obtained in step 5051 File is surveyed, the payload file in positive sample, negative sample and file to be detected is extracted in step 5052, then in step In 5053, instruction set corresponding to each payload file is obtained, according to the distribution of instruction set statistical operation code, wherein One instruction set includes at least one instruction, and operation code vector, the negative sample for generating positive sample are finally distributed in step 5054 This operation code vector for operating code vector and file to be detected;
In step 506, the operation code vector of the operation code vector sum negative sample of positive sample is input to Random Forest model It is trained;
In step 507, an available model library file after model training is carried out, which can be stored Duplication, detects for subsequent viral and calls, and model library file here can be understood as a kind of configuration file;
In step 508, the operation code vector of the file to be detected is extracted, and the operation code vector of file to be detected is inputted Into Random Forest model;
In step 509, the Random Forest model obtained using the training of model library file, i.e. viral diagnosis model can be obtained To the sample label of file to be detected;
In step 510, judge whether the sample label of file to be detected is consistent with viral label, if unanimously, entering step Rapid 511, conversely, if inconsistent with viral label, go to step 512;
In step 511, determine that the file to be detected is virus document;
In step 512, the security situation of the file to be detected can not be determined;
In step 513, terminate the safety detection to file to be detected.
The viral diagnosis device in the present invention is described in detail below, please refers to Figure 11, Figure 11 is that the present invention is implemented Viral diagnosis device one embodiment schematic diagram in example, viral diagnosis device 60 include:
Obtain module 601, for obtaining the object run code vector of file to be detected, wherein the object run code to At least one instruction generates according to amount;
The acquisition module 601 is also used to obtain corresponding to the object run code vector by viral diagnosis model Target sample label, wherein the viral diagnosis model is according to positive sample operation code vector set and negative sample operation code The training of vector set obtains, and the viral diagnosis model is used to indicate the relationship between operation code vector and sample label;
Determining module 602, the target sample label for being obtained according to the acquisition module determine described to be detected The viral diagnosis result of file.
In the present embodiment, the object run code vector that module 601 obtains file to be detected is obtained, wherein the target behaviour At least one instructs generation according to making code vector, and the acquisition module 601 obtains the target by viral diagnosis model Operate target sample label corresponding to code vector, wherein the viral diagnosis model is according to positive sample operation code vector set What conjunction and the training of negative sample operation code vector set obtained, the viral diagnosis model is for indicating operation code vector and sample Relationship between label, determining module 602 determine described to be checked according to the target sample label that the acquisition module obtains Survey the viral diagnosis result of file.
In the embodiment of the present invention, a kind of viral diagnosis device is provided, obtains the object run code of file to be detected first Vector, wherein at least one instruction generates according to object run code vector, then obtains target by viral diagnosis model Operate code vector corresponding to target sample label, wherein viral diagnosis model be according to positive sample operation code vector set with And the training of negative sample operation code vector set obtains, viral diagnosis model is for indicating between operation code vector and sample label Relationship, the viral diagnosis result of file to be detected is finally determined according to target sample label.By the above-mentioned means, on the one hand can To save the artificial process for extracting condition code, the sample label of file to be detected directly is obtained using viral diagnosis model analysis, The sample label can indicate whether file to be detected has virus, and on the other hand, viral diagnosis model is by a large amount of positive and negative samples What this training obtained, there is preferable viral predictive ability to be conducive to the safety of lifting scheme so as to perceive unknown virus Property.
Optionally, on the basis of the embodiment corresponding to above-mentioned Figure 11, viral diagnosis dress provided in an embodiment of the present invention It sets in 60 another embodiment,
The acquisition module 601, if being specifically used for the file to be detected includes effective payload file, obtain it is described to Detect the instruction set of file, wherein described instruction set includes at least one described instruction;
Every instruction in described instruction set is numbered according to order number rule, to generate the text to be detected The object run code vector of part.
Secondly, in the embodiment of the present invention, if file to be detected includes effective payload file, viral diagnosis device can be with The instruction set for obtaining file to be detected is numbered every instruction in instruction set according still further to order number rule, with Generate the object run code vector of file to be detected.By the above-mentioned means, can be to the file to be detected comprising payload file Extracting and processing for instruction set is carried out, payload file can reflect out whether file to be detected is likely to occur harmful or dislikes The operation of property, if not having payload file, also there is no need to carry out viral diagnosis, to promote the success rate of viral diagnosis.
Optionally, on the basis of the embodiment corresponding to above-mentioned Figure 11, viral diagnosis dress provided in an embodiment of the present invention It sets in 60 another embodiment, the object run code vector includes at least one vector element;
The acquisition module 601, specifically for the frequency of occurrence of each operation code in statistics described instruction set;
Vector element is generated according to the frequency of occurrence of described instruction coding rule and each operation code;
The object run code vector of the file to be detected is generated according to the vector element.
Again, in the embodiment of the present invention, viral diagnosis device is using order number rule to the frequency of occurrence of every instruction It is handled, and obtains vector element, determine required object run code further according at least one vector element of generation Vector.By the above-mentioned means, using order number rule indicated by instruction and number between relationship, can automatically generate to The object run code vector of file is detected, artificial extraction operation code vector is not necessarily to, to be conducive to promote file security detection Efficiency, reduce cost of labor, the practicability of lifting scheme.
Optionally, on the basis of the embodiment corresponding to above-mentioned Figure 11, Figure 12 is please referred to, it is provided in an embodiment of the present invention In another embodiment of viral diagnosis device 60, viral diagnosis device 60 further includes training module 603;
The acquisition module 601 is also used to obtain corresponding to the object run code vector by viral diagnosis model Target sample label before, obtain the positive sample operation code vector set and the negative sample operation code vector set, Wherein, the positive sample operation code vector set include at least one positive sample operate code vector, the negative sample operation code to Duration set includes that at least one negative sample operates code vector, positive sample operation code vector be according in safe sample at least What one instruction generated, the negative sample operation code vector is to be generated according to the instruction of at least one of Virus Sample;
The training module 603, for it is described acquisition module 601 obtain the positive sample operation code vector set with And the negative sample operation code vector set is trained, and obtains Random Forest model.
Secondly, in the embodiment of the present invention, viral diagnosis device can obtain in advance positive sample operation code vector set and Then negative sample operation code vector set instructs positive sample operation code vector set and negative sample operation code vector set Practice, obtains Random Forest model.By the above-mentioned means, using Random Forest model as viral diagnosis model, and random forest Model can be generated high accuracy as a result, and be capable of handling a large amount of data, for unbalanced classification situation, with Machine forest model can also balance error, thus the practicability and operability of lifting scheme.
Optionally, on the basis of the embodiment corresponding to above-mentioned Figure 12, viral diagnosis dress provided in an embodiment of the present invention It sets in 60 another embodiment,
The acquisition module 601 is obtained specifically for the object run code vector is input to the Random Forest model To sample judging result corresponding to the object run code vector, wherein the Random Forest model includes more decision trees, Each decision tree is for exporting positive result or negative test;
The target sample label is determined according to the sample judging result.
Again, in the embodiment of the present invention, object run code vector is first input to Random Forest model by viral diagnosis device, Sample judging result corresponding to object run code vector is obtained, target sample label is then determined according to sample judging result. By the above-mentioned means, using in Random Forest model the output of more decision trees as a result, it is possible to determine the target of file to be detected Sample label, and decision tree is a kind of basic classifier, has readable strong and fast classification speed advantage, thus favorably In the operability and detection efficiency of lifting scheme.
Optionally, on the basis of the embodiment corresponding to above-mentioned Figure 12, viral diagnosis dress provided in an embodiment of the present invention It sets in 60 another embodiment,
The acquisition module 601, specifically for according to the output of every decision tree in the Random Forest model as a result, obtaining Take positive fruiting quantities corresponding to the object run code vector and negative test quantity;
If the negative test quantity is less than the positive fruiting quantities, it is determined that the target sample label is positive label.
Further, in the embodiment of the present invention, viral diagnosis device can be first according to every decision in Random Forest model The output of tree is as a result, obtain positive fruiting quantities and negative test quantity corresponding to object run code vector, if negative test quantity Less than positive fruiting quantities, it is determined that target sample label is positive label.By the above-mentioned means, with the principle of " the minority is subordinate to the majority " Target sample label corresponding to file to be detected is generated, in the case where negative test quantity is less than positive fruiting quantities, can be sentenced The sample label that sets the goal is positive label, thus the feasibility and practicability of lifting scheme.
Optionally, on the basis of the embodiment corresponding to above-mentioned Figure 12, viral diagnosis dress provided in an embodiment of the present invention It sets in 60 another embodiment,
The acquisition module 601, specifically for according to the output of every decision tree in the Random Forest model as a result, obtaining Take positive fruiting quantities corresponding to the object run code vector and negative test quantity;
If the negative test quantity is greater than or equal to the positive fruiting quantities, it is determined that the target sample label is negative mark Label.
Further, in the embodiment of the present invention, viral diagnosis device can be first according to every decision in Random Forest model The output of tree is as a result, obtain positive fruiting quantities and negative test quantity corresponding to object run code vector, if negative test quantity More than or equal to positive fruiting quantities, it is determined that target sample label is negative label.By the above-mentioned means, with " the minority is subordinate to the majority " Principle generate target sample label corresponding to file to be detected, negative test quantity be greater than or equal to positive fruiting quantities feelings Under condition, it is possible to determine that target sample label is negative label, thus the feasibility and practicability of lifting scheme.
Optionally, on the basis of the embodiment corresponding to above-mentioned Figure 12, viral diagnosis dress provided in an embodiment of the present invention It sets in 60 another embodiment,
The determining module 602, if being specifically used for the target sample label is the positive label, it is determined that described to be checked Survey file is secure file;
If the target sample label is the negative label, it is determined that the file to be detected is virus document.
Secondly, viral diagnosis device determines the safety of the file to be measured according to viral diagnosis result in the embodiment of the present invention Property, the label if target sample label is negative, it is determined that file to be measured is virus document, if instead target sample label is positive Label, it is determined that file to be measured is secure file.By the above-mentioned means, using viral diagnosis model to the target sample of file to be measured This label is predicted that obtained target sample label is consistent with viral label just to can determine that file to be measured has virus, real The viral predictive ability of existing file to be measured is conducive to the safety of lifting scheme so as to perceive unknown virus.
Figure 13 is a kind of server architecture schematic diagram provided in an embodiment of the present invention, which can be because of configuration or property Energy is different and generates bigger difference, may include one or more central processing units (central processing Units, CPU) 722 (for example, one or more processors) and memory 732, one or more storages apply journey The storage medium 730 (such as one or more mass memory units) of sequence 742 or data 744.Wherein, 732 He of memory Storage medium 730 can be of short duration storage or persistent storage.The program for being stored in storage medium 730 may include one or one With upper module (diagram does not mark), each module may include to the series of instructions operation in server.Further, in Central processor 722 can be set to communicate with storage medium 730, execute on server 700 a series of in storage medium 730 Instruction operation.
Server 700 can also include one or more power supplys 726, one or more wired or wireless networks Interface 750, one or more input/output interfaces 758, and/or, one or more operating systems 741, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM etc..
The step as performed by server can be based on server architecture shown in the Figure 13 in above-described embodiment.
CPU 722 is for executing following steps:
Obtain the object run code vector of file to be detected, wherein according to the object run code vector at least one What instruction generated;
Target sample label corresponding to the object run code vector is obtained by viral diagnosis model, wherein described Viral diagnosis model is obtained according to positive sample operation code vector set and the training of negative sample operation code vector set, described Viral diagnosis model is used to indicate the relationship between operation code vector and sample label;
The viral diagnosis result of the file to be detected is determined according to the target sample label.
Optionally, CPU 722 is specifically used for executing following steps:
If the file to be detected includes effective payload file, the instruction set of the file to be detected is obtained, wherein Described instruction set includes at least one described instruction;
Every instruction in described instruction set is numbered according to order number rule, to generate the text to be detected The object run code vector of part.
Optionally, CPU 722 is specifically used for executing following steps:
It is described that every instruction in described instruction set is numbered according to order number rule, it is described to be checked to generate Survey the object run code vector of file, comprising:
Count the frequency of occurrence of each operation code in described instruction set;
Vector element is generated according to the frequency of occurrence of described instruction coding rule and each operation code;
The object run code vector of the file to be detected is generated according to the vector element.
Optionally, CPU 722 is also used to execute following steps:
Obtain the positive sample operation code vector set and the negative sample operation code vector set, wherein it is described just Sample operations code vector set includes that at least one positive sample operates code vector, and the negative sample operation code vector set includes extremely A few negative sample operates code vector, and the positive sample operation code vector is to be instructed to generate according at least one of safe sample , the negative sample operation code vector is to be generated according to the instruction of at least one of Virus Sample;
The positive sample operation code vector set and the negative sample operation code vector set are trained, obtain with Machine forest model.
Optionally, CPU 722 is specifically used for executing following steps:
The object run code vector is input to the Random Forest model, it is right to obtain the object run code vector institute The sample judging result answered, wherein the Random Forest model includes more decision trees, and each decision tree is for exporting positive result Or negative test;
The target sample label is determined according to the sample judging result.
Optionally, CPU 722 is specifically used for executing following steps:
According to the output of every decision tree in the Random Forest model as a result, to obtain object run code vector institute right The positive fruiting quantities and negative test quantity answered;
If the negative test quantity is less than the positive fruiting quantities, it is determined that the target sample label is positive label.
Optionally, CPU 722 is specifically used for executing following steps:
According to the output of every decision tree in the Random Forest model as a result, to obtain object run code vector institute right The positive fruiting quantities and negative test quantity answered;
If the negative test quantity is greater than or equal to the positive fruiting quantities, it is determined that the target sample label is negative mark Label.
Optionally, CPU 722 is specifically used for executing following steps:
If the target sample label is the positive label, it is determined that the file to be detected is secure file;
If the target sample label is the negative label, it is determined that the file to be detected is virus document.
It is apparent to those skilled in the art that for convenience and simplicity of description, the system of foregoing description, The specific work process of device and unit, can refer to corresponding processes in the foregoing method embodiment, and details are not described herein.
In several embodiments provided by the present invention, it should be understood that disclosed system, device and method can be with It realizes by another way.For example, the apparatus embodiments described above are merely exemplary, for example, the unit It divides, only a kind of logical function partition, there may be another division manner in actual implementation, such as multiple units or components It can be combined or can be integrated into another system, or some features can be ignored or not executed.Another point, it is shown or The mutual coupling, direct-coupling or communication connection discussed can be through some interfaces, the indirect coupling of device or unit It closes or communicates to connect, can be electrical property, mechanical or other forms.
The unit as illustrated by the separation member may or may not be physically separated, aobvious as unit The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple In network unit.It can select some or all of unit therein according to the actual needs to realize the mesh of this embodiment scheme 's.
It, can also be in addition, the functional units in various embodiments of the present invention may be integrated into one processing unit It is that each unit physically exists alone, can also be integrated in one unit with two or more units.Above-mentioned integrated list Member both can take the form of hardware realization, can also realize in the form of software functional units.
If the integrated unit is realized in the form of SFU software functional unit and sells or use as independent product When, it can store in a computer readable storage medium.Based on this understanding, technical solution of the present invention is substantially The all or part of the part that contributes to existing technology or the technical solution can be in the form of software products in other words It embodies, which is stored in a storage medium, including some instructions are used so that a computer Equipment (can be personal computer, server or the network equipment etc.) executes the complete of each embodiment the method for the present invention Portion or part steps.And storage medium above-mentioned include: USB flash disk, mobile hard disk, read-only memory (read-only memory, ROM), random access memory (random access memory, RAM), magnetic or disk etc. are various can store program The medium of code.
The above, the above embodiments are merely illustrative of the technical solutions of the present invention, rather than its limitations;Although referring to before Stating embodiment, invention is explained in detail, those skilled in the art should understand that: it still can be to preceding Technical solution documented by each embodiment is stated to modify or equivalent replacement of some of the technical features;And these It modifies or replaces, the spirit and scope for technical solution of various embodiments of the present invention that it does not separate the essence of the corresponding technical solution.

Claims (15)

1. a kind of method of viral diagnosis characterized by comprising
Obtain the object run code vector of file to be detected, wherein at least one is instructed according to the object run code vector It generates;
Target sample label corresponding to the object run code vector is obtained by viral diagnosis model, wherein the virus Detection model is obtained according to positive sample operation code vector set and the training of negative sample operation code vector set, the virus Detection model is used to indicate the relationship between operation code vector and sample label;
The viral diagnosis result of the file to be detected is determined according to the target sample label.
2. the method according to claim 1, wherein the object run code vector for obtaining file to be detected, Include:
If the file to be detected includes effective payload file, the instruction set of the file to be detected is obtained, wherein described Instruction set includes at least one described instruction;
Every instruction in described instruction set is numbered according to order number rule, to generate the file to be detected The object run code vector.
3. according to the method described in claim 2, it is characterized in that, the object run code vector includes at least one element vector Element;
It is described that every instruction in described instruction set is numbered according to order number rule, to generate the text to be detected The object run code vector of part, comprising:
Count the frequency of occurrence of each operation code in described instruction set;
Vector element is generated according to the frequency of occurrence of described instruction coding rule and each operation code;
The object run code vector of the file to be detected is generated according to the vector element.
4. the method according to claim 1, wherein described obtain the object run by viral diagnosis model Before target sample label corresponding to code vector, the method also includes:
Obtain the positive sample operation code vector set and the negative sample operation code vector set, wherein the positive sample Operation code vector set includes that at least one positive sample operates code vector, and the negative sample operation code vector set includes at least one A negative sample operates code vector, and the positive sample operation code vector is to be generated according to the instruction of at least one of safe sample, The negative sample operation code vector is to be generated according to the instruction of at least one of Virus Sample;
The positive sample operation code vector set and the negative sample operation code vector set are trained, obtained random gloomy Woods model.
5. according to the method described in claim 4, it is characterized in that, described obtain the object run by viral diagnosis model Target sample label corresponding to code vector, comprising:
The object run code vector is input to the Random Forest model, is obtained corresponding to the object run code vector Sample judging result, wherein the Random Forest model includes more decision trees, and each decision tree is for exporting positive result or bearing As a result;
The target sample label is determined according to the sample judging result.
6. according to the method described in claim 5, it is characterized in that, described obtain sample corresponding to the object run code vector This judging result, comprising:
According to the output of every decision tree in the Random Forest model as a result, obtaining corresponding to the object run code vector Positive fruiting quantities and negative test quantity;
It is described that the target sample label is determined according to the sample judging result, comprising:
If the negative test quantity is less than the positive fruiting quantities, it is determined that the target sample label is positive label.
7. according to the method described in claim 5, it is characterized in that, described obtain sample corresponding to the object run code vector This judging result, comprising:
According to the output of every decision tree in the Random Forest model as a result, obtaining corresponding to the object run code vector Positive fruiting quantities and negative test quantity;
It is described that the target sample label is determined according to the sample judging result, comprising:
If the negative test quantity is greater than or equal to the positive fruiting quantities, it is determined that the target sample label is negative label.
8. method according to claim 6 or 7, which is characterized in that described according to target sample label determination The viral diagnosis result of file to be detected, comprising:
If the target sample label is the positive label, it is determined that the file to be detected is secure file;
If the target sample label is the negative label, it is determined that the file to be detected is virus document.
9. a kind of viral diagnosis device characterized by comprising
Module is obtained, for obtaining the object run code vector of file to be detected, wherein according to the object run code vector What at least one instruction generated;
The acquisition module is also used to obtain target sample corresponding to the object run code vector by viral diagnosis model Label, wherein the viral diagnosis model is according to positive sample operation code vector set and negative sample operation code vector set What training obtained, the viral diagnosis model is used to indicate the relationship between operation code vector and sample label;
Determining module, the target sample label for being obtained according to the acquisition module determine the disease of the file to be detected Malicious testing result.
10. viral diagnosis device according to claim 9, which is characterized in that
The acquisition module obtains the text to be detected if being specifically used for the file to be detected includes effective payload file The instruction set of part, wherein described instruction set includes at least one described instruction;
Every instruction in described instruction set is numbered according to order number rule, to generate the file to be detected The object run code vector.
11. viral diagnosis device according to claim 9, which is characterized in that the viral diagnosis device further includes training Module;
The acquisition module is also used to obtaining target sample corresponding to the object run code vector by viral diagnosis model Before this label, the positive sample operation code vector set and the negative sample operation code vector set are obtained, wherein described Positive sample operation code vector set includes that at least one positive sample operates code vector, and the negative sample operation code vector set includes At least one negative sample operates code vector, and the positive sample operation code vector is to be instructed to give birth to according at least one of safe sample At, the negative sample operation code vector is to be generated according to the instruction of at least one of Virus Sample;
The training module, the positive sample operation code vector set and the negative sample for being obtained to the acquisition module This operation code vector set is trained, and obtains Random Forest model.
12. a kind of viral diagnosis device, which is characterized in that the viral diagnosis device includes: memory, transceiver, processor And bus system;
Wherein, the memory is for storing program;
The processor is used to execute the program in the memory, includes the following steps:
Obtain the object run code vector of file to be detected, wherein at least one is instructed according to the object run code vector It generates;
Target sample label corresponding to the object run code vector is obtained by viral diagnosis model, wherein the virus Detection model is obtained according to positive sample operation code vector set and the training of negative sample operation code vector set, the virus Detection model is used to indicate the relationship between operation code vector and sample label;
The viral diagnosis result of the file to be detected is determined according to the target sample label;
The bus system is for connecting the memory and the processor, so that the memory and the processor It is communicated.
13. viral diagnosis device according to claim 12, which is characterized in that the processor is specifically used for executing as follows Step:
If the file to be detected includes effective payload file, the instruction set of the file to be detected is obtained, wherein described Instruction set includes at least one described instruction;
Every instruction in described instruction set is numbered according to order number rule, to generate the file to be detected The object run code vector.
14. viral diagnosis device according to claim 12, which is characterized in that the processor is also used to execute following step It is rapid:
Obtain the positive sample operation code vector set and the negative sample operation code vector set, wherein the positive sample Operation code vector set includes that at least one positive sample operates code vector, and the negative sample operation code vector set includes at least one A negative sample operates code vector, and the positive sample operation code vector is to be generated according to the instruction of at least one of safe sample, The negative sample operation code vector is to be generated according to the instruction of at least one of Virus Sample;
The positive sample operation code vector set and the negative sample operation code vector set are trained, obtained random gloomy Woods model.
15. a kind of computer readable storage medium, including instruction, when run on a computer, so that computer executes such as Method described in any item of the claim 1 to 8.
CN201810332378.XA 2018-04-13 2018-04-13 Virus detection method and related device Active CN110210216B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810332378.XA CN110210216B (en) 2018-04-13 2018-04-13 Virus detection method and related device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810332378.XA CN110210216B (en) 2018-04-13 2018-04-13 Virus detection method and related device

Publications (2)

Publication Number Publication Date
CN110210216A true CN110210216A (en) 2019-09-06
CN110210216B CN110210216B (en) 2023-03-17

Family

ID=67779047

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810332378.XA Active CN110210216B (en) 2018-04-13 2018-04-13 Virus detection method and related device

Country Status (1)

Country Link
CN (1) CN110210216B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110611675A (en) * 2019-09-20 2019-12-24 哈尔滨安天科技集团股份有限公司 Vector magnitude detection rule generation method and device, electronic equipment and storage medium
CN112948829A (en) * 2021-03-03 2021-06-11 深信服科技股份有限公司 File searching and killing method, system, equipment and storage medium
CN113257426A (en) * 2021-06-30 2021-08-13 杭州华网信息技术有限公司 Aggregated group flu prediction system, storage medium and device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105205396A (en) * 2015-10-15 2015-12-30 上海交通大学 Detecting system for Android malicious code based on deep learning and method thereof
CN106845240A (en) * 2017-03-10 2017-06-13 西京学院 A kind of Android malware static detection method based on random forest
CN106919841A (en) * 2017-03-10 2017-07-04 西京学院 A kind of efficient Android malware detection model DroidDet based on rotation forest
CN107392019A (en) * 2017-07-05 2017-11-24 北京金睛云华科技有限公司 A kind of training of malicious code family and detection method and device
CN107463844A (en) * 2016-06-06 2017-12-12 国家计算机网络与信息安全管理中心 WEB Trojan detecting methods and system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105205396A (en) * 2015-10-15 2015-12-30 上海交通大学 Detecting system for Android malicious code based on deep learning and method thereof
CN107463844A (en) * 2016-06-06 2017-12-12 国家计算机网络与信息安全管理中心 WEB Trojan detecting methods and system
CN106845240A (en) * 2017-03-10 2017-06-13 西京学院 A kind of Android malware static detection method based on random forest
CN106919841A (en) * 2017-03-10 2017-07-04 西京学院 A kind of efficient Android malware detection model DroidDet based on rotation forest
CN107392019A (en) * 2017-07-05 2017-11-24 北京金睛云华科技有限公司 A kind of training of malicious code family and detection method and device

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110611675A (en) * 2019-09-20 2019-12-24 哈尔滨安天科技集团股份有限公司 Vector magnitude detection rule generation method and device, electronic equipment and storage medium
CN112948829A (en) * 2021-03-03 2021-06-11 深信服科技股份有限公司 File searching and killing method, system, equipment and storage medium
CN112948829B (en) * 2021-03-03 2023-11-03 深信服科技股份有限公司 File searching and killing method, system, equipment and storage medium
CN113257426A (en) * 2021-06-30 2021-08-13 杭州华网信息技术有限公司 Aggregated group flu prediction system, storage medium and device

Also Published As

Publication number Publication date
CN110210216B (en) 2023-03-17

Similar Documents

Publication Publication Date Title
CN103748853B (en) For the method and system that the protocol message in data communication network is classified
CN105989283B (en) A kind of method and device identifying virus mutation
CN103839003B (en) Malicious file detection method and device
CN103761481A (en) Method and device for automatically processing malicious code sample
CN109684840A (en) Based on the sensitive Android malware detection method for calling path
CN105205396A (en) Detecting system for Android malicious code based on deep learning and method thereof
CN110210218B (en) Virus detection method and related device
CN103810428B (en) Method and device for detecting macro virus
KR102095853B1 (en) Virus database acquisition method and device, equipment, server and system
CN110210216A (en) A kind of method and relevant apparatus of viral diagnosis
CN106598866A (en) smali intermediate language-based static detection system and method
KR102151318B1 (en) Method and apparatus for malicious detection based on heterogeneous information network
Jiang et al. Android malware family classification based on sensitive opcode sequence
CN108595953A (en) Method for carrying out risk assessment on mobile phone application
CN110298173A (en) The detection Malware hiding by the delay circulation of software program
CN111651768B (en) Method and device for identifying link library function name of computer binary program
CN112688966A (en) Webshell detection method, device, medium and equipment
Bernardi et al. A fuzzy-based process mining approach for dynamic malware detection
Sethi et al. A novel malware analysis for malware detection and classification using machine learning algorithms
US20230252136A1 (en) Apparatus for processing cyber threat information, method for processing cyber threat information, and medium for storing a program processing cyber threat information
CN110532776B (en) Android malicious software efficient detection method, system and medium based on runtime data analysis
CN114491523A (en) Malicious software detection method and device, electronic equipment, medium and product
CN114372519A (en) Model training method, API request filtering method, device and storage medium
CN110210215B (en) Virus detection method and related device
US20240054210A1 (en) Cyber threat information processing apparatus, cyber threat information processing method, and storage medium storing cyber threat information processing program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant