CN104680065A - Virus detection method, virus detection device and virus detection equipment - Google Patents

Virus detection method, virus detection device and virus detection equipment Download PDF

Info

Publication number
CN104680065A
CN104680065A CN201510038792.6A CN201510038792A CN104680065A CN 104680065 A CN104680065 A CN 104680065A CN 201510038792 A CN201510038792 A CN 201510038792A CN 104680065 A CN104680065 A CN 104680065A
Authority
CN
China
Prior art keywords
feature
classification
file
detected
viral diagnosis
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201510038792.6A
Other languages
Chinese (zh)
Inventor
陈治宇
周吉文
周杰
李伟
郭疆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Anyi Hengtong Beijing Technology Co Ltd
Original Assignee
Anyi Hengtong Beijing Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Anyi Hengtong Beijing Technology Co Ltd filed Critical Anyi Hengtong Beijing Technology Co Ltd
Priority to CN201510038792.6A priority Critical patent/CN104680065A/en
Publication of CN104680065A publication Critical patent/CN104680065A/en
Priority to BR102015026015A priority patent/BR102015026015A2/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/562Static detection

Abstract

The invention provides a virus detection method, a virus detection device and virus detection equipment. The virus detection method includes recognizing a format of a to-be-detected file so as to determine a type of the to-be-detected file; extracting a feature vector, corresponding to the type, of the to-be-detected file; inputting the feature vector into a virus detection model corresponding to the type so as to detect whether the to-be-detected file is a virus file or not. The virus detection method, the virus detection device and the virus detection equipment have the advantages that virus detection operation can be simplified effectively, virus detection speed can be increased, and different types of to-be-detected files are detected by corresponding types of virus detection models, so that virus detection rate of various types of files and timely discovery of unknown viruses can be guaranteed effectively.

Description

Method for detecting virus, device and equipment
Technical field
The present invention relates to computer realm, particularly relate to a kind of method for detecting virus, device and equipment.
Background technology
Along with the development of Internet technology, kind and the quantity of virus are all constantly then increasing, wherein trojan horse is due to its exclusive harmfulness, cause many unique program developers a large amount of write this kind of with stealing with monitor the invasive program of others' computer, thus cause trojan horse to overflow.
Traditional checking and killing method for trojan horse mostly is to adopt manual analysis coupling or manually start the modes such as rule and detects.This manual analysis coupling or manually start rule mode at least there is following drawback:
Human cost is higher, and requires higher to the professional skill of analyst;
In addition, Viral diagnosis speed is difficult to ensure, therefore, detection efficiency is lower;
Meanwhile, the virus of discovery the unknown promptly and accurately is also difficult to.
Summary of the invention
One of technical matters that the present invention solves is to provide a kind of method for detecting virus, device and equipment, while reducing human cost, improves efficiency and the accuracy of Viral diagnosis.
An embodiment according to an aspect of the present invention, provides a kind of method for detecting virus, comprising:
Identify the file layout of file to be detected, thus determine described file generic to be detected;
Extract the vectorial with described classification characteristic of correspondence of described file to be detected;
Whether described proper vector be input in the Viral diagnosis model of corresponding classification, detecting described file to be detected is virus document.
Alternatively, determine that described file generic to be detected comprises:
Determine that described file generic to be detected is for specifying any one in classification; Or
In the non-designated classification of described file to be detected, any one, then determine that described file to be detected is for other classifications.
Alternatively, described appointment classification comprises following at least one:
Explain compressed package classification, installation kit classification, compiler classification, shell classification by oneself.
Alternatively, following at least one item is comprised with described compressed package classification characteristic of correspondence vector of explaining by oneself:
The School Affairs feature of bibliographic structure feature, file, file data content feature, compression algorithm feature, execution script feature.
Alternatively, following at least one item is comprised with described installation kit classification characteristic of correspondence vector:
Bibliographic structure feature, installation script feature, installation procedure version feature, installation file School Affairs feature, install plug-in unit feature, file data feature, according to interface image feature.
Alternatively, following at least one item is comprised with described compiler classification characteristic of correspondence vector:
Function feature, category feature, characteristics of variables, adduction relationship feature, character string feature, data block characteristics, logical implication, data content feature.
Alternatively, following at least one item is comprised with described shell classification characteristic of correspondence vector:
Shell feature, internal processes feature, compression or encrypted feature.
Alternatively, following at least one item is comprised with other classification characteristic of correspondence vectors described:
Program structure feature, import and export data characteristics, text-string feature, numerical characteristics, entrance and Function feature, version information feature, icon image feature.
Alternatively, the Viral diagnosis model of described corresponding classification is:
Described classification characteristic of correspondence vector is utilized to train based on machine learning algorithm the Viral diagnosis model obtained.
An embodiment according to a further aspect of the invention, provides a kind of Viral diagnosis device, comprising:
For identifying the file layout of file to be detected, thus determine the unit of described file generic to be detected;
For extract described file to be detected with the unit of described classification characteristic of correspondence vector;
For being input in the Viral diagnosis model of corresponding classification by described proper vector, detect the unit whether described file to be detected is virus document.
Alternatively, described file generic to be detected comprises:
Specify any one in classification; Or other classifications outside appointment classification.
Alternatively, described appointment classification comprises following at least one:
Explain compressed package classification, installation kit classification, compiler classification, shell classification by oneself.
Alternatively, following at least one item is comprised with described compressed package classification characteristic of correspondence vector of explaining by oneself:
The School Affairs feature of bibliographic structure feature, file, file data content feature, compression algorithm feature, execution script feature.
Alternatively, following at least one item is comprised with described installation kit classification characteristic of correspondence vector:
Bibliographic structure feature, installation script feature, installation procedure version feature, installation file School Affairs feature, install plug-in unit feature, file data feature, according to interface image feature.
Alternatively, following at least one item is comprised with described compiler classification characteristic of correspondence vector:
Function feature, category feature, characteristics of variables, adduction relationship feature, character string feature, data block characteristics, logical implication, data content feature.
Alternatively, following at least one item is comprised with described shell classification characteristic of correspondence vector:
Shell feature, internal processes feature, compression or encrypted feature.
Alternatively, following at least one item is comprised with other classification characteristic of correspondence vectors described:
Program structure feature, import and export data characteristics, text-string feature, numerical characteristics, entrance and Function feature, version information feature, icon image feature.
Alternatively, the Viral diagnosis model of described corresponding classification is:
Described classification characteristic of correspondence vector is utilized to train based on machine learning algorithm the Viral diagnosis model obtained.
An embodiment according to a further aspect in the invention, additionally provides a kind of computer equipment, comprises aforementioned viral pick-up unit.
The application is by identifying the form of file to be detected, described classification to be detected can be determined, thus extract the vectorial with this classification characteristic of correspondence of file to be detected, be input in the Viral diagnosis model of corresponding classification and realize Viral diagnosis, this process does not need all proper vectors extracting file to be detected, but only extract and this classification characteristic of correspondence vector, therefore Viral diagnosis operation can effectively be simplified, promote Viral diagnosis speed, and different classes of file to be detected adopts the Viral diagnosis model of corresponding classification to detect, effectively can ensure the viral recall rate of various category file, and the Timeliness coverage of unknown virus, and effectively reduce human cost.
Those of ordinary skill in the art will understand, although detailed description is below carried out with reference to illustrated embodiment, accompanying drawing, the present invention is not limited in these embodiments.But scope of the present invention is widely, and be intended to limit scope of the present invention by means of only accompanying claim.
Accompanying drawing explanation
By reading the detailed description done non-limiting example done with reference to the following drawings, other features, objects and advantages of the present invention will become more obvious:
Fig. 1 is the process flow diagram of method for detecting virus according to an embodiment of the invention.
Fig. 2 is the process flow diagram of Viral diagnosis model training method according to an embodiment of the invention.
Fig. 3 is that method for detecting virus realizes theory diagram according to an embodiment of the invention.
Fig. 4 is the structural representation of Viral diagnosis device according to an embodiment of the invention.
In accompanying drawing, same or analogous Reference numeral represents same or analogous parts.
Embodiment
Trojan horse, also claims wooden horse (Trojan), refers to control another computing machine by specific program (trojan horse program).Wooden horse has two executable programs usually: one is control end, and another is controlled terminal." wooden horse " program is virus document popular at present, different from general virus, it can not self-reproduction, " deliberately " do not go to infect alternative document yet, it is by attracting user to download execution self camouflage, there is provided to executing kind of a wooden horse person door opened by kind of main frame, make to execute kind of person and can damage arbitrarily, steal by the file of kind of person, even remote control is by kind of a main frame.
The feature of trojan horse is: permit the right to use that just can obtain computer without computer user.Program capacity is very gently little, too many resource can not be wasted during operation, so there is no use antivirus software to be difficult to realize, be difficult to the action stoping it during operation, after operation, automatically System guides district is logged at once, automatically to run when Windows loads at every turn afterwards, or changing file name automatically at once, even stealthy, or automatically copy in alternative document folder, run and be used in conjunction the action that all cannot run at family itself. at once
The harm of trojan horse: trojan horse is others' password of bald-faced supervision others and theft, data etc., destruction is done as stolen administrator's password-subnet password, or it is joyful, stealing online password is used for other places, game account number, stock account number, even Web bank account etc., reaches and peeps others' privacy and obtain for the purpose of economic interests.So the effect of wooden horse is more useful than early stage computer virus, the object of user more directly can be reached! Cause many unique program developers a large amount of write this kind of with stealing with monitor the invasive program of others' computer, reason that online a large amount of wooden horse overflows that Here it is.In view of these significant damage of wooden horse are different with the interaction property of early stage virus with it, although so wooden horse belongs to the class in virus, independent being stripped out from Virus Type is middle, be independently referred to as " trojan horse program ".
The method for detecting virus that the embodiment of the present application provides, device and equipment can be used for the detection of trojan horse, are certainly not limited thereto.
Below in conjunction with accompanying drawing, the present invention is described in further detail.
Fig. 1 is the process flow diagram of method for detecting virus according to an embodiment of the invention.Method in the present invention has been come mainly through the operating system in computer equipment or processing controller.Operating system or processing controller are called Viral diagnosis device.This computer equipment include but not limited to following at least one: subscriber equipment, the network equipment.Subscriber equipment includes but not limited to computing machine, smart mobile phone, PDA etc.The network equipment includes but not limited to the server group that single network server, multiple webserver form or the cloud be made up of a large amount of computing machine or the webserver based on cloud computing, wherein, cloud computing is the one of Distributed Calculation, the super virtual machine be made up of a group loosely-coupled computing machine collection.
Be that this method for detecting virus mainly comprises the steps: as shown in fig. 1
S100, identify the file layout of file to be detected, thus determine described file generic to be detected;
S110, that extract described file to be detected with described classification characteristic of correspondence vector;
Whether S120, be input in the Viral diagnosis model of corresponding classification by described proper vector, detecting described file to be detected is virus document.
Before introducing above steps, it should be noted that, the method for detecting virus described in the embodiment of the present application adopts the Viral diagnosis model of training in advance to realize, and therefore, before the operation of this Viral diagnosis of execution, needs to train Viral diagnosis model.But owing to all needing to perform this training and operation when being not and performing Viral diagnosis operation at every turn, therefore, the operation of this training Viral diagnosis model is not the steps necessary detecting virus.First the training method of lower Viral diagnosis model is introduced below.As shown in Figure 2, be the process flow diagram of the Viral diagnosis model training method that the application's embodiment provides, this training method can comprise the steps:
S200, acquisition Viral diagnosis model training sample;
The embodiment of the present application does not do concrete restriction to the method and quantity that obtain this sample, and is understandable that, its sample size obtained is more, then the accuracy of the Viral diagnosis Model Identification virus trained is higher.
In addition, it should be noted that, the method for the training Viral diagnosis model that the embodiment of the present application provides, can only train based on by the Virus Sample of virus infections, and namely, training process only adopts black file; Also can based on 1: 1 Virus Sample and non-infection Virus Sample train, namely, training process adopts the ratio of black file and text of an annotated book part 1: 1.Black file described herein is namely by the file of virus infections, and text of an annotated book part is not by the normal file of virus infections.
The embodiment of the present application trains separately corresponding Viral diagnosis model for different classes of file, and such as, the file class described in the embodiment of the present application includes but not limited at least one in following appointment classification:
Explain compressed package classification, installation kit classification, compiler classification, shell classification by oneself.
Compressed package and self-extracting is explained by oneself described in the present embodiment, adopt self-extracting mode to carry out a kind of file bag compressed, wherein self-extracting (SFX, Self-eXtracting) be the one of compressed file, because it can by any tool of compression, and only need double-click this file and just can automatically perform decompression, be therefore called self-extracting.Compare with compressed file, the compressed file volume of self-extracting is greater than common compressed file (because it built-in self-extracting program), but its advantage is exactly can open compressed file when not installing compressed software, and the file explaining compressed package by oneself comprises zip, rar and 7z three types.
Installation kit described in the present embodiment and software installation kit, comprising the All Files that software is installed, run this installation kit (executable file), can the All Files of this software be discharged on hard disk, complete edit the registry, amendment Operation system setting, create the work such as shortcut.Installation kit classification described in the present embodiment includes but not limited to: the file of the types such as inno, nsis, msi, cab.
Compiler described in the present embodiment is exactly be the program of another kind of language by a kind of Language Translation.
Shell classification is wherein the program adding shell, and adding the program that shell crosses can directly run, but can not check the binary data of original program, just will can check the binary data of original program through shelling.Add shell, be utilize special algorithm, the resource in EXE, dll file is compressed, encrypted.The effect of similar WINZIP, the file only after this compression, can independent operating, and decompression procedure is completely hidden, all completes in internal memory.After they are attached to and original program are loaded into internal memory by Windows loader, perform prior to original program, obtain control, in implementation, original program is decrypted, reduce, again control is given back original program after having reduced, perform original code section.After adding shell, Original program code is generally that form after encrypting exists in disk file, only reducing in internal memory when performing, so just can more effectively prevent cracker to the illegal modifications of program file, program also can be prevented by static decompiling simultaneously.
Be understandable that, file class is not limited to above-mentioned cited several, classification outside above-mentioned appointment classification is referred to as other classifications, the embodiment of the present application, except the Viral diagnosis model for above-mentioned appointment classification training correspondence, can also train separately a Viral diagnosis model for other described classifications.
The embodiment of the present application trains separately corresponding Viral diagnosis model for each appointment classification above-mentioned, then when obtaining training sample, need the Virus Sample obtaining corresponding classification, such as, for explaining compressed package classification by oneself, obtain and explained by oneself compressed package files as Virus Sample by virus infections, for installation kit classification, obtain by the installation kit file of virus infections as Virus Sample etc.For other classifications, then obtain other classifications except above-mentioned appointment classification by the file of virus infections as Virus Sample.
S210, extract sample characteristic of correspondence of all categories vector;
The embodiment of the present application is for different classes of sample, and it obtains and this classification characteristic of correspondence vector, and namely it obtains different proper vectors, below for above-mentioned cited appointment classification, illustrates that the proper vector of sample acquisition of all categories is different.
Wherein, explaining compressed package classification by oneself needs the proper vector obtained can comprise following at least one item:
The School Affairs feature of bibliographic structure feature, file, file data content feature, compression algorithm feature, execution script feature.
Installation kit classification needs the proper vector obtained can comprise following at least one item:
Bibliographic structure feature, installation script feature, installation procedure version feature, installation file School Affairs feature, install plug-in unit feature, file data feature, according to interface image feature.
Compiler classification needs the proper vector obtained can comprise following at least one item:
Function feature, category feature, characteristics of variables, adduction relationship feature, character string feature, data block characteristics, logical implication, data content feature.
Shell classification needs the proper vector obtained can comprise following at least one item:
Shell feature, internal processes feature, compression or encrypted feature.
Be understandable that, the proper vector that above-mentioned correspondence of all categories obtains is only the instantiation cited by inventor, also can comprise other proper vectors of corresponding classification in practical operation, cannot be exhaustive at this, and it is all in the application's protection domain.
In addition, for other above-mentioned classification samples, it needs the proper vector obtained can comprise following at least one item:
Program structure feature, import and export data characteristics, text-string feature, numerical characteristics, entrance and Function feature, version information feature, icon image feature.
Above-mentioned is the proper vector needing to obtain of all categories, and after each proper vector of acquisition, the embodiment of the present application also can carry out respective handling to above-mentioned proper vector.Described respective handling includes but not limited to:
Nonumeric feature is quantized, high dimensional feature is carried out Feature Selection, and all features are normalized.
The machine learning classification algorithm that S220, utilization are preset calculates, and obtains infection type virus model of cognition.
This step is input in the sorting algorithm of machine learning the sample of above-mentioned acquisition and the proper vector of extraction, thus obtain Viral diagnosis model.If in above-mentioned steps, carry out respective handling recited above to the proper vector obtained, then the proper vector used during machine learning is the proper vector after above-mentioned process.
The embodiment of the present application does not do concrete restriction to adopted sorting algorithm, and it can adopt any one sorting algorithm existing, such as decision Tree algorithms, SVM (Support Vector Machine, support vector machine) algorithm etc.
Obtain for detecting the Viral diagnosis model of specifying classification file virus by above-mentioned training method, such as, can obtain respectively for detect explain compressed package category file by oneself Viral diagnosis model, for detect installation kit category file Viral diagnosis model, for detect compiler category file Viral diagnosis model, for detecting the Viral diagnosis model of shell category file and the Viral diagnosis model for other category files of detecting non-designated classification.
Below each step of above-mentioned Viral diagnosis is described in further detail.
S100 is the file layout for identifying file to be detected, thus determines described file generic to be detected; According to the file layout of file to be detected, this step namely determines that file generic to be detected specifies any one in classification, or specify other classifications outside classification.
Wherein, the recognition methods of often kind of other file of specified class is different, the embodiment of the present application does not do concrete restriction to concrete recognition methods, it can adopt any one recognition methods existing to realize, such as, for explaining compressed package classification by oneself, by judging that the first two of additional data is to four bytes, or whether there is the keyword of compressed package in the data block of the beginning of additional data or resource, determining that whether this file is for explaining compressed package category file by oneself.And for example, for compiler classification, by the instruction calls feature of entrance, and the Function feature used identifies.For shell classification, the code characteristic by entrance is mated, and the information feature of some joints carries out distinguishing etc.So do not belong to other file of above-mentioned specified class in identifying and be namely defined as other category files.
Identify the file layout of file to be detected, determine that the object of this file generic to be detected is, which proper vector needing to extract this file to be detected can be determined, because different classes of file to be detected needs the proper vector extracting corresponding classification.Step S110 is the classification belonging to the file to be detected determined, extract the vectorial with described classification characteristic of correspondence of described file to be detected, wherein various classification correspondence needs the proper vector extracted as described in the introduction training Viral diagnosis model above, repeats no more herein.It should be noted that, the proper vector shifted to an earlier date when the proper vector needing to extract of all categories and the Viral diagnosis model that training is corresponding is consistent.
The proper vector extracted in step S110 is input in the Viral diagnosis model of corresponding classification by step S120, and whether detect described file to be detected is virus document.
Such as, for explaining compressed package classification by oneself, its proper vector obtained comprises: the School Affairs feature of bibliographic structure feature, file, file data content feature, compression algorithm feature, execution script feature.This proper vector is input to for detect explain compressed package category file by oneself Viral diagnosis model in, carry out Viral diagnosis.
For compiler classification, its proper vector obtained comprises: Function feature, category feature, characteristics of variables, adduction relationship feature, character string feature, data block characteristics, logical implication, data content feature.What this proper vector inputted is used for detecting in the Viral diagnosis model of compiler category file, carries out Viral diagnosis.
Method for detecting virus described in the present embodiment realize theory diagram as shown in Figure 3, as seen from Figure 3, for one section of file to be detected, first format identification is carried out, thus determine that file generic to be detected is classification 1, classification 2, one in classification 3 or classification 4, extract the vectorial with this classification characteristic of correspondence of this file to be detected afterwards, proper vector 1 is extracted for classification 1, proper vector 2 is extracted for classification 2, proper vector 3 is extracted for classification 2, proper vector 4 etc. is extracted for classification 4, again each proper vector extracted is input in corresponding Viral diagnosis model, the corresponding Viral diagnosis model 1 of classification 1, the corresponding Viral diagnosis model 2 of classification 2, the corresponding Viral diagnosis model 3 of classification 3, corresponding Viral diagnosis model 4 of classification 4 etc.
Achieve different classes of file to be detected by above steps, extraction characteristic of correspondence vector, be input in the Viral diagnosis model of corresponding classification and realize Viral diagnosis, the method does not need all proper vectors extracting file to be detected, but only extract and this classification characteristic of correspondence vector, therefore Viral diagnosis operation can effectively be simplified, promote Viral diagnosis speed, and different classes of file to be detected adopts the Viral diagnosis model of corresponding classification to detect, effectively can ensure the viral recall rate of various category file, and the Timeliness coverage of unknown virus, and effectively can reduce human cost.
Based on the thinking that said method is same, the embodiment of the present application also provides a kind of Viral diagnosis device, and as shown in Figure 4, this device mainly comprises the structural representation of this Viral diagnosis device:
For identifying the file layout of file to be detected, thus determine the unit 400 of described file generic to be detected, hereinafter referred to as classification determination unit 400;
For extract described file to be detected with the unit 410 of described classification characteristic of correspondence vector, hereinafter referred to as feature extraction unit 410;
For being input in the Viral diagnosis model of corresponding classification by described proper vector, detect the unit 420 whether described file to be detected is virus document, hereinafter referred to as virus detection element 420.
Below above-mentioned each unit is described in further detail.
Classification determination unit 400 is the file layouts for identifying file to be detected, thus determines file generic to be detected; Namely determine that file generic to be detected specifies any one in classification according to the file layout of file to be detected, or specify other classifications outside classification.
Appointment classification described in the embodiment of the present application comprises following at least one:
Explain compressed package classification, installation kit classification, compiler classification, shell classification by oneself.
Compressed package and self-extracting is explained by oneself described in the present embodiment, adopt self-extracting mode to carry out a kind of file bag compressed, wherein self-extracting (SFX, Self-eXtracting) be the one of compressed file, because it can by any tool of compression, and only need double-click this file and just can automatically perform decompression, be therefore called self-extracting.Compare with compressed file, the compressed file volume of self-extracting is greater than common compressed file (because it built-in self-extracting program), but its advantage is exactly can open compressed file when not installing compressed software, and the file explaining compressed package by oneself comprises zip, rar and 7z three types.
Installation kit described in the present embodiment and software installation kit, comprising the All Files that software is installed, run this installation kit (executable file), can the All Files of this software be discharged on hard disk, complete edit the registry, amendment Operation system setting, create the work such as shortcut.Installation kit classification described in the present embodiment includes but not limited to: the file of the types such as inno, nsis, msi, cab.
Compiler described in the present embodiment is exactly be the program of another kind of language by a kind of Language Translation.
Shell classification is wherein the program adding shell, and adding the program that shell crosses can directly run, but can not check the binary data of original program, just will can check the binary data of original program through shelling.Add shell, be utilize special algorithm, the resource in EXE, dll file is compressed, encrypted.The effect of similar WINZIP, the file only after this compression, can independent operating, and decompression procedure is completely hidden, all completes in internal memory.After they are attached to and original program are loaded into internal memory by Windows loader, perform prior to original program, obtain control, in implementation, original program is decrypted, reduce, again control is given back original program after having reduced, perform original code section.After adding shell, Original program code is generally that form after encrypting exists in disk file, only reducing in internal memory when performing, so just can more effectively prevent cracker to the illegal modifications of program file, program also can be prevented by static decompiling simultaneously.
Be understandable that, file class is not limited to above-mentioned cited several, and the classification outside above-mentioned appointment classification is referred to as other classifications by the embodiment of the present application.
This classification determination unit 400 identifies that often kind is specified the method for category file different, the embodiment of the present application does not do concrete restriction to concrete recognition methods, it can adopt any one recognition methods existing to realize, such as, for explaining compressed package classification by oneself, by judging that the first two of additional data is to four bytes, or in the data block of the beginning of additional data or resource, whether there is the keyword of compressed package, determining that whether this file is for explaining compressed package category file by oneself.And for example, for compiler classification, by the instruction calls feature of entrance, and the Function feature used identifies.For shell classification, the code characteristic by entrance is mated, and the information feature of some joints carries out distinguishing etc.So do not belong to other file of above-mentioned specified class in identifying and be namely defined as other category files.
Identify the file layout of file to be detected, determine that the object of this file generic to be detected is, which proper vector needing to extract this file to be detected can be determined, because the embodiment of the present application needs for different classes of file to be detected the proper vector extracting corresponding classification.
Feature extraction unit 410 is the classification belonging to file to be detected determined according to classification determination unit 400, extracts the vectorial with described classification characteristic of correspondence of described file to be detected.Below for above-mentioned cited appointment classification, the proper vector needing to obtain of all categories is described.
Wherein, explaining compressed package classification by oneself needs the proper vector obtained can comprise following at least one item:
The School Affairs feature of bibliographic structure feature, file, file data content feature, compression algorithm feature, execution script feature.
Installation kit classification needs the proper vector obtained can comprise following at least one item:
Bibliographic structure feature, installation script feature, installation procedure version feature, installation file School Affairs feature, install plug-in unit feature, file data feature, according to interface image feature.
Compiler classification needs the proper vector obtained can comprise following at least one item:
Function feature, category feature, characteristics of variables, adduction relationship feature, character string feature, data block characteristics, logical implication, data content feature.
Shell classification needs the proper vector obtained can comprise following at least one item:
Shell feature, internal processes feature, compression or encrypted feature.
Be understandable that, the proper vector that above-mentioned correspondence of all categories obtains is only the instantiation cited by inventor, also can comprise other proper vectors of corresponding classification in practical operation, cannot be exhaustive at this, and it is all in the application's protection domain.
In addition, for other above-mentioned classifications, it needs the proper vector obtained can comprise following at least one item:
Program structure feature, import and export data characteristics, text-string feature, numerical characteristics, entrance and Function feature, version information feature, icon image feature.
Above-mentioned is the proper vector needing to obtain of all categories, and after each proper vector of acquisition, the embodiment of the present application also can carry out respective handling to above-mentioned proper vector.Described respective handling includes but not limited to:
Nonumeric feature is quantized, high dimensional feature is carried out Feature Selection, and all features are normalized.
Be understandable that, whether above-mentioned respective handling is carried out to obtained proper vector, depend on whether the Viral diagnosis model of its correspondence carries out same respective handling to the proper vector extracted in the training stage, namely consistent with the mode of operation of Viral diagnosis model training stage to proper vector to the operation of the proper vector extracted in testing process.
Virus detection element 420 is input in the Viral diagnosis model of corresponding classification by the proper vector that feature extraction unit 410 is extracted, and whether detect described file to be detected is virus document.
Namely, the embodiment of the present application needs the Viral diagnosis model training correspondence of all categories before execution Viral diagnosis, comprising: for detect explain compressed package category file by oneself Viral diagnosis model, for detect installation kit category file Viral diagnosis model, for detect compiler category file Viral diagnosis model, for detecting the Viral diagnosis model of shell category file and the Viral diagnosis model etc. for other category files of detecting non-designated classification.
So the proper vector of file to be detected is input in the Viral diagnosis model of corresponding classification and carries out Viral diagnosis by virus detection element 420, such as, for explaining compressed package classification by oneself, its proper vector obtained comprises: the School Affairs feature of bibliographic structure feature, file, file data content feature, compression algorithm feature, execution script feature.Virus detection element 420 this proper vector is input to for detect explain compressed package category file by oneself Viral diagnosis model in, carry out Viral diagnosis.
For compiler classification, its proper vector obtained comprises: Function feature, category feature, characteristics of variables, adduction relationship feature, character string feature, data block characteristics, logical implication, data content feature.What this proper vector inputted by virus detection element 420 is used for detecting in the Viral diagnosis model of compiler category file, carries out Viral diagnosis.
Device described in the embodiment of the present application achieves different classes of file to be detected, extraction characteristic of correspondence vector, be input in the Viral diagnosis model of corresponding classification and realize Viral diagnosis, do not need all proper vectors extracting file to be detected, but only extract and this classification characteristic of correspondence vector, therefore Viral diagnosis operation can effectively be simplified, promote Viral diagnosis speed, and different classes of file to be detected adopts the Viral diagnosis model of corresponding classification to detect, effectively can ensure the viral recall rate of various category file, and the Timeliness coverage of unknown virus, and effectively can reduce human cost.
It should be noted that the present invention can be implemented in the assembly of software and/or software restraint, such as, special IC (ASIC), general object computing machine or any other similar hardware device can be adopted to realize.In one embodiment, software program of the present invention can perform to realize step mentioned above or function by processor.Similarly, software program of the present invention (comprising relevant data structure) can be stored in computer readable recording medium storing program for performing, such as, and RAM storer, magnetic or CD-ROM driver or flexible plastic disc and similar devices.In addition, steps more of the present invention or function can adopt hardware to realize, such as, as coordinating with processor thus performing the circuit of each step or function.
In addition, a part of the present invention can be applied to computer program, such as computer program instructions, when it is performed by computing machine, by the operation of this computing machine, can call or provide according to method of the present invention and/or technical scheme.And call the programmed instruction of method of the present invention, may be stored in fixing or moveable recording medium, and/or be transmitted by the data stream in broadcast or other signal bearing medias, and/or be stored in the working storage of the computer equipment run according to described programmed instruction.At this, comprise a device according to one embodiment of present invention, this device comprises the storer for storing computer program instructions and the processor for execution of program instructions, wherein, when this computer program instructions is performed by this processor, trigger this plant running based on the aforementioned method according to multiple embodiment of the present invention and/or technical scheme.
To those skilled in the art, obviously the invention is not restricted to the details of above-mentioned one exemplary embodiment, and when not deviating from spirit of the present invention or essential characteristic, the present invention can be realized in other specific forms.Therefore, no matter from which point, all should embodiment be regarded as exemplary, and be nonrestrictive, scope of the present invention is limited by claims instead of above-mentioned explanation, and all changes be therefore intended in the implication of the equivalency by dropping on claim and scope are included in the present invention.Any Reference numeral in claim should be considered as the claim involved by limiting.In addition, obviously " comprising " one word do not get rid of other unit or step, odd number does not get rid of plural number.Multiple unit of stating in system claims or device also can be realized by software or hardware by a unit or device.First, second word such as grade is used for representing title, and does not represent any specific order.

Claims (19)

1. a method for detecting virus, wherein, comprising:
Identify the file layout of file to be detected, thus determine described file generic to be detected;
Extract the vectorial with described classification characteristic of correspondence of described file to be detected;
Whether described proper vector be input in the Viral diagnosis model of corresponding classification, detecting described file to be detected is virus document.
2. method according to claim 1, wherein, determine that described file generic to be detected comprises:
Determine that described file generic to be detected is for specifying any one in classification; Or
In the non-designated classification of described file to be detected, any one, then determine that described file to be detected is for other classifications.
3. method according to claim 2, wherein, described appointment classification comprises following at least one:
Explain compressed package classification, installation kit classification, compiler classification, shell classification by oneself.
4. method according to claim 3, wherein, comprises following at least one item with described compressed package classification characteristic of correspondence vector of explaining by oneself:
The School Affairs feature of bibliographic structure feature, file, file data content feature, compression algorithm feature, execution script feature.
5. method according to claim 3, wherein, comprises following at least one item with described installation kit classification characteristic of correspondence vector:
Bibliographic structure feature, installation script feature, installation procedure version feature, installation file School Affairs feature, install plug-in unit feature, file data feature, according to interface image feature.
6. method according to claim 3, wherein, comprises following at least one item with described compiler classification characteristic of correspondence vector:
Function feature, category feature, characteristics of variables, adduction relationship feature, character string feature, data block characteristics, logical implication, data content feature.
7. method according to claim 3, wherein, comprises following at least one item with described shell classification characteristic of correspondence vector:
Shell feature, internal processes feature, compression or encrypted feature.
8. method according to claim 2, wherein, comprises following at least one item with other classification characteristic of correspondence vectors described:
Program structure feature, import and export data characteristics, text-string feature, numerical characteristics, entrance and Function feature, version information feature, icon image feature.
9. method according to claim 1, wherein, the Viral diagnosis model of described corresponding classification is:
Described classification characteristic of correspondence vector is utilized to train based on machine learning algorithm the Viral diagnosis model obtained.
10. a Viral diagnosis device, wherein, comprising:
For identifying the file layout of file to be detected, thus determine the unit of described file generic to be detected;
For extract described file to be detected with the unit of described classification characteristic of correspondence vector;
For being input in the Viral diagnosis model of corresponding classification by described proper vector, detect the unit whether described file to be detected is virus document.
11. devices according to claim 10, wherein, described file generic to be detected comprises:
Specify any one in classification; Or other classifications outside appointment classification.
12. devices according to claim 11, wherein, described appointment classification comprises following at least one:
Explain compressed package classification, installation kit classification, compiler classification, shell classification by oneself.
13. devices according to claim 12, wherein, comprise following at least one item with described compressed package classification characteristic of correspondence vector of explaining by oneself:
The School Affairs feature of bibliographic structure feature, file, file data content feature, compression algorithm feature, execution script feature.
14. devices according to claim 12, wherein, comprise following at least one item with described installation kit classification characteristic of correspondence vector:
Bibliographic structure feature, installation script feature, installation procedure version feature, installation file School Affairs feature, install plug-in unit feature, file data feature, according to interface image feature.
15. devices according to claim 12, wherein, comprise following at least one item with described compiler classification characteristic of correspondence vector:
Function feature, category feature, characteristics of variables, adduction relationship feature, character string feature, data block characteristics, logical implication, data content feature.
16. devices according to claim 12, wherein, comprise following at least one item with described shell classification characteristic of correspondence vector:
Shell feature, internal processes feature, compression or encrypted feature.
17. devices according to claim 11, wherein, comprise following at least one item with other classification characteristic of correspondence vectors described:
Program structure feature, import and export data characteristics, text-string feature, numerical characteristics, entrance and Function feature, version information feature, icon image feature.
18. devices according to claim 10, wherein, the Viral diagnosis model of described corresponding classification is:
Described classification characteristic of correspondence vector is utilized to train based on machine learning algorithm the Viral diagnosis model obtained.
19. 1 kinds of Viral diagnosis equipment, wherein, comprise the Viral diagnosis device according to any one of claim 10 to 18.
CN201510038792.6A 2015-01-26 2015-01-26 Virus detection method, virus detection device and virus detection equipment Pending CN104680065A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201510038792.6A CN104680065A (en) 2015-01-26 2015-01-26 Virus detection method, virus detection device and virus detection equipment
BR102015026015A BR102015026015A2 (en) 2015-01-26 2015-10-13 virus detection method, virus detection apparatus, virus detection device, non-transient computer readable storage medium, and computer program product

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510038792.6A CN104680065A (en) 2015-01-26 2015-01-26 Virus detection method, virus detection device and virus detection equipment

Publications (1)

Publication Number Publication Date
CN104680065A true CN104680065A (en) 2015-06-03

Family

ID=53315095

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510038792.6A Pending CN104680065A (en) 2015-01-26 2015-01-26 Virus detection method, virus detection device and virus detection equipment

Country Status (2)

Country Link
CN (1) CN104680065A (en)
BR (1) BR102015026015A2 (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107315955A (en) * 2016-04-27 2017-11-03 百度在线网络技术(北京)有限公司 File security recognition methods and device
CN107437088A (en) * 2016-05-27 2017-12-05 百度在线网络技术(北京)有限公司 File identification method and device
CN107577943A (en) * 2017-09-08 2018-01-12 北京奇虎科技有限公司 Sample predictions method, apparatus and server based on machine learning
CN108629182A (en) * 2017-03-21 2018-10-09 腾讯科技(深圳)有限公司 Leak detection method and Hole Detection device
CN109299609A (en) * 2018-08-08 2019-02-01 北京奇虎科技有限公司 A kind of ELF file test method and device
CN111881450A (en) * 2020-08-04 2020-11-03 深信服科技股份有限公司 Virus detection method, device, system, equipment and medium for terminal file
CN111914257A (en) * 2020-08-04 2020-11-10 中国信息安全测评中心 Document detection method, device, equipment and computer storage medium
CN113378162A (en) * 2020-02-25 2021-09-10 深信服科技股份有限公司 Method and device for checking executable and linkable format files and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090113128A1 (en) * 2007-10-24 2009-04-30 Sumwintek Corp. Method and system for preventing virus infections via the use of a removable storage device
CN101685483A (en) * 2008-09-22 2010-03-31 成都市华为赛门铁克科技有限公司 Method and device for extracting virus feature code
CN102479298A (en) * 2010-11-29 2012-05-30 北京奇虎科技有限公司 Program identification method and device based on machine learning
CN102867038A (en) * 2012-08-30 2013-01-09 北京奇虎科技有限公司 Method and device for determining type of file
CN103473506A (en) * 2013-08-30 2013-12-25 北京奇虎科技有限公司 Method and device of recognizing malicious APK files

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090113128A1 (en) * 2007-10-24 2009-04-30 Sumwintek Corp. Method and system for preventing virus infections via the use of a removable storage device
CN101685483A (en) * 2008-09-22 2010-03-31 成都市华为赛门铁克科技有限公司 Method and device for extracting virus feature code
CN102479298A (en) * 2010-11-29 2012-05-30 北京奇虎科技有限公司 Program identification method and device based on machine learning
CN102867038A (en) * 2012-08-30 2013-01-09 北京奇虎科技有限公司 Method and device for determining type of file
CN103473506A (en) * 2013-08-30 2013-12-25 北京奇虎科技有限公司 Method and device of recognizing malicious APK files

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107315955A (en) * 2016-04-27 2017-11-03 百度在线网络技术(北京)有限公司 File security recognition methods and device
CN107437088A (en) * 2016-05-27 2017-12-05 百度在线网络技术(北京)有限公司 File identification method and device
CN107437088B (en) * 2016-05-27 2020-12-08 百度在线网络技术(北京)有限公司 File identification method and device
CN108629182A (en) * 2017-03-21 2018-10-09 腾讯科技(深圳)有限公司 Leak detection method and Hole Detection device
CN108629182B (en) * 2017-03-21 2022-11-04 腾讯科技(深圳)有限公司 Vulnerability detection method and vulnerability detection device
CN107577943B (en) * 2017-09-08 2021-07-13 北京奇虎科技有限公司 Sample prediction method and device based on machine learning and server
CN107577943A (en) * 2017-09-08 2018-01-12 北京奇虎科技有限公司 Sample predictions method, apparatus and server based on machine learning
CN109299609A (en) * 2018-08-08 2019-02-01 北京奇虎科技有限公司 A kind of ELF file test method and device
CN113378162A (en) * 2020-02-25 2021-09-10 深信服科技股份有限公司 Method and device for checking executable and linkable format files and storage medium
CN113378162B (en) * 2020-02-25 2023-11-07 深信服科技股份有限公司 Method, device and storage medium for checking executable and linkable format files
CN111914257A (en) * 2020-08-04 2020-11-10 中国信息安全测评中心 Document detection method, device, equipment and computer storage medium
CN111881450A (en) * 2020-08-04 2020-11-03 深信服科技股份有限公司 Virus detection method, device, system, equipment and medium for terminal file
CN111881450B (en) * 2020-08-04 2023-12-29 深信服科技股份有限公司 Virus detection method, device, system, equipment and medium for terminal file

Also Published As

Publication number Publication date
BR102015026015A2 (en) 2016-08-02

Similar Documents

Publication Publication Date Title
Hsien-De Huang et al. R2-d2: Color-inspired convolutional neural network (cnn)-based android malware detections
CN104680065A (en) Virus detection method, virus detection device and virus detection equipment
Lin et al. Identifying android malicious repackaged applications by thread-grained system call sequences
US9349006B2 (en) Method and device for program identification based on machine learning
CN111639337B (en) Unknown malicious code detection method and system for massive Windows software
Carmony et al. Extract Me If You Can: Abusing PDF Parsers in Malware Detectors.
Kapratwar et al. Static and dynamic analysis of android malware
JP5992622B2 (en) Malicious application diagnostic apparatus and method
CN109905385B (en) Webshell detection method, device and system
Mehtab et al. AdDroid: rule-based machine learning framework for android malware analysis
Mercaldo et al. Mobile malware detection in the real world
US10387627B2 (en) Systems and methods for analyzing software
Rana et al. Evaluation of tree based machine learning classifiers for android malware detection
KR20170068814A (en) Apparatus and Method for Recognizing Vicious Mobile App
EP2750037B1 (en) System and method for improving the efficiency of application emulation acceleration
CN102567661A (en) Program recognition method and device based on machine learning
Zhao et al. Android malware identification through visual exploration of disassembly files
CN101482907A (en) Main unit malice code behavior detection system based on expert system
Aldriwish A deep learning approach for malware and software piracy threat detection
Wang et al. An android malware dynamic detection method based on service call co-occurrence matrices
Yang et al. Android malware detection using hybrid analysis and machine learning technique
KR101557455B1 (en) Application Code Analysis Apparatus and Method For Code Analysis Using The Same
CN105793864A (en) System and method of detecting malicious multimedia files
Congyi et al. Method for detecting Android malware based on ensemble learning
Grover et al. Malware threat analysis of IoT devices using deep learning neural network methodologies

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20150603

RJ01 Rejection of invention patent application after publication