CN106778241A - The recognition methods of malicious file and device - Google Patents
The recognition methods of malicious file and device Download PDFInfo
- Publication number
- CN106778241A CN106778241A CN201611067380.6A CN201611067380A CN106778241A CN 106778241 A CN106778241 A CN 106778241A CN 201611067380 A CN201611067380 A CN 201611067380A CN 106778241 A CN106778241 A CN 106778241A
- Authority
- CN
- China
- Prior art keywords
- file
- file destination
- malicious
- probability
- destination
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/50—Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
- G06F21/52—Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems during program execution, e.g. stack integrity ; Preventing unwanted data erasure; Buffer overflow
- G06F21/53—Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems during program execution, e.g. stack integrity ; Preventing unwanted data erasure; Buffer overflow by executing in a restricted environment, e.g. sandbox or secure virtual machine
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/50—Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
- G06F21/55—Detecting local intrusion or implementing counter-measures
- G06F21/56—Computer malware detection or handling, e.g. anti-virus arrangements
Landscapes
- Engineering & Computer Science (AREA)
- Computer Security & Cryptography (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Computer Hardware Design (AREA)
- General Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Virology (AREA)
- Information Transfer Between Computers (AREA)
Abstract
Recognition methods and device the invention discloses a kind of malicious file, are related to computer security technique field, the accuracy of identification for improving malicious file, and main technical schemes of the invention are:Obtain the behavioral characteristics vector and static nature vector of file destination;The behavioral characteristics vector and static nature vector of the file destination are input in preset grader, the file content malice probability of the file destination is calculated;The document source malice probability of file content malice probability and the file destination according to the file destination recognizes whether the file destination is malicious file, and the document source malice probability of the file destination is determined according to the source-information of file destination.Present invention is mainly used for identification malicious file.
Description
Technical field
The present invention relates to computer security technique field, more particularly to a kind of malicious file recognition methods and device.
Background technology
With continuing to develop for computer and Internet technology, malicious file also into explosive growth, its attack means with
Camouflage means also constantly develop towards variation and the mode for complicating.In addition the underground industry chain (supply chain) of computer crime is continuous
Perfect, industrialization is increasingly lifted with the degree of scale, and this causes that confrontation malice document turns into current one and has very much challenge
Problem.
At present, malicious code is mainly recognized by static monitoring techniques technology or Dynamic Monitoring, static monitoring techniques technology is to mesh
Signatures match is carried out after the pretreatment for marking file, that is, matches virus base;Dynamic Monitoring is mainly according to file destination
Some behavioural characteristics, for example, change the information such as specific registration table, open particular port and be identified.
But, mutation of the static monitoring techniques technology to malicious file, new malicious file lacks detectability, Dynamic Monitoring
Lack recognition capability for not possessing evident act feature and new Malware.And it is currently limited to single static monitoring techniques skill
Art or Dynamic Monitoring, this causes that malicious file is easier to hide oneself using some general escape technologies, therefore
The accuracy of identification of malicious file is relatively low.
The content of the invention
In view of this, the present invention provides recognition methods and the device of a kind of malicious file, and main purpose is to improve malice
The accuracy of identification of file.
According to one aspect of the invention, there is provided a kind of recognition methods of malicious file, including:
Obtain the behavioral characteristics vector and static nature vector of file destination;
The behavioral characteristics vector and static nature vector of the file destination are input in preset grader, calculate described
The file content malice probability of file destination;
The document source malice probability of file content malice probability and the file destination according to the file destination is known
Whether not described file destination is malicious file, the document source malice probability of the file destination be according to file destination come
What source information determined.
Further, it is described according to the file content malice probability of the file destination and the file of the file destination come
Before whether file destination described in the malice determine the probability of source is malicious file, methods described also includes:
Obtain the source-information of the file destination;
By matching the malicious origin data in the source-information of the file destination and preset malicious origin storehouse, institute is determined
State the document source malice probability of file destination.
Specifically, the behavioral characteristics for obtaining file destination include:
The file destination is put into network sandbox system and is performed, obtain the user behaviors log of the file destination;It is described
Network sandbox system by one group of Imaginary Mechanism into a virtual switch network constitute;
The behavioral characteristics vector of the file destination is obtained from the user behaviors log.
Further, methods described also includes:
The preset grader is trained by malice samples of text and the noise of addition.
Specifically, described included by malice text and the noise of the addition training preset grader:
Network sandbox system is put into by the malicious file sample and according to the first noise that preset noise knowledge base is added
Middle execution, obtains the user behaviors log of the malicious file sample;
The second noise added by the user behaviors log of the malicious file sample and according to preset noise knowledge base is obtained
The behavioral characteristics vector of the malicious file sample;
The 3rd noise added by the malicious file sample and according to preset noise knowledge base obtains the malice text
The static nature vector of part sample;
Static nature vector and behavioral characteristics vector according to the malicious file sample obtain the malicious file sample
Noise feature vector;
It is corresponding with addition noise malicious file sample by being not added with the corresponding characteristic vector of noise malicious file sample
Noise feature vector trains the preset grader.
Specifically, the document source malice probability according to the file content malice probability and the file destination is true
Whether the fixed file destination is that malicious file includes:
File content malice probability, the file content of file destination be maliciously when by the file destination being malicious file
The document source malice probability of probability and the file destination substitutes into the malicious file that Bayesian formula calculates the file destination
Probability;
Whether file destination is malicious file described in malicious file determine the probability according to the file destination.
According to another aspect of the invention, there is provided a kind of identifying device of malicious file, including:
Acquiring unit, behavioral characteristics vector and static nature vector for obtaining file destination;
Computing unit, for the behavioral characteristics vector and static nature vector of the file destination to be input into preset classification
In device, the file content malice probability of the file destination is calculated;
Recognition unit, for the file content malice probability and the file of the file destination according to the file destination come
Source malice probability recognizes whether the file destination is malicious file, and the document source malice probability of the file destination is basis
What the source-information of file destination determined.
Further, described device also includes:
The acquiring unit, is additionally operable to obtain the source-information of the file destination;
Determining unit, for by match the malice in the source-information of the file destination and preset malicious origin storehouse come
Source data, determines the document source malice probability of the file destination.
Specifically, the acquiring unit includes:
Performing module, performs for the file destination to be put into network sandbox system;Obtain the file destination
User behaviors log;The network sandbox system by one group of Imaginary Mechanism into a virtual switch network constitute;
Acquisition module, the behavioral characteristics vector for obtaining the file destination from the user behaviors log.
Further, described device also includes:
Training unit, for training the preset grader by malice samples of text and the noise of addition.
Specifically, the training unit includes:
Acquisition module, for being put into by the malicious file sample and according to the first noise that preset noise knowledge base is added
Performed in network sandbox system, obtain the user behaviors log of the malicious file sample;
The acquisition module, adds for the user behaviors log by the malicious file sample and according to preset noise knowledge base
Plus the second noise obtain the behavioral characteristics vector of the malicious file sample;
The acquisition module, for being added by the malicious file sample and according to preset noise knowledge base the 3rd is made an uproar
Sound obtains the static nature vector of the malicious file sample;
The acquisition module, obtains for the static nature vector and behavioral characteristics vector according to the malicious file sample
The noise feature vector of the malicious file sample;
Training module, for by being not added with the corresponding characteristic vector of noise malicious file sample and addition noise malice text
The corresponding noise feature vector of part sample trains the preset grader.
Specifically, the determining unit includes:
Computing module, file content malice probability, the institute of file destination during for being malicious file by the file destination
The document source malice probability of the file content malice probability and the file destination of stating file destination substitutes into Bayesian formula meter
Calculate the malicious file probability of the file destination;
Whether determining module, be malice for file destination described in the malicious file determine the probability according to the file destination
File.
By above-mentioned technical proposal, technical scheme provided in an embodiment of the present invention at least has following advantages:
A kind of recognition methods of malicious file provided in an embodiment of the present invention and device, obtain the dynamic of file destination first
, then be input to for the behavioral characteristics vector and static nature vector of the file destination pre- by characteristic vector and static nature vector
Put in grader, the file content malice probability of the file destination is calculated, finally according to the file content of the file destination
The document source malice probability of malice probability and the file destination recognizes whether the file destination is malicious file.With it is current
Main to recognize that malicious code is compared by static monitoring techniques technology or Dynamic Monitoring, the embodiment of the present invention utilizes deep learning side
Method is identified by the multidate information of combining target file, static information and environmental information to file destination, is solved and is passed through
Obscure, the malicious file of shell adding is easier static inspection of escaping, and by the harsh evil of long-term latent and trigger condition
Meaning file easily escape dynamic analysis inspection, so as to improve the recognition capability to malicious file by the embodiment of the present invention.
Described above is only the general introduction of technical solution of the present invention, in order to better understand technological means of the invention,
And can be practiced according to the content of specification, and in order to allow the above and other objects of the present invention, feature and advantage can
Become apparent, below especially exemplified by specific embodiment of the invention.
Brief description of the drawings
By reading the detailed description of hereafter preferred embodiment, various other advantages and benefit is common for this area
Technical staff will be clear understanding.Accompanying drawing is only used for showing the purpose of preferred embodiment, and is not considered as to the present invention
Limitation.And in whole accompanying drawing, identical part is denoted by the same reference numerals.In the accompanying drawings:
Fig. 1 shows a kind of recognition methods flow chart of malicious file provided in an embodiment of the present invention;
Fig. 2 shows addition noise schematic diagram provided in an embodiment of the present invention;
Fig. 3 shows the schematic diagram of overall identification malicious file provided in an embodiment of the present invention;
Fig. 4 shows a kind of identifying device structured flowchart of malicious file provided in an embodiment of the present invention;
Fig. 5 shows the identifying device structured flowchart of another malicious file provided in an embodiment of the present invention.
Specific embodiment
The exemplary embodiment of the disclosure is more fully described below with reference to accompanying drawings.Although showing the disclosure in accompanying drawing
Exemplary embodiment, it being understood, however, that may be realized in various forms the disclosure without should be by embodiments set forth here
Limited.Conversely, there is provided these embodiments are able to be best understood from the disclosure, and can be by the scope of the present disclosure
Complete conveys to those skilled in the art.
It should be noted that term " first ", " in the description and claims of this application and above-mentioned accompanying drawing
Two " it is etc. for distinguishing similar object, without for describing specific order or precedence.It should be appreciated that so using
Data can exchange in the appropriate case, so as to embodiments herein described herein can with except illustrating herein or
Order beyond those of description is implemented.Additionally, term " comprising " and " having " and their any deformation, it is intended that cover
Lid is non-exclusive to be included, for example, the process, method, system, product or the equipment that contain series of steps or unit are not necessarily limited to
Those steps or unit clearly listed, but may include not list clearly or for these processes, method, product
Or other intrinsic steps of equipment or unit.
According to the embodiment of the present application, there is provided a kind of recognition methods embodiment of malicious file, it is necessary to explanation, attached
The step of flow of figure is illustrated can perform in the such as one group computer system of computer executable instructions, though also,
So logical order is shown in flow charts, but in some cases, can be with shown different from order execution herein
Or the step of description.
In order to provide the implementation of the recognition accuracy for improving malicious file, a kind of malice is the embodiment of the invention provides
The preferred embodiments of the present invention are illustrated by the recognition methods of file and device below in conjunction with Figure of description.
A kind of recognition methods of malicious file is the embodiment of the invention provides, as shown in figure 1, the method includes:
101st, the behavioral characteristics vector and static nature vector of file destination are obtained.
Wherein, step 101 be according to the file destination of binary form obtain behavioral characteristics vector and static nature to
Amount.Analysis to file destination, be divided into again dynamic analysis and static analysis two parts, dynamic analysis and utilization sandbox system it is virtual
Executive capability analyzes behavior of the file destination in the runtime, and the behavioral characteristics vector of file destination is obtained from analysis result,
And static analysis is then directly extracted feature on the binary data of file destination and is analyzed, and therefrom obtain file destination
Static nature vector.
For the embodiment of the present invention, the behavioral characteristics vector and static nature vector of file destination are obtained, i.e., to target text
Part analysis employs two methods of dynamic analysis and static analysis simultaneously, and being primarily due to two methods has certain complementation
Property:Some are by obscuring, the malicious file of shell adding is easier the static inspection of escape, and some hide and triggering by long-term
The then easily inspection of escape dynamic analysis of the harsh malicious file of condition, and the combination of two methods is in practice with more preferable
Checking on effect.
It should be noted that sandbox system be can be collected with virtual operation ability and analyze corelation behaviour information be
System.The realization of sandbox system is usually to rely on one group of managed virtual machine, and what file destination was automated imported into virtual
Performed in the environment of machine or (such as Office programs) is opened by corresponding program, operate in the Information gather agent of virtual machine internal
The behavior record of the runtime of target program is got off and exported.In embodiments of the present invention, file destination is input to sandbox
After system, can export file destination is behavior record, and the form of output is general to be exported in the form of log information.This law is implemented
Example record and the log information of output are included but are not limited to:The information of network access, the recalls information to other application programs,
Access information to file system, the access information to system registry, to all void such as system call information and internal storage access
The information of plan machine, the embodiment of the present invention is not specifically limited.
After the corresponding daily record of file destination is got by sandbox system, the daily record that will be got is converted to can be used
In the behavioral characteristics vector of machine learning.In embodiments of the present invention, for the mistake of daily record to the conversion of Dynamic Graph characteristic vector
Journey is divided into four:Daily record standardization, feature extraction and dimensionality reduction.
Wherein, daily record standardize main function be remove additional character in daily record, character that will capitalize in daily record it is small
Writing, time stamp label is substituted for consolidation form, and numeral therein is substituted for unified form.Feature extraction unit
Point, the feature in daily record is extracted using the combination of different methods and these methods, these methods are included but are not limited to:
Extraction to the demographic information of timestamp;Serializing tag extraction method (N-gram) of document;It is based on word frequency or be based on
(TF, TF-IDF) algorithm of word importance degree etc., the embodiment of the present invention is not specifically limited.
It should be noted that the purpose of dimensionality reduction is by the characteristic vector of higher dimensional, it is reduced to compared with low dimensional, after lifting
The computational efficiency and optimization memory space of continuous machine learning algorithm.The dimension reduction method that the embodiment of the present invention can be used include but not
It is only limitted to:PCA (principal Component Analysis, principal component analysis method) algorithm;LDA(Latent
Dirichlet Allocation, topic model) algorithm;(locallylinearembedding is locally linear embedding into calculation to LLE
Method) algorithm etc., the embodiment of the present invention does not do specific restriction.
For the embodiment of the present invention, static nature is the feature directly extracted on the basis of binary object file, and
Exported in the way of eigen vector.Static nature extraction is carried out to file destination, feature is divided into binary features and anti-remittance
Compile the class of feature two.Wherein, binary features are included but are not limited to:First number of sequence characteristic extracting method (N-gram), file
Distribution of lengths according to character string in extracting method, the extracting method of the comentropy of file, the image expression of file, file etc.;Base
Included but are not limited in the feature extraction of dis-assembling:Metadata information, symbolic information, operator information, register information,
API use informations, segment structure information, data definition information etc., the embodiment of the present invention is not specifically limited.
102nd, the behavioral characteristics vector and static nature vector of the file destination are input in preset grader, are calculated
The file content malice probability of the file destination.
In embodiments of the present invention, preset grader is first by the behavioral characteristics vector and static nature vector of file destination
It is combined, forms a more high-dimensional characteristic vector, be then input to be classified in deep neural network, obtains target
The corresponding file content malice probability of file, file content malice probability is used to represent in file destination comprising the general of hostile content
Rate.By SDA, (Stack Denoising Autoencoder, every layer is calculated the structure of preset grader with denoising autocoding
Method) it is trained, SDA belongs to the one kind in deep neural network, and its network structure is that the auto-encoders of multilayer is (automatic
Coding) neutral net plus multilayer band dropout fully-connected network, last output layer by sigmoid functions export mesh
Mark the file content malice probability of file.
The training process of SDA is divided into two stages, respectively per-training (pre-training) stages and fine-tuning
(fine setting) stage.The Per-training stages are a unsupervised learning processes, it is therefore an objective to training auto- in layer
The initial parameter of encoders (autocoder) layer, that is, determine it is first n-1 layers after can just be the auto-encoder of n-th layer
per-training.During every layer of per-training of auto-encoder, by training an encoder-
The three-layer neural network of decoder determines its parameter, and mode is to three layers of encoder-decoder by the data with noise
Neutral net is input into, and the target for contrasting is initial data without noise, by the method for backpropagation, iteratively most
Smallization network exports the error with initial data, finally gives the parameter of encoder.The fine-tuning stages are in whole layers
Per-training is completely carried out afterwards, is a process for supervised learning, and its method is anti-with classical BP neural network
It is completely the same to communication process, for finally finely tuning each layer parameter.
103rd, according to the file destination file content malice probability and the document source malice of the file destination is general
Rate recognizes whether the file destination is malicious file.
Wherein, the document source malice probability of the file destination is determined according to the source-information of file destination, text
Part source malice probability is used to represent that the source of file destination is likely to be the probability of malicious origin.The source-information of file destination
URL (Uniform Resource Locator, URL), IP (the Internet Protocol originated including it
Address, internet protocol address), e-mail sender etc., the embodiment of the present invention is not specifically limited.
It should be noted that the file content malice of the file destination of preset grader output is to consider only target
Probability is obtained in the case of file content.Although theoretically whether a file belongs to malicious file and is determined by its content completely
Fixed, but be in fact difficult to be based only on its content and accomplish high-precision identification, this is accomplished by auxiliary with environmental factor, for example:Come
The confidence level of source website, the confidence level of source e-mail sender, the fusion of these environmental factors have extraordinary practice effect.
Therefore, the embodiment of the present invention is according to the file content malice probability of file destination and the document source malice probability pair of file destination
File destination is identified, and can improve the accuracy of identification of malicious file.
Whether can be specifically malicious file by Bayesian recognition file destination, i.e., according to shellfish for the embodiment of the present invention
File content malice probability together with the document source malice probability fusion of file destination, is drawn file destination by this formula of leaf
Whether be malicious file result.Its basic logic is based on Bayes' theorem:
Wherein, P (m | s) is in certain circumstances, when the file content malice probability of preset grader output is s, its knot
Fruit is the probability of malicious file;P (s | m) for file destination be malicious file when, preset grader exports the probability of s;P (m) is
Under specific environment, the source of file destination belongs to the document source malice probability of the probability of malicious origin, i.e. file destination;P(s)
It is that in certain circumstances, the file content malice probability numbers of preset grader output file destination are the probability of s.
The embodiment of the present invention provides a kind of recognition methods of malicious file, and the behavioral characteristics vector of file destination is obtained first
And static nature vector, the behavioral characteristics vector and static nature vector of the file destination are then input to preset grader
In, the file content malice probability of the file destination is obtained, finally according to the file content malice probability of the file destination
Recognize whether the file destination is malicious file with the document source malice probability of the file destination.Due to preset separator
The file content malice probability of output is not accounting for being drawn in the case that file destination is originated that its information is not filled
Point, thus the embodiment of the present invention using file content malice probability only as an intermediate result, according to this intermediate result and mesh
Whether the document source malice probability identification file destination for marking file is malicious file, so as to improve the identification essence of malicious file
Degree.
In order to the recognition methods preferably to malicious file provided in an embodiment of the present invention is illustrated, following examples will
Refined for above steps and extended.
In embodiments of the present invention, the acquisition process of the document source malice probability of file destination is:Obtain the target
The source-information of file;By matching the malicious origin number in the source-information of the file destination and preset malicious origin storehouse
According to determining the document source malice probability of the file destination.
Wherein, preset malicious origin storehouse is the source environmental factor for storing file destination, which stores specific based on certain
Under the conditions of file be malicious file probability.These specified conditions are included but are not limited to:IP information, URL information, mail outbox
People's information.The generating process in preset malicious origin storehouse, as an independent system for outside, can be by self-built, purchase business
The malicious origin storehouse of industry and participate in the shared of some Security Associations and obtain, the embodiment of the present invention is not specifically limited.When one
The environmental information of individual file destination is when matching the polytype entry in preset malicious origin storehouse, it is necessary to be combined
Come, for example:The source IP of one file destination and sender have matched preset malicious origin storehouse, and the probability of output is respectively
A and b, then preset malicious origin storehouse needs the output that two probabilistic combinations get up, the i.e. document source of file destination maliciously
Probability is 1- (1-a) (1-b).
It should be noted that because quite a few malicious file is only using some main frames as springboard, going infiltration another
A little main frames, and its real malicious act is just showed only on the latter, the pattern that this kind of springboard is attacked is senior lasting the modern times
It is very common in threat (APT, Advanced Persistent Threa).Existing sandbox system is only highlighted to target
The emulation of the hosted environment of file, and the emulation to its network environment is ignored, this causes that system lacks to this kind of Malware
Recognition capability.
In order to solve this problem, the embodiment of the present invention by network sandbox system obtain the behavioral characteristics of file destination to
Amount, will the file destination be put into network sandbox system and perform, obtain the user behaviors log of the file destination;The network
Sandbox system by one group of Imaginary Mechanism into a virtual switch network constitute;The target text is obtained from the user behaviors log
The behavioral characteristics vector of part.
Network sandbox system altitude simulating realistic network environment in the embodiment of the present invention, network sandbox system is empty by one group
Intend mechanism into a virtual switch network constitute, common enterprise-level clothes are deployed in this network on different virtual machines
Business and system (such as Windows update server, Oracle, Exchange etc.), and in this network sandbox system
In each virtual machine on run the Agent of information search, when file destination on host's virtual machine in sandbox
Another virtual machine produce infiltration when (such as once long-range flooding), operate in the information search Agent on the latter
Its abnormal behaviour will be recorded.
Due to the network sandbox system in the embodiment of the present invention with n Imaginary Mechanism into network instead of original single void
The sandbox system of plan machine, while recognition capability is improve, has but paid n times of Resources Consumption.Therefore in order to solve this
Problem, the embodiment of the present invention training period employ each sandbox by n virtual robot arm into pure environment be trained, to carry
The accuracy of identification of malicious file high;And this n virtual machine then processes n file, such as each virtual machine operation in the identification phase simultaneously
One file, realizes improving the recognition efficiency of malicious file with this, and when finally malicious file is identified, system is needed this again
N file destination is processed one time in pure environment again, to determine which is only malicious file actually.Need explanation
It is that because most file destination is not malicious file in reality, therefore the embodiment of the present invention is virtual by n in the identification phase
Machine then processes n file destination simultaneously, can improve the recognition efficiency of malicious file, if there is malicious file in n virtual machine,
This n file destination is processed one time in pure environment again again, to determine which is only malicious file actually;If n
Do not exist malicious file in virtual machine, then continue through the file destination that n virtual machine processes next group.
If for example, being necessary to determine whether that, comprising hostile content, network sandbox system includes 10 in the presence of 20 file destinations
Individual virtual machine, then first perform 10 average being assigned in the middle of 10 virtual machines of file destination, if execution is not pinpointed the problems,
Then rear 10 file destinations are continued to be evenly distributed in 10 virtual machines and performed, if now finding to have problematic target text
Part, then processed one time rear 10 file destinations in pure environment again, to determine which is only malicious file actually.
Due to existing abnormal patterns be by true malicious file sample training into, for malicious file lack generalized
Process, this also limits the recognition capability of malicious file mutation.Therefore in order to solve this problem, the embodiment of the present invention exists
Train during preset grader, add specific noise to make it have the good recognition capability to mutation, i.e., by malice
Text and the noise of addition train preset grader.
Specifically, the embodiment of the present invention is trained preset grader by SDA algorithms, an important original of SDA is selected
Because being that it can be in the per-training stages by artificial increase noise, the noise resisting ability of lifting system, this ability
To recognizing the mutation of Malware and escaping that there is extraordinary effect.Reflecting antimierophonic process is:Input vector x, leads to
Cross the process generation for increasing noiseZ is generated by the process of encoder (encoder) and decoder (decoder), and is missed
Difference function is then defined as the gap between x and z.The process for minimizing error function by iteration causes that encoder is provided with
Antimierophonic ability.
In embodiments of the present invention, x is exactly directly inputting for preset grader, i.e. file destination by static and dynamic special
Levy the characteristic vector of extraction.AndIt is the characteristic vector with noise produced by ad hoc fashion, is produced by a noise factor
Raw noise, before the entrance network sandbox system of malicious file sample, extracted into static nature before, and network sandbox
Increase noise processed on three points after system.
As shown in Fig. 2 obtaining the noise feature vector of malicious file sampleDetailed process is:By the paper sample and
The first noise added according to preset noise knowledge base is performed in being put into network sandbox system, obtains the malicious file sample
User behaviors log;The second noise added by the user behaviors log of the malicious file sample and according to preset noise knowledge base is obtained
The behavioral characteristics vector of the malicious file sample;Added by the malicious file sample and according to preset noise knowledge base
3rd noise obtains the static nature vector of the malicious file sample;Static nature vector according to the malicious file sample
And behavioral characteristics vector obtains the noise feature vector of the malicious file sample
It should be noted that noise factor carries out noise addition by the pre-defined rule in preset noise knowledge base, rule
Be one group and the action that personnel experience is formulated analyzed according to malicious file, these be some can produce different characteristic results but
The action of Malware property is not influenceed.Some simple rules are for example:" Malware is by being still disease after compression or shell adding
Poison ", " daily record that normal software is inserted in the sandbox daily record of Malware is still Malware " etc., the embodiment of the present invention is not done
It is specific to limit.These rules are simultaneously not necessary to guaranty that and are absolutely correct, it is only necessary to correct in greater probability, by rule generation
A small amount of error can be eliminated by follow-up neutral net.The function of noise factor is exactly the rule of selection one or more in knowledge base
Act in target data.Noise system only works in the training period of system, is no longer played a role in the runtime of grader.
For the embodiment of the present invention, after the corresponding noise feature vector of paper sample is obtained, noise is not added with maliciously
The noise feature vector training corresponding with addition noise malicious file sample of the corresponding characteristic vector of paper sample is described preset point
Class device, is not added with the acquisition process of noise malicious file sampling feature vectors and the feature of noise of addition noise malicious file sample
Vectorial acquisition process is identical, and the embodiment of the present invention will not be repeated here.In whole preset grader after the training stage is completed, point
The parameter of each layer of class device has just been determined, and preset grader can enter into the operation phase, i.e., divided by preset grader
Class recognizes that the process of operation phase is a propagated forward process for typical neutral net, defeated eventually through sigmoid functions
Go out to recognize the file content malice probability of file destination.
Needs are elaborated, and Row noise addition is entered to malicious file sample, and a malicious file sample can be generated
Multiple characteristic vectors with noise.Reason for this is that:First, the sample of malicious file compares with respect to the sample of normal file
Hardly possible is obtained, and this way helps to alleviate the problem of imbalanced training sets;Second, with respect to malicious file, normal file does not often have
Flight behavior, noise addition is carried out to it unobvious to the lifting of final recognition effect.Therefore the embodiment of the present invention is by making an uproar
The introducing of sound system, larger improves the generalized ability that preset grader is recognized to malicious file, to mutation, the identification escaped
It is respectively provided with and is significantly lifted.
Need to describe in detail, it is described according to the file content malice probability and the document source of the file destination
Whether file destination described in malice determine the probability is that malicious file includes:File destination when by the file destination being malicious file
File content malice probability, the file content malice probability of the file destination and the file destination document source maliciously
Probability substitutes into the malicious file probability that Bayesian formula calculates the file destination;Malicious file according to the file destination is general
Rate determines whether the file destination is malicious file.I.e. embodiment of the present invention basic logic is based on Bayes' theorem:
Calculate the malicious file probability of file destination, wherein P (m | s) in certain circumstances, preset grader output
When file content malice probability is s, the result is that the probability of malicious file;P (s | m) for file destination be malicious file when, in advance
Put the probability that grader exports s;Under P (m) is specific environment, the source of file destination belongs to the probability of malicious origin, i.e. target
The document source malice probability of file;P (s) is that in certain circumstances, the file content of preset grader output file destination is disliked
Meaning probability numbers are the probability of s.This probability does some and is converted to following formula:
Further:
Wherein, b represents non-malicious file.P (m) is the text of the file destination exported according to preset malicious origin storehouse in above formula
Part is originated malice probability, and to expect that P (m | s) also needs to know P (s | m) and P (s | b), the two probability we by generally
Rate density estimation method is estimated that Multilayer networks method is specifically as follows the technologies such as histogram method and kernel method.Enter
And P (m | s) has been obtained as the malicious file probability of target software, so as to merge preset grader by the embodiment of the present invention
The file content probability and environmental factor that obtain and the probability for producing, improve the accuracy of identification of malicious file.
For the embodiment of the present invention, applicable scene as shown in figure 3, but be not limited only to this, including:In this application scene
Input information be divided into two parts, one is file destination, i.e. the binary form of file destination, and another part is file destination
Source-information, including its URL, IP, the e-mail sender that originate etc..Analysis to file destination, is divided into dynamic analysis again
With static analysis two parts, the virtual execution ability of dynamic analysis and utilization network sandbox system, analysis file destination is in the runtime
Behavior, and the behavioral characteristics vector of file destination is obtained from analysis result, and static analysis is then entered the two of file destination
Static nature vector is directly extracted in data processed.The characteristic vector that dynamic analysis and static analysis are extracted is inputed to preset
Grader is classified, and obtains the file content malice probability of file destination, and is believed by matching the source of the file destination
Cease with preset malicious origin storehouse in malicious origin data, determine the document source malice probability of the file destination, finally will
The file content malice probability of file destination and the document source malice probability of file destination are calculated commonly through Bayes
The malice probability results of final goal file.
In this application scene, the method that file destination analysis employs dynamic analysis and static analysis simultaneously, mainly
Because two methods have certain complementarity:Some are by obscuring, the rogue program of shell adding is easier static inspection of escaping
Look into, and some inspections of dynamic analysis of then easily being escaped by the harsh rogue program of long-term latent and trigger condition, and two kinds
The combination of method has more preferable Checking on effect in practice.In addition, the probability of preset grader output is not account for
Drawn in the case of document source, its information is simultaneously insufficient, therefore only as knot in the middle of in this application scene
Really.Carried according to this intermediate result and preset malicious origin storehouse confess come determine the probability file destination whether be malice text
Part, so as to improve the accuracy of identification of malicious file.
Further, the embodiment of the present invention provides a kind of identifying device of malicious file, as shown in figure 4, described device bag
Include:Acquiring unit 21, computing unit 22, recognition unit 23.
Acquiring unit 21, behavioral characteristics vector and static nature vector for obtaining file destination;
Wherein, acquiring unit 21 is obtained according to the file destination of binary form behavioral characteristics vector and static nature to
Amount.Analysis to file destination, be divided into again dynamic analysis and static analysis two parts, dynamic analysis and utilization sandbox system it is virtual
Executive capability analyzes behavior of the file destination in the runtime, and the behavioral characteristics vector of file destination is obtained from analysis result,
And static analysis is then directly extracted feature on the binary data of file destination and is analyzed, and therefrom obtain file destination
Static nature vector.
For the embodiment of the present invention, acquiring unit 21 obtains the behavioral characteristics vector and static nature vector of file destination,
File destination is analyzed and employs two methods of dynamic analysis and static analysis simultaneously, being primarily due to two methods has one
Fixed complementarity:Some are by obscuring, the malicious file of shell adding is easier static inspection of escaping, and some dive by long-term
The then easily inspection of escape dynamic analysis of volt and the harsh malicious file of trigger condition, and the combination of two methods has in practice
There is more preferable Checking on effect.
It should be noted that sandbox system be can be collected with virtual operation ability and analyze corelation behaviour information be
System.The realization of sandbox system is usually to rely on one group of managed virtual machine, and what file destination was automated imported into virtual
Performed in the environment of machine or (such as Office programs) is opened by corresponding program, operate in the Information gather agent of virtual machine internal
The behavior record of the runtime of target program is got off and exported.In embodiments of the present invention, file destination is input to sandbox
After system, can export file destination is behavior record, and the form of output is general to be exported in the form of log information.This law is implemented
Example record and the log information of output are included but are not limited to:The information of network access, the recalls information to other application programs,
Access information to file system, the access information to system registry, to all void such as system call information and internal storage access
The information of plan machine, the embodiment of the present invention is not specifically limited.
For the embodiment of the present invention, static nature is the feature directly extracted on the basis of binary object file, and
Exported in the way of eigen vector.Static nature extraction is carried out to file destination, feature is divided into binary features and anti-remittance
Compile the class of feature two.Wherein, binary features are included but are not limited to:First number of sequence characteristic extracting method (N-gram), file
Distribution of lengths according to character string in extracting method, the extracting method of the comentropy of file, the image expression of file, file etc.;Base
Included but are not limited in the feature extraction of dis-assembling:Metadata information, symbolic information, operator information, register information,
API use informations, segment structure information, data definition information etc., the embodiment of the present invention is not specifically limited.
Computing unit 22, for the behavioral characteristics vector and static nature vector of the file destination to be input into preset point
In class device, the file content malice probability of the file destination is calculated;
Computing unit 22 is first entered the behavioral characteristics vector and static nature vector of file destination by preset grader
Row combination, forms a more high-dimensional characteristic vector, is then input to be classified in deep neural network, obtains target text
The corresponding file content malice probability of part, file content malice probability is used to represent in file destination comprising the general of hostile content
Rate.By SDA, (Stack Denoising Autoencoder, every layer is calculated the structure of preset grader with denoising autocoding
Method) it is trained, SDA belongs to the one kind in deep neural network, and its network structure is that the auto-encoders of multilayer is (automatic
Coding) neutral net plus multilayer band dropout fully-connected network, last output layer by sigmoid functions export mesh
Mark the file content malice probability of file.
Recognition unit 23, for file content malice probability and the file of the file destination according to the file destination
Source malice probability recognizes whether the file destination is malicious file, and the document source malice probability of the file destination is root
Determine according to the source-information of file destination.
Wherein, the document source malice probability of the file destination is determined according to the source-information of file destination, text
Part source malice probability is used to represent that the source of file destination is likely to be the probability of malicious origin.The source-information of file destination
URL (Uniform Resource Locator, URL), IP (the Internet Protocol originated including it
Address, internet protocol address), e-mail sender etc., the embodiment of the present invention is not specifically limited.
It should be noted that the file content malice of the file destination of preset grader output is to consider only target
Probability is obtained in the case of file content.Although theoretically whether a file belongs to malicious file and is determined by its content completely
Fixed, but be in fact difficult to be based only on its content and accomplish high-precision identification, this is accomplished by auxiliary with environmental factor, for example:Come
The confidence level of source website, the confidence level of source e-mail sender, the fusion of these environmental factors have extraordinary practice effect.
Therefore, the embodiment of the present invention is according to the file content malice probability of file destination and the document source malice probability pair of file destination
File destination is identified, and can improve the accuracy of identification of malicious file.
Whether can be specifically malicious file by Bayesian recognition file destination, i.e., according to shellfish for the embodiment of the present invention
File content malice probability together with the document source malice probability fusion of file destination, is drawn file destination by this formula of leaf
Whether be malicious file result.Its basic logic is based on Bayes' theorem:
Wherein, P (m | s) is in certain circumstances, when the file content malice probability of preset grader output is s, its knot
Fruit is the probability of malicious file;P (s | m) for file destination be malicious file when, preset grader exports the probability of s;P (m) is
Under specific environment, the source of file destination belongs to the document source malice probability of the probability of malicious origin, i.e. file destination;P(s)
It is that in certain circumstances, the file content malice probability numbers of preset grader output file destination are the probability of s.
The embodiment of the present invention provides a kind of identifying device of malicious file, and the behavioral characteristics vector of file destination is obtained first
And static nature vector, the behavioral characteristics vector and static nature vector of the file destination are then input to preset grader
In, the file content malice probability of the file destination is obtained, finally according to the file content malice probability of the file destination
Recognize whether the file destination is malicious file with the document source malice probability of the file destination.Due to preset separator
The file content malice probability of output is not accounting for being drawn in the case that file destination is originated that its information is not filled
Point, thus the embodiment of the present invention using file content malice probability only as an intermediate result, according to this intermediate result and mesh
Whether the document source malice probability identification file destination for marking file is malicious file, so as to improve the identification essence of malicious file
Degree.
Further, as shown in figure 5, described device also includes:
The acquiring unit 21, is additionally operable to obtain the source-information of the file destination;
Determining unit 24, for by matching the malice in the source-information of the file destination and preset malicious origin storehouse
Derived data, determines the document source malice probability of the file destination.
Wherein, preset malicious origin storehouse is the source environmental factor for storing file destination, which stores specific based on certain
Under the conditions of file be malicious file probability.These specified conditions are included but are not limited to:IP information, URL information, mail outbox
People's information.The generating process in preset malicious origin storehouse, as an independent system for outside, can be by self-built, purchase business
The malicious origin storehouse of industry and participate in the shared of some Security Associations and obtain, the embodiment of the present invention is not specifically limited.When one
The environmental information of individual file destination is when matching the polytype entry in preset malicious origin storehouse, it is necessary to be combined
Come, for example:The source IP of one file destination and sender have matched preset malicious origin storehouse, and the probability of output is respectively
A and b, then preset malicious origin storehouse needs the output that two probabilistic combinations get up, the i.e. document source of file destination maliciously
Probability is 1- (1-a) (1-b).
It should be noted that because quite a few malicious file is only using some main frames as springboard, going infiltration another
A little main frames, and its real malicious act is just showed only on the latter, the pattern that this kind of springboard is attacked is senior lasting the modern times
It is very common in threat (APT, Advanced Persistent Threa).Existing sandbox system is only highlighted to target
The emulation of the hosted environment of file, and the emulation to its network environment is ignored, this causes that system lacks to this kind of Malware
Recognition capability.
In order to solve this problem, the embodiment of the present invention by network sandbox system obtain the behavioral characteristics of file destination to
Amount, as shown in figure 5, the acquiring unit 21 includes:Performing module 211, for the file destination to be put into network sandbox system
Performed in system;Obtain the user behaviors log of the file destination;The network sandbox system by one group of Imaginary Mechanism into a void
Intend exchange network to constitute;Acquisition module 212, for obtained from the user behaviors log behavioral characteristics of the file destination to
Amount.
Network sandbox system altitude simulating realistic network environment in the embodiment of the present invention, network sandbox system is empty by one group
Intend mechanism into a virtual switch network constitute, common enterprise-level clothes are deployed in this network on different virtual machines
Business and system (such as Windows update server, Oracle, Exchange etc.), and in this network sandbox system
In each virtual machine on run the Agent of information search, when file destination on host's virtual machine in sandbox
Another virtual machine produce infiltration when (such as once long-range flooding), operate in the information search Agent on the latter
Its abnormal behaviour will be recorded.
Due to the network sandbox system in the embodiment of the present invention with n Imaginary Mechanism into network instead of original single void
The sandbox system of plan machine, while recognition capability is improve, has but paid n times of Resources Consumption.Therefore in order to solve this
Problem, the embodiment of the present invention training period employ each sandbox by n virtual robot arm into pure environment be trained, to carry
The accuracy of identification of malicious file high;And this n virtual machine then processes n file, such as each virtual machine operation in the identification phase simultaneously
One file, realizes improving the recognition efficiency of malicious file with this, and when finally malicious file is identified, system is needed this again
N file destination is processed one time in pure environment again, to determine which is only malicious file actually.Need explanation
It is that because most file destination is not malicious file in reality, therefore the embodiment of the present invention is virtual by n in the identification phase
Machine then processes n file destination simultaneously, can improve the recognition efficiency of malicious file, if there is malicious file in n virtual machine,
This n file destination is processed one time in pure environment again again, to determine which is only malicious file actually;If n
Do not exist malicious file in virtual machine, then continue through the file destination that n virtual machine processes next group.
Due to existing abnormal patterns be by true malicious file sample training into, for malicious file lack generalized
Process, this also limits the recognition capability of malicious file mutation.Therefore in order to solve this problem, the embodiment of the present invention exists
Train during preset grader, add specific noise to make it have the good recognition capability to mutation, i.e., by training
Unit 25 trains preset disaggregated model.Training unit 25, for training described pre- by malice samples of text and the noise of addition
Put grader.
Specifically, as shown in figure 5, the training unit 25 includes:
Acquisition module 251, for the first noise added by the malicious file sample and according to preset noise knowledge base
Execution in network sandbox system is put into, the user behaviors log of the malicious file sample is obtained;
The acquisition module 251, for the user behaviors log by the malicious file sample and according to preset noise knowledge
Second noise of storehouse addition obtains the behavioral characteristics vector of the malicious file sample;
The acquisition module 251, for added by the malicious file sample and according to preset noise knowledge base the
Three noises obtain the static nature vector of the malicious file sample;
The acquisition module 251, for static nature vector and behavioral characteristics vector according to the malicious file sample
Obtain the noise feature vector of the malicious file sample;
Training module 252, for being not added with the corresponding characteristic vector of noise malicious file sample and addition noise malice text
The corresponding noise feature vector of part sample trains the preset grader.
It should be noted that noise factor carries out noise addition by the pre-defined rule in preset noise knowledge base, rule
Be one group and the action that personnel experience is formulated analyzed according to malicious file, these be some can produce different characteristic results but
The action of Malware property is not influenceed.Some simple rules are for example:" Malware is by being still disease after compression or shell adding
Poison ", " daily record that normal software is inserted in the sandbox daily record of Malware is still Malware " etc., the embodiment of the present invention is not done
It is specific to limit.These rules are simultaneously not necessary to guaranty that and are absolutely correct, it is only necessary to correct in greater probability, by rule generation
A small amount of error can be eliminated by follow-up neutral net.The function of noise factor is exactly the rule of selection one or more in knowledge base
Act in target data.Noise system only works in the training period of system, is no longer played a role in the runtime of grader.
For the embodiment of the present invention, after the corresponding noise feature vector of paper sample is obtained, noise is not added with maliciously
The noise feature vector training corresponding with addition noise malicious file sample of the corresponding characteristic vector of paper sample is described preset point
Class device, is not added with the acquisition process of noise malicious file sampling feature vectors and the feature of noise of addition noise malicious file sample
Vectorial acquisition process is identical, and the embodiment of the present invention will not be repeated here.In whole preset grader after the training stage is completed, point
The parameter of each layer of class device has just been determined, and preset grader can enter into the operation phase, i.e., divided by preset grader
Class recognizes that the process of operation phase is a propagated forward process for typical neutral net, defeated eventually through sigmoid functions
Go out to recognize the file content malice probability of file destination.
Specifically, as shown in figure 5, the determining unit 24 includes:
Computing module 241, the file content malice probability of file destination during for being malicious file by the file destination,
The file content malice probability of the file destination and the document source malice probability of the file destination substitute into Bayesian formula
Calculate the malicious file probability of the file destination;
Determining module 242, for file destination described in the malicious file determine the probability according to the file destination whether be
Malicious file.
Need to describe in detail, it is described according to the file content malice probability and the document source of the file destination
Whether file destination described in malice determine the probability is that malicious file includes:File destination when by the file destination being malicious file
File content malice probability, the file content malice probability of the file destination and the file destination document source maliciously
Probability substitutes into the malicious file probability that Bayesian formula calculates the file destination;Malicious file according to the file destination is general
Rate determines whether the file destination is malicious file.I.e. embodiment of the present invention basic logic is based on Bayes' theorem:
Calculate the malicious file probability of file destination, wherein P (m | s) in certain circumstances, preset grader output
When file content malice probability is s, the result is that the probability of malicious file;P (s | m) for file destination be malicious file when, in advance
Put the probability that grader exports s;Under P (m) is specific environment, the source of file destination belongs to the probability of malicious origin, i.e. target
The document source malice probability of file;P (s) is that in certain circumstances, the file content of preset grader output file destination is disliked
Meaning probability numbers are the probability of s.This probability does some and is converted to following formula:
Further:
Wherein, b represents non-malicious file.P (m) is the text of the file destination exported according to preset malicious origin storehouse in above formula
Part is originated malice probability, and to expect that P (m | s) also needs to know P (s | m) and P (s | b), the two probability we by generally
Rate density estimation method is estimated that Multilayer networks method is specifically as follows the technologies such as histogram method and kernel method.Enter
And P (m | s) has been obtained as the malicious file probability of target software, so as to merge preset grader by the embodiment of the present invention
The file content probability and environmental factor that obtain and the probability for producing, improve the accuracy of identification of malicious file.
In the above-described embodiments, the description to each embodiment all emphasizes particularly on different fields, and does not have the portion described in detail in certain embodiment
Point, may refer to the associated description of other embodiment.
It is understood that the correlated characteristic in the above method and device can be referred to mutually.In addition, in above-described embodiment
" first ", " second " etc. be, for distinguishing each embodiment, and not represent the quality of each embodiment.
It is apparent to those skilled in the art that, for convenience and simplicity of description, the system of foregoing description,
The specific work process of device and unit, may be referred to the corresponding process in preceding method embodiment, will not be repeated here.
Algorithm and display be not inherently related to any certain computer, virtual system or miscellaneous equipment provided herein.
Various general-purpose systems can also be used together with based on teaching in this.As described above, construct required by this kind of system
Structure be obvious.Additionally, the present invention is not also directed to any certain programmed language.It is understood that, it is possible to use it is various
Programming language realizes the content of invention described herein, and the description done to language-specific above is to disclose this hair
Bright preferred forms.
In specification mentioned herein, numerous specific details are set forth.It is to be appreciated, however, that implementation of the invention
Example can be put into practice in the case of without these details.In some instances, known method, structure is not been shown in detail
And technology, so as not to obscure the understanding of this description.
Similarly, it will be appreciated that in order to simplify one or more that the disclosure and helping understands in each inventive aspect, exist
Above to the description of exemplary embodiment of the invention in, each feature of the invention is grouped together into single implementation sometimes
In example, figure or descriptions thereof.However, the method for the disclosure should be construed to reflect following intention:I.e. required guarantor
The application claims of shield features more more than the feature being expressly recited in each claim.More precisely, such as following
Claims reflect as, inventive aspect is all features less than single embodiment disclosed above.Therefore,
Thus the claims for following specific embodiment are expressly incorporated in the specific embodiment, and wherein each claim is in itself
All as separate embodiments of the invention.
Those skilled in the art are appreciated that can be carried out adaptively to the module in the equipment in embodiment
Change and they are arranged in one or more equipment different from the embodiment.Can be the module or list in embodiment
Unit or component be combined into a module or unit or component, and can be divided into addition multiple submodule or subelement or
Sub-component.In addition at least some in such feature and/or process or unit exclude each other, can use any
Combine to all features disclosed in this specification (including adjoint claim, summary and accompanying drawing) and so disclosed appoint
Where all processes or unit of method or equipment are combined.Unless expressly stated otherwise, this specification (including adjoint power
Profit is required, summary and accompanying drawing) disclosed in each feature can the alternative features of or similar purpose identical, equivalent by offer carry out generation
Replace.
Although additionally, it will be appreciated by those of skill in the art that some embodiments described herein include other embodiments
In included some features rather than further feature, but the combination of the feature of different embodiments means in of the invention
Within the scope of and form different embodiments.For example, in the following claims, embodiment required for protection is appointed
One of meaning mode can be used in any combination.
All parts embodiment of the invention can be realized with hardware, or be run with one or more processor
Software module realize, or with combinations thereof realize.It will be understood by those of skill in the art that can use in practice
Microprocessor or digital signal processor (DSP) realize the channel switching method of DTV according to embodiments of the present invention
And some or all functions of some or all parts in device.The present invention is also implemented as performing institute here
Some or all equipment or program of device of the method for description are (for example, computer program and computer program are produced
Product).It is such to realize that program of the invention be stored on a computer-readable medium, or can have one or more
The form of signal.Such signal can be downloaded from internet website and obtained, or be provided on carrier signal, or to appoint
What other forms is provided.
It should be noted that above-described embodiment the present invention will be described rather than limiting the invention, and ability
Field technique personnel can design alternative embodiment without departing from the scope of the appended claims.In the claims,
Any reference symbol being located between bracket should not be configured to limitations on claims.Word "comprising" is not excluded the presence of not
Element listed in the claims or step.Word "a" or "an" before element is not excluded the presence of as multiple
Element.The present invention can come real by means of the hardware for including some different elements and by means of properly programmed computer
It is existing.If in the unit claim for listing equipment for drying, several in these devices can be by same hardware branch
To embody.The use of word first, second, and third does not indicate that any order.These words can be explained and run after fame
Claim.
Claims (10)
1. a kind of recognition methods of malicious file, it is characterised in that including:
Obtain the behavioral characteristics vector and static nature vector of file destination;
The behavioral characteristics vector and static nature vector of the file destination are input in preset grader, the target is calculated
The file content malice probability of file;
The document source malice probability identification institute of file content malice probability and the file destination according to the file destination
State whether file destination is malicious file, the document source malice probability of the file destination is believed according to the source of file destination
What breath determined.
2. method according to claim 1, it is characterised in that the file content malice according to the file destination is general
Before whether file destination described in the document source malice determine the probability of rate and the file destination is malicious file, methods described
Also include:
Obtain the source-information of the file destination;
By matching the malicious origin data in the source-information of the file destination and preset malicious origin storehouse, the mesh is determined
Mark the document source malice probability of file.
3. method according to claim 1, it is characterised in that the behavioral characteristics of the acquisition file destination include:
The file destination is put into network sandbox system and is performed, obtain the user behaviors log of the file destination;The network
Sandbox system by one group of Imaginary Mechanism into a virtual switch network constitute;
The behavioral characteristics vector of the file destination is obtained from the user behaviors log.
4. method according to claim 1, it is characterised in that methods described also includes:
The preset grader is trained by malice samples of text and the noise of addition.
5. method according to claim 4, it is characterised in that described to train described by malice text and the noise of addition
Preset grader includes:
It is put into network sandbox system and holds by the malicious file sample and according to the first noise that preset noise knowledge base is added
OK, the user behaviors log of the malicious file sample is obtained;
Described in the second noise added by the user behaviors log of the malicious file sample and according to preset noise knowledge base is obtained
The behavioral characteristics vector of malicious file sample;
The 3rd noise added by the malicious file sample and according to preset noise knowledge base obtains the malicious file sample
This static nature vector;
Static nature vector and behavioral characteristics vector according to the malicious file sample obtain making an uproar for the malicious file sample
Sound characteristic vector;
By being not added with the corresponding characteristic vector of noise malicious file sample noise corresponding with addition noise malicious file sample
Characteristic vector trains the preset grader.
6. method according to claim 1, it is characterised in that described according to the file content malice probability and the mesh
Whether file destination described in marking the document source malice determine the probability of file is that malicious file includes:
When by the file destination being malicious file in the file content malice probability of file destination, the file of the file destination
The document source malice probability for holding malice probability and the file destination substitutes into the evil that Bayesian formula calculates the file destination
Meaning file probability;
Whether file destination is malicious file described in malicious file determine the probability according to the file destination.
7. a kind of identifying device of malicious file, it is characterised in that including:
Acquiring unit, behavioral characteristics vector and static nature vector for obtaining file destination;
Computing unit, for the behavioral characteristics vector and static nature vector of the file destination to be input into preset grader
In, obtain the file content malice probability of the file destination;
Recognition unit, the document source for the file content malice probability according to the file destination and the file destination is disliked
Meaning probability recognizes whether the file destination is malicious file, and the document source malice probability of the file destination is according to target
What the source-information of file determined.
8. device according to claim 7, it is characterised in that described device also includes:
The acquiring unit, is additionally operable to obtain the source-information of the file destination;
Determining unit, for by matching the malicious origin number in the source-information of the file destination and preset malicious origin storehouse
According to determining the document source malice probability of the file destination.
9. device according to claim 7, it is characterised in that the acquiring unit includes:
Performing module, performs for the file destination to be put into network sandbox system;Obtain the behavior of the file destination
Daily record;The network sandbox system by one group of Imaginary Mechanism into a virtual switch network constitute;
Acquisition module, the behavioral characteristics vector for obtaining the file destination from the user behaviors log.
10. device according to claim 7, it is characterised in that described device also includes:
Training unit, for training the preset grader by malice samples of text and the noise of addition.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611067380.6A CN106778241B (en) | 2016-11-28 | 2016-11-28 | Malicious file identification method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611067380.6A CN106778241B (en) | 2016-11-28 | 2016-11-28 | Malicious file identification method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106778241A true CN106778241A (en) | 2017-05-31 |
CN106778241B CN106778241B (en) | 2020-12-25 |
Family
ID=58902338
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201611067380.6A Active CN106778241B (en) | 2016-11-28 | 2016-11-28 | Malicious file identification method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106778241B (en) |
Cited By (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107241352A (en) * | 2017-07-17 | 2017-10-10 | 浙江鹏信信息科技股份有限公司 | A kind of net security accident classificaiton and Forecasting Methodology and system |
CN107392025A (en) * | 2017-08-28 | 2017-11-24 | 刘龙 | Malice Android application program detection method based on deep learning |
CN108710797A (en) * | 2018-06-15 | 2018-10-26 | 四川大学 | A kind of malice document detection method based on entropy information distribution |
WO2019242444A1 (en) * | 2018-06-20 | 2019-12-26 | 深信服科技股份有限公司 | Method and system for training machine learning engine and related device |
CN110619211A (en) * | 2018-06-20 | 2019-12-27 | 深信服科技股份有限公司 | Malicious software identification method, system and related device based on dynamic characteristics |
CN110619213A (en) * | 2018-06-20 | 2019-12-27 | 深信服科技股份有限公司 | Malicious software identification method, system and related device based on multi-model features |
CN111382434A (en) * | 2018-12-28 | 2020-07-07 | 卡巴斯基实验室股份制公司 | System and method for detecting malicious files |
CN111400715A (en) * | 2020-06-04 | 2020-07-10 | 鹏城实验室 | Classification engine diagnosis method, classification engine diagnosis device and computer-readable storage medium |
CN111444144A (en) * | 2020-03-04 | 2020-07-24 | 奇安信科技集团股份有限公司 | File feature extraction method and device |
CN111666404A (en) * | 2019-03-05 | 2020-09-15 | 腾讯科技(深圳)有限公司 | File clustering method, device and equipment |
CN112445760A (en) * | 2020-11-13 | 2021-03-05 | 北京鸿腾智能科技有限公司 | File classification method, equipment, storage medium and device |
CN112688926A (en) * | 2020-12-18 | 2021-04-20 | 杭州安恒信息技术股份有限公司 | Method, system and device for detecting spear type phishing mails based on attachments |
CN112802484A (en) * | 2021-04-12 | 2021-05-14 | 四川大学 | Panda sound event detection method and system under mixed audio frequency |
CN113127870A (en) * | 2021-04-08 | 2021-07-16 | 重庆电子工程职业学院 | Rapid intelligent comparison and safety detection method for mobile malicious software big data |
CN113282928A (en) * | 2021-06-11 | 2021-08-20 | 杭州安恒信息技术股份有限公司 | Malicious file processing method, device and system, electronic device and storage medium |
CN116708008A (en) * | 2023-07-18 | 2023-09-05 | 山东溯源安全科技有限公司 | Method for determining malicious files in transformer substation system, electronic equipment and storage medium |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090138945A1 (en) * | 2003-09-10 | 2009-05-28 | Fidelis Security Systems | High-Performance Network Content Analysis Platform |
CN102902924A (en) * | 2012-09-29 | 2013-01-30 | 北京奇虎科技有限公司 | Method and device for detecting behavior feature of file |
CN102930210A (en) * | 2012-10-14 | 2013-02-13 | 江苏金陵科技集团公司 | System and method for automatically analyzing, detecting and classifying malicious program behavior |
CN102982278A (en) * | 2012-10-31 | 2013-03-20 | 北京奇虎科技有限公司 | Method and device and system for scanning files |
CN103150511A (en) * | 2013-03-18 | 2013-06-12 | 珠海市君天电子科技有限公司 | Safety protection system |
CN103761478A (en) * | 2014-01-07 | 2014-04-30 | 北京奇虎科技有限公司 | Judging method and device of malicious files |
CN104834857A (en) * | 2015-03-27 | 2015-08-12 | 清华大学深圳研究生院 | Method and device for detecting Android malicious software in batch |
CN104933059A (en) * | 2014-03-18 | 2015-09-23 | 华为技术有限公司 | File reputation acquisition method, gateway equipment and file reputation server |
CN105074718A (en) * | 2013-02-15 | 2015-11-18 | 高通股份有限公司 | On-line behavioral analysis engine in mobile device with multiple analyzer model providers |
CN105426762A (en) * | 2015-12-28 | 2016-03-23 | 重庆邮电大学 | Static detection method for malice of android application programs |
-
2016
- 2016-11-28 CN CN201611067380.6A patent/CN106778241B/en active Active
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090138945A1 (en) * | 2003-09-10 | 2009-05-28 | Fidelis Security Systems | High-Performance Network Content Analysis Platform |
CN102902924A (en) * | 2012-09-29 | 2013-01-30 | 北京奇虎科技有限公司 | Method and device for detecting behavior feature of file |
CN102930210A (en) * | 2012-10-14 | 2013-02-13 | 江苏金陵科技集团公司 | System and method for automatically analyzing, detecting and classifying malicious program behavior |
CN102982278A (en) * | 2012-10-31 | 2013-03-20 | 北京奇虎科技有限公司 | Method and device and system for scanning files |
CN105074718A (en) * | 2013-02-15 | 2015-11-18 | 高通股份有限公司 | On-line behavioral analysis engine in mobile device with multiple analyzer model providers |
CN103150511A (en) * | 2013-03-18 | 2013-06-12 | 珠海市君天电子科技有限公司 | Safety protection system |
CN103761478A (en) * | 2014-01-07 | 2014-04-30 | 北京奇虎科技有限公司 | Judging method and device of malicious files |
CN104933059A (en) * | 2014-03-18 | 2015-09-23 | 华为技术有限公司 | File reputation acquisition method, gateway equipment and file reputation server |
CN104834857A (en) * | 2015-03-27 | 2015-08-12 | 清华大学深圳研究生院 | Method and device for detecting Android malicious software in batch |
CN105426762A (en) * | 2015-12-28 | 2016-03-23 | 重庆邮电大学 | Static detection method for malice of android application programs |
Cited By (21)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107241352A (en) * | 2017-07-17 | 2017-10-10 | 浙江鹏信信息科技股份有限公司 | A kind of net security accident classificaiton and Forecasting Methodology and system |
CN107241352B (en) * | 2017-07-17 | 2020-01-21 | 浙江鹏信信息科技股份有限公司 | Network security event classification and prediction method and system |
CN107392025A (en) * | 2017-08-28 | 2017-11-24 | 刘龙 | Malice Android application program detection method based on deep learning |
CN107392025B (en) * | 2017-08-28 | 2020-06-26 | 刘龙 | Malicious android application program detection method based on deep learning |
CN108710797A (en) * | 2018-06-15 | 2018-10-26 | 四川大学 | A kind of malice document detection method based on entropy information distribution |
WO2019242444A1 (en) * | 2018-06-20 | 2019-12-26 | 深信服科技股份有限公司 | Method and system for training machine learning engine and related device |
CN110619211A (en) * | 2018-06-20 | 2019-12-27 | 深信服科技股份有限公司 | Malicious software identification method, system and related device based on dynamic characteristics |
CN110619213A (en) * | 2018-06-20 | 2019-12-27 | 深信服科技股份有限公司 | Malicious software identification method, system and related device based on multi-model features |
CN111382434A (en) * | 2018-12-28 | 2020-07-07 | 卡巴斯基实验室股份制公司 | System and method for detecting malicious files |
CN111666404A (en) * | 2019-03-05 | 2020-09-15 | 腾讯科技(深圳)有限公司 | File clustering method, device and equipment |
CN111444144A (en) * | 2020-03-04 | 2020-07-24 | 奇安信科技集团股份有限公司 | File feature extraction method and device |
CN111444144B (en) * | 2020-03-04 | 2023-07-25 | 奇安信科技集团股份有限公司 | File feature extraction method and device |
CN111400715A (en) * | 2020-06-04 | 2020-07-10 | 鹏城实验室 | Classification engine diagnosis method, classification engine diagnosis device and computer-readable storage medium |
CN112445760A (en) * | 2020-11-13 | 2021-03-05 | 北京鸿腾智能科技有限公司 | File classification method, equipment, storage medium and device |
CN112445760B (en) * | 2020-11-13 | 2024-05-14 | 三六零数字安全科技集团有限公司 | File classification method, device, storage medium and apparatus |
CN112688926A (en) * | 2020-12-18 | 2021-04-20 | 杭州安恒信息技术股份有限公司 | Method, system and device for detecting spear type phishing mails based on attachments |
CN113127870A (en) * | 2021-04-08 | 2021-07-16 | 重庆电子工程职业学院 | Rapid intelligent comparison and safety detection method for mobile malicious software big data |
CN112802484A (en) * | 2021-04-12 | 2021-05-14 | 四川大学 | Panda sound event detection method and system under mixed audio frequency |
CN112802484B (en) * | 2021-04-12 | 2021-06-18 | 四川大学 | Panda sound event detection method and system under mixed audio frequency |
CN113282928A (en) * | 2021-06-11 | 2021-08-20 | 杭州安恒信息技术股份有限公司 | Malicious file processing method, device and system, electronic device and storage medium |
CN116708008A (en) * | 2023-07-18 | 2023-09-05 | 山东溯源安全科技有限公司 | Method for determining malicious files in transformer substation system, electronic equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN106778241B (en) | 2020-12-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106778241A (en) | The recognition methods of malicious file and device | |
CN108111489B (en) | URL attack detection method and device and electronic equipment | |
CN107577945B (en) | URL attack detection method and device and electronic equipment | |
US11568049B2 (en) | Methods and apparatus to defend against adversarial machine learning | |
CN109902024B (en) | Ash-box testing method and device sensitive to program path | |
US11580222B2 (en) | Automated malware analysis that automatically clusters sandbox reports of similar malware samples | |
Park et al. | Host-based intrusion detection model using siamese network | |
US20220383157A1 (en) | Interpretable machine learning for data at scale | |
Mohammadpour et al. | A mean convolutional layer for intrusion detection system | |
Wei et al. | Toward identifying APT malware through API system calls | |
Kheddar et al. | Deep transfer learning applications in intrusion detection systems: A comprehensive review | |
CN114548300B (en) | Method and device for explaining service processing result of service processing model | |
Remmide et al. | Detection of phishing URLs using temporal convolutional network | |
Aghaei et al. | Automated CVE Analysis for Threat Prioritization and Impact Prediction | |
Song et al. | Generating fake cyber threat intelligence using the gpt-neo model | |
Khan et al. | Op2Vec: An Opcode Embedding Technique and Dataset Design for End‐to‐End Detection of Android Malware | |
Raja et al. | Fake Profile Detection Using Logistic Regression and Gradient Descent Algorithm on Online Social Networks | |
US20210326748A1 (en) | Method for protecting a machine learning model against extraction | |
CN116702143A (en) | Intelligent malicious software detection method based on API (application program interface) characteristics | |
KR101893029B1 (en) | Method and Apparatus for Classifying Vulnerability Information Based on Machine Learning | |
Noah et al. | An Intelligent System for Detecting Fake Materials on the Internet | |
Gundla et al. | A Feature Extraction Approach for the Detection of Phishing Websites Using Machine Learning. | |
Barath et al. | BaitNet: A Deep Learning Approach for Phishing Detection | |
Sarker | Generative AI and Large Language Modeling in Cybersecurity | |
Li et al. | An Intrusion Detection Model Based on Feature Selection and Improved One‐Dimensional Convolutional Neural Network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |