A kind of malicious script detection method and system based on ragel state machines
Technical field
The present invention relates to computer network security field, more particularly to a kind of malicious script inspection based on ragel state machines
Survey method and system.
Background technology
Existing known malicious script is generally javascript, Visual Basic Script, HTML, python, java
Deng, these malicious codes are often encrypted to code obscures treatment, after reducing true script, the behaviour such as general mostly download wooden horse
Make.Malicious script is different from traditional virus, and it has the characteristics of deforming simple, is hidden certainly by diversified obfuscation mechanisms
Oneself, it can also dynamic creation embedded link, and encode linked contents.
Ragel is a finite state machine compiler, can be by based on regular expression, (ragel regular expressions are similar to one
As regular expression, but grammer has a small amount of difference) state machine be compiled into conventional language(Such as C, C++, D, Java, Ruby
Deng)Resolver.Ragel not only can be used to parse byte stream, and it can essentially parse and any can use regular expressions
The content that formula is expressed, and very easily parsing code can be embedded into conventional language.
The existing detection method for malicious script is mainly and detected using characteristic matching or using the method for machine learning
Detected.
Malicious script detection is carried out mainly according to known malicious script Sample Storehouse program code in itself using condition code
Feature detects whether unknown script program is malicious script program, and mainly script is parsed, and extracts malice in script
Code characteristic, and these features and the feature set for filtering out are compared, shell script is determined if code characteristic matches
Justice is malicious script, otherwise is then common shell script.Its defect is the condition code extracted often a certain section of character in script
The HASH of string extracts one section of canonical feature, and because script is textual form, its characteristic matching generally requires to search full text
Rope is matched, and with the increase of characteristic quantity, detection needs are consumed a longer time.
The shortcoming for carrying out malicious script detection using the method for machine learning is to need to learn a large amount of malicious scripts,
Continuous iteration, rate of false alarm is of a relatively high.
The content of the invention
The present invention proposes a kind of malicious script detection method and system based on ragel state machines, solves existing script
Detection method efficiency is low, rate of false alarm problem high, realizes the quick detection to script file.
Present invention firstly provides a kind of malicious script detection method based on ragel state machines, including:
Parsing known malicious script file, extracts malice character string ragel regular expressions;
The malice character string ragel regular expressions composition detection feature database source file that will be extracted;
The detection feature database source file is compiled as the language of script file to be detected;
Detection feature database source file after compiling is integrated into the detection module of script file to be detected;
Obtain and load script file to be detected;
Using ragel canonical state machine testing script files to be detected;
Judge script file to be detected with detection feature database source file in malice character string ragel regular expressions whether
Match somebody with somebody, if it is, the script file to be detected is malice, otherwise described script file to be detected is ordinary file.
In described method, known malicious script file is parsed, extract malice character string ragel regular expressions specific
For:According to malicious code family or variety classification parsing malicious script file, non-malicious code section is removed, retain malicious code
Part;And there is the malicious code part of general character according to malicious code family or mutation, extract one or more ragel canonical tables
Up to formula.
In described method, the malice character string ragel regular expressions composition detection feature database source that will be extracted
File is specially:Using ragel state machines syntax format construction detection feature database source file.
In described method, the input of the detection interface of the detection feature database source file is script file to be detected
Buf and size, are output as testing result and viral name.
It is described to judge script file to be detected and detect the ragel canonical tables in feature database source file in described method
Whether matched up to formula, specifically, judging whether there is one or more in script file to be detected and detection feature database source file
Ragel regular expressions are identical, if it is match.
The present invention also proposes a kind of malicious script detecting system based on ragel state machines simultaneously, including:
Parsing module, for parsing known malicious script file, extracts malice character string ragel regular expressions;
Generation library module, for the malice character string ragel regular expressions composition detection feature database source file that will be extracted;
Collector, the language for the detection feature database source file to be compiled as script file to be detected;
Integration module, for the detection feature database source file after compiling to be integrated into the detection module of script file to be detected;
Acquisition module, for obtaining and loads script file to be detected;
Detection module, for utilizing ragel canonical state machine testing script files to be detected;
Judge module, for judging script file to be detected and detecting the malice character string ragel canonicals in feature database source file
Whether expression formula matches, if it is, the script file to be detected is malice, otherwise described script file to be detected is common
File.
In described system, known malicious script file is parsed, extract malice character string ragel regular expressions specific
For:According to malicious code family or variety classification parsing malicious script file, non-malicious code section is removed, retain malicious code
Part;And there is the malicious code part of general character according to malicious code family or mutation, extract one or more ragel canonical tables
Up to formula.
In described system, the malice character string ragel regular expressions composition detection feature database source that will be extracted
File is specially:Using ragel state machines syntax format construction detection feature database source file.
In described system, the input of the detection interface of the detection feature database source file is script file to be detected
Buf and size, are output as testing result and viral name.
It is described to judge script file to be detected and detect the ragel canonical tables in feature database source file in described system
Whether matched up to formula, specifically, judging whether there is one or more in script file to be detected and detection feature database source file
Ragel regular expressions are identical, if it is match.
It is of the invention it is critical that ragel regular expressions by extracting malicious script difference family, set up it is vertical by
The detection feature database of ragel regular expressions composition, by ragel finite state machines compiler by the detection feature of malicious script
Storehouse is compiled as file destination language, by the integrated detection feature database, you can carry out the inspection of script file using finite state machine
Survey.The invention has the advantages that, due to being detected based on canonical state machine, all treat detection script file without every feature and enter
Row full-text search, cluster training is carried out without to a large amount of malicious scripts, and its detection speed is than general feature matching detection and machine
Device learning detection method is all fast a lot.In addition compared with other script detection methods, using ragel state machine compilers, can give birth to
Source file is detected into the malicious script under different target language, the script detection module that any different language is write can be integrated in
In, applicable surface is also therefore more extensive.
The present invention proposes a kind of extraction malicious script ragel canonical condition codes, is compiled by ragel state machines compiler
The malicious script detection source file of programming language, can be used for quickly detecting, its inspection using the original to malicious script for needed for
Degree of testing the speed carries out the malicious script detection method of string matching or canonical matching far above condition code is relied solely on, while our
Method can quickly issue ragel canonical condition codes, and the rate of false alarm detected to malicious script is than relatively low.Other this method is being extracted
During malicious script ragel features, it is not limited to extract a certain bar canonical feature, and can be to extract multiple to same family
Ragel regular expressions.
Brief description of the drawings
In order to illustrate more clearly of technical scheme of the invention or of the prior art, below will be to embodiment or prior art
The accompanying drawing to be used needed for description is briefly described, it should be apparent that, during drawings in the following description are only the present invention
Some embodiments recorded, for those of ordinary skill in the art, on the premise of not paying creative work, can be with
Other accompanying drawings are obtained according to these accompanying drawings.
Fig. 1 is a kind of malicious script detection method embodiment flow chart based on ragel state machines of the present invention;
Fig. 2 is a kind of malicious script detecting system example structure schematic diagram based on ragel state machines of the present invention.
Specific embodiment
In order that those skilled in the art more fully understand the technical scheme in the embodiment of the present invention, and make of the invention
Above-mentioned purpose, feature and advantage can be more obvious understandable, and technical scheme in the present invention is made further in detail below in conjunction with the accompanying drawings
Thin explanation.
The present invention proposes a kind of malicious script detection method and system based on ragel state machines, solves existing script
Detection method efficiency is low, rate of false alarm problem high, realizes the quick detection to script file.
Present invention firstly provides a kind of malicious script detection method based on ragel state machines, as shown in figure 1, including:
S101:Parsing known malicious script file, extracts malice character string ragel regular expressions;
S102:The malice character string ragel regular expressions composition detection feature database source file that will be extracted;
S103:The detection feature database source file is compiled as the language of script file to be detected;Feature database source document will such as be detected
Part is compiled as the detection feature database source file that C language is write;Certainly also include other conventional languages, such as C++, D, Java, Ruby,
Python etc.;
S104:Detection feature database source file after compiling is integrated into the detection module of script file to be detected;
S105:Obtain and load script file to be detected;Script file to be detected is loaded into internal memory;
S106:Using ragel canonical state machine testing script files to be detected;Using the inspection for being integrated with detection feature database source file
Module is surveyed, using ragel canonical state machines, the script file to be detected to being carried in internal memory is used for quickly detecting;
S107:Judge script file to be detected is with the malice character string ragel regular expressions in detection feature database source file
No matching, if it is, the script file to be detected is malice, otherwise described script file to be detected is ordinary file.
In described method, known malicious script file is parsed, extract malice character string ragel regular expressions specific
For:According to malicious code family or variety classification parsing malicious script file, non-malicious code section is removed, retain malicious code
Part, such as encrypts the malicious code part obscured;And there is the malicious code part of general character according to malicious code family or mutation,
Extract one or more ragel regular expressions.Extracting a plurality of ragel regular expressions mainly prevents wrong report, a plurality of canonical table
Regular expression feature group can be constituted up to formula, when matching a plurality of regular expression simultaneously in script file, just be can determine that
It is malice.
In described method, the malice character string ragel regular expressions composition detection feature database source that will be extracted
File is specially:Using ragel state machines syntax format construction detection feature database source file.
In described method, the input of the detection interface of the detection feature database source file is script file to be detected
Buf and size, are output as testing result and viral name.
It is described to judge script file to be detected and detect the ragel canonical tables in feature database source file in described method
Whether matched up to formula, specifically, judging whether there is one or more in script file to be detected and detection feature database source file
Ragel regular expressions are identical, if it is match.
The present invention also proposes a kind of malicious script detecting system based on ragel state machines simultaneously, as shown in Fig. 2 including:
Parsing module 201, for parsing known malicious script file, extracts malice character string ragel regular expressions;
Generation library module 202, for the malice character string ragel regular expressions composition detection feature database source document that will be extracted
Part;
Collector 203, the language for the detection feature database source file to be compiled as script file to be detected;
Integration module 204, the detection module for the detection feature database source file after compiling to be integrated into script file to be detected
In;
Acquisition module 205, for obtaining and loads script file to be detected;
Detection module 206, for utilizing ragel canonical state machine testing script files to be detected;
Judge module 207, for judging script file to be detected with the malice character string ragel detected in feature database source file just
Then whether expression formula matches, if it is, the script file to be detected is malice, otherwise described script file to be detected is general
Logical file.
In described system, known malicious script file is parsed, extract malice character string ragel regular expressions specific
For:According to malicious code family or variety classification parsing malicious script file, non-malicious code section is removed, retain malicious code
Part;And there is the malicious code part of general character according to malicious code family or mutation, extract one or more ragel canonical tables
Up to formula.
In described system, the malice character string ragel regular expressions composition detection feature database source that will be extracted
File is specially:Using ragel state machines syntax format construction detection feature database source file.
In described system, the input of the detection interface of the detection feature database source file is script file to be detected
Buf and size, are output as testing result and viral name.
It is described to judge script file to be detected and detect the ragel canonical tables in feature database source file in described system
Whether matched up to formula, specifically, judging whether there is one or more in script file to be detected and detection feature database source file
Ragel regular expressions are identical, if it is match.
It is of the invention it is critical that ragel regular expressions by extracting malicious script difference family, set up it is vertical by
The detection feature database of ragel regular expressions composition, by ragel finite state machines compiler by the detection feature of malicious script
Storehouse is compiled as file destination language, by the integrated detection feature database, you can carry out the inspection of script file using finite state machine
Survey.The invention has the advantages that, due to being detected based on canonical state machine, all treat detection script file without every feature and enter
Row full-text search, cluster training is carried out without to a large amount of malicious scripts, and its detection speed is than general feature matching detection and machine
Device learning detection method is all fast a lot.In addition compared with other script detection methods, using ragel state machine compilers, can give birth to
Source file is detected into the malicious script under different target language, the script detection module that any different language is write can be integrated in
In, applicable surface is also therefore more extensive, and such as javascript, Visual Basic Script, HTML, pthon, java are each
Type.
The present invention proposes a kind of extraction malicious script ragel canonical condition codes, is compiled by ragel state machines compiler
The malicious script detection source file of programming language, can be used for quickly detecting, its inspection using the original to malicious script for needed for
Degree of testing the speed carries out the malicious script detection method of string matching or canonical matching far above condition code is relied solely on, while our
Method can quickly issue ragel canonical condition codes, and the rate of false alarm detected to malicious script is than relatively low.Other this method is being extracted
During malicious script ragel features, it is not limited to extract a certain bar canonical feature, and can be to extract multiple to same family
Ragel regular expressions.
Although depicting the present invention by embodiment, it will be appreciated by the skilled addressee that the present invention have it is many deformation and
Change is without deviating from spirit of the invention, it is desirable to which appended claim includes these deformations and changes without deviating from of the invention
Spirit.