CN101201836A - Method for matching in speedup regular expression based on finite automaton containing memorization determination - Google Patents

Method for matching in speedup regular expression based on finite automaton containing memorization determination Download PDF

Info

Publication number
CN101201836A
CN101201836A CNA2007100710710A CN200710071071A CN101201836A CN 101201836 A CN101201836 A CN 101201836A CN A2007100710710 A CNA2007100710710 A CN A2007100710710A CN 200710071071 A CN200710071071 A CN 200710071071A CN 101201836 A CN101201836 A CN 101201836A
Authority
CN
China
Prior art keywords
state machine
finte
regular expression
definite
compiler
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CNA2007100710710A
Other languages
Chinese (zh)
Other versions
CN101201836B (en
Inventor
王继民
平玲娣
潘雪增
陈小平
陈健
陆魁军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN2007100710710A priority Critical patent/CN101201836B/en
Publication of CN101201836A publication Critical patent/CN101201836A/en
Application granted granted Critical
Publication of CN101201836B publication Critical patent/CN101201836B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention discloses a matching and accelerating method of a regular expression based on a deterministic finite automaton with memory, including a rule compiler of the regular expression and a pattern matching engine; the rule compiler of the regular expression firstly transforms the regular expression into an analytic tree, and then transforms the analytic tree into a nondeterministic finite automaton with memory and the deterministic finite automaton with memory respectively; the pattern matching engine can accelerate pattern matching by using the deterministic finite automaton with memory generated by the rule compiler. The invention has the advantages that: 1) by directly supporting repeat operators, the compiler does not need to unfold the repeat expression, thus the difficulty of the development of the compiler is greatly reduced and the memory occupation and the compile time of the compiler are decreased as well; 2) for the same reason, the volume of a rules database generated by the compiler can be reduced, so the cost and complexity of the pattern matching engine can be lowered.

Description

The regular expression method for matching in speedup of finte-state machine is determined in memory based on band
Technical field
The present invention relates to field of information processing, relate in particular to a kind of regular expression method for matching in speedup based on the definite finte-state machine of band memory.
Background technology
String matching is one of basic operation of field of information processing, is the basis that many information processings are used.String matching is to find out with given character string (hereinafter referred to as feature string) to have the process of the substring of particular kind of relationship in certain input of character string (hereinafter referred to as target string).String matching can be divided into character string accurately two kinds of coupling and character string fuzzy matching, wherein the former finds out in target string and the identical substring of feature string, the latter finds out the specific substring similar to feature string (increasing, reduce or revise or several characters than feature character string such as, the substring of target string) in target string.The accurate coupling of character string is used particularly extensive.
The type mode that regular expression is made up of common character (for example character a is to z) and special character (being called metacharacter), this pattern can be used for describing the class character string with special characteristic.Such as ". *Abc[0-9] " such class character string has been described: its head has an any character arbitrarily, and adds a numeral ending with abc.Regular expression is widely used in string matching, is used for describing feature string.For example, the information filtering program of disposing at gateway is used and is searched and filter the network data with improper content based on the pattern match of regular expression; The Antivirus program utilization scans virus based on the feature database of regular expression; Network invasion monitoring is discerned intrusion attempt based on the pattern match of regular expression with the utilization of defence software.
Pattern match based on regular expression is the task of computation-intensive, needs a large amount of CPU to calculate.For example, the feature database of Antivirus program ClamAV has had at present and has surpassed 110,000 rules, but it need be judged given file and whether has feature in the feature database in the short as far as possible time.It is the information filtering program that is deployed in gateway that the another one typical case uses, and it needs in real time the data stream of passing through to be detected, and takes corresponding action according to content rating.Utilize software to carry out the regular expression coupling at present, between the 1Gbps, this performance can descend several times to tens of times to throughput in actual applications, also far can not satisfy the needs of real time scan at hundreds of Mbps.
Based on hardware-accelerated regular expression matching process is one of them way that solves this contradiction.Utilize special IC (ASIC) or field programmable gate array (FPGA) to realize the regular expression matching engine, can finish pattern match equally, but can carry out special optimization as required, improve performance exponentially.
The hardware-accelerated of regular expression realized with automat usually.
Repeat operator " (A) n, m} " and be common in a regular expression operational character, on behalf of subexpression (A), it repeat n time at least, and Repeated m is inferior at the most.Repeat operator has three kinds of forms:
Figure A20071007107100051
(A) { expression subexpression (A) repeats n time at least for n, m}, and Repeated m is inferior at the most.
Figure A20071007107100052
(A) { n, }, expression subexpression (A) repeats n time at least.
Figure A20071007107100053
(A) { n}, expression subexpression (A) repeats n time.
In common pattern match software, " (A) { n, m} " operational character is not directly supported, can be converted to attended operation symbol and selection operator usually, can be converted into as first kind of form " (A) { n, m} ":
(A) (A) ... (A) | (A) (A) ... (A) (A) | ... | (A) (A) ... (A) ... (A) n (n+1) individual m
The repeat operator of second and third kind form is simple relatively, can expand into respectively:
(A)(A)…(A)·(A) *N
And
(A) (A) ... (A) n
Though above-mentioned conversion method can support repeat operator,, when, scope many in multiplicity is big, the status number of regular expression is sharply increased, cause great burden to rule compiler and pattern matching engine.Such as obtaining 1+2+ after " a{1,3072} " expansion ... + 3072=4720128 state! When the subexpression that repeats is complicated, can form state explosion.The present invention is directed to this problem has proposed to use the state machine of band memory that repeat operator is directly supported.
Determine definition, the performance specification of finte-state machine and non-definite finte-state machine, and make up the Thompson algorithm of non-definite finte-state machine and determine that from non-limited automechanism makes the description of ε-closure algorithm of determining finte-state machine from regular expression, please refer to AlFRed V.Aho, " compiler: principle, technology and instrument " book that Ravi Sethi and Jeffrey D.Ullman are shown; The operational character of regular expression and meaning thereof please refer to " grasp regular expression " book that Jeffrey E.F.Friedl is shown; The use of software Lex and Yacc please refer to John Levine, " Lex and Yacc " book that Tony Mason, Doug Brown are shown.
Summary of the invention
The purpose of this invention is to provide a kind of regular expression method for matching in speedup based on the definite finte-state machine of band memory.
Memory determines that the regular expression method for matching in speedup of finte-state machine comprises regular expression rule compiler and pattern matching engine based on band, the regular expression rule compiler is converted to analytic tree to regular expression earlier, respectively analytic tree is converted to non-definite finte-state machine of band memory and definite finte-state machine that band is remembered again, the acceleration of finte-state machine realization to pattern match determined in the band memory that pattern matching engine uses compiler to generate.
It is characterized in that described regular expression rule compiler is converted to analytic tree to regular expression earlier, respectively analytic tree is converted to non-definite finte-state machine of band memory and definite finte-state machine of band memory again: realize the regular expression context-free grammar by Lex and Yacc software, the regular expression rule syntax is resolved, and in resolving, set up the subtree of corresponding respective nodes type according to mating the syntax, finally form complete analytic tree; Support at the Thompson algorithm on the basis of operational character, increase repeat operator ({ n, m}), non-definite state machine of its correspondence is identical with non-definite state machine of repetition expression formula, but increased the repetition range parameter, used this algorithm analytic tree to be converted to non-definite finte-state machine of band memory; For the non-definite finte-state machine that does not contain repeat operator, press ε-closure algorithm and generate definite finte-state machine, and for the non-definite finte-state machine that contains repeat operator, then replace repeat operator with the simple characters that has special marking earlier, by ε-closure algorithm it is converted to definite finte-state machine, separately the part that is replaced is converted to another by ε-closure algorithm in addition and determines finte-state machine; Analytic tree is generated through resolving by the regular expression of correspondence, for a kind of each node branch is no more than 2 tree, nonleaf node is an operational character in the tree, leaf node is character or set, operational character comprise attended operation symbol (" "), selection operator (" | "), repeat operator (" { n, m} "), Kleen closure operational character (" *"), leaf node is the supplementary set of single character, set, set, the character of representative set or the character of representative set supplementary set, set meets IEEE POSIX 1003.2 standards with the ESC sequence of supplementary set.
Described compiler: compiler is input with the regular expression rule file, output regular expression database, it at first carries out syntax check and it is converted to non-definite finte-state machine of band memory rule file, be converted to definite finte-state machine of band memory afterwards again, definite finte-state machine of band memory is stored in the rule database.
The band that described pattern matching engine uses compiler to generate is remembered and determined the acceleration of finte-state machine realization to pattern match: pattern matching engine reads in definite finte-state machine of band memory and the input of character string that compiler generates, input of character string is mated with each finte-state machine, and the position of coupling is kept in the coupling context; Finish when a finte-state machine coupling, then determine next step action: if this automat does not comprise repeat operator, and do not have follow-up automat, then report a coupling according to this automat type; If this automat does not comprise repeat operator, but follow-up finte-state machine is arranged, then determine whether to report a coupling, and generate the coupling context of a follow-up finte-state machine according to greedy the mode option; If this automat comprises repeat operator, then increase matching times, and determine next step action:, continue coupling if matching times is then adjusted to the automat original state to matched position less than the scope of repetition lower limit according to matching times; If matching times then generates the coupling context of succeeding state machine or reports a coupling greater than repeating range limit; If matching times is positioned at the repetition scope, then matched position is adjusted to the automat original state, continue coupling; Pattern matching engine can be realized by field programmable gate array or special IC, and it has realized that the band memory in the regular expression database determines finte-state machine, can accept to import data, judge whether to exist with the storehouse in the coupling of regular expression.
Advantage of the present invention is: 1) because directly supported repeat operator, compiler can not launch repeating expression formula, greatly reduces the compiler development difficulty, has reduced the EMS memory occupation and the compilation time of compiler yet; 2) based on same reason, the rule database size that compiler generates also can reduce greatly, has reduced the cost and the complexity of pattern matching engine.
Description of drawings
Fig. 1 is basic operation symbol and corresponding analytic tree thereof;
Fig. 2 is the corresponding analytic tree of rule " (a| ([0-9])) { 1,30}c*d ";
Fig. 3 is the corresponding non-definite finte-state machine of leaf node (monocase);
Fig. 4 is the corresponding non-definite finte-state machine of leaf node (set);
Fig. 5 is the corresponding non-definite finte-state machine of attended operation symbol;
Fig. 6 is non-definite finte-state machine of selection operator correspondence;
Fig. 7 is non-definite finte-state machine of Kleen closure operational character correspondence;
Fig. 8 is the corresponding non-definite finte-state machine that converts to according to the Thompson algorithm of rule " (a| ([0-9])) { 1,30}c*d ";
Fig. 9 is that the corresponding analytic tree of rule " (a| ([0-9])) { 1,30}c*d " is by improving non-definite finte-state machine that algorithm converts to;
Figure 10 is definite finte-state machine that the corresponding non-definite finte-state machine of rule " (a| ([0-9])) { 1,30}c*d " converts to according to classical ε-closure algorithm;
Figure 11 is that the corresponding non-definite finte-state machine of rule " (a| ([0-9])) { 1,30}c*d " is by improving definite finte-state machine that algorithm converts to;
Figure 12 is a pattern matching engine structural drawing among the present invention;
Figure 13 is context scheduling flow figure in the pattern matching engine of the present invention;
Figure 14 is the main flowchart of pattern matching engine of the present invention.
Embodiment
The regular expression method for matching in speedup of finte-state machine is determined in memory based on band: its core is to use definite finte-state machine of band memory to repeat operator (A{n, m}) directly support, substantially do not reducing under the condition of matching performance like this, for there being a large amount of rules that repeat, can significantly reduce the scale of determining finte-state machine (DFA) that generates, reduce storage cost.Simultaneously, also can simplify the design of compiler, significantly reduce the rule treatments time.Use definite finte-state machine of band memory, rule compiler and pattern matching engine must provide support to it simultaneously.
Described rule description file: comprised the feature string that to search (regular expression) in the rule description file.Any bar rule can be arranged in the rule description file, and every rule is by forming with the lower part: unique sign is used for distinguishing with other rules; Rule body has been described a regular expression; The rule option, the option when specifying matched rule is write such as whether ignoring alphabet size.
Rule compiler: rule compiler is finished the conversion of rule to rule database.After reading the rule description file, rule is wherein carried out syntax check, then, the rule of grammaticalness definition is converted to analytic tree, and carries out pre-service and necessary optimization; Again analytic tree is converted to non-definite finte-state machine and definite finte-state machine of band memory, and all are determined that finte-state machine writes rule database and uses for pattern matching engine.
Pattern matching engine: the pattern match that realizes regular expression.Reading in the rule database that compiler generates, is input with data to be matched, output matching result after coupling is finished.The result comprises following data: whether coupling is arranged; If coupling is arranged, initial (end) position, the actual match length of the feature string of coupling sign, coupling.
The concrete steps of remembering the regular expression method for matching in speedup of determining finte-state machine based on band are as follows:
1) prepares the regular expression rule file.
Comprised the regular expression rule in the regular expression rule file, and other relevant information.Regular expression rule of every line description and relevant information thereof, for example:
Id=1001,Rule=”abc[0-9]$”,Option=”” Id=1002,Rule=” http://[0-9a-z]+\.google\.[a-z]+[\-/%.0-9a-z]*/images\?)(.*)(&?)(safe=[^&]*”,Option =”i”
Be the example of a rule file above, two rules are wherein arranged, every rule is made of three parts, and first is a rule identifier, is provided by Id=xx; Second portion is a rule body, by Rule=" ... " provide; Third part is for relevant option, by Option=" ... " provide.
Rule body has partly been described the feature regular expression, and its grammer meets IEEE POSIX 1003.2 standards.
2) with rule compiler rule file is compiled as rule database.
Rule compiler is finished the conversion work of rule file to rule database.This work divides four steps to finish:
A. rule file is carried out syntax check, generate analytic tree simultaneously.
The grammer of regular expression can be described by context-free grammar, therefore can use Lex and Yacc compilation tool generative grammar resolver (Parser), and the grammer of regular expression is checked.For example regular expression top layer grammer can be expressed as with Yacc:
<regEx〉:==<regxSpecial〉// special matched, (such as ' ') |<regxOneChar〉// single character |<regxPredef〉// the predefine character set |<regxCharClass〉// the user defined character set |<regEx〉<regEx〉// the attended operation symbol |<regEx〉' | '<regEx〉// selection operator |<regEx〉<repeatSpec〉// repeat, (such as ' * ') | ', the regEx of ('<〉') ' // grouping
When carrying out syntax check, also can utilize the action part definition generative grammar analytic tree (Parse Tree) of Yacc.Analytic tree is that a kind of each node branch is no more than 2 tree, and nonleaf node is an operational character in the tree, and leaf node is character or set.Operational character comprise attended operation symbol (" "), selection operator (" | "), repeat operator (" { n, m} "), Kleen closure operational character (" *").Leaf node can be that the supplementary set (as " [^a-z] ", " [^abc123] ") of single character (as " a "), set (as " [a-z] ", " [abc123] "), set, the character of representative set are (as " d ", it represents " [0-9] ") or the character of representative set supplementary set (as " D ", it represents " [^0-9] ") etc.Set meets IEEE POSIX 1003.2 standards with the ESC sequence of supplementary set.Attended operation symbol (" "), selection operator (" | "), repeat operator (" { n, m} "), Kleen closure operational character (" *") process that generates analytic tree respectively as shown in Figure 1.
The corresponding analytic tree of rule " (a| ([0-9])) { 1,30}c*d " as shown in Figure 2." { } " node representative among the figure " n, m} " and duplicate node, " [] " node is represented collector node.Concrete content (the concrete element of gathering in n in the duplicate node and m and the collector node) is provided by the node correlation parameter, and promptly " { } " node has parameter (1,30), and " [] " node has parameter (0,1,2,3,4,5,6,7,8,9).
B. analytic tree is carried out necessary pre-service and optimization.
Up to the present, also comprise the element that some can not directly be supported in the step in the back in the analytic tree, therefore must make it to become the element that directly to support by certain processing.Main pre-service has:
Figure A20071007107100091
The processing of "+" operational character, "+" operational character directly do not support in the step in the back, need be converted to " *", need be converted to " aa as " a+ " *".This pre-service can be finished by increase node and reconstruct analytic tree on analytic tree.
Figure A20071007107100092
The processing of predefine character set.Character set by the ESC definition is not directly supported, need come out element extraction wherein, clearly specifies.Be converted to " [0-9] " as " d ", " D " is converted to " [^0-9] ", and " [: alphanum :] " is converted to " [0-9a-zA-Z] ".This pre-service can directly be revised node parameter and realize.
Figure A20071007107100101
Cutting apart of top layer selection operator.If the root of analytic tree is for selecting node " | ", then this analytic tree may be partitioned into two analytic trees and does not influence matching result.It no longer is " | " that this process circulation is carried out up to root node.As becoming three analytic trees after " A|B|C " processing, be respectively " A ", " B ", " C ".
Except that pre-service, also need be optimized operation.The purpose of Optimizing operation is to reduce the storage or the coupling cost of expression formula under the situation that does not change the analytic tree semanteme.
C. analytic tree is converted to non-definite finte-state machine of band memory.
Analytic tree is converted to non-definite finte-state machine, by improved Thompson algorithm.Except that the original operational character of supporting of Thompson algorithm, increase support in addition to repeat operator.Analytic tree is carried out follow-up traversal one time, can finish the generation of non-definite finte-state machine.
For dissimilar nodes, its state machine is derived by the state machine of subtree and is drawn.
◆ when node type was leaf node (single character), corresponding non-definite finte-state machine was as shown in Figure 3.
◆ when node type was leaf node (set), corresponding non-definite finte-state machine was as shown in Figure 4.
◆ node type is the attended operation symbol, supposes that subtree corresponding initial state in the left and right sides is respectively I 1, I 2Corresponding final state is respectively F 1, F 2, corresponding non-definite finte-state machine as shown in Figure 5.
◆ node type is a selection operator, supposes that subtree corresponding initial state in the left and right sides is respectively I 1, I 2Corresponding final state is respectively F 1, F 2, corresponding non-definite finte-state machine as shown in Figure 6.
◆ node type is a Kleen closure operational character, supposes that the corresponding initial state of subtree is I 1Corresponding final state is F 1, corresponding non-definite finte-state machine as shown in Figure 7.
◆ when node type was repeat operator, non-definite finte-state machine of generation was identical with non-definite finte-state machine of subtree, but need put on special marking, and storage repetition scope.
In addition, in ergodic process, whenever handle a new duplicate node, just generate a new non-definite finte-state machine, and the annexation between the relevant non-definite finte-state machine of record.
Non-definite finte-state machine that the corresponding analytic tree of rule " (a| ([0-9])) { 1,30}c*d " converts to according to the Thompson algorithm as shown in Figure 8; Same rule is by improving non-definite finte-state machine that algorithm converts to as shown in Figure 9, and " EP " represent the ε limit among the figure, and oval and circular node is represented state, is with the limit of arrow to represent state exchange, and the appended character in limit is a hand over word.If arrow is a round end, then switch condition is set, and set element is provided by relevant parameter.
D. non-definite finte-state machine of band memory is converted to definite finte-state machine of band memory.
Non-definite finte-state machine is converted to definite finte-state machine, uses improved ε-closure method.It and classical ε-closure method difference have: whether for the automat that generates, increase parameter, writing down its type needs to repeat; If desired, also to write down its repetition scope.The state machine that generates be " a full state machine ", promptly each node all exist corresponding to the 0-255 input character totally 256 change limits.If in non-definite finte-state machine, there is not certain specific conversion, then in determining finte-state machine, increase such conversion, the state that changes over to is newly-increased special state---REJECT (disarmed state shows that it fails to match).In addition, because in the previous step, with the repeat operator is the border, analytic tree is divided into a plurality of parts, be converted to non-definite finte-state machine respectively, and write down the relation between the automat, in this step, with ε-closure method each non-definite finte-state machine is converted to definite finte-state machine, perhaps writes down the annexation between the automat simultaneously.
Definite finte-state machine that the corresponding non-definite finte-state machine of rule " (a| ([0-9])) { 1,30}c*d " converts to according to classical ε-closure algorithm as shown in figure 10; Same rule is by improving definite finte-state machine that algorithm converts to as shown in figure 11.Ellipse and circular node are represented state among the figure, and state exchange is represented on the limit of band arrow, and the appended character in limit is a hand over word.
3) use pattern matching engine that input of character string is mated
Pattern matching engine is finished the coupling between input of character string and the feature string (state automata that regular expression generates).Pattern matching engine can be realized by field programmable gate array (FPGA) or special IC (ASIC), it has realized the definite finte-state machine of band memory in the regular expression database, can accept to import data, judge whether to exist with the storehouse in the coupling of regular expression.The structure of pattern matching engine as shown in figure 12, among the figure
(1) rule base, the rule database preservation form in pattern matching engine has wherein write down the definite finte-state machine after the compiling, the annexation between each automat, and be the multiplicity etc. of definite finte-state machine of root with the repeat operator.
(2) carry out engine, the performance element of pattern match, it is responsible for reading in data to be matched, carries out Fast Classification, creates the coupling context, and the redirect of coupling context between each state machine etc.
(3) Fast Classification engine is determined relevant definite finte-state machine fast according to the input data, and creates corresponding coupling context by carrying out engine.
(4) the coupling context has write down the current state machine that is mating, and information such as present located state if be in the state machine that repeat operator is a root, also write down the number of times that has mated.
(5) rule database, rule compiler compiles back binary file that form, that pattern matching engine can read in to rule, has wherein preserved each and has determined the state of finte-state machine, the redirect relation of each state etc.If repeat operator is the state machine of root, also record repeats range parameter.
(6) I/O data, pattern matching engine and the extraneous information that exchanges.The input data owner will be meant data stream to be matched, also may comprise the data of part configuration mode matching engine; Output data is meant the matching result of pattern matching engine, comprises whether coupling is successful, matched position, matching times etc.
The execution flow process of pattern match is respectively as Figure 13 and shown in Figure 14.Wherein Figure 13 is the context scheduling flow, and Figure 14 is the main execution flow process of matching engine.

Claims (4)

1. regular expression method for matching in speedup of determining finte-state machine based on band memory, it is characterized in that comprising regular expression rule compiler and pattern matching engine, the regular expression rule compiler is converted to analytic tree to regular expression earlier, respectively analytic tree is converted to non-definite finte-state machine of band memory and definite finte-state machine that band is remembered again, the acceleration of finte-state machine realization to pattern match determined in the band memory that pattern matching engine uses compiler to generate.
2. a kind of regular expression method for matching in speedup according to claim 1 based on the definite finte-state machine of band memory, it is characterized in that described regular expression rule compiler is converted to analytic tree to regular expression earlier, respectively analytic tree is converted to non-definite finte-state machine of band memory and definite finte-state machine of band memory again: realize the regular expression context-free grammar by Lex and Yacc software, the regular expression rule syntax is resolved, and in resolving, set up the subtree of corresponding respective nodes type according to mating the syntax, finally form complete analytic tree; Support at the Thompson algorithm on the basis of operational character, increase repeat operator ({ n, m}), non-definite state machine of its correspondence is identical with non-definite state machine of repetition expression formula, but increased the repetition range parameter, used this algorithm analytic tree to be converted to non-definite finte-state machine of band memory; For the non-definite finte-state machine that does not contain repeat operator, press ε-closure algorithm and generate definite finte-state machine, and for the non-definite finte-state machine that contains repeat operator, then replace repeat operator with the simple characters that has special marking earlier, by ε-closure algorithm it is converted to definite finte-state machine, separately the part that is replaced is converted to another by ε-closure algorithm in addition and determines finte-state machine; Analytic tree is generated through resolving by the regular expression of correspondence, for a kind of each node branch is no more than 2 tree, nonleaf node is an operational character in the tree, leaf node is character or set, operational character comprises attended operation symbol (" "), selection operator (" | "), repeat operator (" { n; m} "), Kleen closure operational character (" * "), leaf node is the supplementary set of single character, set, set, the character of representative set or the character of representative set supplementary set, and set meets IEEE POSIX 1003.2 standards with the ESC sequence of supplementary set.
3. a kind of regular expression method for matching in speedup according to claim 1 based on the definite finte-state machine of band memory, it is characterized in that described compiler: compiler is input with the regular expression rule file, output regular expression database, it at first carries out syntax check and it is converted to non-definite finte-state machine of band memory rule file, be converted to definite finte-state machine of band memory afterwards again, definite finte-state machine of band memory is stored in the rule database.
4. a kind of regular expression method for matching in speedup according to claim 1 based on the definite finte-state machine of band memory, it is characterized in that band memory that described pattern matching engine uses compiler to generate determines that finte-state machine realizes the acceleration to pattern match: pattern matching engine reads in the band memory that compiler generates determines finte-state machine and input of character string, input of character string is mated with each finte-state machine, and the position of coupling is kept in the coupling context; Finish when a finte-state machine coupling, then determine next step action: if this automat does not comprise repeat operator, and do not have follow-up automat, then report a coupling according to this automat type; If this automat does not comprise repeat operator, but follow-up finte-state machine is arranged, then determine whether to report a coupling, and generate the coupling context of a follow-up finte-state machine according to greedy the mode option; If this automat comprises repeat operator, then increase matching times, and determine next step action:, continue coupling if matching times is then adjusted to the automat original state to matched position less than the scope of repetition lower limit according to matching times; If matching times then generates the coupling context of succeeding state machine or reports a coupling greater than repeating range limit; If matching times is positioned at the repetition scope, then matched position is adjusted to the automat original state, continue coupling; Pattern matching engine can be realized by field programmable gate array or special IC, and it has realized that the band memory in the regular expression database determines finte-state machine, can accept to import data, judge whether to exist with the storehouse in the coupling of regular expression.
CN2007100710710A 2007-09-04 2007-09-04 Method for matching in speedup regular expression based on finite automaton containing memorization determination Expired - Fee Related CN101201836B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2007100710710A CN101201836B (en) 2007-09-04 2007-09-04 Method for matching in speedup regular expression based on finite automaton containing memorization determination

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2007100710710A CN101201836B (en) 2007-09-04 2007-09-04 Method for matching in speedup regular expression based on finite automaton containing memorization determination

Publications (2)

Publication Number Publication Date
CN101201836A true CN101201836A (en) 2008-06-18
CN101201836B CN101201836B (en) 2010-04-14

Family

ID=39517005

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2007100710710A Expired - Fee Related CN101201836B (en) 2007-09-04 2007-09-04 Method for matching in speedup regular expression based on finite automaton containing memorization determination

Country Status (1)

Country Link
CN (1) CN101201836B (en)

Cited By (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101916259A (en) * 2010-07-06 2010-12-15 中国科学院计算技术研究所 Space compression method of state transition table of deterministic automaton
CN102163221A (en) * 2011-04-02 2011-08-24 华为技术有限公司 Pattern matching method and device thereof
CN101630323B (en) * 2009-08-20 2012-01-25 中国科学院计算技术研究所 Method for compressing space of deterministic automaton
CN101634940B (en) * 2008-07-25 2012-07-04 苏州蜗牛数字科技股份有限公司 Method for developing computer games through scripts
CN101794295B (en) * 2010-01-06 2013-06-05 哈尔滨工程大学 Regular expression-oriented multi-mode matching hardware engine and generating method
CN101645069B (en) * 2008-08-04 2013-09-11 中国科学院计算机网络信息中心 Regular expression storage compacting method in multi-mode matching
CN103547998A (en) * 2011-01-25 2014-01-29 美光科技公司 Method and apparatus for compiling regular expressions
CN104426911A (en) * 2013-08-30 2015-03-18 凯为公司 Method and apparatus for compilation of finite automata
CN104753916A (en) * 2013-12-30 2015-07-01 凯为公司 Method and apparatus for processing of finite automata
CN104820666A (en) * 2014-01-31 2015-08-05 凯为公司 Finite Automata Processing Based on a Top of Stack (TOS) Memory
CN105895091A (en) * 2016-04-06 2016-08-24 普强信息技术(北京)有限公司 ESWFST construction method
CN106919622A (en) * 2015-12-28 2017-07-04 伊姆西公司 For the method and apparatus of distributed data processing
US9762544B2 (en) 2011-11-23 2017-09-12 Cavium, Inc. Reverse NFA generation and processing
CN107193776A (en) * 2017-05-24 2017-09-22 南京大学 A kind of new transfer algorithm for matching regular expressions
US9785403B2 (en) 2013-08-30 2017-10-10 Cavium, Inc. Engine architecture for processing finite automata
CN107729001A (en) * 2017-09-08 2018-02-23 北京京东尚科信息技术有限公司 A kind of expression processing method and apparatus
US9916145B2 (en) 2011-01-25 2018-03-13 Micron Technology, Inc. Utilizing special purpose elements to implement a FSM
US10002326B2 (en) 2014-04-14 2018-06-19 Cavium, Inc. Compilation of finite automata based on memory hierarchy
US10110558B2 (en) 2014-04-14 2018-10-23 Cavium, Inc. Processing of finite automata based on memory hierarchy
CN109977298A (en) * 2019-02-15 2019-07-05 中国科学院信息工程研究所 A method of extracting the accurate substring of longest from regular expression
CN110083626A (en) * 2019-03-29 2019-08-02 北京奇安信科技有限公司 Streaming events sequences match method and device
CN110865970A (en) * 2019-10-08 2020-03-06 西安交通大学 Compression flow pattern matching engine and pattern matching method based on FPGA platform
CN114492399A (en) * 2021-12-29 2022-05-13 国网天津市电力公司 Contract information extraction system and method based on regular expression
US11488378B2 (en) 2010-06-10 2022-11-01 Micron Technology, Inc. Analyzing data using a hierarchical structure
CN115292558A (en) * 2022-08-12 2022-11-04 苏州浪潮智能科技有限公司 Regular expression-based pattern matching method, system, storage medium and equipment

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9426166B2 (en) * 2013-08-30 2016-08-23 Cavium, Inc. Method and apparatus for processing finite automata

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1877531A (en) * 2006-06-30 2006-12-13 浙江大学 Embedded compiled system scanner accomplishing method
CN101013441A (en) * 2007-02-12 2007-08-08 杭州华为三康技术有限公司 Method and apparatus for generating deterministic finite automaton and indexing method and directory system

Cited By (38)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101634940B (en) * 2008-07-25 2012-07-04 苏州蜗牛数字科技股份有限公司 Method for developing computer games through scripts
CN101645069B (en) * 2008-08-04 2013-09-11 中国科学院计算机网络信息中心 Regular expression storage compacting method in multi-mode matching
CN101630323B (en) * 2009-08-20 2012-01-25 中国科学院计算技术研究所 Method for compressing space of deterministic automaton
CN101794295B (en) * 2010-01-06 2013-06-05 哈尔滨工程大学 Regular expression-oriented multi-mode matching hardware engine and generating method
US11488378B2 (en) 2010-06-10 2022-11-01 Micron Technology, Inc. Analyzing data using a hierarchical structure
CN101916259A (en) * 2010-07-06 2010-12-15 中国科学院计算技术研究所 Space compression method of state transition table of deterministic automaton
CN101916259B (en) * 2010-07-06 2012-07-11 中国科学院计算技术研究所 Space compression method of state transition table of deterministic automaton
CN103547998B (en) * 2011-01-25 2016-11-09 美光科技公司 For compiling the method and apparatus of regular expression
CN103547998A (en) * 2011-01-25 2014-01-29 美光科技公司 Method and apparatus for compiling regular expressions
US9916145B2 (en) 2011-01-25 2018-03-13 Micron Technology, Inc. Utilizing special purpose elements to implement a FSM
US10089086B2 (en) 2011-01-25 2018-10-02 Micron Technologies, Inc. Method and apparatus for compiling regular expressions
US9792097B2 (en) 2011-01-25 2017-10-17 Micron Technology, Inc. Method and apparatus for compiling regular expressions
CN102163221A (en) * 2011-04-02 2011-08-24 华为技术有限公司 Pattern matching method and device thereof
US9762544B2 (en) 2011-11-23 2017-09-12 Cavium, Inc. Reverse NFA generation and processing
CN104426911A (en) * 2013-08-30 2015-03-18 凯为公司 Method and apparatus for compilation of finite automata
US10466964B2 (en) 2013-08-30 2019-11-05 Cavium, Llc Engine architecture for processing finite automata
CN104426911B (en) * 2013-08-30 2018-03-23 凯为公司 Method and apparatus for compiling finite automata
US9785403B2 (en) 2013-08-30 2017-10-10 Cavium, Inc. Engine architecture for processing finite automata
US9823895B2 (en) 2013-08-30 2017-11-21 Cavium, Inc. Memory management for finite automata processing
CN104753916A (en) * 2013-12-30 2015-07-01 凯为公司 Method and apparatus for processing of finite automata
CN104753916B (en) * 2013-12-30 2018-06-05 凯为公司 For handling the method and apparatus of finite automata
CN104820666B (en) * 2014-01-31 2018-09-25 凯为公司 Finite automata processing based on stack top (TOS) memory
US9904630B2 (en) 2014-01-31 2018-02-27 Cavium, Inc. Finite automata processing based on a top of stack (TOS) memory
CN104820666A (en) * 2014-01-31 2015-08-05 凯为公司 Finite Automata Processing Based on a Top of Stack (TOS) Memory
US10002326B2 (en) 2014-04-14 2018-06-19 Cavium, Inc. Compilation of finite automata based on memory hierarchy
US10110558B2 (en) 2014-04-14 2018-10-23 Cavium, Inc. Processing of finite automata based on memory hierarchy
CN106919622A (en) * 2015-12-28 2017-07-04 伊姆西公司 For the method and apparatus of distributed data processing
CN105895091B (en) * 2016-04-06 2020-01-03 普强信息技术(北京)有限公司 ESWFST construction method
CN105895091A (en) * 2016-04-06 2016-08-24 普强信息技术(北京)有限公司 ESWFST construction method
CN107193776A (en) * 2017-05-24 2017-09-22 南京大学 A kind of new transfer algorithm for matching regular expressions
CN107729001A (en) * 2017-09-08 2018-02-23 北京京东尚科信息技术有限公司 A kind of expression processing method and apparatus
CN109977298A (en) * 2019-02-15 2019-07-05 中国科学院信息工程研究所 A method of extracting the accurate substring of longest from regular expression
CN110083626A (en) * 2019-03-29 2019-08-02 北京奇安信科技有限公司 Streaming events sequences match method and device
CN110865970A (en) * 2019-10-08 2020-03-06 西安交通大学 Compression flow pattern matching engine and pattern matching method based on FPGA platform
CN110865970B (en) * 2019-10-08 2021-06-29 西安交通大学 Compression flow pattern matching engine and pattern matching method based on FPGA platform
CN114492399A (en) * 2021-12-29 2022-05-13 国网天津市电力公司 Contract information extraction system and method based on regular expression
CN115292558A (en) * 2022-08-12 2022-11-04 苏州浪潮智能科技有限公司 Regular expression-based pattern matching method, system, storage medium and equipment
CN115292558B (en) * 2022-08-12 2024-01-26 苏州浪潮智能科技有限公司 Regular expression-based pattern matching method, system, storage medium and equipment

Also Published As

Publication number Publication date
CN101201836B (en) 2010-04-14

Similar Documents

Publication Publication Date Title
CN101201836B (en) Method for matching in speedup regular expression based on finite automaton containing memorization determination
CN109445834B (en) Program code similarity rapid comparison method based on abstract syntax tree
CN102857493B (en) Content filtering method and device
CN107292170B (en) Method, device and system for detecting SQL injection attack
Tu et al. Efficient building and placing of gating functions
US20190317879A1 (en) Deep learning for software defect identification
US11816493B2 (en) Methods and systems for representing processing resources
CN109522225B (en) Automatic test assertion method and device, test platform and storage medium
CN109376866B (en) Method and device for recording metadata and method and device for running quantum program
US20200065160A1 (en) Automated api evaluation based on api parameter resolution
CN101980546B (en) Intelligent network platform, service execution method and method for analyzing service abnormality
Medeiros et al. From regexes to parsing expression grammars
US10891117B2 (en) Method and system for using subroutine graphs for formal language processing
CN111427940A (en) Self-adaptive database conversion method and device
CN112148343A (en) Rule issuing method and device and terminal equipment
Huang et al. Answering regular path queries on workflow provenance
KR101985309B1 (en) Method of creating the balanced parse tree having optimized height
CN116382815B (en) Contract parallelization method based on DAG model
Kazmierczak et al. An optimal distributed ear decomposition algorithm with applications to biconnectivity and outerplanarity testing
CN112799673B (en) Network protocol data checking method and device
CN107193623A (en) The hardware circuit Compilation Method and compiler of a kind of new quick regular expression are realized
CN113849781A (en) Go language source code obfuscation method, system, terminal and storage medium
Borsotti et al. Fast deterministic parsers for transition networks
Chen et al. On distributed computing systems reliability analysis under program execution constraints
Haider An ECMAScript 2015-Compliant Automata-based Regular Expres-sion Engine for Graal. js

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
C17 Cessation of patent right
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20100414

Termination date: 20120904