CN109977275A - A kind of regular expression DFA space compression method and system - Google Patents
A kind of regular expression DFA space compression method and system Download PDFInfo
- Publication number
- CN109977275A CN109977275A CN201910134200.9A CN201910134200A CN109977275A CN 109977275 A CN109977275 A CN 109977275A CN 201910134200 A CN201910134200 A CN 201910134200A CN 109977275 A CN109977275 A CN 109977275A
- Authority
- CN
- China
- Prior art keywords
- state
- similarity
- weight
- dfa
- regular expression
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/14—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
- H04L63/1408—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/14—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
- H04L63/1408—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
- H04L63/1416—Event detection, e.g. attack signature detection
Landscapes
- Engineering & Computer Science (AREA)
- Computer Security & Cryptography (AREA)
- Computer Hardware Design (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
The present invention relates to a kind of regular expression DFA space compression method and systems, comprising: is scanned to state transfer side, reads in transfer matrix, the value of column each in transfer matrix is stored with character string forms;The equal corresponding alphabet element of character string is merged, multiple Hash tables are obtained;The similarity and the corresponding weight of similarity between state two-by-two is calculated in the state machine of regular expression, similarity matrix is constructed by similarity;According to the state depth of state point each in state machine, similarity matrix is updated;It is begun stepping through from each state node in the corresponding digraph of similarity matrix, it chooses the corresponding maximum similarity of state node and shifts side, to complete the construction of default path, default path is saved as maximum spanning tree, the corresponding weight of similarity and corresponding Hash table to find default path, it deletes in the state transfer before compression in DFA, and increases the default path that a weight is default path weight in the state transition path before deletion.
Description
Technical field
The present invention relates to network safety fileds, and in particular to a kind of regular expression DFA space compression method and system.
Background technique
With the rapid development and extensive use of computer network, internet security problem inevitably appears in people
In front.It diversified network flow and is served by and various Means of Intrusion, attack pattern emerge one after another.Therefore.Deep packet
Detection is widely used in disparate networks service, content analysis is carried out to network packet, by what is carried in network packet
The set of patterns of content (content analysis result) and characterization virus characteristic set in advance carries out characteristic matching, so that it is determined that data packet
In whether carry illegal contents.The number to be detected that regular expression matching method based on DFA reaches real time high-speed due to it
There is the characteristics of preferable matching effect according to stream and extensive pattern match, is widely used.
DFA, that is, deterministic finite automation, the mathematical model being made of five elements (S, Σ, δ, S0, F).Use base
During the pattern match of DFA carries out regular expression matching, since status number and the rule set of DFA are closely related, magnanimity
And ever-increasing rule set expression formula quantity makes DFA number of states increase severely, and leads to state explosion.In addition, in regular expression
Existing asterisk wildcard and length limitation are further exacerbated by the space hold of DFA, and current hardware condition is unable to satisfy caused by DFA
Huge space requirement.Finally, needing to save all possible matchings during the regular expression progress based on DFA is matched, again
Speed and practical performance to matching process propose challenge.
Its occupied space is reduced to acceptable degree, generating state is avoided to explode, thus sharp by DFA space compression
High pattern matching speed is obtained with lesser memory space, DFA high speed is applicable in network application and completes regular expression mould
The matched key technology of formula.Since the regular expression matching in network application is generally applicable in the ACSCII character set conduct of extension
Alphabet Σ, size 256, there are 256 to go out side for each node, exists in state transition diagram in large scale a large amount of
There are bulk redundancy side in duplicate message, that is, DFA, existing state transfer compression method is difficult to balance due to needing to make again
With default path length guarantee algorithm worst case under matching performance and compression after the matched demand two without specific state
Matching speed caused by relationship between person and performance issue.
Summary of the invention
In order to overcome DFA state explosion during DFA realization regular expression matching to cause memory space occupancy that can not expire
The problem of foot, the invention proposes a kind of regular expression DFA space compression method, compressed from alphabet eliminate redundancy side and
Two aspect of DFA state transfer side compression is started with, and solves the problems, such as that DFA memory space explodes in regular expression matching.Redundancy side
It must be state transfer side, but state transfer side is not necessarily redundancy, removes redundancy side and has no effect on it to regular expressions
The expression of formula, it is possible to remove to reduce the occupied space DFA, to reduce the time of matching regular expression.
Specifically, the invention discloses a kind of regular expression DFA space compression methods characterized by comprising
Step 1 is scanned by the state transfer side to regular expression, transfer matrix is read in, by the transfer matrix
In the values of each column stored with character string forms;
Step 2 merges the corresponding alphabet element of character string equal in the two-dimensional array, obtains multiple equivalence classes
Hash table;
Step 3 shifts side compression algorithm by pseudocode state, calculates in the state machine of the regular expression state two-by-two
Between similarity weight corresponding with the similarity, pass through the similarity construct similarity matrix;
Step 4, according to the state depth of state point each in the state machine, update the similarity matrix;
Step 5, the corresponding character set of the similarity being calculated between state two-by-two;
Step 6 is begun stepping through from each state node in the corresponding digraph of the similarity matrix, chooses the state section
Point corresponding maximum similarity transfer side saves the default path as maximum spanning tree to complete the construction of default path;
Step 7, using the maximum spanning tree, find the default path the corresponding weight of similarity and the corresponding Kazakhstan
Uncommon table is deleted in the state transfer before compression in DFA, and increases a weight in the state transition path before deletion and be
The default path of default path weight, to complete to carry out original DFA storage to simplify compression.
The regular expression DFA space compression method, wherein the step 4 include: step 41, will be in the state machine
The low state point of depth is updated to -1 to the similarity value between the high state point of depth.
The regular expression DFA space compression method, wherein the step 1 includes: step 11, according to regular expressions
State machine in alphabet and the state machine status number, creation for storage state transfer side information two-dimensional array.
The regular expression DFA space compression method, wherein the step 7 includes: step 71, according to default path
The corresponding weight of similarity and corresponding alphabet find the default path that state node A is directed toward state node B, state section
The weight that point A is directed toward the adjacent side of state node B is resembleDegree [i] [j], deletes pair since state node A
Answer the similarity weight resembleDegree [i] [j] and character set corresponding thereto of default path
The transfer side of character in linkedAlphabet [i] [j], it is silent for increasing a weight in original state transition path later
Recognize the default road of the corresponding weight defaultWeight [i] [j] under similarity of paths resembleDegree [i] [j]
Diameter.
The regular expression DFA space compression method, wherein the state transition path includes multiple state transfers side.
The invention also discloses a kind of regular expression DFA space compression system, including:
Module 1 is scanned by the state transfer side to regular expression, transfer matrix is read in, by the transfer matrix
In the values of each column stored with character string forms;
Module 2 merges the corresponding alphabet element of character string equal in the two-dimensional array, obtains multiple equivalence classes
Hash table;
Module 3 shifts side compression algorithm by pseudocode state, calculates in the state machine of the regular expression state two-by-two
Between similarity weight corresponding with the similarity, pass through the similarity construct similarity matrix;
Module 4, according to the state depth of state point each in the state machine, update the similarity matrix;
Module 5, the corresponding character set of the similarity being calculated between state two-by-two;
Module 6 is begun stepping through from each state node in the corresponding digraph of the similarity matrix, chooses the state section
Point corresponding maximum similarity transfer side saves the default path as maximum spanning tree to complete the construction of default path;
Module 7, using the maximum spanning tree, find the default path the corresponding weight of similarity and the corresponding Kazakhstan
Uncommon table is deleted in the state transfer before compression in DFA, and increases a weight in the state transition path before deletion and be
The default path of default path weight, to complete to carry out original DFA storage to simplify compression.
The regular expression DFA space compression system, wherein the module 4 include: module 41, will be in the state machine
The low state point of depth is updated to -1 to the similarity value between the high state point of depth.
The regular expression DFA space compression system, wherein the module 1 includes: module 11, according to regular expressions
State machine in alphabet and the state machine status number, creation for storage state transfer side information two-dimensional array.
The regular expression DFA space compression system, wherein the module 7 includes: module 71, according to default path
The corresponding weight of similarity and corresponding alphabet find the default path that state node A is directed toward state node B, state section
The weight that point A is directed toward the adjacent side of state node B is resembleDegree [i] [j], deletes pair since state node A
Answer the similarity weight resembleDegree [i] [j] and character set corresponding thereto of default path
The transfer side of character in linkedAlphabet [i] [j], it is silent for increasing a weight in original state transition path later
Recognize the default road of the corresponding weight defaultWeight [i] [j] under similarity of paths resembleDegree [i] [j]
Diameter.
The regular expression DFA space compression system, wherein the state transition path includes multiple state transfers side.
The invention has the benefit that the DFA memory space compression method proposed through the invention, using equivalence class
Method is split alphabet set, is divided into smaller alphabet set, and rebuild and wear an aromatic plant metioned in ancient books default path
Compressive state shifts the information on side, reduces consumed space when DFA storage and matching, so that memory space declines to a great extent,
There is very big promotion with efficiency.The present invention may be implemented DFA be grouped compression after can quickly and judge whether be virus because
Viral data packet often has fixed DFA expression formula, and present invention is primarily intended to improve the speed of detection DFA expression formula.
Detailed description of the invention
Fig. 1 is that the present invention is based on the one embodiment processes of DFA memory space compression method of alphabet and shape transfer side compression
Figure;
Fig. 2 is state depth schematic diagram;
Fig. 3 is the DFA Structure figure shifted before the compression of side;
Fig. 4 shifts the compressed DDFA structure chart in side.
Specific embodiment
In order to overcome DFA state explosion during DFA realization regular expression matching to cause memory space occupancy that can not expire
The problem of foot, the invention proposes a kind of DFA compression methods, in the case where guaranteeing DFA pattern match correctness, solve DFA
The problem of memory space explodes.It specifically includes that two submethods are compressed in the segmentation of DFA alphabet and DFA state transfer side.
Regular expression state machine (state machine) is by a two-dimensional array come flag state transition rule, two dimension
The respectively status number of alphabet ∑ and state machine.DFA alphabet divide submethod, refer to regular expression (each just
Then expression formula can be expressed as a regular expression state machine) state transfer side carry out one time scanning, will be in alphabet ∑
Element be respectively divided in different groups, the identical character representation of the letter character in same group finds out alphabet ∑
All equivalence classes, to be compressed to DFA, memory headroom needed for reducing DFA storage.It includes the following steps:
(wherein M is state machine to the relevant information on step S110: one M*N of creation two-dimensional array storage state transfer side
The number of alphabetical table element, N are the status numbers of DFA);
Step S120: a scanning, i.e. reading transfer matrix information are carried out to state transfer side, by the value of each column with word
Accord with the storage of string form;
Step S130: it is successively read the corresponding character string of each letter in alphabet, judges whether character string is equal, is marked
Whether can merge, i.e., if inputting character string all phases that any character is obtained from the transfer of the elementary state in transfer matrix
Together, then it is of equal value, so as to be divided into an equivalence class, this namely process of Hash;
Step S140: the alphabet element for having identical combinable label is merged, the character list after being divided, i.e., and one
256 ascii characters are mapped to the Hash table of multiple equivalence classes.Hash is a kind of ashing technique, alphabet is divided into more
The process of a equivalence class is exactly the process hashed, is Hash process.
Above-mentioned alphabet compression step obtains multiple equivalence classes of regular expression state machine alphabet ∑ division, can be with
The quantity that each node out-degree is compressed to equivalence class from 256, and above-mentioned Hash table will shift side as submethod DFA state
The input of compression.Equivalence class is divided in alphabet set, set [alpha] is exactly the Hash in algorithm alphabet partitioning algorithm
The embodiment of table.
Submethod is compressed on so-called DFA state transfer side, refers to silent using being constructed using the maximum spanning tree for solving digraph
The method for recognizing transfer path reduces state transfer side, to reduce the memory space of DFA.Specific step is as follows:
Step S210: shifting side compression algorithm by pseudocode state, calculates the similarity of each state of DFA between any two
Weight corresponding with the similarity;(similarity between state, which may indicate that, can merge reduced number of edges between two states,
And prepare for building spanning tree hereinafter, weight is that the weight of possible default path is prepared)
Step S220: similarity matrix is pre-processed its depth according to state depth.That is, in order to construct depth
High node is directed toward the digraph of the low node of depth, and the low state point of the depth in similarity matrix (node) is high to depth
State point between similarity value be updated to -1, guaranteeing all default paths all is referred to from the state node remote from initial state
To the thought for arriving the state node close from initial state, allow to guarantee that algorithm is the worst in the case where excluding default path effect length
In the case of matching performance;Wherein depth refers to the depth of the node in state transition diagram, is term customary in the art, such as
Shown in Fig. 2, the depth of original state S6 is 1, then S2 depth 2, and the depth of final state S7 is exactly 4.
Step S230: the corresponding alphabet set of the similarity of state between any two is calculated;
Step S240: calculating the maximum spanning tree of the corresponding digraph of similarity matrix, i.e., using greedy thought, from similar
Each state node in the corresponding digraph of degree matrix is begun stepping through, and is chosen corresponding under the maximum transfer side i.e. state node
Maximum similarity shift side, to complete the construction of default path.Maximum spanning tree is made of default path, may be gone out
Existing multiple default paths, but can choose in a plurality of any one as a result.
Step S250: using the maximum spanning tree of the obtained digraph of step S240, the phase of the default path of selection is found
Like corresponding weight and corresponding alphabet is spent, delete in the state transfer before compression in DFA, and in original state
Increasing a weight in transfer path is the default path under default path weight (similarity), completes to store original DFA
It carries out simplifying compression.
It is that above-mentioned steps execute as a result, be compressed DFA state storage, so far one layer of DFA memory space
Compression is completed.
To allow features described above and effect of the invention that can illustrate more clearly understandable, special embodiment below, and cooperate
Bright book attached drawing is described in detail below.
Before the explanation of embodiment, we are first defined the concept designed in embodiment, this definition is only used for solving
One embodiment of the present of invention is released, not limitation and the present invention and the present embodiment:
(1) Ci, Cj are two different elements in alphabet Σ, for arbitrary state u ∈ Q, have δ (u, Ci)=
δ (u, Cj) then claims to motivate character Ci and Cj is isovalent element, and Q is the state set in finite-state automata five-tuple, and δ is
C in state transfer, that is, state u input alphabetiThe state and input C being transferred tojIt is identical to shift obtained state, then
Indicate CiAnd CjIt is isovalent element.
(2)If any character Ci, Cj ∈ Σ i, free position u ∈ Q, so that δ (u, Ci)=δ (u, Cj), then
Set Σ i is the equivalence class of alphabet Σ.
(3) similar: for any two different conditions u in state set, v, the different word of any two in alphabet
A, b are accorded with, if it exists integer W, so that δ (u, a)=δ (v, a)+W and δ (u, b)=δ (v, b)+W.Then claim to motivate character a and b in shape
It is similar in state point u and v.
(4) similar division: for any two different conditions u, the v in state set, there are the subset Σ 1 of character list Σ,
Σ 2 ..., Σ n and ∪ Σ i=Σ,I ≠ j hasSo thatThere are δ (u, a)
=δ (v, a)+Wi andI ≠ j has Wi ≠ Wj, then claims Σ 1, Σ 2 ..., Σ n is Σ in state point u and v
Similar division.
(5) similarity: element number in the most similar dividing subset of the corresponding element number of two state point u and v is shape
Similarity between state point u and v.
(6) default transfer shifts weight with default: if it is corresponding to remove character in Σ i from the transfer of the state of state point u
State transfer, and add one to state point u and shifted from state point u to the state of state point v, claim this newly added
State transfer is shifted from state point u to the default of state point v.In addition, Wi is referred to as the default transfer weight of state u to state v,
Referred to as state u is denoted as Wuv to the weight of state v.
Space Explosion caused by DFA when in order to overcome the problems, such as that regular expression is matched, the present invention propose that one kind is based on
The DFA memory space compression method of alphabet and state transfer side compression, in the case where guaranteeing DFA pattern match correctness,
It solves the problems, such as DFA matching performance and memory space is greatly lowered.Fig. 1 is of the invention based on alphabet and shape transfer side pressure
One embodiment flow chart of DFA memory space compression method of contracting specifically includes that alphabet compression and DFA state transfer side compression
Two main submethods.
Alphabet compresses submethod, refers to the division of isovalent element each other in the alphabet for carrying out the DFA of pattern match
To the same equivalence class, and then the redundancy in alphabet is eliminated, thus memory consumption when reducing memory state machine, centainly
Solve the problems, such as that DFA memory space explodes in degree.This method includes step shown in table 1:
(wherein N is DFA to the relevant information on step S110: one 256*N of creation two-dimensional array storage state transfer side
Status number);
Step S120: a scanning, i.e. reading transfer matrix information are carried out to state transfer side, by the value of each column with word
Accord with the storage of string form;
Step S130: it is successively read the corresponding character string of each letter in alphabet, judges whether character string is equal, is marked
Whether can merge;
Step S140: the alphabet element for having identical combinable label is merged, the character list after being divided, i.e., and one
256 ascii characters are mapped to the Hash table of multiple equivalence classes.
1 alphabet compression method of table
After above-mentioned steps, the equivalence class partition set set [alpha] of state machine alphabet set is obtained, eliminates word
Redundancy in matrix.
DFA state shifts side and compresses submethod, refers to and turns using using the maximum spanning tree building default for solving digraph
The method for moving path reduces state and shifts side, to reduce the memory space of DFA.The submethod includes step shown in table 2:
Step S210: calculate in state set state similarity matrix resembleDegree [n] [n] between any two and
Weight matrix defaultWeight [n] [n] is corresponded under the similarity.
Step S220: the low state node of similarity matrix resembleDegree [n] [n] interior joint depth is directed toward deep
Similarity value between degree high state node is set as -1;
Step S230: the corresponding alphabet set (character set) of similarity of state between any two is calculated
linkedAlphabet[n][n];
Step S240: calculating the maximum spanning tree of the corresponding digraph of similarity matrix, i.e., using greedy thought, from similar
Each state node in the corresponding digraph of degree matrix is begun stepping through, and is chosen corresponding under the maximum transfer side i.e. state node
Maximum similarity shift side, to complete the construction of default path.
Step S250: using the maximum spanning tree of the obtained digraph of step S240, the phase of the default path of selection is found
Like the corresponding weight of degree and corresponding alphabet, it is assumed that there are state A is directed toward the default path between state B, and A is directed toward B's
The corresponding weight in adjacent side is resembleDegree [i] [j], is deleted in the state transfer before compression in DFA, the state
Shift similarity weight resembleDegree [i] [j] that side is correspondence default path since state A and corresponding thereto
The transfer of character in the character set linkedAlphabet [i] [j] of (with similarity weight resembleDegree [i] [j])
Side, then increasing a weight in original state transition path is default path similarity resembleDegree [i] [j]
Under corresponding weight defaultWeight [i] [j] default path, complete to carry out original DFA storage simplified pressure
Contracting.
Fig. 3 show the DFA Structure before the compression of transfer side, has U in figure, tri- states of V, W, state U, V are identical in receiving
Input character after all turn to identical state W, after the concept of use " default path ", by state U, the state between W turns
It moves path to remove, and is U, increase one " default path " as shown in figure 4, DDFA figure means added construction between V
Default path.There is no any input character on this paths, it is meant that when input character is character all no on other sides
When, selection uses the paths.For example, when it is a or b that U, which encounters input character, due to there is no discovery to be labeled as the road of a or b
Diameter, can only select default path to be transferred to state V, be further continued for a the or b character that matching needs at this time and reach state W.
2 DFA state of table shifts side compression method
It is that above-mentioned steps execute as a result, be compressed DFA state storage, so far one layer of DFA memory space
Compression is completed.
The following are system embodiment corresponding with above method embodiment, present embodiment can be mutual with above embodiment
Cooperation is implemented.The relevant technical details mentioned in above embodiment are still effective in the present embodiment, in order to reduce repetition,
Which is not described herein again.Correspondingly, the relevant technical details mentioned in present embodiment are also applicable in above embodiment.
The invention also discloses a kind of regular expression DFA space compression system, including:
Module 1 is scanned by the state transfer side to regular expression, transfer matrix is read in, by the transfer matrix
In the values of each column stored with character string forms;Module 2, by the corresponding character list cell of character string equal in the two-dimensional array
Element merges, and obtains the Hash table of multiple equivalence classes;Module 3 shifts side compression algorithm by pseudocode state, calculates the canonical table
Up to the similarity weight corresponding with the similarity between state two-by-two in the state machine of formula, passes through the similarity and construct similarity moment
Battle array;Module 4, according to the state depth of state point each in the state machine, update the similarity matrix;Module 5 is calculated two-by-two
The corresponding character set of similarity between state;Module 6, from each state node in the corresponding digraph of the similarity matrix
It begins stepping through, chooses the state node corresponding maximum similarity transfer side to complete the construction of default path and it is silent to save this
Path is recognized as maximum spanning tree;Module 7, using the maximum spanning tree, find the corresponding weight of similarity of the default path
And the corresponding Hash table, it deletes in the state transfer before compression in DFA, and in the state transition path before deletion
Increase the default path that a weight is default path weight, to complete to carry out original DFA storage to simplify compression.
The regular expression DFA space compression system, wherein the module 4 include: module 41, will be in the state machine
The low state point of depth is updated to -1 to the similarity value between the high state point of depth.
The regular expression DFA space compression system, wherein the module 1 includes: module 11, according to regular expressions
State machine in alphabet and the state machine status number, creation for storage state transfer side information two-dimensional array.
The regular expression DFA space compression system, wherein the module 7 includes: module 71, according to default path
The corresponding weight of similarity and corresponding alphabet find the default path that state node A is directed toward state node B, state section
The weight that point A is directed toward the adjacent side of state node B is resembleDegree [i] [j], deletes pair since state node A
Answer the similarity weight resembleDegree [i] [j] and character set corresponding thereto of default path
The transfer side of character in linkedAlphabet [i] [j], it is silent for increasing a weight in original state transition path later
Recognize the default road of the corresponding weight defaultWeight [i] [j] under similarity of paths resembleDegree [i] [j]
Diameter.
The regular expression DFA space compression system, wherein the state transition path includes multiple state transfers side.
Claims (10)
1. a kind of regular expression DFA space compression method characterized by comprising
Step 1 is scanned by the state transfer side to regular expression, reads in transfer matrix, will be every in the transfer matrix
The value of one column is stored with character string forms;
Step 2 merges the corresponding alphabet element of character string equal in the two-dimensional array, obtains the Kazakhstan of multiple equivalence classes
Uncommon table;
Step 3 shifts side compression algorithm by pseudocode state, calculates in the state machine of the regular expression between state two-by-two
Similarity weight corresponding with the similarity constructs similarity matrix by the similarity;
Step 4, according to the state depth of state point each in the state machine, update the similarity matrix;
Step 5, the corresponding character set of the similarity being calculated between state two-by-two;
Step 6 is begun stepping through from each state node in the corresponding digraph of the similarity matrix, chooses the state node pair
The maximum similarity transfer side answered saves the default path as maximum spanning tree to complete the construction of default path;
Step 7, using the maximum spanning tree, find the corresponding weight of similarity and the corresponding Hash of the default path
Table is deleted in the state transfer before compression in DFA, and it is silent for increasing a weight in the state transition path before deletion
The default path of routine weight value is recognized, to complete to carry out original DFA storage to simplify compression.
2. regular expression DFA space compression method as described in claim 1, which is characterized in that the step 4 includes: step
41, the low state point of the depth in the state machine is updated to -1 to the similarity value between the high state point of depth.
3. regular expression DFA space compression method as described in claim 1, which is characterized in that the step 1 includes: step
11, in state machine according to regular expressions alphabet and the state machine status number, creation for storage state transfer side letter
The two-dimensional array of breath.
4. regular expression DFA space compression method as described in claim 1, which is characterized in that the step 7 includes: step
71, it according to the corresponding weight of the similarity of default path and corresponding alphabet, finds state node A and is directed toward state node B
Default path, state node A be directed toward state node B adjacent side weight be resembleDegree [i] [j], delete from
The similarity weight resembleDegree [i] [j] for the correspondence default path that state node A starts and character corresponding thereto
The transfer side of character, increases a weight in original state transition path later in set linkedAlphabet [i] [j]
For the silent of the corresponding weight defaultWeight [i] [j] under default path similarity resembleDegree [i] [j]
Recognize path.
5. regular expression DFA space compression method as claimed in claim 4, which is characterized in that the state transition path packet
Include multiple state transfers side.
6. a kind of regular expression DFA space compression system characterized by comprising
Module 1 is scanned by the state transfer side to regular expression, reads in transfer matrix, will be every in the transfer matrix
The value of one column is stored with character string forms;
Module 2 merges the corresponding alphabet element of character string equal in the two-dimensional array, obtains the Kazakhstan of multiple equivalence classes
Uncommon table;
Module 3 shifts side compression algorithm by pseudocode state, calculates in the state machine of the regular expression between state two-by-two
Similarity weight corresponding with the similarity constructs similarity matrix by the similarity;
Module 4, according to the state depth of state point each in the state machine, update the similarity matrix;
Module 5, the corresponding character set of the similarity being calculated between state two-by-two;
Module 6 is begun stepping through from each state node in the corresponding digraph of the similarity matrix, chooses the state node pair
The maximum similarity transfer side answered saves the default path as maximum spanning tree to complete the construction of default path;
Module 7, using the maximum spanning tree, find the corresponding weight of similarity and the corresponding Hash of the default path
Table is deleted in the state transfer before compression in DFA, and it is silent for increasing a weight in the state transition path before deletion
The default path of routine weight value is recognized, to complete to carry out original DFA storage to simplify compression.
7. regular expression DFA space compression system as claimed in claim 6, which is characterized in that the module 4 includes: module
41, the low state point of the depth in the state machine is updated to -1 to the similarity value between the high state point of depth.
8. regular expression DFA space compression system as claimed in claim 6, which is characterized in that the module 1 includes: module
11, in state machine according to regular expressions alphabet and the state machine status number, creation for storage state transfer side letter
The two-dimensional array of breath.
9. regular expression DFA space compression system as claimed in claim 6, which is characterized in that the module 7 includes: module
71, it according to the corresponding weight of the similarity of default path and corresponding alphabet, finds state node A and is directed toward state node B
Default path, state node A be directed toward state node B adjacent side weight be resembleDegree [i] [j], delete from
The similarity weight resembleDegree [i] [j] for the correspondence default path that state node A starts and character corresponding thereto
The transfer side of character, increases a weight in original state transition path later in set linkedAlphabet [i] [j]
For the silent of the corresponding weight defaultWeight [i] [j] under default path similarity resembleDegree [i] [j]
Recognize path.
10. regular expression DFA space compression system as claimed in claim 9, which is characterized in that the state transition path packet
Include multiple state transfers side.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910134200.9A CN109977275A (en) | 2019-02-22 | 2019-02-22 | A kind of regular expression DFA space compression method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910134200.9A CN109977275A (en) | 2019-02-22 | 2019-02-22 | A kind of regular expression DFA space compression method and system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109977275A true CN109977275A (en) | 2019-07-05 |
Family
ID=67077302
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910134200.9A Pending CN109977275A (en) | 2019-02-22 | 2019-02-22 | A kind of regular expression DFA space compression method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109977275A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111262589A (en) * | 2020-01-13 | 2020-06-09 | 奇安信科技集团股份有限公司 | DFA space compression method and device |
CN113381986A (en) * | 2021-05-30 | 2021-09-10 | 北京亚鸿世纪科技发展有限公司 | Reduction method and device for network security scanning rule set |
CN117709298A (en) * | 2024-02-05 | 2024-03-15 | 中国电子信息产业集团有限公司第六研究所 | Double character stream scanning method, electronic equipment, storage medium and system |
-
2019
- 2019-02-22 CN CN201910134200.9A patent/CN109977275A/en active Pending
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111262589A (en) * | 2020-01-13 | 2020-06-09 | 奇安信科技集团股份有限公司 | DFA space compression method and device |
CN111262589B (en) * | 2020-01-13 | 2024-04-12 | 奇安信科技集团股份有限公司 | DFA space compression method and device |
CN113381986A (en) * | 2021-05-30 | 2021-09-10 | 北京亚鸿世纪科技发展有限公司 | Reduction method and device for network security scanning rule set |
CN117709298A (en) * | 2024-02-05 | 2024-03-15 | 中国电子信息产业集团有限公司第六研究所 | Double character stream scanning method, electronic equipment, storage medium and system |
CN117709298B (en) * | 2024-02-05 | 2024-05-07 | 中国电子信息产业集团有限公司第六研究所 | Double character stream scanning method, electronic equipment, storage medium and system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106797446B (en) | Historical search based on memory | |
CN109977275A (en) | A kind of regular expression DFA space compression method and system | |
CN107463666B (en) | sensitive word filtering method based on text content | |
US20050251509A1 (en) | System and method of paralled pattern matching | |
Gawrychowski et al. | Order-preserving pattern matching with k mismatches | |
Kida et al. | Multiple pattern matching in LZW compressed text | |
CN110069726B (en) | Prediction method of anchor link relation between document networks suitable for DBLP and arXiv | |
Hayes et al. | The forgiving graph: a distributed data structure for low stretch under adversarial attack | |
CN110309368A (en) | Determination method, apparatus, storage medium and the electronic device of data address | |
Wang et al. | Memory-based architecture for multicharacter Aho–Corasick string matching | |
CN110719106A (en) | Social network graph compression method and system based on node classification and sorting | |
WO2010020101A1 (en) | Method for monitoring pictures or multimedia video pictures in communication system | |
CN100530194C (en) | Key words matching method and system | |
CN106599097A (en) | Massive feature string sets matching method and apparatus | |
US6976025B2 (en) | Database and method for storing a searchable set of keywords | |
CN113360501A (en) | Block chain-based distributed data storage method and system | |
CN106708532B (en) | Multilevel regular expression matching method based on TCAM | |
CN107124410A (en) | Network safety situation feature clustering method based on machine deep learning | |
JP4108337B2 (en) | Electronic filing system and search index creation method thereof | |
CN113065419B (en) | Pattern matching algorithm and system based on flow high-frequency content | |
CN115149962A (en) | Deterministic finite automata compression method, device, equipment and storage medium | |
CN111884659B (en) | Compression method and device of FST data | |
KR100992440B1 (en) | A Multiple Pattern Matching Method using Multiple Consecutive Sub-patterns | |
Tripp | A parallel “String Matching Engine” for use in high speed network intrusion detection systems | |
CN112365247A (en) | Block chain storage optimization method based on remainder system and RAFT algorithm |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20190705 |