CN109977275A - A kind of regular expression DFA space compression method and system - Google Patents

A kind of regular expression DFA space compression method and system Download PDF

Info

Publication number
CN109977275A
CN109977275A CN201910134200.9A CN201910134200A CN109977275A CN 109977275 A CN109977275 A CN 109977275A CN 201910134200 A CN201910134200 A CN 201910134200A CN 109977275 A CN109977275 A CN 109977275A
Authority
CN
China
Prior art keywords
state
similarity
weight
dfa
regular expression
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910134200.9A
Other languages
Chinese (zh)
Inventor
高曌
孙毅
张志强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Computing Technology of CAS
Original Assignee
Institute of Computing Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Computing Technology of CAS filed Critical Institute of Computing Technology of CAS
Priority to CN201910134200.9A priority Critical patent/CN109977275A/en
Publication of CN109977275A publication Critical patent/CN109977275A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1416Event detection, e.g. attack signature detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The present invention relates to a kind of regular expression DFA space compression method and systems, comprising: is scanned to state transfer side, reads in transfer matrix, the value of column each in transfer matrix is stored with character string forms;The equal corresponding alphabet element of character string is merged, multiple Hash tables are obtained;The similarity and the corresponding weight of similarity between state two-by-two is calculated in the state machine of regular expression, similarity matrix is constructed by similarity;According to the state depth of state point each in state machine, similarity matrix is updated;It is begun stepping through from each state node in the corresponding digraph of similarity matrix, it chooses the corresponding maximum similarity of state node and shifts side, to complete the construction of default path, default path is saved as maximum spanning tree, the corresponding weight of similarity and corresponding Hash table to find default path, it deletes in the state transfer before compression in DFA, and increases the default path that a weight is default path weight in the state transition path before deletion.

Description

A kind of regular expression DFA space compression method and system
Technical field
The present invention relates to network safety fileds, and in particular to a kind of regular expression DFA space compression method and system.
Background technique
With the rapid development and extensive use of computer network, internet security problem inevitably appears in people In front.It diversified network flow and is served by and various Means of Intrusion, attack pattern emerge one after another.Therefore.Deep packet Detection is widely used in disparate networks service, content analysis is carried out to network packet, by what is carried in network packet The set of patterns of content (content analysis result) and characterization virus characteristic set in advance carries out characteristic matching, so that it is determined that data packet In whether carry illegal contents.The number to be detected that regular expression matching method based on DFA reaches real time high-speed due to it There is the characteristics of preferable matching effect according to stream and extensive pattern match, is widely used.
DFA, that is, deterministic finite automation, the mathematical model being made of five elements (S, Σ, δ, S0, F).Use base During the pattern match of DFA carries out regular expression matching, since status number and the rule set of DFA are closely related, magnanimity And ever-increasing rule set expression formula quantity makes DFA number of states increase severely, and leads to state explosion.In addition, in regular expression Existing asterisk wildcard and length limitation are further exacerbated by the space hold of DFA, and current hardware condition is unable to satisfy caused by DFA Huge space requirement.Finally, needing to save all possible matchings during the regular expression progress based on DFA is matched, again Speed and practical performance to matching process propose challenge.
Its occupied space is reduced to acceptable degree, generating state is avoided to explode, thus sharp by DFA space compression High pattern matching speed is obtained with lesser memory space, DFA high speed is applicable in network application and completes regular expression mould The matched key technology of formula.Since the regular expression matching in network application is generally applicable in the ACSCII character set conduct of extension Alphabet Σ, size 256, there are 256 to go out side for each node, exists in state transition diagram in large scale a large amount of There are bulk redundancy side in duplicate message, that is, DFA, existing state transfer compression method is difficult to balance due to needing to make again With default path length guarantee algorithm worst case under matching performance and compression after the matched demand two without specific state Matching speed caused by relationship between person and performance issue.
Summary of the invention
In order to overcome DFA state explosion during DFA realization regular expression matching to cause memory space occupancy that can not expire The problem of foot, the invention proposes a kind of regular expression DFA space compression method, compressed from alphabet eliminate redundancy side and Two aspect of DFA state transfer side compression is started with, and solves the problems, such as that DFA memory space explodes in regular expression matching.Redundancy side It must be state transfer side, but state transfer side is not necessarily redundancy, removes redundancy side and has no effect on it to regular expressions The expression of formula, it is possible to remove to reduce the occupied space DFA, to reduce the time of matching regular expression.
Specifically, the invention discloses a kind of regular expression DFA space compression methods characterized by comprising
Step 1 is scanned by the state transfer side to regular expression, transfer matrix is read in, by the transfer matrix In the values of each column stored with character string forms;
Step 2 merges the corresponding alphabet element of character string equal in the two-dimensional array, obtains multiple equivalence classes Hash table;
Step 3 shifts side compression algorithm by pseudocode state, calculates in the state machine of the regular expression state two-by-two Between similarity weight corresponding with the similarity, pass through the similarity construct similarity matrix;
Step 4, according to the state depth of state point each in the state machine, update the similarity matrix;
Step 5, the corresponding character set of the similarity being calculated between state two-by-two;
Step 6 is begun stepping through from each state node in the corresponding digraph of the similarity matrix, chooses the state section Point corresponding maximum similarity transfer side saves the default path as maximum spanning tree to complete the construction of default path;
Step 7, using the maximum spanning tree, find the default path the corresponding weight of similarity and the corresponding Kazakhstan Uncommon table is deleted in the state transfer before compression in DFA, and increases a weight in the state transition path before deletion and be The default path of default path weight, to complete to carry out original DFA storage to simplify compression.
The regular expression DFA space compression method, wherein the step 4 include: step 41, will be in the state machine The low state point of depth is updated to -1 to the similarity value between the high state point of depth.
The regular expression DFA space compression method, wherein the step 1 includes: step 11, according to regular expressions State machine in alphabet and the state machine status number, creation for storage state transfer side information two-dimensional array.
The regular expression DFA space compression method, wherein the step 7 includes: step 71, according to default path The corresponding weight of similarity and corresponding alphabet find the default path that state node A is directed toward state node B, state section The weight that point A is directed toward the adjacent side of state node B is resembleDegree [i] [j], deletes pair since state node A Answer the similarity weight resembleDegree [i] [j] and character set corresponding thereto of default path The transfer side of character in linkedAlphabet [i] [j], it is silent for increasing a weight in original state transition path later Recognize the default road of the corresponding weight defaultWeight [i] [j] under similarity of paths resembleDegree [i] [j] Diameter.
The regular expression DFA space compression method, wherein the state transition path includes multiple state transfers side.
The invention also discloses a kind of regular expression DFA space compression system, including:
Module 1 is scanned by the state transfer side to regular expression, transfer matrix is read in, by the transfer matrix In the values of each column stored with character string forms;
Module 2 merges the corresponding alphabet element of character string equal in the two-dimensional array, obtains multiple equivalence classes Hash table;
Module 3 shifts side compression algorithm by pseudocode state, calculates in the state machine of the regular expression state two-by-two Between similarity weight corresponding with the similarity, pass through the similarity construct similarity matrix;
Module 4, according to the state depth of state point each in the state machine, update the similarity matrix;
Module 5, the corresponding character set of the similarity being calculated between state two-by-two;
Module 6 is begun stepping through from each state node in the corresponding digraph of the similarity matrix, chooses the state section Point corresponding maximum similarity transfer side saves the default path as maximum spanning tree to complete the construction of default path;
Module 7, using the maximum spanning tree, find the default path the corresponding weight of similarity and the corresponding Kazakhstan Uncommon table is deleted in the state transfer before compression in DFA, and increases a weight in the state transition path before deletion and be The default path of default path weight, to complete to carry out original DFA storage to simplify compression.
The regular expression DFA space compression system, wherein the module 4 include: module 41, will be in the state machine The low state point of depth is updated to -1 to the similarity value between the high state point of depth.
The regular expression DFA space compression system, wherein the module 1 includes: module 11, according to regular expressions State machine in alphabet and the state machine status number, creation for storage state transfer side information two-dimensional array.
The regular expression DFA space compression system, wherein the module 7 includes: module 71, according to default path The corresponding weight of similarity and corresponding alphabet find the default path that state node A is directed toward state node B, state section The weight that point A is directed toward the adjacent side of state node B is resembleDegree [i] [j], deletes pair since state node A Answer the similarity weight resembleDegree [i] [j] and character set corresponding thereto of default path The transfer side of character in linkedAlphabet [i] [j], it is silent for increasing a weight in original state transition path later Recognize the default road of the corresponding weight defaultWeight [i] [j] under similarity of paths resembleDegree [i] [j] Diameter.
The regular expression DFA space compression system, wherein the state transition path includes multiple state transfers side.
The invention has the benefit that the DFA memory space compression method proposed through the invention, using equivalence class Method is split alphabet set, is divided into smaller alphabet set, and rebuild and wear an aromatic plant metioned in ancient books default path Compressive state shifts the information on side, reduces consumed space when DFA storage and matching, so that memory space declines to a great extent, There is very big promotion with efficiency.The present invention may be implemented DFA be grouped compression after can quickly and judge whether be virus because Viral data packet often has fixed DFA expression formula, and present invention is primarily intended to improve the speed of detection DFA expression formula.
Detailed description of the invention
Fig. 1 is that the present invention is based on the one embodiment processes of DFA memory space compression method of alphabet and shape transfer side compression Figure;
Fig. 2 is state depth schematic diagram;
Fig. 3 is the DFA Structure figure shifted before the compression of side;
Fig. 4 shifts the compressed DDFA structure chart in side.
Specific embodiment
In order to overcome DFA state explosion during DFA realization regular expression matching to cause memory space occupancy that can not expire The problem of foot, the invention proposes a kind of DFA compression methods, in the case where guaranteeing DFA pattern match correctness, solve DFA The problem of memory space explodes.It specifically includes that two submethods are compressed in the segmentation of DFA alphabet and DFA state transfer side.
Regular expression state machine (state machine) is by a two-dimensional array come flag state transition rule, two dimension The respectively status number of alphabet ∑ and state machine.DFA alphabet divide submethod, refer to regular expression (each just Then expression formula can be expressed as a regular expression state machine) state transfer side carry out one time scanning, will be in alphabet ∑ Element be respectively divided in different groups, the identical character representation of the letter character in same group finds out alphabet ∑ All equivalence classes, to be compressed to DFA, memory headroom needed for reducing DFA storage.It includes the following steps:
(wherein M is state machine to the relevant information on step S110: one M*N of creation two-dimensional array storage state transfer side The number of alphabetical table element, N are the status numbers of DFA);
Step S120: a scanning, i.e. reading transfer matrix information are carried out to state transfer side, by the value of each column with word Accord with the storage of string form;
Step S130: it is successively read the corresponding character string of each letter in alphabet, judges whether character string is equal, is marked Whether can merge, i.e., if inputting character string all phases that any character is obtained from the transfer of the elementary state in transfer matrix Together, then it is of equal value, so as to be divided into an equivalence class, this namely process of Hash;
Step S140: the alphabet element for having identical combinable label is merged, the character list after being divided, i.e., and one 256 ascii characters are mapped to the Hash table of multiple equivalence classes.Hash is a kind of ashing technique, alphabet is divided into more The process of a equivalence class is exactly the process hashed, is Hash process.
Above-mentioned alphabet compression step obtains multiple equivalence classes of regular expression state machine alphabet ∑ division, can be with The quantity that each node out-degree is compressed to equivalence class from 256, and above-mentioned Hash table will shift side as submethod DFA state The input of compression.Equivalence class is divided in alphabet set, set [alpha] is exactly the Hash in algorithm alphabet partitioning algorithm The embodiment of table.
Submethod is compressed on so-called DFA state transfer side, refers to silent using being constructed using the maximum spanning tree for solving digraph The method for recognizing transfer path reduces state transfer side, to reduce the memory space of DFA.Specific step is as follows:
Step S210: shifting side compression algorithm by pseudocode state, calculates the similarity of each state of DFA between any two Weight corresponding with the similarity;(similarity between state, which may indicate that, can merge reduced number of edges between two states, And prepare for building spanning tree hereinafter, weight is that the weight of possible default path is prepared)
Step S220: similarity matrix is pre-processed its depth according to state depth.That is, in order to construct depth High node is directed toward the digraph of the low node of depth, and the low state point of the depth in similarity matrix (node) is high to depth State point between similarity value be updated to -1, guaranteeing all default paths all is referred to from the state node remote from initial state To the thought for arriving the state node close from initial state, allow to guarantee that algorithm is the worst in the case where excluding default path effect length In the case of matching performance;Wherein depth refers to the depth of the node in state transition diagram, is term customary in the art, such as Shown in Fig. 2, the depth of original state S6 is 1, then S2 depth 2, and the depth of final state S7 is exactly 4.
Step S230: the corresponding alphabet set of the similarity of state between any two is calculated;
Step S240: calculating the maximum spanning tree of the corresponding digraph of similarity matrix, i.e., using greedy thought, from similar Each state node in the corresponding digraph of degree matrix is begun stepping through, and is chosen corresponding under the maximum transfer side i.e. state node Maximum similarity shift side, to complete the construction of default path.Maximum spanning tree is made of default path, may be gone out Existing multiple default paths, but can choose in a plurality of any one as a result.
Step S250: using the maximum spanning tree of the obtained digraph of step S240, the phase of the default path of selection is found Like corresponding weight and corresponding alphabet is spent, delete in the state transfer before compression in DFA, and in original state Increasing a weight in transfer path is the default path under default path weight (similarity), completes to store original DFA It carries out simplifying compression.
It is that above-mentioned steps execute as a result, be compressed DFA state storage, so far one layer of DFA memory space Compression is completed.
To allow features described above and effect of the invention that can illustrate more clearly understandable, special embodiment below, and cooperate Bright book attached drawing is described in detail below.
Before the explanation of embodiment, we are first defined the concept designed in embodiment, this definition is only used for solving One embodiment of the present of invention is released, not limitation and the present invention and the present embodiment:
(1) Ci, Cj are two different elements in alphabet Σ, for arbitrary state u ∈ Q, have δ (u, Ci)= δ (u, Cj) then claims to motivate character Ci and Cj is isovalent element, and Q is the state set in finite-state automata five-tuple, and δ is C in state transfer, that is, state u input alphabetiThe state and input C being transferred tojIt is identical to shift obtained state, then Indicate CiAnd CjIt is isovalent element.
(2)If any character Ci, Cj ∈ Σ i, free position u ∈ Q, so that δ (u, Ci)=δ (u, Cj), then Set Σ i is the equivalence class of alphabet Σ.
(3) similar: for any two different conditions u in state set, v, the different word of any two in alphabet A, b are accorded with, if it exists integer W, so that δ (u, a)=δ (v, a)+W and δ (u, b)=δ (v, b)+W.Then claim to motivate character a and b in shape It is similar in state point u and v.
(4) similar division: for any two different conditions u, the v in state set, there are the subset Σ 1 of character list Σ, Σ 2 ..., Σ n and ∪ Σ i=Σ,I ≠ j hasSo thatThere are δ (u, a) =δ (v, a)+Wi andI ≠ j has Wi ≠ Wj, then claims Σ 1, Σ 2 ..., Σ n is Σ in state point u and v Similar division.
(5) similarity: element number in the most similar dividing subset of the corresponding element number of two state point u and v is shape Similarity between state point u and v.
(6) default transfer shifts weight with default: if it is corresponding to remove character in Σ i from the transfer of the state of state point u State transfer, and add one to state point u and shifted from state point u to the state of state point v, claim this newly added State transfer is shifted from state point u to the default of state point v.In addition, Wi is referred to as the default transfer weight of state u to state v, Referred to as state u is denoted as Wuv to the weight of state v.
Space Explosion caused by DFA when in order to overcome the problems, such as that regular expression is matched, the present invention propose that one kind is based on The DFA memory space compression method of alphabet and state transfer side compression, in the case where guaranteeing DFA pattern match correctness, It solves the problems, such as DFA matching performance and memory space is greatly lowered.Fig. 1 is of the invention based on alphabet and shape transfer side pressure One embodiment flow chart of DFA memory space compression method of contracting specifically includes that alphabet compression and DFA state transfer side compression Two main submethods.
Alphabet compresses submethod, refers to the division of isovalent element each other in the alphabet for carrying out the DFA of pattern match To the same equivalence class, and then the redundancy in alphabet is eliminated, thus memory consumption when reducing memory state machine, centainly Solve the problems, such as that DFA memory space explodes in degree.This method includes step shown in table 1:
(wherein N is DFA to the relevant information on step S110: one 256*N of creation two-dimensional array storage state transfer side Status number);
Step S120: a scanning, i.e. reading transfer matrix information are carried out to state transfer side, by the value of each column with word Accord with the storage of string form;
Step S130: it is successively read the corresponding character string of each letter in alphabet, judges whether character string is equal, is marked Whether can merge;
Step S140: the alphabet element for having identical combinable label is merged, the character list after being divided, i.e., and one 256 ascii characters are mapped to the Hash table of multiple equivalence classes.
1 alphabet compression method of table
After above-mentioned steps, the equivalence class partition set set [alpha] of state machine alphabet set is obtained, eliminates word Redundancy in matrix.
DFA state shifts side and compresses submethod, refers to and turns using using the maximum spanning tree building default for solving digraph The method for moving path reduces state and shifts side, to reduce the memory space of DFA.The submethod includes step shown in table 2:
Step S210: calculate in state set state similarity matrix resembleDegree [n] [n] between any two and Weight matrix defaultWeight [n] [n] is corresponded under the similarity.
Step S220: the low state node of similarity matrix resembleDegree [n] [n] interior joint depth is directed toward deep Similarity value between degree high state node is set as -1;
Step S230: the corresponding alphabet set (character set) of similarity of state between any two is calculated linkedAlphabet[n][n];
Step S240: calculating the maximum spanning tree of the corresponding digraph of similarity matrix, i.e., using greedy thought, from similar Each state node in the corresponding digraph of degree matrix is begun stepping through, and is chosen corresponding under the maximum transfer side i.e. state node Maximum similarity shift side, to complete the construction of default path.
Step S250: using the maximum spanning tree of the obtained digraph of step S240, the phase of the default path of selection is found Like the corresponding weight of degree and corresponding alphabet, it is assumed that there are state A is directed toward the default path between state B, and A is directed toward B's The corresponding weight in adjacent side is resembleDegree [i] [j], is deleted in the state transfer before compression in DFA, the state Shift similarity weight resembleDegree [i] [j] that side is correspondence default path since state A and corresponding thereto The transfer of character in the character set linkedAlphabet [i] [j] of (with similarity weight resembleDegree [i] [j]) Side, then increasing a weight in original state transition path is default path similarity resembleDegree [i] [j] Under corresponding weight defaultWeight [i] [j] default path, complete to carry out original DFA storage simplified pressure Contracting.
Fig. 3 show the DFA Structure before the compression of transfer side, has U in figure, tri- states of V, W, state U, V are identical in receiving Input character after all turn to identical state W, after the concept of use " default path ", by state U, the state between W turns It moves path to remove, and is U, increase one " default path " as shown in figure 4, DDFA figure means added construction between V Default path.There is no any input character on this paths, it is meant that when input character is character all no on other sides When, selection uses the paths.For example, when it is a or b that U, which encounters input character, due to there is no discovery to be labeled as the road of a or b Diameter, can only select default path to be transferred to state V, be further continued for a the or b character that matching needs at this time and reach state W.
2 DFA state of table shifts side compression method
It is that above-mentioned steps execute as a result, be compressed DFA state storage, so far one layer of DFA memory space Compression is completed.
The following are system embodiment corresponding with above method embodiment, present embodiment can be mutual with above embodiment Cooperation is implemented.The relevant technical details mentioned in above embodiment are still effective in the present embodiment, in order to reduce repetition, Which is not described herein again.Correspondingly, the relevant technical details mentioned in present embodiment are also applicable in above embodiment.
The invention also discloses a kind of regular expression DFA space compression system, including:
Module 1 is scanned by the state transfer side to regular expression, transfer matrix is read in, by the transfer matrix In the values of each column stored with character string forms;Module 2, by the corresponding character list cell of character string equal in the two-dimensional array Element merges, and obtains the Hash table of multiple equivalence classes;Module 3 shifts side compression algorithm by pseudocode state, calculates the canonical table Up to the similarity weight corresponding with the similarity between state two-by-two in the state machine of formula, passes through the similarity and construct similarity moment Battle array;Module 4, according to the state depth of state point each in the state machine, update the similarity matrix;Module 5 is calculated two-by-two The corresponding character set of similarity between state;Module 6, from each state node in the corresponding digraph of the similarity matrix It begins stepping through, chooses the state node corresponding maximum similarity transfer side to complete the construction of default path and it is silent to save this Path is recognized as maximum spanning tree;Module 7, using the maximum spanning tree, find the corresponding weight of similarity of the default path And the corresponding Hash table, it deletes in the state transfer before compression in DFA, and in the state transition path before deletion Increase the default path that a weight is default path weight, to complete to carry out original DFA storage to simplify compression.
The regular expression DFA space compression system, wherein the module 4 include: module 41, will be in the state machine The low state point of depth is updated to -1 to the similarity value between the high state point of depth.
The regular expression DFA space compression system, wherein the module 1 includes: module 11, according to regular expressions State machine in alphabet and the state machine status number, creation for storage state transfer side information two-dimensional array.
The regular expression DFA space compression system, wherein the module 7 includes: module 71, according to default path The corresponding weight of similarity and corresponding alphabet find the default path that state node A is directed toward state node B, state section The weight that point A is directed toward the adjacent side of state node B is resembleDegree [i] [j], deletes pair since state node A Answer the similarity weight resembleDegree [i] [j] and character set corresponding thereto of default path The transfer side of character in linkedAlphabet [i] [j], it is silent for increasing a weight in original state transition path later Recognize the default road of the corresponding weight defaultWeight [i] [j] under similarity of paths resembleDegree [i] [j] Diameter.
The regular expression DFA space compression system, wherein the state transition path includes multiple state transfers side.

Claims (10)

1. a kind of regular expression DFA space compression method characterized by comprising
Step 1 is scanned by the state transfer side to regular expression, reads in transfer matrix, will be every in the transfer matrix The value of one column is stored with character string forms;
Step 2 merges the corresponding alphabet element of character string equal in the two-dimensional array, obtains the Kazakhstan of multiple equivalence classes Uncommon table;
Step 3 shifts side compression algorithm by pseudocode state, calculates in the state machine of the regular expression between state two-by-two Similarity weight corresponding with the similarity constructs similarity matrix by the similarity;
Step 4, according to the state depth of state point each in the state machine, update the similarity matrix;
Step 5, the corresponding character set of the similarity being calculated between state two-by-two;
Step 6 is begun stepping through from each state node in the corresponding digraph of the similarity matrix, chooses the state node pair The maximum similarity transfer side answered saves the default path as maximum spanning tree to complete the construction of default path;
Step 7, using the maximum spanning tree, find the corresponding weight of similarity and the corresponding Hash of the default path Table is deleted in the state transfer before compression in DFA, and it is silent for increasing a weight in the state transition path before deletion The default path of routine weight value is recognized, to complete to carry out original DFA storage to simplify compression.
2. regular expression DFA space compression method as described in claim 1, which is characterized in that the step 4 includes: step 41, the low state point of the depth in the state machine is updated to -1 to the similarity value between the high state point of depth.
3. regular expression DFA space compression method as described in claim 1, which is characterized in that the step 1 includes: step 11, in state machine according to regular expressions alphabet and the state machine status number, creation for storage state transfer side letter The two-dimensional array of breath.
4. regular expression DFA space compression method as described in claim 1, which is characterized in that the step 7 includes: step 71, it according to the corresponding weight of the similarity of default path and corresponding alphabet, finds state node A and is directed toward state node B Default path, state node A be directed toward state node B adjacent side weight be resembleDegree [i] [j], delete from The similarity weight resembleDegree [i] [j] for the correspondence default path that state node A starts and character corresponding thereto The transfer side of character, increases a weight in original state transition path later in set linkedAlphabet [i] [j] For the silent of the corresponding weight defaultWeight [i] [j] under default path similarity resembleDegree [i] [j] Recognize path.
5. regular expression DFA space compression method as claimed in claim 4, which is characterized in that the state transition path packet Include multiple state transfers side.
6. a kind of regular expression DFA space compression system characterized by comprising
Module 1 is scanned by the state transfer side to regular expression, reads in transfer matrix, will be every in the transfer matrix The value of one column is stored with character string forms;
Module 2 merges the corresponding alphabet element of character string equal in the two-dimensional array, obtains the Kazakhstan of multiple equivalence classes Uncommon table;
Module 3 shifts side compression algorithm by pseudocode state, calculates in the state machine of the regular expression between state two-by-two Similarity weight corresponding with the similarity constructs similarity matrix by the similarity;
Module 4, according to the state depth of state point each in the state machine, update the similarity matrix;
Module 5, the corresponding character set of the similarity being calculated between state two-by-two;
Module 6 is begun stepping through from each state node in the corresponding digraph of the similarity matrix, chooses the state node pair The maximum similarity transfer side answered saves the default path as maximum spanning tree to complete the construction of default path;
Module 7, using the maximum spanning tree, find the corresponding weight of similarity and the corresponding Hash of the default path Table is deleted in the state transfer before compression in DFA, and it is silent for increasing a weight in the state transition path before deletion The default path of routine weight value is recognized, to complete to carry out original DFA storage to simplify compression.
7. regular expression DFA space compression system as claimed in claim 6, which is characterized in that the module 4 includes: module 41, the low state point of the depth in the state machine is updated to -1 to the similarity value between the high state point of depth.
8. regular expression DFA space compression system as claimed in claim 6, which is characterized in that the module 1 includes: module 11, in state machine according to regular expressions alphabet and the state machine status number, creation for storage state transfer side letter The two-dimensional array of breath.
9. regular expression DFA space compression system as claimed in claim 6, which is characterized in that the module 7 includes: module 71, it according to the corresponding weight of the similarity of default path and corresponding alphabet, finds state node A and is directed toward state node B Default path, state node A be directed toward state node B adjacent side weight be resembleDegree [i] [j], delete from The similarity weight resembleDegree [i] [j] for the correspondence default path that state node A starts and character corresponding thereto The transfer side of character, increases a weight in original state transition path later in set linkedAlphabet [i] [j] For the silent of the corresponding weight defaultWeight [i] [j] under default path similarity resembleDegree [i] [j] Recognize path.
10. regular expression DFA space compression system as claimed in claim 9, which is characterized in that the state transition path packet Include multiple state transfers side.
CN201910134200.9A 2019-02-22 2019-02-22 A kind of regular expression DFA space compression method and system Pending CN109977275A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910134200.9A CN109977275A (en) 2019-02-22 2019-02-22 A kind of regular expression DFA space compression method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910134200.9A CN109977275A (en) 2019-02-22 2019-02-22 A kind of regular expression DFA space compression method and system

Publications (1)

Publication Number Publication Date
CN109977275A true CN109977275A (en) 2019-07-05

Family

ID=67077302

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910134200.9A Pending CN109977275A (en) 2019-02-22 2019-02-22 A kind of regular expression DFA space compression method and system

Country Status (1)

Country Link
CN (1) CN109977275A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111262589A (en) * 2020-01-13 2020-06-09 奇安信科技集团股份有限公司 DFA space compression method and device
CN113381986A (en) * 2021-05-30 2021-09-10 北京亚鸿世纪科技发展有限公司 Reduction method and device for network security scanning rule set
CN117709298A (en) * 2024-02-05 2024-03-15 中国电子信息产业集团有限公司第六研究所 Double character stream scanning method, electronic equipment, storage medium and system

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111262589A (en) * 2020-01-13 2020-06-09 奇安信科技集团股份有限公司 DFA space compression method and device
CN111262589B (en) * 2020-01-13 2024-04-12 奇安信科技集团股份有限公司 DFA space compression method and device
CN113381986A (en) * 2021-05-30 2021-09-10 北京亚鸿世纪科技发展有限公司 Reduction method and device for network security scanning rule set
CN117709298A (en) * 2024-02-05 2024-03-15 中国电子信息产业集团有限公司第六研究所 Double character stream scanning method, electronic equipment, storage medium and system
CN117709298B (en) * 2024-02-05 2024-05-07 中国电子信息产业集团有限公司第六研究所 Double character stream scanning method, electronic equipment, storage medium and system

Similar Documents

Publication Publication Date Title
CN106797446B (en) Historical search based on memory
CN109977275A (en) A kind of regular expression DFA space compression method and system
CN107463666B (en) sensitive word filtering method based on text content
US20050251509A1 (en) System and method of paralled pattern matching
Gawrychowski et al. Order-preserving pattern matching with k mismatches
Kida et al. Multiple pattern matching in LZW compressed text
CN110069726B (en) Prediction method of anchor link relation between document networks suitable for DBLP and arXiv
Hayes et al. The forgiving graph: a distributed data structure for low stretch under adversarial attack
CN110309368A (en) Determination method, apparatus, storage medium and the electronic device of data address
Wang et al. Memory-based architecture for multicharacter Aho–Corasick string matching
CN110719106A (en) Social network graph compression method and system based on node classification and sorting
WO2010020101A1 (en) Method for monitoring pictures or multimedia video pictures in communication system
CN100530194C (en) Key words matching method and system
CN106599097A (en) Massive feature string sets matching method and apparatus
US6976025B2 (en) Database and method for storing a searchable set of keywords
CN113360501A (en) Block chain-based distributed data storage method and system
CN106708532B (en) Multilevel regular expression matching method based on TCAM
CN107124410A (en) Network safety situation feature clustering method based on machine deep learning
JP4108337B2 (en) Electronic filing system and search index creation method thereof
CN113065419B (en) Pattern matching algorithm and system based on flow high-frequency content
CN115149962A (en) Deterministic finite automata compression method, device, equipment and storage medium
CN111884659B (en) Compression method and device of FST data
KR100992440B1 (en) A Multiple Pattern Matching Method using Multiple Consecutive Sub-patterns
Tripp A parallel “String Matching Engine” for use in high speed network intrusion detection systems
CN112365247A (en) Block chain storage optimization method based on remainder system and RAFT algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20190705