CN115149962A - Deterministic finite automata compression method, device, equipment and storage medium - Google Patents

Deterministic finite automata compression method, device, equipment and storage medium Download PDF

Info

Publication number
CN115149962A
CN115149962A CN202210918797.8A CN202210918797A CN115149962A CN 115149962 A CN115149962 A CN 115149962A CN 202210918797 A CN202210918797 A CN 202210918797A CN 115149962 A CN115149962 A CN 115149962A
Authority
CN
China
Prior art keywords
state
deterministic finite
finite automaton
path
default
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210918797.8A
Other languages
Chinese (zh)
Inventor
黄昆
游芊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Peng Cheng Laboratory
Original Assignee
Peng Cheng Laboratory
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peng Cheng Laboratory filed Critical Peng Cheng Laboratory
Priority to CN202210918797.8A priority Critical patent/CN115149962A/en
Publication of CN115149962A publication Critical patent/CN115149962A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/90335Query processing
    • G06F16/90344Query processing by using string matching techniques

Abstract

The application discloses a deterministic finite automaton compression method, a device, equipment and a storage medium, wherein the method comprises the following steps: reading characteristic characters from the characteristic character string set; constructing a deterministic finite automaton based on the characteristic characters; based on the failure path and the default path of the state of the deterministic finite automaton, compressing the deterministic finite automaton to obtain a TCAM table; the TCAM table is used for string matching. According to the method and the device, the TCAM is compressed by adopting the failure path and the default path of the state of the deterministic finite automata, the TCAM does not need to be compressed through a public suffix tree, and the state coding length is not limited by character strings, so that the compression ratio of the DFA is improved.

Description

Deterministic finite automata compression method, device, equipment and storage medium
Technical Field
The present application relates to the field of computers, and in particular, to a deterministic finite automata compression method, apparatus, device, and storage medium.
Background
String matching algorithms are the core functions of content-based network applications such as firewalls, intrusion detection and prevention, application layer protocol identification, and the like, and typically use Deterministic Finite Automata (DFA) to represent a given set of characteristic string rules. In order to improve string matching performance, the DFA is compressed based on the TCAM.
At present, the TCAM-based DFA compression method is mainly a compact DFA method. The compact DFA method adopts a DFA state coding algorithm and a DFA transition edge fusion algorithm based on a common suffix tree, but the DFA state coding length is the longest character string length, so that the compression ratio of the DFA is low.
Disclosure of Invention
In view of the above, the present application provides a deterministic finite automata compression method, apparatus, device and storage medium, which are intended to improve the compression ratio of DFA.
To achieve the above object, the present application provides a deterministic finite automaton compression method, comprising:
reading characteristic characters from the characteristic character string set;
constructing a deterministic finite automaton based on the characteristic characters;
based on the failure path and the default path of the state of the deterministic finite automaton, compressing the deterministic finite automaton to obtain a TCAM table; the TCAM table is used for string matching.
Illustratively, the compressing the deterministic finite automaton based on the failure path and the default path of the state of the deterministic finite automaton to obtain the TCAM table includes:
performing state coding on the deterministic finite automaton based on a failure path of the states of the deterministic finite automaton to obtain source codes and target codes corresponding to a plurality of states of the deterministic finite automaton;
and based on the source code, the target code and the default path of each state, compressing the migration edge of the deterministic finite automaton to obtain an item in the TCAM.
Illustratively, the state encoding of the deterministic finite automaton based on the failure path of the states of the deterministic finite automaton to obtain source codes and destination codes corresponding to a plurality of states of the deterministic finite automaton includes:
determining a default migration edge of each state based on a failure path of the states of the deterministic finite automaton;
constructing a coding tree which takes each state as a state node and the default migration edge as an edge;
and calculating the source code and the destination code corresponding to each state based on the coding tree.
Illustratively, the determining a default migration edge of each state based on the failure path of the states of the deterministic finite automaton comprises:
traversing each state, and calculating the weight of the traversed state and the plurality of failure states; the failure state is a state on a failure path corresponding to the state; the weight is the number of the public migration edges;
selecting the failure state corresponding to the highest weight as a target failure state;
and generating a default migration edge of each state and the target failure state corresponding to each state.
Illustratively, the calculating the source code and the destination code corresponding to each state based on the coding tree includes:
calculating mask lengths of a plurality of state nodes based on the coding tree;
based on the mask length, performing descending sequencing on each state node to obtain a sequence of the state nodes;
and traversing the sequence, and calculating the source code and the destination code corresponding to each state.
Illustratively, the compressing the migration edge of the deterministic finite automata based on the source code, the destination code, and the default path of each state to obtain an entry in the TCAM table includes:
generating reverse default paths corresponding to the root state node and the plurality of child state nodes based on the default paths;
traversing the state nodes, and eliminating the common migration edges of the current state nodes and the state nodes in the corresponding reverse default path to obtain one table entry in the TCAM; the entry contains the source code and the destination code.
For example, after the compressing the deterministic finite automaton based on the failure path and the default path of the state of the deterministic finite automaton to obtain the TCAM table, the method includes:
reading input characters in a character string to be processed;
traversing the table entries in the TCAM table, and determining the table entry with the highest matching degree with the key value as a target table entry;
updating the TCAM table based on the target table entry.
Illustratively, to achieve the above object, the present application further provides a deterministic finite automaton compression method apparatus, including:
the first reading module is used for reading the characteristic characters from the characteristic character string set;
the construction module is used for constructing a deterministic finite automaton based on the characteristic characters;
the compression module is used for compressing the deterministic finite automaton based on the failure path and the default path of the state of the deterministic finite automaton to obtain a TCAM table; the TCAM table is used for string matching.
Illustratively, to achieve the above object, the present application further provides a deterministic finite automaton compression method device comprising a memory, a processor, and a deterministic finite automaton compression method program stored on the memory and executable on the processor, the deterministic finite automaton compression method program implementing the steps of the deterministic finite automaton compression method as described above when executed by the processor.
Illustratively, to achieve the above object, the present application also provides a computer-readable storage medium having stored thereon a deterministic finite automata compression method program, which when executed by a processor implements the steps of the deterministic finite automata compression method as described above.
Compared with the prior art, the method compresses the TCAM by a compact DFA method and a public suffix tree-based DFA state coding algorithm and a DFA migration edge fusion algorithm, but the DFA state coding length is the longest character string length, so that the compression ratio of the DFA is low. The method comprises the steps of reading characteristic characters from a characteristic character string set; constructing a deterministic finite automaton based on the characteristic characters; based on the failure path and the default path of the state of the deterministic finite automaton, compressing the deterministic finite automaton to obtain a TCAM table; the TCAM table is used for string matching. According to the method and the device, the TCAM is compressed by adopting the failure path and the default path of the state of the deterministic finite automata, the TCAM does not need to be compressed through a public suffix tree, and the state coding length is not limited by character strings, so that the compression ratio of the DFA is improved.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application.
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, and it is obvious for those skilled in the art to obtain other drawings without inventive exercise.
FIG. 1 is a schematic flow chart diagram of a first embodiment of a deterministic finite automata compression method according to the present application;
FIG. 2 is a diagram of DFA and state transition representation of a string set according to a first embodiment of the deterministic finite automata compression method of the present application;
FIG. 3 is a schematic diagram of a DFA state encoding flow of a first embodiment of the deterministic finite automata compression method of the present application;
FIG. 4 is a diagram of a failure transition edge of a DFA state according to a first embodiment of the deterministic finite automata compression method of the present application;
FIG. 5 is a diagram illustrating a failure path and default migration edges of a first embodiment of the deterministic finite automata compression method according to the present application;
FIG. 6 is a schematic diagram of a code tree of a first embodiment of the deterministic finite automata compression method according to the present application;
FIG. 7 is a diagram illustrating source and destination encodings for DFA states in a first embodiment of a deterministic finite automata compression method according to the present application;
FIG. 8 is a schematic diagram illustrating a TCAM table lookup process according to a first embodiment of the deterministic finite automata compression method of the present application;
FIG. 9 is a schematic diagram of TCAM entry numbers before and after code compression according to a second embodiment of the deterministic finite automata compression method of the present application;
fig. 10 is a schematic structural diagram of a hardware operating environment according to an embodiment of the present application.
The implementation, functional features and advantages of the objectives of the present application will be further explained with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
The present application provides a deterministic finite automaton compression method, and referring to fig. 1, fig. 1 is a schematic flow diagram of a first embodiment of the deterministic finite automaton compression method of the present application.
While a logical order is shown in the flow diagrams, in some cases, the steps shown or described may be performed in an order different than presented herein. For convenience of description, the following omits to perform various steps of a subject description deterministic finite automata compression method including:
and step S10, reading characteristic characters from the characteristic character string set.
And S20, constructing the deterministic finite automaton based on the characteristic characters.
Step S30, based on the failure path and the default path of the state of the deterministic finite automaton, compressing the deterministic finite automaton to obtain a TCAM table; the TCAM table is used for string matching.
The method comprises the following specific steps:
and step S10, reading characteristic characters from the characteristic character string set.
In this embodiment, the characteristic character string set is a set of characteristic character strings, and the characteristic character strings are strings of characters composed of numbers, letters, underlines, and the like. Characteristic characters refer to font-like units or symbols, including letters, numbers, operator symbols, punctuation marks, and other symbols. Characters are the common names of letters, numbers, symbols in electronic computers or radio communications, which are the smallest unit of data access in a data structure. For example, the characteristic characters are a, b, c; the characteristic character strings are abc and cab; the characteristic character string set is { cabab, bab, babc, abc }.
And S20, constructing the deterministic finite automaton based on the characteristic characters.
In this embodiment, the deterministic finite automaton is an automaton implementing a state transition, i.e. for a given state belonging to the automaton and a given character belonging to the alphabet sum of the automaton, a transition to the next state is made according to a predetermined transition function. Which is abbreviated as DFA (deterministic finite automata) in the following description.
Illustratively, a DFA contains a set of states, a set of migration edges, and an alphabet. Wherein the state set includes: an initial state (source state), an acceptance state (destination state) and an intermediate state, the acceptance state being a state matching the characteristic string; wherein, the migration edge is a directed edge which is migrated from the source state to the destination state when a character is input; wherein the alphabet is the valid character set for DFA processing.
Illustratively, for a given set of characteristic strings, an AC algorithm (Aho-Corasick, A Huoke Raschig) is used to construct a DFA and fail-over edges for each state. As shown in fig. 2, the DFA and its state transition table are for the string set { kabab, bab, babc, abc }, where the total number of states of the DFA is 15 (i.e. states 0,1, …), the alphabet contains 3 valid characters (i.e. characters a, b, c), and the total number of edges of the DFA transition is 60. The failure migration edge is a directed edge for migrating failure from the source state to the destination state.
Step S30, based on the failure path and the default path of the state of the deterministic finite automaton, compressing the deterministic finite automaton to obtain a TCAM table; the TCAM table is used for string matching.
In this embodiment, the DFA compression flow includes: DFA state coding and DFA migration edge fusion. And constructing a failure path for each DFA state through the failure transition edge of each DFA state, namely constructing a failure path from the DFA state to the initial state. The default path is a path from a state node to a root state node after a coding tree with the DFA state as the state node is constructed.
Illustratively, the compressing the deterministic finite automaton based on the fail path and the default path of the state of the deterministic finite automaton to obtain the TCAM table includes:
step S31, based on the failure path of the state of the deterministic finite automaton, performing state coding on the deterministic finite automaton to obtain source codes and target codes corresponding to a plurality of states of the deterministic finite automaton.
In this embodiment, the source code is a source state code of the DFA migration edge, and the destination code is a destination state code of the DFA migration edge.
Specifically, a failure path is constructed for each DFA state by using the failure transition edge of each DFA state, namely a failure path from the DFA state to the initial state; calculating a default migration edge of each DFA state by using the failure path of each DFA state, namely calculating the number of public migration edges of each state in the DFA state and the failure path thereof, and selecting a directed edge with the largest number of public migration edges and the smallest target state depth as a default migration edge; and constructing a coding tree covering all DFA states by using the default transition edges of each DFA state, calculating the mask length of each state node in a bottom-up mode, and calculating the source code and the target code of each state node in a top-down mode. As shown in fig. 3, the DFA state encoding flow is shown.
Illustratively, the state encoding of the deterministic finite automaton based on the failure path of the state of the deterministic finite automaton to obtain source codes and destination codes corresponding to a plurality of states of the deterministic finite automaton includes:
step S311, determining a default migration edge of each state based on the failure path of the state of the deterministic finite automaton.
In this embodiment, the default migration edge is a directed edge with the most common migration edges and the least depth of the destination state on the failure path of each DFA state. The common transition edge is a transition edge with the same input character and the same destination state between the two DFA states.
Illustratively, the determining a default migration edge of each state based on the failure path of the states of the deterministic finite automaton includes:
step S3111, traversing each state, and calculating weights of the traversed state and a plurality of failure states; the failure state is a state on a failure path corresponding to the state; the weight is the number of common migration edges.
In this embodiment, the weight is the number of the common migration edges, and the failure state is a destination state on the failure path corresponding to the state. The weight of each failure state on the DFA state and the failure path is calculated. As shown in FIG. 4, the transition edge and the fail transition edge of the DFA state are shown, for example, as shown in FIG. 5, the fail path 11 → 14 → 1 → 0 of the state 11 is composed of 3 fail transition edges, such as 11 → 14, 14 → 1,1 → 0. The weight of the directional edge 11 → 14 is 256, the weight of the directional edge 11 → 1 is 256, and the weight of the directional edge 11 → 0 is 255.
Step S3112, selecting the failure state corresponding to the highest weight as the target failure state.
In this embodiment, the weights of the failure states are compared, and the failure state with the highest weight is selected. As shown in FIG. 5, the directional edge 11 → 14 and the directional edge 11 → 1 have a weight of 256, failure states 14 and 1 are selected, and the target failure state is determined by continuing to compare the destination state depths.
Step S3113, generating a default transition edge of each state and the target failure state corresponding to each state.
In this embodiment, the directional edge with the largest weight and the smallest depth of the destination state is selected as the default migration edge of the DFA state. As shown in FIG. 4, the depth of the failed state 14 is 3, the depth of the failed state 0 is 0, and the depth of the failed state 1 is 1. Thus, the state 11 selects the directional edge 11 → 1 with the largest weight (i.e., 256) and the smallest destination state depth (i.e., 1) as the default transition edge.
Step S312, a coding tree is constructed with each state as a state node and the default migration edge as an edge.
In this embodiment, the state node takes the DFA state as a node. Wherein the state nodes include root state nodes and child state nodes. And constructing a coding tree which takes the DFA state as a state node and takes the default transition edge of the DFA state as an edge. As shown in fig. 6, a coding tree covering all 15 DFA states with initial state 0 as the root node is constructed. For example, the default transition edge of state 5 points to state 10, the default transition edge of state 10 points to state 6, and the default transition edge of state 6 points to initial state 0.
Step S313, based on the coding tree, calculating the source code and the destination code corresponding to each state.
Illustratively, the calculating the source code and the destination code corresponding to each state based on the coding tree includes:
step S3131, calculating mask lengths of a plurality of state nodes based on the coding tree.
In this embodiment, the mask is a string of binary codes and performs a bit and operation on the target field, and masks the current input bit, where the mask length is the number of bits of the masked input bit.
Specifically, a bottom-up mode is adopted to calculate the mask length of each state code in the code tree. The mask length of each state code is calculated starting from the leaf state nodes of the code tree up to the root state node (i.e. the initial state) of the code tree. The leaf state node is a child state node and is a node of the destination state. As shown in FIG. 6, states 11, 14, 2, 5, 3,9,4, 8 and 12 are leaf state nodes with a mask length of 0, indicating that the source code and destination code of the leaf state nodes are exact numbers and the same. And calculating the mask length of the non-leaf state node of the coding tree in a bottom-up mode.
Figure BDA0003775031020000081
Wherein, mask _ length j is the mask length of the state node j; the state node i is a sub-state node of the state node j, and the mask length of the sub-state node is mask _ length hi;
Figure BDA0003775031020000082
representing the sum of the encoded values of all the sub-state nodes i; 1 represents 1 encoded value of parent node j. According to the above formula, the mask length of the non-leaf nodes in fig. 6 is calculated, for example, the mask length of the state nodes 1 and 13 is 2, the mask length of the state nodes 10 and 7 is 1, the mask length of the state node 6 is 3, and the mask length of the root state node 0 is 5 (indicating that the coding length of all the state nodes in the coding tree is 5 bits).
And S3142, sequencing each state node in a descending order based on the mask length to obtain a sequence of the state nodes.
In the embodiment, before the source coding and the destination coding calculation of the state nodes, the state nodes at each layer in the coding tree are arranged in a descending order according to the mask length. Sorting according to the mask length from long to short, wherein the longer the mask length is, the earlier the sorting is; the shorter the mask length, the later the ordering. For example, the mask lengths of the layer 1 state nodes in fig. 6 are arranged in descending order as the state nodes 6, 1, 7, 2, 8, and 12.
Step S3133, traversing the sequence, and calculating a source code and a destination code corresponding to each state.
In this embodiment, a sequence of each layer is traversed from a root state node to a leaf state node of the coding tree in a top-down manner, and a source code and a destination code of each state node in the coding tree are calculated.
Specifically, as shown in fig. 7, since the mask length of the root state node 0 is 5, the source code of the root state node 0 is set by x, and the destination code thereof is set by the lower limit value of the coverage of the source code 00000; the source code of the layer 1 state node 6 (mask length is 3) is the mask value 11 of the coverage upper limit 11111 of the father node 0, and the destination code is the coverage lower limit 11000 of the source code; the source code of the layer 1 state node 2 (mask length is 2) is the mask value 101 of the residual coverage upper limit value 10111 of the parent node 0, and the destination code is the coverage lower limit value 10100 of the source code; the source code of the layer 1 state node 7 (mask length is 1) is the mask value 1001 of the remaining coverage upper limit 10011 of the parent node 0, and the destination code is the coverage lower limit 10010 of the source code; the source code of the layer 1 state node 2 (mask length is 0) is the mask value 10001 of the upper limit 10001 of the residual coverage of the parent node 0, and the destination code is the lower limit 10001 of the coverage of the source code, i.e. the source code and the destination code of the state node 2 are the same; similarly, the source code and the destination code of the state node 8 (mask length is 0) in layer 1 are both 10000, and the source code and the destination code of the state node 12 (mask length is 0) are both 01111; the source code of the layer 2 state node 13 (mask length is 2) is the mask value 111 of the upper coverage limit 11111 of the parent node 6, and the destination code is the lower coverage limit 11100 of the source code; the source of the level 2 state node 10 (mask length 1) is coded as mask value 1101 of the upper limit of the remaining coverage of the parent node 6, 11011, and the destination is coded as the lower limit of the coverage of the source code 11010.
Illustratively, while outputting the source code and the destination code of all DFA states, the default path of each state node in the coding tree, i.e., one default path from the state node to the root state node, is output. As shown in FIG. 7, the default path for state 5 is 5 → 10 → 6 → 0.
In this embodiment, two dependent paths, such as a failure path and a default path of a DFA state, are used to efficiently implement DFA state coding and DFA migration edge fusion. Compared with the existing DFA compression method based on the TCAM, the DFA compression algorithm of the invention reduces the DFA compression time. The experimental result aiming at the Snort attack characteristic character string rule set shows that: in the aspect of DFA compression time, the DFA compression method of the invention is reduced to 1/9.6 compared with a compact deterministic finite automaton (compact finite automaton) method, is reduced to 1/13362 compared with a CSE (covered State Encoding) method, and is reduced to 1/3049 compared with an SSE (Shadow State Encoding) method, and the compression efficiency is greatly improved.
For example, after the compressing the deterministic finite automaton based on the failure path and the default path of the state of the deterministic finite automaton to obtain the TCAM table, the method includes:
step a, reading input characters in a character string to be processed.
In this embodiment, as shown in fig. 8, a TCAM (ternary content addressable memory) table lookup procedure is performed, that is, an input character is read from a character string to be processed in a data packet.
Step b, traversing the table items in the TCAM table, and determining the table item with the highest matching degree with the key value as a target table item.
In this embodiment, the key values are the source state and the input character, and the current source state of the TCAM table is the initial state. And (3) taking the source state and the input character as key values, searching the TCAM table, matching a TCAM table item with the highest priority (namely the smallest index value), and outputting a target state. When the destination state is an accept state, the matched string rule is output.
And c, updating the TCAM table based on the target table entry.
In this embodiment, the current source state of the TCAM table is updated as the destination state, and the next input character is continuously read in, and the TCAM table is continuously searched by using the source state and the input character as key values until the input character string is completely processed.
Compared with the prior art, the TCAM is compressed by a compact DFA method and a DFA state coding algorithm and a DFA migration edge fusion algorithm based on a common suffix tree, but the DFA state coding length is the longest character string length, so that the compression ratio of the DFA is low. The method comprises the steps of reading characteristic characters from a characteristic character string set; constructing a deterministic finite automaton based on the characteristic characters; based on the failure path and the default path of the state of the deterministic finite automaton, compressing the deterministic finite automaton to obtain a TCAM table; the TCAM table is used for string matching. According to the method, the TCAM is compressed by adopting the failure path and the default path of the state of the deterministic finite automata, the TCAM does not need to be compressed through a public suffix tree, and the state coding length is not limited by character strings, so that the compression ratio of DFA is improved.
Illustratively, based on the first embodiment of the deterministic finite automata compression method of the present application described above, a second embodiment is proposed, where the method further includes:
and S32, compressing the migration edge of the deterministic finite automaton based on the source code, the destination code and the default path of each state to obtain an item in the TCAM.
In this embodiment, a plurality of DFA migration edges are efficiently fused and compressed in one TCAM entry by using the source code, the destination code and the default path of the DFA state. Multiple entries are included in the TCAM table. The process comprises the following steps: replacing the source state and the destination state of each DFA migration edge by using the source code and the destination code of each DFA state; and fusing and compressing a plurality of migration edges with the same input character and the same destination code in a TCAM table entry by using the default path of each DFA state in the code tree, namely the default path from the DFA state to the root state (namely the initial state). And adding a default TCAM table entry, namely a source code of a wildcard character …, an input character of a wildcard character … and a destination code of an initial state 00 … at the tail part of the TCAM table.
Illustratively, the compressing the migration edge of the deterministic finite automaton based on the source code, the destination code, and the default path of each state to obtain an entry in a TCAM table includes:
step S321, based on the default path, generates a reverse default path corresponding to the plurality of child state nodes from the root state node.
In this embodiment, the default path is a path from a child state node to a root state node, and the reverse default path is a path from the root state node to the child state node.
Step S322, traversing the state nodes, eliminating the common migration edge of the current state node and the state node in the corresponding reverse default path, and obtaining an item in the TCAM table; the entry contains the source code and the destination code.
In the embodiment, starting from the root state node in the coding tree, the same input characters and the same destination coded transition edges of each state node on the state node and the reverse default path of the state node are eliminated. And (3) carrying out migration edge comparison on the state node j in the coding tree and each state node i on the reverse default path thereof, and eliminating the migration edges of the state nodes i, which have the same input characters and the same target codes as the state node j. And finally, storing the residual migration edges of each state node in the coding tree into the TCAM according to the ascending mode of the source coding mask length, namely, the shorter the source coding mask length of the migration edge is, the higher the priority is, and the smaller the index value of the migration edge in the TCAM is.
Illustratively, the default TCAM entry is inserted at the tail of the TCAM table. The default TCAM entry is the state of outputting the initial state as the destination state when other migration edges cannot match the source state and the input character. The source code of the default TCAM table entry is wildcard character …, the input character is wildcard character …, and the destination code is destination code 00 … of the initial state. The default TCAM entry has the lowest priority and has the largest index value in the TCAM table.
Illustratively, the compressed TCAM table is output, and as shown in fig. 9, the TCAM table before encoding compression comprises 60 TCAM entries, wherein the number of DFA states is 15, and the character table comprises 3 valid characters and 1 invalid character (i.e., [ abc ], indicating other characters other than a, b, and c). The TCAM table after code compression comprises 15 TCAM table items, the source state of each TCAM table item is a source code, the target state is a target code, and the storage space is only 1/4 of the original storage space. In the aspect of TCAM storage cost, the DFA compression method of the invention is reduced by 89.4% compared with the compactDFA method and 57.9% compared with the CSE method, and storage economy is improved.
In this embodiment, the conventional TCAM table includes DFA state numbers, all valid characters, one invalid character, and a destination state corresponding to each DFA state number under a character, which occupies a large storage space, and a large amount of TCAM storage overhead caused by DFA migration edges is high. According to the method and the device, the DFA state is migrated and fused, the multiple DFA migrated edges are efficiently fused and compressed in one TCAM table item, a space efficient compressed TCAM table is constructed, the storage space is reduced, the TCAM storage overhead is further reduced, and the storage economy is improved.
Illustratively, the present application further provides a deterministic finite automaton compression method apparatus, where the deterministic finite automaton compression method apparatus includes:
the first reading module is used for reading the characteristic characters from the characteristic character string set;
the construction module is used for constructing a deterministic finite automaton based on the characteristic characters;
the compression module is used for compressing the deterministic finite automaton based on the failure path and the default path of the state of the deterministic finite automaton to obtain a TCAM table; the TCAM table is used for string matching.
Illustratively, the compression module includes:
the encoding sub-module is used for carrying out state encoding on the deterministic finite automaton based on a failure path of the states of the deterministic finite automaton to obtain source codes and target codes corresponding to a plurality of states of the deterministic finite automaton;
and the compression sub-module is used for compressing the migration edge of the deterministic finite automaton based on the source code, the destination code and the default path of each state to obtain an entry in the TCAM table.
Illustratively, the encoding submodule includes:
a determining unit, configured to determine a default migration edge of each state based on a failure path of the state of the deterministic finite automaton;
a constructing unit, configured to construct a coding tree in which each state is a state node and the default migration edge is an edge;
and the calculating unit is used for calculating the source code and the destination code corresponding to each state based on the coding tree.
Illustratively, the determining unit includes:
the first traversal subunit is used for traversing each state and calculating the weight of the traversed state and the plurality of failure states; the failure state is a state on a failure path corresponding to the state; the weight is the number of the public migration edges;
the selecting subunit is used for selecting the failure state corresponding to the highest weight as a target failure state;
and the generating subunit is used for generating the default transition edges of each state and the target failure state corresponding to each state.
Illustratively, the computing unit includes:
a calculating subunit, configured to calculate mask lengths of a plurality of state nodes based on the coding tree;
a sorting subunit, configured to perform descending sorting on each state node based on the mask length to obtain a sequence of the state nodes;
and the second traversal subunit is used for traversing the sequence and calculating the source code and the destination code corresponding to each state.
Illustratively, the compression sub-module may be, for example,
the generating unit is used for generating reverse default paths corresponding to the plurality of sub-state nodes from the root state node based on the default paths;
the elimination unit is used for traversing the state nodes, eliminating the common migration edge of the current state node and the state node in the corresponding reverse default path and obtaining an item in a TCAM table; the entry contains the source code and the destination code.
Illustratively, the deterministic finite automata compression method apparatus further includes:
the second reading module is used for reading the input characters in the character string to be processed;
the traversal module is used for traversing the table entries in the TCAM table and determining the table entry with the highest matching degree with the key value as a target table entry;
and the updating module is used for updating the TCAM table based on the target table entry.
The specific implementation of the deterministic finite automata compression method apparatus of the present application is substantially the same as each of the embodiments of the deterministic finite automata compression method described above, and will not be described herein again.
In addition, the application also provides a deterministic finite automaton compression method device. As shown in fig. 10, fig. 10 is a schematic structural diagram of a hardware operating environment according to an embodiment of the present application.
Fig. 10 is a schematic diagram of an exemplary hardware operating environment of a deterministic finite automaton compression method apparatus.
As shown in fig. 10, the apparatus for deterministic finite automata compression method may include a processor 1001, a communication interface 1002, a memory 1003 and a communication bus 1004, wherein the processor 1001, the communication interface a02 and the memory 1003 complete communication with each other through the communication bus 1004, and the memory 1003 is used for storing a computer program; the processor 1001 is configured to implement the steps of the deterministic finite automata compression method when executing the program stored in the memory 1003.
The communication bus 1004 mentioned in the deterministic finite automata compression method apparatus may be a Peripheral Component Interconnect (PCI) bus, an Extended Industrial Standard Architecture (EISA) bus, or the like. The communication bus 1004 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus.
The communication interface 1002 is used for communication between the deterministic finite automata compression method apparatus described above and other apparatuses.
The Memory 1003 may include a Random Access Memory (RMD) or a Non-Volatile Memory (NM), such as at least one disk Memory. Optionally, the memory 1003 may also be at least one storage device located remotely from the processor 1001.
The Processor 1001 may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also Digital Signal Processors (DSP), application Specific Integrated Circuits (ASIC), field Programmable Gate Arrays (FPGA) or other programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components.
The specific implementation of the deterministic finite automata compression method device of the present application is substantially the same as the embodiments of the deterministic finite automata compression method described above, and is not described herein again.
Furthermore, an embodiment of the present application also provides a computer-readable storage medium, on which a deterministic finite automata compression method program is stored, and when executed by a processor, the deterministic finite automata compression method program implements the steps of the deterministic finite automata compression method as described above.
The specific implementation of the computer-readable storage medium of the present application is substantially the same as the embodiments of the deterministic finite automata compression method, and is not described herein again.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising a … …" does not exclude the presence of another identical element in a process, method, article, or apparatus that comprises the element.
The above-mentioned serial numbers of the embodiments of the present application are merely for description and do not represent the merits of the embodiments.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present application or portions thereof that contribute to the prior art may be embodied in the form of a software product, where the computer software product is stored in a storage medium (such as a ROM/RAM, a magnetic disk, and an optical disk), and includes several instructions for enabling a terminal device (which may be a mobile phone, a computer, a server, a device, or a network device) to execute the method described in the embodiments of the present application.
The above description is only a preferred embodiment of the present application, and not intended to limit the scope of the present application, and all modifications of equivalent structures and equivalent processes, which are made by the contents of the specification and the drawings of the present application, or which are directly or indirectly applied to other related technical fields, are included in the scope of the present application.

Claims (10)

1. A method of deterministic finite automata compression, the method comprising:
reading characteristic characters from the characteristic character string set;
constructing a deterministic finite automaton based on the characteristic characters;
based on the failure path and the default path of the state of the deterministic finite automaton, compressing the deterministic finite automaton to obtain a TCAM table; the TCAM table is used for string matching.
2. The method of claim 1, wherein compressing the deterministic finite automaton based on a fail path and a default path of the states of the deterministic finite automaton to obtain a TCAM table comprises:
performing state coding on the deterministic finite automaton based on a failure path of the states of the deterministic finite automaton to obtain source codes and target codes corresponding to a plurality of states of the deterministic finite automaton;
and based on the source code, the target code and the default path of each state, compressing the migration edge of the deterministic finite automaton to obtain an item in the TCAM.
3. The method of claim 2, wherein the state encoding of the deterministic finite automaton based on the failure path of the states of the deterministic finite automaton to obtain source and destination encodings corresponding to a plurality of states of the deterministic finite automaton comprises:
determining a default migration edge of each state based on a failure path of the states of the deterministic finite automaton;
constructing a coding tree with each state as a state node and the default migration edge as an edge;
and calculating the source code and the destination code corresponding to each state based on the coding tree.
4. The method of claim 3, wherein determining the default migration edge for each state based on the failure path for the states of the deterministic finite automaton comprises:
traversing each state, and calculating the weight of the traversed state and the plurality of failure states; the failure state is a state on a failure path corresponding to the state; the weight is the number of the public migration edges;
selecting the failure state corresponding to the highest weight as a target failure state;
and generating a default migration edge of each state and the target failure state corresponding to each state.
5. The method of claim 3, wherein said calculating the source code and the destination code corresponding to each state based on the coding tree comprises:
calculating mask lengths of a plurality of state nodes based on the coding tree;
based on the mask length, sequencing each state node in a descending order to obtain a sequence of the state nodes;
and traversing the sequence, and calculating the source code and the destination code corresponding to each state.
6. The method of claim 2, wherein compressing the migration edge of the deterministic finite automaton based on the source code, the destination code, and the default path for each state to obtain an entry in a TCAM table comprises:
generating reverse default paths corresponding to the root state node and the plurality of child state nodes based on the default paths;
traversing the state nodes, and eliminating the common migration edges of the current state nodes and the state nodes in the corresponding reverse default path to obtain one table entry in the TCAM; the entry contains the source code and the destination code.
7. The method of claim 1, wherein the compressing the deterministic finite automaton based on the fail path and the default path of the state of the deterministic finite automaton to obtain the TCAM table comprises:
reading input characters in a character string to be processed;
traversing the table entries in the TCAM table, and determining the table entry with the highest matching degree with the key value as a target table entry;
updating the TCAM table based on the target table entry.
8. An apparatus for deterministic finite automata compression, the apparatus comprising:
the first reading module is used for reading the characteristic characters from the characteristic character string set;
the construction module is used for constructing a deterministic finite automaton based on the characteristic characters;
the compression module is used for compressing the deterministic finite automaton based on the failure path and the default path of the state of the deterministic finite automaton to obtain a TCAM table; the TCAM table is used for string matching.
9. A deterministic finite automata compression method device, characterized in that it comprises a memory, a processor and a deterministic finite automata compression method program stored on said memory and executable on said processor, said deterministic finite automata compression method program realizing the steps of the deterministic finite automata compression method according to any of claims 1 to 7 when executed by said processor.
10. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon a deterministic finite automata compression method program which, when executed by a processor, implements the steps of the deterministic finite automata compression method according to any of claims 1 to 7.
CN202210918797.8A 2022-08-01 2022-08-01 Deterministic finite automata compression method, device, equipment and storage medium Pending CN115149962A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210918797.8A CN115149962A (en) 2022-08-01 2022-08-01 Deterministic finite automata compression method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210918797.8A CN115149962A (en) 2022-08-01 2022-08-01 Deterministic finite automata compression method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN115149962A true CN115149962A (en) 2022-10-04

Family

ID=83414770

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210918797.8A Pending CN115149962A (en) 2022-08-01 2022-08-01 Deterministic finite automata compression method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN115149962A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115801020A (en) * 2023-02-13 2023-03-14 鹏城实验室 Definite finite state automaton compression method, matching method, device and medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115801020A (en) * 2023-02-13 2023-03-14 鹏城实验室 Definite finite state automaton compression method, matching method, device and medium
CN115801020B (en) * 2023-02-13 2023-04-11 鹏城实验室 Definite finite state automaton compression method, matching method, device and medium

Similar Documents

Publication Publication Date Title
CN108370352B (en) High speed flexible packet classification using network processors
CN111123888B (en) Industrial control protocol testing method and system, electronic equipment and storage medium
JP4452183B2 (en) How to create a programmable state machine data structure to parse the input word chain, how to use the programmable state machine data structure to find the resulting value corresponding to the input word chain, deep wire speed A method for performing packet processing, a device for deep packet processing, a chip embedding device, and a computer program including programming code instructions (method and device for deep packet processing)
US7110540B2 (en) Multi-pass hierarchical pattern matching
CN107153647B (en) Method, apparatus, system and computer program product for data compression
US8504510B2 (en) State machine compression for scalable pattern matching
US20070075878A1 (en) Memory circuit for aho-corasick type character recognition automaton and method of storing data in such a circuit
EP2245836B1 (en) Determining a property of a communication device
CN110309368B (en) Data address determining method and device, storage medium and electronic device
US7469317B2 (en) Method and system for character string searching
JP2686847B2 (en) Character sequence verification method and apparatus
US7389538B2 (en) Static code image modeling and recognition
CN115149962A (en) Deterministic finite automata compression method, device, equipment and storage medium
CN109800337B (en) Multi-mode regular matching algorithm suitable for large alphabet
CN109359481B (en) Anti-collision search reduction method based on BK tree
CN116132527A (en) System and method for managing indication board and data processing server
CN113065419B (en) Pattern matching algorithm and system based on flow high-frequency content
CN114301671A (en) Network intrusion detection method, system, device and storage medium
CN114329287A (en) Abnormal link processing method and device, computer equipment and storage medium
Mizumoto et al. An efficient query learning algorithm for zero-suppressed binary decision diagrams
CN113177123A (en) Optimization method and system for text-to-SQL model
CN113626600B (en) Text processing method, device, computer equipment and storage medium
CN109194613B (en) Data packet detection method and device
CN115801020B (en) Definite finite state automaton compression method, matching method, device and medium
CN110071849B (en) Security protocol implementation security analysis method, device, medium and equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination