WO2008141519A1 - Méthode et structure de puce de mise en concordance de chaînes de caractères - Google Patents

Méthode et structure de puce de mise en concordance de chaînes de caractères Download PDF

Info

Publication number
WO2008141519A1
WO2008141519A1 PCT/CN2008/000293 CN2008000293W WO2008141519A1 WO 2008141519 A1 WO2008141519 A1 WO 2008141519A1 CN 2008000293 W CN2008000293 W CN 2008000293W WO 2008141519 A1 WO2008141519 A1 WO 2008141519A1
Authority
WO
WIPO (PCT)
Prior art keywords
state
post
current
input
input character
Prior art date
Application number
PCT/CN2008/000293
Other languages
English (en)
Chinese (zh)
Inventor
Tian Song
Original Assignee
Beijing Zhean Technology Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Zhean Technology Corporation filed Critical Beijing Zhean Technology Corporation
Publication of WO2008141519A1 publication Critical patent/WO2008141519A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/90335Query processing
    • G06F16/90344Query processing by using string matching techniques

Definitions

  • the present invention relates to a method and a chip structure for information processing, and in particular to a multi-character string matching method and chip structure. Background technique
  • Multi-string matching technology also known as multi-keyword matching technology, has matured and is widely used in many fields such as text processing and content filtering.
  • the technology can find one or more of a predefined set of strings in one-dimensional content to be matched, and in the process of matching text, fully utilize the features in a set of strings, perform pre-processing, and according to the pre-
  • the processed intermediate data structure performs content matching to achieve parallel matching of a set of predefined strings. .
  • multi-string matching techniques such as network intrusion detection and prevention systems, spam filtering, virus scanning and filtering, malicious code scanning and filtering, and content filtering.
  • the typical use of this type of application for multi-string matching techniques is to capture packets from the network and restore them to specific network layer data, based on pre-defined rule sets (eg, intrusion rules, virus rules, garbage). Mail rules, etc.), matching in the data. In most cases, this match utilizes multi-character string matching techniques.
  • scheme A In the actual multi-string matching technology application, there is a kind of scheme (hereinafter referred to as scheme A) which is favored because of the following characteristics:
  • the matching performance is independent of the size of the rule base, the matching performance and the minimum length of the rule base. Irrelevant, matching performance and rule base and text to be matched The relationship is irrelevant.
  • scenario A preprocesses P and constructs a finite state automaton (DFA), as shown in Figure 1. (where the circle indicates the state and the line indicates the conversion rule)
  • DFA finite state automaton
  • one character can be read at a time, and in the above structure, according to the conversion relationship, each time advances to a position, when reaching S3 or S5 When the location is located, ⁇ ⁇ a valid match.
  • the scheme of the paper 1 adopts the scheme ⁇ , and proposes a priority conversion rule storage method, which can merge all the failure conversion rules and all the restart conversion rules in Fig. 1 into a maximum of 256 rules. In practical applications, the number of conversion rules can be greatly reduced.
  • Paper 1 does not completely solve the problem of increasing storage space with the increase of the number of rules. Matching large-scale feature sets requires a great space cost.
  • the state machine contains state and conversion rules. Implementing the state machine with a chip structure means that the conversion rules in the state machine are stored in a specific memory, and these conversion rules are accessed as needed.
  • the information contained in each conversion rule includes: pre-state, input characters, and post-state.
  • the pre-state refers to the current state of the state machine.
  • the conversion rule indicates the process of receiving a character to jump to a certain state after the previous state. For each (pre-state, input character) pair, the state machine has a unique conversion rule that corresponds to it.
  • TCAM bead memory device
  • the main object of the present invention is to provide a multi-string matching method and chip structure, and the technical problem to be solved is to enable high matching speed and matching to a large-scale rule set, which is very suitable for practical use.
  • the cache state machine includes: a status register: for registering a current state; a cache status register: for registering a cache state; a conversion rule module: for storing and accessing a state conversion rule base, and according to characters received by the interface module
  • the current state of the status register register and the cache status of the cache status register register look for the next state, output to the status register; and assign the cache status register according to a specific cache rule.
  • a multi-string matching method comprising the steps of: sequentially taking characters as input characters from a received input character stream; for each input character, performing the following steps: The current state and the cache state are searched for in the state transition rule base; the jump to the post state; the state cache is performed according to a specific cache rule; the post state is taken as the current state, and the cached state is used as the cache state, An input character is used as the current input character, and the steps performed for each input character are repeated until all the characters in the character stream are judged.
  • the step of the post-find state includes: first determining whether the current dog state receives the current input character in the basic conversion rule and the n-step cross-conversion rule, and if present, if present, Then, the post state is used as a search result; if not, it is determined whether the cache state receives the current input character in the basic conversion rule and the n-step cross-conversion rule, and if yes, the post state is used as the search result; If it does not exist, it is judged whether the initial state receives the current input character in the basic conversion rule and the n-step cross-conversion rule. If it exists, the post-state is used as the search result; otherwise, the initial state is used as the search result.
  • the step of performing state buffering according to a specific cache rule is: if the initial state receives the corresponding post state of the current input character in the basic conversion rule, the post state is cached; otherwise, the initial state is cached.
  • the step of the post-find state includes: determining a type of the current state, and if it is a converged state or a general state, searching in the state transition rule set according to the current input character and the current state.
  • Post state if it is a detached state, the post state is looked up in the detached state transition rule set according to the current input character, the current state, and the cache state.
  • the separated state transition rule set is set to receive three inputs: the current input character, when The pre-state and the cache state provide an output accordingly: post-state.
  • the step of buffering according to a specific cache rule is: if the current state is a convergence state, the current state is cached.
  • the present invention also provides a computer readable storage medium storing a plurality of instructions, when the instructions are executed by a processor, causing the processor to: receive an input character; for each input character, perform the next Steps: searching for a post state in the state transition rule base according to the current input character, current state, and cache state; jumping to the post state; performing state caching according to a specific caching rule; using the post state as a current state, The state of the cache is used as the cache state, and the next input character is used as the current input character, and the steps performed for each input character are repeated until all the characters in the character stream are judged.
  • the present invention also provides a system comprising: a processor; a bus coupled to the processor for transferring data between portions of the system; a communication interface coupled to the bus for receiving a stream of character data a main memory, coupled to the bus, in which is stored a number of instructions, when the instructions are executed by the processor, causing the processor to perform the following steps: sequentially extracting characters from the received character data stream as Enter characters; for each input character, perform the following steps: Find the post state in the state transition rule base according to the current input character, current state, and cache state; jump to the post state; perform state buffer according to a specific cache rule The post state is taken as the current state, the cached state is taken as the cache state, and the next input character is used as the current input character, and the steps performed for each input character are repeated until all the characters in the character stream are judged. .
  • the post-state search method includes: calculating a possible post-state according to the current state and the input character in conjunction with the input translation table; and searching the rule storage table according to the possible post-state to obtain a corresponding input character; Whether the actual input characters are consistent with the characters obtained by searching the rule storage table; if the results are consistent, the state is switched to the possible post state; if the results are inconsistent, the state is reset to zero.
  • the numbering rule of the state includes: if the current state has only one corresponding output conversion rule, the number of the state after the output conversion rule is the number of the current state plus one.
  • the step of calculating a possible post-state includes: according to a certain rule set, if the current state has only one corresponding output conversion rule, the number of the current state is added a number for obtaining a possible post state; if there are a plurality of corresponding output conversion rules for the current state, taking the color of the current state and the input character as inputs, searching the input translation table to obtain the The difference between the possible post state and the current state, and the number of the current state is added to the difference to obtain the number of possible post states.
  • the rule storage table is configured to: the input is a post state, and the corresponding output is a color of the post state and an input character corresponding to the post state.
  • the input translation table is configured to: the input is the color of the current state and the input character, and the corresponding output is the difference between the possible post state and the current state.
  • the foregoing post-state search method further includes performing entry merging on the input translation table, where each row of the input translation table corresponds to a current state, and each column corresponds to one input character, and the entry is Merging it includes the steps of judging whether there is a resource conflict and an overlay conflict, and judging each of the two rows to be merged, the judgment of the kth column is as follows: If one of the two columns is empty, judging the corresponding of the empty column Whether the character received by the non-empty column data after the merge is equal to k, if yes, it is the overlay conflict, the two columns cannot be merged, and exit; if not, the following judgment is made; if both columns are empty or both If it is not empty, determine whether the corresponding values of the two columns are the same.
  • the resource conflict refers to the value of the corresponding column in the ITT table entry. It is empty and different; the coverage conflict refers to the non-null value of a column in the ITT table entry that covers the null value, which is equivalent to the original state.
  • the external conversion rule, the additional conversion rule conflicts with the original conversion rule, that is, the overlay conflict; until it is determined that if all the columns in the two rows to be merged do not have the resource conflict and the overlay conflict, the corresponding row is performed. Merge, where non-null values cover null values.
  • the foregoing post-state search method further includes performing group associative optimization on the input translation table, and the method includes the following steps of determining whether there is a resource conflict: for the N-way group association, dividing the ITT table into a row 256/N groups, for a group, judge the number of valid values contained in two rows. If the number is greater than N, it indicates that there is a resource conflict in the group; otherwise, judge another group; until all 256/N groups are determined If there are no resource conflicts, the two rows are merged.
  • the present invention also provides a computer readable storage medium storing a plurality of instructions, when the instructions are executed by the processor, causing the processor to perform the following steps: calculating the input translation table according to the current state and the input characters a possible post state; searching the rule storage table according to the possible post state to obtain a corresponding input character; comparing whether the actual input character and the character obtained by searching the rule storage table are consistent; if the results are consistent, The state is converted to the possible post state described; if the results are inconsistent, the state is zeroed.
  • the numbering rule of the state includes: if the current state has only one corresponding output conversion rule, the number of the state after the output conversion rule is the number of the current state plus one; the calculation is possible
  • the step of the post state includes: a certain rule set, if the current state has only one corresponding output conversion rule, add a number of the current state to obtain a number of possible post states; if the current state exists a plurality of corresponding output conversion rules, taking the color of the current state and the input character as inputs, and searching the input translation table to obtain a difference between the number of the possible post state and the current state, And adding the difference by the number of the current state to obtain the number of possible post-states.
  • the rule storage table is configured to: the input is a post state, and the corresponding output is a color of the post state and an input character corresponding to the post state.
  • the input translation table is configured to: the input is the color of the current state and the input character, and the corresponding output is the difference between the possible post state and the current state.
  • each row of the input translation table corresponds to a current state
  • each column corresponds to one input character
  • the input translation table is merged by an entry
  • the combination of the entries is performed as follows:
  • Each column of the two rows is judged, and the judgment of the kth column is as follows: If one of the two columns is empty, it is judged whether the state corresponding to the empty column is equal to k when the character received by the non-null column data after the merge is equal. If yes, it is an override conflict, two columns cannot be merged, and exit.
  • both columns are empty or not empty, it is judged whether the corresponding values of the two columns are the same, if not, then For resource conflicts, the two columns cannot be merged, exit, and if so, the next column is judged; until all the columns in the two rows to be merged are determined to have no resource conflicts and overlay conflicts, the corresponding rows are merged, and the corresponding rows are not empty. The value overrides the null value.
  • the input translation table is optimized by group association, and the group association optimization includes the following steps of determining whether there is a resource conflict: For the N-way group association, the ITT table is divided into 256/N. Groups, for a group, determine the number of valid values contained in the two rows. If the number is greater than N, it indicates that there is a resource conflict in the group; otherwise, judge another group; until it is determined that all 256/N groups do not have resources Conflict, then merge the two lines.
  • the present invention also provides a system, comprising: a main processor, an organization input data stream; a coprocessor unit, connected to the main processor; the coprocessor unit performs the following operations: according to the current state and the input characters Entering a translation table to calculate a possible post state; searching the rule storage table according to the possible post state to obtain a corresponding input character; comparing whether the actual input character and the character obtained by searching the rule storage table are consistent; The results are consistent, then the state is transitioned to the possible post state; if the results are inconsistent, the state is zeroed.
  • the numbering rule of the state includes: if the current state has only one corresponding output conversion rule, the number of the state after the output conversion rule is the number of the current state plus one; the calculation is possible
  • the step of the post state includes: according to a certain rule set, if the current state has only one corresponding output conversion rule, add a number of the current state to obtain a number of possible post states; if the current state exists a plurality of corresponding output conversion rules, taking the color of the current state and the input character as inputs, and searching the input translation table to obtain a difference between the number of the possible post state and the current state, And adding the difference by the number of the current state to obtain the number of possible post-states.
  • the rule storage table is configured to: the input is a post state, and the corresponding output is the color of the post state and the input character corresponding to the post state.
  • the input translation table is configured to: the input is the color of the current state and the input character, and the corresponding output is the difference between the possible post state and the current state.
  • each row of the input translation table corresponds to a current state
  • each column corresponds to one input character
  • the input translation table is merged by an entry
  • the combination of the entries is performed as follows:
  • Each of the two rows is judged by the ⁇ , and the judgment of the kth column is as follows: If one of the two columns is empty, it is judged whether the character corresponding to the empty column is the character received by the non-null column data after the merge. Equivalent to k, if yes, it is an overlay conflict, the two columns cannot be merged, and exit.
  • both columns are empty or not empty, it is judged whether the corresponding values of the two columns are the same, if not, Then, for resource conflicts, the two columns cannot be merged and exited. If yes, the next column is judged; until all the columns in the two rows to be merged are determined to have no resource conflicts and overlay conflicts, the corresponding rows are merged, and the non- A null value covers a null value.
  • the input translation table is optimized by group association, and the group association optimization includes the following steps of determining whether there is a resource conflict: For the N-way group association, the ITT table is divided into 256/N. Groups, for a group, determine the number of valid values contained in the two rows. If the number is greater than N, it indicates that there is a resource conflict in the group; otherwise, judge another group; until all 256 N groups are determined to have no resource conflicts , then merge the two lines. -
  • the object of the present invention and solving the technical problems thereof are additionally achieved by the following technical solutions.
  • a post-state lookup structure includes: a main memory: storing a basic conversion rule and a cross-conversion rule, the input of which is a possible post-state calculated according to the current state and the input character in conjunction with the input translation table, Outputting the color of the possible post state and the input character corresponding to the possible post state according to the stored conversion rule; the secondary memory: storing the failure conversion rule and restarting the conversion rule, and the input is the actual input character Outputting a post state corresponding to the actual input character and its color according to the stored conversion rule; inputting a translation table: the input is the color of the current state and the actual input character, and the corresponding output is possible The difference between the number of the post state and the current state; the two-state gate: according to the comparison result between the character output by the main memory and the actual input character: if equal, the current The state transitions to the calculated possible post state, while the current state of the face The color is converted to the color of the possible post state output by the main memory; otherwise, the current state and
  • the post state lookup structure further includes a comparator for performing the main memory.
  • the post state lookup structure further includes: a status register: configured to store the current state; a color register: A color used to store the current state.
  • the post-state lookup structure further includes a gate: configured to selectively output the output value of the input translation table and the value 1 according to the value of the color register.
  • the post state lookup structure further includes an adder: configured to add the number of the current state to an output value of the gate to calculate a possible post state.
  • an adder configured to add the number of the current state to an output value of the gate to calculate a possible post state.
  • a multi-string matching structure comprising: a status register: for storing a current state; a color register: for storing a color of a current state; a status buffer: for storing a buffer state; a color buffer: The color used to store the cache state; the main memory: stores the basic conversion rule and the n- step cross conversion rule, and the first input is the first possible post state calculated according to the current state and the input character combined with the input translation table, corresponding to The first way output is the color of the first possible post state obtained according to the stored conversion rule and the input character corresponding to the first possible post state; the second input is > cache state and The input character is matched with the second possible post state calculated by the input translation table, and the corresponding second output is the color of the second
  • the road character is the same as the actual input character, the state register is overwritten with the first possible post state, and the color register is overwritten with the color of the first possible post state; if the first path character and The actual input characters are different, but the second path character is the same as the actual input character, the state register is overwritten by the second possible post state, and the color is covered by the second possible post state The color register; otherwise, the status register and the color register are respectively covered by the post state output and the color thereof.
  • the multi-string matching structure further includes: a first comparator, configured to perform a comparison between a first path character output by the main memory and an actual input character; and a second comparator, A comparison between the second pass character output by the main memory and the actual input character is performed.
  • a first comparator configured to perform a comparison between a first path character output by the main memory and an actual input character
  • a second comparator A comparison between the second pass character output by the main memory and the actual input character is performed.
  • the multi-string matching structure further includes: a first strobe: configured to select and output an output value of the input translation table and a value 1 according to a value of the color register; and the second strobe: The value of the color buffer is selected for the output value of the input translation table and the value 1.
  • the multi-string matching structure further includes: a first adder: configured to add a number of the current state to an output value of the first gate to calculate a first possible post state a second adder: configured to compare the number of the buffer state with an output value of the second gate Add to calculate the second possible post state.
  • a first adder configured to add a number of the current state to an output value of the first gate to calculate a first possible post state
  • a second adder configured to compare the number of the buffer state with an output value of the second gate Add to calculate the second possible post state.
  • a multi-regular expression matching method comprising the steps of: sequentially taking characters as input characters from a received input character stream; for each input character, performing the following steps: according to the current input
  • the character, current state, and cache state are looked up in the state transition rule base; jump to the post state; state cache according to a specific cache rule; the post state as the current state, and the cached state as the cache state
  • the next input character is used as the current input character, and the step performed for each input character is repeated until all the characters in the character stream are judged.
  • the step of the post-find state includes: first determining whether the current state receives the current input character in the basic conversion rule and the n-step cross-conversion rule, and if present, if present, Then, the post state is used as a search result; if not, it is determined whether the cache state receives the current input character in the basic conversion rule and the n-step cross-conversion rule, and if yes, the post state is used as the search result; If it does not exist, it is judged whether the initial state receives the current input character in the basic conversion rule and the n-step cross-conversion rule; if it exists, the post state is used as the search result; otherwise, the initial state is used as the search result;
  • the step of performing state buffering according to a specific cache rule is: if the initial state receives the corresponding post state of the current input character in the basic conversion rule, the post state is cached; otherwise, the initial state is cached.
  • the step of the post-find state includes: determining a type of the current state, and if it is a converged state or a general state, according to the current input character and the current state in the state transition rule set After the lookup state; if it is a detached state, the post state is searched in the detached state transition rule set according to the current input character, the current state, and the cache state; the detached state transition rule set is set to receive three inputs: the current input character, the current The status and the cache status are respectively provided with an output: a post state; the step of caching according to a specific cache rule is: If the current state is a converged state, the current state is cached.
  • the present invention has significant advantages and advantageous effects over the prior art.
  • the multi-string matching method based on the cache state machine and the chip structure based on the "post-state lookup" have at least the following advantages and beneficial effects:
  • the performance of the matching is independent of the size of the rule base.
  • the performance of the matching is independent of the minimum length of the rule base.
  • the performance of the matching is independent of the relationship between the rule base and the text to be matched. It can support large-scale rule sets, with the number of rules. Increase the sub-linearity of storage space, effectively reduce space requirements, and be effective Store and access conversion rules in the state machine.
  • Figure 1 The finite state automaton constructed in the existing multi-string matching scheme A.
  • Figure 2 A finite state automaton constructed according to the scheme of the prior art 1, in which different priorities are set for the conversion rules.
  • Figure 3 State machine model.
  • Figure 4 Cache state machine model.
  • Figure 5 A finite state automaton constructed according to scenario A, where the restart conversion rules and the failed conversion rules have been removed.
  • Figure 6 Cache state machine constructed to implement dynamic cross-conversion loading.
  • Figure 7 Flow chart of the dynamic cross-conversion loading method.
  • Figure 8 A finite state automaton constructed according to scenario A, with a homogeneous path.
  • Figure 9 The ideal framework for feature set ⁇ betters, pattern ⁇ optimization.
  • Figure 10 Conformation path merge based on cache state machine.
  • Figure 11 Three states in the homogeneous path merge method.
  • Figure 12 Conversion function for three states in the isomorphic path merge method.
  • Figure 13 Two observations based on the post-state lookup structure.
  • Figure 14 Post-state lookup framework.
  • Figure 15 Detailed structure of the post-state lookup structure.
  • FIG. 16 Input translation table (ITT) structure.
  • Figure 17 Schematic diagram of the consolidation of ITT entries.
  • Figure 19 One of the ITT table optimizations: Table item consolidation method.
  • ITT Table Optimization 2 2-way set associative ITT table structure.
  • ITT table optimization 2 N-way group association ITT table optimization method.
  • Figure 22 Chip structure ACC-NSA structure for implementing multi-string matching technology based on cache state machine.
  • Figure 24 Applying a dynamic cross-conversion loading method to eliminate the effect of a cross-conversion rule (Snort rule).
  • Figure 25 Effect diagram of applying the merged isomorphic path method to reduce the basic conversion rules.
  • DFA deterministic finite state automaton
  • - Deterministic Finite Automata A representation of DFA is shown in Figure 3.
  • Each DFA has a current state (in the status register) that accepts the conversion rules for that character based on the input character and the current state, and proceeds to the next state. When the next character comes, the "next state” becomes the "current state”.
  • DFA can perform state transitions based on the internal data structure shown in Figure 1 driven by input characters. The main features of DFA are: Its next state is determined only by the current state and the currently entered characters.
  • DFA and NFA are simplified form of the Turing machine model, regardless of the deterministic finite state automaton (DFA) or the uncertainty finite state automaton (NFA), the next state is only the current state and The current input decision is shown in Figure 3. NFA can be converted to DFA equivalently.
  • a finite state set denoted K, is a collection of all states
  • a collection of alphabets denoted as ⁇ , that is, a collection of characters received by the state machine
  • Receive state set denoted as F
  • receive state set is a subset of the finite state set
  • the state transition function is a binary function, which determines the next state according to the current state of the state machine and the received characters.
  • the CDFA - Cached Deterministic Finite Automata is proposed by the present invention, and one of its manifestations is shown in FIG. Referring to DFA, CDFA includes a cache state (in the state buffer) in addition to a current state. In the cache state machine, its next state is determined by the current state, the currently input character and the cache state. The next cache state is determined by the internal mechanism of the cache state machine. No external input is required, and the cache state machine can be used. The specific needs of the flexible customization.
  • the Cache State Machine breaks the traditional state machine's "the next state is determined only by the current state and the current input". By recording the history information, the richness of the operation of the state machine in the post-determination state is increased.
  • the cache state machine achieves the above design goal by adding a state cache function to the state machine, as shown in FIG. From the perspective of the external interface, the cache state machine, like the traditional state machine, receives only input characters and outputs the state machine judgment result. The difference is that a state buffer ( Cache ) is added internally to enable a certain policy to cache the state.
  • the Cache State Machine can be defined as a seven-tuple, ⁇ /, ⁇ , , ⁇ ⁇ , ⁇ , including: • A finite state set, denoted ⁇ , that is, a set of all states in the state set;
  • a collection of alphabets denoted as ⁇ , that is, a collection of characters received by the state machine
  • N The number of caches contained in the state machine
  • the cache policy function determines the state to be cached according to the current state and the current input; the state transition function ⁇ determines the next state according to the current state, the cached state, and the input characters.
  • the new state machine model is named as the cache state machine. model.
  • the cache policy function can remember both historical information that the state machine has experienced, and can also "remember” other state information in a certain way.
  • the structure of the cache state machine is as follows, which includes:
  • Status register used to register the current status
  • Cache Status Register Used to register the cache status.
  • the number of states that can be registered is ⁇ , ⁇ > 1 ;
  • Conversion Rule Module Used to store the state conversion rule base, and according to the characters received by the interface module, the current state of the status register registration and the cache. The status register registered cache status looks for the next state.
  • Interface module used to receive input characters
  • Control module Used to control the characters that the interface module normally receives input, control the status register to update the current state, control the cache status register to update the buffer status, and control the conversion rule module to find the next status.
  • the prioritized approach used in paper 1 will be able to restart the conversion rules and failure conversion rules. Then the number is controlled within 256. In the present invention, both types of conversion rules can be solved in this way.
  • the invention utilizes the principle of a buffer state machine, mainly to eliminate nearly all cross-conversion rules, thereby completely solving the space explosion problem.
  • the present invention utilizes the principle of a buffer state machine, and can also reduce the number of basic conversion rules, thereby achieving a sub-linear increase in storage space with the number of rules.
  • the implementation is as follows.
  • the principle of the cache state machine uses Method 1: "Dynamic Cross-Conversion Loading" to eliminate more than 95% or even all of the cross-conversion rules. This method is named ACC.
  • the cross-conversion rule has been eliminated, and replaced by a cache space.
  • the principle of any one of the cross-conversion rules is: in the current state ⁇ S 3 , the received character is s, while switching to state S 4 , another path is opened from S G (ie, where S 6 is located) path).
  • the basic conversion rule of the current path is not met (if the current state S 4 , the input character is S, then the state is converted to state S 5 ), but the basic conversion rule of the other path is met (if the current state S 6 , if the input character is 1, then the condition of transition to state S 7 ) is generated, then the cross-conversion rule is generated, that is, if the next input character is 1, the state jumps from state S 4 to S 7 .
  • the operations performed by the cache state machine are as follows. If the current state S at position 3, the current character is received 3, according to the principle of cross conversion rule generation, S 6 is cached state. At the same time, the state machine enters the next state S 4 . In S 4 state, the received character is 1, the next state, the input characters (1) and the state of the buffer (S. 6) determined by the current state (S 4), because the S in the basic conversion rule path 4 does not accept characters 1, and S 6 accepts the character 1, so S 7 is determined to be the next state.
  • the dynamic cross-conversion loading dynamically generates the cross-conversion rules originally described by DF A using the CDFA principle, thereby greatly reducing the number of stored conversion rules.
  • the state transition function ⁇ can be divided into the following two categories:
  • S nerass represents the state transition function of the n-step cross-conversion rule, S llcross K ⁇ K , and the definition of ⁇ n cross in the ACC method is the same as that of scheme A.
  • the state transition function ⁇ is defined as
  • priority is the priority identifier
  • A is the highest priority
  • D is the highest priority. If the high priority result is valid (not empty), the result is taken first; if the high priority result is invalid, the 4 priority result is adopted.
  • the invalid result means that a certain state Si is in S basie and S ncr . There is no rule in the ss conversion function that accepts the character c.
  • the meaning of the state transition function ⁇ is that, for the state transition of the CDFA in the ACC, it is first determined whether the current state Si has a conversion rule for receiving the current character c in the basic conversion rule and the n-step cross-conversion rule. If yes, apply the rule to jump to the next state; if there is no corresponding conversion rule, the cached state S k is taken out, and the Sk state is the current state in the basic conversion rule and the n-step cross conversion
  • the rule searches for a conversion rule that accepts the current character c. If it exists, it jumps to the corresponding next state; if there is no corresponding conversion rule, it determines whether the initial state So receives the character c; if it receives, jumps to the corresponding state , otherwise jump to the initial state So.
  • the cache policy function is defined as
  • the meaning of the cache policy function ⁇ is that for the buffer space of the CDFA in ACC (only one), each cycle is cached, and the cached content is the initial state So accepts the next state corresponding to the current input character c; The corresponding conversion rule is not included in the rule, and the initial state S G is cached. As you can see, the cache policy function has nothing to do with the current state Si.
  • the ACC method is based on the above cache state machine.
  • the method mainly consists of two steps: preprocessing and matching.
  • the work in the preprocessing stage is to read in the feature set and construct the cache state machine;
  • the job of the matching phase is to read in the text to be matched, perform state machine conversion, and report the match in a specific state.
  • the idea of the ACS method is to combine the homogeneous paths in the state machine to reduce the number of states and basic conversion rules in the state machine.
  • the ACS method uses a cache state machine model.
  • the cache state machine can effectively remember the characteristics of the state transition history information, and perform isomorphic path merging to ensure the correctness of the matching. Taking the feature set ⁇ pattern, betters ⁇ as an example, the isomorphic path using the idea of the cache state machine is combined as shown in Fig. 10.
  • the idea of merging the isomorphic path based on the cache state machine is to dynamically store the path source state (S 8 or S in Figure 10 is stored in the cache of the cache state machine) when the path is merged. If the received characters cause the state transition to arrive at the same isolated path configuration of the position (s 6 state), a state will be cached taken to determine the configuration according to jump to the state where the source of the same path. for this reason, if the text input at this time is "patters", the state in which the same Si configuration at the beginning of the path is cached, the state S 6 when taken out, because the path is not derived from S 8, even when the input character is "s", not to jump to state S 9.
  • Each state in the CDFA corresponds to one color, and the CDFA contains three colors. The color is used to distinguish three different states in the merge process of the isomorphic path, as shown in Figure 11.
  • Converging states Yellow, mesh, defined as the last state before entering the isomorphic path, which represents the history information of the state machine before the isomorphic path. This state triggers its own state cache. This set of states is denoted as K c . v .
  • the state transition function ⁇ of the cache state machine CDFA can be divided into the following two categories: For the convergence of the four dog states and the general state, ⁇ is a binary function, ⁇ : ⁇ ⁇ - ⁇ ⁇ , the definition of ⁇ in the ACS method is the same as that of the scheme A.
  • is a ternary function, : ⁇ ⁇
  • the conversion function ⁇ of the separation state in the ACS method is defined as the current state, the cache state, and the current character.
  • the state transition function ⁇ is defined as
  • the transition rule for the state transition function in ⁇ is different from the traditional transition rule. It contains three inputs and one output. The three inputs contain the aggregation status of the source before the isomorphic path merge, as shown in Figure 12.
  • the conversion rule set two inputs
  • the conversion rule set is found according to the current input and the current state to obtain the next state.
  • the separated state in addition to the current input and current state, it is also necessary to find a separate state transition rule set (different from the conversion rule set, three inputs) according to the state being cached to obtain the next state.
  • the cache policy function ⁇ is defined as
  • the cache policy function ⁇ means that for the CDFA cache space in ACS (only one), when the current state is the aggregation state, the state is cached to the cache space. In other cases, nothing is done with the cache space.
  • the type of the current state is first determined, and then the corresponding action is performed according to the judgment result. If it is the aggregation state, the next state is obtained by searching the conversion rule set according to the current input and the current state, and the current state is cached to the cache space; if it is the general state, the conversion rule set is obtained according to the current input and the current state to obtain the next state. ; If it is a detached state, the next state is obtained by looking up the separation state transition rule set according to the current input, current state, and cache state.
  • the merged CDFA removes 5 states and 4 basic conversion rules, and space can be further saved.
  • the overhead required is the storage of a state storage space as a cache.
  • a regular expression is a string consisting of a series of special characters.
  • regular expressions refer to related materials.
  • the traditional AC algorithm can solve the problem of multi-regular expression matching by converting regular expressions into DFA and using DFA.
  • CDFA and use CDFA to receive input characters for matching.
  • the specific matching method includes eliminating 1 step Cross-conversion rules and homogeneous path merges, etc.
  • the technical difficulty of hardware implementation is: How to effectively store the conversion rule base in the memory and how to effectively locate the conversion rule Tr.
  • Si in the conversion rule Tr is referred to as "input state” and c is referred to as "input character”, which is referred to as "output state”.
  • Linear Trie structures there are a large number of Linear Trie structures in the state machine, especially the cache state machine generated by scenario A.
  • the so-called "linear tree” means that each state in the state machine contains only one transformation rule pointing to the next state, and forms a linear one-dimensional structure. Due to the existence of a large number of linear trees, the status numbers can be arranged incrementally. Therefore, the number of the next state can be calculated from the current state, that is, the predicted state.
  • the characters it accepts are deterministic, regardless of the type of conversion rules entered. If the state S 7 receives the basic conversion rule and the cross conversion rule, the character received by the state is "i" regardless of the conversion rule. Therefore, if the post state, that is, the output state, is obtained, the characters accepted by it can be uniquely determined, and by comparing with the actually input characters, it can be verified whether the calculated post state is a real post state. .
  • the structure of the post-state lookup uses a "predictive" and verification approach, as shown in Figure 14.
  • a possible post-state is calculated through an Input Translation Table (ITT) or a possible post-state is directly calculated, and the post-state is used as an address to index the rule storage table to obtain the state.
  • ITT Input Translation Table
  • the rule storage table can be stored by using an inexpensive memory such as SRAM or DDR, and the internal conversion rules of the memory are compactly distributed, and there is no "gap".
  • the post-state lookup is effective and comes from the use and optimization of ITT tables. According to observation 1, it can be known that since the state machine contains a large number of linear trees, the post-state of each state in the linear tree can be obtained by simple incrementing without looking up the ITT table. Only a small number of states with multiple conversion rule outputs need to enter the ITT table to get the difference between states. In addition, optimization of ITT tables can further reduce the use of storage space. ⁇
  • the detailed design of the NSA structure is divided into two parts.
  • One is the conversion rule in the input translation table ITT.
  • the rules store the storage in the table; the second is the access path design of the conversion rules.
  • the overall structure of the NSA is shown in Figure 15. This includes the main space "TRM-1, (Transition Rule Memory - 1) stored in the conversion rule and the storage space "TRM-0" (Transition Rule Memory -0 ) that resolves the failure conversion rule and restarts the conversion rule.
  • TRM-1 Transition Rule Memory - 1
  • TRM-0 Transition Rule Memory -0
  • a strobe MUX is provided for selecting and outputting the output value (ie, the difference between the states) obtained by accessing the ITT table according to the value of the color register and the value 1. If the color register value is 0, it is considered that there is no color in the current state, MUX selects output 1, and the current status number is incremented by 1 to obtain the post status number, and the corresponding state is used to access TRM-1 to obtain the corresponding value.
  • the corresponding value includes the color of the next state and a character.
  • the color register value is not 0, it is considered that the previous state has color, that is, the current state and the currently input character are input into the table together to obtain the output value, and the MUX selects the output value obtained by accessing the ITT table, that is, the current state number is added.
  • the post-state number is obtained after the difference between the states, and the corresponding value is obtained by accessing the TRM-1 with the post-state.
  • the input character is input to TRM-0 to obtain an output value, which includes the next state and a color value.
  • the character value output from the TRM-1 is compared with the current input character at a comparator CMP, and the following operation is performed by a two-state gate according to the comparison result: If equal, the color of the next state output by the TRM-1 is used.
  • the color register is overwritten, and the status register is overwritten with the calculated address of the access TRM-1 (ie, the post state), thereby realizing state transition in the case of verification. Otherwise, the state register is overwritten with the state of the TRM-0 output, and the color is overwritten with the color register, thereby realizing zeroing in the case of verification failure.
  • Failed conversion rules and restart conversion rules can be combined into a maximum of 256 with priority policies.
  • the input character is used as the address for indexing. That is, the initial state So or the post state of the initial state is output according to the input character.
  • TRM-0 uses character addressing to store the two types of conversion rules. According to the output state that the input character can jump, if there is a conversion rule for the corresponding character, the post state of the initial state is stored in the corresponding position. If there is no conversion rule for the corresponding character, The initial state is stored in the corresponding location. Since the input characters are up to 256, TRM0 contains 256 entries.
  • the character sequence accepted by each state is stored in the main conversion rule memory TRM-1 according to the state number. This part of the space is compact.
  • each state can be made into any color.
  • the input translation table uses color as an index for access.
  • Si the current state Si in the state machine, it is set to the input state of the k conversion rules, ie for this state, there are k characters that cause it to jump to the new state. (The failure conversion rules and restart conversion rules are not considered here).
  • both the state Si and the state S k contain two output conversion rules. To be able to predict the next state, the two states are respectively associated with a new row of the ITT table, and different colors are used to index the ITT table.
  • each color corresponds to 256 values, and each value is a state number difference value in which the state Si receives the corresponding column character and jumps to the new state.
  • state Si receives the character 0x01 and jumps to state S k , which corresponds to the ITT table.
  • the 0x01 column of color 1 stores the difference between state S k and state Si: k - i. Where 0 represents a null value.
  • the possible post state is S i+1 ; if the color is not white, access the ITT with color and current input Table, obtain the state difference, and then calculate the post state S i+ i .
  • the post state is calculated, although the current state information is used and the current input character information may be used, this use is not sufficient to actually determine the post state. To this end, it is necessary to compare the accessed character c, and the current character. c. If the two characters are the same, the calculated post state is the real post state and jumps to the state. If the two characters are different, jump to the state obtained by the TRM-0 access, that is, apply the failure conversion rule or restart the conversion rule.
  • each state containing multiple output conversion rules is assigned a new color, i.e., a portion of the ITT table is allocated as a basis for the post-calculation state. It should be noted that for most colors, there are only a few post-states, so the ITT table has a large number of nulls (0) per line. In order to effectively use the ITT table space, an optimization method for the ITT table is given here: Table item merge.
  • merging the entries of the ITT table is to combine multiple entries of the ITT table into one, so as to effectively utilize the space resources. Another implication of merging is to make the color of the state in the state machine.
  • Figure 17 shows the merge of the entries in the ITT table.
  • the left state machine contains 4 colors, and after merging, the right state machine contains only 2 colors.
  • Resource conflict means that the value of the corresponding column in the ITT table entry is not empty and different; as shown in Figure 18, color 2 and color 4.
  • Coverage conflict means that after a non-null value of a column in the ITT table entry covers a null value, an additional (virtual) conversion rule is added for the original state. It is to be ensured that the added extra conversion rule does not conflict with the original conversion rule, that is, the post state obtained according to the (virtual) conversion rule does not receive the character corresponding to the existing conversion rule.
  • the ITT table can be used to merge the entries.
  • the related method is shown in Figure 19.
  • Figure 19 shows the judgment of whether two ITT table entries can be merged.
  • the judging method judges each of the two rows to be merged.
  • the judgment of the kth column is as follows: If one of the two columns is empty, it is judged whether the state corresponding to the empty column is equal to k if the character received by the non-empty column data after the combination is merged, and if so, the overlay conflict, two columns Cannot merge, exit, if not, proceed to the next judgment; if both columns are empty or not empty, judge whether the corresponding values of the two columns are the same, if not, the resource conflicts, the two columns cannot be merged, and exit, if Yes, then judge the next column. Until all the columns in the two rows to be merged are determined to have no resource conflicts and overlay conflicts, the corresponding rows are merged, with non-null values covering the null values.
  • the method of two-two judgment is taken. As shown in Fig. 18, the color 2 and the color 1 are first combined and judged, then the color 3 and the color 1 are combined and judged, and so on. Until all possible merged colors are merged.
  • the group association optimization strategy is similar to the group association strategy cached in the computer storage system.
  • the idea is to break the boundaries of the ITT table column by group association, and the same column data can be stored in different columns.
  • the 2-way set associative ITT table structure is shown in Figure 20. With this structure, the color 4 in Fig. 18 can be combined with the color 2.
  • the ITT table is divided into 256/N groups.
  • the method for judging whether they can use the group association strategy for merging is shown in FIG. 21.
  • Two ITT table entries can be optimized by group association if and only if there is no conflict in the state in the same group. The conflict here is just a resource conflict, that is, any group contains non-empty elements that exceed N.
  • the method in Figure 21 is to determine if two ITT table entries conflict. For a group p, determine the number of valid values contained in the two lines. If it is greater than N, it means that there is a resource conflict in the group (the total number of valid values exceeds N). Therefore the two lines cannot be merged. Otherwise, another group is judged, and until it is determined that there is no resource conflict in all 256/N groups, the two lines can be merged.
  • the group association policy of the ITT table needs to add a tag bit (Tag) to distinguish each content.
  • the tag here requires two fields, one is the input tag field, and the other is the color tag field. Use these two fields to distinguish between different rows and different columns before the merge.
  • the NSA is an efficient hardware state machine implementation. This effectiveness stems from accurate access to the memory, the absence of conflicting items to determine, and the use of inexpensive SRAM, DDR, etc. memories. Although there are certain storage gaps in the ITT table of the NSA, this gap can be effectively controlled by the combination of the table entry and the optimization of the group association strategy.
  • Corresponding chip structure In order to implement a multi-string matching technology based on a cache state machine at a high speed, the present invention designs a corresponding chip structure, and the overall structure is as shown in FIG.
  • the structure is a feature matching structure including an ACC method and an NSA structure for string matching.
  • the ACC-NSA structure of Figure 22 includes a conversion rules module, a status register, and a cache status register module.
  • the NSA structure can implement a state machine efficiently by hardware, and the ACC method is based on the principle of the cache state machine.
  • the main problem solved by combining the ACC-NSA structure is to provide a cache using the NSA structure to implement the ACC method. state machine.
  • Figure 22 shows the ACC-NSA structure framework. It can be seen that the structure is based on the post-state lookup structure shown in Figure 15, adding the "state buffer” and "color buffer” related paths. These two sets of paths share a set of memory ITT tables and TRM-1 memory.
  • the TRM-1 design and the TRM-1 design are implemented in a dual-port memory that supports parallel access to the registers and cache. (If you do not consider the parallel access feature, single-port memory can also be used.)
  • TriMUX three-state gate
  • TriMUX three-state gate
  • ( state, color, "1” ) represents the post state value and its color calculated using the state in the register
  • ( state, color, "2” ) Represents the post state value and its color calculated using the state in the cache
  • middle_pi'iority means: If the input state calculated by using the state in the cache is consistent with the actual input character after accessing the TRM-1, the TriMU does not satisfy the preference.
  • the output uses the post state value calculated by the state in the register and its color, the output is selected from the post state value calculated by the state in the buffer and its color, and the input has a medium priority. If the conditions of the situation are not met, then the third input is selected, that is, the post-state value and its color of the TRM-0 output obtained by the application failure and restarting the conversion rule.
  • the register (status register and color register) and the cache (status buffer and color buffer) access the ITT table together, calculate the possible post-state value, and access TRM-1 to extract the character corresponding to the conversion rule.
  • TRM-0 the failure conversion rule and the state value corresponding to the restart conversion rule are obtained.
  • the three-way result is entered into the TriMUX module.
  • the TriMUX module it is judged whether the real occurrence occurs by comparing the input characters, and the TriMUX module is controlled to select the correct result to be overwritten into the register.
  • the result of TRM-0 is updated. A state transition is formed.
  • One of the one-step cross-conversion rules can be eliminated by the method of the present invention. It can be seen that the multi-string matching method based on the cache state machine "dynamic cross-conversion loading" can reduce the space to the original 4.1% (ClamAV Rule) and 20.8% (Snort rule).
  • the dotted line is the data of the traditional method using DFA
  • the solid line is the data of the method using the buffer state machine CDFA. It can be seen that the CDFA-based method of the present invention can reduce the number of basic conversion rules by up to 21.4%. (Snort rules).
  • the chip structure ACC-NSA structure can achieve a maximum matching speed of 11.7 Gbps (under 0.18 micron process). It has a faster speed than other methods.
  • the multi-string matching method based on the cache state machine and the chip structure based on the "post-state lookup" have at least the following advantages and beneficial effects:
  • the performance of the matching is independent of the size of the rule base.
  • the performance of the matching is independent of the minimum length of the rule base.
  • the performance of the matching is independent of the relationship between the rule base and the text to be matched. It can support large-scale rule sets, with the number of rules. Increasing the storage space sub-linear increase, can effectively reduce the space requirements, can effectively store and access the conversion rules in the state machine.

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Document Processing Apparatus (AREA)

Abstract

L'invention porte sur une méthode de mise en concordance de chaînes de caractères basées sur un automate fini caché et sur une structure de puce de recherche de l'état suivant. La méthode de mise en concordance recherche l'état suivant dans une base de données de règles de transformation d'états sur la base de caractères entrés, l'état actuel et les états cachés effectuent alors un saut et cachent l'état par des règles spécifiques cachées. La structure de la puce comporte: une mémoire principale renfermant des règles de transformation de base et des règles de transformation croisées à n étapes; une table de traduction des entrées, partagée par les deux chemins d'un registre d'état; un registre de couleurs; un cache d'état, et un cache de couleurs, afin de calculer l'état suivant possible et d'acquérir les caractères entrés correspondants. Une mémoire auxiliaire, contenant les règles de transformation d'échec et de reprise, permet d'acquérir l'état suivant correspondant aux entrées actuelles et d'actualiser le cache d'états et le cache de couleurs. Un sélecteur à trois états met en oeuvre la sélection multi-voie de l'état suivant basé sur le caractère actuellement saisi et sur le caractère correspondant à l'état suivant possible, afin de mettre à jour les registres d'état et les registres de couleurs.
PCT/CN2008/000293 2007-05-18 2008-02-03 Méthode et structure de puce de mise en concordance de chaînes de caractères WO2008141519A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CNB200710099389XA CN100495407C (zh) 2007-05-18 2007-05-18 一种多字符串匹配方法和芯片
CN200710099389.X 2007-05-18

Publications (1)

Publication Number Publication Date
WO2008141519A1 true WO2008141519A1 (fr) 2008-11-27

Family

ID=38782733

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2008/000293 WO2008141519A1 (fr) 2007-05-18 2008-02-03 Méthode et structure de puce de mise en concordance de chaînes de caractères

Country Status (2)

Country Link
CN (1) CN100495407C (fr)
WO (1) WO2008141519A1 (fr)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106445891A (zh) * 2016-08-09 2017-02-22 中国科学院计算技术研究所 一种串匹配算法的加速方法及装置
CN111078963A (zh) * 2019-12-31 2020-04-28 奇安信科技集团股份有限公司 Nfa到dfa的转换方法及装置

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100495407C (zh) * 2007-05-18 2009-06-03 北京哲安科技有限公司 一种多字符串匹配方法和芯片
CN101901257B (zh) * 2010-07-21 2012-07-04 北京理工大学 一种搜索引擎中的多字符串匹配方法
CN104714951A (zh) * 2013-12-13 2015-06-17 世纪禾光科技发展(北京)有限公司 一种并行多模式匹配的方法及系统
CN104361097A (zh) * 2014-11-21 2015-02-18 国家电网公司 一种基于多模匹配的电力敏感邮件实时检测方法
CN107967219B (zh) * 2017-11-27 2021-08-06 北京理工大学 一种基于tcam的大规模字符串高速查找方法
CN108133052A (zh) * 2018-01-18 2018-06-08 广州汇智通信技术有限公司 一种多关键字的搜索方法、系统、介质及设备
CN110222143B (zh) * 2019-05-31 2022-11-04 北京小米移动软件有限公司 字符串匹配方法,装置,存储介质及电子设备

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4241402A (en) * 1978-10-12 1980-12-23 Operating Systems, Inc. Finite state automaton with multiple state types
JP2002297681A (ja) * 2001-03-29 2002-10-11 Kddi Corp 有限状態オートマトン作成装置
US6961693B2 (en) * 2000-04-03 2005-11-01 Xerox Corporation Method and apparatus for factoring ambiguous finite state transducers
CN1801152A (zh) * 2006-01-13 2006-07-12 清华大学 一种用于文本或网络内容分析的多关键词匹配方法
CN101051321A (zh) * 2007-05-18 2007-10-10 北京哲安科技有限公司 一种多字符串匹配方法和芯片

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4241402A (en) * 1978-10-12 1980-12-23 Operating Systems, Inc. Finite state automaton with multiple state types
US6961693B2 (en) * 2000-04-03 2005-11-01 Xerox Corporation Method and apparatus for factoring ambiguous finite state transducers
JP2002297681A (ja) * 2001-03-29 2002-10-11 Kddi Corp 有限状態オートマトン作成装置
CN1801152A (zh) * 2006-01-13 2006-07-12 清华大学 一种用于文本或网络内容分析的多关键词匹配方法
CN101051321A (zh) * 2007-05-18 2007-10-10 北京哲安科技有限公司 一种多字符串匹配方法和芯片

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
AHO A.V. AND CORASICK M.J.: "Efficient String Matching: An Aid to Bibliographic Search", COMMUNICATIONS OF THE ACM, vol. 18, no. 6, June 1975 (1975-06-01), pages 333 - 340, XP001152117 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106445891A (zh) * 2016-08-09 2017-02-22 中国科学院计算技术研究所 一种串匹配算法的加速方法及装置
CN111078963A (zh) * 2019-12-31 2020-04-28 奇安信科技集团股份有限公司 Nfa到dfa的转换方法及装置
CN111078963B (zh) * 2019-12-31 2023-08-15 奇安信科技集团股份有限公司 Nfa到dfa的转换方法及装置

Also Published As

Publication number Publication date
CN101051321A (zh) 2007-10-10
CN100495407C (zh) 2009-06-03

Similar Documents

Publication Publication Date Title
WO2008141519A1 (fr) Méthode et structure de puce de mise en concordance de chaînes de caractères
CN109921996B (zh) 一种高性能的OpenFlow虚拟流表查找方法
US7539032B2 (en) Regular expression searching of packet contents using dedicated search circuits
US7539031B2 (en) Inexact pattern searching using bitmap contained in a bitcheck command
US7624105B2 (en) Search engine having multiple co-processors for performing inexact pattern search operations
US7644080B2 (en) Method and apparatus for managing multiple data flows in a content search system
JP3935880B2 (ja) ネットワーク・プロセッサおよびコンピュータ・システム用ハイブリッド・サーチ・メモリ
CN105224692B (zh) 支持多核处理器的sdn多级流表并行查找的系统及方法
JP4091604B2 (ja) ビットストリングの照合方法および装置
US20080192754A1 (en) Routing system and method for managing rule entries of ternary content addressable memory in the same
US10110492B2 (en) Exact match lookup with variable key sizes
US8966152B2 (en) On-chip memory (OCM) physical bank parallelism
US20080071780A1 (en) Search Circuit having individually selectable search engines
EP2215563B1 (fr) Procédé et appareil destinés à traverser une compression de graphique d'automate déterministe à états finis (adef)
US8560475B2 (en) Content search mechanism that uses a deterministic finite automata (DFA) graph, a DFA state machine, and a walker process
US9871727B2 (en) Routing lookup method and device and method for constructing B-tree structure
US20090274154A1 (en) Double-hash lookup mechanism for searching addresses in a network device
CN101309216B (zh) 一种ip包分类方法和设备
US20070171911A1 (en) Routing system and method for managing rule entry thereof
JP2005513895A5 (fr)
US20030191740A1 (en) Multi-dimensional associative search engine
US9465860B2 (en) Storage medium, trie tree generation method, and trie tree generation device
EP1678619B1 (fr) Memoire associative comprenant des groupes d'entree et des operations de saut
US6629195B2 (en) Implementing semaphores in a content addressable memory
US20140114995A1 (en) Scalable high speed relational processor for databases and networks

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 08714815

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 26/03/2010)

122 Ep: pct application non-entry in european phase

Ref document number: 08714815

Country of ref document: EP

Kind code of ref document: A1