WO2009116646A1 - マルチバイト処理向け文字列照合用有限オートマトン生成システム - Google Patents
マルチバイト処理向け文字列照合用有限オートマトン生成システム Download PDFInfo
- Publication number
- WO2009116646A1 WO2009116646A1 PCT/JP2009/055515 JP2009055515W WO2009116646A1 WO 2009116646 A1 WO2009116646 A1 WO 2009116646A1 JP 2009055515 W JP2009055515 W JP 2009055515W WO 2009116646 A1 WO2009116646 A1 WO 2009116646A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- nfa
- multibyte
- character string
- processing
- finite automaton
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/02—Comparing digital values
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/903—Querying
- G06F16/90335—Query processing
- G06F16/90344—Query processing by using string matching techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2207/00—Indexing scheme relating to methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F2207/02—Indexing scheme relating to groups G06F7/02 - G06F7/026
- G06F2207/025—String search, i.e. pattern matching, e.g. find identical word or best match in a string
Definitions
- the present invention relates to a finite automaton generation system for character string matching for multibyte processing, an automaton circuit generation system, a generation method thereof, a circuit generation method, a generation program, a circuit generation program, a pattern matching apparatus using the same, and a character for multibyte processing
- the present invention relates to a finite automaton circuit for column matching.
- the FA includes a non-deterministic finite automaton (NFA) having a plurality of state transition destinations for an input character in a certain state, and a deterministic finite automaton (DFA: Deterministic) having only one state transition destination. (Finete Autoton).
- NFA non-deterministic finite automaton
- DFA Deterministic finite automaton
- the NFA can construct a syntax tree from a search target condition such as a given regular expression and generate the syntax tree based on the syntax tree.
- the DFA can be generated from the NFA generated by the above procedure, but the number of states may increase to a maximum of about 2n with respect to the number of states NFA (Non-patent Document 2).
- state transition information is stored in a memory as a state transition table, and pattern matching is performed while transitioning states one by one with reference to the above table.
- every time a state transition occurs it is necessary to access the memory to acquire transition information, and this memory access becomes a bottleneck for speeding up.
- the method of storing the NFA state transition table on the memory as described above it is only possible to select one transition destination from a plurality of state transition destinations and perform processing, so matching is performed at the selected state transition destination. If the process fails, a process called “backtrack” is required to return to the point of branching and test another candidate, and this backtrack itself also hinders speeding up.
- the number of states may increase explosively, so a large capacity memory is required.
- the above becomes a bottleneck in speedup and configuration.
- an NFA is directly circuitized and incorporated into a reconfigurable device such as an FPGA (Field Programmable Gate Array).
- a method for performing high-speed pattern matching has been proposed.
- an NFA includes a special transition called an ⁇ -transition that allows a transition to the next state without reading an input, but a pattern matching circuit (hereinafter referred to as a direct NFA) incorporating the direct NFA as described above.
- a direct NFA incorporating the direct NFA as described above.
- NFA circuit it is necessary to use an NFA that does not include an ⁇ transition.
- Such an operation for removing the ⁇ transition from the NFA is called ⁇ -closure (Non-Patent Document 1, Non-Patent Document 2).
- a method of directly circuitizing the NFA As a method of directly circuitizing the NFA as described above, a method of generating an NFA circuit directly incorporating NFA from a regular expression via a syntax tree (Syntax Tree), or a regular expression once converted to NFA.
- a syntax tree Synchrontax Tree
- FIG. 24 a syntax tree
- '*' included in the regular expression is a metacharacter representing zero or more matches
- ' is a metacharacter representing OR
- the white arrow in FIG. 24 indicates the initial state with a double circle. The state indicates an end state.
- states 0 to 4 of the original NFA (FIG.
- the NFA circuit has a configuration in which a register that represents each state and a comparator that determines that a transition condition has been input are connected in accordance with the state transition of the NFA. 1 byte), it has a search throughput performance proportional to the operating frequency.
- NFA for one character (1 byte) processing for the “abcde” pattern shown in FIG. 26 (hereinafter, such an NFA for one byte processing is referred to as “1-byte NFA”). )
- NFA processing 4 characters (4 bytes)
- Multibyte NFA NFA with a processing byte count of k bytes
- FIG. 27 four NFAs as shown in FIG. 27 are generated and converted into hardware circuits. 26 and 27, the white arrow indicates the initial state, the state indicated by a double circle indicates the end state, and the symbol ' ⁇ ' indicates an arbitrary character. As shown in FIG.
- the number of characters in the transition condition is expanded to a plurality, and an NFA is generated in consideration of the offset position where the target pattern is started.
- the number of processed bytes is increased. For this reason, although it can be determined at which position of the input character string the pattern is matched depending on which end state has been reached, NFA for the number of processing bytes is required for one pattern. May increase. Furthermore, only the case where exact match (Exact Match) is used is illustrated, and reduction of the number of states and correspondence to regular expressions are problems.
- Non-Patent Document 5 and Patent Document 1 sharing of states having the same transition condition is performed in order to reduce the increase in the number of states. Remains as.
- Non-Patent Document 6 expands the 1-byte NFA for the regular expression pattern of “a (bc) * (d
- one NFA as shown in FIG. 28 is generated, and this is converted into a hardware circuit. 26 and 27, the white arrow in the figure indicates the initial state, the state indicated by the double circle indicates the end state, and the symbol ' ⁇ ' indicates an arbitrary character.
- this method it is possible to increase the number of processing bytes per clock cycle without increasing the number of states by extending the NFA transition condition itself generated from one regular expression pattern to a plurality of bytes. For this reason, although high-speed pattern matching using a regular expression can be expected, this NFA circuit alone has a drawback in that it cannot determine at which position of the input character string the pattern matches.
- Patent Document 2 proposes a method for improving the detection speed and reducing the time for creating a state transition table by applying a state transition in units of multiple characters by applying a finite automaton method to the string search method of the information search system.
- Patent Documents 3 to 6 have been proposed as applying the finite automaton method to character matching.
- Patent Document 3 is a system that generates a finite state automaton or a finite state transducer representing a context free grammar from a context free grammar.
- a finite state automaton checks each character in an input character string to determine whether or not its 2-byte representation is within the valid range, and can be efficiently performed in a small memory space.
- the purpose is to be able to.
- Patent Document 5 proposes a method in which an automaton generation unit searches a character string by generating a finite state automaton using a derived type as a transition condition from a set of regular expressions and search sound frequency ranges.
- Patent Document 6 discloses that a document processing system that performs a character string search using a DFA (Deterministic Finite State Automaton) based on a search condition expressed in a regular expression in the embodiment is disclosed.
- DFA Deterministic Finite State Automaton
- the technique for constructing an NFA circuit from Multibyte NFA in which the number of characters of the transition condition is expanded to a plurality has the following problems.
- the first problem is that an NFA circuit constructed from a multibyte NFA that extends the 1-byte NFA transition condition itself generated from one regular expression pattern to multiple bytes, and if the input character string matches The NFA circuit alone cannot determine at which position it matches, and in order to know this, a circuit for that purpose is required separately.
- the second problem is that in an NFA circuit constructed from multiple multibyte NFAs that take into account the position of the offset at which the target pattern starts, it matches the pattern at which position of the input character string depending on which end state it has reached.
- exact Match exact match
- An object of the present invention is to extend a NFA transition condition itself to a plurality of bytes and to generate a character string matching finite automaton circuit generation system for multibyte processing that can independently determine at which position in an input character string it is matched. It is an object to provide a method, a generation program, and a pattern matching apparatus using the method.
- Another object of the present invention is to provide a finite automaton circuit generation system for character string matching for multibyte processing corresponding to a regular expression pattern, a generation method thereof, a generation program, and a pattern matching apparatus using the same.
- Still another object of the present invention is to generate an NFA circuit according to the purpose by making it possible to select whether or not to generate a finite automaton circuit that can independently determine at which position in the input character string it matches.
- the first and second finite automaton circuit generation systems for character string matching for multibyte processing of the present invention process 1-byte NFA without ⁇ transition converted from a regular expression with a specified number of bytes.
- the multibyte NFA conversion unit (instruction number 22 in FIG. 1 and instruction number 25 in FIG. 17) that converts to a multibyte NFA that can independently determine the position that matches the pattern, and the converted multibyte NFA in accordance with the operation mode. Describe the Multibyte NFA as a hardware circuit while referring to the NFA storage unit (instruction number 32 in FIGS. 1 and 17) that stores what is represented by the data structure and the state and state transition structure of the converted Multibyte NFA.
- the number of processing bytes of the Multibyte NFA to be converted can be specified by a power of 2.
- the pattern matching apparatus using the fourth multibyte processing character string matching finite automaton circuit of the present invention is generated in addition to the first and second multibyte processing character string matching finite automaton circuit generation systems.
- the configuration data conversion unit (instruction number 26 in FIG. 23) that generates configuration data that is configuration information of a reconfigurable hardware device such as an FPGA from the HDL, and the configuration can be configured by the configuration data.
- a pattern matching unit (instruction number 122 in FIG. 23) configured on the hardware device.
- the first effect is that, even with an NFA in which the NFA transition condition itself processed with 1 byte is expanded to a plurality of bytes, it is possible to independently determine at which position of the input character string the pattern matches.
- the second effect is that it can be selected according to the purpose to generate an NFA that can independently determine at which position of the input character string the pattern matches, and at which position of the input character string matches the pattern.
- an NFA that cannot be determined independently
- the third effect is that the NFA processed with a plurality of bytes generated according to the present invention can also support regular expressions.
- the fourth effect is that it is possible to configure a high-speed pattern matching device that can determine the position in the input character string that corresponds to the regular expression and can independently determine the position corresponding to the regular expression.
- FIG. 5 of the first embodiment of the present invention (at the end of step B3) It is a transition diagram.
- FIG. 24 is a diagram showing an example in the middle of 2-byte NFA conversion (before starting step B16) in which the matching position can be determined using the 1-byte NFA of FIG. 24 according to the flowchart of FIG. 5 of the first embodiment of the present invention. It is a transition diagram.
- FIG. 24 is a diagram showing an example in the middle of 2-byte NFA conversion (before starting step B16) in which the matching position can be determined using the 1-byte NFA of FIG. 24 according to the flowchart of FIG. 5 of the first embodiment of the present invention. It is a transition diagram.
- FIG. 25 is a state transition diagram of an example in which 2-byte NFA conversion is performed using the 1-byte NFA of FIG. 24 so that the matching position can be determined according to the flowchart of FIG. 5 according to the first embodiment of the present invention.
- FIG. 5 is a flowchart of the first embodiment of the present invention showing an example (at the end of step B3) of performing 4-byte NFA conversion in which the matching position can be determined using the 2-byte NFA of FIG. It is a transition diagram.
- FIG. 10 is a state transition diagram of an example in which 4-byte NFA conversion in which a matching position can be determined using 2-byte NFA of FIG. 9 is performed according to the flowchart of FIG. 5 of the first embodiment of the present invention. According to the flowchart of FIG.
- a state transition diagram in the middle of performing 2-byte NFA conversion (at the end of step B4) where the matching position cannot be determined using the 1-byte NFA of FIG. is there.
- a 2-byte NFA conversion in which the matching position cannot be determined using the 1-byte NFA of FIG. State 0, state 1 is selected, and step B15 is confirmed).
- FIG. 5 of the first embodiment of the present invention in the state transition diagram during the 2-byte NFA conversion (before the start of step B16) in which the matching position cannot be determined using the 1-byte NFA of FIG. is there.
- FIG. 25 is a state transition diagram of an example in which 2-byte NFA conversion in which a matching position cannot be determined using the 1-byte NFA of FIG. 24 is performed according to the flowchart of FIG. 5 of the first embodiment of the present invention.
- the state transition diagram during the 4-byte NFA conversion in which the matching position cannot be determined using the 2-byte NFA of FIG. is there.
- FIG. 9 is a state transition diagram of 1-byte NFA that is not generated by the 1-byte NFA converting unit 24 according to the second embodiment of this invention. It is a state transition diagram of 1-byte NFA generated by the 1-byte NFA converting unit 24 of the second embodiment of the present invention. It is a flowchart which shows step A10 in the 2nd Embodiment of this invention. It is a block diagram which shows the structure of the 3rd Embodiment of this invention. It is a block diagram which shows the structure of the 4th Embodiment of this invention. It is a state transition diagram of NFA of 1 character (1 byte) processing for a regular expression pattern of “a (bc) * (d
- FIG. 1 is a block diagram showing the configuration of the first embodiment of the present invention.
- the first embodiment of the present invention includes an input device 1 such as a keyboard, a data processing device 2 that operates under program control, a storage device 3 that stores information, a display device, and a printing device. And the like.
- the storage device 3 includes a regular expression storage unit 31, an NFA storage unit 32, and an HDL storage unit 33.
- the regular expression storage unit 31 stores one or more regular expressions input from the input device to the 1-byte NFA conversion unit 21.
- the NFA storage unit 32 stores the multibyte NFA converted from the 1-byte NFA in the multibyte NFA conversion unit 22 in the form of a data structure such as a list structure or a matrix format.
- the HDL storage unit 33 stores, in the HDL conversion unit 23, HDL such as Verilog HDL or VHDL (Very High Speed Integrated Circuit HDL) describing the NFA circuit of Multibyte NFA stored in the NFA storage unit 32.
- HDL such as Verilog HDL or VHDL (Very High Speed Integrated Circuit HDL) describing the NFA circuit of Multibyte NFA stored in the NFA storage unit 32.
- the data processing device 2 includes a 1-byte NFA conversion unit 21, a multibyte NFA conversion unit 22, and an HDL conversion unit 23.
- the 1-byte NFA converting unit 21 reads one regular expression or a list of a plurality of regular expressions input from the input device 1 and stores them in the regular expression storage unit 31.
- the 1-byte NFA conversion unit 21 converts the regular expression read from the regular expression storage unit 31 into a 1-byte NFA having no ⁇ transition using a conventional method as described in Non-Patent Document 1, for example. Then, the data structure representing the generated NFA is output to the Multibyte NFA conversion unit 22 and conversion of the next regular expression is started.
- a signal indicating that all regular expressions have been converted together with the data structure representing the generated NFA is sent to the Multibyte NFA conversion unit. 22 to output.
- the multibyte NFA conversion unit 22 reads the number of processing bytes and the operation mode (mode) input from the input device 1.
- the number of processing bytes is the number of processing bytes of the multibyte NFA to be generated, and the operation mode specifies the type of the multibyte NFA to be generated.
- a data structure representing 1-byte NFA having no ⁇ transition is received from the 1-byte NFA converting unit 21 and converted into a multibyte NFA having a desired number of processing bytes one by one according to the operation mode.
- the Multibyte NFA conversion unit 22 stores the data structure indicating the converted Multibyte NFA in the NFA storage unit 32, and if there is an NFA received from the 1-byte NFA conversion unit 21, the conversion is performed.
- the converted NFA is stored in the NFA storage unit 32, and then from the NFA storage unit 32
- the data structure indicating the NFA is read and output to the HDL conversion unit 23.
- the data structure of the last NFA is output, it is output together with a signal indicating that it is the last NFA.
- the HDL conversion unit 23 analyzes information such as the state of NFA, transitions between states, transition conditions, etc. from the data structure of the Multibyte NFA received from the Multibyte NFA conversion unit 22, registers each state as a character ( Column) is converted to a comparator, and the registers are connected in accordance with transitions between states, and converted into HDL such as Verilog HDL or VHDL describing the circuit.
- the HDL conversion unit 23 stores the converted HDL in the HDL storage unit 33, and when the conversion to HDL is completed, reads the HDL from the HDL storage unit 33 and outputs it to the output device 4.
- the regular expression input as one or a plurality of lists from the input device 1 is sent to the 1-byte NFA converting unit 21 as an operation mode (mode) for specifying the number of processing bytes of the generated Multibyte NFA and the type of the generated Multibyte NFA. ) Is supplied to the Multibyte NFA converting unit 22.
- mode operation mode
- the 1-byte NFA conversion unit 21 stores the received regular expression in the regular expression storage unit 31, reads out the regular expression one by one from the regular expression, and uses a known method described in Non-Patent Document 1 or the like. Is converted to 1-byte NFA without ⁇ transition (step A1). When the conversion is completed, the 1-byte NFA conversion unit 21 transmits the converted NFA to the Multibyte NFA conversion unit 22, reads the next regular expression from the regular expression storage unit 31, and converts it to a 1-byte NFA without ⁇ transition. To start. When the conversion of the last regular expression stored in the regular expression storage unit 31 is completed, a signal indicating that all regular expressions have been converted is output to the Multibyte NFA conversion unit 22 together with the converted NFA.
- the NFA sent from the 1-byte NFA converting unit 21 to the Multibyte NFA converting unit 22 is a data structure having state transition information of NFA.
- state transition information necessary when attention is paid to a state having an NFA is a state number of a transition destination and a label serving as a transition condition.
- the data structure output here may be a data structure that can acquire the next transition state and the transition condition (label) at that time when focusing on a certain state.
- a data structure representing such an NFA for example, there is a data structure using a structure managed by a one-dimensional array and a link list as shown in FIG.
- the multibyte NFA conversion unit 22 sets the current processing byte count B of the NFA received from the 1-byte NFA conversion unit 21 to 1 and the processing multibyte NFA processing byte count B T received from the input device 1.
- a number is set (step A2).
- the number of processing bytes of the target Multibyte NFA that is, as the number of bytes input from the input device 1, only a power of 2 can be specified, and when the number of bytes is other than that,
- the Multibyte NFA conversion unit 22 performs error processing and ends the processing (step A3).
- the Multibyte NFA converting unit 22 sets an operation mode (mode) for designating the type of the Multibyte NFA generated from the input device 1 (Step A4).
- mode the operation mode
- Multibyte NFA conversion section 22 the above setting is completed, unless Multibyte NFA 1 processing Bytes B T is the byte of interest (step A5), Multibyte NFA conversion section 22, received from the converting means 21 epsilon A 1-byte NFA having no transition is converted into a multibyte NFA having a processing byte count B T (step A6).
- FIG. 5 is a flowchart for explaining the detailed operation of step A6. Further, as an example, a description will be given by taking a conversion example of 1-byte NFA without an ⁇ transition generated from the regular expression “a (bc) * (d
- a transition with an arbitrary character is generated from the initial state to the initial state (step B1).
- a symbol indicating an arbitrary character is ‘X’, and the generated transition is referred to as “self-edge-initial”.
- step B2 the operation mode (mode) is checked (step B2).
- mode the operation mode in which it is possible to independently determine at which position of the input character string the pattern matches.
- the Multibyte NFA conversion unit 22 When the operation mode “mode” is “match”, the Multibyte NFA conversion unit 22 generates one end state for each end state, and uses an arbitrary character (label “X”) from the original end state to the generated end state. (Step B3). The transition generated here is called edge-final.
- FIG. 6 shows an example of an NFA that has finished up to step B3 with respect to the NFA in FIG.
- the Multibyte NFA conversion unit 22 selects one state that has not been selected so far from the current NFA, and this state that has not been selected so far has a transition to state n and state n. Is selected, and one state that has not been selected so far and has transitions from state i and state n is selected as state j (steps B5, B6, and B7). At this time, the label from the state i to the state n is 'L in ', and the label from the state n to the state j is 'L nj '.
- the Multibyte NFA converting unit 22 checks whether 'L nj ' is a label of self-edge-initial (step B8). If it is a label of self-edge-initial, it is not yet selected j Is checked (step B13). If the multibyte NFA conversion unit 22 is not a self-edge-initial label, it checks the operation mode (mode) again (step B2). When the operation mode mode is “match”, the Multibyte NFA conversion unit 22 generates a transition from the state i to the state j (step B10), and labels “L in ” from the state i to the state n and from the state n to the state j.
- a label “L ij ” obtained by concatenating “L nj ” with the label of “” is generated (step B11), and this label “L ij ” is set as a transition condition from the state i to the state j (step B12).
- this label “L ij ” is set as a transition condition from the state i to the state j (step B12).
- L nj labels L in and L nj are “a” and “b”, respectively.
- a transition with label “ab” is generated from state 2 to state 2.
- the Multibyte NFA conversion unit 22 checks whether there is a candidate for the state j that has not yet been selected (step B13), and if there is still a candidate, repeats from step B7. It is checked whether there is a candidate of state i that has not yet been selected (step B14). Similarly, the Multibyte NFA conversion unit 22 repeats from step B6 if there is still a candidate of state i, and checks if there is a candidate of state n that has not yet been selected if there is no candidate (step B15). If there is, repeat from step B5. For example, in the NFA of FIG. 6, when the state n is selected as the state n and the step B15 is confirmed, the NFA as shown in FIG. 7 is generated. However, the state transitions indicated by dotted lines are the original NFA transition (FIG. 24), the self-edge-initial transition added at step B1, and the edge-final transition added at step B3. The transition newly generated by this processing is represented.
- the Multibyte NFA converting unit 22 performs the state transition of the original NFA (transition condition is a transition of the processing byte number B) and the self- added in step B1.
- transition condition is a transition of the processing byte number B
- the transition between edge-initial and edge-final is deleted (step B16, step B17), and the current processing byte count B of the NFA is doubled (step B18).
- the NFA immediately before entering Step B16 is as shown in FIG. 8
- the NFA immediately after performing Step B17 is as shown in FIG. 9.
- the NFA immediately after performing Step B17 has a pattern at any position of the input character string obtained by doubling the number of processing bytes of the original NFA. An NFA that can determine whether they match is generated.
- Multibyte NFA conversion unit 22 compares the B T indicating the number of processing bytes which the converted specified processing bytes B of NFA, B is smaller than B T, that is, the number of processing bytes of interest If not, the process is repeated again from step B1 (step B19). If the target number of processed bytes is satisfied, the process ends.
- FIG. 10 shows an example of NFA immediately after step B3 and
- FIG. 11 shows an example of conversion of 4-byte NFA as an example of NFA when step B1 is further performed on FIG.
- step B9 is performed after step B4 and step B8 instead of step B3.
- step B9 is performed after step B4 and step B8 instead of step B3.
- step B9 is performed after step B4 and step B8 instead of step B3.
- step B9 is performed after step B4 and step B8 instead of step B3.
- the Multibyte NFA conversion unit 22 When non-match is selected as the operation mode mode, the Multibyte NFA conversion unit 22 generates a transition of the label 'X' from the end state to the end state after performing Step B1 and Step B2, and this is the self Called -edge-final (step B4).
- the operation mode mode is match
- FIG. 12 shows an example of the NFA that has completed the above steps.
- the Multibyte NFA conversion unit 22 performs Step B5 to Step B8, and if the label 'L nj ' is a self-edge-initial label, it is determined whether there is a candidate for the state j that has not yet been selected.
- Check step B13.
- the Multibyte NFA converting unit 22 checks whether the label 'L in ' is a self-edge-final label after checking the operation mode (step B2) (step S2). B9) If the label is a self-edge-final label, the process proceeds to step B13. If the label is not a self-edge-final label, the process proceeds to step B10 and the process is continued.
- step B9 it is checked whether “L in ” in step B9 is a label of self-edge-final, for example, when L in and L nj are “X” and “a”, respectively, the label “Xa” Thus, it is for preventing arbitrary characters from entering the middle of the pattern.
- the NFA of FIG. 12 when the state n is selected as the state n and the step B15 is confirmed, the NFA is as shown in FIG.
- the state transitions indicated by dotted lines are the transitions of the original NFA (FIG. 24) and the self-edge-initial and self-edge-final transitions added in Step B1 and Step B4.
- the transition newly generated by this processing is represented.
- the NFA immediately before entering Step B16 is as shown in FIG. 14, and the NFA immediately after performing Step B17 is as shown in FIG. 15, and the input character string obtained by doubling the number of processing bytes of the original NFA. An NFA in which it is impossible to determine at which position the pattern matches is generated.
- step B19 is performed, and the processing is terminated when the target number of processing bytes is satisfied.
- step B1 is further performed with respect to FIG. 15, an NFA as shown in FIG. 16 is obtained immediately after step B4, and finally a 4-byte NFA as shown in FIG. 28 is generated.
- the multibyte NFA conversion unit 22 stores the generated multibyte NFA in the NFA storage unit 32 when step A6 as described above is completed, and starts conversion if there is an NFA received from the 1-byte NFA conversion unit 21. If not, wait until the next NFA is received.
- the converted NFA is stored in the NFA storage unit 32, and then from the NFA storage unit 32
- the data structure indicating the NFA is read and output to the HDL conversion unit 23.
- the Multibyte NFA converting unit 22 outputs it together with a signal indicating that it is the last NFA (step A6).
- the HDL conversion unit 23 analyzes information such as the state of each NFA, transitions between states, and transition conditions from the data structure of the Multibyte NFA received from the Multibyte NFA conversion unit 22, and registers each state as a character and the transition condition as a character ( Column) is converted to a comparator, and each register is connected according to transition between states, converted to HDL such as Verilog HDL or VHDL describing the circuit, and the converted HDL is stored in the HDL storage unit 33 ( Step A7).
- the HDL conversion unit 23 reads out the HDL generated from the HDL storage unit 33 and outputs it to the output device 4 if requested (Step A8).
- the HDL describing the NFA circuit by inputting the regular expression itself, the HDL describing the NFA circuit by converting the Multibyte NFA that performs transition with the number of processing bytes specified from 1-byte NFA. Can be generated.
- the Multibyte NFA generated by this embodiment supports not only exact match (Exact Match) but also regular expression itself, and at any position of the input character string by specifying the operation mode. It is possible to generate an NFA circuit using Multibyte NFA that can be independently determined whether the pattern matches.
- the NFA circuit constructed from Multibyte NFA in which the transition condition itself of 1-byte NFA generated from one regular expression pattern is expanded to multiple bytes.
- an end state is newly generated according to the number of processing bytes, and these Depending on which state of the end state is reached, it is possible to independently determine at which position of the input character string it matches. For example, when a 4-byte NFA is generated that can determine at which position in the input character string the pattern matches the regular expression “a (bc) * (d
- the NFA circuit can independently determine at which position of the input character string the pattern matches the end state according to which end state is reached among the end states 4, 5, 6, and 7.
- the 1-byte NFA without the ⁇ transition converted by the 1-byte NFA converting unit 21 is sent to the Multibyte NFA converting unit 22 every time the conversion is completed. 32, and sends only the signal for which the conversion of 1-byte NFA without ⁇ transition is completed to the Multibyte NFA converting unit 22, and the Multibyte NFA converting unit 22 has no ⁇ transition stored in the NFA storage unit 32.
- -Conversion to Multibyte NFA may be performed while reading out byte NFA.
- the Multibyte NFA conversion unit 22 stores the converted Multibyte NFA in the NFA storage unit 32, and after the conversion for all regular expressions is completed, reads out all the Multibyte NFA from the NFA storage unit 32 and sends it to the HDL conversion unit 23.
- the Multibyte NFA conversion unit 22 may notify the HDL conversion unit 23 that the conversion has been completed, and the HDL conversion unit 23 may perform the HDL conversion process while reading the Multibyte NFA from the NFA storage unit 32.
- the Multibyte NFA conversion unit 22 may send the HDL conversion unit 23 to the HDL conversion unit 23 instead of storing it in the NFA storage unit 32 every time one conversion ends, and the HDL conversion unit 23 may start the HDL conversion process.
- the input device 1 can create a new regular expression without waiting for the processing of the 1-byte NFA conversion unit 21 to end.
- 1-byte NFA conversion unit 21 waits for the multibyte NFA conversion unit 22 to finish processing, and if new regular expression data exists in regular expression storage unit 31, 1-byte NFA conversion process can be started.
- the Multibyte NFA conversion unit 22 adds a new ⁇ transition to the NFA storage unit 32. If there is a 1-byte NFA without any, the next Multibyte NFA conversion process can be started.
- the HDL conversion unit 23 can directly read the Multibyte NFA from the NFA storage unit 32, the HDL conversion process can be started if a new Multibyte NFA exists in the NFA storage unit 32.
- the storage device 3 it is possible to perform HDL generation processing describing an efficient NFA circuit.
- the HDL conversion unit 23 and the HDL storage unit 33 are removed from the above embodiment, and the data structure of the generated Multibyte NFA is directly output from the Multibyte NFA conversion unit 22 to the output device 4, not as an NFA circuit.
- a finite automaton for character string matching for multibyte processing can be generated.
- the input character string is not limited to the NFA but also to the DFA. It is possible to generate a DFA for multi-byte processing that can determine at which position the pattern matches.
- FIG. 17 is a block diagram showing a configuration of the second exemplary embodiment of the present invention.
- the data processing device 5 includes a 1-byte NFA conversion unit 24, a Multibyte NFA conversion unit 25, and an HDL conversion unit 23.
- the 1-byte NFA conversion unit 21 and the multibyte NFA conversion unit 22 of the data processing apparatus 2 of the first embodiment shown in FIG. 1 are replaced with the 1-byte NFA conversion unit 24 and the multibyte NFA conversion. Part 25 is replaced. Others are the same as those in the first embodiment.
- the 1-byte NFA conversion unit 24 generates a 1-byte NFA without an ⁇ transition from the regular expression in the same manner as the 1-byte NFA conversion unit 21 in the first embodiment, but adds restrictions to this NFA. ing. Others are the same as the 1-byte NFA conversion unit 21 in the first embodiment.
- the Multibyte NFA conversion unit 25 performs Multibyte NFA conversion in a procedure specialized for 1-byte NFA having the constraint conditions generated by the 1-byte NFA conversion unit 24, and the others are the Multibyte NFA conversion of the first embodiment. This is the same as the conversion unit 22.
- the regular expression input as one or a plurality of lists from the input device 1 is sent to the 1-byte NFA conversion unit 24 as an operation mode (mode) for specifying the number of processing bytes of the generated Multibyte NFA and the type of the generated Multibyte NFA. ) Is supplied to the Multibyte NFA conversion unit 25.
- mode operation mode
- the 1-byte NFA conversion unit 24 stores the received regular expression in the regular expression storage unit 31, reads out the regular expression one by one from the regular expression, and uses the well-known technique described in Non-Patent Document 1 or the like, The expression is converted to 1-byte NFA without any ⁇ transition to which a certain constraint condition is added (step A9).
- the target multibyte NFA is converted for either NFA.
- the 1-byte NFA conversion unit 24 generates an NFA in which no transition from the end state to another state including itself exists as shown in FIG.
- the target multibyte NFA cannot be converted unless it is a 1-byte NFA to which such a constraint condition is added.
- this restriction there is an advantage that the conversion processing in the Multibyte NFA conversion unit 25 can be partially simplified.
- Step A5 the Multibyte NFA converting unit 25 performs Step A2 to Step 4, and if the target Multibyte NFA processing byte number B T is not 1 byte (Step A5), the Multibyte NFA converting unit 22 converting the 1-byte NFA without ⁇ transition received from the byte NFA conversion unit 24 to the Multibyte NFA treatment bytes B T (step A10).
- Steps A2 to A5 are the same as those in the first embodiment, and thus detailed description thereof is omitted.
- FIG. 21 is a flowchart for explaining more detailed operations in step A10. .
- the Multibyte NFA converting unit 22 omits detailed description of each step from Step B1 to Step B8 and from Step B10 to Step B19 because it is the same as the operation of the first embodiment, but as the operation of Step A10 (FIG. 21), after selecting state n, state i, and state j (steps B5, B6, B7), it is checked whether label 'L nj ' is a self-edge-initial label (step B8). If the label is not a self-edge-initial label, step B10 for immediately generating a transition from state i to state j is performed in accordance with the operation in step A6 of the first embodiment (FIG. 5). Is different.
- Step A10 is completed (Step A7, Step A8) is the same as the operation after Step A6 is completed in the operation of the first embodiment. Detailed description is omitted.
- the conversion of Multibyte NFA that performs transition with the number of processing bytes specified from 1-byte NFA is performed.
- an HDL describing the NFA circuit can be generated.
- the Multibyte NFA generated by this embodiment supports not only exact match (Exact Match) but also regular expression itself, and at any position of the input character string by specifying the operation mode. It is possible to generate an NFA circuit using Multibyte NFA that can independently determine whether the pattern matches.
- step B8 is step B2
- step B9 is further performed depending on the operation mode
- step B10 is performed.
- the Multibyte NFA converting unit 22 can immediately perform Step B10 if the label 'L nj ' is not a self-edge-initial label in Step B8. it can.
- the Multibyte NFA circuit that can independently determine at which position of the input character string the pattern matches, and the position that matches the pattern are not alone. Both Multibyte NFA circuits that cannot be discriminated can be generated, and an effective NFA circuit can be generated according to the purpose of use.
- the 1-byte NFA without the ⁇ transition converted by the 1-byte NFA converter 24 is sent to the Multibyte NFA converter 25 every time conversion is completed.
- this is directly stored in the NFA storage unit 32, and only the signal for which the 1-byte NFA conversion without ⁇ transition is completed is sent to the Multibyte NFA conversion unit 25.
- the Multibyte NFA conversion unit 25 May be converted to Multibyte NFA while reading 1-byte NFA having no ⁇ transition stored in.
- the Multibyte NFA conversion unit 25 stores the converted Multibyte NFA in the NFA storage unit 32, and after the conversion for all the regular expressions is completed, reads out all the Multibyte NFA from the NFA storage unit 32 and sends it to the HDL conversion unit 23.
- the Multibyte NFA conversion unit 25 may notify the HDL conversion unit 23 that the conversion has been completed, and the HDL conversion unit 23 may perform the HDL conversion process while reading the Multibyte NFA from the NFA storage unit 32.
- the Multibyte NFA conversion unit 25 may send the data to the HDL conversion unit 23 instead of storing it in the NFA storage unit 32 every time one conversion is completed, and the HDL conversion unit 23 may start the HDL conversion process.
- the input device 1 can create a new regular expression without waiting for the processing of the 1-byte NFA conversion unit 24 to end.
- the 1-byte NFA conversion unit 24 does not wait for the processing of the multibyte NFA conversion unit 25 to end, and if new regular expression data exists in the regular expression storage unit 31, the next 1-byte NFA conversion process can be started.
- the Multibyte NFA converting unit 25 stores the new ⁇ transition in the NFA storage unit 32. If there is a 1-byte NFA without any, the next Multibyte NFA conversion process can be started.
- the HDL conversion unit 23 can directly read the Multibyte NFA from the NFA storage unit 32, the HDL conversion process can be started if a new Multibyte NFA exists in the NFA storage unit 32.
- the storage device 3 it is possible to perform HDL generation processing describing an efficient NFA circuit.
- the HDL conversion unit 23 and the HDL storage unit 33 are removed from the above embodiment, and the data structure of the generated Multibyte NFA is directly output from the Multibyte NFA conversion unit 22 to the output device 4, not as an NFA circuit.
- a finite automaton for character string matching for multibyte processing can be generated.
- the input character string is not limited to the NFA but also to the DFA. It is possible to generate a DFA for multi-byte processing that can determine at which position the pattern matches.
- FIG. 22 is a block diagram showing the configuration of the third exemplary embodiment of the present invention.
- the third embodiment of the present invention includes an input device 1, a data processing device 6, a storage device 3, and an output device 4, as in the first and second embodiments.
- the processing of the 1-byte NFA conversion unit 24, the multibyte NFA conversion unit 25, and the HDL conversion unit 23 of the processing device 5 is realized by a regular expression-HDL conversion program 7 executed by the data processing device.
- the regular expression-HDL conversion program 7 is read by the data processing device 6 and controls the operation of the data processing device 6 to generate a regular expression storage unit 31, an NFA storage unit 32, and an HDL storage unit 33 in the storage device 3. .
- the data processing device 6 executes the same processing as the processing by the data processing devices 2 and 5 in the first and second embodiments under the control of the regular expression-HDL conversion program.
- a multibyte NFA that performs a transition with a specified number of processing bytes is realized by inputting a regular expression itself.
- An HDL describing the NFA circuit can be generated.
- the Multibyte NFA generated by this embodiment supports not only exact match (Exact Match) but also regular expression itself, and at any position of the input character string by specifying the operation mode. It is possible to generate an NFA circuit using a Multibyte NFA that can independently determine whether the pattern matches.
- the operation mode by specifying the operation mode, it generates both the Multibyte NFA circuit that can independently determine at which position of the input character string the pattern matches, and the Multibyte NFA circuit that cannot determine the position that matches the pattern alone. It is possible to generate an efficient NFA circuit by selecting according to the purpose.
- the regular expression-HDL conversion program 7 is read by the data processing device 6 to control the operation of the data processing device 6, and only the regular expression storage unit 31 and the NFA storage unit 32 are generated in the storage device 3.
- the regular expression-HDL conversion program 7 is read by the data processing device 6 to control the operation of the data processing device 6, and only the regular expression storage unit 31 and the NFA storage unit 32 are generated in the storage device 3.
- the same processing can be performed not only for NFA but also for DFA.
- FIG. 23 is a block diagram showing a configuration of the fourth exemplary embodiment of the present invention.
- an input device 1 such as a keyboard, a data processing device 8 that operates under program control, a storage device 9 that stores information, and a reconfiguration such as an FPGA
- a configuration device 10 such as a cable for configuring the configuration of the hardware device, a data input device 11 for inputting data to be searched for pattern matching to the pattern matching device, and an FPGA or the like.
- a pattern matching device 12 having a configurable hardware device and a result output device 13 such as a display device or a printing device for outputting the result of pattern matching are provided.
- the CPU 102 controls the data processing device 8 and the storage device 9, and the CPU 102 is operated by a program in each unit in the data processing device 8.
- the pattern matching device 12 includes a reconfigurable hardware device such as an FPGA.
- the storage device 9 is obtained by adding a configuration data storage unit 34 to the storage device 3 of the first embodiment shown in FIG. Others are the same as those in the first embodiment.
- the configuration data storage unit 34 stores configuration data, which is configuration information of the target device, generated from the HDL describing the Multibyte NFA circuit read from the HDL storage unit 33 in the configuration data conversion unit 26.
- the data processing device 8 is obtained by adding a configuration data conversion unit 26 to the data processing device 2 of the first embodiment shown in FIG. Others are the same as those in the first embodiment.
- the configuration data conversion unit 26 is designated when receiving a signal that means that the conversion to HDL has been completed by the HDL conversion unit 23 or a signal that means the start of generation of configuration data from the input device 1.
- the HDL describing the Multibyte NFA circuit stored in the stored HDL storage unit 33 is read, converted from HDL to Configuration data that is the configuration information of the target device, and stored in the configuration data storage unit 34 when the conversion ends. To do.
- the conversion from HDL to Configuration data is, for example, an FPGA, using a development tool provided by the vendor, and detailed description thereof is omitted.
- the pattern matching device 12 includes a data input unit 121, a pattern matching unit 122, and a result output unit 123, which are configured on separate reconfigurable hardware devices.
- the data input unit 121 shapes the pattern matching target data (referred to as data to be searched) such as packet data or text data input from the data input device 11, and the number of processing bytes generated by the data processing device 8 Are parallelized and input to the pattern matching unit 122.
- data to be searched the pattern matching target data
- the number of processing bytes generated by the data processing device 8 are parallelized and input to the pattern matching unit 122.
- the pattern matching unit 122 is a circuit configured by configuration data generated by the data processing device 8 input via the configuration device 10, and is a multibyte NFA circuit itself generated by the data processing device 8.
- NFA circuit configured in the pattern matching unit 122, when a condition transition occurs due to data to be searched input from the data input unit 121 and the pattern matches, an output signal from the register that constitutes the end state Is output to the result output unit 123.
- the result output unit 123 receives a signal indicating that it matches the pattern input from the pattern matching unit 122. If the NFA circuit configured in the pattern matching unit 122 is an NFA circuit that can determine at which position of the input character string it matches the pattern, it is input depending on from which state the signal is received. If the NFA circuit cannot determine which pattern is matched at which position of the searched data and where the NFA circuit configured in the pattern matching unit 122 is located in the pattern in the input character string Then, information such as which input character string and which pattern the input search target data matches is processed and output to the result output device 13. Note that there is a method of notifying which pattern is matched by a previously defined pattern number or the like.
- the multibyte NFA that performs transition with the number of processing bytes specified from 1-byte NFA is converted, and the HDL that describes the NFA circuit is converted.
- the NFA circuit is configured on a hardware device, and a pattern matching apparatus using the NFA circuit can be realized.
- the Multibyte NFA generated by this embodiment supports not only exact match (Exact Match) but also regular expression itself, and at any position of the input character string by specifying the operation mode. It is possible to realize a pattern matching apparatus using an NFA circuit using a Multibyte NFA that can independently determine whether the pattern matches.
- the position of the input character string that matches the pattern cannot be determined independently.
- a multibyte NFA circuit that cannot be discriminated can be used, and an efficient pattern matching device using an NFA circuit can be realized according to the purpose.
- the data processing device 8 and the storage device 9 in the above embodiment are configured by adding the configuration data conversion unit 26 and the configuration data storage unit 34 to the data processing device 2 and the storage device 3 in the first embodiment.
- the configuration data conversion unit 26 and the configuration data storage unit 34 may be added to the data processing device 5 and the storage device 3 in the second embodiment.
- configuration data may be generated from HDL generated from the regular expression-HDL conversion program 7 in the third embodiment.
- the data input unit 121, the pattern matching unit 122, and the result output unit 123 are configured on separate reconfigurable hardware devices. May be configured on the same reconfigurable hardware device.
- the data input unit 121 and the result output unit 123 are configured on the same reconfigurable hardware device, and the pattern matching unit 122 is configured on another reconfigurable hardware.
- the generated HDL may be used to configure a hardware device that cannot be reconfigured, such as an ASIC (Application Specific Integrated Circuit).
- the configuration data conversion unit 26 is configured by the HDL conversion unit 23. This can be dealt with by reading not only the HDL that describes the NFA circuit generated in, but also the HDL that describes these circuits and generating the configuration data.
- the same processing can be performed not only for NFA but also for DFA.
- the present invention can be applied to applications such as an HDL generation system and a generation program that describe an NFA circuit for performing pattern matching processing using regular expressions.
- an NFA circuit with HDL generated using the present invention, it can be applied to applications such as a pattern matching device for performing high-speed pattern matching processing using regular expressions.
- a packet processing circuit to the pattern matching device, it can also be applied to a network intrusion detection system (NIDS: Network Intrusion Detection System) and a network intrusion prevention system (NIPS: Network Intrusion Protection System), and is installed in personal computers and workstations.
- NIDS Network Intrusion detection System
- NIPS Network Intrusion Prevention System
- the present invention can also be applied to a hardware accelerator NFA circuit generation system, a generation program, a regular expression search hardware accelerator device, and the like, which are alternatives to the software-based pattern matching processing.
- the present invention relates to a finite automaton generation system for character string matching for multibyte processing, an automaton circuit generation system, a generation method thereof, a circuit generation method, a generation program, a circuit generation program, a pattern matching apparatus using the same, and a character for multibyte processing
- an automaton circuit generation system a generation method thereof, a circuit generation method, a generation program, a circuit generation program, a pattern matching apparatus using the same, and a character for multibyte processing
- the present invention can be applied to any circuit, and there is no limitation on the possibility of use.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- Computational Mathematics (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
次に、本発明の実施の形態について図面を参照して詳細に説明する。
次に、図1、及び図2の流れ図を参照して、本発明の第1の実施の形態の動作について詳細に説明する。
次に、本発明の第2の実施の形態について図面を参照して詳細に説明する。
次に、図17、及び図18を参照して本発明の第2の実施の形態の動作について詳細に説明する。
次に、本発明の第3の実施の形態について図面を参照して詳細に説明する。
次に、本発明の第4の実施の形態について図面を参照して詳細に説明する。
2 データ処理装置
3 記憶装置
4 出力装置
5 データ処理装置
6 データ処理装置
7 正規表現-HDL変換プログラム
8 データ処理装置
9 記憶装置
10 コンフィグレーション装置
11 データ入力装置
12 パターンマッチング装置
13 結果出力装置
21 1-byte NFA変換部
22 Multibyte NFA変換部
23 HDL変換部
24 1-byte NFA変換部
25 Multibyte NFA変換部
26 Configuration data変換部
31 正規表現記憶部
32 NFA記憶部
33 HDL記憶部
34 Configuration data記憶部
101,102 CPU
121 データ入力部
122 パターンマッチング部
123 結果出力部
200~204 レジスタ
300~304 各文字を比較する比較器
400~403 ANDゲート
500~502 ORゲート
Claims (40)
- 正規表現を用いたパターンから、複数の文字数から成る遷移条件をもち、到達した終了状態によって入力された文字列のどの位置で前記パターンに一致したかを単独で判別できるNFAを生成する手段を有するマルチバイト処理向け文字列照合用有限オートマトン生成装置。
- 到達した終了状態によって入力された文字列のどの位置でパターンに一致したかを単独で判別できる有限オートマトンか、どの位置でパターンに一致したかを単独では判別できないが、状態数は前記有限オートマトンよりも少なく、回路規模が削減できる有限オートマトンかを選択して生成することができることを特徴とする請求項1に記載のマルチバイト処理向け文字列照合用有限オートマトン生成装置。
- 入力された正規表現を記憶する正規表現記憶手段と、
前記正規表現記憶手段に記憶された正規表現からε遷移のない1 byteで遷移するNFA(Non-deterministic Finite Automaton)へ変換する1-byte NFA変換手段と、
前記ε遷移のない1 byteで遷移するNFAを、指定された処理バイト数で遷移を行うNFAへ変換するMultibyte NFA変換手段と、
Multibyte NFA変換手段で変換したNFAを記憶するNFA記憶手段と、
を備えることを特徴とするマルチバイト処理向け文字列照合用有限オートマトン生成装置。 - 前記Multibyte NFA変換手段は請求項1又は請求項2に記載のNFAを生成する手段を備えることを特徴とする請求項3に記載のマルチバイト処理向け文字列照合用有限オートマトン生成装置。
- 前記Multibyte NFA変換手段は、指定された動作モードにより、入力された文字列のどの位置でパターンが一致したかを単独で判別できるNFAを生成するか、入力された文字列のどの位置でパターンが一致したかは単独では判別できないNFAを生成するかを利用目的に応じて選択できる、
ことを特徴とする請求項3又は請求項4に記載のマルチバイト処理向け文字列照合用有限オートマトン生成装置。 - 前記1-byte NFA変換手段において、正規表現から変換するε遷移のない1 byteで遷移するNFAに対し、終了状態からは自身も含めた他の状態へ遷移しないという制約を加えることにより、前記Multibyte NFA変換手段での変換処理が簡略化できる、
ことを特徴とする請求項3乃至請求項5いずれか一つに記載のマルチバイト処理向け文字列照合用有限オートマトン生成装置。 - 正規表現を用いたパターンから、複数の文字数から成る遷移条件をもち、到達した終了状態によって入力された文字列のどの位置でパターンに一致したかを単独で判別できるNFAを生成するNFA生成手段を有することを特徴とするマルチバイト処理向け文字列照合用有限オートマトン回路生成装置。
- 到達した終了状態によって入力された文字列のどの位置でパターンに一致したかを単独で判別できる有限オートマトンか、どの位置でパターンに一致したかを単独では判別できないが、状態数は前記有限オートマトンよりも少なく、回路規模が削減できる有限オートマトンかを選択して生成することができることを特徴とする請求項7に記載のマルチバイト処理向け文字列照合用有限オートマトン回路生成装置。
- 入力された正規表現を記憶する正規表現記憶手段と、
前記正規表現記憶手段に記憶された正規表現からε遷移のない1 byteで遷移するNFA(Non-deterministic Finite Automaton)へ変換する1-byte NFA変換手段と、
前記ε遷移のない1 byteで遷移するNFAを、指定された処理バイト数で遷移を行うNFAへ変換するMultibyte NFA変換手段と、
Multibyte NFA変換手段で変換したNFAを記憶するNFA記憶手段と、
Multibyte NFA変換手段で変換したNFAから、そのハードウェア回路を記述するハードウェア記述言語を生成するHDL変換手段と、
HDL変換手段で変換したハードウェア記述言語を記憶するHDL記憶手段と、
を備えることを特徴とするマルチバイト処理向け文字列照合用有限オートマトン回路生成装置。 - 前記Multibyte NFA変換手段は、前記NFA生成手段を有することを特徴とする請求項9記載のマルチバイト処理向け文字列照合用有限オートマトン回路生成装置。
- 前記Multibyte NFA変換手段は、指定された動作モードにより、入力された文字列のどの位置でパターンが一致したかを単独で判別できるNFAを生成するか、入力された文字列のどの位置でパターンが一致したかは単独では判別できないNFAを生成するかを利用目的に応じて選択できる、
ことを特徴とする請求項9又は請求項10に記載のマルチバイト処理向け文字列照合用有限オートマトン回路生成装置。 - 前記1-byte NFA変換手段において、正規表現から変換するε遷移のない1 byteで遷移するNFAに対し、終了状態からは自身も含めた他の状態へ遷移しないという制約を加えることにより、前記Multibyte NFA変換手段での変換処理が簡略化できる、
ことを特徴とする請求項9乃至請求項11いずれか一つに記載のマルチバイト処理向け文字列照合用有限オートマトン回路生成装置。 - 正規表現を用いたパターンから、複数の文字数から成る遷移条件をもち、到達した終了状態によって入力された文字列のどの位置で前記パターンに一致したかを単独で判別できるNFAを生成するNFA生成処理を有することを特徴とするマルチバイト処理向け文字列照合用有限オートマトン生成方法。
- 到達した終了状態によって入力された文字列のどの位置でパターンに一致したかを単独で判別できる有限オートマトンか、どの位置でパターンに一致したかを単独では判別できないが、状態数は前記有限オートマトンよりも少なく、回路規模が削減できる有限オートマトンかを選択して生成することができることを特徴とする請求項13に記載のマルチバイト処理向け文字列照合用有限オートマトン生成方法。
- 入力された正規表現を記憶し、
前記記憶された正規表現からε遷移のない1 byteで遷移するNFAへ変換し、
前記ε遷移のない1 byteで遷移するNFAを、指定された処理バイト数で遷移を行うNFAへ変換し、
前記変換したNFAを記憶する、
ことを特徴とするマルチバイト処理向け文字列照合用有限オートマトン生成方法。 - さらに、前記ε遷移のない1 byteで遷移するNFAから指定された処理バイト数で遷移を行うNFAへの変換は、前記NFA生成処理であることを特徴とする請求項15に記載のマルチバイト処理向け文字列照合用有限オートマトン生成方法。
- 前記ε遷移のない1 byteで遷移するNFAから指定された処理バイト数で遷移を行うNFAへの変換は、指定された動作モードにより、入力された文字列のどの位置でパターンが一致したかを単独で判別できるNFAを生成するか、入力された文字列のどの位置でパターンが一致したかは単独では判別できないNFAを生成するかを利用目的に応じて選択できる、
ことを特徴とする請求項15又は請求項16に記載のマルチバイト処理向け文字列照合用有限オートマトン生成方法。 - 前記正規表現から変換するε遷移のない1 byteで遷移するNFAは、終了状態からは自身も含めた他の状態へ遷移しないという制約を加えることにより、1-byte NFAから指定された処理バイト数で遷移を行うMultibyte NFAへの変換が容易になる、
ことを特徴とする請求項15乃至請求項17いずれか一つに記載のマルチバイト処理向け文字列照合用有限オートマトン生成方法。 - 正規表現を用いたパターンから、複数の文字数から成る遷移条件をもち、到達した終了状態によって入力された文字列のどの位置で前記パターンに一致したかを単独で判別できるNFAを生成する第二NFA生成処理を有することを特徴とするマルチバイト処理向け文字列照合用有限オートマトン回路生成方法。
- 到達した終了状態によって入力された文字列のどの位置でパターンに一致したかを単独で判別できる有限オートマトンか、どの位置でパターンに一致したかを単独では判別できないが、状態数は前記有限オートマトンよりも少なく、回路規模が削減できる有限オートマトンかを選択して生成することができることを特徴とする請求項19に記載のマルチバイト処理向け文字列照合用有限オートマトン回路生成方法。
- 入力された正規表現を記憶し、
前記記憶された正規表現からε遷移のない1 byteで遷移するNFAへ変換し、
前記ε遷移のない1 byteで遷移するNFAを、指定された処理バイト数で遷移を行うNFAへ変換し、
前記変換したNFAを記憶し、
前記記憶されたNFAから、そのハードウェア回路を記述するハードウェア記述言語を生成し、
そのハードウェア記述言語を記憶することを特徴とするマルチバイト処理向け文字列照合用有限オートマトン回路生成方法。 - さらに、前記ε遷移のない1 byteで遷移するNFAから指定された処理バイト数で遷移を行うNFAへの変換は、前記第二NFA生成処理であることを特徴とする請求項21に記載のマルチバイト処理向け文字列照合用有限オートマトン回路生成方法。
- 前記ε遷移のない1 byteで遷移するNFAから指定された処理バイト数で遷移を行うNFAへの変換は、指定された動作モードにより、入力された文字列のどの位置でパターンが一致したかを単独で判別できるNFAを生成する処理か、入力された文字列のどの位置でパターンが一致したかを単独では判別できないNFAを生成する処理かを利用目的に応じて選択できる、
ことを特徴とする請求項21又は請求項22に記載のマルチバイト処理向け文字列照合用有限オートマトン回路生成方法。 - 前記正規表現から変換するε遷移のない1 byteで遷移するNFAは、終了状態からは自身も含めた他の状態へ遷移しないという制約を加えることにより、ε遷移のない1 byteで遷移するNFAから指定された処理バイト数で遷移を行うNFAへの変換が容易になる、
ことを特徴とする請求項21乃至請求項23いずれか一つに記載のマルチバイト処理向け文字列照合用有限オートマトン回路生成方法。 - 正規表現を用いたパターンから、複数の文字数から成る遷移条件をもち、到達した終了状態によって入力された文字列のどの位置で前記パターンに一致したかを単独で判別できるNFAを生成する第三NFA生成処理をコンピュータに実行させることを特徴とするマルチバイト処理向け文字列照合用有限オートマトン生成プログラム。
- 到達した終了状態によって入力された文字列のどの位置でパターンに一致したかを単独で判別できる有限オートマトンか、どの位置でパターンに一致したかを単独では判別できないが、状態数は前記有限オートマトンよりも少なく、回路規模が削減できる有限オートマトンかを選択して生成する処理をコンピュータに実行させることを特徴とする請求項25に記載のマルチバイト処理向け文字列照合用有限オートマトン生成プログラム。
- 入力された正規表現を記憶する正規表現記憶処理と、
前記記憶された正規表現からε遷移のない1 byteで遷移するNFAへ変換する1-byte NFA変換処理と、
前記ε遷移のない1 byteで遷移するNFAを、指定された処理バイト数で遷移を行うNFAへ変換するMultibyte NFA変換処理と、
前記変換したNFAを記憶するNFA記憶処理と、
をコンピュータに実行させることを特徴とするマルチバイト処理向け文字列照合用有限オートマトン生成プログラム。 - 前記Multibyte NFA変換処理は、前記第三NFA生成処理をコンピュータに実行させることを特徴とする請求項27に記載のマルチバイト処理向け文字列照合用有限オートマトン生成プログラム。
- 前記Multibyte NFA変換処理は、指定された動作モードにより、入力された文字列のどの位置でパターンが一致したかを単独で判別できるNFAを生成するか、入力された文字列のどの位置でパターンが一致したかを単独では判別できないNFAを生成するかを利用目的に応じて選択できる、
ことを特徴とする請求項27又は請求項28に記載のマルチバイト処理向け文字列照合用有限オートマトン生成プログラム。 - 前記1-byte NFA変換処理において、正規表現から変換するε遷移のない1 byteで遷移するNFAに対し、終了状態からは自身も含めた他の状態へ遷移しないという制約を加えることにより、前記Multibyte NFA変換処理が簡略化できる、
ことを特徴とする請求項27乃至請求項29いずれか一つに記載のマルチバイト処理向け文字列照合用有限オートマトン生成プログラム。 - 正規表現を用いたパターンから、複数の文字数から成る遷移条件をもち、到達した終了状態によって入力された文字列のどの位置で前記パターンに一致したかを単独で判別できるNFAを生成する第四NFA生成処理をコンピュータに実行させることを特徴とするマルチバイト処理向け文字列照合用有限オートマトン回路生成プログラム。
- 到達した終了状態によって入力された文字列のどの位置でパターンに一致したかを単独で判別できる有限オートマトンか、どの位置でパターンに一致したかを単独では判別できないが、状態数は前記有限オートマトンよりも少なく、回路規模が削減できる有限オートマトンかを選択して生成する処理をコンピュータに実行させることを特徴とする請求項31に記載のマルチバイト処理向け文字列照合用有限オートマトン回路生成プログラム。
- 入力された正規表現を記憶する正規表現記憶処理と、
前記正規表現記憶手段に記憶された正規表現からε遷移のない1 byteで遷移するNFAへ変換する1-byte NFA変換処理と、
前記ε遷移のない1 byteで遷移するNFAを、指定された処理バイト数で遷移を行うNFAへ変換するMultibyte NFA変換処理と、
Multibyte NFA変換処理で変換したNFAを記憶するNFA記憶処理と、
Multibyte NFA変換処理で変換したNFAから、そのハードウェア回路を記述するハードウェア記述言語を生成するHDL変換処理と、
HDL変換処理で変換したハードウェア記述言語を記憶するHDL記憶処理と、
をコンピュータに実行させることを特徴とするマルチバイト処理向け文字列照合用有限オートマトン回路生成プログラム。 - 前記Multibyte NFA変換処理は、前記第四NFA生成処理をコンピュータに実行させることを特徴とする請求項33に記載のマルチバイト処理向け文字列照合用有限オートマトン回路生成プログラム。
- 前記Multibyte NFA変換処理は、指定された動作モードにより、入力された文字列のどの位置でパターンが一致したかを単独で判別できるNFAを生成するか、入力された文字列のどの位置でパターンが一致したかを単独では判別できないNFAを生成するかを利用目的に応じて選択できる、
ことを特徴とする請求項33又は請求項34に記載のマルチバイト処理向け文字列照合用有限オートマトン回路生成プログラム。 - 前記1-byte NFA変換処理において、正規表現から変換するε遷移のない1 byteで遷移するNFAに対し、終了状態からは自身も含めた他の状態へ遷移しないという制約を加えることにより、前記Multibyte NFA変換処理が簡略化できる、
ことを特徴とする請求項33乃至請求項35いずれか一つに記載のマルチバイト処理向け文字列照合用有限オートマトン回路生成プログラム。 - 請求項7から請求項12に記載の有限オートマトン回路生成装置、または、請求項19から請求項24に記載の有限オートマトン回路生成方法、または、請求項31から請求項36に記載の有限オートマトン回路生成プログラムを用いて生成したハードウェア記述言語を用いて、再構成可能ハードウェアデバイス上に前記有限オートマトン回路を用いることを特徴とするパターンマッチング装置。
- 請求項7から請求項12に記載の有限オートマトン回路生成装置に加え、
前記有限オートマトン回路生成装置で生成したハードウェア記述言語から、再構成ハードウェアデバイスの構成情報であるConfiguration dataを生成するConfiguration data変換手段と、
を備え、前記生成したConfiguration dataを用いて再構成可能ハードウェアデバイス上に前記有限オートマトン回路を用いることを特徴とするパターンマッチング装置。 - 請求項7から請求項12に記載の有限オートマトン回路生成装置、または、請求項19から請求項24に記載の有限オートマトン回路生成方法、または、請求項31から請求項36に記載の有限オートマトン回路生成プログラムを用いて生成したハードウェア記述言語を用いて構成した、再構成可能ハードウェアデバイス上のマルチバイト処理向け文字列照合用有限オートマトン回路。
- 請求項7から請求項12に記載の有限オートマトン回路生成装置に加え、
前記有限オートマトン回路生成装置で生成したハードウェア記述言語から、再構成ハードウェアデバイスの構成情報であるConfiguration dataを生成するConfiguration data変換手段と、
を備え、前記生成したConfiguration dataを用いて再構成可能ハードウェアデバイス上のマルチバイト処理向け文字列照合用有限オートマトン回路。
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US12/933,504 US20110022617A1 (en) | 2008-03-19 | 2009-03-19 | Finite automaton generation system for string matching for multi-byte processing |
JP2010503940A JPWO2009116646A1 (ja) | 2008-03-19 | 2009-03-19 | マルチバイト処理向け文字列照合用有限オートマトン生成システム |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2008-071951 | 2008-03-19 | ||
JP2008071951 | 2008-03-19 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2009116646A1 true WO2009116646A1 (ja) | 2009-09-24 |
Family
ID=41091044
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2009/055515 WO2009116646A1 (ja) | 2008-03-19 | 2009-03-19 | マルチバイト処理向け文字列照合用有限オートマトン生成システム |
Country Status (3)
Country | Link |
---|---|
US (1) | US20110022617A1 (ja) |
JP (1) | JPWO2009116646A1 (ja) |
WO (1) | WO2009116646A1 (ja) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2011070502A (ja) * | 2009-09-28 | 2011-04-07 | Mitsubishi Electric Corp | シーケンス制御回路及び制御回路 |
US20140101155A1 (en) * | 2012-10-10 | 2014-04-10 | H. Jonathan Chao | Generating a tunable finite automaton for regular expression matching |
US20140101187A1 (en) * | 2012-10-10 | 2014-04-10 | H. Jonathan Chao | Using a tunable finite automaton for regular expression matching |
US11566687B2 (en) | 2017-09-16 | 2023-01-31 | Genesis Advanced Technology Inc. | Differential planetary gearbox |
Families Citing this family (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120254211A1 (en) * | 2011-04-02 | 2012-10-04 | Huawei Technologies Co., Ltd. | Method and apparatus for mode matching |
US8688608B2 (en) * | 2011-06-28 | 2014-04-01 | International Business Machines Corporation | Verifying correctness of regular expression transformations that use a post-processor |
US9268881B2 (en) | 2012-10-19 | 2016-02-23 | Intel Corporation | Child state pre-fetch in NFAs |
US9117170B2 (en) | 2012-11-19 | 2015-08-25 | Intel Corporation | Complex NFA state matching method that matches input symbols against character classes (CCLs), and compares sequence CCLs in parallel |
US9665664B2 (en) * | 2012-11-26 | 2017-05-30 | Intel Corporation | DFA-NFA hybrid |
US9304768B2 (en) | 2012-12-18 | 2016-04-05 | Intel Corporation | Cache prefetch for deterministic finite automaton instructions |
US9251440B2 (en) * | 2012-12-18 | 2016-02-02 | Intel Corporation | Multiple step non-deterministic finite automaton matching |
US9268570B2 (en) | 2013-01-23 | 2016-02-23 | Intel Corporation | DFA compression and execution |
WO2015084360A1 (en) * | 2013-12-05 | 2015-06-11 | Hewlett-Packard Development Company, L.P. | Regular expression matching |
US9729353B2 (en) * | 2014-01-09 | 2017-08-08 | Netronome Systems, Inc. | Command-driven NFA hardware engine that encodes multiple automatons |
CN107193776A (zh) * | 2017-05-24 | 2017-09-22 | 南京大学 | 一种用于正则表达式匹配的新型转换算法 |
US10481881B2 (en) * | 2017-06-22 | 2019-11-19 | Archeo Futurus, Inc. | Mapping a computer code to wires and gates |
US9996328B1 (en) * | 2017-06-22 | 2018-06-12 | Archeo Futurus, Inc. | Compiling and optimizing a computer code by minimizing a number of states in a finite machine corresponding to the computer code |
CN113703737A (zh) * | 2021-08-31 | 2021-11-26 | 深信服科技股份有限公司 | 一种寄存器传输级代码生成方法、装置、设备及介质 |
Family Cites Families (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5140644A (en) * | 1990-07-23 | 1992-08-18 | Hitachi, Ltd. | Character string retrieving system and method |
US5995963A (en) * | 1996-06-27 | 1999-11-30 | Fujitsu Limited | Apparatus and method of multi-string matching based on sparse state transition list |
US6581191B1 (en) * | 1999-11-30 | 2003-06-17 | Synplicity, Inc. | Hardware debugging in a hardware description language |
GB2367917A (en) * | 2000-10-12 | 2002-04-17 | Qas Systems Ltd | Retrieving data representing a postal address from a database of postal addresses using a trie structure |
US7225188B1 (en) * | 2002-02-13 | 2007-05-29 | Cisco Technology, Inc. | System and method for performing regular expression matching with high parallelism |
US7359895B2 (en) * | 2004-11-18 | 2008-04-15 | Industrial Technology Research Institute | Spiral string matching method |
FR2891075B1 (fr) * | 2005-09-21 | 2008-04-04 | St Microelectronics Sa | Circuit de memoire pour automate de reconnaissance de caracteres de type aho-corasick et procede de memorisation de donnees dans un tel circuit |
-
2009
- 2009-03-19 US US12/933,504 patent/US20110022617A1/en not_active Abandoned
- 2009-03-19 WO PCT/JP2009/055515 patent/WO2009116646A1/ja active Application Filing
- 2009-03-19 JP JP2010503940A patent/JPWO2009116646A1/ja active Pending
Non-Patent Citations (2)
Title |
---|
MASATO ONO: "Network IDS Muke no Koritsuteki na Pattern Matching Kairo no Kenkyu", UNIVERSITY OF TSUKUBA DAIGAKUIN HAKASE KATEI SYSTEM JOHO KOGAKU KENKYUKA SHUSHI RONBUN, 2006, pages 1 - 47, Retrieved from the Internet <URL:http://www.cs.tsukuba.ac.jp/list_H17.html> * |
NORIO YAMAGAKI ET AL.: "NFA Umekomi Gata Pattern Matching Kairo ni Okeru Multibyte Shorika ni Kansuru Kento", IEICE TECHNICAL REPORT, THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS, vol. 107, no. 225, 13 September 2007 (2007-09-13), pages 65 - 70 * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2011070502A (ja) * | 2009-09-28 | 2011-04-07 | Mitsubishi Electric Corp | シーケンス制御回路及び制御回路 |
US20140101155A1 (en) * | 2012-10-10 | 2014-04-10 | H. Jonathan Chao | Generating a tunable finite automaton for regular expression matching |
US20140101187A1 (en) * | 2012-10-10 | 2014-04-10 | H. Jonathan Chao | Using a tunable finite automaton for regular expression matching |
US8938454B2 (en) * | 2012-10-10 | 2015-01-20 | Polytechnic Institute Of New York University | Using a tunable finite automaton for regular expression matching |
US8943063B2 (en) * | 2012-10-10 | 2015-01-27 | Polytechnic Institute Of New York University | Generating a tunable finite automaton for regular expression matching |
US11566687B2 (en) | 2017-09-16 | 2023-01-31 | Genesis Advanced Technology Inc. | Differential planetary gearbox |
Also Published As
Publication number | Publication date |
---|---|
US20110022617A1 (en) | 2011-01-27 |
JPWO2009116646A1 (ja) | 2011-07-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2009116646A1 (ja) | マルチバイト処理向け文字列照合用有限オートマトン生成システム | |
JP5381710B2 (ja) | ε遷移を含まない非決定性有限オートマトン生成システムと方法およびプログラム | |
Abboud et al. | If the current clique algorithms are optimal, so is Valiant's parser | |
EP2668574B1 (en) | Utilizing special purpose elements to implement a fsm | |
JP5857072B2 (ja) | オートマトンの入次数および/または出次数を制御するための量化子の展開 | |
JP5763783B2 (ja) | 正規表現をコンパイルするための方法および装置 | |
US20110153641A1 (en) | System and method for regular expression matching with multi-strings and intervals | |
JP5321589B2 (ja) | 有限オートマトン生成装置、パターンマッチング装置、有限オートマトン回路生成方法およびプログラム | |
JP2014506693A5 (ja) | ||
US20210365253A1 (en) | Heterogeneity-agnostic and topology-agnostic data plane programming | |
US11816493B2 (en) | Methods and systems for representing processing resources | |
KR20140005258A (ko) | 요소 이용을 위한 상태 그룹화 | |
Geffert et al. | More concise representation of regular languages by automata and regular expressions | |
US6944588B2 (en) | Method and apparatus for factoring unambiguous finite state transducers | |
US20030004705A1 (en) | Method and apparatus for factoring ambiguous finite state transducers | |
US6920583B1 (en) | System and method for compiling temporal expressions | |
US7107205B2 (en) | Method and apparatus for aligning ambiguity in finite state transducers | |
US20030033135A1 (en) | Method and apparatus for extracting infinite ambiguity when factoring finite state transducers | |
US6965858B2 (en) | Method and apparatus for reducing the intermediate alphabet occurring between cascaded finite state transducers | |
US6760636B2 (en) | Method and apparatus for extracting short runs of ambiguity from finite state transducers | |
US6959273B2 (en) | Method and apparatus for factoring finite state transducers with unknown symbols | |
US20180011833A1 (en) | Syntax analyzing device, learning device, machine translation device and storage medium | |
KR20200094977A (ko) | 오토마타 기반 증분적 중위 확률 계산 장치 및 방법 | |
US20110023008A1 (en) | Method for optimizing an architectural model of a microprocessor | |
KR102271489B1 (ko) | 정규표현식 패턴의 탐지를 위한 아호코라식 오토마타 구축 장치 및 방법 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 09722478 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2010503940 Country of ref document: JP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 12933504 Country of ref document: US |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 09722478 Country of ref document: EP Kind code of ref document: A1 |