CN101650718A - Method and device for matching character strings - Google Patents

Method and device for matching character strings Download PDF

Info

Publication number
CN101650718A
CN101650718A CN200810147432A CN200810147432A CN101650718A CN 101650718 A CN101650718 A CN 101650718A CN 200810147432 A CN200810147432 A CN 200810147432A CN 200810147432 A CN200810147432 A CN 200810147432A CN 101650718 A CN101650718 A CN 101650718A
Authority
CN
China
Prior art keywords
status information
information
character
state
memory
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN200810147432A
Other languages
Chinese (zh)
Inventor
王浩
赵玉超
王勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN200810147432A priority Critical patent/CN101650718A/en
Publication of CN101650718A publication Critical patent/CN101650718A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention discloses a method and a device for matching character strings. The method for matching the character strings comprises the steps of: classifying state information according to preset state attribute thresholds; adopting different storage modes to store each class of state information after the classification, wherein the different storage modes have different spaceefficiencies; and performing the matching operation on the received character strings by using the stored state information. The method and the device classify the state information, and adopt the different storage methods for different states, so the storage space of the whole state information is compressed to ensure that the storage space can meet the capacity limit of a current storage device, and higher processing speed when performing the matching operation of the character strings is also ensured.

Description

Character string matching method and device
Technical field
The embodiment of the invention relates to communication technical field, particularly a kind of character string matching method and device.
Background technology
Along with all-IP (Internet Protocol, Internet Protocol) network, fixing proposition of moving notions such as fusion, multiple broadcast, traditional IP network is unified bearer network to the multi-service that integrates data, voice and video and is changed.But data mode that IP network is intrinsic and open in essence feature can't satisfy the needs of carrier class business well, at the QoS of internet security, manageability and key business (Qualityof Service, service quality) and aspect such as QoE (Quality of Experience, user experience quality) assurance all have much room for improvement.For accurate traffic identification and control are carried out in some key businesses, except analyzing the fields such as five-tuple in the heading according to the conventional method, also need the load of message is partly detected.DPI (Deep Packet Inspection, deep message detects) technology is arisen at the historic moment as a kind of technology of traffic identification flexibly and effectively.
Now, the DPI technology adopts regular expression substitute character string to describe message characteristic more.Regular expression is a kind of formal language, is made up of constant and operator, and constant and operator are indicated the set and the computing in these set of metacharacter respectively.Compare with character string, regular expression can be described various features very flexibly, simply, effectively, makes the feature string have dynamic perfromance, is fit to various News Search.For example: b, ab, aab, aaab, aaaab ... this a series of character string feature can represent with a regular expression a*b simply that character string can be regarded as a kind of special case of regular expression, follow-up both is referred to as character string.
Judge that the operation that whether comprises the rule of string representation in the input content is called string matching, a kind of character string matching method that prior art provides is based on DFA (Deterministic FiniteAutomation, deterministic finite state machine) method.
Now, string matching is more and more used in the network equipment, such as in the network equipment data message being detected.A kind of trend is to utilize FPGA (Field Programmable GateArray, field programmable gate array), ASIC (Application Specific Integrated Circuit, special IC) realization is based on the string matching system of DFA, and these systems depend on various storeies and preserve the DFA data.Storer commonly used now has: BRAM (Block Random AccessMemory, piecemeal random access memory in the sheet), DRAM (Dynamic Random AccessMemory, dynamic RAM), SRAM (Static Random Access Memory, static RAM) etc. is several.When using complicated character string regular collection, can't store whole DFA information with BRAM and SRAM; Though DRAM can satisfy memory requirement, the DRAM access speed is slow, do not satisfy the requirement that linear speed is handled, and frequent access DRAM can become the maximum bottleneck of total system.
A kind of matching way of prior art is to adopt state transition table mode storaging state information, utilizes the status information of storage to carry out string matching.State transition table mode is a kind of non-compact storage mode, its basic thinking be with current state and input character respectively as two dimensions of table, can uniquely determine a purpose state by these two parameters.
But state transition table mode all needs storage to every kind of possible input character, even on certain state, there is not effective input character, also need a memory location, so the storage space that above-mentioned storage means takies is very big, can't realizes the storage fully of DFA with the quick storage device.
The another kind of matching way of prior art is to adopt adjacency list mode storaging state information, utilizes the status information of storage to carry out string matching.The adjacency list mode is a kind of compact storage mode.Basic thinking is to certain current state, only stores those transfer side informations of significant character correspondence.
But the character that the adjacency list mode is stored is discontinuous, can only be when carrying out the character comparison from first character, one by one the character to storage compares, can't realize string matching processing at a high speed, can't reach the performance requirement that linear speed is handled, though therefore in High Speed System, saved storage space, still seldom be used to carry out string matching.In addition, when transfer limit number was many, the storage space that the storage means of adjacency list mode is saved was very limited.
The internal storage structure that prior art has also proposed a kind of classification comes the store status machine information, and utilizes the state machine information of storage to carry out string matching.Basic ideas are according to access speed BRAM, SRAM and DRAM to be organized from high to low, and the memory speed that level is high more is high more, and capacity is more little; According to the usage frequency order from high to low of character string rule, the state machine that rule is corresponding is deposited with different memory devices then; The rule that usage frequency is high preferentially is placed in the higher storer of level.
But this method still adopts single storage mode for strictly all rules, and all states of same state machine can be stored in the same storer sometime simultaneously.When the more complicated of rule own, the state machine number that can store among the BRAM depends on the complexity of rule.Therefore regular number more or regular itself when complicated, this method is not suitable for the string matching operation.
Summary of the invention
The embodiment of the invention provides a kind of character string matching method and device, to realize adopting different storage modes to store to different status informations, utilizes the status information of storage to carry out string matching, improves the string matching performance.
For achieving the above object, the embodiment of the invention provides a kind of character string matching method on the one hand, comprising:
According to default status attribute threshold value status information is classified; Adopt different storage modes to store sorted every class status information, the space efficiency difference of described different storage modes; Utilize the status information of storage that the character string that receives is carried out matching operation.
On the other hand, the embodiment of the invention also provides a kind of string matching device, comprising: sort module is used for according to default status attribute threshold value status information being classified; Memory module is used to adopt different storage modes to store the sorted every class status information of described sort module, the space efficiency difference of described different storage modes; Matching module is used to utilize the status information of described memory module storage that the character string that receives is carried out matching operation.
On the one hand, the embodiment of the invention also provides a kind of Compilation Method of status information, comprising again:
The regular expression rule is converted to state machine;
Each status information of scan state machine is classified to described status information according to default status attribute threshold value;
Adopt different storage modes to store sorted every class status information, the space efficiency difference of described different storage modes.
On the one hand, the embodiment of the invention also provides a kind of compilation tool of status information, comprising again:
Modular converter is used for the regular expression rule is converted to state machine;
Sort module is used to scan each status information of the state machine of described modular converter conversion, according to default status attribute threshold value described status information is classified;
Memory module is used to adopt different storage modes to store sorted every class status information, the space efficiency difference of described different storage modes.
Compared with prior art, the embodiment of the invention has the following advantages: the embodiment of the invention is classified to status information, adopts different storage meanss at different types of status information, and utilizes the status information of storage to carry out string matching.The embodiment of the invention had both been compressed the storage space of whole status information, enabled to satisfy the capacity limit of current storage part, and having guaranteed again has the high processing performance when carrying out string matching.
Description of drawings
In order to be illustrated more clearly in the technical scheme of the embodiment of the invention, the accompanying drawing of required use is done to introduce simply in will describing embodiment below, accompanying drawing in describing below only is some embodiments of the present invention, for those of ordinary skills, under the prerequisite of not paying creative work, can also obtain other accompanying drawing according to these accompanying drawings.
Fig. 1 is the process flow diagram of a kind of character string matching method of embodiment of the invention proposition;
Fig. 2 is the structural drawing of a kind of string matching device of embodiment of the invention proposition;
Fig. 3 is the structural drawing of the another kind of string matching device of embodiment of the invention proposition;
Fig. 4 is the workflow synoptic diagram of the string matching device of embodiment of the invention proposition;
Fig. 5 is the process flow diagram of the Compilation Method of embodiment of the invention status information;
Fig. 6 is the structural drawing of the compilation tool of embodiment of the invention status information.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the invention, the technical scheme in the embodiment of the invention is clearly and completely described, described embodiment only is a part of embodiment of the present invention, rather than whole embodiment.Based on the embodiment among the present invention, those of ordinary skills belong to the scope of protection of the invention not making the every other embodiment that is obtained under the creative work prerequisite.
As shown in Figure 1, the process flow diagram of a kind of character string matching method that proposes for the embodiment of the invention comprises:
Step S101 classifies to status information according to default status attribute threshold value.
Wherein, status attribute is specifically as follows: shift the visiting frequency of limit number and status information, and in the distance between current state and initial state one or more.
In a kind of implementation of the embodiment of the invention, can preestablish one and shift limit number threshold value, transfer limit number and this transfer limit number threshold value of DFA state compared, status information is divided into non-compact condition information and compact condition information.For example: the status information that shifts limit number threshold value less than this can be divided into compact condition information, as first kind status information; To be divided into non-compact condition information greater than this status information that shifts limit number threshold value, as the second class status information.
Certainly the embodiment of the invention is not limited thereto, and how to classify, and the number of classification does not influence the realization of the embodiment of the invention.
Step S102 adopts different storage modes to store sorted every class status information, the space efficiency difference of different storage modes.
The embodiment of the invention is that first kind status information is stored among the DRAM with compact condition information, and with part compact condition information stores in BRAM, with non-compact condition information is that the second class status information is stored among the SRAM, and the space efficiency of DRAM and BRAM is higher than the space efficiency of SRAM.
Step S103 utilizes the status information of storage that the character string that receives is carried out matching operation.
Wherein, utilizing the status information of storing that the character string that receives is carried out matching operation is specifically as follows:
According to the expression Rule Information that carries out matching operation, obtain the status information of this expression Rule Information correspondence.When the current character coupling of the current character of the character string that receives and this status information, the memory address of determining the purpose state according to the current character and the current state of this status information.Then,, determine the memory location of purpose status information, and read the coupling that this purpose status information is carried out character late according to this memory location according to the memory address of purpose state.
Wherein, reading the purpose status information according to the memory location carries out the coupling of character late and is specifically as follows:
When the purpose status information was stored in on-chip memory, all information that read this purpose status information from on-chip memory were carried out the coupling of character late; Perhaps,
When the purpose status information was stored in chip external memory, an information that reads this purpose status information from chip external memory was carried out the coupling of character late.
Above-mentioned character string matching method is classified to status information, adopts different storage meanss at different types of status information, and utilizes the status information of storage to carry out string matching.The embodiment of the invention had both been compressed the storage space of whole status information, enabled to satisfy the capacity limit of current storage part, and having guaranteed again has the high processing performance when carrying out string matching.
As shown in Figure 2, the structural drawing of a kind of string matching device that proposes for the embodiment of the invention comprises:
Sort module 21 is used for according to default status attribute threshold value status information being classified.
Memory module 22 is used to adopt different storage mode storage sort module 21 sorted every class status informations, the space efficiency difference of different storage modes.
Matching module 23 is used to utilize the status information of memory module 22 storages that the character string that receives is carried out matching operation.
Wherein, matching module 23 can comprise:
State information acquisition submodule 231 is used for obtaining the first kind status information of this expression Rule Information correspondence according to the expression Rule Information that carries out matching operation;
Memory address is determined submodule 232, when being used for the current character coupling when the current character of the character string that receives and status information, and the memory address of determining the purpose state according to the current character and the current state of status information;
Information reading submodule 233 is used for determining according to described memory address the memory address of the purpose state that submodule 232 is determined, determines the memory location of purpose status information, and reads the purpose status information according to this memory location;
Character comparison sub-module 234, the purpose status information that is used for reading according to information reading submodule 233 is carried out the coupling of character late.
Above-mentioned string matching device, 21 pairs of status informations of sort module are classified, indication memory module 22 adopts different storage modes at different status informations, and then matching module 23 can utilize the status information of memory module 22 storages that the character string that receives is carried out matching operation, not only compressed the storage space of whole status information, enable to satisfy the capacity limit of current storage part, improved the processing speed of string matching operation again.
As shown in Figure 3, the structural drawing of the another kind of string matching device that proposes for the embodiment of the invention, present embodiment describes with the example of data message as character string.
In the present embodiment, preestablish one and shift limit number threshold value, sort module 21 compares transfer limit number and this transfer limit number threshold value of DFA state, and status information is divided into non-compact condition information and compact condition information.For example: the status information that shifts limit number threshold value less than this can be divided into compact condition information; To be divided into non-compact condition information greater than this status information that shifts limit number threshold value.
Certainly the embodiment of the invention is not limited thereto, and how to classify, and the number of classification does not influence the realization of the embodiment of the invention.
Memory module 22 specifically can realize on BRAM 31, DRAM 32 and SRAM 33, wherein, BRAM 31 is used to store the partial data relevant with DFA, comprise state machine cache module 311 and state machine information memory module 312, wherein, state machine cache module 311 is preserved the DFA information of visiting recently, also can preserve the often DFA information of visit according to visiting frequency.If cache hit can avoid reading DFA from DRAM, improve the performance of device.
State machine information memory module 3112 is preserved initial memory address and the size of each DFA in DRAM, uses in the time need packing the DFA compact condition at every turn.
DRAM 32, are used to store compact condition information.
SRAM 33, are used to store non-compact condition information.
In the present embodiment, matching module 23 can realize on engine 34 that specifically engine 34 is core and control modules of whole string matching device, and engine 34 comprises packet buffer module 341, comparer 342 and controller 343.
Packet buffer module 341 is used to receive the data message that need carry out the DFA matching operation, carries out character relatively for follow-up;
Comparer 342 is used for taking out single character from packet buffer module 341, and the character that is used on the transfer limit with the DFA state compares, and determines whether it is matching status, realizes the function of character comparison sub-module 234; Or determine the memory address of the purpose state that can move to by current character, realize that memory address determines the function of submodule 232;
Controller 343 mainly contains following effect:
(1) when receiving a new message and needing the rule numbers of coupling, in state machine cache module 311 or all compact condition of searching the DFA of this rule correspondence on the DRAM 32, the compact condition in the machine buffer memory of will not being at state is encased in the state machine cache module 311 from DRAM according to rule numbers;
(2) in matching process, the memory address of the purpose status information that produces according to comparer 342, obtaining from the current state machine of state machine cache module 311 storages or SRAM 33 needs transfer side information relatively, realizes the function of information reading submodule 233;
(3) control module of module (as Ethernet interface, chip external memory) is sent control signals such as read-write to the periphery.
Wherein, BRAM 31 and engine 34 can be integrated in a programmable logic device (PLD), for example: on the FPGA; DRAM 32 and SRAM 33 can be integrated on the chip external memory.
The string matching device that the embodiment of the invention provides, for example: when FPGA realizes based on the regular expression coalignment of DFA with programmable logic device (PLD), both effectively stored all DFA, and can guarantee that again this string matching device carried out high speed processing to the message coupling.
The embodiment of the invention is that to shift the limit number be that the operation of example during to specific implementation is elaborated with the status attribute.
The embodiment of the invention is at first set one and is shifted limit number threshold value, and transfer limit number and this transfer limit number threshold value of DFA state compared, and status information is divided into non-compact condition information and compact condition information.At device during initialization, with the compact condition information stores in DRAM 32, with non-compact condition information stores in SRAM 33.
The transfer limit number threshold value of supposing setting is 8, and the number that the embedded memory pond among the BRAM 31 can be set is 8, because the informational needs of state itself takies 1 memory pool unit, 7 remaining memory pool unit are used for the transfer limit of storaging state information correspondence.For example: have 4 states that shift the limit can take 5 continuous memory pool unit.From the memory pool unit of status information correspondence, continuous 8 memory pool unit are sent to comparer 342 successively simultaneously and compare, and it is effective shifting the limit by comparer 342 which bar of decision.For shifting the state of limit number between 249-255, can carry out negate to shifting the limit number earlier, reduce and shift the limit, it is shifted the limit number be transformed between the 1-7.For example: [^abc] such regular expression can produce 253 and shift the limit, as long as the transfer limit of storage a, b, three character correspondences of c adds " unequal effective " mark on the transfer limit, just can save a large amount of storage spaces.
The embodiment of the invention is being stored shifting side information, when generating DFA by compilation tool, directly the purpose status number is replaced to the memory address of purpose state, adopts direct addressing method, has simplified the process of engine acquisition status information.
Before carrying out matching operation, engine 34 will exist the information among the DRAM 32 once to be read among the BRAM 31 according to rule numbers, and engine 34 can once read all transfer side informations of a compact condition and compare from BRAM 31 when coupling; Engine 34 compares as the transfer side information that index directly reads non-compact condition with specific character.
The BRAM 31 that provides among Fig. 3, DRAM 32, SRAM 33 have just provided the present invention and have realized a kind of embodiment of storing, but the embodiment of the invention is not limited thereto.Whether the embodiment of the invention does not limit uses for which kind of memory device, perhaps use with a kind of memory device and store two states.The memory device difference of using does not influence the realization of the embodiment of the invention.Equally, adopt different device (as ASIC, other memory devices), perhaps increase the module that realizes other functions, change the title and the mutual information of intermodule of module, also can not influence the realization of the embodiment of the invention.
The workflow synoptic diagram of above-mentioned string matching device specifically comprises as shown in Figure 4:
Step S401, engine 34 initialization, with the non-compact condition information stores of all DFA in SRAM 33, and with the compact condition information stores of all DFA in DRAM 32.
Step S402, engine 34 receives data message and the rule numbers that needs coupling by network interface control module (omitting among Fig. 3), this data message is stored in the packet buffer module 341, and the rule numbers that will need to mate is given controller 343.
Step S403, controller 343 obtain start address and the size of compact condition information on DRAM 32 of the DFA corresponding with this rule numbers according to rule numbers query State machine information memory module 312.
Step S404, controller 343 check that the status information of described start address correspondence is whether in state machine cache module 311.If this status information in state machine cache module 311, execution in step S406 then; If this status information is not at state in the machine cache module 311, execution in step S405 then.
Step S405, controller 343 is according to start address and the size of compact condition information on DRAM 32, in DRAM 32, search the compact condition information of DFA correspondence, and all compact condition information of this DFA are loaded in the state machine cache module 311 execution in step S406.
Step S406, controller 343 these start addresses are set to destination address, read initial state information and give comparer 342 as current state from state machine cache module 311; Simultaneously controller 343 notice comparers 342 message that fetches data from packet buffer module 341 is prepared to compare execution in step S407.
Step S407, comparer 342 is judged the matched indicia of current state, if matching status then is provided with comparative result for coupling, jumps to step S413; Otherwise execution in step S408.
Step S408, whether the character in the current character of comparer 342 comparing data messages and the transfer side information of current state mates.
If current state information is from state machine cache module 311, comparer 342 compares the current character of data message and all characters that shift in the side information of this status information simultaneously; If current state information is from SRAM 33, comparer 342 compares the current character of this data message and the character that shifts in the side information.If the current character of this data message can not be mated with the character that shifts in the side information, comparative result is set for not matching, jump to step S413; If the current character of this data message can be mated with the character that shifts in the side information, then determine the memory address of purpose state according to the character in this transfer side information and current state, and the memory address of this purpose state sent to controller 343, execution in step S409.
Step S409, controller 343 judge according to the memory address of purpose state whether the purpose status information is stored in the state cache module 311.When the purpose status information is stored in state cache module 311, execution in step S410; When the purpose status information is stored in SRAM 33, execution in step S411.
Step S410, controller 343 once read the purpose state from state machine cache module 311 matched indicia shifts side information as current state, execution in step S412 with all.
Step S411, controller 343 read the matched indicia of purpose state from SRAM 33 and of corresponding current character shifts side information as current state, execution in step S412.
Step S412 if current character has been last character of data message, then is provided with comparative result for not matching, and jumps to step S413; Otherwise the character late of the message that fetches data jumps to step S407 as current character.
Step 413, flow process finishes, the output comparative result.This comparative result is specifically as follows: the data message of reception and the pairing rule match of the rule numbers of input or do not match.
Above-mentioned steps just illustrates the feasibility of utilizing the classification status information to carry out string matching as a kind of embodiment, might adjust step wherein during specific implementation, as increasing or delete some steps, or adjusts the work of being done in some step.
Above-mentioned character string matching method and device, state to state machine carries out statistical observation, adopt different storage meanss at different states, both compressed whole message package space, enable to satisfy the capacity limit of current storage part, guaranteed that again whole device can have the high processing performance when carrying out the message matching operation.
As shown in Figure 5, the process flow diagram for the Compilation Method of embodiment of the invention status information comprises:
Step S501 is converted to state machine with the regular expression rule.
Step S502, each status information of scan state machine is classified to this status information according to default status attribute threshold value.
Step S503 adopts different storage modes to store sorted every class status information, the space efficiency difference of different storage modes.
As shown in Figure 6, the structural drawing for the compilation tool of embodiment of the invention status information comprises:
Modular converter 61 is used for the regular expression rule is converted to state machine;
Sort module 62 is used for each status information of the state machine of scan conversion module 61 conversion, according to default status attribute threshold value this status information is classified;
Memory module 63 is used to adopt different storage modes to store sorted every class status information, the space efficiency difference of different storage modes.
To shift the limit number, be used for the differentiation state in the embodiment of the invention as status attribute.When reality realized, other status attribute also can be used as the foundation of differentiation state such as distance between visiting frequency, state and the initial state of state etc.Also a plurality of status attributes can be combined as the foundation of the state of differentiation.Use different status attributes, perhaps the number difference of user mode attribute does not influence the realization of the embodiment of the invention.
Whether in actual applications, state not only comprises and shifts the side attribute field, also comprises other attribute field, be the tag field etc. of matching status such as sign.These information are that complete state must have, and may distinguish to some extent in different application.Because the status attribute that these information and the present invention are used to classify is irrelevant, in instructions, omitted.
In the embodiment of the invention state is distinguished into two classes according to shifting the limit number, but the embodiment of the invention is not limited thereto, number of categories purpose difference does not influence the realization of the embodiment of the invention.
The embodiment of the invention uses different storage modes to store to the information of different conditions, the higher storage mode of compact condition information usage space efficient is stored, to reach the purpose of saving storage space.Can adopt the form of adjacency list to store compact condition information among the embodiment, also can use other compression method to store compact condition information, such as the character classification in the regular expression is replaced with the specific coding character, perhaps will be divided into a plurality of state machines that the combination based on bit or bit compares based on the state machine of character comparison to reduce required storage space.Because of using the difference of the storage format that other storage meanss cause, do not influence the realization of the embodiment of the invention.
The character string matching method that the embodiment of the invention proposes also can be applicable to the situation that high speed is mated the data message under the network environment, in the scene of other user mode machines such as search engine, database retrieval, natural language understanding, state machine generally is not as character relatively, but the state machine in the above-mentioned scene can be divided into " current state " and " purpose state " two kinds, and the behavior that state transition takes place to carry out to " purpose state " from " current state " satisfying under the certain condition is also all arranged.Therefore the technical scheme of the embodiment of the invention is equally applicable to above-mentioned scene.
Through the above description of the embodiments, those skilled in the art can be well understood to the present invention and can realize by hardware, also can realize by the mode that software adds necessary general hardware platform.Based on such understanding, technical scheme of the present invention can embody with the form of software product, it (can be CD-ROM that this software product can be stored in a non-volatile memory medium, USB flash disk, portable hard drive etc.) in, comprise some instructions with so that computer equipment (can be personal computer, server, the perhaps network equipment etc.) carry out the described method of each embodiment of the present invention.
It will be appreciated by those skilled in the art that accompanying drawing is the synoptic diagram of a preferred embodiment, module in the accompanying drawing or flow process might not be that enforcement the present invention is necessary.
It will be appreciated by those skilled in the art that the module in the device among the embodiment can be distributed in the device of embodiment according to the embodiment description, also can carry out respective change and be arranged in the one or more devices that are different from present embodiment.The module of the foregoing description can be merged into a module, also can further split into a plurality of submodules.
The invention described above embodiment sequence number is not represented the quality of embodiment just to description.
More than disclosed only be several specific embodiment of the present invention, still, the present invention is not limited thereto, any those skilled in the art can think variation all should fall into protection scope of the present invention.

Claims (9)

1, a kind of character string matching method is characterized in that, comprising:
According to default status attribute threshold value status information is classified;
Adopt different storage modes to store sorted every class status information, the space efficiency difference of described different storage modes;
Utilize the status information of storage that the character string that receives is carried out matching operation.
2, character string matching method according to claim 1 is characterized in that, the status information of described utilization storage is carried out matching operation to the character string that receives and comprised:
According to the expression Rule Information that carries out matching operation, obtain the status information of described expression Rule Information correspondence;
When the current character coupling of the current character of described character string and described status information, the memory address of determining the purpose state according to the current character and the current state of described status information;
According to the memory address of described purpose state, determine the memory location of described purpose status information, and read the coupling that described purpose status information is carried out character late according to described memory location.
3, as character string matching method as described in the claim 2, it is characterized in that, describedly read the coupling that described purpose status information carries out character late according to the memory location and specifically comprise:
When described purpose status information was stored in on-chip memory, all information that read described purpose status information from described on-chip memory were carried out the coupling of character late; Perhaps,
When described purpose status information was stored in chip external memory, an information that reads described purpose status information from described chip external memory was carried out the coupling of character late.
4, character string matching method according to claim 1 is characterized in that described status attribute is specially: shift the visiting frequency of limit number and status information, and in the distance between current state and initial state one or more.
5, a kind of string matching device is characterized in that, comprising:
Sort module is used for according to default status attribute threshold value status information being classified;
Memory module is used to adopt different storage modes to store the sorted every class status information of described sort module, the space efficiency difference of described different storage modes;
Matching module is used to utilize the status information of described memory module storage that the character string that receives is carried out matching operation.
6, as string matching device as described in the claim 5, it is characterized in that described matching module comprises:
The state information acquisition submodule is used for obtaining the first kind status information of described expression Rule Information correspondence according to the expression Rule Information that carries out matching operation;
Memory address is determined submodule, when being used for the current character coupling when the current character of the character string of described reception and status information, and the memory address of determining the purpose state according to the current character and the current state of status information;
The information reading submodule is used for determining according to described memory address the memory address of the purpose state that submodule is determined, determines the memory location of described purpose status information, and reads the purpose status information according to described memory location;
The character comparison sub-module is used for carrying out according to the purpose status information that described information reading submodule reads the coupling of character late.
7, a kind of Compilation Method of status information is characterized in that, comprising:
The regular expression rule is converted to state machine;
Each status information of scan state machine is classified to described status information according to default status attribute threshold value;
Adopt different storage modes to store sorted every class status information, the space efficiency difference of described different storage modes.
8, as the Compilation Method of status information as described in the claim 7, it is characterized in that described status attribute is specially: shift the visiting frequency of limit number and status information, and in the distance between current state and initial state one or more.
9, a kind of compilation tool of status information is characterized in that, comprising:
Modular converter is used for the regular expression rule is converted to state machine;
Sort module is used to scan each status information of the state machine of described modular converter conversion, according to default status attribute threshold value described status information is classified;
Memory module is used to adopt different storage modes to store sorted every class status information, the space efficiency difference of described different storage modes.
CN200810147432A 2008-08-15 2008-08-15 Method and device for matching character strings Pending CN101650718A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN200810147432A CN101650718A (en) 2008-08-15 2008-08-15 Method and device for matching character strings

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN200810147432A CN101650718A (en) 2008-08-15 2008-08-15 Method and device for matching character strings

Publications (1)

Publication Number Publication Date
CN101650718A true CN101650718A (en) 2010-02-17

Family

ID=41672957

Family Applications (1)

Application Number Title Priority Date Filing Date
CN200810147432A Pending CN101650718A (en) 2008-08-15 2008-08-15 Method and device for matching character strings

Country Status (1)

Country Link
CN (1) CN101650718A (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103544142A (en) * 2012-07-17 2014-01-29 安凯(广州)微电子技术有限公司 State machine
CN104980418A (en) * 2014-04-14 2015-10-14 凯为公司 Compilation Of Finite Automata Based On Memory Hierarchy
CN106156061A (en) * 2015-03-30 2016-11-23 北大方正集团有限公司 A kind of method and device improving efficiency data query
CN106445472A (en) * 2016-08-16 2017-02-22 中国科学院计算技术研究所 Character operation acceleration method and apparatus, chip, and processor
CN103685280B (en) * 2013-12-18 2017-04-26 华为技术有限公司 Message matching method, state machine compiling method and equipment
US10110558B2 (en) 2014-04-14 2018-10-23 Cavium, Inc. Processing of finite automata based on memory hierarchy
US10466964B2 (en) 2013-08-30 2019-11-05 Cavium, Llc Engine architecture for processing finite automata
CN110758975A (en) * 2019-01-24 2020-02-07 中船第九设计研究院工程有限公司 Mathematical modeling method for steel plate inventory information of intelligent steel stock yard of shipyard

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103544142A (en) * 2012-07-17 2014-01-29 安凯(广州)微电子技术有限公司 State machine
US10466964B2 (en) 2013-08-30 2019-11-05 Cavium, Llc Engine architecture for processing finite automata
CN103685280B (en) * 2013-12-18 2017-04-26 华为技术有限公司 Message matching method, state machine compiling method and equipment
CN104980418A (en) * 2014-04-14 2015-10-14 凯为公司 Compilation Of Finite Automata Based On Memory Hierarchy
US10002326B2 (en) 2014-04-14 2018-06-19 Cavium, Inc. Compilation of finite automata based on memory hierarchy
US10110558B2 (en) 2014-04-14 2018-10-23 Cavium, Inc. Processing of finite automata based on memory hierarchy
CN104980418B (en) * 2014-04-14 2018-11-13 凯为公司 The compiling of finite automata based on memory hierarchy
CN106156061A (en) * 2015-03-30 2016-11-23 北大方正集团有限公司 A kind of method and device improving efficiency data query
CN106445472A (en) * 2016-08-16 2017-02-22 中国科学院计算技术研究所 Character operation acceleration method and apparatus, chip, and processor
CN106445472B (en) * 2016-08-16 2019-01-11 中国科学院计算技术研究所 A kind of character manipulation accelerated method, device, chip, processor
CN110758975A (en) * 2019-01-24 2020-02-07 中船第九设计研究院工程有限公司 Mathematical modeling method for steel plate inventory information of intelligent steel stock yard of shipyard

Similar Documents

Publication Publication Date Title
CN101650718A (en) Method and device for matching character strings
CN102648468B (en) Table search device, table search method, and table search system
US6775737B1 (en) Method and apparatus for allocating and using range identifiers as input values to content-addressable memories
CN103238145A (en) Method and apparatus for high performance, updatable, and deterministic hash table for network equipment
US20070100919A1 (en) Garbage collection unit and method thereof
CN101546342A (en) Method and system for implementing search service
CN101155182A (en) Garbage information filtering method and apparatus based on network
CN103392169B (en) Sort method and system
CN102754394B (en) Method for hash table storage, method for hash table lookup, and devices thereof
CN106708956B (en) A kind of HTTP data matching method based on more URL rule sets
CN108446399B (en) Dynamic storage optimization method for structured massive real-time data
CN109165096B (en) Cache utilization system and method for web cluster
CN108932271A (en) A kind of file management method and device
CN114943287A (en) Computer big data acquisition and processing system, method, equipment and medium
CN100397816C (en) Method for classifying received data pocket in network apparatus
CN106484815B (en) A kind of automatic identification optimization method based on mass data class SQL retrieval scene
CN101645062B (en) Report form generation method and system
CN104317955B (en) File scanning method and device in a kind of mobile terminal memory space
CN111752941A (en) Data storage method, data access method, data storage device, data access device, server and storage medium
CN107436848B (en) Method and device for realizing conversion between user data and compressed data
CN110471764A (en) A kind of processing method and processing device of memory cleaning
CN110442696A (en) Inquiry processing method and device
CN105791124B (en) Message detecting method and device
CN113382075A (en) Enterprise information management platform, management method, electronic device and storage medium
CN107295485A (en) Multimedia message accessory management method, device and communication system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C12 Rejection of a patent application after its publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20100217