CN103577598B - Matching method and device for pattern string and text string - Google Patents
Matching method and device for pattern string and text string Download PDFInfo
- Publication number
- CN103577598B CN103577598B CN201310576313.7A CN201310576313A CN103577598B CN 103577598 B CN103577598 B CN 103577598B CN 201310576313 A CN201310576313 A CN 201310576313A CN 103577598 B CN103577598 B CN 103577598B
- Authority
- CN
- China
- Prior art keywords
- string
- text string
- pattern
- array
- displacement value
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/903—Querying
- G06F16/90335—Query processing
- G06F16/90344—Query processing by using string matching techniques
Abstract
The invention discloses a matching method and device for a pattern string and a text string. The matching method comprises the steps of matching the pattern string and the text string, and under the circumstance of the mismatch of the pattern string and the text string, determining the skip mode of the text string according to the maximum positive displacement value obtained based on multiple preprocessing arrays constructed by the pattern string in advance. According to the matching method and device for the pattern string and the text string, after the mismatch of the pattern string and the characters in the text string, the skip mode of the text string is determined by using the multiple preprocessing arrays constructed in advance, the effect that the pattern string skips with the maximum positive displacement value can be ensured under the circumstance that the effect that the pattern string does not backtrack after the mismatch of the pattern string and the characters in the text string is ensured, and thus the pattern matching speed and the safety of network management are improved.
Description
Technical field
The present invention relates to computer realm, and especially, it is related to matching process and the dress of a kind of pattern string and text string
Put.
Background technology
As the core algorithm of network security, its task is in a text string to pattern matching algorithm(Alternatively referred to as target
String)In find designated character string(I.e. pattern string), and return this pattern string position, pattern matching algorithm occur in text string
Efficiency directly influence the efficiency of management of network security.Initial force search is carried out by exhaustion, and its shortcoming exists
Slow in matching speed, take length.The Single Pattern Matching Algorithms of the classics then occurring include Ke Nusi-Mo Lisi-Alexandre Desplat word
Symbol string lookup algorithm(Knuth-Morris-Pratt, hereinafter referred to as KMP)Algorithm, slope skill that-mole character string search are calculated
Method (Boyer-Moore, hereinafter referred to as BM) algorithm and the optimized algorithm Boyer-Moore-Horspool of BM(BMH)、
Boyer-Moore-Horspool-Sunday(BMHS), Boyer-Moore-Horspool-two-Chinese (hereinafter referred to as
For BMH2C) etc., these classic algorithm mainly use and redirect to lift matching speed after character mismatch.
Wherein, the complexity of KMP algorithm is O(n)It is ensured that after character mismatch, pattern string is not recalled, using KMP algorithm
In the worst cases(I.e. pattern string is located at the end of text string), there is higher performance.Although the jump distance of KMP algorithm
(Alternatively referred to as shift value)Permanent is just, but because jump distance is less, average belavior of algorithm is low compared with BM algorithm.BM algorithm has
Larger jump distance, has preferable average behavior, but because BM algorithm can draw negative shift value in some cases, that is,
It cannot be guaranteed that not recalling, in the case of the worst and worse after character mismatch in text string(That is, pattern string is located at text string
End or turn by end), BM algorithm has poor performance, and its complexity is O(mn).
For Single Pattern Matching Algorithms in correlation technique after mismatch, text string redirect less or backtracking lead to coupling effect
The low problem of rate, not yet proposes effective solution at present.
Content of the invention
For Single Pattern Matching Algorithms in correlation technique after mismatch, text string redirect less or backtracking lead to coupling effect
The low problem of rate, the present invention proposes the matching process of a kind of pattern string and text string, can be in Assured Mode string with text string
Character mismatch after text string do not recall in the case of it is ensured that text string is redirected with maximum positive displacement value, improve pattern
Matching speed and the safety of network management.
The technical scheme is that and be achieved in that:
According to an aspect of the invention, it is provided the matching process of a kind of pattern string and text string.
This matching process includes:Pattern string is mated with text string;
In the case of pattern string with text string mismatch, the multiple pretreatment arrays according to being in advance based on pattern string structure obtain
To maximum positive displacement value determine text string redirect mode.
Preferably, multiple pretreatment arrays include Skip array and/or Next array.
And, preferential redirect mode according to what Skip array obtained that shift value determines text string.
Alternatively, literary composition is determined according to the maximum positive displacement value that the multiple pretreatment arrays being in advance based on pattern string structure obtain
The mode that redirects of this string includes:
In the case that Skip array obtains positive displacement value, redirect mode according to what this positive displacement value determined text string;
In the case that Skip array obtains negative displacement value, positive displacement value is obtained according to Next array, according to this positive displacement
Value determine text string redirect mode.
Further, before pattern string is mated with text string, pattern string and text string are carried out left-justify.
Wherein, by pattern string and text string carry out mating including:In the case that the match is successful, with mould in returned text string
The position of formula string identical character.
According to another aspect of the present invention, there is provided the coalignment of a kind of pattern string and text string.
This coalignment includes:
Matching module, for being mated pattern string with text string;
Determining module, in the case of pattern string and text string mismatch, according to being in advance based on the many of pattern string structure
What the maximum positive displacement value that individual pretreatment array obtains determined text string redirects mode.
Preferably, multiple pretreatment arrays include Skip array and/or Next array.
And, above-mentioned determining module is additionally operable to preferentially obtain, according to Skip array, the side of redirecting that shift value determines text string
Formula.
Alternatively, above-mentioned determining module is additionally operable to:
In the case that Skip array obtains positive displacement value, redirect mode according to what this positive displacement value determined text string;
In the case that Skip array obtains negative displacement value, positive displacement value is obtained according to Next array, according to this positive displacement
Value determine text string redirect mode.
The present invention by after the character mismatch in pattern string and text string, using the multiple pretreatment arrays building in advance
Determine text string redirects mode, being capable of text string is not recalled after the character mismatch in Assured Mode string and text string situation
Down it is ensured that text string is redirected with maximum positive displacement value, improve the safety of pattern matching speed and network management.
Brief description
Fig. 1 is the flow chart of pattern string according to embodiments of the present invention and the matching process of text string;
Fig. 2 is the flow chart of BMH2C algorithm in prior art;
Fig. 3 is the flow chart of matching algorithm according to an embodiment of the invention;
Fig. 4 is the block diagram of pattern string according to embodiments of the present invention and the coalignment of text string.
Specific embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, complete
Site preparation description is it is clear that described embodiment is only a part of embodiment of the present invention, rather than whole embodiments.It is based on
Embodiment in the present invention, the every other embodiment that those of ordinary skill in the art are obtained, broadly fall into present invention protection
Scope.
According to embodiments of the invention, there is provided the matching management method of a kind of pattern string and text string.
As shown in figure 1, pattern string according to embodiments of the present invention is included with the matching process of text string:
Step S101, pattern string is mated with text string, further, is mated with text string by pattern string
Before, pattern string and text string are carried out left-justify, in the case that the match is successful, with pattern string identical in returned text string
The position of character;
Step S103, in the case of pattern string and text string mismatch, according to being in advance based on the multiple pre- of pattern string structure
Process that the maximum positive displacement value that obtains of array determines text string redirects mode, it is to avoid in text string mismatch condition in prior art
Under, only shift value is obtained by single pretreatment array, cause shift value to be negative or less defect such that it is able to ensure
It is ensured that text string is carried out with maximum positive displacement value in the case that after character mismatch in pattern string and text string, text string is not recalled
Redirect, improve the safety of pattern matching speed and network management.
Preferentially, multiple pretreatment arrays include Skip array and/or Next array, and wherein, Skip array can be KMP
Pretreatment array in algorithm, and Next array can be the pretreatment array in BMH2C algorithm.Hereinafter simply introduce KMP to calculate
Method, BMH2C algorithm and both algorithms are compared, wherein, text string S is the text needing to be searched, and pattern string P is
Need the character string searched in the text, i text string vernier pointer(Also referred to as subscript), j is pattern string vernier pointer, and n is literary composition
This string length, m is pattern string length.
KMP algorithm:In KMP algorithm, in order to ensure, after character mismatch, during coupling next time, the position of j, introduces pre- place
Reason array next [] array, what the value of next [j] represented the longest " suffix is equal to prefix " in P [0~j-1] sews length.
Next [] array is defined as follows:
(Formula 1)Next [0]=- 1, wherein, j=0, i.e. the pretreatment values of the first character of any character string are defined as -1;
(Formula 2)next[j]=kmax, wherein 0<k<j;P [0,1,2 ... k-1]=P [j-k, j-k+1, j-k+2 ... j-1];
(Formula 3)Next [j]=0, other situations.
For example, to make a concrete analysis of the situation of next [j] array shown in table 1 for example:
Table 1
P | a | b | a | b | a |
j | 0 | 1 | 2 | 3 | 4 |
next[j] | -1 | 0 | 0 | 1 | 2 |
In j=0, then it is suitable for(Formula 1), i.e. j=0, then next [j]=- 1;
In j=1, when 2, then it is suitable for(Formula 3), then next [j]=0;
In j=3, it is suitable for(Formula 2), before j=3, only j=2 is identical with j=0, then only one of which value then next [3]=1;
In j=4, it is suitable for(Formula 2), before j=4, j=2 is identical with j=0, and, j=3 is identical with j=1, then have two
Value, then next [4]=2.
I.e. in j=3 and j=4, next [j]=k>When 0, represent P [0...k-1]=P [j-k, j-1].
The thought of KMP algorithm:In the matching process, if there is unmatched situation, if next [j]>=0, then text string
Pointer i constant, position that the pointer j of pattern string is moved to next [j] proceeds to mate;If next [j]=- 1, by i
Move to right 1, and j is set to 0, proceed to compare, complexity is O (n).
BMH2C algorithm:In BMH2C algorithm, by the text character corresponding with the last character of pattern string and should
The character late of text character is as a substring(That is, there are two characters in this substring), when this substring is in pattern string
During appearance, then pattern string moves right so that this substring rightmost in pattern string occurs aliging with it;Otherwise, pattern string is right
During shifting, directly skip this substring, that is, the amount of moving to right is m+1.
Determine, using two characters, the amount of moving to right, side-play amount array skip [char1] can be represented with two-dimensional array
[char2], as shown in Fig. 2 the block diagram for BMH2C algorithm:
Step S201, when array initializes, the value of two-dimensional array is all set to m+1, i.e. Skip [i] [j]=m+1;
Step S203, can also revise to the value of skip array further in initialization, skip [i] [p [0]] is put
For m, that is, Skip [i] [p [0]]=m, is because being additionally contemplates that a kind of special circumstances, that is, when after one word of substring S [i] S [i+1]
Although substring S [i] S [i+1] does not occur in pattern when symbol S [i+1] is identical with the first character P [0] of pattern, but if
Move to right m+1 to be likely to miss a kind of match condition, therefore only should move to right m, make first character P [0] and the substring of pattern
Character S [i+1] alignment afterwards;
Step S205, initialized 3rd step is all substring setting amounts of moving to right accordingly for occurring in pattern, Skip
[p[k]][p[k+1]]=m-k-1.
In case of mismatch, preferentially redirect mode according to what Skip array obtained that shift value determines text string.In Skip
In the case that array obtains positive displacement value, redirect mode according to what this positive displacement value determined text string;Born in Skip array
In the case of shift value, positive displacement value is obtained according to Next array, redirect mode according to what this positive displacement value determined text string.
That is, the shift value of Skip array is the live part reducing pattern complexity in KMP algorithm, the displacement of Next array
It is worth the live part for significantly redirecting in BMH2C algorithm.Take the combination of KMP algorithm and both BMH2C algorithms live part, retain
The two advantage, obtain the faster pattern match of desired speed under general scenario with poor in the case of the lower algorithm of complexity, just
It is the place of the core of the present invention.
According to one embodiment of present invention, there is provided a kind of matching algorithm, as shown in figure 3, being to be implemented according to the present invention
The flow chart of the matching algorithm of example.
Step S301, starts, task initialization, builds Skip array and Next array according to pattern string;
Step S303, text string S and pattern string P is carried out left-justify;
Step S305, defines the vernier that i is text string S, defines the vernier that j is pattern string P, then step S303 can represent
It is that the zero-bit of the i of text string S is alignd with the j zero-bit of pattern string P;
Step S307, judges whether the vernier i of text string S is less than slen(The i.e. length of pattern string S), when i is less than slen
When, then it is circulated, execution step S309, when i is more than slen, then execute S321;
Step S309, judges the word of the character of current cursor i indication of text string S and the current cursor j indication of pattern string P
Whether symbol is identical, if S [i]=P [j], equal then continuation coupling, execution step S311, and if it is different, then meaning, mismatch occurs,
Execution step S313;
Step S311, the vernier of pattern string and text string advances a character bit respectively, prepares to mate next time;
Step S313, after mismatch, calculates the shift value in skip, that is, inquires about skip array, be then back to skip after mismatch
The shift value of array;
Step S315, judges the positive negativity of the shift value that skip array returns, if displacement is just execution step
S319, when displacement is negative, then execution step S317;
Step S317, the shift value that skip array returns is that negative then pressure i does not recall, and then inquires about next array, literary composition
The vernier i of this string keeps constant, and the vernier j of pattern string is set to the shift value of next array return, i.e. j=next [j];
Step S319, for canonical using redirecting, the vernier i of text string presses shift value to the shift value that skip array returns
Advance corresponding character bit, and vernier j resets, and that is, pattern string resets to pattern initial character;
Step S321, judges value and the plen of j(The length of pattern string P)Whether equal, judge whether that coupling completes, i.e.
Whether judgment model string is all covered, if success, execution step S323;
Step S323, pattern matching success, the position of return string;
Step S325, failure, there is no that the match is successful, return invalid value
Step S327, task termination.
Matching algorithm according to embodiments of the present invention combines KMP and the respective advantage of BMH2C algorithm, and initialization builds skip
With next array, in the matching process, first text string S and pattern string P are carried out left-justify, then mate character successively,
The position of coupling in returned text string in the case of joining.After mismatch, search skip array, obtain displacement.Now judge shift value
Positive and negative, if displacement be just, text string is redirected by this positive displacement value.If shift value is negative, inquires about next array, obtain
Take new displacement.Until the match is successful or to text string end.Just it is because next array displacement is permanent, then can ensure this algorithm
J value never recall, redirected with maximum displacement as far as possible, and complexity be O(n).
According to embodiments of the invention, additionally provide the coalignment of a kind of pattern string and text string.
As shown in figure 4, pattern string according to embodiments of the present invention is included with the coalignment of text string:
Matching module 41, for being mated pattern string with text string;
Determining module 42, in the case of pattern string and text string mismatch, according to being in advance based on pattern string structure
What the maximum positive displacement value that multiple pretreatment arrays obtain determined text string redirects mode.
Pattern string according to embodiments of the present invention is further included with the coalignment of text string:
Initialization module(Not shown), for before mated pattern string with text string, by pattern string and text
String carries out left-justify;
Position returns module(Not shown), in the case that the match is successful, identical with pattern string in returned text string
Character position
Multiple pretreatment arrays include Skip array and/or Next array, and determining module 32 preferentially obtains according to Skip array
Determine the mode that redirects of text string to shift value, wherein, Skip array can be the pretreatment array in KMP algorithm, and Next
Array can be the pretreatment array in BMH2C algorithm..
In the case that Skip array obtains positive displacement value, determining module 32 determines the jump of text string according to this positive displacement value
Turn mode;In the case that Skip array obtains negative displacement value, determining module 32 obtains positive displacement value according to Next array, according to
What this positive displacement value determined text string redirects mode.
In sum, by means of the technique scheme of the present invention, by the character mismatch in pattern string with text string
Afterwards, determine the mode that redirects of text string using the multiple pretreatment arrays building in advance, further, using the Skip of BMH2C
With the Next array of KMP, can protect in the case that text string after the character mismatch in Assured Mode string with text string is not recalled
Card text string is redirected with maximum positive displacement value, can take into account the advantage of KMP and BMH2C algorithm, Lifting scheme matching speed
With network security efficiency.
The foregoing is only presently preferred embodiments of the present invention, not in order to limit the present invention, all essences in the present invention
Within god and principle, any modification, equivalent substitution and improvement made etc., should be included within the scope of the present invention.
Claims (4)
1. a kind of pattern string and the matching process of text string are it is characterised in that include:
Described pattern string is mated with described text string;
In the case of described pattern string with described text string mismatch, according to the multiple pre- place being in advance based on described pattern string structure
What the reason maximum positive displacement value that obtains of array determined described text string redirects mode;
Wherein, multiple pretreatment arrays include Skip array and/or Next array;
Preferential redirect mode according to what Skip array obtained that shift value determines described text string;
The maximum positive displacement value that multiple pretreatment arrays according to being in advance based on described pattern string structure obtain determines described text
The mode that redirects of string includes:
In the case that described Skip array obtains positive displacement value, determine the side of redirecting of described text string according to this positive displacement value
Formula;
In the case that described Skip array obtains negative displacement value, positive displacement value is obtained according to described Next array, according to this just
What shift value determined described text string redirects mode.
2. matching process according to claim 1 is it is characterised in that carrying out described pattern string and described text string
Before joining, further include:
Described pattern string and described text string are carried out left-justify.
3. matching process according to claim 1 it is characterised in that mated described pattern string with described text string
Including:
In the case that the match is successful, return to the position with described pattern string identical character in described text string.
4. a kind of pattern string and the coalignment of text string are it is characterised in that include:
Matching module, for being mated described pattern string with described text string;
Determining module, in the case of described pattern string and described text string mismatch, according to being in advance based on described pattern string
The maximum positive displacement value that the multiple pretreatment arrays building obtain determines the mode that redirects of described text string, wherein, multiple pre- places
Reason array includes Skip array and/or Next array;
Described determining module is additionally operable to preferentially redirect mode according to what Skip array obtained that shift value determines described text string;
State determining module to be further used for:
In the case that described Skip array obtains positive displacement value, determine the side of redirecting of described text string according to this positive displacement value
Formula;
In the case that described Skip array obtains negative displacement value, positive displacement value is obtained according to described Next array, according to this just
What shift value determined described text string redirects mode.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310576313.7A CN103577598B (en) | 2013-11-15 | 2013-11-15 | Matching method and device for pattern string and text string |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310576313.7A CN103577598B (en) | 2013-11-15 | 2013-11-15 | Matching method and device for pattern string and text string |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103577598A CN103577598A (en) | 2014-02-12 |
CN103577598B true CN103577598B (en) | 2017-02-15 |
Family
ID=50049374
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201310576313.7A Active CN103577598B (en) | 2013-11-15 | 2013-11-15 | Matching method and device for pattern string and text string |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103577598B (en) |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104519056B (en) * | 2014-12-15 | 2017-09-08 | 广东科学技术职业学院 | A kind of single pattern matching method jumped based on double jump |
CN107341224A (en) * | 2017-06-30 | 2017-11-10 | 北方工业大学 | The matching process and device of a kind of character string |
CN110599028B (en) * | 2019-09-09 | 2022-05-17 | 深圳前海微众银行股份有限公司 | Text positioning method, device, equipment and storage medium |
CN112069303B (en) * | 2020-09-17 | 2022-08-16 | 四川长虹电器股份有限公司 | Matching search method and device for character strings and terminal |
CN113590895B (en) * | 2021-07-28 | 2023-04-25 | 西华大学 | Character string retrieval method |
CN113836367B (en) * | 2021-09-26 | 2023-04-28 | 杭州迪普科技股份有限公司 | Method and device for character reverse matching |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101409623A (en) * | 2008-11-26 | 2009-04-15 | 湖南大学 | Mode matching method facing to high speed network |
CN102750379A (en) * | 2012-06-25 | 2012-10-24 | 华南理工大学 | Fast character string matching method based on filtering type |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7539031B2 (en) * | 2006-09-19 | 2009-05-26 | Netlogic Microsystems, Inc. | Inexact pattern searching using bitmap contained in a bitcheck command |
-
2013
- 2013-11-15 CN CN201310576313.7A patent/CN103577598B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101409623A (en) * | 2008-11-26 | 2009-04-15 | 湖南大学 | Mode matching method facing to high speed network |
CN102750379A (en) * | 2012-06-25 | 2012-10-24 | 华南理工大学 | Fast character string matching method based on filtering type |
Non-Patent Citations (1)
Title |
---|
"一种快速的单模式匹配算法";杨子江等;《华南师范大学学报( 自然科学版)》;20130930;第31-35页 * |
Also Published As
Publication number | Publication date |
---|---|
CN103577598A (en) | 2014-02-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103577598B (en) | Matching method and device for pattern string and text string | |
CN103425739B (en) | A kind of character string matching method | |
EP2940557B1 (en) | Method and device used for providing input candidate item corresponding to input character string | |
CN102135814B (en) | A kind of character and word input method and system | |
CN102722709B (en) | Method and device for identifying garbage pictures | |
CN106371624B (en) | It is a kind of for provide input candidate item method, apparatus and input equipment | |
CN110532381A (en) | A kind of text vector acquisition methods, device, computer equipment and storage medium | |
CN102867049B (en) | Chinese PINYIN quick word segmentation method based on word search tree | |
CN102163234A (en) | Equipment and method for error correction of query sequence based on degree of error correction association | |
CN104281275B (en) | The input method of a kind of English and device | |
CN108734110A (en) | Text fragment identification control methods based on longest common subsequence and system | |
CN106484730A (en) | Character string matching method and device | |
CN103646029A (en) | Similarity calculation method for blog articles | |
CN103581224A (en) | Method and device for pushing information | |
Miller et al. | Tradeoffs between cost and information for rendezvous and treasure hunt | |
CN106777920A (en) | The method and apparatus for determining longest common subsequence | |
CN101901257A (en) | Multi-string matching method | |
CN108628907A (en) | A method of being used for the Trie tree multiple-fault diagnosis based on Aho-Corasick | |
CN107220333B (en) | character search method based on Sunday algorithm | |
CN102609450B (en) | Method for multi-mode string matching according to word length | |
CN104572872B (en) | A kind of data deduplication method of partition based on extreme value | |
CN103076894A (en) | Method and equipment for building input entries for object identity information according to object identity information | |
CN103218452A (en) | Method and device for recognizing valid interlinkage in Hub webpage | |
CN107341224A (en) | The matching process and device of a kind of character string | |
CN109376362A (en) | A kind of the determination method and relevant device of corrected text |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right |
Effective date of registration: 20220726 Address after: 100089 building 36, courtyard 8, Dongbeiwang West Road, Haidian District, Beijing Patentee after: Dawning Information Industry (Beijing) Co.,Ltd. Patentee after: DAWNING INFORMATION INDUSTRY Co.,Ltd. Address before: 100193 No. 36 Building, No. 8 Hospital, Wangxi Road, Haidian District, Beijing Patentee before: Dawning Information Industry (Beijing) Co.,Ltd. |
|
TR01 | Transfer of patent right |