CN103577598B - Matching method and device for pattern string and text string - Google Patents

Matching method and device for pattern string and text string Download PDF

Info

Publication number
CN103577598B
CN103577598B CN201310576313.7A CN201310576313A CN103577598B CN 103577598 B CN103577598 B CN 103577598B CN 201310576313 A CN201310576313 A CN 201310576313A CN 103577598 B CN103577598 B CN 103577598B
Authority
CN
China
Prior art keywords
string
text string
pattern
array
displacement value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201310576313.7A
Other languages
Chinese (zh)
Other versions
CN103577598A (en
Inventor
李开科
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dawning Information Industry Beijing Co Ltd
Dawning Information Industry Co Ltd
Original Assignee
Dawning Information Industry Beijing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dawning Information Industry Beijing Co Ltd filed Critical Dawning Information Industry Beijing Co Ltd
Priority to CN201310576313.7A priority Critical patent/CN103577598B/en
Publication of CN103577598A publication Critical patent/CN103577598A/en
Application granted granted Critical
Publication of CN103577598B publication Critical patent/CN103577598B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/90335Query processing
    • G06F16/90344Query processing by using string matching techniques

Abstract

The invention discloses a matching method and device for a pattern string and a text string. The matching method comprises the steps of matching the pattern string and the text string, and under the circumstance of the mismatch of the pattern string and the text string, determining the skip mode of the text string according to the maximum positive displacement value obtained based on multiple preprocessing arrays constructed by the pattern string in advance. According to the matching method and device for the pattern string and the text string, after the mismatch of the pattern string and the characters in the text string, the skip mode of the text string is determined by using the multiple preprocessing arrays constructed in advance, the effect that the pattern string skips with the maximum positive displacement value can be ensured under the circumstance that the effect that the pattern string does not backtrack after the mismatch of the pattern string and the characters in the text string is ensured, and thus the pattern matching speed and the safety of network management are improved.

Description

The matching process of pattern string and text string and device
Technical field
The present invention relates to computer realm, and especially, it is related to matching process and the dress of a kind of pattern string and text string Put.
Background technology
As the core algorithm of network security, its task is in a text string to pattern matching algorithm(Alternatively referred to as target String)In find designated character string(I.e. pattern string), and return this pattern string position, pattern matching algorithm occur in text string Efficiency directly influence the efficiency of management of network security.Initial force search is carried out by exhaustion, and its shortcoming exists Slow in matching speed, take length.The Single Pattern Matching Algorithms of the classics then occurring include Ke Nusi-Mo Lisi-Alexandre Desplat word Symbol string lookup algorithm(Knuth-Morris-Pratt, hereinafter referred to as KMP)Algorithm, slope skill that-mole character string search are calculated Method (Boyer-Moore, hereinafter referred to as BM) algorithm and the optimized algorithm Boyer-Moore-Horspool of BM(BMH)、 Boyer-Moore-Horspool-Sunday(BMHS), Boyer-Moore-Horspool-two-Chinese (hereinafter referred to as For BMH2C) etc., these classic algorithm mainly use and redirect to lift matching speed after character mismatch.
Wherein, the complexity of KMP algorithm is O(n)It is ensured that after character mismatch, pattern string is not recalled, using KMP algorithm In the worst cases(I.e. pattern string is located at the end of text string), there is higher performance.Although the jump distance of KMP algorithm (Alternatively referred to as shift value)Permanent is just, but because jump distance is less, average belavior of algorithm is low compared with BM algorithm.BM algorithm has Larger jump distance, has preferable average behavior, but because BM algorithm can draw negative shift value in some cases, that is, It cannot be guaranteed that not recalling, in the case of the worst and worse after character mismatch in text string(That is, pattern string is located at text string End or turn by end), BM algorithm has poor performance, and its complexity is O(mn).
For Single Pattern Matching Algorithms in correlation technique after mismatch, text string redirect less or backtracking lead to coupling effect The low problem of rate, not yet proposes effective solution at present.
Content of the invention
For Single Pattern Matching Algorithms in correlation technique after mismatch, text string redirect less or backtracking lead to coupling effect The low problem of rate, the present invention proposes the matching process of a kind of pattern string and text string, can be in Assured Mode string with text string Character mismatch after text string do not recall in the case of it is ensured that text string is redirected with maximum positive displacement value, improve pattern Matching speed and the safety of network management.
The technical scheme is that and be achieved in that:
According to an aspect of the invention, it is provided the matching process of a kind of pattern string and text string.
This matching process includes:Pattern string is mated with text string;
In the case of pattern string with text string mismatch, the multiple pretreatment arrays according to being in advance based on pattern string structure obtain To maximum positive displacement value determine text string redirect mode.
Preferably, multiple pretreatment arrays include Skip array and/or Next array.
And, preferential redirect mode according to what Skip array obtained that shift value determines text string.
Alternatively, literary composition is determined according to the maximum positive displacement value that the multiple pretreatment arrays being in advance based on pattern string structure obtain The mode that redirects of this string includes:
In the case that Skip array obtains positive displacement value, redirect mode according to what this positive displacement value determined text string;
In the case that Skip array obtains negative displacement value, positive displacement value is obtained according to Next array, according to this positive displacement Value determine text string redirect mode.
Further, before pattern string is mated with text string, pattern string and text string are carried out left-justify.
Wherein, by pattern string and text string carry out mating including:In the case that the match is successful, with mould in returned text string The position of formula string identical character.
According to another aspect of the present invention, there is provided the coalignment of a kind of pattern string and text string.
This coalignment includes:
Matching module, for being mated pattern string with text string;
Determining module, in the case of pattern string and text string mismatch, according to being in advance based on the many of pattern string structure What the maximum positive displacement value that individual pretreatment array obtains determined text string redirects mode.
Preferably, multiple pretreatment arrays include Skip array and/or Next array.
And, above-mentioned determining module is additionally operable to preferentially obtain, according to Skip array, the side of redirecting that shift value determines text string Formula.
Alternatively, above-mentioned determining module is additionally operable to:
In the case that Skip array obtains positive displacement value, redirect mode according to what this positive displacement value determined text string;
In the case that Skip array obtains negative displacement value, positive displacement value is obtained according to Next array, according to this positive displacement Value determine text string redirect mode.
The present invention by after the character mismatch in pattern string and text string, using the multiple pretreatment arrays building in advance Determine text string redirects mode, being capable of text string is not recalled after the character mismatch in Assured Mode string and text string situation Down it is ensured that text string is redirected with maximum positive displacement value, improve the safety of pattern matching speed and network management.
Brief description
Fig. 1 is the flow chart of pattern string according to embodiments of the present invention and the matching process of text string;
Fig. 2 is the flow chart of BMH2C algorithm in prior art;
Fig. 3 is the flow chart of matching algorithm according to an embodiment of the invention;
Fig. 4 is the block diagram of pattern string according to embodiments of the present invention and the coalignment of text string.
Specific embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, complete Site preparation description is it is clear that described embodiment is only a part of embodiment of the present invention, rather than whole embodiments.It is based on Embodiment in the present invention, the every other embodiment that those of ordinary skill in the art are obtained, broadly fall into present invention protection Scope.
According to embodiments of the invention, there is provided the matching management method of a kind of pattern string and text string.
As shown in figure 1, pattern string according to embodiments of the present invention is included with the matching process of text string:
Step S101, pattern string is mated with text string, further, is mated with text string by pattern string Before, pattern string and text string are carried out left-justify, in the case that the match is successful, with pattern string identical in returned text string The position of character;
Step S103, in the case of pattern string and text string mismatch, according to being in advance based on the multiple pre- of pattern string structure Process that the maximum positive displacement value that obtains of array determines text string redirects mode, it is to avoid in text string mismatch condition in prior art Under, only shift value is obtained by single pretreatment array, cause shift value to be negative or less defect such that it is able to ensure It is ensured that text string is carried out with maximum positive displacement value in the case that after character mismatch in pattern string and text string, text string is not recalled Redirect, improve the safety of pattern matching speed and network management.
Preferentially, multiple pretreatment arrays include Skip array and/or Next array, and wherein, Skip array can be KMP Pretreatment array in algorithm, and Next array can be the pretreatment array in BMH2C algorithm.Hereinafter simply introduce KMP to calculate Method, BMH2C algorithm and both algorithms are compared, wherein, text string S is the text needing to be searched, and pattern string P is Need the character string searched in the text, i text string vernier pointer(Also referred to as subscript), j is pattern string vernier pointer, and n is literary composition This string length, m is pattern string length.
KMP algorithm:In KMP algorithm, in order to ensure, after character mismatch, during coupling next time, the position of j, introduces pre- place Reason array next [] array, what the value of next [j] represented the longest " suffix is equal to prefix " in P [0~j-1] sews length.
Next [] array is defined as follows:
(Formula 1)Next [0]=- 1, wherein, j=0, i.e. the pretreatment values of the first character of any character string are defined as -1;
(Formula 2)next[j]=kmax, wherein 0<k<j;P [0,1,2 ... k-1]=P [j-k, j-k+1, j-k+2 ... j-1];
(Formula 3)Next [j]=0, other situations.
For example, to make a concrete analysis of the situation of next [j] array shown in table 1 for example:
Table 1
P a b a b a
j 0 1 2 3 4
next[j] -1 0 0 1 2
In j=0, then it is suitable for(Formula 1), i.e. j=0, then next [j]=- 1;
In j=1, when 2, then it is suitable for(Formula 3), then next [j]=0;
In j=3, it is suitable for(Formula 2), before j=3, only j=2 is identical with j=0, then only one of which value then next [3]=1;
In j=4, it is suitable for(Formula 2), before j=4, j=2 is identical with j=0, and, j=3 is identical with j=1, then have two Value, then next [4]=2.
I.e. in j=3 and j=4, next [j]=k>When 0, represent P [0...k-1]=P [j-k, j-1].
The thought of KMP algorithm:In the matching process, if there is unmatched situation, if next [j]>=0, then text string Pointer i constant, position that the pointer j of pattern string is moved to next [j] proceeds to mate;If next [j]=- 1, by i Move to right 1, and j is set to 0, proceed to compare, complexity is O (n).
BMH2C algorithm:In BMH2C algorithm, by the text character corresponding with the last character of pattern string and should The character late of text character is as a substring(That is, there are two characters in this substring), when this substring is in pattern string During appearance, then pattern string moves right so that this substring rightmost in pattern string occurs aliging with it;Otherwise, pattern string is right During shifting, directly skip this substring, that is, the amount of moving to right is m+1.
Determine, using two characters, the amount of moving to right, side-play amount array skip [char1] can be represented with two-dimensional array [char2], as shown in Fig. 2 the block diagram for BMH2C algorithm:
Step S201, when array initializes, the value of two-dimensional array is all set to m+1, i.e. Skip [i] [j]=m+1;
Step S203, can also revise to the value of skip array further in initialization, skip [i] [p [0]] is put For m, that is, Skip [i] [p [0]]=m, is because being additionally contemplates that a kind of special circumstances, that is, when after one word of substring S [i] S [i+1] Although substring S [i] S [i+1] does not occur in pattern when symbol S [i+1] is identical with the first character P [0] of pattern, but if Move to right m+1 to be likely to miss a kind of match condition, therefore only should move to right m, make first character P [0] and the substring of pattern Character S [i+1] alignment afterwards;
Step S205, initialized 3rd step is all substring setting amounts of moving to right accordingly for occurring in pattern, Skip [p[k]][p[k+1]]=m-k-1.
In case of mismatch, preferentially redirect mode according to what Skip array obtained that shift value determines text string.In Skip In the case that array obtains positive displacement value, redirect mode according to what this positive displacement value determined text string;Born in Skip array In the case of shift value, positive displacement value is obtained according to Next array, redirect mode according to what this positive displacement value determined text string.
That is, the shift value of Skip array is the live part reducing pattern complexity in KMP algorithm, the displacement of Next array It is worth the live part for significantly redirecting in BMH2C algorithm.Take the combination of KMP algorithm and both BMH2C algorithms live part, retain The two advantage, obtain the faster pattern match of desired speed under general scenario with poor in the case of the lower algorithm of complexity, just It is the place of the core of the present invention.
According to one embodiment of present invention, there is provided a kind of matching algorithm, as shown in figure 3, being to be implemented according to the present invention The flow chart of the matching algorithm of example.
Step S301, starts, task initialization, builds Skip array and Next array according to pattern string;
Step S303, text string S and pattern string P is carried out left-justify;
Step S305, defines the vernier that i is text string S, defines the vernier that j is pattern string P, then step S303 can represent It is that the zero-bit of the i of text string S is alignd with the j zero-bit of pattern string P;
Step S307, judges whether the vernier i of text string S is less than slen(The i.e. length of pattern string S), when i is less than slen When, then it is circulated, execution step S309, when i is more than slen, then execute S321;
Step S309, judges the word of the character of current cursor i indication of text string S and the current cursor j indication of pattern string P Whether symbol is identical, if S [i]=P [j], equal then continuation coupling, execution step S311, and if it is different, then meaning, mismatch occurs, Execution step S313;
Step S311, the vernier of pattern string and text string advances a character bit respectively, prepares to mate next time;
Step S313, after mismatch, calculates the shift value in skip, that is, inquires about skip array, be then back to skip after mismatch The shift value of array;
Step S315, judges the positive negativity of the shift value that skip array returns, if displacement is just execution step S319, when displacement is negative, then execution step S317;
Step S317, the shift value that skip array returns is that negative then pressure i does not recall, and then inquires about next array, literary composition The vernier i of this string keeps constant, and the vernier j of pattern string is set to the shift value of next array return, i.e. j=next [j];
Step S319, for canonical using redirecting, the vernier i of text string presses shift value to the shift value that skip array returns Advance corresponding character bit, and vernier j resets, and that is, pattern string resets to pattern initial character;
Step S321, judges value and the plen of j(The length of pattern string P)Whether equal, judge whether that coupling completes, i.e. Whether judgment model string is all covered, if success, execution step S323;
Step S323, pattern matching success, the position of return string;
Step S325, failure, there is no that the match is successful, return invalid value
Step S327, task termination.
Matching algorithm according to embodiments of the present invention combines KMP and the respective advantage of BMH2C algorithm, and initialization builds skip With next array, in the matching process, first text string S and pattern string P are carried out left-justify, then mate character successively, The position of coupling in returned text string in the case of joining.After mismatch, search skip array, obtain displacement.Now judge shift value Positive and negative, if displacement be just, text string is redirected by this positive displacement value.If shift value is negative, inquires about next array, obtain Take new displacement.Until the match is successful or to text string end.Just it is because next array displacement is permanent, then can ensure this algorithm J value never recall, redirected with maximum displacement as far as possible, and complexity be O(n).
According to embodiments of the invention, additionally provide the coalignment of a kind of pattern string and text string.
As shown in figure 4, pattern string according to embodiments of the present invention is included with the coalignment of text string:
Matching module 41, for being mated pattern string with text string;
Determining module 42, in the case of pattern string and text string mismatch, according to being in advance based on pattern string structure What the maximum positive displacement value that multiple pretreatment arrays obtain determined text string redirects mode.
Pattern string according to embodiments of the present invention is further included with the coalignment of text string:
Initialization module(Not shown), for before mated pattern string with text string, by pattern string and text String carries out left-justify;
Position returns module(Not shown), in the case that the match is successful, identical with pattern string in returned text string Character position
Multiple pretreatment arrays include Skip array and/or Next array, and determining module 32 preferentially obtains according to Skip array Determine the mode that redirects of text string to shift value, wherein, Skip array can be the pretreatment array in KMP algorithm, and Next Array can be the pretreatment array in BMH2C algorithm..
In the case that Skip array obtains positive displacement value, determining module 32 determines the jump of text string according to this positive displacement value Turn mode;In the case that Skip array obtains negative displacement value, determining module 32 obtains positive displacement value according to Next array, according to What this positive displacement value determined text string redirects mode.
In sum, by means of the technique scheme of the present invention, by the character mismatch in pattern string with text string Afterwards, determine the mode that redirects of text string using the multiple pretreatment arrays building in advance, further, using the Skip of BMH2C With the Next array of KMP, can protect in the case that text string after the character mismatch in Assured Mode string with text string is not recalled Card text string is redirected with maximum positive displacement value, can take into account the advantage of KMP and BMH2C algorithm, Lifting scheme matching speed With network security efficiency.
The foregoing is only presently preferred embodiments of the present invention, not in order to limit the present invention, all essences in the present invention Within god and principle, any modification, equivalent substitution and improvement made etc., should be included within the scope of the present invention.

Claims (4)

1. a kind of pattern string and the matching process of text string are it is characterised in that include:
Described pattern string is mated with described text string;
In the case of described pattern string with described text string mismatch, according to the multiple pre- place being in advance based on described pattern string structure What the reason maximum positive displacement value that obtains of array determined described text string redirects mode;
Wherein, multiple pretreatment arrays include Skip array and/or Next array;
Preferential redirect mode according to what Skip array obtained that shift value determines described text string;
The maximum positive displacement value that multiple pretreatment arrays according to being in advance based on described pattern string structure obtain determines described text The mode that redirects of string includes:
In the case that described Skip array obtains positive displacement value, determine the side of redirecting of described text string according to this positive displacement value Formula;
In the case that described Skip array obtains negative displacement value, positive displacement value is obtained according to described Next array, according to this just What shift value determined described text string redirects mode.
2. matching process according to claim 1 is it is characterised in that carrying out described pattern string and described text string Before joining, further include:
Described pattern string and described text string are carried out left-justify.
3. matching process according to claim 1 it is characterised in that mated described pattern string with described text string Including:
In the case that the match is successful, return to the position with described pattern string identical character in described text string.
4. a kind of pattern string and the coalignment of text string are it is characterised in that include:
Matching module, for being mated described pattern string with described text string;
Determining module, in the case of described pattern string and described text string mismatch, according to being in advance based on described pattern string The maximum positive displacement value that the multiple pretreatment arrays building obtain determines the mode that redirects of described text string, wherein, multiple pre- places Reason array includes Skip array and/or Next array;
Described determining module is additionally operable to preferentially redirect mode according to what Skip array obtained that shift value determines described text string;
State determining module to be further used for:
In the case that described Skip array obtains positive displacement value, determine the side of redirecting of described text string according to this positive displacement value Formula;
In the case that described Skip array obtains negative displacement value, positive displacement value is obtained according to described Next array, according to this just What shift value determined described text string redirects mode.
CN201310576313.7A 2013-11-15 2013-11-15 Matching method and device for pattern string and text string Active CN103577598B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310576313.7A CN103577598B (en) 2013-11-15 2013-11-15 Matching method and device for pattern string and text string

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310576313.7A CN103577598B (en) 2013-11-15 2013-11-15 Matching method and device for pattern string and text string

Publications (2)

Publication Number Publication Date
CN103577598A CN103577598A (en) 2014-02-12
CN103577598B true CN103577598B (en) 2017-02-15

Family

ID=50049374

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310576313.7A Active CN103577598B (en) 2013-11-15 2013-11-15 Matching method and device for pattern string and text string

Country Status (1)

Country Link
CN (1) CN103577598B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104519056B (en) * 2014-12-15 2017-09-08 广东科学技术职业学院 A kind of single pattern matching method jumped based on double jump
CN107341224A (en) * 2017-06-30 2017-11-10 北方工业大学 The matching process and device of a kind of character string
CN110599028B (en) * 2019-09-09 2022-05-17 深圳前海微众银行股份有限公司 Text positioning method, device, equipment and storage medium
CN112069303B (en) * 2020-09-17 2022-08-16 四川长虹电器股份有限公司 Matching search method and device for character strings and terminal
CN113590895B (en) * 2021-07-28 2023-04-25 西华大学 Character string retrieval method
CN113836367B (en) * 2021-09-26 2023-04-28 杭州迪普科技股份有限公司 Method and device for character reverse matching

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101409623A (en) * 2008-11-26 2009-04-15 湖南大学 Mode matching method facing to high speed network
CN102750379A (en) * 2012-06-25 2012-10-24 华南理工大学 Fast character string matching method based on filtering type

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7539031B2 (en) * 2006-09-19 2009-05-26 Netlogic Microsystems, Inc. Inexact pattern searching using bitmap contained in a bitcheck command

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101409623A (en) * 2008-11-26 2009-04-15 湖南大学 Mode matching method facing to high speed network
CN102750379A (en) * 2012-06-25 2012-10-24 华南理工大学 Fast character string matching method based on filtering type

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
"一种快速的单模式匹配算法";杨子江等;《华南师范大学学报( 自然科学版)》;20130930;第31-35页 *

Also Published As

Publication number Publication date
CN103577598A (en) 2014-02-12

Similar Documents

Publication Publication Date Title
CN103577598B (en) Matching method and device for pattern string and text string
CN103425739B (en) A kind of character string matching method
EP2940557B1 (en) Method and device used for providing input candidate item corresponding to input character string
CN102135814B (en) A kind of character and word input method and system
CN102722709B (en) Method and device for identifying garbage pictures
CN106371624B (en) It is a kind of for provide input candidate item method, apparatus and input equipment
CN110532381A (en) A kind of text vector acquisition methods, device, computer equipment and storage medium
CN102867049B (en) Chinese PINYIN quick word segmentation method based on word search tree
CN102163234A (en) Equipment and method for error correction of query sequence based on degree of error correction association
CN104281275B (en) The input method of a kind of English and device
CN108734110A (en) Text fragment identification control methods based on longest common subsequence and system
CN106484730A (en) Character string matching method and device
CN103646029A (en) Similarity calculation method for blog articles
CN103581224A (en) Method and device for pushing information
Miller et al. Tradeoffs between cost and information for rendezvous and treasure hunt
CN106777920A (en) The method and apparatus for determining longest common subsequence
CN101901257A (en) Multi-string matching method
CN108628907A (en) A method of being used for the Trie tree multiple-fault diagnosis based on Aho-Corasick
CN107220333B (en) character search method based on Sunday algorithm
CN102609450B (en) Method for multi-mode string matching according to word length
CN104572872B (en) A kind of data deduplication method of partition based on extreme value
CN103076894A (en) Method and equipment for building input entries for object identity information according to object identity information
CN103218452A (en) Method and device for recognizing valid interlinkage in Hub webpage
CN107341224A (en) The matching process and device of a kind of character string
CN109376362A (en) A kind of the determination method and relevant device of corrected text

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20220726

Address after: 100089 building 36, courtyard 8, Dongbeiwang West Road, Haidian District, Beijing

Patentee after: Dawning Information Industry (Beijing) Co.,Ltd.

Patentee after: DAWNING INFORMATION INDUSTRY Co.,Ltd.

Address before: 100193 No. 36 Building, No. 8 Hospital, Wangxi Road, Haidian District, Beijing

Patentee before: Dawning Information Industry (Beijing) Co.,Ltd.

TR01 Transfer of patent right