CN107341224A - The matching process and device of a kind of character string - Google Patents

The matching process and device of a kind of character string Download PDF

Info

Publication number
CN107341224A
CN107341224A CN201710523458.9A CN201710523458A CN107341224A CN 107341224 A CN107341224 A CN 107341224A CN 201710523458 A CN201710523458 A CN 201710523458A CN 107341224 A CN107341224 A CN 107341224A
Authority
CN
China
Prior art keywords
character
string
pattern string
matching
pattern
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710523458.9A
Other languages
Chinese (zh)
Inventor
王辉柏
袁闻
李晋宏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
North China University of Technology
Original Assignee
North China University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by North China University of Technology filed Critical North China University of Technology
Priority to CN201710523458.9A priority Critical patent/CN107341224A/en
Publication of CN107341224A publication Critical patent/CN107341224A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/90335Query processing
    • G06F16/90344Query processing by using string matching techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention relates to computer realm, the matching process and device of more particularly to a kind of character string, to improve the matching efficiency of character string, resource consumption is reduced.This method is:When pattern string is matched with matching string, can be when it is determined that the adjacent next character character T for the partial character string currently alignd with pattern string be appeared in the pattern string, further according to the value condition of the left side character of the character T or right side character, determine the pattern string redirects mode.So, judged for the left side character or right side character of next character of the partial character string of current matching, being compared compared to single character has more obvious advantage, Rule of judgment is more flexible, matching advantage is had more in the string matching for having more repeat character (RPT), so as to effectively increase the matching efficiency of character string, resource consumption is reduced.

Description

The matching process and device of a kind of character string
Technical field
The present invention relates to computer realm, the matching process and device of more particularly to a kind of character string.
Background technology
In today of network information rapid development, the retrieval of the safety and information of information is as the important of current information-intensive society Problem, and the mode-matching technique of character string is the key point for solving these problems.Either found from numerous contents non- Method information, also it is to aid in user and oneself content interested is retrieved from mass data, all be unable to do without the pattern match of character string Technology.The application of string matching technology includes every aspect, is specially:Search engine, document management, Viral diagnosis etc..
Huge network information it is also proposed higher requirement to the efficiency and accuracy rate of string matching technology.Generally In the case of, after user inputs self-defined keyword, keyword and captions library text can be carried out string matching by system, and lookup is It is no satisfactory captions to be present and return to retrieval result.In short, the pattern matching problem of character string is exactly when shorter In, search pattern string (i.e. keyword) whether there is in given text, if there is the position for then determining its appearance.
At present, the lookup of implementation pattern string is carried out usually using Sunday algorithms.Sunday algorithms are a kind of efficient, quick String matching algorithm, the warps such as Cnut-Mo Lisi-Alexandre Desplat operation (KMP) algorithm are substantially better than on algorithm speed Allusion quotation algorithm.Sunday algorithms to matching order does not clearly require, both can from character string corresponding to text to be matched (with Lower abbreviation matching string) the left side start can also the right start, simply when occur mismatch character when, algorithm redirects Distance is bigger, can ignore more idle characters, so as to improve matching efficiency.
When there is pattern matching failure, Sunday algorithms can compare last that pairing is participated in matching string Position character next bit character, judge whether to find the next bit character in pattern string, if not finding, by pattern string according to Than pattern string length more than the displacement of a character move, if finding, by pattern string according to most right in pattern string The displacement of the mismatch character at end to the more characters of length of last position character distance moves so that the next bit The matching alignment of character and low order end in pattern string.Wherein, Sunday algorithms using a next array come computation schema Low order end mismatches character to the distance of last position character in string, and all characters in pattern string are saved in next arrays, and Different distance values is assigned to each character, the distance value of a character is the position to last position according to the appearance of a character The distance of character is calculated.If character is not appeared in pattern string, its corresponding distance value is the length of pattern string Value.
For example, as shown in fig.1, exemplified by performing the matching process of Sunday algorithms.
In the second row, determinating mode string " abcd " mismatches with " bcda " in matching string, and next word Symbol " e " is not located in pattern string, accordingly, it is determined that jump distance is 4+1=5.
In the third line, determinating mode string " abcd " mismatches with " fecd " in matching string, and next word Symbol " d " is located in pattern string, then redirects 1 according to next arrays.(i.e. pattern string moves right after a character so that the next bit Character d aligns with the matching character d of low order end in pattern string)
……
By that analogy, carry out matching according to above-mentioned rule to redirect, until the match is successful, matching times are 6 times.
It is appearance every time however, when being matched from the front to the back in matching string using Sunday algorithms With it is failed when can just be redirected according to next arrays, matched into next round, it is therefore inefficient.
Currently there are some innovatory algorithms, e.g., reduce the number of syntype String matching by the way of suffix compares, but after Sew compare consumption resource it is also higher.And for example, matched using the method for the first substring in comparison pattern string, although reducing Full matching times, but the number of comparisons of monocase is still a lot.
In view of this, it is necessary to design a kind of new matching algorithm to overcome drawbacks described above.
The content of the invention
The embodiment of the present invention provides a kind of matching process and device of character string, to improve the matching efficiency of character string, Reduce resource consumption.
Concrete technical scheme provided in an embodiment of the present invention is as follows:
A kind of matching process of character string, including:
Determine matching string and the pattern string as matching object;
Judge in matching string, the adjacent next character character T for the partial character string currently alignd with pattern string is It is no to appear in the pattern string;
When determining that the character T is appeared in the pattern string, further according to the left side character of the character T or right side The value condition of character, determine the pattern string redirects mode.
Optionally, when determining that the character T is appeared in the pattern string, further according to the left side word of the character T The value of symbol or right side character, the mode that redirects of the pattern string is determined, including:
Determine that the character T is appeared in the pattern string, and the character T is the ultimate character in the pattern string When, determine whether character T in the matching string left side character whether the left side word with character T in the pattern string Accord with identical;
If so, the character T in pattern string is alignd with the character T in matching string;
Otherwise, after the character T in pattern string is alignd with the character T in matching string, then pattern string moved to right one Individual character.
Optionally, when determining that the character T is appeared in the pattern string, further according to the left side word of the character T The value condition of symbol or right side character, the mode that redirects of the pattern string is determined, including:
Determine that the character T is appeared in the pattern string, and the character T is not the ultimate character in the pattern string When, determine whether character T in the matching string right side character whether the right side word with character T in the pattern string Accord with identical
If so, then the character T in pattern string is alignd with the character T in matching string;
Otherwise, the appearance situation according to the right side character of character T in the matching string in pattern string, determines institute That states pattern string redirects mode.
Optionally, the appearance situation according to the right side character of character T in the matching string in pattern string, it is determined that The pattern string redirects mode, including:
If judging whether the right side character of character T in the matching string occurs in pattern string;
If so, after then the character T in matching string is alignd with the character T in pattern string, then pattern string moved to right One character;
Otherwise, then pattern string is redirected according to character M+2, wherein, M is the number of characters of pattern string;
Optionally, further comprise:
If the character T is not appeared in the pattern string, matching string is compareed, by pattern string according to step-length M+ 1 is redirected, wherein, M is the number of characters of pattern string.
A kind of coalignment of character string, including:
Determining unit, for determining matching string and as the pattern string for matching object;
Judging unit, for judging in matching string, the partial character string currently alignd with pattern string it is adjacent under Whether one character character T appears in the pattern string;
Processing unit, during for determining that the character T is appeared in the pattern string, further according to the character T's The value condition of left side character or right side character, determine the pattern string redirects mode.
Optionally, when determining that the character T is appeared in the pattern string, further according to the left side word of the character T The value of symbol or right side character, determines when redirecting mode of the pattern string, the processing unit is used for:
Determine that the character T is appeared in the pattern string, and the character T is the ultimate character in the pattern string When, determine whether character T in the matching string left side character whether the left side word with character T in the pattern string Accord with identical;
If so, the character T in pattern string is alignd with the character T in matching string;
Otherwise, after the character T in pattern string is alignd with the character T in matching string, then pattern string moved to right one Individual character.
Optionally, when determining that the character T is appeared in the pattern string, further according to the left side word of the character T The value condition of symbol or right side character, determines when redirecting mode of the pattern string, the processing unit is used for:
Determine that the character T is appeared in the pattern string, and the character T is not the ultimate character in the pattern string When, determine whether character T in the matching string right side character whether the right side word with character T in the pattern string Accord with identical
If so, then the character T in pattern string is alignd with the character T in matching string;
Otherwise, the appearance situation according to the right side character of character T in the matching string in pattern string, determines institute That states pattern string redirects mode.
Optionally, the appearance situation according to the right side character of character T in the matching string in pattern string, it is determined that When redirecting mode of the pattern string, the processing unit are used for:
If judging whether the right side character of character T in the matching string occurs in pattern string;
If so, after then the character T in matching string is alignd with the character T in pattern string, then pattern string moved to right One character;
Otherwise, then pattern string is redirected according to character M+2, wherein, M is the number of characters of pattern string;
Optionally, the processing unit is further used for:
If the character T is not appeared in the pattern string, matching string is compareed, by pattern string according to step-length M+ 1 is redirected, wherein, M is the number of characters of pattern string.
The present invention has the beneficial effect that:
, can be it is determined that currently and pattern string when pattern string is matched with matching string in the embodiment of the present invention When adjacent next character character T of the partial character string of alignment is appeared in the pattern string, further according to the character T's The value condition of left side character or right side character, determine the pattern string redirects mode.So, for the part of current matching The left side character or right side character of next character of character string are judged have compared to single character comparison more obviously excellent Gesture, Rule of judgment is more flexible, matching advantage is had more in the string matching for having more repeat character (RPT), so as to effectively improve The matching efficiency of character string, resource consumption is reduced,
Brief description of the drawings
Fig. 1 is traditional Sunday algorithmic match process schematic under prior art;
Fig. 2 is Wundday algorithmic match process flows diagram flow chart in the embodiment of the present invention;
Fig. 3 is Wundday algorithmic match process schematics in the embodiment of the present invention;
Fig. 4 is coalignment illustrative view of functional configuration in the embodiment of the present invention.
Embodiment
In order to improve the matching efficiency of character string, resource consumption is reduced, in the embodiment of the present invention, devise a kind of new With scheme, it is specially:The present invention proposes a kind of improved Sunday algorithms on the basis of Sunday matching algorithms are analyzed, and claims For WSunday algorithms, it is specially:Mismatched in character but the comparison of a character is added when not being the situation of batter's symbol, with This is that foundation increases pattern string jump distance, improves matching efficiency.
The embodiment preferential to the present invention is described in detail below in conjunction with the accompanying drawings.
The preprocessing part that WSunday algorithms start calculates next numbers corresponding to pattern string as Sunday algorithms Group, that is, each jump distance that pattern string can move right is counted in advance.In WSunday algorithms, The jump distance of pattern string not only needs to refer to last position in the character field currently matched in matching string with pattern string Next character of character, while also need to refer to presence situation of the adjacent character of above-mentioned ultimate character in pattern string.
Specifically, as shown in fig.2, in the embodiment of the present invention, Wundday algorithmic match processes comprise the following steps that:
Step 200:Determine matching string, and the pattern string as matching object.
In the embodiment of the present invention, it is assumed that matching string is designated as into T, T is:" bcdaefecdddceabcd ", and pattern String is designated as P, and P is:“abcd”.
So, the matching process of WSunday algorithms is introduced exemplified by P process is searched in T.
Step 201:Pattern string is matched with the partial character string currently alignd in matching string with pattern string, Judge whether that the match is successful, if so, terminating current process;Otherwise, step 202 is carried out.
Step 202:Judge whether adjacent next character (hereinafter referred to as character T) of above-mentioned partial character string appears in pattern In string, if not occurring, step 203 is performed;Otherwise, step 204 is performed.
Step 203:Matching string is compareed, pattern string is redirected according to M+1 characters, i.e., with treating after redirecting With partial character string alignment new in character string, wherein, M is the number of characters of pattern string, then, performs step 213.
Step 204;Judge whether character T is ultimate character in pattern string, if so, then performing step 205;Otherwise, hold Row step step 208.
If character T is appeared in pattern string, Sunday algorithms can be directly by the character T and matching string in pattern string In character T alignment.But in WSunday algorithms, then need first to judge appearance positions of the character T in pattern string, according to going out Show the difference of position and redirect mode using different.
Step 205:Judge the character T in matching string left side character whether the left side with the ultimate character of pattern string Side character is identical, if so, then performing step 206;Otherwise, step 207 is performed.
Step 206:Pattern string is redirected in such a way:By in the character T in pattern string and matching string Character T aligns, and then, performs step 213.
Step 207:Pattern string is redirected in such a way:By in the character T in pattern string and matching string After character T alignment, then pattern string moved to right into a character, then, perform step 213.
Step 208:Judge the character T in matching string right side character whether the right side with the character T in pattern string Side character is identical, if so, then performing step 209;Otherwise, step 210 is performed.
Step 209:Pattern string is redirected in such a way:By in the character T in pattern string and matching string Character T aligns, and then, performs step 213.
Step 210:Judge whether the right side character of the character T in matching string occurred in pattern string, if not Occurred, then perform step 211;If occurring, step 212 is performed.
Step 211:Matching string is compareed, pattern string is redirected according to character M+2, wherein, M is pattern string Number of characters, then, perform step 213.
Step 212:Pattern string is redirected in such a way:By in the character T in matching string and pattern string After character T alignment, then pattern string moved to right into a character, then, perform step 213.
Step 213:Judge with the presence or absence of the character not matched in matching string, if so, then return to step 201; Otherwise, current process is terminated.
Briefly, it is exactly in the part of judgment model string current matching in character to be matched in the embodiment of the present invention In character string while next character of ultimate character, the judgement to the left side character or right side character of the ultimate character is added, So as to determine corresponding jump distance according to different character values, so as to effectively increase jump distance, and then improve The purpose of matching efficiency.
Further description is made to above-described embodiment using a specific application scenarios below.
As shown in fig.3, in the 2nd row, when pattern string " abcd " aligns with " bcda " in matching string, With failed, and the character late " e " of " bcda " is not appeared in pattern string " abcd ", then pattern string " abcd " is redirected into M+ 1=5, reach the 3rd line position and put.
In the 3rd row, when pattern string " abcd " aligns with " fecd " in matching string, matching is failed, then Judge next character " d " of " fecd " as the ultimate character in " abcd ", the left side character " d " of next character " d " of " fecd " Mismatched with the left side character " c " of the ultimate character " d " in " abcd ",
Now, if using Sunday algorithms, pattern string " abcd " can be moved to right to a character, then, the present invention is real Apply in example, after WSundday algorithms, pattern string " abcd " can be redirected in such a way:By the character in pattern string After " d " aligns with the character late " d " of " fecd " in matching string, then a character is moved to right, reach the 4th line position Put.Obviously, compared to Sunday algorithms, pattern string " abcd " to the right more moves a character.
In the 4th row, when pattern string " abcd " aligns with " cddd " in matching string, matching is failed, then Next character " c " of " cddd " is judged not as the ultimate character in " abcd ", and the right side word of next character " c " of " cddd " Symbol " e " is not appeared in " abcd ", then pattern string " abcd " is redirected into M+2=6, is reached the 5th line position and is put.In the 5th row, work as mould When formula string " abcd " aligns with " abcd " in matching string, the match is successful, then matches flow and terminate.
Compared to the Sunday algorithms shown in Fig. 1, the WSundday algorithms shown in Fig. 3 the more move four characters, from And the matching times of final WSunday algorithms reduce 2 times for 4 times than Sunday algorithm.
By taking a specific experiment scene as an example.In Window7 operating platforms, the RCoreTMQuadCPU of Intel zero, internal memory In the environment of 4GB, 500kb sizes, the English language material of about 500,000 characters are selected, 500 times is repeated and BF algorithms, KMP algorithms, BM is calculated Method, Sunday algorithms, Sunday New algorithms [3] and WSunday algorithms are tested.
For the accuracy of Enhancement test, pattern string length two groups of data below more than 10 and 10 are chosen respectively and are compared Compared with experimental data is as shown in table 1, table 2.
Table 1 (experimental result when pattern string length is less than 10)
Table 2 (experimental result when pattern string length is more than 10)
By experimental result as can be seen that WSunday algorithms are better than traditional Sunday algorithms on contrast number of characters With SundayNew algorithms, the matching consumption of the increased character of WSunday algorithms is essentially consisted in, the jump of increase can be passed through Torque can have obvious reduction from compensating compared to traditional Sunday algorithms in total matching times.Especially It is that the speed advantage that this jump distance obtains is more obvious when pattern string length is less than 10.
Based on above-described embodiment, as shown in fig.4, in the embodiment of the present invention, for carrying out the device of string matching (i.e. Coalignment can be claimed) determining unit 40, judging unit 41 and processing unit 42 are comprised at least, wherein,
Determining unit 40, for determining matching string and as the pattern string for matching object;
Judging unit 41, for judging in matching string, the partial character string currently alignd with pattern string it is adjacent Whether next character character T appears in the pattern string;
Processing unit 42, during for determining that the character T is appeared in the pattern string, further according to the character T Left side character or right side character value condition, determine the pattern string redirects mode.
When determining that the character T is appeared in the pattern string, further according to the left side character of the character T or right side The value of character, determines when redirecting mode of the pattern string, and processing unit 42 is used for:
Determine that the character T is appeared in the pattern string, and the character T is the ultimate character in the pattern string When, determine whether character T in the matching string left side character whether the left side word with character T in the pattern string Accord with identical;
If so, the character T in pattern string is alignd with the character T in matching string;
Otherwise, after the character T in pattern string is alignd with the character T in matching string, then pattern string moved to right one Individual character.
When determining that the character T is appeared in the pattern string, further according to the left side character of the character T or right side The value condition of character, determines when redirecting mode of the pattern string, and processing unit 42 is used for:
Determine that the character T is appeared in the pattern string, and the character T is not the ultimate character in the pattern string When, determine whether character T in the matching string right side character whether the right side word with character T in the pattern string Accord with identical
If so, then the character T in pattern string is alignd with the character T in matching string;
Otherwise, the appearance situation according to the right side character of character T in the matching string in pattern string, determines institute That states pattern string redirects mode.
According to appearance situation of the right side character of character T in the matching string in pattern string, the mould is determined When redirecting mode of formula string, processing unit 42 are used for:
If judging whether the right side character of character T in the matching string occurs in pattern string;
If so, after then the character T in matching string is alignd with the character T in pattern string, then pattern string moved to right One character;
Otherwise, then pattern string is redirected according to character M+2, wherein, M is the number of characters of pattern string;
Processing unit 42 is further used for:
If the character T is not appeared in the pattern string, matching string is compareed, by pattern string according to step-length M+ 1 is redirected, wherein, M is the number of characters of pattern string.
, can be it is determined that currently and pattern string when pattern string is matched with matching string in the embodiment of the present invention When adjacent next character character T of the partial character string of alignment is appeared in the pattern string, further according to the character T's The value condition of left side character or right side character, determine the pattern string redirects mode.So, for the part of current matching The left side character or right side character of next character of character string are judged have compared to single character comparison more obviously excellent Gesture, Rule of judgment is more flexible, matching advantage is had more in the string matching for having more repeat character (RPT), so as to effectively improve The matching efficiency of character string, resource consumption is reduced,
It should be understood by those skilled in the art that, embodiments of the invention can be provided as method, system or computer program Product.Therefore, the present invention can use the reality in terms of complete hardware embodiment, complete software embodiment or combination software and hardware Apply the form of example.Moreover, the present invention can use the computer for wherein including computer usable program code in one or more The computer program production that usable storage medium is implemented on (including but is not limited to magnetic disk storage, CD-ROM, optical memory etc.) The form of product.
The present invention is the flow with reference to method according to embodiments of the present invention, equipment (system) and computer program product Figure and/or block diagram describe.It should be understood that can be by every first-class in computer program instructions implementation process figure and/or block diagram Journey and/or the flow in square frame and flow chart and/or block diagram and/or the combination of square frame.These computer programs can be provided The processors of all-purpose computer, special-purpose computer, Embedded Processor or other programmable data processing devices is instructed to produce A raw machine so that produced by the instruction of computer or the computing device of other programmable data processing devices for real The device for the function of being specified in present one flow of flow chart or one square frame of multiple flows and/or block diagram or multiple square frames.
These computer program instructions, which may be alternatively stored in, can guide computer or other programmable data processing devices with spy Determine in the computer-readable memory that mode works so that the instruction being stored in the computer-readable memory, which produces, to be included referring to Make the manufacture of device, the command device realize in one flow of flow chart or multiple flows and/or one square frame of block diagram or The function of being specified in multiple square frames.
These computer program instructions can be also loaded into computer or other programmable data processing devices so that counted Series of operation steps is performed on calculation machine or other programmable devices to produce computer implemented processing, so as in computer or The instruction performed on other programmable devices is provided for realizing in one flow of flow chart or multiple flows and/or block diagram one The step of function of being specified in individual square frame or multiple square frames.
Although preferred embodiments of the present invention have been described, but those skilled in the art once know basic creation Property concept, then can make other change and modification to these embodiments.So appended claims be intended to be construed to include it is excellent Select embodiment and fall into having altered and changing for the scope of the invention.
Obviously, those skilled in the art can carry out various changes and modification without departing from this hair to the embodiment of the present invention The spirit and scope of bright embodiment.So, if these modifications and variations of the embodiment of the present invention belong to the claims in the present invention And its within the scope of equivalent technologies, then the present invention is also intended to comprising including these changes and modification.

Claims (10)

  1. A kind of 1. matching process of character string, it is characterised in that including:
    Determine matching string and the pattern string as matching object;
    Judge in matching string, whether the adjacent next character character T for the partial character string currently alignd with pattern string goes out The present pattern string;
    When determining that the character T is appeared in the pattern string, further according to the left side character or right side character of the character T Value condition, determine the pattern string redirects mode.
  2. 2. the method as described in claim 1, it is characterised in that when determining that the character T is appeared in the pattern string, enter one The mode that redirects for according to the left side character of the character T or the value of right side character, determining the pattern string is walked, including:
    Determine that the character T is appeared in the pattern string, and the character T be the pattern string in ultimate character when, enter One step judge character T in the matching string left side character whether the left side character phase with character T in the pattern string Together;
    If so, the character T in pattern string is alignd with the character T in matching string;
    Otherwise, after the character T in pattern string is alignd with the character T in matching string, then pattern string moved to right into a word Symbol.
  3. 3. the method as described in claim 1, it is characterised in that when determining that the character T is appeared in the pattern string, enter one The mode that redirects for according to the left side character of the character T or the value condition of right side character, determining the pattern string is walked, including:
    Determine that the character T is appeared in the pattern string, and the character T be the pattern string in ultimate character when, Determine whether character T in the matching string right side character whether the right side character with character T in the pattern string It is identical
    If so, then the character T in pattern string is alignd with the character T in matching string;
    Otherwise, the appearance situation according to the right side character of character T in the matching string in pattern string, determines the mould Formula string redirects mode.
  4. 4. method as claimed in claim 3, it is characterised in that according to the right side character of character T in the matching string Appearance situation in pattern string, the mode that redirects of the pattern string is determined, including:
    If judging whether the right side character of character T in the matching string occurs in pattern string;
    If so, after then the character T in matching string is alignd with the character T in pattern string, then pattern string moved to right one Character;
    Otherwise, then pattern string is redirected according to character M+2, wherein, M is the number of characters of pattern string.
  5. 5. the method as described in claim any one of 1-4, it is characterised in that further comprise:
    If the character T is not appeared in the pattern string, matching string is compareed, pattern string is entered according to step-length M+1 Row redirects, wherein, M is the number of characters of pattern string.
  6. A kind of 6. coalignment of character string, it is characterised in that including:
    Determining unit, for determining matching string and as the pattern string for matching object;
    Judging unit, for judging in matching string, the adjacent next word for the partial character string currently alignd with pattern string Whether symbol character T appears in the pattern string;
    Processing unit, during for determining that the character T is appeared in the pattern string, further according to the left side of the character T The value condition of character or right side character, determine the pattern string redirects mode.
  7. 7. device as claimed in claim 6, it is characterised in that when determining that the character T is appeared in the pattern string, enter one Step determines when redirecting mode of the pattern string, the processing according to the left side character of the character T or the value of right side character Unit is used for:
    Determine that the character T is appeared in the pattern string, and the character T be the pattern string in ultimate character when, enter One step judge character T in the matching string left side character whether the left side character phase with character T in the pattern string Together;
    If so, the character T in pattern string is alignd with the character T in matching string;
    Otherwise, after the character T in pattern string is alignd with the character T in matching string, then pattern string moved to right into a word Symbol.
  8. 8. device as claimed in claim 6, it is characterised in that when determining that the character T is appeared in the pattern string, enter one Walk according to the left side character of the character T or the value condition of right side character, determine when redirecting mode of the pattern string, it is described Processing unit is used for:
    Determine that the character T is appeared in the pattern string, and the character T be the pattern string in ultimate character when, Determine whether character T in the matching string right side character whether the right side character with character T in the pattern string It is identical
    If so, then the character T in pattern string is alignd with the character T in matching string;
    Otherwise, the appearance situation according to the right side character of character T in the matching string in pattern string, determines the mould Formula string redirects mode.
  9. 9. device as claimed in claim 8, it is characterised in that according to the right side character of character T in the matching string Appearance situation in pattern string, determines when redirecting mode of the pattern string, and the processing unit is used for:
    If judging whether the right side character of character T in the matching string occurs in pattern string;
    If so, after then the character T in matching string is alignd with the character T in pattern string, then pattern string moved to right one Character;
    Otherwise, then pattern string is redirected according to character M+2, wherein, M is the number of characters of pattern string.
  10. 10. the device as described in claim any one of 6-9, it is characterised in that the processing unit is further used for:
    If the character T is not appeared in the pattern string, matching string is compareed, pattern string is entered according to step-length M+1 Row redirects, wherein, M is the number of characters of pattern string.
CN201710523458.9A 2017-06-30 2017-06-30 The matching process and device of a kind of character string Pending CN107341224A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710523458.9A CN107341224A (en) 2017-06-30 2017-06-30 The matching process and device of a kind of character string

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710523458.9A CN107341224A (en) 2017-06-30 2017-06-30 The matching process and device of a kind of character string

Publications (1)

Publication Number Publication Date
CN107341224A true CN107341224A (en) 2017-11-10

Family

ID=60219365

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710523458.9A Pending CN107341224A (en) 2017-06-30 2017-06-30 The matching process and device of a kind of character string

Country Status (1)

Country Link
CN (1) CN107341224A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108920483A (en) * 2018-04-28 2018-11-30 南京搜文信息技术有限公司 Character string fast matching method based on Suffix array clustering
CN109344301A (en) * 2018-09-26 2019-02-15 长沙学院 Method, computer data processing system, the information management system of construction ballot mark table
CN112069303A (en) * 2020-09-17 2020-12-11 四川长虹电器股份有限公司 Matching search method and device for character strings and terminal
CN113836367A (en) * 2021-09-26 2021-12-24 杭州迪普科技股份有限公司 Character reverse matching method and device

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103577598A (en) * 2013-11-15 2014-02-12 曙光信息产业(北京)有限公司 Matching method and device for pattern string and text string
CN104750683A (en) * 2013-12-25 2015-07-01 中国移动通信集团公司 Character string matching method and device
CN106557553A (en) * 2016-10-27 2017-04-05 东软集团股份有限公司 The method and device of Data Matching

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103577598A (en) * 2013-11-15 2014-02-12 曙光信息产业(北京)有限公司 Matching method and device for pattern string and text string
CN104750683A (en) * 2013-12-25 2015-07-01 中国移动通信集团公司 Character string matching method and device
CN106557553A (en) * 2016-10-27 2017-04-05 东软集团股份有限公司 The method and device of Data Matching

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
巫喜红等: "改进的Sunday模式匹配算法的设计与实现", 《哈尔滨理工大学学报》 *
朱宁洪: "字符串匹配算法Sunday的改进", 《西安科技大学学报》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108920483A (en) * 2018-04-28 2018-11-30 南京搜文信息技术有限公司 Character string fast matching method based on Suffix array clustering
CN109344301A (en) * 2018-09-26 2019-02-15 长沙学院 Method, computer data processing system, the information management system of construction ballot mark table
CN112069303A (en) * 2020-09-17 2020-12-11 四川长虹电器股份有限公司 Matching search method and device for character strings and terminal
CN113836367A (en) * 2021-09-26 2021-12-24 杭州迪普科技股份有限公司 Character reverse matching method and device
CN113836367B (en) * 2021-09-26 2023-04-28 杭州迪普科技股份有限公司 Method and device for character reverse matching

Similar Documents

Publication Publication Date Title
US10796244B2 (en) Method and apparatus for labeling training samples
CN107341224A (en) The matching process and device of a kind of character string
CN105550170B (en) A kind of Chinese word cutting method and device
US20140229473A1 (en) Determining documents that match a query
CN104636349B (en) A kind of index data compression and the method and apparatus of index data search
US20190179818A1 (en) Merge join system and method
CN105468588A (en) Character string matching method and apparatus
CN110347782A (en) Article duplicate checking method, apparatus and electronic equipment
Chen et al. Bit-parallel algorithms for exact circular string matching
CN106469186A (en) A kind of method and device of character string comparison
Xu et al. Bit-parallel multiple approximate string matching based on GPU
CN105359142A (en) Hash join method, device and database management system
US20130179419A1 (en) Retrieval of prefix completions by way of walking nodes of a trie data structure
Faro et al. An efficient skip-search approach to swap matching
CN100527134C (en) Multiple modes search method and system
CN105892995A (en) Minus searching method and device as well as processor
Nishimura et al. Accelerating the Smith-waterman algorithm using bitwise parallel bulk computation technique on GPU
Zhao et al. FARGO: Fast maximum inner product search via global multi-probing
Sachan et al. A generalized links and text properties based forum crawler
CN108304467A (en) For matched method between text
AbdulRazzaq et al. Parallel implementation of maximum-shift algorithm using OpenMp
CN105264522A (en) Method and apparatus for constructing suffix array
DK178764B1 (en) A computer-implemented method for carrying out a search without the use of signatures
CN107169313A (en) The read method and computer-readable recording medium of DNA data files
CN109241124A (en) A kind of method and system of quick-searching similar character string

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20171110

WD01 Invention patent application deemed withdrawn after publication