CN107341224A - The matching process and device of a kind of character string - Google Patents
The matching process and device of a kind of character string Download PDFInfo
- Publication number
- CN107341224A CN107341224A CN201710523458.9A CN201710523458A CN107341224A CN 107341224 A CN107341224 A CN 107341224A CN 201710523458 A CN201710523458 A CN 201710523458A CN 107341224 A CN107341224 A CN 107341224A
- Authority
- CN
- China
- Prior art keywords
- character
- string
- pattern string
- matching
- pattern
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/903—Querying
- G06F16/90335—Query processing
- G06F16/90344—Query processing by using string matching techniques
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention relates to computer realm, the matching process and device of more particularly to a kind of character string, to improve the matching efficiency of character string, resource consumption is reduced.This method is:When pattern string is matched with matching string, can be when it is determined that the adjacent next character character T for the partial character string currently alignd with pattern string be appeared in the pattern string, further according to the value condition of the left side character of the character T or right side character, determine the pattern string redirects mode.So, judged for the left side character or right side character of next character of the partial character string of current matching, being compared compared to single character has more obvious advantage, Rule of judgment is more flexible, matching advantage is had more in the string matching for having more repeat character (RPT), so as to effectively increase the matching efficiency of character string, resource consumption is reduced.
Description
Technical field
The present invention relates to computer realm, the matching process and device of more particularly to a kind of character string.
Background technology
In today of network information rapid development, the retrieval of the safety and information of information is as the important of current information-intensive society
Problem, and the mode-matching technique of character string is the key point for solving these problems.Either found from numerous contents non-
Method information, also it is to aid in user and oneself content interested is retrieved from mass data, all be unable to do without the pattern match of character string
Technology.The application of string matching technology includes every aspect, is specially:Search engine, document management, Viral diagnosis etc..
Huge network information it is also proposed higher requirement to the efficiency and accuracy rate of string matching technology.Generally
In the case of, after user inputs self-defined keyword, keyword and captions library text can be carried out string matching by system, and lookup is
It is no satisfactory captions to be present and return to retrieval result.In short, the pattern matching problem of character string is exactly when shorter
In, search pattern string (i.e. keyword) whether there is in given text, if there is the position for then determining its appearance.
At present, the lookup of implementation pattern string is carried out usually using Sunday algorithms.Sunday algorithms are a kind of efficient, quick
String matching algorithm, the warps such as Cnut-Mo Lisi-Alexandre Desplat operation (KMP) algorithm are substantially better than on algorithm speed
Allusion quotation algorithm.Sunday algorithms to matching order does not clearly require, both can from character string corresponding to text to be matched (with
Lower abbreviation matching string) the left side start can also the right start, simply when occur mismatch character when, algorithm redirects
Distance is bigger, can ignore more idle characters, so as to improve matching efficiency.
When there is pattern matching failure, Sunday algorithms can compare last that pairing is participated in matching string
Position character next bit character, judge whether to find the next bit character in pattern string, if not finding, by pattern string according to
Than pattern string length more than the displacement of a character move, if finding, by pattern string according to most right in pattern string
The displacement of the mismatch character at end to the more characters of length of last position character distance moves so that the next bit
The matching alignment of character and low order end in pattern string.Wherein, Sunday algorithms using a next array come computation schema
Low order end mismatches character to the distance of last position character in string, and all characters in pattern string are saved in next arrays, and
Different distance values is assigned to each character, the distance value of a character is the position to last position according to the appearance of a character
The distance of character is calculated.If character is not appeared in pattern string, its corresponding distance value is the length of pattern string
Value.
For example, as shown in fig.1, exemplified by performing the matching process of Sunday algorithms.
In the second row, determinating mode string " abcd " mismatches with " bcda " in matching string, and next word
Symbol " e " is not located in pattern string, accordingly, it is determined that jump distance is 4+1=5.
In the third line, determinating mode string " abcd " mismatches with " fecd " in matching string, and next word
Symbol " d " is located in pattern string, then redirects 1 according to next arrays.(i.e. pattern string moves right after a character so that the next bit
Character d aligns with the matching character d of low order end in pattern string)
……
By that analogy, carry out matching according to above-mentioned rule to redirect, until the match is successful, matching times are 6 times.
It is appearance every time however, when being matched from the front to the back in matching string using Sunday algorithms
With it is failed when can just be redirected according to next arrays, matched into next round, it is therefore inefficient.
Currently there are some innovatory algorithms, e.g., reduce the number of syntype String matching by the way of suffix compares, but after
Sew compare consumption resource it is also higher.And for example, matched using the method for the first substring in comparison pattern string, although reducing
Full matching times, but the number of comparisons of monocase is still a lot.
In view of this, it is necessary to design a kind of new matching algorithm to overcome drawbacks described above.
The content of the invention
The embodiment of the present invention provides a kind of matching process and device of character string, to improve the matching efficiency of character string,
Reduce resource consumption.
Concrete technical scheme provided in an embodiment of the present invention is as follows:
A kind of matching process of character string, including:
Determine matching string and the pattern string as matching object;
Judge in matching string, the adjacent next character character T for the partial character string currently alignd with pattern string is
It is no to appear in the pattern string;
When determining that the character T is appeared in the pattern string, further according to the left side character of the character T or right side
The value condition of character, determine the pattern string redirects mode.
Optionally, when determining that the character T is appeared in the pattern string, further according to the left side word of the character T
The value of symbol or right side character, the mode that redirects of the pattern string is determined, including:
Determine that the character T is appeared in the pattern string, and the character T is the ultimate character in the pattern string
When, determine whether character T in the matching string left side character whether the left side word with character T in the pattern string
Accord with identical;
If so, the character T in pattern string is alignd with the character T in matching string;
Otherwise, after the character T in pattern string is alignd with the character T in matching string, then pattern string moved to right one
Individual character.
Optionally, when determining that the character T is appeared in the pattern string, further according to the left side word of the character T
The value condition of symbol or right side character, the mode that redirects of the pattern string is determined, including:
Determine that the character T is appeared in the pattern string, and the character T is not the ultimate character in the pattern string
When, determine whether character T in the matching string right side character whether the right side word with character T in the pattern string
Accord with identical
If so, then the character T in pattern string is alignd with the character T in matching string;
Otherwise, the appearance situation according to the right side character of character T in the matching string in pattern string, determines institute
That states pattern string redirects mode.
Optionally, the appearance situation according to the right side character of character T in the matching string in pattern string, it is determined that
The pattern string redirects mode, including:
If judging whether the right side character of character T in the matching string occurs in pattern string;
If so, after then the character T in matching string is alignd with the character T in pattern string, then pattern string moved to right
One character;
Otherwise, then pattern string is redirected according to character M+2, wherein, M is the number of characters of pattern string;
Optionally, further comprise:
If the character T is not appeared in the pattern string, matching string is compareed, by pattern string according to step-length M+
1 is redirected, wherein, M is the number of characters of pattern string.
A kind of coalignment of character string, including:
Determining unit, for determining matching string and as the pattern string for matching object;
Judging unit, for judging in matching string, the partial character string currently alignd with pattern string it is adjacent under
Whether one character character T appears in the pattern string;
Processing unit, during for determining that the character T is appeared in the pattern string, further according to the character T's
The value condition of left side character or right side character, determine the pattern string redirects mode.
Optionally, when determining that the character T is appeared in the pattern string, further according to the left side word of the character T
The value of symbol or right side character, determines when redirecting mode of the pattern string, the processing unit is used for:
Determine that the character T is appeared in the pattern string, and the character T is the ultimate character in the pattern string
When, determine whether character T in the matching string left side character whether the left side word with character T in the pattern string
Accord with identical;
If so, the character T in pattern string is alignd with the character T in matching string;
Otherwise, after the character T in pattern string is alignd with the character T in matching string, then pattern string moved to right one
Individual character.
Optionally, when determining that the character T is appeared in the pattern string, further according to the left side word of the character T
The value condition of symbol or right side character, determines when redirecting mode of the pattern string, the processing unit is used for:
Determine that the character T is appeared in the pattern string, and the character T is not the ultimate character in the pattern string
When, determine whether character T in the matching string right side character whether the right side word with character T in the pattern string
Accord with identical
If so, then the character T in pattern string is alignd with the character T in matching string;
Otherwise, the appearance situation according to the right side character of character T in the matching string in pattern string, determines institute
That states pattern string redirects mode.
Optionally, the appearance situation according to the right side character of character T in the matching string in pattern string, it is determined that
When redirecting mode of the pattern string, the processing unit are used for:
If judging whether the right side character of character T in the matching string occurs in pattern string;
If so, after then the character T in matching string is alignd with the character T in pattern string, then pattern string moved to right
One character;
Otherwise, then pattern string is redirected according to character M+2, wherein, M is the number of characters of pattern string;
Optionally, the processing unit is further used for:
If the character T is not appeared in the pattern string, matching string is compareed, by pattern string according to step-length M+
1 is redirected, wherein, M is the number of characters of pattern string.
The present invention has the beneficial effect that:
, can be it is determined that currently and pattern string when pattern string is matched with matching string in the embodiment of the present invention
When adjacent next character character T of the partial character string of alignment is appeared in the pattern string, further according to the character T's
The value condition of left side character or right side character, determine the pattern string redirects mode.So, for the part of current matching
The left side character or right side character of next character of character string are judged have compared to single character comparison more obviously excellent
Gesture, Rule of judgment is more flexible, matching advantage is had more in the string matching for having more repeat character (RPT), so as to effectively improve
The matching efficiency of character string, resource consumption is reduced,
Brief description of the drawings
Fig. 1 is traditional Sunday algorithmic match process schematic under prior art;
Fig. 2 is Wundday algorithmic match process flows diagram flow chart in the embodiment of the present invention;
Fig. 3 is Wundday algorithmic match process schematics in the embodiment of the present invention;
Fig. 4 is coalignment illustrative view of functional configuration in the embodiment of the present invention.
Embodiment
In order to improve the matching efficiency of character string, resource consumption is reduced, in the embodiment of the present invention, devise a kind of new
With scheme, it is specially:The present invention proposes a kind of improved Sunday algorithms on the basis of Sunday matching algorithms are analyzed, and claims
For WSunday algorithms, it is specially:Mismatched in character but the comparison of a character is added when not being the situation of batter's symbol, with
This is that foundation increases pattern string jump distance, improves matching efficiency.
The embodiment preferential to the present invention is described in detail below in conjunction with the accompanying drawings.
The preprocessing part that WSunday algorithms start calculates next numbers corresponding to pattern string as Sunday algorithms
Group, that is, each jump distance that pattern string can move right is counted in advance.In WSunday algorithms,
The jump distance of pattern string not only needs to refer to last position in the character field currently matched in matching string with pattern string
Next character of character, while also need to refer to presence situation of the adjacent character of above-mentioned ultimate character in pattern string.
Specifically, as shown in fig.2, in the embodiment of the present invention, Wundday algorithmic match processes comprise the following steps that:
Step 200:Determine matching string, and the pattern string as matching object.
In the embodiment of the present invention, it is assumed that matching string is designated as into T, T is:" bcdaefecdddceabcd ", and pattern
String is designated as P, and P is:“abcd”.
So, the matching process of WSunday algorithms is introduced exemplified by P process is searched in T.
Step 201:Pattern string is matched with the partial character string currently alignd in matching string with pattern string,
Judge whether that the match is successful, if so, terminating current process;Otherwise, step 202 is carried out.
Step 202:Judge whether adjacent next character (hereinafter referred to as character T) of above-mentioned partial character string appears in pattern
In string, if not occurring, step 203 is performed;Otherwise, step 204 is performed.
Step 203:Matching string is compareed, pattern string is redirected according to M+1 characters, i.e., with treating after redirecting
With partial character string alignment new in character string, wherein, M is the number of characters of pattern string, then, performs step 213.
Step 204;Judge whether character T is ultimate character in pattern string, if so, then performing step 205;Otherwise, hold
Row step step 208.
If character T is appeared in pattern string, Sunday algorithms can be directly by the character T and matching string in pattern string
In character T alignment.But in WSunday algorithms, then need first to judge appearance positions of the character T in pattern string, according to going out
Show the difference of position and redirect mode using different.
Step 205:Judge the character T in matching string left side character whether the left side with the ultimate character of pattern string
Side character is identical, if so, then performing step 206;Otherwise, step 207 is performed.
Step 206:Pattern string is redirected in such a way:By in the character T in pattern string and matching string
Character T aligns, and then, performs step 213.
Step 207:Pattern string is redirected in such a way:By in the character T in pattern string and matching string
After character T alignment, then pattern string moved to right into a character, then, perform step 213.
Step 208:Judge the character T in matching string right side character whether the right side with the character T in pattern string
Side character is identical, if so, then performing step 209;Otherwise, step 210 is performed.
Step 209:Pattern string is redirected in such a way:By in the character T in pattern string and matching string
Character T aligns, and then, performs step 213.
Step 210:Judge whether the right side character of the character T in matching string occurred in pattern string, if not
Occurred, then perform step 211;If occurring, step 212 is performed.
Step 211:Matching string is compareed, pattern string is redirected according to character M+2, wherein, M is pattern string
Number of characters, then, perform step 213.
Step 212:Pattern string is redirected in such a way:By in the character T in matching string and pattern string
After character T alignment, then pattern string moved to right into a character, then, perform step 213.
Step 213:Judge with the presence or absence of the character not matched in matching string, if so, then return to step 201;
Otherwise, current process is terminated.
Briefly, it is exactly in the part of judgment model string current matching in character to be matched in the embodiment of the present invention
In character string while next character of ultimate character, the judgement to the left side character or right side character of the ultimate character is added,
So as to determine corresponding jump distance according to different character values, so as to effectively increase jump distance, and then improve
The purpose of matching efficiency.
Further description is made to above-described embodiment using a specific application scenarios below.
As shown in fig.3, in the 2nd row, when pattern string " abcd " aligns with " bcda " in matching string,
With failed, and the character late " e " of " bcda " is not appeared in pattern string " abcd ", then pattern string " abcd " is redirected into M+
1=5, reach the 3rd line position and put.
In the 3rd row, when pattern string " abcd " aligns with " fecd " in matching string, matching is failed, then
Judge next character " d " of " fecd " as the ultimate character in " abcd ", the left side character " d " of next character " d " of " fecd "
Mismatched with the left side character " c " of the ultimate character " d " in " abcd ",
Now, if using Sunday algorithms, pattern string " abcd " can be moved to right to a character, then, the present invention is real
Apply in example, after WSundday algorithms, pattern string " abcd " can be redirected in such a way:By the character in pattern string
After " d " aligns with the character late " d " of " fecd " in matching string, then a character is moved to right, reach the 4th line position
Put.Obviously, compared to Sunday algorithms, pattern string " abcd " to the right more moves a character.
In the 4th row, when pattern string " abcd " aligns with " cddd " in matching string, matching is failed, then
Next character " c " of " cddd " is judged not as the ultimate character in " abcd ", and the right side word of next character " c " of " cddd "
Symbol " e " is not appeared in " abcd ", then pattern string " abcd " is redirected into M+2=6, is reached the 5th line position and is put.In the 5th row, work as mould
When formula string " abcd " aligns with " abcd " in matching string, the match is successful, then matches flow and terminate.
Compared to the Sunday algorithms shown in Fig. 1, the WSundday algorithms shown in Fig. 3 the more move four characters, from
And the matching times of final WSunday algorithms reduce 2 times for 4 times than Sunday algorithm.
By taking a specific experiment scene as an example.In Window7 operating platforms, the RCoreTMQuadCPU of Intel zero, internal memory
In the environment of 4GB, 500kb sizes, the English language material of about 500,000 characters are selected, 500 times is repeated and BF algorithms, KMP algorithms, BM is calculated
Method, Sunday algorithms, Sunday New algorithms [3] and WSunday algorithms are tested.
For the accuracy of Enhancement test, pattern string length two groups of data below more than 10 and 10 are chosen respectively and are compared
Compared with experimental data is as shown in table 1, table 2.
Table 1 (experimental result when pattern string length is less than 10)
Table 2 (experimental result when pattern string length is more than 10)
By experimental result as can be seen that WSunday algorithms are better than traditional Sunday algorithms on contrast number of characters
With SundayNew algorithms, the matching consumption of the increased character of WSunday algorithms is essentially consisted in, the jump of increase can be passed through
Torque can have obvious reduction from compensating compared to traditional Sunday algorithms in total matching times.Especially
It is that the speed advantage that this jump distance obtains is more obvious when pattern string length is less than 10.
Based on above-described embodiment, as shown in fig.4, in the embodiment of the present invention, for carrying out the device of string matching (i.e.
Coalignment can be claimed) determining unit 40, judging unit 41 and processing unit 42 are comprised at least, wherein,
Determining unit 40, for determining matching string and as the pattern string for matching object;
Judging unit 41, for judging in matching string, the partial character string currently alignd with pattern string it is adjacent
Whether next character character T appears in the pattern string;
Processing unit 42, during for determining that the character T is appeared in the pattern string, further according to the character T
Left side character or right side character value condition, determine the pattern string redirects mode.
When determining that the character T is appeared in the pattern string, further according to the left side character of the character T or right side
The value of character, determines when redirecting mode of the pattern string, and processing unit 42 is used for:
Determine that the character T is appeared in the pattern string, and the character T is the ultimate character in the pattern string
When, determine whether character T in the matching string left side character whether the left side word with character T in the pattern string
Accord with identical;
If so, the character T in pattern string is alignd with the character T in matching string;
Otherwise, after the character T in pattern string is alignd with the character T in matching string, then pattern string moved to right one
Individual character.
When determining that the character T is appeared in the pattern string, further according to the left side character of the character T or right side
The value condition of character, determines when redirecting mode of the pattern string, and processing unit 42 is used for:
Determine that the character T is appeared in the pattern string, and the character T is not the ultimate character in the pattern string
When, determine whether character T in the matching string right side character whether the right side word with character T in the pattern string
Accord with identical
If so, then the character T in pattern string is alignd with the character T in matching string;
Otherwise, the appearance situation according to the right side character of character T in the matching string in pattern string, determines institute
That states pattern string redirects mode.
According to appearance situation of the right side character of character T in the matching string in pattern string, the mould is determined
When redirecting mode of formula string, processing unit 42 are used for:
If judging whether the right side character of character T in the matching string occurs in pattern string;
If so, after then the character T in matching string is alignd with the character T in pattern string, then pattern string moved to right
One character;
Otherwise, then pattern string is redirected according to character M+2, wherein, M is the number of characters of pattern string;
Processing unit 42 is further used for:
If the character T is not appeared in the pattern string, matching string is compareed, by pattern string according to step-length M+
1 is redirected, wherein, M is the number of characters of pattern string.
, can be it is determined that currently and pattern string when pattern string is matched with matching string in the embodiment of the present invention
When adjacent next character character T of the partial character string of alignment is appeared in the pattern string, further according to the character T's
The value condition of left side character or right side character, determine the pattern string redirects mode.So, for the part of current matching
The left side character or right side character of next character of character string are judged have compared to single character comparison more obviously excellent
Gesture, Rule of judgment is more flexible, matching advantage is had more in the string matching for having more repeat character (RPT), so as to effectively improve
The matching efficiency of character string, resource consumption is reduced,
It should be understood by those skilled in the art that, embodiments of the invention can be provided as method, system or computer program
Product.Therefore, the present invention can use the reality in terms of complete hardware embodiment, complete software embodiment or combination software and hardware
Apply the form of example.Moreover, the present invention can use the computer for wherein including computer usable program code in one or more
The computer program production that usable storage medium is implemented on (including but is not limited to magnetic disk storage, CD-ROM, optical memory etc.)
The form of product.
The present invention is the flow with reference to method according to embodiments of the present invention, equipment (system) and computer program product
Figure and/or block diagram describe.It should be understood that can be by every first-class in computer program instructions implementation process figure and/or block diagram
Journey and/or the flow in square frame and flow chart and/or block diagram and/or the combination of square frame.These computer programs can be provided
The processors of all-purpose computer, special-purpose computer, Embedded Processor or other programmable data processing devices is instructed to produce
A raw machine so that produced by the instruction of computer or the computing device of other programmable data processing devices for real
The device for the function of being specified in present one flow of flow chart or one square frame of multiple flows and/or block diagram or multiple square frames.
These computer program instructions, which may be alternatively stored in, can guide computer or other programmable data processing devices with spy
Determine in the computer-readable memory that mode works so that the instruction being stored in the computer-readable memory, which produces, to be included referring to
Make the manufacture of device, the command device realize in one flow of flow chart or multiple flows and/or one square frame of block diagram or
The function of being specified in multiple square frames.
These computer program instructions can be also loaded into computer or other programmable data processing devices so that counted
Series of operation steps is performed on calculation machine or other programmable devices to produce computer implemented processing, so as in computer or
The instruction performed on other programmable devices is provided for realizing in one flow of flow chart or multiple flows and/or block diagram one
The step of function of being specified in individual square frame or multiple square frames.
Although preferred embodiments of the present invention have been described, but those skilled in the art once know basic creation
Property concept, then can make other change and modification to these embodiments.So appended claims be intended to be construed to include it is excellent
Select embodiment and fall into having altered and changing for the scope of the invention.
Obviously, those skilled in the art can carry out various changes and modification without departing from this hair to the embodiment of the present invention
The spirit and scope of bright embodiment.So, if these modifications and variations of the embodiment of the present invention belong to the claims in the present invention
And its within the scope of equivalent technologies, then the present invention is also intended to comprising including these changes and modification.
Claims (10)
- A kind of 1. matching process of character string, it is characterised in that including:Determine matching string and the pattern string as matching object;Judge in matching string, whether the adjacent next character character T for the partial character string currently alignd with pattern string goes out The present pattern string;When determining that the character T is appeared in the pattern string, further according to the left side character or right side character of the character T Value condition, determine the pattern string redirects mode.
- 2. the method as described in claim 1, it is characterised in that when determining that the character T is appeared in the pattern string, enter one The mode that redirects for according to the left side character of the character T or the value of right side character, determining the pattern string is walked, including:Determine that the character T is appeared in the pattern string, and the character T be the pattern string in ultimate character when, enter One step judge character T in the matching string left side character whether the left side character phase with character T in the pattern string Together;If so, the character T in pattern string is alignd with the character T in matching string;Otherwise, after the character T in pattern string is alignd with the character T in matching string, then pattern string moved to right into a word Symbol.
- 3. the method as described in claim 1, it is characterised in that when determining that the character T is appeared in the pattern string, enter one The mode that redirects for according to the left side character of the character T or the value condition of right side character, determining the pattern string is walked, including:Determine that the character T is appeared in the pattern string, and the character T be the pattern string in ultimate character when, Determine whether character T in the matching string right side character whether the right side character with character T in the pattern string It is identicalIf so, then the character T in pattern string is alignd with the character T in matching string;Otherwise, the appearance situation according to the right side character of character T in the matching string in pattern string, determines the mould Formula string redirects mode.
- 4. method as claimed in claim 3, it is characterised in that according to the right side character of character T in the matching string Appearance situation in pattern string, the mode that redirects of the pattern string is determined, including:If judging whether the right side character of character T in the matching string occurs in pattern string;If so, after then the character T in matching string is alignd with the character T in pattern string, then pattern string moved to right one Character;Otherwise, then pattern string is redirected according to character M+2, wherein, M is the number of characters of pattern string.
- 5. the method as described in claim any one of 1-4, it is characterised in that further comprise:If the character T is not appeared in the pattern string, matching string is compareed, pattern string is entered according to step-length M+1 Row redirects, wherein, M is the number of characters of pattern string.
- A kind of 6. coalignment of character string, it is characterised in that including:Determining unit, for determining matching string and as the pattern string for matching object;Judging unit, for judging in matching string, the adjacent next word for the partial character string currently alignd with pattern string Whether symbol character T appears in the pattern string;Processing unit, during for determining that the character T is appeared in the pattern string, further according to the left side of the character T The value condition of character or right side character, determine the pattern string redirects mode.
- 7. device as claimed in claim 6, it is characterised in that when determining that the character T is appeared in the pattern string, enter one Step determines when redirecting mode of the pattern string, the processing according to the left side character of the character T or the value of right side character Unit is used for:Determine that the character T is appeared in the pattern string, and the character T be the pattern string in ultimate character when, enter One step judge character T in the matching string left side character whether the left side character phase with character T in the pattern string Together;If so, the character T in pattern string is alignd with the character T in matching string;Otherwise, after the character T in pattern string is alignd with the character T in matching string, then pattern string moved to right into a word Symbol.
- 8. device as claimed in claim 6, it is characterised in that when determining that the character T is appeared in the pattern string, enter one Walk according to the left side character of the character T or the value condition of right side character, determine when redirecting mode of the pattern string, it is described Processing unit is used for:Determine that the character T is appeared in the pattern string, and the character T be the pattern string in ultimate character when, Determine whether character T in the matching string right side character whether the right side character with character T in the pattern string It is identicalIf so, then the character T in pattern string is alignd with the character T in matching string;Otherwise, the appearance situation according to the right side character of character T in the matching string in pattern string, determines the mould Formula string redirects mode.
- 9. device as claimed in claim 8, it is characterised in that according to the right side character of character T in the matching string Appearance situation in pattern string, determines when redirecting mode of the pattern string, and the processing unit is used for:If judging whether the right side character of character T in the matching string occurs in pattern string;If so, after then the character T in matching string is alignd with the character T in pattern string, then pattern string moved to right one Character;Otherwise, then pattern string is redirected according to character M+2, wherein, M is the number of characters of pattern string.
- 10. the device as described in claim any one of 6-9, it is characterised in that the processing unit is further used for:If the character T is not appeared in the pattern string, matching string is compareed, pattern string is entered according to step-length M+1 Row redirects, wherein, M is the number of characters of pattern string.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710523458.9A CN107341224A (en) | 2017-06-30 | 2017-06-30 | The matching process and device of a kind of character string |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710523458.9A CN107341224A (en) | 2017-06-30 | 2017-06-30 | The matching process and device of a kind of character string |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107341224A true CN107341224A (en) | 2017-11-10 |
Family
ID=60219365
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710523458.9A Pending CN107341224A (en) | 2017-06-30 | 2017-06-30 | The matching process and device of a kind of character string |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107341224A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108920483A (en) * | 2018-04-28 | 2018-11-30 | 南京搜文信息技术有限公司 | Character string fast matching method based on Suffix array clustering |
CN109344301A (en) * | 2018-09-26 | 2019-02-15 | 长沙学院 | Method, computer data processing system, the information management system of construction ballot mark table |
CN112069303A (en) * | 2020-09-17 | 2020-12-11 | 四川长虹电器股份有限公司 | Matching search method and device for character strings and terminal |
CN113836367A (en) * | 2021-09-26 | 2021-12-24 | 杭州迪普科技股份有限公司 | Character reverse matching method and device |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103577598A (en) * | 2013-11-15 | 2014-02-12 | 曙光信息产业(北京)有限公司 | Matching method and device for pattern string and text string |
CN104750683A (en) * | 2013-12-25 | 2015-07-01 | 中国移动通信集团公司 | Character string matching method and device |
CN106557553A (en) * | 2016-10-27 | 2017-04-05 | 东软集团股份有限公司 | The method and device of Data Matching |
-
2017
- 2017-06-30 CN CN201710523458.9A patent/CN107341224A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103577598A (en) * | 2013-11-15 | 2014-02-12 | 曙光信息产业(北京)有限公司 | Matching method and device for pattern string and text string |
CN104750683A (en) * | 2013-12-25 | 2015-07-01 | 中国移动通信集团公司 | Character string matching method and device |
CN106557553A (en) * | 2016-10-27 | 2017-04-05 | 东软集团股份有限公司 | The method and device of Data Matching |
Non-Patent Citations (2)
Title |
---|
巫喜红等: "改进的Sunday模式匹配算法的设计与实现", 《哈尔滨理工大学学报》 * |
朱宁洪: "字符串匹配算法Sunday的改进", 《西安科技大学学报》 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108920483A (en) * | 2018-04-28 | 2018-11-30 | 南京搜文信息技术有限公司 | Character string fast matching method based on Suffix array clustering |
CN109344301A (en) * | 2018-09-26 | 2019-02-15 | 长沙学院 | Method, computer data processing system, the information management system of construction ballot mark table |
CN112069303A (en) * | 2020-09-17 | 2020-12-11 | 四川长虹电器股份有限公司 | Matching search method and device for character strings and terminal |
CN113836367A (en) * | 2021-09-26 | 2021-12-24 | 杭州迪普科技股份有限公司 | Character reverse matching method and device |
CN113836367B (en) * | 2021-09-26 | 2023-04-28 | 杭州迪普科技股份有限公司 | Method and device for character reverse matching |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10796244B2 (en) | Method and apparatus for labeling training samples | |
CN107341224A (en) | The matching process and device of a kind of character string | |
CN105550170B (en) | A kind of Chinese word cutting method and device | |
US20140229473A1 (en) | Determining documents that match a query | |
CN104636349B (en) | A kind of index data compression and the method and apparatus of index data search | |
US20190179818A1 (en) | Merge join system and method | |
CN105468588A (en) | Character string matching method and apparatus | |
CN110347782A (en) | Article duplicate checking method, apparatus and electronic equipment | |
Chen et al. | Bit-parallel algorithms for exact circular string matching | |
CN106469186A (en) | A kind of method and device of character string comparison | |
Xu et al. | Bit-parallel multiple approximate string matching based on GPU | |
CN105359142A (en) | Hash join method, device and database management system | |
US20130179419A1 (en) | Retrieval of prefix completions by way of walking nodes of a trie data structure | |
Faro et al. | An efficient skip-search approach to swap matching | |
CN100527134C (en) | Multiple modes search method and system | |
CN105892995A (en) | Minus searching method and device as well as processor | |
Nishimura et al. | Accelerating the Smith-waterman algorithm using bitwise parallel bulk computation technique on GPU | |
Zhao et al. | FARGO: Fast maximum inner product search via global multi-probing | |
Sachan et al. | A generalized links and text properties based forum crawler | |
CN108304467A (en) | For matched method between text | |
AbdulRazzaq et al. | Parallel implementation of maximum-shift algorithm using OpenMp | |
CN105264522A (en) | Method and apparatus for constructing suffix array | |
DK178764B1 (en) | A computer-implemented method for carrying out a search without the use of signatures | |
CN107169313A (en) | The read method and computer-readable recording medium of DNA data files | |
CN109241124A (en) | A kind of method and system of quick-searching similar character string |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20171110 |
|
WD01 | Invention patent application deemed withdrawn after publication |