CN106484730A - Character string matching method and device - Google Patents
Character string matching method and device Download PDFInfo
- Publication number
- CN106484730A CN106484730A CN201510549622.4A CN201510549622A CN106484730A CN 106484730 A CN106484730 A CN 106484730A CN 201510549622 A CN201510549622 A CN 201510549622A CN 106484730 A CN106484730 A CN 106484730A
- Authority
- CN
- China
- Prior art keywords
- string
- cryptographic hash
- character
- matching
- substring
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/903—Querying
- G06F16/90335—Query processing
- G06F16/90344—Query processing by using string matching techniques
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
This application discloses a kind of character string matching method and device.Wherein, the method includes:Obtain target string and calculate the cryptographic Hash of target string;Obtain matching string;Calculate the cryptographic Hash with the length identical substring of target string in matching string;Whether the cryptographic Hash of relatively substring is identical with the cryptographic Hash of target string;When the cryptographic Hash of substring and the cryptographic Hash of target string are identical it is determined that matching string is matched with target string.Present application addresses carrying out speed slow technical problem during string matching in prior art.
Description
Technical field
The application is related to data processing field, in particular to a kind of character string matching method and device.
Background technology
It is frequently encountered the operation needing to carry out string matching in programming process, whether such as judge two character strings
Equal, or judge whether to comprise another one character string etc. in certain character string, under normal circumstances directly using programming
The interface (such as IndexOf, Contains etc.) that language itself provides can complete, but is carrying out long character string
Join, or even during overlength string matching, speed requires it is impossible to meet slowly very much.
This is because the interface that provides of programming language its to realize principle be to mate character by character, this matching way is in character string
When longer, performance is poorer.As wanted to whether there is character string B (acd) in matched character string A (abcdefghijk), need
First three character " abc " first intercepting in A is mated one by one with the character in B, and it fails to match, then continue to take from A
Second to the 4th character " bcd " is mated one by one with the character in B, and failure then continues, until it reaches the end of character string A
Tail.
The shortcoming of such scheme is:Each coupling is required for character one by one and is mated, and speed is slow, has a strong impact on
The overall speed of service of program.
For above-mentioned problem, effective solution is not yet proposed at present.
Content of the invention
The embodiment of the present application provides a kind of character string matching method and device, at least to solve to carry out word in prior art
Speed slow technical problem during symbol String matching.
A kind of one side according to the embodiment of the present application, there is provided character string matching method, including:Obtain target word
Symbol is gone here and there and is calculated the cryptographic Hash of described target string;Obtain matching string;Calculate in described matching string with
The cryptographic Hash of the length identical substring of described target string;Relatively the cryptographic Hash of described substring with described
Whether the cryptographic Hash of target string is identical;When the cryptographic Hash of described substring and the cryptographic Hash of described target string
It is determined that described matching string is matched with described target string when identical.
Further, compare the cryptographic Hash inclusion whether identical with the cryptographic Hash of described substring of described target string:
Using the substring starting from first character in described matching string as current substring;Calculate currently son
The cryptographic Hash of character string, and be compared with the cryptographic Hash of described target string;Cryptographic Hash in current substring
When identical with the cryptographic Hash of described target string, stop comparing;Cryptographic Hash and described target in current substring
When the cryptographic Hash of character string is different, using the next substring in described matching string as current substring,
And the step returning the described cryptographic Hash calculating current substring, until the length of next substring is less than described
Till the length of target string, wherein, the two neighboring substring in described matching string differs a word
The distance of symbol.
Further, calculate the Kazakhstan with the length identical substring of target string in matching string described
Before uncommon value, methods described also includes:Extract in described matching string and meet pre-conditioned character, and obtain
The type of the character extracting;Obtain the type of described matching string;Judge type and the institute of the character of described extraction
Whether the type stating matching string is consistent, when inconsistent, then revises the character of extraction, so that revised word
The type of symbol is consistent with the type of described matching string.
Further, before the described cryptographic Hash calculating target string, methods described also includes:Obtain institute respectively
State matching string and the length of described target string;Judge the length of described matching string whether more than etc.
Length in described target string;When being judged as YES, determine the cryptographic Hash calculating described target string.
Further, methods described also includes:Add labelling to described matching string, wherein, described labelling is used
Mate in the described matching string of instruction.
According to the another aspect of the embodiment of the present application, additionally provide a kind of string matching device, including:First acquisition
Unit, for obtaining target string and calculating the cryptographic Hash of described target string;Second acquisition unit, is used for obtaining
Take matching string;Computing unit, for calculating the length phase with described target string in described matching string
The cryptographic Hash of same substring;Comparing unit, for the relatively cryptographic Hash of described substring and described target character
Whether the cryptographic Hash of string is identical;First determining unit, for when cryptographic Hash and the described sub- character of described target string
When the cryptographic Hash of string is identical, determine that described matching string is matched with described target string.
Further, described comparing unit includes:Determining module, for by described matching string from first
The substring that character starts is as current substring;Computing module, for calculating the cryptographic Hash of current substring,
And be compared with the cryptographic Hash of described target string;Stopping modular, for current substring cryptographic Hash with
When the cryptographic Hash of described target string is identical, stop comparing;Acquisition module, for the Hash in current substring
Value different from the cryptographic Hash of described target string when, using the next substring in described matching string as
Current substring, and return the cryptographic Hash that described computing module calculates current substring, until next height described
Till the length of character string is less than the length of described target string, wherein, adjacent two in described matching string
Individual substring differs the distance of a character.
Further, described device also includes:Extraction unit, for described calculating matching string in target
Before the cryptographic Hash of length identical substring of character string, meet pre-conditioned in the described matching string of extraction
Character, and obtain the type of the character of extraction;3rd acquiring unit, for obtaining the class of described matching string
Type;First judging unit, for judge the character of described extraction type and described matching string type whether
Unanimously;Amending unit, for when the first judging unit is judged as inconsistent, after revising the character extracting so that revising
The type of character consistent with the type of described matching string.
Further, described device also includes:4th acquiring unit, in the described Hash calculating target string
Before value, obtain described matching string and the length of described target string respectively;Second judging unit, is used for
Judge whether the length of described matching string is more than or equal to the length of described target string;Second determining unit,
For when the second judging unit is judged as YES, determining the cryptographic Hash calculating described target string.
Further, described device also includes:Adding device, for adding labelling to described matching string, its
In, described labelling is used for indicating that described character to be matched has been matched.
In the embodiment of the present application, using obtaining target string and calculate the cryptographic Hash of target string;Acquisition is treated
Join character string;Calculate the cryptographic Hash with the length identical substring of target string in matching string;Relatively more sub
Whether the cryptographic Hash of character string is identical with the cryptographic Hash of target string;Cryptographic Hash and target string when substring
Cryptographic Hash identical when it is determined that the mode that matches of matching string and target string, by two identical length
The comparison of the cryptographic Hash of character string of degree, cryptographic Hash identical character string is identical character string.Due to calculating cryptographic Hash
Speed be higher than the character speed that compares of one-to-one corresponding, therefore, in the matching process of two character strings, improve word
The matching speed of symbol string, solves the slow technical problem of the matching speed of character string in prior art.Especially than
In the matching process of longer character string, decrease and correspond the number of times comparing, can reach and significantly improve coupling speed
The effect of degree.
Brief description
Accompanying drawing described herein is used for providing further understanding of the present application, constitutes the part of the application, this Shen
Schematic description and description please is used for explaining the application, does not constitute the improper restriction to the application.In accompanying drawing
In:
Fig. 1 is the flow chart of the character string matching method according to the embodiment of the present application;
Fig. 2 is the flow chart of a kind of optional character string matching method according to the embodiment of the present application;
Fig. 3 is the schematic diagram of the string matching device according to the embodiment of the present application.
Specific embodiment
In order that those skilled in the art more fully understand application scheme, below in conjunction with the embodiment of the present application
Accompanying drawing, is clearly and completely described the embodiment it is clear that described to the technical scheme in the embodiment of the present application
It is only the embodiment of the application part, rather than whole embodiments.Based on the embodiment in the application, ability
The every other embodiment that domain those of ordinary skill is obtained under the premise of not making creative work, all should belong to
The scope of the application protection.
It should be noted that term " first " in the description and claims of this application and above-mentioned accompanying drawing, "
Two " it is etc. for distinguishing similar object, without for describing specific order or precedence.It should be appreciated that this
Sample use data can exchange in the appropriate case so that embodiments herein described herein can with except
Here the order beyond those illustrating or describing is implemented.Additionally, term " comprising " and " having " and they
Any deformation, it is intended that covering non-exclusive comprising, for example, contains process, the side of series of steps or unit
Method, system, product or equipment are not necessarily limited to those steps clearly listed or unit, but may include unclear
List or for these processes, method, product or the intrinsic other steps of equipment or unit.
Explanation of nouns:
Hash:" hash " is done in general translation, also have direct transliteration for " Hash " it is simply that the input of random length (and
It is called preliminary mapping, pre-image), by hashing algorithm, it is transformed into the output of regular length, this output is exactly to hash
Value.This conversion is a kind of compression mapping, it is, the space of hashed value is generally much less than the space of input, different
Input may hash to identical export it is impossible to uniquely to determine input value from hashed value.Simply
Say a kind of function of the eap-message digest being exactly message compression by random length to a certain regular length.
According to the embodiment of the present application, there is provided a kind of embodiment of the method for character string matching method, it should be noted that
Can execute in the computer system of such as one group of computer executable instructions in the step that the flow process of accompanying drawing illustrates,
And although showing logical order in flow charts, but in some cases, can be with suitable different from herein
The shown or described step of sequence execution.
Fig. 1 is the flow chart of the character string matching method according to the embodiment of the present application, as shown in figure 1, the method includes
Following steps:
Step S102, obtains target string and calculates the cryptographic Hash of target string.
Step S104, obtains matching string.
Step S106, calculates the cryptographic Hash with the length identical substring of target string in matching string.
Step S108, whether the cryptographic Hash comparing substring is identical with the cryptographic Hash of target string.
Step S110, when the cryptographic Hash of substring and the cryptographic Hash of target string are identical it is determined that word to be matched
Symbol string is matched with target string.
The character string matching method of the present embodiment can be applied whether need to compare two character strings identical, or one
Whether character string is included under the scene in another one character string.Such as, search in short in an article, or
Person, searches certain key word etc. in certain sentence.During comparing, unlike the prior art, existing
Character is corresponded and compares by technology, and character whole identical character string is identical character string, and in this embodiment is
The comparison of the cryptographic Hash of the character string of two equal length, cryptographic Hash identical character string is identical character string.Due to
The speed calculating cryptographic Hash is higher than the speed that character one-to-one corresponding compares, therefore, in the matching process of two character strings
In improve the matching speed of character string, solve the slow technical problem of the matching speed of character string in prior art.
Especially in the matching process of long character string, decrease and correspond the number of times comparing, can reach and significantly carry
The effect of high matching speed.
Specifically, matching string is hoabcdefguafkdoieum, and target string is abcdefg, calculates mesh
The cryptographic Hash of mark character string is H (T), intercepts matching string hoabcde of equal length, the Hash of its character string
It is worth for H1 (S), now compare H (T) and be not equal to H1 (S) it is determined that abcdefg is different from hoabcde.Treating
Matching string abcdefg of equal length is intercepted, its cryptographic Hash is H2 (S), compares H (T) in matched character string
It is identical it is determined that comprising target string in matching string with H2 (S).In this embodiment, target word
Symbol string comprises 7 characters, according to corresponding the matching process comparing in prior art, needs to treating that each intercepts
Matched character string is mated 7 times, and the matching process in the present embodiment only needs to compare once, therefore, has reached raising
The effect of the matching speed of character string, solves the slow technical problem of the matching speed of character string in prior art.
And, can significantly find out from this embodiment, the character that target string comprises is more, with existing character string
The method joined is compared, and the advantage of the character string matching method of this embodiment is more obvious, and matching speed is faster.
Alternatively, in order to improve the accuracy of coupling, each using target string and matching string meets bar
The substring of part is mated, and the substring meeting condition is exactly and target character string length identical character string,
The substring that will start from first character in matching string is as current substring;Calculate currently sub- word
The cryptographic Hash of symbol string, and be compared with the cryptographic Hash of target string;Cryptographic Hash and target in current substring
When the cryptographic Hash of character string is identical, stop comparing;In the cryptographic Hash of current substring and the cryptographic Hash of target string
When different, using the next substring in matching string as current substring, and return calculating currently son
The step of the cryptographic Hash of character string, till the length of next substring is less than the length of target string, its
In, the two neighboring substring in matching string differs the distance of a character.
One by one the cryptographic Hash of the matching string with equal length is compared with the cryptographic Hash of target string,
Comprise target string finding matching string, then no longer compare the remaining character string in matching string.
If current character string is different from target string, it is compared using character late string, until word to be matched
All characters of symbol string are all matched.
For example, matching string is hoabcdefguafkdoieum, and target string is abcdefg, first intercepts
Hoabcde simultaneously calculates its cryptographic Hash, compares with the cryptographic Hash of target string, and judged result is that cryptographic Hash is different, then
Continue to intercept character late string oabcdef, calculate its cryptographic Hash and the cryptographic Hash with target string is compared,
Judged result is that cryptographic Hash is different, then continue to intercept character late string abcdefg as current string, calculate it
The cryptographic Hash cryptographic Hash with target string is compared, judged result is that cryptographic Hash is identical, then do not continue to intercept
Cryptographic Hash.The length of character string intercepting every time is all identical with the length of target string.If in the character being truncated to
All there is no cryptographic Hash identical character string before string kdoieum, then compare cryptographic Hash and the target word of kdoieum
The cryptographic Hash of symbol string, after difference yet, does not continue to intercept character late string, determines in this matching string and do not wrap
Include target string.Matching string is judged by intercepting one by one and the length identical character string of target string
In whether include target string, any one character in matching string can not be missed it is achieved that accurate
Join the accuracy it is ensured that character match.
Alternatively, in order to avoid coupling error, improve the accuracy of coupling, revise word to be matched before being mated
The type of symbol string makes its type consistent, i.e. the sub- word of length identical with target string in calculating matching string
Before the cryptographic Hash of symbol string, method also includes:Extract in matching string and meet pre-conditioned character, and obtain
The type of the character extracting;Obtain the type of matching string;Judge the type of character of extraction and character to be matched
Whether the type of string consistent, when inconsistent, then revises the character of extraction so that the type of revised character with treat
The type of matched character string is consistent.
Meeting pre-conditioned character can be the characters such as punctuation mark, for example, contain China and Britain in matching string
Civilian punctuate, character to be matched is hoa, bcdefguafkdoieum, and target string is a, b, that is, in target string
The punctuate in face is English punctuate, and the punctuate in matching string is Chinese punctuate, if the not mark to character to be matched
Point is modified, then think in coupling that " a, b " in character to be matched is different with target string " a, b "
Character string, and " a, the b " in character substantially to be matched and target string " a, b " they are identical character string.For
The accuracy of coupling, the Chinese punctuate of character to be matched is revised as English punctuate, that is, with character to be matched in its
He belongs to English at the type of character.
Alternatively, the length of identical then two character strings of two character strings is also identical, and the character string of equal length is comparing
When be possible to obtain the conclusion that two character strings are identical character strings, if the length of character string is different, or treat
The curtailment of the character string of coupling obtains a length identical character string with target string to intercept, then no longer
Obtain substring, that is, before calculating the cryptographic Hash of target string, method also includes:Obtain word to be matched respectively
Symbol string and the length of target string;Judge whether the length of matching string is more than or equal to the length of target string;
When being judged as YES it is determined that calculating the cryptographic Hash of target string.
For example, matching string is hoabcdefguafkdoieum, and target string is abcdefg, is judging
After the cryptographic Hash of substring kdoieum is different from the cryptographic Hash of target string abcdefg, need from doieum
In d start to obtain next substring, but the number of remaining character is less than the length of target string, no
It is obtained in that the length identical substring with target string it is determined that no longer obtaining the sub- word of matching string
Symbol string, coupling terminates.
Alternatively, in order to avoid repeated matching being carried out to same matching string for identical target string,
Method also includes:Add labelling to matching string, wherein, labelling is used for indicating that matching string is mated.
Being the character string mated with markd character string, regardless of whether matching target string, not needing to this word
Symbol string is mated again.This labelling can add it is also possible to screen before matching after certain string matching
Go out and substantially do not need the character string mated to be marked, do not limit herein.This labelling can represent do not need right
Certain target string is mated again, for example, matching string be hoabcdefguafkdoieum by
Target string abcdefg mated, then add labelling a1 to matching string for hoabcdefguafkdoieum,
When then mating target string abcdefg again, avoid the need for mating hoabcdefguafkdoieum;But, should
Matching string with labelling a1 can also be mated with target string opq, and different target strings can
Mated with object identical character to be matched.
Below in conjunction with Fig. 2, embodiments herein is illustrated.
Step S201, obtains source string S, and length L (S) of S, that is, obtain matching string S and its length
Degree.
Step S202, initializing variable i=0, i.e. i-th character in matching string.
Step S203, judges L (S) >=i+L (T), judges in matching string that i-th character rises to last
Whether the length of one character total character only is more than the length of target string.If it is, execution step S204,
If it is not, then execution step S209.
Step S204, the individual character of the i-th to the i-th+L (T) intercepting S is as substring S1.
Step S205, obtains hash value H (S1) of S1.
Step S206, judges whether H (S1) is equal to H (T).If it is, execution step S207;If not,
Then execution step S208.
Step S207, the match is successful, i.e. the success of cryptographic Hash identical match.
Step S208, i=i+1, start to intercept new character string from the character late of matching string, then
Execution step S203, judges whether the length of character to be matched meets the requirement of target string.
Step S209, it fails to match.
By above-mentioned steps, it is possible to achieve the comparison of the cryptographic Hash of the character string of two equal length, cryptographic Hash identical
Character string is identical character string.Speed due to calculating cryptographic Hash is higher than the speed that character one-to-one corresponding compares, because
This, improve the matching speed of character string in the matching process of two character strings, solves character string in prior art
The slow technical problem of matching speed.Especially in the matching process of long character string, decrease one a pair
The number of times that should compare, can reach the effect significantly improving matching speed.
The embodiment of the present application additionally provides a kind of string matching device.This string matching device can execute above-mentioned word
Symbol string matching method, above-mentioned character string matching method can also be executed by this string matching device.
Fig. 3 is the schematic diagram of the string matching device according to the embodiment of the present application.As shown in figure 3, this character string
Equipped put including:First acquisition unit 10, second acquisition unit 30, computing unit 50, comparing unit 70 and first
Determining unit 90.Wherein:
First acquisition unit 10 is used for the cryptographic Hash obtaining target string and calculating target string.
Second acquisition unit 30 is used for obtaining matching string.
Computing unit 50 is used for calculating the Hash with the length identical character string of target string in matching string
Value, obtains the cryptographic Hash of substring.
Whether the cryptographic Hash that comparing unit 70 is used for comparison object character string is identical with the cryptographic Hash of substring.
First determining unit 90 is used for when the cryptographic Hash of target string and the cryptographic Hash of substring are identical, and determination is treated
Matched character string is matched with target string.
The character string matching method of the present embodiment can be applied whether need to compare two character strings identical, or one
Whether character string is included under the scene in another one character string.Such as, search in short in an article, or
Person, searches certain key word etc. in certain sentence.During comparing, unlike the prior art, existing
Technology is compared using the one-to-one corresponding of character, and character whole identical character string is identical character string, and this embodiment
In be two equal length the cryptographic Hash of character string comparison, cryptographic Hash identical character string be identical character string.
Speed due to calculating cryptographic Hash is higher than the speed that character one-to-one corresponding compares, therefore, in the coupling of two character strings
During improve the matching speed of character string, solve the slow technology of the matching speed of character string in prior art
Problem.Especially in the matching process of long character string, decrease and correspond the number of times comparing, can reach
Significantly improve the effect of matching speed.
Specifically, matching string is hoabcdefguafkdoieum, and target string is abcdefg, calculates mesh
The cryptographic Hash of mark character string is H (T), intercepts matching string hoabcde of equal length, the Hash of its character string
It is worth for H1 (S), now compare H (T) and be not equal to H1 (S) it is determined that abcdefg is different from hoabcde.Treating
Matching string abcdefg of equal length is intercepted, its cryptographic Hash is H2 (S), compares H (T) in matched character string
It is identical it is determined that comprising target string in matching string with H2 (S).In this embodiment, target word
Symbol string comprises 7 characters, according to corresponding the matching process comparing in prior art, needs to treating that each intercepts
Matched character string is mated 7 times, and the matching process in the present embodiment only needs to compare once, therefore, has reached raising
The effect of the matching speed of character string, solves the slow technical problem of the matching speed of character string in prior art.
And, can significantly find out from this embodiment, the character that target string comprises is more, with existing character string
The method joined is compared, and the advantage of the character string matching method performed by string matching device of this embodiment is more obvious,
Matching speed is faster.
Alternatively, in order to improve the accuracy of coupling, each using target string and matching string meets bar
The substring of part is mated, and the substring meeting condition is exactly and target character string length identical character string,
I.e. comparing unit includes:Determining module, for making the substring starting in matching string from first character
For current substring;Computing module, for calculating the cryptographic Hash of current substring, and the Kazakhstan with target string
Uncommon value is compared;Stopping modular, for identical in the cryptographic Hash of current substring and the cryptographic Hash of target string
When, stop comparing;Acquisition module, for different in the cryptographic Hash of current substring and the cryptographic Hash of target string
When, using the next substring in matching string as current substring, and return computing module calculate work as
The cryptographic Hash of front substring, till the length of next substring is less than the length of target string, wherein,
Two neighboring substring in matching string differs the distance of a character.
One by one the cryptographic Hash of the matching string with equal length is compared with the cryptographic Hash of target string,
Comprise target string finding matching string, then no longer compare the remaining character string in matching string.
If current character string is different from target string, it is compared using character late string, until word to be matched
All characters of symbol string are all matched.
For example, matching string is hoabcdefguafkdoieum, and target string is abcdefg, first intercepts
Hoabcde simultaneously calculates its cryptographic Hash, compares with the cryptographic Hash of target string, and judged result is that cryptographic Hash is different, then
Continue to intercept character late string oabcdef, calculate its cryptographic Hash and the cryptographic Hash with target string is compared,
Judged result is that cryptographic Hash is different, then continue to intercept character late string abcdefg as current string, calculate it
The cryptographic Hash cryptographic Hash with target string is compared, judged result is that cryptographic Hash is identical, then do not continue to intercept
Cryptographic Hash.The length of character string intercepting every time is all identical with the length of target string.If in the character being truncated to
All there is no cryptographic Hash identical character string before string kdoieum, then compare cryptographic Hash and the target word of kdoieum
The cryptographic Hash of symbol string, after difference yet, does not continue to intercept character late string, determines in this matching string and do not wrap
Include target string.Matching string is judged by intercepting one by one and the length identical character string of target string
In whether include target string, any one character in matching string can not be missed it is achieved that accurate
Join the accuracy it is ensured that character match.
Alternatively, in order to avoid coupling error, improve the accuracy of coupling, revise word to be matched before being mated
The type of symbol string makes its type consistent, and that is, device also includes:Extraction unit, for calculate matching string in
Before the cryptographic Hash of length identical substring of target string, meet pre-conditioned in extraction matching string
Character, and obtain the type of the character of extraction;3rd acquiring unit, for obtaining the type of matching string;
First judging unit, whether the type for judging the character of extraction is consistent with the type of matching string;Revise single
Unit, for when inconsistent, revises the character extracting so that the class of the type of revised character and matching string
Type is consistent.
Meeting pre-conditioned character can be the characters such as punctuation mark, for example, contain China and Britain in matching string
Civilian punctuate, character to be matched is hoa, bcdefguafkdoieum, and target string is a, b, that is, in target string
The punctuate in face is English punctuate, and the punctuate in matching string is Chinese punctuate, if the not mark to character to be matched
Point is modified, then think in coupling that " a, b " in character to be matched is different with target string " a, b "
Character string, and " a, the b " in character substantially to be matched and target string " a, b " they are identical character string.For
The accuracy of coupling, the Chinese punctuate of character to be matched is revised as English punctuate, that is, with character to be matched in its
He belongs to English at the type of character.
Alternatively, the length of identical then two character strings of two character strings is also identical, and the character string of equal length is comparing
When be possible to obtain the conclusion that two character strings are identical character strings, if the length of character string is different, or treat
The curtailment of the character string of coupling obtains a length identical character string with target string to intercept, then no longer
Obtain substring, that is, device also includes:4th acquiring unit, for calculate target string cryptographic Hash before,
Obtain matching string and the length of target string respectively;Second judging unit, for judging matching string
Length whether more than or equal to the length of target string;Second determining unit, for when being judged as YES it is determined that
Calculate the cryptographic Hash of target string.
For example, matching string is hoabcdefguafkdoieum, and target string is abcdefg, is judging
After the cryptographic Hash of substring kdoieum is different from the cryptographic Hash of target string abcdefg, need from doieum
In d start to obtain next substring, but the number of remaining character is less than the length of target string, no
It is obtained in that the length identical substring with target string it is determined that no longer obtaining the sub- word of matching string
Symbol string, coupling terminates.
Alternatively, in order to avoid repeated matching being carried out to same matching string for identical target string,
Device also includes:Adding device, for adding labelling to matching string, wherein, labelling is used for indicating to be matched
Character has been matched.It is the character string mated with markd character string, regardless of whether matching target string,
Do not need this character string is mated again.This labelling can add after certain string matching it is also possible to
Filter out before matching and substantially do not need the character string mated to be marked, do not limit herein.This labelling can
Do not need certain target string is mated again to represent, for example, matching string is
Hoabcdefguafkdoieum was mated by target string abcdefg, then to matching string be
Hoabcdefguafkdoieum adds labelling a1, then, when mating target string abcdefg again, avoid the need for mating
hoabcdefguafkdoieum;But, the matching string that should carry labelling a1 can also be with target string opq
Mated, different target strings can be mated with object identical character to be matched.
By above-described embodiment, it is possible to achieve the comparison of the cryptographic Hash of the character string of two equal length, cryptographic Hash is identical
Character string be identical character string.Speed due to calculating cryptographic Hash is higher than the speed that character one-to-one corresponding compares,
Therefore, improve the matching speed of character string in the matching process of two character strings, solve character in prior art
The slow technical problem of matching speed of string.Especially in the matching process of long character string, decrease one by one
The number of times that correspondence compares, can reach the effect significantly improving matching speed.
Above-mentioned the embodiment of the present application sequence number is for illustration only, does not represent the quality of embodiment.
In above-described embodiment of the application, the description to each embodiment all emphasizes particularly on different fields, and does not have in certain embodiment
The part describing in detail, may refer to the associated description of other embodiment.
It should be understood that disclosed technology contents in several embodiments provided herein, other can be passed through
Mode realize.Wherein, device embodiment described above is only the schematically division of for example described unit,
Can be a kind of division of logic function, actual can have other dividing mode when realizing, for example multiple units or assembly
Can in conjunction with or be desirably integrated into another system, or some features can be ignored, or does not execute.Another, institute
The coupling each other of display or discussion or direct-coupling or communication connection can be by some interfaces, unit or mould
The INDIRECT COUPLING of block or communication connection, can be electrical or other forms.
The described unit illustrating as separating component can be or may not be physically separate, show as unit
The part showing can be or may not be physical location, you can with positioned at a place, or can also be distributed to
On multiple units.Some or all of unit therein can be selected according to the actual needs to realize this embodiment scheme
Purpose.
In addition, can be integrated in a processing unit in each functional unit in each embodiment of the application it is also possible to
It is that unit is individually physically present it is also possible to two or more units are integrated in a unit.Above-mentioned integrated
Unit both can be to be realized in the form of hardware, it would however also be possible to employ the form of SFU software functional unit is realized.
If described integrated unit realized using in the form of SFU software functional unit and as independent production marketing or use when,
Can be stored in a computer read/write memory medium.Based on such understanding, the technical scheme essence of the application
On all or part of the part that in other words prior art contributed or this technical scheme can be with software product
Form embodies, and this computer software product is stored in a storage medium, including some instructions with so that one
Platform computer equipment (can be personal computer, server or network equipment etc.) executes each embodiment institute of the application
State all or part of step of method.And aforesaid storage medium includes:USB flash disk, read only memory (ROM, Read-Only
Memory), random access memory (RAM, Random Access Memory), portable hard drive, magnetic disc or CD
Etc. various can be with the medium of store program codes.
The above is only the preferred implementation of the application it is noted that ordinary skill people for the art
For member, on the premise of without departing from the application principle, some improvements and modifications can also be made, these improve and moisten
Decorations also should be regarded as the protection domain of the application.
Claims (10)
1. a kind of character string matching method is it is characterised in that include:
Obtain target string and calculate the cryptographic Hash of described target string;
Obtain matching string;
Calculate the cryptographic Hash with the length identical substring of described target string in described matching string;
Whether the cryptographic Hash of relatively described substring is identical with the cryptographic Hash of described target string;
When the cryptographic Hash of described substring is identical with the cryptographic Hash of described target string it is determined that described treat
Matched character string is matched with described target string.
2. method according to claim 1 it is characterised in that the cryptographic Hash of target string described in comparison with described
The whether identical inclusion of the cryptographic Hash of substring:
Using the substring starting from first character in described matching string as current substring;
Calculate the cryptographic Hash of current substring, and be compared with the cryptographic Hash of described target string;
When the cryptographic Hash of current substring is identical with the cryptographic Hash of described target string, stop comparing;
When the cryptographic Hash of current substring is different from the cryptographic Hash of described target string, will be described to be matched
Next substring in character string is as current substring, and returns the current substring of described calculating
The step of cryptographic Hash, till the length of next substring is less than the length of described target string, its
In, the two neighboring substring in described matching string differs the distance of a character.
3. method according to claim 1 and 2 it is characterised in that described calculating matching string in mesh
Before the cryptographic Hash of length identical substring of mark character string, methods described also includes:
Extract and in described matching string, meet pre-conditioned character, and obtain the type of the character of extraction;
Obtain the type of described matching string;
Judge whether the type of character of described extraction is consistent with the type of described matching string, when inconsistent
When, then revise the character of extraction, so that the type one of the type of revised character and described matching string
Cause.
4. method according to claim 1 it is characterised in that described calculate target string cryptographic Hash before,
Methods described also includes:
Obtain described matching string and the length of described target string respectively;
Judge whether the length of described matching string is more than or equal to the length of described target string;
When being judged as YES, determine the cryptographic Hash calculating described target string.
5. method according to claim 1 is it is characterised in that methods described also includes:
Add labelling to described matching string, wherein, described labelling is used for indicating described matching string
Mate.
6. a kind of string matching device is it is characterised in that include:
First acquisition unit, for obtaining target string and calculating the cryptographic Hash of described target string;
Second acquisition unit, for obtaining matching string;
Computing unit, for calculating length identical with described target string in described matching string
The cryptographic Hash of character string;
Whether comparing unit, the cryptographic Hash for the described substring of comparison and the cryptographic Hash of described target string
Identical;
First determining unit, for the cryptographic Hash phase of cryptographic Hash and described substring when described target string
Meanwhile, determine that described matching string is matched with described target string.
7. device according to claim 6 is it is characterised in that described comparing unit includes:
Determining module, for the substring that will start from first character in described matching string as working as
Front substring;
Computing module, for calculating the cryptographic Hash of current substring, and the cryptographic Hash with described target string
It is compared;
Stopping modular, for when the cryptographic Hash of current substring is identical with the cryptographic Hash of described target string,
Stopping is compared;
Acquisition module, for when the cryptographic Hash of current substring is different from the cryptographic Hash of described target string,
Using the next substring in described matching string as current substring, and return described calculating mould
Block calculates the cryptographic Hash of current substring, until the length of next substring is less than described target string
Length till, wherein, the two neighboring substring in described matching string differ a character away from
From.
8. the device according to claim 6 or 7 is it is characterised in that described device also includes:
Extraction unit, for calculating the sub- word of length identical with target string in matching string described
Before the cryptographic Hash of symbol string, extract in described matching string and meet pre-conditioned character, and obtain extraction
Character type;
3rd acquiring unit, for obtaining the type of described matching string;
First judging unit, for judging the type of character and the type of described matching string of described extraction
Whether consistent;
Amending unit, for when the first judging unit is judged as inconsistent, revising the character extracting so that revising
The type of character afterwards is consistent with the type of described matching string.
9. device according to claim 6 is it is characterised in that described device also includes:
4th acquiring unit, for, before the described cryptographic Hash calculating target string, treating described in acquisition respectively
Matched character string and the length of described target string;
Second judging unit, whether the length for judging described matching string is more than or equal to described target word
The length of symbol string;
Second determining unit, for when the second judging unit is judged as YES it is determined that calculate described target character
The cryptographic Hash of string.
10. device according to claim 6 is it is characterised in that described device also includes:
Adding device, for adding labelling to described matching string, wherein, described labelling is used for indicating institute
State character to be matched to be matched.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510549622.4A CN106484730A (en) | 2015-08-31 | 2015-08-31 | Character string matching method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510549622.4A CN106484730A (en) | 2015-08-31 | 2015-08-31 | Character string matching method and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106484730A true CN106484730A (en) | 2017-03-08 |
Family
ID=58235459
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510549622.4A Pending CN106484730A (en) | 2015-08-31 | 2015-08-31 | Character string matching method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106484730A (en) |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108628817A (en) * | 2017-03-15 | 2018-10-09 | 腾讯科技(深圳)有限公司 | A kind of data processing method and device |
CN109408681A (en) * | 2018-10-11 | 2019-03-01 | 广东工业大学 | A kind of character string matching method, device, equipment and readable storage medium storing program for executing |
CN111090982A (en) * | 2018-10-24 | 2020-05-01 | 迈普通信技术股份有限公司 | Text comparison method and device, electronic equipment and computer readable storage medium |
CN111191087A (en) * | 2019-12-31 | 2020-05-22 | 歌尔股份有限公司 | Character matching method, terminal device and computer-readable storage medium |
CN111475690A (en) * | 2020-06-19 | 2020-07-31 | 支付宝(杭州)信息技术有限公司 | Character string matching method and device, data detection method and server |
CN111627536A (en) * | 2020-05-14 | 2020-09-04 | 广元市中心医院 | Adverse event management system and method for hospital |
CN111797285A (en) * | 2020-06-30 | 2020-10-20 | 深圳壹账通智能科技有限公司 | Character string fuzzy matching method, device, equipment and readable storage medium |
CN112528101A (en) * | 2020-12-22 | 2021-03-19 | 杭州趣链科技有限公司 | Character string matching method, device, equipment and storage medium |
CN112784125A (en) * | 2021-01-14 | 2021-05-11 | 辽宁工程技术大学 | Mode identification method and device for input information |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101901257A (en) * | 2010-07-21 | 2010-12-01 | 北京理工大学 | Multi-string matching method |
CN102184245A (en) * | 2011-05-18 | 2011-09-14 | 华北电力大学 | Method for fast searching massive text data keywords |
CN103186669A (en) * | 2013-03-21 | 2013-07-03 | 厦门雅迅网络股份有限公司 | Method for rapidly filtering key word |
CN103425739A (en) * | 2013-07-09 | 2013-12-04 | 国云科技股份有限公司 | Character string matching algorithm |
CN103455753A (en) * | 2012-05-30 | 2013-12-18 | 北京金山安全软件有限公司 | Sample file analysis method and device |
CN104246663A (en) * | 2013-12-31 | 2014-12-24 | 华为终端有限公司 | Character string input control method and device |
CN104462322A (en) * | 2014-12-01 | 2015-03-25 | 北京国双科技有限公司 | Method and device for contrasting character strings |
CN104850784A (en) * | 2015-04-30 | 2015-08-19 | 中国人民解放军国防科学技术大学 | Method and system for cloud detection of malicious software based on Hash characteristic vector |
-
2015
- 2015-08-31 CN CN201510549622.4A patent/CN106484730A/en active Pending
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101901257A (en) * | 2010-07-21 | 2010-12-01 | 北京理工大学 | Multi-string matching method |
CN102184245A (en) * | 2011-05-18 | 2011-09-14 | 华北电力大学 | Method for fast searching massive text data keywords |
CN103455753A (en) * | 2012-05-30 | 2013-12-18 | 北京金山安全软件有限公司 | Sample file analysis method and device |
CN103186669A (en) * | 2013-03-21 | 2013-07-03 | 厦门雅迅网络股份有限公司 | Method for rapidly filtering key word |
CN103425739A (en) * | 2013-07-09 | 2013-12-04 | 国云科技股份有限公司 | Character string matching algorithm |
CN104246663A (en) * | 2013-12-31 | 2014-12-24 | 华为终端有限公司 | Character string input control method and device |
CN104462322A (en) * | 2014-12-01 | 2015-03-25 | 北京国双科技有限公司 | Method and device for contrasting character strings |
CN104850784A (en) * | 2015-04-30 | 2015-08-19 | 中国人民解放军国防科学技术大学 | Method and system for cloud detection of malicious software based on Hash characteristic vector |
Non-Patent Citations (1)
Title |
---|
搜索技术博客-淘宝: "字符串匹配那些事(一)", 《HTTPS://KB.CNBLOGS.COM/PAGE/107856/》 * |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108628817A (en) * | 2017-03-15 | 2018-10-09 | 腾讯科技(深圳)有限公司 | A kind of data processing method and device |
CN108628817B (en) * | 2017-03-15 | 2022-07-26 | 腾讯科技(深圳)有限公司 | Data processing method and device |
CN109408681B (en) * | 2018-10-11 | 2021-11-26 | 广东工业大学 | Character string matching method, device and equipment and readable storage medium |
CN109408681A (en) * | 2018-10-11 | 2019-03-01 | 广东工业大学 | A kind of character string matching method, device, equipment and readable storage medium storing program for executing |
CN111090982A (en) * | 2018-10-24 | 2020-05-01 | 迈普通信技术股份有限公司 | Text comparison method and device, electronic equipment and computer readable storage medium |
CN111191087A (en) * | 2019-12-31 | 2020-05-22 | 歌尔股份有限公司 | Character matching method, terminal device and computer-readable storage medium |
CN111191087B (en) * | 2019-12-31 | 2023-11-07 | 歌尔股份有限公司 | Character matching method, terminal device and computer readable storage medium |
CN111627536A (en) * | 2020-05-14 | 2020-09-04 | 广元市中心医院 | Adverse event management system and method for hospital |
CN111475690A (en) * | 2020-06-19 | 2020-07-31 | 支付宝(杭州)信息技术有限公司 | Character string matching method and device, data detection method and server |
CN111475690B (en) * | 2020-06-19 | 2020-12-25 | 支付宝(杭州)信息技术有限公司 | Character string matching method and device, data detection method and server |
CN111797285A (en) * | 2020-06-30 | 2020-10-20 | 深圳壹账通智能科技有限公司 | Character string fuzzy matching method, device, equipment and readable storage medium |
CN112528101A (en) * | 2020-12-22 | 2021-03-19 | 杭州趣链科技有限公司 | Character string matching method, device, equipment and storage medium |
CN112784125A (en) * | 2021-01-14 | 2021-05-11 | 辽宁工程技术大学 | Mode identification method and device for input information |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106484730A (en) | Character string matching method and device | |
TWI729472B (en) | Method, device and server for determining feature words | |
CN107193921B (en) | Method and system for correcting error of Chinese-English mixed query facing search engine | |
CN103123618B (en) | Text similarity acquisition methods and device | |
US10552462B1 (en) | Systems and methods for tokenizing user-annotated names | |
CN103440252B (en) | Information extracting method arranged side by side and device in a kind of Chinese sentence | |
CN105589894B (en) | Document index establishing method and device and document retrieval method and device | |
CN106528647B (en) | One kind carrying out the matched method of term based on cedar even numbers group dictionary tree algorithm | |
US9984064B2 (en) | Reduction of memory usage in feature generation | |
CN104008093A (en) | Method and system for chinese name transliteration | |
CN102867049B (en) | Chinese PINYIN quick word segmentation method based on word search tree | |
US10248646B1 (en) | Token matching in large document corpora | |
CN108734110A (en) | Text fragment identification control methods based on longest common subsequence and system | |
CN104281275B (en) | The input method of a kind of English and device | |
CN103631938A (en) | Method and device for automatically expanding segmentation dictionary | |
CN107797995A (en) | A kind of Chinese and English fragment language material generation method | |
US9965546B2 (en) | Fast substring fulltext search | |
US20180011836A1 (en) | Tibetan Character Constituent Analysis Method, Tibetan Sorting Method And Corresponding Devices | |
CN104933030A (en) | Uygur language spelling examination method and device | |
CN110795617A (en) | Error correction method and related device for search terms | |
CN106569986A (en) | Character string replacement method and device | |
CN104750846A (en) | Method and device for finding substring | |
CN103544139A (en) | Forward word segmentation method and device based on Chinese retrieval | |
Tang et al. | An optimization algorithm of Chinese word segmentation based on dictionary | |
Theeramunkong et al. | Pattern-based features vs. statistical-based features in decision trees for word segmentation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB02 | Change of applicant information |
Address after: 100083 No. 401, 4th Floor, Haitai Building, 229 North Fourth Ring Road, Haidian District, Beijing Applicant after: Beijing Guoshuang Technology Co.,Ltd. Address before: 100086 Cuigong Hotel, 76 Zhichun Road, Shuangyushu District, Haidian District, Beijing Applicant before: Beijing Guoshuang Technology Co.,Ltd. |
|
CB02 | Change of applicant information | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20170308 |
|
RJ01 | Rejection of invention patent application after publication |