CN106484730A - Character string matching method and device - Google Patents

Character string matching method and device Download PDF

Info

Publication number
CN106484730A
CN106484730A CN201510549622.4A CN201510549622A CN106484730A CN 106484730 A CN106484730 A CN 106484730A CN 201510549622 A CN201510549622 A CN 201510549622A CN 106484730 A CN106484730 A CN 106484730A
Authority
CN
China
Prior art keywords
string
cryptographic hash
character
matching
substring
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201510549622.4A
Other languages
Chinese (zh)
Inventor
李新国
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Gridsum Technology Co Ltd
Original Assignee
Beijing Gridsum Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Gridsum Technology Co Ltd filed Critical Beijing Gridsum Technology Co Ltd
Priority to CN201510549622.4A priority Critical patent/CN106484730A/en
Publication of CN106484730A publication Critical patent/CN106484730A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/90335Query processing
    • G06F16/90344Query processing by using string matching techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

This application discloses a kind of character string matching method and device.Wherein, the method includes:Obtain target string and calculate the cryptographic Hash of target string;Obtain matching string;Calculate the cryptographic Hash with the length identical substring of target string in matching string;Whether the cryptographic Hash of relatively substring is identical with the cryptographic Hash of target string;When the cryptographic Hash of substring and the cryptographic Hash of target string are identical it is determined that matching string is matched with target string.Present application addresses carrying out speed slow technical problem during string matching in prior art.

Description

Character string matching method and device
Technical field
The application is related to data processing field, in particular to a kind of character string matching method and device.
Background technology
It is frequently encountered the operation needing to carry out string matching in programming process, whether such as judge two character strings Equal, or judge whether to comprise another one character string etc. in certain character string, under normal circumstances directly using programming The interface (such as IndexOf, Contains etc.) that language itself provides can complete, but is carrying out long character string Join, or even during overlength string matching, speed requires it is impossible to meet slowly very much.
This is because the interface that provides of programming language its to realize principle be to mate character by character, this matching way is in character string When longer, performance is poorer.As wanted to whether there is character string B (acd) in matched character string A (abcdefghijk), need First three character " abc " first intercepting in A is mated one by one with the character in B, and it fails to match, then continue to take from A Second to the 4th character " bcd " is mated one by one with the character in B, and failure then continues, until it reaches the end of character string A Tail.
The shortcoming of such scheme is:Each coupling is required for character one by one and is mated, and speed is slow, has a strong impact on The overall speed of service of program.
For above-mentioned problem, effective solution is not yet proposed at present.
Content of the invention
The embodiment of the present application provides a kind of character string matching method and device, at least to solve to carry out word in prior art Speed slow technical problem during symbol String matching.
A kind of one side according to the embodiment of the present application, there is provided character string matching method, including:Obtain target word Symbol is gone here and there and is calculated the cryptographic Hash of described target string;Obtain matching string;Calculate in described matching string with The cryptographic Hash of the length identical substring of described target string;Relatively the cryptographic Hash of described substring with described Whether the cryptographic Hash of target string is identical;When the cryptographic Hash of described substring and the cryptographic Hash of described target string It is determined that described matching string is matched with described target string when identical.
Further, compare the cryptographic Hash inclusion whether identical with the cryptographic Hash of described substring of described target string: Using the substring starting from first character in described matching string as current substring;Calculate currently son The cryptographic Hash of character string, and be compared with the cryptographic Hash of described target string;Cryptographic Hash in current substring When identical with the cryptographic Hash of described target string, stop comparing;Cryptographic Hash and described target in current substring When the cryptographic Hash of character string is different, using the next substring in described matching string as current substring, And the step returning the described cryptographic Hash calculating current substring, until the length of next substring is less than described Till the length of target string, wherein, the two neighboring substring in described matching string differs a word The distance of symbol.
Further, calculate the Kazakhstan with the length identical substring of target string in matching string described Before uncommon value, methods described also includes:Extract in described matching string and meet pre-conditioned character, and obtain The type of the character extracting;Obtain the type of described matching string;Judge type and the institute of the character of described extraction Whether the type stating matching string is consistent, when inconsistent, then revises the character of extraction, so that revised word The type of symbol is consistent with the type of described matching string.
Further, before the described cryptographic Hash calculating target string, methods described also includes:Obtain institute respectively State matching string and the length of described target string;Judge the length of described matching string whether more than etc. Length in described target string;When being judged as YES, determine the cryptographic Hash calculating described target string.
Further, methods described also includes:Add labelling to described matching string, wherein, described labelling is used Mate in the described matching string of instruction.
According to the another aspect of the embodiment of the present application, additionally provide a kind of string matching device, including:First acquisition Unit, for obtaining target string and calculating the cryptographic Hash of described target string;Second acquisition unit, is used for obtaining Take matching string;Computing unit, for calculating the length phase with described target string in described matching string The cryptographic Hash of same substring;Comparing unit, for the relatively cryptographic Hash of described substring and described target character Whether the cryptographic Hash of string is identical;First determining unit, for when cryptographic Hash and the described sub- character of described target string When the cryptographic Hash of string is identical, determine that described matching string is matched with described target string.
Further, described comparing unit includes:Determining module, for by described matching string from first The substring that character starts is as current substring;Computing module, for calculating the cryptographic Hash of current substring, And be compared with the cryptographic Hash of described target string;Stopping modular, for current substring cryptographic Hash with When the cryptographic Hash of described target string is identical, stop comparing;Acquisition module, for the Hash in current substring Value different from the cryptographic Hash of described target string when, using the next substring in described matching string as Current substring, and return the cryptographic Hash that described computing module calculates current substring, until next height described Till the length of character string is less than the length of described target string, wherein, adjacent two in described matching string Individual substring differs the distance of a character.
Further, described device also includes:Extraction unit, for described calculating matching string in target Before the cryptographic Hash of length identical substring of character string, meet pre-conditioned in the described matching string of extraction Character, and obtain the type of the character of extraction;3rd acquiring unit, for obtaining the class of described matching string Type;First judging unit, for judge the character of described extraction type and described matching string type whether Unanimously;Amending unit, for when the first judging unit is judged as inconsistent, after revising the character extracting so that revising The type of character consistent with the type of described matching string.
Further, described device also includes:4th acquiring unit, in the described Hash calculating target string Before value, obtain described matching string and the length of described target string respectively;Second judging unit, is used for Judge whether the length of described matching string is more than or equal to the length of described target string;Second determining unit, For when the second judging unit is judged as YES, determining the cryptographic Hash calculating described target string.
Further, described device also includes:Adding device, for adding labelling to described matching string, its In, described labelling is used for indicating that described character to be matched has been matched.
In the embodiment of the present application, using obtaining target string and calculate the cryptographic Hash of target string;Acquisition is treated Join character string;Calculate the cryptographic Hash with the length identical substring of target string in matching string;Relatively more sub Whether the cryptographic Hash of character string is identical with the cryptographic Hash of target string;Cryptographic Hash and target string when substring Cryptographic Hash identical when it is determined that the mode that matches of matching string and target string, by two identical length The comparison of the cryptographic Hash of character string of degree, cryptographic Hash identical character string is identical character string.Due to calculating cryptographic Hash Speed be higher than the character speed that compares of one-to-one corresponding, therefore, in the matching process of two character strings, improve word The matching speed of symbol string, solves the slow technical problem of the matching speed of character string in prior art.Especially than In the matching process of longer character string, decrease and correspond the number of times comparing, can reach and significantly improve coupling speed The effect of degree.
Brief description
Accompanying drawing described herein is used for providing further understanding of the present application, constitutes the part of the application, this Shen Schematic description and description please is used for explaining the application, does not constitute the improper restriction to the application.In accompanying drawing In:
Fig. 1 is the flow chart of the character string matching method according to the embodiment of the present application;
Fig. 2 is the flow chart of a kind of optional character string matching method according to the embodiment of the present application;
Fig. 3 is the schematic diagram of the string matching device according to the embodiment of the present application.
Specific embodiment
In order that those skilled in the art more fully understand application scheme, below in conjunction with the embodiment of the present application Accompanying drawing, is clearly and completely described the embodiment it is clear that described to the technical scheme in the embodiment of the present application It is only the embodiment of the application part, rather than whole embodiments.Based on the embodiment in the application, ability The every other embodiment that domain those of ordinary skill is obtained under the premise of not making creative work, all should belong to The scope of the application protection.
It should be noted that term " first " in the description and claims of this application and above-mentioned accompanying drawing, " Two " it is etc. for distinguishing similar object, without for describing specific order or precedence.It should be appreciated that this Sample use data can exchange in the appropriate case so that embodiments herein described herein can with except Here the order beyond those illustrating or describing is implemented.Additionally, term " comprising " and " having " and they Any deformation, it is intended that covering non-exclusive comprising, for example, contains process, the side of series of steps or unit Method, system, product or equipment are not necessarily limited to those steps clearly listed or unit, but may include unclear List or for these processes, method, product or the intrinsic other steps of equipment or unit.
Explanation of nouns:
Hash:" hash " is done in general translation, also have direct transliteration for " Hash " it is simply that the input of random length (and It is called preliminary mapping, pre-image), by hashing algorithm, it is transformed into the output of regular length, this output is exactly to hash Value.This conversion is a kind of compression mapping, it is, the space of hashed value is generally much less than the space of input, different Input may hash to identical export it is impossible to uniquely to determine input value from hashed value.Simply Say a kind of function of the eap-message digest being exactly message compression by random length to a certain regular length.
According to the embodiment of the present application, there is provided a kind of embodiment of the method for character string matching method, it should be noted that Can execute in the computer system of such as one group of computer executable instructions in the step that the flow process of accompanying drawing illustrates, And although showing logical order in flow charts, but in some cases, can be with suitable different from herein The shown or described step of sequence execution.
Fig. 1 is the flow chart of the character string matching method according to the embodiment of the present application, as shown in figure 1, the method includes Following steps:
Step S102, obtains target string and calculates the cryptographic Hash of target string.
Step S104, obtains matching string.
Step S106, calculates the cryptographic Hash with the length identical substring of target string in matching string.
Step S108, whether the cryptographic Hash comparing substring is identical with the cryptographic Hash of target string.
Step S110, when the cryptographic Hash of substring and the cryptographic Hash of target string are identical it is determined that word to be matched Symbol string is matched with target string.
The character string matching method of the present embodiment can be applied whether need to compare two character strings identical, or one Whether character string is included under the scene in another one character string.Such as, search in short in an article, or Person, searches certain key word etc. in certain sentence.During comparing, unlike the prior art, existing Character is corresponded and compares by technology, and character whole identical character string is identical character string, and in this embodiment is The comparison of the cryptographic Hash of the character string of two equal length, cryptographic Hash identical character string is identical character string.Due to The speed calculating cryptographic Hash is higher than the speed that character one-to-one corresponding compares, therefore, in the matching process of two character strings In improve the matching speed of character string, solve the slow technical problem of the matching speed of character string in prior art. Especially in the matching process of long character string, decrease and correspond the number of times comparing, can reach and significantly carry The effect of high matching speed.
Specifically, matching string is hoabcdefguafkdoieum, and target string is abcdefg, calculates mesh The cryptographic Hash of mark character string is H (T), intercepts matching string hoabcde of equal length, the Hash of its character string It is worth for H1 (S), now compare H (T) and be not equal to H1 (S) it is determined that abcdefg is different from hoabcde.Treating Matching string abcdefg of equal length is intercepted, its cryptographic Hash is H2 (S), compares H (T) in matched character string It is identical it is determined that comprising target string in matching string with H2 (S).In this embodiment, target word Symbol string comprises 7 characters, according to corresponding the matching process comparing in prior art, needs to treating that each intercepts Matched character string is mated 7 times, and the matching process in the present embodiment only needs to compare once, therefore, has reached raising The effect of the matching speed of character string, solves the slow technical problem of the matching speed of character string in prior art. And, can significantly find out from this embodiment, the character that target string comprises is more, with existing character string The method joined is compared, and the advantage of the character string matching method of this embodiment is more obvious, and matching speed is faster.
Alternatively, in order to improve the accuracy of coupling, each using target string and matching string meets bar The substring of part is mated, and the substring meeting condition is exactly and target character string length identical character string, The substring that will start from first character in matching string is as current substring;Calculate currently sub- word The cryptographic Hash of symbol string, and be compared with the cryptographic Hash of target string;Cryptographic Hash and target in current substring When the cryptographic Hash of character string is identical, stop comparing;In the cryptographic Hash of current substring and the cryptographic Hash of target string When different, using the next substring in matching string as current substring, and return calculating currently son The step of the cryptographic Hash of character string, till the length of next substring is less than the length of target string, its In, the two neighboring substring in matching string differs the distance of a character.
One by one the cryptographic Hash of the matching string with equal length is compared with the cryptographic Hash of target string, Comprise target string finding matching string, then no longer compare the remaining character string in matching string. If current character string is different from target string, it is compared using character late string, until word to be matched All characters of symbol string are all matched.
For example, matching string is hoabcdefguafkdoieum, and target string is abcdefg, first intercepts Hoabcde simultaneously calculates its cryptographic Hash, compares with the cryptographic Hash of target string, and judged result is that cryptographic Hash is different, then Continue to intercept character late string oabcdef, calculate its cryptographic Hash and the cryptographic Hash with target string is compared, Judged result is that cryptographic Hash is different, then continue to intercept character late string abcdefg as current string, calculate it The cryptographic Hash cryptographic Hash with target string is compared, judged result is that cryptographic Hash is identical, then do not continue to intercept Cryptographic Hash.The length of character string intercepting every time is all identical with the length of target string.If in the character being truncated to All there is no cryptographic Hash identical character string before string kdoieum, then compare cryptographic Hash and the target word of kdoieum The cryptographic Hash of symbol string, after difference yet, does not continue to intercept character late string, determines in this matching string and do not wrap Include target string.Matching string is judged by intercepting one by one and the length identical character string of target string In whether include target string, any one character in matching string can not be missed it is achieved that accurate Join the accuracy it is ensured that character match.
Alternatively, in order to avoid coupling error, improve the accuracy of coupling, revise word to be matched before being mated The type of symbol string makes its type consistent, i.e. the sub- word of length identical with target string in calculating matching string Before the cryptographic Hash of symbol string, method also includes:Extract in matching string and meet pre-conditioned character, and obtain The type of the character extracting;Obtain the type of matching string;Judge the type of character of extraction and character to be matched Whether the type of string consistent, when inconsistent, then revises the character of extraction so that the type of revised character with treat The type of matched character string is consistent.
Meeting pre-conditioned character can be the characters such as punctuation mark, for example, contain China and Britain in matching string Civilian punctuate, character to be matched is hoa, bcdefguafkdoieum, and target string is a, b, that is, in target string The punctuate in face is English punctuate, and the punctuate in matching string is Chinese punctuate, if the not mark to character to be matched Point is modified, then think in coupling that " a, b " in character to be matched is different with target string " a, b " Character string, and " a, the b " in character substantially to be matched and target string " a, b " they are identical character string.For The accuracy of coupling, the Chinese punctuate of character to be matched is revised as English punctuate, that is, with character to be matched in its He belongs to English at the type of character.
Alternatively, the length of identical then two character strings of two character strings is also identical, and the character string of equal length is comparing When be possible to obtain the conclusion that two character strings are identical character strings, if the length of character string is different, or treat The curtailment of the character string of coupling obtains a length identical character string with target string to intercept, then no longer Obtain substring, that is, before calculating the cryptographic Hash of target string, method also includes:Obtain word to be matched respectively Symbol string and the length of target string;Judge whether the length of matching string is more than or equal to the length of target string; When being judged as YES it is determined that calculating the cryptographic Hash of target string.
For example, matching string is hoabcdefguafkdoieum, and target string is abcdefg, is judging After the cryptographic Hash of substring kdoieum is different from the cryptographic Hash of target string abcdefg, need from doieum In d start to obtain next substring, but the number of remaining character is less than the length of target string, no It is obtained in that the length identical substring with target string it is determined that no longer obtaining the sub- word of matching string Symbol string, coupling terminates.
Alternatively, in order to avoid repeated matching being carried out to same matching string for identical target string, Method also includes:Add labelling to matching string, wherein, labelling is used for indicating that matching string is mated. Being the character string mated with markd character string, regardless of whether matching target string, not needing to this word Symbol string is mated again.This labelling can add it is also possible to screen before matching after certain string matching Go out and substantially do not need the character string mated to be marked, do not limit herein.This labelling can represent do not need right Certain target string is mated again, for example, matching string be hoabcdefguafkdoieum by Target string abcdefg mated, then add labelling a1 to matching string for hoabcdefguafkdoieum, When then mating target string abcdefg again, avoid the need for mating hoabcdefguafkdoieum;But, should Matching string with labelling a1 can also be mated with target string opq, and different target strings can Mated with object identical character to be matched.
Below in conjunction with Fig. 2, embodiments herein is illustrated.
Step S201, obtains source string S, and length L (S) of S, that is, obtain matching string S and its length Degree.
Step S202, initializing variable i=0, i.e. i-th character in matching string.
Step S203, judges L (S) >=i+L (T), judges in matching string that i-th character rises to last Whether the length of one character total character only is more than the length of target string.If it is, execution step S204, If it is not, then execution step S209.
Step S204, the individual character of the i-th to the i-th+L (T) intercepting S is as substring S1.
Step S205, obtains hash value H (S1) of S1.
Step S206, judges whether H (S1) is equal to H (T).If it is, execution step S207;If not, Then execution step S208.
Step S207, the match is successful, i.e. the success of cryptographic Hash identical match.
Step S208, i=i+1, start to intercept new character string from the character late of matching string, then Execution step S203, judges whether the length of character to be matched meets the requirement of target string.
Step S209, it fails to match.
By above-mentioned steps, it is possible to achieve the comparison of the cryptographic Hash of the character string of two equal length, cryptographic Hash identical Character string is identical character string.Speed due to calculating cryptographic Hash is higher than the speed that character one-to-one corresponding compares, because This, improve the matching speed of character string in the matching process of two character strings, solves character string in prior art The slow technical problem of matching speed.Especially in the matching process of long character string, decrease one a pair The number of times that should compare, can reach the effect significantly improving matching speed.
The embodiment of the present application additionally provides a kind of string matching device.This string matching device can execute above-mentioned word Symbol string matching method, above-mentioned character string matching method can also be executed by this string matching device.
Fig. 3 is the schematic diagram of the string matching device according to the embodiment of the present application.As shown in figure 3, this character string Equipped put including:First acquisition unit 10, second acquisition unit 30, computing unit 50, comparing unit 70 and first Determining unit 90.Wherein:
First acquisition unit 10 is used for the cryptographic Hash obtaining target string and calculating target string.
Second acquisition unit 30 is used for obtaining matching string.
Computing unit 50 is used for calculating the Hash with the length identical character string of target string in matching string Value, obtains the cryptographic Hash of substring.
Whether the cryptographic Hash that comparing unit 70 is used for comparison object character string is identical with the cryptographic Hash of substring.
First determining unit 90 is used for when the cryptographic Hash of target string and the cryptographic Hash of substring are identical, and determination is treated Matched character string is matched with target string.
The character string matching method of the present embodiment can be applied whether need to compare two character strings identical, or one Whether character string is included under the scene in another one character string.Such as, search in short in an article, or Person, searches certain key word etc. in certain sentence.During comparing, unlike the prior art, existing Technology is compared using the one-to-one corresponding of character, and character whole identical character string is identical character string, and this embodiment In be two equal length the cryptographic Hash of character string comparison, cryptographic Hash identical character string be identical character string. Speed due to calculating cryptographic Hash is higher than the speed that character one-to-one corresponding compares, therefore, in the coupling of two character strings During improve the matching speed of character string, solve the slow technology of the matching speed of character string in prior art Problem.Especially in the matching process of long character string, decrease and correspond the number of times comparing, can reach Significantly improve the effect of matching speed.
Specifically, matching string is hoabcdefguafkdoieum, and target string is abcdefg, calculates mesh The cryptographic Hash of mark character string is H (T), intercepts matching string hoabcde of equal length, the Hash of its character string It is worth for H1 (S), now compare H (T) and be not equal to H1 (S) it is determined that abcdefg is different from hoabcde.Treating Matching string abcdefg of equal length is intercepted, its cryptographic Hash is H2 (S), compares H (T) in matched character string It is identical it is determined that comprising target string in matching string with H2 (S).In this embodiment, target word Symbol string comprises 7 characters, according to corresponding the matching process comparing in prior art, needs to treating that each intercepts Matched character string is mated 7 times, and the matching process in the present embodiment only needs to compare once, therefore, has reached raising The effect of the matching speed of character string, solves the slow technical problem of the matching speed of character string in prior art. And, can significantly find out from this embodiment, the character that target string comprises is more, with existing character string The method joined is compared, and the advantage of the character string matching method performed by string matching device of this embodiment is more obvious, Matching speed is faster.
Alternatively, in order to improve the accuracy of coupling, each using target string and matching string meets bar The substring of part is mated, and the substring meeting condition is exactly and target character string length identical character string, I.e. comparing unit includes:Determining module, for making the substring starting in matching string from first character For current substring;Computing module, for calculating the cryptographic Hash of current substring, and the Kazakhstan with target string Uncommon value is compared;Stopping modular, for identical in the cryptographic Hash of current substring and the cryptographic Hash of target string When, stop comparing;Acquisition module, for different in the cryptographic Hash of current substring and the cryptographic Hash of target string When, using the next substring in matching string as current substring, and return computing module calculate work as The cryptographic Hash of front substring, till the length of next substring is less than the length of target string, wherein, Two neighboring substring in matching string differs the distance of a character.
One by one the cryptographic Hash of the matching string with equal length is compared with the cryptographic Hash of target string, Comprise target string finding matching string, then no longer compare the remaining character string in matching string. If current character string is different from target string, it is compared using character late string, until word to be matched All characters of symbol string are all matched.
For example, matching string is hoabcdefguafkdoieum, and target string is abcdefg, first intercepts Hoabcde simultaneously calculates its cryptographic Hash, compares with the cryptographic Hash of target string, and judged result is that cryptographic Hash is different, then Continue to intercept character late string oabcdef, calculate its cryptographic Hash and the cryptographic Hash with target string is compared, Judged result is that cryptographic Hash is different, then continue to intercept character late string abcdefg as current string, calculate it The cryptographic Hash cryptographic Hash with target string is compared, judged result is that cryptographic Hash is identical, then do not continue to intercept Cryptographic Hash.The length of character string intercepting every time is all identical with the length of target string.If in the character being truncated to All there is no cryptographic Hash identical character string before string kdoieum, then compare cryptographic Hash and the target word of kdoieum The cryptographic Hash of symbol string, after difference yet, does not continue to intercept character late string, determines in this matching string and do not wrap Include target string.Matching string is judged by intercepting one by one and the length identical character string of target string In whether include target string, any one character in matching string can not be missed it is achieved that accurate Join the accuracy it is ensured that character match.
Alternatively, in order to avoid coupling error, improve the accuracy of coupling, revise word to be matched before being mated The type of symbol string makes its type consistent, and that is, device also includes:Extraction unit, for calculate matching string in Before the cryptographic Hash of length identical substring of target string, meet pre-conditioned in extraction matching string Character, and obtain the type of the character of extraction;3rd acquiring unit, for obtaining the type of matching string; First judging unit, whether the type for judging the character of extraction is consistent with the type of matching string;Revise single Unit, for when inconsistent, revises the character extracting so that the class of the type of revised character and matching string Type is consistent.
Meeting pre-conditioned character can be the characters such as punctuation mark, for example, contain China and Britain in matching string Civilian punctuate, character to be matched is hoa, bcdefguafkdoieum, and target string is a, b, that is, in target string The punctuate in face is English punctuate, and the punctuate in matching string is Chinese punctuate, if the not mark to character to be matched Point is modified, then think in coupling that " a, b " in character to be matched is different with target string " a, b " Character string, and " a, the b " in character substantially to be matched and target string " a, b " they are identical character string.For The accuracy of coupling, the Chinese punctuate of character to be matched is revised as English punctuate, that is, with character to be matched in its He belongs to English at the type of character.
Alternatively, the length of identical then two character strings of two character strings is also identical, and the character string of equal length is comparing When be possible to obtain the conclusion that two character strings are identical character strings, if the length of character string is different, or treat The curtailment of the character string of coupling obtains a length identical character string with target string to intercept, then no longer Obtain substring, that is, device also includes:4th acquiring unit, for calculate target string cryptographic Hash before, Obtain matching string and the length of target string respectively;Second judging unit, for judging matching string Length whether more than or equal to the length of target string;Second determining unit, for when being judged as YES it is determined that Calculate the cryptographic Hash of target string.
For example, matching string is hoabcdefguafkdoieum, and target string is abcdefg, is judging After the cryptographic Hash of substring kdoieum is different from the cryptographic Hash of target string abcdefg, need from doieum In d start to obtain next substring, but the number of remaining character is less than the length of target string, no It is obtained in that the length identical substring with target string it is determined that no longer obtaining the sub- word of matching string Symbol string, coupling terminates.
Alternatively, in order to avoid repeated matching being carried out to same matching string for identical target string, Device also includes:Adding device, for adding labelling to matching string, wherein, labelling is used for indicating to be matched Character has been matched.It is the character string mated with markd character string, regardless of whether matching target string, Do not need this character string is mated again.This labelling can add after certain string matching it is also possible to Filter out before matching and substantially do not need the character string mated to be marked, do not limit herein.This labelling can Do not need certain target string is mated again to represent, for example, matching string is Hoabcdefguafkdoieum was mated by target string abcdefg, then to matching string be Hoabcdefguafkdoieum adds labelling a1, then, when mating target string abcdefg again, avoid the need for mating hoabcdefguafkdoieum;But, the matching string that should carry labelling a1 can also be with target string opq Mated, different target strings can be mated with object identical character to be matched.
By above-described embodiment, it is possible to achieve the comparison of the cryptographic Hash of the character string of two equal length, cryptographic Hash is identical Character string be identical character string.Speed due to calculating cryptographic Hash is higher than the speed that character one-to-one corresponding compares, Therefore, improve the matching speed of character string in the matching process of two character strings, solve character in prior art The slow technical problem of matching speed of string.Especially in the matching process of long character string, decrease one by one The number of times that correspondence compares, can reach the effect significantly improving matching speed.
Above-mentioned the embodiment of the present application sequence number is for illustration only, does not represent the quality of embodiment.
In above-described embodiment of the application, the description to each embodiment all emphasizes particularly on different fields, and does not have in certain embodiment The part describing in detail, may refer to the associated description of other embodiment.
It should be understood that disclosed technology contents in several embodiments provided herein, other can be passed through Mode realize.Wherein, device embodiment described above is only the schematically division of for example described unit, Can be a kind of division of logic function, actual can have other dividing mode when realizing, for example multiple units or assembly Can in conjunction with or be desirably integrated into another system, or some features can be ignored, or does not execute.Another, institute The coupling each other of display or discussion or direct-coupling or communication connection can be by some interfaces, unit or mould The INDIRECT COUPLING of block or communication connection, can be electrical or other forms.
The described unit illustrating as separating component can be or may not be physically separate, show as unit The part showing can be or may not be physical location, you can with positioned at a place, or can also be distributed to On multiple units.Some or all of unit therein can be selected according to the actual needs to realize this embodiment scheme Purpose.
In addition, can be integrated in a processing unit in each functional unit in each embodiment of the application it is also possible to It is that unit is individually physically present it is also possible to two or more units are integrated in a unit.Above-mentioned integrated Unit both can be to be realized in the form of hardware, it would however also be possible to employ the form of SFU software functional unit is realized.
If described integrated unit realized using in the form of SFU software functional unit and as independent production marketing or use when, Can be stored in a computer read/write memory medium.Based on such understanding, the technical scheme essence of the application On all or part of the part that in other words prior art contributed or this technical scheme can be with software product Form embodies, and this computer software product is stored in a storage medium, including some instructions with so that one Platform computer equipment (can be personal computer, server or network equipment etc.) executes each embodiment institute of the application State all or part of step of method.And aforesaid storage medium includes:USB flash disk, read only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), portable hard drive, magnetic disc or CD Etc. various can be with the medium of store program codes.
The above is only the preferred implementation of the application it is noted that ordinary skill people for the art For member, on the premise of without departing from the application principle, some improvements and modifications can also be made, these improve and moisten Decorations also should be regarded as the protection domain of the application.

Claims (10)

1. a kind of character string matching method is it is characterised in that include:
Obtain target string and calculate the cryptographic Hash of described target string;
Obtain matching string;
Calculate the cryptographic Hash with the length identical substring of described target string in described matching string;
Whether the cryptographic Hash of relatively described substring is identical with the cryptographic Hash of described target string;
When the cryptographic Hash of described substring is identical with the cryptographic Hash of described target string it is determined that described treat Matched character string is matched with described target string.
2. method according to claim 1 it is characterised in that the cryptographic Hash of target string described in comparison with described The whether identical inclusion of the cryptographic Hash of substring:
Using the substring starting from first character in described matching string as current substring;
Calculate the cryptographic Hash of current substring, and be compared with the cryptographic Hash of described target string;
When the cryptographic Hash of current substring is identical with the cryptographic Hash of described target string, stop comparing;
When the cryptographic Hash of current substring is different from the cryptographic Hash of described target string, will be described to be matched Next substring in character string is as current substring, and returns the current substring of described calculating The step of cryptographic Hash, till the length of next substring is less than the length of described target string, its In, the two neighboring substring in described matching string differs the distance of a character.
3. method according to claim 1 and 2 it is characterised in that described calculating matching string in mesh Before the cryptographic Hash of length identical substring of mark character string, methods described also includes:
Extract and in described matching string, meet pre-conditioned character, and obtain the type of the character of extraction;
Obtain the type of described matching string;
Judge whether the type of character of described extraction is consistent with the type of described matching string, when inconsistent When, then revise the character of extraction, so that the type one of the type of revised character and described matching string Cause.
4. method according to claim 1 it is characterised in that described calculate target string cryptographic Hash before, Methods described also includes:
Obtain described matching string and the length of described target string respectively;
Judge whether the length of described matching string is more than or equal to the length of described target string;
When being judged as YES, determine the cryptographic Hash calculating described target string.
5. method according to claim 1 is it is characterised in that methods described also includes:
Add labelling to described matching string, wherein, described labelling is used for indicating described matching string Mate.
6. a kind of string matching device is it is characterised in that include:
First acquisition unit, for obtaining target string and calculating the cryptographic Hash of described target string;
Second acquisition unit, for obtaining matching string;
Computing unit, for calculating length identical with described target string in described matching string The cryptographic Hash of character string;
Whether comparing unit, the cryptographic Hash for the described substring of comparison and the cryptographic Hash of described target string Identical;
First determining unit, for the cryptographic Hash phase of cryptographic Hash and described substring when described target string Meanwhile, determine that described matching string is matched with described target string.
7. device according to claim 6 is it is characterised in that described comparing unit includes:
Determining module, for the substring that will start from first character in described matching string as working as Front substring;
Computing module, for calculating the cryptographic Hash of current substring, and the cryptographic Hash with described target string It is compared;
Stopping modular, for when the cryptographic Hash of current substring is identical with the cryptographic Hash of described target string, Stopping is compared;
Acquisition module, for when the cryptographic Hash of current substring is different from the cryptographic Hash of described target string, Using the next substring in described matching string as current substring, and return described calculating mould Block calculates the cryptographic Hash of current substring, until the length of next substring is less than described target string Length till, wherein, the two neighboring substring in described matching string differ a character away from From.
8. the device according to claim 6 or 7 is it is characterised in that described device also includes:
Extraction unit, for calculating the sub- word of length identical with target string in matching string described Before the cryptographic Hash of symbol string, extract in described matching string and meet pre-conditioned character, and obtain extraction Character type;
3rd acquiring unit, for obtaining the type of described matching string;
First judging unit, for judging the type of character and the type of described matching string of described extraction Whether consistent;
Amending unit, for when the first judging unit is judged as inconsistent, revising the character extracting so that revising The type of character afterwards is consistent with the type of described matching string.
9. device according to claim 6 is it is characterised in that described device also includes:
4th acquiring unit, for, before the described cryptographic Hash calculating target string, treating described in acquisition respectively Matched character string and the length of described target string;
Second judging unit, whether the length for judging described matching string is more than or equal to described target word The length of symbol string;
Second determining unit, for when the second judging unit is judged as YES it is determined that calculate described target character The cryptographic Hash of string.
10. device according to claim 6 is it is characterised in that described device also includes:
Adding device, for adding labelling to described matching string, wherein, described labelling is used for indicating institute State character to be matched to be matched.
CN201510549622.4A 2015-08-31 2015-08-31 Character string matching method and device Pending CN106484730A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510549622.4A CN106484730A (en) 2015-08-31 2015-08-31 Character string matching method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510549622.4A CN106484730A (en) 2015-08-31 2015-08-31 Character string matching method and device

Publications (1)

Publication Number Publication Date
CN106484730A true CN106484730A (en) 2017-03-08

Family

ID=58235459

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510549622.4A Pending CN106484730A (en) 2015-08-31 2015-08-31 Character string matching method and device

Country Status (1)

Country Link
CN (1) CN106484730A (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108628817A (en) * 2017-03-15 2018-10-09 腾讯科技(深圳)有限公司 A kind of data processing method and device
CN109408681A (en) * 2018-10-11 2019-03-01 广东工业大学 A kind of character string matching method, device, equipment and readable storage medium storing program for executing
CN111090982A (en) * 2018-10-24 2020-05-01 迈普通信技术股份有限公司 Text comparison method and device, electronic equipment and computer readable storage medium
CN111191087A (en) * 2019-12-31 2020-05-22 歌尔股份有限公司 Character matching method, terminal device and computer-readable storage medium
CN111475690A (en) * 2020-06-19 2020-07-31 支付宝(杭州)信息技术有限公司 Character string matching method and device, data detection method and server
CN111627536A (en) * 2020-05-14 2020-09-04 广元市中心医院 Adverse event management system and method for hospital
CN111797285A (en) * 2020-06-30 2020-10-20 深圳壹账通智能科技有限公司 Character string fuzzy matching method, device, equipment and readable storage medium
CN112528101A (en) * 2020-12-22 2021-03-19 杭州趣链科技有限公司 Character string matching method, device, equipment and storage medium
CN112784125A (en) * 2021-01-14 2021-05-11 辽宁工程技术大学 Mode identification method and device for input information

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101901257A (en) * 2010-07-21 2010-12-01 北京理工大学 Multi-string matching method
CN102184245A (en) * 2011-05-18 2011-09-14 华北电力大学 Method for fast searching massive text data keywords
CN103186669A (en) * 2013-03-21 2013-07-03 厦门雅迅网络股份有限公司 Method for rapidly filtering key word
CN103425739A (en) * 2013-07-09 2013-12-04 国云科技股份有限公司 Character string matching algorithm
CN103455753A (en) * 2012-05-30 2013-12-18 北京金山安全软件有限公司 Sample file analysis method and device
CN104246663A (en) * 2013-12-31 2014-12-24 华为终端有限公司 Character string input control method and device
CN104462322A (en) * 2014-12-01 2015-03-25 北京国双科技有限公司 Method and device for contrasting character strings
CN104850784A (en) * 2015-04-30 2015-08-19 中国人民解放军国防科学技术大学 Method and system for cloud detection of malicious software based on Hash characteristic vector

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101901257A (en) * 2010-07-21 2010-12-01 北京理工大学 Multi-string matching method
CN102184245A (en) * 2011-05-18 2011-09-14 华北电力大学 Method for fast searching massive text data keywords
CN103455753A (en) * 2012-05-30 2013-12-18 北京金山安全软件有限公司 Sample file analysis method and device
CN103186669A (en) * 2013-03-21 2013-07-03 厦门雅迅网络股份有限公司 Method for rapidly filtering key word
CN103425739A (en) * 2013-07-09 2013-12-04 国云科技股份有限公司 Character string matching algorithm
CN104246663A (en) * 2013-12-31 2014-12-24 华为终端有限公司 Character string input control method and device
CN104462322A (en) * 2014-12-01 2015-03-25 北京国双科技有限公司 Method and device for contrasting character strings
CN104850784A (en) * 2015-04-30 2015-08-19 中国人民解放军国防科学技术大学 Method and system for cloud detection of malicious software based on Hash characteristic vector

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
搜索技术博客-淘宝: "字符串匹配那些事(一)", 《HTTPS://KB.CNBLOGS.COM/PAGE/107856/》 *

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108628817A (en) * 2017-03-15 2018-10-09 腾讯科技(深圳)有限公司 A kind of data processing method and device
CN108628817B (en) * 2017-03-15 2022-07-26 腾讯科技(深圳)有限公司 Data processing method and device
CN109408681B (en) * 2018-10-11 2021-11-26 广东工业大学 Character string matching method, device and equipment and readable storage medium
CN109408681A (en) * 2018-10-11 2019-03-01 广东工业大学 A kind of character string matching method, device, equipment and readable storage medium storing program for executing
CN111090982A (en) * 2018-10-24 2020-05-01 迈普通信技术股份有限公司 Text comparison method and device, electronic equipment and computer readable storage medium
CN111191087A (en) * 2019-12-31 2020-05-22 歌尔股份有限公司 Character matching method, terminal device and computer-readable storage medium
CN111191087B (en) * 2019-12-31 2023-11-07 歌尔股份有限公司 Character matching method, terminal device and computer readable storage medium
CN111627536A (en) * 2020-05-14 2020-09-04 广元市中心医院 Adverse event management system and method for hospital
CN111475690A (en) * 2020-06-19 2020-07-31 支付宝(杭州)信息技术有限公司 Character string matching method and device, data detection method and server
CN111475690B (en) * 2020-06-19 2020-12-25 支付宝(杭州)信息技术有限公司 Character string matching method and device, data detection method and server
CN111797285A (en) * 2020-06-30 2020-10-20 深圳壹账通智能科技有限公司 Character string fuzzy matching method, device, equipment and readable storage medium
CN112528101A (en) * 2020-12-22 2021-03-19 杭州趣链科技有限公司 Character string matching method, device, equipment and storage medium
CN112784125A (en) * 2021-01-14 2021-05-11 辽宁工程技术大学 Mode identification method and device for input information

Similar Documents

Publication Publication Date Title
CN106484730A (en) Character string matching method and device
TWI729472B (en) Method, device and server for determining feature words
CN107193921B (en) Method and system for correcting error of Chinese-English mixed query facing search engine
CN103123618B (en) Text similarity acquisition methods and device
US10552462B1 (en) Systems and methods for tokenizing user-annotated names
CN103440252B (en) Information extracting method arranged side by side and device in a kind of Chinese sentence
CN105589894B (en) Document index establishing method and device and document retrieval method and device
CN106528647B (en) One kind carrying out the matched method of term based on cedar even numbers group dictionary tree algorithm
US9984064B2 (en) Reduction of memory usage in feature generation
CN104008093A (en) Method and system for chinese name transliteration
CN102867049B (en) Chinese PINYIN quick word segmentation method based on word search tree
US10248646B1 (en) Token matching in large document corpora
CN108734110A (en) Text fragment identification control methods based on longest common subsequence and system
CN104281275B (en) The input method of a kind of English and device
CN103631938A (en) Method and device for automatically expanding segmentation dictionary
CN107797995A (en) A kind of Chinese and English fragment language material generation method
US9965546B2 (en) Fast substring fulltext search
US20180011836A1 (en) Tibetan Character Constituent Analysis Method, Tibetan Sorting Method And Corresponding Devices
CN104933030A (en) Uygur language spelling examination method and device
CN110795617A (en) Error correction method and related device for search terms
CN106569986A (en) Character string replacement method and device
CN104750846A (en) Method and device for finding substring
CN103544139A (en) Forward word segmentation method and device based on Chinese retrieval
Tang et al. An optimization algorithm of Chinese word segmentation based on dictionary
Theeramunkong et al. Pattern-based features vs. statistical-based features in decision trees for word segmentation

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 100083 No. 401, 4th Floor, Haitai Building, 229 North Fourth Ring Road, Haidian District, Beijing

Applicant after: Beijing Guoshuang Technology Co.,Ltd.

Address before: 100086 Cuigong Hotel, 76 Zhichun Road, Shuangyushu District, Haidian District, Beijing

Applicant before: Beijing Guoshuang Technology Co.,Ltd.

CB02 Change of applicant information
RJ01 Rejection of invention patent application after publication

Application publication date: 20170308

RJ01 Rejection of invention patent application after publication