WO2007101391A1

WO2007101391A1 - A discrete substring matching method for information searching and information inputting

Info

Publication number: WO2007101391A1
Application number: PCT/CN2007/000392
Authority: WO
Inventors: Guangyao Ding
Original assignee: Guangyao Ding
Priority date: 2006-03-07
Filing date: 2007-02-05
Publication date: 2007-09-13

Abstract

A discrete substring matching method for information searching and information inputting is disclosed. The discrete substring is a character string (“Sg1Sg2&mldr;&mldr;Sgm”(1≤g1<g2&mldr;&mldr;<)gm) formed by one or multiple character in the text S=“S1S2&mldr;&mldr;Sn”. The discrete substring pattern matching is whether the judgement pattern P=“P1P2P3&mldr;&mldr;Pm”(1≤m≤n) is a discrete substring “Sg1Sg2&mldr;&mldr;Sgm” of text S or not. On the other hand, the method provides detail steps of the discrete substring pattern matching. The discrete substring expands the concept scope of the substring. The pattern matching method solves the problem of the pretermission in text searching with discrete feature. It has some advantages in function. For example, it improves the integrality and the veracity of searching. It makes position become easy. It has some advantages in application. For example, it makes information searching and information inputting become simple, flexible and quick.

Description

Discrete substring pattern matching method for information retrieval and information input

Technical field

The invention relates to a discrete substring pattern matching method for information retrieval and information input.

Background technique

Existing information retrieval and information input fields require substrings to perform pattern matching on text. For example, in the field of information retrieval, the input search term needs to be used as a substring to perform matching search on a stored text such as a database or a webpage. If the search term is a substring of the stored text, the stored text is output as the retrieved text, otherwise the text is discarded; If all the text stored in the file does not match the search term, no text can be retrieved. In the field of information acquisition, a character string input on an input device such as a keyboard is used as a pattern substring to match text in a text library stored by an information processing device such as a computer; if the pattern substring matches the text, the text is selected and performed. Follow-up processing. Obviously, the speed, recall rate, and precision of the substring pattern matching method are crucial for information retrieval and information input.

The substring of the existing information retrieval and information input fields is defined as: on a finite character set, given a text string of length η S= %S ₂ ... S„" , and a pattern character of length m Ρ Ρ = ...... P „ "; If there is "SiS _i+1 ...... S _i+B -:" = "P:P ₂ ...... P _m ,,, then P is a substring of S, and P appears The position in S is i. That is, the existing substring must be composed of consecutive characters in the text string S, and the character string composed of the discontinuous characters in the text string S is not a substring of the text S. The substring pattern matching is Refers to: Whether there is a substring equal to the pattern P in the text S. In some application fields, the judgment is also required, and the matching degree and the position of occurrence are output.

This is the simplest and most classic substring pattern matching problem. The earliest method for this problem is the Brute-Force method (the simple substring pattern matching method), which has a worst-case time complexity of 0 (m*n). In 1970, SA Cook theoretically proved that the substring pattern matching problem can be solved in 0 (m+n) time. In the same year, Morr is and Pra tt constructed an algorithm following the proof of Cook, but the time complexity did not reach 0 (ηι+η). Later, Kmith improved the algorithm. Finally, in 1976, the first algorithm in the history of solving substring pattern matching in 0 (ra+n) time complexity was born. The algorithm is abbreviated as KMP (Knuth, Morr). Is , Prat t) , time complexity is significantly reduced. In 1977, Boyer and Moore proposed another algorithm with linear time complexity 0 (ra+n) (BM algorithm). The BM algorithm adopts a right-to-left matching idea. In the actual pattern matching, a lot of useless characters are skipped, so that the service algorithm achieves high efficiency, especially when performing pattern matching of substrings on a large character set. For significant, it is widely used. Since then, some more efficient algorithms have been proposed, mostly based on KMP or BM algorithms.

The above various pattern matching algorithms are based on the pattern P, searching for a continuous substring matching the P in the text string S, and the algorithm is continuously improved around improving the matching speed.

Based on such a substring matching method, in the fields of information retrieval, information input, and the like, there has been a problem of missing related texts for a long time.

Let the text string S have the substring "SiS _i+1 ·.· . . . , which is equal (matched) to the pattern P. Then the discrete characteristics of the substring in the text S are mainly reflected in three aspects:

a) intermediate continuous, consisting of consecutive la characters starting from the ith character of S; b) backward default, followed by S consecutive default n- m-i+1 characters;

c) Forward default, the front of S is continuously defaulted by i-1 characters.

The following example clearly reflects the obvious discrete-related text omissions that exist using the above substrings for matching:

Example 1: S= "Chinese Pinyin, Stroke, Tone Combination Input Method"; P = "Pinyin, Stroke, Tone".

P matches S, which is a substring of S.

Example 2: S= "Chinese Pinyin, Stroke, Tone Combination Input Method", P = "Pinyin Stroke Tone", its typical features are: P discretely appears in S.

P does not match S, not a substring of S.

Obviously, in information retrieval and information input, it is hoped that in the second case, P is also a substring of S, which obviously contradicts the substring definition. The existing substring matching method cannot achieve the matching of P to S in the second case.

This is a problem that the existing substring concept indicates the dependency of the substring character. The sub-string correlation concept reflects that the discrete feature is not a perfect ensemble correlation concept. The sub-string conceptually misses the relevant text with discrete characteristics, which brings many troubles to the application and increases the problem-solving. Complexity.

The following example further reflects the inherent discrete related text missing problem in the entire information retrieval system based on the substring concept.

Example 3: File Search

Suppose there is "my-working-dai ly- plan, doc" in the hard disk. The existing substring pattern matches, the file cannot be retrieved by retrieving the string "mwdp", and "mwdp" appears discretely in "my-working-daily-a plan, doc".

Example 4: In the spelling of the English word "procedure", it is easy to remember the initials of each syllable and the last string of the string "prcde", which does not satisfy the substring definition of "procedure", however "prcde" Discretely appear in "procedure". Enter "prcde" and the existing substring matches, and the English word cannot be retrieved.

Example 5: Input of Chinese characters

Assume that the "bed" pinyin stroke is encoded as "chuangdhp" and stored in the Chinese character encoding library. Since the encoding is too long, can you enter "bed" with random default input methods such as "cugdh", "cdhl", "cugd", etc., thus reducing the input code length? Note that "cugdh", "cdhl", and "cugd" appear discretely in "chuangdhp". This feature is not possible based on existing substring pattern matching.

Another type of pattern matching is a non-exact matching, which is used to determine whether the pattern P is similar to the text S, allowing a limited error to occur, and the similarity constraint, the return judgment result, and the positioning position are applied to information retrieval, information processing, and biology. Technical DNA matching and many other fields. The main error factors that affect the exact match include: insert error, exchange error, delete error, replacement error, reverse error, and so on. Due to the variety of error factors, the inexact pattern matching method considers some error factors comprehensively, and forms various solutions from linear time complexity to non-deterministic polynomial time complexity (NPC problem) from different application angles and various techniques. Solution, trying to solve the matching problem that allows limited errors, the effect is limited by the comprehensiveness of the error and the number of errors.

For example, when the non-exact matching method deals with the problem in Example 4 above, the ED (Edi t Distance) method that does not match the exact match considers that there are four deletion errors. Since the BD method also considers insert errors and replacement errors, if four errors are allowed The matching, the comprehensive processing of the three types of errors, will result in matching a large number of words satisfying 4 error constraints from the English lexicon, making the matching result meaningless. Wildcard matching is an option to solve this type of problem, such as entering "pr*c*d*e", but you must consider where and how many wildcards are added. For the public, this solution still has operational advantages. difficult. The maximum matching can also solve the problem, but the maximum matching is equivalent to comprehensive consideration of the insertion and deletion error factors, so the complexity of the method itself, the time complexity is improved, and the number of candidate words is increased for the thesaurus retrieval. Hamming Distance only considers replacement errors. The similarity matching still considers three kinds of error factors comprehensively, and seeks the similarity between the pattern and the text. The retrieval effect of the above inexact method is limited by the number of allowed errors and the comprehensiveness of errors. Therefore, existing inexact matching does not solve these typical discrete correlation problems well.

With the continuous popularization and deepening of network information, popular information acquisition and information input become the bottleneck of information. String pattern matching has become the most dazzling star in information acquisition and information input. Information is based on existing string pattern matching methods. Obtaining the above-mentioned discrete related text omission problems with information input, has caused a lot of inconvenience to the general public and needs to be solved urgently.

Summary of the invention

The object of the present invention is to solve the above problems, and to provide a discrete substring pattern matching method for information retrieval and information input, which has high recall rate, high accuracy, and easy positioning; information retrieval and information input tube Single, flexible and fast.

The technical solution adopted by the present invention to solve the technical problem thereof is as follows: A discrete substring pattern matching method for information retrieval and information input, characterized in that the discrete substring is text s= '%s ₂ ... ... s„" any one or more characters consisting of the string "S _gJ S _g2 ...... S _gm " (1 < _gl <g ₂ ...... <g„<n); Discrete substring pattern matching That is, the determination mode P = "Ρ Ρ ₃ ...... P _m " (lm < n ) is the discrete substring "S _gl S _g2 ... S _gm " of the text S, and the specific steps of outputting the determination result are as follows:

Step a: Take the first character of the text S as the compared character, and take the first character of the pattern P as the comparison character; bstep If the compared character or the comparison character is the end flag, go to step d;

If the comparison character is equal to the comparison character, the next character of the text S is taken as the compared character, and the next character of the pattern P is taken as the comparison character, and the step b is performed; otherwise, the next character of the text S is taken as the Compare characters, compare characters, and turn b steps;

If the comparison character is the end flag, the determination mode P is a discrete substring of the text S, and the data representing the determination result "present" is output, and the matching is ended; otherwise, the discrete substring of the pattern P does not exist in the text S, and the output is The data representing the "non-existence" of the judgment result ends the match.

Compared with the prior art, the beneficial effects of the present invention are:

1. The discrete substring of the present invention is a character string consisting of any one or more characters in the text S, which extends the concept of the substring, that is, does not require the characters in the substring to be consecutive characters in the text S. The existing substring is only a special case of the discrete substring in the present invention. Since the present invention is a discrete substring of the text S, the method gives the result of the P matching S. Therefore, the recall rate of the present invention is significantly improved, and the problem of discrete related text omission existing in the existing substring pattern matching is solved. . And it is also convenient to implement the positioning of the discrete substrings by further pattern matching methods.

Second, the method of the present invention is only when the characters in the pattern P must be completely, ordered (in bit order) and can be discretely in the text S When appears, the text S is judged to be the text associated with the pattern P, and therefore, the matching accuracy is guaranteed. Third, the theoretical analysis shows that the time complexity of the existence of discrete substrings is 0 (n), the number of character comparisons f (n) <n; the time complexity of determining that there are no discrete substrings is 0 (n), characters The number of comparisons f (n) = n. This decision method skips extraneous text in a faster manner than the complexity of the existing substring pattern matching method (m+n) (SA Cook theory). Therefore, the method of the present invention is quick and effective.

4. Since the discrete substring is a string consisting of any one or more characters in the text S 'SA ... S _n ", when the information is retrieved and the information is input, the search term may be ordered by the text, It can be composed of discrete characters. The choice of search terms is very simple and flexible. It can reduce the input code length and effectively avoid spelling errors or dialect errors.

V. Compared with the existing research ideas of inexact string matching based on error factor distance calculation, discrete substring pattern matching adopts completely different discrete pattern based string pattern matching research ideas. The error factor is the phenomenon exhibited by the string matching problem, and the discrete property is the inherent law of the string matching problem. The discrete property is not equivalent to any error factor. For example: There is a conceptual difference between a delete error and a discrete feature. In text-to-pattern matching, the delete error can exist anywhere in the text, while the discrete feature only discusses the discrete number of characters in "S _Bl S _g2 ....

This matching method is a basic discrete substring pattern matching method. When it is determined that there is a discrete substring, the output data representing the judgment result "present" is output, otherwise the data representing the judgment result "nonexistence" is output. Suitable for inputting judgments of short text and large-capacity character string sets.

The above-mentioned discrete substring pattern matching method for information retrieval and information input can be slightly modified to form a pattern matching method for outputting a simple matching degree, which is a step, b step, and c step in the above basic matching method. Change, and step d is modified to:

If the comparison character is the end flag in d step, it is determined that the mode P is a discrete substring of the text S, and the length n of the text S, the length m of the pattern P, and the simple matching degree of the output discrete substring = Round (100 xm ÷ n ) are obtained. End matching; otherwise, it is determined that there is no discrete substring of the pattern P in the text S, and the determination result "-1" is output, and the matching is ended.

This output matching method of simple matching degree determines the simple matching degree (100*m/n) of the discrete substring of the output text S and the mode P when there is a discrete substring. It is suitable for information input judgment of short text and large capacity string sets. The search result can be arranged in descending order of discrete substring simple matching degree, and the retrieved text is output, so that the user can first select the text with high matching degree.

This method determines that the time complexity of the discrete substring is 0 (η), the number of character comparisons f (n) < n; the time complexity of determining that there is no discrete substring is 0 (η), the number of character comparisons f (n) ) =n , the same as the basic pattern matching method above.

The above-mentioned discrete substring pattern matching method for information retrieval and information input can be slightly modified to form a pattern matching method for output accurate matching degree, which is the same as the a step and the b step in the above basic matching method. Step c and step d are modified to:

If the comparison character is equal to the comparison character, the position value of the character to be compared in the text S is stored in the position array pos [ ], and the storage position is the same as the position of the comparison character in the pattern P, and the text S is taken. The next character as the compared character, take the next character of the pattern P as the comparison character, and turn to step b; otherwise, take the next character of the text S as the compared character, compare the characters unchanged, and turn to step b;

If the comparison character is not the end flag, it is determined that there is no discrete substring of the pattern P in the text S, and the determination result is outputted as "- Γ , ending the matching; otherwise, the determination mode P is the discrete substring of the text S, and the text S is obtained. Length n, mode P The length m, find the position of the first character of the discrete substring and the position of the last character in the text S: _gl = pos [ the first value in [ ], the last value in g _m = pos [ ], the output can reflect the discrete The exact match of the substring dispersion degree = Round (100 X (ra - (g - g - m + 1) ÷ n) ÷ n) , the end of the match.

This method of matching the exact matching degree of the output determines the exact matching degree of the output text S and the mode ( when there are discrete substrings (100*(m-(t„-t -m+l) ÷ n) ÷ n) , The exact matching degree considers not only the length of the text S and the mode Ρ, but also the influence of the discrete number of the retrieved discrete substrings on the matching degree. It is also suitable for the information input judgment of short text and large capacity string sets. The precise matching degree is arranged in descending order, and the retrieved text is output, which is more convenient for the user to first select the text with high matching degree.

The present invention determines that the time complexity of the existence of the discrete substring is 0 ( _η ) , the number of character comparisons f (n) <n; the time complexity of determining that there is no discrete substring is 0 (n), the number of character comparisons f (n) =n, the same as the basic pattern matching method above.

The discrete substring pattern matching method for information retrieval and information input with the precise matching degree of output can be modified to form a pattern matching method for outputting discrete number and position of discrete substrings, and the method is the above-mentioned output accurate matching degree. In the pattern matching method, steps a, b, and c are unchanged, and step d is modified to:

If the comparison character is not the end flag, it is determined that there is no discrete substring of the pattern P in the text S, and the determination result is output, and the matching is ended; otherwise, the pattern P is the discrete substring of the text S "S _gl S _g2 ... S The length m of the pattern P is obtained, and the position of the first character of the discrete substring and the position of the last character in the text S is obtained: _gl = a value in pos [ ], the last value in g»=pos [ ], the output is discrete The discrete number of the substring D=g _m -g -m+l , and output the position array pos [ ] to end the match.

After the method determines that there are discrete substrings, the discrete number of the discrete substring and the corresponding position of each character of the discrete substring in the text S are output, and the discrete number reflects the degree of dispersion, and the discrete number of the detected text can be ascending Sorting, combined with positioning information, makes the subsequent processing of information retrieval more accurate and effective. This method is suitable for location retrieval of short texts such as network information search and database information retrieval.

Time complexity analysis of the method: The time complexity of finding the first discrete substring is 0 (η), the number of character comparisons f (n) < n; The time complexity of finding the discrete substring is O (n) , the number of character comparisons f (n) = ri. As can be seen from the time complexity, the time complexity of the decision method is independent of the mode P. It is only necessary to compare the n times to skip the irrelevant text S, and the worst case of finding the first discrete substring is to compare n. Secondary character.

The above-mentioned discrete substring pattern matching method for outputting discrete substring discrete numbers and positions for information retrieval and information input can be modified to form a pattern matching method for outputting discrete substring discrete numbers and positions based on a given discrete number. The method is the above method for outputting the discrete substring discrete number and the position matching method. The a step, the b step, and the c step are unchanged, and the d step is modified as: d step, if the comparison character is not the end flag, it is determined that the pattern does not exist in the text S. The discrete substring of P outputs the judgment result "-1" to end the match; otherwise, the pattern P is the discrete substring "S _el S _s2 ... S _s „" of the text S, and finds: the length m of the pattern P, discrete The first character of the substring and the position of the last character in the text S: _gl =pos [the first value in [ ], the last value in g„=pos [ ], the discrete number of the discrete substring D=g ₀ - g -ra+L

If the discrete number D < a predetermined discrete number D „, then the decision mode P is the discrete number D of the text S. The first discrete substring required, the discrete number D is output, and the position array pos [ ] is output, ending Match.

If D> D ₀ ; restart the matching of the next discrete substring, modify the position of the compared character in the text S to: The currently compared character position - the length m of the mode P - a predetermined discrete number D „, and takes the character of the position as the compared character; the position of the comparison character in the mode P is modified to the first character position of the mode P, and Take the character at that position as the comparison character and turn to step b.

This is based on a given discrete number D. The discrete substring pattern matching method can be applied to the location search of long and short texts such as network information search and database information retrieval.

This method can adjust the discrete number D. , changing the function of discrete substring pattern matching, searching for discrete substrings and positions satisfying the requirements of a given discrete number; the smaller the discrete number, the more accurate the search positioning, but the worse the search function is, and may skip related Some texts satisfying discrete substrings; the larger the discrete number, the less precise the search position, but the more powerful the search finds, the more text that satisfies the discrete substring matches.

Therefore, the given discrete number can be determined by the user, and by changing the discrete number, a flat street is sought in the recalling rate, the precision rate, and the positioning accuracy, thereby satisfying different conditions, the user has different recall rates, precision ratios, Information retrieval requirements for positioning accuracy. When given a discrete number D. =0, evolved into the existing substring pattern matching, the retrieval function is equivalent to the substring, and the compatibility with the substring pattern is achieved. Visible, discrete number D. Play an important role in the discrete substring pattern matching method.

Its time complexity analysis: Find one that meets the discrete number D. The time complexity of the discrete substring is 0 (n+k (m+D„)), and the number of character comparisons is f (n) < n+ (k-1) (ra+D„); The time complexity of the discrete substring of D„ is ◦(n+k (m+D„) ), the number of character comparisons is f (n) =n+k (m+ D„); k is the found discrete substring The number of times, D. is the given discrete number.

The discrete substring pattern matching method for information retrieval and information input with the above output precise matching degree can be slightly modified and expanded to form a pattern matching method for outputting the discrete substring matching degree, and the method is the above output accurate matching degree. In the pattern matching method, the a step, the b step, and the c step are unchanged, and the discrete substring is first found, and then the discrete substrings are found by the following d steps, e steps, and f steps, that is, discrete numbers in the relevant range of the discrete substrings found. The smallest discrete substring, then determined by g step, h step and output the discrete prime substring matching degree:

Step d If the comparison character is not the end mark, go to step h; otherwise, move the position of the compared character in the text S forward by 2 character positions, and take the character of the position as the compared character, and compare the position of the character in the pattern P. Move forward 2 characters and take the character at that position as the comparison character.

Then, proceed to the following steps e, f, g, and h:

Step e If the first character of mode P has been compared, go to g step;

If the comparison character is equal to the comparison character, the position value of the character to be compared in the text S is stored in the position array pos gate, and the storage position is the same as the position of the comparison character in the pattern P, and the text S is taken. The previous character is used as the compared character, and the previous character of the pattern P is taken as the comparison character, and the e step is changed; otherwise, the previous character of the text S is taken as the compared character, the comparison character is unchanged, and the e step is performed;

The step determination mode P is a discrete element substring of the text S, and the length n of the text S is obtained, and the length ra of the pattern P is obtained, and the position of the first character and the last character of the discrete element substring in the text S is obtained: g is produced in pos [ ] The first value, g _m = the last value in pos [ ], the output discrete prime substring match = Round (100 χ (m- (g _m -g -m+l) ÷ n) ÷ n) , end Matching; the discrete element substring of the pattern P does not exist in the step determination text S, and the determination result "-1" is output, and the matching is ended.

The pattern matching method for outputting the discrete sub-string matching degree can determine whether a discrete element substring exists and output text The degree of matching of S with the discrete prime substring of pattern P (100 x (m - (g _{n -} g - m + l) ÷ n) ÷ n). The discrete element substring reflects a better matching position than the discrete substring, and therefore, the matching degree of the discrete substring can better reflect the degree of matching.

This method is suitable for information input judgment of short text and large-capacity string sets. By using the discrete element substring matching degree, all the retrieved texts can be output in descending order, so that the user first processes the text with high matching degree, and the retrieval efficiency is improved.

The method finds the first discrete element substring with a time complexity of 0 (n) and the character comparison number f (n) < n+ (m+D _r ) < 2n-l , and D _f is the first discrete element found. The discrete number of the string; the time complexity of the discrete substring not found is 0 (n), and the number of character comparisons f (n) = n. From the time complexity analysis, the time complexity of the discrete method substring cannot be found in this decision method. It is only necessary to compare n times, and the unrelated text can be skipped. The worst case of finding the first discrete prime substring is Compare 2n-1 characters.

The discrete substring pattern matching method for outputting discrete element sub-string matching degree for information retrieval and information input is slightly modified to form a pattern matching method for outputting discrete numbers and positions of discrete sub-substrings, and the method is the above-mentioned output discrete sub-string In the pattern matching method of matching degree, a-f step and h step are unchanged, and g step is modified to:

The g step determination mode P is a discrete element substring of the text S, and the length m of the pattern P is obtained, and the position of the first character and the last character of the discrete element substring in the text S is obtained: the first value in g^pos t ] , g„=pos [ ] the last value, output the discrete number Dg^-gr m+l, and output the position array pos [ ] to end the match.

The method outputs the discrete number of the discrete element substring and the position of each character of the discrete element substring in the text S after determining that the discrete element substring exists. Discrete element substring positioning is better than discrete substring pattern matching positioning. This is because there are discrete substrings, and there must be discrete substrings, and the discrete number of discrete substrings must be less than or equal to the discrete number of discrete substrings; With the degree of dispersion, the discrete numbers of the detected text can be sorted in ascending order, and then combined with the positioning information, so that the subsequent processing of information retrieval is more accurate and effective. This method is suitable for location retrieval of short texts such as network information search and database information retrieval.

Time complexity analysis of the method: The time complexity of finding the first discrete prime substring is 0 (n), the number of character comparisons f (n) < 2n-l; The time complexity of finding the discrete substring is O ( n) , the number of character comparisons f (n) = n. As can be seen from the time complexity, the time complexity of the discrete method substring cannot be found in the decision method. It is only necessary to compare the n times, and then skip the irrelevant text S and find the first discrete substring. The worst case is to compare 2n-l characters.

The discrete substring pattern matching method for outputting information and information input by discrete numbers and positions of the discrete elements of the discrete elements can be modified to form a pattern matching method for outputting discrete numbers and positions of discrete prime substrings based on a given discrete number. The method is the af step and the h step in the pattern matching method for outputting the discrete number and position of the discrete prime substring, and the g step is modified to -.

The g step determination mode P is a discrete element substring of the text S, and the length m of the mode P is obtained, and the first character of the discrete substring and the position of the last character in the text S are obtained: g is the first of the pos [ ] The last value in the value, g _m =pos [ ], the discrete number Dg g m+L

If the discrete number I is given a discrete number D. Then, it is determined that the mode P is the discrete number D of the text S. The first discrete element substring is required, the discrete number D is output, and the position array pos [ ] is output to end the match.

If D> Do , restart the matching of the next discrete substring, modify the position of the compared character in the text S to: Max (pos) the value of the second position of the pos [ ], g „+l- m_ D.), and take the character at the position as the compared character; modify the position of the comparison character in the pattern P to the first character position of the pattern P , and take the character at the position as the comparison character, and turn to step b.

This method can adjust the discrete number D of discrete element substrings. , changing the function of discrete element sub-pattern matching, searching for discrete sub-strings and positions that satisfy the discrete number requirement. The smaller the discrete number, the more accurate the search positioning, but the worse the search function is. It may skip some related texts that satisfy the discrete substring. The larger the discrete number, the less accurate the search position, but the search function is searched. The stronger, the more text that satisfies the matching of discrete substrings. This allows the present invention to meet the requirements of different positioning accuracy and different checking functions in different situations in information retrieval by adjusting a predetermined discrete number.

The method can be applied to the location retrieval of long and short texts such as network information search and database information retrieval.

Its time complexity analysis: Find the first one to satisfy the discrete number D. The time complexity of the discrete prime substring is 0 (n+k (m+Da) ) , and the number of character comparisons is f (n) n+2 (k-1) (m+Da-1); Number D. The time complexity of the discrete prime substring is 0 (n+k (m+Da) ), the number of character comparisons is f (n) =n+2k (m+Da-1); k is the found of the found discrete substring The number of times, Da is the average discrete number of discrete substrings found.

The discrete substring pattern matching method for information retrieval and information input, which outputs the discrete substring matching degree, can be modified to form a pattern matching method for outputting the minimum discrete prime substring matching degree, and the method is the output discrete substring described above. In the pattern matching method of matching degree, the af step is unchanged, the g step, the h step are modified, and the i step is added:

Step g P mode is determined as a discrete element text substring of S, if the current smallest discrete element of the first character of the substring and the last character in the text position y ,, y _ra S is not assigned, then let y pos f] first Value, y _m = pos [ the last value in [ ], turn i step; otherwise, find the first character of the discrete substring and the position of the last character in the text S: g, =pos [ ] The value, g _m = pos [ the last value in [ ], if (g„- _gl ) < (y _m - _yi ), then y^ gy _ffl = g _B , turn i step; if (g _n - gi) > (y _m -yi) , then turn directly to i step;

h Step As Gao current minimum discrete element substring the first character and the last character position _yi text in S, y ₂ is not assigned, it is determined that there is the pattern P text S discrete element substring, outputs a determination result "- 1" , end the match; otherwise, find the length n of the text S, the length m of the pattern P, and output the minimum discrete element substring matching degree = Round (100 x (m- ^-y, - m+1) ÷ n) ÷ n) End matching;

If i step (y„- yr"ffl+l) =0, go to h step; otherwise, restart the matching of the next discrete substring, and change the position of the compared character in text S to the second of pos [ ] The value of the position, and take the character of the position as the compared character; modify the position of the comparison character in the mode P to the first character position of the mode P, and take the character of the position as the comparison character, and turn to step b.

The pattern matching method for outputting the minimum discrete element substring matching degree, the minimum discrete element substring matching degree of the output text S and the pattern P when determining the smallest discrete element substring in the text (l OO x (m- (y _ra -y -m+l) ÷ n) ÷ n) The smallest discrete element substring reflects the discrete substring with the smallest scatter in the text. Therefore, the matching degree of the smallest discrete substring can most accurately reflect the degree of matching.

This method is more suitable for information input judgment of short text and large-capacity string sets. By using the matching degree, all the retrieved texts can be output in descending order, and the user first processes the text with high matching degree, which further improves the processing efficiency of information retrieval and input.

This method finds the smallest discrete prime substring with a time complexity of 0 (n+k (m+Da);) and the number of character comparisons is f (n) <n+2k(ra+Da-l), where k is the number of occurrences of the found discrete element substring, and Da is the average discrete number of the found discrete element substring. The time complexity of finding the smallest discrete substring is 0(n), and the number of character comparisons is f(n)=n. It can be seen from the time complexity analysis that when the text string T is not associated with the pattern P, that is, the discrete substring of P is unlikely to occur in the T, the method can skip the irrelevant text by only performing n character comparisons.

The discrete substring pattern matching method for information retrieval and information input for outputting the minimum discrete prime substring matching degree is slightly modified to form a pattern matching method for outputting the discrete number and position of the minimum discrete prime substring, which is the minimum output described above. In the pattern matching method of the discrete element sub-matching degree, the a-f step is unchanged, and the g step, the h step, and the i step are modified as:

Step g finds the discrete element substring. If the current minimum discrete element substring position array min[ ] is not assigned, let min[]= _P os[], turn i step; otherwise, find the first character and end of the discrete element substring The position of the character in the text S: g produces the first value in pos[], the last value in g„=pos [ ], finds the position of the first and last characters of the current smallest discrete prime substring in the text S : _yi =min [ ] The first value in the y _m =min [ ], if (g„,- _gl ) < (y _m - _yi ), then let min[ ]=pos gate, Turn i step; if (g _m - _gl ) > (y _ffl - _yi ), then turn directly to i step;

Step h If the position array min[] of the current smallest discrete substring is not assigned, it is determined that there is no discrete substring of the pattern P in the text S, and the determination result is outputted "-Γ, the end is matched; otherwise, the length of the pattern P is obtained. m, find the position of the first and last characters of the current smallest discrete prime substring in the text S: the first value in yi-min[ ], the last value in y _m =min [ ], the output discrete number D= y„-y广m+1, and output the position array min[ ], ending the match;

Step i find the position of the first and last characters of the current smallest discrete prime substring in the text S: the first value in =πΰη[ ], the last value in y _m =min[ ], if (y„- _yi - m+l)=0, go to h step; otherwise, restart the matching of the next discrete substring, modify the position of the compared character in text S to the value of the second position of pos[ ], and take the position The character is used as the compared character; the position of the comparison character in the mode P is changed to the first character position of the mode P, and the character of the position is taken as the comparison character, and the step b is performed.

This method further improves the positioning accuracy of discrete element sub-pattern matching. This is because if there is a discrete element substring in the text S, there must be a discrete element substring with the smallest discrete number. Finding the discrete discrete substring with the smallest discrete number in the whole range of text is an optimal positioning scheme, which can effectively improve the efficiency and accuracy of information retrieval.

Time complexity analysis of the method: The time complexity of finding the smallest discrete substring is 0 (n+k(in+Da)), and the number of character comparisons is f (n) n+2 (k-1) (m+ Da-1); The time complexity of finding the smallest discrete prime substring is 0 (n+k (m+Da) ) , and the number of character comparisons is f (n) =n+2k (m+ Da-1); k is The number of occurrences of the found discrete substring, Da is the average discrete number of the found discrete substring.

The discrete substring pattern matching method for information retrieval and information input outputting the discrete number and position of the minimum discrete prime substring is slightly modified to form a pattern matching of the discrete discrete substring discrete number and position based on a given discrete number The method is the same as the ag step and the i step in the pattern matching method for outputting the discrete number and position of the smallest discrete element substring, and the h step is modified as:

If the position array rain [ ] of the current minimum discrete substring is not assigned, it is determined that there is no discrete substring of the pattern P in the text S, and the determination result is outputted "-Γ, end matching; otherwise, the length of the mode P is obtained. m, find the position of the first and last characters of the current smallest discrete prime substring in the text S: the first value in _yi =min[ ], the last value in y _m =min[ ], find the discrete number D =y _m -y-m+l. If the discrete number D> is a predetermined discrete number Do, it is determined that there is no minimum discrete element substring satisfying the predetermined discrete number Do in the text S, and the determination result "-" is ended.

If the discrete number D is a predetermined discrete number Do, the mode P is determined to be the smallest discrete element substring required by the given discrete number D of the text S, the discrete number D is output, and the position array min [ ] is output. , end the match.

This method finds in the text S that a given discrete number D is satisfied. The minimum required discrete substrings improve the function of discrete prime sub-pattern matching, filtering out the smallest discrete sub-strings with too large discrete texts, and improving the accuracy of information retrieval.

The method can be applied to the positioning and retrieval of long and short texts such as network information search and database information retrieval.

The time complexity of this method is the same as the minimum discrete element substring pattern matching method.

The above-mentioned discrete substring pattern matching method for information retrieval and information input can modify the expansion to form a two-dimensional discrete substring pattern matching method, which firstly expands the concept of discrete substring and text into a two-dimensional discrete substring. And the concept of two-dimensional text, and then corresponding to the four steps of a, b, c, d in the discrete substring pattern matching method, respectively, B, C, D four steps, and in the C step reference a, b , c, d four steps, namely:

The discrete substring pattern matching method for information retrieval and information input has a plurality of texts S, and a plurality of texts S ^! S ² ... S ⁿ constitute a two-dimensional text Ds = "S'S ² ... S"" , Ds = arbitrary two-dimensional text "in the"S'S ² ...... S "of one or more discrete text substring S ^Ci 'text string consisting ^{of"S' S 02 '...... S} Gffl'"( wherein KG ^ G ^... <G _m _n ) is a two-dimensional discrete substring, and the two-dimensional discrete substring pattern matching determines whether the two-dimensional mode Dp = "P'P ² ...... P ^m " (K m < n) is two-dimensional The two-dimensional discrete substring of the text Ds, the specific steps of this two-dimensional discrete substring pattern matching method are as follows:

Step A takes the first text S ^{1 of the} two-dimensional text Ds as the compared text, and takes the first pattern P ^{1 of the} two-dimensional pattern Dp as the comparison text;

Step B If the text being compared or the comparison text is the end mark, go to step D;

Step C is to compare the text and the comparison text to the steps of step a, step b, step c, and step d in the method for matching the discrete substring basic pattern. If the result of the step d is present, the two-dimensional text D s is taken. The next text is the compared text, take the next mode of the two-dimensional mode D p as the comparison text, and turn to step B; otherwise, take the next text of the two-dimensional text D s as the compared text, compare the text unchanged, turn B Step

In step D, if the comparison text is the end mark, the two-dimensional mode D p is a two-dimensional discrete substring of the two-dimensional text D s , and the number n of texts of the two-dimensional text D s and the number of modes of the two-dimensional mode D p are obtained. m; output two-dimensional discrete substring simple matching degree = Round (100 x ra ÷ n), end matching; otherwise there is no two-dimensional discrete substring of the two-dimensional mode D p in the two-dimensional text D s, outputting the judgment result "- 1 " , end the match.

The two-dimensional discrete substring pattern matching method extends the discrete substring to realize two-dimensional discrete substring pattern matching, and the discrete substring is only a special case of the two-dimensional discrete substring, that is, when the above two-dimensional When m=l in the discrete substring "S ^gl ' S ⁶² ' ... S ^sm '", the two-dimensional discrete substring evolves into a discrete substring.

A two-dimensional discrete substring pattern matching method is used for two-dimensional space. For example, the Chinese single-word pinyin in the keyboard input, the English word as a one-dimensional string, the Chinese phrase Pinyin, and the English phrase can be considered as a two-dimensional string. The two-dimensional discrete substring has all the characteristics of the discrete substring, and also contains the discrete substring. In this way, the present invention can perform arbitrary character omitting input and retrieval on the level of one-dimensional text space, and can perform omitting input and retrieval of any one-dimensional text on the two-dimensional space level. The relevant text can be found, making information retrieval and information input simpler and more flexible.

This two-dimensional discrete substring pattern matching method determines the simple matching degree (100 x m ÷ n) of the two-dimensional discrete substring after the existence of the two-dimensional discrete substring. By using the matching degree, all the retrieved two-dimensional texts can be output in descending order, and the user first processes the two-dimensional text with high matching degree, which improves the efficiency of the retrieval processing. It is suitable for the retrieval judgment of dictionary short text and large capacity two-dimensional string set.

The time complexity analysis: Assume that the lengths of the text strings in the two-dimensional text S are: Li, L ₂ L _n , let LI +I^... + L _n , then determine the time complexity of the existence of the two-dimensional discrete substring Degree is 0 (L), the number of character comparisons f (L) < L, the time complexity of determining that there is no two-dimensional discrete substring is O (L), the number of character comparisons f (L) = L, and the two-dimensional mode P Regardless of the length, the decision method, in the quickest way, skips irrelevant text.

The above-mentioned two-dimensional discrete substring pattern matching method for information retrieval and information input can be slightly modified to form a pattern matching method for output accurate matching degree, and the method is the above two-dimensional discrete substring pattern matching method. Steps and steps B are unchanged, and steps C and D are modified to:

Step C is to compare the text with the comparison text to perform the steps of step a, step b, step c, and step d in the method of discrete substring base matching. If the result of step d is present, the text to be compared is The position value in the two-dimensional text D s is stored in the position array pos [ ], and its storage position is the same as the position of the comparison text in the two-dimensional mode D p , and the next text of the two-dimensional text D s is taken as the comparison Text, take the next mode of the two-dimensional mode D p as the comparison text, and turn to step B; otherwise, take the next text of the two-dimensional text D s as the compared text, compare the text unchanged, and turn to step B;

If the comparison text is not the end mark, the two-dimensional discrete substring of the two-dimensional pattern D p does not exist in the two-dimensional text D s , and the determination result "- , the end matching is output; otherwise, the two-dimensional mode D p is determined to be two-dimensional The two-dimensional discrete substring of the text D s, the number n of texts of the two-dimensional text D s , the number m of modes of the two-dimensional pattern D p , the first text string of the two-dimensional discrete substring and the last one The position of the text string in the two-dimensional text D s: the first value in G^pos [ ], the last value in G _m =pos [ ], the exact match of the output two-dimensional discrete substring - Round (100 X (m - (G _{m -} G - m + l) ÷ n) ÷ n) , end the match.

This method determines the exact matching degree of the two-dimensional discrete substring when there is a two-dimensional discrete substring (100 X (m - (g _{m -} g - m + l) ÷ n) ÷ n). The exact matching degree of the two-dimensional discrete substring not only considers the number of one-dimensional texts of the two-dimensional text S and the two-dimensional pattern P, but also considers the influence of the discrete numbers of the retrieved two-dimensional discrete substrings on the matching degree. By using the matching degree, all the retrieved two-dimensional texts can be output in descending order, and the user first processes the two-dimensional text with high matching degree, which further improves the retrieval processing efficiency of the two-dimensional space. It is also applicable to the retrieval judgment of dictionary short text and large-capacity two-dimensional string set.

The time complexity analysis: The time complexity of determining the existence of two-dimensional discrete substring is 0 (L), the number of character comparison times f (L) < L, and the time complexity of determining that there is no two-dimensional discrete substring is 0 (L) , The number of character comparisons f (L) = L, regardless of the length of the two-dimensional mode P, the decision method, in the fastest way, skips irrelevant text.

The present invention will be further described in detail below in conjunction with specific embodiments.

detailed description

Embodiment 1

A first embodiment of the present invention is: a discrete substring pattern matching method for information retrieval and information input, It is characterized in that the discrete substring is a string consisting of any one or more characters of the text S=...s„""S _El S _g2 ... a S _gm " (K _gl <g ₂ ... < g _m <n); Discrete substring pattern matching, ie, decision mode Ρ = "Ρ P ₃ ...... Ρ, (K mn ) is the discrete substring of the text S "S _gl S _e2 ... S ₆ ' and outputs a decision The specific steps of the result are as follows: a step takes the first character of the text S as the compared character, and takes the first character of the pattern P as the comparison character; b step if the compared character or the comparison character is the end flag, the d step;

Embodiment 2:

The method of this example is a pattern matching method of output simple matching degree formed by slightly modifying the basic matching method, and the method is the steps a, b, and c in the method of the first embodiment. Step d is modified to:

If the comparison character is the end flag in d step, it is determined that the mode P is a discrete substring of the text S, and the length n of the text S, the length m of the pattern P, and the simple matching degree of the output discrete substring = Round (100 xm ÷ n ) are obtained. End the match; otherwise, it is determined that there is no discrete substring of the pattern P in the text S, and the determination result "-" is ended.

The Round in the present invention is a rounding function, that is, a rounding rounding operation.

Embodiment 3

The method of this example is also a pattern matching method for output accurate matching degree formed by slightly modifying a basic matching method, which is implemented in the method of implementing one, step a and step b are unchanged, and step c, Step d is modified to:

If the comparison character is equal to the comparison character, the position value of the character to be compared in the text S is stored in the position array pos gate, and the storage position is the same as the position of the comparison character in the pattern P, and the text S is taken. The next character as the compared character, take the next character of the pattern P as the comparison character, and turn to step b; otherwise, take the next character of the text S as the compared character, compare the characters unchanged, and turn to step b;

If the comparison character is not the end flag, it is determined that there is no discrete substring of the pattern P in the text S, and the output determination result "ends the matching; otherwise, the determination mode P is the discrete substring of the text S, and the length n of the text S is obtained. The length m of the pattern P, find the position of the first character and the last character of the discrete substring in the text S: _gl = pos [ the first value in [ ], the last value in g«=pos [ ], the output can The exact match that reflects the degree of dispersion of the discrete substrings - Round (100 X (ra - (g _{m -} g - m + 1) ÷ n) ÷ n) , ends the match.

Embodiment 4

The method of this example is a mode matching method for output discrete substring discrete numbers and positions formed by slightly modifying the mode of output precision matching of three, which is a method of implementing three steps a, b, The c step is unchanged, and the d step is modified as: d step If the comparison character is not the end flag, it is determined that there is no discrete substring of the pattern P in the text S, and the determination result "-, the end matching is output; otherwise, the mode P is the text S. The discrete substring "S _sl S g ₂ ...... S _em ", find the length m of the pattern P, and find the position of the first character and the last character of the discrete substring in the text S: g, = pos [ ] a value, The last value in g _m =pos [ ], output the discrete number of the discrete substring D=g _m -g -m+l , and output the position array pos gate to end the match.

Embodiment 5

The method of this example is a method for matching the discrete number and position of discrete substrings of a given discrete number on the method of implementing the output discrete substring discrete number and position of the fourth, and the method is In the method of the fourth embodiment, steps a, b, and c are unchanged, and step d is modified to:

If the comparison character is not the end flag, it is determined that there is no discrete substring of the pattern P in the text S, and the output judgment result ends the matching; otherwise, the pattern P is the discrete substring of the text S "S _gl S _g2 ... S _g ; ' , find: the length m of the pattern P, the first character of the discrete substring and the position of the last character in the text S: _gl = pos [ the first value in [ ], the last value in g^pos [ ], The discrete number of discrete substrings D = g _{m -} g - m + l ;

If the discrete number D < a predetermined discrete number D „, then the mode P is determined to be the discrete number D of the text S. The first discrete substring required, the discrete number D is output, and the position array pos [ ] is output. End the match.

If D>D„; restarts the matching of the next discrete substring, the position of the compared character in the text S is modified to: the currently compared character position - the length of the pattern P m - the predetermined discrete number D „, and The character at the position is taken as the compared character; the position of the comparison character in the mode P is modified to the first character position of the mode P, and the character of the position is taken as the comparison character, and the step b is performed.

Embodiment 6

The method of this example is a pattern matching method for the output discrete element substring matching degree formed by slightly modifying and expanding on the pattern matching method of the output precise matching degree of the third embodiment, and the method is the step a of the method of the third embodiment. , step b, step c does not change, first find the discrete substring, then find the discrete substring through the following d step, e step, f step, and then determine the matching degree of the discrete prime substring by g step, h step:

If the comparison character is not the end mark, go to step h; otherwise, move the position of the compared character in the text S by 1 character position, and take the character at the position as the compared character, and compare the position of the character in the pattern P. Move forward 2 characters and take the character at that position as the comparison character.

Then, in turn, enter e, f, g, h steps below 4亍:

Step e If the first character of mode P has been compared, go to g step;

The g step determination mode P is a discrete element substring of the text S, and the length n of the text S and the length m of the pattern P are obtained, and the position of the first character and the last character of the discrete element substring in the text S is obtained: _gl =pos [ ] The first value in , g _m =po s [ last value in [ ], output discrete substring matching - Round d OO x (m- (g„-g -m+1) ÷ n) ÷ n) End matching; h step determines that there is no discrete substring of the pattern P in the text S, and outputs a determination result "- 1 " to end the matching.

Implementation seven

The method of this example is formed by slightly modifying the pattern matching method of the output discrete sub-string matching degree of the sixth embodiment. A pattern matching method for outputting discrete number and position of discrete sub-substrings is the same as the method of the sixth embodiment, wherein the a-f step and the h-step are unchanged, and the g-step is modified to:

The g step determination mode P is a discrete element substring of the text S, and the length m of the pattern P is obtained, and the position of the first character and the last character of the discrete element substring in the text S is obtained: _gl = pos [ ] the first value , g _m =Pos [ last value in [ ], output the discrete number D=g _m -g -m+l, and output the position array po s [ ] to end the match.

Example eight

This example is a pattern matching method in which the discrete matching sub-distribution number and position of a discrete number of discrete elements are obtained by slightly modifying the pattern matching method of the discrete-sub-substring discrete number and position of the output discrete-sub-substring. In the method of the seventh embodiment, the af step and the h step are unchanged, and the g step is modified to:

The g step determination mode P is a discrete prime substring of the text S, and the length ra of the mode P is obtained, and the position of the first character of the discrete substring and the position of the last character in the text S is obtained: _gl = _P os [ ] The value, g _m = the last value in pos [ ], the discrete number D = gf _gl -m + l.

If the discrete number D is given a discrete number D. Then, it is determined that the mode P is the discrete number D of the text S. The first discrete element substring is required, the discrete number D is output, and the position array pos [ ] is output to end the match.

If D> Do, restart the matching of the next discrete substring, modify the position of the compared character in the text S to: Max (pos [ ], the value of the second position, +1- m- D.), Max For the maximum operation, the meaning of this equation is to take the value of the largest of the two numbers. And take the character of the position as the compared character; change the position of the comparison character in the mode P to the first character position of the mode P, and take the character of the position as the comparison character, and turn to step b.

Example nine

This example is a pattern matching method for outputting a minimum discrete element substring matching degree which is slightly modified on the pattern matching method of outputting the discrete element substring matching degree of six. The method is the a-f in the method of the sixth embodiment. Steps are unchanged, modify g step, h step, and increase i step:

The first step g determination mode P is a discrete element substring text S, and if the current smallest discrete element of the substring first character and the last character position y in the text in S ,, y _B not assigned, then let y ^ pos f] in A value, y _m = pos [ ] the last value, turn i step; otherwise, find the first character of the discrete substring and the position of the last character in the text S: _gl = the first value in the pos gate , g„=pos [ ] the last value, if (g„- _gl ) < (y _m - _yi ), then y^ gi , y„,= g _B , turn i step; if ( - gi) > (y _m -yi) , then turn directly to i step;

Step h If the positions _yi and y ₂ of the first character and the last character of the current minimum discrete element substring are not assigned in the text S, it is determined that there is no discrete element substring of the pattern P in the text S, and the determination result "-1" is output. End the match; otherwise, find the length n of the text S, the length m of the pattern P, and output the minimum discrete element substring match = Round (100 (m- (y _m -y, -m+l) ÷ n) ÷ n) End matching;

If i step (y yr m+l) =0, go to h step; otherwise, restart the matching of the next discrete substring, and change the position of the compared character in text S to the value of the second position of pos [ ] And take the character of the position as the compared character; modify the position of the comparison character in the mode P to the first character position of the mode P, and take the character of the position as the comparison character, and turn.

Example ten This example is a pattern matching method for outputting a minimum discrete element substring discrete number and a position formed by slightly modifying the pattern matching method for outputting the minimum discrete element substring matching degree in the ninth embodiment, and the method is the method of the ninth embodiment. The af step is unchanged, and the g step, h step, and i step are modified to:

Step g finds the discrete prime substring. If the current smallest discrete prime substring position array min [ ] is not assigned, let the rain gate = pos [ ], turn i step; otherwise, find the first and last characters of the discrete prime substring. The position in the text S: = ρο3 [ ] The first value in the [ g _m = pos [ ], the last value of the current smallest discrete prime substring and the position of the last character in the text S: _yi = The first value in min [ ], the last value in y„-min [ ], if (g„- g,) < (y^y , let min [ ] =pos [ ], turn i step; If (g„- _gl ) > (y _m - _yi ), then turn directly to i step;

Step h If the position array min [ ] of the current smallest discrete prime substring is not assigned, it is determined that there is no discrete substring of the pattern P in the text S, and the determination result "-1" is output, and the matching is ended; otherwise, the mode P is obtained. The length m, find the position of the first and last characters of the current smallest discrete prime substring in the text S: _yi =inin [ the first value in [ ], the last value in y _n =min [ ], the output discrete number D=y„- -m+l , and output the position array min gate to end the match;

Step i find the position of the first and last characters of the current smallest discrete prime substring in the text S: _Υι =πιίη [ the first value in the y„= fflin the last value in the gate, if (y _m - _yi - m+l) =0, go h step; otherwise, restart the matching of the next discrete substring, modify the position of the compared character in the text S to the value of the two positions of pos [ ], and take the position The character is used as the compared character; the position of the comparison character in the mode P is changed to the first character position of the mode P, and the character of the position is taken as the comparison character, and the step b is performed.

Embodiment 11

This example is a pattern matching method for the discrete-distribution sub-distribution number and position of a given discrete number based on a pattern matching method for outputting the minimum discrete element sub-string discrete number and position of the tenth embodiment. The method is the a-g step, the i step is unchanged in the method of the tenth embodiment, and the h step is modified to:

Step h If the position min gate of the current minimum discrete element substring is not assigned, it is determined that there is no discrete substring of the pattern P in the text S, and the determination result "- 1" is output, and the matching is ended; otherwise, the length of the pattern P is obtained. m, find the position of the first and last characters of the current smallest discrete prime substring in the text S: _yi =min [ the first value in [ ], the last value in y _m =min [ ], find the discrete number D =y _m -y -ni+l.

If the discrete number D> is a predetermined discrete number D. Then, it is determined that there is no discrete number D that satisfies a predetermined value in the text S. The smallest discrete prime substring, the output judgment result ends the match.

If the discrete number D is a predetermined discrete number Do, the mode P is determined to be the smallest discrete element substring of the text S that meets the predetermined discrete number D „, the discrete number D is output, and the position array min [ ] is output. End the match.

Example twelve

In this example, based on the discrete substring pattern matching method for information retrieval and information input in the first embodiment, the expansion is performed to form a two-dimensional discrete substring pattern matching method, which firstly expands the concept of discrete substring and text. The concept of two-dimensional discrete substring and two-dimensional text, and then corresponding four steps A, B, C, and D similar to the four steps a, b, c, and d in the discrete substring pattern matching method, and in step C The four steps a, b, c, and d are quoted, namely:

A discrete substring pattern matching method for information retrieval and information input, wherein the text S has a plurality of texts S'S ² ... S ⁿ constitute a two-dimensional text Ds = "S'S ² ... S " , any one or one of the two-dimensional text Ds = "S'S ² ... S ⁿ " Text string "S"'S ⁰² '... S ¹⁵⁰¹ '" (where KG^G^■····· <G„ <n) consists of more than one text S ¹ "discrete substring S ^ei ' Two-dimensional discrete substring, two-dimensional discrete substring pattern matching, that is, whether the two-dimensional mode Dp = "P ...... " (Km < n) is a two-dimensional discrete substring of the two-dimensional text Ds, this two-dimensional discrete The specific steps of the substring pattern matching method are as follows:

Step B If the text is compared or the comparison text is the end mark, go to step D;

Step C is to compare the text and the comparison text to the steps of step a, step b, step c, and step d in the discrete substring pattern matching method. If the result of step d is present, take the next step of the two-dimensional text Ds. The text as the compared text, take the next mode of the two-dimensional mode Dp as the comparison text, and turn to step B; otherwise, take the next text of the two-dimensional text D s as the compared text, compare the text unchanged, and turn to step B;

In step D, if the comparison text is the end mark, the two-dimensional mode Dp is a two-dimensional discrete substring of the two-dimensional text Ds, and the number n of texts of the two-dimensional text Ds is obtained, and the number of modes of the two-dimensional mode Dp is m; Dimensional discrete substring simple matching degree = Roimd (100xm÷n), end matching; otherwise, the two-dimensional discrete substring of the two-dimensional mode Dp does not exist in the two-dimensional text Ds, and the determination result "- 1" is output, and the matching is ended.

Example thirteen

This example is a two-dimensional discrete substring pattern matching method for the information retrieval and information input in the two-dimensional discrete substring pattern matching method of the twelfth embodiment, and the output precise matching degree is formed by a slight modification, and the method is implemented. In the method of Example 12, step A and step B are unchanged, and step C and step D are changed to:

Step C is to compare the text with the comparison text to perform the steps of step a, step b, step c, and step d in the discrete substring pattern matching method. If the result of the step d is present, the text to be compared is in the two-dimensional text. The position value in Ds is stored in the position array pos [ ], and its storage position is the same as the position of the comparison text in the two-dimensional mode D p , and the next text of the two-dimensional text D s is taken as the compared text, taking two dimensions. The next mode of the mode Dp is used as the comparison text, and the process proceeds to step B; otherwise, the next text of the two-dimensional text Ds is taken as the compared text, the comparison text is unchanged, and the step B is performed;

If the comparison text is not the end mark, the two-dimensional discrete substring of the two-dimensional mode Dp does not exist in the two-dimensional text Ds, and the determination result "-1" is output, and the matching is ended; otherwise, the two-dimensional mode Dp is determined to be two-dimensional text. The two-dimensional discrete substring of Ds, find the number n of texts of the two-dimensional text Ds, the number m of modes of the two-dimensional pattern Dp, and find the first text string of the two-dimensional discrete substring and the last text string in two The position in the dimension text Ds: the first value in d-pos [ ], the last value in G _m =pos [ ], the exact match of the output two-dimensional discrete substring - Round (100 X (m- ( G„-G-m+l) ÷ n) ÷ n), end the match.

In the following, the above embodiments are applied to the enumerated modes and texts, and the results of the pattern matching and the comprehensive analysis are performed.

The pattern matching method of the first, second, third, sixth, and ninth methods determines whether there are discrete substrings in the text and performs the calculation of the matching degree, but does not perform positioning, and is mainly used in the field of information input technology. Table 1 shows the output of a specific pattern match for these pattern matching methods. Table 1: Comparison of output results of the pattern matching methods of the first, second, third, sixth and ninth embodiments

Note: The underlined characters indicate the characters that pattern P matches in the text.

It can be seen from Table 1 that the pattern matching method of the first embodiment only determines whether the mode P exists in the text S, and cannot sort and output the retrieved text. The pattern matching method of the single cylinder matching degree of the second embodiment can be sorted according to the simple matching degree of the discrete substrings of the pattern P and the text S, but cannot reflect the influence of the discrete number of discrete substrings on the matching degree, and the smaller the discrete number, The match should be larger. The exact matching degree pattern matching method of the third embodiment can reflect the influence of the discrete number on the matching degree, and the different discrete numbers obtain different matching degrees, but the judgment result of the third embodiment pattern matching method is not necessarily a better matching position. The discrete-sub-string matching degree pattern matching method of the sixth embodiment can find the position of the discrete-sub-string, since the discrete-sub-string is a discrete sub-string with a discrete number within the corresponding discrete sub-string, showing a more accurate matching position. , so the output is more accurate. The pattern matching method of the minimum discrete element substring matching degree of the ninth embodiment can find the position of the smallest discrete element substring in the text, so the matching degree of the output is the most accurate, and the sorting result is optimal.

Table ² lists the time complexity of the discrete substring pattern matching method of the first, second, third, sixth, and ninth embodiments described above.

Table 2: Time complexity analysis of the first, second, third, sixth and ninth methods

(where k is the number of discrete prime substrings found in the search, and Da is the average discrete number of the found discrete substrings) Table 1 and Table 2 reflect the search for the least discrete prime substring pattern matching method of Example 9. Matching degree Sorting the output in descending order can accurately reflect the degree of matching of the text, but the time complexity of the method is the highest, which also increases the complexity of the method itself. Therefore, according to the requirements of practical problems, comprehensive consideration of various factors, select the above appropriate method for search and determination.

The pattern matching method of the fourth, fifth, seventh, eighth, tenth, and eleventh embodiments performs the calculation of the discrete number on the basis of determining whether there is a discrete substring in the text to reflect the degree of correlation between the pattern and the text, and gives The corresponding position of each character of the pattern in the text is also positioned. They are mainly used in the field of information retrieval technology to more easily and efficiently retrieve relevant texts and indicate the specific location of the characters in the pattern P in the text τ. The output positioning results for a specific pattern match for these pattern matching methods are listed below.

The output results (examples) of several discrete substring pattern matching methods that can be located are as follows, and the search term is the pattern.

Text "Pinyin-based Chinese spelling strokes, tone combination input method, abbreviation spell, pen, sound combination input method" Existing substring positioning Search missing

Search term "spell"

Text "Chinese Pinyin, "Structure, Combination, Input, Abbreviation, Pen, and Sound Combination Input Method". Example 4 † † †

Search term "spell"

Text "Chinese Pinyin, stroke, and A combination input method, referred to as spell, pen, and sound combination input method". Example 5 † †

Search term = "spelling" given discrete number = 25

Text = "Pinyin-based Chinese 3⁄4 tone, «Line, A-key combination input method, barrel-like spell, pen, sound combination input method" Example 5 † † †

Search term = "scrape", given discrete number = 10

Text = "Pinyin-based Chinese Pinyin, Stroke, Tone Combination Input Method, Input Method" Example 5

Search term = "spelling", given discrete number = 5

Text = "Pinyin-based Chinese Pinyin, Stroke, Tone Combination Input Method, abbreviated spell, pen, and sound combination input method" Example 5 Search omission (not meeting the requirement of a given discrete number 1)

Search term = "spell", given discrete number =1

Text = "Pinyin-based Chinese arpeggio, stroke, and combination input method, cartridge-like spell, pen, and sound combination input method" Example 7 † † †

Search term = "spell"

Text = "Pinyin-based Chinese tones, 3⁄4 strokes, combination of input methods, cartridges, pens, and sound combination input methods" Example 8 † † †

Search term = "scrape", given discrete number = 25

Text combination input method"

†

Search term = "spell", given discrete number 5 Text = "Pinyin-based Chinese Pinyin, Stroke, Tone Combination Input Method, abbreviated as spell, pen, and sound combination input method" Example 8 Search Missing (not meeting the requirement of a given discrete number 1)

Search term = "spell", given discrete number =1

Text = "Pinyin-based Chinese Pinyin, Stroke, Tone Combination Input Method, Abbreviation, A Combination Input Method Example 10 † †

Search term = "spell"

Text = "Pinyin-based Chinese Pinyin, Stroke, Tone Combination Input Method, Abbreviation, A Combination Input Method" Example 11 † †

Search term = "scrape", given discrete number = 25

Text = "Pinyin-based Chinese Pinyin, Stroke, Tone Combination Input Method, Referred to as A, A Combination Input Method" Example 11

Search term = "spelling", given discrete number = 5

Text = "Pinyin-based Chinese Pinyin, Stroke, Tone Combination Input Method, abbreviated spell, pen, and sound combination input method" Example 11 Search missing (not meeting the requirement of a given discrete number 1)

Search term = "spell", given discrete number =1

The fourth embodiment is a pattern matching method for outputting discrete substring discrete numbers and positions, which is located in the first discrete substring appearing in the text, and has no discrete number limitation, and is suitable for information retrieval of short text. The fifth embodiment is to output a pattern matching method based on discrete number and position of discrete substrings of a given discrete number, and improve the pattern matching method of the fourth embodiment, and the first one that appears in the text satisfies a predetermined discrete number. D. Discrete substrings, suitable for long and short text retrieval.

The seventh embodiment is a pattern matching method for outputting the discrete number and position of the discrete element substring, which is located in the first discrete element substring appearing in the text, the discrete element substring is within the range of the corresponding discrete substring, and the discrete substring of the discrete number is smaller. Therefore, the positioning is more precise, there is no discrete number limit, and it is suitable for information retrieval of short text. The eighth embodiment is a pattern matching method for outputting discrete numbers and positions of discrete prime sub-strings based on a given discrete number, which improves the method of the seventh embodiment, and the first one that appears in the text satisfies a predetermined discrete number Do The discrete prime substring, suitable for long and short text information retrieval.

Embodiment 10 is a pattern matching method for outputting the discrete number and position of the smallest discrete element substring, which is located in the smallest discrete element substring appearing in the text, the smallest discrete element is in the text range, and the discrete number is the smallest discrete substring, so the positioning The most accurate. This method has no discrete number limitation and is suitable for information retrieval of short text. Embodiment 11 is a pattern matching method for outputting a discrete number and position of a minimum discrete element substring based on a given discrete number, which improves the method of Embodiment 10 and is located to satisfy a predetermined discrete number D. The smallest discrete prime substring, suitable for long and short text information retrieval.

As can be seen from the above example, the pattern matching positioning based on the existing substring may result in the retrieval omission of the discrete related text. In the discrete substring pattern matching method of the present invention, only when the given discrete number of the fifth embodiment, the eighth embodiment, and the eleventh method is too small, an undesired discrete correlation text missing occurs. Other discrete substring pattern matching methods do not occur for discrete related text retrieval omissions. The fifth embodiment, the method of the eighth embodiment and the eleventh embodiment are the most flexible methods, and can balance the recall rate, the precision rate, and the positioning precision by adjusting a predetermined discrete number; and Under the condition of discrete number, the recall and precision of related texts are the same, but the latter is the best.

Table 3 lists the time complexity of the above discrete substring pattern matching methods that can be located, and the adaptation of each method. The scope.

Table 3: Comprehensive analysis of several discrete substring pattern matching methods that can be located

(where D is the predetermined dispersion, k is the number of discrete prime substrings found in the search, and Da is the average discrete element of the found discrete substring)

The above embodiments one, two, three, six, and nine are one-dimensional discrete substring pattern matching methods. The pattern matching method of the twelfth and thirteenth embodiments is a two-dimensional discrete substring pattern matching method formed by expanding and modifying the above one-dimensional discrete substring pattern matching method, which is suitable for short text in two-dimensional space. , large-capacity lexicon pattern matching. For example, the Chinese single-word pinyin in the keyboard input, the one-dimensional string in the English word, the pinyin of the Chinese phrase, and the English phrase can be regarded as a two-dimensional string.

Table 4 below shows the results of the method of Embodiments 12 and 13 specifically for pattern matching in two-dimensional text.

Table 4: Results of the 2D Discrete Substring Pattern Matching Method (Example)

Note: The underlined Chinese characters indicate the Chinese characters that the two-dimensional pattern Dp matches in the two-dimensional text.

The pattern and text content in Table 4 are actually the pinyin of Chinese characters. In order to more clearly reflect the two-dimensional discrete characteristics, Chinese characters are replaced by Chinese characters. The pinyin of each Chinese character can also be subjected to a one-dimensional random default search.

The two-dimensional discrete substring pattern matching method of the twelfth embodiment can sort according to the simple matching degree of the two-dimensional discrete substring of the two-dimensional pattern Dp and the two-dimensional text Ds, but cannot reflect the two-dimensional discrete substring discrete The effect of the number of pairs, the smaller the number of discretes, the greater the degree of matching. The two-dimensional discrete substring pattern matching method of the output precise matching degree of the thirteenth embodiment, the exact matching degree of the output can reflect the influence of the discrete number on the matching degree, and the different discrete numbers obtain different matching degrees, so the sorting result is more reasonable. .

With respect to substrings, the discrete substring proposed by the present invention conceptually greatly increases the range of related texts; the present invention proposes a string based on discrete characteristics with respect to existing inexact matching based on error factor distance calculations. Match research ideas. The pattern matching method based on discrete substrings requires that the characters in the pattern must be completely, orderly, and discrete (three characteristics) appear in the text. When the discrete number is zero, it will evolve into an exact substring pattern matching. Discrete substring contains substrings, substrings It is a special case of discrete substrings.

In the field of application, the discrete characteristics of discrete substrings are in line with the public's choice of search terms. Users can flexibly and simply choose to satisfy ordered and discrete search terms. In the retrieval function, the discrete substring pattern matching method, because it satisfies completeness and order, and can discriminate the degree of correlation of the detected text through discrete numbers, the recall rate is high, the accuracy is guaranteed, and the positioning can be reasonably located.

The discrete substring pattern matching method solves the problem of the inherent discrete correlation retrieval omission in information retrieval in the past 40 years, and has important application value. Applicable to the following areas of information retrieval and information input: database retrieval of various texts, network information search, intra-site retrieval, information inquiry, keyboard input, electronic dictionary, operating system file retrieval, etc.

The output result in each pattern matching method of the present invention represents "non-existence", and any other specified data may be selected as a non-existent flag output.

Claims

WO 2007/101391 Claim PCT/CN2007/000392

A discrete substring pattern matching method for information retrieval and information input, characterized in that: the discrete substring is text 3 = '%S ₂ ... S. "S _8l S _g2 ...... S _sm " (1 g!<g ₂ ―... <g _ra <n);"Discrete substring pattern matching""ΡΡ;......P." (l < m < n ) is the discrete substring "S _sl S _s2 ...... S _g „" of the text S, and the specific steps of outputting the determination result are as follows:

2. The discrete substring pattern matching method for information retrieval and information input according to claim 1, wherein:

If the comparison character is the end flag, the decision mode P is a discrete substring of the text S, and the length n of the text S, the length fli of the pattern P, and the simple matching degree of the output discrete substring = Round (100 xm ÷ n ) are obtained. End matching; otherwise, it is determined that there is no discrete substring of the pattern P in the text S, the determination result is output, and the matching is ended.

3. A discrete substring pattern matching method for information retrieval and information input according to claim 1, wherein:

If the comparison character is equal to the comparison character, the position value of the character to be compared in the text S is stored in the position array pos [ ], and the storage position is the same as the position of the comparison character in the pattern P, and the text S is taken. The next character as the compared character, take the next character of pattern P as the comparison character, and turn b step; otherwise, take the next word of text S as the compared character, compare the characters unchanged, and turn to step b;

If the comparison character is not the end flag, it is determined that there is no discrete substring of the pattern P in the text S, and the result of the judgment is outputted as " - Γ , ending the match; otherwise the decision mode P is the discrete substring of the text S, and the text S is obtained. Length Ω, the length of the pattern ffl, find the position of the first character of the discrete substring and the position of the last character in the text S: _gi =pos [ ] The last value in the g „=pos [ 〗, the output An exact match that reflects the degree of dispersion of discrete substrings - Round (100: (ra - (g _{m -} g - m + 1) ÷ n) ÷ n) , ends the match.

4. A discrete substring pattern matching method for information retrieval and information input according to claim 3, wherein: If the comparison character is not the end flag, it is determined that there is no discrete substring of the pattern P in the text S, and the determination result "-1" is output, and the matching is ended; otherwise, the pattern P is the discrete substring of the text S "S _sl S _s2 ... S _g ' , find the length m of the pattern P, find the position of the first character of the discrete substring and the last character in the text S: _gl = pos [ the first value in [ ], g _m The last value in =pos [ ], output the discrete number D = g _m -g -m + l of the discrete substring, and output the position array pos [ ] to end the match.

5. A discrete substring pattern matching method for information retrieval and information input according to claim 4, wherein:

If the comparison character is not the end flag, it is determined that there is no discrete substring of the pattern P in the text S, and the determination result "-1" is output, and the matching is ended; otherwise, the pattern P is the discrete substring of the text S "S _gl S _E2 ... S _s „,", find: the length of the pattern P, the first character of the discrete substring, and the position of the last character in the text S: _gl = pos [ the first value in [ ], g _ra = pos [ The last value in , the discrete number of the discrete substring D = g _m -g -ffl + l ;

If the discrete number D is a predetermined discrete number D ₉ , then the decision mode P is the first discrete substring of the text S that meets the requirement of the discrete number D», outputs the discrete number D, and outputs the position array pos gate, ending match;

If D> D ₀ ; restart the matching of the next discrete substring, modify the position of the compared character in the text S to: the currently compared character position - the length of the pattern P m - the predetermined discrete number Do, and take The character at the position is used as the compared character; the position of the comparison character in the mode P is changed to the first character position of the mode P, and the character of the position is taken as the comparison character, and the step b is performed.

6. A discrete substring pattern matching method for information retrieval and information input according to claim 3, wherein:

Step d If the comparison character is not the end mark, go to step h; otherwise, move the position of the compared character in the text S forward by 2 character positions, and take the character of the position as the compared character, and compare the position of the character in the pattern P. Move forward 2 characters and take the character at that position as the comparison character;

Then, proceed to the following steps e, f, g, and h:

Step e If the first character of mode P has been compared, go to g step;

The g step determination mode P is a discrete element substring of the text S, and the length n of the text S and the length m of the pattern P are obtained, and the position of the first character and the last character of the discrete element substring in the text S is obtained: _gl = pos [ 〗 The first value in , g = the last value in pos [ ], the output discrete substring match = Round (100 (m- (g _ffl - g"m+l) ÷ n) ÷ n) , end matching ; The h-step determination text S does not have the discrete element sub-string of the pattern P, and the determination result "-1" is output, and the matching is ended.

7. A discrete substring pattern matching method for information retrieval and information input according to claim 6, wherein:

The g step determination mode P is a discrete element substring of the text S, and the length m of the pattern P is obtained, and the position of the first character and the last character of the discrete element substring in the text S is obtained: the first of g^pos i: The value, g _m = the last value in pos [ ], the output discrete number D = g _m - _gl -m + l, and output the position array po s [ ], end the match;

8. A discrete substring pattern matching method for information retrieval and information input according to claim 7, wherein:

The g step determination mode P is a discrete prime substring of the text S, and the length m of the mode P is obtained, and the first character of the discrete substring and the position of the last character in the text S are obtained: _gl = pos [ ] The value, the last value in g _m =pos [ ], the discrete number D-gfgr m+1 ;

If the discrete number D < a predetermined discrete gas, then the mode P is determined to be the discrete number D of the text S. The first discrete element substring is required, the discrete number D is output, and the position array pos [ ] is output, and the matching is ended;

If D> D. , restart the matching of the next discrete substring, modify the position of the compared character in the text S to: the value of the second position of Max (pos t ], g„+l- m- D.), and take the The character of the position is used as the compared character; the position of the comparison character in the mode 修改 is changed to the first character position of the mode ,, and the character of the position is taken as the comparison character, and the step b is performed;

9. A discrete substring pattern matching method for information retrieval and information input according to claim 6, wherein:

The first step g determination mode P is a discrete element substring text S, and if the current smallest discrete element of the substring first character and the last character position y in the text in S ,, y _m is not assigned, then let _yi = pos [] in A value, y _ffl = pos [ ] the last value, turn i step; otherwise, find the first character of the discrete substring and the position of the last character in the text S: _gl = the first value in the pos gate , g _m = pos [ the last value in [ ], if (g _m - g < (y _m - _yi ), then = ^ y _m = g _ra , turn i step; if (g«- g!) > ( y _m -Yi) , then turn directly to i step;

Step h If the position y, y ₂ of the first character and the last character of the current minimum discrete element substring are not assigned in the text S, it is determined that there is no discrete element substring of the pattern P in the text S, and the determination result "-, the end is output. Match; otherwise, find the length n of the text S, the length m of the pattern P, and output the minimum discrete element substring matching degree = Round (100 χ (m- (y _m -y -m+l) ÷ n) ÷ n) , End the match;

If i step (y _n - y, - m+l) =0, go to h step; otherwise, restart the matching of the next discrete substring, and change the position of the compared character in text S to the second of pos [ ] The value of the position, and takes the character of the position as the compared character; the position of the comparison character in the mode P is changed to the first character position of the mode P, and the character of the position is taken as the comparison character, and the step b is performed.

10. A discrete substring pattern matching method for information retrieval and information input according to claim 9, wherein:

Step g finds the discrete element substring. If the current minimum discrete element substring position array min [ ] is not assigned, let min [ ] = pos [ ], turn i step; otherwise, find the first and last characters of the discrete element substring The position in the text S: , the first value in =pos [ ], the last value in g _ffl =pos [ ], find the position of the first and last characters of the current smallest discrete prime substring in the text S: The first historical value in _yi =min [ ], y _m =min [ ] The last one: value, if (g _n - gi) < (y _m -y , then let min [ ] =pos [ ], Turn i step; if (g„- gd > (y.-y , then turn directly to i step;

Step h If the position array min [ ] of the current minimum discrete element substring is not assigned, it is determined that there is no discrete element substring of the pattern P in the text S, and the determination result "-1" is output, and the matching is ended; otherwise, the mode P is obtained. Length m, find the position of the first and last characters of the current smallest discrete substring in the text S: _yi =min [ the first value in [ ], the last value in y _ra =min [ ], the output discrete number D =y„-y , -ι +Ι , and output the position array min [ ] to end the match;

Step i find the position of the first and last characters of the current smallest discrete prime substring in the text S: _Υι =ηΰη [ The first value in : !, the last value in y _m =min [ ], if (y„ - m+1) =0, go h step; otherwise, restart the matching of the next discrete substring, modify the position of the compared character in the text S to the value of the second position of pos [〗, and take the position The character is used as the compared character; the position of the comparison character in the mode P is changed to the first character position of the mode P, and the character of the position is taken as the comparison character, and the step b is performed.

11. A discrete substring pattern matching method for information retrieval and information input according to claim 10, wherein:

Step h If the position array min [] of the current smallest discrete prime substring is not assigned, it is determined that there is no discrete substring of the pattern P in the text S, and the determination result "-1" is output, and the matching is ended; otherwise, the mode P is obtained. The length m, find the position of the first and last characters of the current smallest discrete substring in the text S: _7ι =ηιίη [the first value in [ ], the last value in y _m =min [ ], find the discrete Number D=y _n -y -m+l;

If the discrete number D> is a predetermined discrete number Do, it is determined that there is no minimum discrete element substring of the pattern P satisfying the predetermined discrete number Do in the text S, and the determination result "-1" is output, and the matching is ended;

If the discrete number D is a predetermined discrete number Do, the decision mode P is the smallest discrete element substring of the text S that meets the predetermined discrete number Do, outputs the discrete number D, and outputs the position array min [ ], ending match.

12. The discrete substring pattern matching method for information retrieval and information input according to claim 1, wherein: the plurality of texts S, the plurality of texts S^ ² ... S ⁿ form a two-dimensional Text Ds= "S'S ^J ...... S"" , a text string consisting of any one or more of the two-dimensional text Ds = "S^ ² ...... S ⁿ " or a discrete substring S ^ei ' of the text S " ^G1 . S ⁰² ' ...... S ^Cm '" (where KG^... <G n) is a two-dimensional discrete substring, and the two-dimensional discrete substring pattern matching determines the two-dimensional mode Dp = "Ρ'Ρ ² ...... Whether P ^m " (l ra n) is a two-dimensional discrete substring of the two-dimensional text Ds, the specific steps are as follows Next:

In step D, if the comparison text is the end mark, the two-dimensional mode Dp is a two-dimensional discrete substring of the two-dimensional text Ds, and the number n of texts of the two-dimensional text Ds is obtained, and the number of modes of the two-dimensional mode Dp is ra; Dimensional discrete substring single matching degree = Round (100xm ÷ n), end matching; otherwise there is no two-dimensional discrete substring of the two-dimensional mode D p in the two-dimensional text D s, and the judgment result "- , the end matching is output.

13. The discrete substring pattern matching method for information retrieval and information input according to claim 12, wherein:

If the comparison text is not the end mark, the two-dimensional discrete substring of the two-dimensional mode Dp does not exist in the two-dimensional text Ds, and the determination result "-1" is output, and the matching is ended; otherwise, the two-dimensional mode Dp is determined to be two-dimensional text. The two-dimensional discrete substring of Ds, find the number n of texts of the two-dimensional text Ds, the number m of modes of the two-dimensional pattern Dp, and find the first text string of the two-dimensional discrete substring and the last text string in two Position in the dimension text Ds: the first value in G^pos [ ], the last value in G _n =pos [ ], the output of the exact 2D discrete substring matching = Round (100 X (m- ( G _m -G-m+l) ÷ n) ÷ _n ) , end the match.