GB2440560A - A method of searching for patterns in a text using Boyer-Moore methodology - Google Patents

A method of searching for patterns in a text using Boyer-Moore methodology Download PDF

Info

Publication number
GB2440560A
GB2440560A GB0614986A GB0614986A GB2440560A GB 2440560 A GB2440560 A GB 2440560A GB 0614986 A GB0614986 A GB 0614986A GB 0614986 A GB0614986 A GB 0614986A GB 2440560 A GB2440560 A GB 2440560A
Authority
GB
United Kingdom
Prior art keywords
ngram
text
patterns
character
pattern
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
GB0614986A
Other versions
GB0614986D0 (en
Inventor
Neil Duxbury
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Roke Manor Research Ltd
Original Assignee
Roke Manor Research Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Roke Manor Research Ltd filed Critical Roke Manor Research Ltd
Priority to GB0614986A priority Critical patent/GB2440560A/en
Publication of GB0614986D0 publication Critical patent/GB0614986D0/en
Priority to EP07108874A priority patent/EP1883023A1/en
Priority to US11/812,535 priority patent/US20080027934A1/en
Priority to CA002593937A priority patent/CA2593937A1/en
Publication of GB2440560A publication Critical patent/GB2440560A/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/90335Query processing
    • G06F16/90344Query processing by using string matching techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F17/20
    • G06F17/21
    • G06F17/2765
    • G06F17/28
    • G06F17/3061
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06K9/00993
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/96Management of image or video recognition tasks

Abstract

A method of searching for one or more patterns in a text using Boyer-Moore methodology, including the step of wherein once a match with an ngrarn (a portion of a target text string or pattern) is determined, applying a routine which jumps forward so as to compare more initial characters so as to provide faster rejection. The object of the invention is to provide a faster algorithm based on pattern skipping so as to allow a fast reject mechanism followed by exhaustive matching that collectively provides enhanced throughput over the current approaches.

Description

<p>Improved Method for Searching for Patterns in Text In many information
retrieval applications it is necessary to be able to locate quickly some or all occurrences of user-specified patterns in data. The classical solution to this problem involves the use of the Commentz-Walter. Methodology. A string matching algorithm is described in the Proceedings of the 6th International Colloquium on Automata, Languages and Programming, number 71 in Lecture Notes in Computer Science, pages 118 -132. Springer -Verlag, 1979. The performance of the Commentz Walter algorithm is provided by its ability to identify a set of patterns whilst only examining a sub linear portion of the data. This capability is provided via the generalisation of the Boyer Moore methodology to a set of patterns (R. S. Boyer and J. S. Moore. "A fast string searching algorithm". Communication of the ACM, 20(10):762-772, 1977). The Boyer Moore approach using a pattern skipping technique that is based on the characters appearing in the pattern set.</p>
<p>The algorithm of Boyer and Moore defines a number of skip heuristics that allow the instances of a search pattern to be found within a text whilst only examining a subset of the characters within the text. The Boyer Moore algorithm compares a pattern with a text from right to left.</p>
<p>Example 1 (prior art)</p>
<p>The following example illustrates this situation: Table 1 POSiTION 0123456789...</p>
<p>TEXT babacabacba PATTERN babac In this case the search starts at position 4; the characters of the pattern are then matched in the order 4, 3, 2, 1, 0. If the search reaches the start of the pattern then an occurrence of the pattern in the text has been found. If a mismatch occurs between one of the characters of the pattern and one of the characters of the text a mismatch heuristic is applied to determine the position of the next match attempt.</p>
<p>The full Boyer Moore approach makes use of the heuristics described as follows: if the text symbol that is compared with the rightmost pattern symbol does not occur in the pattern at all, then the pattern can be skipped by m positions beyond this text symbol where m is equal to the length of the search p& n. The following example illustrates this situation.</p>
<p>Table 2</p>
<p>POSITION 0123456789...</p>
<p>TEXT babadabacba PATFERN babac babac The first comparison at position 4 produces a mismatch. The text symbol d does not occur in the pattern. Therefore, the pattern cannot match at any of the positions 0.. .4. Thus, the start of the pattern can be skiped to position 5 and position 9 is then tested. This will be referred to in the following as the mismatch rule.</p>
<p>If the text symbol that causes a mismatch is contained within the pattern then the pattern can be skiped so that the rightmost occurrence of the test symbol in the pattern is aligned to this text symbol. The following example illustrates this situation.</p>
<p>Table 3</p>
<p>POSITiON 0123456789 TEXT abbababacba PAUERN babac babac This heuristic is generally referred to as the bad character heuristic or bad character rule.</p>
<p>The Commentz -Walter algorithm is a natural extension of the Boyer Moore algorithm to cover the case where a search is performed for multiple patterns simultaneously. The Commentz -Walter algorithm represents the pattern set using a trie of the reversed patterns. A position pos is slid along the text, beginning at position 1mm (where 1mm is the shortest pattern length). For each position in the text we read backwards the longest suffix of the text that is also a suffix of one of the patterns. If we find an occurrence we mark it. Then the position of the search is skipped to the right using the Boyer Moore skip heuristics extended to a set of patterns. To avoid skipping any occurrence when skipping the position pos it is necessary to bound the maximal possible skip to 1mm.</p>
<p>Example 2 (prior art)</p>
<p>Below shows another example of the prior art where there are three patterns to be searched abbad, abef, and ghi. The text to be searched is shown at the top and comprises the ordered letters of the alphabet.</p>
<p>abed efg h Jfk 1 m n abbad abef ghi a bbad abe f gi abbad abef gh i For each character of each pattern (or just the shortest one a skip value is computed previously)see table. The set of three patterns is aligned in the first attempt as shown, at position 1. No match (with "e") is found so the patterns. Further more "e" is not present in any patterns so are each skipped by a value of 3 places (equal to the shortest search string. Although the end (right most character of each parent does not match the "h" in the text at position 2, and "h" is found in "g h I". "h" has a skip value of 1 so the pattern set is skipped by 1, to position 3 and a match is found.</p>
<p>Extension to nrams An ngram is a sequence of 1 or more characters where the, n, denotes the number of characters in the gram e.g. a monogram contains 1 character and a digram contains two characters, etc. For large dictionaries the sizes of the skips generated by the bad character and mismatch rules get progressively smaller. This is due in part to the fact that most of the characters in the skip table appear close to or at the right hand edge of one of the patterns within the pattern set. Consequently, the size of the skip that can be obtains is small compared to the length of the pattern. In this scenario the performance of the algorithm is compromised as the effort spent in calculating the skip value is not compensated by skips available. A method of extending the utility of the approach is to base the skipping on ngrams rather than monograms. In this instance the probability of an ngram appearing gets progressively smaller as the length of the ngram is increased. Thus, useful skip distances can be achieved and the performance of algorithm can be maintained. In order to use ngram skipping an extra heuristic must be used to ensure that patterns are not missed. In this case the largest possible skip distance for ngrams whose last character is equal to the first character of the patterns whose length is equal to 1mm is 1mm -1. An initialisation phase is used to create a master ngram skip table from the set of patterns as follows: each pattern is decomposed into its set of ngrams. For each pattern a skip value for each of the ngrams is calculated. The skip value is defined by the number of character positions that the algorithm skips forward in the event of finding the ngram in the text. The minimum skip value for each ngram taken over all the patterns is then stored in a skip database. Once the skip values have been computed the maximal skip criteria are applied. In this step each entry in the database is checked to ensure that the skip value does not exceed 1mm. In the event that the skip value exceeds 1mm it is reset to 1mm. If a particular ngram is not present in the set of patterns then the skip distance associated with that ngram is 1mm. Then for each of the ngrams whose last character matches the first character of any pattern in the set of patterns whose length is equal to 1mm the skip value is set to 1mm -1.</p>
<p>Example 3 (prior art)</p>
<p>For example, using a digram skip database the di-grams and skips of the patterns pebble' and pebbles' are as follows: Table 4 Digram Skip pe 4 eb 3 bb 2 bI 1 le 0 es 0 ANY OTHER 6 (1mm)</p>
<p>DIGRAM</p>
<p>The performance of the algorithm can be significantly improved by providing a fast reject mechanism to prevent unnecessary searching of the pattern trie. A simplistic method to achieve this would be to use a suffix of a pattern as an index into a flat look up table. However, due to current memory constraints the number of character that can be represented by a single look up table is limited to a few characters. Indeed the address space required to represent a flat lookup table quickly escalates as the ni. icr of characters increase according to 28m where, m is the number of characters. Clearly the memory costs of this approach are unworkable. However, the drawback with using a small number of characters is that it limits the effectiveness of the fast reject mechanism. One of the drawbacks of this approach is that as the size of the pattern set increases the utility of the skipping technique decreases resulting in poor performance. A second drawback is that in general these types of algorithms cannot be updated without recompiling their core data structures. For large pattern sets the cost of recompilation can be significant.</p>
<p>Invention It is an object of the invention to provide a faster algorithm based on pattern skipping followed so as to allow a fast reject mechanism followed by exhaustive matching that collectively provide enhanced throughput over the current approaches.</p>
<p>The invention comprises a method of searching for one or more patterns in a text using Boyer-Moore methodology, wherein once a match of an ngram is determined, entering into a routine which jumps forward so as to compare more initial characters so as to provide faster rejection.</p>
<p>This preferably includes comparing the first character (or ngram) of the search pattern.</p>
<p>Also wherein if the search text section which is to be compared with the search patterns includes a pre-designated character, searching for this character in the appropriate position in the search patterns.</p>
<p>The invention also comprises a method of searching for one or more patterns in a text using Boyer-Moore methodology, including the steps of forming a skip value for each ngram; comparing the current ngram with the skip value; if a zero skip is determined, skipping over the right hand most ngram, to another ngram, so that this right-hand most ngram is not compared with the current ngram of the text.</p>
<p>Preferably the first ngram to be compared is the last ngram of the search pattern but 1. In an alternative embodiment of this, there is included the step of formulating for each character a "next node" identifier, identifying which node to be jumped to is given in addition to the skip value.</p>
<p>W. n the current algorithm these memory issues are avoided whilst still providing a high degree of rejection by encoding each patterns characters within a keyword tne. Within the keyword trie each node can have as many edges as are required to represent the patterns contained in the pattern set.</p>
<p>The addition of a skip value to each node of the keyword trie also allows the characters of each pattern to be visited in non- sequential order. This modification improves the mismatch performance of the algorithm as it allows the characters of a search pattern to be compared to the text in non-sequential order. This allows the algorithm to only examine the minimum number of characters necessary to determine that a mismatch has occurred.</p>
<p>Example 3</p>
<p>Figure shows the word "spade" which is a pattern to be searched for in text. In prior art methodology when using an ngram of 1, the word would be located in the appropriate position in text and the rightmost character "e" would be compared with that in the text. If "e" was present then the next most right hand letter would be compared i.e. "d" and if this was matched the process would continue. This however is inefficient. For example if the text aligned to "spade" was "ipade" then the process would continue all the way to the last character before being rejected i.e. it is "i" and not "s". Under the invention if a match has been made, then the process jumps into a routine which allows faster rejection. For example after the "the e" is matched the routine may preferably jump straight to the first character to see if it is an "s". If not it may have saved a lot of time. Although this example as given relates to single characters (i.e. an ngram of!) it is equally applicable to ngrams of any suitable length and multiple patterns.</p>
<p>Example 4</p>
<p>In another example if say the search character (pattern) contains a rare character e.g. "x" in the English language, the routine may search the appropriate character in the text straightaway. As most times the match will be negative, the reject mechanism is faster.</p>
<p>Example 5</p>
<p>The following example relates to an improved embodiment of the invention. In the following example the text comprises the characters of the English alphabet in order. The search patters are "d e f g" and "abcd" 0 1 2 3 4 5 6 7 8 9101112 13 14 a b Cd efg h If kim no d e fg a b cd The following is a skip value table as used in the conventional Boyes more technique:</p>
<p>Table 5</p>
<p>Character Skip a 3 b 2 C 1 d 0 e 2 f 1 g 0 In the context of matching multiple patterns within the standard Commentz Walter approach once an ngram in the text has been aligned to a suffix of a pattern in the search set an exhaustive match on a keyword tne of reversed patterns is performed starting at the rightmost character of the potential alignment in the text Each character in the search pattern / text will have a skip value as defined and determined above.</p>
<p>Once the initial alignment has been made against the suffix d' of a b c d' the algorithm must traverse the keyword trie from the root using the characters of the search text taken in reverse order in order to discover the correct path through the tree to the sentinel marked abcd'. During this traversal it is necessary to reprocess the characters that have already been matched during the initial alignment phase i.e. the character "d " is processed twice i.e. in the Boyes-Moore standard technique, once the te; s aligned, the algorithm looks at the rightmost character of each pattern in the text (in this instance "d" ) and compares, meaning that this means that there are two steps where the character "d" is analysed somehow.</p>
<p>The invention reduces the extra step by allowing jumping straight to the next appropriate character for comparison, i.e. the character "c". Accordingly an extra column in the skip table needs to be determined called "NEXT NODE". This is shown in figure 1 where the nodes are numbered for the above example. Although this is also an extra computational step, it is only calculated once and save computing resources especially where there is a large pattern set. The table below shows the make up of the skip table according to the invention, where only the skip and next node values for "d" are shown. The next node value is "2" which is the numbered node This next node' column allows the algorithm to move directly to the correct location in the keyword trie without the additional comparisons. This methodology is equally applicable to ngrams of any length as the skip table will contain the same number of entries as there are branches exiting the root of the keyword trie. In this case we use the characters of the text to index the skip table. Then when the skip is found to be zero we simply look up the location of the appropriate path in the keyword trie in the next node column.</p>
<p>This is shown in the table below (for character "d" only)</p>
<p>Table 6</p>
<p>Character Code Skip Next Node a b</p>
<p>C d 0 2 e f g</p>
<p>This can be visualised with respect to a tree which is shown in figure which shows the node numbered "2" as the node with the character "c".</p>
<p>S</p>
<p>Further enhancement This further enhancement is enables the algorithm to skip forward to test characters (or ngrams) further up, i.e. more left hand characters, again which saves time. This is because if for example, we skip to the first letter of a pattern and we find this letter does not match we can forget about matching the pattern and so there. Thus this provides a short cut and saves (if thus rejected) having to go through each character in turn. This principle is also used in conjunction with the second invention. Where there are multiple patterns there may well be instances where there are search patterns with common suffices. E.g. "a b c d" and "b b c d". If one visualises this as a tree (see figure) one has to be careful not to jump further that a junction node, otherwise this may lead to missing patterns with different prefixes but with a common suffices. This is illustrate in figure 2 which shows the addition pattern "b b c d" in the search. A skip table which assist will show both the skip value as before, but the next node will be designated 8/6 which is the junction node. Another column in the table indicates "back skip" which indicates how much the algorithm has jumped forward/need to skip back... .rd This allow the algorithm to know how far to move back in the search text.</p>
<p>Once the jump is completed the two paths sharing the suffix b c d' can be differentiated by comparing the character before the b c d' part. The remainder of the pattern can be matched exhaustively or the remaining vertices can be visited in any order until the pattern has either been matched or a mismatch has occurred.</p>
<p>Table 7</p>
<p>Character (ngram) code Skip Value Next node (junction) Back Step a b</p>
<p>C</p>
<p>d 0 6 2 e f g h</p>
<p>J</p>
<p>The above methodology can be extended to cover the use of the fast reject mechanism described previously by adding a further column to the skip table that encodes the distance to be moved back through the search text to make the next comparison; at this point the remainder of the pattern can be matched exhaustively or the remaining vertices can be visited in any order until the pattern has either been matched or a mismatch has occurred. In the latter case each subsequent node must also contain a next node reference and a skip value to tell the algorithm which node and search text character to compare next.</p>
<p>Although this example is given relates to single characters (i.e. an ngram of 1) it is equally applicable to ngrams of any suitable length and multiple patterns.</p>

Claims (1)

  1. <p>Claims 1. A method of searching for one or more patterns in a text
    using Boyer-Moore methodology, including the step of wherein once a match of an ngram is determined, entering into a routine which jumps forward so as to compare more initial characters so as to provide faster rejection.</p>
    <p>2. A method as claim in claim 1 wherein the routine entered into includes comparing the first character of the search pattern.</p>
    <p>3. A method as claimed in claim 1 wherein if the search text section which is to be compared with the search patterns includes a pre-designated character, searching for this character in the appropriate position in the search patterns.</p>
    <p>4. A method of searching for one or more patterns in a text using Boyer-Moore methodology, including the initial step of a) forming a skip value for each ngram; b) comparing the current ngram with the skip value; c) if a zero skip is determined, skipping over the right hand most ngram, to another ngram, so that this right-hand most ngram is not compared with the current ngram of the text.</p>
    <p>5. A method as claimed in claim 4 wherein said first ngram to be compared is the last ngram of the search pattern but 1.</p>
    <p>6. A method as claimed in claim 5 or 6 including the step of formulating for each character a "next node" identifier, identifying which node to be jumped to is given in addition to the skip value.</p>
    <p>7. A method as claimed in claim 4, 5 or 6 wherein in step c) a the skipping step is such that where any search patterns have common suffixes, said skipping step does not move to an ngram which has a character which is not part of a common suffix.</p>
GB0614986A 2006-07-28 2006-07-28 A method of searching for patterns in a text using Boyer-Moore methodology Withdrawn GB2440560A (en)

Priority Applications (4)

Application Number Priority Date Filing Date Title
GB0614986A GB2440560A (en) 2006-07-28 2006-07-28 A method of searching for patterns in a text using Boyer-Moore methodology
EP07108874A EP1883023A1 (en) 2006-07-28 2007-05-24 Improved method for searching for patterns in text
US11/812,535 US20080027934A1 (en) 2006-07-28 2007-06-19 Method for searching for patterns in text
CA002593937A CA2593937A1 (en) 2006-07-28 2007-07-17 Improved method for searching for patterns in text

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
GB0614986A GB2440560A (en) 2006-07-28 2006-07-28 A method of searching for patterns in a text using Boyer-Moore methodology

Publications (2)

Publication Number Publication Date
GB0614986D0 GB0614986D0 (en) 2006-09-06
GB2440560A true GB2440560A (en) 2008-02-06

Family

ID=37006309

Family Applications (1)

Application Number Title Priority Date Filing Date
GB0614986A Withdrawn GB2440560A (en) 2006-07-28 2006-07-28 A method of searching for patterns in a text using Boyer-Moore methodology

Country Status (4)

Country Link
US (1) US20080027934A1 (en)
EP (1) EP1883023A1 (en)
CA (1) CA2593937A1 (en)
GB (1) GB2440560A (en)

Families Citing this family (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8560475B2 (en) 2004-09-10 2013-10-15 Cavium, Inc. Content search mechanism that uses a deterministic finite automata (DFA) graph, a DFA state machine, and a walker process
US8819217B2 (en) * 2007-11-01 2014-08-26 Cavium, Inc. Intelligent graph walking
US8473523B2 (en) 2008-10-31 2013-06-25 Cavium, Inc. Deterministic finite automata graph traversal with nodal bit mapping
CN103548022B (en) 2011-03-28 2017-05-31 思杰系统有限公司 The pattern matching systems of UTF 8 and method
US9483463B2 (en) * 2012-09-10 2016-11-01 Xerox Corporation Method and system for motif extraction in electronic documents
US9268567B2 (en) 2012-09-30 2016-02-23 Intel Corporation Instruction and logic for boyer-moore search of text strings
CN103425739B (en) * 2013-07-09 2016-09-14 国云科技股份有限公司 A kind of character string matching method
US9454695B2 (en) * 2014-10-22 2016-09-27 Xerox Corporation System and method for multi-view pattern matching
US9805099B2 (en) 2014-10-30 2017-10-31 The Johns Hopkins University Apparatus and method for efficient identification of code similarity
US9864956B1 (en) * 2017-05-01 2018-01-09 SparkCognition, Inc. Generation and use of trained file classifiers for malware detection
US10305923B2 (en) 2017-06-30 2019-05-28 SparkCognition, Inc. Server-supported malware detection and protection
US10616252B2 (en) 2017-06-30 2020-04-07 SparkCognition, Inc. Automated detection of malware using trained neural network-based file classifiers and machine learning
US10956669B2 (en) 2018-07-10 2021-03-23 Beijing Didi Infinity Technology And Development Co., Ltd. Expression recognition using character skipping
US11163948B2 (en) 2018-07-10 2021-11-02 Beijing Didi Infinity Technology And Development Co., Ltd. File fingerprint generation
US11557141B2 (en) 2019-12-19 2023-01-17 Beijing Didi Infinity Technology And Development Co., Ltd. Text document categorization using rules and document fingerprints
US11250131B2 (en) 2019-12-19 2022-02-15 Beijing Didi Infinity Technology And Development Co., Ltd. Multi-purpose agent for endpoint scanning

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6311183B1 (en) * 1998-08-07 2001-10-30 The United States Of America As Represented By The Director Of National Security Agency Method for finding large numbers of keywords in continuous text streams
US20020188926A1 (en) * 2001-05-15 2002-12-12 Hearnden Stephen Owen Searching for sequences of character data

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5966709A (en) * 1997-09-26 1999-10-12 Triada, Ltd. Method of optimizing an N-gram memory structure

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6311183B1 (en) * 1998-08-07 2001-10-30 The United States Of America As Represented By The Director Of National Security Agency Method for finding large numbers of keywords in continuous text streams
US20020188926A1 (en) * 2001-05-15 2002-12-12 Hearnden Stephen Owen Searching for sequences of character data

Also Published As

Publication number Publication date
CA2593937A1 (en) 2008-01-28
EP1883023A1 (en) 2008-01-30
GB0614986D0 (en) 2006-09-06
US20080027934A1 (en) 2008-01-31

Similar Documents

Publication Publication Date Title
GB2440560A (en) A method of searching for patterns in a text using Boyer-Moore methodology
US10169426B2 (en) Fast identification of complex strings in a data stream
US8775931B2 (en) Spell check function that applies a preference to a spell check algorithm based upon extensive user selection of spell check results generated by the algorithm, and associated handheld electronic device
Giugno et al. Graphgrep: A fast and universal method for querying graphs
US6785677B1 (en) Method for execution of query to search strings of characters that match pattern with a target string utilizing bit vector
US9519706B2 (en) Multiple rule development support for text analytics
US9460196B2 (en) Conditional string search
CN105426711A (en) Similarity detection method of computer software source code
Faro et al. An efficient skip-search approach to swap matching
Song et al. Fast algorithms for single and multiple pattern Cartesian tree matching
CA2627883C (en) Spell check function having a character replacement spell check algorithm that applies a preference based upon proximity of the characters replacing one another, and associated handheld electronic device
US8881004B2 (en) Use of multiple data sources for spell check function, and associated handheld electronic device
US20080244388A1 (en) Spell Check Function Having a Character Replacement Spell Check Algorithm That Applies a Preference Based Upon Proximity of the Characters Replacing One Another, and Associated Handheld Electronic Device
Külekci BLIM: A new bit-parallel pattern matching algorithm overcoming computer word size limitation
Kurniawan et al. A new string matching algorithm based on logical indexing
Gu et al. Fast multiple pattern cartesian tree matching
Bhardwaj et al. Efficient Wu Manber string matching algorithm for large number of patterns
CA2627836C (en) Use of multiple data sources for spell check function, and associated handheld electronic device
CA2627818C (en) Spell check function that applies a preference to a spell check algorithm based upon extensive user selection of spell check results generated by the algorithm, and associated handheld electronic device
Chen et al. Exact circular pattern matching using the bit-parallelism and q-gram technique
CN108536819A (en) Integer arranges method, apparatus, server and the storage medium with character string comparison
Hon et al. Compressed text indexing with wildcards
US20030187843A1 (en) Method and system for searching for a list of values matching a user defined search expression
US9773056B1 (en) Object location and processing
CA2855398C (en) Fast identification of complex strings in a data stream

Legal Events

Date Code Title Description
WAP Application withdrawn, taken to be withdrawn or refused ** after publication under section 16(1)