CN101960469A - Fast signature scan - Google Patents

Fast signature scan Download PDF

Info

Publication number
CN101960469A
CN101960469A CN200880127748.0A CN200880127748A CN101960469A CN 101960469 A CN101960469 A CN 101960469A CN 200880127748 A CN200880127748 A CN 200880127748A CN 101960469 A CN101960469 A CN 101960469A
Authority
CN
China
Prior art keywords
condition code
fingerprint
character string
fixed length
scanning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN200880127748.0A
Other languages
Chinese (zh)
Other versions
CN101960469B (en
Inventor
王强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wang Ying
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN201711339378.4A priority Critical patent/CN108197470A/en
Priority to CN201410055830.4A priority patent/CN103793522B/en
Publication of CN101960469A publication Critical patent/CN101960469A/en
Application granted granted Critical
Publication of CN101960469B publication Critical patent/CN101960469B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0207Discounts or incentives, e.g. coupons or rebates
    • G06Q30/0225Avoiding frauds

Landscapes

  • Business, Economics & Management (AREA)
  • Strategic Management (AREA)
  • Engineering & Computer Science (AREA)
  • Accounting & Taxation (AREA)
  • Development Economics (AREA)
  • Finance (AREA)
  • Economics (AREA)
  • Game Theory and Decision Science (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Marketing (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Collating Specific Patterns (AREA)

Abstract

Systems and methods for scanning signatures in a string field. In one implementation, the invention provides a method for signature scanning. The method includes processing one or more signatures into one or more formats that include one or more fingerprints and one or more follow-on search data structures for each fixed-size signature or signature substring such that the number of fingerprints for each fixed-size signature or signature substring is equal to a step size for a signature scanning operation and the particular fixed-size signature or signature substring is identifiable at any location within any string fields to be scanned, receiving a particular string field, identifying any signatures included in the particular string field including scanning for the fingerprints for each scan step size and searching for the follow-on search data structures at the locations where one or more fingerprints are found, and outputting any identified signatures.

Description

Condition code scanning fast
Technical field
The present invention relates to scan the condition code in the character string field.
Background technology
The object of digital content (as file, program, webpage, Email, internet data bag, or digital picture) can comprise one or more character string fields.A character string field is a data value string of representing literal or executable code usually.For example, an internet data bag can comprise network address, host name, Hypertext Transfer Protocol (HTTP) header, Hypertext Transfer Protocol message, e-mail attachment, Email Header and Email content.The big I of a character string field from several bytes to millions of with last byte.A character string condition code can be a string specific data value that indicates fully or the expression formula of specific data value (as specific regular expression), its objective is to be used for discerning a character string object (as specific computer virus or special genes sequence).Condition code can be stored in one and levy in the code data storehouse.A word is levied the code data storehouse and can be comprised a plurality of condition codes.The big I of a character string condition code from several bytes to several thousand bytes.
Character string condition code and character string field all are the bit strings that comprises a lot of elementary cells.Elementary cell is the minimum semantic unit that has, therefore usually in the condition code scanning technique as scanning element.The size of an elementary cell is decided by application.For example, the base unit of English character string is 8 bits (i.e. byte) normally, and the base unit of a computer virus condition code normally byte or nybble.
The elementary cell of each condition code can be designated as and equal or be not equal to certain particular value, or in certain particular range (as in digital scope 0 to 9 or in the English alphabet scope a to z).Each elementary cell can be case-insensitive or case sensitive.Each elementary cell can be supported simple logical operation (as " non-").In addition, each condition code can comprise asterisk wildcard, for example, " * " (a random length asterisk wildcard) or "? " (a fixed length asterisk wildcard), wherein " * " expression zero or arbitrarily a plurality of any elementary cells and "? " represent any elementary cell.For each random length condition code symbol, can further indicate its random length scope.When a condition code comprises the random length character, the indefinite length of condition code.If a condition code does not comprise the random length character, its length is fixed.
A typical condition code scanning process can be included on the interior all possible position of a character string field pairing condition code in more described character string field and the condition code database.Sweep velocity is usually by the size of condition code and limitation of complexity.In addition, sweep velocity also is subjected to the capabilities limits that condition code is upgraded one by one.
Summary of the invention
The embodiment of the invention provides the method and system of scan feature sign indicating number on the character string field.In general, the embodiment of one aspect of the invention provides character string condition code scan method, described method comprises one or more condition codes is processed into one or more forms, described form comprises one or more fingerprints and one or more follow-up data structure of searching of each fixed length feature subcode of each fixed length condition code or random length condition code, described one or more fingerprint comprises J fingerprint of specific fixed length condition code or feature subcode, the position on direction of scanning of first elementary cell of described J fingerprint in described specific fixed length condition code or feature subcode equals J divided by the remainder of the step-length of condition code scan operation, thereby make the number of described fingerprint equal the step-length of condition code scanning, and make and to be identified on described specific fixed length condition code or the feature subcode any position in any character string field that is scanned, wherein each described fingerprint comprises one or more fragments of specific fixed length condition code or feature subcode, described one or more fragment has the ad-hoc location Anywhere in described specific fixed length condition code or feature subcode, receive a specific character string field of forming by data value, discern the included any condition code of described specific character string field, being included in each is on the position of spacing with the scanning step, scan described specific character string field, to search the one or more described fingerprint of one or more condition codes, with on the position of the described fingerprint that one or more couplings are arranged, search described specific character string field, one or morely follow-uply search data structure and export any condition code of having discerned in the described specific character string field to search.Other embodiment of described aspect of the present invention comprises the pairing system of described method, device, and computer software product.
These and other embodiment comprises one or more following characteristics alternatively.Each fixed length condition code or feature subcode have a plurality of fingerprints, being included in each with described scanning is on the position of spacing with the scanning step, scan described character string field,, comprise parallel two or more fingerprints of searching to search a plurality of fingerprints of one or more condition codes.Each fingerprint in one or more fingerprints of special characteristic sign indicating number indicates fully in former space or after being projected to one or more shadow spaces, described shadow space is than the wider space of form, described former space, described shadow space is by introducing some ambiguities to described former space, thereby makes a fingerprint shadow at specific shadow space corresponding to one or more fingerprints in described former space.
It is on the position of spacing with the scanning step that described character string condition code scan method can further be included in each, at the described specific character string field of former spacescan, to search one or more fingerprints, with be on the position of spacing with the scanning step earlier at each, in each shadow space in one or more shadow spaces of one or more fingerprints, scan described specific character string field, to search one or more fingerprints, and then have on the position of one or more fingerprints of having discerned at least one shadow space in one or more described shadow spaces, in described former space, examine in the fingerprint of having discerned one or two.Introducing some ambiguities further comprises to described former space the upper case and lower case letter in former space is become a not case sensitive letter, all numerals of from 0 to 9 in former space are become an identical numeral and the space in former space and "-" are become one or more in space or "-".
Scanning can further comprise one or more scanning of using in one or more hash tables and grand (Bloom) filtrator of one or more cloth with described one or more fingerprints of searching described one or more condition codes.Scanning can further comprise one or more scanning of using in a hashed value demodulation multiplexer and the fingerprint length demodulation multiplexer with described one or more fingerprints of searching described one or more condition codes.The number of the different characteristic code length of the comparable described a plurality of condition codes of number of the different fingerprint length of a plurality of condition codes is few, and it is on the position of spacing with the scanning step that described scanning can further be included in each, scan described specific character string field, to search a plurality of fingerprints of described a plurality of condition codes, comprise the parallel identical fingerprint of two or more length of searching.Described one or more fingerprint can be selected, thereby make the length of described fingerprint be restricted to one group of length that covers the one or more length in one or more length ranges, so that the finger scan of multiresolution to be provided.But the integral multiple of the step-length of the length condition code scan operation of each described fingerprint.Described character string condition code scan method can further comprise one or more scanning of using in one or more content adressable memorys (CAM) and the one or more finte-state machine (FA), to search described one or more fingerprints of described one or more condition codes.
Described character string condition code scan method can further comprise each fingerprint of a plurality of fingerprints of a plurality of condition codes is decomposed into one or more fingerprint sections, thereby make the number of different fingerprint segment length of described condition code lack than the number of the different fingerprint length of described condition code, at each is on the position of spacing with the scanning step, scan described specific character string field, to search described a plurality of fingerprint section, comprising walks abreast searches two or more fingerprint sections and the fingerprint section of having discerned is synthesized any fingerprint matching.All fingerprint sections can have an identical length, and the described specific character string field of described scanning is the step-length of condition code scan operation of the integral multiple of described fingerprint segment length to search available one of a plurality of fingerprint sections.The fingerprint section figure of the one or more possible positions of an explanation particular fingerprint section in any fingerprint can be stored in together with described particular fingerprint section, to be used for that the fingerprint section of having discerned is synthesized any fingerprint matching.Illustrate that one or more fingerprint length informations that may length can be stored in together with first section of each fingerprint or each section, to be used for that the fingerprint section of having discerned is synthesized any fingerprint matching.One or more finte-state machines (FA) can be used to the fingerprint section of having discerned is synthesized any fingerprint matching.
Described character string condition code scan method can further comprise the probability of the false positive coupling of storing each fingerprint, on the position of the fingerprint that one or more couplings are arranged, check the probability of pairing false positive coupling, when enough not hanging down with the probability that mates when a false positive in the probability of described one or more false positive couplings, search described specific character string field, to search described one or more follow-up data structure of searching.Described method can further comprise with one or more different difference of elementary cell structure of pairing a plurality of fixed length condition codes of fingerprint or feature subcode searches data structure, with search described specific character string field, to search described one or more follow-up data structure of searching, comprise that difference searches pairing a plurality of fixed length condition codes of the fingerprint of having discerned or feature subcode.Described method can further comprise each fixed length condition code or feature subcode with one or more yards film bits of encoded, described one or more yards film bits comprise one or more yards film bits that are used to illustrate one or more following matching conditions: need not mate, whether case sensitive, logic NOT, the predefine scope, logical operation, and any range, described one or more yards film bits comprise one or more following sign indicating number film bits: the sign indicating number film bit of one or more elementary cells or this unit of subunit, the sign indicating number film bit of one or more feature subcode sections, sign indicating number film bit with one or more fixed length condition codes or feature subcode, search described specific character string field, to search described one or more follow-up data structure of searching, comprise to search described by the fixed length condition code or the feature subcode of sign indicating number film bits of encoded.Described method can further comprise described specific character string field normalization, comprises the specific character string field decoding of having encoded, the specific character string field decompress(ion) that has compressed with the one or more processes in the unwanted string data deletion.
In general, the embodiment of one aspect of the present invention provides character string condition code scan method, described method comprises each condition code in a plurality of condition codes is resolved into one or more condition code sections, receive a specific character string field of forming by data value, scan described specific character string field, to search described a plurality of condition code sections of described a plurality of condition codes, comprise parallel two or more condition code sections of searching, the condition code section of having discerned synthesized the coupling of any condition code and export any condition code of having discerned in the described specific character string field.Other embodiment of described aspect of the present invention comprises the pairing system of described method, device, and computer software product.
Concrete enforcement can comprise one or more following characteristics.Scanning can further comprise use to search described a plurality of condition code section: one or more hash tables, one or more Bloom filters, one or more have hashed value multiplexing length is multiplexing or both hash tables and one or more have hashed value multiplexing length is multiplexing or both Bloom filters in one or more scanning.The condition code section figure of the one or more possible positions of an explanation special characteristic sign indicating number section in any condition code can be stored together with described special characteristic sign indicating number section, to be used for that the condition code section of having discerned is synthesized any condition code coupling.Illustrate that the condition code length information of one or more possible condition code length can be further be stored in together with first section of each condition code or each section, mates to be used for that the condition code section of having discerned is synthesized any condition code.One or more finte-state machines (FA) can be used to the condition code section of having discerned is synthesized any condition code coupling.
In general, the embodiment of one aspect of the present invention provides character string condition code scan method, described method comprises one or more condition codes is processed into one or more forms, comprise each the random length condition code in one or more condition codes is decomposed into a plurality of fixed length feature subcodes and one or more random length feature subcode, receive a specific character string field of forming by data value, discern the included any condition code of described specific character string field, comprise the described specific character string field of scanning, to search a plurality of described fixed length condition codes or feature subcode, with on the position that has one or more described fixed length feature subcodes to be identified, the described fixed length feature subcode of having discerned synthesized any random length condition code and export any condition code of having discerned in the described specific character string field.Handle positional information that one or more condition codes become one or more forms further to comprise to store each fixed length feature subcode to static nature sign indicating number composition rule database, described positional information comprises an order and the distance range to next fixed length feature subcode, and comprise or do not comprise description to the random length feature subcode between every pair of fixed length feature subcode, and the described coupling that the fixed length feature subcode of having discerned is synthesized any random length condition code further comprise the positional information of checking each fixed length feature subcode of having discerned and examine or do not examine every pair between the adjacent fixed length feature subcode described random length feature subcode and upgrade a behavioral characteristics sign indicating number synthetic state table.One or more finte-state machines (FA) can be used to fixed length feature of having discerned is synthesized any random length condition code.Other embodiment of described aspect of the present invention comprises the pairing system of described method, device, and computer software product.
In general, the embodiment of one aspect of the present invention provides character string condition code scan method, each character string object that described method is included as in one or more character string objects is selected a plurality of fixed length condition codes, a plurality of fixed length condition codes of described specific character string object comprise J fixed length condition code, the position on direction of scanning of first elementary cell of described J fixed length condition code in described specific character string object equals J divided by the remainder of the step-length of condition code scan operation, thereby make the number of fixed length condition code of described specific character string object equal the step-length of condition code scan operation, and make and to be identified on any position of described specific character string object in any character string field that is scanned, receive a specific character string field of forming by data value, discern the included any character string object of described specific character string field, being included in each is on the position of spacing with the scanning step, scan described specific character string field, to search described a plurality of fixed length condition codes of described one or more character string objects, comprising and two or more fixed length condition codes of line scanning and export any character string object of having discerned in the described specific character string field.Described method can further be included as each character string object and select a plurality of random length condition codes based on many groups of nonoverlapping orderly fixed length condition codes, each random length condition code in described a plurality of random length condition code comprises a fixed length condition code and the random length condition code section that a pair of adjacent fixed length condition code is coupled together in every group of fixed length condition code, thereby makes the number of condition code of each character string object equal S nWherein S is the number of condition code of scanning step or every group of fixed length condition code and the group number that n is the fixed length condition code, further comprise the described specific character string field of scanning with any character string object in the described specific character string field of described identification, a plurality of fixed length feature subcodes with a plurality of random length condition codes of searching one or more character string objects, with on the position of the fixed length feature subcode that one or more couplings are arranged, the fixed length feature subcode of having discerned is synthesized any random length condition code.
In general, the embodiment of one aspect of the present invention provides character string condition code scan method, each character string object that described method is included as in one or more character string objects is selected one or more fixed length condition codes, described one or more fixed length condition codes of described one or more character string objects are processed into one or more forms, described form comprises one or more fingerprints and one or more follow-up data structure of searching of each fixed length condition code, a plurality of fingerprints of described specific character string object comprise J fingerprint, the position on direction of scanning of first elementary cell in described specific character string object of described J described fingerprint equals J divided by the remainder of the step-length of condition code scan operation, thereby make the fingerprint number of described specific character string object equal the step-length of condition code scan operation, and make and to be identified on any position of described specific character string object in any character string field that is scanned, wherein each described fingerprint comprises one or more fragments of the specific fixed length condition code in described one or more fixed length condition codes of described specific character string object, described one or more fragment has the ad-hoc location Anywhere in described specific fixed length condition code, receive a specific character string field of forming by data value, discern any character string object in the described specific character string field, be included on the position that each scanning step is a spacing, scan described specific character string field, to search a plurality of described fingerprint of described one or more character string objects, comprising parallel two or more described fingerprints of searching, with on the position of the described fingerprint that one or more couplings are arranged, search described specific character string field, to search the described follow-up character string object of searching data structure and exporting any identification in the described specific character string field.Described method can further be included as each character string object and select a plurality of random length condition codes based on many groups of nonoverlapping orderly fixed length condition codes, each random length condition code in described a plurality of random length condition code comprises a fixed length condition code and the random length condition code section fingerprint that a pair of adjacent fixed length condition code is coupled together in every group of fixed length condition code, thereby make the number of random length condition code of each character string object equal respectively to organize the product of number of the fixed length condition code of fixed length condition code, comprise the described specific character string field of scanning with the included any character string object of the described specific character string field of described identification, to search a plurality of fingerprints of one or more character string objects, comprise with parallel two or more fingerprints of searching, with on the position of the fingerprint that one or more couplings are arranged, search described specific character string field, to search the described follow-up data structure of searching.Other embodiment of described aspect of the present invention comprises the pairing system of described method, device, and computer software product.
In general, one aspect of the invention provides a character string condition code scanning system, described system comprises a machine-readable memory device that comprises computer program, comprise the processor of following one or more modules with one or more executable computer program products and operation: one or more fingerprints and one or more follow-up condition code pretreatment module of searching one or more forms of data structure that one or more condition codes can be processed into each the fixed length feature subcode that comprises each fixed length condition code or random length condition code, described one or more fingerprint comprises J fingerprint of specific fixed length condition code or feature subcode, the position on direction of scanning of first elementary cell of described J fingerprint in described fixed length condition code or feature subcode equals J divided by the remainder of the step-length of condition code scan operation, thereby make the number of described fingerprint equal the step-length of condition code scanning, and make and to be identified on described fixed length condition code or the feature subcode any position in any character string field that is scanned, wherein each described fingerprint comprises one or more fragments of specific fixed length condition code or feature subcode, described one or more fragment has the ad-hoc location Anywhere in described specific fixed length condition code or feature subcode, the scanning pre-service engine that can be treated to an input character string field of forming by data value the required form of one or more scannings, with the finger scan engine that can on described input character string field, discern one or more fingerprints of one or more condition codes, described identification is included on the position that each scanning step is a spacing, scan described input character string field, to search described one or more fingerprints of described one or more condition codes.Described system can comprise further that the fixed length condition code that can discern the fixed length feature subcode of pairing fixed length condition code of the fingerprint of having discerned or random length condition code searches engine.Described system can comprise further that one comprises that the random length condition code that the fixed length feature subcode of the random length condition code of having discerned is synthesized the random length the discerned condition code of any random length condition code searches engine.Other embodiment of described aspect of the present invention comprises the pairing method of described system, device, and computer software product.
Concrete enforcement can comprise one or more following characteristics.Described condition code pretreatment module can be selected one or more shadow spaces and one or more fingerprints are projected to described one or more shadow space to go scanning.The condition code pretreatment module can be decomposed into each fingerprint in one or more fingerprints one or more fingerprint sections of one or more length, and the fingerprint composite signal of each fingerprint section stored into a fingerprint database, can discern a plurality of fingerprints of the one or more condition codes in the described input character string field with described finger scan engine, it is on the position of spacing with the scanning step that described identification is included in each, scan described input character string field, to search a plurality of fingerprint sections, with on the position that one or more fingerprint sections of having discerned are arranged, the fingerprint section of having discerned is synthesized any fingerprint matching.
Described condition code pretreatment module can be encoded to one or more condition code sections of a condition code, and one or more condition code sections of described one or more yards film bits and described condition code are stored together with one or more yards film bits.Described condition code pretreatment module can make up a difference and search data structure with the one or more different elementary cell of a plurality of condition codes.Described condition code pretreatment module can make up one and comprise a fingerprint database, fixed length condition code database and the condition code database of a condition code rule database when having a random length condition code at least is arranged when described condition code scanning system.
Described scanning pre-service engine can further comprise a scanning conveyer, a projector, a character string field store device and a shadow field store device.Described scanning pre-service engine can be handled one or more character string field by piece, described processing comprises conveying, decoding, normalization, with conversion one or more, each character string piece in described one or more character string piece comprises that is used for the finger scan zone that fingerprint scanner uni condition code is searched, one be used for that condition code searches be used for the back condition code seek area after the finger scan zone that condition code is searched the preceding condition code seek area before the finger scan zone and one.Described trizonal each zone of one character string piece can be stored in one or more measure-alike memory blocks, all described trizonal all memory blocks separately or with the constructible ring of one or more add-in memories pieces by first memory block beginning of current preceding condition code seek area, to reduce data moving in internal memory.
Described finger scan engine can one or more detects one or more fingerprints with one or more hash tables and one or more Bloom filter.Described finger scan engine can further comprise a finger scan controller, a fingerprint hash counter, a fingerprint finger, a fingerprint compositor and a fingerprint database.Described fingerprint hash counter can calculate a plurality of hashed values of a plurality of hashed key in order with an order hash function on the prefix fragment of a plurality of mutual nonoverlapping hashed key.Described finger scan engine can comprise the one or more fingerprint compositors that a plurality of fingerprint sections walked abreast or synthesize the coupling of any fingerprint serially in a usefulness fingerprint section bit map and the fingerprint length information.Described finger scan engine can comprise a fingerprint compositor that further comprises one or more finte-state machines (FA).
One or more fingerprints of one or more length can be broken down into the fingerprint section of a plurality of identical sizes, and by one or more finger scan engine scannings that the same scan step-length is arranged, each finger scan engine of described one or more finger scan engines covers the position of one or more nonoverlapping staggered described input character string fields, thereby make total scanning step of described one or more finger scan engines equal the product of the former scanning step of the number of described one or more finger scan engines and single finger scan engine, or cover the position of one or more partly overlapping staggered described input character string fields, thereby make total scanning step of described one or more finger scan engines, and between the product of the former scanning step of the number of described one or more finger scan engines and single finger scan engine between the former scanning step of single finger scan engine.The number of the finger scan engine that the product of scanning step and memory speed is less can be more than the product of scanning step and memory speed the number of bigger finger scan engine.
Cover one the used one or more internal memories of one or more finger scan engines of short fingerprint section can equal or faster than covering used one or more internal memories of one or more finger scan engines of long fingerprint section and cover the short pairing one or more fixed length condition codes of fingerprint of one or more average lengths and search that the used one or more internal memories of engine can equal or search the used one or more internal memories of engine faster than covering the long pairing one or more fixed length condition codes of fingerprint of one or more average lengths.One or more coverings are shorter than the finger scan engine of the fingerprint of length-specific, separately or the first of searching engine with pairing fixed length condition code available one or more in described scanning system the fastest internal memory.The a plurality of finger scan engines that scan identical one or more fingerprints can a shared multiport memory.Described finger scan engine can further comprise one or more content adressable memorys (CAM).
Described fixed length condition code is searched engine can further comprise a condition code search engine, and a condition code is examined device and a fixed length condition code database.Described condition code search engine and described condition code are examined the fragment that device can come a condition code of more one or more tape code films with a condition code unit comparer and condition code section comparer, to discern one or more fixed length condition codes.But described condition code search engine difference is searched one or more fixed length condition codes or feature subcode.Described random length condition code is searched engine can further comprise a condition code composition rule finger, a condition code state verification device, a condition code composition rule database and a condition code state form.Described random length condition code is searched engine can comprise a finte-state machine (FA).One or more engines can comprise one or more content adressable memorys (CAM) and one or more finte-state machine (FA) one or more.
The described specific embodiment of this instructions can be used to realize following one or more advantages.The invention provides the character string scanning system of the condition code in condition code storehouse of a scanning.Described character string scanning system is updated flexibly and easily.Even a character string condition code scanning engine is when scanning number of characteristics sign indicating number (as a hundreds of thousands condition code), the complex features sign indicating number (as reaches several kilobyte, or the band asterisk wildcard " * " and "? " scope, case-insensitive, logic NOT) and during a dynamic condition code storehouse, still can provide the sweep velocity that is exceedingly fast (as 100Gbps).The size in the sweep velocity of described character string scanning system and condition code storehouse and complicacy extensibility.In addition, less internal memory and the memory bandwidth of described character string scanning system demand.Described character string scanning system can be implemented with software or field programmable gate array (FPGA) or special IC (ASIC).In addition, the cost benefit height of described character string scanning system both had been applicable to high-end product, also was applicable to product at a low price.
One or more embodiment of the present invention sees following instructions and accompanying drawing for details.Other features and advantages of the present invention are apparent by instructions, accompanying drawing and claims.
Description of drawings
Figure 1A shows the structural drawing of an exemplary quick character string condition code scanning system;
Figure 1B shows an exemplary process diagram that makes up character string condition code database;
Fig. 1 C shows the exemplary process diagram of a character string condition code scanning;
Fig. 2 A-2C shows the example data structure of a fingerprint database;
Fig. 2 D-E shows the example data structure of a Hash-entry piece and the embodiment of a fingerprint compositor;
Fig. 2 F-2G shows the embodiment of the fingerprint compositor of the example data structure of a Hash-entry piece and a correspondence;
Fig. 2 H-I shows the embodiment of the parallel fingerprint compositor of the example data structure of a Hash-entry piece and a correspondence;
Fig. 3 A-B shows and is used for the feature code group chained list that the fixed length condition code searches and the example data structure of condition code chained list;
Fig. 4 A-B shows the example feature code element comparer of a predefined global unit scope of support of a character string field and the structural drawing of a condition code section comparer;
Fig. 4 C shows a structural drawing of supporting the example feature code element comparer of local condition code unit scope;
Fig. 5 A-C shows one and is used for the selected cell tree that the fixed length condition code searches and the example data structure of condition code family chained list;
Fig. 6 shows an example data structure that is used for the condition code regulation linked that the random length condition code searches;
Fig. 7 shows the example data structure of the condition code state-chain-table of a specific character string field;
Fig. 8 shows one by the example data structure of the Hash-entry piece of a condition code state Bloom filter or hash table indication;
Fig. 9 shows an exemplary computer system.
Similarly Reference numeral is represented similar elements with indicating in each accompanying drawing.
Embodiment
General survey
The present invention is for to carry out method for scanning and system at a character string condition code database to a character string field.In one embodiment, the scan method of " dividing and rule " is used to scan with a plurality of flow line stages.Each random length condition code at first is broken down into a plurality of fixed length feature subcodes and scans, and scans and each fixed length feature subcode of each fixed length condition code or each random length condition code further is broken down into a plurality of condition code sections again.In one embodiment, the method for " coarse scanning " back " close scanning " is used to scan with the multiple pipeline sweep phase earlier, to search the character string condition code.On each scanning position, one or more fingerprints of fixed length condition code or fixed length feature subcode at first are scanned.Only further check and to carry out having on the position of one or more fingerprint matchings.In addition, finger scan scanning fingerprint shadow (space that shadow is relevant with shadow will be described in detail below) on each scanning position earlier.Only on the position of the fingerprint shadow that one or more couplings are arranged, just need described fingerprint is comprehensively checked.
Described fingerprint shadow can scan on each scanning position in further first segmentation, and then by only check fingerprint shadow section in any fingerprint possible position and possible fingerprint length fingerprint shadow section is synthesized.The comprehensive inspection of fingerprint shadow only need be carried out on the position of the synthetic fingerprint shadow that one or more couplings are arranged.In addition, the scanning of fingerprint shadow section only needs on each scanning position its hashed value to be scanned.Check that further fingerprint shadow section only needs to carry out on the position of the hashed value that one or more couplings are arranged.In one embodiment, before scanning,, and pretreated condition code stored in the condition code database, to offer the scanning use of multiple pipeline stage earlier the character string condition code pre-service that is used to scan the character string field.
Figure 1A shows a quick character string condition code scanning engine 100.Described scanning engine comprises that 140, one fixed length condition codes of a condition code pretreatment module 90, one scanning, 120, one finger scan engines of pre-service engine are searched engine 160 and a random length condition code is searched engine 180.Character string condition code scanning engine 100 scans the character string field, and may send the numbering 190 and the position in described character string field of the condition code of having mated back at one or more character string condition code databases fast, with identification special characteristic sign indicating number.In one embodiment, the condition code database comprises 148, one fixed length condition codes of fingerprint database database 166 and a condition code rule database 186.
Figure 1B shows the program 91 of each character string condition code of pre-service.In one embodiment, at first a random length condition code is decomposed into a plurality of fixed length feature subcodes and random length feature subcode, and the information stores of the relation between the fixed length feature subcode (step 92) in the condition code rule database 186.
For quick scanning, fixed length condition code of step 92 output or feature subcode can further be broken down into a plurality of can be by the fragment (step 94) of optimum ordered check.In one embodiment, first or preceding several condition code section particular importance can be used as the fingerprint of fixed length condition code or feature subcode.The fingerprint of a character string condition code both can be scanned apace, can reduce the probability of false negative or false positive coupling again.In one embodiment, the probability of false negative coupling is zero.In one embodiment, the number of the different fingerprint length of a plurality of condition codes lacks than the number of the different characteristic code length of a plurality of condition codes, thereby accelerate to require the speed of the scan method that the pattern to different length scans separately.In one embodiment, when scanning step during greater than an elementary cell, a plurality of fingerprints can be used for a character string condition code, wherein first elementary cell displacement of first elementary cell of each fingerprint previous relatively fingerprint on the direction of scanning one or more elementary cells.The direction of scanning is the direction that moves of scan operation scanning position in an input character string field.
The fingerprint of fixed length condition code or feature subcode can further be decomposed into the fingerprint section, can further project to one or more shadow spaces as required, thereby joins fingerprint database 148 then and in the condition code database (step 96).Because the fingerprint of different length may independently be scanned, fingerprint can further resolve into a plurality of fingerprint sections and walk abreast or serial scan.In certain embodiments, fingerprint is broken down into a plurality of fingerprint sections, thereby makes that the number of different fingerprint segment length is one or few more than the number of different fingerprint length, thereby accelerate to require the speed of the scan method that the pattern to different length scans separately.The scanning result of fingerprint section can be synthesized to together subsequently, to detect fingerprint.
In one embodiment, in order further to improve the ability of scan efficiency and scanning complex characteristic sign indicating number, fingerprint and other condition code section can be projected to one or more shadow spaces earlier and scan, and then verify at luv space.Shadow space can be selected to simplify and the accelerated scan process, can also cover all possible form of fingerprint or fingerprint section simultaneously.A shadow space can cover the form of a plurality of fingerprints or condition code section.For example, in order to support case sensitive and single character case-insensitive simultaneously, shadow space can be for having only small letter or having only the space of capitalization.As a special case, shadow space can be exactly a luv space.
The fingerprint that indicates fully or any further feature sign indicating number section are still at the shadow of any shadow space and indicate fully.Therefore, the fingerprint that indicates fully always can be in any shadow space scanning except that luv space.In one embodiment, for the number in the space of reducing required scanning, all fingerprints that indicates fully are a shadow space scanning in one or more shadow spaces all, and making does not have fingerprint to scan at luv space.In another embodiment, the shadow space of a condition code database has only one, and all fingerprints of described condition code database are all in described unique shadow space scanning.
In one embodiment, fingerprint database comprises one or more Bloom filters or hash table.Oversize or too expensive when in the finger scan process, comparing when former hashed key, in order further to reduce the probability of false positive coupling and Hash collision, in another embodiment, fingerprint database comprises one or more have additional hashed value bit or a fingerprint length or both modified Bloom filter or hash tables.
At last, the fragment of the fixed length feature subcode of all fixed length condition codes or random length condition code is encoded and is stored in the fixed length condition code database 166, to search fixed length condition code or feature subcode (step 98).In one embodiment, described fragment can be encoded with unit membrane or subunit film, with the specified conditions of supporting character string condition code coupling (as " need not mate ", " equate ", " unequal ", " case-insensitive ", " case sensitive ", " in a scope ", " outside a scope ").In one embodiment, the fragment of tape code film can be compiled into a chained list or any other subsequently and search structure (as tree etc.).In another embodiment, sheet segment encode film or fixed length condition code or feature subcode sign indicating number film can be compiled into searches structure, to save storage space.In another embodiment, one group of character string condition code can further utilize the different units between the character string condition code to carry out differential coding, to form the differential data structure that can search fast (as the difference tree etc.).
Fig. 1 C has demonstrated the scanning sequence 101 of a character string condition code.A character string field to be scanned is at first wanted decoded and is transformed (as with scanning pre-service engine 120) needed form of one or more follow up scan stage (step 102).Described symbol string field at first compares and is scanned (as with finger scan engine 140) by the shadow to the fingerprint of its shadow and one or more character string condition codes at shadow space, and then in the original fingerprint space any fingerprint shadow of having discerned (step 104) is examined.Step 106 checks whether a fingerprint matching is arranged.
After scanning, do not have coupling or have a coupling to produce output or the output that expression has a few condition code to mate that an expression does not have the condition code coupling respectively.In one embodiment, finger scan engine 140 can provide zero false negative coupling and an enough little false positive coupling probability.If there is not fingerprint matching, the scanning of current scanning position is finished, can move to next scanning position (step 108).If fingerprint matching is arranged, will the condition code of a few coupling further be searched (as searching engine 160), to be identified as the fixed length feature subcode (step 110) of fixed length condition code or random length condition code more clearly with the fixed length condition code.
Step 112 checks whether a fixed length condition code or feature subcode coupling are arranged.If do not mate, the scanning of current scanning position is finished, is movable to next scanning position (step 108).If one or more fixed length condition code couplings are arranged, export the numbering of the fixed length condition code of each coupling, and finish the scanning (step 118) of current scanning position.If there is fixed length feature subcode to be identified as the part of one or more random length condition codes, the feature subcode of having mated will by dynamically synthetic (as searching engine 180) with the random length condition code to detect one or more random length condition codes (step 114).Step 116 checks whether a random length condition code coupling is arranged.If do not mate, the scanning of current scanning position is finished, can move to next scanning position (step 108).If one or more couplings are arranged, export the numbering of the random length condition code of each coupling, and finish the scanning (step 118) of current scanning position.
In the preprocessing process 91 of condition code, in one embodiment, the probability of the false positive of each fingerprint coupling is stored in the fingerprint database 148.If step 106 has a fingerprint matching, the probability of the false positive of the fingerprint of described coupling coupling will be checked.If the probability enough low (as being lower than specified threshold value) of described false positive coupling, the scanning process of current scanning position is finished, can move to next scanning position (step 108).In another embodiment, the probability of the false positive of all database of fingerprint coupling is all enough low, therefore need not store and check the probability of the false positive coupling of fingerprint.The scanning process of current scanning position is finished behind finger scan, can move to next scanning position (step 108).
In step 102, scanning pre-service engine 120 is at first with described character string field decoding, normalization, and be transformed to the condition code database in the identical form of condition code.In one embodiment, the scanning of character string condition code is to carry out on whole character string field.Yet in another embodiment, because the restriction of the storage space of some systems or low delay requirement, it is impossible to store whole character string field.Therefore, when step 102, described character string field can be broken down into the character string piece of several pre-sizings.The scanning of character string condition code is to carry out on the character string block of each pre-sizing.
After loading a character string piece, described character string piece is standardized decoded, and the needed different-format of sweep phase after being transformed to.In one embodiment, decoding can be supported different compressed format (as LZS, PKZip, and gzip) with the normalization process, and different coding standard (as the UU coding, MIME coding, HTML, and XML) and deletion be " counter-scanning " junk data at random.
In one embodiment, decoded character string field will further be projected to the shadow space of one or more condition code Database Requirements, to support the complex features sign indicating number.For example, decoded character string field is converted into full small letter (as a shadow space), to support the character string condition code scanning of case-insensitive.Character string condition code scanning can be carried out on the character string field decoded in full small letter earlier, and then with former case sensitive decoded the character string field and full the small letter character string field of having decoded verify.
In step 104, finger scan can at first be discerned the fingerprint of its shadow for indicating fully.In order to scan a large amount of and complex features sign indicating number fast, in one embodiment, finger scan engine 140 can scan a plurality of elementary cells simultaneously with one or more hash tables or Bloom filter.In one embodiment, finger scan engine 140 can improve the service efficiency of reservoir and reduce the false positive coupling and the probability of fingerprint collision with fingerprint length is multiplexing with hashed value is multiplexing.Hashed value is multiplexing and fingerprint length is multiplexing can further reduce the probability of false positive coupling (being Characteristics of Fault sign indicating number coupling) when guaranteeing zero false negative coupling (promptly missing a condition code coupling).
Step 110 scanning fixed length condition code.The fixed length feature subcode of fixed length condition code or random length condition code can be in described fixed length condition code sweep phase identification.The scanning of fixed length condition code only needs just to carry out when having a fingerprint matching in the finger scan process at least.Pairing fixed length condition code of the fingerprint that has mated or feature subcode can be mated one by one or with other search structure (as the tree etc.) mate.In one embodiment, the tape code film of elementary cell or this unit of subunit relatively can be used for supporting the specified conditions of character string condition code coupling (as " need not mate ", " equate ", " unequal ", " case-insensitive ", " case sensitive ", " in a scope ", " outside a scope ").
Step 114 scanning random length condition code.In one embodiment, the scanning of random length condition code only needs just to carry out when scanning has the random length condition code of one or more random length characters.The fixed length feature subcode of random length condition code is identified at fixed length condition code sweep phase, and the fixed length feature subcode of having discerned can dynamically be linked together at random length condition code sweep phase, to synthesize one or more former random length condition codes.Described synthetic can the realization with a static composition rule table and a dynamic synthetic state table.Described composition rule table indicates the rule that fixed length feature subcode is synthesized the random length condition code, and described synthetic state table is then safeguarded current synthetic state according to described composition rule table.The pre-service of condition code database
In one embodiment, in order to improve sweep velocity and memory efficient, before the described character string field of scanning, 90 pairs of condition codes of condition code pretreatment module are carried out pre-service.Before depositing the condition code database in, condition code pretreatment module 90 can be decomposed condition code earlier, conversion and be encoded into one or more forms.In one embodiment, condition code pretreatment module 90 can make up and safeguard 148, one fixed length condition codes of fingerprint database database 166 and a condition code rule database 186.
Contain one or more random length subcodes (repeating 3 to 6 times " (bc) { 3-6} " as representing more than the zero " * " of elementary cell arbitrarily or representative " bc ") when the condition code database has one or more condition codes, each such random length condition code can be broken down into a plurality of fixed length feature subcodes with random length feature subcode earlier.For example, if a condition code is " a subcode 1* subcode 2* subcode 3 ", wherein subcode 1, and subcode 2 and subcode 3 all are the fixed length subcodes that does not contain the random length character, and described condition code can be broken down into subcode 1, subcode 2, subcode 3.Each described " * " in the subcode 1* subcode 2* subcode 3 can be replaced by a random length subcode.In one embodiment, each described fixed length feature subcode can be scanned earlier independently, and then it is synthesized former random length condition code.
In one embodiment, condition code rule database 186 can make up with the positional information (as order, last subcode sign is to the distance or the distance range of next fixed length subcode) of each fixed length feature subcode, to be used for synthetic fixed length feature subcode.In one embodiment, the random length subcode between two continuous fixed length feature subcodes is not " need not mate ", and described condition code rule database 186 can further contain the description to described random length subcode, to be used for synthetic fixed length feature subcode.In another embodiment, available described fixed length feature subcode and random length feature subcode make up one or more finte-state machines (FA), and to be used for synthetic fixed length feature subcode, wherein to do as a whole be an incoming symbol to each fixed length feature subcode.
In one embodiment, fixed length condition code or feature subcode will further be broken down into a plurality of fragments (these fragments comprise the described fingerprint of described condition code or feature subcode) that can search with the order of the best.Described a plurality of fragment can have different sizes or identical size.In order to prevent the false negative coupling or not missing any one condition code that the set of all described fragments equals former fixed length condition code.In the process of condition code scanning, along with the increase of the number of the fragment of coupling, the false positive matching value will reduce (i.e. the confidence level of coupling increase).The process of scanning will end at the fragment of first unmatched fragment or last coupling.In one embodiment, the selection of the fragment of condition code can allow not have and mate the condition code coupling appearance as early as possible that stops or discern one zero false positive coupling.
In one embodiment, described fingerprint comprises a plurality of fragments.In another embodiment, described fingerprint has only a fragment, and be encoded as tlv triple { fragment, length, dislocation }, wherein fragment is that first fragment that is scanned of character string condition code is the fingerprint of described condition code, and length is the length of described fingerprint, misplaces to be the dislocation of described fingerprint in a fixed length condition code or feature subcode.Particular fingerprint is the specific fragment of the fixed length feature subcode of a fixed length condition code or random length condition code.
In one embodiment, the process of described condition code scanning can selectedly can be simplified and accelerate to described shadow space, can cover the form of described multiple fingerprint or condition code fragment simultaneously.In the ideal case, described phantom value can be directly as a hashed key.For example, in order to support case sensitive simultaneously and unit case-insensitive, described shadow space can be the space of full small letter or full capitalization.For example, in order to scan by connecing the drivers license number that 7 numerals are formed behind the letter, wherein each letter and number can further be appointed as the scope of letter and number arbitrarily, shadow space can replace all letters with a code or any one letter (as " a "), and replaces all numerals with another code or any one numeral (as " 0 ").For example, in order to scan the Social Security Number of being formed by space or "-" separated three bit digital by three groups (SSN), wherein each numeral can further be designated as digital scope arbitrarily, shadow space can replace all numerals with a code or any one numeral (as " 0 "), and replaces space and "-" with another code or space or "-".As a special case, described shadow space can be exactly former space.
In one embodiment, fingerprint is after shadow space has been scanned, and the checking of fingerprint can be carried out in former space after detecting the fingerprint shadow at once.In another embodiment, described checking can formerly verify other fragment partly or entirely after carry out again.If described fingerprint is covered by other fragment fully, need not verify.
The selection of fingerprint should be accelerated the speed of described finger scan, the probability that also will provide the minimum false positive after finger scan to mate simultaneously.In one embodiment, described fingerprint can be on arbitrary dimension and the optional position in described condition code.In another embodiment, in order to satisfy the requirement of system, the size of described fingerprint or the limited location system in described condition code.For example, in order to satisfy the delay requirement of system, the dislocation of described fingerprint can not surpass certain particular value.
In one embodiment, fingerprint can be selected in order to next or a plurality of conditions: 1) the described shadow of described fingerprint does not have asterisk wildcard or scope, with by than short scan, 2) probability that occurs in described character string field to be scanned of described fingerprint is very little, 3) number of the shared fingerprint of a plurality of condition codes is as much as possible little, with 4) there is the number of fingerprint of an identical fingerprint section as much as possible little.
The condition of other selection can increase according to the requirement of system.In common network application and non-network application, all or most of character string condition code all contain one section quite long at least usually, do not have the fragment of asterisk wildcard or scope after in projecting to a selected shadow space.In one embodiment, first alternative condition is a necessary condition.In another embodiment, first alternative condition can further be restricted to all fingerprints each fingerprint at least in a shadow space for indicating fully.The condition code that does not contain the fingerprint that meets alternative condition can be expanded the condition code that contains the fingerprint that meets alternative condition for a plurality of, or does not need the different scanning mode of condition code expansion to scan with one.
Described fingerprint can be selected by checking all fragments that meet a condition code of first alternative condition.The parameter of other fingerprint also can be carried out consideration when selecting fingerprint.According to second alternative condition, be chosen in the very little condition code section of probability that described character string field to be scanned occurs as fingerprint, can reduce the probability of false positive fingerprint matching.In addition, make the number of fingerprint of an identical fingerprint section as much as possible little, can further reduce the probability of false positive fingerprint matching.Although the length of described fingerprint can be less than 8 elementary cells or more than 32 elementary cells, usually between 8 to 32 elementary cells.
Because described condition code can very long (as hundreds of or thousands of elementary cells), the length of fingerprint may be also a lot.Yet in one embodiment, the fingerprint of different length will scan separately, thereby sweep velocity is slower.In one embodiment, in order to reduce the complicacy of scanning, can limit the number of fingerprint length according to particular system requirement and system architecture, as be less than 16.In one embodiment, the length of fingerprint can be selected from a predetermined lengths table.In addition, the length of fingerprint can be according to system requirements and system architecture, and by exponential increasing relation (as 2,4,8,16 and 32), linear increment concerns the (multiple as 4: 4,8,12,16,20,24,28 and 32), or another relation (as 2,3,5,8,13,21,34) is selected.
In one embodiment, the fingerprint of a condition code can be selected with an algorithm.For example, following algorithm can be used for selecting the fingerprint of a fixed length condition code or feature subcode (suppose that scanning step equals 1, the length of fingerprint is from being as short as most the longest l of being fixed as 0, l 1, l 2..., l M-1, l m, the finger scan segmentation is carried out, and the shadow space of finger scan is given): 1. find all in the feature numeral section of shadow space for indicating fully.2. to each l that is longer than mDescribed son section, find all length to equal l mSon section.3. each length is equaled l mSon section, write down described son section has been chosen as fingerprint in all condition codes times N cWith with other fingerprint the times N of the first identical fingerprint section is arranged sAnd with one based on N c, N s, and l mCost function calculate value at cost.4. equal l with length respectively M-1..., l 2, l 1, and l 0Son section repeating step 2 and 3.5. from step 2 to 4, the son section of finding out minimum cost is a fingerprint.
Above-mentioned steps is relevant with the processing sequence of condition code.Several random processing order can be used for finding different fingerprints as required.In one embodiment, value at cost equals (m-i), N c, and N sFrom the most significant digit to the lowest order, link up, i=0 wherein, 1,2 ..., m and i are the length of fingerprint.In one embodiment, if found the fingerprint of length-specific, do not need again all shorter son sections to be selected.
In another embodiment, finger scan engine 140 each mobile scanning steps.On each scanning position, the fingerprint of the different length of finger scan engine 140 serial or parallel ground scanning.Therefore, sweep velocity is directly proportional with scanning step (being the number of the elementary cell between continuous two scanning positions).In one embodiment, in order to improve sweep velocity, finger scan can scan a plurality of elementary cells simultaneously, rather than an elementary cell.In order to ensure zero false negative coupling, each character string condition code will equal described scanning step with the number of a plurality of fingerprints and described fingerprint.In other words, fixed length condition code or feature subcode can equal scanning step with the number that a plurality of fingerprints are put into condition code database and described fingerprint.First elementary cell of J fingerprint of special characteristic sign indicating number is described special characteristic sign indicating number (J+k*S) individual elementary cell in the direction of scanning, and wherein S is a scanning step, and k is a nonnegative integer, and J=0,1,2 ..., S-1.Described condition code just can find in any position in the described string segments to be scanned.For example, if the special characteristic sign indicating number is " [Rr] [Ee] [Aa] [Dd] [Mm] [Ee] 123.exe ", described scanning step is 4, described fingerprint length comprises 4,8 and 12, can select following four fingerprints for use: " [Rr] [Ee] [Aa] [Dd] [Mm] [Ee] 123.ex ", " [Ee] [Aa] [Dd] [Mm] [Ee] 123.exe ", " [Aa] [Dd] [Mm] [Ee] 123. " and " [Dd] [Mm] [Ee] 123.e ", wherein [Rr], [Ee], [Aa], [Dd] and [Mm] is respectively the English alphabet r of case-insensitive, e, a, d, and m." [Rr] [Ee] [Aa] [Dd] [Mm] [Ee] 123.exe " puts into fingerprint database four times with described four fingerprints.When scanning step is 1, only needs a fingerprint and only need put into once.
The scanning of described a plurality of fingerprints can begin any position of (promptly in first scanning step) from preceding S unit of input character string field.For example, in one embodiment, described scanning will scan (k*S) position since the 0th position, and wherein k is a nonnegative integer.In any input of character string field, (k*S) position is covered by the 0th fingerprint, and (k*S+1) position is covered by (S-1) individual fingerprint, and (k*S+2) position is covered by (S-2) individual fingerprint, ... and (k*S+S-1) position is covered by the 1st fingerprint.In another embodiment, scanning will scan (k*S+S-1) position since (S-1) position, and wherein k is a nonnegative integer.In any input of character string field, (k*S) position is covered by (S-1) individual fingerprint, (k*S+1) position is covered by (S-2) individual fingerprint, (k*S+2) position is covered by (S-3) individual fingerprint, ..., (k*S+S-2) position is covered by the 1st fingerprint and (k*S+S-1) position is covered by the 0th fingerprint.
For described scanning step being brought up to S elementary cell, in one embodiment, with the scanning step of being taken in be 1 o'clock select the algorithm of a fingerprint can make following modification for each fixed length condition code or feature subcode, to be used for selecting described S fingerprint of each fixed length condition code or feature subcode: step 1 is to 4 and just the same in the past, step 5 item is revised as and finds step 2 from all, the dislocation in the direction of scanning of described fixed length condition code or feature subcode is in the son section of (J+k*S), the son section of finding out the minimum cost value is J fingerprint in the described S fingerprint, J=0 wherein, 1,2, ..., S-1 and k are a nonnegative integer.
Usually each character string object has only a condition code, and for supporting the scanning step of S elementary cell, the fixed length feature subcode of each fixed length condition code or random length condition code needs S fingerprint.In another embodiment, for supporting the scanning step of S elementary cell, S fixed length condition code can be used for discerning a character string object, thereby make each fixed length condition code be (J+k*S) in the dislocation of described character string object in first elementary cell of direction of scanning, J=0 wherein, 1,2, ..., be a nonnegative integer (S-1) with k.Described specific character string object just can be identified on any position of any character string field to be scanned.In one embodiment, the scanning of S fixed length condition code can be without fingerprint.In another embodiment, each condition code in S fixed length condition code can be carried out condition code scanning with a fingerprint.
In one embodiment, a plurality of random length condition codes based on many groups of nonoverlapping orderly fixed length condition codes can further be selected to discern a character string object.Every group of fixed length condition code has S fixed length condition code, and wherein J fixed length condition code first elementary cell in the direction of scanning is (J+k*S) in the dislocation of described character string object, J=0 wherein, 1,2 ..., (S-1) and k be a nonnegative integer, thereby constitute S nIndividual random length condition code, wherein n is the group number of nonoverlapping code character that S fixed length condition code arranged in order.From every group of fixed length condition code, select a fixed length condition code, and then these fixed length condition codes and one or more random length string segments are combined into S nA random length condition code in the individual random length condition code, thus make described specific character string object on any position of any character string field to be scanned, to be identified.Each former fixed length condition code in the described n group S fixed length condition code becomes a fixed length feature subcode of the random length condition code after a plurality of synthesizing after synthesizing.In one embodiment, the scanning of described random length condition code can be without fingerprint.In another embodiment, each the fixed length condition code in the described random length condition code can select for use a fingerprint to scan.
In one embodiment, for supporting the scanning step of S elementary cell, the specific character string object can be selected P fixed length condition code for use, and each fixed length condition code can further be selected one or more fingerprints for use, thereby makes total fingerprint number of each character string object equal S.J fingerprint in the S of the described specific character string object fingerprint is (J+k*S) in the dislocation of first elementary cell in described specific character string object of direction of scanning, J=0 wherein, 1,2, ..., (S-1) and k be a nonnegative integer, thereby make described character string object on any one position of any character string field to be scanned, to be identified.
In another embodiment, a plurality of random length condition codes based on many groups of nonoverlapping orderly fixed length condition codes can further be selected to discern a character string object.The overlapping and not orderly fixed length condition code of i group has P iIndividual fixed length condition code, wherein each fixed length condition code further has one or more fingerprints, thereby makes total fingerprint number of i group fixed length condition code equal S, i=0 wherein, 1,2 ..., n-1 and n are the group number of overlapping and not orderly fixed length feature code character.J fingerprint in the S of the every group of fixed length condition code fingerprint is (J+k*S) in the dislocation of first elementary cell in described specific character string object of direction of scanning, J=0 wherein, 1,2 ..., (S-1) and k be a nonnegative integer.
From every group of fixed length condition code, select a fixed length condition code, and then these fixed length condition codes and one or more random length character string are combined into P 0* P 1* P 2* P N-1A random length condition code in the individual random length condition code, thus make described specific character string object on any position of any string segments to be scanned, to be identified.Each former fixed length condition code in the described n group fixed length condition code becomes a fixed length feature subcode of the random length condition code after a plurality of synthesizing.
In the condition code of each character string object of a scanning system, after fingerprint and shadow space were determined, the shadow of a fingerprint can be used as an integral body and scans.The fingerprint shadow of different length can walk abreast or serial scan.In one embodiment, the shadow of the fingerprint of different length can be used as an integral body and carries out serial scan.In one embodiment, a character string condition code is enrolled described condition code database and can use following false code: For (i=0, i<S-1, i++) { { fingerprint shadow i, h i}=fingerprint is selected (character string condition code); k i=h i/ S; Preceding hashed value=IV; For (j=0, j<k i-1, the j++) { length=j*S of hash; Existing hash character string=fingerprint shadow i[length of hash, the length+S-1 of hash ]; Existing hashed value=hash (existing hash character string, preceding hashed value); Preceding hashed value=existing hashed value; If (i=0), condition code is searched pointer=condition code and is enrolled (character string condition code); Fingerprint enrolls (existing hashed value, condition code is searched pointer); Wherein S be scanning step, h iBe the length of the fingerprint of i character string condition code, IV is an original Hash value, and hash () is an order hash function.The best fingerprint that fingerprint selects () to select for each shift position, condition code is enrolled () and condition code is programmed into described fixed length condition code is searched and go in the database that engine 160 and random length condition code search engine 180 and fingerprint enrolls () and fingerprint is programmed in the described fingerprint database 148 goes.When scanning step greater than 1 the time, the fingerprint of each condition code all will call fingerprint and enroll () once.But, because all pointing to same condition code, all fingerprints of a condition code search data structure, as long as calling condition code altogether, all fingerprints of a condition code enroll () once.
In one embodiment, can use following false code: For (i=0, i<S-1, i++) { k from character string condition code of described condition code database deletion i=h i/ S; Preceding hashed value=IV; For (j=0, j<k i-1, the j++) { length=j*S of hash; Existing hash character string=fingerprint shadow i[length of hash, the length+S-1 of hash ]; Existing hashed value=hash (existing hash character string, preceding hashed value); Preceding hashed value=existing hashed value; Condition code searches pointer=fingerprint deletion (existing hashed value); If (i=0), condition code deletion (condition code is searched pointer, existing hash character string); }
Wherein S is a scanning step, h iBe the length of the fingerprint of i character string condition code, IV is an original Hash value, hash () is an order hash function, fingerprint deletion () delete a fingerprint from described fingerprint database 148 and condition code is deleted () and the condition code of described condition code is searched data structure searched the database that engine 160 and random length condition code search engine 180 from described fixed length condition code and delete.When scanning step greater than 1 the time because a condition code has a plurality of fingerprints, from the condition code database, delete a condition code, fingerprint deletion () will be called repeatedly.But, as long as all fingerprints of special characteristic sign indicating number call condition code deletion () altogether once.
Because the number of the different fingerprint segment length number more than different fingerprint length usually is few, in one embodiment, in order to improve scan efficiency, a fingerprint can further be broken down into a plurality of fingerprint sections.All fingerprint sections of a fingerprint can walk abreast earlier or scanning serially, and the scanning result of described fingerprint section is again by parallel or synthetic serially, to detect fingerprint.The length of described fingerprint section can be identical or different, depends on the length of the fingerprint that specific scanning engine is supported.
In one embodiment, the number of fingerprint section and length can be decided according to the length of fingerprint and other sweep parameter of specific scanning engine.In another embodiment, be that the length of linear relationship and all fingerprint section is all identical between the fingerprint length.Usually, the length of a fingerprint section equals one or more scanning steps.In another embodiment, scanning step is that integral multiple and a plurality of fingerprint section of the length of a fingerprint section will be synthesized concurrently.
In one embodiment, fingerprint database can comprise one or more in one or more Bloom filters and the one or more hash table.Usually use Bloom filter or hash table should weigh the size and the bandwidth of storer.When the number of condition code can all deposit enough on-chip memories of big bandwidth in, Bloom filter or even a plurality of hash table may be better.On the contrary, when the yardage of condition code more, thereby must use chip external memory the time, the bandwidth of internal memory just becomes main restriction, described hash table rather than described Bloom filter may be better.
In one embodiment, the added bit of hashed value is stored in a Bloom filter or the hash table, and is multiplexing to realize hashed value.In another embodiment, fingerprint length is stored in a Bloom filter or the hash table, and is multiplexing to realize fingerprint length.Oversize or too expensive when comparing in scanning fingerprint when former hashed key, the multiplexing and fingerprint length of hashed value is multiplexing can further to reduce the probability that false positive coupling and fingerprint collide.
In one embodiment, work as shadow space, fingerprint, after fingerprint section and finger print data structure were determined, the fixed length feature subcode of fixed length condition code or random length condition code can be used as an integral body or segmentation and is programmed in described fingerprint database 148 and the condition code database at shadow space and goes.
In one embodiment, other fragment beyond the fingerprint of a fixed length feature subcode of fixed length condition code or random length condition code can be encoded and store into fixed length condition code database 166, to be used for scanning whole fixed length condition code or feature subcode after fingerprint matching.In one embodiment, one or more fragments of a feature subcode of fixed length condition code or random length condition code can be encoded into the fragment of a plurality of tape code films with the sign indicating number film of one or more elementary cells or this unit of subunit, with the specified conditions of supporting character string condition code coupling (as " need not mate ", " equate ", " unequal ", " case-insensitive ", " case sensitive ", " in a scope ", " outside a scope ").In one embodiment, in order to improve storage efficiency, one or more condition code sheet segment encode films or one or more condition code or feature subcode sign indicating number film can use separately, or use together with one or more elementary cells or this element number of subunit film.
In one embodiment, the condition code section of tape code film can be linked in sequence together in order or not with chained list, or pre-service becomes other to search structure (as tree).In one embodiment, the length of the condition code section of all tape code films can equate entirely or be not congruent.The length of the condition code section of described tape code film can be selected best according to the framework of particular memory.
In another embodiment, allow the condition code that to mate converge to zero or one as quickly as possible in order to allow the false positive coupling of character string condition code converge to zero-sum as quickly as possible, one group of character string condition code can further be carried out differential coding, to make up the differential data structure (as the difference tree) that can search fast.The difference tree can utilize the different units between the condition code to make up.
In one embodiment, when adding one or more new condition codes, maybe when deleting one or more existing condition code, the pre-service of condition code database can only be carried out when the initial creation of condition code database.In another embodiment, the database of condition code can dynamically upgrade in the scanning process of condition code.The scanning pre-service of character string field
Described scanning pre-service engine 120 is the pre-service of character string field different forms according to character string condition code database, to simplify and acceleration scanning flow line stage thereafter.In one embodiment, the condition code in the condition code database is stored with uncoded form.Therefore, scanning pre-service engine 120 is the character string field decoding of having encoded, with consistent with the codec format of condition code in the described condition code database.Shown in Figure 1A, scanning pre-service engine 120 comprises 128, one projectors 130 of 126, one decoded word segment memories of 124, one format decoder of 122, one character string field store of scanning conveyer device and a shadow field store device 132.Described scanning conveyer 122 is loaded into described format decoder 126 with data to be scanned from described character string field store device 124 earlier, described format decoder 126 will be decoded to original field again, resolve and decompression, wherein can comprise the MIME decoding, the decoding of UU, the foreign language decoding, remove and comprise skimble-skamble character string (as extra white space) and counter-scanning garbage character string data (as the counter-scanning junk data that injects), HTML resolves, XML resolves, and deflate decompresses, and LZS decompresses, PKZip decompresses and gzip decompresses.Described format decoder 126 can also be according to the needs normalized character string field of particular system.After this, the character string field after described format decoder 126 will be decoded and standardize deposits in the decoded word segment memory 128 goes.
Described projector 130 projects to one or more shadow spaces with decoded field, and the shadow data storage in shadow field store device 132.For example, condition code database can comprise the condition code of case-insensitive.In order to support the condition code of case-insensitive, described projector 130 is helped small letter with the data-switching in the decoded word segment memory 128, and a full lowercase character string field is stored in the shadow field store device 132 go.Full lowercase character string field can be used for carrying out finger scan by finger scan engine 140.The condition code of case sensitive condition code and case-insensitive just can scan simultaneously.The coupling of case sensitive condition code can be verified with the decoded field of case sensitive after its shadow coupling again.
In one embodiment, character string condition code scanning engine 100 comprises the computational resource or the network equipment that can support whole character string field is carried out the scanning of character string condition code fast.Yet in another embodiment, character string condition code scanning engine 100 comprises the computational resource or the network equipment that can't store whole character string field fast, owing to for example be subjected to the storage space limitations and the low delay requirement of system.Therefore, described character string field can be broken down into the character string piece of a plurality of pre-sizings.The scanning of character string condition code is carried out on the character string piece of each pre-sizing.
In one embodiment, the size of a character string piece depends on the longest character string condition code.Described character string piece can further be divided into three zones: a finger scan zone is to cover all condition code scanning positions, before one in the zone in finger scan zone with before fingerprint is provided or among reference data, after one in the zone in finger scan zone with after fingerprint is provided or among reference data.The set in all finger scan zones of specific character string field has covered all possible fingerprint reference position in described character string field.Described trizonal size can be identical or different.In one embodiment, before minimum dimension in the zone in finger scan zone be the maximum fingerprint dislocation of all condition codes in the condition code database, the back is the length of all condition codes in the condition code database and the maximum difference of dislocation in the minimum dimension in the zone in finger scan zone, and the minimum dimension in finger scan zone is a scanning step.
In one embodiment, the size of described scanning character string piece can be according to the parameter of described scanning system, for example, the length of the longest condition code, internal storage structure and sweep velocity are selected.When the longest condition code was longer, scanning character string piece can be several times of length of the longest condition code, for example, and 2 to 4 times.When the longest condition code more in short-term, scanning character string piece can be more times of length of the longest condition code.
In one embodiment, the trizonal size of scanning character string piece is identical, all equals the length of the longest condition code.Described three zones can be stored in the ring that constitutes with three internal memories, to reduce data moving in internal memory.In another embodiment, described trizonal varying in size, the finger scan zone is littler than other two zones, and the size in other two zones can be identical or different.
In one embodiment, described trizonal each zone of a scanning block can be stored in one or more onesize memory blocks, all trizonal all memory blocks constitute first memory block of the past in the zone in finger scan zone to begin, the ring that last memory block in the zone to the back in the finger scan zone finishes is to reduce data moving in internal memory.After having scanned first memory block in finger scan zone, first memory block of described ring will withdraw from described ring, can be used for loaded with new data.After loading new data, described memory block will add described ring as last memory block.In one embodiment, one or more add-in memories pieces can add the ring tail of described ring, for loading new data.
In one embodiment, described character string field store device 124 only comprises last memory block.Therefore, the size of described character string field store device 124 equals the size of a memory block.Described decoded word segment memory 128 and described shadow field store device 132 comprise all trizonal all memory blocks, and its size is all memory block size sums.
Because boundary condition, there are specific condition in described first and last piece.In one embodiment, length can be filled into for " impossible " elementary cell of the length of long condition code before first of reference zone before described finger scan zone described after last piece with the back reference zone in described finger scan zone." impossible " elementary cell comprises the elementary cell that does not have condition code to begin or end up with it, and therefore, the zone of described filling can not become the part of the condition code of a reality.If finger scan engine 140, fixed length condition code are searched engine 160 and random length condition code and searched the sweep limit checking mechanism is arranged in the engine 180, described filling does not just need.The sweep limit checking mechanism can prevent to scan the border that exceeds the character string field.Finger scan
In one embodiment, described finger scan engine 140 can comprise one or more in one or more content adressable memorys (CAM) and the one or more finte-state machine (FA).In another embodiment, when original fingerprint or fingerprint at the shadow of a shadow space when indicating fully, described finger scan engine 140 can be the engine based on hash.Shown in Figure 1A, in one embodiment, described finger scan engine 140 comprises 146, one fingerprint databases 148 of 144, one fingerprint fingers of 142, one fingerprint hash of finger scan controller counter and a fingerprint compositor 150.In one embodiment, a fingerprint scans as an integral body, and therefore, described fingerprint compositor 150 is an optional components.
In another embodiment, each fingerprint is broken down into multistage by independent scanning.All fingerprint sections can be scanned earlier, and then by the scanning result of serial or parallel synthetic (as using fingerprint compositor 150) with the generation fingerprint.In one embodiment, the whole scanning process of described finger scan controller 142 controls.
Described finger scan engine 140 exportable one do not have occurrence or with fingerprint database in the result of a coupling.Described occurrence can be searched engine 160 by the fixed length condition code and the random length condition code is searched the character string condition code that engine 180 is searched subsequently corresponding to one or more.If there is not occurrence, scanning process is just finished.
In one embodiment, described fingerprint hash counter 144 comprises a plurality of independently universal hash function, h 0, h 1..., h I, to support Bloom filter.For example, when the limited bandwidth system of the size of internal memory rather than internal memory, available Bloom filter.For example, when the condition code database of scanning system enough little, in the time of can all depositing on-chip memory in, the limited size system of its internal memory.In another embodiment, described fingerprint hash counter 144 comprises a plurality of independently universal hash function, h 0, h 1..., h I, to support a plurality of hash tables.For example, mating when false positive need be very low (as 10 -3Or littler), the bandwidth of internal memory is enough wide and big or small when enough big, can use a plurality of hash tables.For example, when DRAM or other slower chip external memory are used for storing the data structure in follow up scan stage, and on-chip memory wishes that the false positive coupling is very low when enough storing a plurality of hash table.
In another embodiment, described fingerprint hash counter 144 only comprises a hash function h 0For example, when the limited size system of the bandwidth of internal memory rather than internal memory, can use a hash table.For example, when the condition code database of scanning system enough big, thereby must use chip external memory the time, the limited bandwidth system of internal memory.
Described fingerprint hash counter 144 extracts the data of n byte from described shadow field store device 132, and calculates its hashed value.Described data can be separately or are used for calculating the current hash values of all hash functions with initial value or preceding hashed value at random.In one embodiment, one of them hash function (for example, first hash function h 0) the number of bits of hashed value more than the number of bits of the hashed value of other hash function, and the number of bits of the hashed value of other hash function is identical.
The hashed value of hash function output can be used for searching fingerprint database 148 by fingerprint finger 146.In one embodiment, described fingerprint finger 146 comprises a Bloom filter and a hashed value demodulation multiplexer.Described Bloom filter is checked the effective marker that described hashed value is pointed.If all effective markers are all effective, the hashed value demodulation multiplexer further finds out a corresponding Hash block according to the additional bit of hashed value.Described hashed value demultiplexing is exactly an additional bit of checking particular Hash value, and the hashed value demultiplexing can carry out separately or carry out when checking the fingerprint length information of same relevant with fingerprint with other.The hash demultiplexing can further reduce the result of false positive condition code coupling.In one embodiment, described Bloom filter can be condensed to a hash table.In another embodiment, described Bloom filter can expand and is a plurality of hash tables.
The fingerprint of different length can be by also line scanning or serial scan.In one embodiment, a plurality of fingerprint fingers 146 can be used for carrying out and line scanning simultaneously.For example, fingerprint finger 146 can be used for scanning the fingerprint of each different length.
In another embodiment, a fingerprint finger 146 can carry out serial scan to the fingerprint of different length.For example, the length of fingerprint can be the integral multiple of scanning step.Each quick character string condition code scanning engine 100 is supported one group of fingerprint length, and S, 2*S, 3*S ..., m*S}, wherein S is that scanning step and m*S are the length of long fingerprint.On each scanning position, the fingerprint of different length is by serial scan.The order hash function of an input S elementary cell can be used to carry out described scanning.In one embodiment, the serial finger scan of described finger scan engine 140 can be described with following false code: k=[t/S]; For (i=0, i<k-1, k++) { scanning position=i*S; Preceding hashed value=IV; For (j=0, j<m-1, j++) { length=j*S of hash; If ({ the existing hash input=character string field [length of hash of scanning position+, the length+S-1 of hash of scanning position+] of the length of hash of scanning position+<t); Existing hashed value=hash function (existing hash input, preceding hashed value); Condition code is searched pointer=fingerprint and is searched (existing hashed value); Condition code numbering=condition code is searched (condition code is searched pointer); Preceding hashed value=existing hashed value;
Wherein S is a scanning step, and m*S is the longest fingerprint length, and t is the total length of character string field to be scanned, and IV is an original Hash value, and hash function () is an order hash function.Fingerprint is searched () and is realized with fingerprint finger 146 and fingerprint compositor 150, and condition code is searched () and searched engine 160 and random length condition code with the fixed length condition code and search engine 180 and realize.
Fig. 2 A-C has demonstrated the data structure of the fingerprint database 148 the when fingerprint when different length is done as a whole scanning, comprising a Bloom filter table 200, and 256 of fingerprint Hash block chained list 250 and Hash-entry pieces.With one group of hashed value that fingerprint hash counter 144 is given, { hashed value 0A, hashed value 1..., hashed value 1, fingerprint finger 146 removes to read described Bloom filter table 200.Every of described Bloom filter table 200 comprises an effective marker 202 and a Hash block chain list index 204.Effective marker 202 is the signs that are provided with when there is at least one condition code in input.
In one embodiment, Hash block chain list index 204 hashed value only 0AIndication, and the first term of sensing fingerprint Hash block chained list 250.If the effective marker 202 of all hashed value indications all is set up, further finger scan can carry out on the fingerprint Hash block chained list 250 of Hash block chain list index 204 indications.In another embodiment, Hash block chain list index 204 equals hashed value 0A, can be omitted, to reduce Bloom filter table or hash table, its cost is to increase the data structure of sweep phase thereafter.
In one embodiment, hashed value 0ABe used to fingerprint joined in the Bloom filter table 200 and go, and other hashed value can be used to reduce the false positive coupling.In addition, in order not delete the character string condition code by mistake, each of Bloom filter table 200 is followed the tracks of the number of its condition code with a counter.In another embodiment, described Bloom filter table 200 can be condensed to a hash table, only needs a hashed value (as hashed value 0A).In another embodiment, described Bloom filter table 200 can expand and is a plurality of hash tables.
In one embodiment, described fingerprint Hash block chained list 250 is a chained list.Each of fingerprint Hash block chained list 250 comprises 252, one next block pointers 254 of a last piece and a Hash-entry piece 256.Next block pointer 254 points to the next item down of described fingerprint Hash block chained list 250.When of described fingerprint Hash block chained list 250 is the tail item, last piece 252 will be set up.Because the tail item can also by checking whether next block pointer 254 is that null pointer is known, last piece 252 is one and is used for fast detecting and goes out the optional territory of tail item.Each Hash-entry piece 256 comprises maximum n fingerprint, and wherein n is any greater than zero integer.In one embodiment, the n an of the best can select according to the memory architecture of scanning system.For example, if in save as SRAM, n can equal 1; If in save as DRAM, n>1.
In one embodiment, shown in Fig. 2 C, each fingerprint item of described Hash-entry piece 256 comprises a hashed value (hashed value 0B) 260, one 262, one types of fingerprint length, 264, one feature code groups pointers 266 or condition code pointer 268 and a dislocation 270.Hashed value 0B260 and fingerprint length 262 be used for respectively realizing that the multiplexing and fingerprint length of hashed value is multiplexing.When type 264 is zero, will be feature code group pointer 266 of a feature code group output; Otherwise, will be condition code pointer 268 of a condition code output.Dislocation 270 is the dislocation of first elementary cell of fingerprint to first elementary cell of son section next to be compared.If there is not next data structure, just need not misplace 270.When not misplacing 270 the time, dislocation 270 can be set as zero.In another embodiment, when n>1, each fingerprint item can increase an effective marker.When described effective marker is set up, can compare hashed value 0B260 and fingerprint length 262.
In one embodiment, fingerprint of each fingerprint item storage of Hash-entry piece 256.Yet in another embodiment, because fingerprint can be very long, size is different again, and because have the further feature sign indicating number to search the stage after the finger print data library lookup, original fingerprint can not be stored in each fingerprint item.Therefore, the fingerprint item of each coupling may mate corresponding to the zero fingerprint owing to false positive, or owing to fingerprint collides corresponding to a plurality of fingerprints.When coming scanning fingerprint and (k/2 with a simple hash table rather than a Bloom filter table m)<<1 o'clock, the order of magnitude of the probability of false positive coupling is (k/2 m) and the order of magnitude of the probability of fingerprint collision be (k 2/ 2 2m), wherein k comprises hashed value at total fingerprint number and m 0AAnd hashed value 0BHashed value 0Total bit.
When with a Bloom filter table, more than two probability will decline to a great extent.When with a plurality of hash table, described two probability will further descend.Search in order to reduce the after-stage condition code as far as possible, the probability of false positive coupling and fingerprint collision can be reduced near zero.In one embodiment, the number of enough big m and hash function can be used for reducing the probability that false positive coupling and fingerprint collide.
In order to reduce storage space, in one embodiment, a plurality of hashed values can be multiplexed into a fingerprint item, to reduce the probability of null term.The hashed value of each m position can be divided into two parts: the hashed value of m1 position 1(m-m1) Wei hashed value 2Hashed value 1Be used as the address of fingerprint database, and hashed value 2(as hashed value 0B260) can be used to solve Hash collision and false positive coupling.M1 is more little, and required storage space is few more, but fingerprint Hash block chained list 250 is just long more.In one embodiment, when 2 M1During greatly greater than the k of twice or more times, the average length of fingerprint Hash block tabulation is less than 1.
In order to save the complicacy of storage space and the management that reduces described table, in another embodiment, the fingerprint of all different lengths can be multiplexed into a fingerprint database.Fingerprint length 262 can further reduce the probability of false positive coupling and fingerprint collision.
Searching data structure and can implement shown in Fig. 2 A-C with several different modes.Particular implementation can be according to the size of feature code table, the size of free memory and type, and as SRAM on the sheet, DRAM on the sheet, the outer DRAM of outer SRAM of sheet and sheet and deciding.For example, in one embodiment, if the condition code number is 128K, described effective marker 202 can leave on the sheet SRAM in to realize fast access, and Hash block chain list index 204 then leaves among the outer SRAM of a sheet.Effective marker 202 can be visited with all hashed values, and Hash block chain list index 204 only available hashed values 0AVisit.Only, just need visit Hash block chain list index 204 when the effective marker 202 of all hashed values when all being effective.End piece 252 and next block pointer 254 can leave among the outer SRAM of a sheet, hashed value 0B260 and fingerprint length 262 can leave among outer SRAM of a sheet or the DRAM, type 264, feature code group pointer 266 or condition code pointer 268 and misplacing 270 can leave among the outer DRAM of a sheet.Only work as hashed value 0B260 and during fingerprint length 262 coupling, need access type 264, feature code group pointer 266 or condition code pointer 268 and misplace 270.
In one embodiment, in order to improve scan efficiency, the fingerprint of a character string condition code can resolve into a plurality of fingerprint sections.Can walk abreast or serial scan to these fingerprint sections earlier, and then the fingerprint section of coupling be synthesized the scanning result of fingerprint with fingerprint compositor 150.Because the number of different fingerprint segment length is generally much smaller than the number of different fingerprint length, fingerprint is divided into the fingerprint section scans and to accelerate finger scan.Fingerprint is divided into the fingerprint section scans, make the longer fingerprint that contains more fingerprint sections of support become possibility, thereby reduce the false positive coupling.
In one embodiment, fingerprint section synthetic is accurately or completely, the non-false positive coupling.In another embodiment, in order to accelerate finger scan, fingerprint section synthetic be " coarse " or part, the false positive coupling is arranged.In order to reduce the probability of false positive coupling, the information of the one or more possible position of each fingerprint section and the possible length of fingerprint can be stored and be used to the fingerprint section is walked abreast or serial synthesizes any fingerprint matching.
When a plurality of fingerprint section serial scan, different embodiment of the Hash-entry piece of " at least one fingerprint matching " and its corresponding fingerprint compositor are shown in Fig. 2 D-E.In one embodiment, as long as have at least a fingerprint matching just to report a coupling, but can not provide information about the length of fingerprint that how many fingerprint matchings and coupling are arranged.
Shown in Fig. 2 D, in one embodiment, every hashed value that comprises the particular fingerprint section of Hash-entry piece 257 0B260, one fingerprint section Figure 27 274, one types of 2, one at least one fingerprint composite signals 264, one feature code groups pointers 266 or condition code pointer 268 and a dislocation 270.Fingerprint section Figure 27 2 is effective marker bitmap arrays, and its number of bits is identical with the number of the fingerprint section that described fingerprint compositor 151 is supported.When i fingerprint section of a fingerprint Duan Weiyi fingerprint, the i position of fingerprint section Figure 27 2 will be set up.Fingerprint section Figure 27 2 provides all possible position of described fingerprint section in all fingerprints.At least one fingerprint composite signal 274 provides the number of minimum steps of the fingerprint that contains described fingerprint section, and the fingerprint that is used at least one fingerprint section coupling is synthetic.In one embodiment, at least one fingerprint composite signal 274 is stored in first fingerprint section of described fingerprint.In another embodiment, at least one fingerprint composite signal 274 or any other fingerprint length information are stored in each fingerprint section of described fingerprint.In another embodiment, at least one fingerprint composite signal 274 is omitted.
In one embodiment, hashed value 0B260 and Fig. 2 C in hashed value 0B260 is identical.Type 264, feature code group pointer 266 or condition code pointer 268 and misplace 270 also identically with respective value among Fig. 2 C, are stored in first fingerprint section of fingerprint, and use after the individual clock period postponing " the fingerprint hop count subtracts 1 ".In one embodiment, because type 264, feature code group pointer 266 or condition code pointer 268 and misplacing 270 is stored in first fingerprint section of fingerprint, and the fingerprint that all first fingerprint sections are identical all is stored together.In one embodiment, in the selection course of fingerprint, allow the probability of a plurality of fingerprints of first identical fingerprint section for minimum.
An embodiment of a fingerprint compositor 151 when 274 of at least one fingerprint composite signals are stored in first fingerprint section of each fingerprint of Fig. 2 E demonstration.In one embodiment, the size of fingerprint section is elected as identical with scanning step.For example, the size of fingerprint section and scanning step all are 4, thereby make that the length of fingerprint is 4,8,12 and 16.Described fingerprint compositor 151 comprises the input of 280,32 of 12 d type flip flops and MUX 284 door 282 and 14 input.Fingerprint compositor 151 be input as fingerprint section Figure 27 2 and at least one fingerprint composite signal 274.If find a fingerprint, fingerprint compositor 151 will be exported a coupling 290.Described coupling 290 is only effective when first fingerprint section that first fingerprint section of a synthetic fingerprint is a fingerprint.
In one embodiment, for the MUX 284 of examining described coupling 290,4 inputs can be replaced by the MUX of one 5 input, the input end that wherein newly adds links zero and selected when the fingerprint section is not first fingerprint section.In one embodiment, by increasing logic gate in different time delay stage, fingerprint compositor 151 can be expanded, with the fingerprint length information of the fingerprint section except that the first fingerprint section of supporting a fingerprint.In another embodiment, when at least one fingerprint composite signal 274 was not stored in any fingerprint section, fingerprint compositor 151 can be simplified by deletion MUX 284 and all logic gates that is not used in the shortest fingerprint in the condition code database.In another embodiment, fingerprint compositor 151 can be revised as at an easy rate uses other scanning step, the number of fingerprint section and fingerprint length.In one embodiment, because a plurality of fingerprint matching is merged into the coupling of " at least one fingerprint matching ", fingerprint compositor 151 can be in the result of a finger scan of each clock period output.
When a plurality of fingerprint section serial scan, another embodiment that provides the Hash-entry piece of " all fingerprint matchings " of all couplings and its corresponding fingerprint compositor is shown in Fig. 2 F-G.When the fingerprint of a plurality of different lengths is mated as a condition code coupling, may need one or more after-stages scannings.
Shown in Fig. 2 F, each of described Hash-entry piece 258 comprises a hashed value of particular fingerprint section 0B260, one fingerprint section Figure 27 276, one types of 2, one all fingerprint composite signals 264, one feature code groups pointers 266 or condition code pointer 268 and a dislocation 270.All fingerprint composite signals 276 are fingerprint composite signals of all couplings, and its figure place is identical with the number of the fingerprint section that fingerprint compositor 152 is supported.When a fingerprint section is that the i position of all fingerprint composite signals 276 will be set up when one section of fingerprint of i fingerprint section is arranged.Other territory of Hash-entry piece 258 is identical with corresponding domain among Fig. 2 D.In one embodiment, described all fingerprint composite signals 276 only are stored in first fingerprint section of fingerprint.In another embodiment, described all fingerprint composite signals 276 or any other fingerprint length information are stored in each fingerprint section of a fingerprint.In another embodiment, described all fingerprint composite signals 276 are omitted.
An embodiment of one of Fig. 2 G demonstration fingerprint compositor 152 that a plurality of fingerprint length are arranged when all fingerprint composite signals 276 only are stored in first fingerprint section of each fingerprint.Described fingerprint compositor 152 the usefulness scanning step identical, fingerprint section size and fingerprint length with Fig. 2 E.Described fingerprint compositor 152 comprise the input of 280,52 of 16 d type flip flops with door 282 and 13 input with door 286.Coupling of described fingerprint compositor 152 outputs 0292, one couplings 1294, one couplings 2296 and couplings 3298, respectively with one, two, the coupling of three and four fingerprint segment lengths' fingerprint.
In one embodiment, by increase in different time delay stage logic gate (as with door) or increase input existing and door, described fingerprint compositor 152 can be expanded, with the additional fingerprint length information of the fingerprint section except that the first fingerprint section of supporting a fingerprint.In another embodiment, when all fingerprint composite signals 276 are not stored in any fingerprint section, being used for the logic gates relevant with fingerprint length with all with input door all fingerprint composite signals 276 can be deleted, to simplify described fingerprint compositor 152.In one embodiment, the Hash-entry piece 258 among Fig. 2 F can be expanded, and comprises many set types 264, feature code group pointer 266 or condition code pointer 268 and misplace 270; Every group of territory is used for each fingerprint length.In addition, in one embodiment, the information of the exact length of relevant fingerprint also can be used for the follow up scan stage.
Fingerprint compositor 152 can scan the fingerprint of all different lengths simultaneously.But because type 264, feature code group pointer 266 or condition code pointer 268 and misplacing 270 only is stored in the fingerprint section, and the fingerprints of a plurality of total described fingerprint sections are stored in together.It is minimum that the suitable selection of the fingerprint of condition code can make described influence drop to.Yet, in order to eliminate described influence, the type 264 of the fingerprint of the fingerprint section of all couplings, feature code group pointer 266 or condition code pointer 268 and misplacing 270 can be stored in the table of another fingerprint section indication that has been mated.All addresses to the fingerprint section of having mated can be used for searching clauses and subclauses in the described table.
In another embodiment, except that at least one fingerprint composite signal 274 or all fingerprint composite signals 276, fingerprint section Figure 27 2 can further be omitted, thereby is stored in any fingerprint section without any the fingerprint composite signal.The fingerprint section just can may be synthesized by form according to all fingerprints in the described condition code database.If a plurality of fingerprint sections satisfy any form of the form of all fingerprints, just think a fingerprint matching.For a kind of special circumstances,, just think a fingerprint matching if a plurality of fingerprint section satisfies the minimum requirements of all fingerprint format.
When a plurality of fingerprint sections and line scanning, different embodiment of the fingerprint compositor that " at least one fingerprint fingerprint matching " and Hash-entry piece and its of " all fingerprint matchings " are corresponding are shown in Fig. 2 H-I.Shown in Fig. 2 H, every of described Hash-entry piece 259 comprises a hashed value 0B260, one fingerprint sections are mated 278, one types, 264, one feature code groups pointer 266 or condition code pointer 268 and a dislocation 270.Described fingerprint section coupling 278 is the match bit of fingerprint section, and other territory is identical with pairing territory among Fig. 2 D.278 pairs of described fingerprint section couplings shown in Fig. 2 I as described in certain specific section adaptation of parallel fingerprint compositor (as the section adaptation 0, the section adaptation 1, the section adaptation 2, or the section adaptation 3) specific implication arranged.When the fingerprint section is an i fingerprint section of any fingerprint, the fingerprint section coupling 278 of i section adaptation will be set up.In one embodiment, i i the fingerprint section that the section adaptation is only stored fingerprint, thus fingerprint section coupling 278 always is set up, and can omit.In one embodiment, synthetic in order further to reduce false positive, the fingerprint length information of similar at least one fingerprint composite signal 274 or all fingerprint composite signals 276 can be stored in first fingerprint section or all fingerprint sections of a fingerprint.
One of Fig. 2 I demonstration is not with an embodiment of the parallel fingerprint compositor 153 of fingerprint length information.Described fingerprint compositor 153 comprises one 2 input and 282, one 3 inputs of door and door 286, one 4 that import and door 288 and one 4 input or doors 285.Described fingerprint compositor 153 is coupling of all fingerprint matching outputs 0292, one couplings 1294, one couplings 2296 and couplings 3298 and be coupling 290 of at least one fingerprint matching output.In one embodiment, an overall fingerprint length filtration device can be used coupling 0292, coupling 1294, coupling 2296, coupling 3298 and mate 290, to filter out impossible fingerprint length.In one embodiment, by increasing logic gate, described fingerprint compositor 153 can be expanded, with the fingerprint length information of storing in first fingerprint section of being supported in each fingerprint or all the fingerprint sections.In one embodiment, parallel fingerprint section scanning can be with a long scan step-length that equals fingerprint segment length several times, to add fast scan speed.
In general, in one embodiment, each fixed length condition code can be broken down into a plurality of fragments and scan independently.As scanning and synthetic fingerprint section, all fragments of a condition code can be scanned earlier, and then are closed by serial or parallel.Usually, the number of different characteristic sign indicating number segment length, also little a lot of even be not one than the number of different characteristic code length, thus can accelerate the speed of the scan method that need scan separately for each modal length.In one embodiment, the condition code section that two or more length are identical is by also line scanning.
In another embodiment, one or more in one or more hash tables and the one or more Bloom filter are used to scan the condition code section that indicates fully.In one embodiment, when having a condition code coupling at least, described fingerprint compositor 150 can be used to serial or parallel ground the condition code section of having discerned is synthesized any condition code coupling.Data structure shown in Fig. 2 D-I and embodiment can be used for composite character sign indicating number section.In another embodiment, one or more being used in one or more finte-state machines (FA) and the one or more content adressable memory (CAM) synthesizes any condition code coupling to the condition code section of having mated.The scanning of fixed length condition code
As shown in Figure 1, described fixed length condition code is searched engine 160 and is comprised that 162, one condition codes of a condition code search engine examine device 164 and a fixed length condition code database 166.Condition code search engine 162 can be discerned a possible fixed length condition code, the fixed length feature subcode of a possible random length condition code, or one comprise a plurality of possible fixed length condition codes or the condition code family of fixed length feature subcode.To be examined device 164 by condition code by the possible fixed length condition code of condition code search engine 162 identification or feature subcode examines.166 databases of examining device 164 for condition code search engine 162 and condition code of fixed length condition code database.
Fixed length condition code database 166 can be implemented with plurality of data structures.In one embodiment, shown in Fig. 3 A-B, fixed length condition code database 166 is the two-dimensional chain tables that a plurality of condition code chained lists 350 are linked with constitutive characteristic code group chained list 300.Each of feature code group chained list 300 comprises 304 and condition code pointers 306 of 302, one dislocation of next yard pointer.Next yard pointer 302 is pointers that point to the next item down of feature code group chained list 300, and condition code pointer 306 is pointers that point to special characteristic sign indicating number chained list 350.Dislocation 304 is the dislocation from the first module of particular fingerprint to the first module of the specific character string condition code of condition code pointer 306 indications.
Each fixed length condition code or feature subcode can be broken down into a plurality of condition code sections 352, to form condition code chained list 350.Described condition code section 352 can couple together by scanning sequency since first elementary cell of a fixed length condition code or feature subcode.In one embodiment, the size difference of described condition code section.In another embodiment, all described condition code sections measure-alike, its size can be selected best according to system architecture.Each of condition code chained list 350 comprises 354, one latter ends of 352, one condition code sections of condition code section film, 358, one types 360 of 356, one next segment pointers and a numbering 362.Next segment pointer 358 is pointers that point to next section, and latter end 356 is latter end signs.When type 360 was 0, numbering 362 was a fixed length feature subcode numbering 364; Otherwise numbering 362 is condition code numbering 366.Condition code section film 354 be used to specify each elementary cell or even the specific matching condition of this unit of subunit, comprise " need not mate ", " equating ", " unequal ", " in a scope ", " outside a scope ", " case-insensitive " and " case sensitive ".Described matching condition can realize by the input source and the output source of selected cell comparer.If fixed length condition code or feature subcode are not the integral multiples of the size of condition code section 352, can fill out reaching most " size of condition code section 352 subtracts one " individual " 0 " or other value at the afterbody of described fixed length condition code or feature subcode, and the film of the filler cells of correspondence is set to " need not mate ".
In one embodiment, the condition code section film 354 of each elementary cell is 3 bits.When " case-insensitive ", first bit is set to 0; When " case sensitive ", first bit is set to 1.Latter two bit is set to 0 when " equating ", and latter two bit is set to 1 when " unequal ", and latter two bit is set to 2 when " need not mate ", and latter two bit is 3 to be retained.More film bit can be used for realizing other matching condition as required, for example, and predefined scope (numerical character or alphabetic character), symbol class, or any range.In another embodiment, in order to improve storage efficiency, the sign indicating number film of one or more condition code sections or fixed length condition code or feature subcode can use separately, or cooperates used with the sign indicating number film of elementary cell or this unit of subunit.
In one embodiment, described condition code search engine 162 can be searched the tail item of feature code group chained list 300 up to described chained list, and promptly next yard pointer 302 is a null pointer.Each of feature code group chained list 300 is all returned a condition code pointer 306 that points to a condition code chained list 350.
Described condition code is examined device 164 and can be verified as each condition code chained list 350 and carry out condition code checking.Condition code is examined device 164 first section from condition code chained list 350, by each condition code section of scanning sequency paragraph by paragraph examination.If do not match, condition code is examined device 164 will stop the hypomere inspection; Otherwise condition code is examined device 164 will check full feature sign indicating number chained list 350 up to its tail, and promptly latter end 356 is 1.If find a coupling, when type is 0, mate fixed length subcode for the random length condition code, condition code is examined device 164 will export a fixed length feature subcode numbering 364; When type is 1, mates and be that fixed length condition code, condition code examine device 164 and will export fixed length condition code numbering 366.
In another embodiment, promptly drop to zero in order to make the false positive coupling with the growth of hop count, condition code section 352 can be by optimal sequence link together.Represent current condition code section will be added in each of condition code chained list 350 to the dislocation of next condition code section.Though the length of condition code section can be inequality, the length of a fixing condition code section can be selected.
Fig. 4 A-B has demonstrated the block scheme of an embodiment of a condition code unit comparer 400 and a condition code section comparer 450.Described condition code unit comparer 400 carries out the unit relatively in the fixed length condition code is searched.Condition code unit comparer 400 comprises 410, one 2 inputs of 408, one 4 inputs of 406, two not gates of a sign indicating number film demoder 402, one 2 inputs, 404, one equality comparators of MUX MUX or 412 and range comparator 414 of door.Sign indicating number film demoder 402 can be decoded as sign indicating number film bit the control signal of the input and output of equality comparator 406 and range comparator 414.In one embodiment, range comparator 414 is used for supporting the global scope of predefined each character string field or the global scope of each condition code alternatively.In one embodiment, m range comparator 414 can be used for supporting m predefined global scope.In another embodiment, units match 416 can with " need not mate " bit logical OR after be output.
A plurality of condition codes unit comparer 400 and input more than can be used to construction feature sign indicating number section comparer 450 with door 452.The data cell of condition code section comparer 450 is a byte normally, but also can be four bits or any other size.In another embodiment, condition code unit comparer 400 can be replaced by a condition code unit comparer shown in Fig. 4 C 480, with supported feature code element local scope.The unit of each the tape code film in the described condition code section 352 can expand to the unit scope of a tape code film, or the unit of the upper bound of the given condition code unit of a tape code film and lower bound is right.
In one embodiment, the fixed length condition code shown in Figure 1A is searched engine 160 may need to search a plurality of condition code chained lists.Yet the probability that need search a plurality of condition code chained lists is very low usually.When needs are searched a plurality of condition code chained list, can use the differential coding that a plurality of condition codes in a group condition code are encoded mutually.
For example, in one embodiment, the selection character string cell tree 500 shown in Fig. 5 A can be used as the data structure of searching of the condition code search engine 162 shown in Figure 1A.Select character string cell tree 500 to comprise burl 520a-520e.Select each burl 520 of character string cell tree 500 that two branches are arranged, a coupling branch is by pointer 1530 indications, another branch that do not match is by pointer 2532 indications.Shown in Fig. 5 B, select character string cell tree 500 that two kinds of different burls 520 are arranged: type 528 is that 1 leaf and type 528 are 0 non-leaf.The non-leaf of coupling always points to another burl 520, and unmatched non-leaf points to another burl 520 or the empty joint on the tree.The total condition code family chained list 550 shown in Fig. 5 C, another burl 520 or the empty joint on the unmatched leaf sensing tree of pointing to of the leaf of coupling.
In one embodiment, shown in Fig. 5 B, select each burl 520 of character string cell tree 500 to comprise 528, one pointers of dislocation 526, one types of 524, one unit membranes in 522, one unit 1530 and pointers 2532.Type 528 comprises leaf and non-leaf as mentioned above.Selected unit 524 can be in any position of condition code, when burl 520 is not tree root, its position (is for example provided by the dislocation 522 of previous burl 520, the front nodal point of node 520b is node 520a, but because node 520a is a tree root of selecting character string cell tree 500, so 520a does not have front nodal point), when burl 520 be tree root (for example, node 520a) time, its position is provided by the dislocation 270 of the occurrence in the fingerprint Hash block chained list 250 (seeing Fig. 2 B).Each burl comprise one with condition code chained list 350 in the corresponding unit membrane 526 of condition code section film 354 (seeing Fig. 3 B).
Any two character string condition codes have an elementary cell difference at least, as long as one of them character string condition code is not the son section of another character string condition code.Therefore, at least two character string condition codes can selectedly be distinguished in a unit 524, thereby make behind more described unit 524, have at least a character string condition code to be eliminated.By to selecting searching of character string cell tree 500, have at least a different condition code of elementary cell to be distinguished.In one embodiment, shown selection character string cell tree 500 is a binary tree.In another embodiment, can make up corresponding selection character string unit k and divide tree.K condition code in feature code group can be used as the burl that k divides tree in the locational k of a same elementary cell elementary cell, although each condition code can also contribute a plurality of elementary cells to divide in each burl of tree to k.
In one embodiment, part that the character string condition code can be another character string condition code.These have the condition code of mother-child relationship (MCR) not distinguish by selected character string cell tree 500.In one embodiment,, just do not need further to search, there is no need to distinguish the condition code that mother-child relationship (MCR) is arranged as long as find wherein any one condition code.Therefore, only need the wherein the shortest condition code of scanning.But, in another embodiment, need to distinguish all condition codes and maybe need to discern the longest condition code.
In order to support the condition code family of mother-child relationship (MCR), in one embodiment, condition code family chained list 550 shown in Fig. 5 C can be searched data structure (seeing Figure 1A) for what condition code was examined device 164.Each of condition code family chained list 550 comprises a type 552, one dislocation, 554, one condition code sections, 560, one numbering types 562 of 558, one the next item down pointers of 556, one condition code sections film and a numbering 564.In one embodiment, for supported feature sign indicating number family, condition code family chained list 550 has two kind items: type 552 be 0 search and type 552 is 1 result items.When checking that each is searched, condition code section 556 will be compared according to condition code section film 558.Condition code section 556 and condition code section film 558 and corresponding domain identical (seeing Fig. 3 B) in the condition code chained list 350.But result items does not need any condition code section relatively.
In one embodiment, the condition code of described all couplings of system searching, and export the numbering 564 of the condition code of each coupling.Yet search afterbody or the next item down pointer 560 that will proceed to condition code family chained list 550 of condition code are null pointer always, to find all female condition codes.Numbering 564 has two kinds: feature subcode numbering 566 and condition code numbering 568.The type of numbering 564 is decided by numbering type 562.When numbering type 562 was 0, numbering 564 was feature subcode numberings 566, represented the fixed length feature subcode of a random length condition code; Otherwise numbering 564 is condition code numberings 568, represents a fixed length condition code.
In one embodiment, condition code family chained list 550 from a minimus generation or the shortest subcode, be connected to a oldest generation or the longest subcode always.Dislocation 554 specifies first module from existing condition code section 556 to the dislocation of the first module of condition code section 556 down.If existing condition code section does not match, searching of condition code family chained list 550 can stop.Providing any dislocation 554 of deciding between the feature subcode makes based on unmatched premature termination and becomes possibility.
In one embodiment, in each generation of condition code family chained list 550, can only be supported a condition code.If, there are a plurality of condition codes in a specific generation, can be with a plurality of condition code family chained list 550, each condition code in described specific generation needs a condition code family chained list 550.A plurality of condition code family chained list 550 can be with selecting character string cell tree 500 distinguish.
In another embodiment, the condition code of tape code film can comprise that a content adressable memory (CAM) stores and scan with one or more storeies.The scanning of random length condition code
As shown in Figure 1, described fixed length condition code is searched engine 160 is exported the random length condition code of all couplings by scanning sequency the numbering of fixed length feature subcode, size, the position in described character string field.With the information of the fixed length feature subcode of having discerned, the random length condition code is searched engine 180 the fixed length feature subcode of having discerned is synthesized any random length condition code.In one embodiment, one or more finte-state machines (FA) are used to synthetic fixed length feature subcode.In another embodiment, the random length condition code is searched engine 180 and is comprised 184, one condition code composition rules of condition code rule searching device 182, one condition code state verifications device database 186 and a condition code state form 188.Condition code rule database 186 provides the static rule that how the fixed length feature subcode of random length condition code is synthesized the random length condition code.Condition code state form 188 dynamically is the synthetic state of a process of an input of character string field store.
Condition code rule searching device 182 finds out from condition code rule database 186 and the relevant rule of fixed length feature subcode numbering of having mated, and described relevant condition code rule is offered condition code state verification device 184.Condition code state verification device 184 synthesizes the random length condition code according to the fixed length feature subcode that the condition code composition rule will mate, and upgrades condition code state form 188.
Condition code rule database 186 and condition code state form 188 can have various data structure.In one embodiment, condition code rule database 186 can be condition code regulation linked 600 as shown in Figure 6.The numbering that condition code regulation linked 600 is searched the feature subcode that engine 160 finds by the fixed length condition code is pointed.A plurality of random length condition codes can comprise identical fixed length feature subcode.The random length condition code that condition code regulation linked 600 can comprise all the fixed length feature subcode of special characteristic subcode numbering representative links together.
In one embodiment, random length condition code of each correspondence of condition code regulation linked 600.Each comprises condition code numbering 606, one next yard pointers 608 of 604, one last codes of 602, one orders and a distance range information 610.The specific random length condition code of condition code numbering 602 representatives.Order 604 is the order of fixed length feature subcode in all fixed length feature subcodes of described random length condition code of described feature subcode numbering representative.Whether last code 606 indicates described fixed length feature subcode is last fixed length feature subcode of random length condition code.Last code 606 indicates the end of random length condition code scanning.Next yard pointer 608 points to the next item down of 600.Distance range information 610 is the optional territory that indicates the distance range (i.e. minimum and maximum unit number between the two fixed length feature subcodes) of this described fixed length feature subcode and following fixed length feature subcode.For example, when described scope or longest distance or bee-line is given in advance or when being infinitely great, distance range information 610 can be omitted or be reduced to bee-line or longest distance.
In another embodiment, each of condition code regulation linked 600 can comprise one or more additional fields, to describe one or more random length feature subcodes between two or more fixed length feature subcodes.For example, can be added to " pattern " that is used for filling up the repetition of distance range information 610 in each of condition code regulation linked 600 with one " pattern " or one " pointer of pattern " and go, to describe random length feature subcode.
As shown in Figure 7, in one embodiment, condition code state form 188 can be implemented with one or more condition code state-chain-tables 700.Each condition code state-chain-table 700 can be a character string field of particular link, the condition code state of all fixed length feature subcodes of all random length condition codes that dynamic memory has been discerned.Each of condition code state-chain-table 700 comprises 702, one last subcode order (lorder) 704, one next son code positions 706 (nloc) of a condition code numbering and next yard pointer 708 (nptr).Condition code numbering 702 is the numbering of specific random length condition code.The order of the specific fixed length feature subcode that the numbering that last subcode order 704 is previous fixed length feature subcodes of described specific random length condition code is given.Next son code position 706 is active position scopes of the numbering of the next fixed length feature subcode of the described specific random length condition code fixed length feature subcode of giving.
In one embodiment, each character string field of each line has a condition code state-chain-table 700.Usually, at each constantly, each line has only a character string field to be scanned, and has only a condition code state-chain-table 700.Condition code state-chain-table 700 can comprise whole effective history of all subcodes of having mated of all random length condition codes of a character string field of particular link.
Condition code state-chain-table 700 can be dynamic.In one embodiment, if the fixed length feature subcode of the numbering indication of described fixed length feature subcode is first fixed length feature subcode of a random length condition code, be that its order 604 is 1, and the item that does not also have described specific random length condition code, new item can be added in the condition code state-chain-table 700 of a character string field of particular link and go.If the position of first unit of the fixed length feature subcode of existing coupling is not in the effective scope that next son code position 706 is given or overtime generation, one of described condition code state-chain-table 700 can be deleted.One of described condition code state-chain-table 700 also can be deleted after the random length condition code that finds based on a coupling of described.In one embodiment, all of the condition code state-chain-table 700 of particular link character string field can be deleted at the end of a character string field.
As Fig. 1 and shown in Figure 6, in one embodiment, described condition code rule searching device 182 is searched the numbering that engine 160 receives a fixed length feature subcode from described fixed length condition code.Described condition code rule searching device 182 is searched the condition code regulation linked 600 of the numbering indication of whole described fixed length feature subcode, and with each information of giving of described condition code regulation linked 600 (as { condition code numbering 602, order 604, last code 606, distance range information 610}) and described fixed length condition code search information that engine 160 given (as { position of the first module of fixed length feature subcode, the length of fixed length feature subcode, the line numbering, the character string field number }) deliver to successively together in the condition code state verification device 184 and go, arrive the afterbody (being that next yard pointer 608 is null pointer) of described condition code regulation linked 600 always.
The condition code state-chain-table 700 of the described line numbering of the information searching of each that the described condition code rule searching of described condition code state verification device 184 usefulness device 182 is given indication.To each of condition code state-chain-table 700, if it is 702 identical that condition code numbering 602 and condition code are numbered, order 604 equals last subcode order 704 and adds one, and the effective marker that the position of first elementary cell of fixed length feature subcode is given at next son code position 706 is put in the scope, just is a coupling.To each occurrence of condition code state-chain-table 700, if last code 606 is 1, export described condition code numbering 602, and with described deletion from condition code state-chain-table 700; Otherwise described is updated, and allows the value 704 of last subcode order equal 604 value in proper order, and the value of next son code position 706 equals the position of first unit of fixed length feature subcode, the summation of the length of fixed length feature subcode and the value of distance range information 610.Do not need to do anything to unmatched.
In one embodiment, short and particular link only need scan a character string field in particular moment when condition code state-chain-table 700, can scan with described condition code state-chain-table 700.But if condition code state-chain-table 700 length or particular link need scan a plurality of character string fields in particular moment, other is searched data structure and can be used for described condition code state form 188.In one embodiment, described condition code state form 188 can be a condition code state Bloom filter or the hash table of a condition code state that is similar to the data structure among Fig. 2 A-C.The hashed key of condition code state Bloom filter or the hash table of condition code state can be 3 tuples { line numbering, character string field number, condition code numbering }.In one embodiment, when each line only has a character string field to be scanned in particular moment, need not use the character string field number.
As shown in Figure 8, in one embodiment, the Hash-entry piece 256 among Fig. 2 A-C can be replaced by Hash-entry piece 856.Each of Hash-entry piece 856 comprises that special one is levied 866 and next son code positions 868 of 864, one last subcode orders of 862, one character string field number of 860, one line numberings of sign indicating number numbering.The definition of last subcode order 866 and next son code position 868 is identical with the definition of next son code position 706 with last subcode order 704 among described Fig. 7, hashed key 3 tuples { condition code numbering 860, line numbering 862, character string field number 864} can be stored and be used to solve Hash collision.To each of Hash-entry piece 856, if former hashed key is identical, order 604 equals last subcode order 866 and adds one, and the position of the first module of fixed length feature subcode just is a coupling within the active position scope that next son code position 868 is given.To each occurrence of Hash-entry piece 856, if last code 606 is 1, export described condition code numbering 602, and with described deletion from Hash-entry piece 856; If last code 606 is 0, the item of described Hash-entry piece 856 is updated, and allows last subcode order 866 equal order 604, and next son code position 868 equals the position of the first module of fixed length feature subcode, the length of fixed length feature subcode and the summation of distance range information 610.The Xiang Ze that do not match remains unchanged.
In one embodiment, when the order of two continuous fixed length feature subcodes and distance range satisfy in the condition code regulation linked 600 one or multinomial, whether condition code state verification device 184 will further examine the string segments between described two fixed length feature subcodes and or one or more random length feature subcodes couplings of multinomial indication of condition code regulation linked 600.When random length feature subcode coupling of character string between described two continuous fixed length feature subcodes in the described specific character string field and described condition code regulation linked 600 one or multinomial indication, described condition code state-chain-table 700 will be with described new fixed length feature subcode renewal.The design of scanning system and performance
In one embodiment, fast the speed of character string condition code scanning engine 100 is subjected to the speed limit of finger scan engine 140, for example, and when enough little and follow up scan stage is properly designed when described false positive coupling.Do as a whole quilt length when scanning one by one when fingerprint, the speed of finger scan engine 140 depends on scanning step, the number of fingerprint length and clock speed combination.In one embodiment, the speed of described scanning engine 100 is (s/m) * R, and wherein s is a scanning step, and m is that the number and the R of fingerprint length is clock speed.For example, if scanning step is 8 bytes, fingerprint length 4,8,16 and 32 bytes, clock frequency is 500MHz, the speed of described scanning engine 100 is 8* (8/4) * 500MB/s=8Gbits/s.
In another embodiment, if also line scanning is carried out in the segmentation of fingerprint elder generation, and then synthetic with " at least one fingerprint matching " serial, and the fingerprint segment length is identical with scanning step, and the speed of an independent scanning engine 100 is s*R.In one embodiment, when s got the value identical with precedent with R, described scanning engine 100 can character string field of 32Gbps velocity sweeping.In addition, in another embodiment, fingerprint can scan in first segmentation, and then it is synthetic to walk abreast, thereby the scanning step sweep velocity can further improve n doubly, and wherein n is the number of also line scanning and the fingerprint section of synthesizing.When the fingerprint segment length got the value identical with precedent with R, if n is 4, promptly 4 fingerprint sections were by also line scanning and synthetic, and scanning step is 32 bytes, and described scanning engine 100 can character string field of 128Gbps velocity sweeping.
Sweep velocity discussed above is the speed of a single condition code scanning engine.In one embodiment, when carrying out also line scanning with several condition code scanning machines, the condition code sweep velocity can further improve several times.
In one embodiment, the general structure of a condition code scanning system and the selection of parameter can be according to one or more following factors: the sweep velocity of character string condition code, the size of the fixed length feature subcode of fixed length condition code or random length condition code, similarity between a plurality of condition codes or the feature subcode, select with the size of condition code database, to guarantee finger scan engine 140, the fixed length condition code searches engine 160 and the random length condition code is searched the requirement that engine 180 can satisfy specific scanning system.For example, scanning step can be selected according to system requirements.As shown in table 1, scanning step is long more, and the speed of character string condition code scanning engine 100 is just fast more fast.But the minimum dimension of fixed length condition code or feature subcode just needs big more, and the number of times that inserts and delete is just many more.The step-length that exposes thoroughly also limits the selection of the fingerprint of each condition code, and increases the probability of fingerprint collision and fingerprint false positive coupling.Table 1: the selection of scanning step
Figure BPA00001213012300401
In addition, thus the scanning step sweep velocity can be subjected to the size restrictions of minimum fixed length condition code or feature subcode especially.In one embodiment, for fear of lacking separately scanning of condition code, scanning step can be selected according to the minimum dimension of fixed length condition code or feature subcode by table 1.
Each elementary cell of table 1 all fixed length condition codes of hypothesis and feature subcode can be with being fingerprint.In one embodiment, all fixed length condition code and feature subcodes indicate fully at a shadow space at least.In addition, in another embodiment, thereby the scanning step sweep velocity is limited by the minimum dimension of the shadow that indicates fully of all fixed length condition codes and feature subcode further.Therefore, the 3rd column heading at table 1 can make " minimum dimension of all fixed length condition codes or the shadow that indicates fully of feature subcode " into.
In another embodiment, in order to improve sweep velocity, can select bigger scanning step.Being shorter than the condition code that described scanning step can scan can be scanned separately, for example, and with above-mentioned scan method or any other scan method.When only minority fixed length condition code or feature subcode are shorter, it is very effective to increase scanning step.
In another embodiment, the engine number of different scanning flow line stage can be different.Engine can be selected according to the requirement of described particular system.For example, for particular system, can search engine 160 and two random length condition codes are searched engine 180 with scanning 120, four finger scan engines of pre-service engine 140, one fixed length condition codes.
In one embodiment, can be with a plurality of finger scan engines 140, thus make each finger scan engine 140 cover one group of fingerprint length, so that the multiresolution finger scan to be provided.In one embodiment, all fingerprint all is broken down into the fingerprint section of equal length and all identical scanning steps of fingerprint Duan Douyong scan.The number of finger scan engine that is used to scan every group of fingerprint length of all fingerprint length groups can be all identical.
In another embodiment, fingerprint is broken down into the fingerprint section of different length, the fingerprint section of different length is scanned by the scanning step with different length according to the average length of one group of fingerprint length, thereby make one group of average length use than long fingerprint section and bigger scanning step than the fingerprint of long fingerprint length, one group of average length is used than short fingerprint section and less scanning step than the fingerprint of short fingerprint length.For example, the fingerprint section of 8 elementary cells and scanning step can be used for the fingerprint that length is 8,16,24,32 and 40 elementary cells, and the fingerprint section of 2 elementary cells and scanning step can be used for the fingerprint that length is 2,4 and 6 elementary cells.
In one embodiment, for the speed of balance with the finger scan engine 140 of the fingerprint section of different scanning step scan different length, can be more than the number of using than the finger scan engine 140 of the longer fingerprint section of long scan step scan with shorter scanning step scanning than the number of the finger scan engine 140 of brachydactylia line section.In another embodiment, for the speed of balance with the finger scan engine 140 of the internal memory of friction speed, can be with the number of the finger scan engine 140 of slow internal memory more than number with the finger scan engine 140 of very fast internal memory.In general, in another embodiment, the number of finger scan engine 140 can be decided by the product of scanning step and memory speed.The number of the finger scan engine 140 of the product of less scanning step and memory speed can be more than the number of the finger scan engine 140 of the product of expose thoroughly step-length and memory speed.
In one embodiment, the finger scan engine 140 of same group of fingerprint of the scanning that a plurality of scanning steps are identical covers a plurality of nonoverlapping staggered position in the input character string field, thereby makes that total scanning step of described a plurality of finger scan engines is the product of the scanning step of the number of finger scan engine and former single finger scan engine.For example, for identical sweep velocity is provided, scanning step is that the number of the finger scan engine of 2 elementary cells can be that scanning step is 4 times of number of the finger scan engine of 8 elementary cells.In another embodiment, the a plurality of partly overlapping staggered position that the finger scan engine 140 of same group of fingerprint of the scanning that a plurality of scanning steps are identical covers in the input character string field, thereby it is make total scanning step of described a plurality of finger scan engines, but littler than the product of the scanning step of the number of finger scan engine and former single finger scan engine greater than the scanning step of former single finger scan engine.
In one embodiment, the fingerprint database 148 of different fingerprint segment length can be stored in the internal memory of friction speed, thereby makes that the internal memory that is used for short fingerprint section is faster than the internal memory that is used for long fingerprint section.In one embodiment, the pairing fixed length condition code of different fingerprint length groups database 166 can be stored in the internal memory of friction speed, thus make be used for one group of average length the internal memory of short fingerprint the internal memory of long fingerprint is fast than being used for one group of average length.
In one embodiment, the fingerprint database 148 that is shorter than the fingerprint of length-specific (for example, 9 elementary cells) can be stored in one of the fastest internal memory (for example, the buffer memory of on-chip memory or CPU) of scanning system.In one embodiment, all or part of of the pairing fixed length condition code of fingerprint database 166 that is shorter than described length-specific also can be stored in one of the fastest internal memory of scanning system together with fingerprint database 148.In another embodiment, a plurality of finger scan engines of same fingerprint group can be shared fingerprint database 148 in internal memory that is stored in a multiport.
In one embodiment, one or more engines in the various flows last pipeline stages discussed above can be replaced by any other scan method.For example, in one embodiment, a content adressable memory (CAM) can be used as the shadow that finger scan engine 140 is used to scanning fingerprint, and the fixed length condition code is searched engine 160 and random length condition code and searched engine 180 and still can be used to further scan feature sign indicating number.In another embodiment, a CAM can be used as finger scan engine 140 and is used at the one or more fingerprints of former spacescan.In one embodiment, determine or non-deterministic finite automaton (DFA or NFA) can be used as fingerprint compositor 150 and is used to synthetic fingerprint section for one.In another embodiment, DFA or NFA can be used as the random length condition code and search engine 180 and be used to a fixed length feature subcode and synthesize the random length condition code.
Other embodiments of the invention can scan other string data.For example, in biosystem, a hereditary code sequence can be used as a character string field.The condition code of describing specific gene can be used for identifying described specific gene sequence from a character string field of being made up of genetic data.For example, specific gene can be discerned by specific condition code with described scanning machine.
The described all functions operation of the present invention and this instructions comprises described structural approach of this instructions or suitable structural approach, or both combinations, can or use computer software with Fundamental Digital Circuit, firmware, or hardware is implemented.The present invention can be implemented on one or more computer programs, be one or more being stored in the information carrier as a machine-readable memory device or a transmitting signal, such as a programmable processor, the digital processing device of or many computing machines is performed or control the computer program of its operation.A computer program (is also referred to as a program, software, application software, or code) can be written as any programming language form that comprises compiling or interpretative code, and can be deployed as and comprise as a stand-alone program or as a module, assembly, subroutine, perhaps other is applicable to any form of the unit of computing environment.A computer program might not corresponding file.A program can exist in the file together with other program or data, can be a single problem program file, can be the file (for example, storing one or more modules, subroutine, or a plurality of files of partial code) of a plurality of coordinations.A computer program can be deployed to a computing machine, many computing machines in same place, or be distributed in a plurality of places with carrying out on interconnected many computing machines of communication network.
Handling procedure that this instructions is described and logic flow, comprise method step of the present invention, can carry out with one or more programmable processors of carrying out one or more computer programs, to implement function of the present invention by operation input data and generation output.The logical circuit of specific use as field programmable gate array (FPGA) or special IC (ASIC), also can be used for carrying out described processing procedure and logic flow and enforcement device of the present invention.
The processor that is applicable to computer program comprises, as and the microprocessor of specific use, and any one or a plurality of processor of the digital machine of any kind of.Generally speaking, processor will be received instruction and data from ROM (read-only memory) or random access memory or both.The basic module of a computing machine is a processor that is used to execute instruction and one or more storage instruction and memory of data.In general, computing machine also comprise or be coupled to effectively one or more mass-memory units that are used for storing data (as, disk, magneto-optic disk or CD), to receive, send, or transceive data.The information carrier that is applicable to storage computation machine programmed instruction and data comprises the non-volatility memorizer of all forms, comprises as EPROM EEPROM, the semiconductor memory of quickflashing (flash) storer; Disk as internal hard drive or moveable magnetic disc; Magneto-optic disk; With CD-ROM and DVD-ROM CD or the like.Processor and storer can be replenished or be included into the logical circuit of specific use by the logical circuit of specific use.
For the interaction with the user is provided, the present invention can have display device at one, implements on the computing machine of a keyboard and a pointing device.Display device is used for being user's display message as a cathode ray tube (CRT) or LCD (LCD); Keyboard and pointing device as mouse or trace ball, are then helped the user to provide to be input in the computing machine and are gone.The equipment of other type also can be used for providing the interaction of user and computing machine; For example, the feedback that computing machine offers the user can be the sensation feedback of any form, for example, and visual feedback, audio feedback, or tactile feedback; The form that the user is input to computing machine also can comprise acoustics for any form, voice or sense of touch input.
The present invention can implement on a computer system, described system comprises an aft-end assembly as a data server, or comprise the intermediate module (Middleware) of as an application server, comprise one as one have a graphic user interface client computer or by its user can with the front end assemblies of implementing network (Web) browser of interaction of the present invention, or any this class rear end, centre, or the combination of front end assemblies.The assembly of described system can be by interconnecting as any form of communication network or the digital data communication of media.The example of communication network comprises local network (LAN) and wide area network (WAN), as the internet.
This computing system can comprise client and server.Client and server is usually away from the other side, usually by a communication network interaction.The relation of client and server results from the calculation procedure that client and service relation are arranged that moves respectively on computing machine separately.
Fig. 9 has demonstrated an example of such computing machine, a block scheme that can be used for implementing or carrying out the programmable disposal system (abbreviation system) 910 of device of the present invention or method.Described system 910 comprises a processor 920, a random-access memory (ram) 921, a ROM (read-only memory) 922 (as a ROM (read-only memory) write (ROM)) as flash ROM, a hdd controller 923, an image controller 931, with the controller 924 of an I/O (I/O), and by 925 couplings of processor (CPU) bus.Described system 910 can be programmed as in ROM, also can be with another program source (as floppy disk, CD-ROM, or another computing machine) thus program of loading is programmed (with being reprogrammed).
Hdd controller 923 and hard disk 930 are coupled and can be used for storing executable computer program.
The controller 924 of I/O is to be connected to an input/output interface 927 by an input/output bus 926.Input/output interface 927 is at serial link, LAN (Local Area Network), and on wireless connections and parallel link or the like communication link, reception and transportation simulator or numerical data (as one group of stage photo, picture, film and animation).
A display 928 and a keyboard 929 also are connected on the input/output bus 926.In addition, different line (different buses) can be used for connecting input/output interface 927, display 928 and keyboard 929.
The present invention is described in the mode of specific embodiment.Other embodiment is within the scope of appended claims.For example, the step of invention can be carried out in differing order, and still can reach the result of expectation.

Claims (6)

1. character string condition code scan method, described method comprises:
One or more condition codes are processed into one or more forms, described form comprises one or more fingerprints and one or more follow-up data structure of searching of each fixed length feature subcode of each fixed length condition code or random length condition code, described one or more fingerprint comprises J fingerprint of specific fixed length condition code or feature subcode, the position on direction of scanning of first elementary cell of described J fingerprint in described specific fixed length condition code or feature subcode equals J divided by the remainder of the step-length of condition code scan operation, thereby make the number of described fingerprint equal the step-length of condition code scanning, and make and to be identified on described specific fixed length condition code or the feature subcode any position in any character string field that is scanned, wherein each described fingerprint comprises one or more fragments of specific fixed length condition code or feature subcode, and described one or more fragments have the ad-hoc location Anywhere in described specific fixed length condition code or feature subcode;
Receive a specific character string field of forming by data value;
Discern the included any condition code of described specific character string field, being included in each is on the position of spacing with the scanning step, scan described specific character string field, to search the one or more described fingerprint of one or more condition codes, with on the position of the described fingerprint that one or more couplings are arranged, search described specific character string field, to search one or more follow-up data structures of searching; With
Export any condition code of having discerned in the described specific character string field.
2. character string condition code scan method, described method comprises:
Each condition code in a plurality of condition codes is resolved into one or more condition code sections;
Receive a specific character string field of forming by data value;
Scan described specific character string field,, comprise parallel two or more described condition code sections of searching to search described a plurality of condition code sections of described a plurality of condition codes;
The condition code section of having discerned is synthesized the coupling of any condition code; With
Export any condition code of having discerned in the described specific character string field.
3. character string condition code scan method, described method comprises:
One or more condition codes are processed into one or more forms, comprise each the random length condition code in one or more condition codes is decomposed into a plurality of fixed length feature subcodes and one or more random length feature subcode;
Receive a specific character string field of forming by data value;
Discern the included any condition code of described specific character string field, comprise the described specific character string field of scanning, to search a plurality of described fixed length condition codes or feature subcode, with on the position that has one or more described fixed length feature subcodes to be identified, the described fixed length feature subcode of having discerned is synthesized any random length condition code; With
Export any condition code of having discerned in the described specific character string field.
4. character string condition code scan method, described method comprises:
For each the character string object in one or more character string objects is selected a plurality of fixed length condition codes, a plurality of fixed length condition codes of described specific character string object comprise J fixed length condition code, the position on direction of scanning of first elementary cell of described J fixed length condition code in described specific character string object equals J divided by the remainder of the step-length of condition code scan operation, thereby make the number of fixed length condition code of described specific character string object equal the step-length of condition code scan operation, and make and to be identified on any position of described specific character string object in any character string field that is scanned;
Receive a specific character string field of forming by data value;
Discern the included any character string object of described specific character string field, being included in each is on the position of spacing with the scanning step, scan described specific character string field, to search described a plurality of fixed length condition codes of described one or more character string objects, comprise and two or more fixed length condition codes of line scanning; With
Export any character string object of having discerned in the described specific character string field.
5. character string condition code scan method, described method comprises:
For each the character string object in one or more character string objects is selected one or more fixed length condition codes;
Described one or more fixed length condition codes of described one or more character string objects are processed into one or more forms, described form comprises one or more fingerprints and one or more follow-up data structure of searching of each fixed length condition code, a plurality of fingerprints of described specific character string object comprise J fingerprint, the position on direction of scanning of first elementary cell in described specific character string object of described J described fingerprint equals J divided by the remainder of the step-length of condition code scan operation, thereby make the number of fingerprint of described specific character string object equal the step-length of condition code scan operation, and make and to be identified on any position of described specific character string object in any character string field that is scanned, wherein each described fingerprint comprises one or more fragments of the specific fixed length condition code in described one or more fixed length condition codes of described specific character string object, and described one or more fragments have the ad-hoc location Anywhere in described specific fixed length condition code;
Receive a specific character string field of forming by data value;
Discern any character string object in the described specific character string field, be included on the position that each scanning step is a spacing, scan described specific character string field, to search a plurality of described fingerprint of described one or more character string objects, comprise parallel two or more described fingerprints of searching, with on the position of the described fingerprint that one or more couplings are arranged, search described specific character string field, to search the described follow-up data structure of searching; With
Export the character string object of any identification in the described specific character string field.
6. character string condition code scanning system, described system comprises:
A machine-readable memory device that comprises computer program; With
One or more executable computer program products and operation comprise the processor of following one or more modules:
One or more fingerprints and one or more follow-up condition code pretreatment module of searching one or more forms of data structure that can be processed into one or more condition codes each the fixed length feature subcode that comprises each fixed length condition code or random length condition code, described one or more fingerprint comprises J fingerprint of specific fixed length condition code or feature subcode, the position on direction of scanning of first elementary cell of described J fingerprint in described fixed length condition code or feature subcode equals J divided by the remainder of the step-length of condition code scan operation, thereby make the number of described fingerprint equal the step-length of condition code scanning, and make and to be identified on described fixed length condition code or the feature subcode any position in any character string field that is scanned, wherein each described fingerprint comprises one or more fragments of specific fixed length condition code or feature subcode, and described one or more fragments have the ad-hoc location Anywhere in described specific fixed length condition code or feature subcode;
The scanning pre-service engine that can be treated to an input character string field of forming by data value the required form of one or more scannings; With
The finger scan engine that can on described input character string field, discern one or more fingerprints of one or more condition codes, described identification is included on the position that each scanning step is a spacing, scan described input character string field, to search described one or more fingerprints of described one or more condition codes.
CN200880127748.0A 2008-10-20 2008-10-20 Fast signature scan Expired - Fee Related CN101960469B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201711339378.4A CN108197470A (en) 2008-10-20 2008-10-20 Fast signature scan
CN201410055830.4A CN103793522B (en) 2008-10-20 2008-10-20 Fast signature scan

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/US2008/080457 WO2010047683A1 (en) 2008-10-20 2008-10-20 Fast signature scan

Related Child Applications (2)

Application Number Title Priority Date Filing Date
CN201410055830.4A Division CN103793522B (en) 2008-10-20 2008-10-20 Fast signature scan
CN201711339378.4A Division CN108197470A (en) 2008-10-20 2008-10-20 Fast signature scan

Publications (2)

Publication Number Publication Date
CN101960469A true CN101960469A (en) 2011-01-26
CN101960469B CN101960469B (en) 2014-03-26

Family

ID=42119542

Family Applications (2)

Application Number Title Priority Date Filing Date
CN200880127748.0A Expired - Fee Related CN101960469B (en) 2008-10-20 2008-10-20 Fast signature scan
CN201711339378.4A Pending CN108197470A (en) 2008-10-20 2008-10-20 Fast signature scan

Family Applications After (1)

Application Number Title Priority Date Filing Date
CN201711339378.4A Pending CN108197470A (en) 2008-10-20 2008-10-20 Fast signature scan

Country Status (2)

Country Link
CN (2) CN101960469B (en)
WO (1) WO2010047683A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104380308A (en) * 2012-05-31 2015-02-25 Opto电子有限公司 Read-in device, read-in result output method, and program
CN107567621A (en) * 2015-05-06 2018-01-09 厄尔扬·韦斯特哥特科技公司 For performing the method, system and computer program product of numeric search
WO2020051895A1 (en) * 2018-09-14 2020-03-19 西门子股份公司 Data compression method, data restoration method and device

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105095367B (en) * 2015-06-26 2018-12-28 北京奇虎科技有限公司 A kind of acquisition method and device of client data
US20230104304A1 (en) * 2021-09-28 2023-04-06 Rakuten Mobile, Inc. Logic-gate based non-deterministic finite automata tree structure application apparatus and method

Family Cites Families (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH04306785A (en) * 1991-04-03 1992-10-29 Mitsubishi Electric Corp Pattern recognition system
EP0772929B1 (en) * 1994-07-26 2006-09-06 Siemens Energy and Automation, Inc. Methods and systems for creating and authenticating unalterable self-verifying articles
US7607018B2 (en) * 2001-05-08 2009-10-20 Ip.Com, Inc. Method and apparatus for collecting electronic signatures
KR20040024870A (en) * 2001-07-20 2004-03-22 그레이스노트 아이엔씨 Automatic identification of sound recordings
CN1459761B (en) * 2002-05-24 2010-04-21 清华大学 Character identification technique based on Gabor filter set
CN1567174A (en) * 2003-06-09 2005-01-19 吴胜远 Method for expressing and processing object and apparatus thereof
US7444515B2 (en) * 2003-08-14 2008-10-28 Washington University Method and apparatus for detecting predefined signatures in packet payload using Bloom filters
CN1300982C (en) * 2003-12-05 2007-02-14 中国科学技术大学 Hierarchical cooperated network virus and malice code recognition method
US20060106769A1 (en) * 2004-11-12 2006-05-18 Gibbs Kevin A Method and system for autocompletion for languages having ideographs and phonetic characters
CN100354863C (en) * 2005-02-03 2007-12-12 中国科学院计算技术研究所 Method and system for large scale keyboard matching
US7400271B2 (en) * 2005-06-21 2008-07-15 International Characters, Inc. Method and apparatus for processing character streams
CN1972292B (en) * 2005-10-17 2012-09-26 飞塔公司 Systems and methods for processing electronic data
US20080005578A1 (en) * 2006-06-29 2008-01-03 Innovya Research & Development Ltd. System and method for traceless biometric identification
US7747078B2 (en) * 2006-07-06 2010-06-29 Intel Corporation Substring detection system and method
CN1997011B (en) * 2006-07-26 2011-01-12 白杰 Data partition method and data partition device
CN100530182C (en) * 2006-10-17 2009-08-19 中兴通讯股份有限公司 Character string matching information processing method in communication system

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104380308A (en) * 2012-05-31 2015-02-25 Opto电子有限公司 Read-in device, read-in result output method, and program
CN107567621A (en) * 2015-05-06 2018-01-09 厄尔扬·韦斯特哥特科技公司 For performing the method, system and computer program product of numeric search
CN107567621B (en) * 2015-05-06 2022-04-26 厄尔扬·韦斯特哥特科技公司 Method, system and computer program product for performing a digital search
WO2020051895A1 (en) * 2018-09-14 2020-03-19 西门子股份公司 Data compression method, data restoration method and device

Also Published As

Publication number Publication date
CN101960469B (en) 2014-03-26
CN108197470A (en) 2018-06-22
WO2010047683A1 (en) 2010-04-29

Similar Documents

Publication Publication Date Title
US10949641B2 (en) Fast signature scan
US7454418B1 (en) Fast signature scan
US9805080B2 (en) Data driven relational algorithm formation for execution against big data
Policriti et al. LZ77 computation based on the run-length encoded BWT
CN107924410A (en) For the relevant technology of binary coding with individual-layer data object for supporting the highly effective path of individual-layer data object to navigate
WO2005036403A1 (en) Database management system, data structure generating method for database management system, and storage medium therefor
US20100223240A1 (en) System and method for composite record keys ordered in a flat key space for a distributed database
Li et al. Optimal in-place suffix sorting
CN101960469B (en) Fast signature scan
US9065469B2 (en) Compression match enumeration
Yang et al. Pase: Postgresql ultra-high-dimensional approximate nearest neighbor search extension
Zentgraf et al. Fast lightweight accurate xenograft sorting
CN103793522A (en) Method and system for rapidly scanning feature codes
US20080306948A1 (en) String and binary data sorting
Lederman A random-permutations-based approach to fast read alignment
JP3062119B2 (en) Character string search table, method for creating the same, and character string search method
US20240232158A1 (en) Compact Probabilistic Data Structure For Storing Log Data
Li et al. Longest increasing subsequence computation over streaming sequences
CN117668147A (en) Method and system for realizing high-performance stock searching by using prefix tree
Wohoush Genome Database Indexing Using A Modified Wavelet Transformation And Btree
Kim et al. Practical Space-Efficient Index for Structural Pattern Matching
Thabit An Integer Prefix-Based Methodology for Enhancing the Execution Performance of Any String Sorting Algorithm
Baláž et al. Prefix-free graphs and suffix array construction in sublinear space
Papanastassiou et al. A Language-Agnostic Compression Framework for the Bitcoin Blockchain
EP3036663B1 (en) Thin database indexing

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
ASS Succession or assignment of patent right

Owner name: WANG YING

Free format text: FORMER OWNER: WANG QIANG

Effective date: 20141115

C41 Transfer of patent application or patent right or utility model
COR Change of bibliographic data

Free format text: CORRECT: ADDRESS; TO: 200000 YANGPU DISTRICT, SHANGHAI

TR01 Transfer of patent right

Effective date of registration: 20141115

Address after: 902 room 83, Lane 289, 200000 souvenir Road, Shanghai

Patentee after: Wang Ying

Address before: California, USA

Patentee before: Wang Qiang

CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20140326