CN101960469B - Fast signature scan - Google Patents

Fast signature scan Download PDF

Info

Publication number
CN101960469B
CN101960469B CN200880127748.0A CN200880127748A CN101960469B CN 101960469 B CN101960469 B CN 101960469B CN 200880127748 A CN200880127748 A CN 200880127748A CN 101960469 B CN101960469 B CN 101960469B
Authority
CN
China
Prior art keywords
fingerprint
condition code
shadow
fixed length
length
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN200880127748.0A
Other languages
Chinese (zh)
Other versions
CN101960469A (en
Inventor
王强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wang Ying
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN201711339378.4A priority Critical patent/CN108197470A/en
Priority to CN201410055830.4A priority patent/CN103793522B/en
Publication of CN101960469A publication Critical patent/CN101960469A/en
Application granted granted Critical
Publication of CN101960469B publication Critical patent/CN101960469B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0207Discounts or incentives, e.g. coupons or rebates
    • G06Q30/0225Avoiding frauds

Landscapes

  • Business, Economics & Management (AREA)
  • Strategic Management (AREA)
  • Engineering & Computer Science (AREA)
  • Accounting & Taxation (AREA)
  • Development Economics (AREA)
  • Finance (AREA)
  • Economics (AREA)
  • Game Theory and Decision Science (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Marketing (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Collating Specific Patterns (AREA)

Abstract

Systems and methods for scanning signatures in a string field. In one implementation, the invention provides a method for signature scanning. The method includes processing one or more signatures into one or more formats that include one or more fingerprints and one or more follow-on search data structures for each fixed-size signature or signature substring such that the number of fingerprints for each fixed-size signature or signature substring is equal to a step size for a signature scanning operation and the particular fixed-size signature or signature substring is identifiable at any location within any string fields to be scanned, receiving a particular string field, identifying any signatures included in the particular string field including scanning for the fingerprints for each scan step size and searching for the follow-on search data structures at the locations where one or more fingerprints are found, and outputting any identified signatures.

Description

Fast signature scan
Technical field
The present invention relates to scan the condition code in character string field.
Background technology
The object of digital content (as file, program, webpage, Email, internet data bag, or digital picture) can comprise one or more character string fields.A character string field is a data value string that conventionally represents word or executable code.For example, an internet data bag can comprise network address, host name, Hypertext Transfer Protocol (HTTP) header, Hypertext Transfer Protocol message, e-mail attachment, Email Header and Email content.The large I of a character string field from several bytes to millions of with last byte.A character string condition code can be a string data value specifically indicating completely or the expression formula (as specific regular expression) of specific data value, its objective is for identifying a character string object (as specific computer virus or specific gene order).Condition code can be stored in one and levy in code data storehouse.A word is levied code data storehouse and can be comprised a plurality of condition codes.The large I of a character string condition code is from several bytes to several thousand bytes.
Character string condition code and character string field are all the bit strings that comprises a lot of elementary cells.Elementary cell is the minimum semantic unit that has, therefore conventionally in signature scan technology as scanning element.The size of an elementary cell is determined by applying.For example, the base unit of English character string is 8 bits (i.e. a byte) normally, and the base unit of a computer virus condition code normally byte or nybble.
The elementary cell of each condition code can be designated as and equals or be not equal to certain particular value, or in certain particular range (as in digital scope 0 to 9 or in English alphabet scope a to z).Each elementary cell can be case-insensitive or case sensitive.Each elementary cell can be supported simple logical operation (as " non-").In addition, each condition code can comprise asterisk wildcard, for example, " * " (a random length asterisk wildcard) or "? " (a fixed length asterisk wildcard), wherein " * " represent zero or any number of any elementary cell and "? " represent any elementary cell.For each random length condition code symbol, can further indicate its random length scope.When a condition code comprises random length character, the indefinite length of condition code.If a condition code does not comprise random length character, its length is fixed.
A typical signature scan process can be included on all possible position in a character string field, corresponding condition code in more described character string field and condition code database.Sweep velocity is conventionally by the size of condition code and limitation of complexity.In addition, sweep velocity is also subject to the capabilities limits that condition code is upgraded one by one.
Summary of the invention
The embodiment of the present invention provides the method and system that scans condition code in character string field.In general, the embodiment of one aspect of the invention provides character string signature scan method, described method comprises one or more condition codes is processed into one or more forms, described form comprises one or more fingerprints and one or more follow-up data structure of searching of each fixed length feature subcode of each fixed length condition code or random length condition code, described one or more fingerprint comprises J fingerprint of specific fixed length condition code or feature subcode, the position on direction of scanning of the first elementary cell of described J fingerprint in described specific fixed length condition code or feature subcode equals J divided by the remainder of the step-length of signature scan operation, thereby the number that makes described fingerprint equals the step-length of signature scan, and make can be identified on described specific fixed length condition code or the feature subcode any position in any character string field being scanned, wherein described in each, fingerprint comprises one or more fragments of specific fixed length condition code or feature subcode, described one or more fragment has the ad-hoc location Anywhere in described specific fixed length condition code or feature subcode, receive a specific character string field being formed by data value, identify the included any condition code of described specific character string field, being included in each take on the position that scanning step is spacing, scan described specific character string field, to search the one or more described fingerprint of one or more condition codes, with on the position of described fingerprint that has one or more couplings, search described specific character string field, to search one or more follow-up data structures of searching, with any condition code of having identified in the described specific character string field of output.Other embodiment of described aspect of the present invention comprises the corresponding system of described method, device, and computer software product.
These and other embodiment comprises one or more following characteristics alternatively.Each fixed length condition code or feature subcode have a plurality of fingerprints, being included in each with described scanning take on the position that scanning step is spacing, scan described character string field, to search a plurality of fingerprints of one or more condition codes, comprise two or more fingerprints of parallel search.Each fingerprint in one or more fingerprints of special characteristic code indicates completely in former space or after being projected to one or more shadow spaces, described shadow space is than the wider space of form, described former space, described shadow space arrives described former space by introducing some ambiguities, thereby makes a fingerprint shadow at specific shadow space corresponding to one or more fingerprints in described former space.
Described character string signature scan method can further be included in each and take on the position that scanning step is spacing, at specific character string field described in former spacescan, to search one or more fingerprints, with first at each, take on the position that scanning step is spacing, in each shadow space in one or more shadow spaces of one or more fingerprints, scan described specific character string field, to search one or more fingerprints, and then have on the position of one or more fingerprints of having identified at least one shadow space in one or more described shadow spaces, in described former space, examine one or two in the fingerprint of having identified.Introducing some ambiguities further comprises the upper case and lower case letter in former space is become to a not case sensitive letter to described former space, all numerals of from 0 to 9 in former space are become to an identical numeral, and the space in former space and "-" are become to one or more in space or "-".
Scanning can further comprise one or more scanning of using in one or more hash tables and one or more cloth grand (Bloom) filtrator to search described one or more fingerprints of described one or more condition codes.Scanning can further comprise one or more scanning of using in a hashed value demodulation multiplexer and a fingerprint length demodulation multiplexer to search described one or more fingerprints of described one or more condition codes.The number of the different characteristic code length of the comparable described a plurality of condition codes of number of the different fingerprint length of a plurality of condition codes is few, and described scanning can further be included in each and take on the position that scanning step is spacing, scan described specific character string field, to search a plurality of fingerprints of described a plurality of condition codes, comprise the identical fingerprint of two or more length of parallel search.Described one or more fingerprint can be selected, thereby makes the length of described fingerprint be restricted to one group of length that covers the one or more length in one or more length ranges, so that the finger scan of multiresolution to be provided.But the integral multiple of the step-length of the length signature scan of fingerprint operation described in each.Described character string signature scan method can further comprise one or more scanning of using in one or more content adressable memorys (CAM) and one or more finte-state machine (FA), to search described one or more fingerprints of described one or more condition codes.
Described character string signature scan method can further comprise each fingerprint of a plurality of fingerprints of a plurality of condition codes is decomposed into one or more fingerprint sections, thereby make the number of different fingerprint segment length of described condition code fewer than the number of the different fingerprint length of described condition code, at each, take on the position that scanning step is spacing, scan described specific character string field, to search described a plurality of fingerprint section, comprise two or more fingerprint sections of parallel search, and the fingerprint section of having identified is synthesized to any fingerprint matching.All fingerprint sections can have an identical length, and the described specific character string field of described scanning is the step-length of signature scan operation of the integral multiple of described fingerprint segment length to search available one of a plurality of fingerprint sections.An explanation particular fingerprint section is together with the fingerprint section figure of the one or more possible positions in any fingerprint can be stored in described particular fingerprint section, for the fingerprint section of having identified is synthesized to any fingerprint matching.Illustrate that one or more fingerprint length informations that may length can be together with the first paragraph of each fingerprint or each section be stored in, for the fingerprint section of having identified is synthesized to any fingerprint matching.One or more finte-state machines (FA) can be used to the fingerprint section of having identified to synthesize any fingerprint matching.
Described character string signature scan method can further comprise the probability of the false positive coupling of storing each fingerprint, on the position of fingerprint that has one or more couplings, check the probability of corresponding false positive coupling, when enough low with the probability of a false positive coupling in probability when described one or more false positives coupling, search described specific character string field, to search described one or more follow-up data structure of searching.Described method can further comprise by an one or more different difference of elementary cell structure of the corresponding a plurality of fixed length condition codes of fingerprint or feature subcode searches data structure, with search described specific character string field, to search described one or more follow-up data structure of searching, comprise that difference searches the corresponding a plurality of fixed length condition codes of the fingerprint of having identified or feature subcode.Described method can further comprise each fixed length condition code or one or more yards of film bits of encoded for feature subcode, described one or more yards of film bits comprise for one or more yards of film bits of one or more following matching conditions are described: need not mate, whether case sensitive, logic NOT, predefine scope, logical operation, and any range, described one or more yards of film bits comprise one or more following code film bits: the code film bit of one or more elementary cells or sub-elementary cell, the code film bit of one or more feature subcode sections, code film bit with one or more fixed length condition codes or feature subcode, search described specific character string field, to search described one or more follow-up data structure of searching, comprise searching described by fixed length condition code or the feature subcode of code film bits of encoded.Described method can further comprise described specific character string field normalization, comprises the specific character string field decoding of having encoded, and the specific character string field decompress(ion) having compressed, and the one or more processes during unwanted string data is deleted.
In general, the embodiment of one aspect of the present invention provides character string signature scan method, described method comprises each condition code in a plurality of condition codes is resolved into one or more condition code sections, receive a specific character string field being formed by data value, scan described specific character string field, to search described a plurality of condition code sections of described a plurality of condition codes, comprise two or more condition code sections of parallel search, the condition code section of having identified is synthesized to the coupling of any condition code, with any condition code of having identified in the described specific character string field of output.Other embodiment of described aspect of the present invention comprises the corresponding system of described method, device, and computer software product.
Concrete enforcement can comprise one or more following characteristics.Scanning can further comprise use to search described a plurality of condition code section: one or more hash tables, one or more Bloom filters, one or more have that hashed value is multiplexing or length is multiplexing or both hash tables, and one or more have that hashed value is multiplexing or length is multiplexing or both Bloom filters in one or more scanning.A condition code section figure that the one or more possible positions of special characteristic code section in any condition code are described can be stored together with described special characteristic code section, to be used for that the condition code section of having identified is synthesized to any condition code coupling.Illustrate the condition code length information of one or more possible condition code length can be further together with the first paragraph of each condition code or each section are stored in, for the condition code section of having identified being synthesized to any condition code, mate.One or more finte-state machines (FA) can be used to the condition code section of having identified to synthesize any condition code coupling.
In general, the embodiment of one aspect of the present invention provides character string signature scan method, described method comprises one or more condition codes is processed into one or more forms, comprise each the random length condition code in one or more condition codes is decomposed into a plurality of fixed length feature subcodes and one or more random length feature subcode, receive a specific character string field being formed by data value, identify the included any condition code of described specific character string field, comprise the described specific character string field of scanning, to search a plurality of described fixed length condition codes or feature subcode, on position there being one or more described fixed length feature subcodes to be identified, the described fixed length feature subcode of having identified is synthesized to any random length condition code, with any condition code of having identified in the described specific character string field of output.Processing one or more condition codes becomes one or more forms further to comprise to store the positional information of each fixed length feature subcode to static nature code composition rule database, described positional information comprises an order and a distance range to next fixed length feature subcode, and comprise or do not comprise the description to the random length feature subcode between every pair of fixed length feature subcode, with described fixed length feature subcode the having identified coupling that synthesizes any random length condition code further comprises and checks the positional information of each fixed length feature subcode of having identified and examine or do not examine the described random length feature subcode between every pair of adjacent fixed length feature subcode, with a behavioral characteristics code synthetic state table of renewal.One or more finte-state machines (FA) can be used to fixed length feature of having identified to synthesize any random length condition code.Other embodiment of described aspect of the present invention comprises the corresponding system of described method, device, and computer software product.
In general, the embodiment of one aspect of the present invention provides character string signature scan method, each character string object that described method is included as in one or more character string objects is selected a plurality of fixed length condition codes, a plurality of fixed length condition codes of described specific character string object comprise J fixed length condition code, the position on direction of scanning of the first elementary cell of described J fixed length condition code in described specific character string object equals J divided by the remainder of the step-length of signature scan operation, thereby the number that makes the fixed length condition code of described specific character string object equals the step-length of signature scan operation, and make can be identified on any position of described specific character string object in any character string field being scanned, receive a specific character string field being formed by data value, identify the included any character string object of described specific character string field, being included in each take on the position that scanning step is spacing, scan described specific character string field, to search described a plurality of fixed length condition codes of described one or more character string objects, comprising two or more fixed length condition codes of parallel scan, with any character string object of having identified in the described specific character string field of output.Described method can further be included as each character string object and select a plurality of random length condition codes based on many groups of nonoverlapping orderly fixed length condition codes, each random length condition code in described a plurality of random length condition code comprises a fixed length condition code and the random length condition code section that a pair of adjacent fixed length condition code is coupled together in every group of fixed length condition code, thereby makes the number of the condition code of each character string object equal S nwherein S is the number of condition code of scanning step or every group of fixed length condition code and the group number that n is fixed length condition code, further comprise the described specific character string field of scanning with any character string object in the described specific character string field of described identification, to search a plurality of fixed length feature subcodes of a plurality of random length condition codes of one or more character string objects, with on the position of fixed length feature subcode that has one or more couplings, the fixed length feature subcode of having identified is synthesized to any random length condition code.
In general, the embodiment of one aspect of the present invention provides character string signature scan method, each character string object that described method is included as in one or more character string objects is selected one or more fixed length condition codes, described one or more fixed length condition codes of described one or more character string objects are processed into one or more forms, described form comprises one or more fingerprints and one or more follow-up data structure of searching of each fixed length condition code, a plurality of fingerprints of described specific character string object comprise J fingerprint, the position on direction of scanning of the first elementary cell of described J described fingerprint in described specific character string object equals J divided by the remainder of the step-length of signature scan operation, thereby the fingerprint number that makes described specific character string object equals the step-length of signature scan operation, and make can be identified on any position of described specific character string object in any character string field being scanned, wherein described in each, fingerprint comprises one or more fragments of the specific fixed length condition code in described one or more fixed length condition codes of described specific character string object, described one or more fragment has the ad-hoc location Anywhere in described specific fixed length condition code, receive a specific character string field being formed by data value, identify any character string object in described specific character string field, be included on the position that each scanning step is spacing, scan described specific character string field, to search a plurality of described fingerprint of described one or more character string objects, comprising parallel search fingerprint described in two or more, with on the position of described fingerprint that has one or more couplings, search described specific character string field, to search the described follow-up data structure of searching, with the character string object of exporting any identification in described specific character string field.Described method can further be included as each character string object and select a plurality of random length condition codes based on many groups of nonoverlapping orderly fixed length condition codes, each random length condition code in described a plurality of random length condition code comprises a fixed length condition code and the random length condition code section fingerprint that a pair of adjacent fixed length condition code is coupled together in every group of fixed length condition code, thereby the number that makes the random length condition code of each character string object equals respectively to organize the product of number of the fixed length condition code of fixed length condition code, comprise the described specific character string field of scanning with the included any character string object of the described specific character string field of described identification, to search a plurality of fingerprints of one or more character string objects, comprise with two or more fingerprints of parallel search, with on the position of fingerprint that has one or more couplings, search described specific character string field, to search the described follow-up data structure of searching.Other embodiment of described aspect of the present invention comprises the corresponding system of described method, device, and computer software product.
In general, one aspect of the invention provides a character string signature scan system, described system comprises a machine-readable memory device that comprises computer program, comprise the processor of following one or more modules with one or more executable computer program products and operation: one can be processed into one or more condition codes one or more fingerprints and one or more follow-up condition code pretreatment module of searching one or more forms of data structure of each the fixed length feature subcode that comprises each fixed length condition code or random length condition code, described one or more fingerprint comprises J fingerprint of specific fixed length condition code or feature subcode, the position on direction of scanning of the first elementary cell of described J fingerprint in described fixed length condition code or feature subcode equals J divided by the remainder of the step-length of signature scan operation, thereby the number that makes described fingerprint equals the step-length of signature scan, and make can be identified on described fixed length condition code or the feature subcode any position in any character string field being scanned, wherein described in each, fingerprint comprises one or more fragments of specific fixed length condition code or feature subcode, described one or more fragment has the ad-hoc location Anywhere in described specific fixed length condition code or feature subcode, one can be treated to an input character string field being comprised of data value the scanning pre-service engine of the required form of one or more scanning, with one can on described input character string field, identify the finger scan engine of one or more fingerprints of one or more condition codes, described identification is included on the position that each scanning step is spacing, scan described input character string field, to search described one or more fingerprints of described one or more condition codes.Described system can further comprise the fixed length condition code Lookup engine that can identify the fixed length feature subcode of the corresponding fixed length condition code of the fingerprint of having identified or random length condition code.Described system can further comprise that comprises a random length condition code Lookup engine that the fixed length feature subcode of the random length condition code of having identified is synthesized to the random length the identified condition code of any random length condition code.Other embodiment of described aspect of the present invention comprises the corresponding method of described system, device, and computer software product.
Concrete enforcement can comprise one or more following characteristics.Described condition code pretreatment module can be selected one or more shadow spaces and one or more fingerprints are projected to described one or more shadow space to go scanning.Condition code pretreatment module can be decomposed into each fingerprint in one or more fingerprints one or more fingerprint sections of one or more length, and the fingerprint composite signal of each fingerprint section is stored into a fingerprint database, can identify a plurality of fingerprints of the one or more condition codes in described input character string field with described finger scan engine, described identification is included in each and take on the position that scanning step is spacing, scan described input character string field, to search a plurality of fingerprint sections, with having on the position of one or more fingerprint sections of having identified, the fingerprint section of having identified is synthesized to any fingerprint matching.
Described condition code pretreatment module can, with one or more yards of film bits, be encoded to one or more condition code sections of a condition code, and one or more condition code sections of described one or more yards of film bits and described condition code are stored together.Described condition code pretreatment module can build a difference and search data structure by the one or more different elementary cell of a plurality of condition codes.Described condition code pretreatment module can build one and comprise a fingerprint database, a fixed length condition code database, and have the condition code database of a condition code rule database while having a random length condition code at least when described signature scan system.
Described scanning pre-service engine can further comprise a scanning conveyer, a projector, a character string field store device, and a shadow field store device.Described scanning pre-service engine can be processed by piece one or more character string field, described processing comprises conveying, decoding, normalization, with conversion one or more, each string chunk in described one or more string chunk comprises a finger scan region of searching for fingerprint scanner uni condition code, a front condition code seek area before finger scan region of searching for condition code, and a rear condition code seek area after finger scan region of searching for condition code.Described trizonal each region of one string chunk can be stored in one or more measure-alike memory blocks, all described trizonal all memory blocks separately or with the constructible ring by the first memory block beginning of current front condition code seek area of one or more add-in memories pieces, to reduce the movement of data in internal memory.
Described finger scan engine can with one or more hash tables and one or more Bloom filter, one or more detects one or more fingerprints.Described finger scan engine can further comprise a finger scan controller, a fingerprint hash counter, a fingerprint finger, a fingerprint compositor, and a fingerprint database.Described fingerprint hash counter can, with an order hash function in the prefix fragment of a plurality of mutual nonoverlapping hashed key, calculate a plurality of hashed values of a plurality of hashed key in order.Described finger scan engine can comprise the one or more fingerprint compositors that a plurality of fingerprint sections walked abreast or synthesize serially the coupling of any fingerprint in fingerprint section bit map of a use and fingerprint length information.Described finger scan engine can comprise a fingerprint compositor that further comprises one or more finte-state machines (FA).
One or more fingerprints of one or more length can be broken down into the fingerprint section of a plurality of formed objects, and there is the finger scan engine of same scan step-length to scan by one or more, each finger scan engine of described one or more finger scan engines covers the position of one or more nonoverlapping staggered described input character string fields, thereby the total scanning step that makes described one or more finger scan engines equals the product of the number of described one or more finger scan engines and the former scanning step of single finger scan engine, or cover the position of one or more partly overlapping staggered described input character string fields, thereby make total scanning step of described one or more finger scan engines between the former scanning step of single finger scan engine, and between the product of the former scanning step of the number of described one or more finger scan engines and single finger scan engine.The number of the finger scan engine that the product of scanning step and memory speed is less can be larger more than the product of scanning step and memory speed the number of finger scan engine.
The one or more finger scan engines one or more internal memories used that cover a shorter fingerprint section can equal or faster than the one or more finger scan engines one or more internal memories used that cover a longer fingerprint section, and cover that the corresponding one or more fixed length condition code Lookup engines of the shorter fingerprint of one or more average lengths one or more internal memories used can equal or faster than covering the longer corresponding one or more fixed length condition code Lookup engines of the fingerprint one or more internal memories used of one or more average lengths.One or more coverings are shorter than the finger scan engine of the fingerprint of length-specific, separately or with available one or more internal memories the fastest in described scanning system together with the first of corresponding fixed length condition code Lookup engine.A plurality of finger scan engines that scan identical one or more fingerprints can share a multiport memory.Described finger scan engine can further comprise one or more content adressable memorys (CAM).
Described fixed length condition code Lookup engine can further comprise a condition code search engine, and a condition code is examined device, and a fixed length condition code database.Described condition code search engine and described condition code are examined device can carry out the fragment of a condition code of more one or more tape code films with a condition code unit comparer and condition code section comparer, to identify one or more fixed length condition codes.Described condition code search engine can difference be searched one or more fixed length condition codes or feature subcode.Described random length condition code Lookup engine can further comprise a condition code composition rule finger, a condition code state verification device, a condition code composition rule database, and a condition code state form.Described random length condition code Lookup engine can comprise a finte-state machine (FA).One or more engines can comprise one or more content adressable memorys (CAM) and one or more finte-state machine (FA) one or more.
Described in this instructions, specific embodiment can be used to realize following one or more advantages.The invention provides the character string scanning system of the condition code in a condition code storehouse of a scanning.Described character string scanning system is updated flexibly and easily.Even if a character string signature scan engine is when a large amount of condition code (as a hundreds of thousands condition code) of scanning, complicated condition code (as reaches several kilobyte, or band asterisk wildcard " * " and "? " scope, case-insensitive, and during a dynamic condition code storehouse, still can provide the sweep velocity being exceedingly fast (as 100Gbps) logic NOT).The size in the sweep velocity of described character string scanning system and condition code storehouse and complicacy extensibility.In addition less internal memory and the memory bandwidth of described character string scanning system demand.Described character string scanning system can be implemented with software or field programmable gate array (FPGA) or special IC (ASIC).In addition, the cost benefit of described character string scanning system is high, has both been applicable to high-end product, is also applicable to product at a low price.
One or more embodiment of the present invention refers to following instructions and accompanying drawing.Other features and advantages of the present invention are apparent by instructions, accompanying drawing and claims.
Accompanying drawing explanation
Figure 1A shows the structural drawing of an exemplary quick character string signature scan system;
Figure 1B shows an exemplary process diagram that builds character string condition code database;
Fig. 1 C shows the exemplary process diagram of a character string signature scan;
Fig. 2 A-2C shows the example data structure of a fingerprint database;
Fig. 2 D-E shows the example data structure of a Hash-entry piece and the embodiment of a fingerprint compositor;
Fig. 2 F-2G shows the example data structure of a Hash-entry piece and the embodiment of a corresponding fingerprint compositor;
Fig. 2 H-I shows the example data structure of a Hash-entry piece and the embodiment of a corresponding parallel fingerprint compositor;
Fig. 3 A-B shows the feature code group chained list searched for fixed length condition code and the example data structure of condition code chained list;
Fig. 4 A-B shows the example feature code element comparer of a predefined global unit scope of support of a character string field and the structural drawing of a condition code section comparer;
Fig. 4 C shows a structural drawing of supporting the example feature code element comparer of local condition code unit scope;
Fig. 5 A-C shows the example data structure of a selected cell Shu He condition code family chained list of searching for fixed length condition code;
Fig. 6 shows the example data structure of a condition code regulation linked of searching for random length condition code;
Fig. 7 shows the example data structure of the condition code state-chain-table of a specific character string field;
Fig. 8 shows one by the example data structure of the Hash-entry piece of a condition code state Bloom filter or hash table indication;
Fig. 9 shows an exemplary computer system.
Element like similar Reference numeral and sign representation class in each accompanying drawing.
Embodiment
General survey
The method and system of the present invention for a character string field being scanned for a character string condition code database.In one embodiment, the scan method of " dividing and rule " is used to scan with a plurality of flow line stages.First each random length condition code is broken down into a plurality of fixed length feature subcodes and scans, and each fixed length feature subcode of each fixed length condition code or each random length condition code is further broken down into a plurality of condition code sections again, scans.In one embodiment, first " coarse scanning " afterwards the method for " close scanning " is used to scan with multiple pipeline sweep phase, to search character string condition code.On each scanning position, first one or more fingerprints of fixed length condition code or fixed length feature subcode are scanned.Only further check and need to carry out having on the position of one or more fingerprint matchings.In addition the first scanning fingerprint shadow (space that shadow is relevant with shadow will be described in detail below) on each scanning position of finger scan.Only, on the position of fingerprint shadow that has one or more couplings, just need comprehensively check described fingerprint.
Described fingerprint shadow further first segmentation scans on each scanning position, and then by only checking that the possible position of fingerprint shadow section in any fingerprint and possible fingerprint length synthesizes fingerprint shadow section.The comprehensive inspection of fingerprint shadow only need to be carried out on the position of synthetic fingerprint shadow that has one or more couplings.In addition, the scanning of fingerprint shadow section only need scan its hashed value on each scanning position.Further inspection fingerprint shadow section only need be carried out on the position of hashed value that has one or more couplings.In one embodiment, first being used for scanning the character string condition code pre-service of character string field before scanning, and pretreated condition code is stored in condition code database, to offer the scanning of multiple pipeline stage, use.
Figure 1A shows a quick character string signature scan engine 100.Described scanning engine comprises 140, one fixed length condition code Lookup engines 160 of 90, one scanning 120, one finger scan engines of pre-service engine of a condition code pretreatment module, and a random length condition code Lookup engine 180.Character string signature scan engine 100, for one or more character string condition code databases, scans character string field, and may send numbering 190 and the position in described character string field of the condition code of having mated back to, to identify special characteristic code fast.In one embodiment, condition code database comprises 148, one fixed length condition code databases 166 of a fingerprint database, and a condition code rule database 186.
Figure 1B shows the program 91 of each character string condition code of pre-service.In one embodiment, first a random length condition code is decomposed into a plurality of fixed length feature subcodes and random length feature subcode, and the information of the relation between fixed length feature subcode is stored in condition code rule database 186 (step 92).
For rapid scanning, fixed length condition code of step 92 output or feature subcode can further be broken down into a plurality of can be by the fragments (step 94) of optimum ordered check.In one embodiment, first or front several condition code section particular importance, can be used as the fingerprint of fixed length condition code or feature subcode.The fingerprint of a character string condition code both can be scanned rapidly, can reduce again the probability of false negative or false positive coupling.In one embodiment, the probability of false negative coupling is zero.In one embodiment, the number of the different fingerprint length of a plurality of condition codes is fewer than the number of the different characteristic code length of a plurality of condition codes, thereby accelerating to require carries out the speed of the scan method of scanning separately to the pattern of different length.In one embodiment, when scanning step is greater than an elementary cell, a plurality of fingerprints can be for a character string condition code, wherein first elementary cell displacement of first elementary cell of each fingerprint relatively previous fingerprint on direction of scanning one or more elementary cells.Direction of scanning is the direction of scan operation movement of scanning position in an input character string field.
The fingerprint of fixed length condition code or feature subcode can further be decomposed into fingerprint section, can further project to one or more shadow spaces as required, thereby then joins fingerprint database 148 and to (step 96) in condition code database.Because the fingerprint of different length may independently be scanned, fingerprint can further resolve into a plurality of fingerprint sections and walk abreast or serial scan.In certain embodiments, fingerprint is broken down into a plurality of fingerprint sections, thereby make the number of different fingerprint segment length, is one or few more than the number of different fingerprint length, thereby accelerating to require carries out the speed of the scan method of scanning separately to the pattern of different length.The scanning result of fingerprint section can be synthesized to together subsequently, to detect fingerprint.
In one embodiment, in order further to improve the ability of scan efficiency and scanning complex characteristic code, fingerprint and other condition code section can first be projected to one or more shadow spaces and scan, and then verify at luv space.Shadow space can be selected to simplify and accelerated scan process, can also cover all possible form of fingerprint or fingerprint section simultaneously.A shadow space can cover the form of a plurality of fingerprints or condition code section.For example, in order to support case sensitive and single character case-insensitive simultaneously, shadow space can be for only having small letter or only having the space of capitalization.As a special case, shadow space can be exactly luv space.
The fingerprint indicating completely or any further feature code section are still and indicate completely at the shadow of any shadow space.Therefore, the fingerprint indicating completely, always can be in any shadow space scanning except luv space.In one embodiment, in order to reduce the number in the space of required scanning, all fingerprints indicating are completely a shadow space scanning in one or more shadow spaces all, makes not have fingerprint to scan at luv space.In another embodiment, the shadow space of a condition code database only has one, and all fingerprints of described condition code database are all in described unique shadow space scanning.
In one embodiment, fingerprint database comprises one or more Bloom filters or hash table.When former hashed key is oversize or too costliness compares in finger scan process, in order further to reduce the probability of false positive coupling and Hash collision, in another embodiment, fingerprint database comprises one or more have additional hashed value bit or a fingerprint length or both modified Bloom filter or hash tables.
Finally, the fragment of the fixed length feature subcode of all fixed length condition codes or random length condition code is encoded and is stored in fixed length condition code database 166, to search fixed length condition code or feature subcode (step 98).In one embodiment, described fragment can be encoded with unit membrane or subunit film, to support the specified conditions of character string condition code coupling (as " need not mate ", " equate ", " unequal ", " case-insensitive ", " case sensitive ", " within the scope of one ", " outside a scope ").In one embodiment, the fragment of tape code film can be compiled into subsequently a chained list or any other and search structure (as tree etc.).In another embodiment, sheet segment encode film or fixed length condition code or feature subcode code film can be compiled into searches structure, to save storage space.In another embodiment, one group of character string condition code can further utilize the different units between character string condition code to carry out differential coding, with form can fast finding differential data structure (as difference tree etc.).
Fig. 1 C has demonstrated the scanning sequence 101 of a character string condition code.First a character string field to be scanned is wanted decoded and is converted to (as with scanning pre-service engine 120) needed form of one or more follow up scan stage (step 102).First described symbol string field is scanned (as used finger scan engine 140) by the shadow of the fingerprint of its shadow and one or more character string condition codes is compared at shadow space, and then in original fingerprint space, any fingerprint shadow (step 104) of having identified is examined.Step 106 is looked into and is seen if there is a fingerprint matching.
After scanning, without mating or having a coupling to produce respectively output or an output that indicates a few condition code coupling that an expression does not have condition code coupling.In one embodiment, finger scan engine 140 can provide zero false negative coupling and an enough little false positive coupling probability.If there is no fingerprint matching, the scanning of current scanning position completes, can move to next scanning position (step 108).If there is fingerprint matching, the condition code to a few coupling is further searched to (as used fixed length condition code Lookup engine 160), to be identified as more clearly the fixed length feature subcode (step 110) of fixed length condition code or random length condition code.
Step 112 is looked into and is seen if there is a fixed length condition code or feature subcode coupling.If do not mated, the scanning of current scanning position completes, is movable to next scanning position (step 108).If have one or more fixed length condition code couplings, export the numbering of the fixed length condition code of each coupling, and finish the scanning (step 118) of current scanning position.If there is the fixed length feature subcode as the parts of one or more random length condition codes to be identified, the feature subcode of having mated will be by dynamically synthetic (as with random length condition code Lookup engine 180) to detect one or more random length condition codes (step 114).Step 116 is looked into and is seen if there is a random length condition code coupling.If do not mated, the scanning of current scanning position completes, can move to next scanning position (step 108).If there are one or more couplings, export the numbering of the random length condition code of each coupling, and finish the scanning (step 118) of current scanning position.
In the preprocessing process 91 of condition code, in one embodiment, the probability of the false positive of each fingerprint coupling is stored in fingerprint database 148.If step 106 has a fingerprint matching, the probability of the false positive of the fingerprint of described coupling coupling will be checked.If the probability of described false positive coupling is low (as the threshold value lower than appointment) enough, the scanning process of current scanning position completes, can move to next scanning position (step 108).In another embodiment, the probability of the false positive of the fingerprint in all databases coupling is all enough low, does not therefore need to store and check the probability of the false positive coupling of fingerprint.The scanning process of current scanning position completes after finger scan, can move to next scanning position (step 108).
In step 102, scanning pre-service engine 120 is first by described character string field decoding, normalization, and be transformed to the form identical with condition code in condition code database.In one embodiment, character string signature scan is to carry out in whole character string field.Yet in another embodiment, restriction or low delay requirement due to the storage space of some systems, store whole character string field impossible.Therefore,, when step 102, described character string field can be broken down into the string chunk of several pre-sizings.Character string signature scan is to carry out on the character string block of each pre-sizing.
After loading a string chunk, described string chunk, by decoded, is standardized, and the needed different-format of sweep phase after being transformed to.In one embodiment, decoding and normalization process can be supported different compressed format (as LZS, PKZip, and gzip), different coding standard (as UU coding, MIME coding, HTML, and XML), and delete random " counter-scanning " junk data.
In one embodiment, decoded character string field is by further being projected to the shadow space of one or more condition code Database Requirements, to support complicated condition code.For example, decoded character string field is converted into full small letter (as a shadow space), to support the character string signature scan of case-insensitive.Character string signature scan can be first carry out on character string field decoded in full small letter, and then with former case sensitive decoded character string field and full the small letter character string field of having decoded verify.
In step 104, first finger scan can identify the fingerprint of its shadow for indicating completely.For a large amount of and complicated condition code of rapid scanning, in one embodiment, finger scan engine 140 can scan a plurality of elementary cells with one or more hash tables or Bloom filter simultaneously.In one embodiment, finger scan engine 140 can improve the service efficiency of reservoir and reduce false positive coupling and the probability of fingerprint collision with fingerprint length is multiplexing with hashed value is multiplexing.Hashed value is multiplexing and fingerprint length is multiplexing can, when guaranteeing zero false negative coupling (missing a condition code coupling), further reduce the probability of false positive coupling (being wrong condition code coupling).
Step 110 scanning fixed length condition code.The fixed length feature subcode of fixed length condition code or random length condition code can be in described fixed length signature scan stage identification.Fixed length signature scan only need be worked as while having a fingerprint matching at least in finger scan process and just carried out.The corresponding fixed length condition code of fingerprint of having mated or feature subcode can be mated one by one or with other search structure (as tree etc.) mate.In one embodiment, the tape code film of elementary cell or sub-elementary cell relatively can be used for supporting the specified conditions of character string condition code coupling (as " need not mate ", " equate ", " unequal ", " case-insensitive ", " case sensitive ", " within the scope of one ", " outside a scope ").
Step 114 scanning random length condition code.In one embodiment, random length signature scan only need just be carried out when scanning has the random length condition code of one or more random length characters.The fixed length feature subcode of random length condition code was identified in the fixed length signature scan stage, and the fixed length feature subcode of having identified can dynamically be linked together in the random length signature scan stage, to synthesize one or more former random length condition codes.Described synthetic can realization with a static composition rule table and a dynamic synthetic state table.Described composition rule table indicates the rule that fixed length feature subcode is synthesized to random length condition code, and described synthetic state table is safeguarded current synthetic state according to described composition rule table.The pre-service of condition code database
In one embodiment, in order to improve sweep velocity and memory efficient, before the described character string field of scanning, 90 pairs of condition codes of condition code pretreatment module are carried out pre-service.Before depositing condition code database in, condition code pretreatment module 90 can first be decomposed condition code, conversion, and be encoded into one or more forms.In one embodiment, condition code pretreatment module 90 can build and safeguard 148, one fixed length condition code databases 166 of a fingerprint database, and a condition code rule database 186.
When condition code database has one or more condition codes containing one or more random length subcodes (" * " of the above elementary cell arbitrarily of zero or representative " bc " repeat " (bc) { 3-6} " of 3 to 6 times as represented), each such random length condition code can first be broken down into a plurality of fixed length feature subcodes by random length feature subcode.For example, if a condition code is " subcode 1* subcode 2* subcode 3 ", wherein subcode 1, subcode 2, and subcode 3 is all that described condition code can be broken down into subcode 1, subcode 2, subcode 3 not containing the fixed length subcode of random length character.Described in each in subcode 1* subcode 2* subcode 3, " * " can be replaced by a random length subcode.In one embodiment, described in each, fixed length feature subcode can first be scanned independently, and then is synthesized former random length condition code.
In one embodiment, condition code rule database 186 can build by the positional information (as order, last subcode sign, to distance or the distance range of next fixed length subcode) of each fixed length feature subcode, for synthetic fixed length feature subcode.In one embodiment, the random length subcode between two continuous fixed length feature subcodes is not " need not mate ", and described condition code rule database 186 can be further containing the description to described random length subcode, for synthetic fixed length feature subcode.In another embodiment, available described fixed length feature subcode and random length feature subcode build one or more finte-state machines (FA), and for synthetic fixed length feature subcode, wherein to do as a whole be an incoming symbol to each fixed length feature subcode.
In one embodiment, fixed length condition code or feature subcode will be further broken down into a plurality of fragments that can search by best order (these fragments comprise the described fingerprint of described condition code or feature subcode).Described a plurality of fragment can have different sizes or identical size.In order to prevent false negative coupling or not miss any one condition code, the set of all described fragments equals former fixed length condition code.In the process of signature scan, along with the increase of number of the fragment of coupling, false positive matching value will reduce (the confidence level increase of mating).The process of scanning will end at the fragment of first unmatched fragment or last coupling.In one embodiment, the selection of the fragment of condition code can allow and mate appearance as early as possible without the condition code of mating a zero false positive coupling of termination or identification.
In one embodiment, described fingerprint comprises a plurality of fragments.In another embodiment, described fingerprint only has a fragment, and be encoded as tlv triple { fragment, length, dislocation }, wherein fragment is that first fragment being scanned of character string condition code is the fingerprint of described condition code, and the length that length is described fingerprint misplaces as the dislocation of described fingerprint in a fixed length condition code or feature subcode.Particular fingerprint is the specific fragment of the fixed length feature subcode of a fixed length condition code or random length condition code.
In one embodiment, described shadow space can be selected the process that can simplify and accelerate described signature scan, can cover the form of described multiple fingerprint or condition code fragment simultaneously.In the ideal case, described phantom value can be directly as a hashed key.For example, in order to support case sensitive and unit case-insensitive simultaneously, described shadow space can be the space of full small letter or full capitalization.For example, in order to scan the drivers license number forming by connecing 7 numerals after a letter, wherein each letter and number can be further appointed as the scope of letter and number arbitrarily, shadow space can replace all letters with a code or any one letter (as " a "), and replaces all numerals by another code or any one numeral (as " 0 ").For example, in order to scan the Social Security Number (SSN) being formed by space or "-" separated three bit digital by three groups, wherein each numeral can further be designated as digital scope arbitrarily, shadow space can replace all numerals by a code or any one numeral (as " 0 "), and replaces space and "-" with another code or space or "-".As a special case, described shadow space can be exactly former space.
In one embodiment, fingerprint is after shadow space has been scanned, and the checking of fingerprint can be carried out in former space at once after fingerprint shadow being detected.In another embodiment, described checking can formerly verify other fragment partly or entirely after carry out again.If described fingerprint is covered by other fragment completely, do not need checking.
The selection of fingerprint should be accelerated the speed of described finger scan, the probability that simultaneously also will provide the minimum false positive after finger scan to mate.In one embodiment, described fingerprint can be on arbitrary dimension and the optional position in described condition code.In another embodiment, in order to meet the requirement of system, the size of described fingerprint or the limited location system in described condition code.For example, in order to meet the delay requirement of system, the dislocation of described fingerprint can not surpass certain particular value.
In one embodiment, fingerprint can be selected in order to next or a plurality of conditions: 1) the described shadow of described fingerprint does not have asterisk wildcard or scope, with by compared with short scan, 2) probability that described fingerprint occurs in described character string field to be scanned is very little, 3) number of the shared fingerprint of a plurality of condition codes is as much as possible little, and 4) there is the number of fingerprint of an identical fingerprint section as much as possible little.
The condition of other selection can increase according to the requirement of system.In common network application and non-network application, conventionally all or most of character string condition codes all at least contain one section quite long, there is no the fragment of asterisk wildcard or scope after in projecting to a selected shadow space.In one embodiment, first alternative condition is a necessary condition.In another embodiment, each fingerprint that first alternative condition can further be restricted to all fingerprints at least in a shadow space for indicating completely., containing the condition code that meets the fingerprint of alternative condition, can not be expanded the condition code that contains the fingerprint that meets alternative condition for a plurality of, or not need the different scanning mode of condition code expansion to scan with one.
Described fingerprint can be by checking that all fragments that meet a condition code of the first alternative condition select.The parameter of other fingerprint also can be carried out consideration when selecting fingerprint.According to the second alternative condition, be chosen in condition code section that probability that described character string field to be scanned occurs is very little as fingerprint, can reduce the probability of false positive fingerprint matching.In addition, make the number of fingerprint of an identical fingerprint section as much as possible little, can further reduce the probability of false positive fingerprint matching.Although the length of described fingerprint can be less than 8 elementary cells or more than 32 elementary cells, conventionally between 8 to 32 elementary cells.
Can very long (as hundreds of or thousands of elementary cells) due to described condition code, the length of fingerprint may be also a lot.Yet in one embodiment, the fingerprint of different length will scan separately, thereby sweep velocity is slower.In one embodiment, in order to reduce the complicacy of scanning, can limit the number of fingerprint length according to particular system requirement and system architecture, as be less than 16.In one embodiment, the length of fingerprint can be selected from a predetermined lengths table.In addition, the length of fingerprint can be according to system requirements and system architecture, by exponential increasing relation (as 2,4,8,16, and 32), the linear increment relation (multiple as 4: 4,8,12,16,20,24,28, and 32), or another relation (as 2,3,5,8,13,21,34) is selected.
In one embodiment, the fingerprint of a condition code can be selected with an algorithm.For example, following algorithm can be used for selecting the fingerprint of a fixed length condition code or feature subcode (suppose that scanning step equals 1, the length of fingerprint is from being as short as most the longest l of being fixed as 0, l 1, l 2..., l m-1, l m, finger scan segmentation is carried out, and the shadow space of finger scan is given): 1. finding all is the condition code subsegment indicating completely at shadow space.2. the l each being longer than mdescribed subsegment, find all length to equal l msubsegment.3. each length is equaled to l msubsegment, write down described subsegment and in all condition codes, be chosen as the times N of fingerprint cand have the times N of the first identical fingerprint section with other fingerprint s, and with one based on N c, N s, and l mcost function calculate value at cost.4. by length, equal l respectively m-1..., l 2, l 1, and l 0 subsegment repeating step 2 and 3.5. from step 2 to 4, the subsegment of finding out minimum cost is a fingerprint.
Above-mentioned steps is relevant with the processing sequence of condition code.Several random processing order can be used for finding different fingerprints as required.In one embodiment, value at cost equals (m-i), N c, and N sfrom most significant digit to lowest order, link up, i=0 wherein, 1,2 ..., the length that m and i are fingerprint.In one embodiment, if found the fingerprint of length-specific, do not need again all shorter subsegments to be selected.
In another embodiment, the each mobile scanning step of finger scan engine 140.On each scanning position, the fingerprint of the different length of finger scan engine 140 serial or parallel ground scanning.Therefore, sweep velocity is directly proportional to scanning step (being the number of the elementary cell between continuous two scanning positions).In one embodiment, in order to improve sweep velocity, finger scan can scan a plurality of elementary cells simultaneously, rather than an elementary cell.In order to ensure zero false negative coupling, each character string condition code will equal with the number of a plurality of fingerprints and described fingerprint described scanning step.In other words, fixed length condition code or feature subcode can equal scanning step with the number that a plurality of fingerprints are put into condition code database and described fingerprint.First elementary cell of J fingerprint of special characteristic code be described special characteristic code in (J+k*S) individual elementary cell of direction of scanning, wherein S is scanning step, k is a nonnegative integer, and J=0,1,2 ..., S-1.Described condition code just can find any position in described string segments to be scanned.For example, if special characteristic code is " [Rr] [Ee] [Aa] [Dd] [Mm] [Ee] 123.exe ", described scanning step is 4, described fingerprint length comprises 4,8 and 12, can select following four fingerprints: " [Rr] [Ee] [Aa] [Dd] [Mm] [Ee] 123.ex ", " [Ee] [Aa] [Dd] [Mm] [Ee] 123.exe ", " [Aa] [Dd] [Mm] [Ee] 123. ", and " [Dd] [Mm] [Ee] 123.e ", wherein [Rr], [Ee], [Aa], [Dd], and [Mm] is respectively the English alphabet r of case-insensitive, e, a, d, and m." [Rr] [Ee] [Aa] [Dd] [Mm] [Ee] 123.exe " puts into fingerprint database four times with described four fingerprints.When scanning step is 1, only needs a fingerprint and only need put into once.
The scanning of described a plurality of fingerprints can be from input character string field in S unit any position of (in the first scanning step) start.For example, in one embodiment, described scanning, since the 0th position, will scan (k*S) position, and wherein k is a nonnegative integer.In any input of character string field, (k*S) position is covered by the 0th fingerprint, (k*S+1) position is covered by (S-1) individual fingerprint, (k*S+2) position is covered by (S-2) individual fingerprint, ..., and (k*S+S-1) position is covered by the 1st fingerprint.In another embodiment, scanning, since (S-1) position, will scan (k*S+S-1) position, and wherein k is a nonnegative integer.In any input of character string field, (k*S) position is covered by (S-1) individual fingerprint, (k*S+1) position is covered by (S-2) individual fingerprint, (k*S+2) position is covered by (S-3) individual fingerprint, ..., (k*S+S-2) position is covered by the 1st fingerprint, and (k*S+S-1) position is covered by the 0th fingerprint.
For described scanning step being brought up to S elementary cell, in one embodiment, take be taken in that scanning step is at 1 o'clock select the algorithm of a fingerprint can make following modification as each fixed length condition code or feature subcode, to be used for selecting described S fingerprint of each fixed length condition code or feature subcode: step 1 is to 4 and just the same in the past, step 5 item is revised as from all and finds step 2, the dislocation in direction of scanning of described fixed length condition code or feature subcode is in the subsegment of (J+k*S), the subsegment of finding out minimum cost value is J fingerprint in a described S fingerprint, J=0 wherein, 1, 2, ..., S-1 and k are a nonnegative integer.
Conventionally each character string object only has a condition code, and for supporting the scanning step of S elementary cell, the fixed length feature subcode of each fixed length condition code or random length condition code needs S fingerprint.In another embodiment, for supporting the scanning step of S elementary cell, S fixed length condition code can be used for identifying a character string object, thereby making each fixed length condition code is (J+k*S) in first elementary cell of direction of scanning in the dislocation of described character string object, J=0 wherein, 1,2, ..., be (S-1) nonnegative integer with k.Described specific character string object just can be identified on any position of any character string field to be scanned.In one embodiment, the scanning of S fixed length condition code can be without fingerprint.In another embodiment, each condition code in S fixed length condition code can be carried out signature scan with a fingerprint.
In one embodiment, a plurality of random length condition codes based on many groups of nonoverlapping orderly fixed length condition codes can further be selected to identify a character string object.Every group of fixed length condition code has S fixed length condition code, and wherein J fixed length condition code first elementary cell in direction of scanning is (J+k*S) in the dislocation of described character string object, wherein J=0,1,2 ..., (S-1) and k be a nonnegative integer, thereby form S nindividual random length condition code, wherein n is the group number of nonoverlapping orderly code character that has S fixed length condition code.From every group of fixed length condition code, select a fixed length condition code, and then these fixed length condition codes and one or more random length string segments are combined into S na random length condition code in individual random length condition code, thus described specific character string object can be identified on any position of any character string field to be scanned.Each former fixed length condition code in described n group S fixed length condition code becomes a fixed length feature subcode of the random length condition code after a plurality of synthesizing after synthesizing.In one embodiment, the scanning of described random length condition code can be without fingerprint.In another embodiment, each the fixed length condition code in described random length condition code can select a fingerprint to scan.
In one embodiment, for supporting the scanning step of S elementary cell, specific character string object can be selected P fixed length condition code, and each fixed length condition code can further be selected one or more fingerprints, thereby makes total fingerprint number of each character string object equal S.In first elementary cell of direction of scanning, the dislocation in described specific character string object is (J+k*S) to J fingerprint in the S of a described specific character string object fingerprint, J=0 wherein, 1,2, ..., (S-1) and k be a nonnegative integer, thereby described character string object can be identified on any one position of any character string field to be scanned.
In another embodiment, a plurality of random length condition codes based on many groups of nonoverlapping orderly fixed length condition codes can further be selected to identify a character string object.The overlapping and not orderly fixed length condition code of i group has P iindividual fixed length condition code, wherein each fixed length condition code further has one or more fingerprints, thereby makes total fingerprint number of i group fixed length condition code equal S, i=0 wherein, 1,2 ..., n-1 and n are the group number of overlapping and not orderly fixed length feature code character.In first elementary cell of direction of scanning, the dislocation in described specific character string object is (J+k*S) to J fingerprint in the S of an every group of fixed length condition code fingerprint, J=0 wherein, 1,2 ..., (S-1) and k be a nonnegative integer.
From every group of fixed length condition code, select a fixed length condition code, and then these fixed length condition codes and one or more random length character string are combined into P 0* P 1* P 2* P n-1a random length condition code in individual random length condition code, thus described specific character string object can be identified on any position of any string segments to be scanned.Each former fixed length condition code in described n group fixed length condition code becomes a fixed length feature subcode of the random length condition code after a plurality of synthesizing.
In the condition code of each character string object of a scanning system, fingerprint, and after shadow space determines, the shadow of a fingerprint can be used as an integral body and scans.The fingerprint shadow of different length can walk abreast or serial scan.In one embodiment, the shadow of the fingerprint of different length can be used as an integral body and carries out serial scan.In one embodiment, a character string condition code is enrolled to described condition code database and can use following false code: For (i=0, i < S-1, i++) { { fingerprint shadow i, h i}=fingerprint is selected (character string condition code); k i=h i/ S; Front hashed value=IV; For (j=0, j < k i-1, the j++) { length=j*S of hash; Existing hash character string=fingerprint shadow i[length of hash, the length+S-1 of hash]; Existing hashed value=hash (existing hash character string, front hashed value); Front hashed value=existing hashed value; If (i=0), condition code is searched pointer=condition code and is enrolled (character string condition code); Fingerprint enrolls (existing hashed value, condition code is searched pointer); Wherein S be scanning step, h ibe the length of the fingerprint of i character string condition code, IV is original Hash value, and hash () is an order hash function.The best fingerprint that fingerprint selects () to select for each shift position, condition code is enrolled in the database that () be programmed into condition code described fixed length condition code Lookup engine 160 and random length condition code Lookup engine 180 and is gone, and fingerprint enrolls () fingerprint is programmed in described fingerprint database 148 and is gone.When scanning step is greater than 1, the fingerprint of each condition code will call fingerprint and enroll () once.But, because all pointing to same condition code, all fingerprints of a condition code search data structure, as long as calling altogether condition code, all fingerprints of a condition code enroll () once.
In one embodiment, from described condition code database, delete a character string condition code and can use following false code: For (i=0, i < S-1, i++) { k i=h i/ S; Front hashed value=IV; For (j=0, j < k i-1, the j++) { length=j*S of hash; Existing hash character string=fingerprint shadow i[length of hash, the length+S-1 of hash]; Existing hashed value=hash (existing hash character string, front hashed value); Front hashed value=existing hashed value; Condition code searches pointer=fingerprint and deletes (existing hashed value); If (i=0), condition code is deleted (condition code is searched pointer, existing hash character string); }
Wherein S is scanning step, h ithe length of the fingerprint of i character string condition code, IV is original Hash value, hash () is an order hash function, fingerprint is deleted () fingerprint is deleted from described fingerprint database 148, and condition code deletion () is searched data structure by the condition code of described condition code and deleted from the database of described fixed length condition code Lookup engine 160 and random length condition code Lookup engine 180.When scanning step is greater than 1, because a condition code has a plurality of fingerprints, from condition code database, delete a condition code, fingerprint is deleted () and will be called repeatedly.But, as long as all fingerprints of special characteristic code call altogether condition code, delete () once.
Because the number of different fingerprint segment length is conventionally few more than the number of different fingerprint length, in one embodiment, in order to improve scan efficiency, a fingerprint can further be broken down into a plurality of fingerprint sections.All fingerprint sections of a fingerprint can first walk abreast or scanning serially, and the scanning result of described fingerprint section is walked abreast or be synthetic serially, to detect fingerprint.The length of described fingerprint section can be identical or different, depends on the length of the fingerprint that specific scanning engine is supported.
In one embodiment, the number of fingerprint section and length can be determined according to other sweep parameter of the length of fingerprint and specific scanning engine.In another embodiment, between fingerprint length, be that the length of linear relationship and all fingerprint section is all identical.Conventionally, the length of a fingerprint section equals one or more scanning steps.In another embodiment, scanning step is that integral multiple and a plurality of fingerprint section of the length of a fingerprint section will be synthesized concurrently.
In one embodiment, fingerprint database can comprise one or more in one or more Bloom filters and one or more hash table.Conventionally use Bloom filter or hash table should weigh size and the bandwidth of storer.When the number of condition code can all deposit enough on-chip memories of large bandwidth in, Bloom filter or even a plurality of hash table may be better.On the contrary, when the yardage of condition code more, thereby in the time of must using chip external memory, the bandwidth of internal memory just becomes main restriction, described hash table rather than described Bloom filter may be better.
In one embodiment, the added bit of hashed value is stored in a Bloom filter or a hash table, multiplexing to realize hashed value.In another embodiment, fingerprint length is stored in a Bloom filter or a hash table, multiplexing to realize fingerprint length.When former hashed key is oversize or too costliness compares in scanning fingerprint, hashed value is multiplexing and fingerprint length is multiplexing can further reduce the probability that false positive is mated and fingerprint collides.
In one embodiment, work as shadow space, fingerprint, after fingerprint section and finger print data structure are determined, the fixed length feature subcode of fixed length condition code or random length condition code can be used as an integral body or segmentation is programmed in described fingerprint database 148 and condition code database and goes at shadow space.
In one embodiment, other fragment beyond the fingerprint of a fixed length feature subcode of fixed length condition code or random length condition code can be encoded and store into fixed length condition code database 166, to be used for scanning whole fixed length condition code or feature subcode after fingerprint matching.In one embodiment, one or more fragments of a feature subcode of fixed length condition code or random length condition code can be encoded into the code film of one or more elementary cells or sub-elementary cell the fragment of a plurality of tape code films, to support the specified conditions of character string condition code coupling (as " need not mate ", " equate ", " unequal ", " case-insensitive ", " case sensitive ", " within the scope of one ", " outside a scope ").In one embodiment, in order to improve storage efficiency, one or more condition code sheet segment encode films or one or more condition code or feature subcode code film can be used separately, or use together with one or more elementary cells or this element number of subunit film.
In one embodiment, the condition code section of tape code film can be linked in sequence together in order or not with chained list, or pre-service becomes other to search structure (as tree).In one embodiment, the length of the condition code section of all tape code films can entirely equate or be not congruent.The length of the condition code section of described tape code film can be selected best according to the framework of particular memory.
In another embodiment, in order to allow the false positive coupling of character string condition code converge to as quickly as possible zero-sum, allow the condition code that may mate converge to as quickly as possible zero or one, one group of character string condition code can further be carried out differential coding, with build can fast finding differential data structure (as difference tree).Difference tree can utilize the different units between condition code to build.
In one embodiment, when adding one or more new condition codes, maybe, when deleting one or more existing condition code, condition code database preprocessing can only carry out when the initial creation of condition code database.In another embodiment, the database of condition code can dynamically upgrade in the scanning process of condition code.The scanning pre-service of character string field
Described scanning pre-service engine 120 is different forms according to character string condition code database the pre-service of character string field, to simplify and to accelerate scanning flow line stage thereafter.In one embodiment, the condition code in condition code database is stored with uncoded form.Therefore, scanning pre-service engine 120 is the character string field decoding of having encoded, consistent with the codec format of the condition code with in described condition code database.As shown in Figure 1A, scanning pre-service engine 120 comprises 128, one projectors 130 of a scanning conveyer 126, one decoded word segment memories of 124, one format decoder of 122, one character string field store devices, and a shadow field store device 132.Described scanning conveyer 122 is first loaded into described format decoder 126 by data to be scanned from described character string field store device 124, described format decoder 126 will be decoded to original field again, resolve and decompress, wherein can comprise MIME decoding, the decoding of UU, foreign language decoding, remove and comprise skimble-skamble character string (as extra white space) and counter-scanning garbage character string data (as the counter-scanning junk data injecting), HTML resolves, XML resolves, and deflate decompresses, and LZS decompresses, PKZip decompresses, and gzip decompresses.Described format decoder 126 can also be according to particular system need normalized character string field.After this, described format decoder 126 deposits the character string field after decoding and normalization in decoded word segment memory 128 in and goes.
Described projector 130 projects to one or more shadow spaces by decoded field, and shadow data are stored in shadow field store device 132 and gone.For example, condition code database can comprise the condition code of case-insensitive.In order to support the condition code of case-insensitive, described projector 130 is helped small letter by the data-switching in decoded word segment memory 128, and full lowercase character string field is stored in shadow field store device 132 and gone.Full lowercase character string field can be used for carrying out finger scan by finger scan engine 140.The condition code of case sensitive condition code and case-insensitive just can scan simultaneously.The coupling of case sensitive condition code can be after its shadow coupling, then verifies by the decoded field of case sensitive.
In one embodiment, character string signature scan engine 100 comprises computational resource or the network equipment that can support whole character string field to carry out character string signature scan fast.Yet in another embodiment, character string signature scan engine 100 comprises computational resource or the network equipment that cannot store whole character string field fast, owing to being for example subject to storage space limitations and the low delay requirement of system.Therefore, described character string field can be broken down into the string chunk of a plurality of pre-sizings.The scanning of character string condition code is carried out on the string chunk of each pre-sizing.
In one embodiment, the size of a string chunk depends on the longest character string condition code.Described string chunk can further be divided into three regions: a finger scan region is to cover all signature scans position, before one in the region in finger scan region with before fingerprint is provided or among reference data, after one in the region in finger scan region with after fingerprint is provided or among reference data.The set in all finger scan regions of specific character string field has covered all possible fingerprint reference position in described character string field.Described trizonal size can be identical or different.In one embodiment, the minimum dimension in the front region in finger scan region is the maximum fingerprint dislocation of all condition codes in condition code database, after minimum dimension in the region in finger scan region be the length of all condition codes and the maximum difference of dislocation in condition code database, the minimum dimension in finger scan region is scanning step.
In one embodiment, the size of described scanning string chunk can be according to the parameter of described scanning system, for example, the length of the longest condition code, internal storage structure, and sweep velocity is selected.When the longest condition code is longer, scanning string chunk can be several times of length of the longest condition code, for example, and 2 to 4 times.When the longest condition code more in short-term, scanning string chunk can be more times of length of the longest condition code.
In one embodiment, the trizonal size of scanning string chunk is identical, all equals the length of the condition code grown most.Described three regions can be stored in the ring forming with three internal memories, to reduce the movement of data in internal memory.In another embodiment, described trizonal varying in size, finger scan region is less than other two regions, and the size in other two regions can be identical or different.
In one embodiment, described trizonal each region of a scanning block can be stored in one or more onesize memory blocks, all trizonal all memory blocks form a past in first memory block in the region in finger scan region, to after the ring that finishes of last memory block in the region in finger scan region, to reduce the movement of data in internal memory.After having scanned first memory block in finger scan region, first memory block of described ring will exit described ring, can be used for loaded with new data.After loading new data, described memory block will add described ring as last memory block.In one embodiment, one or more add-in memories pieces can add the ring tail of described ring, for loading new data.
In one embodiment, described character string field store device 124 only comprises last memory block.Therefore, the size of described character string field store device 124 equals the size of a memory block.Described decoded word segment memory 128 and described shadow field store device 132 comprise all trizonal all memory blocks, and its size is all memory block size sums.
Due to boundary condition, described first there is specific condition with last piece.In one embodiment, length is for after " impossible " elementary cell of the length of long condition code can be filled into before first of the described front reference zone in described finger scan region last piece with the rear reference zone in described finger scan region." impossible " elementary cell comprises the elementary cell that does not have condition code to start or end up with it, and therefore, the region of described filling can not become a part for an actual condition code.If finger scan engine 140, fixed length condition code Lookup engine 160, and have sweep limit checking mechanism in random length condition code Lookup engine 180, described filling does not just need.Sweep limit checking mechanism can prevent that scanning from exceeding character string word section boundary.Finger scan
In one embodiment, described finger scan engine 140 can comprise one or more in one or more content adressable memorys (CAM) and one or more finte-state machine (FA).In another embodiment, when original fingerprint or fingerprint at the shadow of a shadow space when indicating completely, described finger scan engine 140 can be the engine based on hash.As shown in Figure 1A, in one embodiment, described finger scan engine 140 comprises 146, one fingerprint databases 148 of a finger scan controller 144, one fingerprint fingers of 142, one fingerprint hash counters, and a fingerprint compositor 150.In one embodiment, a fingerprint scans as an integral body, and therefore, described fingerprint compositor 150 is optional components.
In another embodiment, each fingerprint is broken down into multistage and is independently scanned.All fingerprint sections can first be scanned, and then by serial or parallel, are synthesized (as used fingerprint compositor 150) to produce the scanning result of fingerprint.In one embodiment, described finger scan controller 142 is controlled whole scanning process.
Described finger scan engine 140 exportable one do not have occurrence or with a result of mating in fingerprint database.Described occurrence is corresponding to the one or more character string condition codes that can be searched by fixed length condition code Lookup engine 160 and random length condition code Lookup engine 180 subsequently.If there is no occurrence, scanning process just completes.
In one embodiment, described fingerprint hash counter 144 comprises a plurality of independently universal hash function, h 0, h 1..., h i, to support Bloom filter.For example, when the size of internal memory rather than the Bandwidth-Constrained of internal memory are when processed, available Bloom filter.For example, when the condition code database of scanning system enough little, in the time of can all depositing on-chip memory in, the limited size system of its internal memory.In another embodiment, described fingerprint hash counter 144 comprises a plurality of independently universal hash function, h 0, h 1..., h i, to support a plurality of hash tables.For example,, when false positive coupling needs very low (as 10 -3or less), the bandwidth of internal memory is enough wide and big or small when enough large, can use a plurality of hash tables.For example, when DRAM or other slower chip external memory are used for storing the data structure in follow up scan stage, and on-chip memory is while enough storing a plurality of hash table, wishes that false positive coupling is very low.
In another embodiment, described fingerprint hash counter 144 only comprises a hash function h 0.For example, when the bandwidth of internal memory rather than the limited size of internal memory are when processed, can use a hash table.For example, when the condition code database of scanning system enough large, thereby in the time of must using chip external memory, the Bandwidth-Constrained system of internal memory.
Described fingerprint hash counter 144 extracts the data of n byte from described shadow field store device 132, and calculates its hashed value.Described data can be separately or be used for calculating the current hash values of all hash functions together with random initial value or front hashed values.In one embodiment, one of them hash function (for example, first hash function h 0) the number of bits of hashed value more than the number of bits of the hashed value of other hash function, and the number of bits of the hashed value of other hash function is identical.
The hashed value of hash function output can be used for searching fingerprint database 148 by fingerprint finger 146.In one embodiment, described fingerprint finger 146 comprises a Bloom filter and a hashed value demodulation multiplexer.Described Bloom filter checks the effective marker that described hashed value is pointed.If all effective markers are all effective, hashed value demodulation multiplexer further finds out a corresponding Hash block according to the additional bit of hashed value.Described hashed value demultiplexing is exactly the additional bit that checks particular Hash value, and hashed value demultiplexing can carry out separately or the information relevant with fingerprint with other to checking fingerprint length is carried out simultaneously.Hash demultiplexing can further reduce the result of false positive condition code coupling.In one embodiment, described Bloom filter can be condensed to a hash table.In another embodiment, described Bloom filter can expand as a plurality of hash tables.
The fingerprint of different length can be by parallel scan or serial scan.In one embodiment, a plurality of fingerprint fingers 146 can be used for carrying out parallel scan simultaneously.For example, fingerprint finger 146 can be used for scanning the fingerprint of each different length.
In another embodiment, a fingerprint finger 146 can carry out serial scan to the fingerprint of different length.For example, the length of fingerprint can be the integral multiple of scanning step.Each quick character string signature scan engine 100 is supported one group of fingerprint length, and S, 2*S, 3*S ..., m*S}, wherein S is that scanning step and m*S are the length of long fingerprint.On each scanning position, the fingerprint of different length is by serial scan.The order hash function of an input S elementary cell can be used to carry out described scanning.In one embodiment, the serial finger scan of described finger scan engine 140 can be described by following false code: k=[t/S]; For (i=0, i < k-1, k++) { scanning position=i*S; Front hashed value=IV; For (j=0, j < m-1, the j++) { length=j*S of hash; If (scanning position+the length L EssT.LTssT.LT t of hash) { existing hash input=character string field [length of hash of scanning position+, the length+S-1 of hash of scanning position+]; Existing hashed value=hash function (existing hash input, front hashed value); Condition code is searched pointer=fingerprint and is searched (existing hashed value); Condition code numbering=condition code is searched (condition code is searched pointer); Front hashed value=existing hashed value; ?
Wherein S is scanning step, and m*S is the fingerprint length of growing most, and t is the total length of character string field to be scanned, and IV is original Hash value, and hash function () is an order hash function.Fingerprint is searched () and is realized with fingerprint finger 146 and fingerprint compositor 150, and condition code is searched () and realized with fixed length condition code Lookup engine 160 and random length condition code Lookup engine 180.
Fig. 2 A-C has demonstrated the data structure of the fingerprint database 148 when a fingerprint when different length is done as a whole scanning, comprising a Bloom filter table 200, and a fingerprint Hash block chained list 250, and 256 of Hash-entry pieces.One group of hashed value of giving with fingerprint hash counter 144, { hashed value 0A, hashed value 1..., hashed value 1, fingerprint finger 146 removes to read described Bloom filter table 200.Every of described Bloom filter table 200 comprises an effective marker 202 and a Hash block chain list index 204.Effective marker 202 is the signs that arrange when input exists at least one condition code.
In one embodiment, Hash block chain list index 204 hashed value only 0Aindication, and point to the first term of fingerprint Hash block chained list 250.If the effective marker 202 of all hashed value indications is all set up, further finger scan can carry out on the fingerprint Hash block chained list 250 of Hash block chain list index 204 indications.In another embodiment, Hash block chain list index 204 equals hashed value 0A, can be omitted, to reduce Bloom filter table or hash table, its cost is to increase the data structure of sweep phase thereafter.
In one embodiment, hashed value 0Abe used to fingerprint to join in Bloom filter table 200 and go, and other hashed value can be used to reduce false positive coupling.In addition,, in order not delete character string condition code by mistake, each of Bloom filter table 200 is followed the tracks of the number of its condition code with a counter.In another embodiment, described Bloom filter table 200 can be condensed to a hash table, only needs a hashed value (as hashed value 0A).In another embodiment, described Bloom filter table 200 can expand as a plurality of hash tables.
In one embodiment, described fingerprint Hash block chained list 250 is a chained list.Each of fingerprint Hash block chained list 250 comprises 252, one next block pointers 254 of a last piece, and a Hash-entry piece 256.Next block pointer 254 points to the next item down of described fingerprint Hash block chained list 250.When of described fingerprint Hash block chained list 250 is tail item, last piece 252 will be set up.Because tail item can also be by checking whether next block pointer 254 is that null pointer is known, last piece 252 is one and is used for fast detecting and goes out the optional territory of tail item.Each Hash-entry piece 256 comprises maximum n fingerprint, and wherein n is any integer that is greater than zero.In one embodiment, a best n can select according to the memory architecture of scanning system.For example, if in save as SRAM, n can equal 1; If inside save as DRAM, n > 1.
In one embodiment, as shown in Figure 2 C, each fingerprint item of described Hash-entry piece 256 comprises a hashed value (hashed value 0B) 260, one 264, one feature code group pointers 266 of 262, one types of fingerprint length or condition code pointers 268, and a dislocation 270.Hashed value 0B260 and fingerprint length 262 to be used for respectively realizing the multiplexing and fingerprint length of hashed value multiplexing.When type 264 is zero, it will be a feature code group pointer 266 of a feature code group output; Otherwise, will be a condition code pointer 268 of a condition code output.First elementary cell that dislocation 270 is fingerprint is to the dislocation of first elementary cell of subsegment next to be compared.If there is no next data structure, just do not need to misplace 270.When not needing to misplace 270 time, dislocation 270 can be set as zero.In another embodiment, when n > 1, each fingerprint item can increase an effective marker.When described effective marker is set up, can compare hashed value 0B260 and fingerprint length 262.
In one embodiment, a fingerprint of each fingerprint item storage of Hash-entry piece 256.Yet in another embodiment, because fingerprint can be very long, size is different again, and because have further feature code to search the stage after finger print data library lookup, original fingerprint can not be stored in each fingerprint item.Therefore, the fingerprint item of each coupling may mate corresponding to zero fingerprint due to false positive, or because fingerprint collision is corresponding to a plurality of fingerprints.When coming scanning fingerprint and (k/2 with a simple hash table rather than a Bloom filter table m) during < < 1, the order of magnitude of the probability of false positive coupling is (k/2 m) and the order of magnitude of the probability of fingerprint collision be (k 2/ 2 2m), wherein k comprises hashed value at total fingerprint number and m 0Aand hashed value 0Bhashed value 0total bit.
When with a Bloom filter table, above two probability will decline to a great extent.When with a plurality of hash table, described two probability will further decline.In order to reduce after-stage condition code as far as possible, search, the probability of false positive coupling and fingerprint collision can be reduced to and approach zero.In one embodiment, enough the number of large m and hash function can be used for reducing the probability that false positive coupling and fingerprint collide.
In order to reduce storage space, in one embodiment, a plurality of hashed values can be multiplexed into a fingerprint item, to reduce the probability of null term.The hashed value of each m position can be divided into two parts: the hashed value of m1 position 1(m-m1) hashed value of position 2.Hashed value 1be used as the address of fingerprint database, and hashed value 2(as hashed value 0B260) can be used to solve Hash collision and false positive coupling.M1 is less, and required storage space is fewer, but fingerprint Hash block chained list 250 is just longer.In one embodiment, when 2 m1while being greater than the k of twice or more times, the average length of fingerprint Hash block list is less than 1.
In order to save the complicacy of the management of storage space and the described table of reduction, in another embodiment, the fingerprint of all different lengths can be multiplexed into a fingerprint database.Fingerprint length 262 can further reduce the probability of false positive coupling and fingerprint collision.
Searching data structure and can implement by several different modes as shown in Fig. 2 A-C.Particular implementation can be according to the size of feature code table, the size of free memory and type, and as SRAM on sheet, DRAM on sheet, the outer DRAM of the outer SRAM of sheet and sheet and determining.For example, in one embodiment, if condition code number is 128K, described effective marker 202 can leave on a sheet SRAM in to realize fast access, and Hash block chain list index 204 leaves in the outer SRAM of a sheet.Effective marker 202 can be with the access of all hashed values, and Hash block chain list index 204 available hashed value only 0Aaccess.Only, when the effective marker 202 of all hashed values is while being all effective, just need access Hash block chain list index 204.End piece 252 and next block pointer 254 can leave in the outer SRAM of a sheet, hashed value 0B260 and fingerprint length 262 can leave in the outer SRAM of a sheet or DRAM, type 264, feature code group pointer 266 or condition code pointer 268, and dislocation 270 can be left in the outer DRAM of a sheet.Only work as hashed value 0B260 and during fingerprint length 262 coupling, just need access type 264, feature code group pointer 266 or condition code pointer 268, and misplace 270.
In one embodiment, in order to improve scan efficiency, the fingerprint of a character string condition code can resolve into a plurality of fingerprint sections.Can first to these fingerprint sections, walk abreast or serial scan, and then the fingerprint section of coupling be synthesized to the scanning result of fingerprint with fingerprint compositor 150.Because the number of different fingerprint segment length is generally much smaller than the number of different fingerprint length, fingerprint is divided into fingerprint section and scans and can accelerate finger scan.Fingerprint is divided into fingerprint section and scans, make to support that the longer fingerprint containing more fingerprint sections becomes possibility, thereby reduce false positive coupling.
In one embodiment, fingerprint section synthetic is accurately or completely, non-false positive coupling.In another embodiment, in order to accelerate finger scan, fingerprint section synthetic be " coarse " or part, have a false positive coupling.In order to reduce the probability of false positive coupling, the information of the one or more possible position of each fingerprint section and the possible length of fingerprint can be stored and be used to fingerprint section is walked abreast or serial synthesizes any fingerprint matching.
When a plurality of fingerprint section serial scan, different embodiment of the Hash-entry piece of " at least one fingerprint matching " and its corresponding fingerprint compositor are as shown in Fig. 2 D-E.In one embodiment, as long as have at least a fingerprint matching just to report a coupling, but can not provide about there being the information of length of the fingerprint of how many fingerprint matchings and coupling.
As shown in Figure 2 D, in one embodiment, every hashed value that comprises particular fingerprint section of Hash-entry piece 257 0B264, one feature code group pointers 266 of 274, one types of 2, one at least one fingerprint composite signals of 260, one fingerprint section Figure 27 or condition code pointer 268, and a dislocation 270.Fingerprint section Figure 27 2 is effective marker bitmap arrays, and its number of bits is identical with the number of the fingerprint section that described fingerprint compositor 151 is supported.When i fingerprint section of a fingerprint Duan Weiyi fingerprint, the i position of fingerprint section Figure 27 2 will be set up.Fingerprint section Figure 27 2 provides all possible position of described fingerprint section in all fingerprints.At least one fingerprint composite signal 274 provides the number of minimum steps containing the fingerprint of described fingerprint section, synthetic for the fingerprint of at least one fingerprint section coupling.In one embodiment, at least one fingerprint composite signal 274 is stored in first fingerprint section of described fingerprint.In another embodiment, at least one fingerprint composite signal 274 or any other fingerprint length information are stored in each fingerprint section of described fingerprint.In another embodiment, at least one fingerprint composite signal 274 is omitted.
In one embodiment, hashed value 0B260 and Fig. 2 C in hashed value 0B260 is identical.Type 264, feature code group pointer 266 or condition code pointer 268, also identical with the respective value in Fig. 2 C with dislocation 270, be stored in first fingerprint section of fingerprint, and use after the individual clock period in delay " fingerprint hop count subtracts 1 ".In one embodiment, due to type 264, feature code group pointer 266 or condition code pointer 268, and 270 of dislocation be stored in fingerprint first fingerprint section in, the identical fingerprint of all first fingerprint sections is all stored together.In one embodiment, in the selection course of fingerprint, allow the probability of a plurality of fingerprints of first identical fingerprint section for minimum.
Fig. 2 E demonstrates an embodiment of a fingerprint compositor 151 when 274 of at least one fingerprint composite signals are stored in first fingerprint section of each fingerprint.In one embodiment, the size of fingerprint section is elected as identical with scanning step.For example, the size of fingerprint section and scanning step are all 4, thereby make the length of fingerprint, are 4,8,12, and 16.Described fingerprint compositor 151 comprise the input of 280,32 of 12 d type flip flops with door 282, and the MUX 284 of 14 input.Fingerprint compositor 151 be input as fingerprint section Figure 27 2 and at least one fingerprint composite signal 274.If find a fingerprint, fingerprint compositor 151 will be exported a coupling 290.Described coupling 290 is only effective when first fingerprint section that first fingerprint section of a synthetic fingerprint is a fingerprint.
In one embodiment, in order to examine the MUX 284 of described coupling 290,4 inputs, can be replaced by the MUX of one 5 input, the input end wherein newly adding links zero and selected when fingerprint section is not first fingerprint section.In one embodiment, by the time delay stage increase logic gate different, fingerprint compositor 151 can be expanded, to support the fingerprint length information of the fingerprint section except the first fingerprint section of a fingerprint.In another embodiment, when at least one fingerprint composite signal 274 is not stored in any fingerprint section, fingerprint compositor 151 can be simplified by deleting MUX 284 and all logic gates that is not used in the shortest fingerprint in condition code database.In another embodiment, fingerprint compositor 151 can be revised as at an easy rate uses other scanning step, the number of fingerprint section, and fingerprint length.In one embodiment, because a plurality of fingerprint matching is merged into the coupling of " at least one fingerprint matching ", fingerprint compositor 151 can be in the result of a finger scan of each clock period output.
When a plurality of fingerprint section serial scan, provide the Hash-entry piece of " all fingerprint matchings " of all couplings and another embodiment of its corresponding fingerprint compositor as shown in Fig. 2 F-G.When the fingerprint of a plurality of different lengths is mated as a condition code coupling, may need one or more after-stages scanning.
As shown in Figure 2 F, each of described Hash-entry piece 258 comprises a hashed value of particular fingerprint section 0B264, one feature code group pointers 266 of 276, one types of 2, one all fingerprint composite signals of 260, one fingerprint section Figure 27 or condition code pointer 268, and a dislocation 270.The fingerprint composite signal that all fingerprint composite signals 276 are all couplings, the number of the fingerprint section that its figure place is supported with fingerprint compositor 152 is identical.When a fingerprint section is that while having a section of fingerprint of i fingerprint section, the i position of all fingerprint composite signals 276 will be set up.Other territory of Hash-entry piece 258 is identical with the corresponding domain in Fig. 2 D.In one embodiment, described all fingerprint composite signals 276 are only stored in first fingerprint section of fingerprint.In another embodiment, described all fingerprint composite signals 276 or any other fingerprint length information are stored in each fingerprint section of a fingerprint.In another embodiment, described all fingerprint composite signals 276 are omitted.
An embodiment of one of Fig. 2 G demonstration fingerprint compositor 152 that has a plurality of fingerprint length when all fingerprint composite signals 276 are only stored in first fingerprint section of each fingerprint.The described fingerprint compositor 152 use scanning step identical with Fig. 2 E, fingerprint section size, and fingerprint length.Described fingerprint compositor 152 comprise the input of 280,52 of 16 d type flip flops with door 282, and 13 input with door 286.A coupling of described fingerprint compositor 152 outputs 0292, one couplings 1294, one couplings 2296, and a coupling 3298, respectively with one, two, three, and the coupling of four fingerprint segment lengths' fingerprint.
In one embodiment, the time delay stage by different increase logic gate (as with door) or increase input existing and door, described fingerprint compositor 152 can be expanded, to support the additional fingerprint length information of the fingerprint section except the first fingerprint section of a fingerprint.In another embodiment, when all fingerprint composite signals 276 are not stored in any fingerprint section, can be deleted for all fingerprint composite signals 276 and input door and all logic gates relevant with fingerprint length, to simplify described fingerprint compositor 152.In one embodiment, the Hash-entry piece 258 in Fig. 2 F can be expanded, and comprises many set types 264, feature code group pointer 266 or condition code pointer 268, and dislocation 270; Every group of territory is for each fingerprint length.In addition, in one embodiment, the information of the exact length of relevant fingerprint also can be used for the follow up scan stage.
Fingerprint compositor 152 can scan the fingerprint of all different lengths simultaneously.But, due to type 264, feature code group pointer 266 or condition code pointer 268, and dislocation 270 is only stored in a fingerprint section, the fingerprints of a plurality of total described fingerprint sections are stored in together.It is minimum that the suitable selection of the fingerprint of condition code can make described impact drop to.Yet, in order to eliminate described impact, the type 264 of the fingerprint of the fingerprint section of all couplings, feature code group pointer 266 or condition code pointer 268, and dislocation 270 can be stored in the table of another fingerprint section indication having been mated.The address of all fingerprint sections to having mated can be used for searching an entry in described table.
In another embodiment, except at least one fingerprint composite signal 274 or all fingerprint composite signals 276, fingerprint section Figure 27 2 can further be omitted, thereby is stored in any fingerprint section without any fingerprint composite signal.Fingerprint section just can may be synthesized by form according to all fingerprints in described condition code database.If a plurality of fingerprint sections meet any form of the form of all fingerprints, just think a fingerprint matching.For a kind of special circumstances, if a plurality of fingerprint section meets the minimum requirements of all fingerprint format, just think a fingerprint matching.
When a plurality of fingerprint section parallel scan, the Hash-entry piece of " at least one fingerprint fingerprint matching " and " all fingerprint matchings " and different embodiment of its corresponding fingerprint compositor are as shown in Fig. 2 H-I.As shown in Fig. 2 H, every of described Hash-entry piece 259 comprises a hashed value 0B260, one fingerprint sections are mated 264, one feature code group pointers 266 of 278, one types or condition code pointer 268, and a dislocation 270.Described fingerprint section coupling 278 is the match bit of fingerprint section, and other territory is identical with corresponding territory in Fig. 2 D.278 pairs of described fingerprint section couplings as shown in Fig. 2 I as described in certain specific section adaptation of parallel fingerprint compositor (as section adaptation 0, section adaptation 1, section adaptation 2, or section adaptation 3) there is a specific implication.When i the fingerprint section that fingerprint section is any fingerprint, the fingerprint section coupling 278 of i section adaptation will be set up.In one embodiment, i i the fingerprint section that section adaptation is only stored fingerprint, thus fingerprint section coupling 278 is always set up, and can omit.In one embodiment, synthetic in order further to reduce false positive, the fingerprint length information of similar at least one fingerprint composite signal 274 or all fingerprint composite signals 276 can be stored in first fingerprint section or all fingerprint sections of a fingerprint.
One of Fig. 2 I demonstration is not with an embodiment of the parallel fingerprint compositor 153 of fingerprint length information.Described fingerprint compositor 153 comprise one 2 input with the input of 282, one 3, door with 286, one 4 inputs of door with door 288, and one 4 input or door 285.Described fingerprint compositor 153 is a coupling of all fingerprint matching output 0292, one couplings 1294, one couplings 2296, and a coupling 3298, and be coupling 290 of at least one fingerprint matching output.In one embodiment, an overall fingerprint length filtration device can be used coupling 0292, coupling 1294, coupling 2296, coupling 3298, and coupling 290, to filter out impossible fingerprint length.In one embodiment, by increasing logic gate, described fingerprint compositor 153 can be expanded, to be supported in the fingerprint length information of storing in first fingerprint section of each fingerprint or all fingerprint sections.In one embodiment, parallel fingerprint section scanning can be by a long scan step-length that equals fingerprint segment length several times, to add fast scan speed.
In general, in one embodiment, each fixed length condition code can be broken down into a plurality of fragments and scan independently.The same with synthetic fingerprint section as scanning, all fragments of a condition code can first be scanned, and then are closed by serial or parallel.Conventionally, the number of different characteristic code segment length, even if be not one, also little a lot of than the number of different characteristic code length, thus can accelerate to carry out the speed of the scan method of scanning separately for each modal length.In one embodiment, the condition code section that two or more length are identical is by parallel scan.
In another embodiment, one or more in one or more hash tables and one or more Bloom filter are used to the condition code section that scanning indicates completely.In one embodiment, when having a condition code coupling at least, described fingerprint compositor 150 can be used to serial or parallel the condition code section of having identified is synthesized to any condition code coupling.Data structure shown in Fig. 2 D-I and embodiment can be used for composite character code section.In another embodiment, one or more being used in one or more finte-state machines (FA) and one or more content adressable memory (CAM) synthesizes any condition code coupling the condition code section of having mated.Fixed length signature scan
As shown in Figure 1, described fixed length condition code Lookup engine 160 comprises that 162, one condition codes of a condition code search engine examine device 164, and a fixed length condition code database 166.Condition code search engine 162 can be identified a possible fixed length condition code, the fixed length feature subcode of a possible random length condition code, or a condition code family that comprises a plurality of possible fixed length condition codes or fixed length feature subcode.The possible fixed length condition code of having been identified by condition code search engine 162 or feature subcode will be examined device 164 by condition code and examine.166 of fixed length condition code databases are the database that condition code search engine 162 and condition code are examined device 164.
Fixed length condition code database 166 can be implemented by plurality of data structures.In one embodiment, as shown in Fig. 3 A-B, fixed length condition code database 166 is one a plurality of condition code chained list 350 links is got up with the two-dimensional chain table of constitutive characteristic code group chained list 300.Each of feature code group chained list 300 comprises 302, one dislocation 304 of next a yard of pointer, and a condition code pointer 306.Next yard of pointer 302 is pointers that point to the next item down of feature code group chained list 300, and condition code pointer 306 is pointers that point to special characteristic code chained list 350.Dislocation 304 is the dislocation from the first module of particular fingerprint to the first module of the specific character string condition code of condition code pointer 306 indications.
Each fixed length condition code or feature subcode can be broken down into a plurality of condition code sections 352, to form condition code chained list 350.Described condition code section 352 can, since the first elementary cell of a fixed length condition code or feature subcode, couple together by scanning sequency.In one embodiment, the size of described condition code section is different.In another embodiment, all described condition code sections measure-alike, its size can be selected best according to system architecture.Each of condition code chained list 350 comprises 358, one types 360 of 356, one next segment pointers of 354, one latter ends of 352, one condition code section films of a condition code section, and a numbering 362.Next segment pointer 358 is pointers that point to next section, and latter end 356 is latter end signs.When type 360 is 0, numbering 362 is fixed length feature subcode numbering 364; Otherwise numbering 362 is condition code numbering 366.Condition code section film 354 is used to specify the specific matching condition of each elementary cell or even sub-elementary cell, comprises " need not mate ", " equating ", " unequal ", " within the scope of one ", " outside a scope ", " case-insensitive ", and " case sensitive ".Described matching condition can realize by input source and the output source of selected cell comparer.If fixed length condition code or feature subcode are not the integral multiples of the size of condition code section 352, can fill out to reach most " size of condition code section 352 subtracts one " individual " 0 " or other value at the afterbody of described fixed length condition code or feature subcode, and the film of corresponding filler cells is set to " need not mate ".
In one embodiment, the condition code section film 354 of each elementary cell is 3 bits.When " case-insensitive ", the first bit is set to 0; When " case sensitive ", the first bit is set to 1.While " equating ", latter two bit is set to 0, and when " unequal ", latter two bit is set to 1, and when " need not mate ", latter two bit is set to 2, and latter two bit is 3 to be retained.More film bit can be used for realizing other matching condition as required, for example, and predefined scope (numerical character or alphabetic character), symbol class, or any range.In another embodiment, in order to improve storage efficiency, the code film of one or more condition code sections or fixed length condition code or feature subcode can be used separately, or coordinates used with the code film of elementary cell or sub-elementary cell.
In one embodiment, described condition code search engine 162 can search feature code group chained list 300 until the tail item of described chained list, and next yard of pointer 302 is null pointer.Each of feature code group chained list 300 is all returned to a condition code pointer 306 that points to a condition code chained list 350.
Described condition code is examined device 164 and can be verified as each condition code chained list 350 and carry out condition code checking.Condition code is examined device 164 from the first paragraph of condition code chained list 350, by each condition code section of scanning sequency paragraph by paragraph examination.If do not mated, condition code is examined device 164 will stop hypomere inspection; Otherwise condition code examines that device 164 will check whole condition code chained list 350 until its tail, latter end 356 is 1.If find a coupling, when type is 0, mate the fixed length subcode for random length condition code, condition code is examined device 164 will export a fixed length feature subcode numbering 364; When type is 1, to mate for fixed length condition code, condition code is examined device 164 will export a fixed length condition code numbering 366.
In another embodiment, for make false positive coupling with the rapid development of hop count drop to zero, condition code section 352 can be by an optimal sequence link together.Represent that current condition code section will be added in each of condition code chained list 350 to the dislocation of next condition code section.Although the length of condition code section can be not identical, the length of a fixing condition code section can be selected.
Fig. 4 A-B has demonstrated the block scheme of an embodiment of a condition code unit comparer 400 and a condition code section comparer 450.Described condition code unit comparer 400 carries out unit comparison in fixed length condition code is searched.Condition code unit comparer 400 comprises 410, one 2 inputs of 408, one 4 input MUX of 406, two not gates of 402, one 2 input 404, one equality comparators of MUX of a code film demoder or door 412, and a range comparator 414.Code film demoder 402 can be decoded as code film bit the control signal of the input and output of equality comparator 406 and range comparator 414.In one embodiment, range comparator 414 is used for supporting the global scope of predefined each character string field or the global scope of each condition code alternatively.In one embodiment, m range comparator 414 can be used for supporting m predefined global scope.In another embodiment, a units match 416 can be output with after " need not mate " bit logical OR.
A plurality of condition codes unit comparer 400 and input more than can be used to construction feature code section comparer 450 with door 452.The data cell of condition code section comparer 450 is a byte normally, but also can be four bits or any other size.In another embodiment, condition code unit comparer 400 can be replaced by a condition code unit comparer 480 as shown in Figure 4 C, with supported feature code element local scope.The unit of each the tape code film in described condition code section 352 can expand to the unit scope of a tape code film, or the upper bound of the given condition code unit of a tape code film and the unit pair of lower bound.
In one embodiment, the fixed length condition code Lookup engine 160 shown in Figure 1A may need to search a plurality of condition code chained lists.Yet the probability that need to search a plurality of condition code chained lists is conventionally very low.When needs are searched a plurality of condition code chained list, can use a plurality of condition codes in a group condition code are carried out to the differential coding of coding mutually.
For example, in one embodiment, selection character string cell tree 500 as shown in Figure 5A can be used as the data structure of searching of the condition code search engine 162 shown in Figure 1A.Select character string cell tree 500 to comprise burl 520a-520e.Select each burl 520 of character string cell tree 500 to have two branches, a coupling branch is by pointer 1530 indications, another does not mate branch by pointer 2532 indications.As shown in Figure 5 B, select character string cell tree 500 to have two kinds of different burls 520: the non-leaf that the leaf that type 528 is 1 and type 528 are 0.The non-leaf of coupling always points to another burl 520, and unmatched non-leaf points to another burl 520 or the empty joint on tree.A total condition code family chained list 550 as shown in Figure 5 C, another burl 520 or the empty joint on unmatched leaf sensing tree of pointing to of leaf of coupling.
In one embodiment, as shown in Figure 5 B, select each burl 520 of character string cell tree 500 to comprise 528, one pointers of 522, one 526, one types of 524, one of unit unit membrane of dislocation 1530, and a pointer 2532.Type 528 comprises leaf and non-leaf as mentioned above.Selected unit 524 can be in any position of condition code, when burl 520 is not tree root, its position (is for example provided by the dislocation 522 of previous burl 520, the front nodal point of node 520b is node 520a, but because node 520a is the tree root of selecting character string cell tree 500, so 520a does not have front nodal point), when burl 520 be tree root (for example, node 520a), time, the dislocation 270 of the occurrence of its position in fingerprint Hash block chained list 250 (seeing Fig. 2 B) provides.Each burl comprises a unit membrane 526 corresponding with condition code section film 354 (seeing Fig. 3 B) in condition code chained list 350.
Any two character string condition codes have an elementary cell difference at least, as long as one of them character string condition code is not the subsegment of another character string condition code.Therefore, a unit 524 can be chosen to distinguish at least two character string condition codes, thereby makes behind more described unit 524, has at least a character string condition code to be eliminated.By to selecting searching of character string cell tree 500, have at least a different condition code of elementary cell to be distinguished.In one embodiment, shown selection character string cell tree 500 is binary tree.In another embodiment, can build corresponding selection character string unit k and divide tree.K condition code in a feature code group can be used as in the locational k of a same elementary cell elementary cell burl that k divides tree, although each condition code can also contribute a plurality of elementary cells to divide in each burl of tree to k.
In one embodiment, the part that character string condition code can be another character string condition code.These have the condition code of mother-child relationship (MCR) can not be selected 500 differentiations of character string cell tree.In one embodiment, as long as find any one condition code wherein, just do not need further to search, there is no need to distinguish the condition code that has mother-child relationship (MCR).Therefore, only need to scan wherein the shortest condition code.But in another embodiment, need to distinguish all condition codes maybe needs the longest condition code of identification.
In order to support the condition code family of mother-child relationship (MCR), in one embodiment, condition code family chained list 550 as shown in Figure 5 C can be searched data structure (seeing Figure 1A) for what condition code was examined device 164.Each of condition code family chained list 550 comprises 560, one numbering types 562 of 552, one 554, one condition code section 558, one the next item down pointers of 556, one condition code section films of dislocation of a type, and a numbering 564.In one embodiment, for supported feature code family, condition code family chained list 550 has two kind items: type 552 is 0 searches and result items that type 552 is 1.When checking that each is searched, condition code section 556 will be compared according to condition code section film 558.Condition code section 556 identical with the corresponding domain in condition code chained list 350 with condition code section film 558 (seeing Fig. 3 B).But result items does not need the comparison of any condition code section.
In one embodiment, the condition code of all couplings of described system searching, and export the numbering 564 of the condition code of each coupling.Yet searching of condition code will proceed to the afterbody of condition code family chained list 550 or the next item down pointer 560 is null pointer, to find all female condition codes always.Numbering 564 has two kinds: feature subcode numbering 566 and condition code numbering 568.The type of numbering 564 is determined by numbering type 562.When numbering type 562 is 0, numbering 564 is feature subcode numberings 566, represents the fixed length feature subcode of a random length condition code; Otherwise numbering 564 is condition code numberings 568, represents a fixed length condition code.
In one embodiment, condition code family chained list 550 from a minimus generation or the shortest subcode, be connected to a oldest generation or the longest subcode always.Dislocation 554 specify from the first module of existing condition code section 556 to the dislocation of first module of condition code section 556.If existing condition code section is not mated, searching of condition code family chained list 550 can stop.Providing any dislocation 554 of determining between feature subcode makes to become possibility based on unmatched premature termination.
In one embodiment, every generation of condition code family chained list 550 can only be supported a condition code.If, there are a plurality of condition codes in a specific generation, can be with a plurality of condition code family chained list 550, each condition code in described specific generation needs Yi Ge condition code family chained list 550.A plurality of condition code family chained list 550 can be with selecting character string cell tree 500 distinguish.
In another embodiment, the condition code of tape code film can comprise that a content adressable memory (CAM) stores and scan with one or more storeies.Random length signature scan
As shown in Figure 1, described fixed length condition code Lookup engine 160 is exported the numbering of fixed length feature subcode of the random length condition code of all couplings, size, the position in described character string field by scanning sequency.By the information of the fixed length feature subcode of having identified, random length condition code Lookup engine 180 synthesizes any random length condition code the fixed length feature subcode of having identified.In one embodiment, one or more finte-state machines (FA) are used to synthetic fixed length feature subcode.In another embodiment, random length condition code Lookup engine 180 comprises 184, one condition code composition rule databases 186 of 182, one condition code state verification devices of a condition code rule searching device, and a condition code state form 188.Condition code rule database 186 provides the static rule that how the fixed length feature subcode of random length condition code is synthesized to random length condition code.Condition code state form 188 is dynamically a state of a process that input of character string field store is synthetic.
Condition code rule searching device 182 finds out to the fixed length feature subcode of having mated and numbers relevant rule from condition code rule database 186, and described relevant condition code rule is offered to condition code state verification device 184.Condition code state verification device 184 synthesizes random length condition code according to condition code composition rule by the fixed length feature subcode of having mated, and regeneration characteristics code state form 188.
Condition code rule database 186 and condition code state form 188 can have various data structure.In one embodiment, condition code rule database 186 can be condition code regulation linked 600 as shown in Figure 6.The numbering of the feature subcode that condition code regulation linked 600 is found by fixed length condition code Lookup engine 160 is pointed.A plurality of random length condition codes can comprise identical fixed length feature subcode.Condition code regulation linked 600 can link together the random length condition code of all fixed length feature subcodes that comprise special characteristic subcode numbering representative.
In one embodiment, random length condition code of each correspondence of condition code regulation linked 600.Each comprises 606, one next yard of pointers 608 of condition code numbering 604, one last codes of 602, one orders, and a distance range information 610.Condition code numbering 602 represents specific random length condition code.Order 604 be the order of fixed length feature subcode in all fixed length feature subcodes of described random length condition code that described feature subcode is numbered representative.Whether last code 606 indicates described fixed length feature subcode is last fixed length feature subcode of random length condition code.Last code 606 indicates the end of random length signature scan.Next yard of pointer 608 points to the next item down of 600.Distance range information 610 is for indicating the optional territory of the distance range (i.e. minimum and maximum unit number between two fixed length feature subcodes) of fixed length feature subcode and lower fixed length feature subcode described in this.For example,, when described scope or longest distance or bee-line is given in advance or while being infinitely great, distance range information 610 can be omitted or be reduced to bee-line or longest distance.
In another embodiment, each of condition code regulation linked 600 can comprise one or more additional fields, to describe one or more random length feature subcodes between two or more fixed length feature subcodes.For example, can " pattern " that be used for filling up the repetition of distance range information 610 be added to one " pattern " or one " pointer of pattern " in each of condition code regulation linked 600 and go, to describe random length feature subcode.
As shown in Figure 7, in one embodiment, condition code state form 188 can be implemented with one or more condition code state-chain-tables 700.Each condition code state-chain-table 700 can be a character string field of particular link, the condition code state of all fixed length feature subcodes of all random length condition codes that dynamic memory has been identified.Each of condition code state-chain-table 700 comprises 702, one last subcode order (lorder) 704, one next son code positions 706 (nloc) of a condition code numbering, and next yard of pointer 708 (nptr).Condition code numbering 702 is the numbering of specific random length condition code.Last subcode order 704 be described specific random length condition code previous fixed length feature subcode numbering to the order of specific fixed length feature subcode.Next son code position 706 be described specific random length condition code next fixed length feature subcode numbering to the active position scope of a fixed length feature subcode.
In one embodiment, each character string field of each line has a condition code state-chain-table 700.Conventionally, at each constantly, each line only has a character string field to be scanned, and only has a condition code state-chain-table 700.Condition code state-chain-table 700 can comprise whole effective history of all subcodes of having mated of all random length condition codes of a character string field of particular link.
Condition code state-chain-table 700 can be dynamic.In one embodiment, if the fixed length feature subcode of the numbering indication of described fixed length feature subcode is first fixed length feature subcode of a random length condition code, be that its order 604 is 1, and the item that also there is no described specific random length condition code, new item can be added in the condition code state-chain-table 700 of a character string field of particular link and go.If show in the effective scope of not giving at next son code position 706 position of first unit of the fixed length feature subcode of mating or overtime generation, one of described condition code state-chain-table 700 can be deleted.One of described condition code state-chain-table 700 also can be deleted after the random length condition code that finds a coupling based on described.In one embodiment, all items of the condition code state-chain-table 700 of particular link character string field can be deleted at the end of a character string field.
As shown in Fig. 1 and Fig. 6, in one embodiment, described condition code rule searching device 182 receives the numbering of a fixed length feature subcode from described fixed length condition code Lookup engine 160.Described condition code rule searching device 182 is searched the condition code regulation linked 600 of the numbering indication of whole described fixed length feature subcode, and by each information of giving of described condition code regulation linked 600 (as { condition code numbering 602, order 604, last code 606, distance range information 610}) and described fixed length condition code Lookup engine 160 information of giving (as { position of the first module of fixed length feature subcode, the length of fixed length feature subcode, line numbering, character string field number }) deliver to successively together in condition code state verification device 184 and go, arrive the afterbody (being that next yard of pointer 608 is null pointer) of described condition code regulation linked 600 always.
The condition code state-chain-table 700 of line numbering indication described in the information searching of each that the described condition code rule searching of described condition code state verification device 184 use device 182 is given.To each of condition code state-chain-table 700, if it is 702 identical that condition code numbering 602 and condition code are numbered, order 604 equals last subcode order 704 and adds one, and the effective marker that the position of first elementary cell of fixed length feature subcode is given at next son code position 706 is put in scope, it is just a coupling.Each occurrence to condition code state-chain-table 700, if last code 606 is 1, exports described condition code numbering 602, and described is deleted from condition code state-chain-table 700; Otherwise described is updated, allow the value 704 of last subcode order equal sequentially 604 value, the value of next son code position 706 equals the position of first unit of fixed length feature subcode, the length of fixed length feature subcode, and the summation of the value of distance range information 610.Do not need unmatched to do anything.
In one embodiment, when condition code state-chain-table 700 is short and particular link only needs to scan a character string field in particular moment, can scan with described condition code state-chain-table 700.But if condition code state-chain-table 700 is long or particular link need scan a plurality of character string fields in particular moment, other is searched data structure and can be used for described condition code state form 188.In one embodiment, described condition code state form 188 can be a condition code state Bloom filter or the hash table of a condition code state that is similar to the data structure in Fig. 2 A-C.The hashed key of condition code state Bloom filter or the hash table of condition code state can be 3 tuples { line numbering, character string field number, condition code numbering }.In one embodiment, when each line only has a character string field to be scanned in particular moment, do not need to use character string field number.
As shown in Figure 8, in one embodiment, the Hash-entry piece 256 in Fig. 2 A-C can be hashed item 856 and replace.Each of Hash-entry piece 856 comprises that special one is levied 864, one last subcode orders 866 of 862, one character string field number of 860, one line numberings of code numbering, and a next son code position 868.Last subcode order 866 is identical with the definition of next son code position 706 with the last subcode order 704 in described Fig. 7 with the definition of next son code position 868, hashed key 3 tuples { condition code numbering 860, line numbering 862, character string field number 864} can be stored and be used to solve Hash collision.To each of Hash-entry piece 856, if former hashed key is identical, order 604 equals last subcode order 866 and adds one, and within the active position scope of giving at next son code position 868 of the position of the first module of fixed length feature subcode, is just a coupling.Each occurrence to Hash-entry piece 856, if last code 606 is 1, exports described condition code numbering 602, and described is deleted from Hash-entry piece 856; If last code 606 is 0, the item of described Hash-entry piece 856 is updated, and allows last subcode order 866 equal order 604, and next son code position 868 equals the position of the first module of fixed length feature subcode, the length of fixed length feature subcode, and the summation of distance range information 610.Occurrence does not remain unchanged.
In one embodiment, when the order of two continuous fixed length feature subcodes and distance range meet one or more in condition code regulation linked 600, whether condition code state verification device 184 will further examine one or more random length feature subcodes couplings of the one or more indication of the string segments between described two fixed length feature subcodes and condition code regulation linked 600.When a random length feature subcode of the character string between described two continuous fixed length feature subcodes in described specific character string field and the one or more indication of described condition code regulation linked 600 is mated, described condition code state-chain-table 700 will upgrade by described new fixed length feature subcode.The design of scanning system and performance
In one embodiment, the speed of character string signature scan engine 100 is subject to the speed restriction of finger scan engine 140 fast, for example, and when described false positive coupling is enough little and the follow up scan stage is properly designed.When fingerprint, do as a wholely during by the scanning of length one by one, the speed of finger scan engine 140 depends on scanning step, the number of fingerprint length, and clock speed combination.In one embodiment, the speed of described scanning engine 100 is (s/m) * R, and wherein s is scanning step, and m is the number of fingerprint length, and R is clock speed.For example, if scanning step is 8 bytes, fingerprint length 4,8,16 and 32 bytes, clock frequency is 500MHz, the speed of described scanning engine 100 is 8* (8/4) * 500MB/s=8Gbits/s.
In another embodiment, if parallel scan is carried out in the first segmentation of fingerprint, and then synthetic with " at least one fingerprint matching " serial, and fingerprint segment length is identical with scanning step, and the speed of an independent scanning engine 100 is s*R.In one embodiment, when s gets the value identical with precedent with R, described scanning engine 100 can character string field of 32Gbps velocity sweeping.In addition, in another embodiment, fingerprint can scan in first segmentation, and then parallel synthetic, thereby scanning step sweep velocity can further improve n doubly, and wherein n is the number of parallel scan and synthetic fingerprint section.When fingerprint segment length gets the value identical with precedent with R, if n is 4,4 fingerprint sections are by parallel scan and synthetic, and scanning step is 32 bytes, and described scanning engine 100 can character string field of 128Gbps velocity sweeping.
Sweep velocity discussed above is the speed of a single signature scan engine.In one embodiment, when carrying out parallel scan with several signature scan machines, signature scan speed can further improve several times.
In one embodiment, the general structure of a signature scan system and the selection of parameter can be according to one or more following factors: the sweep velocity of character string condition code, the size of the fixed length feature subcode of fixed length condition code or random length condition code, similarity between a plurality of condition codes or feature subcode, select with the size of condition code database, to guarantee finger scan engine 140, fixed length condition code Lookup engine 160, and random length condition code Lookup engine 180 can meet the requirement of specific scanning system.For example, scanning step can be selected according to system requirements.As shown in table 1, scanning step is longer, and the speed of character string signature scan engine 100 is just faster fast.But the minimum dimension of fixed length condition code or feature subcode just needs larger, and the number of times that inserts and delete is just more.The step-length that exposes thoroughly also limits the selection of the fingerprint of each condition code, and increases the probability of fingerprint collision and fingerprint false positive coupling.Table 1: the selection of scanning step
Figure BPA00001213012300401
In addition, thus scanning step sweep velocity can be subject to the size restrictions of minimum fixed length condition code or feature subcode especially.In one embodiment, for fear of short condition code is separately scanned, scanning step can be selected according to the minimum dimension of fixed length condition code or feature subcode by table 1.
Each elementary cell of the table 1 all fixed length condition codes of hypothesis and feature subcode can be with being fingerprint.In one embodiment, all fixed length condition codes and feature subcode at least indicate completely at a shadow space.In addition, in another embodiment, thereby scanning step sweep velocity is limited by the minimum dimension of the shadow indicating completely of all fixed length condition codes and feature subcode further.Therefore, the 3rd column heading at table 1 can make " minimum dimension of all fixed length condition codes or the shadow indicating completely of feature subcode " into.
In another embodiment, in order to improve sweep velocity, can select larger scanning step.Being shorter than the condition code that described scanning step can scan can be scanned separately, for example, and by above-mentioned scan method or any other scan method.When only minority fixed length condition code or feature subcode are shorter, increase scanning step very effective.
In another embodiment, the engine number of different scanning flow line stage can be different.Engine can be selected according to the requirement of described particular system.For example, for particular system, can be with scanning 120, four finger scan engines of pre-service engine 140, one fixed length condition code Lookup engines 160, and two random length condition code Lookup engines 180.
In one embodiment, can be with a plurality of finger scan engines 140, thus make each finger scan engine 140 cover one group of fingerprint length, so that multiresolution finger scan to be provided.In one embodiment, all fingerprints are all broken down into the fingerprint section of equal length and all identical scanning steps of fingerprint Duan Douyong scan.Can be all identical for scanning the number of finger scan engine of every group of fingerprint length of all fingerprint length groups.
In another embodiment, fingerprint is broken down into the fingerprint section of different length, the fingerprint section of different length is scanned by the scanning step with different length according to the average length of one group of fingerprint length, thereby make longer fingerprint section and the larger scanning step for fingerprint of the fingerprint length that one group of average length is longer, shorter fingerprint section and the less scanning step for fingerprint of the fingerprint length that one group of average length is shorter.For example, the fingerprint section of 8 elementary cells and scanning step can be 8,16,24,32 for length, and the fingerprint of 40 elementary cells, and the fingerprint section of 2 elementary cells and scanning step can be 2,4 for length, and the fingerprint of 6 elementary cells.
In one embodiment, for the speed of balance with the finger scan engine 140 of the fingerprint section of different scanning step scan different length, can be more than using compared with the number of the finger scan engine 140 of the longer fingerprint section of long scan step scan compared with the number of the finger scan engine 140 of brachydactylia line section with shorter scanning step scanning.In another embodiment, for the speed of balance with the finger scan engine 140 of the internal memory of friction speed, the number of the finger scan engine 140 of the slower internal memory of use can be more than the number of the finger scan engine 140 with very fast internal memory.In general, in another embodiment, the number of finger scan engine 140 can be determined by the product of scanning step and memory speed.The number of the finger scan engine 140 of the product of less scanning step and memory speed can be more than the number of the finger scan engine 140 of the product of expose thoroughly step-length and memory speed.
In one embodiment, a plurality of nonoverlapping staggered position that the finger scan engine 140 of the same group of fingerprint of scanning that a plurality of scanning steps are identical covers in an input character string field is the product of the number of finger scan engine and the scanning step of former single finger scan engine thereby make total scanning step of described a plurality of finger scan engines.For example, for identical sweep velocity is provided, scanning step is that the number of the finger scan engine of 2 elementary cells can be that scanning step is 4 times of number of the finger scan engine of 8 elementary cells.In another embodiment, a plurality of partly overlapping staggered position that the finger scan engine 140 of the same group of fingerprint of scanning that a plurality of scanning steps are identical covers in an input character string field, thereby the total scanning step that makes described a plurality of finger scan engines is greater than the scanning step of former single finger scan engine, but less than the product of the scanning step of the number of finger scan engine and former single finger scan engine.
In one embodiment, the fingerprint database 148 of different fingerprint segment length can be stored in the internal memory of friction speed, thereby makes for the internal memory of shorter fingerprint section faster than the internal memory of the fingerprint section for longer.In one embodiment, the corresponding fixed length condition code of different fingerprint length group database 166 can be stored in the internal memory of friction speed, thereby makes for the internal memory of the shorter fingerprint of one group of average length than fast for the internal memory of the longer fingerprint of one group of average length.
In one embodiment, the fingerprint database 148 that is shorter than the fingerprint of length-specific (for example, 9 elementary cells) for example can be stored in, in one of the fastest internal memory (, the buffer memory of on-chip memory or CPU) of scanning system.In one embodiment, be shorter than in one of all or part of the fastest internal memory that also can be stored in scanning system together with fingerprint database 148 of the corresponding fixed length condition code of fingerprint database 166 of described length-specific.In another embodiment, a plurality of finger scan engines of same fingerprint group can share a fingerprint database 148 in the internal memory that is stored in a multiport.
In one embodiment, one or more engines in various flows last pipeline stages discussed above can be replaced by any other scan method.For example, in one embodiment, a content adressable memory (CAM) can be used as the shadow that finger scan engine 140 is used to scanning fingerprint, and fixed length condition code Lookup engine 160 and random length condition code Lookup engine 180 still can be used to further scan condition code.In another embodiment, a CAM can be used as finger scan engine 140 and is used at the one or more fingerprints of former spacescan.In one embodiment, determine or non-deterministic finite automaton (DFA or NFA) can be used as fingerprint compositor 150 and is used to synthetic fingerprint section for one.In another embodiment, DFA or NFA can be used as random length condition code Lookup engine 180 and are used to a fixed length feature subcode and synthesize random length condition code.
Other embodiments of the invention can scan other string data.For example, in biosystem, a hereditary code sequence can be used as a character string field.The condition code of describing specific gene can be used for identifying described specific gene sequence from a character string field being comprised of genetic data.For example, specific gene can be identified by specific condition code with described scanning machine.
The operation of the described all functions of the present invention and this instructions, comprises the described structural approach of this instructions or suitable structural approach, or both combinations, can be with Fundamental Digital Circuit or with computer software, and firmware, or hardware is implemented.The present invention can be implemented on one or more computer programs, be in one or more information carriers that are stored in as a machine-readable memory device or a transmitting signal, such as a programmable processor, the digital processing device of or many computing machines is performed or control the computer program of its operation.A computer program is (also referred to as a program, software, application software, or code) can be written as any programming language form that comprises compiling or interpretative code, and can be deployed as and comprise as a stand-alone program or as a module, assembly, subroutine, or other is applicable to any form of the unit of computing environment.A computer program might not a corresponding file.A program can exist in a file together with other program or data, can be a Single-issue program file, can be the file (for example, storing one or more modules, subroutine, or a plurality of files of partial code) of a plurality of coordinations.A computer program can be deployed to a computing machine, many computing machines in same place, or be distributed in a plurality of places with carrying out on interconnected many computing machines of communication network.
Handling procedure that this instructions is described and logic flow, comprise method step of the present invention, can carry out with one or more programmable processors of carrying out one or more computer programs, to input data by operation and to produce output, implement function of the present invention.The logical circuit of specific use, as field programmable gate array (FPGA) or special IC (ASIC), also can be used for carrying out described processing procedure and logic flow and enforcement device of the present invention.
The processor that is applicable to computer program comprises, as and the microprocessor of specific use, and any one or more processors of the digital machine of any kind.Generally speaking, processor will be received instruction and data from ROM (read-only memory) or random access memory or both.The basic module of a computing machine be one for carrying out the processor of instruction and the storer of one or more storage instruction and datas.In general, computing machine also comprise or be effectively coupled to one or more mass-memory units that are used for storing data (as, disk, magneto-optic disk or CD), to receive, send, or transceiving data.The information carrier that is applicable to store computer program instructions and data comprises and comprises as EPROM EEPROM, the semiconductor memory of quick flashing (flash) storer by the non-volatility memorizer of all forms; As the disk of internal hard drive or moveable magnetic disc; Magneto-optic disk; With CD-ROM and DVD-ROM CD etc.Processor and storer can be supplemented or be included into by the logical circuit of specific use the logical circuit of specific use.
For the interaction with user is provided, the present invention can have display device at one, on the computing machine of a keyboard and a pointing device, implements.Display device, as a cathode-ray tube (CRT) (CRT) or liquid crystal display (LCD) are used for showing information for user; Keyboard and pointing device, as mouse or trace ball, help user to provide to be input in computing machine and go.The equipment of other type also can be used for providing the interaction of user and computing machine; For example, the feedback that computing machine offers user can be the sense feedback of any form, for example, and visual feedback, audio feedback, or tactile feedback; The form that user is input to computing machine also can comprise acoustics for any form, voice or sense of touch input.
The present invention can implement in a computer system, described system comprises that one as the aft-end assembly of a data server, or comprise one as the intermediate module (Middleware) of an application server, or comprise one as one and there is the client computer of a graphic user interface or the front end assemblies of network (Web) browser that can be interactive with enforcement of the present invention by its user, or any this class rear end, centre, or the combination of front end assemblies.The assembly of described system can be by interconnecting as any form of communication network or the digital data communication of medium.The example of communication network comprises local network (LAN) and wide area network (WAN), as internet.
This computing system can comprise client and server.Client and server is conventionally away from the other side, conventionally by a communication network interaction.The relation of client and server results from the calculation procedure that has client and service relation moving on computing machine separately respectively.
Fig. 9 has demonstrated an example of such computing machine, a block scheme that can be used for implementing or carrying out the programmable disposal system (abbreviation system) 910 of device of the present invention or method.Described system 910 comprises a processor 920, a random access memory (RAM) 921, a ROM (read-only memory) 922 (as one as the ROM (read-only memory) write of flash ROM (ROM)), a hdd controller 923, an image controller 931, with the controller 924 of an I/O (I/O), and by 925 couplings of processor (CPU) bus.Described system 910 can be programmed as in ROM, also can be with another program source (as floppy disk, CD-ROM, or another computing machine) thus a program of loading is programmed (with being reprogrammed).
Hdd controller 923 and hard disk 930 are coupled and can be used for storing executable computer program.
The controller 924 of I/O is to be connected to an input/output interface 927 by an input/output bus 926.Input/output interface 927 is at serial link, LAN (Local Area Network), and on wireless connections and parallel link etc. communication link, reception and transportation simulator or numerical data (as one group of stage photo, picture, film and animation).
A display 928 and a keyboard 929 are also connected on input/output bus 926.In addition, different line (different buses) can be used for connecting input/output interface 927, display 928 and keyboard 929.
The present invention is described in the mode of specific embodiment.Other embodiment is within the scope of appended claims.For example, the step of invention can be carried out in differing order, and still can reach the result of expectation.

Claims (166)

1. a character string signature scan method, described method comprises:
One or more condition codes are processed into one or more forms, each fixed length feature subcode that described processing is included as in one or more fixed length feature subcodes of each fixed length condition code or each random length condition code is selected a fingerprint, and for one or more fingerprints of described one or more condition codes build one or more for searching the data structure of fingerprint, wherein described in each, fingerprint comprises one or more fragments of fixed length condition code or feature subcode, any position of described one or more fragments in described fixed length condition code or feature subcode;
Receive a character string field being formed by data value; Identify any condition code in described character string field, be included in each and take on the position that scanning step is spacing and scan described character string field to search described one or more fingerprints of described one or more condition codes; With
Export the condition code of any coupling in described character string field.
2. method according to claim 1, the length of described a plurality of fingerprints of described one or more condition codes is all identical.
3. method according to claim 2, the described one or more condition codes of described processing, further comprise and select one or more shadow spaces, and one or more fingerprints are projected in described one or more shadow space and gone, wherein said shadow space is by introducing ambiguity to former space, its form is wider than former space, thereby makes a shadow at the fingerprint of described shadow space corresponding to one or more fingerprints in described former space; Described one or more fingerprints of searching described one or more condition codes, are further included in the shadow of searching described one or more fingerprints in described one or more shadow space.
4. method according to claim 3, the one or more condition codes of described processing, being further included as described one or more condition code, to build fingerprint described in each corresponding one or more follow-up for searching the data structure of the corresponding condition code of fingerprint; Any condition code in the described character string field of described identification, further be included on the scanning position of fingerprint of one or more couplings with the one or more fingerprints that mated corresponding described follow-uply for searching the data structure of the corresponding condition code of fingerprint, search the coupling of any condition code.
5. method according to claim 1, described a plurality of fingerprints of described one or more condition codes have two or more different length.
6. method according to claim 5, the described one or more condition codes of described processing, further comprise and select one or more shadow spaces, and one or more fingerprints are projected in described one or more shadow space and gone, wherein said shadow space is by introducing ambiguity to former space, its form is wider than former space, thereby makes a shadow at the fingerprint of described shadow space corresponding to one or more fingerprints in described former space; Described one or more fingerprints of searching described one or more condition codes, are further included in the shadow of searching described one or more fingerprints in described one or more shadow space.
7. method according to claim 6, the one or more condition codes of described processing, being further included as described one or more condition code, to build fingerprint described in each corresponding one or more follow-up for searching the data structure of the corresponding condition code of fingerprint; Any condition code in the described character string field of described identification, further be included on the scanning position of fingerprint of one or more couplings with the one or more fingerprints that mated corresponding described follow-uply for searching the data structure of the corresponding condition code of fingerprint, search the coupling of any condition code.
8. method according to claim 1, the described one or more condition codes of described processing, further comprise and select one or more shadow spaces, and one or more fingerprints are projected in described one or more shadow space and gone, wherein said shadow space is by introducing ambiguity to former space, its form is wider than former space, thereby makes a shadow at the fingerprint of described shadow space corresponding to one or more fingerprints in described former space; Described one or more fingerprints of searching described one or more condition codes, are further included in the shadow of searching described one or more fingerprints in described one or more shadow space.
9. method according to claim 8, the one or more condition codes of described processing, further be included as described one or more condition code, build fingerprint described in each corresponding one or more follow-up for searching the data structure of the corresponding condition code of fingerprint; Any condition code in the described character string field of described identification, further be included on the scanning position of fingerprint of one or more couplings with the one or more fingerprints that mated corresponding described follow-uply for searching the data structure of the corresponding condition code of fingerprint, search the coupling of any condition code.
10. method according to claim 1, the one or more condition codes of described processing, further be included as described one or more condition code, build corresponding one or more follow-up for searching the data structure of the corresponding condition code of fingerprint of fingerprint described in each; Any condition code in the described character string field of described identification, further be included on the scanning position of fingerprint of one or more couplings with the one or more fingerprints that mated corresponding described follow-uply for searching the data structure of the corresponding condition code of fingerprint, search the coupling of any condition code.
11. methods according to claim 10, the length of described a plurality of fingerprints of described one or more condition codes is all identical.
12. methods according to claim 10, described a plurality of fingerprints of described one or more condition codes have two or more different length.
13. 1 kinds of character string signature scan methods, described method comprises:
One or more condition codes are processed into one or more forms, each fixed length feature subcode that described processing is included as in one or more fixed length feature subcodes of each fixed length condition code or each random length condition code is selected a plurality of fingerprints, and for a plurality of fingerprints of described one or more condition codes build one or more for searching the data structure of fingerprint, wherein the fingerprint number of each fixed length condition code or feature subcode equals scanning step, and first elementary cell displacement of first elementary cell of each subsequent fingerprint relatively previous fingerprint on direction of scanning one or more elementary cells, thereby make can be identified on described fixed length condition code or the feature subcode any position in any character string field being scanned, wherein described in each, fingerprint comprises one or more fragments of fixed length condition code or feature subcode, the any position of described one or more fragment in described fixed length condition code or feature subcode, receive a character string field being formed by data value,
Identify any condition code in described character string field, be included in each and take on the position that described scanning step is spacing and scan described character string field to search described a plurality of fingerprints of described one or more condition codes; With
Export the condition code of any coupling in described character string field.
14. according to method described in claim 13, and the length of described a plurality of fingerprints of described one or more condition codes is all identical.
15. according to method described in claim 14, the described one or more condition codes of described processing, further comprise and select one or more shadow spaces, and one or more fingerprints are projected in described one or more shadow space and gone, wherein said shadow space is by introducing ambiguity to former space, its form is wider than former space, thereby makes a shadow at the fingerprint of described shadow space corresponding to one or more fingerprints in described former space; Described one or more fingerprints of searching described one or more condition codes, are further included in the shadow of searching described one or more fingerprints in described one or more shadow space.
16. according to method described in claim 15, the one or more condition codes of described processing, further be included as described one or more condition code, build fingerprint described in each corresponding one or more follow-up for searching the data structure of the corresponding condition code of fingerprint; Any condition code in the described character string field of described identification, further be included on the scanning position of fingerprint of one or more couplings with the one or more fingerprints that mated corresponding described follow-uply for searching the data structure of the corresponding condition code of fingerprint, search the coupling of any condition code.
17. according to method described in claim 13, and described a plurality of fingerprints of described one or more condition codes have two or more different length.
18. according to method described in claim 17, the described one or more condition codes of described processing, further comprise and select one or more shadow spaces, and one or more fingerprints are projected in described one or more shadow space and gone, wherein said shadow space is by introducing ambiguity to former space, its form is wider than former space, thereby makes a shadow at the fingerprint of described shadow space corresponding to one or more fingerprints in described former space; Described one or more fingerprints of searching described one or more condition codes, are further included in the shadow of searching described one or more fingerprints in described one or more shadow space.
19. according to method described in claim 18, the one or more condition codes of described processing, further be included as described one or more condition code, build fingerprint described in each corresponding one or more follow-up for searching the data structure of the corresponding condition code of fingerprint; Any condition code in the described character string field of described identification, further be included on the scanning position of fingerprint of one or more couplings with the one or more fingerprints that mated corresponding described follow-uply for searching the data structure of the corresponding condition code of fingerprint, search the coupling of any condition code.
20. according to method described in claim 13, the described one or more condition codes of described processing, further comprise and select one or more shadow spaces, and one or more fingerprints are projected in described one or more shadow space and gone, wherein said shadow space is by introducing ambiguity to former space, its form is wider than former space, thereby makes a shadow at the fingerprint of described shadow space corresponding to one or more fingerprints in described former space; Described one or more fingerprints of searching described one or more condition codes, are further included in the shadow of searching described one or more fingerprints in described one or more shadow space.
21. according to method described in claim 20, the one or more condition codes of described processing, further be included as described one or more condition code, build fingerprint described in each corresponding one or more follow-up for searching the data structure of the corresponding condition code of fingerprint; Any condition code in the described character string field of described identification, further be included on the scanning position of fingerprint of one or more couplings with the one or more fingerprints that mated corresponding described follow-uply for searching the data structure of the corresponding condition code of fingerprint, search the coupling of any condition code.
22. according to method described in claim 13, the one or more condition codes of described processing, further be included as described one or more condition code, build fingerprint described in each corresponding one or more follow-up for searching the data structure of the corresponding condition code of fingerprint; Any condition code in the described character string field of described identification, further be included on the scanning position of fingerprint of one or more couplings with the one or more fingerprints that mated corresponding described follow-uply for searching the data structure of the corresponding condition code of fingerprint, search the coupling of any condition code.
23. according to method described in claim 22, and the length of described a plurality of fingerprints of described one or more condition codes is all identical.
24. according to method described in claim 22, and described a plurality of fingerprints of described one or more condition codes have two or more different length.
25. according to claim 1-24 method described in one of them, and the selection of described fingerprint should be accelerated the speed of finger scan, also will allow the probability of false positive coupling of finger scan be kept to minimum simultaneously.
26. according to claim 1-24 method described in one of them, and the selection of described fingerprint further will allow the number of the shared fingerprint of a plurality of condition codes be kept to minimum.
27. according to claim 1-24 method described in one of them, and described in each, fingerprint indicates completely in former space.
28. according to claim 3-4,6-9,15-16, and 18-21 method described in one of them, described in each fingerprint in former space or described at least one shadow space indicate completely.
29. according to claim 1-24 method described in one of them, and described in each, fingerprint deposits content adressable memory (CAM) in former space.
30. according to claim 3-4,6-9,15-16, and 18-21 method described in one of them, described in each fingerprint in former space or at least one shadow space deposit content adressable memory (CAM) in.
31. according to claim 1-24 method described in one of them, and described scanning character string field comprises and uses one or more hash tables or one or more cloth grand (Bloom) filtrator to search one or more fingerprints of one or more condition codes.
32. according to method described in claim 31, and described one or more hash tables or one or more cloth grand (Bloom) filtrator, further comprise that hashed value is multiplexing, or fingerprint length is multiplexing, or hashed value is multiplexing and fingerprint length is multiplexing.
33. according to claim 1-24 method described in one of them, and described scanning character string field is not used hash table and Bu Long (Bloom) filtrator to search the scan method of one or more fingerprints of one or more condition codes.
34. according to method described in claim 33, and described scanning character string field comprises use one or more content adressable memorys (CAM) to search one or more fingerprints of one or more condition codes.
35. according to claim 1-24 method described in one of them, and the number of the different fingerprint length of described a plurality of fixed length condition codes or feature subcode is fewer than the number of the length of the corresponding described a plurality of condition codes of described fingerprint or feature subcode.
36. according to claim 1,5-10, and 12-13,17-22,24 methods described in one of them, the length of described fingerprint is restricted to one group of a plurality of fingerprint length that cover one or more predetermined fingerprint length ranges to realize the finger scan of multiresolution.
37. according to claim 1-24 method described in one of them, and the length of described fingerprint equals the integral multiple of described scanning step or described scanning step.
38. according to claim 1-24 method described in one of them, the described fingerprint for one or more condition codes builds one or more for searching the data structure of fingerprint, further comprise the one or more fingerprint sections that fingerprint described in each are decomposed into one or more length, and build one or more for searching the data structure of fingerprint section and the data structure of one or more fingerprint sections for the synthesis of having mated for the described fingerprint section of described one or more condition codes; Describedly on each scanning position, scan described character string field to search one or more fingerprints of one or more condition codes, further be included in described in each and scan described character string field to search a plurality of fingerprint sections of one or more condition codes on scanning position, and the fingerprint section of having mated is synthesized to the coupling of any fingerprint on the scanning position of fingerprint section that has one or more couplings.
39. according to method described in claim 38, and the selection of described fingerprint further will allow the number of fingerprint of identical fingerprints section be kept to minimum.
40. according to method described in claim 38, and the length of all described fingerprint sections is all identical, and equals integer/mono-of described scanning step or described scanning step.
41. according to method described in claim 38, and the described character string field of described scanning comprises the one or more hash tables of use or one or more cloth grand (Bloom) filtrator to search a plurality of fingerprint sections of one or more condition codes.
42. according to method described in claim 41, and described one or more hash tables or one or more cloth grand (Bloom) filtrator, further comprise that hashed value is multiplexing, or fingerprint length is multiplexing, or hashed value is multiplexing and fingerprint length is multiplexing.
43. according to method described in claim 38, and the described character string field of described scanning is not used hash table and Bu Long (Bloom) filtrator to search the scan method of a plurality of fingerprint sections of one or more condition codes.
44. according to method described in claim 43, and the described character string field of described scanning comprises use one or more content adressable memorys (CAM) to search a plurality of fingerprint sections of one or more condition codes.
45. according to method described in claim 38, describedly for fingerprint section builds one or more generated data structures, is further included as the fingerprint section figure that each fingerprint section is stored the one or more possible positions of the described fingerprint section of one or more explanations in the fingerprint of all correspondences; The coupling that the described fingerprint section having mated synthesizes any fingerprint further comprises uses described one or more fingerprint section figure.
46. according to method described in claim 38, describedly builds one or more fingerprint length informations that may length that each or first fingerprint section that one or more generated data structures are further included as fingerprint described in each are stored the corresponding fingerprint of the described fingerprint section of one or more explanations for fingerprint section; The coupling that the described fingerprint section having mated synthesizes any fingerprint further comprises uses described fingerprint length information.
47. according to method described in claim 38, and the scan method that the described fingerprint section having mated synthesizes the coupling of any fingerprint is not used fingerprint section figure and fingerprint length information.
48. according to claim 3-4,6-9,15-16, with 18-21 method described in one of them, the described fingerprint for one or more condition codes builds one or more data structures of searching, further comprise the shadow of fingerprint described in each is carried out to segmentation, and build one or more for searching the data structure of shadow section of fingerprint and the data structure of the shadow section of one or more fingerprints for the synthesis of having mated for the shadow section of the described fingerprint of described one or more condition codes; Describedly on each scanning position, scan described character string field and be further included in described in each and scan described character string field to search the shadow section of a plurality of fingerprints of one or more condition codes on scanning position to search one or more fingerprints of one or more condition codes or the shadow of fingerprint, and the shadow section of the fingerprint having mated is synthesized to the coupling of shadow or the coupling of any fingerprint of any fingerprint on the scanning position of shadow section of fingerprint that has one or more couplings.
49. according to method described in claim 48, and the selection of described fingerprint further will allow the number of fingerprint of shadow section of identical fingerprints be kept to minimum.
50. according to method described in claim 48, is further included in original fingerprint space any fingerprint shadow of having identified is examined.
51. according to method described in claim 48, and the length of all fingerprint sections of each shadow space is identical, and equals integer/mono-of scanning step or the scanning step of described shadow space.
52. according to method described in claim 48, and the described character string field of described scanning comprises the one or more hash tables of use or one or more cloth grand (Bloom) filtrator to search the shadow section of a plurality of fingerprints of one or more condition codes.
53. according to method described in claim 52, and described one or more hash tables or one or more cloth grand (Bloom) filtrator, further comprise that hashed value is multiplexing, or fingerprint length is multiplexing, or hashed value is multiplexing and fingerprint length is multiplexing.
54. according to method described in claim 48, and the described character string field of described scanning is not used hash table and Bu Long (Bloom) filtrator to search the scan method of shadow section of a plurality of fingerprints of one or more condition codes.
55. according to method described in claim 54, and the described character string field of described scanning comprises use one or more content adressable memorys (CAM) to search the shadow section of a plurality of fingerprints of one or more condition codes.
56. according to method described in claim 48, describedly builds the fingerprint section figure of shadow section that shadow section that one or more generated data structures are further included as each fingerprint stores the described fingerprint of the one or more explanations one or more possible positions in the shadow of the fingerprint of all correspondences for the shadow section of fingerprint; The coupling of shadow or the coupling of any fingerprint that the shadow section of the described fingerprint having mated synthesizes any fingerprint further comprise the described one or more fingerprint section figure of use.
57. according to method described in claim 48, described for the shadow section of fingerprint builds each or the shadow section of first fingerprint that one or more generated data structures are further included as the shadow of fingerprint described in each, store one or more fingerprint length informations that may length of shadow of the corresponding fingerprint of shadow section of the described fingerprint of one or more explanations; The coupling of shadow or the coupling of any fingerprint that the shadow section of the described fingerprint having mated synthesizes any fingerprint further comprise the described fingerprint length information of use.
58. according to method described in claim 48, and the shadow section of the described fingerprint having mated synthesizes the coupling of shadow of any fingerprint or the scan method of the coupling of any fingerprint is not used fingerprint section figure and fingerprint length information.
59. according to claim 3-4,6-9,15-16, with 18-21 method described in one of them, described in search on the scanning position of fingerprint shadow that one or more fingerprints are further included in one or more couplings and in described former space, examine described one or more corresponding fingerprint of fingerprint shadow having mated.
60. according to claim 3-4,6-9, and 15-16, and 18-21 method described in one of them, described projection comprises helps capitalization or full lowercase capitalization and lowercase conversion.
61. according to claim 3-4,6-9,15-16, with 18-21 method described in one of them, described projection comprises with a code or all letters of any one letter replacement, by a code or any one numeral, replace all numerals, and replace the one or more projections in all spaces and "-" with a code or space or "-".
62. according to claim 1-24 method described in one of them, and wherein the fingerprint of different length is scanned respectively by different scan methods.
63. according to claim 1-24 method described in one of them, and wherein the fingerprint of different length is by the scan method with identical, but different sweep parameters scans respectively.
64. according to method described in claim 63, wherein with shorter scanning step, scans shorter fingerprint and scans longer fingerprint with longer scanning step.
65. according to claim 4,7,9,10-12,16,19,21, and 22-24 method described in one of them, described structure is one or more follow-up for searching the data structure of the corresponding condition code of fingerprint, further comprise the mutual different part by the corresponding a plurality of fixed length condition codes of described fingerprint or feature subcode, build difference and search data structure; Fingerprint that described use has been mated is corresponding follow-up for searching the data structure of the corresponding condition code of fingerprint, searches the coupling of any condition code, comprises that by the corresponding difference of the described fingerprint having mated, searching data structure carries out difference and search.
66. according to method described in claim 65, and it is a selection elementary cell binary tree that described difference is searched data structure.
67. according to method described in claim 65, and it is that a selection elementary cell K divides tree that described difference is searched data structure.
68. according to claim 4, 7, 9, 10-12, 16, 19, 21, with 22-24 method described in one of them, described structure is one or more follow-up further comprised one or more yards of films is compiled and are added in one or more fixed length condition codes or feature subcode or the elementary cell of one or more fixed length condition code or feature subcode or sub-elementary cell or code section is upper with explanation, comprises whether will mating for searching the data structure of the corresponding condition code of fingerprint, whether case sensitive, preset range, any range, logic NOT, with the one or more matching conditions in other logical operation, fingerprint that described use has been mated is corresponding follow-up for searching the data structure of the corresponding condition code of fingerprint, search the coupling of any condition code, be further included in that on the position of fingerprint of one or more couplings, to search described one or more fingerprint having mated corresponding by fixed length condition code or the feature subcode of code film coding.
69. according to claim 4,7,9,10-12,16,19,21, and 22-24 method described in one of them, described structure is one or more follow-up for searching the data structure of the corresponding condition code of fingerprint, further comprise each the random length condition code in one or more condition codes is decomposed into a plurality of fixed length feature subcodes and one or more random length feature subcode, and build one by the condition code composition rule database of fixed length feature subcode indication; Fingerprint that described use has been mated is corresponding follow-up for searching the coupling of any condition code of data structure lookup of the corresponding condition code of fingerprint, be included on the position of fixed length feature subcode of one or more couplings with described condition code composition rule database, described fixed length feature subcode of having mated and corresponding random length feature subcode synthesized to the coupling of any random length condition code; How wherein said condition code composition rule database provides the rule that the fixed length feature subcode of random length condition code and random length feature subcode is synthesized to random length condition code.
70. according to method described in claim 69, and described condition code composition rule database comprises the corresponding random length condition code numbering of described fixed length feature subcode; Described synthetic fixed length feature subcode of having mated comprises the corresponding random length condition code numbering of fixed length feature subcode of having mated described in inspection.
71. according to method described in claim 69, described condition code composition rule database comprises the positional information of described fixed length feature subcode, described positional information comprises the order of described fixed length feature subcode in all fixed length feature subcodes of corresponding random length condition code, or to distance or the distance range of next fixed length feature subcode, or the order in all fixed length feature subcodes of corresponding random length condition code and to distance or the distance range of next fixed length feature subcode; Described synthetic fixed length feature subcode of having mated comprises the positional information of the fixed length feature subcode of having mated described in inspection.
72. according to claim 4, 7, 9, 10-12, 16, 19, 21, with 22-24 method described in one of them, described structure is one or more follow-up for searching the data structure of the corresponding condition code of fingerprint, further comprise each the random length condition code in one or more condition codes is decomposed into a plurality of fixed length feature subcodes and one or more random length feature subcode, and build and one or morely described fixed length feature subcode and corresponding random length feature subcode can be synthesized to the generated data structure of random length condition code, described generated data structure is a kind of data structure of searching of the specific scan method that scans random length condition code, wherein each fixed length feature subcode is done to search data structure described in as a whole being added into, fingerprint that described use has been mated is corresponding follow-up for searching the coupling of any condition code of data structure lookup of the corresponding condition code of fingerprint, be included on the position of fixed length feature subcode of one or more couplings by the described specific scan method that scans random length condition code, described fixed length feature subcode of having mated and corresponding random length feature subcode synthesized to the coupling of any random length condition code.
73. according to claim 1-24 method described in one of them, the described character string field of described scanning is undertaken by piece, wherein described in each piece, string chunk comprises a finger scan region, one provides the front in finger scan region of fingerprint reference data before, with one provide after fingerprint and among reference data after in finger scan region, wherein said string chunk is overlapped, thereby make the set in the described finger scan region of all described string chunks, covered all possible fingerprint positions of described character string field.
74. according to claim 1-24 method described in one of them, the described character string field of described scanning comprises the scope that checks described character string field, for scanned before or after described character string field or before after, fill any elementary cell not occurring in beginning or the ending of any condition code.
75. according to claim 1-24 method described in one of them, any condition code in the described character string field of described identification, further comprise described character string field normalization, comprise the character string solution coding of having encoded, the character string of having compressed is decompressed, and delete the one or more normalization processes in unwanted character string.
76. according to claim 13-24 method described in one of them, and described method scans one or more condition codes with larger scanning step, is shorter than the condition code that described scanning step can scan and is scanned separately by different scanning method.
77. according to claim 13-24 method described in one of them, and described method scans one or more condition codes with larger scanning step, is shorter than condition code that described scanning step can scan by same scan method but different scanning step-length is scanned separately.
78. according to method described in claim 77, wherein for the scanning step that scans shorter condition code than short for scanning the scanning step of longer condition code.
79. 1 kinds of character string signature scan systems, described system comprises:
One can be processed into one or more condition codes the condition code pretreatment module of one or more forms, each fixed length feature subcode that described processing is included as in one or more fixed length feature subcodes of each fixed length condition code or each random length condition code is selected a fingerprint, and for one or more fingerprints of described one or more condition codes build a fingerprint database that is used for searching fingerprint, wherein described in each, fingerprint comprises one or more fragments of fixed length condition code or feature subcode, the any position of described one or more fragment in described fixed length condition code or feature subcode, with
One can be in a character string field each take and identify the finger scan engine of one or more fingerprints of one or more condition codes on the position that scanning step is spacing.
80. according to system described in claim 79, and the length of described a plurality of fingerprints of described one or more condition codes is all identical.
81. systems described in 0 according to Claim 8, described condition code pretreatment module can further be selected one or more shadow spaces, and one or more fingerprints are projected in described one or more shadow space and gone, wherein said shadow space is by introducing ambiguity to former space, its form is wider than former space, thereby makes a shadow at the fingerprint of described shadow space corresponding to one or more fingerprints in described former space; Described finger scan engine can further be searched the shadow of described one or more fingerprints in described one or more shadow spaces.
82. systems described in 1 according to Claim 8, further comprise the fixed length condition code Lookup engine that can identify the fixed length feature subcode of corresponding one or more fixed length condition codes of fingerprint described in each or random length condition code; Described condition code pretreatment module may further be described fixed length condition code Lookup engine and builds one by the fixed length condition code database of fingerprint indication.
83. systems described in 2 according to Claim 8, further comprise that one can synthesize any random length condition code the fixed length feature subcode of random length condition code and corresponding random length feature subcode, maybe can identify the random length condition code Lookup engine of the corresponding one or more random length condition codes of each fingerprint; Described condition code pretreatment module may further be described random length condition code Lookup engine and builds one by the condition code composition rule database of fixed length feature subcode indication, or by the random length condition code database of fingerprint indication.
84. according to system described in claim 79, and described a plurality of fingerprints of described one or more condition codes have two or more different length.
85. systems described in 4 according to Claim 8, described condition code pretreatment module can further be selected one or more shadow spaces, and one or more fingerprints are projected in described one or more shadow space and gone, wherein said shadow space is by introducing ambiguity to former space, its form is wider than former space, thereby makes a shadow at the fingerprint of described shadow space corresponding to one or more fingerprints in described former space; Described finger scan engine can further be searched the shadow of described one or more fingerprints in described one or more shadow spaces.
86. systems described in 5 according to Claim 8, further comprise the fixed length condition code Lookup engine that can identify the fixed length feature subcode of corresponding one or more fixed length condition codes of fingerprint described in each or random length condition code; Described condition code pretreatment module may further be described fixed length condition code Lookup engine and builds one by the fixed length condition code database of fingerprint indication.
87. systems described in 6 according to Claim 8, further comprise that one can synthesize any random length condition code the fixed length feature subcode of random length condition code and corresponding random length feature subcode, maybe can identify the random length condition code Lookup engine of the corresponding one or more random length condition codes of each fingerprint; Described condition code pretreatment module may further be described random length condition code Lookup engine and builds one by the condition code composition rule database of fixed length feature subcode indication, or by the random length condition code database of fingerprint indication.
88. according to system described in claim 79, described condition code pretreatment module can further be selected one or more shadow spaces, and one or more fingerprints are projected in described one or more shadow space and gone, wherein said shadow space is by introducing ambiguity to former space, its form is wider than former space, thereby makes a shadow at the fingerprint of described shadow space corresponding to one or more fingerprints in described former space; Described finger scan engine can further be searched the shadow of described one or more fingerprints in described one or more shadow spaces.
89. systems described in 8 according to Claim 8, further comprise the fixed length condition code Lookup engine that can identify the fixed length feature subcode of corresponding one or more fixed length condition codes of fingerprint described in each or random length condition code; Described condition code pretreatment module may further be described fixed length condition code Lookup engine and builds one by the fixed length condition code database of fingerprint indication.
90. systems described in 9 according to Claim 8, further comprise that one can synthesize any random length condition code the fixed length feature subcode of random length condition code and corresponding random length feature subcode, maybe can identify the random length condition code Lookup engine of the corresponding one or more random length condition codes of each fingerprint; Described condition code pretreatment module may further be described random length condition code Lookup engine and builds one by the condition code composition rule database of fixed length feature subcode indication, or by the random length condition code database of fingerprint indication.
91. according to system described in claim 79, further comprises the fixed length condition code Lookup engine that can identify the fixed length feature subcode of corresponding one or more fixed length condition codes of fingerprint described in each or random length condition code; Described condition code pretreatment module may further be described fixed length condition code Lookup engine and builds one by the fixed length condition code database of fingerprint indication.
92. according to system described in claim 91, further comprise that one can synthesize any random length condition code the fixed length feature subcode of random length condition code and corresponding random length feature subcode, maybe can identify the random length condition code Lookup engine of the corresponding one or more random length condition codes of each fingerprint; Described condition code pretreatment module may further be described random length condition code Lookup engine and builds one by the condition code composition rule database of fixed length feature subcode indication, or by the random length condition code database of fingerprint indication.
93. according to system described in claim 92, and the length of described a plurality of fingerprints of described one or more condition codes is all identical.
94. according to system described in claim 92, and described a plurality of fingerprints of described one or more condition codes have two or more different length.
95. 1 kinds of character string signature scan systems, described system comprises:
One can be processed into one or more condition codes the condition code pretreatment module of one or more forms, each the fixed length feature subcode being included as in one or more fixed length feature subcodes of each fixed length condition code or each random length condition code is selected a plurality of fingerprints, and for a plurality of fingerprints of one or more condition codes build a fingerprint database that is used for searching fingerprint, wherein the fingerprint number of each fixed length condition code or feature subcode equals scanning step, and first elementary cell displacement of first elementary cell of each subsequent fingerprint relatively previous fingerprint on direction of scanning one or more elementary cells, thereby make can be identified on described fixed length condition code or the feature subcode any position in any character string field being scanned, wherein described in each, fingerprint comprises one or more fragments of a fixed length condition code or feature subcode, the any position of described one or more fragment in described fixed length condition code or feature subcode, with
One can be in a character string field each take and identify the finger scan engine of a plurality of fingerprints of one or more condition codes on the position that described scanning step is spacing.
96. according to system described in claim 95, and the length of described a plurality of fingerprints of described one or more condition codes is all identical.
97. according to system described in claim 96, described condition code pretreatment module can further be selected one or more shadow spaces, and one or more fingerprints are projected in described one or more shadow space and gone, wherein said shadow space is by introducing ambiguity to former space, its form is wider than former space, thereby makes a shadow at the fingerprint of described shadow space corresponding to one or more fingerprints in described former space; Described finger scan engine can further be searched the shadow of described one or more fingerprints in described one or more shadow spaces.
98. according to system described in claim 97, further comprises the fixed length condition code Lookup engine that can identify the fixed length feature subcode of corresponding one or more fixed length condition codes of fingerprint described in each or random length condition code; Described condition code pretreatment module may further be described fixed length condition code Lookup engine and builds one by the fixed length condition code database of fingerprint indication.
99. according to system described in claim 98, further comprise that one can synthesize any random length condition code the fixed length feature subcode of random length condition code and corresponding random length feature subcode, maybe can identify the random length condition code Lookup engine of the corresponding one or more random length condition codes of each fingerprint; Described condition code pretreatment module may further be described random length condition code Lookup engine and builds one by the condition code composition rule database of fixed length feature subcode indication, or by the random length condition code database of fingerprint indication.
100. according to system described in claim 95, and described a plurality of fingerprints of described one or more condition codes have two or more different length.
101. according to system described in claim 100, described condition code pretreatment module can further be selected one or more shadow spaces, and one or more fingerprints are projected in described one or more shadow space and gone, wherein said shadow space is by introducing ambiguity to former space, its form is wider than former space, thereby makes a shadow at the fingerprint of described shadow space corresponding to one or more fingerprints in described former space; Described finger scan engine can further be searched the shadow of described one or more fingerprints in described one or more shadow spaces.
102. according to system described in claim 101, further comprises the fixed length condition code Lookup engine that can identify the fixed length feature subcode of corresponding one or more fixed length condition codes of fingerprint described in each or random length condition code; Described condition code pretreatment module may further be described fixed length condition code Lookup engine and builds one by the fixed length condition code database of fingerprint indication.
103. according to system described in claim 102, further comprise that one can synthesize any random length condition code the fixed length feature subcode of random length condition code and corresponding random length feature subcode, maybe can identify the random length condition code Lookup engine of the corresponding one or more random length condition codes of each fingerprint; Described condition code pretreatment module may further be described random length condition code Lookup engine and builds one by the condition code composition rule database of fixed length feature subcode indication, or by the random length condition code database of fingerprint indication.
104. according to system described in claim 95, described condition code pretreatment module can further be selected one or more shadow spaces, and one or more fingerprints are projected in described one or more shadow space and gone, wherein said shadow space is by introducing ambiguity to former space, its form is wider than former space, thereby makes a shadow at the fingerprint of described shadow space corresponding to one or more fingerprints in described former space; Described finger scan engine can further be searched the shadow of described one or more fingerprints in described one or more shadow spaces.
105. according to system described in claim 104, further comprises the fixed length condition code Lookup engine that can identify the fixed length feature subcode of corresponding one or more fixed length condition codes of fingerprint described in each or random length condition code; Described condition code pretreatment module may further be described fixed length condition code Lookup engine and builds one by the fixed length condition code database of fingerprint indication.
106. according to system described in claim 105, further comprise that one can synthesize any random length condition code the fixed length feature subcode of random length condition code and corresponding random length feature subcode, maybe can identify the random length condition code Lookup engine of the corresponding one or more random length condition codes of each fingerprint; Described condition code pretreatment module may further be described random length condition code Lookup engine and builds one by the condition code composition rule database of fixed length feature subcode indication, or by the random length condition code database of fingerprint indication.
107. according to system described in claim 95, further comprises the fixed length condition code Lookup engine that can identify the fixed length feature subcode of corresponding one or more fixed length condition codes of fingerprint described in each or random length condition code; Described condition code pretreatment module may further be described fixed length condition code Lookup engine and builds one by the fixed length condition code database of fingerprint indication.
108. according to system described in claim 107, further comprise that one can synthesize any random length condition code the fixed length feature subcode of random length condition code and corresponding random length feature subcode, maybe can identify the random length condition code Lookup engine of the corresponding one or more random length condition codes of each fingerprint; Described condition code pretreatment module may further be described random length condition code Lookup engine and builds one by the condition code composition rule database of fixed length feature subcode indication, or by the random length condition code database of fingerprint indication.
109. according to system described in claim 108, and the length of described a plurality of fingerprints of described one or more condition codes is all identical.
110. according to system described in claim 108, and described a plurality of fingerprints of described one or more condition codes have two or more different length.
111. according to claim 79-110 system described in one of them, and the selected described fingerprint of described condition code pretreatment module indicates completely in former space; The described fingerprint indicating completely of described finger scan engine scanning.
112. 1-83 according to Claim 8,85-90,97-99, and 101-104 system described in one of them, the selected described fingerprint of described condition code pretreatment module in former space or at least one shadow space indicate completely; The described finger scan engine described fingerprint indicating completely of scanning or fingerprint shadow.
113. according to claim 79-110 system described in one of them, further comprises that can be treated to a character string field to be scanned a scanning pre-service engine for the required form of one or more scanning.
114. according to system described in claim 113, and described scanning pre-service engine further comprises a scanning conveyer, a format decoder, a projector, a character string field store device, a decoded word segment memory, and a shadow field store device; Wherein scan conveyer data to be scanned are loaded into format decoder from character string field store device, the output of described format decoder is transported to decoded word segment memory and projector, and the output of described projector is transported to shadow field store device.
115. according to claim 79-110 system described in one of them, and described finger scan engine comprises with one or more hash tables or one or more cloth grand (Bloom) filtrator.
116. according to system described in claim 115, and described one or more hash tables or one or more cloth grand (Bloom) filtrator, further comprise that hashed value is multiplexing, or fingerprint length is multiplexing, or hashed value is multiplexing and fingerprint length is multiplexing.
117. according to claim 79-110 system described in one of them, and described finger scan engine does not use hash table and Bu Long (Bloom) filtrator.
118. according to system described in claim 117, and described finger scan engine comprises with one or more content adressable memorys (CAM).
119. according to claim 79-110 system described in one of them, described condition code pretreatment module further can be decomposed into each fingerprint in one or more fingerprints one or more fingerprint sections of one or more length, and builds one or more for searching data structure and one or more data structure for the synthesis of coupling fingerprint section of fingerprint section for a plurality of fingerprint sections of described one or more fingerprints; Described finger scan engine further comprises the fingerprint finger that can search a plurality of fingerprint sections of one or more fingerprints, and one can synthesize described a plurality of fingerprint sections the fingerprint compositor of any coupling fingerprint.
120. according to system described in claim 119, and the length of all described fingerprint section of a described finger scan engine is all identical, and equals integer/mono-of scanning step or the described scanning step of described finger scan engine.
121. according to system described in claim 119, and described fingerprint finger comprises with one or more hash tables or one or more cloth grand (Bloom) filtrator.
122. according to system described in claim 121, and described one or more hash tables or one or more cloth grand (Bloom) filtrator, further comprise that hashed value is multiplexing, or fingerprint length is multiplexing, or hashed value is multiplexing and fingerprint length is multiplexing.
123. according to system described in claim 119, and described fingerprint finger is not used hash table and Bu Long (Bloom) filtrator.
124. according to system described in claim 123, and described fingerprint finger comprises with one or more content adressable memorys (CAM).
125. according to system described in claim 119, described fingerprint compositor comprises with one or more fingerprint section figure, or fingerprint length information, or one or more fingerprint section figure and fingerprint length information, a plurality of described fingerprint sections are synthesized serially to the fingerprint of any " at least one fingerprint matching ".
126. according to system described in claim 119, described fingerprint compositor comprises with one or more fingerprint section figure, or fingerprint length information, or one or more fingerprint section figure and fingerprint length information, synthesize a plurality of described fingerprint sections the fingerprint of any " all fingerprint matchings " serially.
127. according to system described in claim 119, and described fingerprint compositor can synthesize a plurality of described fingerprint sections the fingerprint of any " at least one fingerprint matching " or " all fingerprint matchings " concurrently.
128. according to system described in claim 119, and described fingerprint compositor does not use fingerprint section figure and fingerprint length information.
129. 1-83 according to Claim 8,85-90,97-99, with 101-106 system described in one of them, described condition code pretreatment module further can be decomposed into the shadow of fingerprint described in each the shadow section of one or more fingerprints of one or more length, and builds one or more for searching the data structure of shadow section and the data structure of one or more shadow section for the synthesis of coupling fingerprint of fingerprint for the shadow section of the described fingerprint of described one or more condition codes; Described finger scan engine further comprises the fingerprint finger of the shadow section of the fingerprint that can search one or more condition codes, and one can synthesize the fingerprint shadow of any coupling or the fingerprint compositor of any coupling fingerprint the shadow section of described a plurality of fingerprints.
130. according to system described in claim 129, described finger scan engine is identical in the length of the shadow section of all fingerprints of each shadow space, and equals described finger scan engine integer/mono-of scanning step or the scanning step of described shadow space.
131. according to system described in claim 129, and described fingerprint finger comprises with one or more hash tables or one or more cloth grand (Bloom) filtrator.
132. according to system described in claim 131, and described one or more hash tables or one or more cloth grand (Bloom) filtrator, further comprise that hashed value is multiplexing, or fingerprint length is multiplexing, or hashed value is multiplexing and fingerprint length is multiplexing.
133. according to system described in claim 129, and described fingerprint finger is not used hash table and Bu Long (Bloom) filtrator.
134. according to system described in claim 133, and described fingerprint finger comprises with one or more content adressable memorys (CAM).
135. according to system described in claim 129, described fingerprint compositor comprises with one or more fingerprint section figure, or fingerprint length information, or one or more fingerprint section figure and fingerprint length information, the shadow section of a plurality of described fingerprints is synthesized serially to the shadow of the fingerprint of any " at least one fingerprint matching ".
136. according to system described in claim 126, and described fingerprint compositor comprises the shadow that the shadow section of a plurality of described fingerprints is synthesized serially to the fingerprint of any " all fingerprint matchings ".
137. according to system described in claim 129, described fingerprint compositor comprises with one or more fingerprint section figure, or fingerprint length information, or one or more fingerprint section figure and fingerprint length information, the shadow section of a plurality of described fingerprints is synthesized concurrently to the shadow of the fingerprint of any " at least one fingerprint matching " or " all fingerprint matchings ".
138. according to system described in claim 129, and described fingerprint compositor does not use fingerprint section figure and fingerprint length information.
139. 1-83 according to Claim 8,85-90,97-99, with 101-106 system described in one of them, described finger scan engine can further be examined one or more corresponding fingerprints of fingerprint shadow that mated on the scanning position of fingerprint shadow that has one or more couplings in described former space.
140. according to claim 79-110 system described in one of them, and described finger scan engine further comprises a finger scan controller, a fingerprint hash counter, a fingerprint matching device, a fingerprint compositor, and a fingerprint database.
141. according to system described in claim 140, and an order hash function for described fingerprint hash counter, in the prefix fragment of a plurality of partly overlapping hashed key successively, calculates the hashed value of described a plurality of hashed key in order.
142. according to claim 79-110 system described in one of them, further comprises a plurality of described finger scan engines, the scan method of the finger scan engine of fingerprint that wherein scans different length based on different.
143. according to claim 79-110 system described in one of them, further comprises a plurality of described finger scan engines, the scan method of the finger scan engine of fingerprint that wherein scans different length based on identical, but different sweep parameters.
144. according to system described in claim 143, and wherein the shorter finger scan engine of scanning step is used for the shorter fingerprint of scanning and the longer finger scan engine of scanning step and is used for the longer fingerprint of scanning.
145. 2-83 according to Claim 8,86-87,89-94,98-99,102-103, and 105-110 system described in one of them, described fixed length condition code Lookup engine further comprises a condition code finger, and a condition code is examined device, and a fixed length condition code database.
146. according to system described in claim 145, and described condition code pretreatment module further can build the difference of the corresponding a plurality of fixed length condition codes of fingerprint or feature subcode and search data structure; Described condition code finger can be searched data structure by described difference, the corresponding a plurality of fixed length condition codes of the fingerprint having mated or feature subcode is carried out to difference and search.
147. according to system described in claim 145, described condition code pretreatment module further can be compiled one or more yards of films to be added in one or more condition codes or feature subcode or the elementary cell of one or more fixed length condition code or feature subcode or sub-elementary cell or code section is upper with explanation, comprises whether will mating, whether case sensitive, preset range, any range, logic NOT, and the one or more matching conditions in other logical operation; Described condition code is examined device, or condition code is examined condition code unit or condition code section that device and condition code finger can more one or more tape code films.
148. 2-83 according to Claim 8,86-87,89-94,98-99,102-103, and 105-110 system described in one of them, described fixed length condition code Lookup engine comprises one or more content adressable memorys (CAM).
149. 2-83 according to Claim 8,86-87,89-94,98-99,102-103, and 105-110 system described in one of them, the engine number of described finger scan engine is different from the engine number of described fixed length condition code Lookup engine.
150. 2-83 according to Claim 8,86-87,89-94,98-99,102-103, with 105-110 system described in one of them, further comprise a plurality of described fixed length condition code Lookup engines, wherein scan the fixed length condition code of different length or the scan method of the fixed length condition code Lookup engine of feature subcode based on different.
151. 2-83 according to Claim 8,86-87,89-94,98-99,102-103, and 105-110 system described in one of them, further comprise a plurality of described fixed length condition code Lookup engines, wherein scan the fixed length condition code of different length or the scan method of the fixed length condition code Lookup engine of feature subcode based on identical, but different sweep parameters.
152. according to system described in claim 151, further comprise a plurality of described fixed length condition code Lookup engines, wherein the longer fixed length condition code Lookup engine of scanning step is used for scanning longer fixed length condition code or feature subcode, and the shorter fixed length condition code Lookup engine of scanning step is used for scanning shorter fixed length condition code or feature subcode.
153. according to Claim 83,87,90,92-94,99,103,106, with 108-110 system described in one of them, described random length condition code Lookup engine further comprises a condition code rule searching device, a condition code state verification device, a condition code rule database, with a condition code state form, wherein condition code rule searching device is numbered relevant rule and condition code rule is offered to condition code state verification device for the fixed length feature subcode finding out from condition code rule database and mated; Condition code rule database provides the rule that how the fixed length feature subcode of random length condition code is synthesized to random length condition code; Condition code state form is the synthetic state of a process of input of character string field store; Condition code state verification device synthesizes random length condition code regeneration characteristics code state form according to condition code rule by the fixed length feature subcode of having mated.
154. according to system described in claim 153, and described condition code pretreatment module can further build described condition code rule database; Described condition code rule searching device can be searched the corresponding random length condition code of the fixed length feature subcode composition rule having mated; Described condition code state verification device can, with described random length condition code composition rule, be verified the fixed length feature subcode of having mated and upgrade described condition code state form.
155. according to system described in claim 154, described condition code rule database further comprises the positional information of described fixed length feature subcode, described positional information comprises the order of described fixed length feature subcode in all fixed length feature subcodes of corresponding random length condition code, or to distance or the distance range of next fixed length feature subcode, or the order in all fixed length feature subcodes of corresponding random length condition code and to distance or the distance range of next fixed length feature subcode; The fixed length feature subcode that described checking has been mated comprises the positional information of the fixed length feature subcode of having mated described in inspection.
156. according to system described in claim 154, and described condition code rule database further comprises the corresponding random length condition code numbering of described fixed length feature subcode; The fixed length feature subcode that described checking has been mated comprises the corresponding random length condition code numbering of the fixed length feature subcode of having mated described in inspection.
157. according to Claim 83,87,90,92-94,99,103,106, and 108-110 system described in one of them, described random length condition code Lookup engine comprises one or more content adressable memorys (CAM).
158. according to Claim 83,87,90,92-94,99,103,106, and 108-110 system described in one of them, described finger scan engine, described fixed length condition code Lookup engine is entirely identical with the engine number of described random length condition code Lookup engine.
159. according to Claim 83,87,90,92-94,99,103,106, and 108-110 system described in one of them, described finger scan engine, described fixed length condition code Lookup engine, and the engine number of described random length condition code Lookup engine is entirely not identical.
160. according to Claim 83,87,90,92-94,99,103,106, and 108-110 system described in one of them, described random length condition code Lookup engine based on a kind of each fixed length feature subcode do as a whole scanned random length condition code scan method.
161. according to system described in claim 113, and described scanning pre-service engine is processed described character string field by piece.
162. according to system described in claim 161, every described string chunk further comprises a finger scan region, one provides the front in finger scan region of fingerprint reference data before, with one provide after fingerprint and among reference data after in finger scan region, wherein said string chunk is overlapped, thereby make the set in the described finger scan region of all described string chunks, covered all possible fingerprint positions of described character string field.
163. according to system described in claim 161, and described character string field piece is stored in the memory block of a ring of a plurality of measure-alike formations to reduce the movement of data in internal memory.
164. according to claim 95-110 system described in one of them, and described system comprises with larger scanning step and scan one or more condition codes, is shorter than the device that condition code that described scanning step can scan is scanned separately by different scanning method.
165. according to claim 95-110 system described in one of them, described system comprises and scans one or more condition codes with larger scanning step, is shorter than condition code that described scanning step can scan by same scan method but the device that different scanning step-length is scanned separately.
166. according to system described in claim 165, wherein for the scanning step of scanning engine that scans shorter condition code than short for scanning the scanning step of scanning engine of longer condition code.
CN200880127748.0A 2008-10-20 2008-10-20 Fast signature scan Expired - Fee Related CN101960469B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201711339378.4A CN108197470A (en) 2008-10-20 2008-10-20 Fast signature scan
CN201410055830.4A CN103793522B (en) 2008-10-20 2008-10-20 Fast signature scan

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/US2008/080457 WO2010047683A1 (en) 2008-10-20 2008-10-20 Fast signature scan

Related Child Applications (2)

Application Number Title Priority Date Filing Date
CN201410055830.4A Division CN103793522B (en) 2008-10-20 2008-10-20 Fast signature scan
CN201711339378.4A Division CN108197470A (en) 2008-10-20 2008-10-20 Fast signature scan

Publications (2)

Publication Number Publication Date
CN101960469A CN101960469A (en) 2011-01-26
CN101960469B true CN101960469B (en) 2014-03-26

Family

ID=42119542

Family Applications (2)

Application Number Title Priority Date Filing Date
CN200880127748.0A Expired - Fee Related CN101960469B (en) 2008-10-20 2008-10-20 Fast signature scan
CN201711339378.4A Pending CN108197470A (en) 2008-10-20 2008-10-20 Fast signature scan

Family Applications After (1)

Application Number Title Priority Date Filing Date
CN201711339378.4A Pending CN108197470A (en) 2008-10-20 2008-10-20 Fast signature scan

Country Status (2)

Country Link
CN (2) CN101960469B (en)
WO (1) WO2010047683A1 (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5970546B2 (en) * 2012-05-31 2016-08-17 株式会社オプトエレクトロニクス Reading apparatus, reading result output method, and program
EP3091450B1 (en) * 2015-05-06 2017-04-05 Örjan Vestgöte Method and system for performing binary searches
CN105095367B (en) * 2015-06-26 2018-12-28 北京奇虎科技有限公司 A kind of acquisition method and device of client data
WO2020051895A1 (en) * 2018-09-14 2020-03-19 西门子股份公司 Data compression method, data restoration method and device
US20230104304A1 (en) * 2021-09-28 2023-04-06 Rakuten Mobile, Inc. Logic-gate based non-deterministic finite automata tree structure application apparatus and method

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1567174A (en) * 2003-06-09 2005-01-19 吴胜远 Method for expressing and processing object and apparatus thereof
CN1972292A (en) * 2005-10-17 2007-05-30 飞塔信息科技(北京)有限公司 Electronic data processing system and method thereof

Family Cites Families (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH04306785A (en) * 1991-04-03 1992-10-29 Mitsubishi Electric Corp Pattern recognition system
CA2195681A1 (en) * 1994-07-26 1996-02-08 Dennis G. Priddy Methods and systems for creating and authenticating unalterable self-verifying articles
WO2002091145A1 (en) * 2001-05-08 2002-11-14 Ip.Com, Inc. Method and apparatus for collecting electronic signatures
AU2002346116A1 (en) * 2001-07-20 2003-03-03 Gracenote, Inc. Automatic identification of sound recordings
CN1459761B (en) * 2002-05-24 2010-04-21 清华大学 Character identification technique based on Gabor filter set
US7444515B2 (en) * 2003-08-14 2008-10-28 Washington University Method and apparatus for detecting predefined signatures in packet payload using Bloom filters
CN1300982C (en) * 2003-12-05 2007-02-14 中国科学技术大学 Hierarchical cooperated network virus and malice code recognition method
US20060106769A1 (en) * 2004-11-12 2006-05-18 Gibbs Kevin A Method and system for autocompletion for languages having ideographs and phonetic characters
CN100354863C (en) * 2005-02-03 2007-12-12 中国科学院计算技术研究所 Method and system for large scale keyboard matching
US7400271B2 (en) * 2005-06-21 2008-07-15 International Characters, Inc. Method and apparatus for processing character streams
US20080005578A1 (en) * 2006-06-29 2008-01-03 Innovya Research & Development Ltd. System and method for traceless biometric identification
US7747078B2 (en) * 2006-07-06 2010-06-29 Intel Corporation Substring detection system and method
CN1997011B (en) * 2006-07-26 2011-01-12 白杰 Data partition method and data partition device
CN100530182C (en) * 2006-10-17 2009-08-19 中兴通讯股份有限公司 Character string matching information processing method in communication system

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1567174A (en) * 2003-06-09 2005-01-19 吴胜远 Method for expressing and processing object and apparatus thereof
CN1972292A (en) * 2005-10-17 2007-05-30 飞塔信息科技(北京)有限公司 Electronic data processing system and method thereof

Also Published As

Publication number Publication date
CN108197470A (en) 2018-06-22
WO2010047683A1 (en) 2010-04-29
CN101960469A (en) 2011-01-26

Similar Documents

Publication Publication Date Title
US10860832B2 (en) Fast signature scan
US11853389B2 (en) Methods and apparatus for sorting data
US9407287B2 (en) Parallel history search and encoding for dictionary-based compression
US7454418B1 (en) Fast signature scan
US8387003B2 (en) Pluperfect hashing
CN101960469B (en) Fast signature scan
CN105844210B (en) Hardware efficient fingerprinting
Yang et al. Pase: Postgresql ultra-high-dimensional approximate nearest neighbor search extension
Zentgraf et al. Fast lightweight accurate xenograft sorting
CN103793522A (en) Method and system for rapidly scanning feature codes
Rizzo et al. Chaining of maximal exact matches in graphs
CN115982310B (en) Chain table generation method with verification function and electronic equipment
US20080306948A1 (en) String and binary data sorting
Fang et al. Towards a Latin-square search engine
CN105843837B (en) Hardware efficient rabin fingerprinting
JP3062119B2 (en) Character string search table, method for creating the same, and character string search method
Geissmann Longest Increasing Subsequence under Persistent Comparison Errors
CN117668147A (en) Method and system for realizing high-performance stock searching by using prefix tree
GB2583738A (en) Content distribution integrity control
Adjeroh et al. Exact and approximate pattern matching
Wohoush Genome Database Indexing Using A Modified Wavelet Transformation And Btree

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
ASS Succession or assignment of patent right

Owner name: WANG YING

Free format text: FORMER OWNER: WANG QIANG

Effective date: 20141115

C41 Transfer of patent application or patent right or utility model
COR Change of bibliographic data

Free format text: CORRECT: ADDRESS; TO: 200000 YANGPU DISTRICT, SHANGHAI

TR01 Transfer of patent right

Effective date of registration: 20141115

Address after: 902 room 83, Lane 289, 200000 souvenir Road, Shanghai

Patentee after: Wang Ying

Address before: California, USA

Patentee before: Wang Qiang

CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20140326