WO2016029664A1 - Two-dimensional filter generation method, query method and device - Google Patents

Two-dimensional filter generation method, query method and device Download PDF

Info

Publication number
WO2016029664A1
WO2016029664A1 PCT/CN2015/072915 CN2015072915W WO2016029664A1 WO 2016029664 A1 WO2016029664 A1 WO 2016029664A1 CN 2015072915 W CN2015072915 W CN 2015072915W WO 2016029664 A1 WO2016029664 A1 WO 2016029664A1
Authority
WO
WIPO (PCT)
Prior art keywords
key
value
hash
key value
sub
Prior art date
Application number
PCT/CN2015/072915
Other languages
French (fr)
Chinese (zh)
Inventor
张延松
陈红
李翠平
孙东旺
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to RU2017109924A priority Critical patent/RU2017109924A/en
Priority to JP2017511736A priority patent/JP2017526081A/en
Priority to EP15836802.7A priority patent/EP3179382A4/en
Publication of WO2016029664A1 publication Critical patent/WO2016029664A1/en
Priority to US15/443,997 priority patent/US20170170968A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/32Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols including means for verifying the identity or authority of a user of the system or for message authentication, e.g. authorization, entity authentication, data integrity or data verification, non-repudiation, key authentication or verification of credentials
    • H04L9/3236Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols including means for verifying the identity or authority of a user of the system or for message authentication, e.g. authorization, entity authentication, data integrity or data verification, non-repudiation, key authentication or verification of credentials using cryptographic hash functions
    • H04L9/3242Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols including means for verifying the identity or authority of a user of the system or for message authentication, e.g. authorization, entity authentication, data integrity or data verification, non-repudiation, key authentication or verification of credentials using cryptographic hash functions involving keyed hash functions, e.g. message authentication codes [MACs], CBC-MAC or HMAC
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2255Hash tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9014Indexing; Data structures therefor; Storage structures hash tables
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/06Network architectures or network communication protocols for network security for supporting key management in a packet data network
    • H04L63/061Network architectures or network communication protocols for network security for supporting key management in a packet data network for key exchange, e.g. in peer-to-peer networks

Definitions

  • the present invention relates to the technical field of element query matching, and in particular to a method and device for generating and querying a two-dimensional filter.
  • a hash table is a data structure that quickly maps its storage location based on the key value of the element. This mapping function is what we usually call a hash function.
  • FIG. 1A The structure of the hash table is shown in Figure 1A.
  • the elements in the set first get their hash position through the hash function, and then record the element in the hash list of the location.
  • the hash function is HASH
  • a one-dimensional Bloom filter consists of k independent hash functions h1, h2, ..., Hk is composed of a bit vector of length m, where each hash function has a range of ⁇ 0, 1, ..., m-1 ⁇ , and because one byte has 8 bits, the bit vector The actual occupied memory space is m/8 bytes, and all bits of the bit vector are initialized to 0.
  • the Bloom filter When each data element in the set S is loaded into the Bloom filter, the Bloom filter is called The data element set S. When querying whether a data element is in the set S, use the same k hash functions to calculate a hash sequence for the data element. If each bit on the bit vector corresponding to the hash sequence is 1, it is considered The data element belongs to S, otherwise it does not belong to S. Compared to the full storage of data, the use of Bloom filters saves storage space, and the use of Bloom filters never misses any element belonging to the collection.
  • a Bloom filter for these spam email addresses is generated. As shown in FIG. 1B, an example map represented by a Bloom filter for a garbage email address is used. For the garbage email address XXX@163.com, eight different hash functions (F1, F2, ..., F8) are used. The eight hash values (f1, f2, ..., f8) are generated, and the positions of the bit vectors corresponding to the eight hash values are all set to 1, and the spam address is loaded in the Bloom filter.
  • the Bloom filter generated above is for a group of key-value elements, that is, a Bloom filter can only be linked with a key-value element group, and several key-value element groups are required to generate several Bronze.
  • the filter therefore, has the drawback that the Bloom filter is less flexible.
  • Embodiments of the present invention provide a method and a device for generating and querying a two-dimensional filter, which are used to improve Bloom filter flexibility.
  • a method of querying a key value element comprising:
  • a hash value is calculated according to the hash function sub-set corresponding to the key value element group to which the key value element belongs, and the calculated hash value is in the two-dimensional matrix.
  • the element corresponding to the location is set as the second preset identifier;
  • the hash function included in the hash function sub-set corresponding to any two different key element groups is different;
  • the hash function included in the hash function sub-set corresponding to any two different key element groups is the same, and the hash function is arranged differently.
  • the calculating the hash value of the key value element to be queried includes:
  • the third possible implementation in the first aspect In conjunction with the second possible implementation of the first aspect, the third possible implementation in the first aspect In the current mode, the element corresponding to the position of the hash value of the to-be-queried key element in the two-dimensional matrix is obtained, which specifically includes:
  • the first hash value is an element of the column.
  • a method for generating a two-dimensional filter including:
  • each hash function in the set of hash functions corresponds to at least one group of key value elements, and any one of the hash function sets corresponds to at least one key
  • the first sub-key value element of any key value element in the value element group is hashed to obtain a first hash value
  • the second sub-key value element of the any key value element is hashed to obtain a second hash value
  • the first hash value is a positive integer that is less than or equal to the length of the row vector
  • the second hash value is a positive integer that is less than or equal to the length of the column vector
  • a two-dimensional filter including the two-dimensional matrix and the set of hash functions is generated.
  • the length of the row vector and the length of the column vector are both greater than or equal to
  • Sr is the number of all the key element elements included in all the key element group; or Sr is the number of key element elements obtained by filtering all the key value elements included in the all key element group.
  • the first sub-key value element includes the a key value element consisting of all odd bits, the second sub-key value element including a key value element consisting of all even bits of the binary key element when represented by a binary representation;
  • the first sub-key value element includes a key value element consisting of a first bit to a Kth bit when the any of the key value elements is expressed in binary
  • the second sub-key value element includes when the any of the key value elements is expressed in binary
  • the key value element consisting of the K+1th to the Nth bits, N is the number of bits when any of the key elements is expressed in binary, 1 ⁇ K ⁇ N, and K is a positive integer.
  • the method further includes:
  • An element determined by any one of the two-dimensional matrix and any one of the columns of vectors is initialized to a first preset identifier.
  • an apparatus for querying a key value element comprising:
  • a determining unit configured to determine, from the hash function set, a hash function sub-collection corresponding to each group of key value element groups
  • a setting unit configured to calculate a hash value according to a hash function sub-set corresponding to the key value element group of the key value element group for each key value element group in each group of key value element groups, and calculate the hash value in The element corresponding to the position in the two-dimensional matrix is set as the second preset identifier;
  • a calculation unit configured to determine, according to the key value element to be queried, a hash function subset corresponding to the key element group to which the key value element to be queried belongs, and calculate the hash function according to the corresponding hash function subset The hash value of the key element to be queried;
  • An acquiring unit configured to acquire an element corresponding to a location of the hash value of the group of key element elements to be queried in the two-dimensional matrix
  • a query unit configured to determine, when the acquired element is the second preset identifier corresponding to a location of the hash value of the to-be-queried key element in the two-dimensional matrix,
  • the key-value element belongs to a collection of key-valued elements represented by a two-dimensional filter.
  • the determining unit includes a hash function included in a hash function sub-set corresponding to any two different key element groups respectively determined by the determining unit Different; or
  • the hash function included in the hash function sub-set corresponding to any two different key element group determined by the determining unit is the same, and the hash function is arranged differently.
  • the calculating unit is specifically configured to:
  • the acquiring unit is specifically configured to:
  • the first hash value is an element of the column.
  • a device for generating a two-dimensional filter includes:
  • Establishing a unit for establishing a two-dimensional matrix comprising at least two row vectors and at least two column vectors;
  • a determining unit configured to determine a hash function set, wherein each hash function in the hash function set corresponds to at least one key value element group, and any one of the hash function sets
  • the first sub-key value element of any one of the corresponding at least one key value element group is hashed to obtain a first hash value
  • the second sub-key value element of the any of the key value elements is hashed to obtain a a second hash value
  • the first hash value being a positive integer less than or equal to the length of the row vector
  • the second hash value being a positive integer less than or equal to the length of the column vector
  • a generating unit configured to generate a two-dimensional filter including the two-dimensional matrix and the hash function set.
  • the two-dimensional matrix generated by the establishing unit includes a length of a row vector and a length of the column vector that are greater than or equal to
  • Sr is the number of all the key element elements included in all the key element group; or Sr is the number of key element elements obtained by filtering all the key value elements included in the all key element group.
  • the first sub-key value element obtained by the determining unit includes any one of the key value elements a key value element consisting of all odd bits when represented by a binary, and the second sub-key value element obtained by the determining unit includes a key value element consisting of all even bits of the binary element when the binary element is represented;
  • the first sub-key value element obtained by the determining unit includes a key value element composed of the first to the Kth bits when the any key value element is expressed in binary
  • the second sub-key value element obtained by the determining unit includes the A key value element consisting of a K+1th bit to an Nth bit when any of the key value elements is expressed in binary, and N is a bit number when the any of the key value elements is expressed in binary, 1 ⁇ K ⁇ N, and K is a positive integer.
  • An element determined by any one row vector and any one column vector is initialized to a first preset identifier.
  • a two-dimensional filter includes a two-dimensional matrix, and the two-dimensional matrix can be linked with a plurality of key element groups, thereby improving the flexibility of the filter.
  • 1A is a schematic structural diagram of a hash table in the prior art
  • FIG. 1B is a diagram showing an example of a garbage email address in the prior art using a filter
  • 2A is a flowchart of generating a two-dimensional filter in an embodiment of the present invention
  • 2B is a schematic diagram of a two-dimensional matrix in an embodiment of the present invention.
  • FIG. 3 is a flowchart of querying a key value element in an embodiment of the present invention.
  • FIG. 5 is a schematic diagram showing the functional structure of an apparatus for generating a two-dimensional filter according to an embodiment of the present invention
  • FIG. 6 is a schematic diagram of a physical structure of an apparatus for generating a two-dimensional filter according to an embodiment of the present invention
  • FIG. 7 is a schematic diagram showing the functional structure of an apparatus for querying a key value element according to an embodiment of the present invention.
  • FIG. 8 is a schematic diagram showing the physical structure of an apparatus for querying a key value element according to an embodiment of the present invention.
  • system and “network” are used interchangeably herein.
  • the term “and/or” in this context is merely an association describing the associated object, indicating that there may be three relationships, for example, A and / or B, which may indicate that A exists separately, and both A and B exist, respectively. B these three situations.
  • the letter “/” in this article generally indicates that the contextual object is an "or" relationship.
  • Step 200 Establish a two-dimensional matrix including at least two row vectors and at least two column vectors;
  • Step 210 Determine a hash function set, wherein each hash function in the hash function set corresponds to at least one key value element group, and any one of the hash function pairs in the hash function set corresponds to at least one key value
  • the first sub-key value element of any key element in the element group is hashed to obtain a first hash value
  • the second sub-key value element of any key element element in the corresponding key value element group is hashed to obtain a second a hash value
  • the first hash value is a positive integer less than or equal to the length of the row vector
  • the second hash value is a positive integer less than or equal to the length of the column vector
  • Step 220 Generate a two-dimensional filter including a two-dimensional matrix and a hash function set.
  • the two-dimensional matrix is as shown in FIG. 2B.
  • the number of the storage data units of the two-dimensional matrix is the product of the number of row vectors and the number of column vectors. As shown in FIG. 2B, the number of row vectors of the two-dimensional matrix is 9, and the number of column vectors is 9. Then, the number of stored data units of the two-dimensional matrix is 81.
  • the embodiment of the present invention if the length of the row vector and the column vector of the established two-dimensional matrix are smaller than If the key value elements in the key element group are loaded into the filter, the probability that the different key value elements are loaded into the same position is high, which affects the accuracy of the query. Therefore, in order to improve the accuracy of the query, the embodiment of the present invention
  • the length of the row vector and the column vector of the established two-dimensional matrix are greater than Where Sr is the number of all key-value elements included in all key-value element groups; or, Sr is the number of key-valued elements that are filtered by the query condition for all key-value elements included in all key-value element groups.
  • first key value element and the second key value element have various forms, and are optional, and may be in the following forms:
  • the first sub-key value element includes a key value element consisting of all odd-numbered bits when any of the key-value elements are represented in binary
  • the second sub-key value element includes a key-valued element consisting of all even-numbered bits when the binary-valued element is represented in binary.
  • the key value elements of all odd digits may be decimal, and the key value elements of all even digits may be decimal. Of course, other hexadecimal digits may also be used, and details are not described herein.
  • the key element is 37348
  • 37348 is expressed in binary: 1001000111100100
  • all odd bits are: 01011010
  • all even digits are: 10001100
  • all odd digits represent 90 decimal numbers (first subkey element)
  • all even numbers The decimal number represented by the bit is 140 (the second sub-key element).
  • the first sub-key value element includes a key value element consisting of a 1st to a Kth bit when any of the key value elements are expressed in binary
  • the second sub-key value element includes a K+1th bit to the A key element composed of N bits, N is the number of bits when any of the key elements are expressed in binary, 1 ⁇ K ⁇ N, and K is a positive integer.
  • the key value elements of all odd digits may be decimal, and the key value elements of all even digits may be decimal. Of course, other hexadecimal digits may also be used, and details are not described herein.
  • the key element is 37348
  • 37348 is represented by binary: 1001000111100100
  • the 0th to 7th digits are: 10010001
  • the 8th to 15th digits are: 11100100
  • the 0th to 7th digits are represented by a decimal number of 90
  • the first sub-key element the eighth to fifteenth digits represent a decimal number of 140 (the second sub-key element).
  • the method further includes: initializing an element determined by any one of the two-dimensional matrix and an arbitrary one of the columns into the first preset identifier.
  • Step 300 Determine, from the hash function set, a hash function subset corresponding to each group of key value elements respectively;
  • Step 310 Calculate a hash value according to a hash function sub-set corresponding to the key element group of the key element element for each key value element of each group of key value elements, and calculate the hash value in a two-dimensional matrix.
  • the element corresponding to the position in the middle is set as the second preset identifier;
  • Step 320 Determine, for the key value element to be queried, a hash function sub-set corresponding to the key element group to which the key value element to be queried belongs, and calculate a key value element to be queried according to the corresponding hash function sub-set Greek value
  • Step 330 Acquire an element corresponding to a position of a hash value of the key element to be queried in the two-dimensional matrix
  • Step 340 When the acquired element is the second preset identifier corresponding to the position of the hash value of the key value element to be queried in the two-dimensional matrix, it is determined that the key value element to be queried belongs to the two-dimensional filter representation. A collection of key-valued elements.
  • the hash function included in the hash function sub-set corresponding to any two different key element groups is different in the embodiment of the present invention.
  • the hash function included in the hash function sub-set corresponding to any two different key element groups is the same, and the hash function is arranged differently.
  • the first key value element group is a sales table for the region
  • the second key value element group is a sales table for the month
  • the hash function sub-set and the second key value element corresponding to the first key value element group The subset of hash functions corresponding to the group is not the same.
  • the first hash value of the key element of the key value element to be queried is calculated according to the hash function sub-set corresponding to the key element group to which the key value element to be queried belongs;
  • the second hash value is calculated based on the hash function sub-set corresponding to the key element group to which the key value element to be queried belongs, and the second sub-key value element of the key element to be queried.
  • the first sub-key element here may also include a key element composed of all the odd bits of the binary element when represented by a binary, and the second sub-key element includes all the even bits of the binary element when represented by a binary representation.
  • Key element or
  • the first sub-key value element includes a key value element consisting of a 1st to a Kth bit when any of the key value elements are expressed in binary
  • the second sub-key value element includes a K+1th bit to the A key element composed of N bits, N is the number of bits when any of the key elements are expressed in binary, 1 ⁇ K ⁇ N, and K is a positive integer.
  • the specific representation of the first sub-key value element is the same as the representation of the first sub-key value element in the embodiment.
  • the first sub-key element of any key-value element is 90
  • the second sub-key element is 140
  • the hash function sub-set corresponding to any of the key-value elements is (h1, h2, h3), (h1, h2, h3)
  • the first hash value calculated for 90 is 6, 128, 55, respectively (h1, h2, h3)
  • the second hash value calculated for 140 is 0, 101, 46, respectively, then any key value element is
  • the positions in the two-dimensional matrix are (6, 0), (6, 101), (6, 46), (128, 0), (128, 101), (128, 46), (55, 0) (55, 101), (55, 46), then the elements corresponding to these positions are set to the second preset identifier, or the position of any key element in the two-dimensional matrix is (0, 6), (101, 6), (46, 4), (0, 128), (101, 128), (46, 128), (0, 55), (101, 55), (46, 55), then these positions are corresponding
  • the elements
  • the query key value element belongs to the key value element included in the plurality of key value element groups
  • only the two-dimensional filter query is needed, and it is not necessary to generate a corresponding one for each key value element group.
  • the Bloom filter and when querying whether the key element belongs to a key element included in a plurality of key element groups, it is not necessary to separately query based on multiple Bloom filters, thereby solving the current query efficiency. Lower defects.
  • the two-dimensional matrix is described as an example in the first embodiment and the second embodiment. Of course, it can also be a multi-dimensional matrix such as a three-dimensional matrix or a four-dimensional matrix.
  • the process of generating a multi-dimensional matrix is similar to the process of generating a two-dimensional matrix, and is based on
  • the query process of the multi-dimensional matrix is similar to the query process based on the two-dimensional matrix, and will not be described in detail here.
  • Step 400 Establish a two-dimensional matrix including three row vectors and three column vectors;
  • Step 410 Determine a hash function set, and generate a two-dimensional filter including a two-dimensional matrix and a hash function set;
  • each hash function in the hash function set corresponds to at least one key value element group
  • any one of the hash function sets in the hash function set corresponds to any one of the at least one key value element group
  • a subkey element is hashed to obtain a first hash value
  • a second hash value element of any key element element in the corresponding key value element group is hashed to obtain a second hash value
  • the first hash value a positive integer that is less than or equal to the length of the row vector
  • the second hash value is a positive integer that is less than or equal to the length of the column vector
  • the set of hash functions determined in this step includes 10 hash functions: h1, h2, h3, h4, h5, h6, h7, h8, h9, h10.
  • Step 420 Initialize an element determined by any one row vector and any one column vector in the two-dimensional matrix into a first preset identifier
  • Step 430 Determine, from the determined set of hash functions, a subset of the hash function corresponding to the two sets of key element elements respectively;
  • Step 440 Calculate a hash value according to a hash function sub-set corresponding to the key element group of the key element element for any one of the two key value element groups, and calculate the hash value.
  • the element corresponding to the position in the two-dimensional matrix is preset as the second preset identifier;
  • Step 450 Determine, for the key value element to be queried, a hash function sub-set corresponding to the key element group to which the key value element to be queried belongs, and calculate a key value element to be queried according to the corresponding hash function sub-set Greek value
  • Step 460 Acquire an element corresponding to a position of a hash value of the key element to be queried in the two-dimensional matrix
  • Step 470 Determine whether the acquired element is a second preset identifier corresponding to a position of the hash value of the key element to be queried in the two-dimensional matrix, and if yes, determine that the key value element to be queried belongs to the two-dimensional filter.
  • the set of key-valued elements represented by the device otherwise, it is determined that the key-valued element to be queried does not belong to the set of key-valued elements represented by the two-dimensional filter.
  • an embodiment of the present invention provides a device for generating a two-dimensional filter, where the generating device includes an establishing unit 50, a determining unit 51, and a generating unit 52, where:
  • An establishing unit 50 configured to establish a two-dimensional matrix including at least two row vectors and at least two column vectors;
  • a determining unit 51 configured to determine a hash function set, wherein each hash function in the hash function set corresponds to at least one key value element group, and any one of the hash function pairs in the hash function set corresponds to at least The first sub-key value element of any key-value element in a key-value element group is hashed to obtain a first hash value, and the second sub-key value element of any key-value element in the corresponding key-value element group is hashed Obtaining a second hash value, each of which is a positive integer less than or equal to the length of the row vector, and the second hash value is a positive integer less than or equal to the length of the column vector;
  • the generating unit 52 is configured to generate a two-dimensional filter including a two-dimensional matrix and a hash function set.
  • the length of the row vector and the length of the column vector included in the two-dimensional matrix generated by the establishing unit 50 are greater than or equal to
  • Sr is the number of all key element elements included in all key value element groups; or, Sr is the number of key value elements obtained by filtering all query key conditions of all key value elements included in all key value element groups.
  • the first sub-key value element obtained by the determining unit 51 includes a key value element composed of all the odd-numbered bits when any of the key-value elements are expressed in binary, and the second sub-key value element obtained by the determining unit 51 is determined. a key-value element consisting of all even-numbered bits when any key-valued element is represented in binary; or
  • the first sub-key value element obtained by the determining unit 51 includes a key value element composed of the 1st to the Kth bits when any of the key value elements are expressed in binary, and the second sub-key value element obtained by the determining unit 51 includes any key value element in binary.
  • the key element composed of the K+1th to the Nthth position at the time of representation, and N is the number of bits when any of the key elements is expressed in binary, 1 ⁇ K ⁇ N, and K is a positive integer.
  • the method further includes an initialization unit 53 for initializing an element determined by any one of the two-dimensional matrix and any one of the column vectors as the first preset identifier.
  • FIG. 6 it is a physical device diagram of a two-dimensional filter generating apparatus provided by the present invention.
  • the two-dimensional filter generating apparatus includes at least one processor 601, a communication bus 602, a memory 603, and at least one communication interface 604.
  • the communication bus 602 is used to implement the connection and communication between the above components, and the communication interface 604 is used to connect and communicate with external devices.
  • the memory 603 is used to store program code that needs to be executed.
  • the processor 601 executes the program code in the memory 603, the following functions are implemented:
  • each hash function in the hash function set corresponds to at least one key value element group, and any one of the hash function pairs in the hash function set corresponds to at least one key value element group
  • the first sub-key value element of any of the key element elements is hashed to obtain a first hash value
  • the second sub-key value element of any key element element in the corresponding key value element group is hashed to obtain a second hash value.
  • the first hash value is a positive integer less than or equal to the length of the row vector
  • the second hash value is a positive integer less than or equal to the length of the column vector;
  • a two-dimensional filter is generated that includes a two-dimensional matrix and a collection of hash functions.
  • an embodiment of the present invention provides an apparatus for querying a key value element, where the apparatus for querying a key value element includes a determining unit 70, a setting unit 71, a calculating unit 72, an obtaining unit 73, and a query.
  • Unit 74 wherein:
  • a determining unit 70 configured to determine, from the hash function set, a hash function subset corresponding to each group of key value elements
  • the setting unit 71 is configured to calculate, according to any one of the key value elements of each set of key value elements, a hash value according to the hash function subset corresponding to the key element element group to which the key value element belongs, and calculate the hash value in The element corresponding to the position in the two-dimensional matrix is set as the second preset identifier;
  • the calculating unit 72 is configured to determine, according to the key value element to be queried, a hash function sub-set corresponding to the key element group to which the key value element to be queried belongs, and calculate a key value to be queried according to the corresponding hash function sub-set The hash value of the element;
  • the obtaining unit 73 is configured to obtain an element corresponding to a position of the hash value of the key element group to be queried in the two-dimensional matrix;
  • the query unit 74 is configured to: when the acquired element is a second preset identifier corresponding to a position of the hash value of the key value element to be queried in the two-dimensional matrix, determine that the key value element to be queried belongs to the two-dimensional filtering A collection of key-valued elements represented by the device.
  • the hash function included in the hash function sub-set corresponding to any two different key element group determined by the determining unit 70 is different;
  • the hash function included in the hash function sub-set corresponding to any two different key element group determined by the determining unit 70 is the same, and the hash function is arranged differently.
  • the calculating unit 72 is specifically configured to:
  • the first hash value of the key element of the key value element to be queried is calculated according to the hash function sub-set corresponding to the key element group to which the key value element to be queried belongs;
  • the second hash value is calculated based on the hash function sub-set corresponding to the key element group to which the key value element to be queried belongs, and the second sub-key value element of the key element to be queried.
  • the obtaining unit 73 is specifically configured to:
  • FIG. 8 it is a physical device diagram of a two-dimensional filter generating apparatus provided by the present invention.
  • the two-dimensional filter generating apparatus includes at least one processor 801, a communication bus 802, a memory 803, and at least one communication interface 804.
  • the communication bus 802 is used to implement the connection and communication between the above components, and the communication interface 804 is used to connect and communicate with external devices.
  • the memory 803 is configured to store program code that needs to be executed.
  • the processor 801 executes the program code in the memory 803, the following functions are implemented:
  • a hash value is calculated according to the hash function sub-set corresponding to the key value element group of the key value element, and the calculated hash value is in the two-dimensional matrix.
  • the element corresponding to the location is set as the second preset identifier;
  • the key value element to be queried belongs to the key value represented by the two-dimensional filter Collection of elements.
  • a two-dimensional filter includes a two-dimensional matrix, and the two-dimensional matrix can be linked with a plurality of key element groups, thereby improving the flexibility of the filter.
  • the query key value element belongs to the key value element included in the plurality of key value element groups, only the two-dimensional filter query is needed, and it is not necessary to generate the Bronze corresponding to each of the key value element groups respectively.
  • the filter and when the query key value element belongs to the key value element included in the plurality of key value element groups, it is not necessary to separately query based on the multiple bloom filters, thereby solving the current query efficiency is low. defect.
  • the computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing device to operate in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture comprising the instruction device.
  • the apparatus functions in one or more blocks of a flow or a flow diagram and/or block diagram of a flowchart.
  • These computer program instructions can also be loaded onto a computer or other programmable data processing device such that a series of operational steps are performed on a computer or other programmable device to produce computer-implemented processing for execution on a computer or other programmable device.
  • the instructions provide steps for implementing the functions in one or more blocks of the flowchart or in a flow or block of the flowchart.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Computer Security & Cryptography (AREA)
  • Mathematical Physics (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Computing Systems (AREA)
  • Mathematical Analysis (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Optimization (AREA)
  • Computational Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Power Engineering (AREA)
  • Algebra (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Computer And Data Communications (AREA)

Abstract

A two-dimensional filter generation method, query method and device; the two-dimensional filter comprises a two-dimensional matrix capable of being linked to a plurality of key-value element groups, thus improving the flexibility of the filter; and in addition, when querying whether a key-value element belongs to the key-value elements comprised by the plurality of key-value element groups, only querying based on the two-dimensional filter without generating bloom filters respectively corresponding to each key-value element group, and without respectively querying based on a plurality of bloom filters, thus addressing the current defect of low query efficiency.

Description

一种二维过滤器的生成方法、查询方法及装置Method, device and device for generating two-dimensional filter 技术领域Technical field
本发明涉及元素查询匹配的技术领域,特别是涉及一种二维过滤器的生成、查询方法及装置。The present invention relates to the technical field of element query matching, and in particular to a method and device for generating and querying a two-dimensional filter.
背景技术Background technique
在设计计算机软件时,经常要判断一个元素是否在一个集合中。比如在字处理软件中,需要检查一个英语单词是否拼写正确(也就是要判断它是否在已知的字典中);又比如在URL(Universal Resource Locator,统一资源定位符)过滤软件中,判断一个URL是否在过滤名单中等等。最直接的方法就是将集合中的全部元素存储在计算机中,遇到一个新元素时,将它和集合中的元素进行直接比较即可。为了提高查找的速度,通常使用哈希表(Hash Table)来存储集合。哈希表,是一种根据元素的关键码值来快速映射其存储位置的数据结构,这个映射函数也就是通常我们所说的哈希函数。哈希表的结构如图1A所示,集合中的元素首先经过哈希函数得到其哈希位置,然后将元素记录到该位置的哈希链表中。图1A中,假设哈希函数为HASH,并且A1、A2,…A8是集合中的元素,那么从图中可看出HASH(A1)=HASH(A2)=H1,HASH(A3)=HASH(A4)=H2,HASH(A5)=HASH(A6)=H3,HASH(A7)=HASH(A8)=H4。When designing computer software, it is often necessary to determine if an element is in a collection. For example, in word processing software, it is necessary to check whether an English word is spelled correctly (that is, to determine whether it is in a known dictionary); for example, in a URL (Universal Resource Locator) filtering software, judge one Whether the URL is in the filter list, and so on. The most straightforward way is to store all the elements in the collection in the computer. When a new element is encountered, it can be directly compared with the elements in the collection. To improve the speed of the lookup, a hash table is usually used to store the collection. A hash table is a data structure that quickly maps its storage location based on the key value of the element. This mapping function is what we usually call a hash function. The structure of the hash table is shown in Figure 1A. The elements in the set first get their hash position through the hash function, and then record the element in the hash list of the location. In Fig. 1A, assuming that the hash function is HASH, and A1, A2, ..., A8 are elements in the set, it can be seen from the figure that HASH(A1)=HASH(A2)=H1, HASH(A3)=HASH( A4)=H2, HASH(A5)=HASH(A6)=H3, HASH(A7)=HASH(A8)=H4.
哈希表的好处是能够快速准确的判断元素是否在集合中,缺点就是需要较大的存储空间。为了节省存储空间,一维布隆过滤器由巴顿布隆于一九七零年提出,其原理如下:一个一维布隆过滤器由k个相互独立的哈希函数h1,h2,……,hk和一个长度为m的位向量组成,其中,每个哈希函数的值域均为{0,1,……,m-1},又因为一个字节有8个比特位,因此位向量实际占的内存空间为m/8个字节,位向量的所有的位均初始化为0。集合S={s1,s2,……,sn},用k个哈希函数对集合S中的每一个元素计算一个哈希序列 (h1(s),h2(s),……,hk(s)),然后将位向量中对应的哈希序列位设为1,则称该布隆过滤器装入了数据元素集合S,或者说该布隆过滤器表示了数据元素集合S。例如若h1(s1)=5,则将位向量的第6位设为1,h2(s1)=10,则将位向量的第11位设为1,直到hk(s1)=n-1,将位向量的第n位设为1,则称布隆过滤器中装入了数据元素s1,当集合S中的每一个数据元素均装入布隆过滤器中,则称布隆过滤器表示了数据元素集合S。当查询某个数据元素是否在集合S中时,用同样的k个哈希函数对数据元素计算一个哈希序列,如果哈希序列所对应的位向量上的每一位均为1,则认为该数据元素属于S,否则不属于S。与完全存储数据相比,采用布隆过滤器,能够节省存储空间,使用布隆过滤器绝不会漏掉任何一个属于集合中的元素。The advantage of a hash table is that it can quickly and accurately determine whether an element is in a collection. The disadvantage is that it requires a large storage space. In order to save storage space, the one-dimensional Bloom filter was proposed by Barton Bron in 1970. The principle is as follows: A one-dimensional Bloom filter consists of k independent hash functions h1, h2, ..., Hk is composed of a bit vector of length m, where each hash function has a range of {0, 1, ..., m-1}, and because one byte has 8 bits, the bit vector The actual occupied memory space is m/8 bytes, and all bits of the bit vector are initialized to 0. Set S={s1, s2, ..., sn}, calculate a hash sequence for each element in set S with k hash functions (h1(s), h2(s), ..., hk(s)), and then set the corresponding hash sequence bit in the bit vector to 1, then the Bloom filter is said to be loaded with the data element set S, Or the Bloom filter represents a set of data elements S. For example, if h1(s1)=5, the 6th bit of the bit vector is set to 1, and h2(s1)=10, then the 11th bit of the bit vector is set to 1, until hk(s1)=n-1, Setting the nth bit of the bit vector to 1 means that the data element s1 is loaded in the Bloom filter. When each data element in the set S is loaded into the Bloom filter, the Bloom filter is called The data element set S. When querying whether a data element is in the set S, use the same k hash functions to calculate a hash sequence for the data element. If each bit on the bit vector corresponding to the hash sequence is 1, it is considered The data element belongs to S, otherwise it does not belong to S. Compared to the full storage of data, the use of Bloom filters saves storage space, and the use of Bloom filters never misses any element belonging to the collection.
下面结合垃圾Email地址的例子,对布隆过滤器做一个简单的描述。The following is a simple description of the Bloom filter in conjunction with the example of a spam email address.
假定垃圾Email地址的数量有一亿个,先建立一个长度为十六亿比特的位向量,即两亿字节的向量,然后将这十六亿个二进制位全部初始化为零。对于每一个已知的垃圾Email地址,用八个不同的哈希函数(F1,F2,……,F8)产生八个哈希值(f1,f2,……,f8),将这八个哈希值对应的位向量的位置全部置1。将这一亿个垃圾Email地址都进行这样的处理后,一个针对这些垃圾Email地址的布隆过滤器就生成了。如图1B所示,为某一个垃圾Email地址采用布隆过滤器表示的示例图,对于垃圾Email地址XXX@163.com,采用八个不同的哈希函数(F1,F2,……,F8)产生八个哈希值(f1,f2,……,f8),将这八个哈希值对应的位向量的位置全部置1,则称布隆过滤器中装入了该垃圾邮件地址。Assuming there are 100 million spam email addresses, first create a bit vector of 1.6 billion bits, which is a two-billion-byte vector, and then initialize all of the 1.6 billion bits to zero. For each known spam email address, eight hash values (f1, f2, ..., f8) are generated using eight different hash functions (F1, F2, ..., F8). The position of the bit vector corresponding to the hash value is set to 1. After processing the hundreds of millions of spam email addresses, a Bloom filter for these spam email addresses is generated. As shown in FIG. 1B, an example map represented by a Bloom filter for a garbage email address is used. For the garbage email address XXX@163.com, eight different hash functions (F1, F2, ..., F8) are used. The eight hash values (f1, f2, ..., f8) are generated, and the positions of the bit vectors corresponding to the eight hash values are all set to 1, and the spam address is loaded in the Bloom filter.
以上生成的布隆过滤器是针对一个键值元素组的,也就是说,一个布隆过滤器只能与一个键值元素组相链接,有几个键值元素组就要生成几个布隆过滤器,因此,存在布隆过滤器灵活性较差的缺陷。The Bloom filter generated above is for a group of key-value elements, that is, a Bloom filter can only be linked with a key-value element group, and several key-value element groups are required to generate several Bronze. The filter, therefore, has the drawback that the Bloom filter is less flexible.
发明内容Summary of the invention
本发明实施例提供一种二维过滤器的生成、查询方法及装置,用以提高 布隆过滤器的灵活性。Embodiments of the present invention provide a method and a device for generating and querying a two-dimensional filter, which are used to improve Bloom filter flexibility.
本发明实施例提供的具体技术方案如下:The specific technical solutions provided by the embodiments of the present invention are as follows:
第一方面,提供一种查询键值元素的方法,包括:In a first aspect, a method of querying a key value element is provided, comprising:
从哈希函数集合中确定出每一组键值元素组分别对应的哈希函数子集合;Determining, from the hash function set, a hash function sub-set corresponding to each group of key value element groups;
针对每一组键值元素组中的任一键值元素,根据该键值元素所属键值元素组对应的哈希函数子集合计算出哈希值,将计算出的哈希值在二维矩阵中所处位置对应的元素设置为第二预设标识;For each key value element in each group of key value element groups, a hash value is calculated according to the hash function sub-set corresponding to the key value element group to which the key value element belongs, and the calculated hash value is in the two-dimensional matrix. The element corresponding to the location is set as the second preset identifier;
针对待查询的键值元素,确定所述待查询的键值元素所属键值元素组对应的哈希函数子集合,并根据所述对应的哈希函数子集合,计算所述待查询的键值元素的哈希值;Determining, according to the key value element to be queried, a hash function sub-set corresponding to the key element group to which the key value element to be queried belongs, and calculating the key value to be queried according to the corresponding hash function sub-set The hash value of the element;
获取所述待查询的键值元素的哈希值在所述二维矩阵中所处位置对应的元素;Obtaining an element corresponding to a position of the hash value of the to-be-queried key value element in the two-dimensional matrix;
当所述获取的元素为所述待查询的键值元素的哈希值在所述二维矩阵中所处位置对应的第二预设标识时,确定所述待查询的键值元素属于二维过滤器表示的键值元素集合。Determining that the key value element to be queried belongs to two-dimensional when the acquired element is a second preset identifier corresponding to a position of the hash value of the to-be-queried key element in the two-dimensional matrix. A collection of key-valued elements represented by the filter.
结合第一方面,在第一方面的第一种可能的实现方式中,任意两个不相同的键值元素组分别对应的哈希函数子集合中包括的哈希函数不同;或者With reference to the first aspect, in the first possible implementation manner of the first aspect, the hash function included in the hash function sub-set corresponding to any two different key element groups is different; or
任意两个不相同的键值元素组分别对应的哈希函数子集合中包括的哈希函数相同,哈希函数的排列方式不同。The hash function included in the hash function sub-set corresponding to any two different key element groups is the same, and the hash function is arranged differently.
结合第一方面,或者第一方面的第一种可能的实现方式,在第一方面的第二种可能的实现方式中,计算所述待查询的键值元素的哈希值,具体包括:With reference to the first aspect, or the first possible implementation manner of the first aspect, in a second possible implementation manner of the first aspect, the calculating the hash value of the key value element to be queried includes:
基于所述待查询的键值元素所属键值元素组对应的哈希函数子集合,对所述待查询的键值元素的第一子键值元素计算得到第一哈希值;Calculating, according to the hash function sub-set corresponding to the key element group to which the key value element to be queried belongs, calculating a first hash value for the first sub-key value element of the key value element to be queried;
基于所述待查询的键值元素所属键值元素组对应的哈希函数子集合,对所述待查询的键值元素的第二子键值元素计算得到第二哈希值。And calculating, according to the hash function sub-set corresponding to the key element group to which the key value element to be queried belongs, a second hash value is calculated for the second sub-key value element of the key value element to be queried.
结合第一方面的第二种可能的实现方式,在第一方面的第三种可能的实 现方式中,获取所述待查询的键值元素的哈希值在所述二维矩阵中所处位置对应的元素,具体包括:In conjunction with the second possible implementation of the first aspect, the third possible implementation in the first aspect In the current mode, the element corresponding to the position of the hash value of the to-be-queried key element in the two-dimensional matrix is obtained, which specifically includes:
获取所述二维矩阵中以所述第一哈希值为行、所述第二哈希值为列的元素;或者,获取所述二维矩阵中以所述第二哈希值为行、所述第一哈希值为列的元素。Obtaining, in the two-dimensional matrix, an element having the first hash value as a row and the second hash value as a column; or acquiring the second hash value in the two-dimensional matrix, The first hash value is an element of the column.
第二方面,提供一种二维过滤器的生成方法,包括:In a second aspect, a method for generating a two-dimensional filter is provided, including:
建立包括至少两个行向量和至少两个列向量的二维矩阵;Establishing a two-dimensional matrix comprising at least two row vectors and at least two column vectors;
确定哈希函数集合,其中,所述哈希函数集合中的每一个哈希函数与至少一个键值元素组相对应,所述哈希函数集合中的任一哈希函数对对应的至少一个键值元素组中的任一键值元素的第一子键值元素进行哈希运算得到第一哈希值、对所述任一键值元素的第二子键值元素进行哈希运算得到第二哈希值,所述第一哈希值为小于或等于所述行向量的长度的正整数,所述第二哈希值为小于或等于所述列向量的长度的正整数;Determining a set of hash functions, wherein each hash function in the set of hash functions corresponds to at least one group of key value elements, and any one of the hash function sets corresponds to at least one key The first sub-key value element of any key value element in the value element group is hashed to obtain a first hash value, and the second sub-key value element of the any key value element is hashed to obtain a second hash value, The first hash value is a positive integer that is less than or equal to the length of the row vector, and the second hash value is a positive integer that is less than or equal to the length of the column vector;
生成包括所述二维矩阵和所述哈希函数集合的二维过滤器。A two-dimensional filter including the two-dimensional matrix and the set of hash functions is generated.
结合第二方面,在第二方面的第一种可能的实现方式中,所述行向量的长度和所述列向量的长度均大于或等于
Figure PCTCN2015072915-appb-000001
With reference to the second aspect, in a first possible implementation manner of the second aspect, the length of the row vector and the length of the column vector are both greater than or equal to
Figure PCTCN2015072915-appb-000001
其中,Sr为所述所有键值元素组包括的所有键值元素的数量;或者,Sr为所述所有键值元素组包括的所有键值元素经过查询条件过滤后得到的键值元素的数量。Wherein, Sr is the number of all the key element elements included in all the key element group; or Sr is the number of key element elements obtained by filtering all the key value elements included in the all key element group.
结合第二方面,或者第二方面的第一种可能的实现方式,在第二方面的第二种可能的实现方式中,所述第一子键值元素包括所述任一键值元素用二进制表示时的所有奇数位组成的键值元素,所述第二子键值元素包括所述任一键值元素用二进制表示时的所有偶数位组成的键值元素;或者With reference to the second aspect, or the first possible implementation manner of the second aspect, in a second possible implementation manner of the second aspect, the first sub-key value element includes the a key value element consisting of all odd bits, the second sub-key value element including a key value element consisting of all even bits of the binary key element when represented by a binary representation; or
所述第一子键值元素包括所述任一键值元素用二进制表示时的第1位至第K位组成的键值元素,所述第二子键值元素包括所述任一键值元素用二进制表示时的第K+1位至第N位组成的键值元素,N为所述任一键值元素用二进制表示时的位数,1≤K≤N,K为正整数。 The first sub-key value element includes a key value element consisting of a first bit to a Kth bit when the any of the key value elements is expressed in binary, and the second sub-key value element includes when the any of the key value elements is expressed in binary The key value element consisting of the K+1th to the Nth bits, N is the number of bits when any of the key elements is expressed in binary, 1≤K≤N, and K is a positive integer.
结合第二方面,或者第二方面的第一至第二种可能的实现方式,在第二方面的第三种可能的实现方式中,还包括:With reference to the second aspect, or the first to the second possible implementation manners of the second aspect, in a third possible implementation manner of the second aspect, the method further includes:
将所述二维矩阵中的任意一行向量和任意一列向量所确定的元素初始化为第一预设标识。An element determined by any one of the two-dimensional matrix and any one of the columns of vectors is initialized to a first preset identifier.
第三方面,提供一种查询键值元素的装置,包括:In a third aspect, an apparatus for querying a key value element is provided, comprising:
确定单元,用于从哈希函数集合中确定出每一组键值元素组分别对应的哈希函数子集合;a determining unit, configured to determine, from the hash function set, a hash function sub-collection corresponding to each group of key value element groups;
设置单元,用于针对每一组键值元素组中的任一键值元素,根据该键值元素所属键值元素组对应的哈希函数子集合计算出哈希值,将计算出的哈希值在二维矩阵中所处位置对应的元素设置为第二预设标识;a setting unit, configured to calculate a hash value according to a hash function sub-set corresponding to the key value element group of the key value element group for each key value element group in each group of key value element groups, and calculate the hash value in The element corresponding to the position in the two-dimensional matrix is set as the second preset identifier;
计算单元,用于针对待查询的键值元素,确定所述待查询的键值元素所属键值元素组对应的哈希函数子集合,并根据所述对应的哈希函数子集合,计算所述待查询的键值元素的哈希值;a calculation unit, configured to determine, according to the key value element to be queried, a hash function subset corresponding to the key element group to which the key value element to be queried belongs, and calculate the hash function according to the corresponding hash function subset The hash value of the key element to be queried;
获取单元,用于获取所述待查询的键值元素组的哈希值在所述二维矩阵中所处位置对应的元素;An acquiring unit, configured to acquire an element corresponding to a location of the hash value of the group of key element elements to be queried in the two-dimensional matrix;
查询单元,用于当所述获取的元素为所述待查询的键值元素的哈希值在所述二维矩阵中所处位置对应的所述第二预设标识,确定所述待查询的键值元素属于二维过滤器表示的键值元素集合。a query unit, configured to determine, when the acquired element is the second preset identifier corresponding to a location of the hash value of the to-be-queried key element in the two-dimensional matrix, The key-value element belongs to a collection of key-valued elements represented by a two-dimensional filter.
结合第三方面,在第三方面的第一种可能的实现方式中,所述确定单元确定出的任意两个不相同的键值元素组分别对应的哈希函数子集合中包括的哈希函数不同;或者With reference to the third aspect, in a first possible implementation manner of the third aspect, the determining unit includes a hash function included in a hash function sub-set corresponding to any two different key element groups respectively determined by the determining unit Different; or
所述确定单元确定出的任意两个不相同的键值元素组分别对应的哈希函数子集合中包括的哈希函数相同,哈希函数的排列方式不同。The hash function included in the hash function sub-set corresponding to any two different key element group determined by the determining unit is the same, and the hash function is arranged differently.
结合第三方面,或者第三方面的第一种可能的实现方式,在第三方面的第二种可能的实现方式中,所述计算单元具体用于:With reference to the third aspect, or the first possible implementation manner of the third aspect, in a second possible implementation manner of the third aspect, the calculating unit is specifically configured to:
基于所述待查询的键值元素所属键值元素组对应的哈希函数子集合,对所述待查询的键值元素的第一子键值元素计算得到第一哈希值; Calculating, according to the hash function sub-set corresponding to the key element group to which the key value element to be queried belongs, calculating a first hash value for the first sub-key value element of the key value element to be queried;
基于所述待查询的键值元素所属键值元素组对应的哈希函数子集合,对所述待查询的键值元素的第二子键值元素计算得到第二哈希值。And calculating, according to the hash function sub-set corresponding to the key element group to which the key value element to be queried belongs, a second hash value is calculated for the second sub-key value element of the key value element to be queried.
结合第三方面的第二种可能的实现方式,在第三方面的第三种可能的实现方式中,所述获取单元具体用于:In conjunction with the second possible implementation of the third aspect, in a third possible implementation manner of the third aspect, the acquiring unit is specifically configured to:
获取所述二维矩阵中以所述第一哈希值为行、所述第二哈希值为列的元素;或者,获取所述二维矩阵中以所述第二哈希值为行、所述第一哈希值为列的元素。Obtaining, in the two-dimensional matrix, an element having the first hash value as a row and the second hash value as a column; or acquiring the second hash value in the two-dimensional matrix, The first hash value is an element of the column.
第四方面,提供一种二维过滤器的生成装置,包括:In a fourth aspect, a device for generating a two-dimensional filter includes:
建立单元,用于建立包括至少两个行向量和至少两个列向量的二维矩阵;Establishing a unit for establishing a two-dimensional matrix comprising at least two row vectors and at least two column vectors;
确定单元,用于确定哈希函数集合,其中,所述哈希函数集合中的每一个哈希函数与至少一个键值元素组相对应,所述哈希函数集合中的任一哈希函数对对应的至少一个键值元素组中的任一键值元素的第一子键值元素进行哈希运算得到第一哈希值、对所述任一键值元素的第二子键值元素进行哈希运算得到第二哈希值,所述第一哈希值为小于或等于所述行向量的长度的正整数,所述第二哈希值为小于或等于所述列向量的长度的正整数;a determining unit, configured to determine a hash function set, wherein each hash function in the hash function set corresponds to at least one key value element group, and any one of the hash function sets The first sub-key value element of any one of the corresponding at least one key value element group is hashed to obtain a first hash value, and the second sub-key value element of the any of the key value elements is hashed to obtain a a second hash value, the first hash value being a positive integer less than or equal to the length of the row vector, and the second hash value being a positive integer less than or equal to the length of the column vector;
生成单元,用于生成包括所述二维矩阵和所述哈希函数集合的二维过滤器。And a generating unit, configured to generate a two-dimensional filter including the two-dimensional matrix and the hash function set.
结合第四方面,在第四方面的第一种可能的实现方式中,所述建立单元生成的二维矩阵包括的行向量的长度和所述列向量的长度均大于或等于
Figure PCTCN2015072915-appb-000002
With reference to the fourth aspect, in a first possible implementation manner of the fourth aspect, the two-dimensional matrix generated by the establishing unit includes a length of a row vector and a length of the column vector that are greater than or equal to
Figure PCTCN2015072915-appb-000002
其中,Sr为所述所有键值元素组包括的所有键值元素的数量;或者,Sr为所述所有键值元素组包括的所有键值元素经过查询条件过滤后得到的键值元素的数量。Wherein, Sr is the number of all the key element elements included in all the key element group; or Sr is the number of key element elements obtained by filtering all the key value elements included in the all key element group.
结合第四方面,或者第四方面的第一种可能的实现方式,在第四方面的第二种可能的实现方式中,所述确定单元得到的第一子键值元素包括所述任一键值元素用二进制表示时的所有奇数位组成的键值元素,所述确定单元得到的第二子键值元素包括所述任一键值元素用二进制表示时的所有偶数位组成的键值元素;或者 With reference to the fourth aspect, or the first possible implementation manner of the fourth aspect, in the second possible implementation manner of the fourth aspect, the first sub-key value element obtained by the determining unit includes any one of the key value elements a key value element consisting of all odd bits when represented by a binary, and the second sub-key value element obtained by the determining unit includes a key value element consisting of all even bits of the binary element when the binary element is represented; or
所述确定单元得到的第一子键值元素包括所述任一键值元素用二进制表示时的第1至位第K位组成的键值元素,所述确定单元得到的第二子键值元素包括所述任一键值元素用二进制表示时的第K+1位至第N位组成的键值元素,N为所述任一键值元素用二进制表示时的位数,1≤K≤N,K为正整数。The first sub-key value element obtained by the determining unit includes a key value element composed of the first to the Kth bits when the any key value element is expressed in binary, and the second sub-key value element obtained by the determining unit includes the A key value element consisting of a K+1th bit to an Nth bit when any of the key value elements is expressed in binary, and N is a bit number when the any of the key value elements is expressed in binary, 1≤K≤N, and K is a positive integer.
结合第四方面,或者第四方面的第一至第二种可能的实现方式,在第四方面的第三种可能的实现方式中,还包括初始化单元,用于将所述二维矩阵中的任意一行向量和任意一列向量所确定的元素初始化为第一预设标识。With reference to the fourth aspect, or the first to second possible implementation manners of the fourth aspect, in a third possible implementation manner of the fourth aspect, An element determined by any one row vector and any one column vector is initialized to a first preset identifier.
本发明实施例中,一个二维过滤器包括二维矩阵,二维矩阵中可以与多个键值元素组相链接,因此,提高了过滤器的灵活性。In the embodiment of the present invention, a two-dimensional filter includes a two-dimensional matrix, and the two-dimensional matrix can be linked with a plurality of key element groups, thereby improving the flexibility of the filter.
附图说明DRAWINGS
图1A为现有技术中哈希表的结构示意图;1A is a schematic structural diagram of a hash table in the prior art;
图1B为现有技术某一个垃圾Email地址采用过滤器表示的示例图;FIG. 1B is a diagram showing an example of a garbage email address in the prior art using a filter; FIG.
图2A为本发明实施例中生成二维过滤器的流程图;2A is a flowchart of generating a two-dimensional filter in an embodiment of the present invention;
图2B为本发明实施例中二维矩阵的示意图;2B is a schematic diagram of a two-dimensional matrix in an embodiment of the present invention;
图3为本发明实施例中查询键值元素的流程图;3 is a flowchart of querying a key value element in an embodiment of the present invention;
图4为本发明实施例中生成二维过滤器及查询键值元素的实施例;4 is an embodiment of generating a two-dimensional filter and querying a key value element according to an embodiment of the present invention;
图5为本发明实施例中生成二维过滤器的装置的功能结构示意图;FIG. 5 is a schematic diagram showing the functional structure of an apparatus for generating a two-dimensional filter according to an embodiment of the present invention; FIG.
图6为本发明实施例中生成二维过滤器的装置的实体结构示意图;FIG. 6 is a schematic diagram of a physical structure of an apparatus for generating a two-dimensional filter according to an embodiment of the present invention; FIG.
图7为本发明实施例中查询键值元素的装置的功能结构示意图;FIG. 7 is a schematic diagram showing the functional structure of an apparatus for querying a key value element according to an embodiment of the present invention; FIG.
图8为本发明实施例中查询键值元素的装置的实体结构示意图。FIG. 8 is a schematic diagram showing the physical structure of an apparatus for querying a key value element according to an embodiment of the present invention.
具体实施方式detailed description
为使本发明实施例的目的、技术方案和优点更加清楚,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获 得的所有其他实施例,都属于本发明保护的范围。The technical solutions in the embodiments of the present invention will be clearly and completely described in conjunction with the drawings in the embodiments of the present invention. It is a partial embodiment of the invention, and not all of the embodiments. Based on the embodiments of the present invention, those of ordinary skill in the art obtain the following without creative efforts. All other embodiments obtained are within the scope of the invention.
另外,本文中术语“系统”和“网络”在本文中常被可互换使用。本文中术语“和/或”,仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,本文中字母“/”,一般表示前后关联对象是一种“或”的关系。Additionally, the terms "system" and "network" are used interchangeably herein. The term "and/or" in this context is merely an association describing the associated object, indicating that there may be three relationships, for example, A and / or B, which may indicate that A exists separately, and both A and B exist, respectively. B these three situations. In addition, the letter "/" in this article generally indicates that the contextual object is an "or" relationship.
下面结合说明书附图对本发明优选的实施方式进行详细说明,应当理解,此处所描述的优选实施例仅用于说明和解释本发明,并不用于限定本发明,并且在不冲突的情况下,本申请中的实施例及实施例中的特征可以相互组合。The preferred embodiments of the present invention are described in detail below with reference to the accompanying drawings, and the preferred embodiments of the present invention are intended to illustrate and explain the invention, and not to limit the invention, and The embodiments in the application and the features in the embodiments may be combined with each other.
下面结合附图对本发明优选的实施方式进行详细说明。Preferred embodiments of the present invention will be described in detail below with reference to the accompanying drawings.
实施例一 Embodiment 1
参阅图2A所示,本发明实施例中,二维过滤器的生成的详细流程如下:Referring to FIG. 2A, in the embodiment of the present invention, a detailed process of generating a two-dimensional filter is as follows:
步骤200:建立包括至少两个行向量和至少两个列向量的二维矩阵;Step 200: Establish a two-dimensional matrix including at least two row vectors and at least two column vectors;
步骤210:确定哈希函数集合,其中,哈希函数集合中的每一个哈希函数与至少一个键值元素组相对应,哈希函数集合中的任一哈希函数对对应的至少一个键值元素组中的任一键值元素的第一子键值元素进行哈希运算得到第一哈希值、对对应键值元素组中的任一键值元素的第二子键值元素进行哈希运算得到第二哈希值,第一哈希值均为小于或等于行向量的长度的正整数,第二哈希值均为小于或等于列向量的长度的正整数;Step 210: Determine a hash function set, wherein each hash function in the hash function set corresponds to at least one key value element group, and any one of the hash function pairs in the hash function set corresponds to at least one key value The first sub-key value element of any key element in the element group is hashed to obtain a first hash value, and the second sub-key value element of any key element element in the corresponding key value element group is hashed to obtain a second a hash value, the first hash value is a positive integer less than or equal to the length of the row vector, and the second hash value is a positive integer less than or equal to the length of the column vector;
步骤220:生成包括二维矩阵和哈希函数集合的二维过滤器。Step 220: Generate a two-dimensional filter including a two-dimensional matrix and a hash function set.
本发明实施例中,二维矩阵如图2B所示。In the embodiment of the present invention, the two-dimensional matrix is as shown in FIG. 2B.
本发明实施例中,二维矩阵的存储数据单元的数目为行向量的数量和列向量的数量的乘积,如图2B中,二维矩阵的行向量的数量为9,列向量的数量为9,则二维矩阵的存储数据单元的数目为81个。In the embodiment of the present invention, the number of the storage data units of the two-dimensional matrix is the product of the number of row vectors and the number of column vectors. As shown in FIG. 2B, the number of row vectors of the two-dimensional matrix is 9, and the number of column vectors is 9. Then, the number of stored data units of the two-dimensional matrix is 81.
本发明实施例中,若建立的二维矩阵的行向量和列向量长度均小于
Figure PCTCN2015072915-appb-000003
的话,键值元素组中的键值元素载入过滤器时,不同键值元素载入至同一位置的概率较高,影响查询的准确性,因此,为了提高查询的准确度,本发明实施例中,建立的二维矩阵的行向量和列向量长度均大于
Figure PCTCN2015072915-appb-000004
其中,Sr为所有 键值元素组包括的所有键值元素的数量;或者,Sr为所有键值元素组包括的所有键值元素经过查询条件过滤后得到的键值元素的数量。
In the embodiment of the present invention, if the length of the row vector and the column vector of the established two-dimensional matrix are smaller than
Figure PCTCN2015072915-appb-000003
If the key value elements in the key element group are loaded into the filter, the probability that the different key value elements are loaded into the same position is high, which affects the accuracy of the query. Therefore, in order to improve the accuracy of the query, the embodiment of the present invention The length of the row vector and the column vector of the established two-dimensional matrix are greater than
Figure PCTCN2015072915-appb-000004
Where Sr is the number of all key-value elements included in all key-value element groups; or, Sr is the number of key-valued elements that are filtered by the query condition for all key-value elements included in all key-value element groups.
但是,行向量和列向量的长度越大的话,需要的存储空间就越大,因此,本发明实施例中,为了提高存储空间的利用率,本发明实施例中,建立的二维矩阵的行向量和列向量长度均等于
Figure PCTCN2015072915-appb-000005
However, the larger the length of the row vector and the column vector, the larger the required storage space. Therefore, in the embodiment of the present invention, in order to improve the utilization of the storage space, the row of the two-dimensional matrix is established in the embodiment of the present invention. Vector and column vector lengths are equal to
Figure PCTCN2015072915-appb-000005
本发明实施例中,第一键值元素和第二键值元素的形式有多种,可选的,可以采用如下几种形式:In the embodiment of the present invention, the first key value element and the second key value element have various forms, and are optional, and may be in the following forms:
第一子键值元素包括任一键值元素用二进制表示时的所有奇数位组成的键值元素,第二子键值元素包括任一键值元素用二进制表示时的所有偶数位组成的键值元素。The first sub-key value element includes a key value element consisting of all odd-numbered bits when any of the key-value elements are represented in binary, and the second sub-key value element includes a key-valued element consisting of all even-numbered bits when the binary-valued element is represented in binary.
其中,所有奇数位组成的键值元素可以为十进制,所有偶数位组成的键值元素可以为十进制,当然,也可以为其他进制,在此不再进行一一详述。The key value elements of all odd digits may be decimal, and the key value elements of all even digits may be decimal. Of course, other hexadecimal digits may also be used, and details are not described herein.
如键值元素为37348,37348用二进制表示时为:1001000111100100,所有奇数位为:01011010,所有偶数位为:10001100,所有奇数位表示的十进制数为90(第一子键值元素),所有偶数位表示的十进制数为140(第二子键值元素)。For example, when the key element is 37348, 37348 is expressed in binary: 1001000111100100, all odd bits are: 01011010, all even digits are: 10001100, all odd digits represent 90 decimal numbers (first subkey element), all even numbers The decimal number represented by the bit is 140 (the second sub-key element).
或者也可以为如下形式:Or it can be in the form of:
第一子键值元素包括任一键值元素用二进制表示时的第1至位第K位组成的键值元素,第二子键值元素包括任一键值元素用二进制表示时的第K+1位至第N位组成的键值元素,N为任一键值元素用二进制表示时的位数,1≤K≤N,K为正整数。The first sub-key value element includes a key value element consisting of a 1st to a Kth bit when any of the key value elements are expressed in binary, and the second sub-key value element includes a K+1th bit to the A key element composed of N bits, N is the number of bits when any of the key elements are expressed in binary, 1 ≤ K ≤ N, and K is a positive integer.
其中,所有奇数位组成的键值元素可以为十进制,所有偶数位组成的键值元素可以为十进制,当然,也可以为其他进制,在此不再进行一一详述。The key value elements of all odd digits may be decimal, and the key value elements of all even digits may be decimal. Of course, other hexadecimal digits may also be used, and details are not described herein.
如键值元素为37348,37348用二进制表示时为:1001000111100100,第0至第7位为:10010001,第8至第15位为:11100100,第0至第7位所表示的十进制数为90(第一子键值元素),第8至第15位表示的十进制数为140(第二子键值元素)。 For example, when the key element is 37348, 37348 is represented by binary: 1001000111100100, the 0th to 7th digits are: 10010001, the 8th to 15th digits are: 11100100, and the 0th to 7th digits are represented by a decimal number of 90 ( The first sub-key element), the eighth to fifteenth digits represent a decimal number of 140 (the second sub-key element).
本发明实施例中,在建立包括行向量和列向量的二维矩阵之后,还包括:将二维矩阵中的任意一行向量和任意一列向量所确定的元素初始化为第一预设标识。In the embodiment of the present invention, after the two-dimensional matrix including the row vector and the column vector is established, the method further includes: initializing an element determined by any one of the two-dimensional matrix and an arbitrary one of the columns into the first preset identifier.
实施例二Embodiment 2
参阅图3所示,本发明实施例中,利用图2生成的二维过滤器查询键值元素的详细流程如下:Referring to FIG. 3, in the embodiment of the present invention, the detailed process of querying the key element by using the two-dimensional filter generated in FIG. 2 is as follows:
步骤300:从哈希函数集合中确定出每一组键值元素分别对应的哈希函数子集合;Step 300: Determine, from the hash function set, a hash function subset corresponding to each group of key value elements respectively;
步骤310:针对每一组键值元素中的任一键值元素,根据该键值元素所属键值元素组对应的哈希函数子集合计算出哈希值,将计算出的哈希值在二维矩阵中所处位置对应的元素设置为第二预设标识;Step 310: Calculate a hash value according to a hash function sub-set corresponding to the key element group of the key element element for each key value element of each group of key value elements, and calculate the hash value in a two-dimensional matrix. The element corresponding to the position in the middle is set as the second preset identifier;
步骤320:针对待查询的键值元素,确定待查询的键值元素所属键值元素组对应的哈希函数子集合,并根据对应的哈希函数子集合,计算待查询的键值元素的哈希值;Step 320: Determine, for the key value element to be queried, a hash function sub-set corresponding to the key element group to which the key value element to be queried belongs, and calculate a key value element to be queried according to the corresponding hash function sub-set Greek value
步骤330:获取待查询的键值元素的哈希值在二维矩阵中所处位置对应的元素;Step 330: Acquire an element corresponding to a position of a hash value of the key element to be queried in the two-dimensional matrix;
步骤340:当获取的元素为与待查询的键值元素的哈希值在二维矩阵中所处位置相对应的第二预设标识时,确定待查询的键值元素属于二维过滤器表示的键值元素集合。Step 340: When the acquired element is the second preset identifier corresponding to the position of the hash value of the key value element to be queried in the two-dimensional matrix, it is determined that the key value element to be queried belongs to the two-dimensional filter representation. A collection of key-valued elements.
若两类不相同的键值元素对应的哈希函数子集合相同的话,载入二维过滤器时的位置相同,那么不同的键值元素对应二维过滤器的同一位置,这样,查询的准确性较低,为了提高查询的准确度,本发明实施例中,任意两个不相同的键值元素组分别对应的哈希函数子集合中包括的哈希函数不同;或者If the hash function sub-sets of the two different types of key-value elements are the same, the position of the two-dimensional filter is the same when the two-dimensional filter is loaded, and the different key-value elements correspond to the same position of the two-dimensional filter, so that the query is accurate. In the embodiment of the present invention, the hash function included in the hash function sub-set corresponding to any two different key element groups is different in the embodiment of the present invention; or
任意两个不相同的键值元素组分别对应的哈希函数子集合中包括的哈希函数相同,哈希函数的排列方式不同。The hash function included in the hash function sub-set corresponding to any two different key element groups is the same, and the hash function is arranged differently.
例如:第一键值元素组是关于地区的销售表,第二键值元素组是关于月份的销售表,那么第一键值元素组所对应的哈希函数子集合和第二键值元素 组所对应的哈希函数子集合是不相同的。For example, the first key value element group is a sales table for the region, and the second key value element group is a sales table for the month, then the hash function sub-set and the second key value element corresponding to the first key value element group The subset of hash functions corresponding to the group is not the same.
本发明实施例中,计算待查询的键值元素的哈希值的方式有多种,可选的,可以采用如下方式:In the embodiment of the present invention, there are multiple ways to calculate the hash value of the key element to be queried. Alternatively, the following manner may be adopted:
基于待查询的键值元素所属键值元素组对应的哈希函数子集合,对待查询的键值元素的第一子键值元素计算得到第一哈希值;The first hash value of the key element of the key value element to be queried is calculated according to the hash function sub-set corresponding to the key element group to which the key value element to be queried belongs;
基于待查询的键值元素所属键值元素组对应的哈希函数子集合,对待查询的键值元素的第二子键值元素计算得到第二哈希值。The second hash value is calculated based on the hash function sub-set corresponding to the key element group to which the key value element to be queried belongs, and the second sub-key value element of the key element to be queried.
当然,此处的第一子键值元素也可以包括任一键值元素用二进制表示时的所有奇数位组成的键值元素,第二子键值元素包括任一键值元素用二进制表示时的所有偶数位组成的键值元素;或者Of course, the first sub-key element here may also include a key element composed of all the odd bits of the binary element when represented by a binary, and the second sub-key element includes all the even bits of the binary element when represented by a binary representation. Key element; or
第一子键值元素包括任一键值元素用二进制表示时的第1至位第K位组成的键值元素,第二子键值元素包括任一键值元素用二进制表示时的第K+1位至第N位组成的键值元素,N为任一键值元素用二进制表示时的位数,1≤K≤N,K为正整数。The first sub-key value element includes a key value element consisting of a 1st to a Kth bit when any of the key value elements are expressed in binary, and the second sub-key value element includes a K+1th bit to the A key element composed of N bits, N is the number of bits when any of the key elements are expressed in binary, 1 ≤ K ≤ N, and K is a positive integer.
此处第一子键值元素具体的表示形式与实施例中的第一子键值元素的表示形式相同。Here the specific representation of the first sub-key value element is the same as the representation of the first sub-key value element in the embodiment.
本发明实施例中,获取待查询的键值元素的哈希值在二维矩阵中所处位置对应的元素的方式有多种,可选的,可以采用如下方式:In the embodiment of the present invention, there are multiple ways to obtain the elements of the key value of the key value element to be queried in the two-dimensional matrix. Alternatively, the following manner may be adopted:
获取二维矩阵中以第一哈希值为行、第二哈希值为列的元素;或者,获取二维矩阵中以第二哈希值为行、第一哈希值为列的元素。Obtaining an element in the two-dimensional matrix with the first hash value as the row and the second hash value as the column; or, obtaining the element in the two-dimensional matrix with the second hash value as the row and the first hash value as the column.
例如,任一键值元素的第一子键值元素为90、第二子键值元素为140,该任一键值元素对应的哈希函数子集合为(h1、h2、h3),(h1、h2、h3)对90计算得到的第一哈希值分别为6、128、55,(h1、h2、h3)对140计算得到的第二哈希值分别为0、101、46,那么,任一键值元素在二维矩阵中所处的位置为(6、0)、(6、101)、(6、46)、(128、0)、(128、101)、(128、46)、(55、0)、(55、101)、(55、46),那么把这些位置对应的元素都设置为第二预设标识,或者,任一键值元素在二维矩阵中所处的位置为(0、6)、(101、 6)、(46、4)、(0、128)、(101、128)、(46、128)、(0、55)、(101、55)、(46、55),那么把这些位置对应的元素都设置为第二预设标识。For example, the first sub-key element of any key-value element is 90, the second sub-key element is 140, and the hash function sub-set corresponding to any of the key-value elements is (h1, h2, h3), (h1, h2, h3) The first hash value calculated for 90 is 6, 128, 55, respectively (h1, h2, h3) The second hash value calculated for 140 is 0, 101, 46, respectively, then any key value element is The positions in the two-dimensional matrix are (6, 0), (6, 101), (6, 46), (128, 0), (128, 101), (128, 46), (55, 0) (55, 101), (55, 46), then the elements corresponding to these positions are set to the second preset identifier, or the position of any key element in the two-dimensional matrix is (0, 6), (101, 6), (46, 4), (0, 128), (101, 128), (46, 128), (0, 55), (101, 55), (46, 55), then these positions are corresponding The elements are all set to the second preset identifier.
在实施例二中,查询键值元素是否属于多个键值元素组包括的键值元素时,只需要基于这个二维过滤器查询就可以了,没必要生成与每一个键值元素组分别对应的布隆过滤器,并且在查询键值元素是否属于多个键值元素组包括的键值元素时,不需要分别基于多个布隆过滤器一一进行查询,因此,还解决了目前查询效率较低的缺陷。In the second embodiment, when the query key value element belongs to the key value element included in the plurality of key value element groups, only the two-dimensional filter query is needed, and it is not necessary to generate a corresponding one for each key value element group. The Bloom filter, and when querying whether the key element belongs to a key element included in a plurality of key element groups, it is not necessary to separately query based on multiple Bloom filters, thereby solving the current query efficiency. Lower defects.
在实施例一和实施例二中讲述的以二维矩阵为例进行说明,当然,还可以是三维矩阵,四维矩阵等多维矩阵,生成多维矩阵的过程与生成二维矩阵的过程类似,且基于多维矩阵的查询过程,和基于二维矩阵的查询过程类似,在此不再进行一一详述。The two-dimensional matrix is described as an example in the first embodiment and the second embodiment. Of course, it can also be a multi-dimensional matrix such as a three-dimensional matrix or a four-dimensional matrix. The process of generating a multi-dimensional matrix is similar to the process of generating a two-dimensional matrix, and is based on The query process of the multi-dimensional matrix is similar to the query process based on the two-dimensional matrix, and will not be described in detail here.
为了更好地理解本发明实施例,以下给出具体应用场景,针对查询键值元素的过程,作出进一步详细描述,如图4所示:In order to better understand the embodiments of the present invention, a specific application scenario is given below, and a detailed description is made for the process of querying key element elements, as shown in FIG. 4:
实施例三Embodiment 3
步骤400:建立包括三个行向量和三个列向量的二维矩阵;Step 400: Establish a two-dimensional matrix including three row vectors and three column vectors;
步骤410:确定哈希函数集合,生成包括二维矩阵和哈希函数集合的二维过滤器;Step 410: Determine a hash function set, and generate a two-dimensional filter including a two-dimensional matrix and a hash function set;
其中,哈希函数集合中的每一个哈希函数与至少一个键值元素组相对应,哈希函数集合中的任一哈希函数对对应的至少一个键值元素组中的任一键值元素的第一子键值元素进行哈希运算得到第一哈希值、对对应键值元素组中的任一键值元素的第二子键值元素进行哈希运算得到第二哈希值,第一哈希值均为小于或等于行向量的长度的正整数,第二哈希值均为小于或等于列向量的长度的正整数;Wherein each hash function in the hash function set corresponds to at least one key value element group, and any one of the hash function sets in the hash function set corresponds to any one of the at least one key value element group A subkey element is hashed to obtain a first hash value, and a second hash value element of any key element element in the corresponding key value element group is hashed to obtain a second hash value, the first hash value a positive integer that is less than or equal to the length of the row vector, and the second hash value is a positive integer that is less than or equal to the length of the column vector;
且该步骤确定的哈希函数集合包括10个哈希函数:h1、h2、h3、h4、h5、h6、h7、h8、h9、h10。And the set of hash functions determined in this step includes 10 hash functions: h1, h2, h3, h4, h5, h6, h7, h8, h9, h10.
步骤420:将二维矩阵中的任意一行向量和任意一列向量所确定的元素初始化为第一预设标识; Step 420: Initialize an element determined by any one row vector and any one column vector in the two-dimensional matrix into a first preset identifier;
步骤430:从确定的哈希函数集合中确定出两个键值元素组分别对应的哈希函数子集合;Step 430: Determine, from the determined set of hash functions, a subset of the hash function corresponding to the two sets of key element elements respectively;
步骤440:针对两个键值元素组中的任意一键值元素,根据该键值元素所属键值元素组对应的哈希函数子集合计算出哈希值,并将计算出的哈希值在二维矩阵中所处位置对应的元素预设为第二预设标识;Step 440: Calculate a hash value according to a hash function sub-set corresponding to the key element group of the key element element for any one of the two key value element groups, and calculate the hash value. The element corresponding to the position in the two-dimensional matrix is preset as the second preset identifier;
步骤450:针对待查询的键值元素,确定待查询的键值元素所属键值元素组对应的哈希函数子集合,并根据对应的哈希函数子集合,计算待查询的键值元素的哈希值;Step 450: Determine, for the key value element to be queried, a hash function sub-set corresponding to the key element group to which the key value element to be queried belongs, and calculate a key value element to be queried according to the corresponding hash function sub-set Greek value
步骤460:获取待查询的键值元素的哈希值在二维矩阵中所处位置对应的元素;Step 460: Acquire an element corresponding to a position of a hash value of the key element to be queried in the two-dimensional matrix;
步骤470:判断获取的元素是否为与待查询的键值元素的哈希值在二维矩阵中所处位置相对应的第二预设标识,若是,确定待查询的键值元素属于二维过滤器表示的键值元素集合;否则,确定待查询的键值元素不属于二维过滤器表示的键值元素集合。Step 470: Determine whether the acquired element is a second preset identifier corresponding to a position of the hash value of the key element to be queried in the two-dimensional matrix, and if yes, determine that the key value element to be queried belongs to the two-dimensional filter. The set of key-valued elements represented by the device; otherwise, it is determined that the key-valued element to be queried does not belong to the set of key-valued elements represented by the two-dimensional filter.
基于上述技术方案,参阅图5所示,本发明实施例提供一种二维过滤器的生成装置,该生成装置包括建立单元50、确定单元51、生成单元52,其中:Based on the foregoing technical solution, referring to FIG. 5, an embodiment of the present invention provides a device for generating a two-dimensional filter, where the generating device includes an establishing unit 50, a determining unit 51, and a generating unit 52, where:
建立单元50,用于建立包括至少两个行向量和至少两个列向量的二维矩阵;An establishing unit 50, configured to establish a two-dimensional matrix including at least two row vectors and at least two column vectors;
确定单元51,用于确定哈希函数集合,其中,哈希函数集合中的每一个哈希函数与至少一个键值元素组相对应,哈希函数集合中的任一哈希函数对对应的至少一个键值元素组中的任一键值元素的第一子键值元素进行哈希运算得到第一哈希值、对对应键值元素组中的任一键值元素的第二子键值元素进行哈希运算得到第二哈希值,第一哈希值均为小于或等于行向量的长度的正整数,第二哈希值均为小于或等于列向量的长度的正整数;a determining unit 51, configured to determine a hash function set, wherein each hash function in the hash function set corresponds to at least one key value element group, and any one of the hash function pairs in the hash function set corresponds to at least The first sub-key value element of any key-value element in a key-value element group is hashed to obtain a first hash value, and the second sub-key value element of any key-value element in the corresponding key-value element group is hashed Obtaining a second hash value, each of which is a positive integer less than or equal to the length of the row vector, and the second hash value is a positive integer less than or equal to the length of the column vector;
生成单元52,用于生成包括二维矩阵和哈希函数集合的二维过滤器。The generating unit 52 is configured to generate a two-dimensional filter including a two-dimensional matrix and a hash function set.
本发明实施例中,可选的,建立单元50生成的二维矩阵包括的行向量的长度和列向量的长度均大于或等于
Figure PCTCN2015072915-appb-000006
In the embodiment of the present invention, optionally, the length of the row vector and the length of the column vector included in the two-dimensional matrix generated by the establishing unit 50 are greater than or equal to
Figure PCTCN2015072915-appb-000006
其中,Sr为所有键值元素组包括的所有键值元素的数量;或者,Sr为所有键值元素组包括的所有键值元素经过查询条件过滤后得到的键值元素的数量。Where Sr is the number of all key element elements included in all key value element groups; or, Sr is the number of key value elements obtained by filtering all query key conditions of all key value elements included in all key value element groups.
本发明实施例中,可选的,确定单元51得到的第一子键值元素包括任一键值元素用二进制表示时的所有奇数位组成的键值元素,确定单元51得到的第二子键值元素包括任一键值元素用二进制表示时的所有偶数位组成的键值元素;或者In the embodiment of the present invention, optionally, the first sub-key value element obtained by the determining unit 51 includes a key value element composed of all the odd-numbered bits when any of the key-value elements are expressed in binary, and the second sub-key value element obtained by the determining unit 51 is determined. a key-value element consisting of all even-numbered bits when any key-valued element is represented in binary; or
确定单元51得到的第一子键值元素包括任一键值元素用二进制表示时的第1至位第K位组成的键值元素,确定单元51得到的第二子键值元素包括任一键值元素用二进制表示时的第K+1位至第N位组成的键值元素,N为任一键值元素用二进制表示时的位数,1≤K≤N,K为正整数。The first sub-key value element obtained by the determining unit 51 includes a key value element composed of the 1st to the Kth bits when any of the key value elements are expressed in binary, and the second sub-key value element obtained by the determining unit 51 includes any key value element in binary. The key element composed of the K+1th to the Nthth position at the time of representation, and N is the number of bits when any of the key elements is expressed in binary, 1≤K≤N, and K is a positive integer.
本发明实施例中,进一步的,还包括初始化单元53,用于将二维矩阵中的任意一行向量和任意一列向量所确定的元素初始化为第一预设标识。In the embodiment of the present invention, the method further includes an initialization unit 53 for initializing an element determined by any one of the two-dimensional matrix and any one of the column vectors as the first preset identifier.
如图6所示,为本发明提供的二维过滤器的生成装置的实体装置图,二维过滤器的生成装置包括至少一个处理器601,通信总线602,存储器603以及至少一个通信接口604。As shown in FIG. 6 , it is a physical device diagram of a two-dimensional filter generating apparatus provided by the present invention. The two-dimensional filter generating apparatus includes at least one processor 601, a communication bus 602, a memory 603, and at least one communication interface 604.
其中,通信总线602用于实现上述组件之间的连接并通信,通信接口604用于与外部设备连接并通信。The communication bus 602 is used to implement the connection and communication between the above components, and the communication interface 604 is used to connect and communicate with external devices.
其中,存储器603用于存储需要执行的程序代码,当处理器601执行存储器603中的程序代码时,实现如下功能:The memory 603 is used to store program code that needs to be executed. When the processor 601 executes the program code in the memory 603, the following functions are implemented:
建立包括至少两个行向量和至少两个列向量的二维矩阵;Establishing a two-dimensional matrix comprising at least two row vectors and at least two column vectors;
确定哈希函数集合,其中,哈希函数集合中的每一个哈希函数与至少一个键值元素组相对应,哈希函数集合中的任一哈希函数对对应的至少一个键值元素组中的任一键值元素的第一子键值元素进行哈希运算得到第一哈希值、对对应键值元素组中的任一键值元素的第二子键值元素进行哈希运算得到第二哈希值,第一哈希值均为小于或等于行向量的长度的正整数,第二哈希值均为小于或等于列向量的长度的正整数; Determining a hash function set, wherein each hash function in the hash function set corresponds to at least one key value element group, and any one of the hash function pairs in the hash function set corresponds to at least one key value element group The first sub-key value element of any of the key element elements is hashed to obtain a first hash value, and the second sub-key value element of any key element element in the corresponding key value element group is hashed to obtain a second hash value. The first hash value is a positive integer less than or equal to the length of the row vector, and the second hash value is a positive integer less than or equal to the length of the column vector;
生成包括二维矩阵和哈希函数集合的二维过滤器。A two-dimensional filter is generated that includes a two-dimensional matrix and a collection of hash functions.
基于上述技术方案,参阅图7所示,本发明实施例提供一种查询键值元素的装置,该查询键值元素的装置包括确定单元70、设置单元71、计算单元72、获取单元73,查询单元74,其中:Based on the foregoing technical solution, as shown in FIG. 7, an embodiment of the present invention provides an apparatus for querying a key value element, where the apparatus for querying a key value element includes a determining unit 70, a setting unit 71, a calculating unit 72, an obtaining unit 73, and a query. Unit 74, wherein:
确定单元70,用于从哈希函数集合中确定出每一组键值元素分别对应的哈希函数子集合;a determining unit 70, configured to determine, from the hash function set, a hash function subset corresponding to each group of key value elements;
设置单元71,用于针对每一组键值元素中的任一键值元素,根据该键值元素所属键值元素组对应的哈希函数子集合计算出哈希值,将计算出的哈希值在二维矩阵中所处位置对应的元素设置为第二预设标识;The setting unit 71 is configured to calculate, according to any one of the key value elements of each set of key value elements, a hash value according to the hash function subset corresponding to the key element element group to which the key value element belongs, and calculate the hash value in The element corresponding to the position in the two-dimensional matrix is set as the second preset identifier;
计算单元72,用于针对待查询的键值元素,确定待查询的键值元素所属键值元素组对应的哈希函数子集合,并根据对应的哈希函数子集合,计算待查询的键值元素的哈希值;The calculating unit 72 is configured to determine, according to the key value element to be queried, a hash function sub-set corresponding to the key element group to which the key value element to be queried belongs, and calculate a key value to be queried according to the corresponding hash function sub-set The hash value of the element;
获取单元73,用于获取待查询的键值元素组的哈希值在二维矩阵中所处位置对应的元素;The obtaining unit 73 is configured to obtain an element corresponding to a position of the hash value of the key element group to be queried in the two-dimensional matrix;
查询单元74,用于当获取的元素为与待查询的键值元素的哈希值在二维矩阵中所处位置相对应的第二预设标识,确定待查询的键值元素属于二维过滤器表示的键值元素集合。The query unit 74 is configured to: when the acquired element is a second preset identifier corresponding to a position of the hash value of the key value element to be queried in the two-dimensional matrix, determine that the key value element to be queried belongs to the two-dimensional filtering A collection of key-valued elements represented by the device.
本发明实施例中,可选的,确定单元70确定出的任意两个不相同的键值元素组分别对应的哈希函数子集合中包括的哈希函数不同;或者In the embodiment of the present invention, optionally, the hash function included in the hash function sub-set corresponding to any two different key element group determined by the determining unit 70 is different; or
确定单元70确定出的任意两个不相同的键值元素组分别对应的哈希函数子集合中包括的哈希函数相同,哈希函数的排列方式不同。The hash function included in the hash function sub-set corresponding to any two different key element group determined by the determining unit 70 is the same, and the hash function is arranged differently.
本发明实施例中,可选的,计算单元72具体用于:In the embodiment of the present invention, optionally, the calculating unit 72 is specifically configured to:
基于待查询的键值元素所属键值元素组对应的哈希函数子集合,对待查询的键值元素的第一子键值元素计算得到第一哈希值;The first hash value of the key element of the key value element to be queried is calculated according to the hash function sub-set corresponding to the key element group to which the key value element to be queried belongs;
基于待查询的键值元素所属键值元素组对应的哈希函数子集合,对待查询的键值元素的第二子键值元素计算得到第二哈希值。The second hash value is calculated based on the hash function sub-set corresponding to the key element group to which the key value element to be queried belongs, and the second sub-key value element of the key element to be queried.
本发明实施例中,可选的,获取单元73具体用于: In the embodiment of the present invention, the obtaining unit 73 is specifically configured to:
获取二维矩阵中以第一哈希值为行、第二哈希值为列的元素;或者,获取二维矩阵中以第二哈希值为行、第一哈希值为列的元素。Obtaining an element in the two-dimensional matrix with the first hash value as the row and the second hash value as the column; or, obtaining the element in the two-dimensional matrix with the second hash value as the row and the first hash value as the column.
如图8所示,为本发明提供的二维过滤器的生成装置的实体装置图,二维过滤器的生成装置包括至少一个处理器801,通信总线802,存储器803以及至少一个通信接口804。As shown in FIG. 8 , it is a physical device diagram of a two-dimensional filter generating apparatus provided by the present invention. The two-dimensional filter generating apparatus includes at least one processor 801, a communication bus 802, a memory 803, and at least one communication interface 804.
其中,通信总线802用于实现上述组件之间的连接并通信,通信接口804用于与外部设备连接并通信。The communication bus 802 is used to implement the connection and communication between the above components, and the communication interface 804 is used to connect and communicate with external devices.
其中,存储器803用于存储需要执行的程序代码,当处理器801执行存储器803中的程序代码时,实现如下功能:The memory 803 is configured to store program code that needs to be executed. When the processor 801 executes the program code in the memory 803, the following functions are implemented:
从哈希函数集合中确定出每一组键值元素分别对应的哈希函数子集合;Determining, from the set of hash functions, a subset of hash functions corresponding to each set of key value elements;
针对每一组键值元素中的任一键值元素,根据该键值元素所属键值元素组对应的哈希函数子集合计算出哈希值,将计算出的哈希值在二维矩阵中所处位置对应的元素设置为第二预设标识;For each key value element of each set of key value elements, a hash value is calculated according to the hash function sub-set corresponding to the key value element group of the key value element, and the calculated hash value is in the two-dimensional matrix. The element corresponding to the location is set as the second preset identifier;
针对待查询的键值元素,确定待查询的键值元素所属键值元素组对应的哈希函数子集合,并根据对应的哈希函数子集合,计算待查询的键值元素的哈希值;Determining, according to the key value element to be queried, a hash function sub-set corresponding to the key element group to which the key value element to be queried belongs, and calculating a hash value of the key value element to be queried according to the corresponding hash function sub-set;
获取待查询的键值元素的哈希值在二维矩阵中所处位置对应的元素;Obtaining an element corresponding to the position of the hash value of the key element to be queried in the two-dimensional matrix;
当获取的元素为与待查询的键值元素的哈希值在二维矩阵中所处位置相对应的第二预设标识时,确定待查询的键值元素属于二维过滤器表示的键值元素集合。When the acquired element is the second preset identifier corresponding to the position of the hash value of the key element to be queried in the two-dimensional matrix, it is determined that the key value element to be queried belongs to the key value represented by the two-dimensional filter Collection of elements.
综上所述,本发明实施例中,一个二维过滤器包括二维矩阵,二维矩阵中可以与多个键值元素组相链接,因此,提高了过滤器的灵活性。In summary, in the embodiment of the present invention, a two-dimensional filter includes a two-dimensional matrix, and the two-dimensional matrix can be linked with a plurality of key element groups, thereby improving the flexibility of the filter.
进一步的,查询键值元素是否属于多个键值元素组包括的键值元素时,只需要基于这个二维过滤器查询就可以了,没必要生成与每一个键值元素组分别对应的布隆过滤器,并且在查询键值元素是否属于多个键值元素组包括的键值元素时,不需要分别基于多个布隆过滤器一一进行查询,因此,还解决了目前查询效率较低的缺陷。 Further, when the query key value element belongs to the key value element included in the plurality of key value element groups, only the two-dimensional filter query is needed, and it is not necessary to generate the Bronze corresponding to each of the key value element groups respectively. The filter, and when the query key value element belongs to the key value element included in the plurality of key value element groups, it is not necessary to separately query based on the multiple bloom filters, thereby solving the current query efficiency is low. defect.
本发明是参照根据本发明实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中的功能的装置。The present invention has been described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (system), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or FIG. These computer program instructions can be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing device to produce a machine for the execution of instructions for execution by a processor of a computer or other programmable data processing device. Means for implementing the functions in one or more of the flow or in a block or blocks of the flowchart.
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中的功能。The computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing device to operate in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture comprising the instruction device. The apparatus functions in one or more blocks of a flow or a flow diagram and/or block diagram of a flowchart.
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中的功能的步骤。These computer program instructions can also be loaded onto a computer or other programmable data processing device such that a series of operational steps are performed on a computer or other programmable device to produce computer-implemented processing for execution on a computer or other programmable device. The instructions provide steps for implementing the functions in one or more blocks of the flowchart or in a flow or block of the flowchart.
尽管已描述了本发明的优选实施例,但本领域内的技术人员一旦得知了基本创造性概念,则可对这些实施例作出另外的变更和修改。所以,所附权利要求意欲解释为包括优选实施例以及落入本发明范围的所有变更和修改。While the preferred embodiment of the invention has been described, it will be understood that Therefore, the appended claims are intended to be interpreted as including the preferred embodiments and the modifications and
显然,本领域的技术人员可以对本发明实施例进行各种改动和变型而不脱离本发明实施例的精神和范围。这样,倘若本发明实施例的这些修改和变型属于本发明权利要求及其等同技术的范围之内,则本发明也意图包含这些改动和变型在内。 It is apparent that those skilled in the art can make various modifications and variations to the embodiments of the invention without departing from the spirit and scope of the embodiments of the invention. Thus, it is intended that the present invention cover the modifications and modifications of the embodiments of the invention.

Claims (16)

  1. 一种查询键值元素的方法,其特征在于,包括:A method for querying a key value element, comprising:
    从哈希函数集合中确定出每一组键值元素组分别对应的哈希函数子集合;Determining, from the hash function set, a hash function sub-set corresponding to each group of key value element groups;
    针对每一组键值元素组中的任一键值元素,根据该键值元素所属键值元素组对应的哈希函数子集合计算出哈希值,将计算出的哈希值在二维矩阵中所处位置对应的元素设置为第二预设标识;For each key value element in each group of key value element groups, a hash value is calculated according to the hash function sub-set corresponding to the key value element group to which the key value element belongs, and the calculated hash value is in the two-dimensional matrix. The element corresponding to the location is set as the second preset identifier;
    针对待查询的键值元素,确定所述待查询的键值元素所属键值元素组对应的哈希函数子集合,并根据所述对应的哈希函数子集合,计算所述待查询的键值元素的哈希值;Determining, according to the key value element to be queried, a hash function sub-set corresponding to the key element group to which the key value element to be queried belongs, and calculating the key value to be queried according to the corresponding hash function sub-set The hash value of the element;
    获取所述待查询的键值元素的哈希值在所述二维矩阵中所处位置对应的元素;Obtaining an element corresponding to a position of the hash value of the to-be-queried key value element in the two-dimensional matrix;
    当所述获取的元素为所述待查询的键值元素的哈希值在所述二维矩阵中所处位置对应的第二预设标识时,确定所述待查询的键值元素属于二维过滤器表示的键值元素集合。Determining that the key value element to be queried belongs to two-dimensional when the acquired element is a second preset identifier corresponding to a position of the hash value of the to-be-queried key element in the two-dimensional matrix. A collection of key-valued elements represented by the filter.
  2. 如权利要求1所述的方法,其特征在于,任意两个不相同的键值元素组分别对应的哈希函数子集合中包括的哈希函数不同;或者The method according to claim 1, wherein the hash function included in the hash function sub-set corresponding to any two different key element groups is different; or
    任意两个不相同的键值元素组分别对应的哈希函数子集合中包括的哈希函数相同,哈希函数的排列方式不同。The hash function included in the hash function sub-set corresponding to any two different key element groups is the same, and the hash function is arranged differently.
  3. 如权利要求1或2所述的方法,其特征在于,计算所述待查询的键值元素的哈希值,具体包括:The method of claim 1 or 2, wherein calculating the hash value of the key element to be queried comprises:
    基于所述待查询的键值元素所属键值元素组对应的哈希函数子集合,对所述待查询的键值元素的第一子键值元素计算得到第一哈希值;Calculating, according to the hash function sub-set corresponding to the key element group to which the key value element to be queried belongs, calculating a first hash value for the first sub-key value element of the key value element to be queried;
    基于所述待查询的键值元素所属键值元素组对应的哈希函数子集合,对所述待查询的键值元素的第二子键值元素计算得到第二哈希值。And calculating, according to the hash function sub-set corresponding to the key element group to which the key value element to be queried belongs, a second hash value is calculated for the second sub-key value element of the key value element to be queried.
  4. 如权利要求3所述的方法,其特征在于,获取所述待查询的键值元素 的哈希值在所述二维矩阵中所处位置对应的元素,具体包括:The method of claim 3, wherein the key element to be queried is obtained The element corresponding to the location of the hash value in the two-dimensional matrix includes:
    获取所述二维矩阵中以所述第一哈希值为行、所述第二哈希值为列的元素;或者,获取所述二维矩阵中以所述第二哈希值为行、所述第一哈希值为列的元素。Obtaining, in the two-dimensional matrix, an element having the first hash value as a row and the second hash value as a column; or acquiring the second hash value in the two-dimensional matrix, The first hash value is an element of the column.
  5. 一种二维过滤器的生成方法,其特征在于,包括:A method for generating a two-dimensional filter, comprising:
    建立包括至少两个行向量和至少两个列向量的二维矩阵;Establishing a two-dimensional matrix comprising at least two row vectors and at least two column vectors;
    确定哈希函数集合,其中,所述哈希函数集合中的每一个哈希函数与至少一个键值元素组相对应,所述哈希函数集合中的任一哈希函数对对应的至少一个键值元素组中的任一键值元素的第一子键值元素进行哈希运算得到第一哈希值、对所述任一键值元素的第二子键值元素进行哈希运算得到第二哈希值,所述第一哈希值为小于或等于所述行向量的长度的正整数,所述第二哈希值为小于或等于所述列向量的长度的正整数;Determining a set of hash functions, wherein each hash function in the set of hash functions corresponds to at least one group of key value elements, and any one of the hash function sets corresponds to at least one key The first sub-key value element of any key value element in the value element group is hashed to obtain a first hash value, and the second sub-key value element of the any key value element is hashed to obtain a second hash value, The first hash value is a positive integer that is less than or equal to the length of the row vector, and the second hash value is a positive integer that is less than or equal to the length of the column vector;
    生成包括所述二维矩阵和所述哈希函数集合的二维过滤器。A two-dimensional filter including the two-dimensional matrix and the set of hash functions is generated.
  6. 如权利要求5所述的方法,其特征在于,所述行向量的长度和所述列向量的长度均大于或等于
    Figure PCTCN2015072915-appb-100001
    The method of claim 5 wherein the length of the row vector and the length of the column vector are greater than or equal to
    Figure PCTCN2015072915-appb-100001
    其中,Sr为所述所有键值元素组包括的所有键值元素的数量;或者,Sr为所述所有键值元素组包括的所有键值元素经过查询条件过滤后得到的键值元素的数量。Wherein, Sr is the number of all the key element elements included in all the key element group; or Sr is the number of key element elements obtained by filtering all the key value elements included in the all key element group.
  7. 如权利要求5或6所述的方法,其特征在于,所述第一子键值元素包括所述任一键值元素用二进制表示时的所有奇数位组成的键值元素,所述第二子键值元素包括所述任一键值元素用二进制表示时的所有偶数位组成的键值元素;或者A method according to claim 5 or claim 6, wherein said first sub-key value element comprises a key value element consisting of all odd-numbered bits when said any of said key-value elements are represented in binary, said second sub-key value The element includes a key element consisting of all the even bits of any of the key element elements represented in binary; or
    所述第一子键值元素包括所述任一键值元素用二进制表示时的第1位至第K位组成的键值元素,所述第二子键值元素包括所述任一键值元素用二进制表示时的第K+1位至第N位组成的键值元素,N为所述任一键值元素用二进制表示时的位数,1≤K≤N,K为正整数。The first sub-key value element includes a key value element consisting of a first bit to a Kth bit when the any of the key value elements is expressed in binary, and the second sub-key value element includes when the any of the key value elements is expressed in binary The key value element consisting of the K+1th to the Nth bits, N is the number of bits when any of the key elements is expressed in binary, 1≤K≤N, and K is a positive integer.
  8. 如权利要求5-7任一项所述的方法,其特征在于,还包括: The method of any of claims 5-7, further comprising:
    将所述二维矩阵中的任意一行向量和任意一列向量所确定的元素初始化为第一预设标识。An element determined by any one of the two-dimensional matrix and any one of the columns of vectors is initialized to a first preset identifier.
  9. 一种查询键值元素的装置,其特征在于,包括:A device for querying a key value element, comprising:
    确定单元,用于从哈希函数集合中确定出每一组键值元素组分别对应的哈希函数子集合;a determining unit, configured to determine, from the hash function set, a hash function sub-collection corresponding to each group of key value element groups;
    设置单元,用于针对每一组键值元素组中的任一键值元素,根据该键值元素所属键值元素组对应的哈希函数子集合计算出哈希值,将计算出的哈希值在二维矩阵中所处位置对应的元素设置为第二预设标识;a setting unit, configured to calculate a hash value according to a hash function sub-set corresponding to the key value element group of the key value element group for each key value element group in each group of key value element groups, and calculate the hash value in The element corresponding to the position in the two-dimensional matrix is set as the second preset identifier;
    计算单元,用于针对待查询的键值元素,确定所述待查询的键值元素所属键值元素组对应的哈希函数子集合,并根据所述对应的哈希函数子集合,计算所述待查询的键值元素的哈希值;a calculation unit, configured to determine, according to the key value element to be queried, a hash function subset corresponding to the key element group to which the key value element to be queried belongs, and calculate the hash function according to the corresponding hash function subset The hash value of the key element to be queried;
    获取单元,用于获取所述待查询的键值元素组的哈希值在所述二维矩阵中所处位置对应的元素;An acquiring unit, configured to acquire an element corresponding to a location of the hash value of the group of key element elements to be queried in the two-dimensional matrix;
    查询单元,用于当所述获取的元素为所述待查询的键值元素的哈希值在所述二维矩阵中所处位置对应的所述第二预设标识,确定所述待查询的键值元素属于二维过滤器表示的键值元素集合。a query unit, configured to determine, when the acquired element is the second preset identifier corresponding to a location of the hash value of the to-be-queried key element in the two-dimensional matrix, The key-value element belongs to a collection of key-valued elements represented by a two-dimensional filter.
  10. 如权利要求9所述的装置,其特征在于,所述确定单元确定出的任意两个不相同的键值元素组分别对应的哈希函数子集合中包括的哈希函数不同;或者The apparatus according to claim 9, wherein the hash function included in the hash function sub-set corresponding to any two different key element group determined by the determining unit is different; or
    所述确定单元确定出的任意两个不相同的键值元素组分别对应的哈希函数子集合中包括的哈希函数相同,哈希函数的排列方式不同。The hash function included in the hash function sub-set corresponding to any two different key element group determined by the determining unit is the same, and the hash function is arranged differently.
  11. 如权利要求9或10所述的装置,其特征在于,所述计算单元具体用于:The device according to claim 9 or 10, wherein the calculating unit is specifically configured to:
    基于所述待查询的键值元素所属键值元素组对应的哈希函数子集合,对所述待查询的键值元素的第一子键值元素计算得到第一哈希值;Calculating, according to the hash function sub-set corresponding to the key element group to which the key value element to be queried belongs, calculating a first hash value for the first sub-key value element of the key value element to be queried;
    基于所述待查询的键值元素所属键值元素组对应的哈希函数子集合,对所述待查询的键值元素的第二子键值元素计算得到第二哈希值。 And calculating, according to the hash function sub-set corresponding to the key element group to which the key value element to be queried belongs, a second hash value is calculated for the second sub-key value element of the key value element to be queried.
  12. 如权利要求11所述的装置,其特征在于,所述获取单元具体用于:The device according to claim 11, wherein the obtaining unit is specifically configured to:
    获取所述二维矩阵中以所述第一哈希值为行、所述第二哈希值为列的元素;或者,获取所述二维矩阵中以所述第二哈希值为行、所述第一哈希值为列的元素。Obtaining, in the two-dimensional matrix, an element having the first hash value as a row and the second hash value as a column; or acquiring the second hash value in the two-dimensional matrix, The first hash value is an element of the column.
  13. 一种二维过滤器的生成装置,其特征在于,包括:A device for generating a two-dimensional filter, comprising:
    建立单元,用于建立包括至少两个行向量和至少两个列向量的二维矩阵;Establishing a unit for establishing a two-dimensional matrix comprising at least two row vectors and at least two column vectors;
    确定单元,用于确定哈希函数集合,其中,所述哈希函数集合中的每一个哈希函数与至少一个键值元素组相对应,所述哈希函数集合中的任一哈希函数对对应的至少一个键值元素组中的任一键值元素的第一子键值元素进行哈希运算得到第一哈希值、对所述任一键值元素的第二子键值元素进行哈希运算得到第二哈希值,所述第一哈希值为小于或等于所述行向量的长度的正整数,所述第二哈希值为小于或等于所述列向量的长度的正整数;a determining unit, configured to determine a hash function set, wherein each hash function in the hash function set corresponds to at least one key value element group, and any one of the hash function sets The first sub-key value element of any one of the corresponding at least one key value element group is hashed to obtain a first hash value, and the second sub-key value element of the any of the key value elements is hashed to obtain a a second hash value, the first hash value being a positive integer less than or equal to the length of the row vector, and the second hash value being a positive integer less than or equal to the length of the column vector;
    生成单元,用于生成包括所述二维矩阵和所述哈希函数集合的二维过滤器。And a generating unit, configured to generate a two-dimensional filter including the two-dimensional matrix and the hash function set.
  14. 如权利要求13所述的装置,其特征在于,所述建立单元生成的二维矩阵包括的行向量的长度和所述列向量的长度均大于或等于
    Figure PCTCN2015072915-appb-100002
    The apparatus according to claim 13, wherein the length of the row vector included in the two-dimensional matrix generated by the establishing unit and the length of the column vector are greater than or equal to
    Figure PCTCN2015072915-appb-100002
    其中,Sr为所述所有键值元素组包括的所有键值元素的数量;或者,Sr为所述所有键值元素组包括的所有键值元素经过查询条件过滤后得到的键值元素的数量。Wherein, Sr is the number of all the key element elements included in all the key element group; or Sr is the number of key element elements obtained by filtering all the key value elements included in the all key element group.
  15. 如权利要求13或14所述的装置,其特征在于,所述确定单元得到的第一子键值元素包括所述任一键值元素用二进制表示时的所有奇数位组成的键值元素,所述确定单元得到的第二子键值元素包括所述任一键值元素用二进制表示时的所有偶数位组成的键值元素;或者The apparatus according to claim 13 or 14, wherein said first sub-key value element obtained by said determining unit comprises a key value element consisting of all odd-numbered bits when said one of said key-value elements is represented in binary, said determining The second sub-key value element obtained by the unit includes a key value element consisting of all the even bits of the binary key element when the binary value element is represented; or
    所述确定单元得到的第一子键值元素包括所述任一键值元素用二进制表示时的第1至位第K位组成的键值元素,所述确定单元得到的第二子键值元素包括所述任一键值元素用二进制表示时的第K+1位至第N位组成的键值元素,N为所述任一键值元素用二进制表示时的位数,1≤K≤N,K为正整数。 The first sub-key value element obtained by the determining unit includes a key value element composed of the first to the Kth bits when the any key value element is expressed in binary, and the second sub-key value element obtained by the determining unit includes the A key value element consisting of a K+1th bit to an Nth bit when any of the key value elements is expressed in binary, and N is a bit number when the any of the key value elements is expressed in binary, 1≤K≤N, and K is a positive integer.
  16. 如权利要求13-15任一项所述的装置,其特征在于,还包括初始化单元,用于将所述二维矩阵中的任意一行向量和任意一列向量所确定的元素初始化为第一预设标识。 The apparatus according to any one of claims 13-15, further comprising an initialization unit, configured to initialize an element determined by any one of the two-dimensional matrix and an arbitrary one of the columns to the first preset Logo.
PCT/CN2015/072915 2014-08-28 2015-02-12 Two-dimensional filter generation method, query method and device WO2016029664A1 (en)

Priority Applications (4)

Application Number Priority Date Filing Date Title
RU2017109924A RU2017109924A (en) 2014-08-28 2015-02-12 METHOD AND DEVICE FOR GENERATING TWO-DIMENSIONAL MATRIX AND METHOD AND DEVICE FOR REQUESTING ELEMENT WITH KEY VALUE
JP2017511736A JP2017526081A (en) 2014-08-28 2015-02-12 Two-dimensional filter generation method, query method, and apparatus
EP15836802.7A EP3179382A4 (en) 2014-08-28 2015-02-12 Two-dimensional filter generation method, query method and device
US15/443,997 US20170170968A1 (en) 2014-08-28 2017-02-27 Method and apparatus for generating two-dimensional matrix, and method and apparatus for querying key value element

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201410431085.9 2014-08-28
CN201410431085.9A CN104317795A (en) 2014-08-28 2014-08-28 Two-dimensional filter generation method, query method and device

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US15/443,997 Continuation US20170170968A1 (en) 2014-08-28 2017-02-27 Method and apparatus for generating two-dimensional matrix, and method and apparatus for querying key value element

Publications (1)

Publication Number Publication Date
WO2016029664A1 true WO2016029664A1 (en) 2016-03-03

Family

ID=52373027

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2015/072915 WO2016029664A1 (en) 2014-08-28 2015-02-12 Two-dimensional filter generation method, query method and device

Country Status (6)

Country Link
US (1) US20170170968A1 (en)
EP (1) EP3179382A4 (en)
JP (1) JP2017526081A (en)
CN (1) CN104317795A (en)
RU (1) RU2017109924A (en)
WO (1) WO2016029664A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108932263A (en) * 2017-05-26 2018-12-04 腾讯科技(深圳)有限公司 A kind of affiliated partner method for tracing and device

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104317795A (en) * 2014-08-28 2015-01-28 华为技术有限公司 Two-dimensional filter generation method, query method and device
CN106339413A (en) * 2016-08-12 2017-01-18 宁波大学 Approximate membership query method based on high-dimensional data filter
CN108287840B (en) * 2017-01-09 2022-05-03 北京大学 Data storage and query method based on matrix hash
US10803085B1 (en) 2018-12-19 2020-10-13 Airspeed Systems LLC Matched array airspeed and angle of attack alignment system and method
US11010941B1 (en) 2018-12-19 2021-05-18 EffectiveTalent Office LLC Matched array general talent architecture system and method
US10896529B1 (en) 2018-12-19 2021-01-19 EffectiveTalent Office LLC Matched array talent architecture system and method
US11016988B1 (en) 2018-12-19 2021-05-25 Airspeed Systems LLC Matched array flight alignment system and method
US11010940B2 (en) 2018-12-19 2021-05-18 EffectiveTalent Office LLC Matched array alignment system and method
CN112199396B (en) * 2020-10-14 2022-11-11 北京理工大学 Industrial Internet identification query method and system facing MES
US20240061808A1 (en) * 2022-08-16 2024-02-22 Sap Se Low-memory and efficient hashmap

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070130188A1 (en) * 2005-12-07 2007-06-07 Moon Hwa S Data hashing method, data processing method, and data processing system using similarity-based hashing algorithm
CN101398820A (en) * 2007-09-24 2009-04-01 北京启明星辰信息技术有限公司 Large scale key word matching method
CN101567815A (en) * 2009-05-27 2009-10-28 清华大学 Method for effectively detecting and defending domain name server (DNS) amplification attacks
CN104317795A (en) * 2014-08-28 2015-01-28 华为技术有限公司 Two-dimensional filter generation method, query method and device

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8266506B2 (en) * 2009-04-18 2012-09-11 Alcatel Lucent Method and apparatus for multiset membership testing using combinatorial bloom filters
WO2011038899A1 (en) * 2009-09-29 2011-04-07 Nec Europe Ltd. Method and system for probabilistic processing of data
CN101901248B (en) * 2010-04-07 2012-08-15 北京星网锐捷网络技术有限公司 Method and device for creating and updating Bloom filter and searching elements
US20130226972A1 (en) * 2012-02-27 2013-08-29 Ramakumar Kosuru Methods and systems for processing data arrays using bloom filters
JP6028567B2 (en) * 2012-12-28 2016-11-16 富士通株式会社 Data storage program, data search program, data storage device, data search device, data storage method, and data search method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070130188A1 (en) * 2005-12-07 2007-06-07 Moon Hwa S Data hashing method, data processing method, and data processing system using similarity-based hashing algorithm
CN101398820A (en) * 2007-09-24 2009-04-01 北京启明星辰信息技术有限公司 Large scale key word matching method
CN101567815A (en) * 2009-05-27 2009-10-28 清华大学 Method for effectively detecting and defending domain name server (DNS) amplification attacks
CN104317795A (en) * 2014-08-28 2015-01-28 华为技术有限公司 Two-dimensional filter generation method, query method and device

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
LIU, WEI ET AL.: "Pattern Matching Engine Based on Multi-Dimensional Bloom Filters", JOURNAL OF COMPUTER APPLICATIONS, vol. 31, no. 1, 31 January 2011 (2011-01-31), pages 107 - 109, XP008184618, ISSN: 1001-9081 *
See also references of EP3179382A4 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108932263A (en) * 2017-05-26 2018-12-04 腾讯科技(深圳)有限公司 A kind of affiliated partner method for tracing and device
CN108932263B (en) * 2017-05-26 2023-01-10 腾讯科技(深圳)有限公司 Associated object tracking method and device

Also Published As

Publication number Publication date
CN104317795A (en) 2015-01-28
EP3179382A4 (en) 2017-08-09
JP2017526081A (en) 2017-09-07
RU2017109924A (en) 2018-09-28
US20170170968A1 (en) 2017-06-15
EP3179382A1 (en) 2017-06-14
RU2017109924A3 (en) 2018-09-28

Similar Documents

Publication Publication Date Title
WO2016029664A1 (en) Two-dimensional filter generation method, query method and device
US11023801B2 (en) Data processing method and apparatus
US10686589B2 (en) Combining hashes of data blocks
EP3493084A1 (en) Method for processing data in bloom filter and bloom filter
CN108205577B (en) Array construction method, array query method, device and electronic equipment
CN105740405B (en) Method and device for storing data
US20140244654A1 (en) Data migration
AU2017336193B2 (en) Instruction to provide true random numbers
CN105245320B (en) The generation method and device of the q rank ZC sequences of LTE uplink reference signals
JP2017531263A5 (en)
CN114661318A (en) Efficient post-quantum security software updates customized for resource constrained devices
CN108460030B (en) Set element judgment method based on improved bloom filter
US20150295883A1 (en) Storage and retrieval of information using internet protocol addresses
US9015429B2 (en) Method and apparatus for an efficient hardware implementation of dictionary based lossless compression
EP4102354A1 (en) Method, circuit, and soc for performing matrix multiplication operation
WO2016029441A1 (en) File scanning method and apparatus
CN110855812A (en) Positioning method, device and equipment based on IP address
CN102546293B (en) High speed network flow network address measuring method based on Hash bit string multiplexing
CN112929424B (en) Gateway load balancing method, device, equipment and storage medium
CN105989154B (en) Similarity measurement method and equipment
CN109359226A (en) A kind of data capture method and relevant apparatus
CN116010984A (en) Multiple encryption storage method, device and equipment for relational database data
EP3583738B1 (en) Method and device to produce a secure hash value
US9928286B2 (en) Transforming character delimited values
CN110889035A (en) Sensitive information filtering method and device and computer readable storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15836802

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2017511736

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

REEP Request for entry into the european phase

Ref document number: 2015836802

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 2015836802

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 2017109924

Country of ref document: RU

Kind code of ref document: A