CN112069188A - Method for identifying new and old equipment id with high performance - Google Patents

Method for identifying new and old equipment id with high performance Download PDF

Info

Publication number
CN112069188A
CN112069188A CN202010971026.6A CN202010971026A CN112069188A CN 112069188 A CN112069188 A CN 112069188A CN 202010971026 A CN202010971026 A CN 202010971026A CN 112069188 A CN112069188 A CN 112069188A
Authority
CN
China
Prior art keywords
new
old
equipment
bitmap
hash
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010971026.6A
Other languages
Chinese (zh)
Inventor
刘琛林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Zhidemai Technology Co ltd
Original Assignee
Beijing Zhidemai Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Zhidemai Technology Co ltd filed Critical Beijing Zhidemai Technology Co ltd
Priority to CN202010971026.6A priority Critical patent/CN112069188A/en
Publication of CN112069188A publication Critical patent/CN112069188A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2255Hash tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2471Distributed queries

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Fuzzy Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method for identifying new and old equipment ids with high performance, which comprises the following steps: preparing a Redis server, initializing a Bitmap, and defining M hash functions; executing k hash functions to obtain k hash values; taking a module of the hash value, wherein the module is equal to the length of the Bitmap and is the position of the Bit in the Bitmap; inquiring the return values of the k bits by using a getbit command of the Reids, if all the return values are 1, indicating that the equipment id is old equipment, and if any one return value is 0, indicating that the equipment id is new equipment; for the new device id, the Bit corresponding to the new device id is set to 1 by using setbit command, and then the new device id is determined as an old device in the future inquiry. The invention improves the accuracy of judgment by improving the number of the bits and the number of the hash function, and can reduce the misjudgment rate to one millionth.

Description

Method for identifying new and old equipment id with high performance
Technical Field
The invention relates to the technical field of computer networks, in particular to a high-performance method for identifying new and old equipment ids.
Background
With the rapid popularization and rapid update iteration of mobile phone devices, more and more mobile phone devices are available at present. For each modern app, it is necessary to obtain the device id of the user for use in distinguishing whether it is a new user, big data analysis, etc. to provide better service to the user. With the development of companies and the increase of the number of users, the number of device ids is increasing, and when the number reaches the order of hundreds of millions and the future speed is also increasing, the traditional method for querying the database is difficult to support, and the storage capacity and the query efficiency are confronted with tests. At this time, a more efficient and less space-consuming scheme is needed to cope with the continuous increase of the device id, so as to better meet the requirement of service development.
At present, whether a device id is a new device is identified, a database query mode is adopted, if the device id is not the new device, the device id is considered as the new device and is written into the database, and the device id is judged as the old device later. The device id column of the relational database table uses an index to improve query efficiency.
CN111191120A discloses a method and apparatus for matching device information, wherein the method includes: reading the information of the equipment to be matched, wherein the information of the equipment to be matched comprises a search field area; determining a matching unit to which the equipment information to be matched belongs according to the search field area, wherein the matching unit prestores equipment information in the same partition as the search field area; and judging whether the matching unit contains the information of the equipment to be matched or not so as to determine the matching node of the information of the equipment to be matched. The method and the device solve the technical problem that in order to determine the matching result in the related technology, the matching speed is low due to the fact that the join function is matched with the mass equipment ID library.
In the above manner, each device id generates a row of records in the relational database table, and when the devices reach the scale of billions, query and write efficiency is reduced and a large amount of storage space is occupied. And the query performance of the relational database is not high, and the relational database cannot cope with high-concurrency and low-delay query requests.
Disclosure of Invention
The embodiment of the invention provides a method for identifying new and old equipment ids with high performance. The following presents a simplified summary in order to provide a basic understanding of some aspects of the disclosed embodiments. This summary is not an extensive overview and is intended to neither identify key/critical elements nor delineate the scope of such embodiments. Its sole purpose is to present some concepts in a simplified form as a prelude to the more detailed description that is presented later.
According to the embodiment of the invention, a method for identifying new and old device ids with high performance is provided, which comprises the following steps:
s1: preparing a Redis server side, and initializing a Bitmap, wherein the Bitmap is a string in the type of a bottom layer, and M hash functions are defined;
s2: for one device id, respectively executing k hash functions to obtain k hash values;
s3: taking a modulus of each hash value, wherein the modulus is equal to the length of the Bitmap, and obtaining k modulus values, namely the positions of the bits in the Bitmap;
s4: inquiring the return values of the k bits by using a getbit command of the Reids, if all the return values are 1, indicating that the equipment id is old equipment, and if any one return value is 0, indicating that the equipment id is new equipment; s5: for the new device id, the Bit corresponding to the new device id is set to 1 by using setbit command, and then the new device id is determined as an old device in the future inquiry.
Preferably, the memory of the Redis server is greater than 1G.
Preferably, the Bitmap is string with length of 40960000 bits.
Preferably, the value range of M is 3-10.
Preferably, the hash function is a 64-bit function.
Preferably, the value range of k is less than M.
Preferably, the return values of the k bits are queried, and a bloom filter is adopted to introduce a plurality of hash functions to reduce the false rate.
The technical scheme provided by the embodiment of the invention has the following beneficial effects:
(1) the invention realizes a high-performance filter by using the thought principle of the cache component redis and the bloom filter, and the time complexity of judging once is only O (1). The types of the values supporting storage are relatively more, the shortage of key-value storage such as memcached is compensated, and a single server can finish the processing of requests ten thousand times per second
(2) The invention realizes the service logic through Golang language, and realizes higher concurrency and usability
(3) The invention reduces the space occupied by storing hundred million-level elements by using a bitmap storage mode, a common database approximately needs more than 10GB, and the method only needs 512MB
(4) The invention improves the accuracy of judgment by improving the number of the bits and the number of the hash function, and can reduce the misjudgment rate to one millionth.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.
FIG. 1 is a flow diagram illustrating a method for high performance identification of new and old device ids in accordance with an exemplary embodiment;
FIG. 2 is a schematic diagram of a bloom filter shown in accordance with an exemplary embodiment;
FIG. 3 is an illustration of a use of a bloom filter query in conjunction with a redis bitmap shown in accordance with an exemplary embodiment;
FIG. 4 is a graph illustrating accuracy versus number of elements in accordance with an exemplary embodiment;
FIG. 5 is a graph illustrating accuracy versus number of bits in accordance with an exemplary embodiment;
FIG. 6 is a graph illustrating accuracy versus number of hash functions in accordance with an illustrative embodiment;
Detailed Description
The following description and the drawings sufficiently illustrate specific embodiments of the invention to enable those skilled in the art to practice them. The examples merely typify possible variations. Individual components and functions are optional unless explicitly required, and the sequence of operations may vary. Portions and features of some embodiments may be included in or substituted for those of others. The scope of embodiments of the invention encompasses the full ambit of the claims, as well as all available equivalents of the claims. Embodiments may be referred to herein, individually or collectively, by the term "invention" merely for convenience and without intending to voluntarily limit the scope of this application to any single invention or inventive concept if more than one is in fact disclosed. Herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed. The embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. For the structures, products and the like disclosed by the embodiments, the description is relatively simple because the structures, the products and the like correspond to the parts disclosed by the embodiments, and the relevant parts can be just described by referring to the method part.
It should be noted that: the relative arrangement of the components and steps, the numerical expressions, and numerical values set forth in these embodiments do not limit the scope of the present disclosure unless specifically stated otherwise.
It should be noted that although the various steps of the methods of the present disclosure are depicted in the drawings in a particular order, this does not require or imply that these steps must be performed in this particular order, or that all of the depicted steps must be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions, etc.
The invention is further described with reference to the following figures and examples:
a high-performance method for identifying new and old device ids as shown in fig. 1 includes,
s1: preparing a Redis server side, and initializing a Bitmap, wherein the Bitmap is a string in the type of a bottom layer, and M hash functions are defined;
s2: for one device id, respectively executing N hash functions to obtain k hash values;
s3: taking a modulus of each hash value, wherein the modulus is equal to the length of the Bitmap, and obtaining k modulus values, namely the positions of the bits in the Bitmap;
s4: inquiring the return values of the k bits by using a getbit command of the Reids, if all the return values are 1, indicating that the equipment id is old equipment, and if any one return value is 0, indicating that the equipment id is new equipment; s5: for the new device id, the Bit corresponding to the new device id is set to 1 by using setbit command, and then the new device id is determined as an old device in the future inquiry.
According to the scheme, further, the memory of the Redis server is larger than 1G; the Bitmap is string with the length of 40960000 bits; the value range of M is 3-10; the hash function is a 64-bit function; .
According to the above scheme, further, the value range of k is smaller than M.
According to the above scheme, further, the result of the scene judgment of us has only two states (yes or no), so that the stored data can be completely expressed by bits. The data itself can calculate a key through a hash function, the key is a position, the value of the key is 0 or 1 (because there are only two states), and a plurality of character strings are likely to be hash into a value only because of the limitation of the hash function. To solve this problem, the query returns the k Bit bits, and introduces multiple hash functions using a bloom filter to reduce the false positive rate.
As shown in fig. 2, there are three elements x, y, and z in a set, which are respectively mapped to some bits of the bitmap by using 3 hash functions, and if we need to determine whether w is in the set, we also use three hash functions for mapping, and as a result, it is found that the obtained result is not all 1, it indicates that w is not in the set.
Bloom filters, while highly efficient (write and judge are both O (1), requiring very little storage space), have the disadvantage of being misjudged. When the number of the elements in the set is more and the number of the 1 s in the binary sequence is more and more, it is easy to misjudge whether a character string is in the set, and the character string which is not in the set originally can be judged to be in the set.
It is known that the misjudgment rate is influenced by three factors, namely the element number n, the Bit number m and the hash function number k, and in order to reduce the misjudgment rate, a relatively reasonable value among the three numbers needs to be searched. As shown in fig. 3, 4 and 5, the functional image relationship between the three values and the false positive rate is shown.
Therefore, through practice, as shown in the following table, the service requirement is met by storing 2 hundred million elements by using 40960000 bits and 8 hash functions, and the misjudgment rate is about one millionth and can be almost ignored.
Figure BDA0002684037080000051
Figure BDA0002684037080000061
Specific examples are given below:
1. creating bitmaps
Opening a Bitmap with the length of m, corresponding to a string type storage key in the redis, the memory occupation amount of bit storage is very low, and assuming that we use 40960000 bits, the occupied memory is only 512MB
2. Finding a hash function
Such as BKDRHash, JSHash, RSHash, etc. These hash functions we just need to obtain directly.
3. Writing data
The content to be judged is calculated by these hash functions to obtain several values, for example, 3 hash functions are used, and the obtained values are 1000, 2000 and 3000 respectively. The 1000 th, 2000 th, 3000 th bit of the m-bit array is then set to binary 1.
4. Judgment of
It can then be determined whether a new content is in our collection. The flow of judgment and the flow of writing are identical.
It is to be understood that the present invention is not limited to the procedures and structures described above and shown in the drawings, and that various modifications and changes may be made without departing from the scope thereof. The scope of the invention is limited only by the appended claims.

Claims (7)

1. A method for identifying new and old device ids with high performance is characterized by comprising,
s1: preparing a Redis server side, and initializing a Bitmap, wherein the Bitmap is a string in the type of a bottom layer, and M hash functions are defined;
s2: for one device id, respectively executing k hash functions to obtain k hash values;
s3: taking a modulus of each hash value, wherein the modulus is equal to the length of the Bitmap, and obtaining k modulus values, namely the positions of the bits in the Bitmap;
s4: inquiring the return values of the k bits by using a getbit command of the Reids, if all the return values are 1, indicating that the equipment id is old equipment, and if any one return value is 0, indicating that the equipment id is new equipment; s5: for the new device id, the Bit corresponding to the new device id is set to 1 by using setbit command, and then the new device id is determined as an old device in the future inquiry.
2. The method for identifying new and old device ids with high performance according to claim 1, wherein the memory of the Redis server is greater than 1G.
3. The method for identifying old and new device ids with high performance as claimed in claim 1, wherein said Bitmap is string with length of 40960000 bits.
4. The method for identifying the old and new device ids in high performance according to claim 1, wherein the value range of M is 3-10.
5. The method for high-performance identification of new and old device ids according to claim 1, wherein said hash function is a 64-bit function.
6. The method for identifying the old and new device ids in high performance according to claim 1, wherein the value range of k is smaller than M.
7. The method for high-performance identification of new and old device ids according to claim 1, wherein said querying the return values of k Bit bits uses a bloom filter to introduce multiple hash functions to reduce false positive rate.
CN202010971026.6A 2020-09-15 2020-09-15 Method for identifying new and old equipment id with high performance Pending CN112069188A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010971026.6A CN112069188A (en) 2020-09-15 2020-09-15 Method for identifying new and old equipment id with high performance

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010971026.6A CN112069188A (en) 2020-09-15 2020-09-15 Method for identifying new and old equipment id with high performance

Publications (1)

Publication Number Publication Date
CN112069188A true CN112069188A (en) 2020-12-11

Family

ID=73695819

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010971026.6A Pending CN112069188A (en) 2020-09-15 2020-09-15 Method for identifying new and old equipment id with high performance

Country Status (1)

Country Link
CN (1) CN112069188A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130226941A1 (en) * 2012-02-28 2013-08-29 Ramakumar Kosuru System and method for classifying signals using the bloom filter
CN108804242A (en) * 2018-05-23 2018-11-13 武汉斗鱼网络科技有限公司 A kind of data counts De-weight method, system, server and storage medium
CN109597834A (en) * 2018-10-22 2019-04-09 平安科技(深圳)有限公司 Mass data storage means, device, medium and equipment based on redis
CN111198880A (en) * 2019-12-20 2020-05-26 北京淇瑀信息科技有限公司 Data storage method and device based on redis and electronic equipment

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130226941A1 (en) * 2012-02-28 2013-08-29 Ramakumar Kosuru System and method for classifying signals using the bloom filter
CN108804242A (en) * 2018-05-23 2018-11-13 武汉斗鱼网络科技有限公司 A kind of data counts De-weight method, system, server and storage medium
CN109597834A (en) * 2018-10-22 2019-04-09 平安科技(深圳)有限公司 Mass data storage means, device, medium and equipment based on redis
CN111198880A (en) * 2019-12-20 2020-05-26 北京淇瑀信息科技有限公司 Data storage method and device based on redis and electronic equipment

Similar Documents

Publication Publication Date Title
JP6356675B2 (en) Aggregation / grouping operation: Hardware implementation of hash table method
US9805077B2 (en) Method and system for optimizing data access in a database using multi-class objects
US7805427B1 (en) Integrated search engine devices that support multi-way search trees having multi-column nodes
US6546394B1 (en) Database system having logical row identifiers
CN108804031A (en) Best titime is searched
CN102375852A (en) Method for building data index as well as method and system using data index for inquiring data
US11221999B2 (en) Database key compression
US8161051B2 (en) Method and apparatus for data processing with index search
CN114546295B (en) Intelligent writing distribution method and device based on ZNS solid state disk
US20200042538A1 (en) Methods and apparatus to partition a database
CN112732725B (en) NVM (non volatile memory) hybrid memory-based adaptive prefix tree construction method, system and medium
US20140201132A1 (en) Storing a key value to a deleted row based on key range density
US20230195769A1 (en) Computer system and method for indexing and retrieval of partially specified type-less semi-infinite information
KR101806394B1 (en) A data processing method having a structure of the cache index specified to the transaction in a mobile environment dbms
US20080005077A1 (en) Encoded version columns optimized for current version access
CN112069188A (en) Method for identifying new and old equipment id with high performance
CN109992535B (en) Storage control method, device and system
CN112328630B (en) Data query method, device, equipment and storage medium
CN113392039B (en) Data storage and searching method and device
CN113625967A (en) Data storage method, data query method and server
US9824105B2 (en) Adaptive probabilistic indexing with skip lists
CN113821508A (en) Method and system for realizing array index
CN117311645B (en) LSM storage metadata read amplification optimization method
US6807618B1 (en) Address translation
CN115225730B (en) High concurrency offline data packet analysis method supporting multitasking

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination