CN112069188A

CN112069188A - Method for identifying new and old equipment id with high performance

Info

Publication number: CN112069188A
Application number: CN202010971026.6A
Authority: CN
Inventors: 刘琛林
Original assignee: Beijing Zhidemai Technology Co ltd
Current assignee: Beijing Zhidemai Technology Co ltd
Priority date: 2020-09-15
Filing date: 2020-09-15
Publication date: 2020-12-11

Abstract

The invention discloses a method for identifying new and old equipment ids with high performance, which comprises the following steps: preparing a Redis server, initializing a Bitmap, and defining M hash functions; executing k hash functions to obtain k hash values; taking a module of the hash value, wherein the module is equal to the length of the Bitmap and is the position of the Bit in the Bitmap; inquiring the return values of the k bits by using a getbit command of the Reids, if all the return values are 1, indicating that the equipment id is old equipment, and if any one return value is 0, indicating that the equipment id is new equipment; for the new device id, the Bit corresponding to the new device id is set to 1 by using setbit command, and then the new device id is determined as an old device in the future inquiry. The invention improves the accuracy of judgment by improving the number of the bits and the number of the hash function, and can reduce the misjudgment rate to one millionth.

Description

Method for identifying new and old equipment id with high performance

Technical Field

The invention relates to the technical field of computer networks, in particular to a high-performance method for identifying new and old equipment ids.

Background

With the rapid popularization and rapid update iteration of mobile phone devices, more and more mobile phone devices are available at present. For each modern app, it is necessary to obtain the device id of the user for use in distinguishing whether it is a new user, big data analysis, etc. to provide better service to the user. With the development of companies and the increase of the number of users, the number of device ids is increasing, and when the number reaches the order of hundreds of millions and the future speed is also increasing, the traditional method for querying the database is difficult to support, and the storage capacity and the query efficiency are confronted with tests. At this time, a more efficient and less space-consuming scheme is needed to cope with the continuous increase of the device id, so as to better meet the requirement of service development.

At present, whether a device id is a new device is identified, a database query mode is adopted, if the device id is not the new device, the device id is considered as the new device and is written into the database, and the device id is judged as the old device later. The device id column of the relational database table uses an index to improve query efficiency.

CN111191120A discloses a method and apparatus for matching device information, wherein the method includes: reading the information of the equipment to be matched, wherein the information of the equipment to be matched comprises a search field area; determining a matching unit to which the equipment information to be matched belongs according to the search field area, wherein the matching unit prestores equipment information in the same partition as the search field area; and judging whether the matching unit contains the information of the equipment to be matched or not so as to determine the matching node of the information of the equipment to be matched. The method and the device solve the technical problem that in order to determine the matching result in the related technology, the matching speed is low due to the fact that the join function is matched with the mass equipment ID library.

In the above manner, each device id generates a row of records in the relational database table, and when the devices reach the scale of billions, query and write efficiency is reduced and a large amount of storage space is occupied. And the query performance of the relational database is not high, and the relational database cannot cope with high-concurrency and low-delay query requests.

Disclosure of Invention

The embodiment of the invention provides a method for identifying new and old equipment ids with high performance. The following presents a simplified summary in order to provide a basic understanding of some aspects of the disclosed embodiments. This summary is not an extensive overview and is intended to neither identify key/critical elements nor delineate the scope of such embodiments. Its sole purpose is to present some concepts in a simplified form as a prelude to the more detailed description that is presented later.

According to the embodiment of the invention, a method for identifying new and old device ids with high performance is provided, which comprises the following steps:

s1: preparing a Redis server side, and initializing a Bitmap, wherein the Bitmap is a string in the type of a bottom layer, and M hash functions are defined;

s2: for one device id, respectively executing k hash functions to obtain k hash values;

s3: taking a modulus of each hash value, wherein the modulus is equal to the length of the Bitmap, and obtaining k modulus values, namely the positions of the bits in the Bitmap;

s4: inquiring the return values of the k bits by using a getbit command of the Reids, if all the return values are 1, indicating that the equipment id is old equipment, and if any one return value is 0, indicating that the equipment id is new equipment; s5: for the new device id, the Bit corresponding to the new device id is set to 1 by using setbit command, and then the new device id is determined as an old device in the future inquiry.

Preferably, the memory of the Redis server is greater than 1G.

Preferably, the Bitmap is string with length of 40960000 bits.

Preferably, the value range of M is 3-10.

Preferably, the hash function is a 64-bit function.

Preferably, the value range of k is less than M.

Preferably, the return values of the k bits are queried, and a bloom filter is adopted to introduce a plurality of hash functions to reduce the false rate.

The technical scheme provided by the embodiment of the invention has the following beneficial effects:

(1) the invention realizes a high-performance filter by using the thought principle of the cache component redis and the bloom filter, and the time complexity of judging once is only O (1). The types of the values supporting storage are relatively more, the shortage of key-value storage such as memcached is compensated, and a single server can finish the processing of requests ten thousand times per second

(2) The invention realizes the service logic through Golang language, and realizes higher concurrency and usability

(3) The invention reduces the space occupied by storing hundred million-level elements by using a bitmap storage mode, a common database approximately needs more than 10GB, and the method only needs 512MB

(4) The invention improves the accuracy of judgment by improving the number of the bits and the number of the hash function, and can reduce the misjudgment rate to one millionth.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.

FIG. 1 is a flow diagram illustrating a method for high performance identification of new and old device ids in accordance with an exemplary embodiment;

FIG. 2 is a schematic diagram of a bloom filter shown in accordance with an exemplary embodiment;

FIG. 3 is an illustration of a use of a bloom filter query in conjunction with a redis bitmap shown in accordance with an exemplary embodiment;

FIG. 4 is a graph illustrating accuracy versus number of elements in accordance with an exemplary embodiment;

FIG. 5 is a graph illustrating accuracy versus number of bits in accordance with an exemplary embodiment;

FIG. 6 is a graph illustrating accuracy versus number of hash functions in accordance with an illustrative embodiment;

Detailed Description

The following description and the drawings sufficiently illustrate specific embodiments of the invention to enable those skilled in the art to practice them. The examples merely typify possible variations. Individual components and functions are optional unless explicitly required, and the sequence of operations may vary. Portions and features of some embodiments may be included in or substituted for those of others. The scope of embodiments of the invention encompasses the full ambit of the claims, as well as all available equivalents of the claims. Embodiments may be referred to herein, individually or collectively, by the term "invention" merely for convenience and without intending to voluntarily limit the scope of this application to any single invention or inventive concept if more than one is in fact disclosed. Herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed. The embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. For the structures, products and the like disclosed by the embodiments, the description is relatively simple because the structures, the products and the like correspond to the parts disclosed by the embodiments, and the relevant parts can be just described by referring to the method part.

It should be noted that: the relative arrangement of the components and steps, the numerical expressions, and numerical values set forth in these embodiments do not limit the scope of the present disclosure unless specifically stated otherwise.

It should be noted that although the various steps of the methods of the present disclosure are depicted in the drawings in a particular order, this does not require or imply that these steps must be performed in this particular order, or that all of the depicted steps must be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions, etc.

The invention is further described with reference to the following figures and examples:

a high-performance method for identifying new and old device ids as shown in fig. 1 includes,

s2: for one device id, respectively executing N hash functions to obtain k hash values;

According to the scheme, further, the memory of the Redis server is larger than 1G; the Bitmap is string with the length of 40960000 bits; the value range of M is 3-10; the hash function is a 64-bit function; .

According to the above scheme, further, the value range of k is smaller than M.

According to the above scheme, further, the result of the scene judgment of us has only two states (yes or no), so that the stored data can be completely expressed by bits. The data itself can calculate a key through a hash function, the key is a position, the value of the key is 0 or 1 (because there are only two states), and a plurality of character strings are likely to be hash into a value only because of the limitation of the hash function. To solve this problem, the query returns the k Bit bits, and introduces multiple hash functions using a bloom filter to reduce the false positive rate.

As shown in fig. 2, there are three elements x, y, and z in a set, which are respectively mapped to some bits of the bitmap by using 3 hash functions, and if we need to determine whether w is in the set, we also use three hash functions for mapping, and as a result, it is found that the obtained result is not all 1, it indicates that w is not in the set.

Bloom filters, while highly efficient (write and judge are both O (1), requiring very little storage space), have the disadvantage of being misjudged. When the number of the elements in the set is more and the number of the 1 s in the binary sequence is more and more, it is easy to misjudge whether a character string is in the set, and the character string which is not in the set originally can be judged to be in the set.

It is known that the misjudgment rate is influenced by three factors, namely the element number n, the Bit number m and the hash function number k, and in order to reduce the misjudgment rate, a relatively reasonable value among the three numbers needs to be searched. As shown in fig. 3, 4 and 5, the functional image relationship between the three values and the false positive rate is shown.

Therefore, through practice, as shown in the following table, the service requirement is met by storing 2 hundred million elements by using 40960000 bits and 8 hash functions, and the misjudgment rate is about one millionth and can be almost ignored.

Specific examples are given below:

1. creating bitmaps

Opening a Bitmap with the length of m, corresponding to a string type storage key in the redis, the memory occupation amount of bit storage is very low, and assuming that we use 40960000 bits, the occupied memory is only 512MB

2. Finding a hash function

Such as BKDRHash, JSHash, RSHash, etc. These hash functions we just need to obtain directly.

3. Writing data

The content to be judged is calculated by these hash functions to obtain several values, for example, 3 hash functions are used, and the obtained values are 1000, 2000 and 3000 respectively. The 1000 th, 2000 th, 3000 th bit of the m-bit array is then set to binary 1.

4. Judgment of

It can then be determined whether a new content is in our collection. The flow of judgment and the flow of writing are identical.

It is to be understood that the present invention is not limited to the procedures and structures described above and shown in the drawings, and that various modifications and changes may be made without departing from the scope thereof. The scope of the invention is limited only by the appended claims.

Claims

1. A method for identifying new and old device ids with high performance is characterized by comprising,

2. The method for identifying new and old device ids with high performance according to claim 1, wherein the memory of the Redis server is greater than 1G.

3. The method for identifying old and new device ids with high performance as claimed in claim 1, wherein said Bitmap is string with length of 40960000 bits.

4. The method for identifying the old and new device ids in high performance according to claim 1, wherein the value range of M is 3-10.

5. The method for high-performance identification of new and old device ids according to claim 1, wherein said hash function is a 64-bit function.

6. The method for identifying the old and new device ids in high performance according to claim 1, wherein the value range of k is smaller than M.

7. The method for high-performance identification of new and old device ids according to claim 1, wherein said querying the return values of k Bit bits uses a bloom filter to introduce multiple hash functions to reduce false positive rate.