WO2023179414A1

WO2023179414A1 - Hotspot recognition method and rate limiting method

Info

Publication number: WO2023179414A1
Application number: PCT/CN2023/081466
Authority: WO
Inventors: 陈亚东; 汪翔; 杨文龙; 沈春辉; 杨成虎
Original assignee: 阿里云计算有限公司
Priority date: 2022-03-22
Filing date: 2023-03-14
Publication date: 2023-09-28
Also published as: CN114756544A

Abstract

The description provides a hotspot recognition method and a rate limiting method. The hotspot recognition method comprises: initializing count values corresponding to a plurality of hotspot indexes comprised in a hotspot recognition tree to be 0 in advance, the number of hotspot indexes in the hotspot recognition tree being less than the amount of data in a database; under the condition that a data access request is received, determining a primary key of data that the data access request requires to access; calculating a hotspot index corresponding to the primary key, and determining a corresponding count value in the hotspot recognition tree according to the hotspot index, the count value being used for representing the number of access times of data corresponding to the hotspot index; in response to the count value satisfying a preset hotspot condition, determining that the primary key is a hotspot primary key; and in response to the count value not satisfying the preset hotspot condition, adding one to the count value.

Description

A hotspot identification method and a current limiting method

This application claims priority to the Chinese patent application submitted to the China Patent Office on March 22, 2022, with application number 202210289116.6 and application title "A hotspot identification method and a current limiting method", the entire content of which is incorporated by reference. in this application.

Technical field

One or more embodiments of this specification relate to the field of computer application technology, and in particular, to a hotspot identification method and a current limiting method.

Background technique

In a distributed database, the data of a larger database will be divided into multiple copies and stored on different physical machines. If a piece of data in the database is frequently accessed within a certain period of time, since the access service of a piece of data is generally undertaken by a single physical machine, when the amount of access to this piece of data is large, hot spots will be formed, that is, distribution In traditional systems, access requests are concentrated on one physical machine, such as Weibo hot searches, e-commerce flash sales and other scenarios, where hot issues are prone to occur. In the event of a hotspot problem, the physical machine where the hotspot is located may go down due to heavy traffic, affecting the stability of the distributed database service.

For non-relational databases, each piece of data generally corresponds to a primary key (key). In related technologies, some distributed databases generally identify hot spots by counting the number of accesses to each primary key, but for data stored on a single physical machine For databases with a lot of data, this method will occupy a lot of memory and affect the normal operation of the database.

It can be seen that for distributed databases (and non-relational databases) with a large amount of data on a single physical machine, there is a lack of a hotspot identification method that can avoid system downtime.

Contents of the invention

In view of this, one or more embodiments of this specification provide a hotspot identification method and a current limiting method.

According to the first aspect of one or more embodiments of this specification, a hotspot identification method is proposed, in which the count values corresponding to multiple hotspot indexes included in the hotspot identification tree are initialized to 0; the hotspot indexes in the hotspot identification tree are The number is less than the number of data in the database; the method includes:

When a data access request is received, determine the primary key of the data accessed by the data access request;

Calculate the hotspot index corresponding to the primary key, and determine the corresponding count value in the hotspot identification tree based on the hotspot index;

In response to the count value meeting the preset hotspot condition, it is determined that the primary key is the hotspot primary key; in response to the count value not meeting the preset hotspot condition, the count value is increased by one.

According to the second aspect of one or more embodiments of this specification, a current limiting method is proposed, including:

Receive a data access request for target data in the database, where the data access request carries the primary key corresponding to the target data;

In response to the primary key belonging to a hotspot primary key, the data access request for the target data is blocked; the hotspot primary key is obtained through the aforementioned hotspot identification method.

According to a third aspect of the embodiment of this specification, a hotspot identification device is provided, which pre-initializes the count values corresponding to multiple hotspot indexes included in the hotspot identification tree to 0; the number of hotspot indexes in the hotspot identification tree is smaller than the data in the database. The quantity; the device includes:

A primary key determination module, configured to determine the primary key of the data accessed by the data access request when receiving a data access request;

A count value determination module, used to calculate the hotspot index corresponding to the primary key, and determine the corresponding count value in the hotspot identification tree according to the hotspot index;

A hotspot determination module, configured to determine that the primary key is a hotspot primary key in response to the count value meeting a preset hotspot condition; and increment the count value by one in response to the count value not meeting the preset hotspot condition.

According to a fourth aspect of the embodiments of this specification, a current limiting device is provided, including:

A request receiving module, configured to receive a data access request for target data in the database, where the data access request carries the primary key corresponding to the target data;

A request blocking module, configured to block data access requests to the target data in response to the fact that the primary key belongs to a hotspot primary key; the hotspot primary key is obtained through the aforementioned hotspot identification method.

According to a fifth aspect of the embodiments of this specification, an electronic device is provided, including:

processor;

Memory used to store instructions executable by the processor;

Wherein, the processor executes the executable instructions to implement the foregoing hotspot identification method or the foregoing current limiting method.

According to a sixth aspect of the embodiments of this specification, a computer-readable storage medium is provided. Computer instructions are stored on the computer-readable storage medium. When the computer instructions are executed by a processor, the aforementioned hot spot identification method or the aforementioned method is implemented. Current limiting method.

This specification provides a hotspot identification method and a current limiting method. The count values corresponding to multiple hotspot indexes included in the hotspot identification tree are initialized to 0 in advance; the number of hotspot indexes in the hotspot identification tree is less than the number of data in the database. Quantity; when receiving a data access request, determine the primary key of the data accessed by the data access request; calculate the hotspot index corresponding to the primary key, and determine the corresponding count value in the hotspot identification tree based on the hotspot index; The count value is used to represent the number of times the data corresponding to the hotspot index has been accessed; in response to the count value meeting the preset hotspot conditions, it is determined that the primary key is the hotspot primary key; in response to the count value not meeting the preset hotspot conditions hotspot condition, the count value plus one.

Through the above method, the count value of the hotspot index corresponding to each primary key is counted. Since the number of hotspot indexes is smaller than the number of primary keys in the database, hotspots can be identified while ensuring a certain accuracy and occupying less memory, and the hotspots can be identified. Process them to avoid hot spots occupying large amounts of memory or affecting the normal operation of database services.

It should be understood that the above general description and the following detailed description are only exemplary and explanatory, and do not limit this specification.

Description of the drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the specification and together with the description, serve to explain the principles of the specification.

Figure 1 is a flow chart of a hotspot identification method illustrated in this specification according to an exemplary embodiment.

Figure 2 is a schematic structural diagram of a hotspot identification tree shown in this specification according to an exemplary embodiment.

Figure 3 is a flow chart of a current limiting method according to an exemplary embodiment of this specification.

FIG. 4A is a schematic structural diagram of a sketch according to a specific embodiment of this specification.

FIG. 4B is a schematic structural diagram of a hotspot identification tree according to a specific embodiment of this specification.

Figure 5 is a block diagram of a hotspot identification device according to an exemplary embodiment of this specification.

FIG. 6 is a block diagram of a current limiting device according to an exemplary embodiment of this specification.

FIG. 7 is a hardware structure diagram of an electronic device in which a hotspot identification device or a current limiting device is located according to an exemplary embodiment of this specification.

Detailed ways

Exemplary embodiments will be described in detail herein, examples of which are illustrated in the accompanying drawings. When the following description refers to the drawings, the same numbers in different drawings refer to the same or similar elements unless otherwise indicated. The implementations described in the following exemplary embodiments do not represent all implementations consistent with one or more embodiments of this specification. Rather, they are merely examples of apparatus and methods consistent with some aspects of one or more embodiments of this specification as detailed in the appended claims.

It should be noted that in other embodiments, the steps of the corresponding method are not necessarily performed in the order shown and described in this specification. In some other embodiments, methods may include more or fewer steps than described in this specification. In addition, a single step described in this specification may be broken down into multiple steps for description in other embodiments; and multiple steps described in this specification may also be combined into a single step in other embodiments. describe.

In a distributed database, user data is generally sorted according to byte array order, and the sorted data is segmented. The different data obtained by segmentation are stored in different modules (regions) of the database. And the region provides access services to this part of the data. This design is generally called a range sharding design and is a distributed system This is a common design method in the system field. Under this design, a piece of user data must be uniquely located to a range shard.

Distributed systems generally gain great scalability through horizontal expansion. By simply expanding the machine, the overall throughput can almost always be improved. However, due to the range sharding design, in some cases, due to some characteristics (such as: Weibo hot searches, e-commerce flash sales, the data corresponding to a Weibo and a product is one piece of data), most of the traffic may be in a short period of time. Concentrate on one or several pieces of data within a short time, that is, most of the traffic will hit one or several specific regions in a short period of time, which will put a huge load on the machines where these regions are located, thus forming a Hot Issues. Hot issues may lead to abnormal situations such as hardware downtime and process exit. In many distributed systems, there are disaster recovery designs for single-machine abnormalities. When a machine goes down, other machines will be responsible for taking over the downtime machine. However, the hotspot problem is not a hardware problem. As the machine goes down, other machines take over the service and are once again overwhelmed by hotspot traffic. This can easily cause an avalanche effect on the entire cluster, seriously affecting the stability of the service. It can be seen that due to the limited hardware resources of a single machine, the processing power of a single machine must also be limited. In the face of hot issues, the horizontal expansion capabilities of distributed systems are useless, so other methods need to be found to solve hot issues.

In related technologies, the management method for hot issues is usually current limiting, but the difficulty of hot issues lies in the identification and discovery of the hot spots themselves. In order to identify hot spots, considering that each piece of data corresponds to a primary key, some in-memory databases will count the number of visits to all primary keys to determine the hot spot primary key.

However, the above method is only suitable for scenarios where data stored on a single machine is limited, such as in-memory databases. For some databases such as persistent databases that store a large amount of data on a single machine (several TB of data, the throughput per second may be in the order of hundreds of thousands of QPS), counting the access volume of each key will occupy a large amount of memory, resulting in this method Unable to adopt and less efficient.

Based on this, this specification provides a hotspot identification method and a current limiting method. The count values corresponding to multiple hotspot indexes included in the hotspot identification tree are initialized to 0 in advance; the number of hotspot indexes in the hotspot identification tree is smaller than the number of hotspot indexes in the database. the number of data in the data; when receiving a data access request, determine the primary key of the data accessed by the data access request; calculate the hotspot index corresponding to the primary key, and determine the corresponding hotspot identification tree in the hotspot identification tree based on the hotspot index Count value; the count value is used to represent the number of times the data corresponding to the hotspot index has been accessed; in response to the count value meeting the preset hotspot conditions, it is determined that the primary key is the hotspot primary key; in response to the count value not satisfying For the preset hotspot condition, the count value is increased by one.

The above method counts the count value of the hotspot index corresponding to each primary key. Since the number of hotspot indexes is smaller than the number of primary keys in the database, it can identify hotspots while ensuring a certain accuracy and occupying less memory, and based on the identified Process hot spots to avoid hot spots occupying large memory or affecting the normal operation of database services.

Next, a hotspot identification method shown in this specification will be described.

As shown in Figure 1, Figure 1 is a flow chart of a hotspot identification method illustrated in this specification according to an exemplary embodiment, including the following steps:

Step 103: Upon receiving a data access request, determine the primary key of the data accessed by the data access request.

It should also be noted that before this method is executed, the multiple hotspot indexes included in the hotspot identification tree need to be matched in advance. The corresponding count value is initialized to 0; the number of hotspot indexes in the hotspot identification tree is less than the number of data in the database.

The reason why the primary key needs to be determined in step 103 is because the data access request is a request for reading or writing data. The database targeted by the data access request in this manual is a non-relational database. Each piece of data corresponds to a primary key, which can be passed Different primary keys are used to distinguish access requests for different data. Therefore, in order to count the number of fuzzy accesses to different data (the specific meaning is described below), it is necessary to first determine the primary key of the data accessed by the data access request.

The reason why it is necessary to initialize the count values corresponding to multiple hotspot indexes to 0 before executing this method is because in this manual, the hotspot identification tree is used to count the fuzzy visits (that is, the count values) for different data, and then identify the hotspots. , so it is necessary to initialize the hotspot identification tree in advance so that no visits are recorded, that is, to clear the count values corresponding to multiple hotspot indexes in the hotspot identification tree to facilitate the statistics of fuzzy visits.

The reason why the number of hotspot indexes in the hotspot identification tree is smaller than the number of data in the database is to save memory. In order to save memory, instead of counting one visit per primary key in related technologies, one fuzzy visit count needs to correspond to multiple primary keys. Therefore, it is necessary to ensure that the number of hotspot indexes is smaller than the number of primary keys in the database, so as to save memory. space, so that the method in this specification can support the identification of hot spots in databases with large amounts of data.

In addition, the hotspot identification tree is a tree structure that records the count value of each hotspot index. It may have only one layer or multiple layers. In the case of one layer, multiple hotspot indexes are stored in one layer of the hotspot identification tree, and count values corresponding to the multiple hotspot indexes are stored.

In the case of multiple layers, each layer of the hotspot identification tree stores multiple hotspot indexes and their corresponding count values. For a multi-layer hotspot identification tree, a hotspot index in the upper layer corresponds to multiple hotspot indexes in the lower layer. The reason for this correspondence is that the two layers have different methods of calculating hotspot indexes. Therefore, multiple primary keys corresponding to a hotspot index on the first layer will be calculated using different calculation methods on the next layer than on the previous layer. into multiple hot indexes. In other words, a hotspot index on the upper layer and multiple hotspot indexes on the lower layer corresponding to the hotspot index on the upper layer represent the same batch of data. In this way, under a multi-layer structure, as the number of layers increases, the data corresponding to each hotspot index gradually decreases.

The extension method of the hotspot identification tree also needs to be explained. When the count value of a hotspot index in a certain layer is full (that is, the maximum value that the count value can reach), the hotspot index corresponding to the hotspot index will be expanded to determine the hotspot index corresponding to the hotspot index. In the data, which data/data is the hot spot?

In addition, the advantages and other details of the multi-layered hotspot identification tree will be described in detail below and will not be described again here.

Step 105: Calculate the hotspot index corresponding to the primary key, and determine the corresponding count value in the hotspot identification tree according to the hotspot index.

First, the meaning of each noun involved in step 105 will be explained.

The hotspot index is an index value calculated through a method similar to a hash function, that is, the primary key is used as input, and the hotspot index is calculated through a hash function and other methods.

The count value corresponding to the hotspot index in the hotspot identification tree represents the number of visits to the data corresponding to the hotspot index. When the hotspot identification tree has only one layer, the count value corresponding to the hotspot index represents the sum of the number of visits to multiple primary keys/data corresponding to the hotspot index within a period of time, that is, the count value represents the fuzzy access to the data corresponding to the hotspot index. quantity.

When the hotspot identification tree has multiple layers, the count value corresponding to any hotspot index in each layer is used to represent the data corresponding to the hotspot index (for example, the hotspot identification tree has two layers, and any hotspot index in the second layer corresponds to The data refers to the access level of the data corresponding to the first-layer hotspot data corresponding to the second-layer hotspot index, and the index value is the data of the second-layer hotspot index).

For example, when the hotspot identification tree has two layers, if a certain hotspot index only corresponds to the first-layer hotspot index, or the count value of some second-layer hotspot indexes corresponding to the hotspot index is 0, it proves that these hotspot indexes (the former The first-layer hotspot index of the latter (part of the second-layer hotspot index) of the latter has less data access; if the first-layer hotspot index corresponding to a certain hotspot index has a second-layer hotspot index, and the corresponding second-layer hotspot index If the hotspot index is not 0, it proves that the access volume of the data corresponding to the second-level hotspot index is medium; if the count value of the second-level hotspot index corresponding to the primary key (data) is full, it proves that the access volume of the primary key (data) The amount is large and it belongs to the hot primary key.

Step 107: In response to the count value meeting the preset hotspot condition, determine that the primary key is the hotspot primary key; in response to the count value not meeting the preset hotspot condition, add one to the count value.

In other words, since the identification condition of the hotspot primary key is that the number of visits within a certain period of time is greater than a certain value, the statistic of the count value within a certain period of time is greater than a certain value, or the fuzzy number of visits represented by the count value accounts for the total number of visits within a certain period of time. If the ratio is greater than a certain value, the primary key is considered to be a hotspot primary key. And it is necessary to continue to count the count value of the hotspot index corresponding to the primary key when the primary key is not the hotspot primary key.

Therefore, step 107 may specifically include: in response to the ratio of the count value to the sum of all count values exceeding a preset hotspot ratio, determining the primary key to be a hotspot primary key.

In other words, whether it is hotspot data can be determined by calculating whether the proportion of the count value is greater than a certain value. This is a more convenient method for a single-layer hotspot identification tree. For a multi-layer hotspot identification tree, in order to calculate accurate hotspots, the count value can be the sum of the count values of the last layer (the count value of each layer will be accumulated to the next layer, for example, the count value of the first layer is 15. , when expanding to the next layer, the count value of the second layer starts counting from 15). The sum of all count values can also be the sum of the number of accesses (because each access request will add a certain count value by 1 after arrival, then the sum of the number of accesses is also the sum of all count values).

In addition, in addition to the above method, step 107 can also be implemented by the following method.

The method also includes: when receiving a data access request for any data, adding 1 to the number of access requests; when the number of access requests is greater than a preset access request threshold, increasing the number of access requests. Reset to 0, and set the count values corresponding to multiple hotspot indexes included in the hotspot identification tree to 0. And step 107 includes: in response to the count value reaching a preset hotspot threshold, determining the primary key to be a hotspot primary key.

In other words, when the total number of visits to all data in the database is greater than a certain value, the hotspot identification tree is reset and statistics are restarted. In the above case, since the hotspot identification tree will be reset, the preset hotspot condition can be that the count value exceeds the preset hotspot threshold, so that the access traffic of hotspot data accounts for the access traffic of all data within a certain period of time. proportion.

In addition, the hotspot identification tree can also be reset at regular intervals (ie, the hotspot identification tree is reinitialized). In this case, the hotspot identification condition can be that the count value exceeds a preset hotspot threshold. This way you can count the number of visits over a period of time Whether it exceeds a predetermined threshold to determine whether there is a hot spot.

It should also be noted that when the hotspot primary key is identified, you can also record the hotspot primary key and start counting the specific visits of the hotspot primary key. This is mainly to facilitate the observation of traffic conditions and the processing of hotspots.

In other words, the method further includes: when it is determined that the primary key is a hotspot primary key, recording the primary key and recording the number of visits to the primary key.

In the above situation, in order to prevent the system from unlimited access to the primary key record and occupying too much memory, a cache eviction strategy can be based on the least recently used (Least recently used, LRU) or the least frequently used (Least frequency used, LFU) , to delete the primary keys of some records.

Specifically, when the number of recorded primary keys exceeds a certain threshold, the primary key with the lowest access frequency (the primary key with the lowest access frequency may be mistakenly identified as hotspot data because the number of hotspot indexes is less than the number of primary keys) can be deleted. It can be that when the number of primary keys exceeds a certain threshold, the primary keys that have not been accessed within a certain period of time are deleted.

For the former in the previous paragraph, that is: the method also includes: when the number of recorded primary keys exceeds a preset primary key number threshold, deleting the primary key with the lowest access frequency based on the access volume of each recorded primary key.

After describing the method shown in this specification, the hotspot identification tree will be described in further detail.

As mentioned above, the hotspot identification tree has multiple layers. The purpose of setting up multiple layers for the hotspot identification tree is to avoid hash conflicts and further save memory. Specifically, the hotspot index is calculated through a certain calculation method, and because the number of hotspot indexes is less than the number of primary keys, the hotspot index corresponds to multiple primary keys. In order to distinguish different primary keys corresponding to the same hotspot index as much as possible, in When the hotspot identification tree has only one layer, it can be achieved by increasing the number of hotspot indexes. However, increasing the number of hotspot indexes will occupy more memory (it is likely that the count value of some hotspot indexes is 0 or very small), so In order to avoid the above contradiction, it can be solved by setting up the hotspot identification tree to have multiple layers.

Specifically, when the hotspot identification tree has multiple layers, its structure can be shown in Figure 2. The number of first-layer hotspot indexes is N, and each first-layer hotspot index corresponds to M second-layer hotspots. Index, each second-level hotspot index can correspond to Q third-level hotspot indexes (N, M, and Q are all preset positive integers, and the sizes of N, M, and Q can be the same or different), and so on. Different layers of hotspot indexes correspond to different index calculation methods (which can be hash functions), which can avoid hash conflicts (when the index calculation method is a hash function) and reduce the possibility of misidentifying some data as hot spots. sex. And in order to reduce memory, you can set the hotspot index of the next layer not to be established when the count value of the hotspot index of the upper layer is not satisfied. In other words, the initial hotspot identification tree has only one layer. When the count value corresponding to any hotspot index in this layer exceeds the preset expansion threshold, a new hotspot index corresponding to the hotspot index of the next layer is created. In this way, when only a small number of data are hotspots, most hotspot indexes stop at the first few layers, and there are few hotspot indexes that can reach the last layer. Compared with the solution with only one layer of hotspot indexes, the same size Storage space can avoid hash collisions.

It should also be noted that although Figure 2 shows a structure in which multiple layers have corresponding hotspot indexes, in actual applications During use, not every first-level hotspot index corresponds to a second-level hotspot index.

It can be seen that the multi-layer structure of the hotspot identification tree can avoid hash conflicts and reduce the memory space occupied.

Next, a two-layer hotspot identification tree will be used as an example to illustrate the above solution.

The hotspot identification tree includes a first layer and a second layer. Step 103 specifically includes: calculating the first-level hotspot index corresponding to the primary key according to the hash function corresponding to the first level of the hotspot identification tree; responding to the corresponding first-level hotspot index not existing in the hotspot identification tree. second layer hotspot index, determine the technical value corresponding to the first layer hotspot index as the corresponding count value; in response to the existence of the second layer hotspot corresponding to the first layer hotspot index in the hotspot identification tree Index, determine the second layer hotspot index corresponding to the primary key according to the hash function corresponding to the second layer, and determine the count value corresponding to the second layer hotspot index as the corresponding count value. Among them, the hash function corresponding to the first layer is different from the hash function corresponding to the second layer.

In other words, in the case where the hotspot identification tree includes two levels, the determined count value is the count value of the last level corresponding to the primary key. Specifically, when there is a second hotspot index in the first-level hotspot index corresponding to the primary key, the count value of the second-level hotspot index corresponding to the primary key is used as the determined count value. If there is no second-level hotspot index, In the case of index, the count value of the first-level hotspot index is used as the determined count value.

In addition to determining the count value, there is also the process of expanding the next layer. Specifically: initializing the count values corresponding to multiple hotspot indexes included in the hotspot identification tree to 0 in advance, including: preliminarily Count values corresponding to multiple hotspot indexes are initialized to 0. The method also includes: in response to the count value corresponding to the first-layer hotspot index after adding 1 exceeding the preset expansion threshold, creating a plurality of second-layer hotspot indexes corresponding to the first-layer hotspot index, and adding the plurality of second-layer hotspot indexes to the first-layer hotspot index. The count value corresponding to the second-layer hotspot index is initialized to 0.

In other words, only the first-layer hotspot index is initialized during pre-initialization. When the first-layer count value exceeds the expansion threshold, the next-layer hotspot index is expanded and initialized.

Through the above method, memory is saved. Since the number of hotspot indexes is fixed and smaller than the number of primary keys, it can be better applied to databases with massive data. Under the premise of low memory overhead, the false alarm rate can be determined based on the number of layers of the hotspot identification tree, the theoretical maximum number of hotspot indexes at each layer, and the maximum value that the count value can reach (if the primary key of the hotspot is identified by identifying the count The value exceeds the predetermined hotspot threshold) and the access request threshold (in the case where there is an access request threshold) are calculated, and each threshold can be reasonably set to meet the hotspot identification requirements. And because it does not occupy memory, the performance overhead is not large.

After the hotspot primary key is identified, the hotspot primary key also needs to be processed to avoid the impact of the hotspot primary key on stand-alone operation. There are many ways to deal with hot spots. Next, we will take current limiting as an example to illustrate the processing process of hot spot primary keys.

As shown in Figure 3, Figure 3 is a flow chart of a current limiting method according to an exemplary embodiment of this specification, including:

Step 301: Receive a data access request for target data in the database, where the data access request carries the primary key corresponding to the target data.

Step 303: In response to the primary key belonging to the hotspot primary key, block the data access request for the target data.

Wherein, the hotspot primary key is obtained by the aforementioned hotspot identification method. In this way, data access requests for hotspot primary keys are blocked, thereby achieving the purpose of protecting the system itself.

It should be noted that in step 303, blocking the access request to the target data may be implemented by: after identifying the hotspot primary key through the aforementioned hotspot identification method, blocking the access request if it is a hotspot primary key. In addition, in order to further reduce the impact of hotspot problems on system operation, after discovering hotspots, you can add the hotspot primary key to a stand-alone current limiter (for example: RateLimiter in guava. Of course, the current limiter can also be other current limiters. This manual is for (The type of current limiter is not limited), the access request will be blocked by the system after further processing (for example, judging whether it is hot data through the aforementioned method) to save operating performance.

In addition, in order to ensure the accuracy of identification, in addition to identifying the hotspot primary key through the above method, when the number of visits corresponding to the hotspot primary key is recorded, the hotspot primary key can also be identified through the recorded number of visits. When the recorded access volume of a certain primary key is greater than a certain value, data access requests for that primary key will be blocked, which can improve the accuracy of current limiting.

It should also be noted that when the number of visits corresponding to the hotspot primary key is recorded, after limiting the current flow of the hotspot primary key, you can also observe the number of visits to the hotspot primary key with the current limit for a period of time, so as to provide feedback on the hotspot primary key. The current limiting situation makes it easy to adjust the current limiting strategy, etc.

In addition, in the above situation, if there are LRF and LFU cache elimination strategies, you can also lock the hotspot primary keys that need to be observed to avoid the data of the hotspot primary keys that need to be observed being eliminated by the LRF or LFU strategy when current limiting is successful. .

Next, a hotspot identification method and a current limiting method shown in this specification will be described through a specific embodiment.

Next, we will explain the structure of the hotspot identification tree and the update and reset of the hotspot identification tree with 4 layers and the number of hotspot indexes in each group of each layer being 256.

First, the concept of overview (Sketch) is introduced, which refers to a set of hotspot indexes and their corresponding count values (a set includes 256 hotspot indexes). Its structure is shown in Figure 4A. Each sketch includes 256 hotspot indexes. , 256 count values, and stores the maximum value that can be counted for each count value, the number of layers to which the sketch belongs, etc.

After explaining the sketch, the structure of the hotspot identification tree will be introduced next, as shown in Figure 4B. The hotspot identification tree includes a 4-layer structure. The first layer includes 1 sketch, and the second layer includes each hotspot in the first layer. The sketch corresponding to the index, that is, the second layer includes at most 256 sketches, the third layer includes at most 256*256 sketches, and the fourth layer includes at most 256*256*256 sketches (for convenience, only 1 is shown for each layer sketch). Since each sketch has 256 hotspot indexes, the maximum number of hotspots that this hotspot identification tree can identify is the 4th power of 256 (that is, the number of hotspot indexes included in the fourth layer). It should also be noted that different layers correspond to different Hash functions.

There are two operations for the hotspot identification tree. The first is to update the hotspot identification tree, that is, when a data access request is received, the hotspot identification tree is updated, and by the way, it is determined whether the data accessed by the data access request is a hotspot. The second is to reset the hotspot identification tree.

Next, the process of updating the hotspot identification tree will be described first.

First, use the hash function corresponding to the first level of the hotspot identification tree to calculate the first-level hotspot index of the primary key of the data targeted by the data access request. Among them, in order to limit the number of hotspot indexes to 256, the remainder of 256 can be taken from the index value obtained by solving the problem to obtain the first-layer hotspot index.

After obtaining the first-level hotspot index, determine whether the first-level hotspot index corresponds to the second-level hotspot index. If there is a second-level hotspot index, calculate the second-level hotspot index corresponding to the primary key, and determine the second-level hotspot index. Whether there is a third-layer hotspot index in the layer hotspot index, and so on, until the last layer of hotspot index corresponding to the primary key is determined (the last layer refers to the one with the largest number of layers. It should be noted that the last layer of hotspot index here is It is the last hotspot index where the primary key exists).

After determining the hotspot index of the last layer, add 1 to the count value corresponding to the hotspot index of the last layer, and determine whether the count value after adding 1 exceeds the maximum count value (14). If it does, expand the hotspot index. The corresponding next-level sketch (that is, initializing the next-level sketch). If there is no scalable next-level sketch, the primary key is determined to be a hotspot primary key.

This completes the identification of the hotspot primary key. In the above process, the hash function is used to disperse the data access requests into multiple hotspot indexes for accumulation. The higher the accumulated value, the higher the frequency (this effect must be combined with the hotspot identification tree reset process). Because the number of hotspot indexes (256) in each sketch is much smaller than the data distribution to be accessed by the data access request, low-frequency data may cause false positives due to hash conflicts and high-frequency data being scattered to the same hotspot index. Therefore, in A multi-layer design is added to the hotspot identification tree. Each layer corresponds to a hash function. In this way, the data mixed together due to hash conflicts on the first layer has a higher probability of being dispersed using another hash function on the second layer. Through 4 layers of 4 hash functions, false positives caused by hash conflicts can be greatly reduced.

In addition, it should be noted that in a non-relational database (noSQL), each piece of data has a primary key that uniquely identifies it. On the read-write link, the primary key of each accessed data will be used as an input parameter to call the above method to determine whether the accessed data is a hotspot. Each call to the above method will return a boolean type return value (that is, it can only return true or false). When the return value is true, there is a high probability that the data corresponding to the primary key passed in is a hotspot data that is being accessed frequently.

In addition, any primary key that returns true will be recorded for each read and write access. There are two purposes: First, the above method cannot tell the specific number of visits to the data access request. By recording, the real number of visits to the hotspot primary key can be obtained, so that operation and maintenance personnel can intuitively know the number of visits. Second, because the above method has a lower probability of returning low-frequency data (mistaking low-frequency accessed data as hotspot data), through recording, you can know which are the real high-frequency hotspots, thereby deleting the low-frequency data to prevent mistaken current limiting. Some low-frequency data cause trouble to users.

In addition, the number of continuously recorded primary keys must be limited to 1,000. Once it exceeds 1,000, the data will be eliminated according to the LRU principle. At the same time, data that has not been accessed for a certain period of time (such as 5 minutes) will also be eliminated. On the one hand, this is to filter out low-frequency data with low probability of return more quickly; on the other hand, it is to limit the memory overhead and only track and record the most frequent hotspot key access data.

Next, the process of resetting the hotspot identification tree will be described in detail. Consider the above process as the process of hotspot identification tree expansion, Then the reset process is the process of shrinking the hotspot identification tree.

Since each time a data access request is received, the hotspot identification tree needs to be updated through the above method, the number of times the above method is executed can be regarded as the number of received data access requests (data access requests that are not current-limited). When the number of received data access requests exceeds the preset quantity threshold, the hotspot identification tree is reset, that is, layers 2-4 of the hotspot identification tree are deleted, and all count values in the first layer are reset to 0. The purpose of this is that the traffic is constantly changing, and the data accessed frequently may only last for a period of time and then cease to be high-frequency. Therefore, a periodic shrinkage mechanism needs to be added to filter out the historical high-frequency data, so that The above hotspot identification method can always correctly determine the data that is currently being accessed frequently with limited memory overhead.

It should also be noted that a reset method has been added to filter out hotspot primary keys whose access frequency (that is, the number of visits to this data/the number of visits to all data) is greater than a certain value. Specifically, the hotspot identification tree will be reset every time it is updated P times. Then, the maximum value of the count value of each layer can be set to obtain the hotspot primary key with a predetermined access rating rate.

In addition, after discovering the hotspot primary key, the continuously recorded primary key can be combined to learn the current real read and write access throughput, and then the operation and maintenance personnel can limit the current, such as using a single-machine current limiter (such as: RateLimiter in guava). Flow, use the fail fast strategy to return exceptions to the client for flow-limited requests to achieve self-protection of the system.

The above method is very effective in saving memory. For each table in the database, using the parameters in the above example, a single table only occupies 500 bytes of resident memory without hot spots. There is a theoretical upper limit for the memory usage of each table, which is 30k (each table is a hotspot, and each table has multiple hotspot keys). It can be seen that the memory overhead required by using the above method has nothing to do with the actual amount of data stored, and it can be applied to database scenarios with mass storage.

Under the premise of achieving extremely low memory overhead, using the above parameters, this solution can detect keys with a minimum frequency of more than 3.5% and count qps and rt, which can fully meet the requirements for identifying hot spots.

The above method has a log(N) complexity CPU computational overhead and has a low impact on the response to read and write requests. In the absence of hotspots, personal computers can support a higher number of hotspot key determinations. (Sketch will expand when there are hot spots, which is slightly slower by 1 to 2 times), which reflects the advantage of low performance overhead.

In addition, the false alarm rate can be calculated through the above parameters, and the above parameters can be reasonably set to reduce the false alarm rate.

Corresponding to the foregoing method embodiments, this specification also provides embodiments of devices and electronic equipment to which they are applied.

As shown in Figure 5, Figure 5 is a block diagram of a hot spot identification device according to an exemplary embodiment of this specification. The device includes:

The initialization module 500 is used to initialize the count values corresponding to multiple hotspot indexes included in the hotspot identification tree to 0 in advance; the number of hotspot indexes in the hotspot identification tree is smaller than the number of data in the database.

The primary key determination module 510 is configured to determine the primary key of the data accessed by the data access request when receiving the data access request.

The count value determination module 520 is used to calculate the hotspot index corresponding to the primary key, and determine the corresponding count value in the hotspot identification tree according to the hotspot index.

Hotspot determination module 530, configured to determine that the primary key is a hotspot in response to the count value meeting the preset hotspot condition. Primary key; in response to the count value not meeting the preset hotspot condition, increase the count value by one.

In an optional embodiment, the hotspot identification tree includes a first layer and a second layer. The count value determination module 520 is specifically configured to: calculate the first-level hotspot index corresponding to the primary key according to the hash function corresponding to the first level of the hotspot identification tree; in response to the fact that the first level does not exist in the hotspot identification tree. The second layer hotspot index corresponding to the layer hotspot index determines the technical value corresponding to the first layer hotspot index as the corresponding count value; in response to the presence of the first layer hotspot index corresponding to the hotspot identification tree in the The second layer hotspot index determines the second layer hotspot index corresponding to the primary key according to the hash function corresponding to the second layer, and determines the count value corresponding to the second layer hotspot index as the corresponding count value.

In an optional embodiment, the initialization module 500 is specifically configured to: initialize the count values corresponding to multiple hotspot indexes included in the first layer of the hotspot identification tree to 0 in advance. In addition, the device also includes: an expansion module 521 (not shown in the figure), configured to create a new corresponding first-layer hotspot index in response to the count value corresponding to the first-layer hotspot index added by 1 exceeding the preset expansion threshold. multiple second-layer hotspot indexes, and initialize the count values corresponding to the multiple second-layer hotspot indexes to 0.

In an optional embodiment, the device further includes: a reset module 523 (not shown in the figure), configured to increase the number of access requests by 1 when a data access request for any data is received; If the number of access requests is greater than the preset access request threshold, the number of access requests is reset to 0, and the count values corresponding to multiple hotspot indexes included in the hotspot identification tree are set to 0. In this case, the hotspot determination module 530 is specifically configured to determine the primary key as the hotspot primary key in response to the count value reaching a preset hotspot threshold.

In an optional embodiment, the hotspot determination module 530 is specifically configured to determine that the primary key is a hotspot primary key in response to the ratio of the count value to the sum of all count values exceeding a preset hotspot ratio.

In an optional embodiment, the device further includes: a recording module 531 (not shown in the figure), configured to record the primary key when it is determined that the primary key is a hotspot primary key, and record the primary key. The number of visits to the primary key.

In an optional embodiment, the device further includes: a deletion module 532, configured to delete the one with the lowest access frequency according to the access volume of each recorded primary key when the number of recorded primary keys exceeds a preset primary key number threshold. Primary key is deleted.

As shown in Figure 6, Figure 6 is a block diagram of a current limiting device according to an exemplary embodiment of this specification. The device includes:

The request receiving module 610 is used to receive a data access request for target data in the database, where the data access request carries the primary key corresponding to the target data;

The request blocking module 620 is configured to block data access requests for the target data in response to the fact that the primary key belongs to a hotspot primary key; the hotspot primary key is obtained through the aforementioned hotspot identification method.

The specific implementation process of the functions and effects of each module in the above device can be found in the implementation process of the corresponding steps in the above method, and will not be described again here.

As for the device embodiment, since it basically corresponds to the method embodiment, please refer to the partial description of the method embodiment for relevant details. The device embodiments described above are only illustrative. The modules described as separate components may or may not be physically separated. The components shown as modules may or may not be physical modules, that is, they may be located in One place, or it can be distributed to multiple network modules. Some or all of the modules can be selected according to actual needs to achieve the purpose of the solution in this specification. A person of ordinary skill in the art will not pay It can be understood and implemented when creative work is performed.

As shown in Figure 7, Figure 7 shows a hardware structure diagram of an electronic device in which the hotspot identification device or the current limiting device is located according to the embodiment. The device may include: a processor 1010, a memory 1020 for storing computer instructions, an input/ Output interface 1030, communication interface 1040 and bus 1050. The processor 1010, the memory 1020, the input/output interface 1030 and the communication interface 1040 implement communication connections between each other within the device through the bus 1050.

The processor 1010 can be implemented using a general-purpose CPU (Central Processing Unit, processor), a microprocessor, an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), or one or more integrated circuits, and is used to execute computer instructions. , to implement the above hotspot identification method or current limiting method.

The memory 1020 can be implemented in the form of ROM (Read Only Memory), RAM (Random Access Memory), static storage device, dynamic storage device, etc. The memory 1020 can store operating systems and other application programs. When implementing the technical solutions provided by the embodiments of this specification through software or firmware, the relevant program codes are stored in the memory 1020 and called and executed by the processor 1010 .

The input/output interface 1030 is used to connect the input/output module to realize information input and output. The input/output/module can be configured in the device as a component (not shown in the figure), or can be externally connected to the device to provide corresponding functions. Input devices can include keyboards, mice, touch screens, microphones, various sensors, etc., and output devices can include monitors, speakers, vibrators, indicator lights, etc.

The communication interface 1040 is used to connect a communication module (not shown in the figure) to realize communication interaction between this device and other devices. The communication module can realize communication through wired means (such as USB, network cable, etc.) or wireless means (such as mobile network, WIFI, Bluetooth, etc.).

Bus 1050 includes a path that carries information between various components of the device (eg, processor 1010, memory 1020, input/output interface 1030, and communication interface 1040).

It should be noted that although the above device only shows the processor 1010, the memory 1020, the input/output interface 1030, the communication interface 1040 and the bus 1050, during specific implementation, the device may also include necessary components for normal operation. Other components. In addition, those skilled in the art can understand that the above-mentioned device may only include components necessary to implement the embodiments of this specification, and does not necessarily include all components shown in the drawings.

Embodiments of this specification also provide a computer-readable storage medium on which a computer program is stored. When the program is executed by a processor, the above-mentioned hot spot identification method and/or current limiting method is implemented.

Computer-readable media includes both persistent and non-volatile, removable and non-removable media that can be implemented by any method or technology for storage of information. Information may be computer-readable instructions, data structures, modules of programs, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), and read-only memory. (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technology, compact disc read-only memory (CD-ROM), digital versatile disc (DVD) or other optical storage, Magnetic tape cassettes, tape magnetic disk storage or other magnetic storage devices or any other non-transmission medium can be used to store information that can be accessed by a computing device. As defined in this article, computer-readable media does not include temporary computer-readable media (transitory media), such as modulated of data signals and carrier waves.

Embodiments of this specification also provide a computer program that, when run, implements the foregoing hotspot identification method or the foregoing current limiting method.

It should also be noted that the terms "comprises," "comprises," or any other variation thereof are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that includes a list of elements not only includes those elements, but also includes Other elements are not expressly listed or are inherent to the process, method, article or equipment. Without further limitation, an element defined by the statement "comprises a..." does not exclude the presence of additional identical elements in a process, method, article, or device that includes the stated element.

The foregoing describes specific embodiments of this specification. Other embodiments are within the scope of the appended claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desired results. Additionally, the processes depicted in the figures do not necessarily require the specific order shown, or sequential order, to achieve desirable results. Multitasking and parallel processing are also possible or may be advantageous in certain implementations.

Claims

A hotspot identification method that initializes the count values corresponding to multiple hotspot indexes included in the hotspot identification tree to 0 in advance; the number of hotspot indexes in the hotspot identification tree is smaller than the number of data in the database; the method includes:

When a data access request is received, determine the primary key of the data accessed by the data access request;

Calculate the hotspot index corresponding to the primary key, and determine the corresponding count value in the hotspot identification tree according to the hotspot index;

In response to the count value meeting the preset hotspot condition, it is determined that the primary key is the hotspot primary key; in response to the count value not meeting the preset hotspot condition, the count value is increased by one.
The method according to claim 1, the hotspot identification tree includes a first layer and a second layer;

Calculating the hotspot index corresponding to the primary key and determining the corresponding count value in the hotspot identification tree based on the hotspot index includes:

Calculate the first-level hotspot index corresponding to the primary key according to the hash function corresponding to the first level of the hotspot identification tree;

In response to the fact that there is no second-layer hotspot index corresponding to the first-layer hotspot index in the hotspot identification tree, determine the technical value corresponding to the first-layer hotspot index as the corresponding count value;

In response to the existence of the second layer hotspot index corresponding to the first layer hotspot index in the hotspot identification tree, the second layer hotspot index corresponding to the primary key is determined according to the hash function corresponding to the second layer, and the second layer hotspot index corresponding to the first layer is determined. The count value corresponding to the second-layer hotspot index is determined as the corresponding count value.
The method according to claim 2, wherein pre-initializing the count values corresponding to multiple hotspot indexes included in the hotspot identification tree to 0 includes:

Initialize the count values corresponding to multiple hotspot indexes included in the first layer of the hotspot identification tree to 0 in advance;

The method also includes:

In response to the count value corresponding to the first-layer hotspot index after adding 1 exceeding the preset expansion threshold, multiple second-layer hotspot indexes corresponding to the first-layer hotspot index are created, and the corresponding second-layer hotspot indexes are added to The count value is initialized to 0.
The method of claim 1, further comprising:

When a data access request is received for any data, the number of access requests is increased by 1;

When the number of access requests is greater than the preset access request threshold, reset the number of access requests to 0, and set the count values corresponding to multiple hotspot indexes included in the hotspot identification tree to 0;

Determining that the primary key is a hotspot primary key in response to the count value meeting a preset hotspot condition includes:

In response to the count value reaching a preset hotspot threshold, the primary key is determined to be a hotspot primary key.
The method of claim 1, wherein determining that the primary key is a hotspot primary key in response to the count value meeting a preset hotspot condition includes:

In response to the ratio of the count value to the sum of all count values exceeding a preset hotspot ratio, the primary key is determined to be a hotspot primary key.
The method of claim 1, further comprising:

When it is determined that the primary key is a hotspot primary key, record the primary key and record the access of the primary key. quantity.
The method of claim 6, further comprising:

When the number of recorded primary keys exceeds the preset primary key number threshold, the primary key with the lowest access frequency will be deleted based on the recorded access volume of each primary key.
A current limiting method includes:

Receive a data access request for target data in the database, where the data access request carries the primary key corresponding to the target data;

In response to the primary key belonging to a hotspot primary key, data access requests to the target data are blocked; the hotspot primary key is obtained by the hotspot identification method described in any one of claims 1-7.
A hotspot identification device that pre-initializes the count values corresponding to multiple hotspot indexes included in the hotspot identification tree to 0; the number of hotspot indexes in the hotspot identification tree is smaller than the number of data in the database; the device includes:

A primary key determination module, configured to determine the primary key of the data accessed by the data access request when receiving a data access request;

A count value determination module, used to calculate the hotspot index corresponding to the primary key, and determine the corresponding count value in the hotspot identification tree according to the hotspot index;

A hotspot determination module, configured to determine that the primary key is a hotspot primary key in response to the count value meeting a preset hotspot condition; and increment the count value by one in response to the count value not meeting the preset hotspot condition.
A current limiting device including:

A request receiving module, configured to receive a data access request for target data in the database, where the data access request carries the primary key corresponding to the target data;

A request blocking module, configured to block data access requests to the target data in response to the fact that the primary key belongs to a hotspot primary key; the hotspot primary key is obtained by the hotspot identification method described in any one of claims 1-7.
An electronic device including:

processor;

Memory used to store instructions executable by the processor;

Wherein, the processor executes the executable instructions to implement the hot spot identification method according to any one of claims 1 to 7 or the current limiting method according to claim 8.
A computer-readable storage medium. Computer instructions are stored on the computer-readable storage medium. When the computer instructions are executed by a processor, the hot spot identification method according to any one of claims 1 to 7 or claim 8 is implemented. The current limiting method.
A computer program that implements the hot spot identification method according to any one of claims 1 to 7 or the current limiting method according to claim 8 when the computer program is run.