WO2023179414A1 - Hotspot recognition method and rate limiting method - Google Patents

Hotspot recognition method and rate limiting method Download PDF

Info

Publication number
WO2023179414A1
WO2023179414A1 PCT/CN2023/081466 CN2023081466W WO2023179414A1 WO 2023179414 A1 WO2023179414 A1 WO 2023179414A1 CN 2023081466 W CN2023081466 W CN 2023081466W WO 2023179414 A1 WO2023179414 A1 WO 2023179414A1
Authority
WO
WIPO (PCT)
Prior art keywords
hotspot
primary key
count value
layer
data
Prior art date
Application number
PCT/CN2023/081466
Other languages
French (fr)
Chinese (zh)
Inventor
陈亚东
汪翔
杨文龙
沈春辉
杨成虎
Original Assignee
阿里云计算有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 阿里云计算有限公司 filed Critical 阿里云计算有限公司
Publication of WO2023179414A1 publication Critical patent/WO2023179414A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24552Database cache management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models

Definitions

  • One or more embodiments of this specification relate to the field of computer application technology, and in particular, to a hotspot identification method and a current limiting method.
  • a distributed database the data of a larger database will be divided into multiple copies and stored on different physical machines. If a piece of data in the database is frequently accessed within a certain period of time, since the access service of a piece of data is generally undertaken by a single physical machine, when the amount of access to this piece of data is large, hot spots will be formed, that is, distribution In traditional systems, access requests are concentrated on one physical machine, such as Weibo hot searches, e-commerce flash sales and other scenarios, where hot issues are prone to occur. In the event of a hotspot problem, the physical machine where the hotspot is located may go down due to heavy traffic, affecting the stability of the distributed database service.
  • each piece of data generally corresponds to a primary key (key).
  • some distributed databases generally identify hot spots by counting the number of accesses to each primary key, but for data stored on a single physical machine For databases with a lot of data, this method will occupy a lot of memory and affect the normal operation of the database.
  • one or more embodiments of this specification provide a hotspot identification method and a current limiting method.
  • a hotspot identification method in which the count values corresponding to multiple hotspot indexes included in the hotspot identification tree are initialized to 0; the hotspot indexes in the hotspot identification tree are The number is less than the number of data in the database; the method includes:
  • the primary key is the hotspot primary key; in response to the count value not meeting the preset hotspot condition, the count value is increased by one.
  • a current limiting method including:
  • the hotspot primary key is obtained through the aforementioned hotspot identification method.
  • a hotspot identification device which pre-initializes the count values corresponding to multiple hotspot indexes included in the hotspot identification tree to 0; the number of hotspot indexes in the hotspot identification tree is smaller than the data in the database.
  • the quantity; the device includes:
  • a primary key determination module configured to determine the primary key of the data accessed by the data access request when receiving a data access request
  • a count value determination module used to calculate the hotspot index corresponding to the primary key, and determine the corresponding count value in the hotspot identification tree according to the hotspot index;
  • a hotspot determination module configured to determine that the primary key is a hotspot primary key in response to the count value meeting a preset hotspot condition; and increment the count value by one in response to the count value not meeting the preset hotspot condition.
  • a current limiting device including:
  • a request receiving module configured to receive a data access request for target data in the database, where the data access request carries the primary key corresponding to the target data;
  • a request blocking module configured to block data access requests to the target data in response to the fact that the primary key belongs to a hotspot primary key; the hotspot primary key is obtained through the aforementioned hotspot identification method.
  • an electronic device including:
  • Memory used to store instructions executable by the processor
  • the processor executes the executable instructions to implement the foregoing hotspot identification method or the foregoing current limiting method.
  • a computer-readable storage medium is provided.
  • Computer instructions are stored on the computer-readable storage medium.
  • the computer instructions are executed by a processor, the aforementioned hot spot identification method or the aforementioned method is implemented. Current limiting method.
  • This specification provides a hotspot identification method and a current limiting method.
  • the count values corresponding to multiple hotspot indexes included in the hotspot identification tree are initialized to 0 in advance; the number of hotspot indexes in the hotspot identification tree is less than the number of data in the database.
  • Quantity when receiving a data access request, determine the primary key of the data accessed by the data access request; calculate the hotspot index corresponding to the primary key, and determine the corresponding count value in the hotspot identification tree based on the hotspot index; The count value is used to represent the number of times the data corresponding to the hotspot index has been accessed; in response to the count value meeting the preset hotspot conditions, it is determined that the primary key is the hotspot primary key; in response to the count value not meeting the preset hotspot conditions hotspot condition, the count value plus one.
  • the count value of the hotspot index corresponding to each primary key is counted. Since the number of hotspot indexes is smaller than the number of primary keys in the database, hotspots can be identified while ensuring a certain accuracy and occupying less memory, and the hotspots can be identified. Process them to avoid hot spots occupying large amounts of memory or affecting the normal operation of database services.
  • Figure 1 is a flow chart of a hotspot identification method illustrated in this specification according to an exemplary embodiment.
  • Figure 2 is a schematic structural diagram of a hotspot identification tree shown in this specification according to an exemplary embodiment.
  • Figure 3 is a flow chart of a current limiting method according to an exemplary embodiment of this specification.
  • FIG. 4A is a schematic structural diagram of a sketch according to a specific embodiment of this specification.
  • FIG. 4B is a schematic structural diagram of a hotspot identification tree according to a specific embodiment of this specification.
  • FIG. 5 is a block diagram of a hotspot identification device according to an exemplary embodiment of this specification.
  • FIG. 6 is a block diagram of a current limiting device according to an exemplary embodiment of this specification.
  • FIG. 7 is a hardware structure diagram of an electronic device in which a hotspot identification device or a current limiting device is located according to an exemplary embodiment of this specification.
  • the steps of the corresponding method are not necessarily performed in the order shown and described in this specification.
  • methods may include more or fewer steps than described in this specification.
  • a single step described in this specification may be broken down into multiple steps for description in other embodiments; and multiple steps described in this specification may also be combined into a single step in other embodiments. describe.
  • user data is generally sorted according to byte array order, and the sorted data is segmented.
  • the different data obtained by segmentation are stored in different modules (regions) of the database. And the region provides access services to this part of the data.
  • This design is generally called a range sharding design and is a distributed system This is a common design method in the system field. Under this design, a piece of user data must be uniquely located to a range shard.
  • the management method for hot issues is usually current limiting, but the difficulty of hot issues lies in the identification and discovery of the hot spots themselves.
  • the difficulty of hot issues lies in the identification and discovery of the hot spots themselves.
  • some in-memory databases will count the number of visits to all primary keys to determine the hot spot primary key.
  • the above method is only suitable for scenarios where data stored on a single machine is limited, such as in-memory databases.
  • databases such as persistent databases that store a large amount of data on a single machine (several TB of data, the throughput per second may be in the order of hundreds of thousands of QPS), counting the access volume of each key will occupy a large amount of memory, resulting in this method Unable to adopt and less efficient.
  • this specification provides a hotspot identification method and a current limiting method.
  • the count values corresponding to multiple hotspot indexes included in the hotspot identification tree are initialized to 0 in advance; the number of hotspot indexes in the hotspot identification tree is smaller than the number of hotspot indexes in the database.
  • the number of data in the data when receiving a data access request, determine the primary key of the data accessed by the data access request; calculate the hotspot index corresponding to the primary key, and determine the corresponding hotspot identification tree in the hotspot identification tree based on the hotspot index Count value; the count value is used to represent the number of times the data corresponding to the hotspot index has been accessed; in response to the count value meeting the preset hotspot conditions, it is determined that the primary key is the hotspot primary key; in response to the count value not satisfying For the preset hotspot condition, the count value is increased by one.
  • the above method counts the count value of the hotspot index corresponding to each primary key. Since the number of hotspot indexes is smaller than the number of primary keys in the database, it can identify hotspots while ensuring a certain accuracy and occupying less memory, and based on the identified Process hot spots to avoid hot spots occupying large memory or affecting the normal operation of database services.
  • Figure 1 is a flow chart of a hotspot identification method illustrated in this specification according to an exemplary embodiment, including the following steps:
  • Step 103 Upon receiving a data access request, determine the primary key of the data accessed by the data access request.
  • the multiple hotspot indexes included in the hotspot identification tree need to be matched in advance.
  • the corresponding count value is initialized to 0; the number of hotspot indexes in the hotspot identification tree is less than the number of data in the database.
  • the reason why the primary key needs to be determined in step 103 is because the data access request is a request for reading or writing data.
  • the database targeted by the data access request in this manual is a non-relational database.
  • Each piece of data corresponds to a primary key, which can be passed Different primary keys are used to distinguish access requests for different data. Therefore, in order to count the number of fuzzy accesses to different data (the specific meaning is described below), it is necessary to first determine the primary key of the data accessed by the data access request.
  • the hotspot identification tree is used to count the fuzzy visits (that is, the count values) for different data, and then identify the hotspots. , so it is necessary to initialize the hotspot identification tree in advance so that no visits are recorded, that is, to clear the count values corresponding to multiple hotspot indexes in the hotspot identification tree to facilitate the statistics of fuzzy visits.
  • the reason why the number of hotspot indexes in the hotspot identification tree is smaller than the number of data in the database is to save memory.
  • one fuzzy visit count needs to correspond to multiple primary keys. Therefore, it is necessary to ensure that the number of hotspot indexes is smaller than the number of primary keys in the database, so as to save memory. space, so that the method in this specification can support the identification of hot spots in databases with large amounts of data.
  • the hotspot identification tree is a tree structure that records the count value of each hotspot index. It may have only one layer or multiple layers. In the case of one layer, multiple hotspot indexes are stored in one layer of the hotspot identification tree, and count values corresponding to the multiple hotspot indexes are stored.
  • each layer of the hotspot identification tree stores multiple hotspot indexes and their corresponding count values.
  • a hotspot index in the upper layer corresponds to multiple hotspot indexes in the lower layer.
  • the reason for this correspondence is that the two layers have different methods of calculating hotspot indexes. Therefore, multiple primary keys corresponding to a hotspot index on the first layer will be calculated using different calculation methods on the next layer than on the previous layer. into multiple hot indexes.
  • a hotspot index on the upper layer and multiple hotspot indexes on the lower layer corresponding to the hotspot index on the upper layer represent the same batch of data. In this way, under a multi-layer structure, as the number of layers increases, the data corresponding to each hotspot index gradually decreases.
  • the extension method of the hotspot identification tree also needs to be explained.
  • the hotspot index corresponding to the hotspot index will be expanded to determine the hotspot index corresponding to the hotspot index.
  • the data which data/data is the hot spot?
  • Step 105 Calculate the hotspot index corresponding to the primary key, and determine the corresponding count value in the hotspot identification tree according to the hotspot index.
  • step 105 the meaning of each noun involved in step 105 will be explained.
  • the hotspot index is an index value calculated through a method similar to a hash function, that is, the primary key is used as input, and the hotspot index is calculated through a hash function and other methods.
  • the count value corresponding to the hotspot index in the hotspot identification tree represents the number of visits to the data corresponding to the hotspot index.
  • the count value corresponding to the hotspot index represents the sum of the number of visits to multiple primary keys/data corresponding to the hotspot index within a period of time, that is, the count value represents the fuzzy access to the data corresponding to the hotspot index. quantity.
  • the count value corresponding to any hotspot index in each layer is used to represent the data corresponding to the hotspot index (for example, the hotspot identification tree has two layers, and any hotspot index in the second layer corresponds to
  • the data refers to the access level of the data corresponding to the first-layer hotspot data corresponding to the second-layer hotspot index, and the index value is the data of the second-layer hotspot index).
  • the hotspot identification tree has two layers, if a certain hotspot index only corresponds to the first-layer hotspot index, or the count value of some second-layer hotspot indexes corresponding to the hotspot index is 0, it proves that these hotspot indexes (the former The first-layer hotspot index of the latter (part of the second-layer hotspot index) of the latter has less data access; if the first-layer hotspot index corresponding to a certain hotspot index has a second-layer hotspot index, and the corresponding second-layer hotspot index If the hotspot index is not 0, it proves that the access volume of the data corresponding to the second-level hotspot index is medium; if the count value of the second-level hotspot index corresponding to the primary key (data) is full, it proves that the access volume of the primary key (data) The amount is large and it belongs to the hot primary key.
  • Step 107 In response to the count value meeting the preset hotspot condition, determine that the primary key is the hotspot primary key; in response to the count value not meeting the preset hotspot condition, add one to the count value.
  • the identification condition of the hotspot primary key is that the number of visits within a certain period of time is greater than a certain value, the statistic of the count value within a certain period of time is greater than a certain value, or the fuzzy number of visits represented by the count value accounts for the total number of visits within a certain period of time. If the ratio is greater than a certain value, the primary key is considered to be a hotspot primary key. And it is necessary to continue to count the count value of the hotspot index corresponding to the primary key when the primary key is not the hotspot primary key.
  • step 107 may specifically include: in response to the ratio of the count value to the sum of all count values exceeding a preset hotspot ratio, determining the primary key to be a hotspot primary key.
  • the count value can be the sum of the count values of the last layer (the count value of each layer will be accumulated to the next layer, for example, the count value of the first layer is 15. , when expanding to the next layer, the count value of the second layer starts counting from 15).
  • the sum of all count values can also be the sum of the number of accesses (because each access request will add a certain count value by 1 after arrival, then the sum of the number of accesses is also the sum of all count values).
  • step 107 can also be implemented by the following method.
  • the method also includes: when receiving a data access request for any data, adding 1 to the number of access requests; when the number of access requests is greater than a preset access request threshold, increasing the number of access requests. Reset to 0, and set the count values corresponding to multiple hotspot indexes included in the hotspot identification tree to 0. And step 107 includes: in response to the count value reaching a preset hotspot threshold, determining the primary key to be a hotspot primary key.
  • the hotspot identification tree is reset and statistics are restarted.
  • the preset hotspot condition can be that the count value exceeds the preset hotspot threshold, so that the access traffic of hotspot data accounts for the access traffic of all data within a certain period of time. proportion.
  • the hotspot identification tree can also be reset at regular intervals (ie, the hotspot identification tree is reinitialized).
  • the hotspot identification condition can be that the count value exceeds a preset hotspot threshold. This way you can count the number of visits over a period of time Whether it exceeds a predetermined threshold to determine whether there is a hot spot.
  • hotspot primary key when the hotspot primary key is identified, you can also record the hotspot primary key and start counting the specific visits of the hotspot primary key. This is mainly to facilitate the observation of traffic conditions and the processing of hotspots.
  • the method further includes: when it is determined that the primary key is a hotspot primary key, recording the primary key and recording the number of visits to the primary key.
  • a cache eviction strategy can be based on the least recently used (Least recently used, LRU) or the least frequently used (Least frequency used, LFU) , to delete the primary keys of some records.
  • LRU least recently used
  • LFU least frequently used
  • the primary key with the lowest access frequency (the primary key with the lowest access frequency may be mistakenly identified as hotspot data because the number of hotspot indexes is less than the number of primary keys) can be deleted. It can be that when the number of primary keys exceeds a certain threshold, the primary keys that have not been accessed within a certain period of time are deleted.
  • the method also includes: when the number of recorded primary keys exceeds a preset primary key number threshold, deleting the primary key with the lowest access frequency based on the access volume of each recorded primary key.
  • the hotspot identification tree has multiple layers.
  • the purpose of setting up multiple layers for the hotspot identification tree is to avoid hash conflicts and further save memory.
  • the hotspot index is calculated through a certain calculation method, and because the number of hotspot indexes is less than the number of primary keys, the hotspot index corresponds to multiple primary keys. In order to distinguish different primary keys corresponding to the same hotspot index as much as possible, in When the hotspot identification tree has only one layer, it can be achieved by increasing the number of hotspot indexes.
  • each first-layer hotspot index corresponds to M second-layer hotspots.
  • each second-level hotspot index can correspond to Q third-level hotspot indexes (N, M, and Q are all preset positive integers, and the sizes of N, M, and Q can be the same or different), and so on.
  • Different layers of hotspot indexes correspond to different index calculation methods (which can be hash functions), which can avoid hash conflicts (when the index calculation method is a hash function) and reduce the possibility of misidentifying some data as hot spots. sex.
  • the initial hotspot identification tree has only one layer.
  • the count value corresponding to any hotspot index in this layer exceeds the preset expansion threshold, a new hotspot index corresponding to the hotspot index of the next layer is created.
  • most hotspot indexes stop at the first few layers, and there are few hotspot indexes that can reach the last layer.
  • the same size Storage space can avoid hash collisions.
  • Figure 2 shows a structure in which multiple layers have corresponding hotspot indexes, in actual applications During use, not every first-level hotspot index corresponds to a second-level hotspot index.
  • the multi-layer structure of the hotspot identification tree can avoid hash conflicts and reduce the memory space occupied.
  • the hotspot identification tree includes a first layer and a second layer.
  • Step 103 specifically includes: calculating the first-level hotspot index corresponding to the primary key according to the hash function corresponding to the first level of the hotspot identification tree; responding to the corresponding first-level hotspot index not existing in the hotspot identification tree.
  • second layer hotspot index determine the technical value corresponding to the first layer hotspot index as the corresponding count value; in response to the existence of the second layer hotspot corresponding to the first layer hotspot index in the hotspot identification tree Index, determine the second layer hotspot index corresponding to the primary key according to the hash function corresponding to the second layer, and determine the count value corresponding to the second layer hotspot index as the corresponding count value.
  • the hash function corresponding to the first layer is different from the hash function corresponding to the second layer.
  • the determined count value is the count value of the last level corresponding to the primary key. Specifically, when there is a second hotspot index in the first-level hotspot index corresponding to the primary key, the count value of the second-level hotspot index corresponding to the primary key is used as the determined count value. If there is no second-level hotspot index, In the case of index, the count value of the first-level hotspot index is used as the determined count value.
  • the process of expanding the next layer In addition to determining the count value, there is also the process of expanding the next layer. Specifically: initializing the count values corresponding to multiple hotspot indexes included in the hotspot identification tree to 0 in advance, including: preliminarily Count values corresponding to multiple hotspot indexes are initialized to 0.
  • the method also includes: in response to the count value corresponding to the first-layer hotspot index after adding 1 exceeding the preset expansion threshold, creating a plurality of second-layer hotspot indexes corresponding to the first-layer hotspot index, and adding the plurality of second-layer hotspot indexes to the first-layer hotspot index.
  • the count value corresponding to the second-layer hotspot index is initialized to 0.
  • the first-layer hotspot index is initialized during pre-initialization.
  • the next-layer hotspot index is expanded and initialized.
  • the false alarm rate can be determined based on the number of layers of the hotspot identification tree, the theoretical maximum number of hotspot indexes at each layer, and the maximum value that the count value can reach (if the primary key of the hotspot is identified by identifying the count The value exceeds the predetermined hotspot threshold) and the access request threshold (in the case where there is an access request threshold) are calculated, and each threshold can be reasonably set to meet the hotspot identification requirements. And because it does not occupy memory, the performance overhead is not large.
  • the hotspot primary key After the hotspot primary key is identified, the hotspot primary key also needs to be processed to avoid the impact of the hotspot primary key on stand-alone operation. There are many ways to deal with hot spots. Next, we will take current limiting as an example to illustrate the processing process of hot spot primary keys.
  • Figure 3 is a flow chart of a current limiting method according to an exemplary embodiment of this specification, including:
  • Step 301 Receive a data access request for target data in the database, where the data access request carries the primary key corresponding to the target data.
  • Step 303 In response to the primary key belonging to the hotspot primary key, block the data access request for the target data.
  • the hotspot primary key is obtained by the aforementioned hotspot identification method. In this way, data access requests for hotspot primary keys are blocked, thereby achieving the purpose of protecting the system itself.
  • blocking the access request to the target data may be implemented by: after identifying the hotspot primary key through the aforementioned hotspot identification method, blocking the access request if it is a hotspot primary key.
  • a stand-alone current limiter for example: RateLimiter in guava.
  • the current limiter can also be other current limiters. This manual is for (The type of current limiter is not limited), the access request will be blocked by the system after further processing (for example, judging whether it is hot data through the aforementioned method) to save operating performance.
  • the hotspot primary key can also be identified through the recorded number of visits.
  • the recorded access volume of a certain primary key is greater than a certain value, data access requests for that primary key will be blocked, which can improve the accuracy of current limiting.
  • the hotspot identification tree includes a 4-layer structure.
  • the first layer includes 1 sketch
  • the second layer includes each hotspot in the first layer.
  • the sketch corresponding to the index that is, the second layer includes at most 256 sketches
  • the third layer includes at most 256*256 sketches
  • the fourth layer includes at most 256*256*256 sketches (for convenience, only 1 is shown for each layer sketch). Since each sketch has 256 hotspot indexes, the maximum number of hotspots that this hotspot identification tree can identify is the 4th power of 256 (that is, the number of hotspot indexes included in the fourth layer). It should also be noted that different layers correspond to different Hash functions.
  • the hotspot identification tree There are two operations for the hotspot identification tree. The first is to update the hotspot identification tree, that is, when a data access request is received, the hotspot identification tree is updated, and by the way, it is determined whether the data accessed by the data access request is a hotspot. The second is to reset the hotspot identification tree.
  • the remainder of 256 can be taken from the index value obtained by solving the problem to obtain the first-layer hotspot index.
  • the first-level hotspot index After obtaining the first-level hotspot index, determine whether the first-level hotspot index corresponds to the second-level hotspot index. If there is a second-level hotspot index, calculate the second-level hotspot index corresponding to the primary key, and determine the second-level hotspot index. Whether there is a third-layer hotspot index in the layer hotspot index, and so on, until the last layer of hotspot index corresponding to the primary key is determined (the last layer refers to the one with the largest number of layers. It should be noted that the last layer of hotspot index here is It is the last hotspot index where the primary key exists).
  • the hotspot index of the last layer After determining the hotspot index of the last layer, add 1 to the count value corresponding to the hotspot index of the last layer, and determine whether the count value after adding 1 exceeds the maximum count value (14). If it does, expand the hotspot index.
  • the corresponding next-level sketch (that is, initializing the next-level sketch). If there is no scalable next-level sketch, the primary key is determined to be a hotspot primary key.
  • the hash function is used to disperse the data access requests into multiple hotspot indexes for accumulation. The higher the accumulated value, the higher the frequency (this effect must be combined with the hotspot identification tree reset process). Because the number of hotspot indexes (256) in each sketch is much smaller than the data distribution to be accessed by the data access request, low-frequency data may cause false positives due to hash conflicts and high-frequency data being scattered to the same hotspot index. Therefore, in A multi-layer design is added to the hotspot identification tree. Each layer corresponds to a hash function. In this way, the data mixed together due to hash conflicts on the first layer has a higher probability of being dispersed using another hash function on the second layer. Through 4 layers of 4 hash functions, false positives caused by hash conflicts can be greatly reduced.
  • each piece of data has a primary key that uniquely identifies it.
  • the primary key of each accessed data will be used as an input parameter to call the above method to determine whether the accessed data is a hotspot.
  • Each call to the above method will return a boolean type return value (that is, it can only return true or false). When the return value is true, there is a high probability that the data corresponding to the primary key passed in is a hotspot data that is being accessed frequently.
  • any primary key that returns true will be recorded for each read and write access.
  • the above method cannot tell the specific number of visits to the data access request. By recording, the real number of visits to the hotspot primary key can be obtained, so that operation and maintenance personnel can intuitively know the number of visits.
  • the number of continuously recorded primary keys must be limited to 1,000. Once it exceeds 1,000, the data will be eliminated according to the LRU principle. At the same time, data that has not been accessed for a certain period of time (such as 5 minutes) will also be eliminated. On the one hand, this is to filter out low-frequency data with low probability of return more quickly; on the other hand, it is to limit the memory overhead and only track and record the most frequent hotspot key access data.
  • the hotspot identification tree Since each time a data access request is received, the hotspot identification tree needs to be updated through the above method, the number of times the above method is executed can be regarded as the number of received data access requests (data access requests that are not current-limited).
  • the hotspot identification tree is reset, that is, layers 2-4 of the hotspot identification tree are deleted, and all count values in the first layer are reset to 0.
  • the purpose of this is that the traffic is constantly changing, and the data accessed frequently may only last for a period of time and then cease to be high-frequency. Therefore, a periodic shrinkage mechanism needs to be added to filter out the historical high-frequency data, so that The above hotspot identification method can always correctly determine the data that is currently being accessed frequently with limited memory overhead.
  • a reset method has been added to filter out hotspot primary keys whose access frequency (that is, the number of visits to this data/the number of visits to all data) is greater than a certain value.
  • the hotspot identification tree will be reset every time it is updated P times. Then, the maximum value of the count value of each layer can be set to obtain the hotspot primary key with a predetermined access rating rate.
  • the continuously recorded primary key can be combined to learn the current real read and write access throughput, and then the operation and maintenance personnel can limit the current, such as using a single-machine current limiter (such as: RateLimiter in guava).
  • Flow use the fail fast strategy to return exceptions to the client for flow-limited requests to achieve self-protection of the system.
  • the above method is very effective in saving memory.
  • a single table only occupies 500 bytes of resident memory without hot spots.
  • this solution can detect keys with a minimum frequency of more than 3.5% and count qps and rt, which can fully meet the requirements for identifying hot spots.
  • the above method has a log(N) complexity CPU computational overhead and has a low impact on the response to read and write requests.
  • N log(N) complexity CPU computational overhead
  • personal computers can support a higher number of hotspot key determinations. (Sketch will expand when there are hot spots, which is slightly slower by 1 to 2 times), which reflects the advantage of low performance overhead.
  • the false alarm rate can be calculated through the above parameters, and the above parameters can be reasonably set to reduce the false alarm rate.
  • this specification also provides embodiments of devices and electronic equipment to which they are applied.
  • Figure 5 is a block diagram of a hot spot identification device according to an exemplary embodiment of this specification.
  • the device includes:
  • the initialization module 500 is used to initialize the count values corresponding to multiple hotspot indexes included in the hotspot identification tree to 0 in advance; the number of hotspot indexes in the hotspot identification tree is smaller than the number of data in the database.
  • the primary key determination module 510 is configured to determine the primary key of the data accessed by the data access request when receiving the data access request.
  • the count value determination module 520 is used to calculate the hotspot index corresponding to the primary key, and determine the corresponding count value in the hotspot identification tree according to the hotspot index.
  • Hotspot determination module 530 configured to determine that the primary key is a hotspot in response to the count value meeting the preset hotspot condition. Primary key; in response to the count value not meeting the preset hotspot condition, increase the count value by one.
  • the hotspot identification tree includes a first layer and a second layer.
  • the count value determination module 520 is specifically configured to: calculate the first-level hotspot index corresponding to the primary key according to the hash function corresponding to the first level of the hotspot identification tree; in response to the fact that the first level does not exist in the hotspot identification tree.
  • the second layer hotspot index corresponding to the layer hotspot index determines the technical value corresponding to the first layer hotspot index as the corresponding count value; in response to the presence of the first layer hotspot index corresponding to the hotspot identification tree in the The second layer hotspot index determines the second layer hotspot index corresponding to the primary key according to the hash function corresponding to the second layer, and determines the count value corresponding to the second layer hotspot index as the corresponding count value.
  • the initialization module 500 is specifically configured to: initialize the count values corresponding to multiple hotspot indexes included in the first layer of the hotspot identification tree to 0 in advance.
  • the device also includes: an expansion module 521 (not shown in the figure), configured to create a new corresponding first-layer hotspot index in response to the count value corresponding to the first-layer hotspot index added by 1 exceeding the preset expansion threshold. multiple second-layer hotspot indexes, and initialize the count values corresponding to the multiple second-layer hotspot indexes to 0.
  • the device further includes: a reset module 523 (not shown in the figure), configured to increase the number of access requests by 1 when a data access request for any data is received; If the number of access requests is greater than the preset access request threshold, the number of access requests is reset to 0, and the count values corresponding to multiple hotspot indexes included in the hotspot identification tree are set to 0.
  • the hotspot determination module 530 is specifically configured to determine the primary key as the hotspot primary key in response to the count value reaching a preset hotspot threshold.
  • the hotspot determination module 530 is specifically configured to determine that the primary key is a hotspot primary key in response to the ratio of the count value to the sum of all count values exceeding a preset hotspot ratio.
  • the device further includes: a recording module 531 (not shown in the figure), configured to record the primary key when it is determined that the primary key is a hotspot primary key, and record the primary key. The number of visits to the primary key.
  • a recording module 531 (not shown in the figure), configured to record the primary key when it is determined that the primary key is a hotspot primary key, and record the primary key. The number of visits to the primary key.
  • the device further includes: a deletion module 532, configured to delete the one with the lowest access frequency according to the access volume of each recorded primary key when the number of recorded primary keys exceeds a preset primary key number threshold. Primary key is deleted.
  • Figure 6 is a block diagram of a current limiting device according to an exemplary embodiment of this specification.
  • the device includes:
  • the request receiving module 610 is used to receive a data access request for target data in the database, where the data access request carries the primary key corresponding to the target data;
  • the request blocking module 620 is configured to block data access requests for the target data in response to the fact that the primary key belongs to a hotspot primary key; the hotspot primary key is obtained through the aforementioned hotspot identification method.
  • the device embodiment since it basically corresponds to the method embodiment, please refer to the partial description of the method embodiment for relevant details.
  • the device embodiments described above are only illustrative.
  • the modules described as separate components may or may not be physically separated.
  • the components shown as modules may or may not be physical modules, that is, they may be located in One place, or it can be distributed to multiple network modules. Some or all of the modules can be selected according to actual needs to achieve the purpose of the solution in this specification. A person of ordinary skill in the art will not pay It can be understood and implemented when creative work is performed.
  • Figure 7 shows a hardware structure diagram of an electronic device in which the hotspot identification device or the current limiting device is located according to the embodiment.
  • the device may include: a processor 1010, a memory 1020 for storing computer instructions, an input/ Output interface 1030, communication interface 1040 and bus 1050.
  • the processor 1010, the memory 1020, the input/output interface 1030 and the communication interface 1040 implement communication connections between each other within the device through the bus 1050.
  • the processor 1010 can be implemented using a general-purpose CPU (Central Processing Unit, processor), a microprocessor, an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), or one or more integrated circuits, and is used to execute computer instructions. , to implement the above hotspot identification method or current limiting method.
  • a general-purpose CPU Central Processing Unit, processor
  • microprocessor a microprocessor
  • ASIC Application Specific Integrated Circuit
  • the memory 1020 can be implemented in the form of ROM (Read Only Memory), RAM (Random Access Memory), static storage device, dynamic storage device, etc.
  • the memory 1020 can store operating systems and other application programs. When implementing the technical solutions provided by the embodiments of this specification through software or firmware, the relevant program codes are stored in the memory 1020 and called and executed by the processor 1010 .
  • the input/output interface 1030 is used to connect the input/output module to realize information input and output.
  • the input/output/module can be configured in the device as a component (not shown in the figure), or can be externally connected to the device to provide corresponding functions.
  • Input devices can include keyboards, mice, touch screens, microphones, various sensors, etc., and output devices can include monitors, speakers, vibrators, indicator lights, etc.
  • the communication interface 1040 is used to connect a communication module (not shown in the figure) to realize communication interaction between this device and other devices.
  • the communication module can realize communication through wired means (such as USB, network cable, etc.) or wireless means (such as mobile network, WIFI, Bluetooth, etc.).
  • Bus 1050 includes a path that carries information between various components of the device (eg, processor 1010, memory 1020, input/output interface 1030, and communication interface 1040).
  • the above device only shows the processor 1010, the memory 1020, the input/output interface 1030, the communication interface 1040 and the bus 1050, during specific implementation, the device may also include necessary components for normal operation. Other components.
  • the above-mentioned device may only include components necessary to implement the embodiments of this specification, and does not necessarily include all components shown in the drawings.
  • Embodiments of this specification also provide a computer-readable storage medium on which a computer program is stored.
  • the program is executed by a processor, the above-mentioned hot spot identification method and/or current limiting method is implemented.
  • Computer-readable media includes both persistent and non-volatile, removable and non-removable media that can be implemented by any method or technology for storage of information.
  • Information may be computer-readable instructions, data structures, modules of programs, or other data.
  • Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), and read-only memory.
  • PRAM phase change memory
  • SRAM static random access memory
  • DRAM dynamic random access memory
  • RAM random access memory
  • read-only memory read-only memory
  • ROM read-only memory
  • EEPROM electrically erasable programmable read-only memory
  • flash memory or other memory technology
  • compact disc read-only memory CD-ROM
  • DVD digital versatile disc
  • Magnetic tape cassettes tape magnetic disk storage or other magnetic storage devices or any other non-transmission medium can be used to store information that can be accessed by a computing device.
  • computer-readable media does not include temporary computer-readable media (transitory media), such as modulated of data signals and carrier waves.
  • Embodiments of this specification also provide a computer program that, when run, implements the foregoing hotspot identification method or the foregoing current limiting method.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Computing Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The description provides a hotspot recognition method and a rate limiting method. The hotspot recognition method comprises: initializing count values corresponding to a plurality of hotspot indexes comprised in a hotspot recognition tree to be 0 in advance, the number of hotspot indexes in the hotspot recognition tree being less than the amount of data in a database; under the condition that a data access request is received, determining a primary key of data that the data access request requires to access; calculating a hotspot index corresponding to the primary key, and determining a corresponding count value in the hotspot recognition tree according to the hotspot index, the count value being used for representing the number of access times of data corresponding to the hotspot index; in response to the count value satisfying a preset hotspot condition, determining that the primary key is a hotspot primary key; and in response to the count value not satisfying the preset hotspot condition, adding one to the count value.

Description

一种热点识别方法及一种限流方法A hotspot identification method and a current limiting method
本申请要求于2022年03月22日提交中国专利局、申请号为202210289116.6、申请名称为“一种热点识别方法及一种限流方法”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims priority to the Chinese patent application submitted to the China Patent Office on March 22, 2022, with application number 202210289116.6 and application title "A hotspot identification method and a current limiting method", the entire content of which is incorporated by reference. in this application.
技术领域Technical field
本说明书一个或多个实施例涉及计算机应用技术领域,尤其涉及一种热点识别方法及一种限流方法。One or more embodiments of this specification relate to the field of computer application technology, and in particular, to a hotspot identification method and a current limiting method.
背景技术Background technique
分布式数据库中,会将一个较大的数据库的数据分成多份存在不同物理机上。如果某一段时间内,数据库内的一条数据被频繁访问,由于一条数据的访问服务一般是由单台物理机所承担的,在这条数据的访问量较大时,则会形成热点,即分布式系统中,访问请求集中于一台物理机,比如微博热搜、电商秒杀等场景下,容易发生热点问题。在发生热点问题的情况下,热点所在的物理机将可能由于流量较多而宕机,影响分布式数据库服务的稳定性。In a distributed database, the data of a larger database will be divided into multiple copies and stored on different physical machines. If a piece of data in the database is frequently accessed within a certain period of time, since the access service of a piece of data is generally undertaken by a single physical machine, when the amount of access to this piece of data is large, hot spots will be formed, that is, distribution In traditional systems, access requests are concentrated on one physical machine, such as Weibo hot searches, e-commerce flash sales and other scenarios, where hot issues are prone to occur. In the event of a hotspot problem, the physical machine where the hotspot is located may go down due to heavy traffic, affecting the stability of the distributed database service.
对于非关系型数据库来说,每条数据一般对应于一个主键(key),相关技术中一些分布式数据库一般是通过统计每个主键的访问次数来识别热点的,但是对于单台物理机存储的数据较多的数据库来说,这种方法会占用大量内存,影响数据库的正常运行。For non-relational databases, each piece of data generally corresponds to a primary key (key). In related technologies, some distributed databases generally identify hot spots by counting the number of accesses to each primary key, but for data stored on a single physical machine For databases with a lot of data, this method will occupy a lot of memory and affect the normal operation of the database.
可见,针对单台物理机数据量较大的分布式数据库(且是非关系型数据库),缺乏一种可以避免系统宕机的热点识别方法。It can be seen that for distributed databases (and non-relational databases) with a large amount of data on a single physical machine, there is a lack of a hotspot identification method that can avoid system downtime.
发明内容Contents of the invention
有鉴于此,本说明书一个或多个实施例提供一种热点识别方法及一种限流方法。In view of this, one or more embodiments of this specification provide a hotspot identification method and a current limiting method.
根据本说明书一个或多个实施例的第一方面,提出了一种热点识别方法,预先将热点识别树包括的多个热点索引对应的计数值初始化为0;所述热点识别树中热点索引的数量小于数据库中数据的数量;所述方法包括:According to the first aspect of one or more embodiments of this specification, a hotspot identification method is proposed, in which the count values corresponding to multiple hotspot indexes included in the hotspot identification tree are initialized to 0; the hotspot indexes in the hotspot identification tree are The number is less than the number of data in the database; the method includes:
在收到数据访问请求的情况下,确定所述数据访问请求所访问数据的主键;When a data access request is received, determine the primary key of the data accessed by the data access request;
计算所述主键对应的热点索引,并根据所述热点索引在热点识别树中确定对应的计数值; Calculate the hotspot index corresponding to the primary key, and determine the corresponding count value in the hotspot identification tree based on the hotspot index;
响应于所述计数值满足预设的热点条件,确定述主键是热点主键;响应于所述计数值不满足预设的热点条件,将所述计数值加一。In response to the count value meeting the preset hotspot condition, it is determined that the primary key is the hotspot primary key; in response to the count value not meeting the preset hotspot condition, the count value is increased by one.
根据本说明书一个或多个实施例的第二方面,提出了一种限流方法,包括:According to the second aspect of one or more embodiments of this specification, a current limiting method is proposed, including:
接收对数据库中的目标数据的数据访问请求,所述数据访问请求携带所述目标数据对应的主键;Receive a data access request for target data in the database, where the data access request carries the primary key corresponding to the target data;
响应于所述主键属于热点主键,阻断对所述目标数据的数据访问请求;所述热点主键是通过前述的热点识别方法得到。In response to the primary key belonging to a hotspot primary key, the data access request for the target data is blocked; the hotspot primary key is obtained through the aforementioned hotspot identification method.
根据本说明书实施例的第三方面,提供一种热点识别装置,预先将热点识别树包括的多个热点索引对应的计数值初始化为0;所述热点识别树中热点索引的数量小于数据库中数据的数量;所述装置包括:According to a third aspect of the embodiment of this specification, a hotspot identification device is provided, which pre-initializes the count values corresponding to multiple hotspot indexes included in the hotspot identification tree to 0; the number of hotspot indexes in the hotspot identification tree is smaller than the data in the database. The quantity; the device includes:
主键确定模块,用于在收到数据访问请求的情况下,确定所述数据访问请求所访问数据的主键;A primary key determination module, configured to determine the primary key of the data accessed by the data access request when receiving a data access request;
计数值确定模块,用于计算所述主键对应的热点索引,并根据所述热点索引在热点识别树中确定对应的计数值;A count value determination module, used to calculate the hotspot index corresponding to the primary key, and determine the corresponding count value in the hotspot identification tree according to the hotspot index;
热点确定模块,用于响应于所述计数值满足预设的热点条件,确定述主键是热点主键;响应于所述计数值不满足预设的热点条件,将所述计数值加一。A hotspot determination module, configured to determine that the primary key is a hotspot primary key in response to the count value meeting a preset hotspot condition; and increment the count value by one in response to the count value not meeting the preset hotspot condition.
根据本说明书实施例的第四方面,提供一种限流装置,包括:According to a fourth aspect of the embodiments of this specification, a current limiting device is provided, including:
请求接收模块,用于接收对数据库中的目标数据的数据访问请求,所述数据访问请求携带所述目标数据对应的主键;A request receiving module, configured to receive a data access request for target data in the database, where the data access request carries the primary key corresponding to the target data;
请求阻断模块,用于响应于所述主键属于热点主键,阻断对所述目标数据的数据访问请求;所述热点主键是通过前述的热点识别方法得到。A request blocking module, configured to block data access requests to the target data in response to the fact that the primary key belongs to a hotspot primary key; the hotspot primary key is obtained through the aforementioned hotspot identification method.
根据本说明书实施例的第五方面,提供一种电子设备,包括:According to a fifth aspect of the embodiments of this specification, an electronic device is provided, including:
处理器;processor;
用于存储处理器可执行指令的存储器;Memory used to store instructions executable by the processor;
其中,所述处理器通过运行所述可执行指令以实现前述的热点识别方法或前述的限流方法。Wherein, the processor executes the executable instructions to implement the foregoing hotspot identification method or the foregoing current limiting method.
根据本说明书实施例的第六方面,提供一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机指令,所述计算机指令被处理器执行时实现前述的热点识别方法或前述的限流方法。According to a sixth aspect of the embodiments of this specification, a computer-readable storage medium is provided. Computer instructions are stored on the computer-readable storage medium. When the computer instructions are executed by a processor, the aforementioned hot spot identification method or the aforementioned method is implemented. Current limiting method.
本说明书提供了一种热点识别方法及一种限流方法,预先将热点识别树包括的多个热点索引对应的计数值初始化为0;所述热点识别树中热点索引的数量小于数据库中数据的数量;在收到数据访问请求的情况下,确定所述数据访问请求所访问数据的主键;计算所述主键对应的热点索引,并根据所述热点索引在热点识别树中确定对应的计数值;所述计数值用于表示所述热点索引对应的数据的已访问次数;响应于所述计数值满足预设的热点条件,确定述主键是热点主键;响应于所述计数值不满足预设的热点条件,将所述计数值 加一。This specification provides a hotspot identification method and a current limiting method. The count values corresponding to multiple hotspot indexes included in the hotspot identification tree are initialized to 0 in advance; the number of hotspot indexes in the hotspot identification tree is less than the number of data in the database. Quantity; when receiving a data access request, determine the primary key of the data accessed by the data access request; calculate the hotspot index corresponding to the primary key, and determine the corresponding count value in the hotspot identification tree based on the hotspot index; The count value is used to represent the number of times the data corresponding to the hotspot index has been accessed; in response to the count value meeting the preset hotspot conditions, it is determined that the primary key is the hotspot primary key; in response to the count value not meeting the preset hotspot conditions hotspot condition, the count value plus one.
通过上述方法通过统计每个主键对应的热点索引的计数值,由于热点索引的数量小于数据库内主键的数量,可以在确保一定准确度的情况下,占用较小的内存来识别热点,并对热点进行处理,避免热点问题占用较大内存或影响数据库服务的正常运行。Through the above method, the count value of the hotspot index corresponding to each primary key is counted. Since the number of hotspot indexes is smaller than the number of primary keys in the database, hotspots can be identified while ensuring a certain accuracy and occupying less memory, and the hotspots can be identified. Process them to avoid hot spots occupying large amounts of memory or affecting the normal operation of database services.
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,并不能限制本说明书。It should be understood that the above general description and the following detailed description are only exemplary and explanatory, and do not limit this specification.
附图说明Description of the drawings
此处的附图被并入说明书中并构成本说明书的一部分,示出了符合本说明书的实施例,并与说明书一起用于解释本说明书的原理。The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the specification and together with the description, serve to explain the principles of the specification.
图1是本说明书根据一示例性实施例示出的一种热点识别方法的流程图。Figure 1 is a flow chart of a hotspot identification method illustrated in this specification according to an exemplary embodiment.
图2是本说明书根据一示例性实施例示出的一种热点识别树的结构示意图。Figure 2 is a schematic structural diagram of a hotspot identification tree shown in this specification according to an exemplary embodiment.
图3是本说明书根据一示例性实施例示出的一种限流方法的流程图。Figure 3 is a flow chart of a current limiting method according to an exemplary embodiment of this specification.
图4A是本说明书根据一具体实施例示出的一种sketch的结构示意图。FIG. 4A is a schematic structural diagram of a sketch according to a specific embodiment of this specification.
图4B是本说明书根据一具体实施例示出的一种热点识别树的结构示意图。FIG. 4B is a schematic structural diagram of a hotspot identification tree according to a specific embodiment of this specification.
图5是本说明书根据一示例性实施例示出的一种热点识别装置的框图。Figure 5 is a block diagram of a hotspot identification device according to an exemplary embodiment of this specification.
图6是本说明书根据一示例性实施例示出的一种限流装置的框图。FIG. 6 is a block diagram of a current limiting device according to an exemplary embodiment of this specification.
图7是本说明书根据一示例性实施例示出的一种热点识别装置或限流装置所在电子设备的一种硬件结构图。FIG. 7 is a hardware structure diagram of an electronic device in which a hotspot identification device or a current limiting device is located according to an exemplary embodiment of this specification.
具体实施方式Detailed ways
这里将详细地对示例性实施例进行说明,其示例表示在附图中。下面的描述涉及附图时,除非另有表示,不同附图中的相同数字表示相同或相似的要素。以下示例性实施例中所描述的实施方式并不代表与本说明书一个或多个实施例相一致的所有实施方式。相反,它们仅是与如所附权利要求书中所详述的、本说明书一个或多个实施例的一些方面相一致的装置和方法的例子。Exemplary embodiments will be described in detail herein, examples of which are illustrated in the accompanying drawings. When the following description refers to the drawings, the same numbers in different drawings refer to the same or similar elements unless otherwise indicated. The implementations described in the following exemplary embodiments do not represent all implementations consistent with one or more embodiments of this specification. Rather, they are merely examples of apparatus and methods consistent with some aspects of one or more embodiments of this specification as detailed in the appended claims.
需要说明的是:在其他实施例中并不一定按照本说明书示出和描述的顺序来执行相应方法的步骤。在一些其他实施例中,其方法所包括的步骤可以比本说明书所描述的更多或更少。此外,本说明书中所描述的单个步骤,在其他实施例中可能被分解为多个步骤进行描述;而本说明书中所描述的多个步骤,在其他实施例中也可能被合并为单个步骤进行描述。It should be noted that in other embodiments, the steps of the corresponding method are not necessarily performed in the order shown and described in this specification. In some other embodiments, methods may include more or fewer steps than described in this specification. In addition, a single step described in this specification may be broken down into multiple steps for description in other embodiments; and multiple steps described in this specification may also be combined into a single step in other embodiments. describe.
分布式数据库中,一般将用户的数据按照字节数组顺序(byte array order)进行排序,并将排序后的数据进行切分,切分得到的不同数据存储在数据库不同的模块(region)中,并由该region提供这部分数据的访问服务。这种设计一般称为范围分片设计,是分布式系 统领域中一种常见的设计方法,在这种设计下,一条用户数据一定能唯一定位到一个范围分片。In a distributed database, user data is generally sorted according to byte array order, and the sorted data is segmented. The different data obtained by segmentation are stored in different modules (regions) of the database. And the region provides access services to this part of the data. This design is generally called a range sharding design and is a distributed system This is a common design method in the system field. Under this design, a piece of user data must be uniquely located to a range shard.
分布式系统一般通过水平扩容获得极大的扩展能力,只需要扩容机器,几乎可以一直提升整体吞吐能力。但是由于范围分片设计,在某些情况下,由于一些特性(比如:微博热搜,电商秒杀,一条微博和一件商品对应的数据为一条数据)可能会使大部分流量在短时间内集中在某一条或某几条数据上,即大部分流量在短时间内会命中特定的一个或几个region,这会给这些region所处的机器带来巨大的负载,这样就形成了热点问题。热点问题可能会导致硬件宕机、进程退出等的异常情况,在许多分布式系统当中,都有对单机异常的容灾设计,当机器出现宕机,会由别的机器负责接管宕机机器上的访问服务,但是,热点问题并不是硬件问题,随着机器宕机,其他机器接管服务,再次被热点流量击垮宕机,容易对整个集群形成雪崩的效应,严重影响服务的稳定性。可见,受限于单机硬件资源有限,单机的处理能力也一定是有限,在热点问题面前,分布式系统的横向拓展能力毫无用武之地,因此需要寻求其他办法来解决热点问题。Distributed systems generally gain great scalability through horizontal expansion. By simply expanding the machine, the overall throughput can almost always be improved. However, due to the range sharding design, in some cases, due to some characteristics (such as: Weibo hot searches, e-commerce flash sales, the data corresponding to a Weibo and a product is one piece of data), most of the traffic may be in a short period of time. Concentrate on one or several pieces of data within a short time, that is, most of the traffic will hit one or several specific regions in a short period of time, which will put a huge load on the machines where these regions are located, thus forming a Hot Issues. Hot issues may lead to abnormal situations such as hardware downtime and process exit. In many distributed systems, there are disaster recovery designs for single-machine abnormalities. When a machine goes down, other machines will be responsible for taking over the downtime machine. However, the hotspot problem is not a hardware problem. As the machine goes down, other machines take over the service and are once again overwhelmed by hotspot traffic. This can easily cause an avalanche effect on the entire cluster, seriously affecting the stability of the service. It can be seen that due to the limited hardware resources of a single machine, the processing power of a single machine must also be limited. In the face of hot issues, the horizontal expansion capabilities of distributed systems are useless, so other methods need to be found to solve hot issues.
相关技术中,针对热点问题的治理手段通常为限流,但是热点问题的难点在于热点本身的识别和发现。为了识别热点,考虑到每条数据都对应于一个主键(key),部分内存数据库会统计所有主键的访问量,从而判断出热点主键。In related technologies, the management method for hot issues is usually current limiting, but the difficulty of hot issues lies in the identification and discovery of the hot spots themselves. In order to identify hot spots, considering that each piece of data corresponds to a primary key, some in-memory databases will count the number of visits to all primary keys to determine the hot spot primary key.
但是上述方法仅适合类似内存数据库等单机存储的数据有限的场景。而对于有些持久化数据库等单机存储大量数据(几TB的数据,每秒吞吐可能在几十万QPS的量级)的数据库而言,统计每个key的访问量会占用大量内存,导致该方法无法采用,且效率较低。However, the above method is only suitable for scenarios where data stored on a single machine is limited, such as in-memory databases. For some databases such as persistent databases that store a large amount of data on a single machine (several TB of data, the throughput per second may be in the order of hundreds of thousands of QPS), counting the access volume of each key will occupy a large amount of memory, resulting in this method Unable to adopt and less efficient.
基于此,本说明书提供了一种热点识别方法及一种限流方法,预先将热点识别树包括的多个热点索引对应的计数值初始化为0;所述热点识别树中热点索引的数量小于数据库中数据的数量;在收到数据访问请求的情况下,确定所述数据访问请求所访问数据的主键;计算所述主键对应的热点索引,并根据所述热点索引在热点识别树中确定对应的计数值;所述计数值用于表示所述热点索引对应的数据的已访问次数;响应于所述计数值满足预设的热点条件,确定述主键是热点主键;响应于所述计数值不满足预设的热点条件,将所述计数值加一。Based on this, this specification provides a hotspot identification method and a current limiting method. The count values corresponding to multiple hotspot indexes included in the hotspot identification tree are initialized to 0 in advance; the number of hotspot indexes in the hotspot identification tree is smaller than the number of hotspot indexes in the database. the number of data in the data; when receiving a data access request, determine the primary key of the data accessed by the data access request; calculate the hotspot index corresponding to the primary key, and determine the corresponding hotspot identification tree in the hotspot identification tree based on the hotspot index Count value; the count value is used to represent the number of times the data corresponding to the hotspot index has been accessed; in response to the count value meeting the preset hotspot conditions, it is determined that the primary key is the hotspot primary key; in response to the count value not satisfying For the preset hotspot condition, the count value is increased by one.
上述方法通过统计每个主键对应的热点索引的计数值,由于热点索引的数量小于数据库内主键的数量,可以在确保一定准确度的情况下,占用较小的内存来识别热点,并基于识别到的热点对热点进行处理,避免热点问题占用较大内存或影响数据库服务的正常运行。The above method counts the count value of the hotspot index corresponding to each primary key. Since the number of hotspot indexes is smaller than the number of primary keys in the database, it can identify hotspots while ensuring a certain accuracy and occupying less memory, and based on the identified Process hot spots to avoid hot spots occupying large memory or affecting the normal operation of database services.
接下来将对本说明书示出的一种热点识别方法进行说明。Next, a hotspot identification method shown in this specification will be described.
如图1所所示,图1是本说明书根据一示例性实施例示出的一种热点识别方法的流程图,包括以下步骤:As shown in Figure 1, Figure 1 is a flow chart of a hotspot identification method illustrated in this specification according to an exemplary embodiment, including the following steps:
步骤103,在收到数据访问请求的情况下,确定所述数据访问请求所访问数据的主键。Step 103: Upon receiving a data access request, determine the primary key of the data accessed by the data access request.
还需要说明的是,在该方法执行前,还需要预先将热点识别树包括的多个热点索引对 应的计数值初始化为0;所述热点识别树中热点索引的数量小于数据库中数据的数量。It should also be noted that before this method is executed, the multiple hotspot indexes included in the hotspot identification tree need to be matched in advance. The corresponding count value is initialized to 0; the number of hotspot indexes in the hotspot identification tree is less than the number of data in the database.
步骤103中之所以需要确定主键,是因为数据访问请求是针对数据的读或写的请求,本说明书中数据访问请求所针对的数据库是非关系型数据库,每一条数据都对应于一个主键,可以通过不同主键来区分针对不同数据的访问请求。因此为了统计针对不同数据的模糊访问量(具体含义详见下文描述),需要先确定数据访问请求所访问数据的主键。The reason why the primary key needs to be determined in step 103 is because the data access request is a request for reading or writing data. The database targeted by the data access request in this manual is a non-relational database. Each piece of data corresponds to a primary key, which can be passed Different primary keys are used to distinguish access requests for different data. Therefore, in order to count the number of fuzzy accesses to different data (the specific meaning is described below), it is necessary to first determine the primary key of the data accessed by the data access request.
之所以需要在该方法执行前,将多个热点索引对应的计数值初始化为0,是因为本说明书中通过热点识别树来统计针对不同数据的模糊访问量(也就是计数值),进而识别热点,因此预先需要将热点识别树中初始化为没有记录有任何访问量的情况,也就是将热点识别树的多个热点索引对应的计数值清零,好方便进行模糊访问量的统计。The reason why it is necessary to initialize the count values corresponding to multiple hotspot indexes to 0 before executing this method is because in this manual, the hotspot identification tree is used to count the fuzzy visits (that is, the count values) for different data, and then identify the hotspots. , so it is necessary to initialize the hotspot identification tree in advance so that no visits are recorded, that is, to clear the count values corresponding to multiple hotspot indexes in the hotspot identification tree to facilitate the statistics of fuzzy visits.
之所以热点识别树中热点索引的数量小于数据库中数据的数量,是为了节省内存。为了节省内存,相比于相关技术中每个主键统计一个访问量,需要使得统计的一个模糊访问量对应于多个主键,因此需要确保热点索引的数量小于数据库中主键的数量,这样才能节省内存空间,从而能使得本说明书中的方法可以支持对数据量多的数据库的热点的识别。The reason why the number of hotspot indexes in the hotspot identification tree is smaller than the number of data in the database is to save memory. In order to save memory, instead of counting one visit per primary key in related technologies, one fuzzy visit count needs to correspond to multiple primary keys. Therefore, it is necessary to ensure that the number of hotspot indexes is smaller than the number of primary keys in the database, so as to save memory. space, so that the method in this specification can support the identification of hot spots in databases with large amounts of data.
此外,热点识别树即为记录了各个热点索引的计数值的树形结构,其可以只有一层,也可以有多层。一层的情况下,该热点识别树的一层中存储了多个热点索引,并存储了多个热点索引对应的计数值。In addition, the hotspot identification tree is a tree structure that records the count value of each hotspot index. It may have only one layer or multiple layers. In the case of one layer, multiple hotspot indexes are stored in one layer of the hotspot identification tree, and count values corresponding to the multiple hotspot indexes are stored.
多层的情况下,该热点识别树每一层都存储了多个热点索引和其对应的计数值。对于多层的热点识别树而言,上一层的一个热点索引对应了下一层的多个热点索引。之所以会有这样的对应关系,是因为两层计算热点索引的方法不一样,因此在第一层对应于一个热点索引的多个主键,会在下一层被与上一层不同的计算方法计算成多个热点索引。换言之,上一层的一个热点索引,和该上一层的热点索引对应的下一层的多个热点索引表征了同一批数据。这样,多层的结构下,随着层数的增加,每个热点索引对应的数据逐渐减少。In the case of multiple layers, each layer of the hotspot identification tree stores multiple hotspot indexes and their corresponding count values. For a multi-layer hotspot identification tree, a hotspot index in the upper layer corresponds to multiple hotspot indexes in the lower layer. The reason for this correspondence is that the two layers have different methods of calculating hotspot indexes. Therefore, multiple primary keys corresponding to a hotspot index on the first layer will be calculated using different calculation methods on the next layer than on the previous layer. into multiple hot indexes. In other words, a hotspot index on the upper layer and multiple hotspot indexes on the lower layer corresponding to the hotspot index on the upper layer represent the same batch of data. In this way, under a multi-layer structure, as the number of layers increases, the data corresponding to each hotspot index gradually decreases.
还需要对热点识别树的扩展方法进行说明。在某一层的一个热点索引的计数值计满了(即达到计数值可达到的最大值)的情况下,将扩展该热点索引对应的下一层的热点索引,以确定该热点索引对应的数据中,究竟哪个/哪些数据是热点。The extension method of the hotspot identification tree also needs to be explained. When the count value of a hotspot index in a certain layer is full (that is, the maximum value that the count value can reach), the hotspot index corresponding to the hotspot index will be expanded to determine the hotspot index corresponding to the hotspot index. In the data, which data/data is the hot spot?
此外,对于多层结构的热点识别树的优势和其他细节的说明将在下文进行详述,在此暂不赘述。In addition, the advantages and other details of the multi-layered hotspot identification tree will be described in detail below and will not be described again here.
步骤105,计算所述主键对应的热点索引,并根据所述热点索引在热点识别树中确定对应的计数值。Step 105: Calculate the hotspot index corresponding to the primary key, and determine the corresponding count value in the hotspot identification tree according to the hotspot index.
首先对步骤105中涉及的各个名词的含义进行说明。First, the meaning of each noun involved in step 105 will be explained.
热点索引也就是通过类似于散列函数等计算出来的索引值,即将主键作为输入,通过散列函数等方法计算得到热点索引。The hotspot index is an index value calculated through a method similar to a hash function, that is, the primary key is used as input, and the hotspot index is calculated through a hash function and other methods.
热点索引在热点识别树中对应的计数值,表征了热点索引对应的数据的访问量的大小。在热点识别树只有一层的情况下,热点索引对应的计数值表示了热点索引对应的多个主键/数据在一段时间内的访问量和,也就是计数值表征热点索引对应的数据的模糊访问量。 The count value corresponding to the hotspot index in the hotspot identification tree represents the number of visits to the data corresponding to the hotspot index. When the hotspot identification tree has only one layer, the count value corresponding to the hotspot index represents the sum of the number of visits to multiple primary keys/data corresponding to the hotspot index within a period of time, that is, the count value represents the fuzzy access to the data corresponding to the hotspot index. quantity.
在热点识别树具有多层的情况下,每一层任一热点索引对应的计数值用于表征,该热点索引对应的数据(比如说热点识别树具有两层,第二层任一热点索引对应的数据,指的是该第二层热点索引对应的第一层热点数据对应的数据中,索引值为该第二层热点索引的数据)的访问量级别。When the hotspot identification tree has multiple layers, the count value corresponding to any hotspot index in each layer is used to represent the data corresponding to the hotspot index (for example, the hotspot identification tree has two layers, and any hotspot index in the second layer corresponds to The data refers to the access level of the data corresponding to the first-layer hotspot data corresponding to the second-layer hotspot index, and the index value is the data of the second-layer hotspot index).
比如在热点识别树具有两层的情况下,某个热点索引只对应有第一层热点索引,或者该热点索引对应的部分第二层热点索引的计数值为0,则证明这些热点索引(前者的第一层热点索引,后者的部分第二层热点索引)对应的数据访问量较少;如果某个热点索引对应的第一层热点索引存在第二层热点索引,且对应的第二层热点索引不为0,则证明该第二层热点索引对应的数据的访问量中等;如果该主键(数据)对应的第二层热点索引的计数值满了,则证明该主键(数据)的访问量较多,属于热点主键。For example, when the hotspot identification tree has two layers, if a certain hotspot index only corresponds to the first-layer hotspot index, or the count value of some second-layer hotspot indexes corresponding to the hotspot index is 0, it proves that these hotspot indexes (the former The first-layer hotspot index of the latter (part of the second-layer hotspot index) of the latter has less data access; if the first-layer hotspot index corresponding to a certain hotspot index has a second-layer hotspot index, and the corresponding second-layer hotspot index If the hotspot index is not 0, it proves that the access volume of the data corresponding to the second-level hotspot index is medium; if the count value of the second-level hotspot index corresponding to the primary key (data) is full, it proves that the access volume of the primary key (data) The amount is large and it belongs to the hot primary key.
步骤107,响应于所述计数值满足预设的热点条件,确定述主键是热点主键;响应于所述计数值不满足预设的热点条件,将所述计数值加一。Step 107: In response to the count value meeting the preset hotspot condition, determine that the primary key is the hotspot primary key; in response to the count value not meeting the preset hotspot condition, add one to the count value.
换言之,由于热点主键的识别条件为在一段时间内的访问量大于一定值,所以计数值在一定时间内的统计量大于一定值,或者计数值表征的模糊访问量占一定时间内的总访问量的比例大于一定值,则认为主键是热点主键。且需要在主键不是热点主键的情况下,继续统计该主键对应的热点索引的计数值。In other words, since the identification condition of the hotspot primary key is that the number of visits within a certain period of time is greater than a certain value, the statistic of the count value within a certain period of time is greater than a certain value, or the fuzzy number of visits represented by the count value accounts for the total number of visits within a certain period of time. If the ratio is greater than a certain value, the primary key is considered to be a hotspot primary key. And it is necessary to continue to count the count value of the hotspot index corresponding to the primary key when the primary key is not the hotspot primary key.
因此,步骤107可以具体包括:响应于所述计数值占所有计数值之和的比例超过预设的热点比例,确定所述主键为热点主键。Therefore, step 107 may specifically include: in response to the ratio of the count value to the sum of all count values exceeding a preset hotspot ratio, determining the primary key to be a hotspot primary key.
换言之,可以通过计算计数值的占比是否大于一定值来确定是否是热点数据,这对于单层的热点识别树来说是一种较为方便的方法。对于多层热点识别树而言,为了计算得到准确的热点,计数值可以是最后一层的计数值之和(每一层的计数值都会累加至下一层,比如第一层计数值15满了,拓展下一层的情况下,第二层的计数值从15开始计数)。所有计数值之和也可以是访问数之和(因为每个访问请求到达后都会给某个计数值+1,那么访问数之和也就是所有计数值之和)。In other words, whether it is hotspot data can be determined by calculating whether the proportion of the count value is greater than a certain value. This is a more convenient method for a single-layer hotspot identification tree. For a multi-layer hotspot identification tree, in order to calculate accurate hotspots, the count value can be the sum of the count values of the last layer (the count value of each layer will be accumulated to the next layer, for example, the count value of the first layer is 15. , when expanding to the next layer, the count value of the second layer starts counting from 15). The sum of all count values can also be the sum of the number of accesses (because each access request will add a certain count value by 1 after arrival, then the sum of the number of accesses is also the sum of all count values).
此外,除了上述方法外,还可以通过下述方法实现步骤107。In addition, in addition to the above method, step 107 can also be implemented by the following method.
所述方法还包括:在收到针对任意数据的数据访问请求的情况下,将访问请求数加1;在所述访问请求数大于预设的访问请求阈值的情况下,将所述访问请求数重置为0,并将所述热点识别树包括的多个热点索引对应的计数值置为0。且步骤107包括:响应于所述计数值达到预设的热点阈值,确定所述主键为热点主键。The method also includes: when receiving a data access request for any data, adding 1 to the number of access requests; when the number of access requests is greater than a preset access request threshold, increasing the number of access requests. Reset to 0, and set the count values corresponding to multiple hotspot indexes included in the hotspot identification tree to 0. And step 107 includes: in response to the count value reaching a preset hotspot threshold, determining the primary key to be a hotspot primary key.
换言之,在针对数据库中的所有数据总访问量大于一定值时,重置热点识别树,重新开始统计。在上述情况下,由于热点识别树会重置,因此预设的热点条件可以是计数值超过预设的热点阈值,这样便可以统计到一定时间内,热点数据的访问流量占所有数据的访问流量的比例。In other words, when the total number of visits to all data in the database is greater than a certain value, the hotspot identification tree is reset and statistics are restarted. In the above case, since the hotspot identification tree will be reset, the preset hotspot condition can be that the count value exceeds the preset hotspot threshold, so that the access traffic of hotspot data accounts for the access traffic of all data within a certain period of time. proportion.
此外,还可以每隔一段时间重置热点识别树(即重新初始化热点识别树),这种情况下热点识别条件可以是计数值超过预设的热点阈值。这样便可以统计一段时间内的访问量 是否超过预定的阈值,从而判断是否存在热点。In addition, the hotspot identification tree can also be reset at regular intervals (ie, the hotspot identification tree is reinitialized). In this case, the hotspot identification condition can be that the count value exceeds a preset hotspot threshold. This way you can count the number of visits over a period of time Whether it exceeds a predetermined threshold to determine whether there is a hot spot.
还需要说明的是,在识别到热点主键的情况下,还可以记录热点主键,并开始统计热点主键的具体访问量,这样主要是为了方便观测流量的情况,并可以观察热点的处理情况。It should also be noted that when the hotspot primary key is identified, you can also record the hotspot primary key and start counting the specific visits of the hotspot primary key. This is mainly to facilitate the observation of traffic conditions and the processing of hotspots.
换言之,所述方法还包括:在确定所述主键为热点主键的情况下,对所述主键进行记录,并记录所述主键的访问量。In other words, the method further includes: when it is determined that the primary key is a hotspot primary key, recording the primary key and recording the number of visits to the primary key.
在上述情况下,为了避免系统无限制记录主键的访问量,占用过多的内存,可以基于最近最少使用(Least recently used,LRU)或最不经常使用(Least frequency used,LFU)的缓存淘汰策略,来删除一部分记录的主键。In the above situation, in order to prevent the system from unlimited access to the primary key record and occupying too much memory, a cache eviction strategy can be based on the least recently used (Least recently used, LRU) or the least frequently used (Least frequency used, LFU) , to delete the primary keys of some records.
具体而言,可以在记录的主键数量超过一定阈值的时候,将访问频率最低的主键(访问频率最低的主键可能是由于热点索引数量小于主键数量,而误被识别成热点的数据)删除,也可以是在主键数量超过一定阈值的情况下,将一定时间内一直没被访问到的主键删除。Specifically, when the number of recorded primary keys exceeds a certain threshold, the primary key with the lowest access frequency (the primary key with the lowest access frequency may be mistakenly identified as hotspot data because the number of hotspot indexes is less than the number of primary keys) can be deleted. It can be that when the number of primary keys exceeds a certain threshold, the primary keys that have not been accessed within a certain period of time are deleted.
对于上一段的前者来说,也就是:所述方法还包括:在记录的主键数目超过预设的主键数量阈值的情况下,根据记录的各个主键的访问量,将访问频率最低的主键删除。For the former in the previous paragraph, that is: the method also includes: when the number of recorded primary keys exceeds a preset primary key number threshold, deleting the primary key with the lowest access frequency based on the access volume of each recorded primary key.
在对本说明书示出的方法进行说明后,接下来将对热点识别树进行进一步详细说明。After describing the method shown in this specification, the hotspot identification tree will be described in further detail.
上文中曾提及热点识别树具有多层的情况,为热点识别树设置多层的目的,是为了避免哈希冲突,以及进一步节省内存。具体而言,热点索引是通过一定的计算方法所计算出来,且由于热点索引的数量小于主键的数量,导致热点索引对应了多个主键,为了尽可能的区分同一热点索引对应的不同主键,在热点识别树只有一层时,可以通过提高热点索引的数量来实现,但是提高了热点索引的数量会使得占用内存的量也多(很可能有些热点索引的计数值为0或者很小),因此为了避免上述矛盾,可以通过将热点识别树设置多层来解决。As mentioned above, the hotspot identification tree has multiple layers. The purpose of setting up multiple layers for the hotspot identification tree is to avoid hash conflicts and further save memory. Specifically, the hotspot index is calculated through a certain calculation method, and because the number of hotspot indexes is less than the number of primary keys, the hotspot index corresponds to multiple primary keys. In order to distinguish different primary keys corresponding to the same hotspot index as much as possible, in When the hotspot identification tree has only one layer, it can be achieved by increasing the number of hotspot indexes. However, increasing the number of hotspot indexes will occupy more memory (it is likely that the count value of some hotspot indexes is 0 or very small), so In order to avoid the above contradiction, it can be solved by setting up the hotspot identification tree to have multiple layers.
具体而言,在热点识别树设置了多层的情况下,其结构可以如图2所示,第一层热点索引的数量为N,每个第一层热点索引对应于M个第二层热点索引,每个第二层热点索引可以对应于Q个第三层热点索引(N、M和Q都是预设正整数,N、M和Q的大小可以相同也可以不同),以此类推。不同层热点索引对应于了不同的索引计算方法(可以是散列函数),这样可以避免哈希冲突(在索引计算方法是散列函数的情况下),减少将一些数据误识别为热点的可能性。并且为了减少内存,可以设置在上一层的热点索引的计数值不满的情况下,不建立下一层的热点索引。换言之,最开始的热点识别树只有一层,当这一层中任一热点索引对应的计数值超过预设的扩展阈值的情况下,则新建该热点索引对应的下一层的热点索引。这样,在只有少数数据是热点的情况下,大部分的热点索引都止步于前几层,很少有能到最后一层的热点索引,相比于只有一层热点索引的方案,同样大小的存储空间可以避免哈希冲突。Specifically, when the hotspot identification tree has multiple layers, its structure can be shown in Figure 2. The number of first-layer hotspot indexes is N, and each first-layer hotspot index corresponds to M second-layer hotspots. Index, each second-level hotspot index can correspond to Q third-level hotspot indexes (N, M, and Q are all preset positive integers, and the sizes of N, M, and Q can be the same or different), and so on. Different layers of hotspot indexes correspond to different index calculation methods (which can be hash functions), which can avoid hash conflicts (when the index calculation method is a hash function) and reduce the possibility of misidentifying some data as hot spots. sex. And in order to reduce memory, you can set the hotspot index of the next layer not to be established when the count value of the hotspot index of the upper layer is not satisfied. In other words, the initial hotspot identification tree has only one layer. When the count value corresponding to any hotspot index in this layer exceeds the preset expansion threshold, a new hotspot index corresponding to the hotspot index of the next layer is created. In this way, when only a small number of data are hotspots, most hotspot indexes stop at the first few layers, and there are few hotspot indexes that can reach the last layer. Compared with the solution with only one layer of hotspot indexes, the same size Storage space can avoid hash collisions.
还需要说明的是,图2中虽然示出了多层均有对应的热点索引的结构,但是在实际应 用过程中,并不是每个第一层热点索引都对应有第二层热点索引的。It should also be noted that although Figure 2 shows a structure in which multiple layers have corresponding hotspot indexes, in actual applications During use, not every first-level hotspot index corresponds to a second-level hotspot index.
可见,热点识别树的多层结构可以在避免哈希冲突的同时,减少所占用的内存空间。It can be seen that the multi-layer structure of the hotspot identification tree can avoid hash conflicts and reduce the memory space occupied.
接下来将以两层热点识别树为例,来说明上述方案。Next, a two-layer hotspot identification tree will be used as an example to illustrate the above solution.
所述热点识别树包括第一层和第二层。步骤103具体包括:根据所述热点识别树第一层对应的散列函数,计算所述主键对应的第一层热点索引;响应于所述热点识别树中不存在所述第一层热点索引对应的第二层热点索引,将所述第一层热点索引对应的技术值确定为所述对应的计数值;响应于所述热点识别树中存在所述第一层热点索引对应的第二层热点索引,根据第二层对应的散列函数确定所述主键对应的第二层热点索引,并将所述第二层热点索引对应的计数值确定为所述对应的计数值。其中,第一层对应的散列函数和第二层对应的散列函数不同。The hotspot identification tree includes a first layer and a second layer. Step 103 specifically includes: calculating the first-level hotspot index corresponding to the primary key according to the hash function corresponding to the first level of the hotspot identification tree; responding to the corresponding first-level hotspot index not existing in the hotspot identification tree. second layer hotspot index, determine the technical value corresponding to the first layer hotspot index as the corresponding count value; in response to the existence of the second layer hotspot corresponding to the first layer hotspot index in the hotspot identification tree Index, determine the second layer hotspot index corresponding to the primary key according to the hash function corresponding to the second layer, and determine the count value corresponding to the second layer hotspot index as the corresponding count value. Among them, the hash function corresponding to the first layer is different from the hash function corresponding to the second layer.
换言之,在热点识别树包括两层的情况下,确定的计数值是主键对应存在的最后一层的计数值。具体而言,在该主键对应的第一层热点索引存在第二热点索引的情况下,则将该主键对应的第二层热点索引的计数值作为确定的计数值,在不存在第二层热点索引的情况下,则将第一层热点索引的计数值作为确定的计数值。In other words, in the case where the hotspot identification tree includes two levels, the determined count value is the count value of the last level corresponding to the primary key. Specifically, when there is a second hotspot index in the first-level hotspot index corresponding to the primary key, the count value of the second-level hotspot index corresponding to the primary key is used as the determined count value. If there is no second-level hotspot index, In the case of index, the count value of the first-level hotspot index is used as the determined count value.
除了确定计数值外,还有拓展下一层的过程,具体而言:预先将热点识别树包括的多个热点索引对应的计数值初始化为0,包括:预先将热点识别树第一层包括的多个热点索引对应的计数值初始化为0。所述方法还包括:响应于加1后的第一层热点索引对应的计数值超过预设扩展阈值,新建所述第一层热点索引对应的多个第二层热点索引,并将多个第二层热点索引对应的计数值初始化为0。In addition to determining the count value, there is also the process of expanding the next layer. Specifically: initializing the count values corresponding to multiple hotspot indexes included in the hotspot identification tree to 0 in advance, including: preliminarily Count values corresponding to multiple hotspot indexes are initialized to 0. The method also includes: in response to the count value corresponding to the first-layer hotspot index after adding 1 exceeding the preset expansion threshold, creating a plurality of second-layer hotspot indexes corresponding to the first-layer hotspot index, and adding the plurality of second-layer hotspot indexes to the first-layer hotspot index. The count value corresponding to the second-layer hotspot index is initialized to 0.
换言之,预先初始化时只初始化第一层热点索引,在第一层计数值超过扩展阈值的情况下,则扩展下一层热点索引,并初始化该下一层热点索引。In other words, only the first-layer hotspot index is initialized during pre-initialization. When the first-layer count value exceeds the expansion threshold, the next-layer hotspot index is expanded and initialized.
通过上述方法,节省了内存,由于热点索引的数量是固定的,且小于主键的数量,可以较好的应用于具有海量数据的数据库。在较低的内存开销的前提下,误报率可以根据热点识别树的层数、每一层理论上的最多热点索引数量、计数值所能达到的最大值(如果识别热点主键是通过识别计数值超过预定的热点阈值)以及访问请求阈值(在存在访问请求阈值的情况下)计算出来,可以通过合理设定各个阈值以可以满足热点识别要求。且由于不占用内存,性能开销也不大。Through the above method, memory is saved. Since the number of hotspot indexes is fixed and smaller than the number of primary keys, it can be better applied to databases with massive data. Under the premise of low memory overhead, the false alarm rate can be determined based on the number of layers of the hotspot identification tree, the theoretical maximum number of hotspot indexes at each layer, and the maximum value that the count value can reach (if the primary key of the hotspot is identified by identifying the count The value exceeds the predetermined hotspot threshold) and the access request threshold (in the case where there is an access request threshold) are calculated, and each threshold can be reasonably set to meet the hotspot identification requirements. And because it does not occupy memory, the performance overhead is not large.
在识别出热点主键后,还需要对热点主键进行处理,以避免热点主键对于单机运行的影响。而处理热点的方法有多种,接下来将以限流为例来说明热点主键的处理过程。After the hotspot primary key is identified, the hotspot primary key also needs to be processed to avoid the impact of the hotspot primary key on stand-alone operation. There are many ways to deal with hot spots. Next, we will take current limiting as an example to illustrate the processing process of hot spot primary keys.
如图3所示,图3是本说明书根据一示例性实施例示出的一种限流方法的流程图,包括:As shown in Figure 3, Figure 3 is a flow chart of a current limiting method according to an exemplary embodiment of this specification, including:
步骤301,接收对数据库中的目标数据的数据访问请求,所述数据访问请求携带所述目标数据对应的主键。Step 301: Receive a data access request for target data in the database, where the data access request carries the primary key corresponding to the target data.
步骤303,响应于所述主键属于热点主键,阻断对所述目标数据的数据访问请求。 Step 303: In response to the primary key belonging to the hotspot primary key, block the data access request for the target data.
其中,所述热点主键是前述的热点识别方法得到。这样,阻断了热点主键的数据访问请求,从而达到保护系统自身的目的。Wherein, the hotspot primary key is obtained by the aforementioned hotspot identification method. In this way, data access requests for hotspot primary keys are blocked, thereby achieving the purpose of protecting the system itself.
需要说明的是,步骤303中,阻断对目标数据的访问请求的实现方式可以是,在通过前述的热点识别方法识别出热点主键后,如果是热点主键则阻断该访问请求。此外,为了进一步减少热点问题对于系统运行的影响,可以在发现热点后,将热点主键加入单机限流器(比如:guava中的RateLimiter,当然限流器还可以是其他限流器,本说明书对于限流器的类型不做限定)中,在系统对该访问请求进一步处理(比如通过前述方法判断是否是热点数据)阻断该访问请求,以节省运行性能。It should be noted that in step 303, blocking the access request to the target data may be implemented by: after identifying the hotspot primary key through the aforementioned hotspot identification method, blocking the access request if it is a hotspot primary key. In addition, in order to further reduce the impact of hotspot problems on system operation, after discovering hotspots, you can add the hotspot primary key to a stand-alone current limiter (for example: RateLimiter in guava. Of course, the current limiter can also be other current limiters. This manual is for (The type of current limiter is not limited), the access request will be blocked by the system after further processing (for example, judging whether it is hot data through the aforementioned method) to save operating performance.
此外,为了确保识别的准确性,热点主键除了可以通过上述方法识别之外,在记录了热点主键对应的访问量的情况下,还可以通过记录的访问量来识别热点主键。在记录的某一主键访问量大于一定值时,才阻断针对该主键的数据访问请求,这样可以提升限流的准确性。In addition, in order to ensure the accuracy of identification, in addition to identifying the hotspot primary key through the above method, when the number of visits corresponding to the hotspot primary key is recorded, the hotspot primary key can also be identified through the recorded number of visits. When the recorded access volume of a certain primary key is greater than a certain value, data access requests for that primary key will be blocked, which can improve the accuracy of current limiting.
还需要说明的是,在记录了热点主键对应的访问量的情况下,在对热点主键进行限流后,还可以再观察一段时间该限流的热点主键的访问量,这样可以反馈热点主键的限流情况,方便调整限流策略等。It should also be noted that when the number of visits corresponding to the hotspot primary key is recorded, after limiting the current flow of the hotspot primary key, you can also observe the number of visits to the hotspot primary key with the current limit for a period of time, so as to provide feedback on the hotspot primary key. The current limiting situation makes it easy to adjust the current limiting strategy, etc.
此外,在上述情况下,如果还存在LRF和LFU的缓存淘汰策略,还可以将需要观察的热点主键锁定,避免限流成功的情况下,需要观察的热点主键的数据被LRF或LFU策略所淘汰。In addition, in the above situation, if there are LRF and LFU cache elimination strategies, you can also lock the hotspot primary keys that need to be observed to avoid the data of the hotspot primary keys that need to be observed being eliminated by the LRF or LFU strategy when current limiting is successful. .
接下来将通过一具体实施例来对本说明书示出的一种热点识别方法和一种限流方法进行说明。Next, a hotspot identification method and a current limiting method shown in this specification will be described through a specific embodiment.
接下来将以4层、每层每组热点索引的数量为256,对热点识别树的结构和热点识别树的更新、重置等进行说明。Next, we will explain the structure of the hotspot identification tree and the update and reset of the hotspot identification tree with 4 layers and the number of hotspot indexes in each group of each layer being 256.
首先引入概述(Sketch)这一概念,其指的是一组热点索引及其对应的计数值(一组包括256个热点索引),其结构如图4A所示,每个sketch包括256个热点索引,256个计数值,以及存储有每个计数值所能计数到的最大值,sketch所属的层数等。First, the concept of overview (Sketch) is introduced, which refers to a set of hotspot indexes and their corresponding count values (a set includes 256 hotspot indexes). Its structure is shown in Figure 4A. Each sketch includes 256 hotspot indexes. , 256 count values, and stores the maximum value that can be counted for each count value, the number of layers to which the sketch belongs, etc.
在对sketch进行说明后,接下来将来介绍热点识别树的结构,如图4B所示,该热点识别树包括4层结构,第一层包括1个sketch,第二层包括第一层每个热点索引对应的sketch,也就是第二层最多包括256个sketch,第三层最多包括256*256个sketch,第四层最多包括256*256*256个sketch(为了方便每一层只示出了1个sketch)。由于每个sketch具有256个热点索引,该热点识别树最多可以识别的热点数量为256的4次方(即第四层包括的热点索引的个数)。还需要说明的是,不同层对应于不同的Hash函数。After explaining the sketch, the structure of the hotspot identification tree will be introduced next, as shown in Figure 4B. The hotspot identification tree includes a 4-layer structure. The first layer includes 1 sketch, and the second layer includes each hotspot in the first layer. The sketch corresponding to the index, that is, the second layer includes at most 256 sketches, the third layer includes at most 256*256 sketches, and the fourth layer includes at most 256*256*256 sketches (for convenience, only 1 is shown for each layer sketch). Since each sketch has 256 hotspot indexes, the maximum number of hotspots that this hotspot identification tree can identify is the 4th power of 256 (that is, the number of hotspot indexes included in the fourth layer). It should also be noted that different layers correspond to different Hash functions.
针对热点识别树存在两种操作,第一种是更新热点识别树,也就是在接收到数据访问请求的情况下,更新热点识别树,顺便判断该数据访问请求所访问的数据是不是热点。第二种是重置热点识别树。 There are two operations for the hotspot identification tree. The first is to update the hotspot identification tree, that is, when a data access request is received, the hotspot identification tree is updated, and by the way, it is determined whether the data accessed by the data access request is a hotspot. The second is to reset the hotspot identification tree.
接下来将首先叙述更新热点识别树的过程。Next, the process of updating the hotspot identification tree will be described first.
首先,使用热点识别树的第一层对应的hash函数,计算数据访问请求针对的数据的主键的第一层热点索引。其中,为了将热点索引的数量限定在256个之内,可以对求解得到的索引值取256的余数,得到第一层热点索引。First, use the hash function corresponding to the first level of the hotspot identification tree to calculate the first-level hotspot index of the primary key of the data targeted by the data access request. Among them, in order to limit the number of hotspot indexes to 256, the remainder of 256 can be taken from the index value obtained by solving the problem to obtain the first-layer hotspot index.
在得到第一层热点索引后,判断该第一层热点索引是否对应有第二层热点索引,如果有第二层热点索引,则计算该主键对应的第二层热点索引,并判断该第二层热点索引是否存在第三层热点索引,以此类推,直至确定该主键对应的最后一层热点索引(最后一层指的是层数最大的,需要注意的是,这里的最后一层热点索引是主键存在的最后一层热点索引)。After obtaining the first-level hotspot index, determine whether the first-level hotspot index corresponds to the second-level hotspot index. If there is a second-level hotspot index, calculate the second-level hotspot index corresponding to the primary key, and determine the second-level hotspot index. Whether there is a third-layer hotspot index in the layer hotspot index, and so on, until the last layer of hotspot index corresponding to the primary key is determined (the last layer refers to the one with the largest number of layers. It should be noted that the last layer of hotspot index here is It is the last hotspot index where the primary key exists).
在确定最后一层热点索引后,将该最后一层热点索引对应的计数值加1,并判断加1后的计数值是否超过计数值的最大值(14),如果超过,则扩展该热点索引对应的下一层的sketch(即初始化下一层的sketch),如果没有可扩展的下一层sketch,则确定该主键为热点主键。After determining the hotspot index of the last layer, add 1 to the count value corresponding to the hotspot index of the last layer, and determine whether the count value after adding 1 exceeds the maximum count value (14). If it does, expand the hotspot index. The corresponding next-level sketch (that is, initializing the next-level sketch). If there is no scalable next-level sketch, the primary key is determined to be a hotspot primary key.
这样完成了热点主键的识别。上述过程中,使用hash函数,将数据访问请求分散到多个热点索引中做累加,累加值高的代表频率更高(这个效果要结合热点识别树重置过程)。因为每个sketch热点索引的数量(256)远小于数据访问请求要访问的数据分布,频率低的数据可能因为hash冲突和频率高的数据分散到同一个热点索引上而产生误报,因此,在热点识别树中增加了多层的设计,每一层单独对应一个hash函数,这样在第一层因为hash冲突混在一起的数据,在第二层使用另一个hash函数有较大概率分散开来,通过4层4个hash函数,可以极大地降低因为hash冲突产生的误报。This completes the identification of the hotspot primary key. In the above process, the hash function is used to disperse the data access requests into multiple hotspot indexes for accumulation. The higher the accumulated value, the higher the frequency (this effect must be combined with the hotspot identification tree reset process). Because the number of hotspot indexes (256) in each sketch is much smaller than the data distribution to be accessed by the data access request, low-frequency data may cause false positives due to hash conflicts and high-frequency data being scattered to the same hotspot index. Therefore, in A multi-layer design is added to the hotspot identification tree. Each layer corresponds to a hash function. In this way, the data mixed together due to hash conflicts on the first layer has a higher probability of being dispersed using another hash function on the second layer. Through 4 layers of 4 hash functions, false positives caused by hash conflicts can be greatly reduced.
此外还需要说明的是,在非关系型数据库(noSQL)中,每一条数据都有能唯一标识其的主键。在读写链路上,每条被访问的数据,其主键都会作为入参调用上述方法,以确定被访问的数据是否是热点。每次调用上述方法都会返回一个boolean类型(即只能返回true或false)的返回值。当返回值为true时,则有较大的概率传入的主键对应的数据是一个正在被频繁访问的热点数据。In addition, it should be noted that in a non-relational database (noSQL), each piece of data has a primary key that uniquely identifies it. On the read-write link, the primary key of each accessed data will be used as an input parameter to call the above method to determine whether the accessed data is a hotspot. Each call to the above method will return a boolean type return value (that is, it can only return true or false). When the return value is true, there is a high probability that the data corresponding to the primary key passed in is a hotspot data that is being accessed frequently.
此外,凡是返回true的主键,会对其每次读写访问进行记录。其目的有2个:第一,上述方法没法告知其具体的数据访问请求的访问量,通过记录,可以获取到热点主键的真实访问量,以便运维人员直观获知访问量。第二,因为上述方法有较低概率返回低频数据(将低频访问的数据误认为是热点数据),通过记录,可以得知哪些是真正的高频热点,从而将低频数据删除,防止误限流一些低频数据,造成用户的困扰。In addition, any primary key that returns true will be recorded for each read and write access. There are two purposes: First, the above method cannot tell the specific number of visits to the data access request. By recording, the real number of visits to the hotspot primary key can be obtained, so that operation and maintenance personnel can intuitively know the number of visits. Second, because the above method has a lower probability of returning low-frequency data (mistaking low-frequency accessed data as hotspot data), through recording, you can know which are the real high-frequency hotspots, thereby deleting the low-frequency data to prevent mistaken current limiting. Some low-frequency data cause trouble to users.
此外,还要限制持续记录的主键数在1000以内,一旦超过1000,则会按LRU的原则淘汰数据,同时,一定时间未被访问的数据(比如5分钟)也会被淘汰。这样做一方面是更快的过滤掉低概率返回的低频数据,另一方面是限制内存开销,只跟踪记录最高频的热点key访问数据。In addition, the number of continuously recorded primary keys must be limited to 1,000. Once it exceeds 1,000, the data will be eliminated according to the LRU principle. At the same time, data that has not been accessed for a certain period of time (such as 5 minutes) will also be eliminated. On the one hand, this is to filter out low-frequency data with low probability of return more quickly; on the other hand, it is to limit the memory overhead and only track and record the most frequent hotspot key access data.
接下来将详细叙述热点识别树重置的过程。将上述过程看作热点识别树扩展的过程, 则重置过程则是热点识别树收缩的过程。Next, the process of resetting the hotspot identification tree will be described in detail. Consider the above process as the process of hotspot identification tree expansion, Then the reset process is the process of shrinking the hotspot identification tree.
由于每次接收到数据访问请求,都需要通过上述方法来更新热点识别树,因此可以将上述方法执行的次数作为接收到的数据访问请求(未被限流的数据访问请求)的数量。在接收到的数据访问请求的数量超过预设的数量阈值的情况下,则重置热点识别树,即删除热点识别树的第2-4层,并将第一层的所有计数值重置为0。这样做的目的在于,流量是一直变化的,高频访问的数据可能只会持续一段时间就不再高频,因此需要加入周期性的收缩机制,用于将历史高频的数据过滤掉,使得上述热点识别方法能够始终在有限的内存开销下正确判断当下正在被高频访问的数据。Since each time a data access request is received, the hotspot identification tree needs to be updated through the above method, the number of times the above method is executed can be regarded as the number of received data access requests (data access requests that are not current-limited). When the number of received data access requests exceeds the preset quantity threshold, the hotspot identification tree is reset, that is, layers 2-4 of the hotspot identification tree are deleted, and all count values in the first layer are reset to 0. The purpose of this is that the traffic is constantly changing, and the data accessed frequently may only last for a period of time and then cease to be high-frequency. Therefore, a periodic shrinkage mechanism needs to be added to filter out the historical high-frequency data, so that The above hotspot identification method can always correctly determine the data that is currently being accessed frequently with limited memory overhead.
还需要说明的是,加入了重置方法可以筛选出访问频率(即该数据的访问量/所有数据的访问量)大于一定值的热点主键。具体而言,热点识别树每更新P次则会重置一次,那么可以通过设置每一层计数值的最大值,来获取到预定访问评率以上的热点主键。It should also be noted that a reset method has been added to filter out hotspot primary keys whose access frequency (that is, the number of visits to this data/the number of visits to all data) is greater than a certain value. Specifically, the hotspot identification tree will be reset every time it is updated P times. Then, the maximum value of the count value of each layer can be set to obtain the hotspot primary key with a predetermined access rating rate.
此外,在发现热点主键后,可以综合持续记录的主键来获知当下真实的读写访问吞吐,进而运维人员可以进行限流,比如以使用单机限流器(比如:guava中的RateLimiter)进行限流,对被限流的请求采用fail fast的策略返回异常给客户端,实现对系统的自我保护。In addition, after discovering the hotspot primary key, the continuously recorded primary key can be combined to learn the current real read and write access throughput, and then the operation and maintenance personnel can limit the current, such as using a single-machine current limiter (such as: RateLimiter in guava). Flow, use the fail fast strategy to return exceptions to the client for flow-limited requests to achieve self-protection of the system.
上述方法节省内存的效果非常好,对于数据库中的每张表来说,使用上述例子中的参数,单张表在无热点的情况下仅占用500byte常驻内存。每张表的内存占用存在理论上限,为30k(每张表都是热点,每张表都有多个热点key)。可见使用上述方法所需的内存开销和实际存储数据量无关,可以应用于海量存储的数据库场景。The above method is very effective in saving memory. For each table in the database, using the parameters in the above example, a single table only occupies 500 bytes of resident memory without hot spots. There is a theoretical upper limit for the memory usage of each table, which is 30k (each table is a hotspot, and each table has multiple hotspot keys). It can be seen that the memory overhead required by using the above method has nothing to do with the actual amount of data stored, and it can be applied to database scenarios with mass storage.
在实现极低的内存开销前提下,使用上述参数,该方案最低可以探测出出现频率在3.5%以上的key,并统计qps与rt,完全能够满足识别热点的要求。Under the premise of achieving extremely low memory overhead, using the above parameters, this solution can detect keys with a minimum frequency of more than 3.5% and count qps and rt, which can fully meet the requirements for identifying hot spots.
上述方法log(N)复杂度的cpu计算开销,对读写请求响应的影响低。在无热点的情况下,个人电脑可以支持较高数量的热点key判定。(有热点情况下sketch会进行扩展,稍微慢1~2倍),体现出了性能开销小的优势。The above method has a log(N) complexity CPU computational overhead and has a low impact on the response to read and write requests. In the absence of hotspots, personal computers can support a higher number of hotspot key determinations. (Sketch will expand when there are hot spots, which is slightly slower by 1 to 2 times), which reflects the advantage of low performance overhead.
此外,误报率可以通过上述参数计算出来,可以合理设置上述参数降低误报率。In addition, the false alarm rate can be calculated through the above parameters, and the above parameters can be reasonably set to reduce the false alarm rate.
与前述方法的实施例相对应,本说明书还提供了装置及其所应用的电子设备的实施例。Corresponding to the foregoing method embodiments, this specification also provides embodiments of devices and electronic equipment to which they are applied.
如图5所示,图5是本说明书根据一示例性实施例示出的一种热点识别装置的框图,所述装置包括:As shown in Figure 5, Figure 5 is a block diagram of a hot spot identification device according to an exemplary embodiment of this specification. The device includes:
初始化模块500,用于预先将热点识别树包括的多个热点索引对应的计数值初始化为0;所述热点识别树中热点索引的数量小于数据库中数据的数量。The initialization module 500 is used to initialize the count values corresponding to multiple hotspot indexes included in the hotspot identification tree to 0 in advance; the number of hotspot indexes in the hotspot identification tree is smaller than the number of data in the database.
主键确定模块510,用于在收到数据访问请求的情况下,确定所述数据访问请求所访问数据的主键。The primary key determination module 510 is configured to determine the primary key of the data accessed by the data access request when receiving the data access request.
计数值确定模块520,用于计算所述主键对应的热点索引,并根据所述热点索引在热点识别树中确定对应的计数值。The count value determination module 520 is used to calculate the hotspot index corresponding to the primary key, and determine the corresponding count value in the hotspot identification tree according to the hotspot index.
热点确定模块530,用于响应于所述计数值满足预设的热点条件,确定述主键是热点 主键;响应于所述计数值不满足预设的热点条件,将所述计数值加一。Hotspot determination module 530, configured to determine that the primary key is a hotspot in response to the count value meeting the preset hotspot condition. Primary key; in response to the count value not meeting the preset hotspot condition, increase the count value by one.
在一可选实施例中,所述热点识别树包括第一层和第二层。计数值确定模块520具体用于:根据所述热点识别树第一层对应的散列函数,计算所述主键对应的第一层热点索引;响应于所述热点识别树中不存在所述第一层热点索引对应的第二层热点索引,将所述第一层热点索引对应的技术值确定为所述对应的计数值;响应于所述热点识别树中存在所述第一层热点索引对应的第二层热点索引,根据第二层对应的散列函数确定所述主键对应的第二层热点索引,并将所述第二层热点索引对应的计数值确定为所述对应的计数值。In an optional embodiment, the hotspot identification tree includes a first layer and a second layer. The count value determination module 520 is specifically configured to: calculate the first-level hotspot index corresponding to the primary key according to the hash function corresponding to the first level of the hotspot identification tree; in response to the fact that the first level does not exist in the hotspot identification tree. The second layer hotspot index corresponding to the layer hotspot index determines the technical value corresponding to the first layer hotspot index as the corresponding count value; in response to the presence of the first layer hotspot index corresponding to the hotspot identification tree in the The second layer hotspot index determines the second layer hotspot index corresponding to the primary key according to the hash function corresponding to the second layer, and determines the count value corresponding to the second layer hotspot index as the corresponding count value.
在一可选实施例中,初始化模块500具体用于:预先将热点识别树第一层包括的多个热点索引对应的计数值初始化为0。此外所述装置还包括:扩展模块521(图中未示出),用于响应于加1后的第一层热点索引对应的计数值超过预设扩展阈值,新建所述第一层热点索引对应的多个第二层热点索引,并将多个第二层热点索引对应的计数值初始化为0。In an optional embodiment, the initialization module 500 is specifically configured to: initialize the count values corresponding to multiple hotspot indexes included in the first layer of the hotspot identification tree to 0 in advance. In addition, the device also includes: an expansion module 521 (not shown in the figure), configured to create a new corresponding first-layer hotspot index in response to the count value corresponding to the first-layer hotspot index added by 1 exceeding the preset expansion threshold. multiple second-layer hotspot indexes, and initialize the count values corresponding to the multiple second-layer hotspot indexes to 0.
在一可选实施例中,所述装置还包括:重置模块523(图中未示出),用于在收到针对任意数据的数据访问请求的情况下,将访问请求数加1;在所述访问请求数大于预设的访问请求阈值的情况下,将所述访问请求数重置为0,并将所述热点识别树包括的多个热点索引对应的计数值置为0。这种情况下,热点确定模块530,具体用于响应于所述计数值达到预设的热点阈值,确定所述主键为热点主键。In an optional embodiment, the device further includes: a reset module 523 (not shown in the figure), configured to increase the number of access requests by 1 when a data access request for any data is received; If the number of access requests is greater than the preset access request threshold, the number of access requests is reset to 0, and the count values corresponding to multiple hotspot indexes included in the hotspot identification tree are set to 0. In this case, the hotspot determination module 530 is specifically configured to determine the primary key as the hotspot primary key in response to the count value reaching a preset hotspot threshold.
在一可选实施例中,热点确定模块530,具体用于响应于所述计数值占所有计数值之和的比例超过预设的热点比例,确定所述主键为热点主键。In an optional embodiment, the hotspot determination module 530 is specifically configured to determine that the primary key is a hotspot primary key in response to the ratio of the count value to the sum of all count values exceeding a preset hotspot ratio.
在一可选实施例中,所述装置还包括:记录模块531(图中未示出),用于在确定所述主键为热点主键的情况下,对所述主键进行记录,并记录所述主键的访问量。In an optional embodiment, the device further includes: a recording module 531 (not shown in the figure), configured to record the primary key when it is determined that the primary key is a hotspot primary key, and record the primary key. The number of visits to the primary key.
在一可选实施例中,所述装置还包括:删除模块532,用于在记录的主键数目超过预设的主键数量阈值的情况下,根据记录的各个主键的访问量,将访问频率最低的主键删除。In an optional embodiment, the device further includes: a deletion module 532, configured to delete the one with the lowest access frequency according to the access volume of each recorded primary key when the number of recorded primary keys exceeds a preset primary key number threshold. Primary key is deleted.
如图6所示,图6是本说明书根据一示例性实施例示出的一种限流装置的框图,所述装置包括:As shown in Figure 6, Figure 6 is a block diagram of a current limiting device according to an exemplary embodiment of this specification. The device includes:
请求接收模块610,用于接收对数据库中的目标数据的数据访问请求,所述数据访问请求携带所述目标数据对应的主键;The request receiving module 610 is used to receive a data access request for target data in the database, where the data access request carries the primary key corresponding to the target data;
请求阻断模块620,用于响应于所述主键属于热点主键,阻断对所述目标数据的数据访问请求;所述热点主键是通过前述的热点识别方法得到。The request blocking module 620 is configured to block data access requests for the target data in response to the fact that the primary key belongs to a hotspot primary key; the hotspot primary key is obtained through the aforementioned hotspot identification method.
上述装置中各个模块的功能和作用的实现过程具体详见上述方法中对应步骤的实现过程,在此不再赘述。The specific implementation process of the functions and effects of each module in the above device can be found in the implementation process of the corresponding steps in the above method, and will not be described again here.
对于装置实施例而言,由于其基本对应于方法实施例,所以相关之处参见方法实施例的部分说明即可。以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的模块可以是或者也可以不是物理上分开的,作为模块显示的部件可以是或者也可以不是物理模块,即可以位于一个地方,或者也可以分布到多个网络模块上。可以根据实际的需要选择其中的部分或者全部模块来实现本说明书方案的目的。本领域普通技术人员在不付 出创造性劳动的情况下,即可以理解并实施。As for the device embodiment, since it basically corresponds to the method embodiment, please refer to the partial description of the method embodiment for relevant details. The device embodiments described above are only illustrative. The modules described as separate components may or may not be physically separated. The components shown as modules may or may not be physical modules, that is, they may be located in One place, or it can be distributed to multiple network modules. Some or all of the modules can be selected according to actual needs to achieve the purpose of the solution in this specification. A person of ordinary skill in the art will not pay It can be understood and implemented when creative work is performed.
如图7所示,图7示出了实施例热点识别装置或限流装置所在电子设备的一种硬件结构图,该设备可以包括:处理器1010、用于存储计算机指令的存储器1020、输入/输出接口1030、通信接口1040和总线1050。其中处理器1010、存储器1020、输入/输出接口1030和通信接口1040通过总线1050实现彼此之间在设备内部的通信连接。As shown in Figure 7, Figure 7 shows a hardware structure diagram of an electronic device in which the hotspot identification device or the current limiting device is located according to the embodiment. The device may include: a processor 1010, a memory 1020 for storing computer instructions, an input/ Output interface 1030, communication interface 1040 and bus 1050. The processor 1010, the memory 1020, the input/output interface 1030 and the communication interface 1040 implement communication connections between each other within the device through the bus 1050.
处理器1010可以采用通用的CPU(Central Processing Unit,处理器)、微处理器、应用专用集成电路(Application Specific Integrated Circuit,ASIC)、或者一个或多个集成电路等方式实现,用于执行计算机指令,以实现上述的热点识别方法或限流方法。The processor 1010 can be implemented using a general-purpose CPU (Central Processing Unit, processor), a microprocessor, an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), or one or more integrated circuits, and is used to execute computer instructions. , to implement the above hotspot identification method or current limiting method.
存储器1020可以采用ROM(Read Only Memory,只读存储器)、RAM(Random Access Memory,随机存取存储器)、静态存储设备,动态存储设备等形式实现。存储器1020可以存储操作系统和其他应用程序,在通过软件或者固件来实现本说明书实施例所提供的技术方案时,相关的程序代码保存在存储器1020中,并由处理器1010来调用执行。The memory 1020 can be implemented in the form of ROM (Read Only Memory), RAM (Random Access Memory), static storage device, dynamic storage device, etc. The memory 1020 can store operating systems and other application programs. When implementing the technical solutions provided by the embodiments of this specification through software or firmware, the relevant program codes are stored in the memory 1020 and called and executed by the processor 1010 .
输入/输出接口1030用于连接输入/输出模块,以实现信息输入及输出。输入输出/模块可以作为组件配置在设备中(图中未示出),也可以外接于设备以提供相应功能。其中输入设备可以包括键盘、鼠标、触摸屏、麦克风、各类传感器等,输出设备可以包括显示器、扬声器、振动器、指示灯等。The input/output interface 1030 is used to connect the input/output module to realize information input and output. The input/output/module can be configured in the device as a component (not shown in the figure), or can be externally connected to the device to provide corresponding functions. Input devices can include keyboards, mice, touch screens, microphones, various sensors, etc., and output devices can include monitors, speakers, vibrators, indicator lights, etc.
通信接口1040用于连接通信模块(图中未示出),以实现本设备与其他设备的通信交互。其中通信模块可以通过有线方式(例如USB、网线等)实现通信,也可以通过无线方式(例如移动网络、WIFI、蓝牙等)实现通信。The communication interface 1040 is used to connect a communication module (not shown in the figure) to realize communication interaction between this device and other devices. The communication module can realize communication through wired means (such as USB, network cable, etc.) or wireless means (such as mobile network, WIFI, Bluetooth, etc.).
总线1050包括一通路,在设备的各个组件(例如处理器1010、存储器1020、输入/输出接口1030和通信接口1040)之间传输信息。Bus 1050 includes a path that carries information between various components of the device (eg, processor 1010, memory 1020, input/output interface 1030, and communication interface 1040).
需要说明的是,尽管上述设备仅示出了处理器1010、存储器1020、输入/输出接口1030、通信接口1040以及总线1050,但是在具体实施过程中,该设备还可以包括实现正常运行所必需的其他组件。此外,本领域的技术人员可以理解的是,上述设备中也可以仅包含实现本说明书实施例方案所必需的组件,而不必包含图中所示的全部组件。It should be noted that although the above device only shows the processor 1010, the memory 1020, the input/output interface 1030, the communication interface 1040 and the bus 1050, during specific implementation, the device may also include necessary components for normal operation. Other components. In addition, those skilled in the art can understand that the above-mentioned device may only include components necessary to implement the embodiments of this specification, and does not necessarily include all components shown in the drawings.
本说明书实施例还提供一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现如上述的热点识别方法和/或限流方法。Embodiments of this specification also provide a computer-readable storage medium on which a computer program is stored. When the program is executed by a processor, the above-mentioned hot spot identification method and/or current limiting method is implemented.
计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括,但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带,磁带磁磁盘存储或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。按照本文中的界定,计算机可读介质不包括暂存电脑可读媒体(transitory media),如调制 的数据信号和载波。Computer-readable media includes both persistent and non-volatile, removable and non-removable media that can be implemented by any method or technology for storage of information. Information may be computer-readable instructions, data structures, modules of programs, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), and read-only memory. (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technology, compact disc read-only memory (CD-ROM), digital versatile disc (DVD) or other optical storage, Magnetic tape cassettes, tape magnetic disk storage or other magnetic storage devices or any other non-transmission medium can be used to store information that can be accessed by a computing device. As defined in this article, computer-readable media does not include temporary computer-readable media (transitory media), such as modulated of data signals and carrier waves.
本说明书实施例还提供一种计算机程序,所述计算机程序被运行时实现如前述的热点识别方法或前述的限流方法。Embodiments of this specification also provide a computer program that, when run, implements the foregoing hotspot identification method or the foregoing current limiting method.
还需要说明的是,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、商品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、商品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、商品或者设备中还存在另外的相同要素。It should also be noted that the terms "comprises," "comprises," or any other variation thereof are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that includes a list of elements not only includes those elements, but also includes Other elements are not expressly listed or are inherent to the process, method, article or equipment. Without further limitation, an element defined by the statement "comprises a..." does not exclude the presence of additional identical elements in a process, method, article, or device that includes the stated element.
上述对本说明书特定实施例进行了描述。其它实施例在所附权利要求书的范围内。在一些情况下,在权利要求书中记载的动作或步骤可以按照不同于实施例中的顺序来执行并且仍然可以实现期望的结果。另外,在附图中描绘的过程不一定要求示出的特定顺序或者连续顺序才能实现期望的结果。在某些实施方式中,多任务处理和并行处理也是可以的或者可能是有利的。 The foregoing describes specific embodiments of this specification. Other embodiments are within the scope of the appended claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desired results. Additionally, the processes depicted in the figures do not necessarily require the specific order shown, or sequential order, to achieve desirable results. Multitasking and parallel processing are also possible or may be advantageous in certain implementations.

Claims (13)

  1. 一种热点识别方法,预先将热点识别树包括的多个热点索引对应的计数值初始化为0;所述热点识别树中热点索引的数量小于数据库中数据的数量;所述方法包括:A hotspot identification method that initializes the count values corresponding to multiple hotspot indexes included in the hotspot identification tree to 0 in advance; the number of hotspot indexes in the hotspot identification tree is smaller than the number of data in the database; the method includes:
    在收到数据访问请求的情况下,确定所述数据访问请求所访问数据的主键;When a data access request is received, determine the primary key of the data accessed by the data access request;
    计算所述主键对应的热点索引,并根据所述热点索引在热点识别树中确定对应的计数值;Calculate the hotspot index corresponding to the primary key, and determine the corresponding count value in the hotspot identification tree according to the hotspot index;
    响应于所述计数值满足预设的热点条件,确定述主键是热点主键;响应于所述计数值不满足预设的热点条件,将所述计数值加一。In response to the count value meeting the preset hotspot condition, it is determined that the primary key is the hotspot primary key; in response to the count value not meeting the preset hotspot condition, the count value is increased by one.
  2. 根据权利要求1所述的方法,所述热点识别树包括第一层和第二层;The method according to claim 1, the hotspot identification tree includes a first layer and a second layer;
    所述计算所述主键对应的热点索引,并根据所述热点索引在热点识别树中确定对应的计数值,包括:Calculating the hotspot index corresponding to the primary key and determining the corresponding count value in the hotspot identification tree based on the hotspot index includes:
    根据所述热点识别树第一层对应的散列函数,计算所述主键对应的第一层热点索引;Calculate the first-level hotspot index corresponding to the primary key according to the hash function corresponding to the first level of the hotspot identification tree;
    响应于所述热点识别树中不存在所述第一层热点索引对应的第二层热点索引,将所述第一层热点索引对应的技术值确定为所述对应的计数值;In response to the fact that there is no second-layer hotspot index corresponding to the first-layer hotspot index in the hotspot identification tree, determine the technical value corresponding to the first-layer hotspot index as the corresponding count value;
    响应于所述热点识别树中存在所述第一层热点索引对应的第二层热点索引,根据第二层对应的散列函数确定所述主键对应的第二层热点索引,并将所述第二层热点索引对应的计数值确定为所述对应的计数值。In response to the existence of the second layer hotspot index corresponding to the first layer hotspot index in the hotspot identification tree, the second layer hotspot index corresponding to the primary key is determined according to the hash function corresponding to the second layer, and the second layer hotspot index corresponding to the first layer is determined. The count value corresponding to the second-layer hotspot index is determined as the corresponding count value.
  3. 根据权利要求2所述的方法,所述预先将热点识别树包括的多个热点索引对应的计数值初始化为0,包括:The method according to claim 2, wherein pre-initializing the count values corresponding to multiple hotspot indexes included in the hotspot identification tree to 0 includes:
    预先将热点识别树第一层包括的多个热点索引对应的计数值初始化为0;Initialize the count values corresponding to multiple hotspot indexes included in the first layer of the hotspot identification tree to 0 in advance;
    所述方法还包括:The method also includes:
    响应于加1后的第一层热点索引对应的计数值超过预设扩展阈值,新建所述第一层热点索引对应的多个第二层热点索引,并将多个第二层热点索引对应的计数值初始化为0。In response to the count value corresponding to the first-layer hotspot index after adding 1 exceeding the preset expansion threshold, multiple second-layer hotspot indexes corresponding to the first-layer hotspot index are created, and the corresponding second-layer hotspot indexes are added to The count value is initialized to 0.
  4. 根据权利要求1所述的方法,所述方法还包括:The method of claim 1, further comprising:
    在收到针对任意数据的数据访问请求的情况下,将访问请求数加1;When a data access request is received for any data, the number of access requests is increased by 1;
    在所述访问请求数大于预设的访问请求阈值的情况下,将所述访问请求数重置为0,并将所述热点识别树包括的多个热点索引对应的计数值置为0;When the number of access requests is greater than the preset access request threshold, reset the number of access requests to 0, and set the count values corresponding to multiple hotspot indexes included in the hotspot identification tree to 0;
    所述响应于所述计数值满足预设的热点条件,确定述主键是热点主键,包括:Determining that the primary key is a hotspot primary key in response to the count value meeting a preset hotspot condition includes:
    响应于所述计数值达到预设的热点阈值,确定所述主键为热点主键。In response to the count value reaching a preset hotspot threshold, the primary key is determined to be a hotspot primary key.
  5. 根据权利要求1所述的方法,所述响应于所述计数值满足预设的热点条件,确定述主键是热点主键,包括:The method of claim 1, wherein determining that the primary key is a hotspot primary key in response to the count value meeting a preset hotspot condition includes:
    响应于所述计数值占所有计数值之和的比例超过预设的热点比例,确定所述主键为热点主键。In response to the ratio of the count value to the sum of all count values exceeding a preset hotspot ratio, the primary key is determined to be a hotspot primary key.
  6. 根据权利要求1所述的方法,所述方法还包括:The method of claim 1, further comprising:
    在确定所述主键为热点主键的情况下,对所述主键进行记录,并记录所述主键的访问 量。When it is determined that the primary key is a hotspot primary key, record the primary key and record the access of the primary key. quantity.
  7. 根据权利要求6所述的方法,所述方法还包括:The method of claim 6, further comprising:
    在记录的主键数目超过预设的主键数量阈值的情况下,根据记录的各个主键的访问量,将访问频率最低的主键删除。When the number of recorded primary keys exceeds the preset primary key number threshold, the primary key with the lowest access frequency will be deleted based on the recorded access volume of each primary key.
  8. 一种限流方法,包括:A current limiting method includes:
    接收对数据库中的目标数据的数据访问请求,所述数据访问请求携带所述目标数据对应的主键;Receive a data access request for target data in the database, where the data access request carries the primary key corresponding to the target data;
    响应于所述主键属于热点主键,阻断对所述目标数据的数据访问请求;所述热点主键是通过权利要求1-7任一项所述的热点识别方法得到。In response to the primary key belonging to a hotspot primary key, data access requests to the target data are blocked; the hotspot primary key is obtained by the hotspot identification method described in any one of claims 1-7.
  9. 一种热点识别装置,预先将热点识别树包括的多个热点索引对应的计数值初始化为0;所述热点识别树中热点索引的数量小于数据库中数据的数量;所述装置包括:A hotspot identification device that pre-initializes the count values corresponding to multiple hotspot indexes included in the hotspot identification tree to 0; the number of hotspot indexes in the hotspot identification tree is smaller than the number of data in the database; the device includes:
    主键确定模块,用于在收到数据访问请求的情况下,确定所述数据访问请求所访问数据的主键;A primary key determination module, configured to determine the primary key of the data accessed by the data access request when receiving a data access request;
    计数值确定模块,用于计算所述主键对应的热点索引,并根据所述热点索引在热点识别树中确定对应的计数值;A count value determination module, used to calculate the hotspot index corresponding to the primary key, and determine the corresponding count value in the hotspot identification tree according to the hotspot index;
    热点确定模块,用于响应于所述计数值满足预设的热点条件,确定述主键是热点主键;响应于所述计数值不满足预设的热点条件,将所述计数值加一。A hotspot determination module, configured to determine that the primary key is a hotspot primary key in response to the count value meeting a preset hotspot condition; and increment the count value by one in response to the count value not meeting the preset hotspot condition.
  10. 一种限流装置,包括:A current limiting device including:
    请求接收模块,用于接收对数据库中的目标数据的数据访问请求,所述数据访问请求携带所述目标数据对应的主键;A request receiving module, configured to receive a data access request for target data in the database, where the data access request carries the primary key corresponding to the target data;
    请求阻断模块,用于响应于所述主键属于热点主键,阻断对所述目标数据的数据访问请求;所述热点主键是通过权利要求1-7任一项所述的热点识别方法得到。A request blocking module, configured to block data access requests to the target data in response to the fact that the primary key belongs to a hotspot primary key; the hotspot primary key is obtained by the hotspot identification method described in any one of claims 1-7.
  11. 一种电子设备,包括:An electronic device including:
    处理器;processor;
    用于存储处理器可执行指令的存储器;Memory used to store instructions executable by the processor;
    其中,所述处理器通过运行所述可执行指令以实现如权利要求1-7任一项所述的热点识别方法或权利要求8所述的限流方法。Wherein, the processor executes the executable instructions to implement the hot spot identification method according to any one of claims 1 to 7 or the current limiting method according to claim 8.
  12. 一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机指令,所述计算机指令被处理器执行时实现如权利要求1-7任一项所述的热点识别方法或权利要求8所述的限流方法。A computer-readable storage medium. Computer instructions are stored on the computer-readable storage medium. When the computer instructions are executed by a processor, the hot spot identification method according to any one of claims 1 to 7 or claim 8 is implemented. The current limiting method.
  13. 一种计算机程序,所述计算机程序被运行时实现如权利要求1-7任一项所述的热点识别方法或权利要求8所述的限流方法。 A computer program that implements the hot spot identification method according to any one of claims 1 to 7 or the current limiting method according to claim 8 when the computer program is run.
PCT/CN2023/081466 2022-03-22 2023-03-14 Hotspot recognition method and rate limiting method WO2023179414A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210289116.6A CN114756544A (en) 2022-03-22 2022-03-22 Hot spot identification method and current limiting method
CN202210289116.6 2022-03-22

Publications (1)

Publication Number Publication Date
WO2023179414A1 true WO2023179414A1 (en) 2023-09-28

Family

ID=82326426

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/081466 WO2023179414A1 (en) 2022-03-22 2023-03-14 Hotspot recognition method and rate limiting method

Country Status (2)

Country Link
CN (1) CN114756544A (en)
WO (1) WO2023179414A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114756544A (en) * 2022-03-22 2022-07-15 阿里云计算有限公司 Hot spot identification method and current limiting method
CN115051952A (en) * 2022-08-16 2022-09-13 阿里巴巴(中国)有限公司 Current limiting processing method, device, equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107220287A (en) * 2017-04-24 2017-09-29 东软集团股份有限公司 For the index managing method of log query, device, storage medium and equipment
US20190347343A1 (en) * 2018-05-09 2019-11-14 Palantir Technologies Inc. Systems and methods for indexing and searching
CN110716794A (en) * 2019-10-14 2020-01-21 网银在线(北京)科技有限公司 Information processing method, device, system and readable storage medium
CN113448922A (en) * 2021-08-30 2021-09-28 阿里云计算有限公司 Data archiving method, data access method and respective devices
CN114756544A (en) * 2022-03-22 2022-07-15 阿里云计算有限公司 Hot spot identification method and current limiting method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107220287A (en) * 2017-04-24 2017-09-29 东软集团股份有限公司 For the index managing method of log query, device, storage medium and equipment
US20190347343A1 (en) * 2018-05-09 2019-11-14 Palantir Technologies Inc. Systems and methods for indexing and searching
CN110716794A (en) * 2019-10-14 2020-01-21 网银在线(北京)科技有限公司 Information processing method, device, system and readable storage medium
CN113448922A (en) * 2021-08-30 2021-09-28 阿里云计算有限公司 Data archiving method, data access method and respective devices
CN114756544A (en) * 2022-03-22 2022-07-15 阿里云计算有限公司 Hot spot identification method and current limiting method

Also Published As

Publication number Publication date
CN114756544A (en) 2022-07-15

Similar Documents

Publication Publication Date Title
WO2023179414A1 (en) Hotspot recognition method and rate limiting method
TWI727226B (en) Multi-level storage method and device for blockchain data
US20220121384A1 (en) Hot Data Management Method, Apparatus, and System
US10270668B1 (en) Identifying correlated events in a distributed system according to operational metrics
US20200050694A1 (en) Burst Performance of Database Queries According to Query Size
US6574667B1 (en) Dynamic routing for performance partitioning in a data processing network
US10222987B2 (en) Data deduplication with augmented cuckoo filters
US7430640B2 (en) Detecting when to prefetch inodes and then prefetching inodes in parallel
US7467143B2 (en) Storage operation management system
WO2017050014A1 (en) Data storage processing method and device
US10769126B1 (en) Data entropy reduction across stream shard
CN110851311A (en) Service fault identification method, device, equipment and storage medium
JP2019523952A (en) Streaming data distributed processing method and apparatus
WO2017028696A1 (en) Method and device for monitoring load of distributed storage system
CN107562383B (en) Information processing method, storage device, and storage medium
WO2019001085A1 (en) Method and apparatus for monitoring database performance, computer device and storage medium
JP5329756B2 (en) Tracking space usage in the database
WO2016165542A1 (en) Method for analyzing cache hit rate, and device
CN110750498B (en) Object access method, device and storage medium
CN108920326B (en) Method and device for determining time-consuming abnormity of system and electronic equipment
EP3817432A1 (en) Data processing method and system
US11232228B2 (en) Method and device for improving data storage security
CN112711564A (en) Merging processing method and related equipment
WO2023109046A1 (en) Anomaly detection method and apparatus, electronic device, and storage medium
CN111291409B (en) Data monitoring method and device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23773662

Country of ref document: EP

Kind code of ref document: A1