CN114116796A

CN114116796A - Distributed cache system for preventing cache treading

Info

Publication number: CN114116796A
Application number: CN202111287708.6A
Authority: CN
Inventors: 刘津; 赵山; 许晓笛
Original assignee: Inspur Cloud Information Technology Co Ltd
Current assignee: Shanghai Yunxi Technology Co ltd
Priority date: 2021-11-02
Filing date: 2021-11-02
Publication date: 2022-03-01

Abstract

The invention discloses a distributed cache system for preventing cache treading, and belongs to the technical field of cloud storage systems. The distributed cache system for preventing cache treading comprises a protocol analyzer, a cache processor, a key counter, a control configuration item, a current limiter and a protocol generator, wherein the protocol analyzer, the cache processor and the protocol generator are components of the distributed cache system, and the key counter, the control configuration item and the current limiter are components for solving the problem of cache treading increase. The distributed cache system for preventing cache treading can effectively avoid the problem of cache treading generated when the cache in the distributed cache system is not stored in advance or fails suddenly, and has good popularization and application values.

Description

Distributed cache system for preventing cache treading

Technical Field

The invention relates to the technical field of cloud storage systems, and particularly provides a distributed cache system for preventing cache treading.

Background

In the field of computers, a cache is a software/hardware component for temporarily storing data so that future requests can be responded to more quickly, and the data stored in the cache can be the result of previous calculations or data in any other storage facility. The distributed cache is an extension of a traditional cache used locally and independently, and can be deployed across multiple servers to extend the capacity and data transmission capability of the cache. The distributed cache is mainly used for temporarily storing application data or session data of the web server. A business scenario for high-concurrency mass-reading requests is widely applied by internet companies in recent years.

Bond: the cache system discussed in the invention is a key value cache, the key is the unique identifier for storing data, the cache client stores or accesses the data in the cache through the key, and the key of each data is different and unique.

The value: the content corresponding to the key in the cache is a copy of the content of the data in the cache.

Hit: data requests are queried from the cache and return the required data, otherwise, a miss.

And (4) timeout: the time for the existence of the cache data reaches the maximum survival time set in the system for allowing the existence of the cache data.

Elimination: meaning that the caching system discards some cached data (overwritten by new data) according to certain policies.

And (3) failure: the data in the cache is eliminated because the preset timeout time is reached or eliminated according to the preset elimination strategy after the storage space of the cache system reaches the upper limit.

Penetration: the phenomenon that data inquired by an inquiry request does not hit a cache and directly inquires a persistence layer instead is pointed out.

Breakdown: refers to the phenomenon that a large number of query requests directly request the database at the moment of cache failure.

Avalanche: the phenomenon that data in a cache fails in a large scale, so that a large number of cache client processes break down the cache, and the database is over-stressed or even down is caused.

Cascading system failure refers to the situation where, in a system having multiple associated components, the failure of one or a portion of the components can recursively cause successive failures of other components. Cache stamping (cachestampede) is a cascading system failure that occurs in large-load concurrent computing systems with caching mechanisms.

Disclosure of Invention

The technical task of the present invention is to provide a buffer treading prevention distributed cache system which can effectively avoid the buffer treading problem generated when the buffer is not stored in advance or fails suddenly in the distributed cache system.

In order to achieve the purpose, the invention provides the following technical scheme:

a distributed cache system for preventing cache treading comprises a protocol parser, a cache processor, a key counter, a control configuration item, a current limiter and a protocol generator, wherein the protocol parser, the cache processor and the protocol generator are components of the distributed cache system, and the key counter, the control configuration item and the current limiter are components for solving the problem of increased cache treading.

Preferably, the protocol analyzer is responsible for receiving and analyzing the request message of the cache client, and obtaining the key value to be stored or the data key to be queried.

Preferably, the cache processor is configured to write or query a key value according to a user request.

Preferably, a key counter is used to count keys requested by the user but not present in the cache.

Preferably, the key counter is responsible for recording, in the event of a cache miss, the number of times of miss requests of the cache key, which is key value data having an integer number, and is used by the current limiter for the current limiting condition determination as a count entry.

The key counter creates a count entry for the cache key when a key first misses, and adds 1 to it, and directly adds 1 to it when this key misses again (if Redis it can be conveniently implemented in one step by invoking the incr command, i.e., incrkeyme _ key _ count1, where keyame is the name of the missed key and _ key _ count is the suffix of the key counter count entry).

After the value of the data key is inquired by the cache client which receives the miss and is written into the cache, if the cache client inquires again and hits, the key counting entry in the key counter is cleared, thereby achieving the purposes of recycling the memory resource and avoiding the unlimited occupation of the memory by the counter area. If no cache client can write the data key value, the key counter clears the key entry according to the maximum request count set in the control configuration item.

Preferably, the control configuration items are divided into cache system configuration items and client configuration items, and the cache system configuration items comprise maximum key count values, peak value request data and maximum database concurrency data; the client configuration items include a commitment limit and a maximum number of commitment retries.

The maximum key-counter-max-count is used for count control of the key counter count entry. For example, when key-counter-max-count is 100, the key counter will automatically clear the count entry for a key on the 101 st miss.

The peak-request-count (peak-request-count) used for calculating the commitment value of the current limiter can be generally estimated by the daily peak user number of the application system.

The maximum concurrency number (max-concurrency) of the database and the single maximum concurrency number which can be borne by the database system can be set, and a numerical value slightly smaller than the real maximum concurrency can be set for ensuring the safe and stable operation of the database system and is also used for calculating the commitment value by the current limiter.

A commit-limit (commit-limit) and a maximum committed-retry (max-commit-retry) for the cache client to calculate how much time to initiate a retry after a certain request is missed by a time hash function, and the maximum committed-retry is used to control the retry. When max-premium-retry is 0, it indicates no retry, and when 3, it indicates that the cache client will initiate 3 retries at most when receiving the premium response, if the required data is not obtained after 3 retries, it is considered that the database is failed, the client that released the miss request before committing cannot successfully refresh the cache, and at this time, the attempt should be abandoned, and the processing is overtime according to the database connection.

Preferably, the current limiter compares the key count with a count threshold set by the configuration, and classifies the non-existent keys according to the comparison result.

Preferably, the flow restrictor is used for flow restriction control, when the cache misses, the flow restrictor compares the count entry missed in the key counter with the commitment threshold in the control configuration item, if the value of the count entry is greater than the commitment threshold, the commitment is calculated and returned, otherwise, the miss is directly returned.

Wherein the commitment value may be determined by the following hash function:

p＝H(n) (1)

where n is the count entry value, i.e., the number of miss requests, and h (n) is a hash function of the count entry value, including but not limited to hash functions commonly used in computer science.

Preferably, the protocol generator is configured to encapsulate the return data into a TCP packet according to a specific caching protocol and send the TCP packet to the caching client.

The distributed cache system for preventing cache treading expands a commitment agreement on the basis of the traditional cache system. Taking the RESP protocol as an example, when the cache misses and the current request is determined by the current limiter to need to return a commitment, the protocol message is as follows:

+ premium 7\ r \ n, which indicates that the client receiving this commitment is the 7 th caching client receiving the commitment.

The cache client expands the commitment processing and retry strategy, and after the client receives the commitment protocol, the time length of the retry request waiting is calculated according to the commitment value and the commitment limit in the configuration item, and the formula is as follows:

T_w＝L_p-p*t_n (2)

wherein T is_wThe time that the caching client needs to wait for a retry is expressed in milliseconds (ms), which is referred to as committed latency. L is_pAs a commitment limit, the commitment limit is the maximum response time that a client can accept, in milliseconds (ms). p is a commitment value, t_nIs the average response time, p x t, of a single data query of the database_nReferred to as the commitment time. When the commitment time is less than the commitment limit, T_wPositive number, indicating acceptance of commitment, the cache client will be at T_wRetrying the request to inquire the cache after time; when the commitment time is greater than the commitment limit, T_wA negative number indicates that the commitment is not acceptable, the cache client will give up retries, and no longer request the database, and directly throw out the timeout exception (the timeout may be processed according to the database connection, or a null value or a default value may be directly returned to the requested data). Client can be profitableThe method is realized in the form of lib dependent class library of applications such as C/C + + and Java.

Distributed cache system to prevent cache stepping, query scenario at cache hit (i.e. key present and not invalid): the protocol analyzer analyzes the request message of the cache client to obtain a request key, the cache processor queries to obtain data, and the data is packaged by the protocol generator to return a response to the cache client.

Query scenario on cache miss (key not present) or invalidation (key expired or deselected): the protocol analyzer analyzes the request message of the cache client to obtain a request key, the cache processor inquires the miss cache and stores the request key into a counter, and the current limiter compares the request times in the counter with a set miss time threshold in the extended configuration: if the request times are less than the threshold value, directly returning the miss to the protocol generator, if the request times are more than the threshold value, returning a commitment protocol to the protocol generator, and after encapsulating the TCP message, returning a response to the cache client-side by the protocol generator.

When a plurality of cache clients request data from the system at the same time, if the data key value does not exist, only a few (less than the threshold value set by the extended configuration) cache clients receive missed return messages, other cache clients recover a commitment protocol, the cache clients receiving the commitment protocol calculate a retry time through a time hash function according to the commitment value, and the time and the times of requesting retry inquiry cache are controlled by combining a request retry strategy set by the cache clients, so that active current limiting under a huge concurrency scene is realized, and the occurrence of cache trampling is effectively avoided.

Compared with the prior art, the distributed cache system for preventing cache treading has the following outstanding beneficial effects: the distributed cache system for preventing cache treading expands a commitment protocol between the cache system and the cache client, and utilizes the time hash function to carry out time hash and retry on a large amount of requests in a short time, so that the problem of cache treading generated when the cache in the distributed cache system is not prestored or suddenly fails can be effectively avoided, and the distributed cache system has good popularization and application values.

Drawings

FIG. 1 is a key counter operating schematic diagram of a distributed cache system for cache tread prevention according to the present invention;

FIG. 2 is a schematic diagram of the flow restrictor activity of the cache tread prevention distributed cache system of the present invention;

fig. 3 is a schematic diagram of the activities of the client of the distributed cache system for preventing cache treading according to the present invention.

Detailed Description

The distributed cache system for preventing cache treading according to the present invention will be described in further detail with reference to the accompanying drawings and embodiments.

Examples

As shown in fig. 1, the distributed cache system for preventing cache treading of the present invention includes a protocol parser, a cache processor, a key counter, a control configuration item, a current limiter and a protocol generator, wherein the protocol parser, the cache processor and the protocol generator are components of the distributed cache system, and the key counter, the control configuration item and the current limiter are components for solving the problem of cache treading increase.

The protocol analyzer is responsible for receiving and analyzing the request message of the cache client and obtaining the key value to be stored or the data key to be inquired.

Writing a scene: the protocol analyzer analyzes the request message of the cache client to obtain a request key and a value, the cache processor stores the key value into the memory and sends a success signal to the protocol generator, and the protocol generator packages the success signal into a TCP message and then returns a response to the cache client.

As shown in fig. 1, the cache processor is configured to write or query a key value according to a user request. The key counter is used to count keys requested by the user but not present in the cache. The key counter is responsible for recording the number of times of miss requests of the cache key when the cache misses, the number of times is key value data with integer numbers, and the number of times is a counting entry and is used for current limiting condition judgment by the current limiter.

The control configuration items are divided into cache system configuration items and client configuration items, and the cache system configuration items comprise maximum key count values, peak value request numbers and maximum database concurrency numbers; the client configuration items include a commitment limit and a maximum number of commitment retries.

As shown in fig. 2, the current limiter compares the key count with a count threshold set by the configuration, and classifies the non-existent keys according to the comparison result. The flow restrictor is used for flow restriction control, when the cache misses, the flow restrictor compares the count entry which misses in the key counter with the commitment threshold value in the control configuration item, if the value of the count entry is larger than the commitment threshold value, the commitment is calculated and returned, otherwise, the commitment is directly returned.

Wherein the commitment value may be determined by the following hash function:

p＝H(n) (1)

Taking the Facebook case as an example, there are currently over 24.5 billion active users per month around the globe of Facebook. But a general global application is to use a multi-region deployment, each region is burdened with a part of the load, taking the U.S. with the most users as an example, the U.S. currently has a population of over 3.2 billion, and 70% of them are Facebook users, i.e. around 2 billion. Temporarily, assuming that Facebook has 20 regions in the whole U.S., about 2000 million active users per month in a single region are estimated online according to 10% of users at the same time, the instantaneous concurrency of the single region is 200 million, a single-node MySQL database can generally bear 1000 concurrences at the maximum, the bearing capacity can be increased to about 5 times at most after a read-write separation or clustering technology is used, namely 5000 concurrences are carried out, the instantaneous request is 400 times of the bearing capacity of the database, namely 400 concurrent requests are needed to complete all queries, the problem is converted into the use of a hash function, the frequency of 200 million concurrent database query requests is reduced to 50ms once, and 400 concurrent requests are needed in total.

Assuming that the average query time of a single data query by index is 50 milliseconds (ms), 20000 milliseconds (ms) are needed in 400 rounds, i.e. a total of 50ms x 400 ms 20000ms is needed for 200 ten thousand requests, and assuming that the user-acceptable limit response time is 5 seconds(s), i.e. 5000ms, is equivalent to only meeting 100 rounds of retries, and the remaining 300 rounds of retries need to be blown.

Ideally, the transmission delay of the network inside the cluster is negligible, and the influence of refreshing the count entry value when the key counter reaches the limit value is not considered, the hash function h (p) can be determined simply by using the following formula:

where N is the value of the count entry, i.e., the current number of requests that miss the cache, N_sTo assume a single extreme concurrency load capability of a database cluster at cache breakthrough, 5000 in this example,

in order to perform the rounding-down operation,

indicating a database concurrent batch to which a miss request corresponds. For example, when n is 1 to 4999,

the value is 0, and the formula (3) shows that the commitment value of the 1 st to 4999 th miss requests is 0, namely, the current limiter returns a miss to the 1 st to 4999 th requests, allows the requests to directly access the database query data and updates the cache; and requests with n of 5000-9999 will retry to request cache in the same batch.

Considering now that the key counter in the cache performs a clear operation, i.e., a recount, when the maximum key count value is reached, the maximum key count value k is introduced in equation (3), and the total number of requests N (500, 000 in this example), when k < N,

a counter indicating how many groups the key counter refresh will divide the requests into, i.e. how many requests correspond toThe number entry values are the same, e.g. k is 5000,

is 100, indicating that every 100 requests in 500,000 requests have the same count entry value, then

Representing how many sets of same count entry value requests can be processed one database concurrently, equation (3) can be evolved to equation (4):

according to the nature of the formula, the formula (5) can be arranged:

when k ≧ N, H (N) degenerates to equation (3), i.e., H (N) is a piecewise function of the maximum key-count value k:

the daily activity peak concurrency number N in the formula (6) and the maximum concurrency number N supported by the database system once_sNamely, 4.3 nodes control configuration item the peak-request-count and the database maximum concurrency (max-concurrency).

Meanwhile, the maximum key count value k of the key counter is reasonably set, so that multi-batch miss response can be triggered, the client can be ensured to try to read the database for multiple times, and the data needing to be inquired is refreshed into the cache.

The protocol generator is used for packaging the return data into a TCP message according to a specific cache protocol and sending the TCP message to the cache client. The distributed cache system for preventing cache treading expands a commitment agreement on the basis of the traditional cache system. Taking the RESP protocol as an example, when the cache misses and the current request is determined by the current limiter to need to return a commitment, the protocol message is as follows:

T_w＝L_p-p*t_n (2)

wherein T is_wThe time that the caching client needs to wait for a retry is expressed in milliseconds (ms), which is referred to as committed latency. L is_pAs a commitment limit, the commitment limit is the maximum response time that a client can accept, in milliseconds (ms). p is a commitment value, t_nIs the average response time, p x t, of a single data query of the database_nReferred to as the commitment time. When the commitment time is less than the commitment limit, T_wPositive number, indicating acceptance of commitment, the cache client will be at T_wRetrying the request to inquire the cache after time; when the commitment time is greater than the commitment limit, T_wA negative number, indicating that the commitment is not acceptable, the caching client will give up retries and no longer request the database, and directly throw the timeout exception (which may be handled as a timeout for the database connection, or may return a null value or default value directly for the requested data), as shown in fig. 3. The client can be realized in the form of lib dependent class library of applications such as C/C + + and Java.

The above-described embodiments are merely preferred embodiments of the present invention, and general changes and substitutions by those skilled in the art within the technical scope of the present invention are included in the protection scope of the present invention.

Claims

1. A distributed cache system for preventing cache treading is characterized in that: the distributed cache system comprises a protocol analyzer, a cache processor, a key counter, a control configuration item, a current limiter and a protocol generator, wherein the protocol analyzer, the cache processor and the protocol generator are components of the distributed cache system, and the key counter, the control configuration item and the current limiter are components for solving the problem of buffer stepping increase.

2. The cache tread preventive distributed cache system of claim 1, wherein: the protocol analyzer is responsible for receiving and analyzing the request message of the cache client and obtaining the key value to be stored or the data key to be inquired.

3. The cache tread preventive distributed cache system of claim 2, wherein: the cache processor is used for writing or inquiring key values according to user requests.

4. The cache tread preventive distributed cache system of claim 3, wherein: the key counter is used to count keys requested by the user but not present in the cache.

5. The cache tread preventive distributed cache system of claim 4, wherein: the key counter is responsible for recording the number of times of miss requests of the cache key when the cache misses, the number of times is key value data with integer numbers, and the number of times is a counting entry and is used for current limiting condition judgment by the current limiter.

6. The cache tread preventive distributed cache system of claim 5, wherein: the control configuration items are divided into cache system configuration items and client configuration items, and the cache system configuration items comprise maximum key count values, peak value request numbers and maximum database concurrency numbers; the client configuration items include a commitment limit and a maximum number of commitment retries.

7. The cache tread preventive distributed cache system of claim 6, wherein: the current limiter compares the key count with a count threshold set by configuration, and classifies the non-existing keys according to the comparison result.

8. The cache tread preventive distributed cache system of claim 7, wherein: the flow restrictor is used for flow restriction control, when the cache misses, the flow restrictor compares the count entry which misses in the key counter with the commitment threshold value in the control configuration item, if the value of the count entry is larger than the commitment threshold value, the commitment is calculated and returned, otherwise, the commitment is directly returned.

9. The cache tread preventive distributed cache system of claim 8, wherein: the protocol generator is used for packaging the return data into a TCP message according to a specific cache protocol and sending the TCP message to the cache client.