WO2022016861A1

WO2022016861A1 - Hotspot data caching method and system, and related device

Info

Publication number: WO2022016861A1
Application number: PCT/CN2021/076978
Authority: WO
Inventors: 谢有权
Original assignee: 浪潮电子信息产业股份有限公司
Priority date: 2020-07-24
Filing date: 2021-02-20
Publication date: 2022-01-27
Also published as: CN111857597A

Abstract

A hotspot data caching method and system, a computer readable storage medium, and a server. The method comprises: receiving a read-write request, and confirming request data corresponding to the read-write request (S101); the number of times of access to the request data being increased by one, and determining whether the request data is hotspot data according to the current number of times of access (S102); if yes, adding the hotpot data to a hotspot queue, and adding a hotspot mark to the hotspot data in the hotspot queue (S103); and caching the hotspot data in the hotspot queue to a cluster for local storage (S104). The resource consumption caused by caching the hotspot data by a client is reduced, the service performance of the client is improved, each client is ensured to share the consistent cache, and the cache performance under a distributed shared storage system is further improved.

Description

A hotspot data caching method, system and related device

This application claims the priority of the Chinese patent application with the application number 202010724366.9 and the invention titled "A Hotspot Data Cache Method, System and Related Apparatus" filed with the China Patent Office on July 24, 2020, the entire contents of which are incorporated by reference in this application.

technical field

The present application relates to the field of data processing, and in particular, to a method, system and related device for caching hotspot data.

Background technique

With the rapid development of distributed shared storage systems, more and more attention is paid to its performance and security. In the scenario where multiple distributed clients share a volume, client cache is generally not used. Because there is no communication between clients, when a single client caches data, it cannot perceive that other clients modify this part of the data. , so it will lead to data inconsistency; so the general solution is to establish communication between clients. If the client has a lot of data, it will inevitably lead to the busy network, and at the same time, it will also cause the blocking problem of normal business due to ensuring data consistency, resulting in Degradation of client business performance; in addition, multiple distributed clients may cache duplicate data, resulting in a waste of resources, and the client's resources are limited, which easily affects the normal operation of the client.

Therefore, how to implement an effective cache in a distributed shared storage system is a technical problem that needs to be solved urgently by those skilled in the art.

SUMMARY OF THE INVENTION

The purpose of this application is to provide a hotspot data caching method, system, computer-readable storage medium and server, which improve the caching performance in a shared volume scenario.

In order to solve the above-mentioned technical problems, the present application provides a method for caching hotspot data, and the specific technical solutions are as follows:

Receive a read/write request, and confirm that the read/write request corresponds to the request data;

Add one to the number of visits of the requested data, and determine whether the requested data is hot data according to the current number of visits;

If so, add the hotspot data to the hotspot queue, and add a hotspot mark to the hotspot data in the hotspot queue;

The hotspot data in the hotspot queue is cached to the local storage of the cluster.

Optionally, judging whether the request data is hotspot data according to the current number of visits includes:

Whether the requested data is hot data is determined by using the least recently used policy according to the current number of visits.

Optionally, after adding the hotspot data to the hotspot queue, the method further includes:

Sorting according to the hotspot degree of each of the hotspot data in the hotspot queue;

The tail of the hotspot queue is the hotspot data with the lowest hotspot degree.

Optionally, when the hotspot queue is full, adding the hotspot data to the hotspot queue includes:

After moving the hotspot data at the end of the hotspot queue to the aging queue, add the hotspot data to the hotspot queue;

The hotspot flag of each hotspot data in the aging queue is changed to a hotspot aging flag.

After changing the hotspot mark of each hotspot data in the aging queue to the hotspot aging mark, the method further includes:

Obtain the number of visits in the hotspot data unit period;

Calculate the aging parameter of the hotspot data according to a preset formula;

Judging whether the difference between the aging parameter of the hotspot data and the number of visits per unit period is less than the rejection threshold;

If not, removing the hotspot data from the cluster local storage;

Wherein, the preset formula is:

Wherein, D=tt ₁ , β is the aging parameter, D is the time access interval, t is the current time, t ₁ is the hot spot marking time of the hot spot data, δ is the preset attenuation factor, and T is the unit period.

Optionally, when a read operation request for hotspot data is received, the method further includes:

Determine whether the hotspot data corresponding to the read operation are all located in the local storage of the cluster;

If not, obtain the first part of the requested data from the cluster local storage, and obtain the second part of the requested data from the disk;

The first part of the data and the second part of the data are combined and returned to the requester of the read operation.

Optionally, caching the hotspot data in the hotspot queue to the cluster local storage includes:

Confirm the block object ID where the hotspot data including the hotspot mark is located;

The hotspot data and the corresponding block object ID are cached to the local storage of the cluster.

The application also provides a hotspot data caching system, including:

a receiving module, configured to receive read and write requests, and confirm that the read and write requests correspond to request data;

a judgment module, used for adding one to the access times of the requested data, and judging whether the requested data is hot data according to the current access times;

A hotspot marking module, for adding the hotspot data to the hotspot queue when the judgment result of the judgment module is yes;

The cache module is configured to add a hotspot mark to the hotspot data in the hotspot queue, and cache the hotspot data in the hotspot queue to the local storage of the cluster.

The present application also provides a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, implements the steps of the above-described method.

The present application also provides a server, including a memory and a processor, wherein a computer program is stored in the memory, and the processor implements the steps of the above method when the computer program in the memory is invoked.

The present application provides a method for caching hotspot data, which includes: receiving a read/write request, and confirming that the read/write request corresponds to the request data; adding one to the access times of the request data, and determining whether the request data is based on the current access times Hotspot data; if yes, add the hotspot data to the hotspot queue, and add a hotspot mark to the hotspot data in the hotspot queue; cache the hotspot data in the hotspot queue to the local storage of the cluster.

This application uses the client to identify the hotspot data, and saves the hotspot data to the cluster local storage at the bottom of the distributed storage cluster through the hotspot queue. The local storage engine at the bottom of the cluster ensures the consistency of the cached data of each client. It is responsible for the resources occupied by hot data, reduces the resource consumption caused by the client caching hot data, improves the service performance of the client, ensures that each client enjoys a consistent cache, and further improves the cache performance under the distributed shared storage system.

The present application also provides a method, system, computer-readable storage medium and server for caching hotspot data, which have the above beneficial effects, and will not be repeated here.

Description of drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the following briefly introduces the accompanying drawings required for the description of the embodiments or the prior art. Obviously, the drawings in the following description are only It is an embodiment of the present application. For those of ordinary skill in the art, other drawings can also be obtained according to the provided drawings without any creative effort.

1 is a flowchart of a method for caching hotspot data provided by an embodiment of the present application;

FIG. 2 is a schematic structural diagram of a hotspot data caching system provided by an embodiment of the present application.

detailed description

In order to make the purposes, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be described clearly and completely below with reference to the drawings in the embodiments of the present application. Obviously, the described embodiments It is a part of the embodiments of the present application, but not all of the embodiments. Based on the embodiments in the present application, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present application.

Please refer to FIG. 1. FIG. 1 is a flowchart of a method for caching hotspot data provided by an embodiment of the present application. The specific technical solution is as follows:

S101: Receive a read/write request, and confirm that the read/write request corresponds to request data;

In this step, the client receives the read and write requests, and confirms the corresponding request data. Specifically, the corresponding object can be found according to the offset and length of the read operation.

S102: Add one to the access times of the requested data, and determine whether the requested data is hotspot data according to the current access times; if so, go to S103;

Since the read and write requests are directed to access request data, the number of accesses to the requested data is increased at this time, and it is determined whether the requested data satisfies the hotspot data condition, that is, whether it becomes hotspot data. There is no specific limitation on how to determine whether the requested data is hotspot data, and it can be determined whether the requested data is hotspot data by using the least recently used policy according to the current number of visits. The so-called Least Recently Used strategy is the LRU (Least Recently Used) strategy, which is essentially a cache elimination strategy, mainly based on the frequency of data access to determine whether it is hot data. Of course, those skilled in the art can also use other ways to determine whether the request data is hot data, which is not limited by examples here.

S103: adding the hotspot data to a hotspot queue, and adding a hotspot mark to the hotspot data in the hotspot queue;

If the requested data is hotspot data, the hotspot data is added to the hotspot queue, and a hotspot mark is added to the hotspot data to identify that the data has become hotspot data. It should be noted that the hotspot queue can be a storage queue in the storage device, or it can only be used as a collection of hotspot data without an actual storage structure. For example, the hotspot queue can be added to the hotspot data as an attribute to show the hotspot data. Added to hotspot queue.

As a preferred implementation manner of this step, after the hotspot data is added to the hotspot queue, it can also be sorted according to the hotspot degree of each of the hotspot data in the hotspot queue, so that the queue tail of the hotspot queue is the lowest hotspot degree hotspot data. There is no specific limitation on how to calculate the hotness degree of hotspot data, which can generally be determined according to the access frequency of each hotspot data in a unit time, and the unit time may be a preset period, such as one day, one week, or one month. In addition, if there is hotspot data with the same access frequency per unit time, the hotspot degree can be further determined according to the last access time, that is, the closer the last access time is to the current time, the higher the hotspot degree.

S104: Cache the hotspot data in the hotspot queue to local storage in the cluster.

This step is designed to cache hotspot data to cluster local storage. It should be noted that in this step, the client sends the hotspot queue to the master node of the distributed storage cluster by default, and then the master node caches the hotspot data in the hotspot queue to the local storage of the cluster. The cluster local storage is located at the bottom layer of the distributed storage cluster and can be accessed by all clients in the cluster. At this time, cache consistency between clients can be achieved. Specifically, it is also necessary to distinguish hotspot data and non-hotspot data according to the hotspot mark in step S103.

It should be noted that only one hotspot queue exists in the entire distributed storage cluster, and each client can determine the corresponding hotspot data according to the read and write status of its own data, and feed it back to the hotspot queue in the cluster.

There is no specific limitation on how to cache the hotspot data here, you can first confirm the block object ID where the hotspot data including the hotspot mark is located, and then cache the hotspot data and the corresponding block object ID to the local storage of the cluster. In order to facilitate management, the shared volume in the distributed storage cluster is divided, that is, divided into several block objects according to the preset size, and any data has its own block object. By dividing the shared volume, it is convenient to improve the search efficiency of hot data in the cache, which is equivalent to establishing an index for each data. Then, when this step is performed, the block object ID where the hotspot data is located may be determined first, and the corresponding block object ID is stored in the cluster local storage synchronously when the hotspot data is cached. It should be noted that all block objects are not directly cached. Since the block object may contain non-hot data, only the hot data in the block object is saved when caching is performed.

It is easy to understand that this embodiment can be executed once every time the client receives a read/write request. If the requested data has already been hotspot data before the number of accesses is increased by one, then the increase of the number of visits at this time means the request. The hotspot corresponding to the data increases, and at this time, its position in the hotspot queue can be closer to the head of the team. And when the requested data is already hot data, you can directly retrieve the corresponding hot data from the local storage of the cluster and reply to the request.

In this embodiment of the present application, the client identifies hotspot data, saves the hotspot data to the cluster local storage at the bottom of the distributed storage cluster through the hotspot queue, and the local storage engine at the bottom of the cluster ensures the consistency of the cached data of each client. The storage cluster is responsible for the resources occupied by hot data, which reduces the resource consumption caused by the client's caching of hot data, improves the service performance of the client, ensures that each client enjoys a consistent cache, and further improves the cache performance under the distributed shared storage system. .

Based on the above embodiment, as a preferred embodiment, in order to ensure that the amount of hotspot data is relatively stable, that is, hotspot data cannot be increased indefinitely, when the hotspot queue is fully loaded, the following steps may also be performed:

The hotspot data at the end of the hotspot queue is moved to the aging queue, and the hotspot mark of each hotspot data in the aging queue is changed to a hotspot aging mark.

When the hotspot queue is full, if there is new hotspot data at this time, the hotspot data at the end of the hotspot queue is removed and added to the aging queue. Of course, at this time, the default hotspot queue has been sorted according to the hotspot degree of each hotspot data, that is, the hotspot data at the end of the queue is the hotspot data with the lowest current hotspot degree. It should be noted that the access frequency of the new hotspot data and the hotspot data at the end of the queue in the hotspot queue is usually the same or similar. However, since the last access time of the new hotspot data is obviously shorter than that of the hotspot data already in the hotspot queue, it can be unconditionally. Perform hot spot data removal at the end of the queue.

It should be noted that the aging queue is designed to carry the replaced hotspot data, which does not mean that the data in the aging queue no longer belongs to the hotspot data. Therefore, cache removal is not required for all data in the aging queue. It is further determined whether the data including the hotspot aging flag really needs to be removed from the cache.

There is no specific limitation on how to perform hotspot data removal. As a preferred implementation manner of this embodiment, the following steps may be performed:

S201: Obtain the number of visits of the hotspot data in a unit period;

S202: Calculate aging parameters of hotspot data according to a preset formula;

S203: Determine whether the difference between the aging parameter of the hotspot data and the number of visits per unit period is less than the rejection threshold; if not, go to S204;

S204: remove the hotspot data from the cluster local storage;

Among them, the preset formula is:

Wherein, D=tt ₁ , β is the aging parameter, D is the time access interval, t is the current time, t ₁ is the marking time of the hot spot marking, δ is the preset decay factor, and T is the unit period.

The rejection threshold and preset attenuation factor are not specifically limited here, and can be set by those skilled in the art according to actual rejection requirements. In addition, the execution processes of step S201 and step S202 are independent of each other, and there is no predetermined execution sequence, and only needs to be completed between the execution of the judgment process of S203. It is easy to understand that the above process is only a detailed process for removing hotspot data from the aging queue provided in this embodiment, and those skilled in the art can make any improvements to the above process, which should fall within the protection scope of the present application.

It should be noted that the execution subject of this embodiment does not have to be the client. Since the hotspot queue is held by the server in the distributed storage cluster, the hotspot data aging process in this embodiment can also be performed by the server. Execution, that is, reducing the hot data processing pressure of the client, while avoiding repeated operations between different clients, and reducing the extra waste of resources. At the same time, by setting the aging queue in this embodiment, it can ensure that the hotspot data in the hotspot queue is the current hottest data in the distributed storage cluster, which is beneficial to improve the utilization efficiency of the cluster cache.

Based on the above embodiment, as a preferred embodiment, when a read operation request for hotspot data is received, the following steps may also be included:

S301: Determine whether the hotspot data corresponding to the read operation are all located in the local storage of the cluster; if not, go to S302;

S302: Obtain the first part of the requested data from the cluster local storage, and obtain the second part of the requested data from the disk;

S303: Combine the first part of the data and the second part of the data and return to the requester of the read operation.

It is easy to understand that due to the message lag between the server and the client, or because the data status is updated too quickly, that is, the actual data for the read operation request of the hot data sent by the client is not necessarily the current distributed data. The cluster in the storage system locally stores the actual cached hot data. That is, part of the data corresponding to the read operation request may have become aging data including hotspot aging marks, or it may be that each block object is large due to the excessively large division granularity when executing shared volume division, so that when hotspot data caching is executed, certain If the hot data of each block object is not completely cached, at this time, it can be obtained according to the location of the actual data requested by the read operation request, that is, the first part of the data cached in the local storage of the cluster is directly read from the cache, while The other part needs to call the disk to perform the corresponding IO operation to obtain the second part of the data, and finally the data returned to the requester should contain the first part of the data and the second part of the data.

The following describes a hotspot data caching system provided by an embodiment of the present application. The hotspot data caching system described below and the hotspot data caching method described above may refer to each other correspondingly.

Referring to FIG. 2, FIG. 2 is a schematic structural diagram of a hotspot data caching system provided by an embodiment of the present application, and the present application also provides a hotspot data caching system, including:

A receiving module 100, configured to receive a read/write request, and confirm that the read/write request corresponds to request data;

Judging module 200, for adding one to the number of visits of the requested data, and judging whether the requested data is hot data according to the current number of visits;

A hotspot marking module 300, configured to add the hotspot data to the hotspot queue when the judgment result of the judgment module is yes;

The caching module 400 is configured to add a hotspot mark to the hotspot data in the hotspot queue, and cache the hotspot data in the hotspot queue to local storage in the cluster.

Based on the above embodiment, as a preferred embodiment, the judgment module 200 includes:

A judging unit for judging whether the request data is hotspot data by using the least recently used policy according to the current access times.

Based on the above embodiment, as a preferred embodiment, it can also include:

a hotspot sorting module, configured to sort according to the hotspot degree of each of the hotspot data in the hotspot queue;

Based on the above embodiment, as a preferred embodiment, it can also include:

An aging processing module, configured to move the hotspot data at the end of the hotspot queue to the aging queue when the hotspot queue is full, and change the hotspot mark of each hotspot data in the aging queue to the hotspot aging mark.

Further, the aging processing module may include:

An aging processing unit, configured to acquire the number of visits per unit period of the hotspot data; calculate the aging parameter of the hotspot data according to a preset formula; determine whether the difference between the aging parameter of the hotspot data and the number of visits per unit period is less than the number of times to be eliminated threshold; if not, remove the hotspot data from the cluster local storage;

Wherein, the preset formula is:

Based on the above embodiment, as a preferred embodiment, it also includes:

A data retrieval module, configured to determine whether the hotspot data corresponding to the read operation are all located in the cluster local storage; if not, obtain the first part of the requested data from the cluster local storage, and obtain the data from the disk Request the second part of the data; combine the first part of the data and the second part of the data and return it to the requester of the read operation.

Based on the above embodiment, as a preferred embodiment, the cache module 400 includes:

a confirmation unit for confirming the block object ID where the hotspot data including the hotspot mark is located;

A cache unit, configured to cache the hotspot data and corresponding block object IDs to local storage in the cluster.

The present application also provides a computer-readable storage medium on which a computer program is stored, and when the computer program is executed, the steps provided by the above embodiments can be implemented. The storage medium may include: U disk, removable hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program codes.

The present application also provides a server, which may include a memory and a processor, where a computer program is stored in the memory, and when the processor invokes the computer program in the memory, the steps provided in the above embodiments can be implemented. Of course, the server may also include various network interfaces, power supplies and other components.

The various embodiments in the specification are described in a progressive manner, and each embodiment focuses on the differences from other embodiments, and the same and similar parts between the various embodiments can be referred to each other. For the system provided by the embodiment, since it corresponds to the method provided by the embodiment, the description is relatively simple, and the relevant part can be referred to the description of the method.

Specific examples are used herein to illustrate the principles and implementations of the present application, and the descriptions of the above embodiments are only used to help understand the methods and core ideas of the present application. It should be pointed out that for those of ordinary skill in the art, without departing from the principles of the present application, several improvements and modifications can also be made to the present application, and these improvements and modifications also fall within the protection scope of the claims of the present application.

It should also be noted that, in this specification, relational terms such as first and second are used only to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply these entities or operations. There is no such actual relationship or sequence between operations. Moreover, the terms "comprising", "comprising" or any other variation thereof are intended to encompass non-exclusive inclusion such that a process, method, article or device comprising a list of elements includes not only those elements, but also includes not explicitly listed or other elements inherent to such a process, method, article or apparatus. Without further limitation, an element qualified by the phrase "comprising a..." does not preclude the presence of additional identical elements in a process, method, article or apparatus that includes the element.

Claims

A method for caching hotspot data, comprising:

Receive a read/write request, and confirm that the read/write request corresponds to the request data;

Add one to the number of visits of the requested data, and determine whether the requested data is hot data according to the current number of visits;

If so, add the hotspot data to the hotspot queue, and add a hotspot mark to the hotspot data in the hotspot queue;

The hotspot data in the hotspot queue is cached to the local storage of the cluster.
The hotspot data caching method according to claim 1, wherein judging whether the requested data is hotspot data according to the current number of visits comprises:

Whether the requested data is hot data is determined by using the least recently used policy according to the current number of visits.
The method for caching hotspot data according to claim 1, wherein after adding the hotspot data to the hotspot queue, the method further comprises:

Sorting according to the hotspot degree of each of the hotspot data in the hotspot queue;

The tail of the hotspot queue is the hotspot data with the lowest hotspot degree.
The method for caching hotspot data according to claim 3, wherein when the hotspot queue is full, adding the hotspot data to the hotspot queue comprises:

After moving the hotspot data at the end of the hotspot queue to the aging queue, add the hotspot data to the hotspot queue;

The hotspot flag of each hotspot data in the aging queue is changed to a hotspot aging flag.
The method for caching hotspot data according to claim 4, wherein after changing the hotspot mark of each hotspot data in the aging queue to a hotspot aging mark, the method further comprises:

Obtain the number of visits in the hotspot data unit period;

Calculate the aging parameter of the hotspot data according to a preset formula;

Judging whether the difference between the aging parameter of the hotspot data and the number of visits per unit period is less than the rejection threshold;

If not, removing the hotspot data from the cluster local storage;

Wherein, the preset formula is:

Wherein, D=tt 1 , β is the aging parameter, D is the time access interval, t is the current time, t 1 is the marking time of the hot spot marking, δ is the preset decay factor, and T is the unit period.
The method for caching hotspot data according to claim 1, wherein when receiving a read operation request for hotspot data, the method further comprises:

Determine whether the hotspot data corresponding to the read operation are all located in the local storage of the cluster;

If not, obtain the first part of the requested data from the cluster local storage, and obtain the second part of the requested data from the disk;

The first part of the data and the second part of the data are combined and returned to the requester of the read operation.
The hotspot data caching method according to claim 1, wherein the caching of the hotspot data in the hotspot queue to the cluster local storage comprises:

Confirm the block object ID where the hotspot data including the hotspot mark is located;

The hotspot data and the corresponding block object ID are cached to the local storage of the cluster.
A hotspot data caching system, characterized in that it includes:

a receiving module, configured to receive a read/write request, and confirm that the read/write request corresponds to the request data;

a judgment module, used for adding one to the access times of the requested data, and judging whether the requested data is hot data according to the current access times;

A hotspot marking module, used for adding the hotspot data to the hotspot queue when the judgment result of the judgment module is yes;

The cache module is configured to add a hotspot mark to the hotspot data in the hotspot queue, and cache the hotspot data in the hotspot queue to the local storage of the cluster.
A computer-readable storage medium on which a computer program is stored, characterized in that, when the computer program is executed by a processor, the steps of the method according to any one of claims 1-7 are implemented.
A server, characterized by comprising a memory and a processor, wherein a computer program is stored in the memory, and the processor implements the method according to any one of claims 1-7 when the processor invokes the computer program in the memory A step of.