WO2022152086A1 - Data caching method and apparatus, and device and computer-readable storage medium - Google Patents

Data caching method and apparatus, and device and computer-readable storage medium Download PDF

Info

Publication number
WO2022152086A1
WO2022152086A1 PCT/CN2022/071079 CN2022071079W WO2022152086A1 WO 2022152086 A1 WO2022152086 A1 WO 2022152086A1 CN 2022071079 W CN2022071079 W CN 2022071079W WO 2022152086 A1 WO2022152086 A1 WO 2022152086A1
Authority
WO
WIPO (PCT)
Prior art keywords
cache
policy
data
type
caching
Prior art date
Application number
PCT/CN2022/071079
Other languages
French (fr)
Chinese (zh)
Inventor
郭畅
Original Assignee
华为云计算技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为云计算技术有限公司 filed Critical 华为云计算技术有限公司
Publication of WO2022152086A1 publication Critical patent/WO2022152086A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24552Database cache management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/12Replacement control
    • G06F12/121Replacement control using replacement algorithms
    • G06F12/128Replacement control using replacement algorithms adapted to multidimensional cache systems, e.g. set-associative, multicache, multiset or multilevel
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/12Computing arrangements based on biological models using genetic models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/12Computing arrangements based on biological models using genetic models
    • G06N3/126Evolutionary algorithms, e.g. genetic algorithms or genetic programming

Definitions

  • the present application relates to the field of caching technologies, and in particular, to a data caching method, apparatus, device, and computer-readable storage medium.
  • caching strategy An appropriate caching strategy can ensure the effectiveness of the cache, thereby improving the speed at which users acquire data.
  • the data access patterns of different users may also be different, so the caching strategies suitable for processing the data generated by these users will also be different. Therefore, in order to adapt to the data access patterns of different users, it is an urgent problem to be solved in the current field of caching technology to construct a flexible and adaptable caching strategy.
  • the present application discloses a data caching method, device, device and computer-readable storage medium, which can construct a caching strategy group with high flexibility and strong adaptability, and when using the caching strategy group for data caching, the cache performance can be improved. effectiveness, thereby increasing the speed of data reading.
  • the present application provides a data caching method, which includes the following steps:
  • each type of cache policy library includes at least one cache policy of the same type, wherein the types of the cache policy library include filter type, prefetch type, replacement type, At least one of the sacrificing cache types, the filtering type caching strategy is used to filter data, the prefetch type caching strategy is used to prefetch data, the replacement type caching strategy is used to evict data from the cache, and the sacrificing cache type caching strategy is used for processing data that has been eliminated from the cache;
  • a first set of caching policies comprising a plurality of caching policies is applied to the data generated by the first entity.
  • the user or device can choose the caching strategy applied to the data generated by the first entity.
  • a caching strategy group has more flexibility, and can also improve the caching effect of the first caching strategy group on the data generated by the first entity.
  • applying the first cache policy group including multiple cache policies to the data generated by the first entity includes: optimizing the first cache policy group according to the access records of the data generated by the first entity , obtain the second cache strategy group; apply the second cache strategy group to the data generated by the first entity.
  • the second cache policy group is obtained by optimizing the first cache policy group according to the access records of the data generated by the first entity, the second cache policy group is more suitable than the first cache policy group. Processing the data generated by the first entity, that is, using the second cache policy group to cache the data generated by the first entity can achieve a better cache effect, thereby improving the speed at which the first entity acquires data.
  • the type of the cache policy library further includes an exclusive type
  • the cache policy of the exclusive type is a cache policy set by a user.
  • the user can set the required caching policy by himself, so the first caching policy group can have higher flexibility.
  • the above method further includes: optimizing the first cache policy group according to the access record of the data generated by the second entity to obtain a third cache policy group; applying the third cache policy group to the second cache policy group Data generated by the entity.
  • the first cache policy group can also be used to process the data generated by the second entity, wherein the data generated by the first entity is different from the data generated by the second entity, therefore, the first cache policy group can be applied It has good adaptability to data generated by different entities.
  • the multiple cache policies in the first cache policy group are arranged in a preset order, and when the multiple cache policies include a proprietary type of cache policy, the exclusive type of cache policy The location is set by the user.
  • the user can also set the location of the cache policy by himself. Then, the user can set the first cache policy group according to his own needs, so that the first cache policy group can have higher flexibility.
  • the above method before applying the first cache policy group including multiple cache policies to the data generated by the first entity, the above method further includes: determining the validity of the first cache policy group.
  • the first cache policy group includes multiple cache policies, and there may be conflicts between multiple cache policies, in order to avoid this situation, before using the first cache policy group for data caching, it is necessary to A cache policy group is checked for validity.
  • the first caching strategy combination method it can be applied to the data generated by the first entity.
  • the first caching strategy group is invalid, it needs to be further adjusted to a valid caching strategy group, and then applied to the data generated by the first entity. Data generated by the first entity.
  • each cache policy in the multiple cache policies corresponds to a policy attribute set
  • determining the validity of the first cache policy group includes: according to the multiple policy attribute sets corresponding to the multiple cache policies, Determine the legitimacy of the first cache policy group; wherein, a policy attribute set corresponding to a cache policy includes at least one of a first attribute and a second attribute, and the first attribute is used to determine whether there is a combination of the above-mentioned one in the first cache policy group The cache policy of the cache policy conflict, and the second attribute is used to determine whether the first cache policy group includes a plurality of the one cache policy.
  • optimizing the first cache policy group according to the data access record generated by the first entity to obtain the second cache policy group includes: in the case of the first cache policy combination method, according to the first cache policy combination method.
  • each cache policy in the first cache policy group is iteratively optimized by using a heuristic algorithm or a machine learning algorithm, thereby obtaining a second cache policy group.
  • the present application provides a data caching device, the device comprising:
  • the acquiring unit is configured to acquire multiple cache policies from multiple types of cache policy libraries, where each type of cache policy library includes at least one cache policy of the same type, wherein the types of the cache policy library include filtering types, prefetching At least one of Type, Replacement Type, and Sacrifice Cache Type, the filter-type cache policy is used to filter data, the prefetch-type cache policy is used to prefetch data, the replacement-type cache policy is used to evict data from the cache, and the sacrifice cache The type of caching strategy used to handle data evictions from the cache;
  • the cache unit is used for applying the first cache policy group including a plurality of cache policies to the data generated by the first entity.
  • the cache unit is specifically configured to: optimize the first cache policy group according to the access record of the data generated by the first entity to obtain the second cache policy group; apply the second cache policy group to the first cache policy group Data generated by an entity.
  • the type of the cache policy library further includes an exclusive type
  • the cache policy of the exclusive type is a cache policy set by a user.
  • the cache unit is further configured to: optimize the first cache policy group according to the access record of the data generated by the second entity to obtain a third cache policy group; apply the third cache policy group to the third cache policy group Data generated by two entities.
  • the multiple cache policies in the first cache policy group are arranged in a preset order, and when the multiple cache policies include a proprietary type of cache policy, the exclusive type of cache policy The location is set by the user.
  • the above-mentioned apparatus further includes a determination unit, where the determination unit is configured to: determine the validity of the first cache policy group.
  • each cache policy in the multiple cache policies corresponds to a policy attribute set
  • the determining unit is specifically configured to: determine the first cache policy group according to the multiple policy attribute sets corresponding to the multiple cache policies The legality of the cache policy; wherein, a set of policy attributes corresponding to a cache policy includes at least one of a first attribute and a second attribute, and the first attribute is used to determine whether there is a cache policy conflicting with the above-mentioned one cache policy in the first cache policy group , and the second attribute is used to determine whether the first cache policy group includes a plurality of the above one cache policy.
  • the caching unit is specifically configured to: in the case of the first caching strategy combination method, use a heuristic algorithm or a machine learning algorithm for the first caching strategy according to the access records of the data generated by the first entity.
  • Each cache policy in the group is iteratively optimized to obtain a second cache policy group.
  • the present application provides a cache device.
  • the cache device includes a processor and a memory.
  • the processor executes code in the memory to implement some or all of the steps described in the first aspect.
  • the present application provides a computer-readable storage medium storing computer instructions, where the computer instructions are used to implement some or all of the steps described in the first aspect.
  • the present application provides a computer program product, including a computer program, which, when the computer program is read and executed by a computing device, implements some or all of the steps described in the first aspect.
  • 1A is a schematic diagram of an ARC strategy provided by the application.
  • 1B is a schematic diagram of the principle of adaptive adjustment of an ARC strategy provided by the present application.
  • FIG. 2A is a schematic diagram of a cache policy selection interface provided by the present application.
  • FIG. 2B is a schematic diagram of another cache policy selection interface provided by the present application.
  • FIG. 3 is a schematic flowchart of a data caching method provided by the present application.
  • 4A is a schematic diagram of a cache management page provided by the present application.
  • 4B is a schematic diagram of another cache management page provided by the present application.
  • 5A is a schematic diagram of a first cache policy group provided by the present application.
  • 5B is a schematic diagram of another first cache policy group provided by the present application.
  • FIG. 6 is a schematic flowchart of a specific embodiment provided by the present application.
  • FIG. 7 is a schematic structural diagram of a data cache device provided by the present application.
  • FIG. 8 is a schematic structural diagram of a cache device provided by the present application.
  • Data access entity refers to data users who access data and have different data storage requirements (ie cache requirements), such as users (groups), applications (groups), processes (groups) ), thread (group), etc.
  • cache requirements such as users (groups), applications (groups), processes (groups) ), thread (group), etc.
  • the cache requirement in this application can be understood as the entity's requirement for the data stored in the cache, that is, when the data is stored in the cache, the cache is valid.
  • Cache validity refers to whether the cache is valid. When the cache is valid, the entity can obtain more accessed data from the direct cache, thereby improving the speed at which the entity obtains data.
  • indicators to measure cache effectiveness include: cache hit rate, data migration amount read, and magnification.
  • Data migration amount refers to the migration amount of cached data.
  • data is usually stored in different storage methods on storage devices with different performances according to the importance, access frequency, retention time, capacity, performance and other indicators of the data, that is, through hierarchical storage. In this way, data that is not frequently accessed is automatically migrated to a lower level in the storage hierarchy, thereby releasing higher-cost storage space for frequently accessed data. Then, when the amount of data migration is larger, it means that a large amount of infrequently accessed data is stored in the cache, which will increase the cost of data migration and the burden on the storage system.
  • data migration amount can have different names, for example, different standards or different versions of the same standard, different manufacturers, and different application scenarios can have different names for "data migration amount", for example, cache overhead. Wait.
  • Read magnification For a piece of data, if the cache policy determines that the data meets the cache requirements, the data will be read from the memory and copied to the cache. At this time, the data is equivalent to being read twice. Next, when the cache is full, if the cache policy determines that the data does not meet the cache, the data will be eliminated from the cache. When the data is accessed again, the cache policy will consider the data to meet the cache requirements, and then will The data is copied from memory to cache, at which point the data is read 2 more times. Then, when the read magnification of a data is larger, it means that the data is repeatedly eliminated from the cache, and the cache effect of the cache is poor. It is worth noting that "read magnification” can have different names, for example, different standards or different versions of the same standard, different manufacturers, and different application scenarios can have different names for "read magnification", for example, cache load. intake, etc.
  • Data Access Mode refers to the way an entity accesses data, such as recency mode and frequency mode.
  • Recent access mode refers to the entity always accessing the most recently accessed data
  • frequent access mode refers to the entity Always access data with high historical access frequency.
  • the data in the cache should meet the requirements of the data access mode. For example, if the entity adopts the most recent access mode, the cache should store the recently accessed data. If the entity adopts the frequent access mode, Then the cache should store data with high historical access frequency.
  • the caching strategy used to process the data produced by the entity should be adapted to the data access pattern so that the caching needs can be satisfied and thus the effectiveness of the cache can be improved.
  • Cache refers to a structure located between two kinds of hardware with large speed differences (for example, processor and memory, memory and hard disk, hard disk and network, etc.) The effect of data read performance. It is not difficult to understand that when a user frequently accesses a piece of data, if the data is retrieved from the memory every time, the user needs to wait a long time to obtain the data each time it is accessed. The emergence of the cache effectively solves this problem. Specifically, the frequently accessed data is copied to the cache, so that when users access the data subsequently, they can directly read the data from the cache, thereby improving the speed of data reading. .
  • a cache is a storage area for frequently accessed data.
  • frequently accessed data may change, that is, data that was frequently accessed in the previous period may not be accessed at this time, while data that was not frequently accessed in the previous period may be accessed at this time.
  • Frequently accessed how to determine which data is frequently accessed data?
  • a caching strategy needs to be designed to manage the cache to ensure the effectiveness of the cache.
  • a "cache policy” can have a different name.
  • different standards or different versions of the same standard, different manufacturers, and different application scenarios may have different names for "cache strategy".
  • the term “cache strategy” may sometimes be called “cache method”, “cache algorithm” Wait.
  • an appropriate caching strategy is the key to improving the speed at which the entity obtains data.
  • the caching requirements corresponding to different entities may also be different.
  • Caching strategies that are suitable for handling data generated by different entities also vary. It is not difficult to understand that if a proper caching strategy is designed for each entity, it will consume a lot of resources. Therefore, in order to adapt to the data access patterns of different users, how to construct a flexible and adaptable caching strategy to improve the effectiveness of the cache is still an urgent problem to be solved in the current field of caching technology.
  • common caching strategy schemes include a single adaptive strategy scheme and a hybrid caching strategy scheme, as follows:
  • Single adaptive strategy scheme It refers to configuring the same caching strategy for different entities, and then improving the caching strategy according to the data access situation of each entity in the actual application process, so that it can adapt to different data access adopted by different entities. model.
  • the adaptive replacement cache (ARC) proposed by Megiddo and Modha is a typical single adaptive strategy, which combines the least recently used (LRU) algorithm.
  • the ARC strategy is suitable for processing data generated by entities with a recent access pattern, or data generated by an entity with a frequent access pattern.
  • FIG. 1A shows a schematic diagram of an ARC strategy.
  • the ARC strategy specifically includes an LRU linked list, an LFU linked list, a linked list (Ghost LRU linked list) for storing information about data eliminated from the LRU linked list, and a linked list for storing information about data eliminated from the LFU linked list.
  • Linked list Ghost LFU linked list.
  • the LRU linked list and the LFU linked list are used to store data. Specifically, the LRU linked list stores the most recently used data, and the LFU linked list stores the most recently used data; while the ghost LRU linked list and ghost LFU linked list does not store data , which stores information about the data (eg, offset).
  • the ARC strategy will dynamically adjust the lengths of the LRU linked list and the LFU linked list according to the hits of the four linked lists, so that the ARC strategy has the ability to adapt. Specifically, if the hit rate of the LRU linked list and the ghost-LRU linked list is high, the length of the LRU linked list is increased, and if the hit rate of the LFU linked list and the ghost-LFU linked list is high, the length of the LFU linked list is increased. For example, as shown in Figure 1B, when the LRU linked list is full, if another data A needs to be written into the LRU linked list, the least recently accessed data B in the LRU linked list will be eliminated, and the data B will be eliminated. Will be put into the ghost-LRU queue. Suppose that after a period of time, the data B is accessed again, and the ghost-LRU queue is hit. In this case, the length of the LRU list will be increased by 1, and correspondingly, the length of the LFU list will be decreased by 1.
  • the adaptability of the ARC strategy is realized by switching between the LRU algorithm and the LFU algorithm based on the hit situation of the linked list. Therefore, the ARC strategy can only be adapted to the recent access mode and the frequent access mode, which leads to ARC
  • the adaptability of the strategy is limited, and it is difficult to meet other cache requirements except the cache requirements corresponding to the above two data access modes.
  • other single adaptive strategies have limited flexibility and adaptability. In general, a single adaptive strategy usually suffers from poor flexibility and adaptability, making it difficult to meet many different caching needs.
  • Hybrid caching strategy scheme A candidate strategy set is provided. For different entities, the corresponding caching strategy can be selected from the candidate strategy set to process the data generated by the corresponding entity.
  • the hybrid caching strategy is the most commonly used caching strategy in cloud storage, content delivery network (CDN) and other fields.
  • the solution provides a cache policy selection interface, which displays a set of candidate policies to the user, so that the user can select an appropriate cache policy by himself.
  • a cache policy selection interface shown in FIG. 2A , a selection option of a cache policy is provided, and a user can select one or more corresponding cache policies according to the data access mode corresponding to the entity.
  • the cache policy selection interface can also provide the configuration options of the cache policy, so that the user can conveniently select the cache policy for the specified file. Taking the cache policy selection interface shown in FIG.
  • the user can Enter “.txt” in the configuration options, and select cache policy 1 for files with a file suffix of ".txt", so as to use cache policy 1 to process data in files with a file extension of ".txt".
  • the data access mode corresponding to the entity is analyzed, and then according to the data access mode of the entity, a caching policy suitable for the data access mode corresponding to the entity is selected from the candidate policy set.
  • the present application provides a data caching method, which can construct a caching strategy group with high flexibility and strong adaptability, thereby improving the effectiveness of caching.
  • the data caching method provided by the present application will be described in detail with reference to FIG. 3 to FIG. 6 .
  • FIG. 3 shows a schematic flowchart of a data caching method provided by the present application.
  • the method includes but is not limited to the following steps:
  • the cache device acquires multiple cache policies from multiple types of cache policy libraries.
  • each type of cache policy library includes at least one cache policy of the same type, wherein the type of the cache policy library includes at least one of a filter type, a prefetch type, a replacement type, and a sacrifice cache type.
  • the type of the cache policy library also includes a proprietary type. The following is an introduction to each type of cache policy library and the cache policies it includes:
  • the filtering-type caching strategy library (hereinafter referred to as the filtering strategy library) includes one or more filtering-type caching strategies (hereinafter referred to as the filtering strategy), and the filtering strategy is used to filter data.
  • the processor is executing the program of face recognition. A large amount of data such as face images, face features, and face recognition results will be generated, and the filtering strategy can filter out data such as face images and face features.
  • the filtering strategies included in the filtering-type caching strategy library may be classical filtering algorithms such as double filter, bloom filter, etc., or may be user-defined filtering rules, such as , the user filters files with the file name suffix ".jpg" in a self-defined manner, which is not specifically limited in this application.
  • the cache strategy library of the prefetch type includes one or more cache strategies of the prefetch type (hereinafter referred to as the prefetch strategy), and the prefetch strategy is used to prefetch data.
  • the retrieval strategy is used to predict the data to be accessed by the entity, and store the predicted data in the cache in advance. Then, when the entity accesses the data, it can be obtained directly from the cache, thereby improving the access efficiency of the entity.
  • the prefetch strategy can predict the data required for the next execution of the loop instruction according to the number of times the processor has executed the loop instruction, and calculate the data required for the next execution of the loop instruction by the processor.
  • the data is stored in the cache in advance.
  • the prefetching strategy included in the cache strategy library of the prefetching type in this application may be a readahead algorithm (readahead), an adaptive readahead algorithm (adaptive readahead), a smart prefetcher algorithm (smart prefetcher), etc.
  • the defined prefetching rules are not specifically limited here.
  • the replacement-type caching strategy library (hereinafter referred to as the replacement strategy library) includes one or more replacement-type caching strategies (hereinafter referred to as the replacement strategy), and the replacement strategy is used to eliminate data from the cache.
  • the replacement-type cache The replacement policies included in the policy library may be LRU, LFU, ARC, etc., or may be user-defined replacement rules, which are not specifically limited here.
  • the cache strategy library of sacrificial cache type includes one or more cache strategies of sacrificial cache type (hereinafter referred to as sacrificial cache strategy), and the sacrificial cache strategy is used to process data eliminated from the cache. Understandably, since the data eliminated from the cache may still be accessed again, in this case, the entity needs to re-obtain the eliminated data from the memory. In order to reduce the loss caused by this process, the cache strategy can be sacrificed. The eliminated data is temporarily stored in the victim cache, and then whether to eliminate the data from the victim cache is determined according to the probability of subsequent access, that is, the victim cache strategy can be eliminated from the cache but subsequently accessed. Data with high probability is stored in the sacrifice cache, so that when the entity accesses the eliminated data again, it can directly obtain the data from the sacrifice cache.
  • sacrificial cache strategy library includes one or more cache strategies of sacrificial cache type (hereinafter referred to as sacrificial cache strategy), and the sacrificial cache
  • the proprietary type of caching policy library (hereinafter referred to as the proprietary policy library) includes one or more proprietary types of caching policies (hereinafter referred to as the proprietary policy), and the proprietary policy is the caching policy set by the user. For example, when an entity accesses a database, it extracts data according to the row number of the database every time. Therefore, the user can set a special purpose for extracting relevant data according to the row number of the database. There are strategies.
  • multiple types of cache policy libraries may be configured in the cache device, or may be configured in other electronic devices or systems, and may also be partially configured in the cache device and partially configured in other electronic devices or systems. There is no specific limitation here.
  • the types of multiple cache policies obtained from multiple types of cache policy libraries may be different, or may all be the same, or may be partially the same and partially different, which is not specifically limited here.
  • the multiple cache policies may include multiple identical cache policies, and the multiple cache policies may also be multiple different cache policies, which are not specifically limited here.
  • the cache device can obtain multiple cache policies from multiple types of cache policy libraries in the following ways.
  • Manner 1 The cache device acquires multiple cache policies selected by the user from multiple types of cache policy libraries.
  • multiple types of cache policy libraries can be displayed to users in the form of a cache management page.
  • five types of cache policy libraries are displayed on the cache management page, including: filtering policy library, prefetching Strategy library, replacement strategy library, sacrifice cache strategy library, and proprietary strategy library.
  • the filter strategy library includes 3 filtering strategies
  • the prefetch strategy library includes 4 prefetch strategies
  • the replacement strategy library includes 5 replacement strategies.
  • the strategy library includes 2 sacrificial caching strategies
  • the proprietary strategy library includes 2 proprietary strategies.
  • configuration options can also be displayed on the cache management page, so that the user can define which data is processed by using the cache policy, where the processed data is cached, the cache time, and the cache priority.
  • the user inputs directory A in the configuration options on the cache management page, and the cache device processes the data in the directory A by selecting multiple cache policies.
  • the user may randomly select multiple cache policies from multiple types of cache policy libraries, or analyze the data access mode of the first entity (specifically, the access record of the data generated by the first entity),
  • multiple cache policies are selected from multiple types of cache policy libraries, and multiple cache policies can also be selected from multiple types of cache policy libraries in other ways, which are not specifically limited in this application.
  • the cache device selects multiple cache policies from multiple types of cache policy libraries.
  • the cache device may randomly select multiple cache policies from multiple types of cache policy libraries, or may analyze the data access mode of the first entity (specifically, the access record of the data generated by the first entity) , so that multiple cache policies can be selected from multiple types of cache policy libraries, and multiple cache policies can also be selected from multiple types of cache policy libraries according to the issued configuration file, which is not specifically limited in this application.
  • the configuration file includes one or more of the following: the total number of selected cache policies, which types of cache policies are selected, the number of selected cache policies of each type, and which cache policy is selected specifically.
  • the cache device can also obtain multiple cache policies by combining the first mode and the second mode, that is, a part of the cache policies are selected by the user, and the other part of the cache policies are selected by the cache device.
  • a user or a cache device can flexibly select a cache policy according to actual needs, which enables the first cache policy group to better meet the user's requirements. For example, when a sacrifice cache is not configured in the storage system, the sacrifice cache policy may not be selected. , for another example, when the data from the cache needs to be filtered first and then stored in the sacrifice cache, the user can set a proprietary strategy to achieve this purpose. Moreover, the types of cache policies in the first cache policy group, the number of each type of cache policies, etc. can be adjusted according to actual conditions, which makes the first cache policy group have multiple possibilities. It is not difficult to understand that compared with the single adaptive strategy and candidate strategy set mentioned in the foregoing content, more cache strategy groups can be easily expanded by using the above method, thereby providing more choices, that is, it can satisfy more cache requirements.
  • the cache device applies a first cache policy group including multiple cache policies to the data generated by the first entity.
  • the multiple cache policies in the first cache policy group are arranged in a preset order.
  • the location of the exclusive policy is set by the user.
  • the preset order is: the filtering strategies are arranged before the prefetching strategies, and the prefetching strategies are arranged before the prefetching strategies.
  • the replacement strategy is arranged before the sacrifice cache strategy, and the position of the exclusive strategy is set by the user, that is, the exclusive strategy can be arranged before or after any strategy. It can be understood that, by specifying the location of the dedicated policy, the user can make the first cache policy group more in line with the cache requirement, so that a better cache effect can be obtained.
  • the preset order may also define the order of these cache policies of the same type.
  • the preset sequence defines that filter policy 1 is arranged before filter policy 2 , and filter policy 3 is arranged before filter policy 1 .
  • the cache device can arrange these cache policies according to a preset order , so as to obtain the first cache policy group as shown in FIG. 5A .
  • the arrangement order between cache policies of the same type may not be defined in the preset order.
  • the preset order does not define the arrangement between filter policy 1 and filter policy 2 In order, the cache device will obtain two first cache policy groups as shown in FIG. 5A and FIG. 5B .
  • each cache policy in the multiple cache policies includes not only the algorithm description of the policy itself, but also the description of policy attributes. , is to use algorithm description and policy attribute description to jointly describe a cache policy.
  • each cache policy in the above-mentioned multiple cache policies corresponds to a policy attribute set
  • the policy attribute set includes the type of the cache policy
  • the type of the cache policy may specifically be the filter type, prefetch type, replacement type mentioned in step S101 Any of type, sacrifice cache type, proprietary type.
  • the cache device may arrange the multiple cache policies in a preset order according to the type of each cache policy in the multiple cache policies, so as to obtain the first cache policy group. It should be noted here that one cache policy in this application corresponds to only one policy type, so that the cache device can arrange multiple cache policies according to the type of the cache policy.
  • the first cache policy group may also be obtained in the following way: after the cache device acquires the above-mentioned multiple cache policies, it stores these cache policies in a preset file, and then names the file as the first cache policy Group.
  • filtering strategy 1 is used to filter data A
  • prefetching strategy 4 is used to put data A into the cache
  • filtering strategy 1 and Prefetching strategy 4 is incompatible, which means that filtering strategy 1 and prefetching strategy 4 conflict with each other. If filtering strategy 1 and prefetching strategy 4 are used at the same time, filtering strategy 1 or prefetching strategy 4 will fail, resulting in the first Part of the function of a cache policy group is invalid.
  • the cache device before the cache device applies the first cache policy group including multiple cache policies to the data generated by the first entity, the cache device also needs to determine the legality of the first cache policy group , that is, checking for incompatibilities between multiple caching strategies.
  • determining the validity of the first caching policy group by the caching device includes: determining the validity of the first caching policy group by the caching device according to multiple policy attribute sets corresponding to multiple caching policies; wherein, The policy attribute set of each cache policy further includes at least one of a first attribute and a second attribute.
  • the first attribute is used to determine whether there is a cache policy conflicting with the cache policy in the first cache policy group
  • the second attribute is used to determine whether there is a cache policy conflicting with the cache policy in the first cache policy group. It is determined whether a plurality of the cache policies can be included in the first cache policy group.
  • LRU is used to eliminate recently accessed strategies.
  • LFU is used to eliminate the least frequently accessed data recently. If the first cache policy group includes both LRU and LFU, it is assumed that a certain data in the cache is the most accessed data in the last 20 minutes, but this data is also The least frequently accessed data in the last 2 hours, in this case, the data should not be eliminated according to the LRU, but the data should be eliminated according to the LFU.
  • the first attribute of the replacement policy may be set to be incompatible with other replacement policies, and the second attribute may be set to one, indicating that the first cache policy group can only include one replacement policy.
  • the sacrifice cache strategy mainly acts on the data eliminated by the replacement strategy. Therefore, when the replacement strategy is one, the sacrifice cache strategy generally only needs one. Therefore, the first attribute of the sacrifice cache strategy can be set to be incompatible with other sacrifice cache strategies, and the second attribute can be set to one.
  • the first cache policy group may include multiple identical cache policies. For example, it is assumed that the entity first prefetches data according to the row number of the database, then prefetches the data according to the column number of the database, and then prefetches the data according to the row number of the database.
  • the above process can be implemented by executing prefetch strategy 1, prefetch strategy 2 and prefetch strategy 1 in sequence.
  • the first cache strategy group needs two prefetch strategies 1, prefetch
  • the second attribute of policy 1 can be set to multiple.
  • the policy attribute set corresponding to each cache policy may specifically include preconditions, postconditions, At least one of the first attribute and the second attribute.
  • the precondition of a cache policy refers to the condition that the cache policy arranged before the cache policy should satisfy, and/or the condition that the cache policy arranged before the cache policy should not satisfy, so that the cache device Whether the cache policy in the first cache policy group conflicts with the previous cache policy can be determined according to the precondition.
  • Post-conditions refer to conditions that should be satisfied by the caching policy arranged after the caching policy, and/or conditions that should not be satisfied by the caching policies arranged behind the caching policy, so that the caching device can judge based on the post-conditions Whether the cache policy in the first cache policy group conflicts with the cache policy behind it.
  • the first cache policy group shown in FIG. 5A it is assumed that the post-condition of filtering policy 1 is to filter data A, or the pre-condition of prefetching policy 4 is to put data A into the cache. In this case, The cache device determines that the first cache policy group is invalid.
  • the cache device may also select multiple cache policies from multiple types of cache policy libraries according to at least one of a precondition, a postcondition, a first attribute or a second attribute.
  • a precondition a precondition that prefetching policy 4 and filtering policy 2 should be used at the same time, then, after the caching device selects prefetching policy 4, filtering policy 2 will be selected.
  • the caching device applies a first caching policy group including multiple caching policies to the data generated by the first entity, including: The group is optimized to obtain a second cache policy group; then, the second cache policy group is applied to the data generated by the first entity. It can be understood that, through the above steps, a second cache policy group that is more suitable for satisfying the data access mode of the first entity can be obtained, thereby improving the effectiveness of the cache.
  • the caching device optimizes the first caching policy group according to the access records of the data generated by the first entity to obtain the second caching policy group, including: in the case of the first caching policy combination method, caching
  • the device uses a heuristic algorithm or a machine learning algorithm to iteratively optimize each cache policy in the first cache policy group according to the data access record generated by the first entity, thereby obtaining the second cache policy group.
  • the specific content of this step will be described in detail through steps S1021-S1025 below. Understandably, using a heuristic algorithm or a machine learning algorithm to optimize each cache policy in the first cache policy group can improve the optimization speed, so that the cache device can obtain the second cache policy group faster and apply it to the second cache policy group. Data generated by an entity.
  • the caching device may also apply the first caching policy group including multiple caching policies to the data generated by the second entity, wherein the data generated by the first entity is different from the data generated by the second entity.
  • the specific process of this step includes: the cache device optimizes the first cache policy group according to the access records of the data generated by the second entity to obtain a third cache policy group; and then applies the third cache policy group to the second entity generated data. It can be understood that the specific process of applying the first cache policy group to the data generated by the second entity by the cache device is similar to the specific process of applying the first cache policy group to the data generated by the first entity. Expand the details. It can be seen that the first caching strategy group provided by the present application can be applied to data generated by different entities, that is to say, the first caching strategy group has good adaptability.
  • S1021 Collect an access record of the data generated by the first entity.
  • S1022 Preprocess the access records of the data generated by the first entity, so as to remove abnormal data in the data generated by the first entity.
  • the preprocessing methods include filtering, cleaning, etc., and abnormal data includes incomplete data (including truncated data, censored data, missing data, etc.), data with wrong time stamps, and data with addresses exceeding the address range. Wait.
  • S1023 Analyze and evaluate the caching effect of the first caching policy group according to the access records of the preprocessed data.
  • the preprocessed data is input into the first cache strategy group, and after being processed by the first cache strategy group, the data stored in the current cache is obtained, and then, according to the access record of the preprocessed data, the determination is made.
  • the cache index corresponding to the current cache is further determined, so as to further determine the cache effect of the first cache policy group.
  • the cache indicator refers to an indicator that measures the effectiveness of the current cache. It can be understood that the more effective the cache is, the better the cache effect of the corresponding cache policy is. Therefore, the cache effect of the first cache policy group can be determined according to the cache index corresponding to the current cache.
  • the cache indicator includes at least one of cache hit rate, cache migration amount, and read magnification, wherein, for definitions of cache hit rate, cache migration amount, and read magnification, refer to the introduction of related concepts in the foregoing content.
  • the specific process of determining the cache index corresponding to the current cache is as follows: First, determine which data in the preprocessed data is stored in the current cache, so as to obtain the cache The number of hits and the number of cache misses, and then, the cache hit rate of the current cache is calculated according to the number of cache hits and the number of cache misses.
  • S1024 Determine whether the first cache policy group needs to be optimized according to the cache effect of the first cache policy group.
  • the cache device determines that the first cache policy group needs to be optimized. When the cache index meets the preset index, the cache device may not optimize the first cache policy group.
  • the cache index satisfying the preset index includes one or more of the following: the cache hit rate is greater than the preset hit rate, the cache migration amount is smaller than the preset migration amount, and the read magnification is smaller than the preset multiple.
  • the preset hit rate, the preset migration amount, and the preset multiple may be set by the user, or may be dynamically adjusted by the cache device according to the actual situation, which is not specifically limited here.
  • the cache device may not need to optimize the first cache policy group.
  • the cache index does not meet the preset index, it means that the cache is not effective, and at this time, the first cache policy group needs to be optimized to improve the effectiveness of the cache.
  • the optimization process of the first caching strategy group belongs to a multi-objective optimization problem, and since the multi-objective optimization problem will involve each sub-objective. (here, each caching strategy) is mutually restricted, that is, the optimization of one sub-goal may cause the performance of other sub-goals to degrade. Therefore, this application uses a heuristic algorithm (for example, an evolutionary algorithm) ) or a machine learning algorithm (eg, a reinforcement learning algorithm) to optimize the first caching strategy group, so that each caching strategy in the first caching strategy group is as optimized as possible.
  • a heuristic algorithm for example, an evolutionary algorithm
  • a machine learning algorithm eg, a reinforcement learning algorithm
  • the following describes the optimization process of the first cache strategy group by taking the genetic algorithm as an example.
  • GA Genetic algorithm
  • the specific process for the cache device to use the genetic algorithm to optimize the first cache policy group is as follows: first, the first cache policy group is regarded as a group, and the first Each cache policy in the cache policy group is regarded as an individual to be optimized, and each individual to be optimized is encoded (that is, the parameters of the cache policy are encoded), and then K data of string structure are randomly generated, The data of each string structure represents an individual to be optimized, so as to obtain the initial population data. Next, take the initial population data as the search point, calculate the fitness of each individual to be optimized, and calculate the fitness of the current population greater than the expected one.
  • the individuals with fitness are inherited to the next generation group, and then new individuals are generated through the crossover operation and mutation algorithm to obtain a new generation group, and then the cache index of the cache strategy group corresponding to the new generation group is determined.
  • the cache index of the strategy group is greater than the preset threshold, the above optimization process is performed iteratively, and the iteration is stopped until the cache index of the optimized cache strategy group is less than or equal to the preset threshold, thereby obtaining the second cache strategy group.
  • Step 1 Evaluate the caching effect of the caching strategy currently applied to the data generated by the entity.
  • the cache index corresponding to the current cache is monitored. If the cache index corresponding to the current cache is greater than the preset threshold, it means that the current cache strategy has a better cache effect, and the current cache strategy can still be used. If the cache index corresponding to the current cache is less than or equal to the preset threshold, it means that a large amount of data that does not meet the cache requirements is stored in the current cache, which means that the cache effect of the current cache strategy is poor, and then steps 2- Step 6 to build a caching strategy that can meet the needs of entity caching.
  • Step 2 Collect the access log of the data generated by the entity.
  • Step 3 Build a corresponding cache policy group for the entity.
  • Step 4 The validity of the cache strategy group constructed in step 3 is checked, and in the case of the combination method of the cache strategy, the cache effect is evaluated.
  • Step 5 When the cache effect of the cache policy group does not reach the preset effect (that is, the cache index does not meet the preset index), the cache policy group is iteratively optimized according to the access log of the data generated by the entity, until a cache effect is obtained. New set of caching strategies for preset effects.
  • Step 6 Apply the new cache strategy group obtained in step 5 to the storage system to process the data generated by the entity, so as to achieve a better cache effect and further improve the speed at which the entity acquires data.
  • step 1 may also be omitted, and steps 2 to 6 are performed periodically to optimize the caching policy corresponding to the entity, so that the caching policy can better adapt to the caching requirements of the entity.
  • FIG. 7 shows a schematic structural diagram of a data caching apparatus provided by the present application.
  • the apparatus 100 includes an obtaining unit 110 and a caching unit 120 .
  • the acquiring unit 110 is configured to acquire multiple cache policies from multiple types of cache policy libraries, where each type of cache policy library includes at least one cache policy of the same type, wherein the types of the cache policy library include filtering type, pre- At least one of the fetch type, the replacement type, and the sacrifice cache type, the filter type cache policy is used to filter data, the prefetch type cache policy is used to prefetch data, the replacement type cache policy is used to eliminate data from the cache, sacrifice Cache-type caching strategies are used to handle data that is evicted from the cache.
  • the cache unit 120 is configured to apply the first cache policy group including a plurality of cache policies to the data generated by the first entity.
  • the cache unit 120 is specifically configured to: optimize the first cache policy group according to the access record of the data generated by the first entity to obtain the second cache policy group; apply the second cache policy group to the first cache policy group. Data generated by an entity.
  • the type of the cache policy library further includes a proprietary type
  • the cache policy of the proprietary type is a cache policy set by a user.
  • the cache unit 120 is further configured to: optimize the first cache policy group according to the access records of the data generated by the second entity to obtain a third cache policy group; apply the third cache policy group to the third cache policy group. Data generated by two entities.
  • the multiple cache policies in the first cache policy group are arranged in a preset order, and when the multiple cache policies include a proprietary type of cache policy, the location of the proprietary type of cache policy Set by the user.
  • the data caching apparatus 100 further includes a determining unit 130, and the determining unit 130 is configured to: determine the validity of the first caching policy group.
  • each cache policy in the multiple cache policies corresponds to a policy attribute set
  • the determining unit 130 is specifically configured to: determine the first cache policy group according to the multiple policy attribute sets corresponding to the multiple cache policies The legality of the cache policy; wherein, a set of policy attributes corresponding to a cache policy includes at least one of a first attribute and a second attribute, and the first attribute is used to determine whether there is a cache policy conflicting with the above-mentioned one cache policy in the first cache policy group , and the second attribute is used to determine whether the first cache policy group includes a plurality of the above one cache policy.
  • the caching unit 120 is specifically configured to: in the case of the first caching strategy combination method, use a heuristic algorithm or a machine learning algorithm for the first caching strategy according to the access records of the data generated by the first entity.
  • Each cache policy in the group is iteratively optimized to obtain a second cache policy group.
  • the data caching apparatus 100 in this embodiment of the present application only uses the division of the above-mentioned functional modules as an example.
  • the above-mentioned functions may be allocated by different functional modules as required, that is, the internal structure of the data caching apparatus 100 is divided into Different functional modules to complete all or part of the functions described above.
  • the data caching apparatus 100 provided in the above embodiment belongs to the same concept as the caching device in the above method embodiment, and the specific implementation process is detailed in the above method embodiment.
  • the obtaining unit 110 obtains from multiple types of cache policy libraries
  • For the process of multiple caching policies please refer to the above step S101, and for the process of optimizing the first cache policy group by the caching unit 120, please refer to the above steps S1021-S1025, which will not be repeated here.
  • FIG. 8 shows a schematic structural diagram of a cache device provided by the present application.
  • the cache device 200 includes a processor 210 , a communication interface 220 and a memory 230 .
  • the processor 210 , the communication interface 220 and the memory 230 are coupled through the bus 240 .
  • the processor 210 may be a central processing unit (CPU), a general-purpose processor, a DSP, an application specific integrated circuit (ASIC), a field programmable gate array (FPGA), or any other available processor.
  • the processor 210 may implement or execute various exemplary methods described in conjunction with the above method embodiments. Specifically, the processor 210 reads the program code stored in the memory 230, and cooperates with the communication interface 220 to execute part or all of steps S101-S102, S1021-S1025 and steps 1-6.
  • the communication interface 220 can be a wired interface or a wireless interface for communicating with other modules or devices.
  • the wired interface can be an Ethernet interface, a controller area network interface, a local interconnect network (LIN), and a FlexRay interface.
  • the interface may be a cellular network interface or use a wireless local area network interface or the like.
  • the communication interface 220 can be connected to other devices.
  • the communication interface 220 can be connected to a storage system. After the processor 210 obtains the first cache policy group, the first cache policy can be sent to the storage system through the communication interface 220. for processing the data generated by the first entity.
  • the memory 230 may include volatile memory, such as random access memory (RAM); the memory 230 may also include non-volatile memory, such as read only memory (ROM), flash memory, hard disk (hard disk drive, HDD) or solid state drive (solid state drive, SSD), the memory 230 may also include a combination of the above-mentioned types of memory.
  • the memory 230 may store program codes and program data.
  • the program code is composed of codes of some or all of the units in the data caching apparatus 100 shown in FIG.
  • the program data is the data generated by the data caching apparatus 100 shown in FIG. 7 in the process of running the program, for example, the data generated by the first entity, the cache policy, and the like.
  • Bus 240 may be a controller area network (CAN) or other implementation internal bus.
  • the bus 240 can be divided into an address bus, a data bus, a control bus, and the like. For ease of presentation, only one thick line is used in FIG. 8, but it does not mean that there is only one bus or one type of bus.
  • the cache device 200 in the embodiment of the present application is configured to execute the method executed by the cache device in the above method embodiments, which belongs to the same concept as the above method embodiments.
  • the cache device 200 in the embodiment of the present application is configured to execute the method executed by the cache device in the above method embodiments, which belongs to the same concept as the above method embodiments.
  • the present application also provides a computer-readable storage medium, where the computer-readable storage medium stores computer instructions, when the computer instructions are stored in a computing device (for example, the data caching apparatus 100 shown in FIG. 7 or the caching apparatus 200 shown in FIG. ), causing the computing device to execute the method executed by the cache device in the foregoing method embodiments.
  • a computing device for example, the data caching apparatus 100 shown in FIG. 7 or the caching apparatus 200 shown in FIG.
  • the present application also provides a computer program product, including a computer program, when the computer program is read and executed by a computing device (for example, the data caching device 100 shown in FIG. 7 or the caching device 200 shown in FIG. 8 ), using The method is used to implement the method executed by the cache device in the above method embodiment.
  • a computing device for example, the data caching device 100 shown in FIG. 7 or the caching device 200 shown in FIG. 8 .
  • the above-mentioned embodiments it may be implemented in whole or in part by software, hardware, firmware or any combination thereof.
  • software it can be implemented in whole or in part in the form of a computer program product.
  • the computer program product described above includes one or more computer instructions.
  • the aforementioned computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable device.
  • the above-mentioned computer instructions may be stored in a computer-readable storage medium, or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the above-mentioned computer instructions may be transmitted from a website site, computer, server or data center via wired communication. (eg, coaxial cable, optical fiber, digital subscriber line) or wireless (eg, infrared, wireless, microwave, etc.) to another website site, computer, server or data center.
  • the above-mentioned computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device such as a server, a data center, etc. that includes one or more available media integrated.
  • the above-mentioned usable media may be magnetic media (eg, floppy disks, memory disks, magnetic tapes), optical media (eg, DVD), or semiconductor media (eg, SSD), and the like.
  • magnetic media eg, floppy disks, memory disks, magnetic tapes
  • optical media eg, DVD
  • semiconductor media eg, SSD
  • the disclosed apparatus may also be implemented in other manners.
  • the device embodiments described above are only illustrative.
  • the division of the units is only a logical function division. In actual implementation, there may be other division methods.
  • multiple units or components may be combined or integrated. to another system, or some features can be ignored or not implemented.
  • the indirect coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be in electrical or other forms.
  • the units described above as separate components may or may not be physically separated, and components shown as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solutions in the embodiments of the present application.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.
  • the above-mentioned integrated units may be implemented in the form of hardware, or may be implemented in the form of software functional units.
  • the integrated units are implemented in the form of software functional units and sold or used as independent products, they may be stored in a computer-readable storage medium.
  • the technical solution of the present application can be embodied in the form of a software product in essence, or the part that contributes to the prior art, or all or part of the technical solution, and the computer software product is stored in a storage medium.
  • a computer device which may be a personal computer, a server, or a network device, etc.
  • the aforementioned storage medium may include, for example, various media that can store program codes, such as a U disk, a removable hard disk, a read-only memory, a random access memory, a magnetic disk or an optical disk.

Abstract

A data caching method and apparatus, and a device and a computer-readable storage medium. The method comprises: acquiring a plurality of caching policies from a plurality of types of caching policy libraries (S101), wherein at least one caching policy comprised in each type of caching policy libraries is of the same type, the types of the caching policy libraries comprise at least one of a filtering type, a pre-fetching type, a replacement type and a victim cache type, a caching policy of the filtering type is used for filtering data, a caching policy of the pre-fetching type is used for pre-fetching data, a caching policy of the replacement type is used for eliminating data from a cache, and a caching policy of the victim cache type is used for processing the data eliminated from the cache; and applying a first caching policy group, which comprises the plurality of caching policies, to data generated by a first entity (S102). By using the above-mentioned method, a caching policy applied to data generated by a first entity can be flexibly selected, so as to improve the caching effect of a first caching policy group on the data generated by the first entity.

Description

数据缓存方法、装置、设备及计算机可读存储介质Data caching method, apparatus, device, and computer-readable storage medium
本申请要求于2021年1月15日提交中国国家知识产权局、申请号为202110057914.1、发明名称为“数据缓存方法、装置、设备及计算机可读存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application filed on January 15, 2021 with the State Intellectual Property Office of China, the application number is 202110057914.1, and the invention name is "Data caching method, device, device and computer-readable storage medium", all of which are The contents are incorporated herein by reference.
技术领域technical field
本申请涉及缓存技术领域,尤其涉及一种数据缓存方法、装置、设备及计算机可读存储介质。The present application relates to the field of caching technologies, and in particular, to a data caching method, apparatus, device, and computer-readable storage medium.
背景技术Background technique
随着缓存技术的快速发展,用户获取数据的速度也越来越快。其中,影响用户获取数据的速度的一个关键因素是缓存策略,一个合适的缓存策略能够保证缓存的有效性,从而能够提高用户获取数据的速度。但是,在实际应用过程中,由于业务逻辑的不同,不同用户的数据访问模式也可能不同,那么适用于处理这些用户产生的数据的缓存策略也会有所差异。因此,为了适应不同用户的数据访问模式,构建出灵活且适应能力强的缓存策略是当前缓存技术领域中的一个亟需解决的问题。With the rapid development of caching technology, the speed at which users obtain data is getting faster and faster. Among them, a key factor that affects the speed at which users acquire data is the caching strategy. An appropriate caching strategy can ensure the effectiveness of the cache, thereby improving the speed at which users acquire data. However, in the actual application process, due to different business logic, the data access patterns of different users may also be different, so the caching strategies suitable for processing the data generated by these users will also be different. Therefore, in order to adapt to the data access patterns of different users, it is an urgent problem to be solved in the current field of caching technology to construct a flexible and adaptable caching strategy.
发明内容SUMMARY OF THE INVENTION
本申请公开了一种数据缓存方法、装置、设备及计算机可读存储介质,能够构建出灵活度高且适应能力强的缓存策略组,并且利用该缓存策略组进行数据缓存时,可以提高缓存的有效性,从而提高数据读取的速度。The present application discloses a data caching method, device, device and computer-readable storage medium, which can construct a caching strategy group with high flexibility and strong adaptability, and when using the caching strategy group for data caching, the cache performance can be improved. effectiveness, thereby increasing the speed of data reading.
第一方面,本申请提供了一种数据缓存方法,该方法包括如下步骤:In a first aspect, the present application provides a data caching method, which includes the following steps:
从多个类型的缓存策略库中获取多个缓存策略,每个类型的缓存策略库包括的至少一个缓存策略的类型相同,其中,缓存策略库的类型包括过滤类型、预取类型、替换类型、牺牲缓存类型中的至少一个,过滤类型的缓存策略用于过滤数据,预取类型的缓存策略用于预取数据,替换类型的缓存策略用于从缓存中淘汰数据,牺牲缓存类型的缓存策略用于处理从缓存中淘汰的数据;Obtain multiple cache policies from multiple types of cache policy libraries, each type of cache policy library includes at least one cache policy of the same type, wherein the types of the cache policy library include filter type, prefetch type, replacement type, At least one of the sacrificing cache types, the filtering type caching strategy is used to filter data, the prefetch type caching strategy is used to prefetch data, the replacement type caching strategy is used to evict data from the cache, and the sacrificing cache type caching strategy is used for processing data that has been eliminated from the cache;
将包括多个缓存策略的第一缓存策略组运用于第一实体产生的数据。A first set of caching policies comprising a plurality of caching policies is applied to the data generated by the first entity.
实施第一方面所描述的方法,用户或者设备可以自行选择运用于第一实体产生的数据的缓存策略,在这一过程中,用户或设备可以根据实际需求灵活地选择缓存策略,不仅可以使得第一缓存策略组具有更多的灵活性,还可以提高第一缓存策略组对第一实体产生的数据的缓存效果。By implementing the method described in the first aspect, the user or device can choose the caching strategy applied to the data generated by the first entity. A caching strategy group has more flexibility, and can also improve the caching effect of the first caching strategy group on the data generated by the first entity.
在一种可能的实现方式中,将包括多个缓存策略的第一缓存策略组运用于第一实体产生的数据,包括:根据第一实体产生的数据的访问记录对第一缓存策略组进行优化,得到第二缓存策略组;将第二缓存策略组运用于第一实体产生的数据。In a possible implementation manner, applying the first cache policy group including multiple cache policies to the data generated by the first entity includes: optimizing the first cache policy group according to the access records of the data generated by the first entity , obtain the second cache strategy group; apply the second cache strategy group to the data generated by the first entity.
上述实现方式中,由于第二缓存策略组是根据第一实体产生的数据的访问记录对第一缓存策略组进行优化得到的,因此相较于第一缓存策略组,第二缓存策略组更适合处理第一实 体产生的数据,也就是说,利用第二缓存策略组对第一实体产生的数据进行缓存可以达到更好的缓存效果,从而提高第一实体获取数据的速度。In the above implementation manner, since the second cache policy group is obtained by optimizing the first cache policy group according to the access records of the data generated by the first entity, the second cache policy group is more suitable than the first cache policy group. Processing the data generated by the first entity, that is, using the second cache policy group to cache the data generated by the first entity can achieve a better cache effect, thereby improving the speed at which the first entity acquires data.
在一种可能的实现方式中,缓存策略库的类型还包括专有类型,专有类型的缓存策略为用户设置的缓存策略。In a possible implementation manner, the type of the cache policy library further includes an exclusive type, and the cache policy of the exclusive type is a cache policy set by a user.
上述实现方式中,用户可以自行设置所需的缓存策略,那么,第一缓存策略组可以具有更高的灵活性。In the above implementation manner, the user can set the required caching policy by himself, so the first caching policy group can have higher flexibility.
在一种可能的实现方式中,上述方法还包括:根据第二实体产生的数据的访问记录对第一缓存策略组进行优化,得到第三缓存策略组;将第三缓存策略组运用于第二实体产生的数据。In a possible implementation manner, the above method further includes: optimizing the first cache policy group according to the access record of the data generated by the second entity to obtain a third cache policy group; applying the third cache policy group to the second cache policy group Data generated by the entity.
上述实现方式中,第一缓存策略组还可以用于对第二实体产生的数据进行处理,其中,第一实体产生的数据与第二实体产生的数据不同,因此,第一缓存策略组能够适用于不同实体产生的数据,具有良好的适应能力。In the above implementation manner, the first cache policy group can also be used to process the data generated by the second entity, wherein the data generated by the first entity is different from the data generated by the second entity, therefore, the first cache policy group can be applied It has good adaptability to data generated by different entities.
在一种可能的实现方式中,第一缓存策略组中的多个缓存策略是按照预设顺序排布的,当多个缓存策略包括专有类型的缓存策略时,专有类型的缓存策略的位置由用户设置。In a possible implementation manner, the multiple cache policies in the first cache policy group are arranged in a preset order, and when the multiple cache policies include a proprietary type of cache policy, the exclusive type of cache policy The location is set by the user.
上述实现方式中,用户还可以自行设置缓存策略的位置,那么,用户可以根据自身需求来设置第一缓存策略组,从而使得第一缓存策略组可以具有更高的灵活性。In the above implementation manner, the user can also set the location of the cache policy by himself. Then, the user can set the first cache policy group according to his own needs, so that the first cache policy group can have higher flexibility.
在一种可能的实现方式中,在将包括多个缓存策略的第一缓存策略组运用于第一实体产生的数据之前,上述方法还包括:确定第一缓存策略组的合法性。In a possible implementation manner, before applying the first cache policy group including multiple cache policies to the data generated by the first entity, the above method further includes: determining the validity of the first cache policy group.
可以理解的,由于第一缓存策略组包括多个缓存策略,而多个缓存策略之间可能存在冲突,因此为了避免这种情况,在运用第一缓存策略组进行数据缓存之前,需要先对第一缓存策略组进行合法性检查。当第一缓存策略组合法的情况下,可以将其运用于第一实体产生的数据,当第一缓存策略组不合法的情况下,需要将其进一步调整为合法的缓存策略组,再运用于第一实体产生的数据。Understandably, since the first cache policy group includes multiple cache policies, and there may be conflicts between multiple cache policies, in order to avoid this situation, before using the first cache policy group for data caching, it is necessary to A cache policy group is checked for validity. When the first caching strategy combination method is used, it can be applied to the data generated by the first entity. When the first caching strategy group is invalid, it needs to be further adjusted to a valid caching strategy group, and then applied to the data generated by the first entity. Data generated by the first entity.
在一种可能的实现方式中,多个缓存策略中的每个缓存策略对应一个策略属性集合,确定第一缓存策略组的合法性,包括:根据多个缓存策略对应的多个策略属性集合,确定第一缓存策略组的合法性;其中,一个缓存策略对应的策略属性集合包括第一属性、第二属性中的至少一个,第一属性用于确定第一缓存策略组中是否存在与上述一个缓存策略冲突的缓存策略,第二属性用于确定第一缓存策略组中能否包括多个上述一个缓存策略。In a possible implementation manner, each cache policy in the multiple cache policies corresponds to a policy attribute set, and determining the validity of the first cache policy group includes: according to the multiple policy attribute sets corresponding to the multiple cache policies, Determine the legitimacy of the first cache policy group; wherein, a policy attribute set corresponding to a cache policy includes at least one of a first attribute and a second attribute, and the first attribute is used to determine whether there is a combination of the above-mentioned one in the first cache policy group The cache policy of the cache policy conflict, and the second attribute is used to determine whether the first cache policy group includes a plurality of the one cache policy.
上述实现方式中,通过为每个缓存策略设置策略属性集合,可以更加方便地对第一缓存策略组的合法性进行确定。In the above implementation manner, by setting a policy attribute set for each cache policy, the validity of the first cache policy group can be more conveniently determined.
在一种可能的实现方式中,根据第一实体产生的数据的访问记录对第一缓存策略组进行优化,得到第二缓存策略组,包括:在第一缓存策略组合法的情况下,根据第一实体产生的数据的访问记录,利用启发式算法或机器学习算法对第一缓存策略组中的每个缓存策略进行迭代优化,从而得到第二缓存策略组。In a possible implementation manner, optimizing the first cache policy group according to the data access record generated by the first entity to obtain the second cache policy group includes: in the case of the first cache policy combination method, according to the first cache policy combination method. For the access records of data generated by an entity, each cache policy in the first cache policy group is iteratively optimized by using a heuristic algorithm or a machine learning algorithm, thereby obtaining a second cache policy group.
上述实现方式中,利用启发式算法或机器学习算法来优化第一缓存策略组中的每个缓存策略,可以提高优化速度,从而可以更快的获得第二缓存策略组。In the above implementation manner, using a heuristic algorithm or a machine learning algorithm to optimize each cache policy in the first cache policy group can improve the optimization speed, so that the second cache policy group can be obtained faster.
第二方面,本申请提供了一种数据缓存装置,所述装置包括:In a second aspect, the present application provides a data caching device, the device comprising:
获取单元,用于从多个类型的缓存策略库中获取多个缓存策略,每个类型的缓存策略库包括的至少一个缓存策略的类型相同,其中,缓存策略库的类型包括过滤类型、预取类型、替换类型、牺牲缓存类型中的至少一个,过滤类型的缓存策略用于过滤数据,预取类型的缓存策略用于预取数据,替换类型的缓存策略用于从缓存中淘汰数据,牺牲缓存类型的缓存策 略用于处理从缓存中淘汰的数据;The acquiring unit is configured to acquire multiple cache policies from multiple types of cache policy libraries, where each type of cache policy library includes at least one cache policy of the same type, wherein the types of the cache policy library include filtering types, prefetching At least one of Type, Replacement Type, and Sacrifice Cache Type, the filter-type cache policy is used to filter data, the prefetch-type cache policy is used to prefetch data, the replacement-type cache policy is used to evict data from the cache, and the sacrifice cache The type of caching strategy used to handle data evictions from the cache;
缓存单元,用于将包括多个缓存策略的第一缓存策略组运用于第一实体产生的数据。The cache unit is used for applying the first cache policy group including a plurality of cache policies to the data generated by the first entity.
在一种可能的实现方式中,缓存单元具体用于:根据第一实体产生的数据的访问记录对第一缓存策略组进行优化,得到第二缓存策略组;将第二缓存策略组运用于第一实体产生的数据。In a possible implementation manner, the cache unit is specifically configured to: optimize the first cache policy group according to the access record of the data generated by the first entity to obtain the second cache policy group; apply the second cache policy group to the first cache policy group Data generated by an entity.
在一种可能的实现方式中,缓存策略库的类型还包括专有类型,专有类型的缓存策略为用户设置的缓存策略。In a possible implementation manner, the type of the cache policy library further includes an exclusive type, and the cache policy of the exclusive type is a cache policy set by a user.
在一种可能的实现方式中,缓存单元还用于:根据第二实体产生的数据的访问记录对第一缓存策略组进行优化,得到第三缓存策略组;将第三缓存策略组运用于第二实体产生的数据。In a possible implementation manner, the cache unit is further configured to: optimize the first cache policy group according to the access record of the data generated by the second entity to obtain a third cache policy group; apply the third cache policy group to the third cache policy group Data generated by two entities.
在一种可能的实现方式中,第一缓存策略组中的多个缓存策略是按照预设顺序排布的,当多个缓存策略包括专有类型的缓存策略时,专有类型的缓存策略的位置由用户设置。In a possible implementation manner, the multiple cache policies in the first cache policy group are arranged in a preset order, and when the multiple cache policies include a proprietary type of cache policy, the exclusive type of cache policy The location is set by the user.
在一种可能的实现方式中,上述装置还包括确定单元,确定单元用于:确定第一缓存策略组的合法性。In a possible implementation manner, the above-mentioned apparatus further includes a determination unit, where the determination unit is configured to: determine the validity of the first cache policy group.
在一种可能的实现方式中,多个缓存策略中的每个缓存策略对应一个策略属性集合,确定单元具体用于:根据多个缓存策略对应的多个策略属性集合,确定第一缓存策略组的合法性;其中,一个缓存策略对应的策略属性集合包括第一属性、第二属性中的至少一个,第一属性用于确定第一缓存策略组中是否存在与上述一个缓存策略冲突的缓存策略,第二属性用于确定第一缓存策略组中能否包括多个上述一个缓存策略。In a possible implementation manner, each cache policy in the multiple cache policies corresponds to a policy attribute set, and the determining unit is specifically configured to: determine the first cache policy group according to the multiple policy attribute sets corresponding to the multiple cache policies The legality of the cache policy; wherein, a set of policy attributes corresponding to a cache policy includes at least one of a first attribute and a second attribute, and the first attribute is used to determine whether there is a cache policy conflicting with the above-mentioned one cache policy in the first cache policy group , and the second attribute is used to determine whether the first cache policy group includes a plurality of the above one cache policy.
在一种可能的实现方式中,缓存单元具体用于:在第一缓存策略组合法的情况下,根据第一实体产生的数据的访问记录,利用启发式算法或机器学习算法对第一缓存策略组中的每个缓存策略进行迭代优化,从而得到第二缓存策略组。In a possible implementation manner, the caching unit is specifically configured to: in the case of the first caching strategy combination method, use a heuristic algorithm or a machine learning algorithm for the first caching strategy according to the access records of the data generated by the first entity. Each cache policy in the group is iteratively optimized to obtain a second cache policy group.
第三方面,本申请提供了一种缓存设备,缓存设备包括处理器和存储器,处理器执行存储器中的代码以实现第一方面所描述的部分或全部步骤。In a third aspect, the present application provides a cache device. The cache device includes a processor and a memory. The processor executes code in the memory to implement some or all of the steps described in the first aspect.
第四方面,本申请提供了一种计算机可读存储介质,存储有计算机指令,计算机指令用于实现第一方面所描述的部分或全部步骤。In a fourth aspect, the present application provides a computer-readable storage medium storing computer instructions, where the computer instructions are used to implement some or all of the steps described in the first aspect.
第五方面,本申请提供了一种计算机程序产品,包括计算机程序,当计算机程序被计算设备读取并执行时,实现如第一方面所描述的部分或全部步骤。In a fifth aspect, the present application provides a computer program product, including a computer program, which, when the computer program is read and executed by a computing device, implements some or all of the steps described in the first aspect.
附图说明Description of drawings
为了更清楚地说明本申请涉及的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to explain the technical solutions involved in the present application more clearly, the following briefly introduces the accompanying drawings that are used in the description of the embodiments. For those of ordinary skill, other drawings can also be obtained from these drawings without any creative effort.
图1A是本申请提供的一种ARC策略的示意图;1A is a schematic diagram of an ARC strategy provided by the application;
图1B是本申请提供的一种ARC策略自适应调整的原理示意图;1B is a schematic diagram of the principle of adaptive adjustment of an ARC strategy provided by the present application;
图2A是本申请提供的一种缓存策略选择界面的示意图;2A is a schematic diagram of a cache policy selection interface provided by the present application;
图2B是本申请提供的另一种缓存策略选择界面的示意图;2B is a schematic diagram of another cache policy selection interface provided by the present application;
图3是本申请提供的一种数据缓存方法的流程示意图;3 is a schematic flowchart of a data caching method provided by the present application;
图4A是本申请提供的一种缓存管理页面的示意图;4A is a schematic diagram of a cache management page provided by the present application;
图4B是本申请提供的另一种缓存管理页面的示意图;4B is a schematic diagram of another cache management page provided by the present application;
图5A是本申请提供的一种第一缓存策略组的示意图;5A is a schematic diagram of a first cache policy group provided by the present application;
图5B是本申请提供的另一种第一缓存策略组的示意图;5B is a schematic diagram of another first cache policy group provided by the present application;
图6是本申请提供的一种具体实施例的流程示意图;6 is a schematic flowchart of a specific embodiment provided by the present application;
图7是本申请提供的一种数据缓存装置的结构示意图;7 is a schematic structural diagram of a data cache device provided by the present application;
图8是本申请提供的一种缓存设备的结构示意图。FIG. 8 is a schematic structural diagram of a cache device provided by the present application.
具体实施方式Detailed ways
下面将结合附图,对本申请中的技术方案进行清楚、完整地描述。显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。The technical solutions in the present application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only a part of the embodiments of the present application, but not all of the embodiments. Based on the embodiments in the present application, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.
为了便于理解本申请提供的技术方案,首先介绍本申请涉及到的相关概念。In order to facilitate the understanding of the technical solutions provided by the present application, related concepts involved in the present application are first introduced.
数据访问实体(以下简称为“实体”):是指访问数据,并对数据存储需求(即缓存需求)有所差异的数据使用者,例如,用户(组)、应用(组)、进程(组)、线程(组)等。本申请中的缓存需求可以理解为实体对存储在缓存中的数据的要求,即缓存中存储哪些数据时,缓存具备有效性。Data access entity (hereinafter referred to as "entity"): refers to data users who access data and have different data storage requirements (ie cache requirements), such as users (groups), applications (groups), processes (groups) ), thread (group), etc. The cache requirement in this application can be understood as the entity's requirement for the data stored in the cache, that is, when the data is stored in the cache, the cache is valid.
缓存有效性:是指缓存是否有效,当缓存具备有效性时,实体可以从直接缓存中获取较多的被访问的数据,从而可以提高实体获取数据的速度。一般地,衡量缓存有效性的指标包括:缓存命中率、数据迁移量读、放大倍数。Cache validity: refers to whether the cache is valid. When the cache is valid, the entity can obtain more accessed data from the direct cache, thereby improving the speed at which the entity obtains data. Generally, indicators to measure cache effectiveness include: cache hit rate, data migration amount read, and magnification.
缓存命中率:当实体访问数据时,如果缓存中存储了该数据,则实体可以从缓存中获取该数据,这就是命中,相反地,如果缓存中没有存储该数据,则实体需要从内存中获取该数据,这就是没有命中。那么,缓存命中率=命中的数量/(命中的数量+没有命中的数量),可以看出,缓存命中率越高,表示缓存的使用率越高,即被访问的数据大部分都是从缓存中获取到的。Cache hit rate: When an entity accesses data, if the data is stored in the cache, the entity can get the data from the cache, which is a hit, on the contrary, if the data is not stored in the cache, the entity needs to get it from memory That data, that's not a hit. Then, the cache hit rate = the number of hits/(the number of hits + the number of no hits), it can be seen that the higher the cache hit rate, the higher the cache usage rate, that is, most of the accessed data is from the cache. obtained from.
数据迁移量:是指缓存数据的迁移量。为了加快存储系统的性能,通常会根据数据的重要性、访问频率、保留时间、容量、性能等指标,将数据采取不同的存储方式分别存储在不同性能的存储设备上,也就是通过分级存储的方式将不经常被访问的数据自动迁移到存储层次中较低的层次,从而释放出较高成本的存储空间给被频繁访问的数据。那么,当数据迁移量越大时,意味着缓存中存储了大量的不被经常访问的数据,这将增加数据迁移的成本以及存储系统的负担。值得注意的一点时,“数据迁移量”可以具有不同的名称,例如,不同标准或者同一标准的不同版本、不同厂商、不同应用场景对“数据迁移量”可以具有不同的称呼,例如,缓存开销等。Data migration amount: refers to the migration amount of cached data. In order to speed up the performance of the storage system, data is usually stored in different storage methods on storage devices with different performances according to the importance, access frequency, retention time, capacity, performance and other indicators of the data, that is, through hierarchical storage. In this way, data that is not frequently accessed is automatically migrated to a lower level in the storage hierarchy, thereby releasing higher-cost storage space for frequently accessed data. Then, when the amount of data migration is larger, it means that a large amount of infrequently accessed data is stored in the cache, which will increase the cost of data migration and the burden on the storage system. It is worth noting that "data migration amount" can have different names, for example, different standards or different versions of the same standard, different manufacturers, and different application scenarios can have different names for "data migration amount", for example, cache overhead. Wait.
读放大倍数:对于一个数据来说,如果缓存策略判断该数据符合缓存需求,则会从内存中读取该数据,并复制到缓存,这时该数据相当于被读取2次。接下来,当缓存已满时,如果缓存策略判断该数据不符合缓存,则会从缓存中淘汰该数据,当该数据再次被访问时,缓存策略又会认为该数据符合缓存需求,那么又会将该数据从内存复制到缓存,这时该数据又被读取了2次。那么,当一个数据的读放大倍数越大,意味着该数据被从缓存中反复淘汰,该缓存的缓存效果较差。值得注意的一点是,“读放大倍数”可以具有不同的名称,例如,不同标准或者同一标准的不同版本、不同厂商、不同应用场景对“读放大倍数”可以具有不同的称呼,例如,缓存载入量等。Read magnification: For a piece of data, if the cache policy determines that the data meets the cache requirements, the data will be read from the memory and copied to the cache. At this time, the data is equivalent to being read twice. Next, when the cache is full, if the cache policy determines that the data does not meet the cache, the data will be eliminated from the cache. When the data is accessed again, the cache policy will consider the data to meet the cache requirements, and then will The data is copied from memory to cache, at which point the data is read 2 more times. Then, when the read magnification of a data is larger, it means that the data is repeatedly eliminated from the cache, and the cache effect of the cache is poor. It is worth noting that "read magnification" can have different names, for example, different standards or different versions of the same standard, different manufacturers, and different application scenarios can have different names for "read magnification", for example, cache load. intake, etc.
数据访问模式:是指实体访问数据的方式,例如,最近访问(recency)模式和频繁访问 (frequency)模式,最近访问模式是指实体总是访问最近被访问过的数据,频繁访问模式是指实体总是访问历史访问频率较高的数据。可以理解的,为了提高缓存有效性,缓存中的数据应符合数据访问模式的要求,例如,如果实体采用最近访问模式,那么缓存中应存储最近被访问过的数据,如果实体采用频繁访问模式,那么缓存中应存储历史访问频率较高的数据。换句话说,用于处理实体产生的数据的缓存策略应适应于数据访问模式,这样才能满足缓存需求,从而提高缓存的有效性。Data Access Mode: Refers to the way an entity accesses data, such as recency mode and frequency mode. Recent access mode refers to the entity always accessing the most recently accessed data, and frequent access mode refers to the entity Always access data with high historical access frequency. Understandably, in order to improve the effectiveness of the cache, the data in the cache should meet the requirements of the data access mode. For example, if the entity adopts the most recent access mode, the cache should store the recently accessed data. If the entity adopts the frequent access mode, Then the cache should store data with high historical access frequency. In other words, the caching strategy used to process the data produced by the entity should be adapted to the data access pattern so that the caching needs can be satisfied and thus the effectiveness of the cache can be improved.
为了便于理解本申请提供的技术方案,首先介绍本申请适用的应用场景:缓存场景。In order to facilitate the understanding of the technical solutions provided by this application, firstly, an application scenario applicable to this application: a caching scenario is introduced.
缓存(cache)是指位于速度相差较大的两种硬件(例如,处理器与内存、内存与硬盘、硬盘与网络等)之间,用于协调二者数据传输速度差异的结构,以达到提高数据读取性能的效果。不难理解,当用户频繁访问一个数据时,如果该数据每次都是从内存中提取到的,那么用户每次访问时都需要等待较长的时间才能获得该数据。而缓存的出现有效地解决了该问题,具体地,将被频繁访问的数据复制到缓存,使得用户后续访问该数据时,都可以直接从缓存中读取该数据,从而提高数据读取的速度。Cache refers to a structure located between two kinds of hardware with large speed differences (for example, processor and memory, memory and hard disk, hard disk and network, etc.) The effect of data read performance. It is not difficult to understand that when a user frequently accesses a piece of data, if the data is retrieved from the memory every time, the user needs to wait a long time to obtain the data each time it is accessed. The emergence of the cache effectively solves this problem. Specifically, the frequently accessed data is copied to the cache, so that when users access the data subsequently, they can directly read the data from the cache, thereby improving the speed of data reading. .
简单地说,缓存是一个用于存储被频繁访问的数据的存储区。但是,随着时间的推移,被频繁访问的数据可能随之变化,也就是说,前一段时间被频繁访问的数据此时可能不被访问,而前一段间不被频繁访问的数据此时可能被频繁访问,那么如何确定哪些数据属于被频繁访问的数据?另外,当缓存空间被占满时,如果存在新的数据需要被复制到缓存,那么此时缓存中应保留哪些数据,删除哪些数据?因此,需要设计一个缓存策略来管理缓存,以保证缓存的有效性。Simply put, a cache is a storage area for frequently accessed data. However, over time, frequently accessed data may change, that is, data that was frequently accessed in the previous period may not be accessed at this time, while data that was not frequently accessed in the previous period may be accessed at this time. Frequently accessed, how to determine which data is frequently accessed data? In addition, when the cache space is full, if there is new data that needs to be copied to the cache, which data should be kept in the cache and which data should be deleted? Therefore, a caching strategy needs to be designed to manage the cache to ensure the effectiveness of the cache.
值得注意的一点是,“缓存策略”可以具有不同的名称。例如,不同标准或者同一标准的不同版本、不同厂商、不同应用场景对“缓存策略”可以具有不同的称呼,例如,术语“缓存策略”有时也可以被称为“缓存方法”、“缓存算法”等。It's worth noting that a "cache policy" can have a different name. For example, different standards or different versions of the same standard, different manufacturers, and different application scenarios may have different names for "cache strategy". For example, the term "cache strategy" may sometimes be called "cache method", "cache algorithm" Wait.
对于一个实体来说,合适的缓存策略是提高该实体获取数据的速度的关键,但实际应用中,由于不同实体对应的数据访问模式可能不同,因此不同实体对应的缓存需求也可能不同,那么,适合处理不同实体产生的数据的缓存策略也要有所不同。不难理解,如果对每个实体都设计一个合适的缓存策略,这将耗费大量的资源。因此,为了适应不同用户的数据访问模式,如何构建出灵活且适应能力强的缓存策略,以用于提高缓存的有效性仍然是当前缓存技术领域中的一个亟需解决的问题。For an entity, an appropriate caching strategy is the key to improving the speed at which the entity obtains data. However, in practical applications, since the data access patterns corresponding to different entities may be different, the caching requirements corresponding to different entities may also be different. Then, Caching strategies that are suitable for handling data generated by different entities also vary. It is not difficult to understand that if a proper caching strategy is designed for each entity, it will consume a lot of resources. Therefore, in order to adapt to the data access patterns of different users, how to construct a flexible and adaptable caching strategy to improve the effectiveness of the cache is still an urgent problem to be solved in the current field of caching technology.
目前,常见的缓存策略方案包括单一自适应策略方案和混合缓存策略方案,具体如下:At present, common caching strategy schemes include a single adaptive strategy scheme and a hybrid caching strategy scheme, as follows:
(1)单一自适应策略方案(1) A single adaptive strategy scheme
单一自适应策略方案:是指为不同的实体配置同一个缓存策略,然后在实际运用过程中根据各个实体的数据访问情况对该缓存策略进行改良,使其能够适应于不同实体采用的不同数据访问模式。比如说,Megiddo和Modha提出的具有一定自适应能力的缓存算法(adaptive replacement cache,ARC)就是一种典型的单一自适应策略,这种缓存策略融合了最近最少使用(least recently used,LRU)算法的思想(即,如果一个数据在最近一段时间内未被访问过,那么可以认为这个数据在将来被访问的可能性也很小)和最不经常使用(least frequently used,LFU)算法的思想(即,如果一个数据在最近一段时间内很少被访问,那么可以认为这个数据在将来被访问的可能性也很小)。因此,ARC策略适合处理采用最近访问模式的实体产生的数据,或者采用频繁访问模式的实体产生的数据。Single adaptive strategy scheme: It refers to configuring the same caching strategy for different entities, and then improving the caching strategy according to the data access situation of each entity in the actual application process, so that it can adapt to different data access adopted by different entities. model. For example, the adaptive replacement cache (ARC) proposed by Megiddo and Modha is a typical single adaptive strategy, which combines the least recently used (LRU) algorithm. The idea of (that is, if a data has not been accessed in the recent period of time, it can be considered that the possibility of this data being accessed in the future is also very small) and the idea of the least frequently used (LFU) algorithm ( That is, if a piece of data is rarely accessed in the recent period, it can be considered that the possibility of this data being accessed in the future is also low). Therefore, the ARC strategy is suitable for processing data generated by entities with a recent access pattern, or data generated by an entity with a frequent access pattern.
如图1A所示,图1A示出了一种ARC策略的示意图。从图中可以看出,ARC策略具体 包括LRU链表、LFU链表、用于存储从LRU链表中淘汰的数据的信息的链表(Ghost LRU链表)、用于存储从LFU链表中淘汰的数据的信息的链表(Ghost LFU链表)。其中,LRU链表和LFU链表用于存储数据,具体地,LRU链表存储的是最近最多使用的数据,LFU链表存储的是最近最频繁使用的数据;而Ghost LRU链表和Ghost LFU链表中不存储数据,存储的是数据的信息(例如,偏移量(offset))。As shown in FIG. 1A, FIG. 1A shows a schematic diagram of an ARC strategy. As can be seen from the figure, the ARC strategy specifically includes an LRU linked list, an LFU linked list, a linked list (Ghost LRU linked list) for storing information about data eliminated from the LRU linked list, and a linked list for storing information about data eliminated from the LFU linked list. Linked list (Ghost LFU linked list). Among them, the LRU linked list and the LFU linked list are used to store data. Specifically, the LRU linked list stores the most recently used data, and the LFU linked list stores the most recently used data; while the Ghost LRU linked list and Ghost LFU linked list does not store data , which stores information about the data (eg, offset).
在实际应用过程中,ARC策略会根据这四个链表的命中情况,动态调整LRU链表和LFU链表的长度,从而使得ARC策略具有适应能力。具体地,如果LRU链表和Ghost-LRU链表的命中率较高,则增加LRU链表的长度,如果LFU链表和Ghost-LFU链表的命中率较高,则增加LFU链表的长度。举例说明,如图1B所示,在LRU链表已满的情况下,如果需要向LRU链表中再写入一个数据A,则LRU链表中最近最少被访问的数据B将会被淘汰,并且数据B会被放入Ghost-LRU队列。假设一段时间后,数据B被再一次访问,这时Ghost-LRU队列被命中,在这种情况下,会将LRU链表的长度加1,对应的,LFU链表的长度减1。In the actual application process, the ARC strategy will dynamically adjust the lengths of the LRU linked list and the LFU linked list according to the hits of the four linked lists, so that the ARC strategy has the ability to adapt. Specifically, if the hit rate of the LRU linked list and the Ghost-LRU linked list is high, the length of the LRU linked list is increased, and if the hit rate of the LFU linked list and the Ghost-LFU linked list is high, the length of the LFU linked list is increased. For example, as shown in Figure 1B, when the LRU linked list is full, if another data A needs to be written into the LRU linked list, the least recently accessed data B in the LRU linked list will be eliminated, and the data B will be eliminated. Will be put into the Ghost-LRU queue. Suppose that after a period of time, the data B is accessed again, and the Ghost-LRU queue is hit. In this case, the length of the LRU list will be increased by 1, and correspondingly, the length of the LFU list will be decreased by 1.
可以看出,ARC策略的适应能力是基于链表的命中情况,在LRU算法和LFU算法之间进行切换而实现的,因此ARC策略仅能适应于最近访问模式和频繁访问模式,这也就导致ARC策略具备的适应能力有限,难以满足除上述两种数据访问模式对应的缓存需求外的其他缓存需求。类似的,其他的单一自适应策略也存在灵活度和适应能力有限的问题。总的来说,单一自适应策略通常存在灵活性和适应能力较差的问题,从而难以满足众多不同的缓存需求。It can be seen that the adaptability of the ARC strategy is realized by switching between the LRU algorithm and the LFU algorithm based on the hit situation of the linked list. Therefore, the ARC strategy can only be adapted to the recent access mode and the frequent access mode, which leads to ARC The adaptability of the strategy is limited, and it is difficult to meet other cache requirements except the cache requirements corresponding to the above two data access modes. Similarly, other single adaptive strategies have limited flexibility and adaptability. In general, a single adaptive strategy usually suffers from poor flexibility and adaptability, making it difficult to meet many different caching needs.
(2)混合缓存策略方案(2) Hybrid cache strategy scheme
混合缓存策略方案:提供有候选策略集合,对于不同的实体,可以从候选策略集合中选择相应的缓存策略来处理对应实体产生的数据。目前,混合缓存策略为云存储、内容分发网络(content delivery network,CDN)等领域中最为常用的缓存策略。Hybrid caching strategy scheme: A candidate strategy set is provided. For different entities, the corresponding caching strategy can be selected from the candidate strategy set to process the data generated by the corresponding entity. At present, the hybrid caching strategy is the most commonly used caching strategy in cloud storage, content delivery network (CDN) and other fields.
在一种可能的实现方式中,该方案提供有缓存策略选择界面,该界面向用户展示了候选策略集合,使得用户可以自行选择合适的缓存策略。如图2A示出的缓存策略选择界面中,提供有缓存策略的选择选项,用户可以根据实体对应的数据访问模式选择相应的一个或多个缓存策略。除缓存策略的选择选项外,缓存策略选择界面还可以提供缓存策略的配置选项,使得用户可以方便地为指定的文件选择缓存策略,以图2B示出的缓存策略选择界面为例,用户可以在配置选项中输入“.txt”,并为文件后缀为“.txt”的文件选择缓存策略1,从而实现利用缓存策略1来处理文件后缀为“.txt”的文件中的数据。In a possible implementation manner, the solution provides a cache policy selection interface, which displays a set of candidate policies to the user, so that the user can select an appropriate cache policy by himself. In the cache policy selection interface shown in FIG. 2A , a selection option of a cache policy is provided, and a user can select one or more corresponding cache policies according to the data access mode corresponding to the entity. In addition to the selection options of the cache policy, the cache policy selection interface can also provide the configuration options of the cache policy, so that the user can conveniently select the cache policy for the specified file. Taking the cache policy selection interface shown in FIG. 2B as an example, the user can Enter ".txt" in the configuration options, and select cache policy 1 for files with a file suffix of ".txt", so as to use cache policy 1 to process data in files with a file extension of ".txt".
在另一种可能的实现方式中,通过分析实体对应的数据访问模式,然后根据实体的数据访问模式,从候选策略集合中选择出适应于该实体对应的数据访问模式的缓存策略。In another possible implementation manner, the data access mode corresponding to the entity is analyzed, and then according to the data access mode of the entity, a caching policy suitable for the data access mode corresponding to the entity is selected from the candidate policy set.
可以看出,该方案中候选策略集合中的缓存策略越丰富,越能从中挑选出更合适的缓存策略。但是,如何构建出包括海量缓存策略的候选策略集合,本身就是一个难以实现的工作,这将导致该方案也无法满足众多不同的缓存需求。It can be seen that the richer the caching strategies in the candidate strategy set in this scheme, the more suitable caching strategies can be selected from them. However, how to construct a set of candidate strategies including a large number of caching strategies is a difficult task in itself, which will lead to the inability of this solution to meet many different caching requirements.
为了解决上述问题,本申请提供了一种数据缓存方法,能够构建出灵活度高且适应能力强的缓存策略组,从而提高缓存的有效性。下面,将结合图3-图6对本申请提供的数据缓存方法进行详细介绍。In order to solve the above problems, the present application provides a data caching method, which can construct a caching strategy group with high flexibility and strong adaptability, thereby improving the effectiveness of caching. Hereinafter, the data caching method provided by the present application will be described in detail with reference to FIG. 3 to FIG. 6 .
首先,请参见图3,图3示出了本申请提供的一种数据缓存方法的流程示意图,该方法包括但不限于如下步骤:First, please refer to FIG. 3, which shows a schematic flowchart of a data caching method provided by the present application. The method includes but is not limited to the following steps:
S101:缓存设备从多个类型的缓存策略库中获取多个缓存策略。S101: The cache device acquires multiple cache policies from multiple types of cache policy libraries.
在一具体的实施例中,每个类型的缓存策略库包括的至少一个缓存策略的类型相同,其 中,缓存策略库的类型包括过滤类型、预取类型、替换类型、牺牲缓存类型中的至少一个。可选的,缓存策略库的类型还包括专有类型。下面对各个类型的缓存策略库及其包括的缓存策略进行介绍:In a specific embodiment, each type of cache policy library includes at least one cache policy of the same type, wherein the type of the cache policy library includes at least one of a filter type, a prefetch type, a replacement type, and a sacrifice cache type. . Optionally, the type of the cache policy library also includes a proprietary type. The following is an introduction to each type of cache policy library and the cache policies it includes:
过滤类型的缓存策略库(以下简称为过滤策略库)包括一个或多个过滤类型的缓存策略(以下简称为过滤策略),过滤策略用于过滤数据,例如,处理器在执行人脸识别的程序时,会产生人脸图像、人脸特征、人脸识别结果等大量的数据,过滤策略可以将人脸图像、人脸特征等数据滤除。本申请中,过滤类型的缓存策略库中包括的过滤策略可以是双边滤波算法(double filter)、布隆滤波算法(bloom filter)等经典的滤波算法,也可以是用户自定义的过滤规则,例如,用户通过自定义的方式将文件名后缀为“.jpg”的文件过滤,本申请对此不作具体限定。The filtering-type caching strategy library (hereinafter referred to as the filtering strategy library) includes one or more filtering-type caching strategies (hereinafter referred to as the filtering strategy), and the filtering strategy is used to filter data. For example, the processor is executing the program of face recognition. A large amount of data such as face images, face features, and face recognition results will be generated, and the filtering strategy can filter out data such as face images and face features. In this application, the filtering strategies included in the filtering-type caching strategy library may be classical filtering algorithms such as double filter, bloom filter, etc., or may be user-defined filtering rules, such as , the user filters files with the file name suffix ".jpg" in a self-defined manner, which is not specifically limited in this application.
预取类型的缓存策略库(以下简称为预取策略库)包括一个或多个预取类型的缓存策略(以下简称为预取策略),预取策略用于预取数据,具体地说,预取策略用于预测实体将要访问的数据,并将预测得到的数据提前存储到缓存中,那么,当实体访问到这些数据时,便能直接从缓存中获得,从而提高实体访问效率。例如,处理器执行一个循环指令时,预取策略可以根据处理器已经执行该循环指令的次数,预测出下一次执行循环指令时所需的数据,并将处理器下一次执行循环指令时所需的数据提前存储到缓存中。本申请中预取类型的缓存策略库中包括的预取策略可以是预读算法(readahead),自适应预读算法(adaptive readahead)、智能预取算法(smart prefetcher)等,也可以是用户自定义的预取规则,此处不作具体限定。The cache strategy library of the prefetch type (hereinafter referred to as the prefetch strategy library) includes one or more cache strategies of the prefetch type (hereinafter referred to as the prefetch strategy), and the prefetch strategy is used to prefetch data. The retrieval strategy is used to predict the data to be accessed by the entity, and store the predicted data in the cache in advance. Then, when the entity accesses the data, it can be obtained directly from the cache, thereby improving the access efficiency of the entity. For example, when the processor executes a loop instruction, the prefetch strategy can predict the data required for the next execution of the loop instruction according to the number of times the processor has executed the loop instruction, and calculate the data required for the next execution of the loop instruction by the processor. The data is stored in the cache in advance. The prefetching strategy included in the cache strategy library of the prefetching type in this application may be a readahead algorithm (readahead), an adaptive readahead algorithm (adaptive readahead), a smart prefetcher algorithm (smart prefetcher), etc. The defined prefetching rules are not specifically limited here.
替换类型的缓存策略库(以下简称为替换策略库)包括一个或多个替换类型的缓存策略(以下简称为替换策略),替换策略用于从缓存中淘汰数据,本申请中,替换类型的缓存策略库中包括的替换策略可以是LRU、LFU、ARC等,也可以是用户自定义的替换规则,此处不作具体限定。The replacement-type caching strategy library (hereinafter referred to as the replacement strategy library) includes one or more replacement-type caching strategies (hereinafter referred to as the replacement strategy), and the replacement strategy is used to eliminate data from the cache. In this application, the replacement-type cache The replacement policies included in the policy library may be LRU, LFU, ARC, etc., or may be user-defined replacement rules, which are not specifically limited here.
牺牲缓存类型的缓存策略库(以下简称为牺牲缓存策略库)包括一个或多个牺牲缓存类型的缓存策略(以下简称为牺牲缓存策略),牺牲缓存策略用于处理从缓存中淘汰的数据。可以理解的,由于从缓存中淘汰的数据仍然可能被再次访问,在这种情况下,实体需要重新从内存中获得被淘汰的数据,为了减少这一过程造成的损失,通过牺牲缓存策略可以将被淘汰的数据暂时存储到牺牲缓存(victim cache)中,然后根据后续被访问的概率确定是否从牺牲缓存中淘汰该数据,也就是说,牺牲缓存策略可以将从缓存中被淘汰但后续被访问概率大的数据存储到牺牲缓存中,使得实体再次访问被淘汰的数据时,可以直接从牺牲缓存中获取该数据。The cache strategy library of sacrificial cache type (hereinafter referred to as sacrificial cache strategy library) includes one or more cache strategies of sacrificial cache type (hereinafter referred to as sacrificial cache strategy), and the sacrificial cache strategy is used to process data eliminated from the cache. Understandably, since the data eliminated from the cache may still be accessed again, in this case, the entity needs to re-obtain the eliminated data from the memory. In order to reduce the loss caused by this process, the cache strategy can be sacrificed. The eliminated data is temporarily stored in the victim cache, and then whether to eliminate the data from the victim cache is determined according to the probability of subsequent access, that is, the victim cache strategy can be eliminated from the cache but subsequently accessed. Data with high probability is stored in the sacrifice cache, so that when the entity accesses the eliminated data again, it can directly obtain the data from the sacrifice cache.
专有类型的缓存策略库(以下简称为专有策略库)包括一个或多个专有类型的缓存策略(以下简称为专有策略),专有策略为用户设置的缓存策略,具体可以是用于根据实体的数据访问行为而专门定制的策略,例如,实体访问数据库时,每次都是按照数据库的行号来提取数据,因此,用户可以设置一个根据数据库的行号来提取相关数据的专有策略。The proprietary type of caching policy library (hereinafter referred to as the proprietary policy library) includes one or more proprietary types of caching policies (hereinafter referred to as the proprietary policy), and the proprietary policy is the caching policy set by the user. For example, when an entity accesses a database, it extracts data according to the row number of the database every time. Therefore, the user can set a special purpose for extracting relevant data according to the row number of the database. There are strategies.
可选的,多个类型的缓存策略库可以都配置在缓存设备中,也可以都配置在其他电子设备或系统中,还可以部分配置在缓存设备中,部分配置在其他电子设备或系统中,此处不作具体限定。Optionally, multiple types of cache policy libraries may be configured in the cache device, or may be configured in other electronic devices or systems, and may also be partially configured in the cache device and partially configured in other electronic devices or systems. There is no specific limitation here.
可选的,从多个类型的缓存策略库获取的多个缓存策略的类型可以都不相同,也可以全都相同,还可以一部分相同,一部分不同,此处不作具体限定。可选的,上述多个缓存策略可以包括多个相同的缓存策略,上述多个缓存策略也可以是多个不同的缓存策略,此处不作具体限定。具体实现中,缓存设备可以通过以下几种方式从多个类型的缓存策略库中获取多 个缓存策略。Optionally, the types of multiple cache policies obtained from multiple types of cache policy libraries may be different, or may all be the same, or may be partially the same and partially different, which is not specifically limited here. Optionally, the multiple cache policies may include multiple identical cache policies, and the multiple cache policies may also be multiple different cache policies, which are not specifically limited here. In a specific implementation, the cache device can obtain multiple cache policies from multiple types of cache policy libraries in the following ways.
方式一,缓存设备获取用户从多个类型的缓存策略库中选取的多个缓存策略。Manner 1: The cache device acquires multiple cache policies selected by the user from multiple types of cache policy libraries.
具体地,多个类型的缓存策略库可以以缓存管理页面的形式向用户进行展示,以图4A为例,缓存管理页面上显示了5个类型的缓存策略库,包括:过滤策略库、预取策略库、替换策略库、牺牲缓存策略库以及专有策略库,其中,过滤策略库包括3个过滤策略,预取策略库包括4个预取策略,替换策略库包括5个替换策略,牺牲缓存策略库包括2个牺牲缓存策略,专有策略库包括2个专有策略。那么,用户可以通过缓存管理页面清楚地获知有哪些类型的缓存策略库,以及有哪些缓存策略可供选择,而且,用户还可以在缓存管理页面上进行操作,从而选择出上述多个缓存策略。Specifically, multiple types of cache policy libraries can be displayed to users in the form of a cache management page. Taking FIG. 4A as an example, five types of cache policy libraries are displayed on the cache management page, including: filtering policy library, prefetching Strategy library, replacement strategy library, sacrifice cache strategy library, and proprietary strategy library. Among them, the filter strategy library includes 3 filtering strategies, the prefetch strategy library includes 4 prefetch strategies, and the replacement strategy library includes 5 replacement strategies. The strategy library includes 2 sacrificial caching strategies, and the proprietary strategy library includes 2 proprietary strategies. Then, the user can clearly know what types of cache policy libraries are available and which cache policies are available through the cache management page, and the user can also operate on the cache management page to select the above-mentioned multiple cache policies.
可选的,缓存管理页面上还可以显示配置选项,以方便用户可以定义利用缓存策略对哪些数据进行处理、处理后的数据缓存到什么地方以及缓存时间、缓存优先级等。以图4B为例,用户通过在缓存管理页面上的配置选项中输入目录A,则缓存设备会利用选择出多个缓存策略对目录A中的数据进行处理。Optionally, configuration options can also be displayed on the cache management page, so that the user can define which data is processed by using the cache policy, where the processed data is cached, the cache time, and the cache priority. Taking FIG. 4B as an example, the user inputs directory A in the configuration options on the cache management page, and the cache device processes the data in the directory A by selecting multiple cache policies.
可选的,用户可以随机地从多个类型的缓存策略库中选取出多个缓存策略,也可以通过分析第一实体的数据访问模式(具体可以是第一实体产生的数据的访问记录),从而在多个类型的缓存策略库中选取出多个缓存策略,还可以通过其他方式从多个类型的缓存策略库中选取出多个缓存策略,本申请对此不作具体限定。Optionally, the user may randomly select multiple cache policies from multiple types of cache policy libraries, or analyze the data access mode of the first entity (specifically, the access record of the data generated by the first entity), Thus, multiple cache policies are selected from multiple types of cache policy libraries, and multiple cache policies can also be selected from multiple types of cache policy libraries in other ways, which are not specifically limited in this application.
方式二,缓存设备从多个类型的缓存策略库中选取多个缓存策略。In a second manner, the cache device selects multiple cache policies from multiple types of cache policy libraries.
可选的,缓存设备可以随机地从多个类型的缓存策略库中选取出多个缓存策略,也可以通过分析第一实体的数据访问模式(具体可以是第一实体产生的数据的访问记录),从而在多个类型的缓存策略库中选取出多个缓存策略,还可以根据下发的配置文件从多个类型的缓存策略库中选取出多个缓存策略等,本申请对此不作具体限定。其中,配置文件包括以下一个或多个:选取的缓存策略的总数量、选取哪些类型的缓存策略、选取每种类型的缓存策略的数量、具体选取哪个缓存策略。Optionally, the cache device may randomly select multiple cache policies from multiple types of cache policy libraries, or may analyze the data access mode of the first entity (specifically, the access record of the data generated by the first entity) , so that multiple cache policies can be selected from multiple types of cache policy libraries, and multiple cache policies can also be selected from multiple types of cache policy libraries according to the issued configuration file, which is not specifically limited in this application. . The configuration file includes one or more of the following: the total number of selected cache policies, which types of cache policies are selected, the number of selected cache policies of each type, and which cache policy is selected specifically.
方式三,缓存设备还可以通过将方式一和方式二进行结合,从而获取多个缓存策略,即一部分缓存策略为用户选取的,另一部分缓存策略为缓存设备选取的。In the third mode, the cache device can also obtain multiple cache policies by combining the first mode and the second mode, that is, a part of the cache policies are selected by the user, and the other part of the cache policies are selected by the cache device.
通过上述方法,用户或者缓存设备可以根据实际需求灵活地选择缓存策略,这使得第一缓存策略组能够更加符合用户的要求,例如,当存储系统中没有配置牺牲缓存时,可以不选择牺牲缓存策略,又例如,当需要将从缓存中淘的数据先进行过滤再存入牺牲缓存时,用户可以设置一个专有策略来达到该目的。而且,第一缓存策略组中缓存策略的类型、每种类型的缓存策略的数量等都是可以根据实际情况进行相应调整的,这使得第一缓存策略组具有多种可能性。不难理解,相较于前述内容中提及的单一自适应策略和候选策略集合,利用上述方法可以容易地扩展出更多的缓存策略组,从而提供更多的选择,也就是能够满足更多的缓存需求。Through the above method, a user or a cache device can flexibly select a cache policy according to actual needs, which enables the first cache policy group to better meet the user's requirements. For example, when a sacrifice cache is not configured in the storage system, the sacrifice cache policy may not be selected. , for another example, when the data from the cache needs to be filtered first and then stored in the sacrifice cache, the user can set a proprietary strategy to achieve this purpose. Moreover, the types of cache policies in the first cache policy group, the number of each type of cache policies, etc. can be adjusted according to actual conditions, which makes the first cache policy group have multiple possibilities. It is not difficult to understand that compared with the single adaptive strategy and candidate strategy set mentioned in the foregoing content, more cache strategy groups can be easily expanded by using the above method, thereby providing more choices, that is, it can satisfy more cache requirements.
S102:缓存设备将包括多个缓存策略的第一缓存策略组运用于第一实体产生的数据。S102: The cache device applies a first cache policy group including multiple cache policies to the data generated by the first entity.
在一具体的实施例中,第一缓存策略组中的多个缓存策略是按照预设顺序排布的。当多个缓存策略包括专有策略时,专有策略的位置由用户设置。In a specific embodiment, the multiple cache policies in the first cache policy group are arranged in a preset order. When multiple cache policies include an exclusive policy, the location of the exclusive policy is set by the user.
更具体地,当多个缓存策略包括过滤策略、预取策略、替换策略、牺牲缓存策略以及专有策略时,预设顺序为:过滤策略排布在预取策略之前,预取策略排布在替换策略之前,替换策略排布在牺牲缓存策略之前,专有策略的位置由用户设置,也就是说,专有策略可以排布于任一策略之前或之后。可以理解的,用户通过指定专用策略的位置,可以使得第一缓存 策略组更加符合缓存需求,从而可以获得更好的缓存效果。More specifically, when the multiple caching strategies include filtering strategies, prefetching strategies, replacement strategies, sacrificing caching strategies and proprietary strategies, the preset order is: the filtering strategies are arranged before the prefetching strategies, and the prefetching strategies are arranged before the prefetching strategies. Before the replacement strategy, the replacement strategy is arranged before the sacrifice cache strategy, and the position of the exclusive strategy is set by the user, that is, the exclusive strategy can be arranged before or after any strategy. It can be understood that, by specifying the location of the dedicated policy, the user can make the first cache policy group more in line with the cache requirement, so that a better cache effect can be obtained.
在一具体的实施例中,当上述多个缓存策略中包括两个或两个以上类型相同的缓存策略时,预设顺序还可以定义这些类型相同的缓存策略的顺序。以图4A示出的缓存策略为例,预设顺序定义了过滤策略1排布在过滤策略2之前,过滤策略3排布在过滤策略1之前。那么,当多个缓存策略包括过滤策略1、过滤策略2、预取策略4、替换策略3、牺牲缓存策略1以及专有策略2时,缓存设备可以根据预设顺序将这些缓存策略进行排布,从而得到如图5A所示的第一缓存策略组。可选的,预设顺序中也可以不定义类型相同的缓存策略之间的排布顺序,继续以上述例子为例,当预设顺序中没有定义过滤策略1和过滤策略2之间的排布顺序时,缓存设备将会得到如图5A和图5B示出的2个第一缓存策略组。In a specific embodiment, when the above-mentioned multiple cache policies include two or more cache policies of the same type, the preset order may also define the order of these cache policies of the same type. Taking the caching policy shown in FIG. 4A as an example, the preset sequence defines that filter policy 1 is arranged before filter policy 2 , and filter policy 3 is arranged before filter policy 1 . Then, when multiple cache policies include filter policy 1, filter policy 2, prefetch policy 4, replacement policy 3, sacrifice cache policy 1 and exclusive policy 2, the cache device can arrange these cache policies according to a preset order , so as to obtain the first cache policy group as shown in FIG. 5A . Optionally, the arrangement order between cache policies of the same type may not be defined in the preset order. Continuing to take the above example as an example, when the preset order does not define the arrangement between filter policy 1 and filter policy 2 In order, the cache device will obtain two first cache policy groups as shown in FIG. 5A and FIG. 5B .
在一具体的实施例中,为了方便缓存设备将多个缓存策略进行排布,多个缓存策略中的每个缓存策略除包括策略本身的算法描述外,还包括策略属性的描述,简单地说,就是利用算法描述和策略属性描述来共同描述一个缓存策略。具体地,上述多个缓存策略中的每个缓存策略对应一个策略属性集合,策略属性集合包括缓存策略的类型,缓存策略的类型具体可以是步骤S101中提到的过滤类型、预取类型、替换类型、牺牲缓存类型、专有类型中的任一种。那么,缓存设备可以根据多个缓存策略中每个缓存策略的类型,将多个缓存策略按照预设顺序进行排布,从而得到第一缓存策略组。此处需要说明的一点是,本申请中一个缓存策略仅对应一个策略类型,以便于缓存设备可以根据缓存策略的类型将多个缓存策略进行排布。In a specific embodiment, in order to facilitate the arrangement of multiple cache policies by the cache device, each cache policy in the multiple cache policies includes not only the algorithm description of the policy itself, but also the description of policy attributes. , is to use algorithm description and policy attribute description to jointly describe a cache policy. Specifically, each cache policy in the above-mentioned multiple cache policies corresponds to a policy attribute set, the policy attribute set includes the type of the cache policy, and the type of the cache policy may specifically be the filter type, prefetch type, replacement type mentioned in step S101 Any of type, sacrifice cache type, proprietary type. Then, the cache device may arrange the multiple cache policies in a preset order according to the type of each cache policy in the multiple cache policies, so as to obtain the first cache policy group. It should be noted here that one cache policy in this application corresponds to only one policy type, so that the cache device can arrange multiple cache policies according to the type of the cache policy.
可选的,第一缓存策略组还可以是这样得到的:缓存设备获取上述多个缓存策略之后,将这些缓存策略存储到某个预设的文件中,然后将该文件命名为第一缓存策略组。Optionally, the first cache policy group may also be obtained in the following way: after the cache device acquires the above-mentioned multiple cache policies, it stores these cache policies in a preset file, and then names the file as the first cache policy Group.
本申请中,考虑到不同缓存策略之间可能存在不兼容的问题,比如说,过滤策略1用于过滤数据A,而预取策略4用于将数据A放入缓存,那么,过滤策略1与预取策略4之间不兼容,这意味着过滤策略1与预取策略4相互冲突,如果同时运用过滤策略1与预取策略4,会造成过滤策略1或者预取策略4失效,从而导致第一缓存策略组的部分功能失效。为了避免这种情况,在一具体的实施例中,缓存设备将包括多个缓存策略的第一缓存策略组运用于第一实体产生的数据之前,缓存设备还需确定第一缓存策略组的合法性,即检查多个缓存策略之间是否存在不兼容问题。In this application, considering that there may be incompatibility between different caching strategies, for example, filtering strategy 1 is used to filter data A, and prefetching strategy 4 is used to put data A into the cache, then, filtering strategy 1 and Prefetching strategy 4 is incompatible, which means that filtering strategy 1 and prefetching strategy 4 conflict with each other. If filtering strategy 1 and prefetching strategy 4 are used at the same time, filtering strategy 1 or prefetching strategy 4 will fail, resulting in the first Part of the function of a cache policy group is invalid. In order to avoid this situation, in a specific embodiment, before the cache device applies the first cache policy group including multiple cache policies to the data generated by the first entity, the cache device also needs to determine the legality of the first cache policy group , that is, checking for incompatibilities between multiple caching strategies.
在一具体的实施例中,缓存设备根据确定第一缓存策略组的合法性,包括:缓存设备根据多个缓存策略对应的多个策略属性集合,确定第一缓存策略组的合法性;其中,每个缓存策略的策略属性集合还包括第一属性、第二属性中的至少一个,第一属性用于确定第一缓存策略组中是否存在与该缓存策略冲突的缓存策略,第二属性用于确定第一缓存策略组中能否包括多个该缓存策略。In a specific embodiment, determining the validity of the first caching policy group by the caching device includes: determining the validity of the first caching policy group by the caching device according to multiple policy attribute sets corresponding to multiple caching policies; wherein, The policy attribute set of each cache policy further includes at least one of a first attribute and a second attribute. The first attribute is used to determine whether there is a cache policy conflicting with the cache policy in the first cache policy group, and the second attribute is used to determine whether there is a cache policy conflicting with the cache policy in the first cache policy group. It is determined whether a plurality of the cache policies can be included in the first cache policy group.
需要说明的,由于不同的替换策略之间通常都会存在不兼容的情况,如果将不兼容的替换策略组合在一起,会导致其中的某一个替换策略失效,比如说,LRU用于淘汰最近被访问最少的数据,LFU用于淘汰最近被访问最不频繁的数据,如果第一缓存策略组同时包括LRU和LFU,假设缓存中的某一数据为最近20分钟被访问次数最多数据,但该数据也是最近2个小时被访问最不频繁的数据,在这种情况下,依据LRU不应将该数据淘汰,但依据LFU却要将该数据淘汰。因此,替换策略的第一属性可以设置为与其他替换策略不兼容,第二属性可以设置为一个,以表示第一缓存策略组仅能包括一个替换策略。而牺牲缓存策略主要作用于被替换策略淘汰的数据,因此,当替换策略为一个时,牺牲缓存策略一般也只需要一个。因此,牺牲缓存策略的第一属性可以设置为与其他牺牲缓存策略不兼容,第二属性可以设置为一个。还需说明的,第一缓存策略组中可以包括多个相同的缓存策略,例如,假设实体先 按照数据库的行号预取数据,再按照数据库的列号预取数据,再按照数据库的行号预取数据,那么,可以依次执行预取策略1、预取策略2和预取策略1来实现上述过程,在这种情况下,第一缓存策略组就需要2个预取策略1,预取策略1的第二属性可以设置为多个。It should be noted that due to the incompatibility between different replacement strategies, if incompatible replacement strategies are combined, one of the replacement strategies will become invalid. For example, LRU is used to eliminate recently accessed strategies. The least data, LFU is used to eliminate the least frequently accessed data recently. If the first cache policy group includes both LRU and LFU, it is assumed that a certain data in the cache is the most accessed data in the last 20 minutes, but this data is also The least frequently accessed data in the last 2 hours, in this case, the data should not be eliminated according to the LRU, but the data should be eliminated according to the LFU. Therefore, the first attribute of the replacement policy may be set to be incompatible with other replacement policies, and the second attribute may be set to one, indicating that the first cache policy group can only include one replacement policy. The sacrifice cache strategy mainly acts on the data eliminated by the replacement strategy. Therefore, when the replacement strategy is one, the sacrifice cache strategy generally only needs one. Therefore, the first attribute of the sacrifice cache strategy can be set to be incompatible with other sacrifice cache strategies, and the second attribute can be set to one. It should also be noted that the first cache policy group may include multiple identical cache policies. For example, it is assumed that the entity first prefetches data according to the row number of the database, then prefetches the data according to the column number of the database, and then prefetches the data according to the row number of the database. To prefetch data, then, the above process can be implemented by executing prefetch strategy 1, prefetch strategy 2 and prefetch strategy 1 in sequence. In this case, the first cache strategy group needs two prefetch strategies 1, prefetch The second attribute of policy 1 can be set to multiple.
在一具体的实施例中,当第一缓存策略组中的多个缓存策略是按照预设顺序排布的,上述每个缓存策略对应的策略属性集合可以具体包括前置条件、后置条件、第一属性、第二属性中的至少一个。其中,一个缓存策略的前置条件是指排布在该缓存策略前的缓存策略应满足的条件,和/或,排布在该缓存策略前的缓存策略不应满足的条件,以使得缓存设备可以根据前置条件判断第一缓存策略组中该缓存策略与其前面的缓存策略是否冲突。后置条件是指排布在该缓存策略后的缓存策略应满足的条件,和/或,排布在该缓存策略后的缓存策略不应满足的条件,以使得缓存设备可以根据后置条件判断第一缓存策略组中该缓存策略与其后面的缓存策略是否冲突。以图5A所示的第一缓存策略组为例,假设过滤策略1的后置条件为过滤数据A,或预取策略4的前置条件为将数据A放入缓存,在这种情况下,缓存设备确定第一缓存策略组不合法。In a specific embodiment, when the multiple cache policies in the first cache policy group are arranged in a preset order, the policy attribute set corresponding to each cache policy may specifically include preconditions, postconditions, At least one of the first attribute and the second attribute. Wherein, the precondition of a cache policy refers to the condition that the cache policy arranged before the cache policy should satisfy, and/or the condition that the cache policy arranged before the cache policy should not satisfy, so that the cache device Whether the cache policy in the first cache policy group conflicts with the previous cache policy can be determined according to the precondition. Post-conditions refer to conditions that should be satisfied by the caching policy arranged after the caching policy, and/or conditions that should not be satisfied by the caching policies arranged behind the caching policy, so that the caching device can judge based on the post-conditions Whether the cache policy in the first cache policy group conflicts with the cache policy behind it. Taking the first cache policy group shown in FIG. 5A as an example, it is assumed that the post-condition of filtering policy 1 is to filter data A, or the pre-condition of prefetching policy 4 is to put data A into the cache. In this case, The cache device determines that the first cache policy group is invalid.
可选的,缓存设备还可以根据前置条件、后置条件、第一属性或第二属性中的至少一个来,从多个类型的缓存策略库中选取多个缓存策略。以上述例子中替换策略的第一属性为例,缓存设备根据替换策略的第一属性,会只选择一个替换策略。又例如,假设预取策略4的前置条件为预取策略4应与过滤策略2同时使用,那么,当缓存设备选取了预取策略4之后,便会选择过滤策略2。Optionally, the cache device may also select multiple cache policies from multiple types of cache policy libraries according to at least one of a precondition, a postcondition, a first attribute or a second attribute. Taking the first attribute of the replacement policy in the above example as an example, the cache device will only select one replacement policy according to the first attribute of the replacement policy. For another example, if the precondition of prefetching policy 4 is that prefetching policy 4 and filtering policy 2 should be used at the same time, then, after the caching device selects prefetching policy 4, filtering policy 2 will be selected.
在一具体的实施例中,缓存设备将包括多个缓存策略的第一缓存策略组运用于第一实体产生的数据,包括:缓存设备根据第一实体产生的数据的访问记录对第一缓存策略组进行优化,得到第二缓存策略组;然后,将第二缓存策略组运用于第一实体产生的数据。可以理解的,通过上述步骤可以得到更适应于满足第一实体的数据访问模式的第二缓存策略组,从而提高缓存的有效性。In a specific embodiment, the caching device applies a first caching policy group including multiple caching policies to the data generated by the first entity, including: The group is optimized to obtain a second cache policy group; then, the second cache policy group is applied to the data generated by the first entity. It can be understood that, through the above steps, a second cache policy group that is more suitable for satisfying the data access mode of the first entity can be obtained, thereby improving the effectiveness of the cache.
在一具体的实施例中,缓存设备根据第一实体产生的数据的访问记录对第一缓存策略组进行优化,得到第二缓存策略组,包括:在第一缓存策略组合法的情况下,缓存设备根据第一实体产生的数据的访问记录,利用启发式算法或机器学习算法对所述第一缓存策略组中的每个缓存策略进行迭代优化,从而得到所述第二缓存策略组。该步骤的具体内容将通过后文的步骤S1021-S1025进行详细叙述。可以理解的,利用启发式算法或机器学习算法来优化第一缓存策略组中的每个缓存策略,可以提高优化速度,从而使得缓存设备可以更快的获得第二缓存策略组,并运用于第一实体产生的数据。In a specific embodiment, the caching device optimizes the first caching policy group according to the access records of the data generated by the first entity to obtain the second caching policy group, including: in the case of the first caching policy combination method, caching The device uses a heuristic algorithm or a machine learning algorithm to iteratively optimize each cache policy in the first cache policy group according to the data access record generated by the first entity, thereby obtaining the second cache policy group. The specific content of this step will be described in detail through steps S1021-S1025 below. Understandably, using a heuristic algorithm or a machine learning algorithm to optimize each cache policy in the first cache policy group can improve the optimization speed, so that the cache device can obtain the second cache policy group faster and apply it to the second cache policy group. Data generated by an entity.
在一具体的实施例中,缓存设备还可以将包括多个缓存策略的第一缓存策略组运用于第二实体产生的数据,其中,第一实体产生的数据与第二实体产生的数据不同。该步骤的具体过程包括:缓存设备根据第二实体产生的数据的访问记录对第一缓存策略组进行优化,得到第三缓存策略组;然后,将第三缓存策略组运用于所述第二实体产生的数据。可以理解的,缓存设备将第一缓存策略组运用于第二实体产生的数据的具体过程与将第一缓存策略组运用于第一实体产生的数据的具体过程类似,为了简便,此处不再展开详细赘述。可以看出,本申请提供的第一缓存策略组能够适用于不同实体产生的数据,也就是说,第一缓存策略组具有良好的适应能力。In a specific embodiment, the caching device may also apply the first caching policy group including multiple caching policies to the data generated by the second entity, wherein the data generated by the first entity is different from the data generated by the second entity. The specific process of this step includes: the cache device optimizes the first cache policy group according to the access records of the data generated by the second entity to obtain a third cache policy group; and then applies the third cache policy group to the second entity generated data. It can be understood that the specific process of applying the first cache policy group to the data generated by the second entity by the cache device is similar to the specific process of applying the first cache policy group to the data generated by the first entity. Expand the details. It can be seen that the first caching strategy group provided by the present application can be applied to data generated by different entities, that is to say, the first caching strategy group has good adaptability.
下面将结合步骤S1021-S1025对前述步骤S102中缓存设备优化第一缓存策略组,从而得到第二缓存策略组的具体过程进行进一步地描述。The specific process of optimizing the first cache policy group by the cache device in the foregoing step S102 to obtain the second cache policy group will be further described below with reference to steps S1021-S1025.
S1021:采集第一实体产生的数据的访问记录。S1021: Collect an access record of the data generated by the first entity.
S1022:对第一实体产生的数据的访问记录进行预处理,从而去除第一实体产生的数据中的异常数据。S1022: Preprocess the access records of the data generated by the first entity, so as to remove abnormal data in the data generated by the first entity.
在一具体的实施例中,预处理的方式包括过滤、清洗等,异常数据包括不完整数据(包括截断数据、删失数据、缺失数据等)、时间戳错误的数据,地址超过地址区间的数据等。In a specific embodiment, the preprocessing methods include filtering, cleaning, etc., and abnormal data includes incomplete data (including truncated data, censored data, missing data, etc.), data with wrong time stamps, and data with addresses exceeding the address range. Wait.
S1023:根据预处理后的数据的访问记录,分析并评估第一缓存策略组的缓存效果。S1023: Analyze and evaluate the caching effect of the first caching policy group according to the access records of the preprocessed data.
具体实现中,将预处理后的数据输入到第一缓存策略组,经过第一缓存策略组的处理后,得到当前缓存中存储的数据,然后,再根据预处理后的数据的访问记录,确定当前缓存对应的缓存指标,从而进一步确定第一缓存策略组的缓存效果。其中,缓存指标是指衡量当前缓存的有效性的指标。可以理解的,缓存越有效,说明对应的缓存策略的缓存效果越好,因此,可以根据当前缓存对应的缓存指标来确定第一缓存策略组的缓存效果。可选的,缓存指标包括缓存命中率、缓存迁移量、读放大倍数中的至少一个,其中,缓存命中率、缓存迁移量、读放大倍数的定义可参见前述内容中的相关概念的介绍。In a specific implementation, the preprocessed data is input into the first cache strategy group, and after being processed by the first cache strategy group, the data stored in the current cache is obtained, and then, according to the access record of the preprocessed data, the determination is made. The cache index corresponding to the current cache is further determined, so as to further determine the cache effect of the first cache policy group. The cache indicator refers to an indicator that measures the effectiveness of the current cache. It can be understood that the more effective the cache is, the better the cache effect of the corresponding cache policy is. Therefore, the cache effect of the first cache policy group can be determined according to the cache index corresponding to the current cache. Optionally, the cache indicator includes at least one of cache hit rate, cache migration amount, and read magnification, wherein, for definitions of cache hit rate, cache migration amount, and read magnification, refer to the introduction of related concepts in the foregoing content.
以缓存命中率为例,根据预处理后的数据的访问记录,确定当前缓存对应的缓存指标的具体过程为:首先,确定当前缓存中存储了预处理后的数据中的哪些数据,从而得到缓存命中的数量以及缓存没有命中的数量,然后,根据缓存命中的数量和缓存没有命中的数量计算得到当前缓存的缓存命中率。Taking the cache hit rate as an example, according to the access records of the preprocessed data, the specific process of determining the cache index corresponding to the current cache is as follows: First, determine which data in the preprocessed data is stored in the current cache, so as to obtain the cache The number of hits and the number of cache misses, and then, the cache hit rate of the current cache is calculated according to the number of cache hits and the number of cache misses.
S1024:根据第一缓存策略组的缓存效果,确定是否需要对第一缓存策略组进行优化。S1024: Determine whether the first cache policy group needs to be optimized according to the cache effect of the first cache policy group.
在一具体的实施例中,当缓存指标不满足预设指标时,缓存设备确定需要对第一缓存策略组进行优化。当缓存指标满足预设指标时,缓存设备可以不对第一缓存策略组进行优化。其中,缓存指标满足预设指标包括以下一个或多个:缓存命中率大于预设命中率,缓存迁移量小于预设迁移量、读放大倍数小于预设倍数。预设命中率、预设迁移量以及预设倍数可以是用户设置的,也可以是缓存设备根据实际情况进行动态调整得到的,此处不作具体限定。In a specific embodiment, when the cache index does not meet the preset index, the cache device determines that the first cache policy group needs to be optimized. When the cache index meets the preset index, the cache device may not optimize the first cache policy group. The cache index satisfying the preset index includes one or more of the following: the cache hit rate is greater than the preset hit rate, the cache migration amount is smaller than the preset migration amount, and the read magnification is smaller than the preset multiple. The preset hit rate, the preset migration amount, and the preset multiple may be set by the user, or may be dynamically adjusted by the cache device according to the actual situation, which is not specifically limited here.
可以理解的,当缓存指标满足预设指标时,意味着缓存是有效的,也就是说,利用第一缓存策略组对第一实体产生的数据进行缓存时可以获得满意的缓存效果,因此为了节约资源,缓存设备可以不用对第一缓存策略组进行优化。相反的,当缓存指标不满足预设指标时,意味着缓存不具备有效性,此时需要对第一缓存策略组进行优化,从而提高缓存的有效性。It can be understood that when the cache index meets the preset index, it means that the cache is effective, that is to say, a satisfactory cache effect can be obtained when the data generated by the first entity is cached by the first cache policy group. Therefore, in order to save resources, the cache device may not need to optimize the first cache policy group. On the contrary, when the cache index does not meet the preset index, it means that the cache is not effective, and at this time, the first cache policy group needs to be optimized to improve the effectiveness of the cache.
S1025、在第一缓存策略组需要进行优化的情况下,根据预处理后的数据的访问记录,对第一缓存策略组进行优化,从而得到第二缓存策略组。S1025. In the case that the first cache policy group needs to be optimized, optimize the first cache policy group according to the access record of the preprocessed data, thereby obtaining the second cache policy group.
在一更具体的实施例中,考虑到第一缓存策略组包括多个缓存策略,因此,第一缓存策略组的优化过程属于多目标优化问题,又由于多目标优化问题会涉及到各个子目标(此处为各个缓存策略)之间的相互制约,也就是说,一个子目标的优化可能会引起其他子目标的性能降低,因此,本申请利用启发式算法(heuristic algorithm)(例如,进化算法)或机器学习算法(例如,强化学习算法)来优化第一缓存策略组,以使得第一缓存策略组中的各个缓存策略都尽可能达到最优化。In a more specific embodiment, considering that the first caching strategy group includes multiple caching strategies, the optimization process of the first caching strategy group belongs to a multi-objective optimization problem, and since the multi-objective optimization problem will involve each sub-objective. (here, each caching strategy) is mutually restricted, that is, the optimization of one sub-goal may cause the performance of other sub-goals to degrade. Therefore, this application uses a heuristic algorithm (for example, an evolutionary algorithm) ) or a machine learning algorithm (eg, a reinforcement learning algorithm) to optimize the first caching strategy group, so that each caching strategy in the first caching strategy group is as optimized as possible.
下面遗传算法以为例,对第一缓存策略组的优化过程进行叙述。The following describes the optimization process of the first cache strategy group by taking the genetic algorithm as an example.
遗传算法(genetic algorithm,GA)是一种具有高鲁棒性和广泛适应性的进化算法,通过模拟自然选择和遗传中发生的复制、交叉和变异等现象,使群体可以进化到搜索空间中越来越好的区域,从而可以产生一群最适应应用场景的个体。Genetic algorithm (GA) is an evolutionary algorithm with high robustness and wide adaptability. By simulating the phenomena of duplication, crossover, and mutation that occur in natural selection and genetics, the population can evolve into the search space. The better the area, so that a group of individuals that are most suitable for the application scenario can be generated.
假设,第一缓存策略组包括K个缓存策略,K为正整数,缓存设备利用遗传算法优化第一缓存策略组的具体过程为:首先,将第一缓存策略组看作一个群体,将第一缓存策略组中的每个缓存策略都看作一个待优化的个体,并对每个待优化的个体进行编码(即对缓存策略 的参数进行编码),然后,随机生成K个串结构的数据,每个串结构的数据表示一个待优化的个体,从而得到初始群体数据,接下来,以初始化群体数据为搜索点,计算每个待优化的个体的适应度,并将当前群体中适应度大于预设适应度的个体遗传到下一代群体,然后,通过交叉运算和变异算法产生新个体,从而得到新一代群体,然后确定新一代群体对应的缓存策略组的缓存指标,当新一代群体对应的缓存策略组的缓存指标大于预设阈值时,迭代执行上述优化过程,直到优化后的缓存策略组的缓存指标小于或等于预设阈值时,停止迭代,从而得到第二缓存策略组。Assuming that the first cache policy group includes K cache policies, where K is a positive integer, the specific process for the cache device to use the genetic algorithm to optimize the first cache policy group is as follows: first, the first cache policy group is regarded as a group, and the first Each cache policy in the cache policy group is regarded as an individual to be optimized, and each individual to be optimized is encoded (that is, the parameters of the cache policy are encoded), and then K data of string structure are randomly generated, The data of each string structure represents an individual to be optimized, so as to obtain the initial population data. Next, take the initial population data as the search point, calculate the fitness of each individual to be optimized, and calculate the fitness of the current population greater than the expected one. It is assumed that the individuals with fitness are inherited to the next generation group, and then new individuals are generated through the crossover operation and mutation algorithm to obtain a new generation group, and then the cache index of the cache strategy group corresponding to the new generation group is determined. When the cache index of the strategy group is greater than the preset threshold, the above optimization process is performed iteratively, and the iteration is stopped until the cache index of the optimized cache strategy group is less than or equal to the preset threshold, thereby obtaining the second cache strategy group.
下面,通过一个具体的例子来进一步说明本申请提供的第一缓存策略组以及利用第一数据缓存方法,具体包括如下步骤,如图6所示:Below, a specific example is used to further illustrate the first caching strategy group provided by the present application and the first data caching method, which specifically includes the following steps, as shown in FIG. 6 :
步骤①:对当前应用于实体产生的数据的缓存策略的缓存效果进行评估。Step 1: Evaluate the caching effect of the caching strategy currently applied to the data generated by the entity.
具体地,监控当前缓存对应的缓存指标,如果当前缓存对应的缓存指标大于预设阈值,说明当前的缓存策略的缓存效果较好,接下来仍可以使用当前的缓存策略。如果当前缓存对应的缓存指标小于或等于预设阈值,意味着当前的缓存中存储了大量不符合缓存需求的数据,也就说明当前的缓存策略的缓存效果较差,接下来需要执行步骤②-步骤⑥来构建能够满足实体缓存需求的缓存策略。Specifically, the cache index corresponding to the current cache is monitored. If the cache index corresponding to the current cache is greater than the preset threshold, it means that the current cache strategy has a better cache effect, and the current cache strategy can still be used. If the cache index corresponding to the current cache is less than or equal to the preset threshold, it means that a large amount of data that does not meet the cache requirements is stored in the current cache, which means that the cache effect of the current cache strategy is poor, and then steps ②- Step ⑥ to build a caching strategy that can meet the needs of entity caching.
步骤②:采集实体产生的数据的访问日志。Step ②: Collect the access log of the data generated by the entity.
步骤③:为实体构建对应的缓存策略组。Step 3: Build a corresponding cache policy group for the entity.
步骤④:对步骤③构建出的缓存策略组进行合法性检测,并在缓存策略组合法的情况下,对其进行缓存效果的评估。Step 4: The validity of the cache strategy group constructed in step ③ is checked, and in the case of the combination method of the cache strategy, the cache effect is evaluated.
步骤⑤:当缓存策略组的缓存效果没有达到预设效果(即缓存指标不满足预设指标)时,根据实体产生的数据的访问日志,对缓存策略组进行迭代优化,直至得到一个缓存效果达到预设效果的新的缓存策略组。Step ⑤: When the cache effect of the cache policy group does not reach the preset effect (that is, the cache index does not meet the preset index), the cache policy group is iteratively optimized according to the access log of the data generated by the entity, until a cache effect is obtained. New set of caching strategies for preset effects.
步骤⑥:将步骤⑤得到新的缓存策略组应用到存储系统,以用于处理实体产生的数据,从而达到更好的缓存效果,并进一步提高实体获取数据的速度。Step ⑥: Apply the new cache strategy group obtained in step ⑤ to the storage system to process the data generated by the entity, so as to achieve a better cache effect and further improve the speed at which the entity acquires data.
可选的,还可以省略步骤1,通过定期执行步骤2-步骤6来优化实体对应的缓存策略,使得缓存策略能够更加适应实体的缓存需求。Optionally, step 1 may also be omitted, and steps 2 to 6 are performed periodically to optimize the caching policy corresponding to the entity, so that the caching policy can better adapt to the caching requirements of the entity.
应理解,为了简便,上述例子并没有对每个步骤的具体实现方式进行详细叙述,具体可参见上述方法实施例,此处不再展开赘述。It should be understood that, for the sake of simplicity, the above examples do not describe the specific implementation manner of each step in detail. For details, refer to the above method embodiments, which will not be repeated here.
前述内容详细阐述了本申请的方法,为了更好地实施本申请提供的方法,接下来将介绍本申请提供的用于配合实施上述方法的相关装置和设备。The foregoing content describes the method of the present application in detail. In order to better implement the method provided by the present application, the relevant apparatus and equipment provided by the present application for implementing the above-mentioned method will be introduced next.
如图7所示,图7示出了本申请提供的一种数据缓存装置的结构示意图,该装置100包括获取单元110、缓存单元120。As shown in FIG. 7 , FIG. 7 shows a schematic structural diagram of a data caching apparatus provided by the present application. The apparatus 100 includes an obtaining unit 110 and a caching unit 120 .
获取单元110,用于从多个类型的缓存策略库中获取多个缓存策略,每个类型的缓存策略库包括的至少一个缓存策略的类型相同,其中,缓存策略库的类型包括过滤类型、预取类型、替换类型、牺牲缓存类型中的至少一个,过滤类型的缓存策略用于过滤数据,预取类型的缓存策略用于预取数据,替换类型的缓存策略用于从缓存中淘汰数据,牺牲缓存类型的缓存策略用于处理从缓存中淘汰的数据。The acquiring unit 110 is configured to acquire multiple cache policies from multiple types of cache policy libraries, where each type of cache policy library includes at least one cache policy of the same type, wherein the types of the cache policy library include filtering type, pre- At least one of the fetch type, the replacement type, and the sacrifice cache type, the filter type cache policy is used to filter data, the prefetch type cache policy is used to prefetch data, the replacement type cache policy is used to eliminate data from the cache, sacrifice Cache-type caching strategies are used to handle data that is evicted from the cache.
缓存单元120,用于将包括多个缓存策略的第一缓存策略组运用于第一实体产生的数据。The cache unit 120 is configured to apply the first cache policy group including a plurality of cache policies to the data generated by the first entity.
在一具体的实施例中,缓存单元120具体用于:根据第一实体产生的数据的访问记录对 第一缓存策略组进行优化,得到第二缓存策略组;将第二缓存策略组运用于第一实体产生的数据。In a specific embodiment, the cache unit 120 is specifically configured to: optimize the first cache policy group according to the access record of the data generated by the first entity to obtain the second cache policy group; apply the second cache policy group to the first cache policy group. Data generated by an entity.
在一具体的实施例中,缓存策略库的类型还包括专有类型,专有类型的缓存策略为用户设置的缓存策略。In a specific embodiment, the type of the cache policy library further includes a proprietary type, and the cache policy of the proprietary type is a cache policy set by a user.
在一具体的实施例中,缓存单元120还用于:根据第二实体产生的数据的访问记录对第一缓存策略组进行优化,得到第三缓存策略组;将第三缓存策略组运用于第二实体产生的数据。In a specific embodiment, the cache unit 120 is further configured to: optimize the first cache policy group according to the access records of the data generated by the second entity to obtain a third cache policy group; apply the third cache policy group to the third cache policy group. Data generated by two entities.
在一具体的实施例中,第一缓存策略组中的多个缓存策略是按照预设顺序排布的,当多个缓存策略包括专有类型的缓存策略时,专有类型的缓存策略的位置由用户设置。In a specific embodiment, the multiple cache policies in the first cache policy group are arranged in a preset order, and when the multiple cache policies include a proprietary type of cache policy, the location of the proprietary type of cache policy Set by the user.
在一具体的实施例中,数据缓存装置100还包括确定单元130,确定单元130用于:确定第一缓存策略组的合法性。In a specific embodiment, the data caching apparatus 100 further includes a determining unit 130, and the determining unit 130 is configured to: determine the validity of the first caching policy group.
在一具体的实施例中,多个缓存策略中的每个缓存策略对应一个策略属性集合,确定单元130具体用于:根据多个缓存策略对应的多个策略属性集合,确定第一缓存策略组的合法性;其中,一个缓存策略对应的策略属性集合包括第一属性、第二属性中的至少一个,第一属性用于确定第一缓存策略组中是否存在与上述一个缓存策略冲突的缓存策略,第二属性用于确定第一缓存策略组中能否包括多个上述一个缓存策略。In a specific embodiment, each cache policy in the multiple cache policies corresponds to a policy attribute set, and the determining unit 130 is specifically configured to: determine the first cache policy group according to the multiple policy attribute sets corresponding to the multiple cache policies The legality of the cache policy; wherein, a set of policy attributes corresponding to a cache policy includes at least one of a first attribute and a second attribute, and the first attribute is used to determine whether there is a cache policy conflicting with the above-mentioned one cache policy in the first cache policy group , and the second attribute is used to determine whether the first cache policy group includes a plurality of the above one cache policy.
在一具体的实施例中,缓存单元120具体用于:在第一缓存策略组合法的情况下,根据第一实体产生的数据的访问记录,利用启发式算法或机器学习算法对第一缓存策略组中的每个缓存策略进行迭代优化,从而得到第二缓存策略组。In a specific embodiment, the caching unit 120 is specifically configured to: in the case of the first caching strategy combination method, use a heuristic algorithm or a machine learning algorithm for the first caching strategy according to the access records of the data generated by the first entity. Each cache policy in the group is iteratively optimized to obtain a second cache policy group.
本申请实施例的数据缓存装置100仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将数据缓存装置100的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。另外,上述实施例提供的数据缓存装置100与上述方法实施例中的缓存设备属于同一构思,其具体实现过程详见上述方法实施例,例如,获取单元110从多个类型的缓存策略库中获取多个缓存策略的过程具体请参见上述步骤S101,缓存单元120优化第一缓存策略组的过程具体请参见上述步骤S1021-S1025,这里不再赘述。The data caching apparatus 100 in this embodiment of the present application only uses the division of the above-mentioned functional modules as an example. In practical applications, the above-mentioned functions may be allocated by different functional modules as required, that is, the internal structure of the data caching apparatus 100 is divided into Different functional modules to complete all or part of the functions described above. In addition, the data caching apparatus 100 provided in the above embodiment belongs to the same concept as the caching device in the above method embodiment, and the specific implementation process is detailed in the above method embodiment. For example, the obtaining unit 110 obtains from multiple types of cache policy libraries For the process of multiple caching policies, please refer to the above step S101, and for the process of optimizing the first cache policy group by the caching unit 120, please refer to the above steps S1021-S1025, which will not be repeated here.
如图8所示,图8示出了本申请提供的一种缓存设备的结构示意图,缓存设备200包括处理器210、通信接口220和存储器230。其中,处理器210、通信接口220以及存储器230通过总线240进行耦合。As shown in FIG. 8 , FIG. 8 shows a schematic structural diagram of a cache device provided by the present application. The cache device 200 includes a processor 210 , a communication interface 220 and a memory 230 . The processor 210 , the communication interface 220 and the memory 230 are coupled through the bus 240 .
处理器210可以是中央处理器(central processing unit,CPU),通用处理器、DSP、专用集成电路(application specific integrated circuit,ASIC)、现场可编程门阵列(field programmable gate array,FPGA)或者其他可编程逻辑器件(programmable logic device,PLD)、CPLD、晶体管逻辑器件、硬件部件或者其任意组合。处理器210可以实现或执行结合上述方法实施例中所描述的各种示例性的方法。具体的,处理器210读取存储器230中存储的程序代码,并与通信接口220配合执行S101-S102、S1021-S1025以及步骤①-⑥的部分或者全部步骤。The processor 210 may be a central processing unit (CPU), a general-purpose processor, a DSP, an application specific integrated circuit (ASIC), a field programmable gate array (FPGA), or any other available processor. Programmable logic device (PLD), CPLD, transistor logic device, hardware component, or any combination thereof. The processor 210 may implement or execute various exemplary methods described in conjunction with the above method embodiments. Specifically, the processor 210 reads the program code stored in the memory 230, and cooperates with the communication interface 220 to execute part or all of steps S101-S102, S1021-S1025 and steps ①-⑥.
通信接口220可以为有线接口或无线接口,用于与其他模块或设备进行通信,有线接口可以是以太接口、控制器局域网络接口、局域互联网络(local interconnect network,LIN)以及FlexRay接口,无线接口可以是蜂窝网络接口或使用无线局域网接口等。具体的,通信接口220可以与其他设备连接,例如,通信接口220可以与存储系统相连,当处理器210得到第一缓存策略组之后,通过通信接口220可以将第一缓存策略发送给存储系统,以用于处理第 一实体产生的数据。The communication interface 220 can be a wired interface or a wireless interface for communicating with other modules or devices. The wired interface can be an Ethernet interface, a controller area network interface, a local interconnect network (LIN), and a FlexRay interface. The interface may be a cellular network interface or use a wireless local area network interface or the like. Specifically, the communication interface 220 can be connected to other devices. For example, the communication interface 220 can be connected to a storage system. After the processor 210 obtains the first cache policy group, the first cache policy can be sent to the storage system through the communication interface 220. for processing the data generated by the first entity.
存储器230可以包括易失性存储器,例如随机存取存储器(random access memory,RAM);存储器230也可以包括非易失性存储器,例如只读存储器(read only memory,ROM)、快闪存储器、硬盘(hard disk drive,HDD)或固态硬盘(solid state drive,SSD),存储器230还可以包括上述种类的存储器的组合。存储器230可以存储有程序代码以及程序数据。其中,程序代码由图7示出的数据缓存装置100中的部分或者全部单元的代码组成,例如,获取单元110的代码、缓存单元120的代码以及确定单元130。程序数据由图7示出的数据缓存装置100在运行程序的过程中产生的数据,例如,第一实体产生的数据、缓存策略等。The memory 230 may include volatile memory, such as random access memory (RAM); the memory 230 may also include non-volatile memory, such as read only memory (ROM), flash memory, hard disk (hard disk drive, HDD) or solid state drive (solid state drive, SSD), the memory 230 may also include a combination of the above-mentioned types of memory. The memory 230 may store program codes and program data. The program code is composed of codes of some or all of the units in the data caching apparatus 100 shown in FIG. The program data is the data generated by the data caching apparatus 100 shown in FIG. 7 in the process of running the program, for example, the data generated by the first entity, the cache policy, and the like.
总线240可以是控制器局域网络(controller area network,CAN)或其他实现内部总线。总线240可以分为地址总线、数据总线、控制总线等。为了便于表示,图8中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。 Bus 240 may be a controller area network (CAN) or other implementation internal bus. The bus 240 can be divided into an address bus, a data bus, a control bus, and the like. For ease of presentation, only one thick line is used in FIG. 8, but it does not mean that there is only one bus or one type of bus.
本申请实施例中的缓存设备200用于执行上述方法实施例中的缓存设备执行的方法,与上述方法实施例属于同一构思,其具体实现过程详见上述方法实施例,这里不再赘述。The cache device 200 in the embodiment of the present application is configured to execute the method executed by the cache device in the above method embodiments, which belongs to the same concept as the above method embodiments. For details of the specific implementation process, please refer to the above method embodiments, which will not be repeated here.
本申请还提供了一种计算机可读存储介质,计算机可读存储介质存储有计算机指令,当计算机指令在计算设备(例如,图7示出的数据缓存装置100或图8示出的缓存设备200)上运行时,使得计算设备执行上述方法实施例中的缓存设备执行的方法。The present application also provides a computer-readable storage medium, where the computer-readable storage medium stores computer instructions, when the computer instructions are stored in a computing device (for example, the data caching apparatus 100 shown in FIG. 7 or the caching apparatus 200 shown in FIG. ), causing the computing device to execute the method executed by the cache device in the foregoing method embodiments.
本申请还提供了一种计算机程序产品,包括计算机程序,当计算机程序被计算设备(例如,图7示出的数据缓存装置100或图8示出的缓存设备200)读取并执行时,用于实现上述方法实施例中的缓存设备执行的方法。The present application also provides a computer program product, including a computer program, when the computer program is read and executed by a computing device (for example, the data caching device 100 shown in FIG. 7 or the caching device 200 shown in FIG. 8 ), using The method is used to implement the method executed by the cache device in the above method embodiment.
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。上述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行上述计算机程序指令时,全部或部分地产生按照本申请实施例所述的流程或功能。上述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。上述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,上述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(如,同轴电缆、光纤、数字用户线)或无线(如,红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。上述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。上述可用介质可以是磁性介质,(如,软盘、存储盘、磁带)、光介质(如,DVD)、或者半导体介质(如,SSD)等。在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其他实施例的相关描述。In the above-mentioned embodiments, it may be implemented in whole or in part by software, hardware, firmware or any combination thereof. When implemented in software, it can be implemented in whole or in part in the form of a computer program product. The computer program product described above includes one or more computer instructions. When the above-mentioned computer program instructions are loaded and executed on a computer, all or part of the processes or functions described in the embodiments of the present application are generated. The aforementioned computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable device. The above-mentioned computer instructions may be stored in a computer-readable storage medium, or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the above-mentioned computer instructions may be transmitted from a website site, computer, server or data center via wired communication. (eg, coaxial cable, optical fiber, digital subscriber line) or wireless (eg, infrared, wireless, microwave, etc.) to another website site, computer, server or data center. The above-mentioned computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device such as a server, a data center, etc. that includes one or more available media integrated. The above-mentioned usable media may be magnetic media (eg, floppy disks, memory disks, magnetic tapes), optical media (eg, DVD), or semiconductor media (eg, SSD), and the like. In the above-mentioned embodiments, the description of each embodiment has its own emphasis. For parts that are not described in detail in a certain embodiment, reference may be made to the relevant descriptions of other embodiments.
在本申请所提供的几个实施例中,应该理解到,所揭露的装置,也可以通过其它的方式实现。例如以上所描述的装置实施例仅是示意性的,例如所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可结合或者可以集成到另一个系统,或一些特征可以忽略或不执行。另一点,所显示或讨论的相互之间的间接耦合或者直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性或其它的形式。In the several embodiments provided in this application, it should be understood that the disclosed apparatus may also be implemented in other manners. For example, the device embodiments described above are only illustrative. For example, the division of the units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components may be combined or integrated. to another system, or some features can be ignored or not implemented. On the other hand, the indirect coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be in electrical or other forms.
上述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者,也可以分布到多个网络单 元上。可以根据实际的需要选择其中的部分或者全部单元来实现本申请实施例的方案的目的。The units described above as separate components may or may not be physically separated, and components shown as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solutions in the embodiments of the present application.
另外,在本申请各实施例中的各功能单元可集成在一个处理单元中,也可以是各单元单独物理存在,也可以是两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit. The above-mentioned integrated units may be implemented in the form of hardware, or may be implemented in the form of software functional units.
上述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可为个人计算机、服务器或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质例如可包括:U盘、移动硬盘、只读存储器、随机存取存储器、磁碟或光盘等各种可存储程序代码的介质。If the above-mentioned integrated units are implemented in the form of software functional units and sold or used as independent products, they may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application can be embodied in the form of a software product in essence, or the part that contributes to the prior art, or all or part of the technical solution, and the computer software product is stored in a storage medium, Several instructions are included to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present application. The aforementioned storage medium may include, for example, various media that can store program codes, such as a U disk, a removable hard disk, a read-only memory, a random access memory, a magnetic disk or an optical disk.

Claims (18)

  1. 一种数据缓存方法,其特征在于,所述方法包括:A data caching method, characterized in that the method comprises:
    从多个类型的缓存策略库中获取多个缓存策略,每个类型的缓存策略库包括的至少一个缓存策略的类型相同,其中,所述缓存策略库的类型包括过滤类型、预取类型、替换类型、牺牲缓存类型中的至少一个,所述过滤类型的缓存策略用于过滤数据,所述预取类型的缓存策略用于预取数据,所述替换类型的缓存策略用于从缓存中淘汰数据,所述牺牲缓存类型的缓存策略用于处理从所述缓存中淘汰的数据;Acquire multiple cache policies from multiple types of cache policy libraries, each type of cache policy library includes at least one cache policy of the same type, wherein the types of the cache policy library include filter type, prefetch type, replacement type At least one of the type and the sacrificial cache type, the filter-type cache policy is used to filter data, the prefetch-type cache policy is used to prefetch data, and the replacement-type cache policy is used to eliminate data from the cache , the cache strategy of the sacrificial cache type is used to process the data eliminated from the cache;
    将包括所述多个缓存策略的第一缓存策略组运用于第一实体产生的数据。A first set of caching policies including the plurality of caching policies is applied to the data generated by the first entity.
  2. 根据权利要求1所述的方法,其特征在于,所述将包括所述多个缓存策略的第一缓存策略组运用于第一实体产生的数据,包括:The method according to claim 1, wherein the applying the first cache policy group including the multiple cache policies to the data generated by the first entity comprises:
    根据所述第一实体产生的数据的访问记录对所述第一缓存策略组进行优化,得到第二缓存策略组;Optimizing the first cache policy group according to the access record of the data generated by the first entity to obtain a second cache policy group;
    将所述第二缓存策略组运用于所述第一实体产生的数据。Applying the second set of caching policies to data generated by the first entity.
  3. 根据权利要求1或2所述的方法,其特征在于,所述缓存策略库的类型还包括专有类型,所述专有类型的缓存策略为用户设置的缓存策略。The method according to claim 1 or 2, wherein the type of the cache policy library further includes a proprietary type, and the cache policy of the proprietary type is a cache policy set by a user.
  4. 根据权利要求1-3任一项所述的方法,其特征在于,所述方法还包括:The method according to any one of claims 1-3, wherein the method further comprises:
    根据第二实体产生的数据的访问记录对所述第一缓存策略组进行优化,得到第三缓存策略组;Optimizing the first cache policy group according to the access record of the data generated by the second entity to obtain a third cache policy group;
    将所述第三缓存策略组运用于所述第二实体产生的数据。Applying the third set of caching policies to data generated by the second entity.
  5. 根据权利要求3所述的方法,其特征在于,所述第一缓存策略组中的多个缓存策略是按照预设顺序排布的,当所述多个缓存策略包括所述专有类型的缓存策略时,所述专有类型的缓存策略的位置由所述用户设置。The method according to claim 3, wherein the multiple cache policies in the first cache policy group are arranged in a preset order, and when the multiple cache policies include the proprietary type of cache The location of the proprietary type of cache policy is set by the user.
  6. 根据权利要求1-5任一项所述的方法,其特征在于,在所述将包括所述多个缓存策略的第一缓存策略组运用于第一实体产生的数据之前,所述方法还包括:The method according to any one of claims 1-5, characterized in that before the applying the first cache policy group including the plurality of cache policies to the data generated by the first entity, the method further comprises: :
    确定所述第一缓存策略组的合法性。The validity of the first cache policy group is determined.
  7. 根据权利要求6所述的方法,其特征在于,所述多个缓存策略中的每个缓存策略对应一个策略属性集合,所述确定所述第一缓存策略组的合法性,包括:The method according to claim 6, wherein each cache policy in the plurality of cache policies corresponds to a policy attribute set, and the determining the validity of the first cache policy group comprises:
    根据所述多个缓存策略对应的多个策略属性集合,确定所述第一缓存策略组的合法性;determining the validity of the first cache policy group according to multiple policy attribute sets corresponding to the multiple cache policies;
    其中,一个缓存策略对应的策略属性集合包括第一属性、第二属性中的至少一个,所述第一属性用于确定所述第一缓存策略组中是否存在与所述一个缓存策略冲突的缓存策略,所述第二属性用于确定所述第一缓存策略组中能否包括多个所述一个缓存策略。The set of policy attributes corresponding to one cache policy includes at least one of a first attribute and a second attribute, and the first attribute is used to determine whether there is a cache conflicting with the one cache policy in the first cache policy group policy, and the second attribute is used to determine whether the first cache policy group includes a plurality of the one cache policy.
  8. 根据权利要求6或7所述的方法,其特征在于,所述根据所述第一实体产生的数据的访问记录对所述第一缓存策略组进行优化,得到第二缓存策略组,包括:The method according to claim 6 or 7, wherein the optimizing the first cache policy group according to the access record of the data generated by the first entity to obtain the second cache policy group, comprising:
    在所述第一缓存策略组合法的情况下,根据所述第一实体产生的数据的访问记录,利用启发式算法或机器学习算法对所述第一缓存策略组中的每个缓存策略进行迭代优化,从而得到所述第二缓存策略组。In the case of the first caching strategy combination method, according to the access records of the data generated by the first entity, each caching strategy in the first caching strategy group is iterated by using a heuristic algorithm or a machine learning algorithm optimization, so as to obtain the second cache policy group.
  9. 一种数据缓存装置,其特征在于,所述装置包括:A data cache device, characterized in that the device comprises:
    获取单元,用于从多个类型的缓存策略库中获取多个缓存策略,每个类型的缓存策略库包括的至少一个缓存策略的类型相同,其中,所述缓存策略库的类型包括过滤类型、预取类型、替换类型、牺牲缓存类型中的至少一个,所述过滤类型的缓存策略用于过滤数据,所述预取类型的缓存策略用于预取数据,所述替换类型的缓存策略用于从缓存中淘汰数据,所述 牺牲缓存类型的缓存策略用于处理从所述缓存中淘汰的数据;The acquiring unit is configured to acquire multiple cache policies from multiple types of cache policy libraries, where each type of cache policy library includes at least one cache policy of the same type, wherein the types of the cache policy library include filter types, At least one of a prefetch type, a replacement type, and a sacrifice cache type, the filtering type caching policy is used for filtering data, the prefetching type caching policy is used for prefetching data, and the replacement type caching policy is used for Eliminate data from the cache, and the cache strategy of the sacrificial cache type is used to process the data eliminated from the cache;
    缓存单元,用于将包括所述多个缓存策略的第一缓存策略组运用于第一实体产生的数据。The cache unit is configured to apply the first cache policy group including the plurality of cache policies to the data generated by the first entity.
  10. 根据权利要求9所述的装置,其特征在于,所述缓存单元具体用于:The device according to claim 9, wherein the cache unit is specifically used for:
    根据所述第一实体产生的数据的访问记录对所述第一缓存策略组进行优化,得到第二缓存策略组;Optimizing the first cache policy group according to the access record of the data generated by the first entity to obtain a second cache policy group;
    将所述第二缓存策略组运用于所述第一实体产生的数据。Applying the second set of caching policies to data generated by the first entity.
  11. 根据权利要求9或10所述的装置,其特征在于,所述缓存策略库的类型还包括专有类型,所述专有类型的缓存策略为用户设置的缓存策略。The apparatus according to claim 9 or 10, wherein the type of the cache policy library further includes a proprietary type, and the cache policy of the proprietary type is a cache policy set by a user.
  12. 根据权利要求9-11任一项所述的装置,其特征在于,所述缓存单元还用于:The device according to any one of claims 9-11, wherein the cache unit is further configured to:
    根据第二实体产生的数据的访问记录对所述第一缓存策略组进行优化,得到第三缓存策略组;Optimizing the first cache policy group according to the access record of the data generated by the second entity to obtain a third cache policy group;
    将所述第三缓存策略组运用于所述第二实体产生的数据。Applying the third set of caching policies to data generated by the second entity.
  13. 根据权利要求11所述的装置,其特征在于,所述第一缓存策略组中的多个缓存策略是按照预设顺序排布的,当所述多个缓存策略包括所述专有类型的缓存策略时,所述专有类型的缓存策略的位置由所述用户设置。The apparatus according to claim 11, wherein the multiple cache policies in the first cache policy group are arranged in a preset order, and when the multiple cache policies include the proprietary type of cache The location of the proprietary type of cache policy is set by the user.
  14. 根据权利要求9-13任一项所述的装置,其特征在于,所述装置还包括确定单元,所述确定单元用于:确定所述第一缓存策略组的合法性。The apparatus according to any one of claims 9-13, characterized in that, the apparatus further comprises a determination unit, wherein the determination unit is configured to: determine the validity of the first cache policy group.
  15. 根据权利要求14所述的装置,其特征在于,所述多个缓存策略中的每个缓存策略对应一个策略属性集合,所述确定单元具体用于:The device according to claim 14, wherein each cache policy in the plurality of cache policies corresponds to a policy attribute set, and the determining unit is specifically configured to:
    根据所述多个缓存策略对应的多个策略属性集合,确定所述第一缓存策略组的合法性;determining the validity of the first cache policy group according to multiple policy attribute sets corresponding to the multiple cache policies;
    其中,一个缓存策略对应的策略属性集合包括第一属性、第二属性中的至少一个,所述第一属性用于确定所述第一缓存策略组中是否存在与所述一个缓存策略冲突的缓存策略,所述第二属性用于确定所述第一缓存策略组中能否包括多个所述一个缓存策略。The set of policy attributes corresponding to one cache policy includes at least one of a first attribute and a second attribute, and the first attribute is used to determine whether there is a cache conflicting with the one cache policy in the first cache policy group policy, and the second attribute is used to determine whether the first cache policy group includes a plurality of the one cache policy.
  16. 根据权利要求14或15所述的装置,其特征在于,所述缓存单元具体用于:The device according to claim 14 or 15, wherein the cache unit is specifically used for:
    在所述第一缓存策略组合法的情况下,根据所述第一实体产生的数据的访问记录,利用启发式算法或机器学习算法对所述第一缓存策略组中的每个缓存策略进行迭代优化,从而得到所述第二缓存策略组。In the case of the first caching strategy combination method, according to the access records of the data generated by the first entity, each caching strategy in the first caching strategy group is iterated by using a heuristic algorithm or a machine learning algorithm optimization, so as to obtain the second cache policy group.
  17. 一种缓存设备,其特征在于,所述缓存设备包括处理器和存储器,所述处理器执行所述存储器中的代码以实现权利要求1至8任一项所述的方法。A cache device, characterized in that the cache device includes a processor and a memory, and the processor executes codes in the memory to implement the method of any one of claims 1 to 8.
  18. 一种计算机可读存储介质,其特征在于,存储有计算机指令,所述计算机指令用于实现权利要求1至8任一项所述的方法。A computer-readable storage medium, characterized in that it stores computer instructions, and the computer instructions are used to implement the method of any one of claims 1 to 8.
PCT/CN2022/071079 2021-01-15 2022-01-10 Data caching method and apparatus, and device and computer-readable storage medium WO2022152086A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110057914.1A CN114764416A (en) 2021-01-15 2021-01-15 Data caching method, device and equipment and computer readable storage medium
CN202110057914.1 2021-01-15

Publications (1)

Publication Number Publication Date
WO2022152086A1 true WO2022152086A1 (en) 2022-07-21

Family

ID=82365274

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/071079 WO2022152086A1 (en) 2021-01-15 2022-01-10 Data caching method and apparatus, and device and computer-readable storage medium

Country Status (2)

Country Link
CN (1) CN114764416A (en)
WO (1) WO2022152086A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20240028512A1 (en) * 2022-07-25 2024-01-25 Samsung Electronics Co., Ltd. Adaptive cache indexing for a storage device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106708636A (en) * 2016-12-29 2017-05-24 北京奇虎科技有限公司 Cluster-based data caching method and apparatus
CN106921713A (en) * 2015-12-25 2017-07-04 中国移动通信集团上海有限公司 A kind of resource caching method and device
US20190124174A1 (en) * 2016-01-22 2019-04-25 Alibaba Group Holding Limited Resource cache management method and system and apparatus
CN110929195A (en) * 2019-11-21 2020-03-27 望海康信(北京)科技股份公司 Data caching strategy determining method and device and electronic equipment

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106921713A (en) * 2015-12-25 2017-07-04 中国移动通信集团上海有限公司 A kind of resource caching method and device
US20190124174A1 (en) * 2016-01-22 2019-04-25 Alibaba Group Holding Limited Resource cache management method and system and apparatus
CN106708636A (en) * 2016-12-29 2017-05-24 北京奇虎科技有限公司 Cluster-based data caching method and apparatus
CN110929195A (en) * 2019-11-21 2020-03-27 望海康信(北京)科技股份公司 Data caching strategy determining method and device and electronic equipment

Also Published As

Publication number Publication date
CN114764416A (en) 2022-07-19

Similar Documents

Publication Publication Date Title
US9304928B2 (en) Systems and methods for adaptive prefetching
US10346067B2 (en) Multi-tier file storage management using file access and cache profile information
US9817765B2 (en) Dynamic hierarchical memory cache awareness within a storage system
JP5087467B2 (en) Method and apparatus for managing data compression and integrity in a computer storage system
US20080147974A1 (en) Multi-level caching system
US10409728B2 (en) File access predication using counter based eviction policies at the file and page level
US10133673B2 (en) Cache optimization based on predictive routing
WO2014183514A1 (en) Method, device, and computer storage medium for hierarchical storage
JP6573674B2 (en) Storage constrained synchronization of shared content items
WO2022152086A1 (en) Data caching method and apparatus, and device and computer-readable storage medium
CN116560562A (en) Method and device for reading and writing data
JP5481669B2 (en) Cache control method, node device, manager device, and computer system
US11216316B1 (en) Facilitating object deletion based on delete lock contention in distributed file systems
US6915386B2 (en) Processing service level agreement (SLA) terms in a caching component of a storage system
WO2023165543A1 (en) Shared cache management method and apparatus, and storage medium
US20230224209A1 (en) Adaptive time window-based log message deduplication
CN116028389A (en) Hot spot data caching method, device, equipment and medium
US20220027322A1 (en) Facilitating exclusive local locks on a distributed file system
CN117235088B (en) Cache updating method, device, equipment, medium and platform of storage system
WO2022148306A1 (en) Data elimination method and apparatus, cache node, and cache system
CN116069529B (en) Dynamic caching method and device, electronic equipment and computer readable medium
CN109617943B (en) Data caching method and device, storage medium and computer equipment
US20230169005A1 (en) Cache prefetching method and system based on k-truss graph for storage system, and medium
CN117908788A (en) Metadata caching method and device, electronic equipment and storage medium
KR20240022203A (en) Communication system for private information retrieval using user preference based cache and its operation method

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22738967

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 22738967

Country of ref document: EP

Kind code of ref document: A1