WO2023185770A1 - Cloud data caching method and apparatus, device and storage medium - Google Patents

Cloud data caching method and apparatus, device and storage medium Download PDF

Info

Publication number
WO2023185770A1
WO2023185770A1 PCT/CN2023/084183 CN2023084183W WO2023185770A1 WO 2023185770 A1 WO2023185770 A1 WO 2023185770A1 CN 2023084183 W CN2023084183 W CN 2023084183W WO 2023185770 A1 WO2023185770 A1 WO 2023185770A1
Authority
WO
WIPO (PCT)
Prior art keywords
file
user
cache
frequency
access
Prior art date
Application number
PCT/CN2023/084183
Other languages
French (fr)
Chinese (zh)
Inventor
余洋
孙相征
何万青
Original Assignee
阿里巴巴(中国)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 阿里巴巴(中国)有限公司 filed Critical 阿里巴巴(中国)有限公司
Publication of WO2023185770A1 publication Critical patent/WO2023185770A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/13File access structures, e.g. distributed indices
    • G06F16/137Hash-based
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/172Caching, prefetching or hoarding of files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • G06F16/1824Distributed file systems implemented using Network-attached Storage [NAS] architecture
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/0643Management of files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0662Virtualisation aspects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • This application belongs to the field of computer technology, and specifically relates to a cloud data caching method, system, equipment and storage medium.
  • HPC High Performance Computing, high-performance computer cluster
  • This application proposes a cloud data caching method, system, equipment and storage medium, which can build a multi-level data caching architecture on the cloud host including a caching layer, a caching management layer and a caching client, and record each user file
  • the file distribution hash table of the storage location can flexibly cope with complex IO scenarios on the cloud.
  • the idle resources of the cloud host are used to build the cache layer, so that cloud resources can be fully utilized and the upper layer processing IO pressure can be effectively alleviated.
  • the first embodiment of the present application proposes a data caching method on the cloud.
  • the method includes:
  • the hierarchical characteristics corresponding to the user file are determined, and based on the hierarchical characteristics, the user files are cached to the storage area of the corresponding level.
  • the hierarchical characteristics include access frequency and modification frequency.
  • at least one of the amount of data, the different levels of storage areas include the distributed memory of the cloud host, the virtual disk, and the file system mounted on the cloud host.
  • the third embodiment of the present application provides a cloud data cache system, including a data source layer, a cache layer on a cloud host, a cache management and control layer, a cache client on the cloud host, and an HPC processing end, where:
  • the data source layer includes cloud low-frequency file storage, cloud object storage and IDC file storage;
  • the cache layer includes a file system, distributed memory and virtual disk mounted on the cloud host.
  • the cache layer is used to cache frequently accessed user files;
  • the cache management and control layer includes a cache configuration center, a file access characteristic statistics table and a file distribution hash table.
  • the cache management and control layer is used to manage cached user files;
  • the cache client is used to provide a data operation interface for the HPC processing layer and process IO requests.
  • An embodiment of the fourth aspect of the present application provides an electronic device, including a memory, a processor, and a computer program stored on the memory and executable on the processor.
  • the processor runs the computer program to Implement the method described in the first or second aspect above.
  • the embodiment of the fifth aspect of the present application provides a computer-readable storage medium on which a computer program is stored, and the program is executed by a processor to implement the method described in the first or second aspect.
  • a multi-level data cache architecture including a cache layer, a cache management layer, and a cache client can be constructed on the cloud host, and a flexible response can be implemented based on the file distribution hash table that records the storage locations of each user's files.
  • Complex IO scenarios on the cloud In addition, in the embodiment of this application, the idle resources of the cloud host are used to build the cache layer, so that cloud resources can be fully utilized and the upper layer processing IO pressure can be effectively alleviated.
  • Figure 1 shows an operation flow chart of a cloud data caching method provided by an embodiment of the present application
  • Figure 2 shows an architectural diagram of a cloud data caching system provided by an embodiment of the present application
  • Figure 3 shows a schematic structural diagram of a cloud data caching device provided by an embodiment of the present application
  • Figure 4 shows a schematic structural diagram of an electronic device provided by an embodiment of the present application
  • Figure 5 shows a schematic diagram of a storage medium provided by an embodiment of the present application.
  • the cloud data caching method provided by the embodiment of this application. Referring to Figure 1, the method specifically includes the following steps:
  • Step 101 Cache user files through the cloud host's distributed memory, virtual disk, and file system mounted on the cloud host.
  • HPC processing such as film and television rendering, biological information, and underground exploration
  • HPC processing usually requires massive computing resources.
  • the computing process is accompanied by a large number of file reading and writing operations, so the performance requirements for cloud file storage are extremely high.
  • HPC processing cloud migration often encounters the following storage problems: storage performance does not match processing requirements. That is, for users who are migrating to the cloud for the first time, it is difficult to select the file storage specification that best matches the processing IO characteristics at one time. Subsequent changes in storage specifications involve changes to underlying hardware resources and data migration, which is very costly. In addition, due to the diversity of offline HPC processing, users may face the situation that low-end storage performance on the cloud cannot meet processing needs, while high-end storage specifications overflow performance and are expensive. It is difficult for a single storage to cope with complex IO characteristics 1. HPC processing usually involves a huge amount of data, and the access characteristics (access frequency, block size) of different data vary greatly. Storing all data on a single storage is difficult to control costs.
  • this application proposes a method of caching user files through the distributed memory of the cloud host, the virtual disk, and the file system mounted on the cloud host to implement data caching.
  • the specific application includes the cloud data caching system applied to the data source layer, the cache layer on the cloud host, the cache management and control layer, the cache client on the cloud host, and the HPC processing end.
  • the data source layer in the cloud data caching system in this application includes cloud low-frequency file storage, cloud object storage and IDC file storage.
  • the cache layer includes the file system, distributed memory and virtual disk mounted on the cloud host.
  • the cache layer is used to cache frequently accessed user files; and the cache management and control layer includes the cache configuration center, file access characteristic statistics table and file Distributed hash table, cache management and control layer is used to manage cached user files; among them, the cache client is used to provide a data operation interface for the HPC processing layer and process IO requests.
  • the transmission of user file data between various cloud hosts can be supported based on file system mounting. It should be noted that it can support standard file IO interface. And only used for caching user files whose data access frequency is low.
  • cloud object storage it can support the transmission and access of user file data between cloud hosts through specific API interfaces, which has certain advantages in data distribution.
  • the file storage located in the user's local computer room supports the user's local processing on the one hand, and on the other hand is connected to the cloud network through a dedicated line/VPN.
  • the file system mounted on the cloud host in the cache layer is a high-frequency file cache area on the cloud, which can be used to support the persistent global cache layer for data sharing between cloud hosts, and has better performance than low-frequency file storage. Stronger.
  • it can be used to cache files in the data source that are accessed more frequently, have larger file data volumes, and frequently change data, such as core user files for HPC tasks.
  • the free disk space of the cloud host is used to cache files in data sources with high access frequency, small file data volume, and infrequent data changes, such as: processing software, program plug-ins, pre- and post-processing scripts, etc.
  • the disk cache capacity can be expanded by adding a data disk to the cloud host.
  • the user files cached in the disk cache used for local caching can be user files exclusive to a certain cloud host.
  • distributed memory it can be used for non-persistent global cache layer. That is to say, it and the high-frequency file cache area on the cloud can also be used to support the persistent global cache layer that supports data sharing between cloud hosts.
  • the free memory of multiple cloud hosts builds a memory file system through tmpfs/ramdisk and other methods to form a distributed memory cache layer, and is uniformly managed by the cache management and control layer to cache data sources that are frequently used. The file block being accessed. The more cloud hosts there are during peak user processing periods, the larger the memory cache space will be.
  • the cache configuration center of the cache management and control layer in the cloud data cache system of this application can be used to maintain the cache configuration information of cached data in the system and provide cache control interfaces to users.
  • users can enable and close each cache layer by interacting with the cache configuration center, thereby cooperating with the cache client to achieve User files can be retrieved at any time.
  • the cache cold data cleaning strategy can also be implemented, such as regularly cleaning data with low data access frequency based on the memory/disk occupancy ratio and file popularity.
  • caching strategies can also be customized, such as data prefetching based on specific file names.
  • file access characteristic statistics table can be used to maintain the file access characteristics of the upper-layer HPC Workload, including but not limited to file access popularity, access mode (sequential access, random access), read file block size, etc. Supports periodic dimension statistics (for example, by month/day/hour/minute). File access characteristics statistics tables are also used to provide input for cache cold data cleanup and data flow.
  • the file distribution hash table is used to maintain the storage location of files/file blocks in the cache layer, supporting HPC Workload to efficiently obtain target files/file blocks and ensuring the efficiency of upper-layer processing.
  • the three components of the cache management and control layer in the data cache system can be used to store cache core data, and the storage method is not limited to database, redis, files, and ensure the consistency of data access/update under the distributed architecture through mutex locks.
  • the cache client of this application can provide file system entry and standard POSIX file operation interface for the HPC processing layer upward, and is responsible for real-time processing of IO requests downward.
  • the target file/file block can be read from the data source and passed to the upper layer for processing, and at the same time, the target file/file block can be cached to a high-frequency file on the cloud.
  • the target file/file block is a file that has been cached in the cloud high-frequency file storage or disk cache but has not been cached in the distributed memory cache or the cache is invalid
  • the file is read from the target cache and passed to the upper layer for processing.
  • the read target file/file block is cached in the distributed memory cache, and the file and its mapping relationship are synchronously updated into the file distribution hash table.
  • the target file/file block is a file block that has been cached in the distributed memory cache
  • the target file/file block is directly read from the distributed memory cache and passed to the upper layer for processing.
  • the cache client is responsible for the real-time processing of IO requests. It can periodically update the statistical hierarchical feature information to the file access feature table and obtain the relevant configuration information of the cache configuration center. According to the time-series changes in file access heat and read files Block size, combined with the configuration information of the cache configuration center, realizes the flow of hot data between high-frequency file cache and disk cache on the cloud and the cleanup of cold data at each cache layer, and simultaneously updates the file distribution hash table.
  • Step 102 When the user file to be cached is obtained, the hierarchical characteristics corresponding to the user file are determined, and based on the hierarchical characteristics, the user files are cached to the storage area of the corresponding level.
  • the hierarchical characteristics include access frequency, modification frequency and data volume, which vary.
  • the level storage area includes the distributed memory of the cloud host, the virtual disk, and the file system mounted on the cloud host.
  • each user file to be cached can be cached in a corresponding hierarchical manner according to one of access frequency, modification frequency, and data amount.
  • the distributed memory, the virtual disk and the file system mounted on the cloud host can jointly form the multi-level cache area.
  • the distributed memory cache can correspond to cached user files: user files with high cache access frequency (i.e. hot data) and file block granularity and small capacity. At the same time, it can also provide low latency for upper-layer processing. data access. Shared cache.
  • the disk cache can correspond to cached user files: files with high cache access frequency (i.e. hot data), file block granularity, small capacity, and infrequent changes, to avoid single points that may occur in shared caches. IO bottleneck and share the pressure on the distributed memory cache.
  • high cache access frequency i.e. hot data
  • file block granularity i.e. small capacity
  • infrequent changes to avoid single points that may occur in shared caches. IO bottleneck and share the pressure on the distributed memory cache.
  • the file system mounted on the cloud host can correspond to the cached user files: cache files with high access frequency (i.e., warm data) and user files with file block granularity, large capacity, and frequent changes. Understandably, as the backend of distributed memory cache and disk cache, it carries most of the hot/warm data required for upper-layer processing, thus reducing access to data sources.
  • the cloud data caching system in this application can support hot swapping, support docking with data sources on and off the cloud, and support horizontal and vertical expansion.
  • the corresponding cache location can be determined based on the preset file distribution hash table (the file distribution hash table records the mapping relationship between each user file and the cache location). And extract cache data from the corresponding cache location and reply to the sending object.
  • Step 1 match the requested user file in the IO request with the file distribution hash table to determine the storage location of the requested user file on the cloud host.
  • Step 2 If it is determined that the requested user file is not stored on the cloud host, obtain the requested user file from the associated data source on the cloud host. Otherwise skip to step 5.
  • Step 3 Cache the requested user file to the high-frequency file storage area on the cloud in the cloud host (i.e., one of the distributed memory of the cloud host, the virtual disk, and the file system mounted on the cloud host), and cache the requested user file.
  • User files and high The mapping relationship of the frequency file storage area is updated into the file distribution hash table.
  • Step 4 Send the requested user file to the sender of the IO request.
  • Step 5 If it is determined that the requested user file is stored on the cloud host, determine whether the requested user file is cached in distributed memory.
  • Step 6 If cached in distributed memory, send the requested user file to the sender of the IO request.
  • Step 7 If it is not cached in distributed memory, cache the requested user file into distributed memory, update the mapping relationship between the requested user file and distributed memory to the file distribution hash table, and send the requested user file to The sender of the IO request.
  • one method can be to periodically determine the access popularity value of each user file based on the cache configuration information set by the user and the file access characteristics of the user files collected online. And clean up user files whose access heat value is lower than the preset heat value.
  • the space size change of each user file can also be determined periodically based on the cache configuration information set by the user and the file access characteristics of the user files in online statistics. Therefore, based on subsequent changes in the space size of each user file, the user files are cached from the disk cache to the distributed cache, or the distributed cache is cached to the disk cache.
  • user files are cached through the distributed memory of the cloud host, the virtual disk, and the file system mounted on the cloud host; and when the user file to be cached is obtained, the hierarchical characteristics corresponding to the user file are determined. And based on the hierarchical characteristics, the user files are cached to the storage area of the corresponding level.
  • the hierarchical characteristics include at least one of the access frequency, the modification frequency and the amount of data.
  • the different levels of storage areas include the distributed memory of the cloud host, the virtual disk and the cloud host.
  • IO requests for user files can also be responded to based on the file distribution hash table.
  • the file distribution hash table includes the mapping relationship between user files and cache locations; based on the cache configuration information and online statistics set by the user. File access characteristics of user files, and data management of cached user files.
  • a multi-level data cache architecture constructed on a cloud host in the embodiment of the present application including a cache layer, a cache management layer and a cache client can flexibly cope with the cloud based on the file distribution hash table that records the storage locations of each user's files. Complex IO scenarios.
  • the idle resources of the cloud host are used to build the cache layer, so that cloud resources can be fully utilized and the upper layer processing IO pressure can be effectively alleviated.
  • user files can be cached to storage areas of corresponding levels based on hierarchical characteristics, including:
  • low-frequency access user files are user files whose access frequency is lower than the first preset frequency
  • the user files are determined to be high-frequency access user files, and the high-frequency access user files are cached in a storage area of a corresponding level.
  • the high-frequency access user files are user files whose access frequency is not lower than the first preset frequency.
  • this application can divide each user file to be cached into a high-frequency access user file or a low-frequency access user file. And store the high-frequency access user files in the cloud high-frequency file storage area in the cloud host (that is, one of the distributed memory of the cloud host, the virtual disk, and the file system mounted on the cloud host). And store the cached user files of low-frequency access user files to the cloud low-frequency file storage area in the cloud host.
  • user files that are accessed frequently are user files that users perform relatively frequent retrieval operations (that is, user files whose access frequency is not lower than the first preset frequency).
  • the low-frequency accessed user files are user files that the user performs relatively infrequent retrieval operations (that is, user files whose access frequency is lower than the first preset frequency).
  • the user file is a high-frequency access user file
  • the high-frequency access user file is cached in a corresponding level of storage area, including:
  • user files corresponding to high-frequency access to user files and the data volume is lower than the preset space threshold are cached into a virtual disk or a file system mounted on the cloud host.
  • the distributed memory cache can correspond to cached user files: user files with high cache access frequency and file block granularity and small capacity (that is, corresponding to user files with high frequency access and the data volume is lower than the preset space). threshold user files), and it can also provide low-latency data access for upper-layer processing. Shared cache.
  • the disk cache can correspond to cached user files: files with high cache access frequency, file block granularity, small capacity, and infrequent changes (that is, files corresponding to high-frequency access user files, and the data volume is less than User files with a preset space threshold and a change frequency lower than the second preset frequency) avoid single-point IO bottlenecks that may occur in shared caches and share the pressure on distributed memory caches.
  • the file system mounted on the cloud host can correspond to the cached user files: cache access frequency Files with higher file block granularity, user files with large capacity and frequent changes (that is, corresponding to user files with high frequency access, and the data volume is not lower than the preset space threshold, and the change frequency is not lower than the second preset frequency user file). Understandably, as the backend of distributed memory cache and disk cache, it carries most of the hot/warm data required for upper-layer processing, thereby reducing access to data sources.
  • the access popularity value of each user file is periodically determined
  • the space size change of each user file is periodically determined
  • the storage area includes one of disk cache and distributed cache.
  • one method can be to periodically determine the access popularity value of each user file based on the cache configuration information set by the user and the file access characteristics of the user files collected online. And clean up user files whose access heat value is lower than the preset heat value.
  • this application can periodically update the statistical hierarchical feature information to the file access feature table and obtain the relevant configuration information of the cache configuration center. According to the change in the read file block space size, combined with the configuration information of the cache configuration center, the data flow of each storage area is realized (i.e. Based on the change in space size of each user file, the user files are cached from the disk cache to the distributed cache, or the distributed cache is cached to the disk cache). And update the file distribution hash table synchronously.
  • the file distribution hash table includes the mapping relationship between the user file and the cache location
  • the corresponding cache location can be determined based on the preset file distribution hash table (the file distribution hash table records the mapping relationship between each user file and the cache location). And extract cache data from the corresponding cache location and reply to the sending object.
  • Step 1 match the requested user file in the IO request with the file distribution hash table to determine the storage location of the requested user file on the cloud host.
  • Step 2 If it is determined that the requested user file is not stored on the cloud host, obtain the requested user file from the associated data source on the cloud host. Otherwise skip to step 5.
  • Step 3 Cache the requested user file to the cloud high-frequency file storage area in the cloud host, and update the mapping relationship between the requested user file and the high-frequency file storage area into the file distribution hash table.
  • Step 4 Send the requested user file to the sender of the IO request.
  • Step 5 If it is determined that the requested user file is stored on the cloud host, determine whether the requested user file is cached in distributed memory.
  • Step 6 If cached in distributed memory, send the requested user file to the sender of the IO request.
  • Step 7 If it is not cached in distributed memory, cache the requested user file into distributed memory, update the mapping relationship between the requested user file and distributed memory to the file distribution hash table, and send the requested user file to The sender of the IO request.
  • the embodiment of this application also includes:
  • the requested user file will be sent to the sender of the IO request;
  • the requested user file will be cached in distributed memory, the mapping relationship between the requested user file and distributed memory will be updated to the file distribution hash table, and the requested user file will be sent to the IO requester. sender.
  • a multi-level data cache architecture including a cache layer, a cache management layer, and a cache client can be constructed on the cloud host, and a flexible response can be implemented based on the file distribution hash table that records the storage locations of each user's files.
  • Complex IO scenarios on the cloud since the idle resources of the cloud host are used to build the cache layer, It can make full use of cloud resources and effectively alleviate the IO pressure of upper-layer processing.
  • Embodiments of the present application also provide a cloud data caching system, which includes a data source layer, a cache layer on a cloud host, a cache management and control layer, a cache client on the cloud host, and an HPC processing end, where:
  • the data source layer includes low-frequency file storage on the cloud, object storage on the cloud, and IDC file storage;
  • the cache layer includes the file system, distributed memory and virtual disk mounted on the cloud host.
  • the cache layer is used to cache frequently accessed user files;
  • the cache management and control layer includes the cache configuration center, file access characteristic statistics table and file distribution hash table.
  • the cache management and control layer is used to manage cached user files;
  • the cache client is used to provide a data operation interface for the HPC processing layer and process IO requests.
  • the cloud data caching system proposed in this application does not distinguish data sources, which can be user local file storage, cloud low-frequency file storage, or cloud object storage.
  • the transmission of user file data between various cloud hosts can be supported based on file system mounting.
  • cloud object storage it can support the transmission and access of user file data between cloud hosts through specific API interfaces, which has certain advantages in data distribution.
  • One approach, which is also used only for less frequently accessed data, is the caching of user files.
  • IDC file storage its file storage located in the user's local computer room supports the user's local processing on the one hand, and on the other hand is connected to the cloud network through a dedicated line/VPN.
  • the file system mounted on the cloud host in the cache layer it can be used for persistent global cache to support the transmission of user file data between cloud hosts, and its performance is stronger than low-frequency file storage.
  • it can be used to cache files in the data source that are accessed more frequently, have larger file data volumes, and frequently change data, such as core user files for HPC tasks.
  • the free disk space of the cloud host is used to cache files in data sources with high access frequency, small file data volume, and infrequent data changes, such as: processing software, program plug-ins, pre- and post-processing scripts, etc.
  • the disk cache capacity can be expanded by adding a data disk to the cloud host.
  • the free memory of multiple cloud hosts builds a memory file system through tmpfs/ramdisk and other methods to form a distributed memory cache layer, and is uniformly managed by the cache management and control layer to cache data sources that are frequently used. The file block being accessed. The more cloud hosts there are during peak user processing periods, the larger the memory cache space will be.
  • the cache configuration center of the cache management and control layer in the cloud data cache system of this application can be Used to maintain cache configuration information of cached data in the system and provide cache control interfaces to users.
  • users can turn on and off each cache layer through interaction with the cache configuration center, so as to cooperate with the cache client to achieve the effect of retrieving user files at any time.
  • the cache cold data cleaning strategy can also be implemented, such as regularly cleaning data with low data access frequency based on the memory/disk occupancy ratio and file popularity.
  • caching strategies can also be customized, such as data prefetching based on specific file names.
  • file access characteristic statistics table can be used to maintain the file access characteristics of the upper-layer HPC Workload, including but not limited to file access popularity, access mode (sequential access, random access), read file block size, etc. Supports periodic dimension statistics (for example, by month/day/hour/minute). File access characteristics statistics tables are also used to provide input for cache cold data cleanup and data flow.
  • the file distribution hash table is used to maintain the storage location of user files/user file blocks in the cache layer, supporting HPC Workload to efficiently obtain target files/file blocks, and ensuring the efficiency of upper-layer processing.
  • the three components of the cache management and control layer in the data cache system can be used to store cache core data, and the storage method is not limited to database, redis, files, and ensure the consistency of data access/update under the distributed architecture through mutex locks.
  • the cloud data caching system in this application can also include:
  • the cloud data caching system in this application can also include:
  • cached user files whose data volume is not less than the preset space threshold are stored in distributed memory;
  • the user files whose data volume is lower than the preset space threshold and whose modification frequency is lower than the preset frequency are stored in the virtual disk;
  • the user files whose data volume is lower than the preset space threshold and whose modification frequency is not lower than the preset frequency are stored in the file system mounted on the cloud host.
  • the cloud data caching system in this application can also include:
  • the cache configuration center is used to maintain cache configuration information and provide cache control interfaces to users;
  • the file access characteristics statistics table is used to collect file access characteristics of the HPC processing layer, including file access characteristics including file access popularity, access mode, and data volume of user files.
  • the cloud data caching system provided by the above embodiments of the present application and the cloud data caching method provided by the embodiments of the present application are based on the same inventive concept, and have the same advantages as the methods adopted, run or implemented by the applications stored therein. beneficial effect.
  • An embodiment of the present application also provides a cloud data caching device, which is configured to perform operations performed by the cloud data caching method provided in any of the above embodiments.
  • the device includes:
  • Deployment module 201 is used to cache user files through the distributed memory of the cloud host, the virtual disk, and the file system mounted on the cloud host;
  • the response module 202 is configured to, when obtaining a user file to be cached, determine the hierarchical characteristics corresponding to the user file, and cache the user file to a storage area of the corresponding level based on the hierarchical characteristics.
  • the hierarchical characteristics Including at least one of access frequency, modification frequency and data volume, the different levels of storage areas include distributed memory of the cloud host, virtual disks and file systems mounted on the cloud host.
  • the response module 202 is specifically configured to determine that the user file is a low-frequency access user file, and cache the low-frequency access user file to the cloud low-frequency file storage area in the cloud host.
  • the low-frequency access user file is a low-frequency access user file.
  • the response module 202 is specifically configured to determine that the user file is a user file with high frequency access, and cache the user file with high frequency access to the storage area of the corresponding level.
  • the user file with high frequency access is an access frequency no less than The user file of the first preset frequency.
  • the response module 202 is specifically configured to cache the user files corresponding to high-frequency access to user files, the data volume is not lower than the preset space threshold, and the modification frequency is not lower than the second preset frequency to the cloud host.
  • the mounted file system In the mounted file system;
  • the response module 202 is specifically configured to cache the user files corresponding to high-frequency access user files and whose data volume is lower than the preset space threshold into the virtual disk or the distributed memory.
  • the response module 202 is specifically used to determine the modification frequency corresponding to the cached user files that correspond to high-frequency access user files and whose data volume is lower than the preset space threshold;
  • the response module 202 is specifically configured to store cached user files whose modification frequency is lower than the second preset frequency into the virtual disk; or,
  • the response module 202 is specifically configured to store cached user files whose modification frequency is not lower than the second preset frequency into the distributed memory.
  • the deployment module 201 is specifically configured to periodically determine the access popularity value of each user file based on the cache configuration information set by the user and the file access characteristics of the user file in online statistics;
  • the deployment module 201 is specifically used to clean up user files whose access popularity value is lower than the preset popularity value.
  • the deployment module 201 is specifically configured to periodically determine the space size change of each user file based on the cache configuration information set by the user and the file access characteristics of the user file according to online statistics;
  • the deployment module 201 is specifically configured to cache each user file to a different storage area based on the change in space size of each user file.
  • the storage area includes one of the disk cache and the distributed cache.
  • Deployment module 201 is specifically used to receive IO requests for user files
  • Deployment module 201 is specifically used to match the requested user file in the IO request with a file distribution hash table, and determine the storage location of the requested user file on the cloud host.
  • the file distribution hash table includes user Mapping relationship between files and cache locations;
  • the deployment module 201 is specifically configured to obtain the requested user file through the associated data source on the cloud host if it is determined that the requested user file is not stored on the cloud host;
  • the deployment module 201 is specifically configured to cache the requested user file to the storage area of the corresponding level, and update the mapping relationship between the requested user file and the storage area of the corresponding level into the file distribution hash table;
  • the deployment module 201 is specifically configured to send the requested user file to the sender of the IO request.
  • the cloud data caching device provided by the above embodiments of the present application and the cloud data caching method provided by the embodiments of the present application are based on the same inventive concept, and have the same beneficial effects as the methods adopted, run or implemented by the applications stored therein. .
  • FIG. 4 shows a schematic diagram of an electronic device provided by some embodiments of the present application.
  • the electronic device 3 includes: a processor 300, a memory 301, a bus 302 and a communication interface 303.
  • the processor 300, the communication interface 303 and the memory 301 are connected through the bus 302; the memory 301 stores available data.
  • a computer program runs on the processor 300. When the processor 300 runs the computer program, it executes the cloud data caching method provided in any of the previous embodiments of this application.
  • the memory 301 may include high-speed random access memory (RAM: Random Access Memory), and may also include non-volatile memory (non-volatile memory), such as at least one disk memory.
  • RAM Random Access Memory
  • non-volatile memory such as at least one disk memory.
  • the communication connection between the device network element and at least one other network element is realized through at least one communication interface 303 (which can be wired or wireless), and the Internet, wide area network, local network, metropolitan area network, etc. can be used.
  • the bus 302 may be an ISA bus, a PCI bus, an EISA bus, etc.
  • the bus can be divided into address bus, data bus, control bus, etc.
  • the memory 301 is used to store a program. After receiving the execution instruction, the processor 300 executes the program.
  • the cloud data caching method disclosed in any of the embodiments of the present application can be applied to the processor 300 , or implemented by the processor 300 .
  • the processor 300 may be an integrated circuit chip with signal processing capabilities. During the implementation process, each step of the above method can be completed by instructions in the form of hardware integrated logic circuits or software in the processor 300 .
  • the processor 300 may be a general-purpose processor, including a central processing unit (CPU), a network processor (NP), etc.; it may also be a digital signal processor (DSP), application specific integrated circuit (ASIC), etc. ), off-the-shelf programmable gate arrays (FPGAs) or other programmable logic devices, discrete gate or transistor logic devices, and discrete hardware components.
  • CPU central processing unit
  • NP network processor
  • ASIC application specific integrated circuit
  • FPGAs off-the-shelf programmable gate arrays
  • a general-purpose processor may be a microprocessor or the processor may be any conventional processor, etc.
  • the steps of the method disclosed in conjunction with the embodiments of the present application can be directly implemented by a hardware decoding processor, or executed by a combination of hardware and software modules in the decoding processor.
  • the software module can be located in random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, registers and other mature storage media in this field.
  • the storage medium is located in the memory 301.
  • the processor 300 reads the information in the memory 301 and completes the steps of the above method in combination with its hardware.
  • the electronic device provided by the embodiments of this application and the cloud data caching method provided by the embodiments of this application are based on the same inventive concept, and have the same beneficial effects as the methods adopted, run or implemented.
  • the embodiment of the present application also provides a computer-readable storage medium corresponding to the cloud data caching method provided in the previous embodiment. Please refer to Figure 5.
  • the computer-readable storage medium shown is an optical disk 30, on which is stored A computer program (i.e., a program product). When the computer program is run by a processor, the computer program will execute the cloud data caching method provided by any of the foregoing embodiments.
  • examples of the computer-readable storage medium may also include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), and other types of random access memory.
  • PRAM phase change memory
  • SRAM static random access memory
  • DRAM dynamic random access memory
  • RAM random access memory
  • ROM read-only memory
  • EEPROM electrically erasable programmable read-only memory
  • flash memory or other optical and magnetic storage media will not be described in detail here.
  • the computer-readable storage medium provided by the above embodiments of the present application is based on the same inventive concept as the cloud data caching method provided by the embodiments of the present application, and has the same beneficial effects as the methods used, run or implemented by the applications stored therein. .

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Disclosed in the present application are a cloud data caching method and system, a device and a storage medium. The method comprises: in the embodiments of the present application, a multi-level data caching architecture comprising a cache layer, a cache management layer and a cache client can be constructed on a cloud host, and, when a user file to be cached is obtained, a grading feature corresponding to said user file is determined, and said user file is cached to a storage area of a corresponding level on the basis of the grading feature. In the embodiments of the present application, a complex cloud IO scenario can be flexibly dealt with according to a file distribution hash table recording the storage position of each user file. In addition, in the embodiments of the present application, the cache layer is constructed by using idle resources of the cloud host, so that cloud resources can be fully utilized, and the pressure of upper layer processing IO is effectively relieved.

Description

云上数据缓存方法、装置、设备及存储介质Cloud data caching method, device, equipment and storage medium
本申请要求于2022年03月28日提交中国专利局、申请号为202210313588.0、申请名称为“云上数据缓存方法、装置、设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims priority to the Chinese patent application filed with the China Patent Office on March 28, 2022, with application number 202210313588.0 and the application name "Cloud Data Caching Method, Device, Equipment and Storage Medium", the entire content of which is incorporated by reference. incorporated in this application.
技术领域Technical field
本申请属于计算机技术领域,具体涉及一种云上数据缓存方法、系统、设备及存储介质。This application belongs to the field of computer technology, and specifically relates to a cloud data caching method, system, equipment and storage medium.
背景技术Background technique
随着云计算技术的迅猛发展,越来越多的HPC(High Performance Computing,高性能计算机群)行业用户将运行数据迁移到云上。像影视渲染、生物信息、地藏勘探等场景下的HPC行业通常需要海量计算资源,运算过程伴随大量的文件读写操作,对云上文件存储性能要求极高。With the rapid development of cloud computing technology, more and more HPC (High Performance Computing, high-performance computer cluster) industry users are migrating operating data to the cloud. The HPC industry in scenarios such as film and television rendering, biological information, and geological exploration usually requires massive computing resources. The computing process is accompanied by a large number of file reading and writing operations, which places extremely high requirements on cloud file storage performance.
由于不同HPC场景的IO(Input/Output,输入/输出)特征千差万别,所需的吞吐率、IOPS(Input/Output Operations Per Second,每秒读写次数)、延迟等具体存储性能指标也有很大差异,导致HPC场景迁云经常会遇到很多存储问题,如存储性能与需求不匹配,单一的存储模式难以应对复杂的IO特征,多套存储管理复杂度高等。Since the IO (Input/Output, input/output) characteristics of different HPC scenarios vary widely, the required specific storage performance indicators such as throughput rate, IOPS (Input/Output Operations Per Second, reads and writes per second), and latency also vary greatly. , resulting in HPC scenarios often encountering many storage problems when migrating to the cloud, such as storage performance not matching requirements, a single storage mode being difficult to cope with complex IO characteristics, and multiple sets of storage management being highly complex.
发明内容Contents of the invention
本申请提出一种云上数据缓存方法、系统、设备及存储介质,可以构建一种云主机上的包含缓存层、缓存管理层以及缓存客户端的多级数据缓存架构,并根据记录有各个用户文件存储位置的文件分布哈希表,弹性应对云上复杂IO场景。另外,本申请实施例中由于使用了云主机空闲资源构建缓存层,从而可以充分利用云资源,又有效缓解上层处理IO压力。This application proposes a cloud data caching method, system, equipment and storage medium, which can build a multi-level data caching architecture on the cloud host including a caching layer, a caching management layer and a caching client, and record each user file The file distribution hash table of the storage location can flexibly cope with complex IO scenarios on the cloud. In addition, in the embodiment of this application, the idle resources of the cloud host are used to build the cache layer, so that cloud resources can be fully utilized and the upper layer processing IO pressure can be effectively alleviated.
本申请第一方面实施例提出了一种云上数据缓存方法,所述方法包括:The first embodiment of the present application proposes a data caching method on the cloud. The method includes:
通过云主机的分布式内存、虚拟磁盘以及所述云主机上挂载的文件系统缓存用户文件; Cache user files through the cloud host's distributed memory, virtual disk, and file system mounted on the cloud host;
当获取到待缓存的用户文件时,确定所述用户文件对应的分级特征,并基于所述分级特征,将所述用户文件缓存至对应级别的存储区域,所述分级特征包括访问频率、改动频率以及数据量的至少之一,所述不同级别的存储区域包括所述云主机的分布式内存、虚拟磁盘以及所述云主机上挂载的文件系统。When the user file to be cached is obtained, the hierarchical characteristics corresponding to the user file are determined, and based on the hierarchical characteristics, the user files are cached to the storage area of the corresponding level. The hierarchical characteristics include access frequency and modification frequency. And at least one of the amount of data, the different levels of storage areas include the distributed memory of the cloud host, the virtual disk, and the file system mounted on the cloud host.
本申请第三方面的实施例提供了一种云上数据缓存系统,包括数据源层、云主机上的缓存层、缓存管控层、云主机上的缓存客户端以及HPC处理端,其中:The third embodiment of the present application provides a cloud data cache system, including a data source layer, a cache layer on a cloud host, a cache management and control layer, a cache client on the cloud host, and an HPC processing end, where:
所述数据源层包括云上低频文件存储、云上对象存储以及IDC文件存储;The data source layer includes cloud low-frequency file storage, cloud object storage and IDC file storage;
所述缓存层包括云主机上挂载的文件系统、分布式内存以及虚拟磁盘,所述缓存层用于对高频访问的用户文件进行缓存;The cache layer includes a file system, distributed memory and virtual disk mounted on the cloud host. The cache layer is used to cache frequently accessed user files;
所述缓存管控层包括缓存配置中心、文件访问特征统计表以及文件分布哈希表,所述缓存管控层用于对缓存用户文件进行管理;The cache management and control layer includes a cache configuration center, a file access characteristic statistics table and a file distribution hash table. The cache management and control layer is used to manage cached user files;
其中,所述缓存客户端用于为所述HPC处理层提供数据操作接口以及处理IO请求。Wherein, the cache client is used to provide a data operation interface for the HPC processing layer and process IO requests.
本申请第四方面的实施例提供了一种电子设备,包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序,所述处理器运行所述计算机程序以实现上述第一方面或第二方面所述的方法。An embodiment of the fourth aspect of the present application provides an electronic device, including a memory, a processor, and a computer program stored on the memory and executable on the processor. The processor runs the computer program to Implement the method described in the first or second aspect above.
本申请第五方面的实施例提供了一种计算机可读存储介质,其上存储有计算机程序,所述程序被处理器执行实现上述第一方面或第二方面所述的方法。The embodiment of the fifth aspect of the present application provides a computer-readable storage medium on which a computer program is stored, and the program is executed by a processor to implement the method described in the first or second aspect.
本申请实施例中提供的技术方案,至少具有如下技术效果或优点:The technical solutions provided in the embodiments of this application have at least the following technical effects or advantages:
在本申请实施例中,可以构建一种云主机上的包含缓存层、缓存管理层以及缓存客户端的多级数据缓存架构,并根据记录有各个用户文件存储位置的文件分布哈希表,弹性应对云上复杂IO场景。另外,本申请实施例中由于使用了云主机空闲资源构建缓存层,从而可以充分利用云资源,又有效缓解上层处理IO压力。In the embodiment of this application, a multi-level data cache architecture including a cache layer, a cache management layer, and a cache client can be constructed on the cloud host, and a flexible response can be implemented based on the file distribution hash table that records the storage locations of each user's files. Complex IO scenarios on the cloud. In addition, in the embodiment of this application, the idle resources of the cloud host are used to build the cache layer, so that cloud resources can be fully utilized and the upper layer processing IO pressure can be effectively alleviated.
本申请附加的方面和优点将在下面的描述中部分给出,部分将从下面的描述中变的明显,或通过本申请的实践了解到。Additional aspects and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.
附图说明Description of drawings
通过阅读下文优选实施方式的详细描述,各种其他的优点和益处对于本领域普通技术人员将变得清楚明了。附图仅用于示出优选实施方式的目的,而并不认为是对本申请的限制。而且在整个附图中,用相同的参考符号表示相同的部件。在附图中:Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are for the purpose of illustrating preferred embodiments only and are not to be construed as limiting the application. Also throughout the drawings, the same reference characters are used to designate the same components. In the attached picture:
图1示出了本申请一实施例所提供的一种云上数据缓存方法的操作流程图; Figure 1 shows an operation flow chart of a cloud data caching method provided by an embodiment of the present application;
图2示出了本申请一实施例所提供的一种云上数据缓存系统的架构图;Figure 2 shows an architectural diagram of a cloud data caching system provided by an embodiment of the present application;
图3示出了本申请一实施例所提供的一种云上数据缓存装置的结构示意图;Figure 3 shows a schematic structural diagram of a cloud data caching device provided by an embodiment of the present application;
图4示出了本申请一实施例所提供的一种电子设备的结构示意图;Figure 4 shows a schematic structural diagram of an electronic device provided by an embodiment of the present application;
图5示出了本申请一实施例所提供的一种存储介质的示意图。Figure 5 shows a schematic diagram of a storage medium provided by an embodiment of the present application.
具体实施方式Detailed ways
下面将参照附图更详细地描述本申请的示例性实施方式。虽然附图中显示了本申请的示例性实施方式,然而应当理解,可以以各种形式实现本申请而不应被这里阐述的实施方式所限制。相反,提供这些实施方式是为了能够更透彻地理解本申请,并且能够将本申请的范围完整的传达给本领域的技术人员。Exemplary embodiments of the present application will be described in more detail below with reference to the accompanying drawings. Although exemplary embodiments of the present application are shown in the drawings, it should be understood that the present application may be implemented in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided to provide a more thorough understanding of the present application, and to fully convey the scope of the present application to those skilled in the art.
需要注意的是,除非另有说明,本申请使用的技术术语或者科学术语应当为本申请所属领域技术人员所理解的通常意义。It should be noted that, unless otherwise stated, the technical terms or scientific terms used in this application should have the usual meanings understood by those skilled in the art to which this application belongs.
下面结合附图来描述根据本申请实施例提出的一种云上数据缓存方法、系统、设备及存储介质。A cloud data caching method, system, device and storage medium proposed according to embodiments of the present application will be described below with reference to the accompanying drawings.
本申请实施例提供的云上数据缓存方法。参见图1,该方法具体包括以下步骤:The cloud data caching method provided by the embodiment of this application. Referring to Figure 1, the method specifically includes the following steps:
步骤101:通过云主机的分布式内存、虚拟磁盘以及云主机上挂载的文件系统缓存用户文件。Step 101: Cache user files through the cloud host's distributed memory, virtual disk, and file system mounted on the cloud host.
相关技术中,在HPC处理(如影视渲染、生物信息、地藏勘探)中,通常需要海量计算资源,运算过程伴随大量文件读写操作,故对云上文件存储性能要求极高。In related technologies, HPC processing (such as film and television rendering, biological information, and underground exploration) usually requires massive computing resources. The computing process is accompanied by a large number of file reading and writing operations, so the performance requirements for cloud file storage are extremely high.
然而,由于不同HPC处理场景的IO特征千差万别,因此所需的具体存储性能指标(吞吐、IOPS、延迟等)亦有很大差异。考虑到IO特征多样性,HPC处理迁云经常会遇到如下存储问题:存储性能与处理需求不匹配。即,对于首次上云的用户,一次性选择出最匹配处理IO特征的文件存储规格是比较困难的。而后续存储规格的变更涉及到底层硬件资源改动及数据迁移,成本很高。另外,由于线下HPC处理多样性,用户可能会面临云上低配存储性能无法满足处理需求,而高配存储规格性能溢出且价格高昂的情况。单一存储难以应对复杂IO特征1.HPC处理通常数据量巨大,且不同数据的访问特征(访问频率、块大小)差别很大。将所有数据存放在单一存储上,难以做到成本可控的问题。However, since the IO characteristics of different HPC processing scenarios vary widely, the specific storage performance indicators (throughput, IOPS, latency, etc.) required also vary greatly. Considering the diversity of IO characteristics, HPC processing cloud migration often encounters the following storage problems: storage performance does not match processing requirements. That is, for users who are migrating to the cloud for the first time, it is difficult to select the file storage specification that best matches the processing IO characteristics at one time. Subsequent changes in storage specifications involve changes to underlying hardware resources and data migration, which is very costly. In addition, due to the diversity of offline HPC processing, users may face the situation that low-end storage performance on the cloud cannot meet processing needs, while high-end storage specifications overflow performance and are expensive. It is difficult for a single storage to cope with complex IO characteristics 1. HPC processing usually involves a huge amount of data, and the access characteristics (access frequency, block size) of different data vary greatly. Storing all data on a single storage is difficult to control costs.
基于上述存在的问题,本申请提出一种通过云主机的分布式内存、虚拟磁盘以及云主机上挂载的文件系统缓存用户文件来实现数据缓存的方法。其中,具体应用于包含应用于数据源层、云主机上的缓存层、缓存管控层、云主机上的缓存客户端以及HPC处理端的云上数据缓存系统中。 Based on the above existing problems, this application proposes a method of caching user files through the distributed memory of the cloud host, the virtual disk, and the file system mounted on the cloud host to implement data caching. Among them, the specific application includes the cloud data caching system applied to the data source layer, the cache layer on the cloud host, the cache management and control layer, the cache client on the cloud host, and the HPC processing end.
如图2所示,本申请中的云上数据缓存系统中的数据源层包括云上低频文件存储、云上对象存储以及IDC文件存储。以及缓存层包括云主机上挂载的文件系统、分布式内存以及虚拟磁盘,缓存层用于对高频访问的用户文件进行缓存;以及缓存管控层包括缓存配置中心、文件访问特征统计表以及文件分布哈希表,缓存管控层用于对缓存用户文件进行管理;其中,缓存客户端用于为HPC处理层提供数据操作接口以及处理IO请求。As shown in Figure 2, the data source layer in the cloud data caching system in this application includes cloud low-frequency file storage, cloud object storage and IDC file storage. And the cache layer includes the file system, distributed memory and virtual disk mounted on the cloud host. The cache layer is used to cache frequently accessed user files; and the cache management and control layer includes the cache configuration center, file access characteristic statistics table and file Distributed hash table, cache management and control layer is used to manage cached user files; among them, the cache client is used to provide a data operation interface for the HPC processing layer and process IO requests.
具体地,对于数据源层中的云上低频文件存储来说,可以为基于文件系统挂载的方式支持各个云主机之间的用户文件数据的传输。需要说明的是,其可以支持标准文件IO接口。且仅用于数据访问频率较低是用户文件的缓存。Specifically, for low-frequency file storage on the cloud in the data source layer, the transmission of user file data between various cloud hosts can be supported based on file system mounting. It should be noted that it can support standard file IO interface. And only used for caching user files whose data access frequency is low.
另外,对于云上对象存储来说,其可以支持云主机之间通过特定API接口进行用户文件数据的传输访问,在数据分发上具备一定优势。一种方式中,其也仅用于数据访问频率较低是用户文件的缓存。In addition, for cloud object storage, it can support the transmission and access of user file data between cloud hosts through specific API interfaces, which has certain advantages in data distribution. One approach, which is also used only for less frequently accessed data, is the caching of user files.
再者,对于IDC文件存储来说,其位于用户本地机房的文件存储,一方面支撑用户本地处理,另一方面通过专线/VPN与云上网络打通。Furthermore, for IDC file storage, the file storage located in the user's local computer room supports the user's local processing on the one hand, and on the other hand is connected to the cloud network through a dedicated line/VPN.
进一步地,对于缓存层中的云主机上挂载的文件系统来说,其为云上高频文件缓存区域,可用于支持云主机之间数据共享的持久化全局缓存层,性能比低频文件存储更强。在本申请实施例中,可以用于缓存数据源中访问频率较高、文件数据量较大、数据改动较频繁的文件,如:HPC任务的核心用户文件。Furthermore, for the file system mounted on the cloud host in the cache layer, it is a high-frequency file cache area on the cloud, which can be used to support the persistent global cache layer for data sharing between cloud hosts, and has better performance than low-frequency file storage. Stronger. In this embodiment of the present application, it can be used to cache files in the data source that are accessed more frequently, have larger file data volumes, and frequently change data, such as core user files for HPC tasks.
另外,对于磁盘缓存来说,其可以用于持久化局部缓存层。本申请实施例中,云主机的空闲磁盘空间用于缓存数据源中访问频率高、文件数据量较小、数据改动不频繁的文件,如:处理软件、程序插件、前后处理脚本等。一种方式中,可以通过给云主机添加数据盘的方式来扩展磁盘缓存容量。其中,该用于局部缓存的磁盘缓存中缓存的用户文件可以为某个云主机独享的用户文件。Additionally, for disk caching, it can be used to persist local cache layers. In the embodiment of this application, the free disk space of the cloud host is used to cache files in data sources with high access frequency, small file data volume, and infrequent data changes, such as: processing software, program plug-ins, pre- and post-processing scripts, etc. In one method, the disk cache capacity can be expanded by adding a data disk to the cloud host. Among them, the user files cached in the disk cache used for local caching can be user files exclusive to a certain cloud host.
另外,对于分布式内存来说,可以用于非持久化全局缓存层。也即其与云上高频文件缓存区域同样可用于支持云主机之间数据共享的持久化全局缓存层。本申请实施例中,多台云主机的空闲内存通过tmpfs/ramdisk等方式构建内存文件系统并形成分布式的内存缓存层,并由缓存管控层进行统一管理,用于缓存数据源中被高频访问的文件块。用户处理高峰期云主机数量越多,内存缓存空间就越大。In addition, for distributed memory, it can be used for non-persistent global cache layer. That is to say, it and the high-frequency file cache area on the cloud can also be used to support the persistent global cache layer that supports data sharing between cloud hosts. In the embodiment of this application, the free memory of multiple cloud hosts builds a memory file system through tmpfs/ramdisk and other methods to form a distributed memory cache layer, and is uniformly managed by the cache management and control layer to cache data sources that are frequently used. The file block being accessed. The more cloud hosts there are during peak user processing periods, the larger the memory cache space will be.
更进一步,对于本申请云上数据缓存系统中缓存管控层的缓存配置中心而言,可以用于维护系统中缓存数据的缓存配置信息并向用户提供缓存控制接口。一种方式中,用户可通过与缓存配置中心的交互,实现对各缓存层的开启和关闭,从而配合缓存客户端达到 用户文件随时调取的效果。另外,也可以实现缓存冷数据清理策略的作用,如:根据内存/磁盘占用比例和文件热度来对数据访问频率较低的数据进行定时的清理。再者,还可以实现对缓存策略的定制,如基于特定文件名的数据预取。Furthermore, the cache configuration center of the cache management and control layer in the cloud data cache system of this application can be used to maintain the cache configuration information of cached data in the system and provide cache control interfaces to users. In one method, users can enable and close each cache layer by interacting with the cache configuration center, thereby cooperating with the cache client to achieve User files can be retrieved at any time. In addition, the cache cold data cleaning strategy can also be implemented, such as regularly cleaning data with low data access frequency based on the memory/disk occupancy ratio and file popularity. Furthermore, caching strategies can also be customized, such as data prefetching based on specific file names.
另外,对于文件访问特征统计表来说,可以用于维护上层HPC Workload的文件访问特征,包括但不限于文件访问热度、访问模式(顺序访问、随机访问)、读文件块大小等。支持周期性维度的统计(例如可以按月/天/小时/分钟)。文件访问特征统计表也用于为缓存冷数据清理和数据流动提供输入。In addition, the file access characteristic statistics table can be used to maintain the file access characteristics of the upper-layer HPC Workload, including but not limited to file access popularity, access mode (sequential access, random access), read file block size, etc. Supports periodic dimension statistics (for example, by month/day/hour/minute). File access characteristics statistics tables are also used to provide input for cache cold data cleanup and data flow.
可选的,文件分布哈希表用于维护文件/文件块在缓存层中存放的位置,支持HPC Workload高效获取目标文件/文件块,保证上层处理运行效率。Optionally, the file distribution hash table is used to maintain the storage location of files/file blocks in the cache layer, supporting HPC Workload to efficiently obtain target files/file blocks and ensuring the efficiency of upper-layer processing.
需要说明的是,数据缓存系统中缓存管控层的三个组件(即缓存配置中心、文件访问特征统计表以及文件分布哈希表)可以用于存放缓存核心数据,存储方式不限于数据库、redis、文件,并通过互斥锁保证分布式架构下数据访问/更新的一致性。It should be noted that the three components of the cache management and control layer in the data cache system (i.e. cache configuration center, file access characteristic statistics table and file distribution hash table) can be used to store cache core data, and the storage method is not limited to database, redis, files, and ensure the consistency of data access/update under the distributed architecture through mutex locks.
进一步说明,本申请的缓存客户端可以向上为HPC处理层提供文件系统入口及标准POSIX文件操作接口,向下负责IO请求的实时处理。Further explanation, the cache client of this application can provide file system entry and standard POSIX file operation interface for the HPC processing layer upward, and is responsible for real-time processing of IO requests downward.
其中对于与HPC处理层交互的过程而言,首先可以在获取到上层HPC处理发送的实时IO请求,从文件分布哈希表获取请求的用户文件对应的目标文件/文件块在缓存中的位置。For the process of interacting with the HPC processing layer, you can first obtain the real-time IO request sent by the upper-layer HPC processing, and obtain the location in the cache of the target file/file block corresponding to the requested user file from the file distribution hash table.
其中,若目标文件/文件块为未被缓存的文件,即可以从数据源处读取该目标文件/文件块传递给上层处理,同时将该目标文件/文件块缓存到云上高主频文件存储,并同步将文件及其映射关系更新到文件分布哈希表中。Among them, if the target file/file block is an uncached file, the target file/file block can be read from the data source and passed to the upper layer for processing, and at the same time, the target file/file block can be cached to a high-frequency file on the cloud. Store and synchronously update files and their mapping relationships into the file distribution hash table.
而若该目标文件/文件块为已缓存到云上高主频文件存储或磁盘缓存但未被缓存到分布式内存缓存或缓存失效的文件,从目标缓存中读取该文件传递给上层处理,同时将读到的目标文件/文件块缓存到分布式内存缓存,同步将文件及其映射关系更新到文件分布哈希表中。And if the target file/file block is a file that has been cached in the cloud high-frequency file storage or disk cache but has not been cached in the distributed memory cache or the cache is invalid, the file is read from the target cache and passed to the upper layer for processing. At the same time, the read target file/file block is cached in the distributed memory cache, and the file and its mapping relationship are synchronously updated into the file distribution hash table.
而若该目标文件/文件块为已缓存到分布式内存缓存的文件块,直接从分布式内存缓存读取该目标文件/文件块传递给上层处理。If the target file/file block is a file block that has been cached in the distributed memory cache, the target file/file block is directly read from the distributed memory cache and passed to the upper layer for processing.
其中缓存客户端对于与负责IO请求的实时处理的过程而言,可以周期性统计分级特征信息更新到文件访问特征表并获取缓存配置中心相关配置信息,根据时间序的文件访问热度变化及读文件块大小,结合缓存配置中心配置信息,实现云上高主频文件缓存和磁盘缓存间的热数据流动和各缓存层的冷数据清理,并同步更新文件分布哈希表。 Among them, the cache client is responsible for the real-time processing of IO requests. It can periodically update the statistical hierarchical feature information to the file access feature table and obtain the relevant configuration information of the cache configuration center. According to the time-series changes in file access heat and read files Block size, combined with the configuration information of the cache configuration center, realizes the flow of hot data between high-frequency file cache and disk cache on the cloud and the cleanup of cold data at each cache layer, and simultaneously updates the file distribution hash table.
步骤102:当获取到待缓存的用户文件时,确定用户文件对应的分级特征,并基于分级特征,将用户文件缓存至对应级别的存储区域,分级特征包括访问频率、改动频率以及数据量,不同级别的存储区域包括云主机的分布式内存、虚拟磁盘以及所述云主机上挂载的文件系统。Step 102: When the user file to be cached is obtained, the hierarchical characteristics corresponding to the user file are determined, and based on the hierarchical characteristics, the user files are cached to the storage area of the corresponding level. The hierarchical characteristics include access frequency, modification frequency and data volume, which vary. The level storage area includes the distributed memory of the cloud host, the virtual disk, and the file system mounted on the cloud host.
一种方式中,本申请提出的云上数据缓存方法中,可以将各个待缓存的用户文件按照访问频率、改动频率、数据量的其中之一,进行对应的分级缓存。其中,该分布式内存、虚拟磁盘以及所述云主机上挂载的文件系统可以共同组成该多级的缓存区域。In one way, in the cloud data caching method proposed in this application, each user file to be cached can be cached in a corresponding hierarchical manner according to one of access frequency, modification frequency, and data amount. Among them, the distributed memory, the virtual disk and the file system mounted on the cloud host can jointly form the multi-level cache area.
一种方式中,分布式内存缓存可以对应缓存的用户文件为:缓存访问频率高(即热数据)且文件块粒度,容量较小的用户文件,同时其还可以为上层处理提供低时延的数据访问。共享型缓存。In one method, the distributed memory cache can correspond to cached user files: user files with high cache access frequency (i.e. hot data) and file block granularity and small capacity. At the same time, it can also provide low latency for upper-layer processing. data access. Shared cache.
一种方式中,磁盘缓存可以对应缓存的用户文件为:缓存访问频率高(即热数据)、且文件块粒度,容量较小、且改动不频繁的文件,避免共享型缓存可能出现的单点IO瓶颈,并为分布式内存缓存分担压力。In one method, the disk cache can correspond to cached user files: files with high cache access frequency (i.e. hot data), file block granularity, small capacity, and infrequent changes, to avoid single points that may occur in shared caches. IO bottleneck and share the pressure on the distributed memory cache.
另一种方式在,云主机上挂载的文件系统可以对应缓存的用户文件为:缓存访问频率较高的文件(即温数据)且文件块粒度,容量较大、且改动频繁的用户文件。可以理解的,其作为分布式内存缓存和磁盘缓存的后端,承载上层处理所需的绝大部分热/温数据,从而降低对数据源的访问。Another way is that the file system mounted on the cloud host can correspond to the cached user files: cache files with high access frequency (i.e., warm data) and user files with file block granularity, large capacity, and frequent changes. Understandably, as the backend of distributed memory cache and disk cache, it carries most of the hot/warm data required for upper-layer processing, thus reducing access to data sources.
此外,本申请中的云上数据缓存系统可以支持热插拔,支持对接云上和云下数据源,支持横向和纵向扩容。In addition, the cloud data caching system in this application can support hot swapping, support docking with data sources on and off the cloud, and support horizontal and vertical expansion.
一种方式中,若接收到发送对象(例如为HPC处理层)发送的针对某个用户文件的IO请求。即可以基于预设的文件分布哈希表(文件分布哈希表中记录有各个用户文件与缓存位置的映射关系)确定对应的缓存位置。并从对应缓存位置中提取缓存数据并回复给该发送对象。In one method, if an IO request for a user file is received from the sending object (for example, the HPC processing layer). That is, the corresponding cache location can be determined based on the preset file distribution hash table (the file distribution hash table records the mapping relationship between each user file and the cache location). And extract cache data from the corresponding cache location and reply to the sending object.
其中具体地,可以包括以下步骤:Specifically, the following steps may be included:
步骤1、首先将IO请求中的请求用户文件与文件分布哈希表进行匹配,确定请求用户文件在云主机的存储位置。Step 1. First, match the requested user file in the IO request with the file distribution hash table to determine the storage location of the requested user file on the cloud host.
步骤2、若确定请求用户文件未存储在云主机上,通过云主机上关联的数据源处获取请求用户文件。否则跳到步骤5。Step 2. If it is determined that the requested user file is not stored on the cloud host, obtain the requested user file from the associated data source on the cloud host. Otherwise skip to step 5.
步骤3、将请求用户文件缓存到云主机中的云上高频文件存储区域(即云主机的分布式内存、虚拟磁盘以及所述云主机上挂载的文件系统其中之一),并将请求用户文件与高 频文件存储区域的映射关系更新到文件分布哈希表中。Step 3. Cache the requested user file to the high-frequency file storage area on the cloud in the cloud host (i.e., one of the distributed memory of the cloud host, the virtual disk, and the file system mounted on the cloud host), and cache the requested user file. User files and high The mapping relationship of the frequency file storage area is updated into the file distribution hash table.
步骤4、将请求用户文件发送给IO请求的发送方。Step 4. Send the requested user file to the sender of the IO request.
步骤5、若确定请求用户文件存储在云主机上,确定请求用户文件是否缓存在分布式内存中。Step 5. If it is determined that the requested user file is stored on the cloud host, determine whether the requested user file is cached in distributed memory.
步骤6、若缓存在分布式内存中,将请求用户文件发送给IO请求的发送方。Step 6. If cached in distributed memory, send the requested user file to the sender of the IO request.
步骤7、若未缓存在分布式内存中,将请求用户文件缓存到分布式内存中,并将请求用户文件与分布式内存的映射关系更新到文件分布哈希表中,将请求用户文件发送给IO请求的发送方。Step 7. If it is not cached in distributed memory, cache the requested user file into distributed memory, update the mapping relationship between the requested user file and distributed memory to the file distribution hash table, and send the requested user file to The sender of the IO request.
另外,对于系统中的数据管理来说,一种方式可以为根据用户设置的缓存配置信息和在线统计的用户文件的文件访问特征,周期性的确定每个用户文件的访问热度值。并将访问热度值低于预设热度值的用户文件进行清理。In addition, for data management in the system, one method can be to periodically determine the access popularity value of each user file based on the cache configuration information set by the user and the file access characteristics of the user files collected online. And clean up user files whose access heat value is lower than the preset heat value.
另一种方式中,也可以为根据用户设置的缓存配置信息和在线统计的用户文件的文件访问特征,周期性的确定每个用户文件的空间大小变化。从而在后续基于每个用户文件的空间大小变化,将用户文件由磁盘缓存缓存至及分布式缓存,或将分布式缓存缓存至磁盘缓存中。In another method, the space size change of each user file can also be determined periodically based on the cache configuration information set by the user and the file access characteristics of the user files in online statistics. Therefore, based on subsequent changes in the space size of each user file, the user files are cached from the disk cache to the distributed cache, or the distributed cache is cached to the disk cache.
在本申请实施例中,通过云主机的分布式内存、虚拟磁盘以及云主机上挂载的文件系统缓存用户文件;并在当获取到待缓存的用户文件时,确定用户文件对应的分级特征,并基于分级特征,将用户文件缓存至对应级别的存储区域,分级特征包括访问频率、改动频率以及数据量的至少之一,不同级别的存储区域包括云主机的分布式内存、虚拟磁盘以及云主机上挂载的文件系统。In the embodiment of this application, user files are cached through the distributed memory of the cloud host, the virtual disk, and the file system mounted on the cloud host; and when the user file to be cached is obtained, the hierarchical characteristics corresponding to the user file are determined. And based on the hierarchical characteristics, the user files are cached to the storage area of the corresponding level. The hierarchical characteristics include at least one of the access frequency, the modification frequency and the amount of data. The different levels of storage areas include the distributed memory of the cloud host, the virtual disk and the cloud host. The file system mounted on.
另外,本申请实施例中还可以根据文件分布哈希表,响应针对用户文件的IO请求,文件分布哈希表包括用户文件与缓存位置的映射关系;根据用户设置的缓存配置信息和在线统计的用户文件的文件访问特征,对缓存的用户文件进行数据管理。在本申请实施例构建的一种云主机上的包含缓存层、缓存管理层以及缓存客户端的多级数据缓存架构,可以根据记录有各个用户文件存储位置的文件分布哈希表,弹性应对云上复杂IO场景。另外,本申请实施例中由于使用了云主机空闲资源构建缓存层,从而可以充分利用云资源,又有效缓解上层处理IO压力。In addition, in the embodiment of the present application, IO requests for user files can also be responded to based on the file distribution hash table. The file distribution hash table includes the mapping relationship between user files and cache locations; based on the cache configuration information and online statistics set by the user. File access characteristics of user files, and data management of cached user files. A multi-level data cache architecture constructed on a cloud host in the embodiment of the present application including a cache layer, a cache management layer and a cache client can flexibly cope with the cloud based on the file distribution hash table that records the storage locations of each user's files. Complex IO scenarios. In addition, in the embodiment of this application, the idle resources of the cloud host are used to build the cache layer, so that cloud resources can be fully utilized and the upper layer processing IO pressure can be effectively alleviated.
可选的,本申请实施例中的一方面来说,可以基于分级特征,将用户文件缓存至对应级别的存储区域,包括:Optionally, in one aspect of the embodiment of the present application, user files can be cached to storage areas of corresponding levels based on hierarchical characteristics, including:
确定用户文件为低频访问用户文件,将低频访问用户文件缓存到云主机中的云上低频 文件存储区域,低频访问用户文件为访问频率低于第一预设频率的用户文件;Determine the user file to be a low-frequency access user file, and cache the low-frequency access user file to the low-frequency cloud server in the cloud host. In the file storage area, low-frequency access user files are user files whose access frequency is lower than the first preset frequency;
确定用户文件为高频访问用户文件,将高频访问用户文件缓存至对应级别的存储区域,高频访问用户文件为访问频率不低于第一预设频率的用户文件。The user files are determined to be high-frequency access user files, and the high-frequency access user files are cached in a storage area of a corresponding level. The high-frequency access user files are user files whose access frequency is not lower than the first preset frequency.
一种方式中,本申请可以将各个待缓存的用户文件划分为高频访问用户文件或低频访问用户文件。并将其中的高频访问用户文件存储到云主机中的云上高频文件存储区域(即云主机的分布式内存、虚拟磁盘以及所述云主机上挂载的文件系统其中之一)。以及将低频访问用户文件的缓存用户文件存储到云主机中的云上低频文件存储区域。In one way, this application can divide each user file to be cached into a high-frequency access user file or a low-frequency access user file. And store the high-frequency access user files in the cloud high-frequency file storage area in the cloud host (that is, one of the distributed memory of the cloud host, the virtual disk, and the file system mounted on the cloud host). And store the cached user files of low-frequency access user files to the cloud low-frequency file storage area in the cloud host.
可以理解的,高频访问用户文件即为用户较为频繁的进行调取操作的用户文件(即访问频率不低于第一预设频率的用户文件)。而低频访问用户文件即为用户较为不频繁的进行调取操作的用户文件(即访问频率低于第一预设频率的用户文件)。It can be understood that user files that are accessed frequently are user files that users perform relatively frequent retrieval operations (that is, user files whose access frequency is not lower than the first preset frequency). The low-frequency accessed user files are user files that the user performs relatively infrequent retrieval operations (that is, user files whose access frequency is lower than the first preset frequency).
可选的,本申请实施例中的一方面来说,确定用户文件为高频访问用户文件,将高频访问用户文件缓存至对应级别的存储区域,包括:Optionally, on the one hand, in the embodiment of the present application, it is determined that the user file is a high-frequency access user file, and the high-frequency access user file is cached in a corresponding level of storage area, including:
将对应于高频访问用户文件,且数据量不低于预设空间阈值,且改动频率不低于第二预设频率的用户文件缓存到云主机上挂载的文件系统中;Cache user files that correspond to high-frequency access to user files, and the data volume is not lower than the preset space threshold, and the modification frequency is not lower than the second preset frequency, into the file system mounted on the cloud host;
将对应于高频访问用户文件,且数据量低于预设空间阈值的用户文件缓存到虚拟磁盘或分布式内存中。Cache user files that correspond to frequently accessed user files and whose data volume is lower than the preset space threshold into virtual disks or distributed memory.
可选的,本申请实施例中的一方面来说,将对应于高频访问用户文件,且数据量低于预设空间阈值的用户文件缓存到虚拟磁盘或云主机上挂载的文件系统中,包括:Optionally, on the one hand, in the embodiment of the present application, user files corresponding to high-frequency access to user files and the data volume is lower than the preset space threshold are cached into a virtual disk or a file system mounted on the cloud host. ,include:
确定对应于高频访问用户文件,且数据量低于预设空间阈值的缓存用户文件对应的改动频率;Determine the modification frequency corresponding to cached user files that correspond to frequently accessed user files and whose data volume is lower than the preset space threshold;
将其中改动频率低于第二预设频率的缓存用户文件存储到虚拟磁盘中;或,Store cached user files whose modification frequency is lower than the second preset frequency in the virtual disk; or,
将其中改动频率不低于第二预设频率的缓存用户文件存储到分布式内存中。Store cached user files in which the modification frequency is not lower than the second preset frequency in the distributed memory.
一种方式中,分布式内存缓存可以对应缓存的用户文件为:缓存访问频率高且文件块粒度,容量较小的用户文件(即将对应于高频访问用户文件,且数据量低于预设空间阈值的用户文件),同时其还可以为上层处理提供低时延的数据访问。共享型缓存。In one method, the distributed memory cache can correspond to cached user files: user files with high cache access frequency and file block granularity and small capacity (that is, corresponding to user files with high frequency access and the data volume is lower than the preset space). threshold user files), and it can also provide low-latency data access for upper-layer processing. Shared cache.
一种方式中,磁盘缓存可以对应缓存的用户文件为:缓存访问频率高、且文件块粒度,容量较小、且改动不频繁的文件(即将对应于高频访问用户文件,且数据量低于预设空间阈值,且改动频率低于第二预设频率的用户文件),避免共享型缓存可能出现的单点IO瓶颈,并为分布式内存缓存分担压力。In one method, the disk cache can correspond to cached user files: files with high cache access frequency, file block granularity, small capacity, and infrequent changes (that is, files corresponding to high-frequency access user files, and the data volume is less than User files with a preset space threshold and a change frequency lower than the second preset frequency) avoid single-point IO bottlenecks that may occur in shared caches and share the pressure on distributed memory caches.
另一种方式在,云主机上挂载的文件系统可以对应缓存的用户文件为:缓存访问频率 较高的文件且文件块粒度,容量较大、且改动频繁的用户文件(即将对应于高频访问用户文件,且数据量不低于预设空间阈值,且改动频率不低于第二预设频率的用户文件)。可以理解的,其作为分布式内存缓存和磁盘缓存的后端,承载上层处理所需的绝大部分热/温数据,从而降低对数据源的访问Another way is that the file system mounted on the cloud host can correspond to the cached user files: cache access frequency Files with higher file block granularity, user files with large capacity and frequent changes (that is, corresponding to user files with high frequency access, and the data volume is not lower than the preset space threshold, and the change frequency is not lower than the second preset frequency user file). Understandably, as the backend of distributed memory cache and disk cache, it carries most of the hot/warm data required for upper-layer processing, thereby reducing access to data sources.
可选的,本申请实施例中的一方面来说,在将用户文件缓存至对应级别的存储区域之后,还包括:Optionally, in one aspect of the embodiment of the present application, after caching user files to a storage area of a corresponding level, it also includes:
根据用户设置的缓存配置信息和在线统计的用户文件的文件访问特征,周期性的确定每个用户文件的访问热度值;Based on the cache configuration information set by the user and the file access characteristics of user files collected online, the access popularity value of each user file is periodically determined;
将访问热度值低于预设热度值的用户文件进行清理。Clean up user files whose access heat value is lower than the preset heat value.
可选的,本申请实施例中的一方面来说,在将用户文件缓存至对应级别的存储区域之后,还包括:Optionally, in one aspect of the embodiment of the present application, after caching user files to a storage area of a corresponding level, it also includes:
根据用户设置的缓存配置信息和在线统计的用户文件的文件访问特征,周期性的确定每个用户文件的空间大小变化;Based on the cache configuration information set by the user and the file access characteristics of user files collected online, the space size change of each user file is periodically determined;
基于每个用户文件的空间大小变化,将用户文件缓存至不同的存储区域,存储区域包括磁盘缓存以及分布式缓存中的其中一种。Based on the change in space size of each user file, the user files are cached in different storage areas. The storage area includes one of disk cache and distributed cache.
一种方式中,对于系统中的数据管理来说,一种方式可以为根据用户设置的缓存配置信息和在线统计的用户文件的文件访问特征,周期性的确定每个用户文件的访问热度值。并将访问热度值低于预设热度值的用户文件进行清理。In one method, for data management in the system, one method can be to periodically determine the access popularity value of each user file based on the cache configuration information set by the user and the file access characteristics of the user files collected online. And clean up user files whose access heat value is lower than the preset heat value.
另外,本申请可以周期性统计分级特征信息更新到文件访问特征表并获取缓存配置中心相关配置信息,根据读文件块空间大小变化,结合缓存配置中心配置信息,实现各存储区域的数据流动(即基于每个用户文件的空间大小变化,将用户文件由磁盘缓存缓存至及分布式缓存,或将分布式缓存缓存至磁盘缓存中)。并同步更新文件分布哈希表。In addition, this application can periodically update the statistical hierarchical feature information to the file access feature table and obtain the relevant configuration information of the cache configuration center. According to the change in the read file block space size, combined with the configuration information of the cache configuration center, the data flow of each storage area is realized (i.e. Based on the change in space size of each user file, the user files are cached from the disk cache to the distributed cache, or the distributed cache is cached to the disk cache). And update the file distribution hash table synchronously.
可选的,本申请实施例中的一方面来说,在将用户文件缓存至对应级别的存储区域之后,还包括:Optionally, in one aspect of the embodiment of the present application, after caching user files to a storage area of a corresponding level, it also includes:
接收针对用户文件的IO请求;Receive IO requests for user files;
将IO请求中的请求用户文件与文件分布哈希表进行匹配,确定请求用户文件在云主机的存储位置,文件分布哈希表包括用户文件与缓存位置的映射关系;Match the requested user file in the IO request with the file distribution hash table to determine the storage location of the requested user file on the cloud host. The file distribution hash table includes the mapping relationship between the user file and the cache location;
若确定请求用户文件未存储在云主机上,通过云主机上关联的数据源处获取请求用户文件;If it is determined that the requested user file is not stored on the cloud host, obtain the requested user file from the associated data source on the cloud host;
将请求用户文件缓存至对应级别的存储区域,并将请求用户文件与对应级别的存储区 域的映射关系更新到文件分布哈希表中;Cache the requested user file to the storage area of the corresponding level, and cache the requested user file with the storage area of the corresponding level. The mapping relationship of the domain is updated to the file distribution hash table;
将请求用户文件发送给IO请求的发送方。Send the requested user file to the sender of the IO request.
一种方式中,若接收到发送对象(例如为HPC处理层)发送的针对用户文件的IO请求。即可以基于预设的文件分布哈希表(文件分布哈希表中记录有各个用户文件与缓存位置的映射关系)确定对应的缓存位置。并从对应缓存位置中提取缓存数据并回复给该发送对象。In one way, if an IO request for a user file is received from the sending object (for example, the HPC processing layer). That is, the corresponding cache location can be determined based on the preset file distribution hash table (the file distribution hash table records the mapping relationship between each user file and the cache location). And extract cache data from the corresponding cache location and reply to the sending object.
其中具体地,可以包括以下步骤:Specifically, the following steps may be included:
步骤1、首先将IO请求中的请求用户文件与文件分布哈希表进行匹配,确定请求用户文件在云主机的存储位置。Step 1. First, match the requested user file in the IO request with the file distribution hash table to determine the storage location of the requested user file on the cloud host.
步骤2、若确定请求用户文件未存储在云主机上,通过云主机上关联的数据源处获取请求用户文件。否则跳到步骤5。Step 2. If it is determined that the requested user file is not stored on the cloud host, obtain the requested user file from the associated data source on the cloud host. Otherwise skip to step 5.
步骤3、将请求用户文件缓存到云主机中的云上高频文件存储区域,并将请求用户文件与高频文件存储区域的映射关系更新到文件分布哈希表中。Step 3: Cache the requested user file to the cloud high-frequency file storage area in the cloud host, and update the mapping relationship between the requested user file and the high-frequency file storage area into the file distribution hash table.
步骤4、将请求用户文件发送给IO请求的发送方。Step 4. Send the requested user file to the sender of the IO request.
步骤5、若确定请求用户文件存储在云主机上,确定请求用户文件是否缓存在分布式内存中。Step 5. If it is determined that the requested user file is stored on the cloud host, determine whether the requested user file is cached in distributed memory.
步骤6、若缓存在分布式内存中,将请求用户文件发送给IO请求的发送方。Step 6. If cached in distributed memory, send the requested user file to the sender of the IO request.
步骤7、若未缓存在分布式内存中,将请求用户文件缓存到分布式内存中,并将请求用户文件与分布式内存的映射关系更新到文件分布哈希表中,将请求用户文件发送给IO请求的发送方。Step 7. If it is not cached in distributed memory, cache the requested user file into distributed memory, update the mapping relationship between the requested user file and distributed memory to the file distribution hash table, and send the requested user file to The sender of the IO request.
另外,本申请实施例在将IO请求中的请求用户文件与文件分布哈希表进行匹配,确定请求用户文件在云主机的存储位置之后,还包括:In addition, after matching the requested user file in the IO request with the file distribution hash table and determining the storage location of the requested user file on the cloud host, the embodiment of this application also includes:
若确定请求用户文件存储在云主机上,确定请求用户文件是否缓存在分布式内存中;If it is determined that the requested user file is stored on the cloud host, determine whether the requested user file is cached in distributed memory;
若缓存在分布式内存中,将请求用户文件发送给IO请求的发送方;If cached in distributed memory, the requested user file will be sent to the sender of the IO request;
若未缓存在分布式内存中,将请求用户文件缓存到分布式内存中,并将请求用户文件与分布式内存的映射关系更新到文件分布哈希表中,将请求用户文件发送给IO请求的发送方。If it is not cached in distributed memory, the requested user file will be cached in distributed memory, the mapping relationship between the requested user file and distributed memory will be updated to the file distribution hash table, and the requested user file will be sent to the IO requester. sender.
在本申请实施例中,可以构建一种云主机上的包含缓存层、缓存管理层以及缓存客户端的多级数据缓存架构,并根据记录有各个用户文件存储位置的文件分布哈希表,弹性应对云上复杂IO场景。另外,本申请实施例中由于使用了云主机空闲资源构建缓存层,从 而可以充分利用云资源,又有效缓解上层处理IO压力。In the embodiment of this application, a multi-level data cache architecture including a cache layer, a cache management layer, and a cache client can be constructed on the cloud host, and a flexible response can be implemented based on the file distribution hash table that records the storage locations of each user's files. Complex IO scenarios on the cloud. In addition, in the embodiment of this application, since the idle resources of the cloud host are used to build the cache layer, It can make full use of cloud resources and effectively alleviate the IO pressure of upper-layer processing.
本申请实施例还提供一种云上数据缓存系统,该系统包括数据源层、云主机上的缓存层、缓存管控层、云主机上的缓存客户端以及HPC处理端,其中:Embodiments of the present application also provide a cloud data caching system, which includes a data source layer, a cache layer on a cloud host, a cache management and control layer, a cache client on the cloud host, and an HPC processing end, where:
数据源层包括云上低频文件存储、云上对象存储以及IDC文件存储;The data source layer includes low-frequency file storage on the cloud, object storage on the cloud, and IDC file storage;
缓存层包括云主机上挂载的文件系统、分布式内存以及虚拟磁盘,缓存层用于对高频访问的用户文件进行缓存;The cache layer includes the file system, distributed memory and virtual disk mounted on the cloud host. The cache layer is used to cache frequently accessed user files;
缓存管控层包括缓存配置中心、文件访问特征统计表以及文件分布哈希表,缓存管控层用于对缓存用户文件进行管理;The cache management and control layer includes the cache configuration center, file access characteristic statistics table and file distribution hash table. The cache management and control layer is used to manage cached user files;
其中,缓存客户端用于为HPC处理层提供数据操作接口以及处理IO请求。Among them, the cache client is used to provide a data operation interface for the HPC processing layer and process IO requests.
一种方式中,本申请提出的云上数据缓存系统中可以不区分数据源,可以是用户本地文件存储、云上低频文件存储或者云上对象存储。In one approach, the cloud data caching system proposed in this application does not distinguish data sources, which can be user local file storage, cloud low-frequency file storage, or cloud object storage.
具体地,对于数据源层中的云上低频文件存储来说,可以为基于文件系统挂载的方式支持各个云主机之间的用户文件数据的传输。另外,对于云上对象存储来说,其可以支持云主机之间通过特定API接口进行用户文件数据的传输访问,在数据分发上具备一定优势。一种方式中,其也仅用于数据访问频率较低是用户文件的缓存。而对于IDC文件存储来说,其位于用户本地机房的文件存储,一方面支撑用户本地处理,另一方面通过专线/VPN与云上网络打通。Specifically, for low-frequency file storage on the cloud in the data source layer, the transmission of user file data between various cloud hosts can be supported based on file system mounting. In addition, for cloud object storage, it can support the transmission and access of user file data between cloud hosts through specific API interfaces, which has certain advantages in data distribution. One approach, which is also used only for less frequently accessed data, is the caching of user files. As for IDC file storage, its file storage located in the user's local computer room supports the user's local processing on the one hand, and on the other hand is connected to the cloud network through a dedicated line/VPN.
进一步的,对于缓存层中的云主机上挂载的文件系统来说,其可以用于持久化全局缓存,支持云主机之间的用户文件数据的传输,性能比低频文件存储更强。在本申请实施例中,可以用于缓存数据源中访问频率较高、文件数据量较大、数据改动较频繁的文件,如:HPC任务的核心用户文件。Furthermore, for the file system mounted on the cloud host in the cache layer, it can be used for persistent global cache to support the transmission of user file data between cloud hosts, and its performance is stronger than low-frequency file storage. In this embodiment of the present application, it can be used to cache files in the data source that are accessed more frequently, have larger file data volumes, and frequently change data, such as core user files for HPC tasks.
另外,对于磁盘缓存来说,其可以用于持久化局部缓存层。本申请实施例中,云主机的空闲磁盘空间用于缓存数据源中访问频率高、文件数据量较小、数据改动不频繁的文件,如:处理软件、程序插件、前后处理脚本等。一种方式中,可以通过给云主机添加数据盘的方式来扩展磁盘缓存容量。Additionally, for disk caching, it can be used to persist local cache layers. In the embodiment of this application, the free disk space of the cloud host is used to cache files in data sources with high access frequency, small file data volume, and infrequent data changes, such as: processing software, program plug-ins, pre- and post-processing scripts, etc. In one method, the disk cache capacity can be expanded by adding a data disk to the cloud host.
另外,对于分布式内存来说,可以用于非持久化全局缓存层,使用云主机内存。本申请实施例中,多台云主机的空闲内存通过tmpfs/ramdisk等方式构建内存文件系统并形成分布式的内存缓存层,并由缓存管控层进行统一管理,用于缓存数据源中被高频访问的文件块。用户处理高峰期云主机数量越多,内存缓存空间就越大。In addition, for distributed memory, it can be used in the non-persistent global cache layer and use cloud host memory. In the embodiment of this application, the free memory of multiple cloud hosts builds a memory file system through tmpfs/ramdisk and other methods to form a distributed memory cache layer, and is uniformly managed by the cache management and control layer to cache data sources that are frequently used. The file block being accessed. The more cloud hosts there are during peak user processing periods, the larger the memory cache space will be.
更进一步,对于本申请云上数据缓存系统中缓存管控层的缓存配置中心而言,可以 用于维护系统中缓存数据的缓存配置信息并向用户提供缓存控制接口。一种方式中,用户可通过与缓存配置中心的交互,实现对各缓存层的开启和关闭,从而配合缓存客户端达到用户文件随时调取的效果。另外,也可以实现缓存冷数据清理策略的作用,如:根据内存/磁盘占用比例和文件热度来对数据访问频率较低的数据进行定时的清理。再者,还可以实现对缓存策略的定制,如基于特定文件名的数据预取。Furthermore, for the cache configuration center of the cache management and control layer in the cloud data cache system of this application, it can be Used to maintain cache configuration information of cached data in the system and provide cache control interfaces to users. In one method, users can turn on and off each cache layer through interaction with the cache configuration center, so as to cooperate with the cache client to achieve the effect of retrieving user files at any time. In addition, the cache cold data cleaning strategy can also be implemented, such as regularly cleaning data with low data access frequency based on the memory/disk occupancy ratio and file popularity. Furthermore, caching strategies can also be customized, such as data prefetching based on specific file names.
另外,对于文件访问特征统计表来说,可以用于维护上层HPC Workload的文件访问特征,包括但不限于文件访问热度、访问模式(顺序访问、随机访问)、读文件块大小等。支持周期性维度的统计(例如可以按月/天/小时/分钟)。文件访问特征统计表也用于为缓存冷数据清理和数据流动提供输入。In addition, the file access characteristic statistics table can be used to maintain the file access characteristics of the upper-layer HPC Workload, including but not limited to file access popularity, access mode (sequential access, random access), read file block size, etc. Supports periodic dimension statistics (for example, by month/day/hour/minute). File access characteristics statistics tables are also used to provide input for cache cold data cleanup and data flow.
可选的,文件分布哈希表用于维护用户文件/用户文件块在缓存层中存放的位置,支持HPC Workload高效获取目标文件/文件块,保证上层处理运行效率。Optionally, the file distribution hash table is used to maintain the storage location of user files/user file blocks in the cache layer, supporting HPC Workload to efficiently obtain target files/file blocks, and ensuring the efficiency of upper-layer processing.
需要说明的是,数据缓存系统中缓存管控层的三个组件(即缓存配置中心、文件访问特征统计表以及文件分布哈希表)可以用于存放缓存核心数据,存储方式不限于数据库、redis、文件,并通过互斥锁保证分布式架构下数据访问/更新的一致性。It should be noted that the three components of the cache management and control layer in the data cache system (i.e. cache configuration center, file access characteristic statistics table and file distribution hash table) can be used to store cache core data, and the storage method is not limited to database, redis, files, and ensure the consistency of data access/update under the distributed architecture through mutex locks.
可选的,本申请中的云上数据缓存系统还可以包括:Optionally, the cloud data caching system in this application can also include:
将低频访问用户文件缓存到云上低频文件存储中,其中低频访问用户文件为访问频率低于预设频率的用户文件。Cache low-frequency access user files to low-frequency file storage on the cloud, where low-frequency access user files are user files whose access frequency is lower than the preset frequency.
可选的,本申请中的云上数据缓存系统还可以包括:Optionally, the cloud data caching system in this application can also include:
将从数据源层获取到的高频访问的用户文件中,数据量不低于预设空间阈值的缓存用户文件存储到分布式内存中;Among the frequently accessed user files obtained from the data source layer, cached user files whose data volume is not less than the preset space threshold are stored in distributed memory;
将从数据源层获取到的高频访问的用户文件中,数据量低于预设空间阈值,且改动频率低于预设频率的用户文件存储到虚拟磁盘中;Among the frequently accessed user files obtained from the data source layer, the user files whose data volume is lower than the preset space threshold and whose modification frequency is lower than the preset frequency are stored in the virtual disk;
将从数据源层获取到的高频访问的用户文件中,数据量低于预设空间阈值且改动频率不低于预设频率的用户文件存储到云主机上挂载的文件系统中。Among the frequently accessed user files obtained from the data source layer, the user files whose data volume is lower than the preset space threshold and whose modification frequency is not lower than the preset frequency are stored in the file system mounted on the cloud host.
可选的,本申请中的云上数据缓存系统还可以包括:Optionally, the cloud data caching system in this application can also include:
缓存配置中心用于维护缓存配置信息并向用户提供缓存控制接口;The cache configuration center is used to maintain cache configuration information and provide cache control interfaces to users;
文件访问特征统计表用于采集HPC处理层的文件访问特征,包括文件访问特征包括文件访问热度、访问模式以及用户文件的数据量。The file access characteristics statistics table is used to collect file access characteristics of the HPC processing layer, including file access characteristics including file access popularity, access mode, and data volume of user files.
本申请的上述实施例提供的云上数据缓存系统与本申请实施例提供的云上数据缓存方法出于相同的发明构思,具有与其存储的应用程序所采用、运行或实现的方法相同的有 益效果。The cloud data caching system provided by the above embodiments of the present application and the cloud data caching method provided by the embodiments of the present application are based on the same inventive concept, and have the same advantages as the methods adopted, run or implemented by the applications stored therein. beneficial effect.
本申请实施例还提供一种云上数据缓存装置,该装置用于执行上述任一实施例提供的云上数据缓存方法所执行的操作。如图3所示,该装置包括:An embodiment of the present application also provides a cloud data caching device, which is configured to perform operations performed by the cloud data caching method provided in any of the above embodiments. As shown in Figure 3, the device includes:
部署模块201,用于通过云主机的分布式内存、虚拟磁盘以及所述云主机上挂载的文件系统缓存用户文件;Deployment module 201 is used to cache user files through the distributed memory of the cloud host, the virtual disk, and the file system mounted on the cloud host;
响应模块202,用于当获取到待缓存的用户文件时,确定所述用户文件对应的分级特征,并基于所述分级特征,将所述用户文件缓存至对应级别的存储区域,所述分级特征包括访问频率、改动频率以及数据量的至少之一,所述不同级别的存储区域包括所述云主机的分布式内存、虚拟磁盘以及所述云主机上挂载的文件系统。The response module 202 is configured to, when obtaining a user file to be cached, determine the hierarchical characteristics corresponding to the user file, and cache the user file to a storage area of the corresponding level based on the hierarchical characteristics. The hierarchical characteristics Including at least one of access frequency, modification frequency and data volume, the different levels of storage areas include distributed memory of the cloud host, virtual disks and file systems mounted on the cloud host.
响应模块202,具体用于确定所述用户文件为低频访问用户文件,将所述低频访问用户文件缓存到所述云主机中的云上低频文件存储区域,所述低频访问用户文件为访问频率低于第一预设频率的用户文件;The response module 202 is specifically configured to determine that the user file is a low-frequency access user file, and cache the low-frequency access user file to the cloud low-frequency file storage area in the cloud host. The low-frequency access user file is a low-frequency access user file. User file at the first preset frequency;
响应模块202,具体用于确定所述用户文件为高频访问用户文件,将所述高频访问用户文件缓存至所述对应级别的存储区域,所述高频访问用户文件为访问频率不低于所述第一预设频率的用户文件。The response module 202 is specifically configured to determine that the user file is a user file with high frequency access, and cache the user file with high frequency access to the storage area of the corresponding level. The user file with high frequency access is an access frequency no less than The user file of the first preset frequency.
响应模块202,具体用于将所述对应于高频访问用户文件,且数据量不低于预设空间阈值,且改动频率不低于第二预设频率的用户文件缓存到所述云主机上挂载的文件系统中;The response module 202 is specifically configured to cache the user files corresponding to high-frequency access to user files, the data volume is not lower than the preset space threshold, and the modification frequency is not lower than the second preset frequency to the cloud host. In the mounted file system;
响应模块202,具体用于将所述对应于高频访问用户文件,且数据量低于所述预设空间阈值的用户文件缓存到所述虚拟磁盘或所述分布式内存中。The response module 202 is specifically configured to cache the user files corresponding to high-frequency access user files and whose data volume is lower than the preset space threshold into the virtual disk or the distributed memory.
响应模块202,具体用于确定所述对应于高频访问用户文件,且数据量低于所述预设空间阈值的缓存用户文件对应的改动频率;The response module 202 is specifically used to determine the modification frequency corresponding to the cached user files that correspond to high-frequency access user files and whose data volume is lower than the preset space threshold;
响应模块202,具体用于将其中改动频率低于所述第二预设频率的缓存用户文件存储到所述虚拟磁盘中;或,The response module 202 is specifically configured to store cached user files whose modification frequency is lower than the second preset frequency into the virtual disk; or,
响应模块202,具体用于将其中改动频率不低于所述第二预设频率的缓存用户文件存储到所述分布式内存中。The response module 202 is specifically configured to store cached user files whose modification frequency is not lower than the second preset frequency into the distributed memory.
部署模块201,具体用于根据用户设置的缓存配置信息和在线统计的所述用户文件的文件访问特征,周期性的确定每个用户文件的访问热度值;The deployment module 201 is specifically configured to periodically determine the access popularity value of each user file based on the cache configuration information set by the user and the file access characteristics of the user file in online statistics;
部署模块201,具体用于将访问热度值低于预设热度值的用户文件进行清理。The deployment module 201 is specifically used to clean up user files whose access popularity value is lower than the preset popularity value.
部署模块201,具体用于根据用户设置的缓存配置信息和在线统计的所述用户文件的文件访问特征,周期性的确定每个用户文件的空间大小变化; The deployment module 201 is specifically configured to periodically determine the space size change of each user file based on the cache configuration information set by the user and the file access characteristics of the user file according to online statistics;
部署模块201,具体用于基于每个用户文件的空间大小变化,将所述用户文件缓存至不同的存储区域,所述存储区域包括所述磁盘缓存以及所述分布式缓存中的其中一种。The deployment module 201 is specifically configured to cache each user file to a different storage area based on the change in space size of each user file. The storage area includes one of the disk cache and the distributed cache.
部署模块201,具体用于接收针对用户文件的IO请求;Deployment module 201 is specifically used to receive IO requests for user files;
部署模块201,具体用于将所述IO请求中的请求用户文件与文件分布哈希表进行匹配,确定所述请求用户文件在所述云主机的存储位置,所述文件分布哈希表包括用户文件与缓存位置的映射关系;Deployment module 201 is specifically used to match the requested user file in the IO request with a file distribution hash table, and determine the storage location of the requested user file on the cloud host. The file distribution hash table includes user Mapping relationship between files and cache locations;
部署模块201,具体用于若确定所述请求用户文件未存储在所述云主机上,通过所述云主机上关联的数据源处获取所述请求用户文件;The deployment module 201 is specifically configured to obtain the requested user file through the associated data source on the cloud host if it is determined that the requested user file is not stored on the cloud host;
部署模块201,具体用于将所述请求用户文件缓存至对应级别的存储区域,并将所述请求用户文件与所述对应级别的存储区域的映射关系更新到所述文件分布哈希表中;The deployment module 201 is specifically configured to cache the requested user file to the storage area of the corresponding level, and update the mapping relationship between the requested user file and the storage area of the corresponding level into the file distribution hash table;
部署模块201,具体用于将所述请求用户文件发送给所述IO请求的发送方。The deployment module 201 is specifically configured to send the requested user file to the sender of the IO request.
本申请的上述实施例提供的云上数据缓存装置与本申请实施例提供的云上数据缓存方法出于相同的发明构思,具有与其存储的应用程序所采用、运行或实现的方法相同的有益效果。The cloud data caching device provided by the above embodiments of the present application and the cloud data caching method provided by the embodiments of the present application are based on the same inventive concept, and have the same beneficial effects as the methods adopted, run or implemented by the applications stored therein. .
本申请实施方式还提供一种电子设备,以执行上述云上数据缓存方法。请参考图4,其示出了本申请的一些实施方式所提供的一种电子设备的示意图。如图4所示,电子设备3包括:处理器300,存储器301,总线302和通信接口303,所述处理器300、通信接口303和存储器301通过总线302连接;所述存储器301中存储有可在所述处理器300上运行的计算机程序,所述处理器300运行所述计算机程序时执行本申请前述任一实施方式所提供的云上数据缓存方法。An embodiment of the present application also provides an electronic device to execute the above cloud data caching method. Please refer to FIG. 4 , which shows a schematic diagram of an electronic device provided by some embodiments of the present application. As shown in Figure 4, the electronic device 3 includes: a processor 300, a memory 301, a bus 302 and a communication interface 303. The processor 300, the communication interface 303 and the memory 301 are connected through the bus 302; the memory 301 stores available data. A computer program runs on the processor 300. When the processor 300 runs the computer program, it executes the cloud data caching method provided in any of the previous embodiments of this application.
其中,存储器301可能包含高速随机存取存储器(RAM:Random Access Memory),也可能还包括非不稳定的存储器(non-volatile memory),例如至少一个磁盘存储器。通过至少一个通信接口303(可以是有线或者无线)实现该装置网元与至少一个其他网元之间的通信连接,可以使用互联网、广域网、本地网、城域网等。Among them, the memory 301 may include high-speed random access memory (RAM: Random Access Memory), and may also include non-volatile memory (non-volatile memory), such as at least one disk memory. The communication connection between the device network element and at least one other network element is realized through at least one communication interface 303 (which can be wired or wireless), and the Internet, wide area network, local network, metropolitan area network, etc. can be used.
总线302可以是ISA总线、PCI总线或EISA总线等。所述总线可以分为地址总线、数据总线、控制总线等。其中,存储器301用于存储程序,所述处理器300在接收到执行指令后,执行所述程序,前述本申请实施例任一实施方式揭示的所述云上数据缓存方法可以应用于处理器300中,或者由处理器300实现。The bus 302 may be an ISA bus, a PCI bus, an EISA bus, etc. The bus can be divided into address bus, data bus, control bus, etc. The memory 301 is used to store a program. After receiving the execution instruction, the processor 300 executes the program. The cloud data caching method disclosed in any of the embodiments of the present application can be applied to the processor 300 , or implemented by the processor 300 .
处理器300可能是一种集成电路芯片,具有信号的处理能力。在实现过程中,上述方法的各步骤可以通过处理器300中的硬件的集成逻辑电路或者软件形式的指令完成。上述 的处理器300可以是通用处理器,包括处理器(Central Processing Unit,简称CPU)、网络处理器(Network Processor,简称NP)等;还可以是数字信号处理器(DSP)、专用集成电路(ASIC)、现成可编程门阵列(FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器301,处理器300读取存储器301中的信息,结合其硬件完成上述方法的步骤。The processor 300 may be an integrated circuit chip with signal processing capabilities. During the implementation process, each step of the above method can be completed by instructions in the form of hardware integrated logic circuits or software in the processor 300 . above The processor 300 may be a general-purpose processor, including a central processing unit (CPU), a network processor (NP), etc.; it may also be a digital signal processor (DSP), application specific integrated circuit (ASIC), etc. ), off-the-shelf programmable gate arrays (FPGAs) or other programmable logic devices, discrete gate or transistor logic devices, and discrete hardware components. Each method, step and logical block diagram disclosed in the embodiment of this application can be implemented or executed. A general-purpose processor may be a microprocessor or the processor may be any conventional processor, etc. The steps of the method disclosed in conjunction with the embodiments of the present application can be directly implemented by a hardware decoding processor, or executed by a combination of hardware and software modules in the decoding processor. The software module can be located in random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, registers and other mature storage media in this field. The storage medium is located in the memory 301. The processor 300 reads the information in the memory 301 and completes the steps of the above method in combination with its hardware.
本申请实施例提供的电子设备与本申请实施例提供的云上数据缓存方法出于相同的发明构思,具有与其采用、运行或实现的方法相同的有益效果。The electronic device provided by the embodiments of this application and the cloud data caching method provided by the embodiments of this application are based on the same inventive concept, and have the same beneficial effects as the methods adopted, run or implemented.
本申请实施方式还提供一种与前述实施方式所提供的云上数据缓存方法对应的计算机可读存储介质,请参考图5,其示出的计算机可读存储介质为光盘30,其上存储有计算机程序(即程序产品),所述计算机程序在被处理器运行时,会执行前述任意实施方式所提供的云上数据缓存方法。The embodiment of the present application also provides a computer-readable storage medium corresponding to the cloud data caching method provided in the previous embodiment. Please refer to Figure 5. The computer-readable storage medium shown is an optical disk 30, on which is stored A computer program (i.e., a program product). When the computer program is run by a processor, the computer program will execute the cloud data caching method provided by any of the foregoing embodiments.
需要说明的是,所述计算机可读存储介质的例子还可以包括,但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他光学、磁性存储介质,在此不再一一赘述。It should be noted that examples of the computer-readable storage medium may also include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), and other types of random access memory. Access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other optical and magnetic storage media will not be described in detail here.
本申请的上述实施例提供的计算机可读存储介质与本申请实施例提供的云上数据缓存方法出于相同的发明构思,具有与其存储的应用程序所采用、运行或实现的方法相同的有益效果。The computer-readable storage medium provided by the above embodiments of the present application is based on the same inventive concept as the cloud data caching method provided by the embodiments of the present application, and has the same beneficial effects as the methods used, run or implemented by the applications stored therein. .
需要说明的是:It should be noted:
在此处所提供的说明书中,说明了大量具体细节。然而,能够理解,本申请的实施例可以在没有这些具体细节的情况下实践。在一些实例中,并未详细示出公知的结构和技术,以便不模糊对本说明书的理解。In the instructions provided here, a number of specific details are described. However, it is understood that embodiments of the present application may be practiced without these specific details. In some instances, well-known structures and techniques are not shown in detail so as not to obscure the understanding of this description.
类似地,应当理解,为了精简本申请并帮助理解各个发明方面中的一个或多个,在上面对本申请的示例性实施例的描述中,本申请的各个特征有时被一起分组到单个实施例、图、或者对其的描述中。然而,并不应将该公开的方法解释成反映如下示意图:即所要求 保护的本申请要求比在每个权利要求中所明确记载的特征更多的特征。更确切地说,如下面的权利要求书所反映的那样,发明方面在于少于前面公开的单个实施例的所有特征。因此,遵循具体实施方式的权利要求书由此明确地并入该具体实施方式,其中每个权利要求本身都作为本申请的单独实施例。Similarly, it will be understood that in the above description of exemplary embodiments of the present application, various features of the present application are sometimes grouped together into a single embodiment, in order to streamline the application and assist in understanding one or more of the various inventive aspects. figure, or its description. However, the disclosed methods should not be interpreted to reflect the following schematic diagram: i.e. the claimed The application for protection requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the Detailed Description are hereby expressly incorporated into this Detailed Description, with each claim standing on its own as a separate embodiment of this application.
此外,本领域的技术人员能够理解,尽管在此所述的一些实施例包括其它实施例中所包括的某些特征而不是其它特征,但是不同实施例的特征的组合意味着处于本申请的范围之内并且形成不同的实施例。例如,在下面的权利要求书中,所要求保护的实施例的任意之一都可以以任意的组合方式来使用。Furthermore, those skilled in the art will understand that although some embodiments described herein include certain features included in other embodiments but not others, combinations of features of different embodiments are meant to be within the scope of the present application. within and form different embodiments. For example, in the following claims, any of the claimed embodiments may be used in any combination.
以上所述,仅为本申请较佳的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到的变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。 The above are only preferred specific implementations of the present application, but the protection scope of the present application is not limited thereto. Any person familiar with the technical field can easily think of changes or modifications within the technical scope disclosed in the present application. Replacements shall be covered by the protection scope of this application. Therefore, the protection scope of this application should be subject to the protection scope of the claims.

Claims (14)

  1. 一种云上数据缓存方法,其特征在于,所述方法包括:A data caching method on the cloud, characterized in that the method includes:
    通过云主机的分布式内存、虚拟磁盘以及所述云主机上挂载的文件系统缓存用户文件;Cache user files through the cloud host's distributed memory, virtual disk, and file system mounted on the cloud host;
    当获取到待缓存的用户文件时,确定所述用户文件对应的分级特征,并基于所述分级特征,将所述用户文件缓存至对应级别的存储区域,所述分级特征包括访问频率、改动频率以及数据量的至少之一,所述不同级别的存储区域包括所述云主机的分布式内存、虚拟磁盘以及所述云主机上挂载的文件系统。When the user file to be cached is obtained, the hierarchical characteristics corresponding to the user file are determined, and based on the hierarchical characteristics, the user files are cached to the storage area of the corresponding level. The hierarchical characteristics include access frequency and modification frequency. And at least one of the amount of data, the different levels of storage areas include the distributed memory of the cloud host, the virtual disk, and the file system mounted on the cloud host.
  2. 根据权利要求1所述的方法,其特征在于,所述基于所述分级特征,将所述用户文件缓存至对应级别的存储区域,包括:The method according to claim 1, characterized in that, based on the hierarchical characteristics, caching the user files to a storage area of a corresponding level includes:
    确定所述用户文件为低频访问用户文件,将所述低频访问用户文件缓存到所述云主机中的云上低频文件存储区域,所述低频访问用户文件为访问频率低于第一预设频率的用户文件;Determine that the user file is a low-frequency access user file, and cache the low-frequency access user file to the cloud low-frequency file storage area in the cloud host. The low-frequency access user file is an access frequency lower than the first preset frequency. user files;
    确定所述用户文件为高频访问用户文件,将所述高频访问用户文件缓存至所述对应级别的存储区域,所述高频访问用户文件为访问频率不低于所述第一预设频率的用户文件。Determine that the user file is a high-frequency access user file, cache the high-frequency access user file to the storage area of the corresponding level, and the high-frequency access user file is an access frequency not lower than the first preset frequency user files.
  3. 根据权利要求2所述的方法,其特征在于,所述确定所述用户文件为高频访问用户文件,将所述高频访问用户文件缓存至所述对应级别的存储区域,包括:The method according to claim 2, wherein determining that the user file is a high-frequency access user file and caching the high-frequency access user file to the storage area of the corresponding level includes:
    将所述对应于高频访问用户文件,且数据量不低于预设空间阈值,且改动频率不低于第二预设频率的用户文件缓存到所述云主机上挂载的文件系统中;Cache the user files corresponding to high-frequency access to user files, the data volume is not lower than the preset space threshold, and the modification frequency is not lower than the second preset frequency into the file system mounted on the cloud host;
    将所述对应于高频访问用户文件,且数据量低于所述预设空间阈值的用户文件缓存到所述虚拟磁盘或所述分布式内存中。The user files corresponding to frequently accessed user files and whose data volume is lower than the preset space threshold are cached in the virtual disk or the distributed memory.
  4. 根据权利要求3所述的方法,其特征在于,将将所述对应于高频访问用户文件,且数据量低于所述预设空间阈值的用户文件缓存到所述虚拟磁盘或所述分布式内存中,包括:The method according to claim 3, characterized in that, the user files corresponding to high-frequency access user files and the data volume is lower than the preset space threshold are cached to the virtual disk or the distributed in memory, including:
    确定所述对应于高频访问用户文件,且数据量低于所述预设空间阈值的缓存用户文件对应的改动频率;Determine the modification frequency corresponding to the cached user files that correspond to high-frequency access user files and whose data volume is lower than the preset space threshold;
    将其中改动频率低于所述第二预设频率的缓存用户文件存储到所述虚拟磁盘中;或,Store cached user files whose modification frequency is lower than the second preset frequency into the virtual disk; or,
    将其中改动频率不低于所述第二预设频率的缓存用户文件存储到所述分布式内存中。 Store cached user files in which the modification frequency is not lower than the second preset frequency into the distributed memory.
  5. 根据权利要求1所述的方法,其特征在于,在所述将所述用户文件缓存至对应级别的存储区域之后,还包括:The method according to claim 1, characterized in that, after caching the user file to a storage area of a corresponding level, it further includes:
    根据用户设置的缓存配置信息和在线统计的所述用户文件的文件访问特征,周期性的确定每个用户文件的访问热度值;Periodically determine the access popularity value of each user file according to the cache configuration information set by the user and the file access characteristics of the user file according to online statistics;
    将访问热度值低于预设热度值的用户文件进行清理。Clean up user files whose access heat value is lower than the preset heat value.
  6. 根据权利要求1或5所述的方法,其特征在于,在所述将所述用户文件缓存至对应级别的存储区域之后,还包括:The method according to claim 1 or 5, characterized in that, after caching the user file to a storage area of a corresponding level, it further includes:
    根据用户设置的缓存配置信息和在线统计的所述用户文件的文件访问特征,周期性的确定每个用户文件的空间大小变化;Periodically determine changes in the space size of each user file based on the cache configuration information set by the user and the file access characteristics of the user files collected online;
    基于每个用户文件的空间大小变化,将所述用户文件缓存至不同的存储区域,所述存储区域包括所述磁盘缓存以及所述分布式缓存中的其中一种。Based on the change in space size of each user file, the user files are cached in different storage areas, and the storage area includes one of the disk cache and the distributed cache.
  7. 根据权利要求1所述的方法,其特征在于,在所述将所述用户文件缓存至对应级别的存储区域之后,还包括:The method according to claim 1, characterized in that, after caching the user file to a storage area of a corresponding level, it further includes:
    接收针对用户文件的IO请求;Receive IO requests for user files;
    将所述IO请求中的请求用户文件与文件分布哈希表进行匹配,确定所述请求用户文件在所述云主机的存储位置,所述文件分布哈希表包括用户文件与缓存位置的映射关系;Match the requested user file in the IO request with a file distribution hash table to determine the storage location of the requested user file on the cloud host. The file distribution hash table includes a mapping relationship between user files and cache locations. ;
    若确定所述请求用户文件未存储在所述云主机上,通过所述云主机上关联的数据源处获取所述请求用户文件;If it is determined that the requested user file is not stored on the cloud host, obtain the requested user file through the associated data source on the cloud host;
    将所述请求用户文件缓存至对应级别的存储区域,并将所述请求用户文件与所述对应级别的存储区域的映射关系更新到所述文件分布哈希表中;Cache the requested user file to the storage area of the corresponding level, and update the mapping relationship between the requested user file and the storage area of the corresponding level into the file distribution hash table;
    将所述请求用户文件发送给所述IO请求的发送方。Send the requested user file to the sender of the IO request.
  8. 一种云上数据缓存系统,其特征在于,所述系统包括数据源层、云主机上的缓存层、缓存管控层、所述云主机上的缓存客户端以及HPC处理端,其中:A data caching system on the cloud, characterized in that the system includes a data source layer, a cache layer on a cloud host, a cache management and control layer, a cache client on the cloud host and an HPC processing end, wherein:
    所述数据源层包括云上低频文件存储、云上对象存储以及IDC文件存储;The data source layer includes cloud low-frequency file storage, cloud object storage and IDC file storage;
    所述缓存层包括云主机上挂载的文件系统、分布式内存以及虚拟磁盘,所述缓存层用于对高频访问的用户文件进行缓存;The cache layer includes a file system, distributed memory and virtual disk mounted on the cloud host. The cache layer is used to cache frequently accessed user files;
    所述缓存管控层包括缓存配置中心、文件访问特征统计表以及文件分布哈希表,所述缓存管控层用于对缓存用户文件进行管理; The cache management and control layer includes a cache configuration center, a file access characteristic statistics table and a file distribution hash table. The cache management and control layer is used to manage cached user files;
    其中,所述缓存客户端用于为所述HPC处理层提供数据操作接口以及处理IO请求。Wherein, the cache client is used to provide a data operation interface for the HPC processing layer and process IO requests.
  9. 根据权利要求8所述的系统,其特征在于,所述数据源层包括:The system according to claim 8, characterized in that the data source layer includes:
    将低频访问用户文件缓存到所述云上低频文件存储中,其中所述低频访问用户文件为访问频率低于预设频率的用户文件。Cache low-frequency access user files into the low-frequency file storage on the cloud, where the low-frequency access user files are user files whose access frequency is lower than a preset frequency.
  10. 根据权利要求8所述的系统,其特征在于,所述缓存层包括:The system according to claim 8, characterized in that the caching layer includes:
    将从数据源层获取到的高频访问的用户文件中,数据量不低于预设空间阈值的缓存用户文件存储到所述分布式内存中;Among the frequently accessed user files obtained from the data source layer, cached user files whose data volume is not less than the preset space threshold are stored in the distributed memory;
    将从数据源层获取到的高频访问的用户文件中,数据量低于所述预设空间阈值,且改动频率低于预设频率的用户文件存储到所述虚拟磁盘中;Among the frequently accessed user files obtained from the data source layer, the user files whose data volume is lower than the preset space threshold and whose modification frequency is lower than the preset frequency are stored in the virtual disk;
    将从数据源层获取到的高频访问的用户文件中,数据量低于所述预设空间阈值且改动频率不低于所述预设频率的用户文件存储到所述云主机上挂载的文件系统中。Among the frequently accessed user files obtained from the data source layer, the user files whose data volume is lower than the preset space threshold and whose modification frequency is not lower than the preset frequency are stored in the file mounted on the cloud host. in the file system.
  11. 根据权利要求8所述的系统,其特征在于,所述缓存管控层包括:The system according to claim 8, characterized in that the cache management and control layer includes:
    所述缓存配置中心用于维护缓存配置信息并向用户提供缓存控制接口;The cache configuration center is used to maintain cache configuration information and provide cache control interfaces to users;
    所述文件访问特征统计表用于采集所述HPC处理层的文件访问特征,包括文件访问特征包括文件访问热度、访问模式以及用户文件的数据量。The file access characteristic statistics table is used to collect file access characteristics of the HPC processing layer, including file access characteristics including file access popularity, access mode, and data volume of user files.
  12. 一种云上数据缓存装置,其特征在于,所述装置包括:A data caching device on the cloud, characterized in that the device includes:
    部署模块,用于通过云主机的分布式内存、虚拟磁盘以及所述云主机上挂载的文件系统缓存用户文件;Deployment module, used to cache user files through the distributed memory of the cloud host, the virtual disk, and the file system mounted on the cloud host;
    响应模块,用于当获取到待缓存的用户文件时,确定所述用户文件对应的分级特征,并基于所述分级特征,将所述用户文件缓存至对应级别的存储区域,所述分级特征包括访问频率、改动频率以及数据量的至少之一,所述不同级别的存储区域包括所述云主机的分布式内存、虚拟磁盘以及所述云主机上挂载的文件系统。A response module configured to, when obtaining a user file to be cached, determine the hierarchical characteristics corresponding to the user file, and cache the user file to a storage area of a corresponding level based on the hierarchical characteristics, where the hierarchical characteristics include At least one of access frequency, modification frequency and data volume, the different levels of storage areas include distributed memory of the cloud host, virtual disks and file systems mounted on the cloud host.
  13. 一种电子设备,包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序,其特征在于,所述处理器运行所述计算机程序以实现1-11任一项所述的方法。An electronic device, including a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor runs the computer program to implement any one of 1-11 method described in the item.
  14. 一种计算机可读存储介质,其上存储有计算机程序,其特征在于,所述程序被处理器执行实现1-11任一项所述的方法。 A computer-readable storage medium on which a computer program is stored, characterized in that the program is executed by a processor to implement the method described in any one of 1-11.
PCT/CN2023/084183 2022-03-28 2023-03-27 Cloud data caching method and apparatus, device and storage medium WO2023185770A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210313588.0 2022-03-28
CN202210313588.0A CN114840140A (en) 2022-03-28 2022-03-28 On-cloud data caching method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
WO2023185770A1 true WO2023185770A1 (en) 2023-10-05

Family

ID=82564657

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/084183 WO2023185770A1 (en) 2022-03-28 2023-03-27 Cloud data caching method and apparatus, device and storage medium

Country Status (2)

Country Link
CN (1) CN114840140A (en)
WO (1) WO2023185770A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117312328A (en) * 2023-11-28 2023-12-29 金篆信科有限责任公司 Self-adaptive bottom storage configuration method, device, system and medium

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114840140A (en) * 2022-03-28 2022-08-02 阿里巴巴(中国)有限公司 On-cloud data caching method, device, equipment and storage medium
CN116627920B (en) * 2023-07-24 2023-11-07 华能信息技术有限公司 Data storage method based on industrial Internet

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104462240A (en) * 2014-11-18 2015-03-25 浪潮(北京)电子信息产业有限公司 Method and system for realizing hierarchical storage and management in cloud storage
CN109144895A (en) * 2017-06-15 2019-01-04 中兴通讯股份有限公司 A kind of date storage method and device
US20200201768A1 (en) * 2019-08-28 2020-06-25 Intel Corporation Cloud-based frequency-based cache management
CN113742290A (en) * 2021-11-04 2021-12-03 上海闪马智能科技有限公司 Data storage method and device, storage medium and electronic device
CN114840140A (en) * 2022-03-28 2022-08-02 阿里巴巴(中国)有限公司 On-cloud data caching method, device, equipment and storage medium

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP6552196B2 (en) * 2011-08-02 2019-07-31 アジャイ ジャドハブ Cloud-based distributed persistence and cache data model
CN105988721A (en) * 2015-02-10 2016-10-05 中兴通讯股份有限公司 Data caching method and apparatus for network disk client
CN106648464B (en) * 2016-12-22 2020-01-21 柏域信息科技(上海)有限公司 Multi-node mixed block cache data reading and writing method and system based on cloud storage
CN113010514B (en) * 2021-03-01 2024-02-20 中国工商银行股份有限公司 Thermal loading method and device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104462240A (en) * 2014-11-18 2015-03-25 浪潮(北京)电子信息产业有限公司 Method and system for realizing hierarchical storage and management in cloud storage
CN109144895A (en) * 2017-06-15 2019-01-04 中兴通讯股份有限公司 A kind of date storage method and device
US20200201768A1 (en) * 2019-08-28 2020-06-25 Intel Corporation Cloud-based frequency-based cache management
CN113742290A (en) * 2021-11-04 2021-12-03 上海闪马智能科技有限公司 Data storage method and device, storage medium and electronic device
CN114840140A (en) * 2022-03-28 2022-08-02 阿里巴巴(中国)有限公司 On-cloud data caching method, device, equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117312328A (en) * 2023-11-28 2023-12-29 金篆信科有限责任公司 Self-adaptive bottom storage configuration method, device, system and medium
CN117312328B (en) * 2023-11-28 2024-03-01 金篆信科有限责任公司 Self-adaptive bottom storage configuration method, device, system and medium

Also Published As

Publication number Publication date
CN114840140A (en) 2022-08-02

Similar Documents

Publication Publication Date Title
WO2023185770A1 (en) Cloud data caching method and apparatus, device and storage medium
US10324832B2 (en) Address based multi-stream storage device access
KR102504042B1 (en) Access parameter based multi-stream storage device access
US9201794B2 (en) Dynamic hierarchical memory cache awareness within a storage system
US10496613B2 (en) Method for processing input/output request, host, server, and virtual machine
US9996542B2 (en) Cache management in a computerized system
US9652405B1 (en) Persistence of page access heuristics in a memory centric architecture
US7089391B2 (en) Managing a codec engine for memory compression/decompression operations using a data movement engine
JP2819982B2 (en) Multiprocessor system with cache match guarantee function that can specify range
US10409728B2 (en) File access predication using counter based eviction policies at the file and page level
US20130290643A1 (en) Using a cache in a disaggregated memory architecture
WO2019085769A1 (en) Tiered data storage and tiered query method and apparatus
US20120102273A1 (en) Memory agent to access memory blade as part of the cache coherency domain
US20200097183A1 (en) Workload based device access
TW201220197A (en) for improving the safety and reliability of data storage in a virtual machine based on cloud calculation and distributed storage environment
US9612975B2 (en) Page cache device and method for efficient mapping
US20210117333A1 (en) Providing direct data access between accelerators and storage in a computing environment, wherein the direct data access is independent of host cpu and the host cpu transfers object map identifying object of the data
WO2021258881A1 (en) Data management method and system for application, and computer device
US9483523B2 (en) Information processing apparatus, distributed processing system, and distributed processing method
WO2023125524A1 (en) Data storage method and system, storage access configuration method and related device
WO2023045492A1 (en) Data pre-fetching method, and computing node and storage system
CN109254958A (en) Distributed data reading/writing method, equipment and system
WO2022257685A1 (en) Storage system, network interface card, processor, and data access method, apparatus, and system
US11327887B2 (en) Server-side extension of client-side caches
WO2023200502A1 (en) Direct swap caching with zero line optimizations

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23778135

Country of ref document: EP

Kind code of ref document: A1