CN116048425A - Hierarchical caching method, hierarchical caching system and related components - Google Patents

Hierarchical caching method, hierarchical caching system and related components Download PDF

Info

Publication number
CN116048425A
CN116048425A CN202310220769.3A CN202310220769A CN116048425A CN 116048425 A CN116048425 A CN 116048425A CN 202310220769 A CN202310220769 A CN 202310220769A CN 116048425 A CN116048425 A CN 116048425A
Authority
CN
China
Prior art keywords
file
data
operation request
hierarchical
client
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310220769.3A
Other languages
Chinese (zh)
Other versions
CN116048425B (en
Inventor
臧林劼
何怡川
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur Electronic Information Industry Co Ltd
Original Assignee
Inspur Electronic Information Industry Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur Electronic Information Industry Co Ltd filed Critical Inspur Electronic Information Industry Co Ltd
Priority to CN202310220769.3A priority Critical patent/CN116048425B/en
Publication of CN116048425A publication Critical patent/CN116048425A/en
Application granted granted Critical
Publication of CN116048425B publication Critical patent/CN116048425B/en
Priority to PCT/CN2024/080583 priority patent/WO2024183799A1/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0893Caches characterised by their organisation or structure
    • G06F12/0897Caches characterised by their organisation or structure with two or more cache hierarchy levels
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a hierarchical caching method, a hierarchical caching system and related components, which relate to the field of distributed storage, wherein the hierarchical caching method is applied to each computing node of a distributed storage system and comprises the following steps: monitoring a file IO operation request sent by a client to a distributed storage system by using a client process, and redirecting the file IO operation request to a server process when the file IO operation request is monitored; judging whether a target storage position corresponding to the file IO operation request is an aggregation cache layer or not by utilizing a server process; if not, reading data corresponding to the file IO operation request from the bottom layer of the distributed storage system, and caching the data to an aggregation cache layer; if yes, reading the data from the aggregation cache layer, and returning the data to the client process so that the client process returns the data to the client. The method and the device can improve IO performance of massive small file data sets and improve performance bottleneck caused by metadata in high-concurrency metadata intensive file system service.

Description

Hierarchical caching method, hierarchical caching system and related components
Technical Field
The present disclosure relates to the field of distributed storage, and in particular, to a hierarchical caching method, system, and related components.
Background
With the rapid growth of HPC (High Performance Computing, high performance computer group) computing power, large-scale, highly concurrent applications put a great strain on distributed storage system IO (Input Output). Three key elements involved in high performance HPC scenarios: element one, data of each high-performance computing nodeLarge-scale and high-concurrency storage IO is needed for tag metadata attributes, wherein 80% of the IO comprises loading and modifying data sets and is used for HPC training and random data retrieval, and a typical intensive IO model is as follows
Figure SMS_1
The method comprises the steps of carrying out a first treatment on the surface of the The second element is the HPC high-performance calculation process, which comprises data preprocessing, data marking, data compression and the like; and thirdly, synchronously updating the distributed data consistency.
By analyzing three key elements of an HPC scene, it is found that massive small files generated by high-performance computing have high concurrency IO and random access, so that IO read-write performance of a distributed storage system is easy to saturate, for example, a common HPC data set generally comprises 3000 different types of small files exceeding 200 ten thousand, and if a storage IO read-write software stack cannot meet the HPC requirement of large-scale operation, high-performance computing service can be blocked, so that IO performance of the distributed storage system is crucial to the high-performance computing service. The existing technical scheme provides some optimization schemes for improving the performance of storage IO (input/output) aiming at high-performance computing, such as prefetching and caching, however, the adoption of the existing solution to perform large-scale and high-concurrency storage IO on an HPC high-performance computing scene still has a plurality of technical challenges, such as reading intensive high-performance IO aiming at small files, and huge metadata service overhead of a distributed storage system can be generated, so that the data storage efficiency is affected.
Therefore, how to provide a solution to the above technical problem is a problem that a person skilled in the art needs to solve at present.
Disclosure of Invention
The purpose of the application is to provide a hierarchical caching method, a hierarchical caching system and related components, which can improve IO performance of massive small file data sets and improve performance bottlenecks caused by metadata in high-concurrency metadata intensive file system business.
In order to solve the above technical problems, the present application provides a hierarchical caching method, which is applied to each computing node of a distributed storage system, and the hierarchical caching method includes:
monitoring a file IO operation request sent by a client to the distributed storage system by using a client process, and redirecting the file IO operation request to a server process when the file IO operation request is monitored;
judging whether a target storage position corresponding to the file IO operation request is an aggregation cache layer or not by utilizing the server process;
if not, reading data corresponding to the file IO operation request from the bottom layer of the distributed storage system, and caching the data to the aggregation caching layer;
if yes, the data is read from the aggregation cache layer, and the data is returned to the client process, so that the client process returns the data to the client.
Optionally, the process of using the server process to determine whether the target storage location corresponding to the file IO operation request is an aggregation cache layer includes:
inserting the received file IO operation request into a shared queue by using the server process;
in the shared queue, determining whether the data corresponding to the file IO operation request is cached data;
if yes, judging the target storage position corresponding to the file IO operation request as an aggregation cache layer.
Optionally, in the sharing queue, the determining whether the data corresponding to the file IO operation request is cached data includes:
in the shared queue, determining whether the data corresponding to the file IO operation request is cached data or not through a data thread; the data thread is a thread generated when the server process instance is constructed.
Optionally, the hierarchical caching method further includes:
and when the file IO operation request sent by the client to the distributed storage system is monitored, starting the server process, and dynamically constructing the server process instance by using the state of the computing node and the states of other computing nodes adjacent to the computing node.
Optionally, the process of reading the data from the aggregation cache layer includes:
redirecting the file IO operation request to the aggregation cache layer through a data thread so as to read the data from the aggregation cache layer.
Optionally, the method for hierarchical caching further includes, while inserting the received file IO operation request into a shared queue by using the server process:
and configuring the shared queue with a mutual exclusion lock.
Optionally, the shared queue is a FIFO queue.
Optionally, the data corresponding to the file IO operation request includes a file descriptor, a read offset, and a length.
Optionally, the hierarchical caching method further includes:
and determining the storage position of the data in the aggregation cache layer based on the file path and the computing node.
Optionally, the hierarchical caching method further includes:
and broadcasting the file IO operation request to the computing nodes adjacent to the computing node.
Optionally, the hierarchical caching method further includes:
judging whether a data set corresponding to the file IO operation request is larger than the total capacity of a local storage medium or not;
if yes, performing a cache eviction operation and a replacement operation.
Optionally, the hierarchical caching method further includes:
constructing a dynamic link library based on the environment variable; the dynamic link library is used for intercepting the file IO operation request.
Optionally, the redirecting the file IO operation request to the server process includes:
and redirecting the file IO operation request to a server process through a hash algorithm.
Optionally, the aggregate cache layer is a cache layer formed by high-speed storage media in each computing node in the distributed storage system.
Optionally, the high-speed storage medium is an Nvme SSD.
Optionally, the hierarchical caching method further includes:
when the clearing condition is satisfied, the data stored in the medium and high speed storage medium on the present computing node is cleared.
In order to solve the above technical problem, the present application further provides a hierarchical cache system, which is applied to each computing node of a distributed storage system, where the hierarchical cache system includes:
the monitoring module is used for monitoring a file IO operation request sent by a client to the distributed storage system by using a client process, and redirecting the file IO operation request to a server process when the file IO operation request is monitored;
The processing module is used for judging whether the target storage position corresponding to the file IO operation request is an aggregation cache layer or not by utilizing the server process, if not, triggering the first reading module, and if so, triggering the second reading module;
the first reading module is used for reading data corresponding to the file IO operation request from the bottom layer of the distributed storage system and caching the data to the aggregation caching layer;
and the second reading module is used for reading the data from the aggregation cache layer and returning the data to the client process so that the client process returns the data to the client.
In order to solve the above technical problem, the present application further provides an electronic device, including:
a memory for storing a computer program;
a processor for implementing the steps of the hierarchical caching method as claimed in any one of the preceding claims when executing said computer program.
In order to solve the technical problem, the present application further provides a distributed storage system, including a storage bottom layer module and a plurality of nodes, each node includes a layered client process, a layered server process and a storage medium, and the storage medium of each node forms an aggregation cache layer, where:
The hierarchical client process is used for monitoring a file IO operation request sent by a client, and redirecting the file IO operation request to the hierarchical server process when the file IO operation request is monitored;
the hierarchical server process is configured to determine whether a target storage location corresponding to the file IO operation request is the aggregation cache layer, if not, read data corresponding to the file IO operation request from the storage bottom layer module, and send the data to the aggregation cache layer, if yes, read the data from the aggregation cache layer, and return the data to the hierarchical client process, so that the hierarchical client process returns the data to the client;
the aggregation cache layer is used for storing data sent by the layered server process.
To solve the above technical problem, the present application further provides a computer readable storage medium, on which a computer program is stored, the computer program implementing the steps of the hierarchical caching method as described in any one of the above when being executed by a processor.
The application provides a hierarchical caching method, after a client process monitors a file IO operation request, the file IO operation request is redirected to a server process, the server process firstly searches files in an aggregation caching layer, and when the aggregation caching layer is missed, data is searched from a distributed storage system bottom layer, so that IO performance of a massive small file data set is improved, and performance bottleneck caused by metadata in high-concurrency metadata intensive file system business is improved. The application also provides a layered cache system, electronic equipment, a distributed storage system and a computer readable storage medium, which have the same beneficial effects as the layered cache method.
Drawings
For a clearer description of the embodiments of the present application, the drawings that are needed in the embodiments will be briefly described, it being apparent that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flowchart illustrating steps of a hierarchical caching method provided in the present application;
FIG. 2 is a schematic architecture diagram of a hierarchical cache system provided herein;
FIG. 3 is a schematic diagram of a hierarchical cache framework of a distributed storage system provided herein;
fig. 4 is a schematic structural diagram of a hierarchical cache system provided in the present application.
Detailed Description
The core of the application is to provide a hierarchical caching method, a hierarchical caching system and related components, which can improve IO performance of massive small file data sets and improve performance bottlenecks caused by metadata in high-concurrency metadata intensive file system business.
For the purposes of making the objects, technical solutions and advantages of the embodiments of the present application more clear, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.
Referring to fig. 1, fig. 1 is a flowchart illustrating steps of a hierarchical caching method provided in the present application, where the hierarchical caching method includes:
s101: monitoring a file IO operation request sent by a client to a distributed storage system by using a client process, and redirecting the file IO operation request to a server process when the file IO operation request is monitored;
the embodiment provides an aggregation cache layer, which is a transparent read-only cache layer, aiming at a typical intensive IO model
Figure SMS_2
Constructing a cache by aggregating distributed cluster nodesThe local and adjacent nodes store locally to accelerate the performance of reading the stored data, so as to improve the IO performance of massive small file data sets.
The client accesses the distributed storage system through a POSIX file system interface provided by the distributed storage system so as to accelerate the storage IO access performance of an HPC high-performance scene, wherein the scene has read-only data with high re-reading rate and a typical intensive IO model
Figure SMS_3
Characteristics are that. Referring to fig. 2, the architecture of the hierarchical cache system provided in the present application is composed of two main components: and when the job is distributed on a group of computing nodes on the HPC, the hierarchical cache server process is started, and a server process instance is dynamically constructed by utilizing local storage of the distributed storage nodes and adjacent nodes. Each node of the distributed storage system deploys a hierarchical cache client process and a server process, and the processes can cache data of the HPC high-performance computing job request to the Nvme SSD high-speed storage medium device of the node.
Specifically, describing the hierarchical caching process of the present application, referring to fig. 3, a client process is preloaded first, and file system operations, such as open, read and close, are monitored and intercepted by the client process, so that file system calls are intercepted by the process, and no modification is required to an existing high-performance computing application program or a bottom file system of a distributed storage system. It can be understood that the hierarchical cache client process is composed of a file system IO interface forwarding module, and the interface captures file system calls to the distributed storage system and redirects to the corresponding hierarchical cache server process, so that the hierarchical cache client process firstly reads hit data through the aggregation cache layer, and is beneficial to high-performance business performance requirements.
S102: judging whether a target storage position corresponding to the file IO operation request is an aggregation cache layer or not by utilizing a server process, if not, executing S103, and if so, executing S104;
s103: reading data corresponding to the file IO operation request from the bottom layer of the distributed storage system, and caching the data to an aggregation caching layer;
s104: the data is read from the aggregate cache layer and returned to the client process so that the client process returns the data to the client.
After receiving the file IO operation request intercepted by the client process, the server process retrieves the file from the bottom layer of the distributed storage system only when the cache is not hit, and it can be understood that the description is respectively directed at two reading scenes, wherein the two reading scenes comprise first reading and non-first reading.
For the first read:
the method comprises the steps that a computing node client of an HPC high-performance scene initiates a read request to a data set catalog on a distributed storage system, a hierarchical cache client process intercepts any incoming file IO operation request and starts tracking in the data set catalog, an RPC (remote procedure call) processing program of the hierarchical cache client process redirects the requested file IO operation request to a corresponding hierarchical cache server process, and internal management RPC processing programs of the hierarchical cache client process and the server process are responsible for sending and receiving messages through a network.
As an optional embodiment, the process of determining, by using the server process, whether the target storage location corresponding to the file IO operation request is an aggregate cache layer includes: inserting the received file IO operation request into a shared queue by using a server process; in the shared queue, determining whether data corresponding to the file IO operation request is cached data; if yes, the target storage position corresponding to the file IO operation request is judged to be an aggregation cache layer.
As an optional embodiment, in the shared queue, the process of determining whether the data corresponding to the file IO operation request is cached data includes: in the shared queue, determining whether data corresponding to the file IO operation request is cached data or not through a data thread; the data thread is a thread generated when the server process instance is constructed.
Specifically, when the hierarchical cache Server process receives a file IO operation request, the RPC handler inserts the forwarded file IO into a shared FIFO (First Input First Output, first-in first-out queue) queue, and in the shared FIFO queue, whether the file is cached is checked by a move-data thread, and because the file is read for the first time, the data needs to be pulled into an aggregate cache, and the cached file descriptor, the read offset and the length are determined.
For non-first reads:
as an alternative embodiment, the process of reading data from the aggregate cache layer includes:
the file IO operation request is redirected to the aggregate cache layer by the data thread to read data from the aggregate cache layer.
Specifically, the data thread redirects the IO to the aggregation cache to read the file, returns the file descriptor to the corresponding hierarchical cache Client process, and then the process returns the file descriptor, the read offset, the length and other data to the HPC application program, namely the HPC high performance Client request end. As can be seen from the file IO operation request, in this embodiment, after the client process monitors the file IO operation request, the client process redirects the file IO operation request to the server process, and the server process retrieves the file in the aggregation cache layer first, and retrieves the data from the bottom layer of the distributed storage system when the aggregation cache layer is missed, so as to improve the IO performance of the massive small file data set, and improve the performance bottleneck caused by metadata in the high-concurrency metadata intensive file system service.
It can be seen that, in this embodiment, after the client process monitors the file IO operation request, the client process redirects the file IO operation request to the server process, and the server process retrieves the file in the aggregation cache layer first, and retrieves the data from the bottom layer of the distributed storage system when the aggregation cache layer is missed, so as to improve the IO performance of the massive small file data set, and improve the performance bottleneck caused by metadata in the high-concurrency metadata intensive file system service.
Based on the above embodiments:
as an optional embodiment, while inserting the received file IO operation request into the shared queue by using the server process, the hierarchical caching method further includes:
the shared queue is configured with a mutex lock.
In particular, it is contemplated that multiple hierarchical cache client processes may request a single file at the same time, thus using exclusive locks on the shared queue to ensure consistency and avoid duplication of files into the aggregate cache.
As an alternative embodiment, the hierarchical caching method further includes:
the storage position of the data in the aggregation cache layer is determined based on the file path and the affiliated computing node.
As an alternative embodiment, the hierarchical caching method further includes:
And broadcasting the file IO operation request to the computing nodes adjacent to the computing node.
Specifically, the hierarchical caching process broadcasts a request to find a file to neighboring nodes, helping to balance the load pressure between the nodes.
As an alternative embodiment, the hierarchical caching method further includes:
judging whether a data set corresponding to the file IO operation request is larger than the total capacity of the local storage medium or not;
if yes, performing a cache eviction operation and a replacement operation.
Specifically, the aggregate cache elimination mechanism performs cache eviction and replacement based on repeated read dataset operations if the dataset is greater than the total capacity of the node's local cache.
As an alternative embodiment, the hierarchical caching method further includes:
constructing a dynamic link library based on the environment variable; the dynamic link library is used for intercepting file IO operation requests.
Specifically, the embodiment makes a redirection only read aggregation cache, namely, intercepts a read IO mechanism, and based on an initial prototype of a high-performance scene operation of the distributed storage system, the aggregation cache layer of the embodiment is beneficial to analyzing a high-performance service scene, in particular, IO read calls in three key elements so as to know how a data loader in a framework accesses files. The embodiment designs the aggregation cache layer to intercept related function calls of IO, and adopts the same function mechanism selectively loaded into different dynamic link libraries to construct, and the mechanism avoids the necessity of forcing an application program to modify the code library to support the aggregation cache layer.
Specifically, the intercepting Read IO mechanism of the present embodiment is redirected to a dynamic link library only_read_performance. So, and the technical optimization point of the dynamic link library is that once the function of the dynamic library changes, the intercepting Read IO mechanism is transparent to the executable program, and the executable program does not need to be recompiled. For statically linked programs in other technologies, a small change in the function library requires recompilation and release of the entire program. Where a static link is to compile all referenced functions or variables into an executable file. Dynamic linking does not compile the functions into an executable file, but rather loads the library of functions dynamically at program runtime, i.e., the run-link. Therefore, redirection to the dynamic link library only_read_performance. So, compatibility and portability are provided, and the method has important value for the distributed storage system.
The specific embodiment of the intercept read IO mechanism steps are as follows:
(1) Aiming at HPC high-performance computing job client requests, file system requests meeting standard POSIX semantics are met, and typical intensive IO models are met
Figure SMS_4
Calling to access an underlying distributed storage file system;
(2) The environment variable LD_PRELOAD of the Linux server of the distributed storage system is used, and is characterized in that the dynamic library is loaded, and the priority is highest, so that the method is an embodiment method for intercepting read request processing logic;
(3) Input of the read IO mechanism is intercepted:
a.
Figure SMS_5
a file system call;
ld_reload environment variable;
c. a dynamic link library in the local aggregation cache layer is marked as only_read_performance. So;
(4) Output of the read IO mechanism is intercepted:
and executing the only_read_performance. So, and performing Read cache logic processing at the cache aggregation layer.
As an alternative embodiment, the process of redirecting the file IO operation request to the server process includes:
and redirecting the file IO operation request to the server process through a hash algorithm.
The redirection is performed through the hash algorithm, so that the bottleneck of metadata searching can be avoided, and the aim is to improve random reading performance. The hierarchical cache Client process uses Hash redirection IO to find a cache on the hierarchical cache Server process, in order not to store cached file metadata in a distributed metadata store or in-memory database. In the aggregate cache, file cache locations are determined using file paths and the affiliated nodes.
As an alternative embodiment, the aggregate cache layer is a cache layer comprised of high-speed storage media in each computing node in the distributed storage system.
As an alternative embodiment, the high-speed storage medium is Nvme (Non-Volatile Memory express, nonvolatile memory host controller interface specification) SSD (Solid State Disk).
As an alternative embodiment, the hierarchical caching method further includes:
when the clearing condition is satisfied, the data stored in the medium and high speed storage medium on the present computing node is cleared.
In this embodiment, the lifecycle of the data set in the cache is coupled to the lifecycle of the job on the HPC high performance client, and after the job is completed, the cached data set is purged from the node local storage.
In summary, the metadata module and the data module storage mechanism of the aggregate cache layer and the distributed storage system provided in the present application are independent from each other, and are transparent read-only cache layers, and are specific to typical dense IO models
Figure SMS_6
Constructing a cache, and accelerating the performance of reading and storing data by adopting the local storage of the aggregation distributed cluster nodes and the local storage of adjacent nodes based on an RPC remote procedure call mechanism so as to improve the IO performance of a massive small file data set; determining a cache position of a data request by adopting a distributed hash algorithm IO redirection, designing a highest-priority dynamic link library only_read_performance. So based on an LD_PRELOAD environment variable to intercept the Read IO request, and avoiding bottleneck caused by searching metadata of a distributed storage system, wherein the aim is to improve random reading performance; meanwhile, the aggregation cache layer and the distributed storage system are mutually independent, so that the method has the characteristics of portability and universality.
In terms of performance, the performance problem of the high concurrency IO model of the HPC high performance scene is effectively improved, and performance bottleneck caused by metadata in high concurrency metadata intensive file system service is improved; in terms of stability, the cache aggregation layer is independent of the distributed storage bottom layer system, the storage medium fault condition of the cache aggregation layer occurs, normal service call is not affected, and the cache aggregation layer has stability; in terms of safety, the layered cache architecture is loosely coupled with the distributed storage system, so that safety risks are avoided; in terms of cost, the problem of the performance of the universal storage IO of the HPC high-performance scene is solved, and the competitiveness and maintenance cost of distributed file storage can be improved; in terms of compatibility, the method and the device have portability and universality, do not need to modify HPC operation application programs, improve linear expansibility of the distributed clusters, and are compatible with common file quota, snapshot and other characteristics.
In a second aspect, referring to fig. 4, fig. 4 is a schematic structural diagram of a hierarchical cache system provided in the present application, which is applied to each computing node of a distributed storage system, the hierarchical cache system includes:
the monitoring module 41 is configured to monitor, by using a client process, a file IO operation request sent by a client to the distributed storage system, and redirect the file IO operation request to a server process when the file IO operation request is monitored;
The processing module 42 is configured to determine, by using a server process, whether a target storage location corresponding to the file IO operation request is an aggregation cache layer, and if not, trigger the first reading module 43, and if yes, trigger the second reading module 44;
the first reading module 43 is configured to read data corresponding to the file IO operation request from the bottom layer of the distributed storage system, and cache the data to the aggregation cache layer;
a second reading module 44, configured to read the data from the aggregation cache layer and return the data to the client process, so that the client process returns the data to the client.
The embodiment provides an aggregation cache layer, which is a transparent read-only cache layer, aiming at a typical intensive IO model
Figure SMS_7
And constructing a cache, and accelerating the performance of reading and storing data by aggregating the local storage of the distributed cluster nodes and the local storage of adjacent nodes so as to improve the IO performance of massive small file data sets.
The client accesses the distributed storage system through a POSIX file system interface provided by the distributed storage system so as to accelerate the storage IO access performance of an HPC high-performance scene, wherein the scene has read-only data with high re-reading rate and a typical intensive IO model
Figure SMS_8
Characteristics are that. Referring to fig. 2, the architecture of the hierarchical cache system provided in the present application is composed of two main components: and when the job is distributed on a group of computing nodes on the HPC, the hierarchical cache server process is started, and a server process instance is dynamically constructed by utilizing local storage of the distributed storage nodes and adjacent nodes. Each node of the distributed storage system deploys a hierarchical cache client process and a server process that operate on HPC high performance computing jobsThe requested data is cached on the Nvme SSD high-speed storage media device of the node.
Specifically, describing the hierarchical caching process of the present application, referring to fig. 3, a client process is preloaded first, and file system operations, such as open, read and close, are monitored and intercepted by the client process, so that file system calls are intercepted by the process, and no modification is required to an existing high-performance computing application program or a bottom file system of a distributed storage system. It can be understood that the hierarchical cache client process is composed of a file system IO interface forwarding module, and the interface captures file system calls to the distributed storage system and redirects to the corresponding hierarchical cache server process, so that the hierarchical cache client process firstly reads hit data through the aggregation cache layer, and is beneficial to high-performance business performance requirements.
After receiving the file IO operation request intercepted by the client process, the server process retrieves the file from the bottom layer of the distributed storage system only when the cache is not hit, and it can be understood that the description is respectively directed at two reading scenes, wherein the two reading scenes comprise first reading and non-first reading.
For the first read:
the method comprises the steps that a computing node client of an HPC high-performance scene initiates a read request to a data set catalog on a distributed storage system, a hierarchical cache client process intercepts any incoming file IO operation request and starts tracking in the data set catalog, an RPC (remote procedure call) processing program of the hierarchical cache client process redirects the requested file IO operation request to a corresponding hierarchical cache server process, and internal management RPC processing programs of the hierarchical cache client process and the server process are responsible for sending and receiving messages through a network.
As an optional embodiment, the process of determining, by using the server process, whether the target storage location corresponding to the file IO operation request is an aggregate cache layer includes: inserting the received file IO operation request into a shared queue by using a server process; in the shared queue, determining whether data corresponding to the file IO operation request is cached data; if yes, the target storage position corresponding to the file IO operation request is judged to be an aggregation cache layer. As an optional embodiment, in the shared queue, the process of determining whether the data corresponding to the file IO operation request is cached data includes: in the shared queue, determining whether data corresponding to the file IO operation request is cached data or not through a data thread; the data thread is a thread generated when the server process instance is constructed.
Specifically, when the hierarchical cache Server process receives a file IO operation request, the RPC processing program inserts the forwarded file IO into a shared FIFO queue, and in the shared FIFO queue, whether the file is cached is checked by a move-data thread, and because the file is read for the first time, the data needs to be pulled into an aggregation cache, and the cached file descriptor, the reading offset and the length are determined.
For non-first reads:
as an alternative embodiment, the process of reading data from the aggregate cache layer includes:
the file IO operation request is redirected to the aggregate cache layer by the data thread to read data from the aggregate cache layer.
Specifically, the data thread redirects the IO to the aggregation cache to read the file, returns the file descriptor to the corresponding hierarchical cache Client process, and then the process returns the file descriptor, the read offset, the length and other data to the HPC application program, namely the HPC high performance Client request end. As can be seen from the file IO operation request, in this embodiment, after the client process monitors the file IO operation request, the client process redirects the file IO operation request to the server process, and the server process retrieves the file in the aggregation cache layer first, and retrieves the data from the bottom layer of the distributed storage system when the aggregation cache layer is missed, so as to improve the IO performance of the massive small file data set, and improve the performance bottleneck caused by metadata in the high-concurrency metadata intensive file system service.
It can be seen that, in this embodiment, after the client process monitors the file IO operation request, the client process redirects the file IO operation request to the server process, and the server process retrieves the file in the aggregation cache layer first, and retrieves the data from the bottom layer of the distributed storage system when the aggregation cache layer is missed, so as to improve the IO performance of the massive small file data set, and improve the performance bottleneck caused by metadata in the high-concurrency metadata intensive file system service.
As an optional embodiment, the process of determining, by using the server process, whether the target storage location corresponding to the file IO operation request is an aggregate cache layer includes:
inserting the received file IO operation request into a shared queue by using a server process;
in the shared queue, determining whether data corresponding to the file IO operation request is cached data;
if yes, the target storage position corresponding to the file IO operation request is judged to be an aggregation cache layer.
As an optional embodiment, in the shared queue, the process of determining whether the data corresponding to the file IO operation request is cached data includes:
in the shared queue, determining whether data corresponding to the file IO operation request is cached data or not through a data thread; the data thread is a thread generated when the server process instance is constructed.
As an alternative embodiment, the hierarchical caching system further comprises:
and the preprocessing module is used for starting a server process when the monitoring client sends a file IO operation request to the distributed storage system, and dynamically constructing a server process instance by utilizing the state of the computing node and the states of other computing nodes adjacent to the computing node.
As an alternative embodiment, the process of reading data from the aggregate cache layer includes:
the file IO operation request is redirected to the aggregate cache layer by the data thread to read data from the aggregate cache layer.
As an optional embodiment, while inserting the received file IO operation request into the shared queue by the server process, the hierarchical cache system further includes:
and the configuration module is used for configuring the shared queue with the mutual exclusion lock.
As an alternative embodiment, the shared queue is a FIFO queue.
As an alternative embodiment, the data corresponding to the file IO operation request includes a file descriptor, a read offset, and a length.
As an alternative embodiment, the hierarchical caching system further comprises:
and the determining module is used for determining the storage position of the data in the aggregation cache layer based on the file path and the computing node.
As an alternative embodiment, the hierarchical caching system further comprises:
and the broadcasting module is used for broadcasting the file IO operation request to the computing nodes adjacent to the computing node.
As an alternative embodiment, the hierarchical caching system further comprises:
and the judging module is used for judging whether the data set corresponding to the file IO operation request is larger than the total capacity of the local storage medium, and if so, executing cache eviction operation and replacement operation.
As an alternative embodiment, the hierarchical caching system further comprises:
the construction module is used for constructing a dynamic link library based on the environment variable; the dynamic link library is used for intercepting file IO operation requests.
As an alternative embodiment, the process of redirecting the file IO operation request to the server process includes:
and redirecting the file IO operation request to the server process through a hash algorithm.
As an alternative embodiment, the aggregate cache layer is a cache layer comprised of high-speed storage media in each computing node in the distributed storage system.
As an alternative embodiment, the high-speed storage medium is an Nvme SSD.
As an alternative embodiment, the hierarchical caching system further comprises:
and the clearing module is used for clearing the data stored in the medium-high speed storage medium on the computing node when the clearing condition is met.
In a third aspect, the present application further provides an electronic device, including:
a memory for storing a computer program;
a processor for implementing the steps of the hierarchical caching method as any one of the above when executing a computer program.
Specifically, the memory includes a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and computer readable instructions, and the internal memory provides an environment for the operating system and the execution of the computer readable instructions in the non-volatile storage medium. The processor provides computing and control capabilities for the electronic device, and when executing the computer program stored in the memory, the following steps may be implemented: monitoring a file IO operation request sent by a client to a distributed storage system by using a client process, and redirecting the file IO operation request to a server process when the file IO operation request is monitored; judging whether a target storage position corresponding to the file IO operation request is an aggregation cache layer or not by utilizing a server process; if not, reading data corresponding to the file IO operation request from the bottom layer of the distributed storage system, and caching the data to an aggregation cache layer; if yes, reading the data from the aggregation cache layer, and returning the data to the client process so that the client process returns the data to the client.
It can be seen that, in this embodiment, after the client process monitors the file IO operation request, the client process redirects the file IO operation request to the server process, and the server process retrieves the file in the aggregation cache layer first, and retrieves the data from the bottom layer of the distributed storage system when the aggregation cache layer is missed, so as to improve the IO performance of the massive small file data set, and improve the performance bottleneck caused by metadata in the high-concurrency metadata intensive file system service.
As an alternative embodiment, the processor may implement the following steps when executing the computer subroutine stored in the memory: inserting the received file IO operation request into a shared queue by using a server process; in the shared queue, determining whether data corresponding to the file IO operation request is cached data; if yes, the target storage position corresponding to the file IO operation request is judged to be an aggregation cache layer.
As an alternative embodiment, the processor may implement the following steps when executing the computer subroutine stored in the memory: in the shared queue, determining whether data corresponding to the file IO operation request is cached data or not through a data thread; the data thread is a thread generated when the server process instance is constructed.
As an alternative embodiment, the processor may implement the following steps when executing the computer subroutine stored in the memory: and when the monitoring client sends a file IO operation request to the distributed storage system, starting a server process, and dynamically constructing a server process instance by using the state of the computing node and the states of other computing nodes adjacent to the computing node.
As an alternative embodiment, the processor may implement the following steps when executing the computer subroutine stored in the memory: the file IO operation request is redirected to the aggregate cache layer by the data thread to read data from the aggregate cache layer.
As an alternative embodiment, the processor may implement the following steps when executing the computer subroutine stored in the memory: the shared queue is configured with a mutex lock.
As an alternative embodiment, the processor may implement the following steps when executing the computer subroutine stored in the memory: the storage position of the data in the aggregation cache layer is determined based on the file path and the affiliated computing node.
As an alternative embodiment, the processor may implement the following steps when executing the computer subroutine stored in the memory: and broadcasting the file IO operation request to the computing nodes adjacent to the computing node.
As an alternative embodiment, the processor may implement the following steps when executing the computer subroutine stored in the memory: judging whether a data set corresponding to the file IO operation request is larger than the total capacity of the local storage medium or not; if yes, performing a cache eviction operation and a replacement operation.
As an alternative embodiment, the processor may implement the following steps when executing the computer subroutine stored in the memory: constructing a dynamic link library based on the environment variable; the dynamic link library is used for intercepting file IO operation requests.
As an alternative embodiment, the processor may implement the following steps when executing the computer subroutine stored in the memory: and redirecting the file IO operation request to the server process through a hash algorithm.
As an alternative embodiment, the processor may implement the following steps when executing the computer subroutine stored in the memory: when the clearing condition is satisfied, the data stored in the medium and high speed storage medium on the present computing node is cleared.
On the basis of the above embodiment, as a preferred implementation manner, the electronic device further includes:
the input interface is connected with the processor and used for acquiring the externally imported computer programs, parameters and instructions, and the externally imported computer programs, parameters and instructions are controlled by the processor and stored in the memory. The input interface may be coupled to an input device for receiving parameters or instructions manually entered by a user. The input device can be a touch layer covered on a display screen, or can be a key, a track ball or a touch pad arranged on a terminal shell.
And the display unit is connected with the processor and used for displaying the data sent by the processor. The display unit may be a liquid crystal display or an electronic ink display, etc.
And the network port is connected with the processor and used for carrying out communication connection with external terminal equipment. The communication technology adopted by the communication connection can be a wired communication technology or a wireless communication technology, such as a mobile high definition link technology (MHL), a Universal Serial Bus (USB), a High Definition Multimedia Interface (HDMI), a wireless fidelity technology (WiFi), a Bluetooth communication technology with low power consumption, a communication technology based on IEEE802.11s, and the like.
In a fourth aspect, the present application further provides a distributed storage system, including a storage bottom layer module and a plurality of nodes, each node includes a hierarchical client process, a hierarchical server process, and a storage medium, where the storage medium of each node forms an aggregate cache layer, where:
the layering client process is used for monitoring a file IO operation request sent by the client, and redirecting the file IO operation request to the layering server process when the file IO operation request is monitored;
the hierarchical server process is used for judging whether a target storage position corresponding to the file IO operation request is an aggregation cache layer or not, if not, reading data corresponding to the file IO operation request from the storage bottom layer module, sending the data to the aggregation cache layer, and if so, reading the data from the aggregation cache layer, and returning the data to the hierarchical client process, so that the hierarchical client process returns the data to the client;
And the aggregation cache layer is used for storing data sent by the layered server process.
In a fifth aspect, the present application further provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the hierarchical caching method as described in any one of the above.
In particular, the computer-readable storage medium may include: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes. The storage medium has stored thereon a computer program which, when executed by a processor, performs the steps of: monitoring a file IO operation request sent by a client to a distributed storage system by using a client process, and redirecting the file IO operation request to a server process when the file IO operation request is monitored; judging whether a target storage position corresponding to the file IO operation request is an aggregation cache layer or not by utilizing a server process; if not, reading data corresponding to the file IO operation request from the bottom layer of the distributed storage system, and caching the data to an aggregation cache layer; if yes, reading the data from the aggregation cache layer, and returning the data to the client process so that the client process returns the data to the client.
It can be seen that, in this embodiment, after the client process monitors the file IO operation request, the client process redirects the file IO operation request to the server process, and the server process retrieves the file in the aggregation cache layer first, and retrieves the data from the bottom layer of the distributed storage system when the aggregation cache layer is missed, so as to improve the IO performance of the massive small file data set, and improve the performance bottleneck caused by metadata in the high-concurrency metadata intensive file system service.
As an alternative embodiment, the following steps may be implemented in particular when a computer subroutine stored in a computer readable storage medium is executed by a processor: inserting the received file IO operation request into a shared queue by using a server process; in the shared queue, determining whether data corresponding to the file IO operation request is cached data; if yes, the target storage position corresponding to the file IO operation request is judged to be an aggregation cache layer.
As an alternative embodiment, the following steps may be implemented in particular when a computer subroutine stored in a computer readable storage medium is executed by a processor: in the shared queue, determining whether data corresponding to the file IO operation request is cached data or not through a data thread; the data thread is a thread generated when the server process instance is constructed.
As an alternative embodiment, the following steps may be implemented in particular when a computer subroutine stored in a computer readable storage medium is executed by a processor: and when the monitoring client sends a file IO operation request to the distributed storage system, starting a server process, and dynamically constructing a server process instance by using the state of the computing node and the states of other computing nodes adjacent to the computing node.
As an alternative embodiment, the following steps may be implemented in particular when a computer subroutine stored in a computer readable storage medium is executed by a processor: the file IO operation request is redirected to the aggregate cache layer by the data thread to read data from the aggregate cache layer.
As an alternative embodiment, the following steps may be implemented in particular when a computer subroutine stored in a computer readable storage medium is executed by a processor: the shared queue is configured with a mutex lock.
As an alternative embodiment, the following steps may be implemented in particular when a computer subroutine stored in a computer readable storage medium is executed by a processor: the storage position of the data in the aggregation cache layer is determined based on the file path and the affiliated computing node.
As an alternative embodiment, the following steps may be implemented in particular when a computer subroutine stored in a computer readable storage medium is executed by a processor: and broadcasting the file IO operation request to the computing nodes adjacent to the computing node.
As an alternative embodiment, the following steps may be implemented in particular when a computer subroutine stored in a computer readable storage medium is executed by a processor: judging whether a data set corresponding to the file IO operation request is larger than the total capacity of the local storage medium or not; if yes, performing a cache eviction operation and a replacement operation.
As an alternative embodiment, the following steps may be implemented in particular when a computer subroutine stored in a computer readable storage medium is executed by a processor: constructing a dynamic link library based on the environment variable; the dynamic link library is used for intercepting file IO operation requests.
As an alternative embodiment, the following steps may be implemented in particular when a computer subroutine stored in a computer readable storage medium is executed by a processor: and redirecting the file IO operation request to the server process through a hash algorithm.
As an alternative embodiment, the following steps may be implemented in particular when a computer subroutine stored in a computer readable storage medium is executed by a processor: when the clearing condition is satisfied, the data stored in the medium and high speed storage medium on the present computing node is cleared.
It should also be noted that in this specification, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (20)

1. A hierarchical caching method applied to each computing node of a distributed storage system, the hierarchical caching method comprising:
monitoring a file IO operation request sent by a client to the distributed storage system by using a client process, and redirecting the file IO operation request to a server process when the file IO operation request is monitored;
judging whether a target storage position corresponding to the file IO operation request is an aggregation cache layer or not by utilizing the server process;
if not, reading data corresponding to the file IO operation request from the bottom layer of the distributed storage system, and caching the data to the aggregation caching layer;
if yes, the data is read from the aggregation cache layer, and the data is returned to the client process, so that the client process returns the data to the client.
2. The hierarchical caching method according to claim 1, wherein the process of determining, by the server process, whether the target storage location corresponding to the file IO operation request is an aggregate cache layer includes:
inserting the received file IO operation request into a shared queue by using the server process;
In the shared queue, determining whether the data corresponding to the file IO operation request is cached data;
if yes, judging the target storage position corresponding to the file IO operation request as an aggregation cache layer.
3. The hierarchical caching method according to claim 2, wherein in the shared queue, determining whether the data corresponding to the file IO operation request is cached data comprises:
in the shared queue, determining whether the data corresponding to the file IO operation request is cached data or not through a data thread; the data thread is a thread generated when the server process instance is constructed.
4. The hierarchical caching method of claim 3, further comprising:
and when the file IO operation request sent by the client to the distributed storage system is monitored, starting the server process, and dynamically constructing the server process instance by using the state of the computing node and the states of other computing nodes adjacent to the computing node.
5. A hierarchical caching method according to claim 3, wherein the process of reading the data from the aggregate cache layer comprises:
Redirecting the file IO operation request to the aggregation cache layer through a data thread so as to read the data from the aggregation cache layer.
6. The hierarchical caching method according to claim 2, wherein the method further comprises, while inserting the received file IO operation request into a shared queue with the server process:
and configuring the shared queue with a mutual exclusion lock.
7. The hierarchical caching method of claim 2, wherein the shared queue is a FIFO queue.
8. The hierarchical caching method according to claim 1, wherein the data corresponding to the file IO operation request includes a file descriptor, a read offset, and a length.
9. The hierarchical caching method of claim 1, further comprising:
and determining the storage position of the data in the aggregation cache layer based on the file path and the computing node.
10. The hierarchical caching method of claim 1, further comprising:
and broadcasting the file IO operation request to the computing nodes adjacent to the computing node.
11. The hierarchical caching method of claim 1, further comprising:
judging whether a data set corresponding to the file IO operation request is larger than the total capacity of a local storage medium or not;
if yes, performing a cache eviction operation and a replacement operation.
12. The hierarchical caching method of claim 1, further comprising:
constructing a dynamic link library based on the environment variable; the dynamic link library is used for intercepting the file IO operation request.
13. The hierarchical caching method of claim 1, wherein redirecting the file IO operation request to a server process comprises:
and redirecting the file IO operation request to a server process through a hash algorithm.
14. The hierarchical caching method according to any one of claims 1-13, wherein the aggregate cache layer is a cache layer comprised of high-speed storage media in each of the computing nodes in the distributed storage system.
15. The hierarchical caching method of claim 14, wherein the high-speed storage medium is an Nvme SSD.
16. The hierarchical caching method of claim 14, further comprising:
When the clearing condition is satisfied, the data stored in the medium and high speed storage medium on the present computing node is cleared.
17. A hierarchical caching system for each computing node of a distributed storage system, the hierarchical caching system comprising:
the monitoring module is used for monitoring a file IO operation request sent by a client to the distributed storage system by using a client process, and redirecting the file IO operation request to a server process when the file IO operation request is monitored;
the processing module is used for judging whether the target storage position corresponding to the file IO operation request is an aggregation cache layer or not by utilizing the server process, if not, triggering the first reading module, and if so, triggering the second reading module;
the first reading module is used for reading data corresponding to the file IO operation request from the bottom layer of the distributed storage system and caching the data to the aggregation caching layer;
and the second reading module is used for reading the data from the aggregation cache layer and returning the data to the client process so that the client process returns the data to the client.
18. An electronic device, comprising:
A memory for storing a computer program;
a processor for implementing the steps of the hierarchical caching method according to any one of claims 1-16 when executing said computer program.
19. The distributed storage system is characterized by comprising a storage bottom layer module and a plurality of nodes, wherein each node comprises a layered client process, a layered server process and a storage medium, and the storage medium of each node forms an aggregation cache layer, wherein:
the hierarchical client process is used for monitoring a file IO operation request sent by a client, and redirecting the file IO operation request to the hierarchical server process when the file IO operation request is monitored;
the hierarchical server process is configured to determine whether a target storage location corresponding to the file IO operation request is the aggregation cache layer, if not, read data corresponding to the file IO operation request from the storage bottom layer module, and send the data to the aggregation cache layer, if yes, read the data from the aggregation cache layer, and return the data to the hierarchical client process, so that the hierarchical client process returns the data to the client;
The aggregation cache layer is used for storing data sent by the layered server process.
20. A computer readable storage medium, characterized in that the computer readable storage medium has stored thereon a computer program which, when executed by a processor, implements the steps of the hierarchical caching method according to any one of claims 1-16.
CN202310220769.3A 2023-03-09 2023-03-09 Hierarchical caching method, hierarchical caching system and related components Active CN116048425B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202310220769.3A CN116048425B (en) 2023-03-09 2023-03-09 Hierarchical caching method, hierarchical caching system and related components
PCT/CN2024/080583 WO2024183799A1 (en) 2023-03-09 2024-03-07 Hierarchical caching method and system, and related component

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310220769.3A CN116048425B (en) 2023-03-09 2023-03-09 Hierarchical caching method, hierarchical caching system and related components

Publications (2)

Publication Number Publication Date
CN116048425A true CN116048425A (en) 2023-05-02
CN116048425B CN116048425B (en) 2023-07-14

Family

ID=86127618

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310220769.3A Active CN116048425B (en) 2023-03-09 2023-03-09 Hierarchical caching method, hierarchical caching system and related components

Country Status (2)

Country Link
CN (1) CN116048425B (en)
WO (1) WO2024183799A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024183799A1 (en) * 2023-03-09 2024-09-12 浪潮电子信息产业股份有限公司 Hierarchical caching method and system, and related component

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020083120A1 (en) * 2000-12-22 2002-06-27 Soltis Steven R. Storage area network file system
CN101158965A (en) * 2007-10-25 2008-04-09 中国科学院计算技术研究所 File reading system and method of distributed file systems
US20110145499A1 (en) * 2009-12-16 2011-06-16 International Business Machines Corporation Asynchronous file operations in a scalable multi-node file system cache for a remote cluster file system
CN103744975A (en) * 2014-01-13 2014-04-23 锐达互动科技股份有限公司 Efficient caching server based on distributed files
CN104317736A (en) * 2014-09-28 2015-01-28 曙光信息产业股份有限公司 Method for implementing multi-level caches in distributed file system
CN111984191A (en) * 2020-08-05 2020-11-24 华东计算技术研究所(中国电子科技集团公司第三十二研究所) Multi-client caching method and system supporting distributed storage
CN112000287A (en) * 2020-08-14 2020-11-27 北京浪潮数据技术有限公司 IO request processing device, method, equipment and readable storage medium
CN113688113A (en) * 2021-07-28 2021-11-23 华东计算技术研究所(中国电子科技集团公司第三十二研究所) Metadata prefetching system and method for distributed file system
CN113835614A (en) * 2020-09-17 2021-12-24 北京焱融科技有限公司 SSD intelligent caching method and system based on distributed file storage client

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9323615B2 (en) * 2014-01-31 2016-04-26 Google Inc. Efficient data reads from distributed storage systems
US10664405B2 (en) * 2017-11-03 2020-05-26 Google Llc In-memory distributed cache
CN112363676A (en) * 2020-11-18 2021-02-12 无锡江南计算技术研究所 Control method and system based on low access delay distributed storage system
CN116048425B (en) * 2023-03-09 2023-07-14 浪潮电子信息产业股份有限公司 Hierarchical caching method, hierarchical caching system and related components

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020083120A1 (en) * 2000-12-22 2002-06-27 Soltis Steven R. Storage area network file system
CN101158965A (en) * 2007-10-25 2008-04-09 中国科学院计算技术研究所 File reading system and method of distributed file systems
US20110145499A1 (en) * 2009-12-16 2011-06-16 International Business Machines Corporation Asynchronous file operations in a scalable multi-node file system cache for a remote cluster file system
CN103744975A (en) * 2014-01-13 2014-04-23 锐达互动科技股份有限公司 Efficient caching server based on distributed files
CN104317736A (en) * 2014-09-28 2015-01-28 曙光信息产业股份有限公司 Method for implementing multi-level caches in distributed file system
CN111984191A (en) * 2020-08-05 2020-11-24 华东计算技术研究所(中国电子科技集团公司第三十二研究所) Multi-client caching method and system supporting distributed storage
CN112000287A (en) * 2020-08-14 2020-11-27 北京浪潮数据技术有限公司 IO request processing device, method, equipment and readable storage medium
CN113835614A (en) * 2020-09-17 2021-12-24 北京焱融科技有限公司 SSD intelligent caching method and system based on distributed file storage client
CN113688113A (en) * 2021-07-28 2021-11-23 华东计算技术研究所(中国电子科技集团公司第三十二研究所) Metadata prefetching system and method for distributed file system

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
A. A. SHVIDKIY; A. A. SAVELIEVA; A. A. ZARUBIN: "Caching Methods Analysis for Improving Distributed Storage Systems Performance", 《2021 SYSTEMS OF SIGNAL SYNCHRONIZATION, GENERATING AND PROCESSING IN TELECOMMUNICATIONS》 *
曹风华;: "一种基于授权机制的分布式文件系统小文件访问优化策略", 计算机系统应用, no. 07 *
王胜;杨超;崔蔚;黄高攀;张明明;: "基于MongoDB的分布式缓存", 计算机系统应用, no. 04 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024183799A1 (en) * 2023-03-09 2024-09-12 浪潮电子信息产业股份有限公司 Hierarchical caching method and system, and related component

Also Published As

Publication number Publication date
CN116048425B (en) 2023-07-14
WO2024183799A1 (en) 2024-09-12

Similar Documents

Publication Publication Date Title
CA3027756C (en) Systems and methods for efficient distribution of stored data objects
US10785322B2 (en) Server side data cache system
US9244980B1 (en) Strategies for pushing out database blocks from cache
US11080207B2 (en) Caching framework for big-data engines in the cloud
US11966416B2 (en) Cross-organization and cross-cloud automated data pipelines
WO2024183799A1 (en) Hierarchical caching method and system, and related component
US20210318994A1 (en) Extensible streams for operations on external systems
US11762860B1 (en) Dynamic concurrency level management for database queries
CN116450966A (en) Cache access method and device, equipment and storage medium
JP6406254B2 (en) Storage device, data access method, and data access program
Branagan et al. Understanding the top 5 Redis performance metrics
US11748327B2 (en) Streams using persistent tables
US20240330296A1 (en) Active invalidation of metadata cache entries
KR101345802B1 (en) System for processing rule data and method thereof
CN117608864B (en) Multi-core cache consistency method and system
US11797497B2 (en) Bundle creation and distribution
US20240265010A1 (en) Multi-cluster query result caching
Su et al. SACache: Size-Aware Load Balancing for Large-Scale Storage Systems
KR20230080902A (en) Apparatus for preloading data in distributed computing enviroment and method using the same
CN111737298A (en) Cache data control method and device based on distributed storage
KR20120078372A (en) Metadata server and method of processing file in metadata server and asymmetric clustered file system using the same
Frainer et al. Pervasive File Space: Flexible Application And Context Aware Adaptation in A Pervasive File System

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant