CN116048425A

CN116048425A - Hierarchical caching method, hierarchical caching system and related components

Info

Publication number: CN116048425A
Application number: CN202310220769.3A
Authority: CN
Inventors: 臧林劼; 何怡川
Original assignee: Inspur Electronic Information Industry Co Ltd
Current assignee: Inspur Electronic Information Industry Co Ltd
Priority date: 2023-03-09
Filing date: 2023-03-09
Publication date: 2023-05-02
Anticipated expiration: 2043-03-09
Also published as: CN116048425B; WO2024183799A1

Abstract

The application discloses a hierarchical caching method, a hierarchical caching system and related components, which relate to the field of distributed storage, wherein the hierarchical caching method is applied to each computing node of a distributed storage system and comprises the following steps: monitoring a file IO operation request sent by a client to a distributed storage system by using a client process, and redirecting the file IO operation request to a server process when the file IO operation request is monitored; judging whether a target storage position corresponding to the file IO operation request is an aggregation cache layer or not by utilizing a server process; if not, reading data corresponding to the file IO operation request from the bottom layer of the distributed storage system, and caching the data to an aggregation cache layer; if yes, reading the data from the aggregation cache layer, and returning the data to the client process so that the client process returns the data to the client. The method and the device can improve IO performance of massive small file data sets and improve performance bottleneck caused by metadata in high-concurrency metadata intensive file system service.

Description

Hierarchical caching method, hierarchical caching system and related components

Technical Field

The present disclosure relates to the field of distributed storage, and in particular, to a hierarchical caching method, system, and related components.

Background

With the rapid growth of HPC (High Performance Computing, high performance computer group) computing power, large-scale, highly concurrent applications put a great strain on distributed storage system IO (Input Output). Three key elements involved in high performance HPC scenarios: element one, data of each high-performance computing nodeLarge-scale and high-concurrency storage IO is needed for tag metadata attributes, wherein 80% of the IO comprises loading and modifying data sets and is used for HPC training and random data retrieval, and a typical intensive IO model is as follows

The method comprises the steps of carrying out a first treatment on the surface of the The second element is the HPC high-performance calculation process, which comprises data preprocessing, data marking, data compression and the like; and thirdly, synchronously updating the distributed data consistency.

By analyzing three key elements of an HPC scene, it is found that massive small files generated by high-performance computing have high concurrency IO and random access, so that IO read-write performance of a distributed storage system is easy to saturate, for example, a common HPC data set generally comprises 3000 different types of small files exceeding 200 ten thousand, and if a storage IO read-write software stack cannot meet the HPC requirement of large-scale operation, high-performance computing service can be blocked, so that IO performance of the distributed storage system is crucial to the high-performance computing service. The existing technical scheme provides some optimization schemes for improving the performance of storage IO (input/output) aiming at high-performance computing, such as prefetching and caching, however, the adoption of the existing solution to perform large-scale and high-concurrency storage IO on an HPC high-performance computing scene still has a plurality of technical challenges, such as reading intensive high-performance IO aiming at small files, and huge metadata service overhead of a distributed storage system can be generated, so that the data storage efficiency is affected.

Therefore, how to provide a solution to the above technical problem is a problem that a person skilled in the art needs to solve at present.

Disclosure of Invention

The purpose of the application is to provide a hierarchical caching method, a hierarchical caching system and related components, which can improve IO performance of massive small file data sets and improve performance bottlenecks caused by metadata in high-concurrency metadata intensive file system business.

In order to solve the above technical problems, the present application provides a hierarchical caching method, which is applied to each computing node of a distributed storage system, and the hierarchical caching method includes:

monitoring a file IO operation request sent by a client to the distributed storage system by using a client process, and redirecting the file IO operation request to a server process when the file IO operation request is monitored;

judging whether a target storage position corresponding to the file IO operation request is an aggregation cache layer or not by utilizing the server process;

if not, reading data corresponding to the file IO operation request from the bottom layer of the distributed storage system, and caching the data to the aggregation caching layer;

if yes, the data is read from the aggregation cache layer, and the data is returned to the client process, so that the client process returns the data to the client.

Optionally, the process of using the server process to determine whether the target storage location corresponding to the file IO operation request is an aggregation cache layer includes:

inserting the received file IO operation request into a shared queue by using the server process;

in the shared queue, determining whether the data corresponding to the file IO operation request is cached data;

if yes, judging the target storage position corresponding to the file IO operation request as an aggregation cache layer.

Optionally, in the sharing queue, the determining whether the data corresponding to the file IO operation request is cached data includes:

in the shared queue, determining whether the data corresponding to the file IO operation request is cached data or not through a data thread; the data thread is a thread generated when the server process instance is constructed.

Optionally, the hierarchical caching method further includes:

and when the file IO operation request sent by the client to the distributed storage system is monitored, starting the server process, and dynamically constructing the server process instance by using the state of the computing node and the states of other computing nodes adjacent to the computing node.

Optionally, the process of reading the data from the aggregation cache layer includes:

redirecting the file IO operation request to the aggregation cache layer through a data thread so as to read the data from the aggregation cache layer.

Optionally, the method for hierarchical caching further includes, while inserting the received file IO operation request into a shared queue by using the server process:

and configuring the shared queue with a mutual exclusion lock.

Optionally, the shared queue is a FIFO queue.

Optionally, the data corresponding to the file IO operation request includes a file descriptor, a read offset, and a length.

Optionally, the hierarchical caching method further includes:

and determining the storage position of the data in the aggregation cache layer based on the file path and the computing node.

Optionally, the hierarchical caching method further includes:

and broadcasting the file IO operation request to the computing nodes adjacent to the computing node.

Optionally, the hierarchical caching method further includes:

judging whether a data set corresponding to the file IO operation request is larger than the total capacity of a local storage medium or not;

if yes, performing a cache eviction operation and a replacement operation.

Optionally, the hierarchical caching method further includes:

constructing a dynamic link library based on the environment variable; the dynamic link library is used for intercepting the file IO operation request.

Optionally, the redirecting the file IO operation request to the server process includes:

and redirecting the file IO operation request to a server process through a hash algorithm.

Optionally, the aggregate cache layer is a cache layer formed by high-speed storage media in each computing node in the distributed storage system.

Optionally, the high-speed storage medium is an Nvme SSD.

Optionally, the hierarchical caching method further includes:

when the clearing condition is satisfied, the data stored in the medium and high speed storage medium on the present computing node is cleared.

In order to solve the above technical problem, the present application further provides a hierarchical cache system, which is applied to each computing node of a distributed storage system, where the hierarchical cache system includes:

the monitoring module is used for monitoring a file IO operation request sent by a client to the distributed storage system by using a client process, and redirecting the file IO operation request to a server process when the file IO operation request is monitored;

The processing module is used for judging whether the target storage position corresponding to the file IO operation request is an aggregation cache layer or not by utilizing the server process, if not, triggering the first reading module, and if so, triggering the second reading module;

the first reading module is used for reading data corresponding to the file IO operation request from the bottom layer of the distributed storage system and caching the data to the aggregation caching layer;

and the second reading module is used for reading the data from the aggregation cache layer and returning the data to the client process so that the client process returns the data to the client.

In order to solve the above technical problem, the present application further provides an electronic device, including:

a memory for storing a computer program;

a processor for implementing the steps of the hierarchical caching method as claimed in any one of the preceding claims when executing said computer program.

In order to solve the technical problem, the present application further provides a distributed storage system, including a storage bottom layer module and a plurality of nodes, each node includes a layered client process, a layered server process and a storage medium, and the storage medium of each node forms an aggregation cache layer, where:

The hierarchical client process is used for monitoring a file IO operation request sent by a client, and redirecting the file IO operation request to the hierarchical server process when the file IO operation request is monitored;

the hierarchical server process is configured to determine whether a target storage location corresponding to the file IO operation request is the aggregation cache layer, if not, read data corresponding to the file IO operation request from the storage bottom layer module, and send the data to the aggregation cache layer, if yes, read the data from the aggregation cache layer, and return the data to the hierarchical client process, so that the hierarchical client process returns the data to the client;

the aggregation cache layer is used for storing data sent by the layered server process.

To solve the above technical problem, the present application further provides a computer readable storage medium, on which a computer program is stored, the computer program implementing the steps of the hierarchical caching method as described in any one of the above when being executed by a processor.

The application provides a hierarchical caching method, after a client process monitors a file IO operation request, the file IO operation request is redirected to a server process, the server process firstly searches files in an aggregation caching layer, and when the aggregation caching layer is missed, data is searched from a distributed storage system bottom layer, so that IO performance of a massive small file data set is improved, and performance bottleneck caused by metadata in high-concurrency metadata intensive file system business is improved. The application also provides a layered cache system, electronic equipment, a distributed storage system and a computer readable storage medium, which have the same beneficial effects as the layered cache method.

Drawings

For a clearer description of the embodiments of the present application, the drawings that are needed in the embodiments will be briefly described, it being apparent that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flowchart illustrating steps of a hierarchical caching method provided in the present application;

FIG. 2 is a schematic architecture diagram of a hierarchical cache system provided herein;

FIG. 3 is a schematic diagram of a hierarchical cache framework of a distributed storage system provided herein;

fig. 4 is a schematic structural diagram of a hierarchical cache system provided in the present application.

Detailed Description

The core of the application is to provide a hierarchical caching method, a hierarchical caching system and related components, which can improve IO performance of massive small file data sets and improve performance bottlenecks caused by metadata in high-concurrency metadata intensive file system business.

For the purposes of making the objects, technical solutions and advantages of the embodiments of the present application more clear, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.

Referring to fig. 1, fig. 1 is a flowchart illustrating steps of a hierarchical caching method provided in the present application, where the hierarchical caching method includes:

s101: monitoring a file IO operation request sent by a client to a distributed storage system by using a client process, and redirecting the file IO operation request to a server process when the file IO operation request is monitored;

the embodiment provides an aggregation cache layer, which is a transparent read-only cache layer, aiming at a typical intensive IO model

Constructing a cache by aggregating distributed cluster nodesThe local and adjacent nodes store locally to accelerate the performance of reading the stored data, so as to improve the IO performance of massive small file data sets.

The client accesses the distributed storage system through a POSIX file system interface provided by the distributed storage system so as to accelerate the storage IO access performance of an HPC high-performance scene, wherein the scene has read-only data with high re-reading rate and a typical intensive IO model

Characteristics are that. Referring to fig. 2, the architecture of the hierarchical cache system provided in the present application is composed of two main components: and when the job is distributed on a group of computing nodes on the HPC, the hierarchical cache server process is started, and a server process instance is dynamically constructed by utilizing local storage of the distributed storage nodes and adjacent nodes. Each node of the distributed storage system deploys a hierarchical cache client process and a server process, and the processes can cache data of the HPC high-performance computing job request to the Nvme SSD high-speed storage medium device of the node.

Specifically, describing the hierarchical caching process of the present application, referring to fig. 3, a client process is preloaded first, and file system operations, such as open, read and close, are monitored and intercepted by the client process, so that file system calls are intercepted by the process, and no modification is required to an existing high-performance computing application program or a bottom file system of a distributed storage system. It can be understood that the hierarchical cache client process is composed of a file system IO interface forwarding module, and the interface captures file system calls to the distributed storage system and redirects to the corresponding hierarchical cache server process, so that the hierarchical cache client process firstly reads hit data through the aggregation cache layer, and is beneficial to high-performance business performance requirements.

S102: judging whether a target storage position corresponding to the file IO operation request is an aggregation cache layer or not by utilizing a server process, if not, executing S103, and if so, executing S104;

s103: reading data corresponding to the file IO operation request from the bottom layer of the distributed storage system, and caching the data to an aggregation caching layer;

s104: the data is read from the aggregate cache layer and returned to the client process so that the client process returns the data to the client.

After receiving the file IO operation request intercepted by the client process, the server process retrieves the file from the bottom layer of the distributed storage system only when the cache is not hit, and it can be understood that the description is respectively directed at two reading scenes, wherein the two reading scenes comprise first reading and non-first reading.

For the first read:

the method comprises the steps that a computing node client of an HPC high-performance scene initiates a read request to a data set catalog on a distributed storage system, a hierarchical cache client process intercepts any incoming file IO operation request and starts tracking in the data set catalog, an RPC (remote procedure call) processing program of the hierarchical cache client process redirects the requested file IO operation request to a corresponding hierarchical cache server process, and internal management RPC processing programs of the hierarchical cache client process and the server process are responsible for sending and receiving messages through a network.

As an optional embodiment, the process of determining, by using the server process, whether the target storage location corresponding to the file IO operation request is an aggregate cache layer includes: inserting the received file IO operation request into a shared queue by using a server process; in the shared queue, determining whether data corresponding to the file IO operation request is cached data; if yes, the target storage position corresponding to the file IO operation request is judged to be an aggregation cache layer.

As an optional embodiment, in the shared queue, the process of determining whether the data corresponding to the file IO operation request is cached data includes: in the shared queue, determining whether data corresponding to the file IO operation request is cached data or not through a data thread; the data thread is a thread generated when the server process instance is constructed.

Specifically, when the hierarchical cache Server process receives a file IO operation request, the RPC handler inserts the forwarded file IO into a shared FIFO (First Input First Output, first-in first-out queue) queue, and in the shared FIFO queue, whether the file is cached is checked by a move-data thread, and because the file is read for the first time, the data needs to be pulled into an aggregate cache, and the cached file descriptor, the read offset and the length are determined.

For non-first reads:

as an alternative embodiment, the process of reading data from the aggregate cache layer includes:

the file IO operation request is redirected to the aggregate cache layer by the data thread to read data from the aggregate cache layer.

Specifically, the data thread redirects the IO to the aggregation cache to read the file, returns the file descriptor to the corresponding hierarchical cache Client process, and then the process returns the file descriptor, the read offset, the length and other data to the HPC application program, namely the HPC high performance Client request end. As can be seen from the file IO operation request, in this embodiment, after the client process monitors the file IO operation request, the client process redirects the file IO operation request to the server process, and the server process retrieves the file in the aggregation cache layer first, and retrieves the data from the bottom layer of the distributed storage system when the aggregation cache layer is missed, so as to improve the IO performance of the massive small file data set, and improve the performance bottleneck caused by metadata in the high-concurrency metadata intensive file system service.

It can be seen that, in this embodiment, after the client process monitors the file IO operation request, the client process redirects the file IO operation request to the server process, and the server process retrieves the file in the aggregation cache layer first, and retrieves the data from the bottom layer of the distributed storage system when the aggregation cache layer is missed, so as to improve the IO performance of the massive small file data set, and improve the performance bottleneck caused by metadata in the high-concurrency metadata intensive file system service.

Based on the above embodiments:

as an optional embodiment, while inserting the received file IO operation request into the shared queue by using the server process, the hierarchical caching method further includes:

the shared queue is configured with a mutex lock.

In particular, it is contemplated that multiple hierarchical cache client processes may request a single file at the same time, thus using exclusive locks on the shared queue to ensure consistency and avoid duplication of files into the aggregate cache.

As an alternative embodiment, the hierarchical caching method further includes:

the storage position of the data in the aggregation cache layer is determined based on the file path and the affiliated computing node.

As an alternative embodiment, the hierarchical caching method further includes:

Specifically, the hierarchical caching process broadcasts a request to find a file to neighboring nodes, helping to balance the load pressure between the nodes.

As an alternative embodiment, the hierarchical caching method further includes:

judging whether a data set corresponding to the file IO operation request is larger than the total capacity of the local storage medium or not;

if yes, performing a cache eviction operation and a replacement operation.

Specifically, the aggregate cache elimination mechanism performs cache eviction and replacement based on repeated read dataset operations if the dataset is greater than the total capacity of the node's local cache.

As an alternative embodiment, the hierarchical caching method further includes:

constructing a dynamic link library based on the environment variable; the dynamic link library is used for intercepting file IO operation requests.

Specifically, the embodiment makes a redirection only read aggregation cache, namely, intercepts a read IO mechanism, and based on an initial prototype of a high-performance scene operation of the distributed storage system, the aggregation cache layer of the embodiment is beneficial to analyzing a high-performance service scene, in particular, IO read calls in three key elements so as to know how a data loader in a framework accesses files. The embodiment designs the aggregation cache layer to intercept related function calls of IO, and adopts the same function mechanism selectively loaded into different dynamic link libraries to construct, and the mechanism avoids the necessity of forcing an application program to modify the code library to support the aggregation cache layer.

Specifically, the intercepting Read IO mechanism of the present embodiment is redirected to a dynamic link library only_read_performance. So, and the technical optimization point of the dynamic link library is that once the function of the dynamic library changes, the intercepting Read IO mechanism is transparent to the executable program, and the executable program does not need to be recompiled. For statically linked programs in other technologies, a small change in the function library requires recompilation and release of the entire program. Where a static link is to compile all referenced functions or variables into an executable file. Dynamic linking does not compile the functions into an executable file, but rather loads the library of functions dynamically at program runtime, i.e., the run-link. Therefore, redirection to the dynamic link library only_read_performance. So, compatibility and portability are provided, and the method has important value for the distributed storage system.

The specific embodiment of the intercept read IO mechanism steps are as follows:

(1) Aiming at HPC high-performance computing job client requests, file system requests meeting standard POSIX semantics are met, and typical intensive IO models are met

Calling to access an underlying distributed storage file system;

(2) The environment variable LD_PRELOAD of the Linux server of the distributed storage system is used, and is characterized in that the dynamic library is loaded, and the priority is highest, so that the method is an embodiment method for intercepting read request processing logic;

(3) Input of the read IO mechanism is intercepted:

a.

a file system call;

ld_reload environment variable;

c. a dynamic link library in the local aggregation cache layer is marked as only_read_performance. So;

(4) Output of the read IO mechanism is intercepted:

and executing the only_read_performance. So, and performing Read cache logic processing at the cache aggregation layer.

As an alternative embodiment, the process of redirecting the file IO operation request to the server process includes:

and redirecting the file IO operation request to the server process through a hash algorithm.

The redirection is performed through the hash algorithm, so that the bottleneck of metadata searching can be avoided, and the aim is to improve random reading performance. The hierarchical cache Client process uses Hash redirection IO to find a cache on the hierarchical cache Server process, in order not to store cached file metadata in a distributed metadata store or in-memory database. In the aggregate cache, file cache locations are determined using file paths and the affiliated nodes.

As an alternative embodiment, the aggregate cache layer is a cache layer comprised of high-speed storage media in each computing node in the distributed storage system.

As an alternative embodiment, the high-speed storage medium is Nvme (Non-Volatile Memory express, nonvolatile memory host controller interface specification) SSD (Solid State Disk).

As an alternative embodiment, the hierarchical caching method further includes:

In this embodiment, the lifecycle of the data set in the cache is coupled to the lifecycle of the job on the HPC high performance client, and after the job is completed, the cached data set is purged from the node local storage.

In summary, the metadata module and the data module storage mechanism of the aggregate cache layer and the distributed storage system provided in the present application are independent from each other, and are transparent read-only cache layers, and are specific to typical dense IO models

Constructing a cache, and accelerating the performance of reading and storing data by adopting the local storage of the aggregation distributed cluster nodes and the local storage of adjacent nodes based on an RPC remote procedure call mechanism so as to improve the IO performance of a massive small file data set; determining a cache position of a data request by adopting a distributed hash algorithm IO redirection, designing a highest-priority dynamic link library only_read_performance. So based on an LD_PRELOAD environment variable to intercept the Read IO request, and avoiding bottleneck caused by searching metadata of a distributed storage system, wherein the aim is to improve random reading performance; meanwhile, the aggregation cache layer and the distributed storage system are mutually independent, so that the method has the characteristics of portability and universality.

In terms of performance, the performance problem of the high concurrency IO model of the HPC high performance scene is effectively improved, and performance bottleneck caused by metadata in high concurrency metadata intensive file system service is improved; in terms of stability, the cache aggregation layer is independent of the distributed storage bottom layer system, the storage medium fault condition of the cache aggregation layer occurs, normal service call is not affected, and the cache aggregation layer has stability; in terms of safety, the layered cache architecture is loosely coupled with the distributed storage system, so that safety risks are avoided; in terms of cost, the problem of the performance of the universal storage IO of the HPC high-performance scene is solved, and the competitiveness and maintenance cost of distributed file storage can be improved; in terms of compatibility, the method and the device have portability and universality, do not need to modify HPC operation application programs, improve linear expansibility of the distributed clusters, and are compatible with common file quota, snapshot and other characteristics.

In a second aspect, referring to fig. 4, fig. 4 is a schematic structural diagram of a hierarchical cache system provided in the present application, which is applied to each computing node of a distributed storage system, the hierarchical cache system includes:

the monitoring module 41 is configured to monitor, by using a client process, a file IO operation request sent by a client to the distributed storage system, and redirect the file IO operation request to a server process when the file IO operation request is monitored;

The processing module 42 is configured to determine, by using a server process, whether a target storage location corresponding to the file IO operation request is an aggregation cache layer, and if not, trigger the first reading module 43, and if yes, trigger the second reading module 44;

the first reading module 43 is configured to read data corresponding to the file IO operation request from the bottom layer of the distributed storage system, and cache the data to the aggregation cache layer;

a second reading module 44, configured to read the data from the aggregation cache layer and return the data to the client process, so that the client process returns the data to the client.

And constructing a cache, and accelerating the performance of reading and storing data by aggregating the local storage of the distributed cluster nodes and the local storage of adjacent nodes so as to improve the IO performance of massive small file data sets.

Characteristics are that. Referring to fig. 2, the architecture of the hierarchical cache system provided in the present application is composed of two main components: and when the job is distributed on a group of computing nodes on the HPC, the hierarchical cache server process is started, and a server process instance is dynamically constructed by utilizing local storage of the distributed storage nodes and adjacent nodes. Each node of the distributed storage system deploys a hierarchical cache client process and a server process that operate on HPC high performance computing jobsThe requested data is cached on the Nvme SSD high-speed storage media device of the node.

For the first read:

As an optional embodiment, the process of determining, by using the server process, whether the target storage location corresponding to the file IO operation request is an aggregate cache layer includes: inserting the received file IO operation request into a shared queue by using a server process; in the shared queue, determining whether data corresponding to the file IO operation request is cached data; if yes, the target storage position corresponding to the file IO operation request is judged to be an aggregation cache layer. As an optional embodiment, in the shared queue, the process of determining whether the data corresponding to the file IO operation request is cached data includes: in the shared queue, determining whether data corresponding to the file IO operation request is cached data or not through a data thread; the data thread is a thread generated when the server process instance is constructed.

Specifically, when the hierarchical cache Server process receives a file IO operation request, the RPC processing program inserts the forwarded file IO into a shared FIFO queue, and in the shared FIFO queue, whether the file is cached is checked by a move-data thread, and because the file is read for the first time, the data needs to be pulled into an aggregation cache, and the cached file descriptor, the reading offset and the length are determined.

For non-first reads:

As an optional embodiment, the process of determining, by using the server process, whether the target storage location corresponding to the file IO operation request is an aggregate cache layer includes:

inserting the received file IO operation request into a shared queue by using a server process;

in the shared queue, determining whether data corresponding to the file IO operation request is cached data;

if yes, the target storage position corresponding to the file IO operation request is judged to be an aggregation cache layer.

As an optional embodiment, in the shared queue, the process of determining whether the data corresponding to the file IO operation request is cached data includes:

in the shared queue, determining whether data corresponding to the file IO operation request is cached data or not through a data thread; the data thread is a thread generated when the server process instance is constructed.

As an alternative embodiment, the hierarchical caching system further comprises:

and the preprocessing module is used for starting a server process when the monitoring client sends a file IO operation request to the distributed storage system, and dynamically constructing a server process instance by utilizing the state of the computing node and the states of other computing nodes adjacent to the computing node.

As an optional embodiment, while inserting the received file IO operation request into the shared queue by the server process, the hierarchical cache system further includes:

and the configuration module is used for configuring the shared queue with the mutual exclusion lock.

As an alternative embodiment, the shared queue is a FIFO queue.

As an alternative embodiment, the data corresponding to the file IO operation request includes a file descriptor, a read offset, and a length.

and the determining module is used for determining the storage position of the data in the aggregation cache layer based on the file path and the computing node.

and the broadcasting module is used for broadcasting the file IO operation request to the computing nodes adjacent to the computing node.

and the judging module is used for judging whether the data set corresponding to the file IO operation request is larger than the total capacity of the local storage medium, and if so, executing cache eviction operation and replacement operation.

the construction module is used for constructing a dynamic link library based on the environment variable; the dynamic link library is used for intercepting file IO operation requests.

As an alternative embodiment, the high-speed storage medium is an Nvme SSD.

and the clearing module is used for clearing the data stored in the medium-high speed storage medium on the computing node when the clearing condition is met.

In a third aspect, the present application further provides an electronic device, including:

a memory for storing a computer program;

a processor for implementing the steps of the hierarchical caching method as any one of the above when executing a computer program.

Specifically, the memory includes a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and computer readable instructions, and the internal memory provides an environment for the operating system and the execution of the computer readable instructions in the non-volatile storage medium. The processor provides computing and control capabilities for the electronic device, and when executing the computer program stored in the memory, the following steps may be implemented: monitoring a file IO operation request sent by a client to a distributed storage system by using a client process, and redirecting the file IO operation request to a server process when the file IO operation request is monitored; judging whether a target storage position corresponding to the file IO operation request is an aggregation cache layer or not by utilizing a server process; if not, reading data corresponding to the file IO operation request from the bottom layer of the distributed storage system, and caching the data to an aggregation cache layer; if yes, reading the data from the aggregation cache layer, and returning the data to the client process so that the client process returns the data to the client.

As an alternative embodiment, the processor may implement the following steps when executing the computer subroutine stored in the memory: inserting the received file IO operation request into a shared queue by using a server process; in the shared queue, determining whether data corresponding to the file IO operation request is cached data; if yes, the target storage position corresponding to the file IO operation request is judged to be an aggregation cache layer.

As an alternative embodiment, the processor may implement the following steps when executing the computer subroutine stored in the memory: in the shared queue, determining whether data corresponding to the file IO operation request is cached data or not through a data thread; the data thread is a thread generated when the server process instance is constructed.

As an alternative embodiment, the processor may implement the following steps when executing the computer subroutine stored in the memory: and when the monitoring client sends a file IO operation request to the distributed storage system, starting a server process, and dynamically constructing a server process instance by using the state of the computing node and the states of other computing nodes adjacent to the computing node.

As an alternative embodiment, the processor may implement the following steps when executing the computer subroutine stored in the memory: the file IO operation request is redirected to the aggregate cache layer by the data thread to read data from the aggregate cache layer.

As an alternative embodiment, the processor may implement the following steps when executing the computer subroutine stored in the memory: the shared queue is configured with a mutex lock.

As an alternative embodiment, the processor may implement the following steps when executing the computer subroutine stored in the memory: the storage position of the data in the aggregation cache layer is determined based on the file path and the affiliated computing node.

As an alternative embodiment, the processor may implement the following steps when executing the computer subroutine stored in the memory: and broadcasting the file IO operation request to the computing nodes adjacent to the computing node.

As an alternative embodiment, the processor may implement the following steps when executing the computer subroutine stored in the memory: judging whether a data set corresponding to the file IO operation request is larger than the total capacity of the local storage medium or not; if yes, performing a cache eviction operation and a replacement operation.

As an alternative embodiment, the processor may implement the following steps when executing the computer subroutine stored in the memory: constructing a dynamic link library based on the environment variable; the dynamic link library is used for intercepting file IO operation requests.

As an alternative embodiment, the processor may implement the following steps when executing the computer subroutine stored in the memory: and redirecting the file IO operation request to the server process through a hash algorithm.

As an alternative embodiment, the processor may implement the following steps when executing the computer subroutine stored in the memory: when the clearing condition is satisfied, the data stored in the medium and high speed storage medium on the present computing node is cleared.

On the basis of the above embodiment, as a preferred implementation manner, the electronic device further includes:

the input interface is connected with the processor and used for acquiring the externally imported computer programs, parameters and instructions, and the externally imported computer programs, parameters and instructions are controlled by the processor and stored in the memory. The input interface may be coupled to an input device for receiving parameters or instructions manually entered by a user. The input device can be a touch layer covered on a display screen, or can be a key, a track ball or a touch pad arranged on a terminal shell.

And the display unit is connected with the processor and used for displaying the data sent by the processor. The display unit may be a liquid crystal display or an electronic ink display, etc.

And the network port is connected with the processor and used for carrying out communication connection with external terminal equipment. The communication technology adopted by the communication connection can be a wired communication technology or a wireless communication technology, such as a mobile high definition link technology (MHL), a Universal Serial Bus (USB), a High Definition Multimedia Interface (HDMI), a wireless fidelity technology (WiFi), a Bluetooth communication technology with low power consumption, a communication technology based on IEEE802.11s, and the like.

In a fourth aspect, the present application further provides a distributed storage system, including a storage bottom layer module and a plurality of nodes, each node includes a hierarchical client process, a hierarchical server process, and a storage medium, where the storage medium of each node forms an aggregate cache layer, where:

the layering client process is used for monitoring a file IO operation request sent by the client, and redirecting the file IO operation request to the layering server process when the file IO operation request is monitored;

the hierarchical server process is used for judging whether a target storage position corresponding to the file IO operation request is an aggregation cache layer or not, if not, reading data corresponding to the file IO operation request from the storage bottom layer module, sending the data to the aggregation cache layer, and if so, reading the data from the aggregation cache layer, and returning the data to the hierarchical client process, so that the hierarchical client process returns the data to the client;

And the aggregation cache layer is used for storing data sent by the layered server process.

In a fifth aspect, the present application further provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the hierarchical caching method as described in any one of the above.

In particular, the computer-readable storage medium may include: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes. The storage medium has stored thereon a computer program which, when executed by a processor, performs the steps of: monitoring a file IO operation request sent by a client to a distributed storage system by using a client process, and redirecting the file IO operation request to a server process when the file IO operation request is monitored; judging whether a target storage position corresponding to the file IO operation request is an aggregation cache layer or not by utilizing a server process; if not, reading data corresponding to the file IO operation request from the bottom layer of the distributed storage system, and caching the data to an aggregation cache layer; if yes, reading the data from the aggregation cache layer, and returning the data to the client process so that the client process returns the data to the client.

As an alternative embodiment, the following steps may be implemented in particular when a computer subroutine stored in a computer readable storage medium is executed by a processor: inserting the received file IO operation request into a shared queue by using a server process; in the shared queue, determining whether data corresponding to the file IO operation request is cached data; if yes, the target storage position corresponding to the file IO operation request is judged to be an aggregation cache layer.

As an alternative embodiment, the following steps may be implemented in particular when a computer subroutine stored in a computer readable storage medium is executed by a processor: in the shared queue, determining whether data corresponding to the file IO operation request is cached data or not through a data thread; the data thread is a thread generated when the server process instance is constructed.

As an alternative embodiment, the following steps may be implemented in particular when a computer subroutine stored in a computer readable storage medium is executed by a processor: and when the monitoring client sends a file IO operation request to the distributed storage system, starting a server process, and dynamically constructing a server process instance by using the state of the computing node and the states of other computing nodes adjacent to the computing node.

As an alternative embodiment, the following steps may be implemented in particular when a computer subroutine stored in a computer readable storage medium is executed by a processor: the file IO operation request is redirected to the aggregate cache layer by the data thread to read data from the aggregate cache layer.

As an alternative embodiment, the following steps may be implemented in particular when a computer subroutine stored in a computer readable storage medium is executed by a processor: the shared queue is configured with a mutex lock.

As an alternative embodiment, the following steps may be implemented in particular when a computer subroutine stored in a computer readable storage medium is executed by a processor: the storage position of the data in the aggregation cache layer is determined based on the file path and the affiliated computing node.

As an alternative embodiment, the following steps may be implemented in particular when a computer subroutine stored in a computer readable storage medium is executed by a processor: and broadcasting the file IO operation request to the computing nodes adjacent to the computing node.

As an alternative embodiment, the following steps may be implemented in particular when a computer subroutine stored in a computer readable storage medium is executed by a processor: judging whether a data set corresponding to the file IO operation request is larger than the total capacity of the local storage medium or not; if yes, performing a cache eviction operation and a replacement operation.

As an alternative embodiment, the following steps may be implemented in particular when a computer subroutine stored in a computer readable storage medium is executed by a processor: constructing a dynamic link library based on the environment variable; the dynamic link library is used for intercepting file IO operation requests.

As an alternative embodiment, the following steps may be implemented in particular when a computer subroutine stored in a computer readable storage medium is executed by a processor: and redirecting the file IO operation request to the server process through a hash algorithm.

As an alternative embodiment, the following steps may be implemented in particular when a computer subroutine stored in a computer readable storage medium is executed by a processor: when the clearing condition is satisfied, the data stored in the medium and high speed storage medium on the present computing node is cleared.

It should also be noted that in this specification, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A hierarchical caching method applied to each computing node of a distributed storage system, the hierarchical caching method comprising:

2. The hierarchical caching method according to claim 1, wherein the process of determining, by the server process, whether the target storage location corresponding to the file IO operation request is an aggregate cache layer includes:

3. The hierarchical caching method according to claim 2, wherein in the shared queue, determining whether the data corresponding to the file IO operation request is cached data comprises:

4. The hierarchical caching method of claim 3, further comprising:

5. A hierarchical caching method according to claim 3, wherein the process of reading the data from the aggregate cache layer comprises:

6. The hierarchical caching method according to claim 2, wherein the method further comprises, while inserting the received file IO operation request into a shared queue with the server process:

and configuring the shared queue with a mutual exclusion lock.

7. The hierarchical caching method of claim 2, wherein the shared queue is a FIFO queue.

8. The hierarchical caching method according to claim 1, wherein the data corresponding to the file IO operation request includes a file descriptor, a read offset, and a length.

9. The hierarchical caching method of claim 1, further comprising:

10. The hierarchical caching method of claim 1, further comprising:

11. The hierarchical caching method of claim 1, further comprising:

if yes, performing a cache eviction operation and a replacement operation.

12. The hierarchical caching method of claim 1, further comprising:

13. The hierarchical caching method of claim 1, wherein redirecting the file IO operation request to a server process comprises:

14. The hierarchical caching method according to any one of claims 1-13, wherein the aggregate cache layer is a cache layer comprised of high-speed storage media in each of the computing nodes in the distributed storage system.

15. The hierarchical caching method of claim 14, wherein the high-speed storage medium is an Nvme SSD.

16. The hierarchical caching method of claim 14, further comprising:

17. A hierarchical caching system for each computing node of a distributed storage system, the hierarchical caching system comprising:

18. An electronic device, comprising:

A memory for storing a computer program;

a processor for implementing the steps of the hierarchical caching method according to any one of claims 1-16 when executing said computer program.

19. The distributed storage system is characterized by comprising a storage bottom layer module and a plurality of nodes, wherein each node comprises a layered client process, a layered server process and a storage medium, and the storage medium of each node forms an aggregation cache layer, wherein:

20. A computer readable storage medium, characterized in that the computer readable storage medium has stored thereon a computer program which, when executed by a processor, implements the steps of the hierarchical caching method according to any one of claims 1-16.