CN116700608A - Caching method, caching device, caching equipment and readable storage medium - Google Patents

Caching method, caching device, caching equipment and readable storage medium Download PDF

Info

Publication number
CN116700608A
CN116700608A CN202310451188.0A CN202310451188A CN116700608A CN 116700608 A CN116700608 A CN 116700608A CN 202310451188 A CN202310451188 A CN 202310451188A CN 116700608 A CN116700608 A CN 116700608A
Authority
CN
China
Prior art keywords
beegfs
data
beeond
pool
caching
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310451188.0A
Other languages
Chinese (zh)
Inventor
曹代
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jinan Inspur Data Technology Co Ltd
Original Assignee
Jinan Inspur Data Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jinan Inspur Data Technology Co Ltd filed Critical Jinan Inspur Data Technology Co Ltd
Priority to CN202310451188.0A priority Critical patent/CN116700608A/en
Publication of CN116700608A publication Critical patent/CN116700608A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0655Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
    • G06F3/0656Data buffering arrangements
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The application discloses a caching method, a caching device and a caching medium, wherein management services and metadata services are deleted, and BeeOND source codes pointing to the BeeGFS metadata services are added; based on the Beeond source code, running a Beeond program to add a computing node SSD in the Beeond to the BeeGFS cluster and to mount a client; creating a default pool of the BeeGFS cluster by using a computing node local SSD; during the data interaction process of the BeeGFS, the data of the BeeGFS are cached in a default pool. In the application, the BeeOND is used as a buffer zone of the BeeGFS to buffer data, on one hand, the BeeOND provides high-speed buffer for the BeeGFS, and on the other hand, the BeeOND has non-volatility, so that the data buffered by the BeeGFS can be powered down and not lost.

Description

Caching method, caching device, caching equipment and readable storage medium
Technical Field
The present application relates to the field of storage technologies, and in particular, to a caching method, a caching device, and a readable storage medium.
Background
BeeGFS (a distributed file system) is a leading parallel file system, its development emphasis is on performance, and the design aims at easy use, simple installation and management, it is growing and gaining remarkable popularity in communities. BeeGFS has evolved into a file system of global value that provides maximum performance, scalability, high flexibility and robustness.
In practical application, beeGFS is used as a global storage system, a storage medium with lower cost is generally adopted, and a cache technology generally uses a server memory or a client buffered (write back and read pre-through are performed by using a small static buffer pool) and a native (Linux kernel page cache) technology.
Although such a cache technology can meet the cache requirement of BeeGFS, the data stored in such a cache technology is volatile and can only be temporarily stored, and cannot cope with situations such as power failure.
In summary, how to effectively solve the problems of the BeeGFS, such as the cache data volatility, is a technical problem that needs to be solved by those skilled in the art.
Disclosure of Invention
The application aims to provide a caching method, a caching device, a caching equipment and a readable storage medium, which can realize unified namespaces of BeeOND and BeeGFS; the implementation of BeeGFS caching by BeeOND greatly improves BeeGFS performance by less expensive full flash storage media (BeeOND). The Beeond cache data has a local characteristic, and network delay in the data access process is reduced by local client reading and writing; beeGFS has globally accessible features, which are accessible across nodes to computing nodes. The Beeond data cache is nonvolatile, and the data can still be accessed after the node is powered off and powered on.
In order to solve the technical problems, the application provides the following technical scheme:
a caching method, comprising:
acquiring deleted management service and metadata service, and adding BeeOND source codes pointing to the BeeGFS metadata service; the BeeGFS is a distributed file system, and the BeeOND is a temporary parallel file system instance;
based on the Beeond source code, running a Beeond program to add a computing node SSD in the Beeond to a BeeGFS cluster and to mount a client;
creating a default pool of the BeeGFS cluster by using the local SSD of the computing node;
and caching the data of the BeeGFS in the default pool in the process of data interaction of the BeeGFS.
Preferably, creating a default pool of the BeeGFS cluster using the computing node local SSD includes:
removing an original default pool of the BeeGFS cluster;
creating a pool named BeeGFS by using the local SSD of the computing node;
and determining the pool named as the BeeGFS as a default pool of the BeeGFS cluster.
Preferably, the method further comprises:
and executing a data placement strategy, and placing corresponding data in the BeeOND storage pool and the BeeGFS storage pool based on the data attributes.
Preferably, executing a data placement policy, placing corresponding data in the BeeOND storage pool and the BeeGFS storage pool based on the data attributes, includes:
dividing the data into hot data and cold data according to at least one data attribute in the user identifier, the group identity, the file name and the catalog;
and storing the hot data in a storage pool of the BeeOND, and storing the cold data in a storage pool of the BeeGFS.
Preferably, the method further comprises:
receiving and analyzing a cache closing request to obtain a target computing node which is requested to be closed;
after executing the buffer closing command corresponding to the target computing node, directly writing the data of the target computing node into a BeeGFS storage medium in the process of data interaction of the BeeGFS.
Preferably, in the process of data interaction of the BeeGFS, caching the data of the BeeGFS in the default pool includes:
and when an application reads the target data of the BeeGFS, copying the target data into the default pool.
Preferably, the method further comprises:
performing access timing on the target data;
and if the target data is not accessed for more than a preset time period, deleting the target data in the default pool.
Preferably, in the process of data interaction of the BeeGFS, caching the data of the BeeGFS in the default pool includes:
and when the application reads the target data of the BeeGFS, migrating the target data from the BeeGFS to the default pool.
Preferably, the method further comprises:
performing access timing on the target data;
and if the target data is not accessed for more than a preset time period, returning the target data from the default pool to the BeeGFS.
Preferably, the method further comprises:
and after the power-off restarting, reading cache data from the default pool.
A caching apparatus, comprising:
the code acquisition module is used for acquiring the BeeOND source codes with the management service and the metadata service deleted and adding the BeeGFS metadata service; the BeeGFS is a distributed file system, and the BeeOND is a temporary parallel file system instance;
the system fusion module is used for running a BeeOND program based on the BeeOND source code so as to add a computing node SSD in the BeeOND to a BeeGFS cluster and load a client;
the default pool creation module is used for creating a default pool of the BeeGFS cluster by utilizing the local SSD of the computing node;
and the caching module is used for caching the data of the BeeGFS in the default pool in the process of data interaction of the BeeGFS.
An electronic device, comprising:
a memory for storing a computer program;
and the processor is used for realizing the steps of the caching method when executing the computer program.
A readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the caching method described above.
By applying the method provided by the embodiment of the application, the management service and the metadata service are obtained and deleted, and the BeeOND source code pointing to the BeeGFS metadata service is added; wherein, beeGFS is a distributed file system, and BeeOND is a temporary parallel file system instance; based on the Beeond source code, running a Beeond program to add a computing node SSD in the Beeond to the BeeGFS cluster and to mount a client; creating a default pool of the BeeGFS cluster by using a computing node local SSD; during the data interaction process of the BeeGFS, the data of the BeeGFS are cached in a default pool.
In the application, the data physical isolation between the Beeond and the BeeGFS is broken through firstly sharing the BeeGFS metadata service between the Beeond and the BeeGFS, namely, the unified naming space is realized. Then, the compute node SSD of BeeOND is added to the BeeGFS cluster, and a default pool of BeeGFS clusters is created based on the compute node SSD of BeeOND. Thus, in the process of data interaction of the BeeGFS, the data of the BeeGFS is cached in the default pool. That is, the BeeOND serves as a buffer area of the BeeGFS to perform data buffering, on one hand, the BeeOND provides a cache for the BeeGFS, and on the other hand, since the BeeOND has non-volatility, the data buffered by the BeeGFS can be powered down without losing.
Correspondingly, the embodiment of the application also provides a caching device, a device and a readable storage medium corresponding to the caching method, which have the technical effects and are not described in detail herein.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the related art, the drawings that are required to be used in the embodiments or the related technical descriptions will be briefly described, and it is apparent that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to the drawings without inventive effort for those skilled in the art.
FIG. 1 is a flowchart of a buffering method according to an embodiment of the present application;
FIG. 2 is a schematic diagram of a buffering device according to an embodiment of the present application;
fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
In order to better understand the aspects of the present application, the present application will be described in further detail with reference to the accompanying drawings and detailed description. It will be apparent that the described embodiments are only some, but not all, embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
Referring to fig. 1, fig. 1 is a flowchart of a buffering method according to an embodiment of the present application, the method includes the following steps:
s101, obtaining the BeeOND source codes with the management service and the metadata service deleted and adding the BeeGFS metadata service.
Wherein, beeGFS is a distributed file system, and BeeOND is a temporary parallel file system instance.
In the embodiment of the application, when the BeeGFS is deployed, the BeeGFS server deployment can be performed according to a BeeGFS standard implementation manual, wherein the BeeGFS server deployment comprises mgmtd management service, meta metadata service and storage data service.
BeeGFS is a leading parallel file system, the development focus of which is performance, and the design aims at easy use, simple installation and management, and is growing and gaining remarkable popularity in communities. BeeGFS has evolved into a file system of global value that provides maximum performance, scalability, high flexibility and robustness. One of the most basic concepts of BeeGFS is to strictly avoid architectural bottlenecks, partitioning file content across multiple storage servers. Another important feature is distributing file system metadata (e.g., directory information) across multiple metadata servers. Large systems and metadata intensive applications can obtain great benefit from the latter feature. BeeGFS builds on an efficient and scalable multi-threaded core component and supports native RDMA (Remote Direct Memory Access, remote direct data access). The file system node may provide both RDMA (all-path, roCE) and TCP/IP network connections and automatically switch to a redundant connection path upon failure of either.
The BeeGFS client and server may run on the same machine to improve the performance of a small cluster or network. The BeeGFS does not require a dedicated File System partition on the server, it uses existing partitions, formatted using any standard Linux (an operating System) File System, such as XFS (X File System, new generation File System), ext4 (log File System under Linux System, which is a successor to ext3 File System) or ZFS (Zettabyte File System, dynamic File System, which is the first 128-bit File System). For larger networks, several different BeeGFS file system partitions may also be created using different configurations.
Most HPC (high performance computing ) cluster systems use a global storage system based on a parallel file system on dedicated servers to achieve high throughput. Computing nodes are typically equipped (or readily equipped) with internal hard disks or SSDs (Solid State disks), which may provide additional performance advantages. A problem with internal drivers in compute nodes is that they provide neither the advantage of a single namespace across multiple computers nor the flexibility and performance of sharing parallel file systems. The BeeOND is developed to dynamically and easily create one or more BeeGFS instances, and on a per job basis, create a shared parallel file system on all computing nodes of a particular computing job, aggregate the performance and capacity of SSD or hard disk inside the computing node, and provide additional performance and a very elegant burst buffer.
Because Beeond is very simple to launch, it is easy to integrate bed with a workload manager, such as Torque or Slurm. Since the BeeOND can start and stop a new BeeGFS instance with only one command, it can be easily added to these scripts to start when a computing job starts and stop when the job is completed.
BeeOND is nonvolatile but is data isolated from BeeGFS, and thus does not provide caching techniques for BeeGFS in a unified namespace. That is, the BeeGFS is physically isolated from the BeeOND, the BeeGFS cannot read and write BeeOND data, and vice versa, there is no intersection of the two.
In order to break the isolation state between the BeeOND and the BeeGFS, in the embodiment of the application, a method based on metadata sharing service is provided to realize the unified naming space of the BeeOND and the BeeGFS.
Specifically, firstly, the management service and the metadata service are deleted, and the BeeOND source code pointing to the BeeGFS metadata service is added.
That is, the BeeOND source code can be modified first, the original mgmtd management service and meta metadata service part are deleted, and then the BeeOND source code is added with the parameters pointing to the BeeGFS management service node, so that the BeeOND can correspond to all metadata services.
S102, based on the Beeond source code, running a Beeond program to add a computing node SSD in the Beeond to the BeeGFS cluster and mount the client.
After the Beeond source codes for metadata service sharing setting are obtained, the Beeond program is operated based on the Beeond source codes, so that the computing node SSD in the Beeond can be added to the BeeGFS cluster, and the client is mounted. That is, the compute nodes SSD in BeeOND may be merged into the BeeGFS cluster such that the compute nodes SSD in BeeOND are included within the BeeGFS cluster. After client mounting, the compute node SSD in BeeOND may be used at the BeeGFS cluster.
S103, using the local SSD of the computing node, and creating a default pool of the BeeGFS cluster.
In order for the BeeOND to act as a cache for BeeGFS, a default pool of BeeGFS clusters may be created with the compute node local SSD of BeeOND.
That is, after the deployment of steps S101 and S102 is performed, the physical isolation between the BeeOND and the BeeGFS may be broken (i.e., the BeeGFS may access the data stored by the BeeOND and the BeeGFS may also access the data stored by the BeeGFS), so in the embodiment of the present application, the full flash memory medium of the BeeOND may be used as a component of the default pool of the BeeGFS, so that the BeeGFS uses the full flash memory medium of the BeeOND to perform the caching when using the default pool.
In one embodiment of the present application, creating a default pool of BeeGFS clusters using a computing node local SSD includes:
step one, eliminating an original default pool of the BeeGFS cluster;
step two, creating a pool named as BeeGFS by using a local SSD of a computing node;
and step three, determining a pool named as the BeeGFS as a default pool of the BeeGFS cluster.
For convenience of description, the following description will be given by combining the above three steps.
The BeeGFS itself has a default pool, so as to avoid the chaotic state of the two default pools, after the BeeOND and the BeeGFS share metadata service, the default BeeOND and the BeeGFS are in the same storage pool, the original BeeGFS storage target is removed from the default pool by adjusting the storage pool, and a pool named by the BeeGFS is created. All data after adjustment are preferentially written into a default pool formed by the local SSD of the computing node, but the data cannot automatically fall into a BeeGFS storage medium, and write caching in the storage medium in the BeeGFS cannot be realized, namely, cached data of the BeeGFS is directly stored into the BeeOND.
And S104, caching the data of the BeeGFS in a default pool in the process of data interaction of the BeeGFS.
Since the buffer is a buffer for data exchange, in step S103, a default pool of the BeeGFS is formed based on the local SSD of the computing node, and when the BeeGFS performs data exchange, that is, when there is a buffer requirement, the data of the BeeGFS can be buffered in the default pool. That is, beeOND is implemented as a cache of BeeGFS.
In the buffer area, the Beeond is used as a cache of the BeeGFS, and the data stored in the Beeond has the characteristic of temporary property, so that the buffer capacity is continuously provided for ensuring sufficient space of the Beeond, and the data in the Beeond can be managed. That is, space recovery is required for BeeOND, in the embodiment of the present application, the following recovery method is given, any one of them may be selected in practical application, and other space recovery methods (for example, space recovery is performed periodically) may also be selected to perform storage space recovery:
recovery mode 1: if, during the data interaction process of the BeeGFS, the data of the BeeGFS is cached in the default pool, specifically: when the application reads the target data of the BeeGFS, the target data is copied into a default pool, and then the space recovery can be performed by executing the following steps:
step one, performing access timing on target data;
and step two, deleting the target data in the default pool if the target data is not accessed beyond the preset time length.
Since the buffer is a buffer for data exchange, the data has the characteristic of being temporary, and the configuration hierarchical storage mainly focuses on the modification attribute of the file. Illustrating: files that are not accessed for more than 30 minutes (time-adjustable, not listed here any more) will automatically migrate to the BeeGFS storage medium; when the application reads the data, the data is copied to the BeeOND storage medium in one copy, and when it is not accessed for more than 30 minutes, the stored temporary data is deleted from the BeeOND.
In addition, when the data is modified in the BeeOND, the latest data can be dropped to the BeeGFS, and then the data stored in the BeeOND can be deleted.
Recovery mode 2: if, during the data interaction process of the BeeGFS, the data of the BeeGFS is cached in the default pool, specifically including: when the application reads the target data of the BeeGFS, the target data is migrated from the BeeGFS to the default pool, and then the following steps are performed to perform space recycling:
step one, performing access timing on target data;
and step two, if the target data is not accessed beyond the preset time, returning the target data from the default pool to the BeeGFS.
Illustrating: when an application reads data, the data is migrated to the BeeOND storage medium, and when the application does not access the data for more than 30 minutes, the stored temporary data is migrated from the BeeOND to the BeeGFS.
In addition, when data is modified in the BeeOND, the latest data stored by the BeeOND is subject to when the disc is dropped back to BeeGFS.
Since BeeOND has non-volatility, cache data is read from the default pool after a power-down restart. By taking the non-volatile BeeOND as the buffer area of the BeeGFS, the data buffered by the BeeGFS can be enabled not to be lost when power is lost, and the reliability and consistency of the data can be effectively improved.
That is, in the embodiment of the application, the unified naming space of the BeeOND and the BeeGFS can be realized; the caching of the BeeGFS is implemented by BeeOND. BeeGFS performance is greatly improved by less expensive full flash memory media (BeeOND). The Beeond cache data has a local characteristic, and network delay in the data access process is reduced by local client reading and writing; beeGFS has globally accessible features, which are accessible across nodes to computing nodes. The Beeond data cache is nonvolatile, and the data can still be accessed after the node is powered off and powered on.
By applying the method provided by the embodiment of the application, the management service and the metadata service are obtained and deleted, and the BeeOND source code pointing to the BeeGFS metadata service is added; wherein, beeGFS is a distributed file system, and BeeOND is a temporary parallel file system instance; based on the Beeond source code, running a Beeond program to add a computing node SSD in the Beeond to the BeeGFS cluster and to mount a client; creating a default pool of the BeeGFS cluster by using a computing node local SSD; during the data interaction process of the BeeGFS, the data of the BeeGFS are cached in a default pool.
In the application, the data physical isolation between the Beeond and the BeeGFS is broken through firstly sharing the BeeGFS metadata service between the Beeond and the BeeGFS, namely, the unified naming space is realized. Then, the compute node SSD of BeeOND is added to the BeeGFS cluster, and a default pool of BeeGFS clusters is created based on the compute node SSD of BeeOND. Thus, in the process of data interaction of the BeeGFS, the data of the BeeGFS is cached in the default pool. That is, the BeeOND serves as a buffer area of the BeeGFS to perform data buffering, on one hand, the BeeOND provides a cache for the BeeGFS, and on the other hand, since the BeeOND has non-volatility, the data buffered by the BeeGFS can be powered down without losing.
It should be noted that, based on the above embodiments, the embodiments of the present application further provide corresponding improvements. The preferred/improved embodiments relate to the same steps as those in the above embodiments or the steps corresponding to the steps may be referred to each other, and the corresponding advantages may also be referred to each other, so that detailed descriptions of the preferred/improved embodiments are omitted herein.
In a specific embodiment of the present application, considering that the storage medium of the BeeOND is a full flash storage medium, the response speed is higher, in order to effectively improve the overall IO performance of the BeeGFS, a data placement policy may also be executed, and based on the data attribute, corresponding data is placed in the BeeOND storage pool and the BeeGFS storage pool.
Specifically, executing a data placement policy, placing corresponding data in a BeeOND storage pool and a BeeGFS storage pool based on data attributes, including:
step one, dividing data into hot data and cold data according to at least one data attribute in a user identifier, a group identity, a file name and a catalog;
and step two, storing hot data in a storage pool of the BeeOND, and storing cold data in a storage pool of the BeeGFS.
For convenience of description, the two steps are described in combination.
Through the hierarchical storage function, a data placement strategy is configured, and according to a User Identifier (UID), a GID (Group Identification, group identity, which refers to the identity of a shared resource system user), a file name, a directory and other attributes, corresponding data can be selectively placed into a BeeOND storage pool or a BeeGFS storage pool.
In a specific embodiment of the present application, the cache may be turned on or off by taking the computing node as a unit. The specific implementation process comprises the following steps:
step one, receiving and analyzing a cache closing request to obtain a target computing node which is requested to be closed;
and step two, after executing the buffer closing command corresponding to the target computing node, directly writing the data of the target computing node into the BeeGFS storage medium in the process of data interaction of the BeeGFS.
For convenience of description, the two steps are described in combination.
The client can operate the client according to the requirement, so that the client sends out a cache closing request.
After receiving the cache closing request, the target computing node which requests closing can be explicitly determined by analyzing the request. And then executing the cache closing command corresponding to the target computing node. Therefore, in the process of data interaction of the BeeGFS, the data of the target computing node can be directly written into the BeeGFS storage medium. That is, based on the characteristics of Beeond, i.e., very simple to start, it is easy to integrate bed with a workload manager, such as Torque or Slurm. Since the BeeOND can start and stop a new BeeGFS instance with only one command, it can be easily added to these scripts to start when a computing job starts and stop when the job is completed.
In order to facilitate the better application of the caching method provided by the embodiment of the present application for those skilled in the art, the specific application of the caching method is described in detail below with reference to a specific application scenario as an example.
In the embodiment of the application, based on a BeeGFS file system, a metadata service sharing mode is provided to realize the unified naming space of the BeeGFS and the BeeOND. The specific implementation process comprises the following steps:
and step one, performing BeeGFS server deployment according to a BeeGFS standard implementation manual, wherein the BeeGFS server deployment comprises mgmtd management service, meta metadata service and storage data service.
And step two, modifying the BeeOND source code, and deleting the original mgmtd management service and meta metadata service part.
And thirdly, adding parameters pointing to the management service node by the Beeond source codes.
And step three, running the BeeOND program, automatically adding the computing node SSD to the original BeeGFS cluster, and mounting the client.
Based on the hierarchical storage function, the data is automatically migrated to the BeeGFS storage space, the cache space is timely released, and the specific implementation process comprises the following steps:
step one, default BeeOND and BeeGFS are in the same storagepool, original BeeGFS storage target is removed from the default pool by adjusting the storagepool, and a pool named BeeGFS is created. All the adjusted data are written into a default pool formed by the local SSD of the computing node preferentially, but the data cannot automatically fall into the BeeGFS storage medium, and write caching is not realized in the BeeGFS storage medium.
Step two, configuring a data placement strategy through a hierarchical storage function.
Specifically, the corresponding data can be selectively placed in the BeeOND storage pool or the BeeGFS storage pool according to the user defined attribute such as UID, GID, file name, catalog and the like.
And thirdly, configuring a data migration strategy through a hierarchical storage function.
Because the cache is a buffer zone for data exchange, and the data has the characteristic of temporary property, the configuration hierarchical storage mainly focuses on the modification attribute of the file, and the file which is not accessed for more than 30 minutes (with adjustable time) can be automatically migrated to the BeeGFS storage medium; when the application reads the data, the data is copied to the BeeOND storage medium one copy, and when it is not accessed for more than 30 minutes, the stored temporary data is deleted.
And step four, if a certain computing node task does not need the BeeOND buffer, the node can be temporarily closed (can be performed through a command), and the data can be directly written into the BeeGFS storage medium.
That is, by applying the caching method provided by the embodiment of the application, the Beeond fusion deployment technology: the original BeeOND mgmtd management service deployment is pointed to be an existing BeeGFS cluster, and meta metadata service deployment is removed, so that a unified naming space of the BeeOND and the BeeGFS is realized; beeOND caching technique: through the hierarchical storage function, the data IO intercommunication between the BeeOND storage medium and the BeeGFS storage medium is realized; the separation deployment mode comprises the following steps: the BeeOND cache may be enabled at the blind computing node, not globally necessary.
The caching method provided by the embodiment of the application can realize unified namespaces of the BeeOND and the BeeGFS; the BeeGFS cache is realized through the BeeOND; realizing data distribution and migration strategies through hierarchical storage; cached data has local, persistent, and global access characteristics.
Corresponding to the above method embodiment, the embodiment of the present application further provides a caching apparatus, where the caching apparatus described below and the caching method described above may be referred to correspondingly.
Referring to fig. 2, the apparatus includes the following modules:
a code acquisition module 101, configured to acquire a BeeOND source code from which the management service and the metadata service are deleted, and add a BeeGFS metadata service; wherein, beeGFS is a distributed file system, and BeeOND is a temporary parallel file system instance;
a system fusion module 102, configured to run a BeeOND program based on BeeOND source code, so as to add a computing node SSD in BeeOND to a BeeGFS cluster, and mount a client;
a default pool creation module 103, configured to create a default pool of the BeeGFS cluster by using the local SSD of the computing node;
and the caching module 104 is configured to cache the data of the BeeGFS in a default pool in the process of performing data interaction on the BeeGFS.
By applying the device provided by the embodiment of the application, the management service and the metadata service are obtained and deleted, and the BeeOND source code pointing to the BeeGFS metadata service is added; wherein, beeGFS is a distributed file system, and BeeOND is a temporary parallel file system instance; based on the Beeond source code, running a Beeond program to add a computing node SSD in the Beeond to the BeeGFS cluster and to mount a client; creating a default pool of the BeeGFS cluster by using a computing node local SSD; during the data interaction process of the BeeGFS, the data of the BeeGFS are cached in a default pool.
In the application, the data physical isolation between the Beeond and the BeeGFS is broken through firstly sharing the BeeGFS metadata service between the Beeond and the BeeGFS, namely, the unified naming space is realized. Then, the compute node SSD of BeeOND is added to the BeeGFS cluster, and a default pool of BeeGFS clusters is created based on the compute node SSD of BeeOND. Thus, in the process of data interaction of the BeeGFS, the data of the BeeGFS is cached in the default pool. That is, the BeeOND serves as a buffer area of the BeeGFS to perform data buffering, on one hand, the BeeOND provides a cache for the BeeGFS, and on the other hand, since the BeeOND has non-volatility, the data buffered by the BeeGFS can be powered down without losing.
In a specific embodiment of the present application, the default pool creation module 103 is specifically configured to reject an original default pool of the BeeGFS cluster;
creating a pool named BeeGFS by using a computing node local SSD;
the pool named BeeGFS is determined to be the default pool for the BeeGFS cluster.
In one embodiment of the present application, the method further comprises:
and the data placement module is used for executing a data placement strategy and placing corresponding data in the BeeOND storage pool and the BeeGFS storage pool based on the data attribute.
In one embodiment of the present application, the data placement module is specifically configured to divide data into hot data and cold data according to at least one data attribute of a user identifier, a group identity, a file name, and a directory;
hot data is stored in the storage pool of BeeOND and cold data is stored in the storage pool of BeeGFS.
In one embodiment of the present application, the method further comprises: the cache control module is used for receiving and analyzing the cache closing request to obtain a target computing node which is requested to be closed;
after executing the buffer closing command corresponding to the target computing node, directly writing the data of the target computing node into the BeeGFS storage medium in the process of data interaction of the BeeGFS.
In one embodiment of the present application, the cache control module is specifically configured to copy the target data into the default pool when the application reads the target data of the BeeGFS.
In one embodiment of the present application, the method further comprises:
the space recovery module 1 is used for performing access timing on target data;
and if the target data is not accessed for more than the preset time, deleting the target data in the default pool.
In one embodiment of the present application, the cache control module is specifically configured to migrate the target data from the BeeGFS to the default pool when the application reads the target data of the BeeGFS.
In one embodiment of the present application, the method further comprises:
the space recovery module 1 is used for performing access timing on target data;
and if the target data is not accessed for more than the preset time period, returning the target data from the default pool to the BeeGFS.
In one embodiment of the present application, the method further comprises:
and the power failure recovery module is used for reading the cache data from the default pool after the power failure is restarted.
Corresponding to the above method embodiment, the embodiment of the present application further provides an electronic device, where an electronic device described below and a caching method described above may be referred to correspondingly.
Referring to fig. 3, the electronic device includes:
a memory 332 for storing a computer program;
processor 322, when executing the computer program, implements the steps of the caching method of the method embodiment described above.
Specifically, referring to fig. 4, fig. 4 is a schematic diagram of a specific structure of an electronic device according to the present embodiment, where the electronic device may have a relatively large difference due to different configurations or performances, and may include one or more processors (central processing units, CPU) 322 (e.g., one or more processors) and a memory 332, where the memory 332 stores one or more computer programs 342 or data 344. Wherein the memory 332 may be transient storage or persistent storage. The program stored in memory 332 may include one or more modules (not shown), each of which may include a series of instruction operations in the data processing apparatus. Still further, the central processor 322 may be configured to communicate with the memory 332 and execute a series of instruction operations in the memory 332 on the electronic device 301.
The electronic device 301 may also include one or more power supplies 326, one or more wired or wireless network interfaces 350, one or more input/output interfaces 358, and/or one or more operating systems 341.
The steps in the caching method described above may be implemented by the structure of the electronic device.
Corresponding to the above method embodiments, the embodiments of the present application further provide a readable storage medium, where a readable storage medium described below and a caching method described above may be referred to correspondingly.
A readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the caching method of the above-described method embodiment.
The readable storage medium may be a usb disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, or an optical disk, and the like.
In this specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, so that the same or similar parts between the embodiments are referred to each other. For the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section.
Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative elements and steps are described above generally in terms of functionality in order to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Those skilled in the art may implement the described functionality using different approaches for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. The software modules may be disposed in Random Access Memory (RAM), memory, read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
Finally, it is further noted that, in this document, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms include, comprise, or any other variation is intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
The principles and embodiments of the present application have been described herein with reference to specific examples, the description of which is intended only to assist in understanding the methods of the present application and the core ideas thereof; meanwhile, as those skilled in the art will vary in the specific embodiments and application scope according to the idea of the present application, the present disclosure should not be construed as limiting the present application in summary.

Claims (13)

1. A caching method, comprising:
acquiring deleted management service and metadata service, and adding BeeOND source codes pointing to the BeeGFS metadata service; the BeeGFS is a distributed file system, and the BeeOND is a temporary parallel file system instance;
based on the Beeond source code, running a Beeond program to add a computing node SSD in the Beeond to a BeeGFS cluster and to mount a client;
creating a default pool of the BeeGFS cluster by using the local SSD of the computing node;
and caching the data of the BeeGFS in the default pool in the process of data interaction of the BeeGFS.
2. The caching method of claim 1, wherein creating a default pool of the BeeGFS cluster using the computing node local SSD comprises:
removing an original default pool of the BeeGFS cluster;
creating a pool named BeeGFS by using the local SSD of the computing node;
and determining the pool named as the BeeGFS as a default pool of the BeeGFS cluster.
3. The caching method of claim 1, further comprising:
and executing a data placement strategy, and placing corresponding data in the BeeOND storage pool and the BeeGFS storage pool based on the data attributes.
4. The caching method of claim 3, wherein executing the data placement policy places corresponding data in the BeeOND pool and the BeeGFS pool based on the data attributes, comprising:
dividing the data into hot data and cold data according to at least one data attribute in the user identifier, the group identity, the file name and the catalog;
and storing the hot data in a storage pool of the BeeOND, and storing the cold data in a storage pool of the BeeGFS.
5. The caching method of claim 1, further comprising:
receiving and analyzing a cache closing request to obtain a target computing node which is requested to be closed;
after executing the buffer closing command corresponding to the target computing node, directly writing the data of the target computing node into a BeeGFS storage medium in the process of data interaction of the BeeGFS.
6. The caching method according to claim 1, wherein, during the data interaction of the BeeGFS, caching the data of the BeeGFS in the default pool includes:
and when an application reads the target data of the BeeGFS, copying the target data into the default pool.
7. The caching method of claim 6, further comprising:
performing access timing on the target data;
and if the target data is not accessed for more than a preset time period, deleting the target data in the default pool.
8. The caching method according to claim 1, wherein, during the data interaction of the BeeGFS, caching the data of the BeeGFS in the default pool includes:
and when the application reads the target data of the BeeGFS, migrating the target data from the BeeGFS to the default pool.
9. The caching method of claim 8, further comprising:
performing access timing on the target data;
and if the target data is not accessed for more than a preset time period, returning the target data from the default pool to the BeeGFS.
10. The caching method according to any one of claims 1 to 9, further comprising:
and after the power-off restarting, reading cache data from the default pool.
11. A caching apparatus, comprising:
the code acquisition module is used for acquiring the BeeOND source codes with the management service and the metadata service deleted and adding the BeeGFS metadata service; the BeeGFS is a distributed file system, and the BeeOND is a temporary parallel file system instance;
the system fusion module is used for running a BeeOND program based on the BeeOND source code so as to add a computing node SSD in the BeeOND to a BeeGFS cluster and load a client;
the default pool creation module is used for creating a default pool of the BeeGFS cluster by utilizing the local SSD of the computing node;
and the caching module is used for caching the data of the BeeGFS in the default pool in the process of data interaction of the BeeGFS.
12. An electronic device, comprising:
a memory for storing a computer program;
processor for implementing the steps of the caching method according to any one of claims 1 to 10 when executing said computer program.
13. A readable storage medium, characterized in that it has stored thereon a computer program which, when executed by a processor, implements the steps of the caching method according to any one of claims 1 to 10.
CN202310451188.0A 2023-04-21 2023-04-21 Caching method, caching device, caching equipment and readable storage medium Pending CN116700608A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310451188.0A CN116700608A (en) 2023-04-21 2023-04-21 Caching method, caching device, caching equipment and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310451188.0A CN116700608A (en) 2023-04-21 2023-04-21 Caching method, caching device, caching equipment and readable storage medium

Publications (1)

Publication Number Publication Date
CN116700608A true CN116700608A (en) 2023-09-05

Family

ID=87830098

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310451188.0A Pending CN116700608A (en) 2023-04-21 2023-04-21 Caching method, caching device, caching equipment and readable storage medium

Country Status (1)

Country Link
CN (1) CN116700608A (en)

Similar Documents

Publication Publication Date Title
US11593319B2 (en) Virtualized data storage system architecture
US11068395B2 (en) Cached volumes at storage gateways
US10853339B2 (en) Peer to peer ownership negotiation
US9274956B1 (en) Intelligent cache eviction at storage gateways
US9268651B1 (en) Efficient recovery of storage gateway cached volumes
US9251003B1 (en) Database cache survivability across database failures
US8694469B2 (en) Cloud synthetic backups
US9613064B1 (en) Facilitating the recovery of a virtual machine using a distributed filesystem
US9559889B1 (en) Cache population optimization for storage gateways
US10852996B2 (en) System and method for provisioning slave storage including copying a master reference to slave storage and updating a slave reference
US20050071560A1 (en) Autonomic block-level hierarchical storage management for storage networks
KR102288503B1 (en) Apparatus and method for managing integrated storage
WO2006089479A1 (en) A data managing method in a network storage system and the network storage system based on the method
US8122182B2 (en) Electronically addressed non-volatile memory-based kernel data cache
US20120159480A1 (en) Data processing method and apparatus for remote storage system
CN107832423B (en) File reading and writing method for distributed file system
US11567680B2 (en) Method and system for dynamic storage scaling
Choi et al. A write-friendly approach to manage namespace of Hadoop distributed file system by utilizing nonvolatile memory
CN116700608A (en) Caching method, caching device, caching equipment and readable storage medium
US8356016B1 (en) Forwarding filesystem-level information to a storage management system
US10713121B1 (en) Dynamic migration of a cloud based distributed file system metadata server
Xu et al. Cooperating with high available write cache and local read only buffer to reduces the latency of small read and write

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination