CN111708488B - Distributed memory disk-based Ceph performance optimization method and device - Google Patents

Distributed memory disk-based Ceph performance optimization method and device Download PDF

Info

Publication number
CN111708488B
CN111708488B CN202010452359.8A CN202010452359A CN111708488B CN 111708488 B CN111708488 B CN 111708488B CN 202010452359 A CN202010452359 A CN 202010452359A CN 111708488 B CN111708488 B CN 111708488B
Authority
CN
China
Prior art keywords
storage
storage node
virtual
node
osd
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010452359.8A
Other languages
Chinese (zh)
Other versions
CN111708488A (en
Inventor
丁钊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Inspur Intelligent Technology Co Ltd
Original Assignee
Suzhou Inspur Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Inspur Intelligent Technology Co Ltd filed Critical Suzhou Inspur Intelligent Technology Co Ltd
Priority to CN202010452359.8A priority Critical patent/CN111708488B/en
Publication of CN111708488A publication Critical patent/CN111708488A/en
Application granted granted Critical
Publication of CN111708488B publication Critical patent/CN111708488B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0604Improving or facilitating administration, e.g. storage management
    • G06F3/0607Improving or facilitating administration, e.g. storage management by facilitating the process of upgrading existing storage systems, e.g. for improving compatibility between host and storage device
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1456Hardware arrangements for backup
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1458Management of the backup or restore process
    • G06F11/1464Management of the backup or restore process for networked environments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1458Management of the backup or restore process
    • G06F11/1469Backup restoration techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1471Saving, restoring, recovering or retrying involving logging of persistent data for recovery
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0614Improving the reliability of storage systems
    • G06F3/0619Improving the reliability of storage systems in relation to data integrity, e.g. data losses, bit errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/064Management of blocks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0662Virtualisation aspects
    • G06F3/0664Virtualisation aspects at device level, e.g. emulation of a storage device or system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]

Abstract

The invention provides a distributed memory disk-based Ceph performance optimization method and device, wherein the method comprises the following steps: creating a virtual disk on a memory file system on each storage node of the Ceph distributed storage system; integrating the virtual disks on the plurality of storage nodes, and creating a high-speed storage pool by using the integrated virtual disks; the performance of the Ceph distributed storage system is accelerated based on the created high-speed storage pool. By using the scheme of the invention, the distributed memory disk can be used as a high-speed storage pool, higher performance is provided than media such as a solid state disk, the scheme of the prior art of using the solid state disk for acceleration can be compatible, the defect that the memory is easy to lose data is overcome through the use of a redundancy rule and a processing flow under special conditions, and the reliability is higher.

Description

Distributed memory disk-based Ceph performance optimization method and device
Technical Field
The field relates to the field of computers, and more particularly, to a method and apparatus for distributed memory disk-based Ceph performance optimization.
Background
Ceph is a unified, distributed storage system designed for excellent performance, reliability, and scalability. The Ceph cluster can provide three use scenes, namely block storage, object storage and file storage. In the existing performance optimization mode, a solid state disk is usually used as a high-speed medium, and a hierarchical storage or cache mode is used to improve performance. Memory in a host in the distributed storage domain is typically used as a cache for a software stack on a single node. The memory is cached in units of physical pages, which is different from the unit of sectors used by block devices such as hard disks. In the prior art, a scheme that a single machine uses a memory disc for acceleration exists, and the defects of small capacity and easy data loss exist.
Disclosure of Invention
In view of this, an object of the embodiments of the present invention is to provide a method and an apparatus for Ceph performance optimization based on a distributed memory disk, where by using the method of the present invention, the distributed memory disk can be used as a high-speed storage pool, and provides higher performance than media such as a solid state disk, and can be compatible with an existing scheme for acceleration by using a solid state disk, and by using a redundancy rule and a processing flow under a special condition, a defect that a memory is prone to data loss is overcome, and the method and the apparatus have higher reliability.
In view of the above, an aspect of the embodiments of the present invention provides a method for Ceph performance optimization based on a distributed memory disk, including the following steps:
creating a virtual disk on a memory file system on each storage node of the Ceph distributed storage system;
integrating virtual disks on a plurality of storage nodes, and creating a high-speed storage pool by using the integrated virtual disks;
the performance of the Ceph distributed storage system is accelerated based on the created high-speed storage pool.
According to an embodiment of the present invention, further comprising:
in response to receiving a command of restarting or shutting down the storage node, calling a script program to record the state and configuration information of a high-speed storage pool of the storage node, and recording the configuration of a memory disc on the storage node and the configuration information of a corresponding OSD (on screen display) (object storage device);
in response to the fact that the storage node is restarted, reconstructing a virtual block device in a memory file system of the storage node based on the detected recorded information, and replacing the virtual block device of the original storage node with the newly-created virtual block device;
and calculating missing data blocks in the restarting process and synchronizing the data blocks based on the data on other non-restarted storage nodes.
According to an embodiment of the present invention, further comprising:
in response to receiving a command of unexpected power failure recovery of a storage node, transmitting OSD capacity and id information of the storage node in a management node to the storage node, reconstructing virtual block equipment in a memory file system of the storage node, and replacing the virtual block equipment of the original storage node with newly-created virtual block equipment;
and calculating missing data blocks in the power-off process and synchronizing based on data on other storage nodes.
According to an embodiment of the present invention, creating a virtual disk on a memory file system on each storage node of a Ceph distributed storage system comprises:
mounting Tmpfs (linux memory file system) with a specified size on each storage node to a specified path;
respectively creating virtual disk files with specified sizes under the paths;
the virtual disk file is mounted as a local loop device (a pseudo device, which is a technique for simulating a block device using a file, and the file is used as a magnetic disk or an optical disk after being simulated into the block device) on each storage node.
According to an embodiment of the present invention, the integrating virtual disks on a plurality of storage nodes, and the creating a high-speed storage pool using the integrated virtual disks includes:
respectively initializing local loop equipment on all storage nodes into an OSD (on screen display) of the Ceph distributed storage system, creating a high-speed storage pool by using the OSD, and setting fault domains of the high-speed storage pool according to the number and distribution conditions of the storage nodes;
dividing each OSD into a plurality of PGs (grouped together), and uniformly distributing the original data block and the redundant data block in different PGs of different OSD through the hash algorithm carried by the Ceph distributed storage system.
In another aspect of the embodiments of the present invention, an apparatus for Ceph performance optimization based on a distributed memory disk is further provided, where the apparatus includes:
the creating module is configured to create a virtual disk on a memory file system on each storage node of the Ceph distributed storage system;
the integration module is configured to integrate the virtual disks on the plurality of storage nodes, and a high-speed storage pool is created by using the integrated virtual disks;
an application module configured to perform performance acceleration on the Ceph distributed storage system based on the created high-speed storage pool.
According to an embodiment of the invention, the recovery module is further configured to:
in response to receiving a command of restarting or shutting down the storage node, calling a script program to record the state and configuration information of a high-speed storage pool of the storage node, and recording the configuration of a memory disc on the storage node and the configuration information of a corresponding OSD (on screen display);
in response to the fact that the storage node is restarted, reconstructing virtual block equipment in a memory file system of the storage node based on the detected recorded information, and replacing the virtual block equipment of the original storage node with newly-created virtual block equipment;
and calculating the missing data blocks in the restarting process and synchronizing based on the data on other non-restarted storage nodes.
According to one embodiment of the invention, the power supply further comprises a power-off module configured to:
in response to receiving a command of unexpected power failure recovery of a storage node, transmitting OSD capacity and id information of the storage node in a management node to the storage node, reconstructing virtual block equipment in a memory file system of the storage node, and replacing the virtual block equipment of the original storage node with newly-created virtual block equipment;
and calculating missing data blocks in the power-off process and synchronizing based on data on other storage nodes.
According to one embodiment of the invention, the creation module is further configured to:
mounting the Tmpfs with the specified size on each storage node to a specified path;
respectively creating virtual disk files with specified sizes under the paths;
and mounting the virtual disk file as a local loop device on each storage node.
According to one embodiment of the invention, the integration module is further configured to:
respectively initializing local loop equipment on all storage nodes into OSD (on screen display) of the Ceph distributed storage system, creating a high-speed storage pool by using the OSD, and setting fault domains of the high-speed storage pool according to the number and distribution conditions of the storage nodes;
dividing each OSD into a plurality of PG (PG) arrangement groups, and uniformly distributing the original data blocks and the redundant data blocks in different PGs of different OSD through a hash algorithm carried by a Ceph distributed storage system.
The invention has the following beneficial technical effects: in the method for optimizing Ceph performance based on a distributed memory disk provided by the embodiment of the present invention, a virtual disk is created on a memory file system on each storage node of a Ceph distributed storage system; integrating the virtual disks on the plurality of storage nodes, and creating a high-speed storage pool by using the integrated virtual disks; the technical scheme for accelerating the performance of the Ceph distributed storage system based on the created high-speed storage pool can use the distributed memory disk as the high-speed storage pool, provides higher performance than media such as a solid state disk, can be compatible with the existing scheme for accelerating by using the solid state disk, overcomes the defect that the memory is easy to lose data through the use of a redundancy rule and a processing flow under special conditions, and has higher reliability.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other embodiments can be obtained by using the drawings without creative efforts.
FIG. 1 is a schematic flow chart diagram of a method for distributed memory disk based Ceph performance optimization according to an embodiment of the present invention;
FIG. 2 is a diagram illustrating an apparatus for distributed memory disk based Ceph performance optimization according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of constructing a high-speed storage pool, according to one embodiment of the invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the following embodiments of the present invention are described in further detail with reference to the accompanying drawings.
In view of the foregoing, a first aspect of the embodiments of the present invention provides an embodiment of a method for Ceph performance optimization based on a distributed memory disk. Fig. 1 shows a schematic flow diagram of the method.
As shown in fig. 1, the method may comprise the steps of:
s1, creating a virtual disk on a memory file system on each storage node of a Ceph distributed storage system, and using a memory as storage to greatly improve the access performance;
s2, integrating the virtual disks on the plurality of storage nodes, and creating a high-speed storage pool by using the integrated virtual disks, so that a redundancy design can be added to ensure data safety in case of accidents;
s3, the performance of the Ceph distributed storage system is accelerated based on the created high-speed storage pool, and for example, the performance of the storage system can be accelerated by using a three-layer storage scheme.
The technical scheme provided by the invention can overcome the defect of data loss caused by power failure of the memory by technical means. The method converts the memory into block devices which can be managed by the distributed storage system, and performs pooling on the memory resources in the plurality of storage nodes in the form of the block devices. And a redundancy design is added into the resource pool, so that the data security is enhanced. And accelerating distributed storage by using the created high-speed memory storage pool. Data security is a prerequisite for storage systems.
By the technical scheme of the invention, the distributed memory disk can be used as a high-speed storage pool, higher performance is provided compared with media such as a solid state disk, the existing scheme of using the solid state disk for acceleration can be compatible, the defect that the memory is easy to lose data is overcome through the use of a redundancy rule and a processing flow under special conditions, and the distributed memory disk has higher reliability.
In a preferred embodiment of the present invention, further comprising:
in response to receiving a command of restarting or shutting down the storage node, calling a script program to record the state and configuration information of a high-speed storage pool of the storage node, and recording the configuration of a memory disc on the storage node and the configuration information of a corresponding OSD (on screen display);
in response to the fact that the storage node is restarted, reconstructing virtual block equipment in a memory file system of the storage node based on the detected recorded information, and replacing the virtual block equipment of the original storage node with newly-created virtual block equipment;
and calculating missing data blocks in the restarting process and synchronizing the data blocks based on the data on other non-restarted storage nodes.
The storage nodes are restarted or shut down, and the two situations are respectively explained below, wherein the first situation is that part of the storage nodes are shut down or restarted in a plan, and when the number of the restarting nodes is not higher than the number of the fault nodes allowed by the redundancy rule, the data recovery is performed by using the following process: (1) After a restart or shutdown command is executed, a script program is called to record the state and the configuration information of the high-speed storage pool, and the configuration of the memory disk on the node to be restarted and the configuration information of the corresponding OSD are recorded. The data in the high-speed storage pool is not required to be migrated to the nonvolatile storage pool. The node operating system is closed according to a normal flow; (2) After the node system is restarted, the event and the configuration information recorded during shutdown are detected, and the virtual block device in the memory file system is reconstructed according to the method of the invention. Executing a failure disk replacement process in the Ceph, removing the OSD recorded in the last shutdown according to a failure disk, and replacing the OSD by using newly created virtual block equipment; (3) And automatically reconstructing data when the number and distribution of OSD in the storage pool reach the state before shutdown or restart, and calculating missing data blocks according to data on other nodes which are not restarted. Because the memory speed is high, the space is small, the data recovery can be completed in a short time, and the time is less than the time taken by restarting the equipment.
The second case is that when all the storage nodes are shut down or restarted in plan, or the number of the restart nodes is higher than the number of the fault nodes allowed by the redundancy rule at the same time, the following process is used: (1) After a restart or shutdown command is executed, calling a script program to record the state and configuration information of the high-speed storage pool, and recording the configuration of a memory disk on a node to be restarted and the configuration information of a corresponding OSD; (2) Migrating the data in the high-speed storage pool to a nonvolatile storage pool, recording logs, and closing a node operating system according to a normal flow; (3) After all node systems are restarted, detecting events and configuration information recorded during shutdown, and rebuilding virtual block equipment in the memory file system according to the step of the first point; (4) Rebuilding a high-speed storage pool according to the method of the invention; (5) And (3) recovering data from the nonvolatile storage pool to the high-speed memory pool according to the log recorded in the step (2) when the number and distribution of OSD in the high-speed storage pool reach the state before shutdown or restart.
In a preferred embodiment of the present invention, further comprising:
in response to receiving a command of restoring the storage node from an unexpected power failure, transmitting OSD capacity and id information of the storage node in the management node to the storage node, reconstructing a virtual block device in a memory file system of the storage node, and replacing the virtual block device of the original storage node with the newly-created virtual block device;
and calculating missing data blocks in the power-off process and synchronizing based on data on other storage nodes.
When the number of the nodes which are unexpectedly powered off is lower than the number of the fault nodes which can be tolerated by the redundancy rule, the following steps are carried out: (1) After the storage node with the unexpected power failure is completely started, the management node of the distributed storage cluster can sense that the node with the unexpected power failure is on-line, and the OSD corresponding to the node memory disc after the node with the on-line is in a fault state because the corresponding block device cannot be found; (2) The information of OSD capacity, id and the like with fault state in the management node is transmitted to the online node as parameters, virtual block equipment is established according to the method of the invention, and then a fault disk replacing process is executed, so that the newly established empty virtual block equipment with the same capacity is added into a high-speed memory pool; (3) And automatically reconstructing data when the number and distribution of OSD in the storage pool reach the state before shutdown or restart, and calculating the missing data blocks according to the data on other nodes which are not restarted. Because the memory speed is high, the space is small, the data recovery can be completed in a short time, and the time is less than the time taken by restarting the equipment.
In a preferred embodiment of the present invention, creating a virtual disk on the memory file system on each storage node of the Ceph distributed storage system comprises:
mounting the Tmpfs with the specified size on each storage node to a specified path;
respectively creating virtual disk files with specified sizes under the paths;
and mounting the virtual disk file as a local loop device on each storage node.
The method comprises the steps of changing a physical page cache using mode of a memory managed by a local operating system into a block device using mode which can be cross-node and managed by a distributed storage system, wherein the memory is a temporary storage area for data operation of the operating system, a page is taken as a basic unit, generally, a server A cannot directly use the memory of a server B for caching, and data in the memory cannot be stored persistently. The mechanical hard disk and the solid state hard disk belong to block equipment, and sectors are used as basic storage units and can be stored persistently. The Tmpfs, i.e. Linux memory file system, in the Linux distribution version is a common memory disk using method. For example, on 4 storage nodes with 256G memories configured, the following operations are respectively performed, namely, a memory file system with the size of 128G is mounted to a/mnt/ramdisk path, two virtual disk files of 63G are respectively created under the path, and the virtual disk files are mounted to two virtual block devices of/dev/loop 0 and/dev/loop 1 by using a linux layout tool. This results in a total of 8 disk-based block storage devices on the 4 storage nodes.
In a preferred embodiment of the present invention, the integrating the virtual disks on the plurality of storage nodes, and the creating the high-speed storage pool using the integrated virtual disks includes:
respectively initializing local loop equipment on all storage nodes into OSD (on screen display) of the Ceph distributed storage system, creating a high-speed storage pool by using the OSD, and setting fault domains of the high-speed storage pool according to the number and distribution conditions of the storage nodes;
dividing each OSD into a plurality of PG (PG) arrangement groups, and uniformly distributing the original data blocks and the redundant data blocks in different PGs of different OSD through a hash algorithm carried by a Ceph distributed storage system.
Memory resources on a plurality of storage nodes are integrated and aggregated into a large capacity space, and a redundancy design is added to ensure data security in the event of an accident. Firstly, initializing the created local loopback loop devices on all storage nodes into OSD of a Ceph distributed storage system, selecting proper fault domain setting according to the number and distribution condition of the storage nodes, then dividing each OSD into a plurality of PG arranging groups, setting a redundancy strategy in a storage pool, and uniformly distributing original data blocks and redundant data blocks in different PGs of different OSD through a Hash algorithm carried by the Ceph. For example, a double-copy redundancy mode is selected, each redundant data block and the original data block are respectively stored in PGs of different fault domains, when at most 50% of nodes can be tolerated to fail, the remaining nodes reconstruct the failed PG, a higher reliability can be obtained by selecting a triple copy, and a higher space utilization rate can be obtained by selecting an erasure code rule. For example, 2 blocks of loop devices 63G are respectively arranged on 4 storage nodes, a storage pool mempool is created by 8 blocks of virtual block devices, a fault domain is set as a node level, three copies are selected by a redundancy rule, that is, each data block generates two copy blocks, and the two copy blocks are respectively stored on the three storage nodes. The actual size of mempool is the total block device size/number of copies, i.e., 168G. When the system is in operation, when any two storage nodes fail, at least one data copy is still stored on the rest nodes. The mempool normally provides storage service and can automatically reconstruct missing data on the remaining normal nodes.
FIG. 3 is a schematic diagram of implementing the building of a high-speed storage pool, where 01 is a memory; 02 is a memory file system; 03, creating a virtual disk file in a memory file system; 04 is an OSD corresponding to the loop device for mounting the virtual disk file; 05, a PG (PG) homing group on OSD (on screen display); 06 is a storage node, and the figure shows that two loop devices are provided on the storage node, and a plurality of storage nodes are provided; numeral 07 denotes a high-speed memory pool.
The high-speed storage pool created by the method can accelerate the performance of the Ceph storage system, and the high-speed storage pool based on the memory disk created by the steps is consistent with the high-speed storage pool created based on the solid-state disk in attribute. The time delay of the mechanical hard disk is 10 milliseconds, the solid-state disk is within 1 millisecond, and the response speed of the internal memory disk is several times faster than that of the solid-state disk. Existing schemes for mechanical disk acceleration using solid state disks are equally applicable to high speed memory pools based on memory disks.
For example, performance acceleration is performed using a tiered storage technique. The three-tier storage scheme is as follows: data is firstly stored in a high-speed memory storage pool as a first-level cache, when a threshold value is reached, a migration operation is triggered, the data is migrated to a solid-state disk storage pool as a second-level cache, and when the second-level cache reaches the threshold value, the migration operation is triggered to be written into a mechanical disk storage pool. The two-tier storage scheme is as follows: data is firstly stored into a high-speed memory storage pool as a first-level cache, and when a threshold value is reached, a migration operation is triggered, and the data is written into a mechanical disk storage pool.
For example, in a cloud computing scenario, a template volume for creating virtual machines in batch is migrated from a normal storage pool to a high-speed memory storage pool on line, so that the performance of the volume is improved. The virtual machine starting storm can be solved.
By the technical scheme of the invention, the distributed memory disk can be used as a high-speed storage pool, higher performance is provided than media such as a solid state disk, the method can be compatible with the existing scheme of accelerating by using the solid state disk, the defect that the memory is easy to lose data is overcome by using a redundancy rule and a processing flow under special conditions, and the method has higher reliability.
It should be noted that, as can be understood by those skilled in the art, all or part of the processes in the methods of the embodiments described above can be implemented by instructing relevant hardware by a computer program, and the program may be stored in a computer-readable storage medium, and when executed, the program may include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like. The embodiments of the computer program may achieve the same or similar effects as any of the above-described method embodiments.
Furthermore, the method disclosed according to an embodiment of the present invention may also be implemented as a computer program executed by a CPU, which may be stored in a computer-readable storage medium. The computer program, when executed by the CPU, performs the functions defined above in the methods disclosed in the embodiments of the present invention.
In view of the above objects, in a second aspect of the embodiments of the present invention, an apparatus for optimizing Ceph performance based on a distributed memory disk is provided, as shown in fig. 2, an apparatus 200 includes:
the creating module is configured to create a virtual disk on a memory file system on each storage node of the Ceph distributed storage system;
the integration module is configured to integrate the virtual disks on the plurality of storage nodes, and a high-speed storage pool is created by using the integrated virtual disks;
an application module configured to perform performance acceleration on the Ceph distributed storage system based on the created high-speed storage pool.
In a preferred embodiment of the present invention, there is further provided a recovery module configured to:
in response to receiving a command of restarting or shutting down the storage node, calling a script program to record the state and configuration information of a high-speed storage pool of the storage node, and recording the configuration of a memory disc on the storage node and the configuration information of a corresponding OSD (on screen display);
in response to the fact that the storage node is restarted, reconstructing virtual block equipment in a memory file system of the storage node based on the detected recorded information, and replacing the virtual block equipment of the original storage node with newly-created virtual block equipment;
and calculating missing data blocks in the restarting process and synchronizing the data blocks based on the data on other non-restarted storage nodes.
In a preferred embodiment of the present invention, the power supply further comprises a power-off module configured to:
in response to receiving a command of unexpected power failure recovery of a storage node, transmitting OSD capacity and id information of the storage node in a management node to the storage node, reconstructing virtual block equipment in a memory file system of the storage node, and replacing the virtual block equipment of the original storage node with newly-created virtual block equipment;
and calculating missing data blocks in the power-off process and synchronizing based on data on other storage nodes.
In a preferred embodiment of the present invention, the creation module is further configured to:
mounting the Tmpfs with the specified size on each storage node to a specified path;
respectively creating virtual disk files with specified sizes under the paths;
and mounting the virtual disk file as a local loop device on each storage node.
In a preferred embodiment of the invention, the integration module is further configured to:
respectively initializing local loop equipment on all storage nodes into OSD (on screen display) of the Ceph distributed storage system, creating a high-speed storage pool by using the OSD, and setting fault domains of the high-speed storage pool according to the number and distribution conditions of the storage nodes;
dividing each OSD into a plurality of PG (PG) arrangement groups, and uniformly distributing the original data blocks and the redundant data blocks in different PGs of different OSD through a hash algorithm carried by a Ceph distributed storage system.
It should be particularly noted that the embodiment of the system described above employs the embodiment of the method described above to specifically describe the working process of each module, and those skilled in the art can easily think that the modules are applied to other embodiments of the method described above.
Further, the above-described method steps and system units or modules may also be implemented using a controller and a computer-readable storage medium for storing a computer program for causing the controller to implement the functions of the above-described steps or units or modules.
Those of skill would further appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the disclosure herein may be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as software or hardware depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the disclosed embodiments of the present invention.
The embodiments described above, particularly any "preferred" embodiments, are possible examples of implementations and are presented merely to clearly understand the principles of the invention. Many variations and modifications may be made to the above-described embodiments without departing from the spirit and principles of the technology described herein. All such modifications are intended to be included within the scope of this disclosure and protected by the following claims.

Claims (6)

1. A distributed memory disk-based Ceph performance optimization method is characterized by comprising the following steps:
creating a virtual disk on a memory file system on each storage node of the Ceph distributed storage system, wherein the creation of the virtual disk on the memory file system on each storage node of the Ceph distributed storage system comprises mounting Tmpfs with a specified size on each storage node to a specified path, respectively creating virtual disk files with the specified size under the path, and mounting the virtual disk files on each storage node as local loop equipment;
integrating the virtual disks on the plurality of storage nodes, creating a high-speed storage pool by using the integrated virtual disks, wherein the virtual disks on the plurality of storage nodes are integrated, the creating of the high-speed storage pool by using the integrated virtual disks comprises the steps of respectively initializing local loop devices on all the storage nodes into OSD (on screen display) of the Ceph distributed storage system, creating the high-speed storage pool by using the OSD, setting fault domains of the high-speed storage pool according to the number and the distribution condition of the storage nodes, dividing each OSD into a plurality of PG (group membership) groups, and uniformly distributing original data blocks and redundant data blocks in different PGs of different OSD (on screen display) through a self-contained Hash algorithm of the Ceph distributed storage system;
performing performance acceleration on the Ceph distributed storage system based on the created high-speed storage pool.
2. The method of claim 1, further comprising:
in response to receiving a command of restarting or shutting down a storage node, calling a script program to record the state and configuration information of the high-speed storage pool of the storage node, and recording the configuration of a memory disk on the storage node and the configuration information of a corresponding OSD (on screen display);
in response to the fact that the storage node is restarted, reconstructing virtual block equipment in a memory file system of the storage node based on the detected recorded information, and replacing the virtual block equipment of the original storage node with newly-created virtual block equipment;
and calculating missing data blocks in the restarting process and synchronizing the data blocks based on the data on other non-restarted storage nodes.
3. The method of claim 1, further comprising:
in response to receiving a command of unexpected power failure recovery of a storage node, transferring OSD capacity and id information of the storage node in a management node to the storage node, reconstructing a virtual block device in a memory file system of the storage node, and replacing the virtual block device of the original storage node with the newly-created virtual block device;
and calculating and synchronizing missing data blocks in the power-off process based on data on other storage nodes.
4. An apparatus for Ceph performance optimization based on a distributed memory disk, the apparatus comprising:
the creating module is configured to create a virtual disk on a memory file system on each storage node of the Ceph distributed storage system, mount Tmpfs with a specified size on each storage node to a specified path, respectively create virtual disk files with the specified size under the path, and mount the virtual disk files as local loop equipment on each storage node;
the integration module is configured to integrate the virtual disks on the plurality of storage nodes, create a high-speed storage pool by using the integrated virtual disks, initialize local loop devices on all the storage nodes to OSD (on screen display) of the Ceph distributed storage system respectively, create the high-speed storage pool by using the OSD, set fault domains of the high-speed storage pool according to the number and distribution conditions of the storage nodes, divide each OSD into a plurality of PG (packet control) homing groups, and uniformly distribute original data blocks and redundant data blocks in different PGs of different OSD (on screen display) through a hash algorithm carried by the Ceph distributed storage system;
an application module configured to accelerate performance of the Ceph distributed storage system based on the created high-speed storage pool.
5. The device of claim 4, further comprising a recovery module configured to:
in response to receiving a command of restarting or shutting down a storage node, calling a script program to record the state and configuration information of the high-speed storage pool of the storage node, and recording the configuration of a memory disk on the storage node and the configuration information of a corresponding OSD (on screen display);
in response to the fact that the storage node is restarted, reconstructing a virtual block device in a memory file system of the storage node based on the detected recorded information, and replacing the virtual block device of the original storage node with the newly-created virtual block device;
and calculating missing data blocks in the restarting process and synchronizing the data blocks based on the data on other non-restarted storage nodes.
6. The device of claim 4, further comprising a power-down module configured to:
in response to receiving a command of unexpected power failure recovery of a storage node, transferring OSD capacity and id information of the storage node in a management node to the storage node, reconstructing a virtual block device in a memory file system of the storage node, and replacing the virtual block device of the original storage node with the newly-created virtual block device;
and calculating and synchronizing missing data blocks in the power-off process based on data on other storage nodes.
CN202010452359.8A 2020-05-26 2020-05-26 Distributed memory disk-based Ceph performance optimization method and device Active CN111708488B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010452359.8A CN111708488B (en) 2020-05-26 2020-05-26 Distributed memory disk-based Ceph performance optimization method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010452359.8A CN111708488B (en) 2020-05-26 2020-05-26 Distributed memory disk-based Ceph performance optimization method and device

Publications (2)

Publication Number Publication Date
CN111708488A CN111708488A (en) 2020-09-25
CN111708488B true CN111708488B (en) 2023-01-06

Family

ID=72537707

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010452359.8A Active CN111708488B (en) 2020-05-26 2020-05-26 Distributed memory disk-based Ceph performance optimization method and device

Country Status (1)

Country Link
CN (1) CN111708488B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112905118B (en) * 2021-02-19 2023-01-20 山东英信计算机技术有限公司 Cluster storage pool creating method
CN113608674B (en) * 2021-06-25 2024-02-23 济南浪潮数据技术有限公司 Method and device for realizing reading and writing of distributed block storage system
CN113535095B (en) * 2021-09-14 2022-02-18 苏州浪潮智能科技有限公司 Data storage method, device and equipment for double storage pools and storage medium
CN115686363B (en) * 2022-10-19 2023-09-26 百硕同兴科技(北京)有限公司 Tape simulation gateway system of IBM mainframe based on Ceph distributed storage

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102291466A (en) * 2011-09-05 2011-12-21 浪潮电子信息产业股份有限公司 Method for optimizing cluster storage network resource configuration
CN103593226A (en) * 2013-11-04 2014-02-19 国云科技股份有限公司 Method for improving IO performance of disc of virtual machine
US20150160872A1 (en) * 2013-12-09 2015-06-11 Hsun-Yuan Chen Operation method of distributed memory disk cluster storage system
CN109714229A (en) * 2018-12-27 2019-05-03 山东超越数控电子股份有限公司 A kind of performance bottleneck localization method of distributed memory system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102291466A (en) * 2011-09-05 2011-12-21 浪潮电子信息产业股份有限公司 Method for optimizing cluster storage network resource configuration
CN103593226A (en) * 2013-11-04 2014-02-19 国云科技股份有限公司 Method for improving IO performance of disc of virtual machine
US20150160872A1 (en) * 2013-12-09 2015-06-11 Hsun-Yuan Chen Operation method of distributed memory disk cluster storage system
CN109714229A (en) * 2018-12-27 2019-05-03 山东超越数控电子股份有限公司 A kind of performance bottleneck localization method of distributed memory system

Also Published As

Publication number Publication date
CN111708488A (en) 2020-09-25

Similar Documents

Publication Publication Date Title
CN111708488B (en) Distributed memory disk-based Ceph performance optimization method and device
CN111031096B (en) Distributed storage system construction method based on mimicry defense
CN101617295B (en) Subsystem controller with aligned cluster and cluster operation method
US8078581B2 (en) Storage system and remote copy control method
CN100570575C (en) A kind of method of data backup and device
US10949314B2 (en) Method and apparatus for failure recovery of storage device
US9471449B2 (en) Performing mirroring of a logical storage unit
US11221927B2 (en) Method for the implementation of a high performance, high resiliency and high availability dual controller storage system
CN103765373A (en) Data storage method, data storage device, and storage equipment
CN106776123B (en) Disaster-tolerant real-time data copying method and system and backup client
JP2001337792A (en) Disk array device
CN115167782B (en) Temporary storage copy management method, system, equipment and storage medium
CN104407940A (en) Method for quickly recovering CDP system
CN112379825B (en) Distributed data storage method and device based on data feature sub-pools
CN106528338A (en) Remote data replication method, storage equipment and storage system
JP6070146B2 (en) Information processing apparatus and backup method
CN105487946A (en) Fault computer automatic switching method and device
US10572346B1 (en) Data integrity check for VM disaster recovery using backup application
CN115098300B (en) Database backup method, disaster recovery method, device and equipment
US20190026195A1 (en) System halt event recovery
CN115599607A (en) Data recovery method of RAID array and related device
CN109582497A (en) One kind being based on the quick emergency starting method of dynamic data increment
CN115470041A (en) Data disaster recovery management method and device
CN111400098A (en) Copy management method and device, electronic equipment and storage medium
CN111459607A (en) Virtual server cluster building method, system and medium based on cloud desktop virtualization

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant