CN112131145A - Caching method and device applied to ceph - Google Patents

Caching method and device applied to ceph Download PDF

Info

Publication number
CN112131145A
CN112131145A CN202010938200.7A CN202010938200A CN112131145A CN 112131145 A CN112131145 A CN 112131145A CN 202010938200 A CN202010938200 A CN 202010938200A CN 112131145 A CN112131145 A CN 112131145A
Authority
CN
China
Prior art keywords
request
caching
drbd
block
ceph
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010938200.7A
Other languages
Chinese (zh)
Other versions
CN112131145B (en
Inventor
刘国辉
杨东升
陈亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Easy Star Technology Development Co ltd
Original Assignee
Beijing Easy Star Technology Development Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Easy Star Technology Development Co ltd filed Critical Beijing Easy Star Technology Development Co ltd
Priority to CN202010938200.7A priority Critical patent/CN112131145B/en
Publication of CN112131145A publication Critical patent/CN112131145A/en
Application granted granted Critical
Publication of CN112131145B publication Critical patent/CN112131145B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0877Cache access modes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0844Multiple simultaneous or quasi-simultaneous cache accessing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/064Management of blocks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/0644Management of space entities, e.g. partitions, extents, pools
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0646Horizontal data movement in storage systems, i.e. moving data in between storage devices or systems
    • G06F3/065Replication mechanisms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0662Virtualisation aspects
    • G06F3/0664Virtualisation aspects at device level, e.g. emulation of a storage device or system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/45579I/O management, e.g. providing access to device drivers or storage
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/45595Network integration; Enabling network access in virtual machine instances

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Software Systems (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

The application discloses a caching method and device applied to ceph, and the method comprises the steps of receiving an IO request initiated by a virtual machine qemu through a bcache block device, wherein the IO request is a data writing request; caching data corresponding to the IO request to drbd block equipment in distributed storage, wherein the drbd block equipment is used as caching equipment of bcache block equipment; and writing the data corresponding to the IO request back to the kernel rbd block device of the operating system, wherein the kernel rbd block device is used as a backing device of the bcache block device. The application aims to provide a cache mode applied to ceph for improving the performance of the ceph.

Description

Caching method and device applied to ceph
Technical Field
The application relates to the technical field of computer data processing, in particular to a caching method and device applied to ceph.
Background
The cloud computing technology is rapidly developed in recent years, and the flexible resource sharing is popular with users, so that the cloud OpenStack computing management platform is an open source cloud computing platform which is widely used at present. In the current deployment scene of OpenStack, a distributed file system ceph is commonly used as a storage providing component, which can provide reliable data storage and also can consider the cost. In a typical scenario of a cloud computing environment with ceph as storage, the following process is required for IO to send a disk-down return from a virtual machine qemu: GuestOS- > qemu (librbd) - > network- > osd- > SSD | HDD | rocksdb. However, the osd device takes most of the latency time to write data. IO needs to be converted into disk IO in the osd, and write amplification occurs in the process. The data drop position needs to be managed, and metadata is needed for management, so that the data volume written into the disk is larger than one IO. In Bluestore (ceph's storage engine), management of metadata utilizes a rocksdb database, that is, a database needs to be updated once for metadata update, and the rocksdb itself introduces write amplification of itself. This process takes a long time. Aiming at the problems, the currently adopted optimization mode mainly comprises the steps of using cache tier (hierarchical cache); or through ssd caching when landing on the osd side.
The inventor finds that whichever method is used, it is difficult to avoid that IO passes through a complete ceph software stack, and ceph performance still cannot be improved better.
Disclosure of Invention
The present application mainly aims to provide a caching method and apparatus applied to ceph, so as to provide a caching method applied to ceph, which improves ceph performance.
To achieve the above object, according to a first aspect of the present application, there is provided a caching method applied to ceph.
The caching method applied to ceph comprises the following steps:
receiving an IO request initiated by a virtual machine qemu through a bcache block device, wherein the IO request is a data writing request;
caching data corresponding to the IO request to drbd block equipment in distributed storage, wherein the drbd block equipment is used as caching equipment of bcache block equipment;
and writing the data corresponding to the IO request back to the kernel rbd block device of the operating system, wherein the kernel rbd block device is used as a backing device of the bcache block device.
Optionally, the drbd block device is a multi-copy cache device, and caching the data corresponding to the IO request to the drbd block device includes:
and caching data corresponding to the IO request into a drbd cluster consisting of the solid state disk nvme.
Optionally, the nvme includes a local active partition and a passive partition of the remote node, and the caching data corresponding to the IO request in the drbd cluster composed of the nvme includes:
and copying data corresponding to the IO request to a local active partition and a passive partition of the remote node to realize multi-copy cache.
Optionally, the method further includes:
if a cache failure is detected and only one copy exists, reading the cached data in the drbd by the hot spare disk and adding the data into the drbd cluster to increase the number of copies.
Optionally, the method further includes:
receiving data corresponding to the read IO request initiated by the virtual machine qemu through the bcache block device;
and reading data from the local active partition.
Optionally, after writing back data corresponding to the IO request to the kernel rbd block device of the operating system, the method further includes:
and sending the data written back to the kernel rbd block device to the ceph cluster.
In order to achieve the above object, according to a second aspect of the present application, there is provided a cache apparatus applied to ceph.
The cache device applied to ceph according to the application comprises:
the receiving unit is used for receiving an IO request initiated by a virtual machine qemu through a bcache block device, wherein the IO request is a data writing request;
the cache unit is used for caching data corresponding to the IO request to drbd block equipment in distributed storage, wherein the drbd block equipment is used as caching equipment of bcache block equipment;
and the write-back unit is used for writing back the data corresponding to the IO request to kernel rbd block equipment of the operating system, and the kernel rbd block equipment is used as backing equipment of the bcache block equipment.
Optionally, the drbd block device is a multi-copy cache device, and the cache unit is configured to:
and caching data corresponding to the IO request into a drbd cluster consisting of the solid state disk nvme.
Optionally, the nvme includes a local active partition and a passive partition of the remote node, and the cache unit is further configured to:
and copying data corresponding to the IO request to a local active partition and a passive partition of the remote node to realize multi-copy cache.
Optionally, the apparatus further comprises:
and the standby unit is used for reading the data cached in the drbd by the hot standby disk and adding the data into the drbd cluster to increase the number of copies if the cache failure is detected and only one copy exists.
Optionally, the apparatus further comprises:
the receiving unit is further configured to receive, through a bcache block device, data corresponding to the read IO request initiated by the virtual machine qemu;
and the reading unit is used for reading data in the local active partition.
Optionally, the apparatus further comprises:
and the sending unit is used for sending the data written back to the kernel rbd block device of the operating system to the ceph cluster after the data corresponding to the IO request is written back to the kernel rbd block device of the operating system.
In order to achieve the above object, according to a third aspect of the present application, there is provided a computer-readable storage medium storing computer instructions for causing the computer to execute the caching method applied to ceph of any one of the above first aspects.
In the embodiment of the application, the method and the device are applied to the caching method and the device of ceph, and three new components including bcache, kernel rbd and drbd are introduced on the basis of ensuring the affinity of ceph for cloud protogenesis. The specific caching scheme is as follows: receiving an IO request initiated by a virtual machine qemu through a bcache block device, wherein the IO request is a data writing request; caching data corresponding to the IO request to drbd block equipment in distributed storage, wherein the drbd block equipment is used as caching equipment of bcache block equipment; and writing the data corresponding to the IO request back to the kernel rbd block device of the operating system, wherein the kernel rbd block device is used as a backing device of the bcache block device. Because drbd does not need extra metadata management (i.e., extra metadata is not needed to manage LBA mapping), delay can be reduced, and performance can be improved; in addition, qemu no longer uses the conventional way of driving ceph directly with rbd, but uses a block device (kernel rbd block device) provided by bcache as a backing device of the virtual disk. By utilizing a bcache cache mechanism, when data is written, the data only needs to be written into a bcache device and then falls to a cache disk, namely drbd block device, IO is completed, and the writing performance can be greatly improved.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this application, serve to provide a further understanding of the application and to enable other features, objects, and advantages of the application to be more apparent. The drawings and their description illustrate the embodiments of the invention and do not limit it. In the drawings:
fig. 1 is a flowchart of a caching method applied to ceph according to an embodiment of the present application;
fig. 2 is a schematic structural diagram corresponding to a caching scheme provided according to an embodiment of the present application;
fig. 3 is a flowchart of another caching method applied to ceph according to an embodiment of the present application
Fig. 4 is a block diagram illustrating a cache device applied to ceph according to an embodiment of the present disclosure;
fig. 5 is a block diagram of another cache apparatus applied to ceph according to an embodiment of the present disclosure.
Detailed Description
In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only partial embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
It should be noted that the terms "first," "second," and the like in the description and claims of this application and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It should be understood that the data so used may be interchanged under appropriate circumstances such that embodiments of the application described herein may be used. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.
According to an embodiment of the present application, there is provided a caching method applied to ceph, as shown in fig. 1, the method includes the following steps:
s101, receiving an IO request initiated by a virtual machine qemu through a bcache block device.
Different from the existing ceph storage, the three added components introduce a client caching scheme, and cache IO data on the basis of ensuring the affinity of ceph for cloud protogenesis. The three components introduced are drbd, kernel rbd and bcache respectively.
The bcache is a cache mechanism realized in an inner core of the linux kernel, one universal block device can be used as a caching disk, the other universal block device can be used as a backing disk, and then a new block device bcache X is simulated.
drbd is a distributed block storage technology, mature and stable. No extra metadata management is needed, and the performance is excellent. In this embodiment, drbd device is used as a multi-copy caching disk for storing the cache data of bcache.
kernel rbd: the device is a kernel module of linux kernel, and can present the rbd image as a block device of linux. This embodiment is used as a backing disk for a bcache.
Specifically, the present embodiment takes a write request as an example for explanation, that is, an IO request is a write data request; specifically, a write data request initiated by a virtual machine qemu is received by a bcache block device.
S102, caching data corresponding to the IO request to drbd block devices of distributed storage, wherein the drbd block devices serve as caching devices of the bcache block devices.
Specifically, this embodiment is a caching device that uses drbd block device as a bcache block device. Therefore, after receiving the write data request through the bcache block device, the data corresponding to the write data request is cached in the drbd block device of the distributed storage.
Specifically, the drbd block device is a cache device with multiple copies (for example, 3 copies), drbd uses nvme as storage, and the caching device (drbd block device) of bcache is established in a drbd cluster composed of nvme. Correspondingly caching the data corresponding to the IO request into the drbd block device comprises the following steps: and caching data corresponding to the IO request into a drbd cluster consisting of the solid state disk nvme. The nvme comprises a local active partition and a passive partition of the remote node, and data corresponding to the IO request is cached in a drbd cluster consisting of the nvme, namely: and copying data corresponding to the IO request to a local active partition and a passive partition of the remote node to realize multi-copy cache.
S103, writing the data corresponding to the IO request back to kernel rbd block equipment of the operating system, wherein the kernel rbd block equipment is used as backing equipment of the bcache block equipment.
In this embodiment, the kernel rbd block device is used as a backing device of the bcache block device, and dirty data cached in nvme of the drbd block device is written back to the kernel rbd block device. Corresponding to this embodiment, data corresponding to a cached write data request (belonging to an IO request) is written back to a kernel rbd block device, and then the data written back to the kernel rbd block device is sent to the ceph cluster.
To more intuitively explain the above caching scheme, a schematic structural diagram is given, as shown in fig. 2, where node1 is a local computer, node2 and node3 are two remote computers, each computer is denoted as a node, node1 is a local node, and node2 and node3 are remote nodes. Three drbd corresponding to the three nodes share one disk (nvme). Specifically, when writing data, qemu makes an IO request to bcache, writes the data into bcache, and further falls into drbd (specifically, stores the data in nvme), and bcache reads the data from drbd, writes the data back into rbd (kernel rbd), and then sends the data to ceph through network by rbd.
From the above description, it can be seen that in the caching method applied to ceph in the embodiment of the present application, on the basis of ensuring the affinity of ceph for cloud-native, three new components, namely bcache, kernel rbd, and drbd, are introduced. The specific caching scheme is as follows: receiving an IO request initiated by a virtual machine qemu through a bcache block device, wherein the IO request is a data writing request; caching data corresponding to the IO request to drbd block equipment in distributed storage, wherein the drbd block equipment is used as caching equipment of bcache block equipment; and writing the data corresponding to the IO request back to the kernel rbd block device of the operating system, wherein the kernel rbd block device is used as a backing device of the bcache block device. Because drbd does not need extra metadata management (i.e., extra metadata is not needed to manage LBA mapping), delay can be reduced, and performance can be improved; in addition, qemu no longer uses the conventional way of driving ceph directly with rbd, but uses a block device (kernel rbd block device) provided by bcache as a backing device of the virtual disk. By utilizing a bcache cache mechanism, when data is written, the data only needs to be written into a bcache device and then falls to a cache disk, namely drbd block device, IO is completed, and the writing performance can be greatly improved.
Further, if a cache failure is detected and only one cache copy exists, the data cached in the drbd is read through the hot spare disk and added into the drbd cluster to increase the number of copies. The hot spare disc is also an rbd disc.
In this embodiment, drbd is three copies, and when the number of copies is less than two, that is, only one copy can be generated, it indicates that the current environment is not reliable, drbd will stop responding to the upper layer/front end IO request, but it needs to ensure that the dirty data existing in drbd can be written back to rbd, so in this case, a hot spare disk supported by rbd is proposed to be opened, the hot spare disk is added to a drbd cluster, and in this process, drbd will synchronize the data from the generated one copy to the hot spare disk to obtain another copy, so that the number of copies of drbd can be increased, so that drbd recovers the response to the upper layer/front end IO request, and the write-back process can be continued.
It should be noted that drbd has a mechanism that can stop the response when the number of copies is set to be less than some. In addition, when the remote node fails or the local node breaks down, the number of copies is reduced.
Further, when reading data, receiving data corresponding to a read IO request (read data request) initiated by the virtual machine qemu through the bcache block device; if the data to be read exists in the cache, the data can be read through the local active partition, namely the data is directly read from the local copy, the reading performance is basically the effect of the local SSD, and the high reading performance can be ensured.
For detailed and intuitive description of all the processes of the foregoing caching method, a flow chart is given, as shown in fig. 3, where node1 is a local computer, node2 and node3 are two remote computers, each of which is recorded as a node, node1 is a local node, and node2 and node3 are remote nodes. Three drbd corresponding to the three nodes share one disk (nvme). Specifically, when writing data, qemu makes an IO request to bcache, writes the data into bcache, and then falls into drbd (specifically, nvme), and drbd writes the data back into rbd (kernel rbd). Drbd stores data to nvme includes three paths, one for local node replication and two for remote replication. It should be noted that, in this embodiment, nvme is divided into three partitions, and there are three copies, but the three copies and the three partitions are not in a one-to-one correspondence relationship. When the number of the copies is less than 2, namely when a remote node fails or a local disk is damaged, a hot standby disk, namely RBD Hotspare in the graph, needs to be started, is added into the drbd cluster, and in the process, drbd synchronizes data from one generated copy to the hot standby disk to obtain another copy, so that the copy number of drbd can be increased. The number of copies is guaranteed, and the reliability of the environment can be guaranteed, so that the response to the upper layer/front end IO request is recovered.
Finally, the performance of the caching method in the application is tested, specifically, the hardware configuration is as follows:
a CPU: intel (R) Xeon (R) CPU E5-2620 v4@2.10GHz 2 (each CPU 16 processer)
Memory: 96G
Network: 6 kilomega +2 kilomega
NVMe:Intel P3700
Taking a write request as an example, when the caching space is not full, the following performance data is obtained by using the above hardware in the cache manner of the present application, as shown in table 1:
RW TYPE BS IODEPTH NUMJOBS IOPS BW(MiB/s) LATENCY(ms)
randwrite 4k 1 1 3810 14.9 0.258340
randwrite 4k 2 1 7061 27.6 0.279680
randwrite 4k 4 1 12000 50.7 0.305370
randwrite 4k 8 1 22700 88.8 0.349380
randwrite 4k 16 1 33400 130 0.476690
randwrite 4k 32 1 38300 150 0.832900
randwrite 4k 64 1 43900 172 1.454960
randwrite 4k 128 1 47800 187 2.677390
the same hardware configuration uses the existing common optimization method, namely the method of ssd caching when the osd side is down, to obtain the performance data, as shown in table 2:
Figure BDA0002672384540000091
Figure BDA0002672384540000101
it can be seen that compared to the manner of ssd buffering when landing on the osd side, the LATENCY (LATENCY) of the present application is greatly reduced, and LATENCY performs better especially when the queue depth is not large.
It should be noted that the steps illustrated in the flowcharts of the figures may be performed in a computer system such as a set of computer-executable instructions and that, although a logical order is illustrated in the flowcharts, in some cases, the steps illustrated or described may be performed in an order different than presented herein.
According to an embodiment of the present application, there is further provided a cache apparatus applied to ceph, for implementing the method described in fig. 1, as shown in fig. 4, the apparatus includes:
a receiving unit 21, configured to receive an IO request initiated by a virtual machine qemu through a bcache block device, where the IO request is a write data request;
the cache unit 22 is configured to cache data corresponding to the IO request to drbd block devices in distributed storage, where the drbd block devices serve as caching devices of a bcache block device;
and the write-back unit 23 is configured to write back data corresponding to the IO request to a kernel rbd block device of the operating system, where the kernel rbd block device is used as a backing device of a bcache block device.
From the above description, it can be seen that in the cache device applied to ceph according to the embodiment of the present application, on the basis of ensuring the affinity of ceph for cloud-native, three new components, namely bcache, kernel rbd, and drbd, are introduced. The specific caching scheme is as follows: receiving an IO request initiated by a virtual machine qemu through a bcache block device, wherein the IO request is a data writing request; caching data corresponding to the IO request to drbd block equipment in distributed storage, wherein the drbd block equipment is used as caching equipment of bcache block equipment; and writing the data corresponding to the IO request back to the kernel rbd block device of the operating system, wherein the kernel rbd block device is used as a backing device of the bcache block device. Because drbd does not need extra metadata management (i.e., extra metadata is not needed to manage LBA mapping), delay can be reduced, and performance can be improved; in addition, qemu no longer uses the conventional way of driving ceph directly with rbd, but uses a block device (kernel rbd block device) provided by bcache as a backing device of the virtual disk. By utilizing a bcache cache mechanism, when data is written, the data only needs to be written into a bcache device and then falls to a cache disk, namely drbd block device, IO is completed, and the writing performance can be greatly improved.
Further, the drbd block device is a multi-copy cache device, and the cache unit 22 is configured to:
and caching data corresponding to the IO request into a drbd cluster consisting of the solid state disk nvme.
Further, the nvme includes a local active partition and a passive partition of the remote node, and the cache unit 22 is further configured to:
and copying data corresponding to the IO request to a local active partition and a passive partition of the remote node to realize multi-copy cache.
Further, as shown in fig. 5, the apparatus further includes:
and the standby unit 24 is used for reading the data cached in the drbd by the hot standby disk and adding the data into the drbd cluster to increase the number of copies if a cache failure is detected and only one copy exists.
Further, as shown in fig. 5, the apparatus further includes:
the receiving unit 21 is further configured to receive, through a bcache block device, data corresponding to the read IO request initiated by the virtual machine qemu;
and the reading unit 25 is used for reading data in the local active partition.
Further, as shown in fig. 5, the apparatus further includes:
and the sending unit 26 is configured to send the data written back to the kernel rbd block device of the operating system to the ceph cluster after the data corresponding to the IO request is written back to the kernel rbd block device of the operating system.
Specifically, the specific process of implementing the functions of each unit and module in the device in the embodiment of the present application may refer to the related description in the method embodiment, and is not described herein again.
According to an embodiment of the present application, there is further provided a computer-readable storage medium, where the computer-readable storage medium stores computer instructions, and the computer instructions are configured to cause the computer to execute the caching method applied to ceph in the foregoing method embodiment.
It will be apparent to those skilled in the art that the modules or steps of the present application described above may be implemented by a general purpose computing device, they may be centralized on a single computing device or distributed across a network of multiple computing devices, and they may alternatively be implemented by program code executable by a computing device, such that they may be stored in a storage device and executed by a computing device, or fabricated separately as individual integrated circuit modules, or fabricated as a single integrated circuit module from multiple modules or steps. Thus, the present application is not limited to any specific combination of hardware and software.
The above description is only a preferred embodiment of the present application and is not intended to limit the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims (10)

1. A caching method applied to ceph, the method comprising:
receiving an IO request initiated by a virtual machine qemu through a bcache block device, wherein the IO request is a data writing request;
caching data corresponding to the IO request to drbd block equipment in distributed storage, wherein the drbd block equipment is used as caching equipment of bcache block equipment;
and writing the data corresponding to the IO request back to the kernel rbd block device of the operating system, wherein the kernel rbd block device is used as a backing device of the bcache block device.
2. The caching method applied to ceph according to claim 1, wherein the drbd block device is a multi-copy caching device, and the caching the data corresponding to the IO request to the drbd block device includes:
and caching data corresponding to the IO request into a drbd cluster consisting of the solid state disk nvme.
3. The caching method applied to ceph according to claim 2, wherein the nvme includes a local active partition and a passive partition of the remote node, and the caching of the data corresponding to the IO request into the drbd cluster composed of the nvme comprises:
and copying data corresponding to the IO request to a local active partition and a passive partition of the remote node to realize multi-copy cache.
4. The caching method applied to ceph according to claim 3, wherein the method further comprises:
if a cache failure is detected and only one copy exists, reading the cached data in the drbd by the hot spare disk and adding the data into the drbd cluster to increase the number of copies.
5. The caching method applied to ceph according to claim 3, wherein the method further comprises:
receiving data corresponding to the read IO request initiated by the virtual machine qemu through the bcache block device;
and reading data from the local active partition.
6. The caching method applied to ceph according to any one of claims 1 to 4, wherein after writing data corresponding to the IO request back to an operating system kernel rbd block device, the method further comprises:
and sending the data written back to the kernel rbd block device to the ceph cluster.
7. A caching apparatus applied to ceph, the apparatus comprising:
the receiving unit is used for receiving an IO request initiated by a virtual machine qemu through a bcache block device, wherein the IO request is a data writing request;
the cache unit is used for caching data corresponding to the IO request to drbd block equipment in distributed storage, wherein the drbd block equipment is used as caching equipment of bcache block equipment;
and the write-back unit is used for writing back the data corresponding to the IO request to kernel rbd block equipment of the operating system, and the kernel rbd block equipment is used as backing equipment of the bcache block equipment.
8. The apparatus according to claim 7, wherein the drbd block device is a multi-copy cache device, and the cache unit is configured to:
and caching data corresponding to the IO request into a drbd cluster consisting of the solid state disk nvme.
9. The ceph-applied caching device according to claim 8, wherein nvme comprises a local active partition and a passive partition of a remote node, and said caching unit is further configured to:
and copying data corresponding to the IO request to a local active partition and a passive partition of the remote node to realize multi-copy cache.
10. A computer-readable storage medium storing computer instructions for causing a computer to perform the caching method applied to ceph of any one of claims 1 to 6.
CN202010938200.7A 2020-09-08 2020-09-08 Caching method and device applied to ceph Active CN112131145B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010938200.7A CN112131145B (en) 2020-09-08 2020-09-08 Caching method and device applied to ceph

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010938200.7A CN112131145B (en) 2020-09-08 2020-09-08 Caching method and device applied to ceph

Publications (2)

Publication Number Publication Date
CN112131145A true CN112131145A (en) 2020-12-25
CN112131145B CN112131145B (en) 2021-11-09

Family

ID=73846287

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010938200.7A Active CN112131145B (en) 2020-09-08 2020-09-08 Caching method and device applied to ceph

Country Status (1)

Country Link
CN (1) CN112131145B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114296646A (en) * 2021-12-24 2022-04-08 天翼云科技有限公司 Caching method, device, server and storage medium based on IO service

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9141814B1 (en) * 2014-06-03 2015-09-22 Zettaset, Inc. Methods and computer systems with provisions for high availability of cryptographic keys
CN110781158A (en) * 2019-10-25 2020-02-11 山东乾云启创信息科技股份有限公司 Distributed storage method and system based on CEPH
CN111045987A (en) * 2019-12-17 2020-04-21 湖南大学 Ceph-based distributed file system metadata access acceleration method and system
CN111309266A (en) * 2020-02-23 2020-06-19 苏州浪潮智能科技有限公司 Distributed storage metadata system log optimization system and method based on ceph
CN111338751A (en) * 2020-02-13 2020-06-26 山东汇贸电子口岸有限公司 Cross-pool migration method and device for data in same ceph cluster

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9141814B1 (en) * 2014-06-03 2015-09-22 Zettaset, Inc. Methods and computer systems with provisions for high availability of cryptographic keys
CN110781158A (en) * 2019-10-25 2020-02-11 山东乾云启创信息科技股份有限公司 Distributed storage method and system based on CEPH
CN111045987A (en) * 2019-12-17 2020-04-21 湖南大学 Ceph-based distributed file system metadata access acceleration method and system
CN111338751A (en) * 2020-02-13 2020-06-26 山东汇贸电子口岸有限公司 Cross-pool migration method and device for data in same ceph cluster
CN111309266A (en) * 2020-02-23 2020-06-19 苏州浪潮智能科技有限公司 Distributed storage metadata system log optimization system and method based on ceph

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
WEICHAO DING ETC.: "Construction and Performance Analysis of Unified Storage Cloud Platform Based on OpenStack with Ceph RBD", 《2018 THE 3RD IEEE INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND BIG DATA ANALYSIS》 *
李翔: "ceph分布式文件系统的研究及性能测试", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
詹玲,方协云等: "基于ceph文件系统的元数据缓存备份", 《计算机工程》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114296646A (en) * 2021-12-24 2022-04-08 天翼云科技有限公司 Caching method, device, server and storage medium based on IO service
CN114296646B (en) * 2021-12-24 2023-06-23 天翼云科技有限公司 Caching method and device based on IO service, server and storage medium

Also Published As

Publication number Publication date
CN112131145B (en) 2021-11-09

Similar Documents

Publication Publication Date Title
KR102457611B1 (en) Method and apparatus for tenant-aware storage sharing platform
US10521135B2 (en) Data system with data flush mechanism
US10291739B2 (en) Systems and methods for tracking of cache sector status
JP4892072B2 (en) Storage device that eliminates duplicate data in cooperation with host device, storage system including the storage device, and deduplication method in the system
CN100405304C (en) Method for realizing high speed solid storage device based on storage region network
US8938604B2 (en) Data backup using distributed hash tables
US11811895B2 (en) Automatic data replica manager in distributed caching and data processing systems
US10146632B2 (en) Efficient mechanism to replicate data for multiple controllers
AU2015360953A1 (en) Dataset replication in a cloud computing environment
WO2015010394A1 (en) Data sending method, data receiving method and storage device
US10244069B1 (en) Accelerated data storage synchronization for node fault protection in distributed storage system
CN105897859B (en) Storage system
US9778927B2 (en) Storage control device to control storage devices of a first type and a second type
CN113220729A (en) Data storage method and device, electronic equipment and computer readable storage medium
CN106873902B (en) File storage system, data scheduling method and data node
US11157191B2 (en) Intra-device notational data movement system
CN112131145B (en) Caching method and device applied to ceph
WO2019049224A1 (en) Distributed storage system and distributed storage control method
KR101601877B1 (en) Apparatus and method for client's participating in data storage of distributed file system
US11010091B2 (en) Multi-tier storage
CN112104729A (en) Storage system and caching method thereof
CN116594551A (en) Data storage method and device
CN109343928B (en) Virtual memory file redirection method and system for virtual machine in virtualization cluster
CN114077517A (en) Data processing method, equipment and system
US10437471B2 (en) Method and system for allocating and managing storage in a raid storage system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant