CN108846009B

CN108846009B - Copy data storage method and device in ceph

Info

Publication number: CN108846009B
Application number: CN201810400813.8A
Authority: CN
Inventors: 韩庆波
Original assignee: Beijing QIYI Century Science and Technology Co Ltd
Current assignee: Beijing QIYI Century Science and Technology Co Ltd
Priority date: 2018-04-28
Filing date: 2018-04-28
Publication date: 2021-02-05
Anticipated expiration: 2038-04-28
Also published as: CN108846009A

Abstract

The embodiment of the invention provides a method and a device for storing copy data in ceph, electronic equipment and a storage medium, wherein the method comprises the following steps: acquiring a topological structure of an object storage device OSD cluster; dividing at least one physical fault domain of the OSD cluster into a plurality of virtual fault sub-domains, and taking an undivided fault domain and a virtual fault sub-domain obtained by division as fault domains to be stored; selecting the number of fault domains to be stored as storage fault domains based on the number of the copy data needing to be stored, wherein different virtual fault sub domains in the selected storage fault domains belong to different physical fault domains; and respectively storing the duplicate data in one OSD of each storage fault domain. By using the copy data storage method provided by the embodiment of the invention, the probability of loss of the copy data in ceph can be reduced.

Description

Copy data storage method and device in ceph

Technical Field

The present invention relates to the field of data management technologies, and in particular, to a copy data storage method and apparatus in a ceph (distributed file system), an electronic device, and a storage medium.

Background

ceph is a distributed file system, and for ceph, PG (placement group) is a virtual node of data storage, and a carrier of PG is a hardware storage unit that may be an entity, such as OSD (object storage device). Each PG has R copies of data, which are stored on R different OSDs, respectively, and the R OSDs belong to different physical fault domains. The physical fault domain is a storage area which is artificially divided, and the division of the physical fault domain is to avoid that the same copy data stored in a certain storage area is affected when the certain storage area fails, and the same copy data is usually stored in different physical fault domains.

If multiple OSDs, which respectively belong to different physical fault domains, storing the copy data of a PG fail, the copy data of the PG is lost. A collection containing a certain number of OSDs is usually referred to as an OSD cluster.

The probability formula for a known missing PG's duplicate data is:

wherein R represents the number of duplicate data, P_rThe probability that R OSD simultaneously fails is represented, N represents the number of OSD in the OSD cluster, and M represents the number of the distribution condition of R copy data of one PG in the OSD cluster.

For example, referring to fig. 1, fig. 1 is a schematic diagram of a copy data storage method in ceph in the prior art, as shown in fig. 1, a PG has three copy data, which are stored on three OSDs located in three different fault domains, respectively, and each chassis in fig. 1 is a fault domain, that is, three OSDs storing the copy data of the PG are located on three different chassis, respectively.

With respect to fig. 1, there are 24 OSDs on each rack, and the distribution of 3 copies of the PG in the OSD cluster is 24 × 24 × 24, that is, in the formula for calculating the probability of missing PG data, the value of M is 24 × 24 × 24 — 13824, which is larger, and also means that the probability of missing PG copies is larger.

Disclosure of Invention

The embodiment of the invention aims to provide a method and a device for storing copy data in ceph, electronic equipment and a storage medium, so as to reduce the probability of loss of the copy data in the ceph.

The specific technical scheme is as follows:

the embodiment of the invention provides a method for storing copy data in ceph, which comprises the following steps:

acquiring a topological structure of an OSD cluster, wherein the OSD cluster comprises a plurality of OSD, and the topological structure represents the division condition of a physical fault domain of the OSD cluster;

dividing at least one physical fault domain of the OSD cluster into a plurality of virtual fault sub-domains, and taking an undivided fault domain and a virtual fault sub-domain obtained by division as fault domains to be stored;

selecting the number of fault domains to be stored as storage fault domains based on the number of the replica data to be stored, wherein different virtual fault sub domains in the selected storage fault domains belong to different physical fault domains;

and respectively storing the duplicate data in one OSD of each storage fault domain.

Optionally, the dividing at least one physical fault domain of the OSD cluster into a plurality of virtual fault sub-domains includes:

and dividing each physical fault domain of the OSD cluster into a plurality of virtual fault sub-domains.

Optionally, before the selecting the number of fault domains to be stored as the storage fault domains, the method further includes:

dividing the obtained fault domains to be stored into a plurality of groups of fault domains to be stored, wherein different virtual fault sub domains in each group of fault domains to be stored belong to different physical fault domains;

the selecting the number of fault domains to be stored as storage fault domains comprises:

and selecting the number of fault domains to be stored as storage fault domains from one of the plurality of groups of fault domains to be stored.

dividing each physical fault domain of the OSD cluster into two virtual fault sub-domains;

before the selecting the number of fault domains to be stored as the storage fault domains, the method further includes:

dividing the obtained fault domains to be stored into two groups of fault domains to be stored, wherein different virtual fault sub domains in each group of fault domains to be stored belong to different physical fault domains;

and selecting the number of fault domains to be stored as storage fault domains from one of the two groups of fault domains to be stored.

Optionally, the number of the duplicate data to be stored is the same as the number of the physical fault domains of the OSD cluster.

The embodiment of the invention also provides a device for storing the copy data in the ceph, which comprises:

the system comprises a topological structure obtaining module, a judging module and a judging module, wherein the topological structure obtaining module is used for obtaining a topological structure of an OSD cluster, the OSD cluster comprises a plurality of OSD, and the topological structure represents the division condition of a physical fault domain of the OSD cluster;

the fault domain dividing module is used for dividing at least one physical fault domain of the OSD cluster into a plurality of virtual fault sub-domains, and taking the non-divided fault domains and the divided virtual fault sub-domains as fault domains to be stored;

the storage domain selection module is used for selecting the number of fault domains to be stored as storage fault domains according to the number of the copy data to be stored, wherein different virtual fault sub domains in the selected storage fault domains belong to different physical fault domains;

and the storage module is used for respectively storing the duplicate data in one OSD of each storage fault domain.

Optionally, the fault domain dividing module is specifically configured to:

Optionally, the apparatus further comprises:

the grouping module is used for dividing the obtained fault domains to be stored into a plurality of groups of fault domains to be stored, and different virtual fault sub domains in each group of fault domains to be stored belong to different physical fault domains;

the storage domain selection module is specifically configured to:

Optionally, the fault domain dividing module is specifically configured to:

the device further comprises:

the dividing module is used for dividing the obtained fault domains to be stored into two groups of fault domains to be stored, and different virtual fault sub domains in each group of fault domains to be stored belong to different physical fault domains;

and the selecting module is used for selecting the number of fault domains to be stored as storage fault domains from one of the two groups of fault domains to be stored.

The embodiment of the invention also provides electronic equipment which comprises a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory finish mutual communication through the communication bus;

the memory is used for storing a computer program;

the processor is configured to implement any of the above method steps when executing the program stored in the memory.

An embodiment of the present invention further provides a computer-readable storage medium, in which a computer program is stored, and the computer program, when executed by a processor, implements any of the above method steps.

By using the copy data storage method, the device, the electronic equipment and the storage medium in ceph provided by the embodiment of the invention, at least one physical fault domain in the OSD cluster can be divided into a plurality of virtual fault sub-domains, fault domains with the same number as that of copies are selected for copy data storage, and different virtual fault sub-domains in the selected fault domains belong to different physical fault domains. The number of OSD contained in the selected fault domain is smaller than that contained in the physical fault domain before division, so that the number of distribution conditions of the copy data in the selected fault domain is reduced, and the copy data storage method provided by the embodiment of the invention can reduce the probability of the loss of the copy data in ceph according to the existing probability formula of the loss of the copy data.

Of course, not all of the advantages described above need to be achieved at the same time in the practice of any one product or method of the invention.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below.

FIG. 1 is a diagram illustrating a method for storing copy data in ceph according to the prior art;

fig. 2 is a flowchart of a method for storing copy data in ceph according to an embodiment of the present invention;

fig. 3 is a schematic diagram of a copy data storage method in ceph according to an embodiment of the present invention;

fig. 4 is another schematic diagram of a copy data storage method in ceph according to an embodiment of the present invention;

FIG. 5 is a schematic structural diagram of a copy data storage device in ceph according to an embodiment of the present invention;

fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be described below with reference to the drawings in the embodiments of the present invention.

In order to solve the problems in the prior art, embodiments of the present invention provide a method and an apparatus for storing copy data in ceph, an electronic device, and a storage medium, which can solve the problem in the prior art that the probability of losing copy data of PG is relatively high.

Referring to fig. 2, fig. 2 is a flowchart of a copy data storage method in ceph according to an embodiment of the present invention, where the method may include the following steps:

step S201: acquiring a topological structure of an OSD cluster, wherein the OSD cluster comprises a plurality of OSD, and the topological structure represents the division condition of a physical fault domain of the OSD cluster;

in ceph, a copy of data may be stored on OSD, and before storing the data, the topology of the OSD cluster may be obtained first, and the topology of the OSD cluster may be obtained by calling a flush map (ceph middle level distribution map) describing the hierarchy of the ceph storage system, for example, how many racks the storage system includes, how many storage devices each includes, how many OSDs each includes, and the like.

In the embodiment of the present invention, the obtained topology structure of the OSD cluster may include a division condition of a physical fault domain, where the physical fault domain is a storage region that is artificially divided, and the division of the physical fault domain is to avoid that the same duplicate data stored in a certain storage region is affected when the certain storage region fails, so that the same duplicate data is usually stored in different physical fault domains. Referring to fig. 1, the physical fault domain level in fig. 1 is a rack level, that is, each rack is a physical fault domain, and as can be seen from fig. 1, each rack contains 24 OSDs.

Step S202: dividing at least one physical fault domain of the OSD cluster into a plurality of virtual fault sub-domains, and taking an undivided fault domain and a virtual fault sub-domain obtained by division as fault domains to be stored;

in this step, the division of the physical failure domain may be modified by modifying ceph rule (a data storage rule in ceph), which defines a constraint condition for storing duplicate data, for example, for three identical duplicate data to be stored, ceph rule defines a rule by which the three duplicate data are stored in three different physical failure domains.

In the embodiment of the invention, at least one physical fault domain of the OSD cluster can be divided into a plurality of virtual physical fault sub-domains by modifying ceph rule, and after the division is finished, the non-divided fault domain and the divided virtual fault sub-domains are both used as fault domains to be stored.

Each physical fault domain of the OSD cluster may also be divided into a plurality of virtual physical fault sub-domains, which is not limited in the embodiment of the present invention.

Step S203: for the copy data to be stored, selecting the copy number of fault domains to be stored as storage fault domains based on the number of the copy data to be stored, wherein different virtual fault sub domains in the selected storage fault domains belong to different physical fault domains;

in the embodiment of the invention, after the physical fault domains are divided, the fault domains to be stored in the number can be selected to store the duplicate data according to the number of the duplicate data. For example, for replica data with the number of replicas being three, three fault domains to be stored may be selected to store the three replicas respectively, and the three fault domains to be stored belong to different physical fault domains, so that it may be ensured that the three replica data are stored in different physical fault domains.

In the embodiment of the invention, after the physical fault domains are divided, the obtained fault domains to be stored can be divided into a plurality of groups of fault domains to be stored, and different virtual fault sub domains of each group of fault domains to be stored belong to different physical fault domains. Therefore, when the fault to be stored in which the duplicate data is stored is determined, a group of fault domains to be stored can be selected from a plurality of groups of fault domains to be stored obtained by division, and then the fault domains to be stored with the number of duplicates are selected from the group of fault domains to be stored as the storage fault domains, wherein the storage fault domains are the selected fault domains to be stored with the duplicate data.

For example, referring to fig. 3, fig. 3 is a schematic diagram of a copy data storage method according to an embodiment of the present invention, for data to be stored with a copy number of three, three fault domains are finally divided for the data to be stored for data storage. Before that, if the number of the acquired physical fault domains is 4, the physical fault domains a, b, c and d shown in fig. 3 are included. In the embodiment shown in fig. 3, the physical fault domain a is not divided into sub-domains, but the physical fault domains b, c, and d are divided into virtual fault sub-domains b1, b2, c1, c2, d1, and d2, respectively, and then a, b1, b2, c1, c2, d1, and d2 are all used as fault domains to be stored.

After the division is completed, the fault domains to be stored can be grouped, and different virtual fault sub domains in each group of fault domains to be stored are ensured to belong to different physical fault domains. For example, a, b1, c1 and d2 may be divided into one group, or b1, c2 and d2 may be divided into one group, that is, the number of fault domains to be stored included in each group may be greater than or equal to the number of copies of copy data to be stored.

After the grouping is determined, one group of fault domains to be stored can be selected from the multiple groups of fault domains to be stored for storing the duplicate data, and since the number of the fault domains in the selected group of fault domains to be stored is possibly greater than the number of the duplicates, the number of the fault domains to be stored can be selected from the selected group of fault domains to be stored, and the duplicate data can be stored in the number of the fault domains to be stored. For example, if a selected group of fault domains to be stored includes fault domains a, b1, c1 and d2, since the number of copies is 3, a, b1 and c1 can be selected from the group of fault domains to be stored for storing the copy data.

Step S204: the duplicate data is stored in one OSD for each storage fault domain.

After the number of fault domains to be stored in the copy is selected, the fault domains to be stored may be determined as storage fault domains, that is, the copy data to be stored may be stored in the storage fault domains, each storage fault domain includes a plurality of OSDs, and the copy data may be stored in one OSD of each storage fault domain.

It can be seen that by using the method for storing duplicate data in ceph provided by the embodiment of the present invention, at least one physical fault domain in an OSD cluster can be divided into a plurality of virtual fault sub-domains, and then fault domains with the same number as that of duplicates are selected for storing the duplicate data, and different virtual fault sub-domains in the selected fault domains belong to different physical fault domains. The number of OSD contained in the selected fault domain is smaller than that contained in the physical fault domain before division, so that the number of distribution conditions of the copy data in the selected fault domain is reduced, and the copy data storage method provided by the embodiment of the invention can reduce the probability of the loss of the copy data in ceph according to the existing probability formula of the loss of the copy data.

In this embodiment of the present invention, dividing at least one physical fault domain of the OSD cluster into a plurality of virtual fault sub-domains may include:

before selecting the fault domains to be stored with the number of copies as the storage fault domains, the method may further include:

selecting the fault domains to be stored with the number of copies as storage fault domains may include:

and selecting a number of fault domains to be stored as storage fault domains from one of the two groups of fault domains to be stored.

Referring to fig. 4, fig. 4 is a schematic diagram of a copy data storage method in ceph according to an embodiment of the present invention, in the embodiment shown in fig. 4, each physical fault domain, that is, each rack in the figure, is divided into two virtual fault sub-domains, respectively, a1, a2, B1, B2, C1, and C2, as shown in fig. 4, each virtual fault sub-domain includes 12 OSDs, in the embodiment shown in fig. 4, two groups of fault domains to be stored may be determined, and the virtual fault sub-domains in each group of fault domains to be stored belong to different fault domains, for example, a1, B1, and C1 are determined as one group, and a2, B2, and C2 are determined as another group.

Then, when data storage is performed, one of the two groups may be selected to store the duplicate data, and the duplicate data to be stored is stored on one OSD in the selected group of storage failure domains, because each storage failure domain includes 12 OSDs, then, for three identical duplicate data, when the three identical duplicate data are stored, 12 × 12 × 12 possible storage situations are selected for each group of storage failure domains, then the number of the final storage situations of the duplicate data in the failure domains is 12 × 12 × 12 × 2, which is smaller than the 24 × 24 × 24 possible storage situations obtained by using the duplicate data storage method in the prior art, and it can be known from the existing probability formula of data loss distribution that the duplicate data storage method provided in the embodiment of the present invention can reduce the probability of the loss of the duplicate data in ceph.

Referring to fig. 5, fig. 5 is a schematic structural diagram of a copy data storage device in ceph according to an embodiment of the present invention, where the schematic structural diagram may include:

a topology obtaining module 501, configured to obtain a topology of an OSD cluster of an object storage unit, where the OSD cluster includes a plurality of OSDs, and the topology represents a division condition of a physical fault domain of the OSD cluster;

a fault domain dividing module 502, configured to divide at least one physical fault domain of the OSD cluster into a plurality of virtual fault sub-domains, and use an undivided fault domain and a divided virtual fault sub-domain as a fault domain to be stored;

a storage domain selecting module 503, configured to select, based on the number of the copy data to be stored, a number of fault domains to be stored of the copy number as storage fault domains for the copy data to be stored, where different virtual fault sub domains in the selected storage fault domain belong to different physical fault domains;

the storage module 504 is configured to store the replica data in one OSD of each storage fault domain.

In this embodiment of the present invention, the fault domain dividing module 502 may be specifically configured to:

each physical fault domain of the OSD cluster is divided into a plurality of virtual fault sub-domains.

In this embodiment of the present invention, on the basis of the copy data storage device in ceph shown in fig. 5, the method may further include:

the storage domain selection module may be specifically configured to:

and selecting the fault domains to be stored with the number of copies as storage fault domains from one fault domain to be stored in the plurality of groups of fault domains to be stored.

In this embodiment of the present invention, the fault domain dividing module may be specifically configured to:

on the basis of the copy data storage device in ceph shown in fig. 5, the method may further include:

and the selection module is used for selecting the fault domains to be stored with the number of copies from one fault domain to be stored in two fault domains to be stored as the storage fault domains.

In the embodiment of the present invention, the number of the duplicate data to be stored may be the same as the number of the physical fault domains of the OSD cluster.

The embodiment of the invention discloses electronic equipment, which is shown in figure 6. Comprises a processor 601, a communication interface 602, a memory 603 and a communication bus 604, wherein the processor 601, the communication interface 602 and the memory 603 are communicated with each other through the communication bus 604,

a memory 603 for storing a computer program;

the processor 601 is configured to implement any of the above method steps when executing the program stored in the memory 603.

The communication bus mentioned in the electronic device may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus.

The communication interface is used for communication between the electronic equipment and other equipment.

The Memory may include a Random Access Memory (RAM) or a Non-Volatile Memory (NVM), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.

The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components.

The embodiment of the invention also provides a computer readable storage medium, wherein a computer program is stored in the computer readable storage medium, and the computer program is used for realizing any one of the method steps when being executed by a processor.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. The procedures or functions according to the embodiments of the invention are brought about in whole or in part when the computer program instructions are loaded and executed on a computer. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by wire (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wirelessly (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the apparatus, the electronic device, the computer-readable storage medium, and the computer program product embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and for the relevant points, reference may be made to the partial description of the method embodiments.

The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims

1. A method for storing copy data in a distributed file system ceph is characterized by comprising the following steps:

acquiring a topological structure of an OSD cluster of an object storage device, wherein the OSD cluster comprises a plurality of OSD, and the topological structure represents the division condition of a physical fault domain of the OSD cluster;

respectively storing the duplicate data in an OSD of each storage fault domain;

2. The method of claim 1, wherein the dividing at least one physical fault domain of the OSD cluster into a plurality of virtual fault sub-domains comprises:

3. The method of claim 1, wherein the dividing at least one physical fault domain of the OSD cluster into a plurality of virtual fault sub-domains comprises:

4. The method of claim 3, wherein the number of replica data to be stored is the same as the number of physical fault domains of the OSD cluster.

5. A replica data storage apparatus in a distributed file system ceph, the apparatus comprising:

the system comprises a topological structure acquisition module, a data processing module and a data processing module, wherein the topological structure acquisition module is used for acquiring a topological structure of an OSD cluster of an object storage device, the OSD cluster comprises a plurality of OSD, and the topological structure represents the division condition of a physical fault domain of the OSD cluster;

the storage module is used for respectively storing the duplicate data in one OSD of each storage fault domain;

the device further comprises:

the storage domain selection module is specifically configured to:

6. The apparatus according to claim 5, wherein the fault domain partitioning module is specifically configured to:

7. The apparatus of claim 5, wherein the fault domain partitioning module is specifically configured to:

the device further comprises:

8. The apparatus of claim 7, wherein the number of replica data to be stored is the same as the number of physical fault domains of the OSD cluster.

9. An electronic device, comprising a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory complete communication with each other through the communication bus;

the memory is used for storing a computer program;

the processor, when executing the program stored in the memory, implementing the method steps of any of claims 1-4.