CN114138736A

CN114138736A - Method, device, equipment and readable medium for selecting members of distributed file system PG

Info

Publication number: CN114138736A
Application number: CN202111276654.3A
Authority: CN
Inventors: 李彦博; 孟祥瑞; 孙润宇
Original assignee: Jinan Inspur Data Technology Co Ltd
Current assignee: Jinan Inspur Data Technology Co Ltd
Priority date: 2021-10-29
Filing date: 2021-10-29
Publication date: 2022-03-04

Abstract

The invention provides a method, a device, equipment and a readable medium for selecting members of a distributed file system (PG), wherein the method comprises the following steps: responding to the fault of a member in the OSD, calculating a new member of PG in the faulted OSD through a flush algorithm and an upmap correction table; responding to PG for information synchronization, and judging whether a new member of PG in the failed OSD conforms to the fault domain; in response to the new member of the PG in the failed OSD not conforming to the failure domain, deleting the new member of the PG in the OSD and putting the failed OSD in a waiting state and sending a message to a monitor; in response to the monitor receiving the message, the new osdmap table is sent to the malfunctioning OSD after the information in the upmap correction table is cleared; the malfunctioning OSD selects the member of the new osdmap table as the member of the PG. By using the scheme of the invention, the selected members can be ensured to conform to the fault domain, and the problems of front-end service blocking and data loss caused by the fault condition are avoided.

Description

Method, device, equipment and readable medium for selecting members of distributed file system PG

Technical Field

The present invention relates to the field of computers, and more particularly, to a method, apparatus, device and readable medium for selecting members of a distributed file system PG.

Background

For a distributed storage cluster, members of a PG (place Group, logical unit of data distribution) are obtained through a crush algorithm, the crush algorithm is a pseudorandom algorithm, the distribution of the PG members obtained through calculation is not balanced, in order to improve the capacity utilization rate of the cluster, an upmap algorithm is introduced, a correction table is finally obtained through the upmap algorithm, the member list of the PG is corrected through the upmap table, and the purpose of PG balanced distribution is finally achieved.

The PG equalization purpose can be achieved through correction of a crush algorithm and an upmap correction table, the member requirement of PG is to be in accordance with a fault domain, the up member calculated during calculation of the currh algorithm is in accordance with the fault domain, the correction table calculated during calculation of the upmap algorithm is in accordance with the fault domain after replacement, but when a fault occurs, the member of PG is changed, the crush algorithm calculates a new up member at the moment, but the correction table of the upmap algorithm has probability to enable the up member of PG not to be in accordance with the fault domain, and finally, data loss is possibly caused when the superposition fault occurs.

Disclosure of Invention

In view of this, an object of the embodiments of the present invention is to provide a method, an apparatus, a device, and a readable medium for selecting members of a distributed file system PG, which can ensure that the selected members conform to a fault domain, and avoid problems of front-end service blocking and data loss caused by a fault condition.

In view of the above object, an aspect of the embodiments of the present invention provides a method for selecting members by a distributed file system PG, including the following steps:

in response to a failure of a member in an OSD (Object-based Storage Device), calculating a new member of PG in the failed OSD through a pause algorithm and an upmap correction table;

responding to PG for information synchronization, and judging whether a new member of PG in the failed OSD conforms to the fault domain;

in response to the new member of the PG in the failed OSD not conforming to the failure domain, deleting the new member of the PG in the OSD and putting the failed OSD in a waiting state and sending a message to a monitor;

in response to the monitor receiving the message, the new osdmap table is sent to the malfunctioning OSD after the information in the upmap correction table is cleared;

the malfunctioning OSD selects the member of the new osdmap table as the member of the PG.

According to one embodiment of the present invention, in response to a failure of a member in an OSD, and calculating a new member of PG in the failed OSD through a flush algorithm and an upmap correction table includes:

in response to the fact that a member in the OSD fails, the member in the failed OSD is removed;

recalculating the members of the failed OSD by using a crush algorithm, and calculating a correction table of the failed OSD by using an upmap algorithm;

and correcting the members calculated by the flush algorithm by using the correction table, wherein the corrected members are selected by the faulted OSD as the members of the PG.

According to one embodiment of the present invention, the synchronizing information in response to the PG, the determining whether the new member of the PG in the failed OSD conforms to the fault domain comprises:

responding to the PG to carry out information synchronization, and acquiring node information of a new member of the PG;

judging whether each new member has the same node information;

in response to the same node information existing in the new member, it is determined that the new member of the PG in the failed OSD does not conform to the fault domain.

According to an embodiment of the present invention, further comprising:

in response to the absence of the same node information in the new member, determining that the new member of the PG in the failed OSD conforms to the fault domain.

In another aspect of the embodiments of the present invention, there is also provided an apparatus for selecting members by a distributed file system PG, the apparatus including:

the calculation module is configured to respond to the fact that a member in the OSD fails, and calculate a new member of PG in the failed OSD through a flush algorithm and an upmap correction table;

the judging module is configured to respond to the PG for information synchronization and judge whether a new member of the PG in the failed OSD conforms to a fault domain;

a delete module configured to delete a new member of PG in OSD and place the failed OSD in a wait state and send a message to a monitor in response to the new member of PG in the failed OSD not conforming to the fault domain;

a processing module configured to send a new osdmap table to the malfunctioning OSD after clearing information in the upmap correction table in response to the monitor receiving the message;

a selection module configured to select a member of the new osdmap table as a member of the PG by the malfunctioning OSD.

According to one embodiment of the invention, the calculation module is further configured to:

According to an embodiment of the invention, the determining module is further configured to:

judging whether each new member has the same node information;

In another aspect of an embodiment of the present invention, there is also provided a computer apparatus including:

at least one processor; and

a memory storing computer instructions executable on the processor, the instructions when executed by the processor implementing the steps of any of the methods described above.

In another aspect of the embodiments of the present invention, there is also provided a computer-readable storage medium storing a computer program, which when executed by a processor implements the steps of any one of the above-mentioned methods.

The invention has the following beneficial technical effects: in the method for selecting the PG members of the distributed file system provided by the embodiment of the invention, the new PG members in the OSD with faults are calculated by a flush algorithm and an upmap correction table in response to the fault of the PG members in the OSD; responding to PG for information synchronization, and judging whether a new member of PG in the failed OSD conforms to the fault domain; in response to the new member of the PG in the failed OSD not conforming to the failure domain, deleting the new member of the PG in the OSD and putting the failed OSD in a waiting state and sending a message to a monitor; in response to the monitor receiving the message, the new osdmap table is sent to the malfunctioning OSD after the information in the upmap correction table is cleared; the technical scheme that the member in the new osdmap table is selected by the faulted OSD as the member of the PG can ensure that the selected member conforms to the fault domain, and the problems of front-end service blocking and data loss caused by the fault condition are avoided.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other embodiments can be obtained by using the drawings without creative efforts.

FIG. 1 is a schematic flow chart diagram of a method of electing a distributed file system PG in accordance with one embodiment of the present invention;

FIG. 2 is a schematic diagram of an apparatus for election of a distributed file system PG according to one embodiment of the present invention;

FIG. 3 is a schematic diagram of a computer device according to one embodiment of the present invention;

fig. 4 is a schematic diagram of a computer-readable storage medium according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the following embodiments of the present invention are described in further detail with reference to the accompanying drawings.

In view of the above objects, a first aspect of embodiments of the present invention proposes an embodiment of a method for electing a distributed file system PG. Fig. 1 shows a schematic flow diagram of the method.

As shown in fig. 1, the method may include the steps of:

s1 responds to the fault of the member in the OSD, calculates the new member of PG in the fault OSD through the flush algorithm and the upmap correction table.

When a member in the OSD fails, namely a certain PG fails, the member needs to be reselected for the OSD, after all the members of the failed OSD are removed, the PG members of the OSD are calculated through a crush algorithm, but the crush algorithm is a pseudorandom algorithm, the calculated PG members are not distributed uniformly, in order to improve the capacity utilization rate of the cluster, an upmap algorithm needs to be used, a correction table is finally obtained through the upmap algorithm, then the PG members calculated through the crush algorithm are corrected through the correction table, and the corrected members are the new members of the OSD.

S2 performs information synchronization in response to the PG to determine whether a new member of the PG in the failed OSD corresponds to the fault domain.

After the new members are selected, an information synchronization process between PGs is carried out, in the process of the process, whether the selected PG members conform to a fault domain needs to be judged, namely node information of the new members of each PG is obtained, whether the same node information exists in each new member is judged, if the same node information exists in the new members, the new members of the PG are determined not to conform to the fault domain, if the same node information does not exist in the new members, namely, each PG member is distributed on different nodes, the new members of the PG conform to the fault domain is determined, and if the new members of the PG are determined to conform to the fault domain, the information synchronization process is completed.

S3 deletes the new member of PG in the OSD and puts the failed OSD in a waiting state and sends a message to the monitor in response to the new member of PG in the failed OSD not conforming to the fail domain.

If the new member of the PG is judged not to be in accordance with the fault domain, the new member needs to be selected again, the OSD is in a waiting state after all the selected members are deleted, and a message is sent to the monitor so that the monitor triggers the action of re-selecting the members.

S4 responds to the monitor receiving the message by clearing the information in the upmap correction table and sending the new osdmap table to the malfunctioning OSD.

After receiving the message, the monitor deletes the correction table calculated by using the upmap algorithm, and sends the osdmap table of the PG member calculated by using the crush algorithm to the OSD, that is, the PG member in the osdmap table is not corrected.

The OSD with S5 fault selects a member of the new osdmap table as a member of the PG.

After receiving osdmap table, the OSD selects PG in the table as a member of PG, and then performs the process of information synchronization.

By the technical scheme of the invention, the selected members can be ensured to conform to the fault domain, and the problems of front-end service blocking and data loss caused by the fault condition are avoided.

In a preferred embodiment of the present invention, in response to a failure of a member in the OSD, and calculating a new member of PG in the failed OSD through a flush algorithm and an upmap correction table includes:

and correcting the members calculated by the flush algorithm by using the correction table, wherein the corrected members are selected by the faulted OSD as the members of the PG. The computed PG members are not distributed uniformly, in order to improve the capacity utilization rate of the cluster, an upmap algorithm needs to be used, a correction table is finally obtained through the upmap algorithm, then the PG members computed by the brush algorithm are corrected by the correction table, and the corrected members are new members of the OSD.

In a preferred embodiment of the present invention, the synchronizing information in response to the PG, and the determining whether the new member of the PG in the failed OSD conforms to the fault domain comprises:

judging whether each new member has the same node information;

In a preferred embodiment of the present invention, the method further comprises:

It should be noted that, as will be understood by those skilled in the art, all or part of the processes in the methods of the above embodiments may be implemented by instructing relevant hardware through a computer program, and the above programs may be stored in a computer-readable storage medium, and when executed, the programs may include the processes of the embodiments of the methods as described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like. The embodiments of the computer program may achieve the same or similar effects as any of the above-described method embodiments.

Furthermore, the method disclosed according to an embodiment of the present invention may also be implemented as a computer program executed by a CPU, and the computer program may be stored in a computer-readable storage medium. The computer program, when executed by the CPU, performs the above-described functions defined in the method disclosed in the embodiments of the present invention.

In view of the above objects, according to a second aspect of the embodiments of the present invention, there is provided an apparatus for selecting members by a distributed file system PG, as shown in fig. 2, the apparatus 200 includes:

In a preferred embodiment of the present invention, the calculation module is further configured to:

In a preferred embodiment of the present invention, the determining module is further configured to:

judging whether each new member has the same node information;

In view of the above object, a third aspect of the embodiments of the present invention provides a computer device. Fig. 3 is a schematic diagram of an embodiment of a computer device provided by the present invention. As shown in fig. 3, an embodiment of the present invention includes the following means: at least one processor 21; and a memory 22, the memory 22 storing computer instructions 23 executable on the processor, the instructions when executed by the processor implementing the method of:

responding to the fault of a member in the OSD, calculating a new member of PG in the faulted OSD through a flush algorithm and an upmap correction table;

judging whether each new member has the same node information;

In view of the above object, a fourth aspect of the embodiments of the present invention proposes a computer-readable storage medium. FIG. 4 is a schematic diagram illustrating an embodiment of a computer-readable storage medium provided by the present invention. As shown in fig. 4, the computer-readable storage medium 31 stores a computer program 32 that, when executed by a processor, performs the method of:

judging whether each new member has the same node information;

Furthermore, the methods disclosed according to embodiments of the present invention may also be implemented as a computer program executed by a processor, which may be stored in a computer-readable storage medium. Which when executed by a processor performs the above-described functions defined in the methods disclosed in embodiments of the invention.

Further, the above method steps and system elements may also be implemented using a controller and a computer readable storage medium for storing a computer program for causing the controller to implement the functions of the above steps or elements.

Those of skill would further appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the disclosure herein may be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as software or hardware depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the disclosed embodiments of the present invention.

In one or more exemplary designs, the functions may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium. Computer-readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A storage media may be any available media that can be accessed by a general purpose or special purpose computer. By way of example, and not limitation, such computer-readable media can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a general-purpose or special-purpose computer, or a general-purpose or special-purpose processor. Also, any connection is properly termed a computer-readable medium. For example, if the software is transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, Digital Subscriber Line (DSL), or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium. Disk and disc, as used herein, includes Compact Disc (CD), laser disc, optical disc, Digital Versatile Disc (DVD), floppy disk, blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.

The foregoing is an exemplary embodiment of the present disclosure, but it should be noted that various changes and modifications could be made herein without departing from the scope of the present disclosure as defined by the appended claims. The functions, steps and/or actions of the method claims in accordance with the disclosed embodiments described herein need not be performed in any particular order. Furthermore, although elements of the disclosed embodiments of the invention may be described or claimed in the singular, the plural is contemplated unless limitation to the singular is explicitly stated.

It should be understood that, as used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly supports the exception. It should also be understood that "and/or" as used herein is meant to include any and all possible combinations of one or more of the associated listed items.

The numbers of the embodiments disclosed in the embodiments of the present invention are merely for description, and do not represent the merits of the embodiments.

It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, and the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.

Those of ordinary skill in the art will understand that: the discussion of any embodiment above is meant to be exemplary only, and is not intended to intimate that the scope of the disclosure, including the claims, of embodiments of the invention is limited to these examples; within the idea of an embodiment of the invention, also technical features in the above embodiment or in different embodiments may be combined and there are many other variations of the different aspects of the embodiments of the invention as described above, which are not provided in detail for the sake of brevity. Therefore, any omissions, modifications, substitutions, improvements, and the like that may be made without departing from the spirit and principles of the embodiments of the present invention are intended to be included within the scope of the embodiments of the present invention.

Claims

1. A method for selecting members of a distributed file system PG, comprising the steps of:

2. The method of claim 1, wherein responding to a failure of a member in an OSD and calculating a new member of PG in the failed OSD through a flush algorithm and an upmap correction table comprises:

3. The method of claim 1, wherein the determining whether a new member of the PG in the failed OSD conforms to the fault domain in response to the PG performing information synchronization comprises:

judging whether each new member has the same node information;

4. The method of claim 3, further comprising:

5. An apparatus for PG membership of a distributed file system, the apparatus comprising:

6. The apparatus of claim 5, wherein the computing module is further configured to:

7. The apparatus of claim 5, wherein the determining module is further configured to:

judging whether each new member has the same node information;

8. The apparatus of claim 7, wherein the determining module is further configured to:

9. A computer device, comprising:

at least one processor; and

a memory storing computer instructions executable on the processor, the instructions when executed by the processor implementing the steps of the method of any one of claims 1 to 4.

10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 4.