CN109614276B

CN109614276B - Fault processing method and device, distributed storage system and storage medium

Info

Publication number: CN109614276B
Application number: CN201811433003.9A
Authority: CN
Inventors: 宋小兵; 姜文峰
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2018-11-28
Filing date: 2018-11-28
Publication date: 2021-09-21
Anticipated expiration: 2038-11-28
Also published as: WO2020107829A1; CN109614276A

Abstract

The invention relates to a distributed storage technology, and discloses a fault processing method, a fault processing device, a distributed storage system and a computer readable storage medium. The invention detects the main OSD of the fault in real time or at regular time; when a fault main OSD is detected, determining PG corresponding to each object data stored in the fault main OSD according to the predetermined mapping relation between the object data and PG, and taking each determined PG as a fault PG; reducing the copy configuration quantity of all object data corresponding to all the fault PGs from a first preset quantity to a second preset quantity; and selecting one standby OSD from the standby OSD group as a new main OSD, replacing the failed main OSD by the new main OSD, and increasing the copy configuration quantity of all object data corresponding to all the failed PGs from the second preset quantity to the first preset quantity. Compared with the prior art, the method and the device reduce the data migration quantity among the OSD during the OSD fault processing process.

Description

Fault processing method and device, distributed storage system and storage medium

Technical Field

The present invention relates to the field of distributed storage technologies, and in particular, to a method and an apparatus for processing a failure, a distributed storage system, and a computer-readable storage medium.

Background

The CEPH distributed file system is a distributed storage system with large capacity, high performance and strong reliability. The core component of CEPH is an OSD (Object Storage Device) that manages a separate hard disk and provides an Object-based Storage interface for read and write access. The CEPH cluster is composed of a plurality of independent OSD, and the number of OSD can be increased or decreased dynamically. The CEPH client distributes the Object data (Object) to different OSDs for storage through the CRUSH algorithm. The CRUSH is a pseudo-random distribution algorithm, the algorithm firstly belongs object data to a Place Group (PG) through a HASH value (HASH), then calculates OSD stored by the PG, and accordingly, the object data belonging to the same PG is stored in a target OSD corresponding to the PG.

The CEPH is also a storage cluster with a self-repair function, that is, when a certain OSD in the CEPH fails, the corresponding OSD exits from service, data belonging to the OSD is reconstructed and redistributed to other OSDs, and after the OSD is repaired, the CEPH migrates part of data on the other OSDs to the OSD. Thus, CEPH can maintain the integrity of stored data even if a malfunctioning OSD in CEPH occurs. However, although the integrity of the stored data can be guaranteed by the fault handling method, the data in the OSDs in CEPH is migrated in a large amount in the data reconstruction process, so that the storage cluster resources are occupied and the storage performance is reduced.

Therefore, how to reduce the data migration amount between OSDs in the OSD fault processing process has become an urgent problem to be solved.

Disclosure of Invention

The invention mainly aims to provide a fault handling method, a fault handling device, a distributed storage system and a computer readable storage medium, aiming at solving the problem of reducing the data migration amount between OSD during the OSD fault handling process.

In order to achieve the above object, the present invention provides an electronic device, where the electronic device is respectively in communication connection with a plurality of active OSDs and at least one standby OSD group, where the standby OSD group includes a plurality of standby OSDs, the active OSDs are used to store object data, a first preset number of copies of each object data are respectively stored in corresponding first preset number of active OSDs, the electronic device includes a memory and a processor, where the memory stores a fault handling program, and when the fault handling program is executed by the processor, the electronic device implements the following steps:

a detection step: detecting whether each main object storage device has a fault in real time or in a timing manner;

a determination step: when a fault main OSD is detected, determining PG corresponding to each object data stored in the fault main OSD according to a predetermined mapping relation between the object data and PG, and taking each determined PG as a fault PG;

a degradation step: reducing the copy configuration quantity of all object data corresponding to all the fault PGs from a first preset quantity to a second preset quantity;

a replacement step: and selecting one standby OSD from the standby OSD group as a new main OSD, replacing the failed main OSD by the new main OSD, and increasing the copy configuration quantity of all object data corresponding to the failed PG from a second preset quantity to a first preset quantity.

Preferably, the processor executes the fault handling program, and after the replacing step, further implements the steps of:

and according to a predetermined mapping relation between the PG and the main OSD, taking a first preset number of main OSD corresponding to each fault PG as a fault OSD group, and performing data recovery on the new main OSD by using other non-faulty main OSD except the new main OSD in each fault OSD group.

when a write request for object data is received by one of the malfunctioning OSD groups,

redirecting the write request to the standby OSD group, and executing the write request by using the standby OSD group;

judging whether each spare OSD of the spare OSD group stores object data in real time or at regular time or when an incremental data recovery request is received;

when each spare OSD of the spare OSD group stores object data, judging whether the fault OSD group exists or not;

when the fault OSD group does not exist, transferring the object data stored in the standby OSD group to one or more main OSD;

when the fault OSD group exists, searching for the main OSD which does not belong to the fault OSD group, and when the fault OSD group exists, transferring the object data stored in the standby OSD group to one or more searched main OSD.

Preferably, the step of replacing the failed active OSD with the new active OSD includes:

and removing the preset mapping relationship between the equipment identification information of the main OSD with the fault and the position information of the main OSD with the fault, allocating the equipment identification information of the main OSD with the fault to the new main OSD as the equipment identification information of the new main OSD, and reestablishing and storing the mapping relationship between the equipment identification information of the new main OSD and the position information of the new main OSD.

In addition, to achieve the above object, the present invention further provides a fault handling method, which is suitable for an electronic device, where the electronic device is respectively in communication connection with a plurality of active OSDs and at least one standby OSD group, where the standby OSD group includes a plurality of standby OSDs, the active OSDs are used to store object data, and copies of a first preset number of each object data are respectively stored in the corresponding active OSDs of the first preset number, and the method includes:

Preferably, after the step of replacing, the method further comprises:

In addition, to achieve the above object, the present invention further provides a distributed storage system, where the distributed storage system includes an electronic apparatus, a plurality of active object storage devices, and at least one standby object storage device group, the electronic apparatus is respectively in communication connection with each of the active object storage devices and each of the standby object storage device groups, each of the standby object storage device groups includes a plurality of standby object storage devices, the active object storage devices are configured to store object data, a first preset number of copies of each of the object data are respectively stored in corresponding first preset number of active object storage devices, the electronic apparatus includes a memory and a processor, a fault handling program is stored in the memory, and when executed by the processor, the fault handling program implements the following steps:

a determination step: when a fault main object storage device is detected, determining a homing group corresponding to each object data stored in the fault main object storage device according to a predetermined mapping relation between the object data and the homing group, and taking each determined homing group as a fault homing group;

a degradation step: reducing the copy configuration quantity of all the object data corresponding to all the fault homing groups from a first preset quantity to a second preset quantity;

a replacement step: and selecting one standby object storage device from the standby object storage device group as a new main object storage device, replacing the failed main object storage device with the new main object storage device, and increasing the copy configuration quantity of all object data corresponding to the failure homing group from a second preset quantity to a first preset quantity.

Furthermore, to achieve the above object, the present invention also proposes a computer-readable storage medium storing a fault handling program executable by at least one processor to cause the at least one processor to perform the steps of the fault handling method according to any one of the above.

The invention detects the main OSD of the fault in real time or at regular time; when a fault main OSD is detected, determining PG corresponding to each object data stored in the fault main OSD according to a predetermined mapping relation between the object data and PG, and taking each determined PG as a fault PG; reducing the copy configuration quantity of all object data corresponding to all the fault PGs from a first preset quantity to a second preset quantity; and selecting one standby OSD from the standby OSD group as a new main OSD, replacing the failed main OSD by the new main OSD, and increasing the copy configuration quantity of all object data corresponding to the failed PG from a second preset quantity to a first preset quantity. Compared with the prior art, when OSD of the distributed storage system fails, the copy configuration quantity of all object data corresponding to the failure PG is reduced from the first preset quantity to the second preset quantity, so that the distributed storage system identifies that the current failure PG copy quantity meets the copy configuration quantity, data reconstruction cannot be carried out on the failure OSD, and a large amount of data migration among OSD cannot be caused. Therefore, the invention reduces the data migration quantity among OSD in the OSD fault processing process.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the structures shown in the drawings without creative efforts.

FIG. 1 is a system architecture diagram of a distributed storage system according to a first embodiment of the present invention;

FIG. 2 is a schematic diagram of a storage relationship of the distributed storage system of the present invention;

FIG. 3 is a schematic diagram of an operating environment of a first embodiment of a fault handler of the present invention;

FIG. 4 is a block diagram of a first embodiment of a fault handling routine of the present invention;

fig. 5 is a flowchart illustrating a fault handling method according to a first embodiment of the present invention.

The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.

Detailed Description

The principles and features of this invention are described below in conjunction with the following drawings, which are set forth by way of illustration only and are not intended to limit the scope of the invention.

Fig. 1 is a schematic diagram of a system architecture of a distributed storage system according to a first embodiment of the present invention.

In this embodiment, the distributed storage system includes a plurality of active OSDs 31 and at least one standby OSD group, where the standby OSD group includes a plurality of standby OSDs 32, each of the active OSDs 31 and the standby OSDs 32 may be disposed in each host 3, for example, at least one active OSD31 and at least one standby OSD32 are disposed in one host 3, and each of the active OSDs 31 and the standby OSD32 are communicatively connected (e.g., communicatively connected through a network 2).

In some application scenarios, an electronic device 1 is further disposed in the distributed storage system, and the electronic device 1 is communicatively connected (e.g., communicatively connected via a network 2) with each of the active OSD31 and the standby OSD 32.

In some application scenarios, the electronic apparatus 1 is disposed independently of the distributed storage system and is communicatively connected with the distributed storage system (e.g., communicatively connected via the network 2).

In this embodiment, the smallest storage unit in the distributed storage system is object data (object), one object data is a data block with a size not exceeding a predetermined value (e.g., 4MB), and each object data is mapped to a corresponding PG.

The distributed storage system supports a multi-copy policy, for example, if the configuration quantity of the copies of the object data corresponding to the PG in the distributed storage system is preset to a first preset quantity (e.g., three), it represents that all the object data in one PG have the first preset quantity of copies (copies), and each copy of all the object data in the PG is correspondingly stored in the first preset quantity of OSDs, respectively. For example, 3 copies of each object data in PG1.1 in fig. 2 are stored in osd.0, osd.1, and osd.2, respectively, so all object data in PG1.1 are stored in osd.0, osd.1, and osd.2, respectively. Since the distributed storage system performs data processing with PG as a basic unit, in the following embodiments, one copy (duplicate) of all object data in one PG is referred to as a PG duplicate of the PG.

Hereinafter, various embodiments of the present invention will be proposed based on the above-described distributed storage system and related devices.

The invention provides a fault handling program.

Referring to fig. 3, fig. 3 is a schematic diagram of an operating environment of a fault handling program according to a first embodiment of the invention.

In the present embodiment, the failure processing program 10 is installed and run in the electronic apparatus 1. The electronic device 1 may be a desktop computer, a notebook, a palm computer, a server, or other computing equipment. The electronic device 1 may include, but is not limited to, a memory 11 and a processor 12 that communicate with each other via a program bus. Fig. 3 only shows the electronic device 1 with components 11, 12, but it is to be understood that not all shown components are required to be implemented, and that more or fewer components may alternatively be implemented.

The memory 11 may in some embodiments be an internal storage unit of the electronic device 1, such as a hard disk or a memory of the electronic device 1. The memory 11 may also be an external storage device of the electronic apparatus 1 in other embodiments, such as a plug-in hard disk provided on the electronic apparatus 1, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like. Further, the memory 11 may also include both an internal storage unit and an external storage device of the electronic apparatus 1. The memory 11 is used for storing application software installed in the electronic device 1 and various types of data, such as program codes of the fault handling program 10. The memory 11 may also be used to temporarily store data that has been output or is to be output.

Processor 12, which in some embodiments may be a Central Processing Unit (CPU), microprocessor or other data Processing chip, is configured to execute program code stored in memory 11 or process data, such as executing fault handler 10.

Referring to fig. 4, a block diagram of a first embodiment of the fault handling program 10 according to the present invention is shown. In this embodiment, the fault handling program 10 may be divided into one or more modules, and the one or more modules are stored in the memory 11 and executed by one or more processors (in this embodiment, the processor 12) to complete the present invention. For example, in fig. 3, the fault handling program 10 may be divided into a detection module 101, a determination module 102, a degradation module 103, and a replacement module 104. The module referred to in the present invention refers to a series of computer program instruction segments capable of performing specific functions, and is more suitable than a program for describing the execution process of the fault handling program 10 in the electronic device 1, wherein:

the detecting module 101 is configured to detect whether each active OSD fails in real time or at regular time.

For example, a heartbeat mechanism may be used to detect whether a primary OSD fails, and send a detection message to each primary OSD in real time or at regular time.

The determining module 102 is configured to, when a faulty active OSD is detected, determine, according to a predetermined mapping relationship between object data and PGs, a PG corresponding to each object data stored in the faulty active OSD, and use each determined PG as a faulty PG.

A demotion module 103, configured to reduce the copy configuration amount of all object data corresponding to all the failed PGs from a first preset amount to a second preset amount.

For example, if the first preset number is 3, each object data in each PG should have 3 copies and be correspondingly stored in 3 active OSDs, that is, one PG should have 3 PG copies and be correspondingly stored in 3 active OSDs. Once a primary OSD fails, only 2 PG copies of the failed PG exist in the distributed storage system, and when the distributed storage system recognizes that the number of the copies of the failed PG is less than the copy configuration amount, data reconstruction is started, that is, one PG copy of each failed PG is copied, and each copied PG copy is written into the corresponding primary OSD, so that the number of the copies of the failed PG reaches the copy configuration amount. In this embodiment, the copy allocation amount of all the object data corresponding to all the failed PGs is reduced from the first preset amount to the second preset amount, that is, the copy allocation amount of all the failed PGs is reduced from the first preset amount to the second preset amount, for example, the first preset amount is 3, and the second preset amount is 2, that is, the multi-copy policy of the failed PG is degraded from three copies to two copies. At this time, one PG copy of each failed PG stored in the failed active OSD is removed, two PG copies of each failed PG still exist in the active OSDs in other normal states, and the number of the PG copies of the failed PG is equal to the current copy configuration amount, so that the distributed storage system does not immediately reconstruct data, and does not cause a large amount of data migration.

The replacing module 104 is configured to select one spare OSD from the spare OSD group as a new active OSD, replace the failed active OSD with the new active OSD, and increase the copy configuration amount of all object data corresponding to all the failed PGs from the second preset amount to the first preset amount.

In this embodiment, the step of selecting one spare OSD from the spare OSD group by the replacement module 104 as the new active OSD includes:

and searching the standby OSD group for the standby OSD which is in the same host as the main OSD with the fault. And if the standby OSD is found, taking the found standby OSD as a new main OSD. If not, a spare OSD is randomly selected from the spare OSD group as a new main OSD.

Further, in this embodiment, the step of replacing, by the replacing module 104, the failed active OSD by the new active OSD includes:

and removing a preset mapping relationship between the device identification information of the main OSD with the fault and the position information (for example, a network port value) of the main OSD with the fault, allocating the device identification information of the main OSD with the fault to the new main OSD as the device identification information of the new main OSD, and reestablishing and storing the mapping relationship between the device identification information of the new main OSD and the position information of the new main OSD.

In this embodiment, the reason why the device identification information of the failed main OSD is allocated to the new main OSD as the device identification information of the new main OSD, but the original device identification information of the new main OSD is not used, is that once the original device identification information of the new main OSD is used and the mapping relationship between the original device identification information of the new main OSD and the location information of the new main OSD is established, the distributed storage system recognizes that a new OSD is added, and then starts a data rebalancing (re-balance) operation, that is, a part of PG copies are respectively selected from each other main OSD and migrated to the new main OSD, so as to achieve reasonable distribution of the PG copies, and the data rebalancing operation causes migration of a large amount of data, thereby affecting the response speed of the distributed storage system.

Compared with the prior art, when OSD fails in the distributed storage system, the copy allocation amount of all object data corresponding to the failure PG is reduced from the first preset amount to the second preset amount, so that the distributed storage system recognizes that the current failure PG copy amount meets the copy allocation amount, and therefore, data reconstruction cannot be performed on the failure OSD, and a large amount of data migration among OSDs cannot be caused.

Further, in this embodiment, the program further includes a data recovery module (not shown in the figure) configured to:

according to the predetermined mapping relationship between the PG and the active OSD, a first preset number of active OSDs corresponding to each of the failed PGs are used as a failed OSD group (as shown in fig. 2, if osd.0 is the active OSD of the failure, PG1.1, PG1.2, and PG1.3 are all failed PGs, the failed OSD group corresponding to PG1.1 includes osd.0, osd.1, and osd.2, the failed OSD group corresponding to PG1.2 includes osd.0, osd.1, and osd.2, and the failed OSD group corresponding to PG1.3 includes osd.0, osd.2, and osd.3), and the data recovery is performed on the new active OSD by using other active OSDs that do not fail in each of the failed OSD groups except the new active OSD. And after data recovery is completed, the state of each faulty OSD group is marked as a normal state.

Further, in this embodiment, the program further includes a redirection module (not shown in the figure) configured to:

when one of the fault OSD groups receives a write request of object data, the write request is redirected to the spare OSD group, and the write request is executed by using the spare OSD group.

The reason why the standby OSD group is enabled to execute the write request in this embodiment is that the new active OSD in the failed OSD group has not completed data recovery at this time, and if the write request is executed again by the failed OSD group, the write request execution is delayed. Therefore, the write request can be executed by starting the standby OSD group, so that the execution efficiency of the write request can be effectively ensured.

Further, in this embodiment, the data recovery module is further configured to:

and judging whether each spare OSD of the spare OSD group stores object data in real time or at regular time or when an incremental data recovery request is received.

And when each spare OSD of the spare OSD group stores object data, judging whether the fault OSD group exists or not.

And when the fault OSD group does not exist, transferring the object data stored in the standby OSD group to one or more main OSD.

And when the fault OSD group exists, searching for the main OSD which does not belong to the fault OSD group.

And when the standby OSD group is searched, transferring the object data stored in the standby OSD group to one or more searched main OSD.

And when the data is not searched, returning a message of failing to recover the incremental data, or returning and continuously searching the main OSD which does not belong to the fault OSD group until the main OSD which does not belong to the fault OSD group is searched.

Further, in this embodiment, the program is further configured to:

detecting the number of the spare OSD groups in real time or at regular time, and when the number of the spare OSD groups is smaller than or equal to a preset threshold value, selecting one or more spare OSD groups which do not belong to the spare OSD groups from the spare OSD groups of each host to add to the spare OSD groups.

In addition, the invention provides a fault processing method.

As shown in fig. 5, fig. 5 is a flowchart illustrating a fault handling method according to a first embodiment of the present invention.

In this embodiment, the method is applicable to an electronic device, the electronic device is respectively in communication connection with a plurality of active OSDs and at least one standby OSD group, each of the standby OSD groups includes a plurality of standby OSDs, the active OSDs are used for storing object data, and copies of a first preset number of each of the object data are respectively stored in the corresponding active OSDs of the first preset number, and the method includes:

step S10, detecting whether each active OSD fails in real time or at regular time.

Step S20, when a faulty active OSD is detected, determining PGs corresponding to object data stored in the faulty active OSD according to a predetermined mapping relationship between the object data and the PGs, and using the determined PGs as faulty PGs.

Step S30, reducing the copy allocation amount of all object data corresponding to all the failed PGs from a first preset amount to a second preset amount.

Step S40, selecting one spare OSD from the spare OSD group as a new active OSD, replacing the failed active OSD with the new active OSD, and increasing the copy allocation amount of all object data corresponding to all the failed PGs from the second preset amount to the first preset amount.

In this embodiment, the step of selecting one spare OSD from the spare OSD group as the new active OSD includes:

Further, in this embodiment, the step of replacing the failed active OSD with the new active OSD includes:

Further, in this embodiment, after step S40, the method further includes:

Further, in this embodiment, the method further includes:

Further, the present invention also proposes a computer-readable storage medium storing a fault handling program executable by at least one processor to cause the at least one processor to perform the fault handling method in any of the above embodiments.

The above description is only a preferred embodiment of the present invention, and is not intended to limit the scope of the present invention, and all modifications and equivalents of the present invention, which are made by the contents of the present specification and the accompanying drawings, or directly/indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims

1. An electronic apparatus, wherein the electronic apparatus is respectively in communication connection with a plurality of active object storage devices and at least one standby object storage device group, the standby object storage device group includes a plurality of standby object storage devices, the active object storage devices are used for storing object data, a first preset number of copies of each object data are respectively stored in a corresponding first preset number of active object storage devices, the electronic apparatus includes a memory and a processor, the memory stores a fault handling program, and when executed by the processor, the fault handling program implements the following steps:

2. The electronic device of claim 1, wherein the processor executes the fault handling program and, after the replacing step, further performs the steps of:

according to a predetermined mapping relationship between the homing groups and the primary object storage devices, taking a first preset number of primary object storage devices corresponding to each fault homing group as a fault object storage device group, and performing data recovery on the new primary object storage device by using other non-fault primary object storage devices in each fault object storage device group except the new primary object storage device.

3. The electronic device of claim 2, wherein the processor executes the fault handling program and, after the replacing step, further performs the steps of:

when a fault object storage device group receives a write request of object data, redirecting the write request to a standby object storage device group, and executing the write request by using the standby object storage device group;

judging whether each spare object storage device of the spare object storage device group stores object data in real time or at regular time or when receiving an incremental data recovery request;

when object data is stored in each spare object storage device of the spare object storage device group, judging whether the fault object storage device group exists or not;

when the fault object storage equipment group does not exist, migrating the object data stored in the standby object storage equipment group to one or more main object storage equipment;

when the fault object storage device group exists, searching for the main object storage device which does not belong to the fault object storage device group, and when the fault object storage device group exists, migrating the object data stored in the standby object storage device group to one or more searched main object storage devices.

4. The electronic apparatus of any of claims 1-3, wherein the step of replacing the failed primary object storage device with the new primary object storage device comprises:

and removing the preset mapping relationship between the device identification information of the failed main object storage device and the position information of the failed main object storage device, allocating the device identification information of the failed main object storage device to the new main object storage device as the device identification information of the new main object storage device, and reestablishing and storing the mapping relationship between the device identification information of the new main object storage device and the position information of the new main object storage device.

5. A fault handling method is applicable to an electronic device, and is characterized in that the electronic device is respectively in communication connection with a plurality of main object storage devices and at least one spare object storage device group, the spare object storage device group comprises a plurality of spare object storage devices, the main object storage devices are used for storing object data, and first preset number of copies of each object data are respectively stored in corresponding first preset number of main object storage devices, the method comprises the following steps:

6. The fault handling method of claim 5 wherein after the step of permuting, the method further comprises:

7. The fault handling method of claim 6 wherein after the step of permuting, the method further comprises:

8. The failure handling method of any of claims 5 to 7, wherein the step of replacing the failed primary object storage device with the new primary object storage device comprises:

9. A distributed storage system is characterized in that the distributed storage system comprises an electronic device, a plurality of main object storage devices and at least one spare object storage device group, the electronic device is respectively in communication connection with each main object storage device and each spare object storage device group, each spare object storage device group comprises a plurality of spare object storage devices, the main object storage devices are used for storing object data, a first preset number of copies of each object data are respectively stored in the corresponding first preset number of main object storage devices, the electronic device comprises a memory and a processor, a fault handling program is stored in the memory, and when the fault handling program is executed by the processor, the following steps are realized:

10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a fault handling program executable by at least one processor to cause the at least one processor to perform the steps of the fault handling method according to any one of claims 5-8.