CN109614276B - Fault processing method and device, distributed storage system and storage medium - Google Patents

Fault processing method and device, distributed storage system and storage medium Download PDF

Info

Publication number
CN109614276B
CN109614276B CN201811433003.9A CN201811433003A CN109614276B CN 109614276 B CN109614276 B CN 109614276B CN 201811433003 A CN201811433003 A CN 201811433003A CN 109614276 B CN109614276 B CN 109614276B
Authority
CN
China
Prior art keywords
object storage
storage device
fault
group
osd
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811433003.9A
Other languages
Chinese (zh)
Other versions
CN109614276A (en
Inventor
宋小兵
姜文峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN201811433003.9A priority Critical patent/CN109614276B/en
Publication of CN109614276A publication Critical patent/CN109614276A/en
Priority to PCT/CN2019/088634 priority patent/WO2020107829A1/en
Application granted granted Critical
Publication of CN109614276B publication Critical patent/CN109614276B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/2053Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where persistent mass storage functionality or persistent mass storage control functionality is redundant
    • G06F11/2094Redundant storage or storage space
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1458Management of the backup or restore process
    • G06F11/1464Management of the backup or restore process for networked environments

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Hardware Redundancy (AREA)

Abstract

The invention relates to a distributed storage technology, and discloses a fault processing method, a fault processing device, a distributed storage system and a computer readable storage medium. The invention detects the main OSD of the fault in real time or at regular time; when a fault main OSD is detected, determining PG corresponding to each object data stored in the fault main OSD according to the predetermined mapping relation between the object data and PG, and taking each determined PG as a fault PG; reducing the copy configuration quantity of all object data corresponding to all the fault PGs from a first preset quantity to a second preset quantity; and selecting one standby OSD from the standby OSD group as a new main OSD, replacing the failed main OSD by the new main OSD, and increasing the copy configuration quantity of all object data corresponding to all the failed PGs from the second preset quantity to the first preset quantity. Compared with the prior art, the method and the device reduce the data migration quantity among the OSD during the OSD fault processing process.

Description

Fault processing method and device, distributed storage system and storage medium
Technical Field
The present invention relates to the field of distributed storage technologies, and in particular, to a method and an apparatus for processing a failure, a distributed storage system, and a computer-readable storage medium.
Background
The CEPH distributed file system is a distributed storage system with large capacity, high performance and strong reliability. The core component of CEPH is an OSD (Object Storage Device) that manages a separate hard disk and provides an Object-based Storage interface for read and write access. The CEPH cluster is composed of a plurality of independent OSD, and the number of OSD can be increased or decreased dynamically. The CEPH client distributes the Object data (Object) to different OSDs for storage through the CRUSH algorithm. The CRUSH is a pseudo-random distribution algorithm, the algorithm firstly belongs object data to a Place Group (PG) through a HASH value (HASH), then calculates OSD stored by the PG, and accordingly, the object data belonging to the same PG is stored in a target OSD corresponding to the PG.
The CEPH is also a storage cluster with a self-repair function, that is, when a certain OSD in the CEPH fails, the corresponding OSD exits from service, data belonging to the OSD is reconstructed and redistributed to other OSDs, and after the OSD is repaired, the CEPH migrates part of data on the other OSDs to the OSD. Thus, CEPH can maintain the integrity of stored data even if a malfunctioning OSD in CEPH occurs. However, although the integrity of the stored data can be guaranteed by the fault handling method, the data in the OSDs in CEPH is migrated in a large amount in the data reconstruction process, so that the storage cluster resources are occupied and the storage performance is reduced.
Therefore, how to reduce the data migration amount between OSDs in the OSD fault processing process has become an urgent problem to be solved.
Disclosure of Invention
The invention mainly aims to provide a fault handling method, a fault handling device, a distributed storage system and a computer readable storage medium, aiming at solving the problem of reducing the data migration amount between OSD during the OSD fault handling process.
In order to achieve the above object, the present invention provides an electronic device, where the electronic device is respectively in communication connection with a plurality of active OSDs and at least one standby OSD group, where the standby OSD group includes a plurality of standby OSDs, the active OSDs are used to store object data, a first preset number of copies of each object data are respectively stored in corresponding first preset number of active OSDs, the electronic device includes a memory and a processor, where the memory stores a fault handling program, and when the fault handling program is executed by the processor, the electronic device implements the following steps:
a detection step: detecting whether each main object storage device has a fault in real time or in a timing manner;
a determination step: when a fault main OSD is detected, determining PG corresponding to each object data stored in the fault main OSD according to a predetermined mapping relation between the object data and PG, and taking each determined PG as a fault PG;
a degradation step: reducing the copy configuration quantity of all object data corresponding to all the fault PGs from a first preset quantity to a second preset quantity;
a replacement step: and selecting one standby OSD from the standby OSD group as a new main OSD, replacing the failed main OSD by the new main OSD, and increasing the copy configuration quantity of all object data corresponding to the failed PG from a second preset quantity to a first preset quantity.
Preferably, the processor executes the fault handling program, and after the replacing step, further implements the steps of:
and according to a predetermined mapping relation between the PG and the main OSD, taking a first preset number of main OSD corresponding to each fault PG as a fault OSD group, and performing data recovery on the new main OSD by using other non-faulty main OSD except the new main OSD in each fault OSD group.
Preferably, the processor executes the fault handling program, and after the replacing step, further implements the steps of:
when a write request for object data is received by one of the malfunctioning OSD groups,
redirecting the write request to the standby OSD group, and executing the write request by using the standby OSD group;
judging whether each spare OSD of the spare OSD group stores object data in real time or at regular time or when an incremental data recovery request is received;
when each spare OSD of the spare OSD group stores object data, judging whether the fault OSD group exists or not;
when the fault OSD group does not exist, transferring the object data stored in the standby OSD group to one or more main OSD;
when the fault OSD group exists, searching for the main OSD which does not belong to the fault OSD group, and when the fault OSD group exists, transferring the object data stored in the standby OSD group to one or more searched main OSD.
Preferably, the step of replacing the failed active OSD with the new active OSD includes:
and removing the preset mapping relationship between the equipment identification information of the main OSD with the fault and the position information of the main OSD with the fault, allocating the equipment identification information of the main OSD with the fault to the new main OSD as the equipment identification information of the new main OSD, and reestablishing and storing the mapping relationship between the equipment identification information of the new main OSD and the position information of the new main OSD.
In addition, to achieve the above object, the present invention further provides a fault handling method, which is suitable for an electronic device, where the electronic device is respectively in communication connection with a plurality of active OSDs and at least one standby OSD group, where the standby OSD group includes a plurality of standby OSDs, the active OSDs are used to store object data, and copies of a first preset number of each object data are respectively stored in the corresponding active OSDs of the first preset number, and the method includes:
a detection step: detecting whether each main object storage device has a fault in real time or in a timing manner;
a determination step: when a fault main OSD is detected, determining PG corresponding to each object data stored in the fault main OSD according to a predetermined mapping relation between the object data and PG, and taking each determined PG as a fault PG;
a degradation step: reducing the copy configuration quantity of all object data corresponding to all the fault PGs from a first preset quantity to a second preset quantity;
a replacement step: and selecting one standby OSD from the standby OSD group as a new main OSD, replacing the failed main OSD by the new main OSD, and increasing the copy configuration quantity of all object data corresponding to the failed PG from a second preset quantity to a first preset quantity.
Preferably, after the step of replacing, the method further comprises:
and according to a predetermined mapping relation between the PG and the main OSD, taking a first preset number of main OSD corresponding to each fault PG as a fault OSD group, and performing data recovery on the new main OSD by using other non-faulty main OSD except the new main OSD in each fault OSD group.
Preferably, after the step of replacing, the method further comprises:
when a write request for object data is received by one of the malfunctioning OSD groups,
redirecting the write request to the standby OSD group, and executing the write request by using the standby OSD group;
judging whether each spare OSD of the spare OSD group stores object data in real time or at regular time or when an incremental data recovery request is received;
when each spare OSD of the spare OSD group stores object data, judging whether the fault OSD group exists or not;
when the fault OSD group does not exist, transferring the object data stored in the standby OSD group to one or more main OSD;
when the fault OSD group exists, searching for the main OSD which does not belong to the fault OSD group, and when the fault OSD group exists, transferring the object data stored in the standby OSD group to one or more searched main OSD.
Preferably, the step of replacing the failed active OSD with the new active OSD includes:
and removing the preset mapping relationship between the equipment identification information of the main OSD with the fault and the position information of the main OSD with the fault, allocating the equipment identification information of the main OSD with the fault to the new main OSD as the equipment identification information of the new main OSD, and reestablishing and storing the mapping relationship between the equipment identification information of the new main OSD and the position information of the new main OSD.
In addition, to achieve the above object, the present invention further provides a distributed storage system, where the distributed storage system includes an electronic apparatus, a plurality of active object storage devices, and at least one standby object storage device group, the electronic apparatus is respectively in communication connection with each of the active object storage devices and each of the standby object storage device groups, each of the standby object storage device groups includes a plurality of standby object storage devices, the active object storage devices are configured to store object data, a first preset number of copies of each of the object data are respectively stored in corresponding first preset number of active object storage devices, the electronic apparatus includes a memory and a processor, a fault handling program is stored in the memory, and when executed by the processor, the fault handling program implements the following steps:
a detection step: detecting whether each main object storage device has a fault in real time or in a timing manner;
a determination step: when a fault main object storage device is detected, determining a homing group corresponding to each object data stored in the fault main object storage device according to a predetermined mapping relation between the object data and the homing group, and taking each determined homing group as a fault homing group;
a degradation step: reducing the copy configuration quantity of all the object data corresponding to all the fault homing groups from a first preset quantity to a second preset quantity;
a replacement step: and selecting one standby object storage device from the standby object storage device group as a new main object storage device, replacing the failed main object storage device with the new main object storage device, and increasing the copy configuration quantity of all object data corresponding to the failure homing group from a second preset quantity to a first preset quantity.
Furthermore, to achieve the above object, the present invention also proposes a computer-readable storage medium storing a fault handling program executable by at least one processor to cause the at least one processor to perform the steps of the fault handling method according to any one of the above.
The invention detects the main OSD of the fault in real time or at regular time; when a fault main OSD is detected, determining PG corresponding to each object data stored in the fault main OSD according to a predetermined mapping relation between the object data and PG, and taking each determined PG as a fault PG; reducing the copy configuration quantity of all object data corresponding to all the fault PGs from a first preset quantity to a second preset quantity; and selecting one standby OSD from the standby OSD group as a new main OSD, replacing the failed main OSD by the new main OSD, and increasing the copy configuration quantity of all object data corresponding to the failed PG from a second preset quantity to a first preset quantity. Compared with the prior art, when OSD of the distributed storage system fails, the copy configuration quantity of all object data corresponding to the failure PG is reduced from the first preset quantity to the second preset quantity, so that the distributed storage system identifies that the current failure PG copy quantity meets the copy configuration quantity, data reconstruction cannot be carried out on the failure OSD, and a large amount of data migration among OSD cannot be caused. Therefore, the invention reduces the data migration quantity among OSD in the OSD fault processing process.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the structures shown in the drawings without creative efforts.
FIG. 1 is a system architecture diagram of a distributed storage system according to a first embodiment of the present invention;
FIG. 2 is a schematic diagram of a storage relationship of the distributed storage system of the present invention;
FIG. 3 is a schematic diagram of an operating environment of a first embodiment of a fault handler of the present invention;
FIG. 4 is a block diagram of a first embodiment of a fault handling routine of the present invention;
fig. 5 is a flowchart illustrating a fault handling method according to a first embodiment of the present invention.
The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
The principles and features of this invention are described below in conjunction with the following drawings, which are set forth by way of illustration only and are not intended to limit the scope of the invention.
Fig. 1 is a schematic diagram of a system architecture of a distributed storage system according to a first embodiment of the present invention.
In this embodiment, the distributed storage system includes a plurality of active OSDs 31 and at least one standby OSD group, where the standby OSD group includes a plurality of standby OSDs 32, each of the active OSDs 31 and the standby OSDs 32 may be disposed in each host 3, for example, at least one active OSD31 and at least one standby OSD32 are disposed in one host 3, and each of the active OSDs 31 and the standby OSD32 are communicatively connected (e.g., communicatively connected through a network 2).
In some application scenarios, an electronic device 1 is further disposed in the distributed storage system, and the electronic device 1 is communicatively connected (e.g., communicatively connected via a network 2) with each of the active OSD31 and the standby OSD 32.
In some application scenarios, the electronic apparatus 1 is disposed independently of the distributed storage system and is communicatively connected with the distributed storage system (e.g., communicatively connected via the network 2).
In this embodiment, the smallest storage unit in the distributed storage system is object data (object), one object data is a data block with a size not exceeding a predetermined value (e.g., 4MB), and each object data is mapped to a corresponding PG.
The distributed storage system supports a multi-copy policy, for example, if the configuration quantity of the copies of the object data corresponding to the PG in the distributed storage system is preset to a first preset quantity (e.g., three), it represents that all the object data in one PG have the first preset quantity of copies (copies), and each copy of all the object data in the PG is correspondingly stored in the first preset quantity of OSDs, respectively. For example, 3 copies of each object data in PG1.1 in fig. 2 are stored in osd.0, osd.1, and osd.2, respectively, so all object data in PG1.1 are stored in osd.0, osd.1, and osd.2, respectively. Since the distributed storage system performs data processing with PG as a basic unit, in the following embodiments, one copy (duplicate) of all object data in one PG is referred to as a PG duplicate of the PG.
Hereinafter, various embodiments of the present invention will be proposed based on the above-described distributed storage system and related devices.
The invention provides a fault handling program.
Referring to fig. 3, fig. 3 is a schematic diagram of an operating environment of a fault handling program according to a first embodiment of the invention.
In the present embodiment, the failure processing program 10 is installed and run in the electronic apparatus 1. The electronic device 1 may be a desktop computer, a notebook, a palm computer, a server, or other computing equipment. The electronic device 1 may include, but is not limited to, a memory 11 and a processor 12 that communicate with each other via a program bus. Fig. 3 only shows the electronic device 1 with components 11, 12, but it is to be understood that not all shown components are required to be implemented, and that more or fewer components may alternatively be implemented.
The memory 11 may in some embodiments be an internal storage unit of the electronic device 1, such as a hard disk or a memory of the electronic device 1. The memory 11 may also be an external storage device of the electronic apparatus 1 in other embodiments, such as a plug-in hard disk provided on the electronic apparatus 1, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like. Further, the memory 11 may also include both an internal storage unit and an external storage device of the electronic apparatus 1. The memory 11 is used for storing application software installed in the electronic device 1 and various types of data, such as program codes of the fault handling program 10. The memory 11 may also be used to temporarily store data that has been output or is to be output.
Processor 12, which in some embodiments may be a Central Processing Unit (CPU), microprocessor or other data Processing chip, is configured to execute program code stored in memory 11 or process data, such as executing fault handler 10.
Referring to fig. 4, a block diagram of a first embodiment of the fault handling program 10 according to the present invention is shown. In this embodiment, the fault handling program 10 may be divided into one or more modules, and the one or more modules are stored in the memory 11 and executed by one or more processors (in this embodiment, the processor 12) to complete the present invention. For example, in fig. 3, the fault handling program 10 may be divided into a detection module 101, a determination module 102, a degradation module 103, and a replacement module 104. The module referred to in the present invention refers to a series of computer program instruction segments capable of performing specific functions, and is more suitable than a program for describing the execution process of the fault handling program 10 in the electronic device 1, wherein:
the detecting module 101 is configured to detect whether each active OSD fails in real time or at regular time.
For example, a heartbeat mechanism may be used to detect whether a primary OSD fails, and send a detection message to each primary OSD in real time or at regular time.
The determining module 102 is configured to, when a faulty active OSD is detected, determine, according to a predetermined mapping relationship between object data and PGs, a PG corresponding to each object data stored in the faulty active OSD, and use each determined PG as a faulty PG.
A demotion module 103, configured to reduce the copy configuration amount of all object data corresponding to all the failed PGs from a first preset amount to a second preset amount.
For example, if the first preset number is 3, each object data in each PG should have 3 copies and be correspondingly stored in 3 active OSDs, that is, one PG should have 3 PG copies and be correspondingly stored in 3 active OSDs. Once a primary OSD fails, only 2 PG copies of the failed PG exist in the distributed storage system, and when the distributed storage system recognizes that the number of the copies of the failed PG is less than the copy configuration amount, data reconstruction is started, that is, one PG copy of each failed PG is copied, and each copied PG copy is written into the corresponding primary OSD, so that the number of the copies of the failed PG reaches the copy configuration amount. In this embodiment, the copy allocation amount of all the object data corresponding to all the failed PGs is reduced from the first preset amount to the second preset amount, that is, the copy allocation amount of all the failed PGs is reduced from the first preset amount to the second preset amount, for example, the first preset amount is 3, and the second preset amount is 2, that is, the multi-copy policy of the failed PG is degraded from three copies to two copies. At this time, one PG copy of each failed PG stored in the failed active OSD is removed, two PG copies of each failed PG still exist in the active OSDs in other normal states, and the number of the PG copies of the failed PG is equal to the current copy configuration amount, so that the distributed storage system does not immediately reconstruct data, and does not cause a large amount of data migration.
The replacing module 104 is configured to select one spare OSD from the spare OSD group as a new active OSD, replace the failed active OSD with the new active OSD, and increase the copy configuration amount of all object data corresponding to all the failed PGs from the second preset amount to the first preset amount.
In this embodiment, the step of selecting one spare OSD from the spare OSD group by the replacement module 104 as the new active OSD includes:
and searching the standby OSD group for the standby OSD which is in the same host as the main OSD with the fault. And if the standby OSD is found, taking the found standby OSD as a new main OSD. If not, a spare OSD is randomly selected from the spare OSD group as a new main OSD.
Further, in this embodiment, the step of replacing, by the replacing module 104, the failed active OSD by the new active OSD includes:
and removing a preset mapping relationship between the device identification information of the main OSD with the fault and the position information (for example, a network port value) of the main OSD with the fault, allocating the device identification information of the main OSD with the fault to the new main OSD as the device identification information of the new main OSD, and reestablishing and storing the mapping relationship between the device identification information of the new main OSD and the position information of the new main OSD.
In this embodiment, the reason why the device identification information of the failed main OSD is allocated to the new main OSD as the device identification information of the new main OSD, but the original device identification information of the new main OSD is not used, is that once the original device identification information of the new main OSD is used and the mapping relationship between the original device identification information of the new main OSD and the location information of the new main OSD is established, the distributed storage system recognizes that a new OSD is added, and then starts a data rebalancing (re-balance) operation, that is, a part of PG copies are respectively selected from each other main OSD and migrated to the new main OSD, so as to achieve reasonable distribution of the PG copies, and the data rebalancing operation causes migration of a large amount of data, thereby affecting the response speed of the distributed storage system.
Compared with the prior art, when OSD fails in the distributed storage system, the copy allocation amount of all object data corresponding to the failure PG is reduced from the first preset amount to the second preset amount, so that the distributed storage system recognizes that the current failure PG copy amount meets the copy allocation amount, and therefore, data reconstruction cannot be performed on the failure OSD, and a large amount of data migration among OSDs cannot be caused.
Further, in this embodiment, the program further includes a data recovery module (not shown in the figure) configured to:
according to the predetermined mapping relationship between the PG and the active OSD, a first preset number of active OSDs corresponding to each of the failed PGs are used as a failed OSD group (as shown in fig. 2, if osd.0 is the active OSD of the failure, PG1.1, PG1.2, and PG1.3 are all failed PGs, the failed OSD group corresponding to PG1.1 includes osd.0, osd.1, and osd.2, the failed OSD group corresponding to PG1.2 includes osd.0, osd.1, and osd.2, and the failed OSD group corresponding to PG1.3 includes osd.0, osd.2, and osd.3), and the data recovery is performed on the new active OSD by using other active OSDs that do not fail in each of the failed OSD groups except the new active OSD. And after data recovery is completed, the state of each faulty OSD group is marked as a normal state.
Further, in this embodiment, the program further includes a redirection module (not shown in the figure) configured to:
when one of the fault OSD groups receives a write request of object data, the write request is redirected to the spare OSD group, and the write request is executed by using the spare OSD group.
The reason why the standby OSD group is enabled to execute the write request in this embodiment is that the new active OSD in the failed OSD group has not completed data recovery at this time, and if the write request is executed again by the failed OSD group, the write request execution is delayed. Therefore, the write request can be executed by starting the standby OSD group, so that the execution efficiency of the write request can be effectively ensured.
Further, in this embodiment, the data recovery module is further configured to:
and judging whether each spare OSD of the spare OSD group stores object data in real time or at regular time or when an incremental data recovery request is received.
And when each spare OSD of the spare OSD group stores object data, judging whether the fault OSD group exists or not.
And when the fault OSD group does not exist, transferring the object data stored in the standby OSD group to one or more main OSD.
And when the fault OSD group exists, searching for the main OSD which does not belong to the fault OSD group.
And when the standby OSD group is searched, transferring the object data stored in the standby OSD group to one or more searched main OSD.
And when the data is not searched, returning a message of failing to recover the incremental data, or returning and continuously searching the main OSD which does not belong to the fault OSD group until the main OSD which does not belong to the fault OSD group is searched.
Further, in this embodiment, the program is further configured to:
detecting the number of the spare OSD groups in real time or at regular time, and when the number of the spare OSD groups is smaller than or equal to a preset threshold value, selecting one or more spare OSD groups which do not belong to the spare OSD groups from the spare OSD groups of each host to add to the spare OSD groups.
In addition, the invention provides a fault processing method.
As shown in fig. 5, fig. 5 is a flowchart illustrating a fault handling method according to a first embodiment of the present invention.
In this embodiment, the method is applicable to an electronic device, the electronic device is respectively in communication connection with a plurality of active OSDs and at least one standby OSD group, each of the standby OSD groups includes a plurality of standby OSDs, the active OSDs are used for storing object data, and copies of a first preset number of each of the object data are respectively stored in the corresponding active OSDs of the first preset number, and the method includes:
step S10, detecting whether each active OSD fails in real time or at regular time.
For example, a heartbeat mechanism may be used to detect whether a primary OSD fails, and send a detection message to each primary OSD in real time or at regular time.
Step S20, when a faulty active OSD is detected, determining PGs corresponding to object data stored in the faulty active OSD according to a predetermined mapping relationship between the object data and the PGs, and using the determined PGs as faulty PGs.
Step S30, reducing the copy allocation amount of all object data corresponding to all the failed PGs from a first preset amount to a second preset amount.
For example, if the first preset number is 3, each object data in each PG should have 3 copies and be correspondingly stored in 3 active OSDs, that is, one PG should have 3 PG copies and be correspondingly stored in 3 active OSDs. Once a primary OSD fails, only 2 PG copies of the failed PG exist in the distributed storage system, and when the distributed storage system recognizes that the number of the copies of the failed PG is less than the copy configuration amount, data reconstruction is started, that is, one PG copy of each failed PG is copied, and each copied PG copy is written into the corresponding primary OSD, so that the number of the copies of the failed PG reaches the copy configuration amount. In this embodiment, the copy allocation amount of all the object data corresponding to all the failed PGs is reduced from the first preset amount to the second preset amount, that is, the copy allocation amount of all the failed PGs is reduced from the first preset amount to the second preset amount, for example, the first preset amount is 3, and the second preset amount is 2, that is, the multi-copy policy of the failed PG is degraded from three copies to two copies. At this time, one PG copy of each failed PG stored in the failed active OSD is removed, two PG copies of each failed PG still exist in the active OSDs in other normal states, and the number of the PG copies of the failed PG is equal to the current copy configuration amount, so that the distributed storage system does not immediately reconstruct data, and does not cause a large amount of data migration.
Step S40, selecting one spare OSD from the spare OSD group as a new active OSD, replacing the failed active OSD with the new active OSD, and increasing the copy allocation amount of all object data corresponding to all the failed PGs from the second preset amount to the first preset amount.
In this embodiment, the step of selecting one spare OSD from the spare OSD group as the new active OSD includes:
and searching the standby OSD group for the standby OSD which is in the same host as the main OSD with the fault. And if the standby OSD is found, taking the found standby OSD as a new main OSD. If not, a spare OSD is randomly selected from the spare OSD group as a new main OSD.
Further, in this embodiment, the step of replacing the failed active OSD with the new active OSD includes:
and removing a preset mapping relationship between the device identification information of the main OSD with the fault and the position information (for example, a network port value) of the main OSD with the fault, allocating the device identification information of the main OSD with the fault to the new main OSD as the device identification information of the new main OSD, and reestablishing and storing the mapping relationship between the device identification information of the new main OSD and the position information of the new main OSD.
In this embodiment, the reason why the device identification information of the failed main OSD is allocated to the new main OSD as the device identification information of the new main OSD, but the original device identification information of the new main OSD is not used, is that once the original device identification information of the new main OSD is used and the mapping relationship between the original device identification information of the new main OSD and the location information of the new main OSD is established, the distributed storage system recognizes that a new OSD is added, and then starts a data rebalancing (re-balance) operation, that is, a part of PG copies are respectively selected from each other main OSD and migrated to the new main OSD, so as to achieve reasonable distribution of the PG copies, and the data rebalancing operation causes migration of a large amount of data, thereby affecting the response speed of the distributed storage system.
Compared with the prior art, when OSD fails in the distributed storage system, the copy allocation amount of all object data corresponding to the failure PG is reduced from the first preset amount to the second preset amount, so that the distributed storage system recognizes that the current failure PG copy amount meets the copy allocation amount, and therefore, data reconstruction cannot be performed on the failure OSD, and a large amount of data migration among OSDs cannot be caused.
Further, in this embodiment, after step S40, the method further includes:
according to the predetermined mapping relationship between the PG and the active OSD, a first preset number of active OSDs corresponding to each of the failed PGs are used as a failed OSD group (as shown in fig. 2, if osd.0 is the active OSD of the failure, PG1.1, PG1.2, and PG1.3 are all failed PGs, the failed OSD group corresponding to PG1.1 includes osd.0, osd.1, and osd.2, the failed OSD group corresponding to PG1.2 includes osd.0, osd.1, and osd.2, and the failed OSD group corresponding to PG1.3 includes osd.0, osd.2, and osd.3), and the data recovery is performed on the new active OSD by using other active OSDs that do not fail in each of the failed OSD groups except the new active OSD. And after data recovery is completed, the state of each faulty OSD group is marked as a normal state.
Further, in this embodiment, after step S40, the method further includes:
when one of the fault OSD groups receives a write request of object data, the write request is redirected to the spare OSD group, and the write request is executed by using the spare OSD group.
The reason why the standby OSD group is enabled to execute the write request in this embodiment is that the new active OSD in the failed OSD group has not completed data recovery at this time, and if the write request is executed again by the failed OSD group, the write request execution is delayed. Therefore, the write request can be executed by starting the standby OSD group, so that the execution efficiency of the write request can be effectively ensured.
Further, in this embodiment, the method further includes:
and judging whether each spare OSD of the spare OSD group stores object data in real time or at regular time or when an incremental data recovery request is received.
And when each spare OSD of the spare OSD group stores object data, judging whether the fault OSD group exists or not.
And when the fault OSD group does not exist, transferring the object data stored in the standby OSD group to one or more main OSD.
And when the fault OSD group exists, searching for the main OSD which does not belong to the fault OSD group.
And when the standby OSD group is searched, transferring the object data stored in the standby OSD group to one or more searched main OSD.
And when the data is not searched, returning a message of failing to recover the incremental data, or returning and continuously searching the main OSD which does not belong to the fault OSD group until the main OSD which does not belong to the fault OSD group is searched.
Further, in this embodiment, the method further includes:
detecting the number of the spare OSD groups in real time or at regular time, and when the number of the spare OSD groups is smaller than or equal to a preset threshold value, selecting one or more spare OSD groups which do not belong to the spare OSD groups from the spare OSD groups of each host to add to the spare OSD groups.
Further, the present invention also proposes a computer-readable storage medium storing a fault handling program executable by at least one processor to cause the at least one processor to perform the fault handling method in any of the above embodiments.
The above description is only a preferred embodiment of the present invention, and is not intended to limit the scope of the present invention, and all modifications and equivalents of the present invention, which are made by the contents of the present specification and the accompanying drawings, or directly/indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims (10)

1. An electronic apparatus, wherein the electronic apparatus is respectively in communication connection with a plurality of active object storage devices and at least one standby object storage device group, the standby object storage device group includes a plurality of standby object storage devices, the active object storage devices are used for storing object data, a first preset number of copies of each object data are respectively stored in a corresponding first preset number of active object storage devices, the electronic apparatus includes a memory and a processor, the memory stores a fault handling program, and when executed by the processor, the fault handling program implements the following steps:
a detection step: detecting whether each main object storage device has a fault in real time or in a timing manner;
a determination step: when a fault main object storage device is detected, determining a homing group corresponding to each object data stored in the fault main object storage device according to a predetermined mapping relation between the object data and the homing group, and taking each determined homing group as a fault homing group;
a degradation step: reducing the copy configuration quantity of all the object data corresponding to all the fault homing groups from a first preset quantity to a second preset quantity;
a replacement step: and selecting one standby object storage device from the standby object storage device group as a new main object storage device, replacing the failed main object storage device with the new main object storage device, and increasing the copy configuration quantity of all object data corresponding to the failure homing group from a second preset quantity to a first preset quantity.
2. The electronic device of claim 1, wherein the processor executes the fault handling program and, after the replacing step, further performs the steps of:
according to a predetermined mapping relationship between the homing groups and the primary object storage devices, taking a first preset number of primary object storage devices corresponding to each fault homing group as a fault object storage device group, and performing data recovery on the new primary object storage device by using other non-fault primary object storage devices in each fault object storage device group except the new primary object storage device.
3. The electronic device of claim 2, wherein the processor executes the fault handling program and, after the replacing step, further performs the steps of:
when a fault object storage device group receives a write request of object data, redirecting the write request to a standby object storage device group, and executing the write request by using the standby object storage device group;
judging whether each spare object storage device of the spare object storage device group stores object data in real time or at regular time or when receiving an incremental data recovery request;
when object data is stored in each spare object storage device of the spare object storage device group, judging whether the fault object storage device group exists or not;
when the fault object storage equipment group does not exist, migrating the object data stored in the standby object storage equipment group to one or more main object storage equipment;
when the fault object storage device group exists, searching for the main object storage device which does not belong to the fault object storage device group, and when the fault object storage device group exists, migrating the object data stored in the standby object storage device group to one or more searched main object storage devices.
4. The electronic apparatus of any of claims 1-3, wherein the step of replacing the failed primary object storage device with the new primary object storage device comprises:
and removing the preset mapping relationship between the device identification information of the failed main object storage device and the position information of the failed main object storage device, allocating the device identification information of the failed main object storage device to the new main object storage device as the device identification information of the new main object storage device, and reestablishing and storing the mapping relationship between the device identification information of the new main object storage device and the position information of the new main object storage device.
5. A fault handling method is applicable to an electronic device, and is characterized in that the electronic device is respectively in communication connection with a plurality of main object storage devices and at least one spare object storage device group, the spare object storage device group comprises a plurality of spare object storage devices, the main object storage devices are used for storing object data, and first preset number of copies of each object data are respectively stored in corresponding first preset number of main object storage devices, the method comprises the following steps:
a detection step: detecting whether each main object storage device has a fault in real time or in a timing manner;
a determination step: when a fault main object storage device is detected, determining a homing group corresponding to each object data stored in the fault main object storage device according to a predetermined mapping relation between the object data and the homing group, and taking each determined homing group as a fault homing group;
a degradation step: reducing the copy configuration quantity of all the object data corresponding to all the fault homing groups from a first preset quantity to a second preset quantity;
a replacement step: and selecting one standby object storage device from the standby object storage device group as a new main object storage device, replacing the failed main object storage device with the new main object storage device, and increasing the copy configuration quantity of all object data corresponding to the failure homing group from a second preset quantity to a first preset quantity.
6. The fault handling method of claim 5 wherein after the step of permuting, the method further comprises:
according to a predetermined mapping relationship between the homing groups and the primary object storage devices, taking a first preset number of primary object storage devices corresponding to each fault homing group as a fault object storage device group, and performing data recovery on the new primary object storage device by using other non-fault primary object storage devices in each fault object storage device group except the new primary object storage device.
7. The fault handling method of claim 6 wherein after the step of permuting, the method further comprises:
when a fault object storage device group receives a write request of object data, redirecting the write request to a standby object storage device group, and executing the write request by using the standby object storage device group;
judging whether each spare object storage device of the spare object storage device group stores object data in real time or at regular time or when receiving an incremental data recovery request;
when object data is stored in each spare object storage device of the spare object storage device group, judging whether the fault object storage device group exists or not;
when the fault object storage equipment group does not exist, migrating the object data stored in the standby object storage equipment group to one or more main object storage equipment;
when the fault object storage device group exists, searching for the main object storage device which does not belong to the fault object storage device group, and when the fault object storage device group exists, migrating the object data stored in the standby object storage device group to one or more searched main object storage devices.
8. The failure handling method of any of claims 5 to 7, wherein the step of replacing the failed primary object storage device with the new primary object storage device comprises:
and removing the preset mapping relationship between the device identification information of the failed main object storage device and the position information of the failed main object storage device, allocating the device identification information of the failed main object storage device to the new main object storage device as the device identification information of the new main object storage device, and reestablishing and storing the mapping relationship between the device identification information of the new main object storage device and the position information of the new main object storage device.
9. A distributed storage system is characterized in that the distributed storage system comprises an electronic device, a plurality of main object storage devices and at least one spare object storage device group, the electronic device is respectively in communication connection with each main object storage device and each spare object storage device group, each spare object storage device group comprises a plurality of spare object storage devices, the main object storage devices are used for storing object data, a first preset number of copies of each object data are respectively stored in the corresponding first preset number of main object storage devices, the electronic device comprises a memory and a processor, a fault handling program is stored in the memory, and when the fault handling program is executed by the processor, the following steps are realized:
a detection step: detecting whether each main object storage device has a fault in real time or in a timing manner;
a determination step: when a fault main object storage device is detected, determining a homing group corresponding to each object data stored in the fault main object storage device according to a predetermined mapping relation between the object data and the homing group, and taking each determined homing group as a fault homing group;
a degradation step: reducing the copy configuration quantity of all the object data corresponding to all the fault homing groups from a first preset quantity to a second preset quantity;
a replacement step: and selecting one standby object storage device from the standby object storage device group as a new main object storage device, replacing the failed main object storage device with the new main object storage device, and increasing the copy configuration quantity of all object data corresponding to the failure homing group from a second preset quantity to a first preset quantity.
10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a fault handling program executable by at least one processor to cause the at least one processor to perform the steps of the fault handling method according to any one of claims 5-8.
CN201811433003.9A 2018-11-28 2018-11-28 Fault processing method and device, distributed storage system and storage medium Active CN109614276B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201811433003.9A CN109614276B (en) 2018-11-28 2018-11-28 Fault processing method and device, distributed storage system and storage medium
PCT/CN2019/088634 WO2020107829A1 (en) 2018-11-28 2019-05-27 Fault processing method, apparatus, distributed storage system, and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811433003.9A CN109614276B (en) 2018-11-28 2018-11-28 Fault processing method and device, distributed storage system and storage medium

Publications (2)

Publication Number Publication Date
CN109614276A CN109614276A (en) 2019-04-12
CN109614276B true CN109614276B (en) 2021-09-21

Family

ID=66006290

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811433003.9A Active CN109614276B (en) 2018-11-28 2018-11-28 Fault processing method and device, distributed storage system and storage medium

Country Status (2)

Country Link
CN (1) CN109614276B (en)
WO (1) WO2020107829A1 (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109614276B (en) * 2018-11-28 2021-09-21 平安科技(深圳)有限公司 Fault processing method and device, distributed storage system and storage medium
CN111190775A (en) * 2019-12-30 2020-05-22 浪潮电子信息产业股份有限公司 OSD (on Screen display) replacing method, system, equipment and computer readable storage medium
CN111752483B (en) * 2020-05-28 2022-07-22 苏州浪潮智能科技有限公司 Method and system for reducing reconstruction data in storage medium change in storage cluster
CN111880747B (en) * 2020-08-01 2022-11-08 广西大学 Automatic balanced storage method of Ceph storage system based on hierarchical mapping
CN111966291B (en) * 2020-08-14 2023-02-24 苏州浪潮智能科技有限公司 Data storage method, system and related device in storage cluster
CN112162699B (en) * 2020-09-18 2023-12-22 北京浪潮数据技术有限公司 Data reading and writing method, device, equipment and computer readable storage medium
CN112395263B (en) * 2020-11-26 2022-08-19 新华三大数据技术有限公司 OSD data recovery method and device
CN113126925B (en) * 2021-04-21 2022-08-02 山东英信计算机技术有限公司 Member list determining method, device and equipment and readable storage medium
CN114510379B (en) * 2022-04-21 2022-11-01 山东百盟信息技术有限公司 Distributed array video data storage device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106250055A (en) * 2016-07-12 2016-12-21 乐视控股(北京)有限公司 A kind of date storage method and system
CN107729185A (en) * 2017-10-26 2018-02-23 新华三技术有限公司 A kind of fault handling method and device
CN108121510A (en) * 2017-12-19 2018-06-05 紫光华山信息技术有限公司 OSD choosing methods, method for writing data, device and storage system
CN108235751A (en) * 2017-12-18 2018-06-29 华为技术有限公司 Identify the method, apparatus and data-storage system of object storage device inferior health
CN108287669A (en) * 2018-01-26 2018-07-17 平安科技(深圳)有限公司 Date storage method, device and storage medium

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2725491B1 (en) * 2012-10-26 2019-01-02 Western Digital Technologies, Inc. A distributed object storage system comprising performance optimizations
CN109614276B (en) * 2018-11-28 2021-09-21 平安科技(深圳)有限公司 Fault processing method and device, distributed storage system and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106250055A (en) * 2016-07-12 2016-12-21 乐视控股(北京)有限公司 A kind of date storage method and system
CN107729185A (en) * 2017-10-26 2018-02-23 新华三技术有限公司 A kind of fault handling method and device
CN108235751A (en) * 2017-12-18 2018-06-29 华为技术有限公司 Identify the method, apparatus and data-storage system of object storage device inferior health
CN108121510A (en) * 2017-12-19 2018-06-05 紫光华山信息技术有限公司 OSD choosing methods, method for writing data, device and storage system
CN108287669A (en) * 2018-01-26 2018-07-17 平安科技(深圳)有限公司 Date storage method, device and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于Ceph存储系统的冗余存储技术研究;刘媛媛;《信息通信》;20180915(第9期);91-92 *

Also Published As

Publication number Publication date
WO2020107829A1 (en) 2020-06-04
CN109614276A (en) 2019-04-12

Similar Documents

Publication Publication Date Title
CN109614276B (en) Fault processing method and device, distributed storage system and storage medium
CN109656896B (en) Fault repairing method and device, distributed storage system and storage medium
CN109656895B (en) Distributed storage system, data writing method, device and storage medium
US10459814B2 (en) Drive extent based end of life detection and proactive copying in a mapped RAID (redundant array of independent disks) data storage system
US10725692B2 (en) Data storage method and apparatus
US10261853B1 (en) Dynamic replication error retry and recovery
US10289336B1 (en) Relocating data from an end of life storage drive based on storage drive loads in a data storage system using mapped RAID (redundant array of independent disks) technology
CN109669822B (en) Electronic device, method for creating backup storage pool, and computer-readable storage medium
CN107656834B (en) System and method for recovering host access based on transaction log and storage medium
CN103942112A (en) Magnetic disk fault-tolerance method, device and system
US10445295B1 (en) Task-based framework for synchronization of event handling between nodes in an active/active data storage system
EP3311272B1 (en) A method of live migration
US11567899B2 (en) Managing dependent delete operations among data stores
US20190163493A1 (en) Methods, systems and devices for recovering from corruptions in data processing units
US10324794B2 (en) Method for storage management and storage device
US11409711B2 (en) Barriers for dependent operations among sharded data stores
US9195528B1 (en) Systems and methods for managing failover clusters
CN114035905A (en) Fault migration method and device based on virtual machine, electronic equipment and storage medium
US20230251931A1 (en) System and device for data recovery for ephemeral storage
US10528426B2 (en) Methods, systems and devices for recovering from corruptions in data processing units in non-volatile memory devices
CN106776142B (en) Data storage method and data storage device
CN115470041A (en) Data disaster recovery management method and device
US9471409B2 (en) Processing of PDSE extended sharing violations among sysplexes with a shared DASD
CN115033337A (en) Virtual machine memory migration method, device, equipment and storage medium
CN112463019A (en) Data reading method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant