WO2020107829A1 - 故障处理方法、装置、分布式存储系统和存储介质 - Google Patents

故障处理方法、装置、分布式存储系统和存储介质 Download PDF

Info

Publication number
WO2020107829A1
WO2020107829A1 PCT/CN2019/088634 CN2019088634W WO2020107829A1 WO 2020107829 A1 WO2020107829 A1 WO 2020107829A1 CN 2019088634 W CN2019088634 W CN 2019088634W WO 2020107829 A1 WO2020107829 A1 WO 2020107829A1
Authority
WO
WIPO (PCT)
Prior art keywords
object storage
storage device
primary object
faulty
group
Prior art date
Application number
PCT/CN2019/088634
Other languages
English (en)
French (fr)
Inventor
宋小兵
姜文峰
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2020107829A1 publication Critical patent/WO2020107829A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/2053Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where persistent mass storage functionality or persistent mass storage control functionality is redundant
    • G06F11/2094Redundant storage or storage space
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1458Management of the backup or restore process
    • G06F11/1464Management of the backup or restore process for networked environments

Definitions

  • the present application relates to the field of distributed storage technology, and in particular, to a fault processing method, device, distributed storage system, and computer-readable storage medium.
  • CEPH distributed file system is a distributed storage system with large capacity, high performance and strong reliability.
  • the core component of CEPH is OSD (Object Storage), which manages an independent hard disk and provides read-write access interface for Object Storage (Object-based Storage).
  • the CEPH cluster is composed of many independent OSDs, and the number of OSDs can be dynamically increased or deleted.
  • the CEPH client distributes object data (Object) to different OSDs for storage through the CRUSH algorithm.
  • CRUSH is a pseudo-random distribution algorithm. The algorithm first assigns the object data to a placement group (PG) through a hash value (HASH), and then calculates the OSD stored by the PG. The object data of the same PG is stored in the target OSD corresponding to the PG.
  • CEPH is also a storage cluster with self-healing function, that is, when an OSD in CEPH fails, the corresponding OSD will exit the service, and the data belonging to the OSD will be reconstructed and redistributed to other OSDs. After the repair, CEPH will migrate part of the data on the other OSD to the OSD. Therefore, even if a faulty OSD occurs in CEPH, CEPH can maintain the integrity of the stored data. However, although this kind of fault handling method can guarantee the integrity of the stored data, the process of data reconstruction will cause a large amount of data migration in multiple OSDs in CEPH, which will result in occupied storage cluster resources and reduced storage performance.
  • the main purpose of the present application is to provide a fault processing method, device, distributed storage system, and computer-readable storage medium, aimed at solving the problem of how to reduce the amount of data migration between OSDs during the OSD fault processing.
  • the present application proposes an electronic device that is in communication with a plurality of active OSDs and at least one standby OSD group.
  • the standby OSD group includes a plurality of standby OSDs.
  • the active OSD uses For storing object data, a first preset number of copies of each of the object data is stored in a corresponding first preset number of active OSDs respectively, the electronic device includes a memory and a processor, and the memory stores A fault processing program, when the fault processing program is executed by the processor, the following steps are realized:
  • Detection step detect whether each of the main object storage devices fails in real time or regularly;
  • Determination step when a faulty primary OSD is detected, the PG corresponding to each object data stored in the faulty primary OSD is determined according to the predetermined mapping relationship between the object data and the PG, and the determination is made Each of the PGs is regarded as a faulty PG;
  • Downgrading step reduce the copy configuration amount of all object data corresponding to all the faulty PGs from the first preset number to the second preset number;
  • Replacement step select a standby OSD from the standby OSD group as the new active OSD, replace the failed active OSD with the new active OSD, and replace all object data corresponding to all the failed PGs
  • the number of copies is increased from the second preset number to the first preset number.
  • the present application also proposes a fault handling method, which is suitable for electronic devices, which are respectively communicatively connected to multiple active OSDs and at least one standby OSD group, and the standby OSD group includes several A standby OSD, the primary OSD is used to store object data, and a first preset number of copies of each of the object data is stored in a corresponding first preset number of primary OSDs respectively.
  • the method includes the steps of:
  • Detection step detect whether each of the main object storage devices fails in real time or regularly;
  • Determination step when a faulty primary OSD is detected, the PG corresponding to each object data stored in the faulty primary OSD is determined according to the predetermined mapping relationship between the object data and the PG, and the determination is made Each of the PGs is regarded as a faulty PG;
  • Downgrading step reduce the copy configuration amount of all object data corresponding to all the faulty PGs from the first preset number to the second preset number;
  • Replacement step select a standby OSD from the standby OSD group as the new active OSD, replace the failed active OSD with the new active OSD, and replace all object data corresponding to all the failed PGs
  • the number of copies is increased from the second preset number to the first preset number.
  • the present application also proposes a distributed storage system.
  • the distributed storage system includes an electronic device, a plurality of primary object storage devices, and at least one spare object storage device group.
  • Each of the primary object storage devices and each of the backup object storage device groups are in communication connection
  • the backup object storage device group includes a plurality of backup object storage devices
  • the primary object storage device is used to store object data
  • each A first preset number of copies of the object data are respectively stored in a corresponding first preset number of main object storage devices.
  • the electronic device includes a memory and a processor, and a fault handling program is stored on the memory. When the fault processing program is executed by the processor, the following steps are implemented:
  • Detection step detect whether each of the main object storage devices fails in real time or regularly;
  • Determination step when a faulty primary object storage device is detected, according to a predetermined mapping relationship between the object data and the placement group, determine that each object data stored in the faulty primary object storage device corresponds to The placement group, and use each determined placement group as the failure placement group;
  • Downgrading step reduce the copy configuration amount of all object data corresponding to all the fault placement groups from the first preset number to the second preset number;
  • Replacement step select a spare object storage device from the spare object storage device group as a new primary object storage device, replace the failed primary object storage device with the new primary object storage device, and replace The copy configuration amount of all object data corresponding to all the fault placement groups is increased from the second preset number to the first preset number.
  • the present application also proposes a computer-readable storage medium, the computer-readable storage medium storing a fault processing program, the fault processing program may be executed by at least one processor, so that the at least A processor performs the steps:
  • Detection step detect whether each of the main object storage devices fails in real time or regularly;
  • Determination step when a faulty primary object storage device is detected, according to a predetermined mapping relationship between the object data and the placement group, determine that each object data stored in the faulty primary object storage device corresponds to The placement group, and use each determined placement group as the failure placement group;
  • Downgrading step reduce the copy configuration amount of all object data corresponding to all the fault placement groups from the first preset number to the second preset number;
  • Replacement step select a spare object storage device from the spare object storage device group as a new primary object storage device, replace the failed primary object storage device with the new primary object storage device, and replace The copy configuration amount of all object data corresponding to all the fault placement groups is increased from the second preset number to the first preset number.
  • This application detects the faulty active OSD in real time or at regular intervals; when a faulty active OSD is detected, based on the mapping relationship between the predetermined object data and the PG, the faulty OSD stored in the fault is determined PG corresponding to each object data, and use each determined PG as a faulty PG; reduce the copy configuration amount of all object data corresponding to all the faulty PGs from the first preset number to the second preset number; Select a standby OSD from the standby OSD group as the new active OSD, replace the failed active OSD with the new active OSD, and configure the amount of copies of all object data corresponding to all the failed PGs from The second preset number is increased to the first preset number.
  • the application reduces the configuration amount of all object data corresponding to the failed PG from the first preset number to the second preset number, so that the distributed storage
  • the system recognizes that the number of current faulty PG copies meets the number of copies configured. Therefore, no data reconstruction is performed on the faulty OSD, and a large amount of data migration between OSDs will not be caused. It can be seen that, in the process of OSD fault handling, this application reduces the amount of data migration between OSDs.
  • FIG. 1 is a schematic diagram of a system architecture of a first embodiment of a distributed storage system of this application;
  • FIG. 2 is a schematic diagram of the storage relationship of the distributed storage system of the present application.
  • FIG. 3 is a schematic diagram of the operating environment of the first embodiment of the fault handling program of the present application.
  • FIG. 4 is a program module diagram of the first embodiment of the fault handling program of the present application.
  • FIG. 5 is a schematic flowchart of a first embodiment of a fault processing method of the present application.
  • FIG. 1 it is a schematic diagram of the system architecture of the first embodiment of the distributed storage system of the present application.
  • the distributed storage system includes multiple active OSDs 31 and at least one standby OSD group.
  • the standby OSD group includes a number of standby OSDs 32, where each active OSD 31 and standby OSD 32 may be set in each host 3
  • at least one main OSD 31 and at least one standby OSD 32 are provided in one host 3, and a communication connection between each main OSD 31 and the standby OSD 32 (for example, a communication connection through the network 2).
  • an electronic device 1 is also provided in the distributed storage system, and the electronic device 1 is communicatively connected to each of the active OSD 31 and the standby OSD 32 (for example, a communication connection through the network 2).
  • the electronic device 1 described above is set independently of the distributed storage system, and is communicatively connected to the distributed storage system (for example, through the network 2 communication connection).
  • the smallest storage unit in the distributed storage system is object data.
  • An object data is a data block whose size does not exceed a specified value (for example, 4MB), and each object data is mapped to the corresponding PG
  • the target data is not directly manipulated, but data processing (eg, data addressing, data migration, etc.) is performed using PG as a basic unit.
  • the above-mentioned distributed storage system supports a multi-copy strategy. For example, if the number of copies of object data corresponding to the PG in the distributed storage system is preset to a first preset number (for example, three), it represents all objects in a PG There is a first preset number of copies (copy) of the data, and each copy of all the object data in the PG is correspondingly stored in the first preset number of OSDs. For example, if three copies of each object data in PG1.1 in Figure 2 are stored in OSD.0, OSD.1 and OSD.2, respectively, PG1 is stored in OSD.0, OSD.1 and OSD.2 All object data in .1. Since the distributed storage system performs data processing with a PG as a basic unit, in the following embodiments, a copy (copy) of all object data in a PG is called a PG copy of the PG.
  • a copy (copy) of all object data in a PG is called a PG copy of the
  • FIG. 3 is a schematic diagram of the operating environment of the first embodiment of the fault handling procedure of the present application.
  • the fault processing program 10 is installed and runs in the electronic device 1.
  • the electronic device 1 may be a computing device such as a desktop computer, a notebook, a palmtop computer, and a server.
  • the electronic device 1 may include, but is not limited to, a memory 11 and a processor 12 that communicate with each other through a program bus.
  • FIG. 3 only shows the electronic device 1 having the components 11 and 12, but it should be understood that it is not required to implement all the components shown, and more or fewer components may be implemented instead.
  • the memory 11 may be an internal storage unit of the electronic device 1 in some embodiments, such as a hard disk or a memory of the electronic device 1. In other embodiments, the memory 11 may also be an external storage device of the electronic device 1, such as a plug-in hard disk equipped on the electronic device 1, a smart memory card (Smart, Media, Card, SMC), and a secure digital (SD) Cards, flash cards, etc. Further, the memory 11 may also include both an internal storage unit of the electronic device 1 and an external storage device. The memory 11 is used to store application software installed in the electronic device 1 and various types of data, such as program codes of the fault processing program 10. The memory 11 can also be used to temporarily store data that has been or will be output.
  • the processor 12 may be a central processing unit (CPU), a microprocessor, or other data processing chip, which is used to run the program code or process data stored in the memory 11, such as executing a fault processing program 10 and so on.
  • CPU central processing unit
  • microprocessor or other data processing chip, which is used to run the program code or process data stored in the memory 11, such as executing a fault processing program 10 and so on.
  • FIG. 4 is a program module diagram of the first embodiment of the fault processing program 10 of the present application.
  • the fault processing program 10 can be divided into one or more modules, and the one or more modules are stored in the memory 11 and are processed by one or more processors (in this embodiment, the processor 12). Execute to complete this application.
  • the fault processing program 10 may be divided into a detection module 101, a determination module 102, a degradation module 103, and a replacement module 104.
  • the module referred to in this application refers to a series of computer program instruction segments capable of performing specific functions, and is more suitable for describing the execution process of the fault processing program 10 in the electronic device 1 than the program, where:
  • the detection module 101 is used to detect whether each of the active OSDs fails in real time or regularly.
  • a heartbeat mechanism can be used to detect whether a primary OSD is faulty, and a detection message can be sent to each primary OSD in real time or regularly. If a primary OSD does not return a reply message within a preset time period, it is determined that the primary OSD has occurred malfunction.
  • the determination module 102 is configured to determine the PG corresponding to each object data stored in the faulty main OSD according to the predetermined mapping relationship between the target data and the PG when a faulty main OSD is detected, The determined PGs are regarded as faulty PGs.
  • the downgrade module 103 is configured to reduce the copy configuration amount of all object data corresponding to all the faulty PGs from the first preset number to the second preset number.
  • the distributed storage system recognizes that the number of copies of the failed PG is less than the configured number of copies, it will start data reconstruction, that is, copy A PG copy of each faulty PG is generated, and each copy of the PG of the copy is written into the corresponding active OSD, so that the number of copies of the faulty PG reaches the copy configuration amount.
  • the copy configuration amount of all object data corresponding to all the faulty PGs is reduced from the first preset number to the second preset number, that is, the copy configuration amount of all faulty PGs is reduced from the first preset number to
  • the second preset number for example, the first preset number is 3, and the second preset number is 2, that is, the multiple copy strategy of the failed PG is downgraded from three copies to two copies.
  • the number of failed PG copies is equal to the current copy configuration amount. Therefore, the distributed storage system will not immediately reconstruct the data, nor will it cause a large amount of data migration.
  • the replacement module 104 is used to select a standby OSD from the standby OSD group as the new active OSD, replace the failed active OSD with the new active OSD, and replace all faulty PGs with the corresponding
  • the copy configuration amount of all object data is increased from the second preset number to the first preset number.
  • the step of the replacement module 104 selecting a standby OSD from the standby OSD group as the new active OSD includes:
  • the step of the replacement module 104 replacing the failed active OSD with the new active OSD includes:
  • the identification information is allocated to the new primary OSD as the device identification information of the new primary OSD, and the relationship between the device identification information of the new primary OSD and the location information of the new primary OSD is re-established and saved Mapping relations.
  • the device identification information of the failed primary OSD is assigned to the new primary OSD as the device identification information of the new primary OSD, instead of using the original original primary OSD
  • Some device identification information is because once the original device identification information of the new primary OSD is used, the mapping between the original device identification information of the new primary OSD and the location information of the new primary OSD is established Relationship, the distributed storage system will recognize the addition of a new OSD, and immediately initiate a data re-balance (re-balance) operation, that is, select some PG copies from each other active OSD to migrate to the new active OSD In order to achieve a reasonable distribution of PG copies, the data rebalancing operation will cause a large amount of data migration, thus affecting the response speed of the distributed storage system.
  • re-balance data re-balance
  • this embodiment reduces the number of copies of all object data corresponding to the failed PG from the first preset number to the second preset number to make the distributed
  • the storage system recognizes that the number of current faulty PG copies meets its copy configuration amount, so it will not reconstruct the faulty OSD, and will not cause a large amount of data migration between OSDs. It can be seen that this application is in the process of OSD fault handling , Reducing the amount of data migration between OSDs.
  • the program further includes a data recovery module (not shown in the figure), which is used to:
  • the first preset number of active OSDs corresponding to each of the faulty PGs is used as the faulty OSD group (as shown in FIG. 2, if OSD.0 is a fault
  • the main OSD, PG1.1, PG1.2 and PG1.3 are all faulty PG, and the faulty OSD group corresponding to PG1.1 includes OSD.0, OSD.1 and OSD.2, PG1.2 corresponds to The faulty OSD group includes OSD.0, OSD.1 and OSD.2, and the faulty OSD group corresponding to PG1.3 includes OSD.0, OSD.2 and OSD.3), and the The non-failed active OSD other than the new active OSD performs data recovery on the new active OSD. After the data recovery is completed, the state of each of the failed OSD groups is marked as a normal state.
  • the program further includes a redirection module (not shown in the figure), which is used to:
  • the write request is redirected to the standby OSD group, and the write request is executed using the standby OSD group.
  • this embodiment enables the standby OSD group to execute the write request is because the new primary OSD in the failed OSD group has not yet completed data recovery. If the failed OSD group executes the write request again, it will cause a delay in the execution of the write request. It can be seen that enabling the standby OSD group to execute the write request can effectively ensure the execution efficiency of the write request.
  • the data recovery module is also used to:
  • the object data stored in the standby OSD group is migrated to one or more active OSDs.
  • the faulty OSD group searches for the active OSD that does not belong to the faulty OSD group.
  • the object data stored in the standby OSD group is migrated to one or more found active OSDs.
  • program is also used for:
  • Real-time or regular detection of the number of standby OSDs in the standby OSD group When the number of standby OSDs is less than or equal to the preset threshold, select one or more standby OSDs from each host's standby OSD that do not belong to the standby OSD group The standby OSD group.
  • FIG. 5 is a schematic flowchart of a first embodiment of a fault processing method of the present application.
  • the method is applicable to an electronic device that is in communication with a plurality of active OSDs and at least one standby OSD group.
  • the standby OSD group includes a number of standby OSDs.
  • the active OSD is used to Object data is stored, and a first preset number of copies of each of the object data is stored in a corresponding first preset number of active OSDs respectively.
  • the method includes:
  • step S10 whether each of the active OSDs fails is detected in real time or regularly.
  • a heartbeat mechanism can be used to detect whether a primary OSD is faulty, and a detection message can be sent to each primary OSD in real time or regularly. If a primary OSD does not return a reply message within a preset time period, it is determined that the primary OSD has occurred malfunction.
  • Step S20 when a faulty primary OSD is detected, the PG corresponding to each object data stored in the faulty primary OSD is determined according to the predetermined mapping relationship between the object data and the PG, and the determination is made Each of the PGs is regarded as a faulty PG.
  • Step S30 Reduce the copy configuration amount of all object data corresponding to all the faulty PGs from the first preset number to the second preset number.
  • the distributed storage system recognizes that the number of copies of the failed PG is less than the configured number of copies, it will start data reconstruction, that is, copy A PG copy of each faulty PG is generated, and each copy of the PG of the copy is written into the corresponding active OSD, so that the number of copies of the faulty PG reaches the copy configuration amount.
  • the copy configuration amount of all object data corresponding to all the faulty PGs is reduced from the first preset number to the second preset number, that is, the copy configuration amount of all faulty PGs is reduced from the first preset number to
  • the second preset number for example, the first preset number is 3, and the second preset number is 2, that is, the multiple copy strategy of the failed PG is downgraded from three copies to two copies.
  • the number of failed PG copies is equal to the current copy configuration amount. Therefore, the distributed storage system will not immediately reconstruct the data, nor will it cause a large amount of data migration.
  • Step S40 Select a standby OSD from the standby OSD group as a new active OSD, replace the failed active OSD with the new active OSD, and replace all object data corresponding to all the failed PGs The number of copies is increased from the second preset number to the first preset number.
  • the above steps for selecting a standby OSD from the standby OSD group as the new active OSD include:
  • the step of replacing the failed primary OSD with the new primary OSD includes:
  • the identification information is allocated to the new primary OSD as the device identification information of the new primary OSD, and the relationship between the device identification information of the new primary OSD and the location information of the new primary OSD is re-established and saved Mapping relations.
  • the device identification information of the failed primary OSD is assigned to the new primary OSD as the device identification information of the new primary OSD, instead of using the original original primary OSD
  • Some device identification information is because once the original device identification information of the new primary OSD is used, the mapping between the original device identification information of the new primary OSD and the location information of the new primary OSD is established Relationship, the distributed storage system will recognize the addition of a new OSD, and immediately initiate a data re-balance (re-balance) operation, that is, select some PG copies from each other active OSD to migrate to the new active OSD In order to achieve a reasonable distribution of PG copies, the data rebalancing operation will cause a large amount of data migration, thus affecting the response speed of the distributed storage system.
  • re-balance data re-balance
  • this embodiment reduces the number of copies of all object data corresponding to the failed PG from the first preset number to the second preset number to make the distributed
  • the storage system recognizes that the number of current faulty PG copies meets its copy configuration amount, so it will not reconstruct the faulty OSD, and will not cause a large amount of data migration between OSDs. It can be seen that this application is in the process of OSD fault handling , Reducing the amount of data migration between OSDs.
  • step S40 the method further includes:
  • the first preset number of active OSDs corresponding to each of the faulty PGs is used as the faulty OSD group (as shown in FIG. 2, if OSD.0 is a fault
  • the main OSD, PG1.1, PG1.2 and PG1.3 are all faulty PG, and the faulty OSD group corresponding to PG1.1 includes OSD.0, OSD.1 and OSD.2, PG1.2 corresponds to The faulty OSD group includes OSD.0, OSD.1 and OSD.2, and the faulty OSD group corresponding to PG1.3 includes OSD.0, OSD.2 and OSD.3), and the The non-failed active OSD other than the new active OSD performs data recovery on the new active OSD. After the data recovery is completed, the state of each of the failed OSD groups is marked as a normal state.
  • step S40 the method further includes:
  • the write request is redirected to the standby OSD group, and the write request is executed using the standby OSD group.
  • this embodiment enables the standby OSD group to execute the write request is because the new primary OSD in the failed OSD group has not yet completed data recovery. If the failed OSD group executes the write request again, it will cause a delay in the execution of the write request. It can be seen that enabling the standby OSD group to execute the write request can effectively ensure the execution efficiency of the write request.
  • the method further includes:
  • the object data stored in the standby OSD group is migrated to one or more active OSDs.
  • the faulty OSD group searches for the active OSD that does not belong to the faulty OSD group.
  • the object data stored in the standby OSD group is migrated to one or more found active OSDs.
  • the method further includes:
  • the present application also provides a computer-readable storage medium that stores a fault processing program, and the fault processing program may be executed by at least one processor to cause the at least one processor to execute The fault processing method in any of the above embodiments.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Hardware Redundancy (AREA)

Abstract

本申请涉及一种分布式存储技术,揭露了一种故障处理方法、装置、分布式存储系统和计算机可读存储介质。本申请实时或定时侦测故障的主用OSD;当侦测到一故障的主用OSD时,根据预先确定的对象数据与PG之间的映射关系,确定故障的主用OSD中存储的各个对象数据对应的PG,并将确定的各个PG作为故障PG;将所有故障PG对应的所有对象数据的副本配置量从第一预设数量减少为第二预设数量;从备用OSD组中选择一个备用OSD作为新的主用OSD,用新的主用OSD置换故障的主用OSD,并将所有故障PG对应的所有对象数据的副本配置量从第二预设数量增加为第一预设数量。相较于现有技术,本申请在OSD故障处理过程中,减少了OSD之间的数据迁移量。

Description

故障处理方法、装置、分布式存储系统和存储介质
优先权申明
本申请基于巴黎公约申明享有2018年11月28日递交的申请号为CN201811433003.9、名称为“故障处理方法、装置、分布式存储系统和计算机可读存储介质”中国专利申请的优先权,该中国专利申请的整体内容以参考的方式结合在本申请中。
技术领域
本申请涉及分布式存储技术领域,特别涉及一种故障处理方法、装置、分布式存储系统和计算机可读存储介质。
背景技术
CEPH分布式文件系统是一种容量大、性能高、可靠性强的分布式存储系统。CEPH的核心组件是OSD(Object Storage Device,对象存储设备),OSD管理一块独立的硬盘,并提供对象存储(Object-based Storage)的读写访问接口。CEPH集群由很多独立的OSD构成,OSD数量可以动态的增删。CEPH客户端通过CRUSH算法将对象数据(Object)分发到不同OSD上进行存储。其中,CRUSH是一种伪随机分布算法,该算法先将对象数据通过哈希值(HASH)归属到一个归置组(Placement Group,PG)中,然后计算该PG存放的OSD,由此,归属于同一个PG的对象数据存放到该PG对应的目标OSD中。
CEPH还是一个具有自修复功能的存储集群,即当CEPH中的某个OSD出现故障时,对应的OSD会退出服务,属于该OSD的数据会产生重构,重新分布到其他OSD上,待该OSD修复后,CEPH又会将该其他OSD上的部分数据迁移到该OSD上。因此,即使CEPH中出现故障的OSD,CEPH也可保持存储数据的完整性。然而,这种故障处理方法虽可保障存储数据的完整性,但数据重构的过程会造成CEPH中多个OSD中数据发生大量迁移,从而导致存储集群资源被占用,存储性能下降。
因此,如何在OSD故障处理过程中,减少OSD之间的数据迁移量已成为一个亟待解决的问题。
发明内容
本申请的主要目的是提供一种故障处理方法、装置、分布式存储系统和计算机可读存储介质,旨在解决如何在OSD故障处理过程中,减少OSD之间的数据迁移量的问题。
为实现上述目的,本申请提出一种电子装置,所述电子装置分别与多个主用OSD及至少一个备用OSD组通信连接,所述备用OSD组包括若干个备用OSD,所述主用OSD用于存储对象数据,各个所述对象数据的第一预设数量的副本分别存储于对应的第一预设数量的主用OSD中,所述电子装置包括存储器和处理器,所述存储器上存储有故障处理程序,所述故障处理程序被所述处理器执行时实现如下步骤:
侦测步骤:实时或定时侦测各个所述主用对象存储设备是否发生故障;
确定步骤:当侦测到一故障的主用OSD时,根据预先确定的对象数据与PG之间的映射关系,确定所述故障的主用OSD中存储的各个对象数据对应的PG,并将确定的各个所述PG作为故障PG;
降级步骤:将所有所述故障PG对应的所有对象数据的副本配置量从第一预设数量减少为第二预设数量;
置换步骤:从所述备用OSD组中选择一个备用OSD作为新的主用OSD,用所述新的主用OSD置换所述故障的主用OSD,并将所有所述故障PG对应的所有对象数据的副本配置量从第二预设数量增加为第一预设数量。
此外,为实现上述目的,本申请还提出一种故障处理方法,适用于电子装置,所述电子装置分别与多个主用OSD及至少一个备用OSD组通信连接,所述备用OSD组包括若干个备用OSD,所述主用OSD用于存储对象数据,各个所述对象数据的第一预设数量的副本分别存储于对应的第一预设数量的主用OSD中,该方法包括步骤:
侦测步骤:实时或定时侦测各个所述主用对象存储设备是否发生故障;
确定步骤:当侦测到一故障的主用OSD时,根据预先确定的对象数据与PG之间的映射关系,确定所述故障的主用OSD中存储的各个对象数据对应的PG,并将确定的各个所述PG作为故障PG;
降级步骤:将所有所述故障PG对应的所有对象数据的副本配置量从第一 预设数量减少为第二预设数量;
置换步骤:从所述备用OSD组中选择一个备用OSD作为新的主用OSD,用所述新的主用OSD置换所述故障的主用OSD,并将所有所述故障PG对应的所有对象数据的副本配置量从第二预设数量增加为第一预设数量。
此外,为实现上述目的,本申请还提出一种分布式存储系统,所述分布式存储系统包括电子装置、多个主用对象存储设备及至少一个备用对象存储设备组,所述电子装置分别与各个所述主用对象存储设备及各个所述备用对象存储设备组通信连接,所述备用对象存储设备组包括若干个备用对象存储设备,所述主用对象存储设备用于存储对象数据,各个所述对象数据的第一预设数量的副本分别存储于对应的第一预设数量的主用对象存储设备中,所述电子装置包括存储器和处理器,所述存储器上存储有故障处理程序,所述故障处理程序被所述处理器执行时实现如下步骤:
侦测步骤:实时或定时侦测各个所述主用对象存储设备是否发生故障;
确定步骤:当侦测到一故障的主用对象存储设备时,根据预先确定的对象数据与归置组之间的映射关系,确定所述故障的主用对象存储设备中存储的各个对象数据对应的归置组,并将确定的各个所述归置组作为故障归置组;
降级步骤:将所有所述故障归置组对应的所有对象数据的副本配置量从第一预设数量减少为第二预设数量;
置换步骤:从所述备用对象存储设备组中选择一个备用对象存储设备作为新的主用对象存储设备,用所述新的主用对象存储设备置换所述故障的主用对象存储设备,并将所有所述故障归置组对应的所有对象数据的副本配置量从第二预设数量增加为第一预设数量。
此外,为实现上述目的,本申请还提出一种计算机可读存储介质,所述计算机可读存储介质存储有故障处理程序,所述故障处理程序可被至少一个处理器执行,以使所述至少一个处理器执行步骤:
侦测步骤:实时或定时侦测各个所述主用对象存储设备是否发生故障;
确定步骤:当侦测到一故障的主用对象存储设备时,根据预先确定的对象数据与归置组之间的映射关系,确定所述故障的主用对象存储设备中存储的各个对象数据对应的归置组,并将确定的各个所述归置组作为故障归置组;
降级步骤:将所有所述故障归置组对应的所有对象数据的副本配置量从 第一预设数量减少为第二预设数量;
置换步骤:从所述备用对象存储设备组中选择一个备用对象存储设备作为新的主用对象存储设备,用所述新的主用对象存储设备置换所述故障的主用对象存储设备,并将所有所述故障归置组对应的所有对象数据的副本配置量从第二预设数量增加为第一预设数量。
本申请实时或定时侦测故障的主用OSD;当侦测到一故障的主用OSD时,根据预先确定的对象数据与PG之间的映射关系,确定所述故障的主用OSD中存储的各个对象数据对应的PG,并将确定的各个所述PG作为故障PG;将所有所述故障PG对应的所有对象数据的副本配置量从第一预设数量减少为第二预设数量;从所述备用OSD组中选择一个备用OSD作为新的主用OSD,用所述新的主用OSD置换所述故障的主用OSD,并将所有所述故障PG对应的所有对象数据的副本配置量从第二预设数量增加为第一预设数量。相较于现有技术,本申请在分布式存储系统一OSD发生故障时,将故障PG对应的所有对象数据的副本配置量从第一预设数量减少为第二预设数量,使分布式存储系统识别当前故障PG副本数量满足其副本配置量,因此,不会对该故障OSD进行数据重构,也就不会造成OSD之间大量的数据迁移。可见,本申请在OSD故障处理过程中,减少了OSD之间的数据迁移量。
附图说明
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图示出的结构获得其他的附图。
图1为本申请分布式存储系统第一实施例的系统架构示意图;
图2为本申请分布式存储系统的存储关系示意图;
图3为本申请故障处理程序第一实施例的运行环境示意图;
图4为本申请故障处理程序第一实施例的程序模块图;
图5为本申请故障处理方法第一实施例的流程示意图。
本申请目的的实现、功能特点及优点将结合实施例,参照附图做进一步 说明。
具体实施方式
以下结合附图对本申请的原理和特征进行描述,所举实例只用于解释本申请,并非用于限定本申请的范围。
参阅图1所示,是本申请分布式存储系统第一实施例的系统架构示意图。
在本实施例中,分布式存储系统包括多个主用OSD31及至少一个备用OSD组,所述备用OSD组包括若干个备用OSD32,其中,各个主用OSD31和备用OSD32可设置于各个主机3中,例如,一台主机3中至少设置一个主用OSD31及至少一个备用OSD32,且各个主用OSD31、备用OSD32之间通信连接(例如,通过网络2通信连接)。
在一些应用场景中,分布式存储系统中还设置有电子装置1,该电子装置1与各个主用OSD31、备用OSD32之间通信连接(例如,通过网络2通信连接)。
在一些应用场景中,上述电子装置1独立于分布式存储系统设置,且与分布式存储系统通信连接(例如,通过网络2通信连接)。
本实施例中,上述分布式存储系统中的最小存储单元为对象数据(object),一个对象数据是一个大小不超过规定数值(例如,4MB)的数据块,各个对象数据被映射至对应的PG中,该分布式存储系统不会直接操作对象数据,而是以PG为基本单位进行数据处理(例如,数据寻址、数据迁移等)。
上述分布式存储系统支持多副本策略,例如,预先设置该分布式存储系统中PG对应的对象数据的副本配置量为第一预设数量(如,三个),则代表一个PG中所有的对象数据存在第一预设数量的副本(拷贝),且该PG中所有的对象数据的各个副本分别被对应存储至第一预设数量的OSD中。例如,图2中PG1.1中各个对象数据的3个副本分别存储于OSD.0、OSD.1及OSD.2中,则OSD.0、OSD.1及OSD.2中均分别存储有PG1.1中所有对象数据。由于该分布式存储系统以PG为基本单位进行数据处理,因此,在下面各实施例中,将一个PG中所有的对象数据的一个拷贝(副本)称为该PG的PG副本。
下面,将基于上述分布式存储系统和相关设备,提出本申请的各个实施例。
本申请提出一种故障处理程序。请参阅图3,图3为本申请故障处理程序第一实施例的运行环境示意图。
在本实施例中,故障处理程序10安装并运行于电子装置1中。电子装置1可以是桌上型计算机、笔记本、掌上电脑及服务器等计算设备。该电子装置1可包括,但不仅限于,通过程序总线相互通信的存储器11及处理器12。图3仅示出了具有组件11、12的电子装置1,但是应理解的是,并不要求实施所有示出的组件,可以替代的实施更多或者更少的组件。
存储器11在一些实施例中可以是电子装置1的内部存储单元,例如该电子装置1的硬盘或内存。存储器11在另一些实施例中也可以是电子装置1的外部存储设备,例如电子装置1上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。进一步地,存储器11还可以既包括电子装置1的内部存储单元也包括外部存储设备。存储器11用于存储安装于电子装置1的应用软件及各类数据,例如故障处理程序10的程序代码等。存储器11还可以用于暂时地存储已经输出或者将要输出的数据。
处理器12在一些实施例中可以是一中央处理器(Central Processing Unit,CPU),微处理器或其他数据处理芯片,用于运行存储器11中存储的程序代码或处理数据,例如执行故障处理程序10等。
请参阅图4,是本申请故障处理程序10第一实施例的程序模块图。在本实施例中,故障处理程序10可以被分割成一个或多个模块,一个或者多个模块被存储于存储器11中,并由一个或多个处理器(本实施例为处理器12)所执行,以完成本申请。例如,在图3中,故障处理程序10可以被分割成侦测模块101、确定模块102、降级模块103及置换模块104。本申请所称的模块是指能够完成特定功能的一系列计算机程序指令段,比程序更适合于描述故障处理程序10在电子装置1中的执行过程,其中:
侦测模块101,用于实时或定时侦测各个所述主用OSD是否发生故障。
例如,可采用心跳机制来侦测一主用OSD是否故障,实时或定时发送检测消息至各个主用OSD,若一主用OSD在预设时长内未返回回复消息,则确定该主用OSD发生故障。
确定模块102,用于当侦测到一故障的主用OSD时,根据预先确定的对 象数据与PG之间的映射关系,确定所述故障的主用OSD中存储的各个对象数据对应的PG,并将确定的各个所述PG作为故障PG。
降级模块103,用于将所有所述故障PG对应的所有对象数据的副本配置量从第一预设数量减少为第二预设数量。
例如,若第一预设数量为3,则每一个PG中各个对象数据应当存在3个副本并对应存储于3个主用OSD中,即一个PG应当存在3个PG副本并对应存储于3个主用OSD中。一旦一主用OSD发生故障,则分布式存储系统中仅存在故障PG的2个PG副本,分布式存储系统识别到故障PG的副本数量少于副本配置量时,会启动数据重构,即拷贝出各个故障PG的一个PG副本,并将该拷贝的各个PG副本写入至对应的主用OSD中,以使故障PG的副本数量达到副本配置量。本实施例中,将所有所述故障PG对应的所有对象数据的副本配置量从第一预设数量减少为第二预设数量,即将所有故障PG的副本配置量从第一预设数量减少为第二预设数量,例如,第一预设数量为3,第二预设数量为2,即将故障PG的多副本策略从三副本降级为二副本。此时,除去故障的主用OSD中存储的各个故障PG的一个PG副本,其他正常状态的主用OSD中仍存在各个故障PG的两个PG副本,故障PG的副本数量等于当前副本配置量,因此,分布式存储系统不会立即进行数据重构,也不会造成数据的大量迁移。
置换模块104,用于从所述备用OSD组中选择一个备用OSD作为新的主用OSD,用所述新的主用OSD置换所述故障的主用OSD,并将所有所述故障PG对应的所有对象数据的副本配置量从第二预设数量增加为第一预设数量。
本实施例中,上述置换模块104从所述备用OSD组中选择一个备用OSD作为新的主用OSD的步骤包括:
在备用OSD组中查找与所述故障的主用OSD处于同一主机的备用OSD。若查找到,则将该查找到的备用OSD作为新的主用OSD。若未查找到,则从备用OSD组中随机选择一备用OSD作为新的主用OSD。
进一步地,本实施例中,所述置换模块104用所述新的主用OSD置换所述故障的主用OSD的步骤包括:
将预先设置的所述故障的主用OSD的设备标识信息与所述故障的主用 OSD的位置信息(例如,网络端口值)之间的映射关系解除,将所述故障的主用OSD的设备标识信息分配给所述新的主用OSD作为该新的主用OSD的设备标识信息,重新建立并保存所述新的主用OSD的设备标识信息与新的主用OSD的位置信息之间的映射关系。
本实施例中,之所以将所述故障的主用OSD的设备标识信息分配给所述新的主用OSD作为该新的主用OSD的设备标识信息,而不使用该新的主用OSD原有的设备标识信息,是因为一旦使用该新的主用OSD原有的设备标识信息,并建立该新的主用OSD原有的设备标识信息与新的主用OSD的位置信息之间的映射关系,该分布式存储系统将会识别有新的OSD加入,随即启动数据再平衡(re-balance)操作,即从各个其他的主用OSD中分别选取部分PG副本迁移至该新的主用OSD中,以实现PG副本的合理分布,数据再平衡操作会造成大量数据的迁移,从而对分布式存储系统的响应速度造成影响。
相较于现有技术,本实施例在分布式存储系统一OSD发生故障时,将故障PG对应的所有对象数据的副本配置量从第一预设数量减少为第二预设数量,使分布式存储系统识别当前故障PG副本数量满足其副本配置量,因此,不会对该故障OSD进行数据重构,也就不会造成OSD之间大量的数据迁移,可见,本申请在OSD故障处理过程中,减少了OSD之间的数据迁移量。
进一步地,本实施例中,该程序还包括数据恢复模块(图中未示),用于:
根据预先确定的PG与主用OSD之间的映射关系,将每一个所述故障PG对应的第一预设数量的主用OSD作为故障OSD组(如图2所示,若OSD.0为故障的主用OSD,则PG1.1、PG1.2及PG1.3均为故障PG,而PG1.1对应的故障OSD组中包括OSD.0、OSD.1及OSD.2,PG1.2对应的故障OSD组中包括OSD.0、OSD.1及OSD.2,PG1.3对应的故障OSD组中包括OSD.0、OSD.2及OSD.3),并利用各个所述故障OSD组中除所述新的主用OSD之外的其他未发生故障的主用OSD对所述新的主用OSD进行数据恢复。并在完成数据恢复之后,将各个所述故障OSD组的状态标记为正常状态。
进一步地,本实施例中,该程序还包括重定向模块(图中未示出),用于:
当一所述故障OSD组接收到对象数据的写请求时,将所述写请求重定向至所述备用OSD组,利用所述备用OSD组执行所述写请求。
本实施例之所以启用备用OSD组执行写请求,是因为此时故障OSD组 中新的主用OSD尚未完成数据恢复,若该故障OSD组再执行写请求,则会造成写请求执行延时。可见,启用备用OSD组执行写请求可有效的保证写请求的执行效率。
进一步地,本实施例中,数据恢复模块还用于:
实时或定时,或在收到增量数据恢复请求时,判断所述备用OSD组的各所述备用OSD是否存储有对象数据。
当所述备用OSD组的各所述备用OSD存储有对象数据时,判断是否存在所述故障OSD组。
当不存在所述故障OSD组时,将所述备用OSD组中存储的对象数据迁移至一个或多个主用OSD中。
当存在所述故障OSD组时,查找不属于所述故障OSD组的主用OSD。
当查找到时,将所述备用OSD组中存储的对象数据迁移至一个或多个查找到的所述主用OSD中。
当未查找到时,返回恢复增量数据失败的消息,或者,返回继续查找不属于所述故障OSD组的主用OSD直至查找到不属于所述故障OSD组的主用OSD。
进一步地,本实施例中,该程序还用于:
实时或定时侦测备用OSD组中备用OSD的数量,当备用OSD的数量小于或者等于预设阈值时,在各个主机的备用OSD中选择一个或多个不属于该备用OSD组的备用OSD添加至该备用OSD组中。
此外,本申请提出一种故障处理方法。如图5所示,图5为本申请故障处理方法第一实施例的流程示意图。
本实施例中,该方法适用于电子装置,所述电子装置分别与多个主用OSD及至少一个备用OSD组通信连接,所述备用OSD组包括若干个备用OSD,所述主用OSD用于存储对象数据,各个所述对象数据的第一预设数量的副本分别存储于对应的第一预设数量的主用OSD中,该方法包括:
步骤S10,实时或定时侦测各个所述主用OSD是否发生故障。
例如,可采用心跳机制来侦测一主用OSD是否故障,实时或定时发送检测消息至各个主用OSD,若一主用OSD在预设时长内未返回回复消息,则确定该主用OSD发生故障。
步骤S20,当侦测到一故障的主用OSD时,根据预先确定的对象数据与PG之间的映射关系,确定所述故障的主用OSD中存储的各个对象数据对应的PG,并将确定的各个所述PG作为故障PG。
步骤S30,将所有所述故障PG对应的所有对象数据的副本配置量从第一预设数量减少为第二预设数量。
例如,若第一预设数量为3,则每一个PG中各个对象数据应当存在3个副本并对应存储于3个主用OSD中,即一个PG应当存在3个PG副本并对应存储于3个主用OSD中。一旦一主用OSD发生故障,则分布式存储系统中仅存在故障PG的2个PG副本,分布式存储系统识别到故障PG的副本数量少于副本配置量时,会启动数据重构,即拷贝出各个故障PG的一个PG副本,并将该拷贝的各个PG副本写入至对应的主用OSD中,以使故障PG的副本数量达到副本配置量。本实施例中,将所有所述故障PG对应的所有对象数据的副本配置量从第一预设数量减少为第二预设数量,即将所有故障PG的副本配置量从第一预设数量减少为第二预设数量,例如,第一预设数量为3,第二预设数量为2,即将故障PG的多副本策略从三副本降级为二副本。此时,除去故障的主用OSD中存储的各个故障PG的一个PG副本,其他正常状态的主用OSD中仍存在各个故障PG的两个PG副本,故障PG的副本数量等于当前副本配置量,因此,分布式存储系统不会立即进行数据重构,也不会造成数据的大量迁移。
步骤S40,从所述备用OSD组中选择一个备用OSD作为新的主用OSD,用所述新的主用OSD置换所述故障的主用OSD,并将所有所述故障PG对应的所有对象数据的副本配置量从第二预设数量增加为第一预设数量。
本实施例中,上述从所述备用OSD组中选择一个备用OSD作为新的主用OSD的步骤包括:
在备用OSD组中查找与所述故障的主用OSD处于同一主机的备用OSD。若查找到,则将该查找到的备用OSD作为新的主用OSD。若未查找到,则从备用OSD组中随机选择一备用OSD作为新的主用OSD。
进一步地,本实施例中,所述用所述新的主用OSD置换所述故障的主用OSD的步骤包括:
将预先设置的所述故障的主用OSD的设备标识信息与所述故障的主用 OSD的位置信息(例如,网络端口值)之间的映射关系解除,将所述故障的主用OSD的设备标识信息分配给所述新的主用OSD作为该新的主用OSD的设备标识信息,重新建立并保存所述新的主用OSD的设备标识信息与新的主用OSD的位置信息之间的映射关系。
本实施例中,之所以将所述故障的主用OSD的设备标识信息分配给所述新的主用OSD作为该新的主用OSD的设备标识信息,而不使用该新的主用OSD原有的设备标识信息,是因为一旦使用该新的主用OSD原有的设备标识信息,并建立该新的主用OSD原有的设备标识信息与新的主用OSD的位置信息之间的映射关系,该分布式存储系统将会识别有新的OSD加入,随即启动数据再平衡(re-balance)操作,即从各个其他的主用OSD中分别选取部分PG副本迁移至该新的主用OSD中,以实现PG副本的合理分布,数据再平衡操作会造成大量数据的迁移,从而对分布式存储系统的响应速度造成影响。
相较于现有技术,本实施例在分布式存储系统一OSD发生故障时,将故障PG对应的所有对象数据的副本配置量从第一预设数量减少为第二预设数量,使分布式存储系统识别当前故障PG副本数量满足其副本配置量,因此,不会对该故障OSD进行数据重构,也就不会造成OSD之间大量的数据迁移,可见,本申请在OSD故障处理过程中,减少了OSD之间的数据迁移量。
进一步地,本实施例中,在步骤S40之后,该方法还包括:
根据预先确定的PG与主用OSD之间的映射关系,将每一个所述故障PG对应的第一预设数量的主用OSD作为故障OSD组(如图2所示,若OSD.0为故障的主用OSD,则PG1.1、PG1.2及PG1.3均为故障PG,而PG1.1对应的故障OSD组中包括OSD.0、OSD.1及OSD.2,PG1.2对应的故障OSD组中包括OSD.0、OSD.1及OSD.2,PG1.3对应的故障OSD组中包括OSD.0、OSD.2及OSD.3),并利用各个所述故障OSD组中除所述新的主用OSD之外的其他未发生故障的主用OSD对所述新的主用OSD进行数据恢复。并在完成数据恢复之后,将各个所述故障OSD组的状态标记为正常状态。
进一步地,本实施例中,在步骤S40之后,该方法还包括:
当一所述故障OSD组接收到对象数据的写请求时,将所述写请求重定向至所述备用OSD组,利用所述备用OSD组执行所述写请求。
本实施例之所以启用备用OSD组执行写请求,是因为此时故障OSD组 中新的主用OSD尚未完成数据恢复,若该故障OSD组再执行写请求,则会造成写请求执行延时。可见,启用备用OSD组执行写请求可有效的保证写请求的执行效率。
进一步地,本实施例中,该方法还包括:
实时或定时,或在收到增量数据恢复请求时,判断所述备用OSD组的各所述备用OSD是否存储有对象数据。
当所述备用OSD组的各所述备用OSD存储有对象数据时,判断是否存在所述故障OSD组。
当不存在所述故障OSD组时,将所述备用OSD组中存储的对象数据迁移至一个或多个主用OSD中。
当存在所述故障OSD组时,查找不属于所述故障OSD组的主用OSD。
当查找到时,将所述备用OSD组中存储的对象数据迁移至一个或多个查找到的所述主用OSD中。
当未查找到时,返回恢复增量数据失败的消息,或者,返回继续查找不属于所述故障OSD组的主用OSD直至查找到不属于所述故障OSD组的主用OSD。
进一步地,本实施例中,该方法还包括:
实时或定时侦测备用OSD组中备用OSD的数量,当备用OSD的数量小于或者等于预设阈值时,在各个主机的备用OSD中选择一个或多个不属于该备用OSD组的备用OSD添加至该备用OSD组中。
进一步地,本申请还提出一种计算机可读存储介质,所述计算机可读存储介质存储有故障处理程序,所述故障处理程序可被至少一个处理器执行,以使所述至少一个处理器执行上述任一实施例中的故障处理方法。
以上所述仅为本申请的优选实施例,并非因此限制本申请的专利范围,凡是在本申请的发明构思下,利用本申请说明书及附图内容所作的等效结构变换,或直接/间接运用在其他相关的技术领域均包括在本申请的专利保护范围内。

Claims (20)

  1. 一种电子装置,其特征在于,所述电子装置分别与多个主用对象存储设备及至少一个备用对象存储设备组通信连接,所述备用对象存储设备组包括若干个备用对象存储设备,所述主用对象存储设备用于存储对象数据,各个所述对象数据的第一预设数量的副本分别存储于对应的第一预设数量的主用对象存储设备中,所述电子装置包括存储器和处理器,所述存储器上存储有故障处理程序,所述故障处理程序被所述处理器执行时实现如下步骤:
    侦测步骤:实时或定时侦测各个所述主用对象存储设备是否发生故障;
    确定步骤:当侦测到一故障的主用对象存储设备时,根据预先确定的对象数据与归置组之间的映射关系,确定所述故障的主用对象存储设备中存储的各个对象数据对应的归置组,并将确定的各个所述归置组作为故障归置组;
    降级步骤:将所有所述故障归置组对应的所有对象数据的副本配置量从第一预设数量减少为第二预设数量;
    置换步骤:从所述备用对象存储设备组中选择一个备用对象存储设备作为新的主用对象存储设备,用所述新的主用对象存储设备置换所述故障的主用对象存储设备,并将所有所述故障归置组对应的所有对象数据的副本配置量从第二预设数量增加为第一预设数量。
  2. 如权利要求1所述的电子装置,其特征在于,所述处理器执行所述故障处理程序,在所述置换步骤之后,还实现以下步骤:
    根据预先确定的归置组与主用对象存储设备之间的映射关系,将每一个所述故障归置组对应的第一预设数量的主用对象存储设备作为故障对象存储设备组,并利用各个所述故障对象存储设备组中除所述新的主用对象存储设备之外的其他未发生故障的主用对象存储设备对所述新的主用对象存储设备进行数据恢复。
  3. 如权利要求2所述的电子装置,其特征在于,所述处理器执行所述故障处理程序,在所述置换步骤之后,还实现以下步骤:
    当一所述故障对象存储设备组接收到对象数据的写请求时,将所述写请求重定向至所述备用对象存储设备组,利用所述备用对象存储设备组执行所述写请求;
    实时或定时,或在收到增量数据恢复请求时,判断所述备用对象存储设 备组的各所述备用对象存储设备是否存储有对象数据;
    当所述备用对象存储设备组的各所述备用对象存储设备存储有对象数据时,判断是否存在所述故障对象存储设备组;
    当不存在所述故障对象存储设备组时,将所述备用对象存储设备组中存储的对象数据迁移至一个或多个主用对象存储设备中;
    当存在所述故障对象存储设备组时,查找不属于所述故障对象存储设备组的主用对象存储设备,当查找到时,将所述备用对象存储设备组中存储的对象数据迁移至一个或多个查找到的所述主用对象存储设备中。
  4. 如权利要求1所述的电子装置,其特征在于,所述用所述新的主用对象存储设备置换所述故障的主用对象存储设备的步骤包括:
    将预先设置的所述故障的主用对象存储设备的设备标识信息与所述故障的主用对象存储设备的位置信息之间的映射关系解除,将所述故障的主用对象存储设备的设备标识信息分配给所述新的主用对象存储设备作为该新的主用对象存储设备的设备标识信息,重新建立并保存所述新的主用对象存储设备的设备标识信息与新的主用对象存储设备的位置信息之间的映射关系。
  5. 如权利要求2所述的电子装置,其特征在于,所述用所述新的主用对象存储设备置换所述故障的主用对象存储设备的步骤包括:
    将预先设置的所述故障的主用对象存储设备的设备标识信息与所述故障的主用对象存储设备的位置信息之间的映射关系解除,将所述故障的主用对象存储设备的设备标识信息分配给所述新的主用对象存储设备作为该新的主用对象存储设备的设备标识信息,重新建立并保存所述新的主用对象存储设备的设备标识信息与新的主用对象存储设备的位置信息之间的映射关系。
  6. 如权利要求3所述的电子装置,其特征在于,所述用所述新的主用对象存储设备置换所述故障的主用对象存储设备的步骤包括:
    将预先设置的所述故障的主用对象存储设备的设备标识信息与所述故障的主用对象存储设备的位置信息之间的映射关系解除,将所述故障的主用对象存储设备的设备标识信息分配给所述新的主用对象存储设备作为该新的主用对象存储设备的设备标识信息,重新建立并保存所述新的主用对象存储设备的设备标识信息与新的主用对象存储设备的位置信息之间的映射关系。
  7. 一种故障处理方法,适用于电子装置,其特征在于,所述电子装置分 别与多个主用对象存储设备及至少一个备用对象存储设备组通信连接,所述备用对象存储设备组包括若干个备用对象存储设备,所述主用对象存储设备用于存储对象数据,各个所述对象数据的第一预设数量的副本分别存储于对应的第一预设数量的主用对象存储设备中,该方法包括步骤:
    侦测步骤:实时或定时侦测各个所述主用对象存储设备是否发生故障;
    确定步骤:当侦测到一故障的主用对象存储设备时,根据预先确定的对象数据与归置组之间的映射关系,确定所述故障的主用对象存储设备中存储的各个对象数据对应的归置组,并将确定的各个所述归置组作为故障归置组;
    降级步骤:将所有所述故障归置组对应的所有对象数据的副本配置量从第一预设数量减少为第二预设数量;
    置换步骤:从所述备用对象存储设备组中选择一个备用对象存储设备作为新的主用对象存储设备,用所述新的主用对象存储设备置换所述故障的主用对象存储设备,并将所有所述故障归置组对应的所有对象数据的副本配置量从第二预设数量增加为第一预设数量。
  8. 如权利要求7所述的故障处理方法,其特征在于,在所述置换步骤之后,该方法还包括:
    根据预先确定的归置组与主用对象存储设备之间的映射关系,将每一个所述故障归置组对应的第一预设数量的主用对象存储设备作为故障对象存储设备组,并利用各个所述故障对象存储设备组中除所述新的主用对象存储设备之外的其他未发生故障的主用对象存储设备对所述新的主用对象存储设备进行数据恢复。
  9. 如权利要求8所述的故障处理方法,其特征在于,在所述置换步骤之后,该方法还包括:
    当一所述故障对象存储设备组接收到对象数据的写请求时,将所述写请求重定向至所述备用对象存储设备组,利用所述备用对象存储设备组执行所述写请求;
    实时或定时,或在收到增量数据恢复请求时,判断所述备用对象存储设备组的各所述备用对象存储设备是否存储有对象数据;
    当所述备用对象存储设备组的各所述备用对象存储设备存储有对象数据时,判断是否存在所述故障对象存储设备组;
    当不存在所述故障对象存储设备组时,将所述备用对象存储设备组中存储的对象数据迁移至一个或多个主用对象存储设备中;
    当存在所述故障对象存储设备组时,查找不属于所述故障对象存储设备组的主用对象存储设备,当查找到时,将所述备用对象存储设备组中存储的对象数据迁移至一个或多个查找到的所述主用对象存储设备中。
  10. 如权利要求7所述的故障处理方法,其特征在于,所述用所述新的主用对象存储设备置换所述故障的主用对象存储设备的步骤包括:
    将预先设置的所述故障的主用对象存储设备的设备标识信息与所述故障的主用对象存储设备的位置信息之间的映射关系解除,将所述故障的主用对象存储设备的设备标识信息分配给所述新的主用对象存储设备作为该新的主用对象存储设备的设备标识信息,重新建立并保存所述新的主用对象存储设备的设备标识信息与新的主用对象存储设备的位置信息之间的映射关系。
  11. 如权利要求8所述的故障处理方法,其特征在于,所述用所述新的主用对象存储设备置换所述故障的主用对象存储设备的步骤包括:
    将预先设置的所述故障的主用对象存储设备的设备标识信息与所述故障的主用对象存储设备的位置信息之间的映射关系解除,将所述故障的主用对象存储设备的设备标识信息分配给所述新的主用对象存储设备作为该新的主用对象存储设备的设备标识信息,重新建立并保存所述新的主用对象存储设备的设备标识信息与新的主用对象存储设备的位置信息之间的映射关系。
  12. 如权利要求9所述的故障处理方法,其特征在于,所述用所述新的主用对象存储设备置换所述故障的主用对象存储设备的步骤包括:
    将预先设置的所述故障的主用对象存储设备的设备标识信息与所述故障的主用对象存储设备的位置信息之间的映射关系解除,将所述故障的主用对象存储设备的设备标识信息分配给所述新的主用对象存储设备作为该新的主用对象存储设备的设备标识信息,重新建立并保存所述新的主用对象存储设备的设备标识信息与新的主用对象存储设备的位置信息之间的映射关系。
  13. 一种分布式存储系统,其特征在于,所述分布式存储系统包括电子装置、多个主用对象存储设备及至少一个备用对象存储设备组,所述电子装置分别与各个所述主用对象存储设备及各个所述备用对象存储设备组通信连接,所述备用对象存储设备组包括若干个备用对象存储设备,所述主用对象 存储设备用于存储对象数据,各个所述对象数据的第一预设数量的副本分别存储于对应的第一预设数量的主用对象存储设备中,所述电子装置包括存储器和处理器,所述存储器上存储有故障处理程序,所述故障处理程序被所述处理器执行时实现如下步骤:
    侦测步骤:实时或定时侦测各个所述主用对象存储设备是否发生故障;
    确定步骤:当侦测到一故障的主用对象存储设备时,根据预先确定的对象数据与归置组之间的映射关系,确定所述故障的主用对象存储设备中存储的各个对象数据对应的归置组,并将确定的各个所述归置组作为故障归置组;
    降级步骤:将所有所述故障归置组对应的所有对象数据的副本配置量从第一预设数量减少为第二预设数量;
    置换步骤:从所述备用对象存储设备组中选择一个备用对象存储设备作为新的主用对象存储设备,用所述新的主用对象存储设备置换所述故障的主用对象存储设备,并将所有所述故障归置组对应的所有对象数据的副本配置量从第二预设数量增加为第一预设数量。
  14. 如权利要求13所述的分布式存储系统,其特征在于,所述处理器执行所述故障处理程序,在所述置换步骤之后,还实现以下步骤:
    根据预先确定的归置组与主用对象存储设备之间的映射关系,将每一个所述故障归置组对应的第一预设数量的主用对象存储设备作为故障对象存储设备组,并利用各个所述故障对象存储设备组中除所述新的主用对象存储设备之外的其他未发生故障的主用对象存储设备对所述新的主用对象存储设备进行数据恢复。
  15. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质存储有故障处理程序,所述故障处理程序可被至少一个处理器执行,以使所述至少一个处理器执行步骤:
    侦测步骤:实时或定时侦测各个所述主用对象存储设备是否发生故障;
    确定步骤:当侦测到一故障的主用对象存储设备时,根据预先确定的对象数据与归置组之间的映射关系,确定所述故障的主用对象存储设备中存储的各个对象数据对应的归置组,并将确定的各个所述归置组作为故障归置组;
    降级步骤:将所有所述故障归置组对应的所有对象数据的副本配置量从第一预设数量减少为第二预设数量;
    置换步骤:从所述备用对象存储设备组中选择一个备用对象存储设备作为新的主用对象存储设备,用所述新的主用对象存储设备置换所述故障的主用对象存储设备,并将所有所述故障归置组对应的所有对象数据的副本配置量从第二预设数量增加为第一预设数量。
  16. 如权利要求15所述的计算机可读存储介质,其特征在于,所述处理器执行所述故障处理程序,在所述置换步骤之后,还实现以下步骤:
    根据预先确定的归置组与主用对象存储设备之间的映射关系,将每一个所述故障归置组对应的第一预设数量的主用对象存储设备作为故障对象存储设备组,并利用各个所述故障对象存储设备组中除所述新的主用对象存储设备之外的其他未发生故障的主用对象存储设备对所述新的主用对象存储设备进行数据恢复。
  17. 如权利要求16所述的计算机可读存储介质,其特征在于,所述处理器执行所述故障处理程序,在所述置换步骤之后,还实现以下步骤:
    当一所述故障对象存储设备组接收到对象数据的写请求时,将所述写请求重定向至所述备用对象存储设备组,利用所述备用对象存储设备组执行所述写请求;
    实时或定时,或在收到增量数据恢复请求时,判断所述备用对象存储设备组的各所述备用对象存储设备是否存储有对象数据;
    当所述备用对象存储设备组的各所述备用对象存储设备存储有对象数据时,判断是否存在所述故障对象存储设备组;
    当不存在所述故障对象存储设备组时,将所述备用对象存储设备组中存储的对象数据迁移至一个或多个主用对象存储设备中;
    当存在所述故障对象存储设备组时,查找不属于所述故障对象存储设备组的主用对象存储设备,当查找到时,将所述备用对象存储设备组中存储的对象数据迁移至一个或多个查找到的所述主用对象存储设备中。
  18. 如权利要求15所述的计算机可读存储介质,其特征在于,所述用所述新的主用对象存储设备置换所述故障的主用对象存储设备的步骤包括:
    将预先设置的所述故障的主用对象存储设备的设备标识信息与所述故障的主用对象存储设备的位置信息之间的映射关系解除,将所述故障的主用对象存储设备的设备标识信息分配给所述新的主用对象存储设备作为该新的主 用对象存储设备的设备标识信息,重新建立并保存所述新的主用对象存储设备的设备标识信息与新的主用对象存储设备的位置信息之间的映射关系。
  19. 如权利要求16所述的计算机可读存储介质,其特征在于,所述用所述新的主用对象存储设备置换所述故障的主用对象存储设备的步骤包括:
    将预先设置的所述故障的主用对象存储设备的设备标识信息与所述故障的主用对象存储设备的位置信息之间的映射关系解除,将所述故障的主用对象存储设备的设备标识信息分配给所述新的主用对象存储设备作为该新的主用对象存储设备的设备标识信息,重新建立并保存所述新的主用对象存储设备的设备标识信息与新的主用对象存储设备的位置信息之间的映射关系。
  20. 如权利要求17所述的计算机可读存储介质,其特征在于,所述用所述新的主用对象存储设备置换所述故障的主用对象存储设备的步骤包括:
    将预先设置的所述故障的主用对象存储设备的设备标识信息与所述故障的主用对象存储设备的位置信息之间的映射关系解除,将所述故障的主用对象存储设备的设备标识信息分配给所述新的主用对象存储设备作为该新的主用对象存储设备的设备标识信息,重新建立并保存所述新的主用对象存储设备的设备标识信息与新的主用对象存储设备的位置信息之间的映射关系。
PCT/CN2019/088634 2018-11-28 2019-05-27 故障处理方法、装置、分布式存储系统和存储介质 WO2020107829A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201811433003.9A CN109614276B (zh) 2018-11-28 2018-11-28 故障处理方法、装置、分布式存储系统和存储介质
CN201811433003.9 2018-11-28

Publications (1)

Publication Number Publication Date
WO2020107829A1 true WO2020107829A1 (zh) 2020-06-04

Family

ID=66006290

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/088634 WO2020107829A1 (zh) 2018-11-28 2019-05-27 故障处理方法、装置、分布式存储系统和存储介质

Country Status (2)

Country Link
CN (1) CN109614276B (zh)
WO (1) WO2020107829A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111966291A (zh) * 2020-08-14 2020-11-20 苏州浪潮智能科技有限公司 一种存储集群中的数据存储方法、系统及相关装置
CN112395263A (zh) * 2020-11-26 2021-02-23 新华三大数据技术有限公司 一种osd的数据恢复方法及装置
WO2022028033A1 (zh) * 2020-08-01 2022-02-10 广西大学 一种基于分级映射的Ceph存储系统自动均衡存储方法

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109614276B (zh) * 2018-11-28 2021-09-21 平安科技(深圳)有限公司 故障处理方法、装置、分布式存储系统和存储介质
CN111190775A (zh) * 2019-12-30 2020-05-22 浪潮电子信息产业股份有限公司 一种osd更换方法、系统、设备及计算机可读存储介质
CN111752483B (zh) * 2020-05-28 2022-07-22 苏州浪潮智能科技有限公司 一种存储集群中存储介质变更减少重构数据的方法及系统
CN112162699B (zh) * 2020-09-18 2023-12-22 北京浪潮数据技术有限公司 一种数据读写方法、装置、设备及计算机可读存储介质
CN113126925B (zh) * 2021-04-21 2022-08-02 山东英信计算机技术有限公司 一种成员列表确定方法、装置、设备及可读存储介质
CN114510379B (zh) * 2022-04-21 2022-11-01 山东百盟信息技术有限公司 一种分布式阵列视频数据存储装置

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107729185A (zh) * 2017-10-26 2018-02-23 新华三技术有限公司 一种故障处理方法及装置
CN108235751A (zh) * 2017-12-18 2018-06-29 华为技术有限公司 识别对象存储设备亚健康的方法、装置和数据存储系统
CN109614276A (zh) * 2018-11-28 2019-04-12 平安科技(深圳)有限公司 故障处理方法、装置、分布式存储系统和存储介质

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2725491B1 (en) * 2012-10-26 2019-01-02 Western Digital Technologies, Inc. A distributed object storage system comprising performance optimizations
CN106250055A (zh) * 2016-07-12 2016-12-21 乐视控股(北京)有限公司 一种数据存储方法及系统
CN108121510A (zh) * 2017-12-19 2018-06-05 紫光华山信息技术有限公司 Osd选取方法、数据写入方法、装置和存储系统
CN108287669B (zh) * 2018-01-26 2019-11-12 平安科技(深圳)有限公司 数据存储方法、装置及存储介质

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107729185A (zh) * 2017-10-26 2018-02-23 新华三技术有限公司 一种故障处理方法及装置
CN108235751A (zh) * 2017-12-18 2018-06-29 华为技术有限公司 识别对象存储设备亚健康的方法、装置和数据存储系统
CN109614276A (zh) * 2018-11-28 2019-04-12 平安科技(深圳)有限公司 故障处理方法、装置、分布式存储系统和存储介质

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022028033A1 (zh) * 2020-08-01 2022-02-10 广西大学 一种基于分级映射的Ceph存储系统自动均衡存储方法
CN111966291A (zh) * 2020-08-14 2020-11-20 苏州浪潮智能科技有限公司 一种存储集群中的数据存储方法、系统及相关装置
CN112395263A (zh) * 2020-11-26 2021-02-23 新华三大数据技术有限公司 一种osd的数据恢复方法及装置
CN112395263B (zh) * 2020-11-26 2022-08-19 新华三大数据技术有限公司 一种osd的数据恢复方法及装置

Also Published As

Publication number Publication date
CN109614276A (zh) 2019-04-12
CN109614276B (zh) 2021-09-21

Similar Documents

Publication Publication Date Title
WO2020107829A1 (zh) 故障处理方法、装置、分布式存储系统和存储介质
US10261853B1 (en) Dynamic replication error retry and recovery
CN109656895B (zh) 分布式存储系统、数据写入方法、装置和存储介质
CN109656896B (zh) 故障修复方法、装置及分布式存储系统和存储介质
US10459814B2 (en) Drive extent based end of life detection and proactive copying in a mapped RAID (redundant array of independent disks) data storage system
US9886736B2 (en) Selectively killing trapped multi-process service clients sharing the same hardware context
CN109669822B (zh) 电子装置、备用存储池的创建方法和计算机可读存储介质
EP3311272B1 (en) A method of live migration
US10430336B2 (en) Lock-free raid implementation in multi-queue architecture
US10445295B1 (en) Task-based framework for synchronization of event handling between nodes in an active/active data storage system
CN103516736A (zh) 分布式缓存系统的数据恢复方法及装置
US10223205B2 (en) Disaster recovery data sync
CN116107516B (zh) 数据写入方法、装置、固态硬盘、电子设备及存储介质
US10176035B2 (en) System, information processing device, and non-transitory medium for storing program for migration of virtual machine
CN115167782B (zh) 临时存储副本管理方法、系统、设备和存储介质
CN113051104A (zh) 基于纠删码的磁盘间数据恢复方法及相关装置
US20230251931A1 (en) System and device for data recovery for ephemeral storage
US9195528B1 (en) Systems and methods for managing failover clusters
EP3696658A1 (en) Log management method, server and database system
US20130246710A1 (en) Storage system and data management method
CN106776142B (zh) 一种数据存储方法以及数据存储装置
WO2021043246A1 (zh) 数据读取方法及装置
US11074003B2 (en) Storage system and restoration method
US10656867B2 (en) Computer system, data management method, and data management program
JP2513060B2 (ja) 故障回復型計算機

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19888711

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19888711

Country of ref document: EP

Kind code of ref document: A1