WO2019178891A1 - 一种处理设备故障的方法和系统 - Google Patents

一种处理设备故障的方法和系统 Download PDF

Info

Publication number
WO2019178891A1
WO2019178891A1 PCT/CN2018/081562 CN2018081562W WO2019178891A1 WO 2019178891 A1 WO2019178891 A1 WO 2019178891A1 CN 2018081562 W CN2018081562 W CN 2018081562W WO 2019178891 A1 WO2019178891 A1 WO 2019178891A1
Authority
WO
WIPO (PCT)
Prior art keywords
target
node
control device
shared storage
standby
Prior art date
Application number
PCT/CN2018/081562
Other languages
English (en)
French (fr)
Inventor
范家星
过晓春
Original Assignee
网宿科技股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 网宿科技股份有限公司 filed Critical 网宿科技股份有限公司
Priority to EP18893326.1A priority Critical patent/EP3570169B1/en
Priority to US16/340,241 priority patent/US20210326224A1/en
Publication of WO2019178891A1 publication Critical patent/WO2019178891A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/2053Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where persistent mass storage functionality or persistent mass storage control functionality is redundant
    • G06F11/2094Redundant storage or storage space
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1479Generic software techniques for error detection or fault masking
    • G06F11/1489Generic software techniques for error detection or fault masking through recovery blocks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/202Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant
    • G06F11/2023Failover techniques
    • G06F11/2028Failover techniques eliminating a faulty processor or activating a spare
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/202Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant
    • G06F11/2023Failover techniques
    • G06F11/2025Failover techniques using centralised failover control functionality
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/202Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant
    • G06F11/2046Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant where the redundant components share persistent storage
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/2053Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where persistent mass storage functionality or persistent mass storage control functionality is redundant
    • G06F11/2089Redundant storage control functionality
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/2053Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where persistent mass storage functionality or persistent mass storage control functionality is redundant
    • G06F11/2089Redundant storage control functionality
    • G06F11/2092Techniques of failing over between control units
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0614Improving the reliability of storage systems
    • G06F3/0617Improving the reliability of storage systems in relation to availability
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0629Configuration or reconfiguration of storage systems
    • G06F3/0634Configuration or reconfiguration of storage systems by changing the state or mode of one or more devices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]

Definitions

  • the present invention relates to the field of data storage technologies, and in particular, to a method and system for processing device failure.
  • Service providers generally use distributed storage systems to store data, which can be distributed across multiple storage servers (called storage nodes) in a storage cluster.
  • a distributed storage system When a distributed storage system provides storage services, it can create multiple copies of data for each piece of data and store them in multiple storage nodes. If a storage node fails and cannot provide data storage services, the cluster management node of the distributed storage system may first determine the data stored by the faulty node, and then search for multiple storage nodes that store corresponding data copies, and may select multiple The target storage node can then instruct the storage node storing the data copy described above to recover the data to the plurality of target storage nodes using the data copy.
  • the foregoing multiple storage nodes for data recovery need to allocate a large amount of device processing resources to perform the above data recovery processing, so that there is not enough device processing resources to provide data storage services, and thus the storage quality of the distributed storage system is poor.
  • a method of processing a device failure is provided, the method being applied to a distributed storage system, the distributed storage system comprising a cluster management node and a plurality of storage nodes, each storage node comprising a shared storage device, each The shared storage device is associated with a control device and a standby device, and the method includes:
  • the target standby device associated with the target shared storage device detects an operation state of the target control device that manages the target shared storage device
  • the target standby device sends a management request to the target shared storage device, and sends a replacement request for the target control device to the cluster management node;
  • the target shared storage device sets the target standby device as a local management device
  • the cluster management node determines that the target standby device is a replacement device of the target control device.
  • the management request carries metadata information of the target standby device
  • the target shared storage device sets the target backup device as a local management device, including:
  • the target shared storage device determines that the target standby device is the local management device of the shared storage device by modifying the home subscriber information of the target shared storage device to the metadata information of the target standby device.
  • the replacement request carries the node identifier of the storage node to which the target control device belongs and the metadata information of the target standby device;
  • the cluster management node determines that the target standby device is a replacement device of the target control device, and includes:
  • the cluster management node determines that the target backup device is a replacement device of the target control device by modifying metadata information corresponding to the node identifier of the storage node to which the target control device belongs to the metadata information of the target backup device. .
  • each shared storage device is also associated with at least one idle device
  • the method further includes:
  • the cluster management node randomly determines a target idle device in at least one idle device associated with the target shared storage device to make the target The idle device detects an operating state of the target standby device.
  • the cluster management node determines that the target standby device is the replacement device of the target control device, the cluster management node further includes:
  • the cluster management node updates the metadata information corresponding to the node identifier of the storage node to which the target control device belongs in the node information list to the metadata information of the target standby device, and pushes the updated node information list to the cluster. All storage nodes inside.
  • a system for processing device failure being a distributed storage system, the distributed storage system comprising a cluster management node and a plurality of storage nodes, each storage node comprising a shared storage device, each Shared storage devices are associated with one control device and one standby device, where:
  • a target standby device configured to detect an operating state of a target control device that manages the target shared storage device, where the target standby device is associated with the target shared storage device;
  • the target backup device is further configured to: if the target control device fails, send a management request to the target shared storage device, and send a replacement request for the target control device to the cluster management node;
  • the target shared storage device is configured to set the target standby device as a local management device
  • the cluster management node is configured to determine that the target standby device is a replacement device of the target control device.
  • the management request carries metadata information of the target standby device
  • the target shared storage device is further configured to determine that the target standby device is a local management device of the shared storage device by modifying the home subscriber information of the target shared storage device to the metadata information of the target standby device. .
  • the replacement request carries the node identifier of the storage node to which the target control device belongs and the metadata information of the target standby device;
  • the cluster management node is further configured to: modify the metadata information corresponding to the node identifier of the storage node to which the target control device belongs to the metadata information of the target standby device, and determine that the target standby device is the target control. Replacement device for the device.
  • each shared storage device is also associated with at least one idle device
  • the cluster management node is further configured to:
  • cluster management node is further configured to:
  • the metadata information corresponding to the node identifier of the storage node to which the target control device belongs in the node information list is updated to the metadata information of the target standby device, and the updated node information list is pushed to all storage nodes.
  • the target standby device associated with the target shared storage device detects the running state of the target control device of the management target shared storage device; if the target control device fails, the target standby device sends a management request to the target shared storage device, and The cluster management node sends a replacement request for the target control device; the target shared storage device sets the target standby device as the local management device; and the cluster management node determines that the target standby device is the replacement device of the target control device.
  • the storage node can store the data in the shared storage device.
  • the backup device associated with the shared storage device can serve as a replacement device for the control device, and the service device continues to provide services instead of the control device. There is no need to consume equipment processing resources for data recovery processing, so the storage service quality of the distributed storage system can be guaranteed to some extent.
  • FIG. 1 is a schematic structural diagram of a system for processing a device fault according to an embodiment of the present invention
  • FIG. 2 is a schematic structural diagram of a system for processing a device fault according to an embodiment of the present invention
  • FIG. 3 is a flowchart of a method for processing a device fault according to an embodiment of the present invention.
  • the embodiment of the present invention provides a method for processing a device fault.
  • the execution entity of the method is a distributed storage system, which can be deployed in a service room of a service provider.
  • the distributed storage system includes a storage cluster composed of a cluster management node and a plurality of storage nodes, each storage node includes a shared storage device, and each shared storage device can communicate with each other through wired or wireless communication.
  • a control device and a standby device association wherein the cluster management node can manage storage nodes in the storage cluster, such as adding and deleting storage nodes in the storage cluster, detecting service status of the storage node, etc.; the control device of the storage node can provide data.
  • the storage and reading service which can store the data in the shared storage device associated with it; after the control device fails, the standby device can serve as a replacement device for the control device, instead of providing the storage and reading service of the data instead of the control device.
  • a single control device can utilize its own device processing resources to construct a plurality of virtual control devices, each of which can provide data storage and reading services, such that Each virtual control device can act as a control device for a storage node and associate with different shared storage devices.
  • Step 301 The target standby device associated with the target shared storage device detects an operating state of the target control device of the management target shared storage device.
  • the operating state of the target control device can be divided into a normal state and a fault state.
  • the target control device when the target control device is in a normal state, it can receive a data storage or read request from the external device, storing or reading the corresponding data in the managed target shared storage device.
  • the target standby device associated with the target shared storage device may detect the running state of the target control device by periodically sending a heartbeat query request to the target control device. If the target standby device receives the heartbeat of the target control device within a preset time In response, the target control device can be recorded as a normal state, otherwise the target control device is recorded as a fault state.
  • Step 302 If the target control device fails, the target standby device sends a management request to the target shared storage device, and sends a replacement request for the target control device to the cluster management node.
  • the target standby device when the target control device has a failure such as downtime or hardware damage, the target standby device can detect that the target control device is in a fault state. Thereafter, the target standby device may send a management request to the target shared storage device to take over the management function of the target shared storage device by the target control device, and at the same time, the target standby device may further send a replacement request for the target control device to the cluster management node, to As an alternative device to the target control device, the storage and reading service of the data is continued to be provided instead of the target control device.
  • Step 303 The target shared storage device sets the target standby device as a local management device.
  • the target shared storage device may cancel the management right of the target control device, and set the target standby device as the local management device of the target shared storage device based on the management request.
  • the specific processing procedure of step 303 may be as follows: the target shared storage device determines that the target standby device is a local management device of the shared storage device by modifying the owner information of the shared storage device to the metadata information of the target standby device.
  • the management request sent by the target standby device to the target shared storage device carries the metadata information of the target standby device.
  • the local management device of the target shared storage device may be determined by the owner information recorded by the target shared storage device. Therefore, the local management of the target shared storage device may be replaced by modifying the owner information recorded by the target shared storage device. device.
  • the above-mentioned subscriber information may be metadata information of the device, such as a device identifier, a communication address, etc., wherein the device identifier may be a unique identifier of the device itself, for example, A2001, and the communication address may be IP (Internet Protocol). Address, for example 1.1.1.106.
  • the target standby device may send a management request carrying the metadata information of the target standby device to the target shared storage device.
  • the target shared storage device can receive the management request and obtain the metadata information of the target standby device from the management request. Thereafter, the target shared storage device may modify the locally recorded owner information to the metadata information of the target standby device, so that the target standby device may be determined as the local management device of the shared storage device.
  • Step 304 The cluster management node determines that the target standby device is a replacement device of the target control device.
  • the cluster management node may determine that the target standby device is the replacement device of the target control device. Further, the cluster management node may further send a heartbeat query request to the target control device to detect the running state of the target control device. If it is detected that the target control device is in a fault state, the cluster management node may perform the above replacement request to determine that the target standby device is the replacement device of the target control device, otherwise reject the replacement request. In this way, the cluster management node can identify the wrong replacement request and ensure the normal operation of the target control device.
  • the specific processing procedure of step 304 may be as follows: the cluster management node determines that the target standby device is the target control device by modifying the metadata information corresponding to the node identifier of the storage node to which the target control device belongs to the metadata information of the target standby device. Replacement device.
  • the replacement request for the target control device sent by the target standby device to the cluster management node carries the node identifier of the storage node to which the target control device belongs and the metadata information of the target standby device.
  • the target shared device can obtain the node identifier of the storage node to which the target control device belongs, and then the target standby device can generate the storage node that carries the target control device.
  • a replacement request for the node identifier and the metadata information of the target standby device and sends the replacement request to the cluster management node.
  • the cluster management node may modify the metadata information corresponding to the node identifier of the storage node to which the target control device belongs to the metadata information of the target standby device, and the cluster management node may determine the target backup.
  • the device is a replacement device for the target control device.
  • the cluster management node may also determine, when the standby device manages the shared storage device, an idle device, as a backup device of the standby device, the corresponding processing may be as follows: when determining that the target standby device is the replacement device of the target control device The cluster management node randomly determines the target idle device in the at least one idle device associated with the target shared storage device, so that the target idle device detects the running state of the target standby device.
  • each shared storage device can also be associated with at least one idle device.
  • the cluster management node may randomly determine one of the at least one idle device associated with the target shared storage device.
  • the target idle device is a standby device of the target standby device, and the target idle device can detect the running state of the target standby device. If the target standby device fails, the subsequent processing of the target idle device can refer to the processing process of the target standby device, where No longer.
  • the cluster management node may respond only to the replacement request received within the preset time, and the corresponding processing may be as follows: the cluster management node detects the running status of all the control devices; if the target control device is detected as the fault state After the preset time is received, the cluster management node receives the replacement request, and determines that the target standby device is the replacement device of the target control device, otherwise, in the at least one idle device associated with the target shared storage device, the idle device is reselected as the target. Spare device.
  • the cluster management node may periodically send a heartbeat query request to the control device of the storage node to detect the service state of the storage node.
  • the cluster management node may start timing, and if the cluster management node receives the replacement request for the target control device within a preset time, for example, 2 seconds, the replacement request is executed. Determining the target standby device as a replacement device of the target control device; if the cluster management node does not receive the replacement request for the target control device within the preset time, the cluster management node may be at least one idle associated with the target shared storage device In the device, reselect an idle device as the target standby device.
  • the newly selected target standby device may send a management request to the target shared storage device according to the above process, and send a replacement request for the target control device to the cluster management node.
  • the subsequent processing reference may be made to the foregoing processing, and details are not described herein again.
  • the cluster management node may further update and push the node information list, and the corresponding processing may be as follows: the cluster management node selects the target information in the node information list.
  • the metadata information corresponding to the node identifier of the storage node to which the device belongs is updated to the metadata information of the target standby device, and the updated node information list is pushed to all the storage nodes.
  • the node management node may maintain a node information list, where the node information list records the node identifier, metadata information, and service status of all storage nodes in the storage cluster, wherein the node identifier is an identifier that can uniquely determine the storage node.
  • Information metadata information can provide an access address for data storage and reading of the control device, such as an IP address
  • the service status can be the operating state of the control device.
  • the cluster management node may push the locally maintained node information list to all the storage nodes, so that the storage node obtains the current metadata information of each storage node through the node information list, and then the storage node can pass the current metadata information of each storage node. Store and read data with each storage node.
  • the cluster management node may update the metadata information corresponding to the node identifier of the storage node to which the target control device belongs in the node information list to the metadata information of the target standby device.
  • the cluster management node can push the updated node information list to all the storage nodes, so that all the storage nodes can obtain the updated metadata information in time.
  • the target standby device associated with the target shared storage device detects the running state of the target control device of the management target shared storage device; if the target control device fails, the target standby device sends a management request to the target shared storage device, and The cluster management node sends a replacement request for the target control device; the target shared storage device sets the target standby device as the local management device; and the cluster management node determines that the target standby device is the replacement device of the target control device.
  • the storage node can store the data in the shared storage device.
  • the backup device associated with the shared storage device can serve as a replacement device for the control device, and the service device continues to provide services instead of the control device. There is no need to consume equipment processing resources for data recovery processing, so the storage service quality of the distributed storage system can be guaranteed to some extent.
  • the embodiment of the present invention further provides a system for processing device failure.
  • the system is a distributed storage system, and the distributed storage system includes a cluster management node and A plurality of storage nodes, each of which includes a shared storage device, each shared storage device being associated with a control device and a standby device, wherein:
  • a target standby device configured to detect an operating state of a target control device that manages the target shared storage device, where the target standby device is associated with the target shared storage device;
  • the target backup device is further configured to: if the target control device fails, send a management request to the target shared storage device, and send a replacement request for the target control device to the cluster management node;
  • the target shared storage device is configured to set the target standby device as a local management device
  • the cluster management node is configured to determine that the target standby device is a replacement device of the target control device.
  • the management request carries metadata information of the target standby device
  • the target shared storage device is further configured to determine that the target standby device is a local management device of the shared storage device by modifying the home subscriber information of the target shared storage device to the metadata information of the target standby device. .
  • the replacement request carries the node identifier of the storage node to which the target control device belongs and the metadata information of the target standby device;
  • the cluster management node is further configured to: modify the metadata information corresponding to the node identifier of the storage node to which the target control device belongs to the metadata information of the target standby device, and determine that the target standby device is the target control. Replacement device for the device.
  • each shared storage device is also associated with at least one idle device
  • the cluster management node is further configured to: when determining that the target standby device is a replacement device of the target control device, randomly determine a target idle device in at least one idle device associated with the target shared storage device, So that the target idle device detects an operating state of the target standby device.
  • the cluster management node is further configured to: update metadata information corresponding to the node identifier of the storage node to which the target control device belongs in the node information list to metadata information of the target standby device, and update The subsequent list of node information is pushed to all storage nodes inside the cluster.
  • the target standby device associated with the target shared storage device detects the running state of the target control device of the management target shared storage device; if the target control device fails, the target standby device sends a management request to the target shared storage device, and The cluster management node sends a replacement request for the target control device; the target shared storage device sets the target standby device as the local management device; and the cluster management node determines that the target standby device is the replacement device of the target control device.
  • the storage node can store the data in the shared storage device.
  • the backup device associated with the shared storage device can serve as a replacement device for the control device, and the service device continues to provide services instead of the control device. There is no need to consume equipment processing resources for data recovery processing, so the storage service quality of the distributed storage system can be guaranteed to some extent.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Human Computer Interaction (AREA)
  • Hardware Redundancy (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Retry When Errors Occur (AREA)

Abstract

本发明公开了一种处理设备故障的方法和系统,属于数据存储技术领域。所述方法包括:目标共享存储设备关联的目标备用设备检测管理所述目标共享存储设备的目标控制设备的运行状态;如果所述目标控制设备发生故障,所述目标备用设备向所述目标共享存储设备发送管理请求,并向所述集群管理节点发送对于所述目标控制设备的替换请求;所述目标共享存储设备将所述目标备用设备设置为本地管理设备;所述集群管理节点确定所述目标备用设备为所述目标控制设备的替换设备。采用本发明,可以保证分布式存储系统的存储服务质量。

Description

一种处理设备故障的方法和系统 技术领域
本发明涉及数据存储技术领域,尤其涉及一种处理设备故障的方法和系统。
背景技术
当前网络业务的种类越来越多,功能也越来越丰富,也随之产生了海量的数据。业务提供方一般采用分布式存储系统来存储数据,其中的数据可以分散存储在存储集群的多个存储服务器(可称为存储节点)中。
分布式存储系统在提供存储服务时,可以对每份数据创建多份数据副本,并将这些数据副本存储在多个存储节点中。如果某个存储节点发生故障,不能继续提供数据存储服务,分布式存储系统的集群管理节点可以先确定故障节点存储的数据,再查找存储有相应数据副本的多个存储节点,同时可以选择多个目标存储节点,然后可以指示上述存储有数据副本的存储节点,利用数据副本将数据恢复到上述多个目标存储节点中。
在实现本发明的过程中,发明人发现现有技术至少存在以下问题:
上述用于数据恢复的多个存储节点,需要调配大量的设备处理资源来进行上述数据恢复处理,导致没有足够的设备处理资源来提供数据存储服务,故而分布式存储系统的存储服务质量较差。
发明内容
为了解决现有技术的问题,本发明实施例提供了一种处理设备故障的方法和系统。所述技术方案如下:
一方面,提供了一种处理设备故障的方法,所述方法应用于分布式存储系统,所述分布式存储系统包括集群管理节点和多个存储节点,每个存储节点包括一个共享存储设备,每个共享存储设备与一个控制设备、一个备用设备关联,所述方法包括:
目标共享存储设备关联的目标备用设备检测管理所述目标共享存储设备的 目标控制设备的运行状态;
如果所述目标控制设备发生故障,所述目标备用设备向所述目标共享存储设备发送管理请求,并向所述集群管理节点发送对于所述目标控制设备的替换请求;
所述目标共享存储设备将所述目标备用设备设置为本地管理设备;
所述集群管理节点确定所述目标备用设备为所述目标控制设备的替换设备。
进一步的,所述管理请求中携带有所述目标备用设备的元数据信息;
所述目标共享存储设备将所述目标备用设备设置为本地管理设备,包括:
所述目标共享存储设备通过将所述目标共享存储设备的归属者信息修改为所述目标备用设备的元数据信息,确定所述目标备用设备为所述共享存储设备的本地管理设备。
进一步的,所述替换请求中携带有所述目标控制设备所属存储节点的节点标识和所述目标备用设备的元数据信息;
所述集群管理节点确定所述目标备用设备为所述目标控制设备的替换设备,包括:
所述集群管理节点通过将所述目标控制设备所属存储节点的节点标识对应的元数据信息修改为所述目标备用设备的元数据信息,确定所述目标备用设备为所述目标控制设备的替换设备。
进一步的,每个共享存储设备还与至少一个空闲设备关联;
所述方法还包括:
当确定所述目标备用设备为所述目标控制设备的替换设备时,所述集群管理节点在与所述目标共享存储设备关联的至少一个空闲设备中,随机确定目标空闲设备,以使所述目标空闲设备检测所述目标备用设备的运行状态。
进一步的,所述集群管理节点确定所述目标备用设备作为所述目标控制设备的替换设备之后,还包括:
所述集群管理节点将节点信息列表中所述目标控制设备所属存储节点的节点标识对应的元数据信息,更新为所述目标备用设备的元数据信息,并将更新后的节点信息列表推送至集群内部的所有存储节点。
另一方面,提供了一种处理设备故障的系统,所述系统为分布式存储系统, 所述分布式存储系统包括集群管理节点和多个存储节点,每个存储节点包括一个共享存储设备,每个共享存储设备与一个控制设备、一个备用设备关联,其中:
目标备用设备,用于检测管理目标共享存储设备的目标控制设备的运行状态,其中,所述目标备用设备与所述目标共享存储设备关联;
所述目标备用设备,还用于如果所述目标控制设备发生故障,向所述目标共享存储设备发送管理请求,并向所述集群管理节点发送对于所述目标控制设备的替换请求;
所述目标共享存储设备,用于将所述目标备用设备设置为本地管理设备;
所述集群管理节点,用于确定所述目标备用设备为所述目标控制设备的替换设备。
进一步的,所述管理请求中携带有所述目标备用设备的元数据信息;
所述目标共享存储设备,还用于通过将所述目标共享存储设备的归属者信息修改为所述目标备用设备的元数据信息,确定所述目标备用设备为所述共享存储设备的本地管理设备。
进一步的,所述替换请求中携带有所述目标控制设备所属存储节点的节点标识和所述目标备用设备的元数据信息;
所述集群管理节点,还用于通过将所述目标控制设备所属存储节点的节点标识对应的元数据信息修改为所述目标备用设备的元数据信息,确定所述目标备用设备为所述目标控制设备的替换设备。
进一步的,每个共享存储设备还与至少一个空闲设备关联;
所述集群管理节点,还用于:
当确定所述目标备用设备为所述目标控制设备的替换设备时,在与所述目标共享存储设备关联的至少一个空闲设备中,随机确定目标空闲设备,以使所述目标空闲设备检测所述目标备用设备的运行状态。
进一步的,所述集群管理节点,还用于:
将节点信息列表中所述目标控制设备所属存储节点的节点标识对应的元数据信息,更新为所述目标备用设备的元数据信息,并将更新后的节点信息列表推送至所有的存储节点。
本发明实施例提供的技术方案带来的有益效果是:
本发明实施例中,目标共享存储设备关联的目标备用设备检测管理目标共享存储设备的目标控制设备的运行状态;如果目标控制设备发生故障,目标备用设备向目标共享存储设备发送管理请求,并向集群管理节点发送对于目标控制设备的替换请求;目标共享存储设备将目标备用设备设置为本地管理设备;集群管理节点确定目标备用设备为目标控制设备的替换设备。这样,存储节点可以将数据存储在共享存储设备中,当存储节点的控制设备发生故障时,与共享存储设备关联的备用设备可以作为控制设备的替换设备,代替控制设备继续提供服务,其它存储节点无需耗费设备处理资源来进行数据恢复处理,故而可以一定程度上保证分布式存储系统的存储服务质量。
附图说明
为了更清楚地说明本发明实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1是本发明实施例提供的一种处理设备故障的系统结构示意图;
图2是本发明实施例提供的一种处理设备故障的系统结构示意图;
图3是本发明实施例提供的一种处理设备故障的方法流程图。
具体实施方式
为使本发明的目的、技术方案和优点更加清楚,下面将结合附图对本发明实施方式作进一步地详细描述。
本发明实施例提供了一种处理设备故障的方法,该方法的执行主体为分布式存储系统,其可以部署于业务提供方的机房。如图1所示,该分布式存储系统包括由集群管理节点和多个存储节点组成的存储集群,每个存储节点包括一个共享存储设备,每个共享存储设备可以通过有线或无线的通信方式与一个控制设备、一个备用设备关联,其中,集群管理节点可以管理存储集群中的存储节点,如增加、删除存储集群中的存储节点,检测存储节点的服务状态等;存储节点的控制设备可以提供数据的存储与读取服务,其可以将数据存储在与其 关联的共享存储设备中;在控制设备故障后,备用设备可以作为控制设备的替换设备,代替控制设备继续提供数据的存储与读取服务。
在另一种情况下,如图2所示,单个控制设备可以利用自身的设备处理资源,构造出多个虚拟控制设备,每个虚拟控制设备均可以提供数据的存储与读取服务,这样,每个虚拟控制设备可以作为一个存储节点的控制设备,并与不同的共享存储设备进行关联。
下面将结合具体实施方式,对图3所示的一种处理设备故障的处理流程进行详细的说明,内容可以如下:
步骤301:目标共享存储设备关联的目标备用设备检测管理目标共享存储设备的目标控制设备的运行状态。
其中,目标控制设备的运行状态可以分为正常状态和故障状态。
在实施中,当目标控制设备处于正常状态时,其可以接收外部设备的数据存储或读取请求,在其管理的目标共享存储设备中存储或读取相应数据。与目标共享存储设备关联的目标备用设备,可以通过向目标控制设备周期性发送心跳查询请求,来检测目标控制设备的运行状态,如果在预设时间内,目标备用设备接收到目标控制设备的心跳应答,则可以记录目标控制设备为正常状态,否则记录目标控制设备为故障状态。
步骤302:如果目标控制设备发生故障,目标备用设备向目标共享存储设备发送管理请求,并向集群管理节点发送对于目标控制设备的替换请求。
在实施中,当目标控制设备发生了宕机、硬件损坏等故障时,目标备用设备可以检测到目标控制设备处于故障状态。之后,目标备用设备可以向目标共享存储设备发送管理请求,以接管目标控制设备对目标共享存储设备的管理功能,同时,目标备用设备还可以向集群管理节点发送对于目标控制设备的替换请求,以作为目标控制设备的替换设备,代替目标控制设备继续提供数据的存储与读取服务。
步骤303:目标共享存储设备将目标备用设备设置为本地管理设备。
在实施中,目标共享存储设备在接收到目标备用设备发送的管理请求后,可以取消目标控制设备的管理权限,并基于该管理请求将目标备用设备设置为目标共享存储设备的本地管理设备。
可选的,步骤303的具体处理过程可以如下:目标共享存储设备通过将共 享存储设备的归属者信息修改为目标备用设备的元数据信息,确定目标备用设备为共享存储设备的本地管理设备。
其中,目标备用设备向目标共享存储设备发送的管理请求中携带有目标备用设备的元数据信息。
在实施中,目标共享存储设备的本地管理设备可以由目标共享存储设备记录的归属者信息所决定,因此,可以通过修改目标共享存储设备记录的归属者信息,来更换目标共享存储设备的本地管理设备。上述归属者信息可以是设备的元数据信息,如设备标识、通信地址等,其中,设备标识可以是设备自身的唯一识别码,例如A2001,通信地址可以是IP(Internet Protocol,网络互连协议)地址,例如1.1.1.106。当检测到目标控制设备处于故障状态时,目标备用设备可以向目标共享存储设备发送携带有目标备用设备的元数据信息的管理请求。之后,目标共享存储设备可以接收到管理请求,并从管理请求中获取目标备用设备的元数据信息。之后,目标共享存储设备可以将本地记录的归属者信息修改为目标备用设备的元数据信息,从而可以将目标备用设备确定为共享存储设备的本地管理设备。
步骤304:集群管理节点确定目标备用设备为目标控制设备的替换设备。
在实施中,在接收到目标备用设备发送的对于目标控制设备的替换请求后,集群管理节点可以确定目标备用设备为目标控制设备的替换设备。进一步的,集群管理节点还可以向目标控制设备发送心跳查询请求,来检测目标控制设备的运行状态。如果检测到目标控制设备处于故障状态,则集群管理节点可以执行上述替换请求,确定目标备用设备作为目标控制设备的替换设备,否则拒绝上述替换请求。这样,集群管理节点可以识别错误的替换请求,保证目标控制设备的正常运行。
可选的,步骤304的具体处理过程可以如下:集群管理节点通过将目标控制设备所属存储节点的节点标识对应的元数据信息修改为目标备用设备的元数据信息,确定目标备用设备为目标控制设备的替换设备。
其中,目标备用设备向集群管理节点发送的对于目标控制设备的替换请求中,携带有目标控制设备所属存储节点的节点标识和目标备用设备的元数据信息。
在实施中,目标备用设备在接管目标共享存储设备后,可以从目标共享存 储设备中获取到目标控制设备所属存储节点的节点标识,之后,目标备用设备可以生成携带有目标控制设备所属存储节点的节点标识和目标备用设备的元数据信息的替换请求,并将该替换请求发送至集群管理节点。集群管理节点在接收到目标备用设备发送的替换请求后,可以将目标控制设备所属存储节点的节点标识对应的元数据信息,修改为目标备用设备的元数据信息,则集群管理节点可以确定目标备用设备为目标控制设备的替换设备。
可选的,集群管理节点还可以在备用设备管理共享存储设备时,确定一个空闲设备,作为该备用设备的备用设备,则相应的处理可以如下:当确定目标备用设备为目标控制设备的替换设备时,集群管理节点在与目标共享存储设备关联的至少一个空闲设备中,随机确定目标空闲设备,以使目标空闲设备检测目标备用设备的运行状态。
在实施中,除了控制设备和备用设备,每个共享存储设备还可以与至少一个空闲设备关联。这样,当集群管理节点确定目标备用设备为目标控制设备的替换设备之后,为了应对目标备用设备发生故障的情况,集群管理节点可以在与目标共享存储设备关联的至少一个空闲设备中,随机确定一个目标空闲设备为目标备用设备的备用设备,该目标空闲设备可以检测目标备用设备的运行状态,如果目标备用设备发生故障,该目标空闲设备的后续处理可以参考上述目标备用设备的处理过程,在此不再赘述。
可选的,集群管理节点可以只对在预设时间内接收到的替换请求进行响应,相应的处理可以如下:集群管理节点检测所有控制设备的运行状态;如果在检测到目标控制设备为故障状态之后的预设时间内,集群管理节点接收到替换请求,则确定目标备用设备作为目标控制设备的替换设备,否则在与目标共享存储设备关联的至少一个空闲设备中,重新选取一个空闲设备作为目标备用设备。
在实施中,集群管理节点可以周期性向存储节点的控制设备发送心跳查询请求,以检测存储节点的服务状态。当集群管理节点检测到目标控制设备处于故障状态时,集群管理节点可以开始计时,如果在预设时间内,例如2秒,集群管理节点接收到对于目标控制设备的替换请求,则执行该替换请求,确定目标备用设备作为目标控制设备的替换设备;如果在预设时间内,集群管理节点未接收到对于目标控制设备的替换请求,则集群管理节点可以在与目标共享存储设备关联的至少一个空闲设备中,重新选取一个空闲设备作为目标备用设备。 之后,新选取的目标备用设备可以按照上述流程,向目标共享存储设备发送管理请求,并向集群管理节点发送对于目标控制设备的替换请求。后续处理可参照前述处理,在此不再赘述。
可选的,在目标备用设备被确定为目标控制设备的替换设备后,集群管理节点还可以对节点信息列表进行更新,并进行推送,相应的处理可以如下:集群管理节点将节点信息列表中目标控制设备所属存储节点的节点标识对应的元数据信息,更新为目标备用设备的元数据信息,并将更新后的节点信息列表推送至所有的存储节点。
在实施中,集群管理节点中可以维护有节点信息列表,节点信息列表中记录有存储集群内所有存储节点的节点标识、元数据信息和服务状态,其中,节点标识为可以唯一确定存储节点的标识信息;元数据信息可以为控制设备提供数据存储和读取的访问地址,如IP地址;服务状态可以为控制设备的运行状态。集群管理节点可以将本地维护的节点信息列表推送给所有的存储节点,以使存储节点通过节点信息列表,获取各存储节点当前的元数据信息,进而存储节点可以通过各存储节点当前的元数据信息,与各存储节点进行数据的存储和读取。进一步的,在确定目标备用设备为目标控制设备的替换设备后,集群管理节点可以将节点信息列表中目标控制设备所属存储节点的节点标识对应的元数据信息,更新为目标备用设备的元数据信息,同时,集群管理节点可以将更新后的节点信息列表推送给所有的存储节点,以使所有的存储节点可以及时获取到上述更新后的元数据信息。
本发明实施例中,目标共享存储设备关联的目标备用设备检测管理目标共享存储设备的目标控制设备的运行状态;如果目标控制设备发生故障,目标备用设备向目标共享存储设备发送管理请求,并向集群管理节点发送对于目标控制设备的替换请求;目标共享存储设备将目标备用设备设置为本地管理设备;集群管理节点确定目标备用设备为目标控制设备的替换设备。这样,存储节点可以将数据存储在共享存储设备中,当存储节点的控制设备发生故障时,与共享存储设备关联的备用设备可以作为控制设备的替换设备,代替控制设备继续提供服务,其它存储节点无需耗费设备处理资源来进行数据恢复处理,故而可以一定程度上保证分布式存储系统的存储服务质量。
基于相同的技术构思,本发明实施例还提供了一种处理设备故障的系统,如图1或图2所示,所述系统为分布式存储系统,所述分布式存储系统包括集群管理节点和多个存储节点,每个存储节点包括一个共享存储设备,每个共享存储设备与一个控制设备、一个备用设备关联,其中:
目标备用设备,用于检测管理目标共享存储设备的目标控制设备的运行状态,其中,所述目标备用设备与所述目标共享存储设备关联;
所述目标备用设备,还用于如果所述目标控制设备发生故障,向所述目标共享存储设备发送管理请求,并向所述集群管理节点发送对于所述目标控制设备的替换请求;
所述目标共享存储设备,用于将所述目标备用设备设置为本地管理设备;
所述集群管理节点,用于确定所述目标备用设备为所述目标控制设备的替换设备。
可选的,所述管理请求中携带有所述目标备用设备的元数据信息;
所述目标共享存储设备,还用于通过将所述目标共享存储设备的归属者信息修改为所述目标备用设备的元数据信息,确定所述目标备用设备为所述共享存储设备的本地管理设备。
可选的,所述替换请求中携带有所述目标控制设备所属存储节点的节点标识和所述目标备用设备的元数据信息;
所述集群管理节点,还用于通过将所述目标控制设备所属存储节点的节点标识对应的元数据信息修改为所述目标备用设备的元数据信息,确定所述目标备用设备为所述目标控制设备的替换设备。
可选的,每个共享存储设备还与至少一个空闲设备关联;
所述集群管理节点,还用于:当确定所述目标备用设备为所述目标控制设备的替换设备时,在与所述目标共享存储设备关联的至少一个空闲设备中,随机确定目标空闲设备,以使所述目标空闲设备检测所述目标备用设备的运行状态。
可选的,所述集群管理节点,还用于:将节点信息列表中所目标控制设备所属存储节点的节点标识对应的元数据信息,更新为所述目标备用设备的元数据信息,并将更新后的节点信息列表推送至集群内部的所有存储节点。
本发明实施例中,目标共享存储设备关联的目标备用设备检测管理目标共 享存储设备的目标控制设备的运行状态;如果目标控制设备发生故障,目标备用设备向目标共享存储设备发送管理请求,并向集群管理节点发送对于目标控制设备的替换请求;目标共享存储设备将目标备用设备设置为本地管理设备;集群管理节点确定目标备用设备为目标控制设备的替换设备。这样,存储节点可以将数据存储在共享存储设备中,当存储节点的控制设备发生故障时,与共享存储设备关联的备用设备可以作为控制设备的替换设备,代替控制设备继续提供服务,其它存储节点无需耗费设备处理资源来进行数据恢复处理,故而可以一定程度上保证分布式存储系统的存储服务质量。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到各实施方式可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件。基于这样的理解,上述技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品可以存储在计算机可读存储介质中,如ROM/RAM、磁碟、光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务端,或者网络设备等)执行各个实施例或者实施例的某些部分所述的方法。
以上所述仅为本发明的较佳实施例,并不用以限制本发明,凡在本发明的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。

Claims (10)

  1. 一种处理设备故障的方法,其特征在于,所述方法应用于分布式存储系统,所述分布式存储系统包括集群管理节点和多个存储节点,每个存储节点包括一个共享存储设备,每个共享存储设备与一个控制设备、一个备用设备关联,所述方法包括:
    目标共享存储设备关联的目标备用设备检测管理所述目标共享存储设备的目标控制设备的运行状态;
    如果所述目标控制设备发生故障,所述目标备用设备向所述目标共享存储设备发送管理请求,并向所述集群管理节点发送对于所述目标控制设备的替换请求;
    所述目标共享存储设备将所述目标备用设备设置为本地管理设备;
    所述集群管理节点确定所述目标备用设备为所述目标控制设备的替换设备。
  2. 根据权利要求1所述的方法,其特征在于,所述管理请求中携带有所述目标备用设备的元数据信息;
    所述目标共享存储设备将所述目标备用设备设置为本地管理设备,包括:
    所述目标共享存储设备通过将所述目标共享存储设备的归属者信息修改为所述目标备用设备的元数据信息,确定所述目标备用设备为所述共享存储设备的本地管理设备。
  3. 根据权利要求1所述的方法,其特征在于,所述替换请求中携带有目标控制设备所属存储节点的节点标识和所述目标备用设备的元数据信息;
    所述集群管理节点确定所述目标备用设备为所述目标控制设备的替换设备,包括:
    所述集群管理节点通过将所述目标控制设备所属存储节点的节点标识对应的元数据信息修改为所述目标备用设备的元数据信息,确定所述目标备用设备为所述目标控制设备的替换设备。
  4. 根据权利要求1所述的方法,其特征在于,每个共享存储设备还与至少 一个空闲设备关联;
    所述方法还包括:
    当确定所述目标备用设备为所述目标控制设备的替换设备时,所述集群管理节点在与所述目标共享存储设备关联的至少一个空闲设备中,随机确定目标空闲设备,以使所述目标空闲设备检测所述目标备用设备的运行状态。
  5. 根据权利要求3所述的方法,其特征在于,所述集群管理节点确定所述目标备用设备作为所述目标控制设备的替换设备之后,还包括:
    所述集群管理节点将节点信息列表中所述目标控制设备所属存储节点的节点标识对应的元数据信息,更新为所述目标备用设备的元数据信息,并将更新后的节点信息列表推送至集群内部的所有存储节点。
  6. 一种处理设备故障的系统,其特征在于,所述系统为分布式存储系统,所述分布式存储系统包括集群管理节点和多个存储节点,每个存储节点包括一个共享存储设备,每个共享存储设备与一个控制设备、一个备用设备关联,其中:
    目标备用设备,用于检测管理目标共享存储设备的目标控制设备的运行状态,其中,所述目标备用设备与所述目标共享存储设备关联;
    所述目标备用设备,还用于如果所述目标控制设备发生故障,向所述目标共享存储设备发送管理请求,并向所述集群管理节点发送对于所述目标控制设备的替换请求;
    所述目标共享存储设备,用于将所述目标备用设备设置为本地管理设备;
    所述集群管理节点,用于确定所述目标备用设备为所述目标控制设备的替换设备。
  7. 根据权利要求6所述的系统,其特征在于,所述管理请求中携带有所述目标备用设备的元数据信息;
    所述目标共享存储设备,还用于通过将所述目标共享存储设备的归属者信息修改为所述目标备用设备的元数据信息,确定所述目标备用设备为所述共享存储设备的本地管理设备。
  8. 根据权利要求6所述的系统,其特征在于,所述替换请求中携带有所述目标控制设备所属存储节点的节点标识和所述目标备用设备的元数据信息;
    所述集群管理节点,还用于通过将所述目标控制设备所属存储节点的节点标识对应的元数据信息修改为所述目标备用设备的元数据信息,确定所述目标备用设备为所述目标控制设备的替换设备。
  9. 根据权利要求6所述的系统,其特征在于,每个共享存储设备还与至少一个空闲设备关联;
    所述集群管理节点,还用于:
    当确定所述目标备用设备为所述目标控制设备的替换设备时,在与所述目标共享存储设备关联的至少一个空闲设备中,随机确定目标空闲设备,以使所述目标空闲设备检测所述目标备用设备的运行状态。
  10. 根据权利要求8所述的系统,其特征在于,所述集群管理节点,还用于:
    将节点信息列表中所述目标控制设备所属存储节点的节点标识对应的元数据信息,更新为所述目标备用设备的元数据信息,并将更新后的节点信息列表推送至集群内部的所有存储节点。
PCT/CN2018/081562 2018-03-19 2018-04-02 一种处理设备故障的方法和系统 WO2019178891A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP18893326.1A EP3570169B1 (en) 2018-03-19 2018-04-02 Method and system for processing device failure
US16/340,241 US20210326224A1 (en) 2018-03-19 2018-04-02 Method and system for processing device failure

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810226159.3 2018-03-19
CN201810226159.3A CN108509296B (zh) 2018-03-19 2018-03-19 一种处理设备故障的方法和系统

Publications (1)

Publication Number Publication Date
WO2019178891A1 true WO2019178891A1 (zh) 2019-09-26

Family

ID=63375944

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/081562 WO2019178891A1 (zh) 2018-03-19 2018-04-02 一种处理设备故障的方法和系统

Country Status (4)

Country Link
US (1) US20210326224A1 (zh)
EP (1) EP3570169B1 (zh)
CN (1) CN108509296B (zh)
WO (1) WO2019178891A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111638995A (zh) * 2020-05-08 2020-09-08 杭州海康威视系统技术有限公司 元数据备份方法、装置及设备、存储介质

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110445871A (zh) * 2019-08-14 2019-11-12 益逻触控系统公司 自助服务终端的操作方法及自助服务终端
CN112835915B (zh) * 2019-11-25 2023-07-18 中国移动通信集团辽宁有限公司 Mpp数据库系统、数据存储方法及数据查询方法

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105117171A (zh) * 2015-08-28 2015-12-02 南京国电南自美卓控制系统有限公司 一种能源scada海量数据分布式处理系统及其方法
CN105991325A (zh) * 2015-02-10 2016-10-05 华为技术有限公司 处理至少一个分布式集群中的故障的方法、设备和系统
US20170262346A1 (en) * 2016-03-09 2017-09-14 Commvault Systems, Inc. Data management and backup of distributed storage environment
US20170371542A1 (en) * 2016-06-25 2017-12-28 International Business Machines Corporation Method and system for reducing memory device input/output operations
CN107547653A (zh) * 2017-09-11 2018-01-05 华北水利水电大学 一种分布式文件存储系统

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9607001B2 (en) * 2012-07-13 2017-03-28 Facebook, Inc. Automated failover of a metadata node in a distributed file system
JP6122126B2 (ja) * 2013-08-27 2017-04-26 株式会社東芝 データベースシステム、プログラムおよびデータ処理方法
US9483367B1 (en) * 2014-06-27 2016-11-01 Veritas Technologies Llc Data recovery in distributed storage environments

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105991325A (zh) * 2015-02-10 2016-10-05 华为技术有限公司 处理至少一个分布式集群中的故障的方法、设备和系统
CN105117171A (zh) * 2015-08-28 2015-12-02 南京国电南自美卓控制系统有限公司 一种能源scada海量数据分布式处理系统及其方法
US20170262346A1 (en) * 2016-03-09 2017-09-14 Commvault Systems, Inc. Data management and backup of distributed storage environment
US20170371542A1 (en) * 2016-06-25 2017-12-28 International Business Machines Corporation Method and system for reducing memory device input/output operations
CN107547653A (zh) * 2017-09-11 2018-01-05 华北水利水电大学 一种分布式文件存储系统

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111638995A (zh) * 2020-05-08 2020-09-08 杭州海康威视系统技术有限公司 元数据备份方法、装置及设备、存储介质

Also Published As

Publication number Publication date
US20210326224A1 (en) 2021-10-21
EP3570169B1 (en) 2021-02-24
EP3570169A4 (en) 2020-03-04
EP3570169A8 (en) 2020-04-29
EP3570169A1 (en) 2019-11-20
CN108509296A (zh) 2018-09-07
CN108509296B (zh) 2021-02-02

Similar Documents

Publication Publication Date Title
CN106911524B (zh) 一种ha实现方法及装置
US7539150B2 (en) Node discovery and communications in a network
CN106982236B (zh) 一种信息处理方法、装置和系统
US20210176310A1 (en) Data synchronization method and system
JP5187249B2 (ja) 冗長化システム用コネクションリカバリ装置,方法および処理プログラム
WO2016070375A1 (zh) 一种分布式存储复制系统和方法
CN109005045B (zh) 主备服务系统及主节点故障恢复方法
JP2022502926A (ja) Ue移行方法、装置、システム、および記憶媒体
US20230041089A1 (en) State management methods, methods for switching between master application server and backup application server, and electronic devices
US8775859B2 (en) Method, apparatus and system for data disaster tolerance
US20180300210A1 (en) Method for Processing Acquire Lock Request and Server
WO2019178891A1 (zh) 一种处理设备故障的方法和系统
US20140059315A1 (en) Computer system, data management method and data management program
KR20140093720A (ko) 클라우드 내의 메시징을 위한 방법 및 장치
WO2023082800A1 (zh) 主节点选择方法、分布式数据库及存储介质
WO2022048646A1 (zh) 虚拟ip管理方法、装置、电子设备和存储介质
CN112367182B (zh) 容灾主备用设备的配置方法及装置
CN113347037A (zh) 一种数据中心访问方法及装置
CN109189854B (zh) 提供持续业务的方法及节点设备
CN111342986B (zh) 分布式节点管理方法及装置、分布式系统、存储介质
CN112214377B (zh) 一种设备管理方法及系统
WO2019080386A1 (zh) 一种管理网络服务的方法和系统
WO2017080362A1 (zh) 数据管理方法及装置
US20140047108A1 (en) Self organizing network event reporting
WO2022242426A1 (zh) 会话绑定关系处理方法、装置、电子设备和可读介质

Legal Events

Date Code Title Description
ENP Entry into the national phase

Ref document number: 2018893326

Country of ref document: EP

Effective date: 20190702

NENP Non-entry into the national phase

Ref country code: DE