WO2021249173A1 - 一种分布式存储系统及其异常处理方法和相关装置 - Google Patents

一种分布式存储系统及其异常处理方法和相关装置 Download PDF

Info

Publication number
WO2021249173A1
WO2021249173A1 PCT/CN2021/095586 CN2021095586W WO2021249173A1 WO 2021249173 A1 WO2021249173 A1 WO 2021249173A1 CN 2021095586 W CN2021095586 W CN 2021095586W WO 2021249173 A1 WO2021249173 A1 WO 2021249173A1
Authority
WO
WIPO (PCT)
Prior art keywords
node
switch
notification message
identifier
attribute
Prior art date
Application number
PCT/CN2021/095586
Other languages
English (en)
French (fr)
Inventor
冀智刚
刘宁
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to EP21822660.3A priority Critical patent/EP4148549A4/en
Publication of WO2021249173A1 publication Critical patent/WO2021249173A1/zh
Priority to US18/064,752 priority patent/US20230106077A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/32Monitoring with visual or acoustical indication of the functioning of the machine
    • G06F11/324Display of status information
    • G06F11/328Computer systems status display
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0614Improving the reliability of storage systems
    • G06F3/0617Improving the reliability of storage systems in relation to availability
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0805Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters by checking availability
    • H04L43/0817Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters by checking availability by checking functioning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0706Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment
    • G06F11/0709Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment in a distributed system consisting of a plurality of standalone computer nodes, e.g. clusters, client-server systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0751Error or fault detection not based on redundancy
    • G06F11/0754Error or fault detection not based on redundancy by exceeding limits
    • G06F11/076Error or fault detection not based on redundancy by exceeding limits by exceeding a count or rate limit, e.g. word- or bit count limit
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0766Error or fault reporting or storing
    • G06F11/0772Means for error signaling, e.g. using interrupts, exception flags, dedicated error registers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0793Remedial or corrective actions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/2053Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where persistent mass storage functionality or persistent mass storage control functionality is redundant
    • G06F11/2089Redundant storage control functionality
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3034Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is a storage system, e.g. DASD based or network based
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0629Configuration or reconfiguration of storage systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0654Management of faults, events, alarms or notifications using network fault recovery
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0751Error or fault detection not based on redundancy
    • G06F11/0754Error or fault detection not based on redundancy by exceeding limits
    • G06F11/0757Error or fault detection not based on redundancy by exceeding limits by exceeding a time limit, i.e. time-out, e.g. watchdogs
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/10Active monitoring, e.g. heartbeat, ping or trace-route

Definitions

  • This application relates to the field of storage technology, and in particular to a distributed storage system, an exception handling method and related devices applied in the distributed storage system.
  • Distributed storage is a data storage technology that stores data in different locations on storage nodes. These storage nodes in different locations are interconnected through network equipment to transmit data and exchange information. The node that accesses and uses the data in the storage node is called a computing node. Storage nodes, network equipment, and computing nodes form a distributed storage system. The distributed storage system may also include control nodes to manage storage nodes and computing nodes.
  • Distributed storage systems have high requirements for the reliability of distributed storage systems. Abnormal events such as congestion, link failures, and packet loss will all affect the reliability of distributed storage systems. How to make nodes quickly discover and deal with abnormalities in distributed storage systems has become a problem that needs to be solved in this field.
  • This application provides a distributed storage system, an abnormal handling method and related devices applied in the distributed storage system, so that nodes in the distributed storage system can quickly find and handle network abnormalities, and improve the performance of the distributed storage system. Anomaly detection efficiency, thereby improving the reliability of the distributed storage system.
  • the first aspect of the present application provides a distributed storage system.
  • the distributed storage system includes a first switch and a plurality of nodes.
  • the plurality of nodes includes a first node and a second node; the first node is connected to the second node.
  • a switch; the first node is a storage node, a computing node, or a control node.
  • the first switch is configured to detect the state of the first node, and when the state of the first node meets a preset condition, send a first notification message to the second node, where the first notification message includes The identifier of the first node and the status information of the first node.
  • the second node is configured to receive the first notification message, and perform processing operations according to the first notification message.
  • the first node and the second node are distinguished according to functions. There can be one or more first nodes, and one or more second nodes.
  • the first switch detects the state of the node connected to the first switch, and when the state of the first node meets a preset condition, the state of the first node is sent to the second node through the first notification message. Prevent the first node and the second node from sending keep-alive messages to each other, thereby reducing the number of keep-alive messages in the network, saving network bandwidth, reducing the time for the second node to learn the status of the first node, and enhancing the reliability of the distributed storage system .
  • the state of the first node that satisfies a preset condition includes that the value of the operating parameter of the first node is equal to or greater than a set threshold, and the state information of the first node includes the operation of the first node The value of the parameter; or the state of the first node that satisfies a preset condition includes that the first node is locally unreachable, and the state information of the first node includes an identifier indicating that the first node is locally unreachable.
  • the first switch not only detects whether the first node is locally unreachable, but also can detect the operating parameters of the first node. In this way, the first switch can send the different states of the first node to the second node according to the needs of the second node, making the processing of the second node more timely, reducing the impact of the state change of the first node and avoiding service interruption .
  • the local unreachability of the first node includes: the first switch does not receive a keep-alive message from the first node within a set time; or the first switch is connected to the first node The port is invalid.
  • the first switch can detect the state of the first node in multiple ways to ensure that the detected state of the first node is timely and accurate, which improves the reliability of the distributed storage system.
  • the first switch when sending the first notification message to the second node, is configured to: send the first notification message to the second node according to the query message sent by the second node; Or according to a set condition, actively send the first notification message to the second node.
  • the second node may be a node that has subscribed to the state information of the first node, or all nodes in the network.
  • the first switch uses multiple methods to send the first notification message to the second node, which improves the flexibility of the distributed storage system.
  • the distributed storage system further includes a second switch and a third node connected to the second switch; the first switch is also used to delete or invalidate routing information at the third node When sending a second notification message to the fourth node, the fourth node accesses the first switch, and the second notification message is used to notify the fourth node that the third node is remotely unreachable.
  • the first switch also detects whether the third node connected to the second switch is reachable, and sends a notification message to the fourth node of the first switch when the third node is unreachable, so that the fourth node executes the corresponding
  • the processing of this can prevent the fourth node from communicating with the third node when the third node is unreachable, which improves the reliability of the distributed storage system.
  • the distributed storage system further includes a second switch; the first switch is further configured to send a second notification message to the fourth node after the communication with the second switch is interrupted.
  • the node accesses the first switch, and the second notification message is used to notify the fourth node that the communication of the second switch is interrupted or that all third nodes connected to the second switch are remotely unreachable.
  • the second switch also detects whether the communication with the second switch is interrupted, and when the communication between the first switch and the second switch is interrupted, the second notification message is sent to the fourth node, so that the fourth node
  • the communication between the first switch and the second switch is interrupted, communication with the third node connected to the second switch is avoided, which improves the reliability of the distributed storage system.
  • the second aspect of the present application provides an exception handling method, which is executed by the first switch in the distributed storage system of the first aspect.
  • the first switch detects the state of the first node, the first switch is a switch of the first node, and the first node is a storage node, a computing node, or a control node.
  • the first switch sends a first notification message to a second node, and the second node is in the distributed storage system other than the first node ,
  • the first notification message includes the identifier of the first node and the status information of the first node.
  • the state of the first node that satisfies a preset condition includes that the value of the operating parameter of the first node is equal to or greater than a set threshold, and the state information of the first node includes the operation of the first node The value of the parameter; or, that the state of the first node satisfies a preset condition includes that the first node is locally unreachable, and the state information of the first node includes an identifier indicating that the first node is locally unreachable.
  • the local unreachability of the first node includes: the first switch does not receive a keep-alive message from the first node within a set time; or, the first switch is connected to the first node The port of the node is invalid.
  • the sending the first notification message to the second node includes: sending the first notification message to the second node according to the query message sent by the second node; or, the first switch according to the setting A predetermined condition, actively send the first notification message to the second node.
  • the distributed storage system further includes a second switch and a third node connected to the second switch, and when the routing information of the third node is deleted or invalidated, the first switch sends The fourth node sends a second notification message, the fourth node accesses the first switch, and the second notification message is used to notify the fourth node that the third node is remotely unreachable.
  • the distributed storage system further includes a second switch, and after the communication between the first switch and the second switch is interrupted, a second notification message is sent to the fourth node, and the fourth node accesses For the first switch, the second notification message is used to notify the fourth node that the third node connected to the second switch is remotely unreachable.
  • the third aspect of the present application provides an exception handling method, which is applied to the second node of the distributed storage system of the first aspect, the distributed storage system further includes a first node, and the first node is connected to the second node.
  • the first node is a storage node, a computing node, or a control node.
  • the second node receives a first notification message from the first switch, where the first notification message is generated when the first switch determines that the state of the first node satisfies a preset condition, and the first notification message is A notification message includes the identifier of the first node and the status information of the first node.
  • the second node performs processing operations according to the first notification message.
  • the status information of the first node includes an identifier indicating that the first node is locally unreachable
  • the execution of the processing operation by the second node according to the first notification message includes: the second node from the The identification of the first node is obtained in the first notification message; the second node determines that the first node is locally unreachable according to the status information of the first node; the second node determines that the first node is locally unreachable according to the status information of the first node
  • the identifier determines the type and attribute of the first node; the second node performs a fault handling operation according to the type and attribute of the first node and the type and attribute of the second node.
  • the state information of the first node includes the value of the operating parameter of the first node
  • the execution of the processing operation by the second node according to the first notification message includes: Obtain the identification of the first node and the value of the operating parameter of the first node in a notification message; when the value of the operating parameter is greater than or equal to the alarm threshold of the operating parameter and less than the failure threshold of the operating parameter ,
  • the second node sends an alarm message, the alarm message includes the identification of the first node and the value of the operating parameter of the first node; when the value of the operating parameter is greater than the failure threshold of the operating parameter ,
  • the second node determines the type and attribute of the first node according to the identifier of the first node; the second node determines the type and attribute of the first node according to the type and attribute of the first node, and the type and attribute of the second node Perform troubleshooting operations.
  • the distributed storage system further includes a second switch, and the second node also receives a second notification message, where the second notification message is used to notify the second node that the communication of the second switch is interrupted Or all third nodes connected to the second switch are remotely unreachable; the second node performs processing operations according to the second notification message.
  • the second notification message includes an identifier of the third node and an identifier indicating that the third node is remotely unreachable
  • the processing operation performed by the second node according to the second notification message includes: The second node obtains the identity of the third node from the second notification message; the second node determines that the first node is remotely unreachable according to the status information of the third node; the second node The node determines the type and attribute of the third node according to the identifier of the third node; the second node performs a fault handling operation according to the type and attribute of the third node and the type and attribute of the second node.
  • the second node can directly determine the unreachable third node according to the identifier of the third node in the second notification message, and avoid communication with the third node, which can improve the reliability of the distributed storage system.
  • the second notification message includes an identifier or a subnet prefix of the second switch
  • performing processing operations by the second node according to the second notification message includes: The identification or subnet prefix of the second switch is acquired in the notification message; the second node determines the identification of the third node that matches the identification or subnet prefix of the second switch; the second node is based on the first The identification of the three nodes determines the type and attribute of the third node; the second node performs fault handling operations according to the type and attribute of the third node and the type and attribute of the second node.
  • the first switch only needs to send a second notification message including the identification or subnet prefix of the second switch to the second node, and the second node can perform operations to avoid communicating with all third nodes of the second switch , Improve processing efficiency, save network bandwidth, and improve network reliability.
  • the second notification message includes an identifier or a subnet prefix of the second switch
  • performing processing operations by the second node according to the second notification message includes: The identification or subnet prefix of the second switch is acquired in the notification message; the second node stores the identification or subnet prefix of the second switch; when the second node needs to access a new node, the first node
  • the two nodes compare the identity of the new node with the identity or subnet prefix of the second switch; when the identity of the new node matches the identity or subnet prefix of the second switch, the second The node determines the type and attribute of the new node according to the identifier of the new node; the second node performs a fault handling operation according to the type and attribute of the new node and the type and attribute of the second node.
  • the second node does not need to perform the corresponding processing when receiving the identification or subnet prefix of the second switch, but only needs to determine whether the new node is the one that needs to avoid communication when visiting a new node.
  • the third node improves the flexibility of implementation and saves the processing resources of the second node.
  • the fourth aspect of the present application provides a switch.
  • the switch includes a functional module that executes the exception handling method provided by any possible design of the second aspect or the second aspect; the division of the functional modules is not limited in this application, and the functions can be divided according to the process steps of the exception handling method of the first aspect Modules can also be divided into functional modules according to specific implementation needs.
  • the fifth aspect of the present application provides a node.
  • This node includes functional modules that execute the exception handling method provided by the third aspect or any possible design of the third aspect; this application does not limit the division of functional modules, and can be divided according to the process steps of the third aspect of the exception handling method Functional modules can also be divided into functional modules according to specific implementation needs.
  • the sixth aspect of the present application provides a host.
  • the node running on the host includes a memory, a processor, and a communication interface.
  • the memory is used to store computer program code and data.
  • the processor is used to call the computer program code and combine the data to enable the node to implement the third aspect of the present application. And any possible design exception handling methods.
  • the seventh aspect of the present application provides a chip.
  • the chip When the chip is running, it can implement the second aspect of the present application and any possible design exception handling methods, as well as the third aspect of the present application and any possible related design methods. Exception handling method.
  • the eighth aspect of the present application provides a storage medium in which program code is stored.
  • the device switch, server, terminal device, etc.
  • the program code can implement the second aspect of the present application.
  • FIG. 1 is a schematic structural diagram of a distributed storage system provided by an embodiment of this application.
  • FIG. 2 is a schematic diagram of a fault location of a distributed storage system provided by an embodiment of the application
  • FIG. 3 is a schematic flowchart of an exception handling method provided by an embodiment of the application.
  • Figure 4 is a schematic diagram of a subscription table provided by an embodiment of the application.
  • FIG. 5 is a schematic structural diagram of an access device 500 provided by an embodiment of this application.
  • FIG. 6 is a schematic structural diagram of a terminal device 600 provided by an embodiment of this application.
  • words such as “exemplary” or “for example” are used as examples, illustrations, or illustrations. Any embodiment or design solution described as “exemplary” or “for example” in the embodiments of the present application should not be construed as being more preferable or advantageous than other embodiments or design solutions. To be precise, words such as “exemplary” or “for example” are used to present related concepts in a specific manner.
  • the meaning of “plurality” means two or more.
  • multiple nodes refer to two or more nodes.
  • “At least one” refers to any number, for example, one, two or more.
  • “A and/or B” can be only A, only B, or include A and B.
  • “At least one of A, B, and C” may include only A, only B, only C, or include A and B, include B and C, include A and C, or include A, B, and C.
  • the terms “first” and “second” in this application are only used to distinguish different objects, but not to indicate the priority or importance of the objects.
  • the embodiments of the present application are used to enable nodes in a distributed storage system to quickly perceive and deal with abnormalities in the distributed storage system.
  • the abnormalities include congestion, packet loss, excessive CPU usage, excessive delay and other abnormal network operation. And network failures such as port failure and service interruption.
  • FIG. 1 is a schematic structural diagram of a distributed storage system provided by an embodiment of the application.
  • the distributed storage system of the present application includes an access device 20, a computing node 50, and a storage node 40.
  • the storage node 40 and the computing node 50 may be distributed on one or more hosts 10. Multiple distributed storage systems can form a larger system.
  • the distributed storage system 100a includes a computing node 50a deployed on a host 10a, storage nodes 40a-40c deployed on the host 10b, and an access device 20a connected to the hosts 10a and 10b.
  • the distributed storage system may also include a control node 60 for managing the storage node 40 and the computing node 50.
  • the distributed storage system 100b includes not only computing nodes and storage nodes deployed on the hosts 10c and 10d, and an access device 20b, but also a control node 60a deployed on the host 10d.
  • the distributed storage systems 100a, 100b, and 100c are communicatively connected through backbone devices 30a and 30b to form a larger distributed storage system 1000.
  • the storage node 40, the computing node 50, and the control node 60 may be collectively referred to as nodes, and the host 10 is the carrier of the node, and one host can carry one or more nodes.
  • the multiple nodes that can be carried by a host may include one or more of the computing node 50, the storage node 40, or the control node 60.
  • the host 10 may be a physical server, a virtual server, a workstation, a mobile station, a general-purpose computer, and other devices that can carry nodes.
  • the storage node 50 is used to store data, and the data can be any digital information, for example, information generated when a user uses a network application, files stored by the user, network configuration, and so on.
  • the computing node 50 is used to obtain data from the storage node 40 and process services based on the obtained data.
  • the access device 20 is used to connect to different hosts and forward data sent between nodes on the hosts.
  • the backbone device 30 is used to connect different access devices to expand the scale of the distributed storage system.
  • the access device 20 may be a layer 2 switch or a layer 3 switch
  • the backbone device 30 may be a layer 3 switch or a router.
  • a spare node can be deployed for one node.
  • both computing nodes 50a and 50b can provide computing function A
  • computing node 50a can be used as the main computing node of computing function A
  • computing node 50b can be used as the backup computing node of computing function A.
  • the computing node 40a can be used as the primary storage node for data B, and the computing node 40g can be used as the backup storage node for data B.
  • the control node 60a is the active control node, and the control node 60b is the standby control node.
  • the distributed storage system may not include the control node. When the distributed storage system does not include the control node, the storage node 40 and the computing node 50 may be managed by the network management system.
  • the distributed storage system may fail during operation. Take the process of the computing node 50a reading data from the storage node 40g as an example.
  • the location (indicated by X) where the distributed storage system may fail is shown in Figure 1 2 shown. That is, a failure may occur between any two devices, and the failure may be a port failure of one of the two devices, or a link failure.
  • nodes in a distributed storage system send keep-alive messages to each other to detect failures, so as to avoid sending requests to nodes that have lost connections. However, it can be seen from Figure 2 that the path between two nodes may be very long.
  • the distributed storage system shown in FIG. 1 is abstracted as: including a first access device and a plurality of nodes, the plurality of nodes including a first-type node and a second-type node, The first type node accesses the access device (that is, the first access device is the access device of the first type node). Further, the distributed storage system further includes a second access device, a third type of node, and a fourth type of node.
  • the third type of node is connected to the second access device
  • the fourth type of node is connected to the first access device.
  • the first type of node, the second type of node, the third type of node, and the fourth type of node in this application are distinguished based on functions, and do not refer to a specific node.
  • the first type of node in a scene can be The second type of node in another scene.
  • Any type of node can include one or more nodes.
  • the first type of node, the second type of node, the third type of node, and the fourth type of node are referred to as the first node, the second node, the third node, and the fourth node, respectively.
  • the exception handling method provided by the embodiment of the present application includes steps S300-S355. Steps S300-S355 can all be deleted according to the needs of the scene. That is, the fault implementation method provided by the embodiment of the present application does not require all the processes in steps S300-S355 to be performed.
  • step S300 a connection between the node and the corresponding access device is established.
  • each node accesses the network, it needs to establish a connection with the corresponding access device.
  • Figure 3 takes the first node to connect to the first access device and the third node to access the second access device as an example for illustration.
  • the first node can be a storage node, a computing node or a control node
  • the second node can be a storage node, a computing node or a control node that subscribes to the status information of the first node, or any node other than the first node .
  • the follow-up will be combined with specific scenarios.
  • the first access device may be any one of the access devices 20a, 20b, and 20c in FIG. 1, and the second access device is any access device other than the first access device.
  • step S310 the first access device determines whether the state of the first node meets a preset condition. When the status of the first node does not meet the preset condition, the first access device can continue to perform step S310 after waiting for a period of time, and the first access device can also continue to perform step S310 immediately, when the status of the first node meets When the preset conditions are met, the first access device executes step S320.
  • the preset condition may be that the value of the operating parameter of the first node is equal to or greater than a set threshold or the first node is locally unreachable.
  • the first node can be considered as an abnormal node.
  • the first access device may obtain the operating parameters of the first node.
  • the operating parameters of the first node may include, for example, one or more of packet loss rate, time delay, bit error rate, CPU utilization, and network card status.
  • the value of the operating parameter indicates the operating status of the first node, for example, Is it normal operation (the value of the operating parameter is less than the set threshold) or abnormal operation (the value of the operating parameter is equal to or greater than the set threshold).
  • the first access device can monitor or collect the value of the operating parameter of the first node, determine whether the value of the operating parameter of the first node is equal to or greater than the set threshold, and when the value of the operating parameter of the first node is equal to or greater than the set threshold When the threshold is set, the first notification message is sent to the second node.
  • the first access device determines that there is a local unreachability of the first node failure.
  • "locally unreachable" of a node means that data cannot be forwarded to the node through the access device of the node. For example, if the link between the computing node 50a and the access device 20a fails, and other nodes cannot send messages to the computing node 50a through the access device 20a, the computing node 50a is locally unreachable relative to the access device 20a. The communication between the first access device and the first node is interrupted.
  • the port of the first access device connected to the first node may fail, or the first access device may fail at a set time (for example, every 5 Minutes, or every N set keep-alive message periods, or within a set time point) no keep-alive message from the first node is received.
  • the port of the first access device connected to the first node fails, that is, the port of the first access device connected to the first node is in a faulty (e.g., down) state.
  • Various situations can cause the port of the first access device to connect to the first node to fail, for example, the port of the first access device to the first node fails, the port of the first node to connect to the first access device fails, and the first The cable between the access device and the first node is faulty, the first node is powered off, the first node is reset, the optical module of the first access device is faulty, and so on.
  • step S320 the first access device sends a first notification message to the second node, where the first notification message includes the identifier of the first node and the status information of the first node.
  • the first notification message refers to a type of message sent by the first access device when it finds that the state of the first node connected to the first access device meets a preset condition.
  • the first notification message may include one or more strip.
  • the first notification message may include the identification of the first node.
  • the status information of the first node includes an identifier indicating that the first node is locally unreachable, and the identifier may specifically be a field or a flag bit.
  • the first notification message may also include the location of the fault, the time of the fault, and so on. The fault location in the first notification message may specifically be the identifier of the first node or the identifier of the port connected to the first node.
  • the identifier of the first node may be the IP address, MAC address, device name, device identifier (identifier, ID) of the first node and other information that can uniquely identify the node.
  • the first notification message is an operating parameter notification message
  • the first notification message further includes the operating parameter and the value of the operating parameter.
  • step S320 may be that the first access device actively sends the first notification message to the second node according to a set condition.
  • the second node may be a node that has subscribed to the abnormal information of the first node, or may be a node determined by the first access device according to a certain rule.
  • the rule may be determined according to the type of the node. For example, the rule may be that when an abnormal node is the active computing node, a notification message is sent to all standby computing nodes of the active computing node.
  • the second node may also be all other nodes except the first node.
  • the first access device may store the subscription table shown in FIG. 4.
  • the subscription table records the information of each node connected to the access device 20a and the subscribed nodes.
  • the subscription table is just an example, and the content in the subscription table can be deleted or added as needed.
  • the first access device may actively send the first notification message to the second node according to the subscription table.
  • the access device 20a detects that the storage node 40a is locally unreachable or After the value of the operating parameter of the storage node 40a meets the preset condition, the first notification message is sent to the computing node 50a.
  • step S320 may be that the first access device receives the query message sent by the second node, and sends the first notification message to the second node according to the query message.
  • step S320 may further include S315, where the second node sends a query message to the first access device. The query message is used to obtain the status information of the first node from the first access device.
  • the second node stores the information of the target node (for example, the first node) of the second node and the access device (for example, the first access device) of the target node, and the second node may according to needs or according to
  • the set condition sends a query message to the access device of the target node, the query message includes the identifier of the target node, and is used to obtain the status information of the target node from the access device of the target node.
  • step S325 the second node performs processing operations according to the first notification message.
  • the second node obtains the state information of the first node according to the first notification message, and performs processing operations according to the state information of the first node.
  • the status information of the first node includes the value of the operating parameter of the first node, or the identifier indicating that the first node is locally unreachable.
  • the second node when the status information of the first node includes an identifier indicating that the first node is locally unreachable, the second node obtains the identifier of the first node, and determines the status information of the first node The first node is locally unreachable, and the type and attribute of the first node are determined according to the identifier of the first node, and then the fault handling operation is performed according to the type and attribute of the first node and the type and attribute of the second node. Since the first node is locally unreachable, the second can be considered that the network is faulty, so fault handling operations need to be performed.
  • the type of the node refers to whether the node is a control node, a computing node or a storage node, and the attribute of the node indicates that the node is an active node or a standby node.
  • the first notification message may include the identification of the first node
  • the second node may look up the node mapping table according to the identification of the first node, determine the type and attribute of the first node, and then according to the identification of the first node
  • the type and attribute, the type and attribute of the second node perform processing operations.
  • the second node stores a node mapping table, and each entry of the node mapping table includes a corresponding relationship between a node identifier, a node type, and a node attribute.
  • the identification of a node may be the node's IP address, MAC address, device name of the node, device identification and other information that can uniquely identify the node.
  • step S325 has the following possible implementation manners:
  • the first node is the main computing node (for example, computing node 50a) of computing function A (in this application, computing function A is used to indicate any computing function)
  • the second node is computing function A
  • a standby computing node for example, computing node 50b
  • the notification message changes the attribute of the backup computing node of computing function A (for example, computing node 50b) from backup to active use, and informs the backup computing node to enable the backup computing node to perform active/standby switchover (even if the backup computing node will The attribute of the standby computing node is changed from standby to active);
  • the control node If the first node is a backup computing node of computing function A (for example, computing node 50b), and the second node is a control node (for example, control node 60a), then the control node according to the first notification message Update the network topology (modify the attribute of the backup computing node to be unavailable or delete the backup computing node).
  • implementation (1) usually occurs in a scenario where there is no control node
  • implementation (2) usually occurs in a scenario where there is a control node.
  • the standby computing node of computing function A can switch to the primary computing node of computing function A according to the first notification message sent by the first access device, or it can switch according to the notification of the control node
  • the switching operation is performed according to the first received message.
  • the first access device may neither send notification messages to the primary computing node of computing function A, nor to the storage that stores data B that computing function A needs to access. Nodes send notification messages to avoid too many messages in the distributed storage system.
  • step S325 has the following possible implementation modes:
  • the first node is the primary storage node of data B (for example, storage node 40a)
  • the second node is the primary computing node (for example, computing node 50a) of computing function A that accesses data B
  • the second node switches the read-write interface of the second node to the backup storage node of the data B (for example, the storage node 40f) to obtain the data B from the backup storage node.
  • Each computing node stores the storage node corresponding to each computing function and the attributes of each storage node. Among them, one computing function can correspond to multiple storage nodes.
  • the attributes of a storage node include active or standby.
  • the control node determines according to the first notification message New primary storage node (for example, storage node 40g), delete the primary storage node of data B or modify the attributes of the primary storage node, determine the new primary storage node for data B, and set the new primary storage node
  • the information of the storage node informs the primary computing node and the standby computing node that use the computing function A of the data B.
  • the control node deletes according to the first notification message The standby storage node or modify the attribute of the standby storage node to be unavailable, and then the control node selects a new standby storage node (for example, storage node 40e) for data B, and notifies the active storage node of the new standby storage node , So that the active storage node sends data B to the new standby storage node.
  • a new standby storage node for example, storage node 40e
  • the main storage node is based on The first notification message stops sending data B to the storage node.
  • the access device of the main storage node may not need to access the data stored on the main storage node for computing
  • the functional backup computing node sends the first notification message.
  • the access device of the backup storage node may not send the computing node (regardless of the primary use The computing node or the standby computing node) sends the first notification message.
  • step S325 has the following possible implementation modes:
  • the second node stops reporting to the active control node according to the first notification message Sending information, after receiving the message sent by the new active control node, register with the new active control node.
  • the second node will send the second node to the second node after receiving the first notification message. Switch to the master control node, and then send a notification message to the nodes it manages (including computing nodes and storage nodes), so that the managed nodes register with the second node.
  • the first access device may not send notification messages to the computing node or storage node, but only send notification messages to the management device or administrator, so that the management device or administrator can re-designate or deploy Standby control node.
  • the distributed storage system shown in FIG. 1 may not include a control node.
  • the control node When the control node is not included, the above-mentioned processes related to the control node may not be executed.
  • the second node when the state information of the first node includes the value of the operating parameter of the first node, the second node obtains the identifier of the first node and the value of the operating parameter of the first node When the value of the operating parameter is greater than or equal to the alarm threshold of the operating parameter and less than the fault threshold of the operating parameter, the second node sends an alarm message, the alarm message including the identification of the first node and the operation of the first node The value of the parameter. Wherein, the second node may send the alarm message to the network management system or the controller.
  • the second node when the state information of the first node includes the value of the operating parameter of the first node, the second node obtains the identifier of the first node and the value of the operating parameter of the first node When the value of the operating parameter is greater than the fault threshold of the operating parameter, the second node determines the type and attribute of the first node according to the identification of the first node, and then according to the type and attribute of the first node, the second node The type and attributes of the node perform fault handling operations.
  • the fault handling operation performed by the second node may refer to the fault handling operation of the second node when the status information of the first node includes an identifier indicating that the first node is locally unreachable.
  • step S330 the first access device and the second access device send keep-alive messages to each other, so that the first access device determines whether the connection with the second access device is interrupted.
  • step S330 may be executed before step S310, may also be executed after step S310, or may not be executed.
  • step S340 the first access device determines whether there is a fault that makes the remote end of the third node unreachable.
  • the first access device may continue to perform step S340 after waiting for a period of time, and when there is a fault that makes the remote end of the third node unreachable, step S350 is performed.
  • the third node refers to a local node of the second access device other than the first access device, that is, a node deployed on a host directly connected to the second access device.
  • "remotely unreachable" of a certain node means that an access device cannot access the node through the access device of the node.
  • the access device 20a there is a failure in the link between the access device 20a and the access device 20c, which causes the access device 20a to be unable to send messages to the local computing nodes 50d and 50e, storage nodes 40f and 40g, and the control node 60b of the access node 20c, Then the computing nodes 50d and 50e, the storage nodes 40f and 40g, and the control node 60b are remotely unreachable relative to the access device 20a.
  • the first access device may determine whether the node connected to the second access device is remotely unreachable according to whether the keep-alive message sent by the second access device is received in step S330 , If the keep-alive message sent by the second access device is not received within the set time or within the set period, the first access device determines that the communication with the second access device is interrupted, that is, the second The third node to which the access device is connected is remotely unreachable.
  • the first access device may monitor the routing table, and if the second access device with a reachable route becomes unreachable (for example, the subnet corresponding to the second access device is unreachable). The routing information is deleted or invalidated), the first access device determines that the communication with the second access device is interrupted.
  • the communication between the first access device and the second access device is interrupted, which means that the remote end of the third node connected to the second access device is unreachable.
  • the third node refers to all nodes connected to the second access device.
  • the communication between the first access device and the second access device is interrupted, which means that the access device 20a cannot communicate with the access device through the backbone device 30a.
  • 20c communicates, and cannot communicate with the access device through the backbone device 30b, so the node connected to the access device 20a cannot access any node of the access device 20c.
  • the access mapping table on each access device stores the correspondence between nodes in the entire distributed storage system and their access devices. Therefore, the first access device may determine the remotely unreachable third node according to the access mapping table after the communication between the second access device is interrupted.
  • the first access device may monitor the routing table, and if the third node that was originally routed becomes unreachable (that is, the routing table entry corresponding to the third node is deleted or set to When invalid), the first access device determines that the remote end of the third node is unreachable.
  • the third node can be one node, multiple nodes, or all nodes corresponding to a subnet.
  • step S350 the first access device sends a second notification message to the fourth node.
  • the second notification message is used to notify the fourth node that the communication of the second access device is interrupted or the second access device is connected.
  • the third node is remotely unreachable.
  • the second notification message refers to a message sent when the access device discovers that the remote end of the node is unreachable, and there may be one or more second notification messages.
  • the second notification message may include the location of the failure, the time of the failure, and so on.
  • the fault location may be the identification of the second access device (for example, IP address, MAC address, path identification of the second access device), Subnet prefix, etc.
  • the second notification message may include an identifier or a subnet prefix of the second access device, and the second notification message may also include an identifier indicating that the second access device is disconnected from communication.
  • the fault location may be the identity of one or more third nodes, and the identity of the one or more third nodes It can be the third node's IP address, MAC address, device name, device identification and other information that can uniquely identify the third node.
  • the second notification message may include an identifier of the third node and an identifier indicating that the third node is remotely unreachable.
  • the fourth node may be all local nodes of the first access device.
  • the first access device may also generate different second notification messages according to the type and attributes of each third node, and send them to different fourth nodes.
  • the fourth node may be the local node of the first access device or the local node of other access devices, and the fourth node may be one node or multiple nodes.
  • the fourth node may include the first node.
  • the first access device may find all third nodes according to the stored access mapping table, and then determine the type and attribute of each third node according to the node mapping table, and according to the type and attribute of each third node Generate a different second notification message.
  • the access device 20a in FIG. 1 is the first access device
  • the access device 20c is the second access device
  • the access device 20a detects that the access device 20c is not received within the set time.
  • the sent keep-alive message determines that the communication with the access device 20c is interrupted, and it can also determine the local nodes of the access device 20c, namely the computing nodes 50d and 50e, the storage nodes 40f and 40g, and the control node 60b, which are far from the access device 20a. The end is unreachable.
  • the access device 20a may generate a second notification message, the second notification message includes the identification (for example, IP address) or subnet prefix of the access device 20c and an indication that the communication between the access device 20a and the access device 20c is interrupted. Identify, and then unicast, broadcast or multicast the second notification message to all local nodes, namely the computing node 50a and the storage nodes 40a-40c.
  • the access device 20a may also generate different second notification messages according to the information of the local node of the access device 20c stored by the access device 20 and the subscription table stored by the access device 20a.
  • the access device 20a may send the control node 60b to the computing node 50a and the storage node 40b according to the subscription table
  • the second notification message that the remote end is unreachable may also send the second notification message that the remote end of the computing node 50d is unreachable to the computing node 50a according to the subscription table.
  • step S355 the fourth node receives the second notification message, and performs processing operations according to the second notification message.
  • the fourth node may be one or more. When there are multiple fourth nodes, after each fourth node receives the second notification message, it performs processing operations according to the second notification message.
  • the broadcast or multicast message may include the identifier or subnet prefix of the second access device, or include the identifier of a third node, or include the information of multiple third nodes.
  • logo When the fourth node receives the second notification message, it can obtain the identity or subnet prefix of the second access device, determine the identity of the third node according to the identity or subnet prefix of the second access device, and The identification of the third node performs corresponding processing operations. For example, the fourth node records the information of the associated nodes of the fourth node.
  • the associated node may be a node that the fourth node needs to visit.
  • the fourth node compares the identifier of the associated node with the identifier or subnet prefix of the second access device, uses the identifier of the matched associated node as the identifier of the third node, and determines the third node according to the identifier of the third node According to the type and attribute of the third node, the type and attribute of the fourth node perform fault handling operations.
  • the associated node of the fourth node is recorded in an association table of the fourth node, and the association table records the identification, type and attribute of the associated node of the fourth node.
  • the fourth node may also store the identifier or subnet prefix of the second access device, and compare the stored identifier or subnet of the second access device when it needs to access a new node.
  • the prefix and the identifier of the new node determine whether a fault handling operation needs to be performed on the new node.
  • the fourth node determines that it is necessary to perform a fault handling operation on the new node. Then, the fourth node determines the type and attribute of the new node according to the identifier of the new node, and performs a fault handling operation according to the type and attribute of the new node and the type and attribute of the fourth node.
  • the second notification message When the second notification message is a unicast message, the second notification message includes the identifier of the third node, and the fourth node performs the fault handling operation according to the identifier of the third node. That is, the fourth node obtains the identity of the third node from the second notification message, and determines that the remote end of the first node is unreachable according to the status information of the third node. The fourth node determines the type and attribute of the third node according to the identifier of the third node; the fourth node performs a fault handling operation according to the type and attribute of the third node and the type and attribute of the fourth node.
  • the first stage (including steps S310, S315, S320, and S325) and the second stage (including steps 330, S340, S350, and S355) are two independent processes. Therefore, the second stage can be in the first stage Before or after execution, it can also be executed at the same time as the first stage.
  • the first node, second node, third node, and fourth node in this application are only used to distinguish different functions of nodes in different scenarios, and are not used to distinguish different nodes.
  • the same node, for example, the computing node 50a can be the first node that is locally unreachable to the access device 20a due to a failure at time T1, or it can be stored as a notification sent by the access device 20a at time T2 after the failure is restored
  • the second node notified that the node 40a is locally unreachable can also be at time T3, because the communication between the access device 20a and the access device 20b is interrupted, and the access device 20b serves as the remote unreachable third node.
  • step S325 and step S355 in FIG. 3 of the present application may be executed by the same node, or may be executed by different nodes.
  • the access device in the distributed storage system actively detects whether there is an abnormality in the network, and after detecting the abnormality, sends a notification message to the node (which may be a node that has subscribed to the failure notification or all local nodes).
  • the node that receives the notification message can perform corresponding processing operations.
  • This application does not require nodes to send keep-alive messages to each other, which can not only reduce the number of keep-alive messages in the distributed storage system, save network bandwidth, but also improve the accuracy and reliability of data storage and reading.
  • the exception handling methods provided in the embodiments of this application are respectively introduced from the perspective of nodes and access devices.
  • the nodes and access devices in the embodiments of the present application include hardware structures and/or software modules corresponding to each function.
  • the functions and steps of the examples described in the embodiments disclosed in this application can be implemented in the form of hardware or a combination of hardware and computer software. Whether a certain function is executed by hardware or computer software-driven hardware depends on the specific application and design constraint conditions of the technical solution. Professionals and technicians use different methods to implement the described functions, but such implementation should not be considered beyond the scope of this application. The following describes the structure of the node sum of this application from different perspectives.
  • an embodiment of the present application provides an access device 500.
  • the access device 500 may be the first access device in FIG. 3.
  • the access device 500 may also include other components to implement more functions.
  • the access device 500 includes a detection unit 5031 and a processing unit 5032.
  • the detection unit 5031 is configured to execute steps S310 and/or S340
  • the processing unit 5032 is configured to execute steps S320 and/or S340. Further, the processing unit 5032 may also implement step S330. The functions of the detection unit 5031 and the processing unit 5032 are described in detail below.
  • the detection unit 5031 is configured to detect the state of the first node, the access device 500 is an access device of the first node, and the first node is a storage node, a computing node, or a control node;
  • the processing unit 5032 is configured to send a first notification message to a second node when the state of the first node meets a preset condition, and the second node is a node other than the first node in the distributed storage system.
  • the first notification message includes the identification of the first node and the status information of the first node.
  • the state of the first node that satisfies the preset condition includes that the value of the operating parameter of the first node is equal to or greater than a set threshold, and the state information of the first node includes the value of the operating parameter of the first node.
  • the state of the first node that satisfies a preset condition includes that the first node is locally unreachable, and the state information of the first node includes an identifier indicating that the first node is locally unreachable.
  • the local unreachability of the first node includes: the access device 500 does not receive a keep-alive message from the first node within a set time; or the port through which the access device 500 connects to the first node fails.
  • the processing unit 5032 when sending the first notification message to the second node, is configured to: send the first notification message to the second node according to the query message sent by the second node; or according to the setting Condition, actively send the first notification message to the second node.
  • the distributed storage system further includes a second access device and a third node connected to the second access device, and the processing unit 5032 is further configured to: the routing information at the third node is deleted or When it is set as invalid, a second notification message is sent to the fourth node, and the fourth node accesses the access device 500, and the second notification message is used to notify the fourth node that the third node is remotely unreachable.
  • the distributed storage system further includes a second access device
  • the processing unit 5032 is further configured to: after the communication between the access device 500 and the second access device is interrupted, send a message to the fourth node A second notification message is sent, and the fourth node accesses the access device 500, and the second notification message is used to notify the fourth node that the third node connected to the second access device is remotely unreachable.
  • the access device 500 may further include a storage unit 5033 for storing data required when implementing the method shown in FIG. 3, and the data 5033 may include, for example, a subscription table, a routing table, and the like.
  • the detection unit 5031 and the processing unit 5032 may be implemented by hardware or software.
  • the access device 500 may further include a processor 501, a communication interface 502, and a memory 503.
  • the processor 501, the communication interface 502, and the memory 503 are connected through a bus system 504.
  • the memory 503 is used to store program code, and the program code includes instructions that can implement the functions of the detection unit 5031 and the processing unit 5032.
  • the processor 501 can call the program code in the memory 503 to realize the functions of the detection unit 5031 and the processing unit 5032.
  • the access device 500 may also include a programming interface 505 for writing the program code to the memory 503.
  • the processor 501 may be a central processing unit (CPU), and the processor 501 may also be other general-purpose processors, digital signal processors (digital signal processors, DSP), and application-specific integrated circuits. (application-specific integrated circuit, ASIC), field programmable gate array (FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, etc.
  • the general-purpose processor may be a microprocessor or any conventional processor.
  • the processor 501 may include one or more processing cores.
  • the communication interface 502 is used to communicate with external devices, for example, to cooperate with the program code in the memory 503 to implement receiving and/or sending functions.
  • the communication interface 502 in FIG. 5 is only an example.
  • the access device 500 may include multiple communication interfaces to connect multiple different external devices and communicate with these external devices.
  • the memory 503 may include read-only memory (ROM) or random access memory (RAM). Any other suitable type of storage device can also be used as the memory 503.
  • the memory 503 may include one or more storage devices.
  • the memory 503 may further store an operating system 5034, which is used to support the operation of the access device 500.
  • the bus system 504 may also include a power bus, a control bus, and a status signal bus. However, for clear description, various buses are marked as the bus system 504 in the figure.
  • an embodiment of the present application also provides a terminal device 600.
  • the terminal device 600 may be a node (in this case, the node is a physical device), or a host where the node is located (in this case, the node is a virtual device running on a physical device).
  • the terminal device 600 may have the functions of the second node and/or the fourth node in FIG. 3.
  • the terminal device 600 may also include other components to implement more functions.
  • the terminal device 600 includes a receiving unit 6031 and a processing unit 6032.
  • the receiving unit 6031 is configured to execute steps S320 and/or S350
  • the processing unit 6032 is configured to execute steps S325 and/or S355. Further, the processing unit 6032 may also implement step S315. The functions of the receiving unit 6031 and the processing unit 6032 are described in detail below.
  • the receiving unit 6031 is configured to receive a first notification message from a first access device, where the first notification message is generated when the first access device determines that the state of the first node meets a preset condition, and the first notification message is The notification message includes the identification of the first node and the status information of the first node;
  • the processing unit 6032 is configured to perform processing operations according to the first notification message.
  • the status information of the first node includes an identifier indicating that the first node is locally unreachable
  • the processing unit 6032 is configured to: obtain the identifier of the first node; determine the status information of the first node The first node is locally unreachable; the type and attribute of the first node are determined according to the identifier of the first node; the type and attribute of the second node are executed according to the type and attribute of the first node, and the fault handling operation is performed on the type and attribute of the second node.
  • the state information of the first node includes the value of the operating parameter of the first node
  • the processing unit 6032 is configured to: obtain the identifier of the first node and the value of the operating parameter of the first node; When the value of the operating parameter is greater than or equal to the alarm threshold of the operating parameter and less than the fault threshold of the operating parameter, an alarm message is sent.
  • the alarm message includes the identification of the first node and the value of the operating parameter of the first node; When the value of the operating parameter is greater than the fault threshold of the operating parameter, the type and attribute of the first node are determined according to the identification of the first node; according to the type and attribute of the first node, the type and attribute of the second node perform failure Processing operation.
  • the receiving unit 6031 is further configured to receive a second notification message, the second notification message being used to notify the second node that the communication of the second access device in the distributed storage system is interrupted or the second access The remote end of the third node connected to the incoming device is unreachable; the processing unit 6032 is further configured to perform processing operations according to the second notification message.
  • the second notification message includes an identifier of the third node and an identifier indicating that the third node is remotely unreachable
  • the processing unit 6032 is configured to: obtain the identifier of the third node from the second notification message ; According to the status information of the third node, determine that the first node is remotely unreachable; determine the type and attributes of the third node according to the identification of the third node; according to the type and attributes of the third node and the second node Perform fault handling operations on the type and attributes.
  • the second notification message includes the identification or subnet prefix of the second access device
  • the processing unit 6032 is configured to: obtain the identification or subnet prefix of the second access device from the second notification message ; Determine the identity of the third node that matches the identity of the second access device or the subnet prefix; determine the type and attribute of the third node according to the identity of the third node; according to the type and attribute of the third node and the The type and attributes of the second node perform fault handling operations.
  • the second notification message includes the identification or subnet prefix of the second access device
  • the processing unit 6032 is configured to: obtain the identification or subnet prefix of the second access device from the second notification message; 2.
  • the identification or subnet prefix of the access device is sent to the storage unit, so that the storage unit stores the identification or subnet prefix of the second access device; when the second access device needs to access a new node, compare the The identity of the new node matches the identity or subnet prefix of the second access device; when the identity of the new node matches the identity or subnet prefix of the second access device, according to the second node according to the new
  • the identifier of the node determines the type and attribute of the new node; the fault handling operation is performed according to the type and attribute of the new node and the type and attribute of the second node.
  • the terminal device 600 may also include a storage unit 6033 for storing data required to implement the method shown in FIG. 3.
  • the data may include, for example, an association table, the identifier or subnet prefix of the second access device, and node mapping. Table etc.
  • the receiving unit 6031 and the processing unit 6032 may be implemented by hardware or software.
  • the terminal device 600 may further include a processor 601, a communication interface 602, and a memory 603.
  • the processor 601, the communication interface 602, and the memory 603 are connected through a bus system 604.
  • the memory 603 is used to store program code, and the program code includes instructions that can implement the functions of the receiving unit 6031 and the processing unit 6032.
  • the processor 601 can call the program code in the memory 603 to realize the functions of the receiving unit 6031 and the processing unit 6032.
  • the terminal device 600 may further include a programming interface 605 for writing the program code to the memory 603.
  • the processor 601 may be a CPU, and the processor 601 may also be other general-purpose processors, DSP, ASIC, FPGA or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, etc. .
  • the general-purpose processor may be a microprocessor or any conventional processor.
  • the processor 601 may include one or more processing cores.
  • the communication interface 602 is used to communicate with an external device, for example, to cooperate with the program code in the memory 603 to implement receiving and/or sending functions.
  • the communication interface 602 in FIG. 6 is just an example.
  • the terminal device 600 may include multiple communication interfaces to connect multiple different external devices and communicate with these external devices.
  • the memory 603 may include ROM or RAM. Any other suitable type of storage device can also be used as the memory 603.
  • the memory 603 may include one or more storage devices.
  • the memory 603 may further store an operating system 6034, and the operating system 6034 is used to support the operation of the terminal device 600.
  • the bus system 604 may also include a power bus, a control bus, a status signal bus, and the like. However, for clear description, various buses are marked as the bus system 604 in the figure.
  • the various components of the access device 500 or the terminal device 600 provided in the embodiments of the present application are only exemplary. Those skilled in the art can add or reduce components as needed, or divide the function of one component into multiple components. . For the implementation of each function of the access device 500 or the terminal device 600 of this application, reference may be made to the description of each step in FIG. 3.
  • the access device 500 or the terminal device 600 of this application can be implemented by hardware, or can be implemented by means of software plus a necessary general hardware platform.
  • the technical solution of the present application can be embodied in the form of a hardware product or a software product.
  • the hardware product can be a dedicated chip or processor.
  • the software product can be stored in a non-volatile storage medium (can be CD-ROM, U disk, mobile hard disk, etc.), and the software product includes several instructions. When the software product is executed, it can make a computer device execute the method described in each embodiment of the present application.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computing Systems (AREA)
  • Human Computer Interaction (AREA)
  • Computer Hardware Design (AREA)
  • Environmental & Geological Engineering (AREA)
  • Mathematical Physics (AREA)
  • Hardware Redundancy (AREA)

Abstract

本申请公开了一种分布式存储系统,应用于存储技术领域。该分布式存储系统包括第一接入设备和多个节点,所述多个节点包括第一节点和第二节点;所述第一节点接入所述第一接入设备;所述第一节点为存储节点、计算节点或控制节点。所述第一接入设备用于:检测所述第一节点的状态,当所述第一节点的状态满足预设条件时,向所述第二节点发送第一通知消息,所述第一通知消息包括所述第一节点的标识以及所述第一节点的状态信息。所述第二节点用于:接收所述第一通知消息,根据所述第一通知消息执行处理操作。通过本申请可以提高分布式存储系统的异常检测效率,进而提高分布式存储系统的可靠性。

Description

一种分布式存储系统及其异常处理方法和相关装置
本申请要求于2020年6月12日提交、申请号为202010538198.4、发明名称为“一种分布式存储系统及其异常处理方法和相关装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及存储技术领域,尤其涉及一种分布式存储系统、应用在该分布式存储系统中的异常处理方法和相关装置。
背景技术
分布式存储是一种将数据分散地存储在不同位置上的存储节点的数据存储技术。这些不同位置的存储节点通过网络设备互联,以传输数据和交换信息。访问并使用存储节点中的数据的节点被称为计算节点。存储节点、网络设备和计算节点组成了分布式存储系统。分布式存储系统中还可以包括控制节点,以管理存储节点和计算节点。
分布式存储系统对分布式存储系统的可靠性有较高要求,拥塞,链路故障,丢包等异常事件都会影响分布式存储系统的可靠性。如何使节点快速发现并处理分布式存储系统中的异常成为本领域需要解决的问题。
发明内容
本申请提供了一种分布式存储系统、应用在该分布式存储系统中的异常处理方法和相关装置,以使分布式存储系统中的节点能够快速发现并处理网络异常,提高分布式存储系统的异常检测效率,进而提高分布式存储系统的可靠性。
本申请第一方面提供了分布式存储系统,该分布式存储系统包括第一交换机和多个节点,所述多个节点包括第一节点和第二节点;所述第一节点接入所述第一交换机;所述第一节点为存储节点、计算节点或控制节点。所述第一交换机用于:检测所述第一节点的状态,当所述第一节点的状态满足预设条件时,向所述第二节点发送第一通知消息,所述第一通知消息包括所述第一节点的标识以及所述第一节点的状态信息。所述第二节点用于:接收所述第一通知消息,根据所述第一通知消息执行处理操作。其中,所述第一节点和第二节点是根据功能区分的。第一节点可以有一个或多个,第二节点可以有一个或多个。
本申请中,由第一交换机检测该第一交换机连接的节点的状态,并在第一节点的状态满足预设条件时,通过第一通知消息将第一节点的状态发送给第二节点,可以避免第一节点和第二节点互相发送保活消息,进而可以减少网络中保活消息的数量,节约网络带宽,减少第二节点获知第一节点的状态的时间,增强分布式存储系统的可靠性。
可选地,所述第一节点的状态满足预设条件包括所述第一节点的运行参数的值等于或大于设定的阈值,所述第一节点的状态信息包括所述第一节点的运行参数的 值;或所述第一节点的状态满足预设条件包括所述第一节点本地不可达,所述第一节点的状态信息包括指示所述第一节点本地不可达的标识。
本申请中,第一交换机不仅检测第一节点是否本地不可达,还可以检测第一节点的运行参数。这样,第一交换机就可以根据第二节点的需要将第一节点的不同状态发送给第二节点,使得第二节点的处理更加及时,可以减少第一节点的状态变化产生的影响,避免业务中断。
可选地,所述第一节点本地不可达包括:所述第一交换机在设定时间内没有接收到来自所述第一节点的保活消息;或所述第一交换机连接所述第一节点的端口失效。
本申请中,第一交换机可以通过多种方式,全方位检测第一节点的状态,保证检测出的第一节点的状态及时准确,提高了分布式存储系统的可靠性。
可选地,当向所述第二节点发送第一通知消息时,所述第一交换机用于:根据所述第二节点发送的查询消息向所述第二节点发送所述第一通知消息;或根据设定的条件,主动向所述第二节点发送所述第一通知消息。该第二节点可以是订阅了第一节点的状态信息的节点,或者是网络中的所有节点。
本申请中,第一交换机采用多种方式向第二节点发送第一通知消息,提高了分布式存储系统的灵活性。
可选地,所述分布式存储系统还包括第二交换机和连接所述第二交换机的第三节点;所述第一交换机还用于在所述第三节点的路由信息被删除或置为无效时,向第四节点发送第二通知消息,所述第四节点接入所述第一交换机,所述第二通知消息用于通知所述第四节点所述第三节点远端不可达。
本申请中,第一交换机还检测第二交换机连接的第三节点是否可达,并在第三节点不可达的时候,向第一交换机的第四节点发送通知消息,以使第四节点执行相应的处理,可以避免第四节点在第三节点不可达的时候与第三节点通信,提高了分布式存储系统的可靠性。
可选地,所述分布式存储系统还包括第二交换机;所述第一交换机还用于在与所述第二交换机的通信中断后,向第四节点发送第二通知消息,所述第四节点接入所述第一交换机,所述第二通知消息用于通知所述第四节点所述第二交换机的通信中断或所述第二交换机连接的所有第三节点远端不可达。
本申请中,第二交换机还检测与第二交换机之间的通信是否中断,并在第一交换机与第二交换机之间的通信中断时,向第四节点第二通知消息,以使第四节点在第一交换机与第二交换机间的通信中断时,避免与第二交换机连接的第三节点通信,提高了分布式存储系统的可靠性。
本申请第二方面提供了一种异常处理方法,由第一方面的分布式存储系统中的第一交换机执行。所述第一交换机检测第一节点的状态,所述第一交换机为所述第一节点的交换机,所述第一节点为存储节点、计算节点或控制节点。当所述第一节点的状态满足预设条件时,所述第一交换机向第二节点发送第一通知消息,所述第二节点为所述分布式存储系统中除所述第一节点之外的节点,所述第一通知消息包括所述第一节点的标识以及所述第一节点的状态信息。
可选地,所述第一节点的状态满足预设条件包括所述第一节点的运行参数的值等于或大于设定的阈值,所述第一节点的状态信息包括所述第一节点的运行参数的值;或,所述第一节点的状态满足预设条件包括所述第一节点本地不可达,所述第一节点的状态信息包括指示所述第一节点本地不可达的标识。
可选地,所述第一节点本地不可达包括:所述第一交换机在设定时间内没有接收到来自所述第一节点的保活消息;或,所述第一交换机连接所述第一节点的端口失效。可选地,所述向第二节点发送第一通知消息包括:根据所述第二节点发送的查询消息向所述第二节点发送所述第一通知消息;或,所述第一交换机根据设定的条件,主动向所述第二节点发送所述第一通知消息。
可选地,所述分布式存储系统还包括第二交换机和连接所述第二交换机的第三节点,在所述第三节点的路由信息被删除或置为无效时,所述第一交换机向第四节点发送第二通知消息,所述第四节点接入所述第一交换机,所述第二通知消息用于通知所述第四节点所述第三节点远端不可达。
可选地,所述分布式存储系统还包括第二交换机,在所述第一交换机与所述第二交换机的通信中断后,向第四节点发送第二通知消息,所述第四节点接入所述第一交换机,所述第二通知消息用于通知所述第四节点所述第二交换机连接的第三节点远端不可达。
本申请第三方面提供了一种异常处理方法,所述方法应用于第一方面的分布式存储系统的第二节点中,所述分布式存储系统还包括第一节点,所述第一节点接入第一交换机,所述第一节点为存储节点、计算节点或控制节点。所述第二节点接收来自所述第一交换机的第一通知消息,所述第一通知消息是在所述第一交换机确定所述第一节点的状态满足预设条件时生成的,所述第一通知消息包括所述第一节点的标识以及所述第一节点的状态信息。所述第二节点根据所述第一通知消息执行处理操作。
可选地,所述第一节点的状态信息包括指示所述第一节点本地不可达的标识,所述第二节点根据所述第一通知消息执行处理操作包括:所述第二节点从所述第一通知消息中获取所述第一节点的标识;所述第二节点根据所述第一节点的状态信息确定所述第一节点本地不可达;所述第二节点根据所述第一节点的标识确定所述第一节点的类型和属性;所述第二节点根据所述第一节点的类型和属性以及所述第二节点的类型和属性执行故障处理操作。
可选地,所述第一节点的状态信息包括所述第一节点的运行参数的值,所述第二节点根据所述第一通知消息执行处理操作包括:所述第二节点从所述第一通知消息中获取所述第一节点的标识和所述第一节点的运行参数的值;当所述运行参数的值大于等于所述运行参数的告警阈值且小于所述运行参数的故障阈值时,所述第二节点发送告警消息,所述告警消息包括所述第一节点的标识和所述第一节点的运行参数的值;当所述运行参数的值大于所述运行参数的故障阈值时,所述第二节点根据所述第一节点的标识确定所述第一节点的类型和属性;所述第二节点根据所述第一节点的类型和属性,所述第二节点的类型和属性执行故障处理操作。
可选地,所述分布式存储系统还包括第二交换机,所述第二节点还接收第二通知消 息,所述第二通知消息用于通知所述第二节点所述第二交换机的通信中断或所述第二交换机连接的所有第三节点远端不可达;所述第二节点根据所述第二通知消息执行处理操作。
可选地,所述第二通知消息包括所述第三节点的标识以及指示所述第三节点远端不可达的标识,所述第二节点根据所述第二通知消息执行处理操作包括:所述第二节点从所述第二通知消息中获取所述第三节点的标识;所述第二节点根据所述第三节点的状态信息确定所述第一节点远端不可达;所述第二节点根据所述第三节点的标识确定所述第三节点的类型和属性;所述第二节点根据所述第三节点的类型和属性以及所述第二节点的类型和属性执行故障处理操作。
本申请中,第二节点可以根据第二通知消息中的第三节点的标识直接确定不可达的第三节点,并避免与第三节点通信,可以提高分布式存储系统的可靠性。
可选地,所述第二通知消息包括所述第二交换机的标识或子网前缀,所述第二节点根据所述第二通知消息执行处理操作包括:所述第二节点从所述第二通知消息中获取所述第二交换机的标识或子网前缀;所述第二节点确定匹配所述第二交换机的标识或子网前缀的第三节点的标识;所述第二节点根据所述第三节点的标识确定所述第三节点的类型和属性;所述第二节点根据所述第三节点的类型和属性以及所述第二节点的类型和属性执行故障处理操作。
本申请中,第一交换机只需要向第二节点发送一条包括第二交换机的标识或子网前缀的第二通知消息,第二节点就可以执行操作,避免与第二交换机的所有第三节点通信,提高了处理效率,节约了网络带宽,提高了网络可靠性。
可选地,所述第二通知消息包括所述第二交换机的标识或子网前缀,所述第二节点根据所述第二通知消息执行处理操作包括:所述第二节点从所述第二通知消息中获取所述第二交换机的标识或子网前缀;所述第二节点存储所述第二交换机的标识或子网前缀;当所述第二节点需要访问新的节点前,所述第二节点比较所述新的节点的标识与所述第二交换机的标识或子网前缀;当所述新的节点的标识与所述第二交换机的标识或子网前缀匹配时,所述第二节点根据所述新的节点的标识确定所述新的节点的类型和属性;所述第二节点根据所述新的节点的类型和属性以及所述第二节点的类型和属性执行故障处理操作。
本实施方式中,第二节点不需要在接收到第二交换机的标识或子网前缀时就执行相应的处理,而只需要在访问新的节点时才判断该新的节点是否为需要避免通信的第三节点,提高了实现的灵活性,节约了第二节点的处理资源。
本申请第四方面提供了一种交换机。该交换机包括执行第二方面或第二方面的任意可能设计提供的异常处理方法的功能模块;本申请对功能模块的划分不做限定,可以按照第一方面的异常处理方法的流程步骤对应划分功能模块,也可以按照具体实现需要划分功能模块。
本申请第五方面提供了一种节点。该节点包括执行第三方面或第三方面的任意可能设计提供的异常处理方法的功能模块;本申请对功能模块的划分不做限定,可以按照第三方面的异常处理的方法的流程步骤对应划分功能模块,也可以按照具体实现需要划分功能模块。
本申请第六方面提供了一种主机。该主机上运行节点,包括存储器、处理器和通信接口,该存储器用于存储计算机程序代码和数据,该处理器用于调用该计算机程序代码,并结合该数据使所述节点实现本申请第三方面及其任意可能的设计中的异常处理方法。
本申请第七方面提供了一种芯片,该芯片运行时,能够实现本申请第二方面及其任意可能的设计中的异常处理方法,以及实现本申请第三方面及其任意可能的涉及中的异常处理方法。
本申请第八方面提供了一种存储介质,该存储介质中存储有程序代码,该程序代码运行时,能够使运行该程序代码的设备(交换机,服务器,终端设备等)实现本申请第二方面及其任意可能的设计中的异常处理方法,以及实现本申请第三方面及其任意可能的涉及中的异常处理的方法。
本申请第二方面至第八方面的有益效果可以参考对第一方面及其各可能的设计的有益效果的描述,在此不再赘述。
附图说明
图1为本申请实施例提供的一种分布式存储系统的结构示意图;
图2为本申请实施例提供的分布式存储系统的故障位置示意图;
图3为本申请实施例提供的异常处理方法的流程示意图;
图4为本申请实施例提供的订阅表的示意图;
图5为本申请实施例提供的一种接入设备500的结构示意图;
图6为本申请实施例提供的一种终端设备600的结构示意图。
具体实施方式
在本申请实施例中,“示例性的”或者“例如”等词用于表示作例子、例证或说明。本申请实施例中被描述为“示例性的”或者“例如”的任何实施例或设计方案不应被解释为比其它实施例或设计方案更优选或更具优势。确切而言,使用“示例性的”或者“例如”等词旨在以具体方式呈现相关概念。
在本申请实施例中,除非另有说明,“多个”的含义是指两个或两个以上。例如,多个节点是指两个或两个以上的节点。“至少一个”是指任意的数量,例如,一个,两个或两个以上。“A和/或B”可以是只有A,只有B,或包括A和B。“A、B和C中的至少一个”,可以是只有A,只有B,只有C,或包括A和B,包括B和C,包括A和C,或者包括A,B和C。本申请中的“第一”、“第二”等用语仅用于区分不同的对象,而不用于对象的指示优先级或重要性。
本申请各实施例用于使分布式存储系统中的节点快速感知并处理分布式存储系统中的异常,该异常包括拥塞,丢包,CPU占用率过高,时延过大等网络运行异常,以及端口失效,业务中断等网络故障。为了使本申请的目的、技术方案和优点更加清楚,下面将结合附图对本申请作进一步地详细描述。
图1为本申请实施例提供的分布式存储系统的结构示意图。本申请的分布式存储系统包括接入设备20,计算节点50和存储节点40。存储节点40和计算节点50可以分布在一个或多个主机10上。多个分布式存储系统可以组成一个更大的系统。例如,分布 式存储系统100a包括部署在主机10a上的计算节点50a、部署在主机10b上的存储节点40a-40c,以及连接主机10a和10b的接入设备20a。分布式存储系统还可以包括控制节点60,用于管理存储节点40和计算节点50。例如,分布式存储系统100b除了包括部署在主机10c和10d上的计算节点和存储节点,以及接入设备20b外,还包括部署在主机10d上的控制节点60a。分布式存储系统100a、100b和100c通过骨干设备30a和30b通信连接,组成更大的分布式存储系统1000。
在本申请的分布式存储系统中,存储节点40、计算节点50和控制节点60可统称为节点,主机10是节点的载体,一台主机可以承载一个或多个节点。一台主机可以承载的多个节点可以包括计算节点50、存储节点40或控制节点60中的一种或多种。主机10可以是物理服务器、虚拟服务器、工作站、移动台、通用计算机等可以承载节点的设备。存储节点50用于存储数据,数据可以是任何数字化的信息,例如,用户使用网络应用时产生的信息,用户存储的文件,网络配置等。计算节点50用于从存储节点40获取数据,并基于获取的数据处理业务。接入设备20用于连接不同的主机,并转发主机上的节点间发送的数据。骨干设备30用于连接不同的接入设备,以扩展分布式存储系统的规模。其中,接入设备20可以是二层交换机或者三层交换机,骨干设备30可以是三层交换机或者路由器。为了提高系统的可靠性,可以为一个节点部署备用节点。例如,计算节点50a和50b都可以提供计算功能A,可以将计算节点50a作为计算功能A的主用计算节点,将计算节点50b作为计算功能A的备用计算节点。再比如,存储节点40a和存储节点40g都存储了数据B,则可以将计算节点40a作为数据B的主用存储节点,将计算节点40g作为数据B的备用存储节点。再比如,控制节点60a为主用控制节点,控制节点60b为备用控制节点。分布式存储系统中也可以不包括控制节点,当分布式存储系统中不包括控制节点时,可以由网络管理系统管理存储节点40和计算节点50。
分布式存储系统在运行过程中可能会发生故障,以计算节点50a向存储节点40g读取数据的过程为例,图1所示的分布式存储系统可能发生故障的位置(以X表示)如图2所示。即,任意两个设备之间都可能发生故障,故障可能是该两个设备中某个设备的端口故障,也可能是链路故障。通常,分布式存储系统中的节点之间通过互相发送保活报文来检测故障,以避免向失去连接的节点发送请求。然而,从图2可以看出,两个节点之间的路径可能很长,采用节点之间发送保活报文来检测故障的方式会导致网络中存在大量的保活报文,浪费了网络和系统资源。并且,为了防止由于主机繁忙等原因造成心跳报文不能及时回复,通常会设置较长的心跳等待时间,导致故障收敛时间长,进而导致业务中断。进一步地,分布式存储系统中的任何节点或设备在运行的过程中都可能发生丢包,时延增大,CPU利用率过高,误码率增加等影响网络性能的事件。这些事件如果不能及时通知相关节点,该相关节点仍然会与被该异常影响的节点通信,将会导致业务迟延或中断。本申请中将网络故障和影响网络性能的事件统称为异常。
为了避免网络异常导致的网络性能下降或者业务中断,在图1所示的分布式存储系统的基础上,本申请实施例提供了一种异常处理方法,由节点的接入设备进行异常检测,以提高在分布式存储系统中异常发现和异常处理的效率,节约网络资源。在后续的实施例中,为了便于描述,图1所示的分布式存储系统被抽象为:包括第一接入设备和多个节点,该多个节点包括第一类节点和第二类节点,该第一类节点接入该接入设备(即该 第一接入设备为该第一类节点的接入设备)。进一步地,该分布式存储系统还包括第二接入设备、第三类节点和第四类节点,第三类节点接入第二接入设备,第四类节点接入第一接入设备。本申请中的第一类节点,第二类节点,第三类节点和第四类节点是根据功能区分的,而不指某个具体的节点,例如,在一个场景的第一类节点可以是另一个场景中的第二类节点。任何一类节点可以包括一个或多个节点。本申请为了方便描述,将第一类节点,第二类节点,第三类节点和第四类节点分别简称为第一节点,第二节点,第三节点和第四节点。以下将结合图3,描述第一接入设备,第二接入设备和骨干以及各节点的功能。如图3所示,本申请实施例提供的异常处理方法包括步骤S300-S355。步骤S300-S355均可以根据场景需要进行删减。即,本申请实施例提供的故障实施方法并不要求执行步骤S300-S355中的全部过程。
在步骤S300中,建立节点与对应的接入设备的连接。每个节点接入网络的时候,都需要和对应的接入设备建立连接,图3以第一节点连接第一接入设备,第三节点接入第二接入设备为例进行说明。其中,第一节点可以为存储节点,计算节点或控制节点,第二节点可以是订阅了第一节点的状态信息的存储节点,计算节点或控制节点,也可以是除第一节点外的任意节点,后续将结合具体场景说明。第一接入设备可以是图1中的接入设备20a、20b和20c中的任意一个,第二接入设备是除第一接入设备之外的任意一个接入设备。
在步骤S310中,第一接入设备确定第一节点的状态是否满足预设的条件。当第一节点的状态不满足预设的条件时,第一接入设备可以在等待一段时间后继续执行步骤S310,第一接入设备也可以立即继续执行步骤S310,当第一节点的状态满足预设的条件时,第一接入设备执行步骤S320。
该预设的条件可以是该第一节点的运行参数的值等于或大于设定的阈值或第一节点本地不可达。当第一节点的状态满足预设的条件时,可以认为第一节点为异常节点。
当第一接入设备与第一节点之间持续通信时,第一接入设备可以获取第一节点的运行参数。第一节点的运行参数,例如可以包括丢包率、时延、误码率、CPU利用率、网卡状态中的一种或多种,该运行参数的值指示第一节点的运行状态,例如,是正常运行(运行参数的值小于设定的阈值),还是异常运行(运行参数的值等于或大于设定的阈值)。第一接入设备可以监测或收集第一节点的运行参数的值,确定第一节点的运行参数的值是否等于或大于设定的阈值,并在第一节点的运行参数的值等于或大于设定的阈值时,向第二节点发送第一通知消息。
当第一接入设备与第一节点间的通信中断时,或第一接入设备上存储的到达第一节点的路由失效时,第一接入设备确定存在使所述第一节点本地不可达的故障。本申请中,某个节点“本地不可达”是指数据不能通过该节点的接入设备被转发到该节点。例如,计算节点50a与接入设备20a之间的链路故障,导致其他节点不能通过接入设备20a向计算节点50a发送消息,则计算节点50a相对于接入设备20a本地不可达。第一接入设备与第一节点间的通信中断,例如可以是第一接入设备连接所述第一节点的端口失效,或者所述第一接入设备在设定时间(例如,每隔5分钟,或者每隔N个设定的保活消息周期,或在设定的时间点)内没有接收到来自所述第一节点的保活消息。第一接入设备连接所述第一节点的端口失效,即该第一接入设备用于连接所述第一节点的端口处于 故障(例如,down)的状态。多种情况可以导致第一接入设备连接所述第一节点的端口失效,例如,第一接入设备连接第一节点的端口故障,第一节点连接第一接入设备的端口故障,第一接入设备与第一节点间的线缆故障,第一节点掉电,第一节点复位,第一接入设备的光模块故障等。
在步骤S320中,第一接入设备向第二节点发送第一通知消息,该第一通知消息包括该第一节点的标识以及该第一节点的状态信息。
本申请中,第一通知消息指第一接入设备在发现接入第一接入设备的第一节点的状态满足预设条件时发送的一类消息,该第一通知消息可以有一条或多条。该第一通知消息可以包括第一节点的标识。在一个实施方式中,第一节点的状态信息包括指示给第一节点本地不可达的标识,该标识具体可以是一个字段或者一个标记位。该第一通知消息还可以包括故障位置,故障时间等。第一通知消息中的故障位置具体可以是第一节点的标识或连接第一节点的端口的标识。第一节点的标识可以是第一节点的IP地址,MAC地址,设备名称,设备标识(identifier,ID)等能够唯一识别该节点的信息。当该第一通知消息是运行参数通知消息时,该第一通知消息还包括运行参数以及该运行参数的值。
在一个实现方式中,步骤S320可以是第一接入设备根据设定的条件,主动向该第二节点发送该第一通知消息。该第二节点可以是订阅了该第一节点的异常信息的节点,也可以是第一接入设备根据一定的规则确定的节点。该规则可以是根据节点的类型的确定的,例如,该规则可以是当异常节点为主用计算节点时,向该主用计算节点的所有备用计算节点发送通知消息。该第二节点还可以是除第一节点之外的所有其他节点。
当第二节点是订阅了该第一节点的异常信息的节点的场景下,第一接入设备(例如,接入设备20a)可以存储有图4所示的订阅表。该订阅表记录了接入设备20a所连接的每个节点及其订阅的节点的信息,该订阅表只是一个示例,可以根据需要删除或增加订阅表中的内容。第一接入设备可以根据该订阅表主动向第二节点发送该第一通知消息,例如,计算节点50a订阅了存储节点40a的信息,则接入设备20a在检测到存储节点40a本地不可达或存储节点40a的运行参数的值满足预设的条件之后,向计算节点50a发送第一通知消息。
在另一个实现方式中,步骤S320可以是第一接入设备接收该第二节点发送的查询消息,根据该查询消息向该第二节点发送该第一通知消息。在这种场景下,步骤S320之前还可以包括S315,该第二节点向该第一接入设备发送查询消息。该查询消息用于从第一接入设备获取第一节点的状态信息。
可选地,第二节点上存储有第二节点的目标节点(例如,第一节点)以及该目标节点的接入设备(例如第一接入设备)的信息,第二节点可以根据需要或根据设定条件向该目标节点的接入设备发送查询消息,该查询消息包括该目标节点的标识,用于从该目标节点的接入设备获取该目标节点的状态信息。
在步骤S325中,该第二节点根据该第一通知消息执行处理操作。
在一个实施方式中,该第二节点根据该第一通知消息获取该第一节点的状态信息,并根据该第一节点的状态信息执行处理操作。如前所述,第一节点的状态信息包括该第一节点的运行参数的值,或所述指示第一节点本地不可达的标识。
在一个可选的实现方式中,当第一节点的状态信息包括指示第一节点本地不可达的标识时,该第二节点获取该第一节点的标识,根据该第一节点的状态信息确定该第一节点本地不可达,并根据第一节点的标识确定该第一节点的类型和属性,然后根据第一节点的类型和属性,所述第二节点的类型和属性执行故障处理操作。由于第一节点本地不可达,第二可以认为网络发生了故障,因此需要执行故障处理操作。
本申请实施例中,节点的类型指该节点是控制节点,计算节点还是存储节点,节点的属性指示该节点为主用节点或备用节点。
例如,该第一通知消息可以包括该第一节点的标识,该第二节点可以根据该第一节点的标识查找节点映射表,确定该第一节点的类型和属性,然后根据该第一节点的类型和属性、该第二节点的类型和属性执行处理操作。可选地,该第二节点存储有节点映射表,该节点映射表的每个表项包括节点标识、节点类型和节点属性的对应关系。一个节点的标识可以是该节点的IP地址,MAC地址,该节点的设备名称,设备标识等能够唯一识别该节点的信息。
当第一节点为计算节点时,步骤S325有以下可能的实现方式:
(1)、若该第一节点为计算功能A(本申请中用计算功能A指示任意一个计算功能)的主用计算节点(例如,计算节点50a),且该第二节点为该计算功能A的备用计算节点(例如,计算节点50b),则该第二节点切换该第二节点的属性为计算功能A的主用计算节点;
(2)、若该第一节点为计算功能A的主用计算节点(例如,计算节点50a),且该第二节点为控制节点(例如,控制节点60a),则该控制节点根据该第一通知消息将计算功能A的备用计算节点(例如,计算节点50b)的属性从备用修改为主用,并通知该备用计算节点,以使该备用计算节点进行主备切换(即使该备用计算节点将该备用计算节点的属性从备用修改为主用);
(3)若该第一节点为计算功能A的备用计算节点(例如,计算节点50b),且该第二节点为控制节点(例如,控制节点60a),则该控制节点根据该第一通知消息更新网络拓扑(修改备用计算节点的属性为不可用或删除备用计算节点)。
上述的实现方式(1)通常发生在没有控制节点存在的场景下,而实现方式(2)通常发生在有控制节点存在的场景下,当然,有控制节点存在的场景下也可以执行(1)和(2),在这样的情况下,计算功能A的备用计算节点可以根据第一接入设备发送的第一通知消息切换为计算功能A的主用计算节点,也可以根据控制节点的通知切换为计算功能A的主用计算节点,优选地,根据先收到的消息执行切换操作。
当第一节点为计算功能A的备用计算节点时,第一接入设备可以既不向计算功能A的主用计算节点发送通知消息,也不向存储有计算功能A需要访问的数据B的存储节点发送通知消息,以避免分布式存储系统中消息过多。
当第一节点为存储节点时,步骤S325有以下可能的实现方式:
(4)、若第一节点为数据B的主用存储节点(例如,存储节点40a),且该第二节点为访问数据B的计算功能A的主用计算节点(例如,计算节点50a),则该第二节点将该第二节点的读写接口切换到该数据B的备用存储节点(例如,存储节点40f),以从该备用存储节点获取数据B。每个计算节点上存储有每个计算功能对应 的存储节点,以及每个存储节点的属性。其中,一个计算功能可以对应多个存储节点。存储节点的属性包括主用或备用。
(5)、若第一节点为数据B的主用存储节点(例如,存储节点40a),且该第二节点为控制节点(例如,控制节点60a),该控制节点根据该第一通知消息确定新的主用存储节点(例如,存储节点40g),删除数据B的主用存储节点或修改该主用存储节点的属性,为数据B确定新的主用存储节点,并将该新的主用存储节点的信息通知使用数据B的计算功能A的主用计算节点和备用计算节点。
(6)、若第一节点为数据B的备用存储节点(例如,存储节点40f),且该第二节点为控制节点(例如,控制节点60a),则该控制节点根据该第一通知消息删除该备用存储节点或修改该备用存储节点的属性为不可用,然后,该控制节点为数据B选择新的备用存储节点(例如,存储节点40e),并将新的备用存储节点通知主用存储节点,以使该主用存储节点向该新的备用存储节点发送数据B。
(7)、若第一节点为数据B的备用存储节点(例如,存储节点40f),且该第二节点为数据B的主用存储节点(例如,存储节点40a),该主用存储节点根据该第一通知消息停止向该存储节点发送数据B。
在实现中,为了避免分布式存储系统中的数据包过多,主用存储节点故障的时候,该主用存储节点的接入设备可以不向需要访问该主用存储节点上存储的数据的计算功能的备用计算节点发送第一通知消息,在备用存储节点故障的时候,该备用存储节点的接入设备可以不向需要访问该备用存储节点上存储的数据的计算功能的计算节点(无论主用计算节点或备用计算节点)发送第一通知消息。
当第一节点为控制节点时,步骤S325有以下可能的实现方式:
(8)、若第一节点为主用控制节点,且该第二节点为该主用控制节点管理的计算节点或存储节点,则第二节点根据该第一通知消息停止向该主用控制节点发送信息,在接收到新的主用控制节点发送的消息后,向该新的主用控制节点注册。
(9)、若第一节点为主用控制节点,且该第二节点为该主用控制节点的备用控制节点,则该第二节点在收到该第一通知消息后,将该第二节点切换为主用控制节点,然后向其管理的节点(包括计算节点和存储节点)发送通知消息,以使被管理的节点向该第二节点注册。
当第一节点为备用控制节点时,第一接入设备可以不向计算节点或存储节点发送通知消息,而仅向管理设备或管理员发送通知消息,以使管理设备或管理员重新指定或部署备用控制节点。
图1所示的分布式存储系统可以不包括控制节点,在不包括控制节点的时候,上述与控制节点相关的各过程可以不执行。
在另一个可选的实现方式中,当第一节点的状态信息包括该第一节点的运行参数的值时,该第二节点获取该第一节点的标识和该第一节点的运行参数的值,当该运行参数的值大于等于该运行参数的告警阈值并小于该运行参数的故障阈值时,该第二节点发送告警消息,该告警消息包括该第一节点的标识和该第一节点的运行参数的值。其中,第二节点可以向网络管理系统或者控制器发送该告警消息。
在再一个可选的实现方式中,当第一节点的状态信息包括该第一节点的运行参数的 值时,该第二节点获取该第一节点的标识和该第一节点的运行参数的值,当该运行参数的值大于该运行参数的故障阈值时,该第二节点根据第一节点的标识确定该第一节点的类型和属性,然后根据第一节点的类型和属性,所述第二节点的类型和属性执行故障处理操作。在这种场景下,第二节点执行的故障处理操作可以参考第一节点的状态信息包括指示第一节点本地不可达的标识时,第二节点的故障处理操作。
进一步地,在步骤S330中,第一接入设备与第二接入设备之间互相发送保活消息,以使第一接入设备确定与第二接入设备的连接是否中断。
其中,步骤S330可以在步骤S310之前执行,也可以在步骤S310之后执行,也可以不执行。
在步骤S340中,第一接入设备确定是否存在使第三节点远端不可达的故障。当不存在使第三节点远端不可达的故障时,第一接入设备可以在等待一段时间后继续执行步骤S340,当存在使第三节点远端不可达的故障时,执行步骤S350。其中,第三节点是指除第一接入设备外的第二接入设备的本地节点,即部署在第二接入设备直接相连的主机上的节点。本申请中,某个节点“远端不可达”是指一个接入设备不能通过该节点的接入设备访问该节点。例如,接入设备20a和接入设备20c之间的链路存在故障,导致接入设备20a不能向接入节点20c本地的计算节点50d和50e,存储节点40f和40g,控制节点60b发送消息,则计算节点50d和50e,存储节点40f和40g,控制节点60b相对于接入设备20a远端不可达。
在步骤S340的第一种实施方式中,第一接入设备可以根据步骤S330中是否接收到第二接入设备发送的保活消息来判断第二接入设备所连接的节点是否远端不可达,如果在设定的时间或设定的周期内没有接收到第二接入设备发送的保活消息,则第一接入设备确定与第二接入设备之间的通信中断,即确定第二接入设备所连接的第三节点远端不可达。
在步骤S340的第二种实施方式中,第一接入设备可以监测路由表,如果原来路由可达的第二接入设备变为路由不可达(例如,第二接入设备对应的子网的路由信息被删除或被置为无效),则第一接入设备确定与第二接入设备之间的通信中断。
在上述第一种和第二种实施方式中,第一接入设备与第二接入设备之间的通信中断,意味着第二接入设备所连接的第三节点远端不可达。这种情况下,第三节点是指第二接入设备连接的所有节点。假设第一接入设备为20a,第二接入设备为20c,第一接入设备与第二接入设备之间的通信中断,是指接入设备20a既不能通过骨干设备30a与接入设备20c通信,也不能通过骨干设备30b与接入设备通信,那么,接入设备20a连接的节点不能访问接入设备20c的任何节点。
在一个实施方式中,每个接入设备上的接入映射表存储了整个分布式存储系统中的节点与其接入设备之间的对应关系。因此,第一接入设备可以在于第二接入设备之间的通信中断后,根据该接入映射表确定远端不可达的第三节点。
在步骤S340的第三种实施方式中,第一接入设备可以监测路由表,如果原来路由可达的第三节点变为路由不可达(即第三节点对应的路由表项被删除或置为无效时),则第一接入设备确定第三节点远端不可达。在这种情况下,第三节点可以是一个节点,多个节点或者一个子网对应的所有节点。
在步骤S350中,第一接入设备向第四节点发送第二通知消息,该第二通知消息用于通知该第四节点该第二接入设备的通信中断或该第二接入设备连接的第三节点远端不可达。
本申请中,第二通知消息是指接入设备发现节点远端不可达时发送的消息,该第二通知消息可以有一条或多条。可选地,第二通知消息中可以包括故障位置,故障时间等。
当第二通知消息用于通知第二接入设备的通信中断时,故障位置可以是第二接入设备的标识(例如,IP地址,MAC地址,所在的路径标识)、第二接入设备的子网前缀等。相应地,第二通知消息可以包括第二接入设备的标识或子网前缀,该第二通知消息还可以包括指示该第二接入设备通信中断的标识。
当第二通知消息用于通知第二接入设备接入的第三节点远端不可达时,该故障位置可以是一个或多个第三节点的标识,该一个或多个第三节点的标识可以是第三节点的IP地址,MAC地址,设备名称,设备标识等能够唯一识别该第三节点的信息。相应地,第二通知消息可以包括第三节点的标识以及指示第三节点远端不可达的标识。
当第一接入设备与第二接入设备之间的通信中断时,将第二接入设备的标识或子网前缀作为故障位置并广播或组播发送第二通知消息可以减少第一接入设备发送第二通知消息的数量,节约网络带宽,这种情况下,第四节点可以是第一接入设备的所有本地节点。
第一接入设备也可以根据每个第三节点的类型和属性,生成不同的第二通知消息,发送给不同的第四节点。这种情况下,第四节点可以是第一接入设备的本地节点,也可以是其他接入设备的本地节点,第四节点可以是一个节点,也可以是多个节点。第四节点可以包括第一节点。可选地,第一接入设备可以根据存储的接入映射表找到全部第三节点,然后根据节点映射表确定每个第三节点的类型和属性,并根据每个第三节点的类型和属性生成不同的第二通知消息。
在一个实施方式中,图1中的接入设备20a为第一接入设备,接入设备20c为第二接入设备,接入设备20a检测到在设定时间内没有接收到接入设备20c发送的保活消息,确定与接入设备20c的通信中断,还可以确定接入设备20c的本地节点,即计算节点50d和50e,存储节点40f和40g以及控制节点60b,相对接入设备20a远端不可达。接入设备20a可以生成一个第二通知消息,该第二通知消息包括接入设备20c的标识(例如IP地址)或子网前缀以及指示接入设备20a与接入设备20c之间的通信中断的标识,然后将该第二通知消息单播、广播或组播发送给所有的本地节点,即计算节点50a和存储节点40a-40c。接入设备20a还可以根据接入设备20存储的接入设备20c的本地节点的信息,以及接入设备20a存储的订阅表,生成不同的第二通知消息。例如,当接入设备20a确定计算节点50d和50e,存储节点40f和40g以及控制节点60b远端不可达时,接入设备20a可以根据该订阅表向计算节点50a和存储节点40b发送控制节点60b远端不可达的第二通知消息,也可以根据该订阅表向计算节点50a发送计算节点50d远端不可达的第二通知消息。
在步骤S355中,第四节点接收该第二通知消息,根据该第二通知消息执行处理操作。
第四节点可以是一个或多个,当存在多个第四节点时,每个第四节点接收到第二通 知消息后,根据该第二通知消息执行处理操作。
当第二通知消息为广播或组播消息时,该广播或组播消息可以包括第二接入设备的标识或子网前缀,或者包括一个第三节点的标识,或者包括多个第三节点的标识。当第四节点接收到该第二通知消息后,可以获取该第二接入设备的标识或子网前缀,根据该第二接入设备的标识或子网前缀确定第三节点的标识,并根据第三节点的标识执行相应的处理操作。例如,第四节点上记录了第四节点的关联节点的信息。该关联节点可以是第四节点需要访问的节点。第四节点比较所述关联节点的标识与该第二接入设备的标识或子网前缀,将匹配的关联节点的标识作为第三节点的标识,根据该第三节点的标识确定该第三节点的类型和属性,根据该第三节点的类型和属性,该第四节点的类型和属性执行故障处理操作。可选地,该第四节点的关联节点记录在第四节点的关联表中,该关联表记录第四节点的关联节点的标识,类型和属性。在另一个实施方式中,第四节点还可以存储该第二接入设备的标识或子网前缀,并在需要访问一个新的节点时,比较该存储的第二接入设备的标识或子网前缀与该新的节点的标识,确定是否需要对该新的节点是否执行故障处理操作。当所述新的节点的标识与所述第二接入设备的标识或子网前缀匹配时,第四节点确定需要对该新的节点执行故障处理操作。然后,第四节点根据所述新的节点的标识确定所述新的节点的类型和属性,并根据该新的节点的类型和属性以及该第四节点的类型和属性执行故障处理操作。当第二通知消息为单播消息时,该第二通知消息包括第三节点的标识,该第四节点根据第三节点的标识执行故障处理操作。即,该第四节点从该第二通知消息中获取该第三节点的标识,根据该第三节点的状态信息确定该第一节点远端不可达。该第四节点根据该第三节点的标识确定该第三节点的类型和属性;该第四节点根据该第三节点的类型和属性以及该第四节点的类型和属性执行故障处理操作。
第四节点执行的故障处理操作可以参考第二节点的故障处理过程。
本申请中,第一阶段(包括步骤S310,S315,S320和S325)与第二阶段(包括步骤330,S340,S350和S355)是两个独立的过程,因此,第二阶段可以在第一阶段之前或之后执行,也可以与第一阶段同时执行。
本申请中的第一节点,第二节点,第三节点和第四节点仅用于在不同场景下区分节点不同的功能,而不用于区分不同的节点。同一个节点,例如计算节点50a,可以因为在时刻T1发生故障,作为接入设备20a本地不可达的第一节点,也可以在故障恢复后的T2时刻,作为接收接入设备20a发送的通知存储节点40a本地不可达的通知的第二节点,也可以在T3时刻,因为接入设备20a与接入设备20b之间的通信中断,被接入设备20b作为远端不可达的第三节点,还可以在T4时刻,因为接入设备20a与接入设备20c之间的通信中断,作为接收接入设备20a发送的通知计算节点50d远端不可达的第四节点。基于以上原因,本申请图3中的步骤S325和步骤S355可以由同一个节点执行,也可以由不同的节点执行。
本申请各实施例中,分布式存储系统中的接入设备主动检测网络中是否存在异常,并在检测到异常后向节点(可以是订阅了故障通知的节点或全部本地节点)发送通知消息,接收到该通知消息的节点可以执行相应的处理操作。本申请不需要节点之间互相发送保活消息,不仅可以减少分布式存储系统中的保活消息的数量,节约网络带宽,还可 以提高数据存储和读取的准确性和可靠性。
上述本申请提供的实施例中,分别从节点和接入设备的角度对本申请实施例提供的异常处理方法进行了介绍。可以理解的是,本申请实施例中的节点和接入设备为了实现上述功能,其包含了执行各个功能相应的硬件结构和/或软件模块。本领域技术人员应该很容易意识到,本申请中所公开的实施例描述的各示例的功能和步骤,能够以硬件或硬件和计算机软件的结合等形式来实现。某个功能究竟以硬件还是计算机软件驱动硬件的方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。以下从不同角度对本申请节点和的结构予以说明。
为了实现本申请图3所示的方法,本申请实施例提供了一种接入设备500。该接入设备500可以是图3中的第一接入设备。该接入设备500除了图5所示的组成部分外,还可以包括其他组成部分以实现更多功能。
如图5所示,该接入设备500包括检测单元5031和处理单元5032。本申请实施例中,检测单元5031用于执行步骤S310和/或S340,处理单元5032用于执行步骤S320和/或S340。进一步地,处理单元5032还可以实现步骤S330。以下对检测单元5031和处理单元5032的功能进行详细描述。
检测单元5031用于检测第一节点的状态,接入设备500为第一节点的接入设备,第一节点为存储节点、计算节点或控制节点;
处理单元5032用于当该第一节点的状态满足预设条件时,向第二节点发送第一通知消息,该第二节点为该分布式存储系统中除该第一节点之外的节点,该第一通知消息包括该第一节点的标识以及该第一节点的状态信息。
在一个实施方式中,该第一节点的状态满足预设条件包括该第一节点的运行参数的值等于或大于设定的阈值,该第一节点的状态信息包括该第一节点的运行参数的值
在另一个实施方式中,该第一节点的状态满足预设条件包括该第一节点本地不可达,该第一节点的状态信息包括指示该第一节点本地不可达的标识。
可选地,该第一节点本地不可达包括:该接入设备500在设定时间内没有接收到来自该第一节点的保活消息;或接入设备500连接该第一节点的端口失效。
可选地,当向该第二节点发送该第一通知消息时,该处理单元5032用于:根据该第二节点发送的查询消息向该第二节点发送该第一通知消息;或根据设定的条件,主动向该第二节点发送该第一通知消息。
在一个实施方式中,该分布式存储系统还包括第二接入设备和连接该第二接入设备的第三节点,该处理单元5032还用于:在该第三节点的路由信息被删除或置为无效时,向第四节点发送第二通知消息,该第四节点接入该接入设备500,该第二通知消息用于通知该第四节点该第三节点远端不可达。
在另一个实施方式中,该分布式存储系统还包括第二接入设备,该处理单元5032还用于:在该接入设备500与该第二接入设备的通信中断后,向第四节点发送第二通知消息,该第四节点接入该接入设备500,该第二通知消息用于通知该第四节点该第二接入设备连接的第三节点远端不可达。
进一步地,该接入设备500还可以包括存储单元5033,用于存储实现图3所示的方法时需要的数据,数据5033例如可以包括订阅表,路由表等。
检测单元5031和处理单元5032可以由硬件实现,也可以由软件实现,当由软件实现时,如图5所示,该接入设备500还可以包括处理器501,通信接口502,存储器503。处理器501,通信接口502和存储器503通过总线系统504相连。存储器503用于存储程序代码,该程序代码包括可以实现检测单元5031和处理单元5032的功能的指令。该处理器501可以调用存储器503中的程序代码,以实现检测单元5031和处理单元5032的功能。进一步地,接入设备500还可以包括编程接口505,用于向存储器503写入该程序代码。
本申请实施例中,该处理器501可以是中央处理单元(central processing unit,CPU),该处理器501还可以是其他通用处理器、数字信号处理器(digital signal processor,DSP)、专用集成电路(application-specific integrated circuit,ASIC)、现场可编程门阵列(field programmable gate array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者任何常规的处理器。该处理器501可以包括一个或多个处理核心。
通信接口502用于与外部设备通信,例如,配合存储器503中的程序代码实现接收和/或发送功能。图5中的通信接口502只是示例,实际中,接入设备500可以包括多个通信接口,以连接多个不同的外部设备,并与这些外部设备通信。
该存储器503可以包括只读存储器(read-only memory,ROM)或者随机存取存储器(random access memory,RAM)。任何其他适宜类型的存储设备也可以用作存储器503。存储器503可以包括一个或多个存储设备。存储器503还可以进一步存储操作系统5034,该操作系统用于支持接入设备500的运行。
该总线系统504除包括数据总线之外,还可以包括电源总线、控制总线和状态信号总线等。但是为了清楚说明起见,在图中将各种总线都标为总线系统504。
为了实现本申请图3所示的方法,本申请实施例还提供了一种终端设备600。该终端设备600可以是节点(这种情况下节点为物理设备),也可以是节点所在的主机(这种情况下节点为运行在物理设备上的虚拟设备)。该终端设备600可以具有图3中的第二节点和/或第四节点的功能。该终端设备600除了图6所示的组成部分外,还可以包括其他组成部分以实现更多功能。
如图6所示,该终端设备600包括接收单元6031和处理单元6032。本申请实施例中,接收单元6031用于执行步骤S320和/或S350,处理单元6032用于执行步骤S325和/或S355。进一步地,处理单元6032还可以实现步骤S315。以下对接收单元6031和处理单元6032的功能进行详细描述。
接收单元6031,用于接收来自第一接入设备的第一通知消息,该第一通知消息是在该第一接入设备确定该第一节点的状态满足预设条件时生成的,该第一通知消息包括该第一节点的标识以及该第一节点的状态信息;
处理单元6032,用于根据该第一通知消息执行处理操作。
在一个实施方式中,该第一节点的状态信息包括指示该第一节点本地不可达的标识,该处理单元6032用于:获取该第一节点的标识;根据该第一节点的状态信 息确定该第一节点本地不可达;根据该第一节点的标识确定该第一节点的类型和属性;根据该第一节点的类型和属性,该第二节点的类型和属性执行故障处理操作。
在另一个实施方式中,该第一节点的状态信息包括该第一节点的运行参数的值,处理单元6032用于:获取该第一节点的标识和该第一节点的运行参数的值;当该运行参数的值大于等于该运行参数的告警阈值且小于该运行参数的故障阈值时,发送告警消息,该告警消息包括该第一节点的标识和该第一节点的运行参数的值;当该运行参数的值大于该运行参数的故障阈值时,根据该第一节点的标识确定该第一节点的类型和属性;根据该第一节点的类型和属性,该第二节点的类型和属性执行故障处理操作。
在一个实施方式中,该接收单元6031还用于接收第二通知消息,该第二通知消息用于通知该第二节点分布式存储系统中的第二接入设备的通信中断或该第二接入设备连接的第三节点远端不可达;该处理单元6032还用于根据该第二通知消息执行处理操作。
可选地,该第二通知消息包括该第三节点的标识以及指示该第三节点远端不可达的标识,该处理单元6032用于:从该第二通知消息中获取该第三节点的标识;根据该第三节点的状态信息确定该第一节点远端不可达;根据该第三节点的标识确定该第三节点的类型和属性;根据该第三节点的类型和属性以及该第二节点的类型和属性执行故障处理操作。
可选地,该第二通知消息包括该第二接入设备的标识或子网前缀,该处理单元6032用于:从该第二通知消息中获取该第二接入设备的标识或子网前缀;确定匹配该第二接入设备的标识或子网前缀的第三节点的标识;根据该第三节点的标识确定该第三节点的类型和属性;根据该第三节点的类型和属性以及该第二节点的类型和属性执行故障处理操作。
该第二通知消息包括该第二接入设备的标识或子网前缀,该处理单元6032用于:从该第二通知消息中获取该第二接入设备的标识或子网前缀;将该第二接入设备的标识或子网前缀发送给存储单元,以使该存储单元存储该第二接入设备的标识或子网前缀;当该第二接入设备需要访问新的节点前,比较该新的节点的标识与该第二接入设备的标识或子网前缀;当该新的节点的标识与该第二接入设备的标识或子网前缀匹配时,根据该第二节点根据该新的节点的标识确定该新的节点的类型和属性;根据该新的节点的类型和属性以及该第二节点的类型和属性执行故障处理操作。
进一步地,该终端设备600还可以包括存储单元6033,用于存储实现图3所示的方法时需要的数据,数据例如可以包括关联表,第二接入设备的标识或子网前缀,节点映射表等。
接收单元6031和处理单元6032可以由硬件实现,也可以由软件实现,当由软件实现时,如图6所示,终端设备600还可以包括处理器601,通信接口602,存储器603。处理器601,通信接口602和存储器603通过总线系统604相连。存储器603用于存储程序代码,该程序代码包括可以实现接收单元6031和处理单元6032的功能的指令。该处理器601可以调用存储器603中的程序代码,以实现接收单元6031和处理单元6032 的功能。进一步地,终端设备600还可以包括编程接口605,用于向存储器603写入该程序代码。
本申请实施例中,该处理器601可以是CPU,该处理器601还可以是其他通用处理器、,DSP、ASIC、FPGA或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者任何常规的处理器。该处理器601可以包括一个或多个处理核心。
通信接口602用于与外部设备通信,例如,配合存储器603中的程序代码实现接收和/或发送功能。图6中的通信接口602只是示例,实际中,终端设备600可以包括多个通信接口,以连接多个不同的外部设备,并与这些外部设备通信。
该存储器603可以包括ROM或者RAM。任何其他适宜类型的存储设备也可以用作存储器603。存储器603可以包括一个或多个存储设备。存储器603还可以进一步存储操作系统6034,该操作系统6034用于支持终端设备的600的运行。
该总线系统604除包括数据总线之外,还可以包括电源总线、控制总线和状态信号总线等。但是为了清楚说明起见,在图中将各种总线都标为总线系统604。
本申请实施例提供的接入设备500或终端设备600的各个组成部分仅仅是示例性的,本领域技术人员可以根据需要增加或减少组件,也可以把一个组件的功能分割由多个组件来实现。本申请接入设备500或终端设备600的各功能的实现方式可参考对图3各步骤的描述。
通过以上实施方式的描述,本领域的技术人员可以清楚地了解到本申请接入设备500或终端设备600可以通过硬件实现,也可以借助软件加必要的通用硬件平台的方式来实现。基于这样的理解,本申请的技术方案可以以硬件产品或软件产品的形式体现出来。该硬件产品可以是专用芯片或处理器。该软件产品可以存储在一个非易失性存储介质(可以是CD-ROM,U盘,移动硬盘等)中,该软件产品包括若干指令。该软件产品被执行时,可以使得一台计算机设备执行本申请各个实施例所述的方法。
以上所述仅是本申请的优选实施方式,应当指出,对于本技术领域的普通技术人员来说,在不脱离本申请原理的前提下,还可以做出若干改进和润饰,这些改进和润饰也应视为本申请的保护范围。

Claims (34)

  1. 一种分布式存储系统,其特征在于,包括第一交换机和多个节点,所述多个节点包括第一节点和第二节点;所述第一节点接入所述第一交换机;所述第一节点为存储节点、计算节点或控制节点;
    所述第一交换机用于:检测所述第一节点的状态,当所述第一节点的状态满足预设条件时,向所述第二节点发送第一通知消息,所述第一通知消息包括所述第一节点的标识以及所述第一节点的状态信息;
    所述第二节点用于:接收所述第一通知消息,根据所述第一通知消息执行处理操作。
  2. 根据权利要求1所述的分布式存储系统,其特征在于,
    所述第一节点的状态满足预设条件包括所述第一节点的运行参数的值等于或大于设定的阈值,所述第一节点的状态信息包括所述第一节点的运行参数的值;
    所述第一节点的状态满足预设条件包括所述第一节点本地不可达,所述第一节点的状态信息包括指示所述第一节点本地不可达的标识。
  3. 根据权利要求2所述的分布式存储系统,其特征在于,所述第一节点本地不可达包括:
    所述第一交换机在设定时间内没有接收到来自所述第一节点的保活消息;
    所述第一交换机连接所述第一节点的端口失效。
  4. 根据权利要求1-3中任意一项所述的分布式存储系统,其特征在于,当向所述第二节点发送第一通知消息时,所述第一交换机用于:
    根据所述第二节点发送的查询消息向所述第二节点发送所述第一通知消息;
    根据设定的条件,主动向所述第二节点发送所述第一通知消息。
  5. 根据权利要求1-4中任意一项所述的分布式存储系统,其特征在于,所述分布式存储系统还包括第二交换机和连接所述第二交换机的第三节点;
    所述第一交换机还用于在所述第三节点的路由信息被删除或置为无效时,向第四节点发送第二通知消息,所述第四节点接入所述第一交换机,所述第二通知消息用于通知所述第四节点所述第三节点远端不可达。
  6. 根据权利要求1-4中任意一项所述的分布式存储系统,其特征在于,所述分布式存储系统还包括第二交换机;
    所述第一交换机还用于在与所述第二交换机的通信中断后,向第四节点发送第二通知消息,所述第四节点接入所述第一交换机,所述第二通知消息用于通知所述第四节点所述第二交换机的通信中断或所述第二交换机连接的所有第三节点远端不可达。
  7. 一种异常处理方法,其特征在于,所述方法应用于分布式存储系统的第一 交换机中,所述方法包括:
    所述第一交换机检测第一节点的状态,所述第一交换机为所述第一节点的交换机,所述第一节点为存储节点、计算节点或控制节点;
    当所述第一节点的状态满足预设条件时,所述第一交换机向第二节点发送第一通知消息,所述第二节点为所述分布式存储系统中除所述第一节点之外的节点,所述第一通知消息包括所述第一节点的标识以及所述第一节点的状态信息。
  8. 根据权利要求7所述的异常处理方法,其特征在于,
    所述第一节点的状态满足预设条件包括所述第一节点的运行参数的值等于或大于设定的阈值,所述第一节点的状态信息包括所述第一节点的运行参数的值;
    所述第一节点的状态满足预设条件包括所述第一节点本地不可达,所述第一节点的状态信息包括指示所述第一节点本地不可达的标识。
  9. 根据权利要求8所述的异常处理方法,其特征在于,所述第一节点本地不可达包括:
    所述第一交换机在设定时间内没有接收到来自所述第一节点的保活消息;
    所述第一交换机连接所述第一节点的端口失效。
  10. 根据权利要求7-9中任意一项所述的异常处理方法,其特征在于,所述向第二节点发送第一通知消息包括:
    根据所述第二节点发送的查询消息向所述第二节点发送所述第一通知消息;
    所述第一交换机根据设定的条件,主动向所述第二节点发送所述第一通知消息。
  11. 根据权利要求7-10中任意一项所述的异常处理方法,其特征在于,所述分布式存储系统还包括第二交换机和连接所述第二交换机的第三节点,所述方法还包括;
    在所述第三节点的路由信息被删除或置为无效时,所述第一交换机向第四节点发送第二通知消息,所述第四节点接入所述第一交换机,所述第二通知消息用于通知所述第四节点所述第三节点远端不可达。
  12. 根据权利要求7-10中任意一项所述的异常处理方法,其特征在于,所述分布式存储系统还包括第二交换机,所述方法还包括;
    在所述第一交换机与所述第二交换机的通信中断后,向第四节点发送第二通知消息,所述第四节点接入所述第一交换机,所述第二通知消息用于通知所述第四节点所述第二交换机连接的第三节点远端不可达。
  13. 一种异常处理方法,其特征在于,所述方法应用于分布式存储系统的第二节点中,所述分布式存储系统还包括第一节点,所述第一节点接入第一交换机,所述第一节点为存储节点、计算节点或控制节点,所述方法包括:
    所述第二节点接收来自所述第一交换机的第一通知消息,所述第一通知消息是在所述第一交换机确定所述第一节点的状态满足预设条件时生成的,所述第一通知 消息包括所述第一节点的标识以及所述第一节点的状态信息;
    所述第二节点根据所述第一通知消息执行处理操作。
  14. 根据权利要求13所述的异常处理方法,其特征在于,所述第一节点的状态信息包括指示所述第一节点本地不可达的标识,所述第二节点根据所述第一通知消息执行处理操作包括:
    所述第二节点从所述第一通知消息中获取所述第一节点的标识;
    所述第二节点根据所述第一节点的状态信息确定所述第一节点本地不可达;
    所述第二节点根据所述第一节点的标识确定所述第一节点的类型和属性;
    所述第二节点根据所述第一节点的类型和属性以及所述第二节点的类型和属性执行故障处理操作。
  15. 根据权利要求13所述的异常处理方法,其特征在于,所述第一节点的状态信息包括所述第一节点的运行参数的值,所述第二节点根据所述第一通知消息执行处理操作包括:
    所述第二节点从所述第一通知消息中获取所述第一节点的标识和所述第一节点的运行参数的值;当所述运行参数的值大于等于所述运行参数的告警阈值且小于所述运行参数的故障阈值时,所述第二节点发送告警消息,所述告警消息包括所述第一节点的标识和所述第一节点的运行参数的值;
    所述第二节点从所述第一通知消息中获取所述第一节点的标识和所述第一节点的运行参数的值;当所述运行参数的值大于所述运行参数的故障阈值时,所述第二节点根据所述第一节点的标识确定所述第一节点的类型和属性;所述第二节点根据所述第一节点的类型和属性,所述第二节点的类型和属性执行故障处理操作。
  16. 根据权利要求13-15中任意一项所述的方法,其特征在于,所述分布式存储系统还包括第二交换机,所述方法还包括:
    所述第二节点接收第二通知消息,所述第二通知消息用于通知所述第二节点所述第二交换机的通信中断或所述第二交换机连接的所有第三节点远端不可达;
    所述第二节点根据所述第二通知消息执行处理操作。
  17. 根据权利要求16所述的方法,其特征在于,所述第二通知消息包括所述第三节点的标识以及指示所述第三节点远端不可达的标识,所述第二节点根据所述第二通知消息执行处理操作包括:
    所述第二节点从所述第二通知消息中获取所述第三节点的标识;
    所述第二节点根据所述第三节点的状态信息确定所述第一节点远端不可达;
    所述第二节点根据所述第三节点的标识确定所述第三节点的类型和属性;
    所述第二节点根据所述第三节点的类型和属性以及所述第二节点的类型和属性执行故障处理操作。
  18. 根据权利要求16所述的方法,其特征在于,所述第二通知消息包括所述第二交换机的标识或子网前缀,所述第二节点根据所述第二通知消息执行处理操作包括:
    所述第二节点从所述第二通知消息中获取所述第二交换机的标识或子网前缀;
    所述第二节点确定匹配所述第二交换机的标识或子网前缀的第三节点的标识;
    所述第二节点根据所述第三节点的标识确定所述第三节点的类型和属性;
    所述第二节点根据所述第三节点的类型和属性以及所述第二节点的类型和属性执行故障处理操作。
  19. 根据权利要求16所述的方法,其特征在于,所述第二通知消息包括所述第二交换机的标识或子网前缀,所述第二节点根据所述第二通知消息执行处理操作包括:
    所述第二节点从所述第二通知消息中获取所述第二交换机的标识或子网前缀;
    所述第二节点存储所述第二交换机的标识或子网前缀;
    当所述第二节点需要访问新的节点前,所述第二节点比较所述新的节点的标识与所述第二交换机的标识或子网前缀;
    当所述新的节点的标识与所述第二交换机的标识或子网前缀匹配时,所述第二节点根据所述新的节点的标识确定所述新的节点的类型和属性;
    所述第二节点根据所述新的节点的类型和属性以及所述第二节点的类型和属性执行故障处理操作。
  20. 一种交换机,所述交换机为分布式存储系统中的第一交换机,所述第一交换机包括:
    检测单元,用于检测第一节点的状态,所述第一交换机为所述第一节点的交换机,所述第一节点为存储节点、计算节点或控制节点;
    处理单元,用于当所述第一节点的状态满足预设条件时,向第二节点发送第一通知消息,所述第二节点为所述分布式存储系统中除所述第一节点之外的节点,所述第一通知消息包括所述第一节点的标识以及所述第一节点的状态信息。
  21. 根据权利要求20所述的交换机,其特征在于,
    所述第一节点的状态满足预设条件包括所述第一节点的运行参数的值等于或大于设定的阈值,所述第一节点的状态信息包括所述第一节点的运行参数的值;
    所述第一节点的状态满足预设条件包括所述第一节点本地不可达,所述第一节点的状态信息包括指示所述第一节点本地不可达的标识。
  22. 根据权利要求21所述的交换机,其特征在于,所述第一节点本地不可达包括:
    所述第一交换机在设定时间内没有接收到来自所述第一节点的保活消息;或
    所述第一交换机连接所述第一节点的端口失效。
  23. 根据权利要求20-22中任意一项所述的交换机,其特征在于,当向所述第二节点发送所述第一通知消息时,所述处理单元用于:
    根据所述第二节点发送的查询消息向所述第二节点发送所述第一通知消息;或
    根据设定的条件,主动向所述第二节点发送所述第一通知消息。
  24. 根据权利要求20-23中任意一项所述的交换机,其特征在于,所述分布式存储系统还包括第二交换机和连接所述第二交换机的第三节点,所述处理单元还用于:
    在所述第三节点的路由信息被删除或置为无效时,向第四节点发送第二通知消息,所述第四节点接入所述第一交换机,所述第二通知消息用于通知所述第四节点所述第三节点远端不可达。
  25. 根据权利要求20-23中任意一项所述的交换机,其特征在于,所述分布式存储系统还包括第二交换机,所述处理单元还用于:
    在所述第一交换机与所述第二交换机的通信中断后,向第四节点发送第二通知消息,所述第四节点接入所述第一交换机,所述第二通知消息用于通知所述第四节点所述第二交换机连接的第三节点远端不可达。
  26. 一种节点,其特征在于,所述节点为部署在分布式存储系统的第二节点,所述分布式存储系统还包括第一节点,所述第一节点连接第一交换机,所述第一节点为存储节点、计算节点或控制节点,所述第二节点包括:
    接收单元,用于接收来自所述第一交换机的第一通知消息,所述第一通知消息是在所述第一交换机确定所述第一节点的状态满足预设条件时生成的,所述第一通知消息包括所述第一节点的标识以及所述第一节点的状态信息;
    处理单元,用于根据所述第一通知消息执行处理操作。
  27. 根据权利要求26所述的节点,其特征在于,所述第一节点的状态信息包括指示所述第一节点本地不可达的标识,所述处理单元用于:
    获取所述第一节点的标识;
    根据所述第一节点的状态信息确定所述第一节点本地不可达;
    根据所述第一节点的标识确定所述第一节点的类型和属性;
    根据所述第一节点的类型和属性,所述第二节点的类型和属性执行故障处理操作。
  28. 根据权利要求26所述的节点,其特征在于,所述第一节点的状态信息包括所述第一节点的运行参数的值,处理单元用于:
    获取所述第一节点的标识和所述第一节点的运行参数的值;当所述运行参数的值大于等于所述运行参数的告警阈值且小于所述运行参数的故障阈值时,发送告警消息,所述告警消息包括所述第一节点的标识和所述第一节点的运行参数的值;
    获取所述第一节点的标识和所述第一节点的运行参数的值;当所述运行参数的值大于所述运行参数的故障阈值时,根据所述第一节点的标识确定所述第一节点的类型和属性;根据所述第一节点的类型和属性,所述第二节点的类型和属性执行故障处理操作。
  29. 根据权利要求26-28中任意一项所述的节点,其特征在于,所述分布式存储系统还包括第二交换机,
    所述接收单元还用于接收第二通知消息,所述第二通知消息用于通知所述第二节点所述第二交换机的通信中断或所述第二交换机连接的第三节点远端不可达;
    所述处理单元还用于根据所述第二通知消息执行处理操作。
  30. 根据权利要求29所述的节点,其特征在于,所述第二通知消息包括所述第三节点的标识以及指示所述第三节点远端不可达的标识,所述处理单元用于:
    从所述第二通知消息中获取所述第三节点的标识;
    根据所述第三节点的状态信息确定所述第一节点远端不可达;
    根据所述第三节点的标识确定所述第三节点的类型和属性;
    根据所述第三节点的类型和属性以及所述第二节点的类型和属性执行故障处理操作。
  31. 根据权利要求29所述的节点,其特征在于,所述第二通知消息包括所述第二交换机的标识或子网前缀,所述处理单元用于:
    从所述第二通知消息中获取所述第二交换机的标识或子网前缀;
    确定匹配所述第二交换机的标识或子网前缀的第三节点的标识;
    根据所述第三节点的标识确定所述第三节点的类型和属性;
    根据所述第三节点的类型和属性以及所述第二节点的类型和属性执行故障处理操作。
  32. 根据权利要求29所述的节点,其特征在于,所述第二通知消息包括所述第二交换机的标识或子网前缀,所述节点还包括存储单元,所述处理单元用于:
    从所述第二通知消息中获取所述第二交换机的标识或子网前缀;
    将所述第二交换机的标识或子网前缀发送给所述存储单元,以使所述存储单元存储所述第二交换机的标识或子网前缀;
    当所述第二交换机需要访问新的节点前,比较所述新的节点的标识与所述第二交换机的标识或子网前缀;
    当所述新的节点的标识与所述第二交换机的标识或子网前缀匹配时,根据所述第二节点根据所述新的节点的标识确定所述新的节点的类型和属性;
    根据所述新的节点的类型和属性以及所述第二节点的类型和属性执行故障处理操作。
  33. 一种交换机,其特征在于,所述交换机部署在分布式存储系统中,所述交换机包括存储器和处理器,所述存储器用于存储计算机程序代码,所述处理器用于调用所述计算机程序代码,以使所述交换机执行权利要求7-12中任意一项所述的异常处理方法。
  34. 一种主机,其特征在于,所述主机部署在分布式存储系统中,所述主机上运行节点,所述主机包括存储器、处理器和通信接口,所述存储器用于存储计算机程序代码,所述处理器用于调用所述计算机程序代码,以在所述通信接口的配合下使所述节点实现权利要求13-19中任意一项所述的异常处理方法。
PCT/CN2021/095586 2020-06-12 2021-05-24 一种分布式存储系统及其异常处理方法和相关装置 WO2021249173A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP21822660.3A EP4148549A4 (en) 2020-06-12 2021-05-24 DISTRIBUTED STORAGE SYSTEM, ANOMALY PROCESSING METHOD THEREFOR AND ASSOCIATED APPARATUS
US18/064,752 US20230106077A1 (en) 2020-06-12 2022-12-12 Distributed Storage System, Exception Handling Method Thereof, and Related Apparatus

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010538198.4A CN113805788B (zh) 2020-06-12 2020-06-12 一种分布式存储系统及其异常处理方法和相关装置
CN202010538198.4 2020-06-12

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US18/064,752 Continuation US20230106077A1 (en) 2020-06-12 2022-12-12 Distributed Storage System, Exception Handling Method Thereof, and Related Apparatus

Publications (1)

Publication Number Publication Date
WO2021249173A1 true WO2021249173A1 (zh) 2021-12-16

Family

ID=78845183

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/095586 WO2021249173A1 (zh) 2020-06-12 2021-05-24 一种分布式存储系统及其异常处理方法和相关装置

Country Status (4)

Country Link
US (1) US20230106077A1 (zh)
EP (1) EP4148549A4 (zh)
CN (1) CN113805788B (zh)
WO (1) WO2021249173A1 (zh)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109088794A (zh) * 2018-08-20 2018-12-25 郑州云海信息技术有限公司 一种节点的故障监测方法和装置
US10275326B1 (en) * 2014-10-31 2019-04-30 Amazon Technologies, Inc. Distributed computing system failure detection
CN110535692A (zh) * 2019-08-12 2019-12-03 华为技术有限公司 故障处理方法、装置、计算机设备、存储介质及存储系统
CN110740072A (zh) * 2018-07-20 2020-01-31 华为技术有限公司 一种故障检测方法、装置和相关设备
CN110830283A (zh) * 2018-08-10 2020-02-21 华为技术有限公司 故障检测方法、装置、设备和系统

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100440872C (zh) * 2004-10-01 2008-12-03 中兴通讯股份有限公司 一种分布式环境中消息交换的实现方法及其装置
EP2832042A4 (en) * 2012-03-27 2015-12-23 Ericsson Telefon Ab L M COMMON KEEP ALIVE AND ERROR RECOGNITION MECHANISMS IN A DISTRIBUTED NETWORK
US9665415B2 (en) * 2015-09-26 2017-05-30 Intel Corporation Low-latency internode communication
CN106936662B (zh) * 2015-12-31 2020-01-31 杭州华为数字技术有限公司 一种实现心跳机制的方法、装置及系统
CN109257195B (zh) * 2017-07-12 2021-01-15 华为技术有限公司 集群中节点的故障处理方法及设备

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10275326B1 (en) * 2014-10-31 2019-04-30 Amazon Technologies, Inc. Distributed computing system failure detection
CN110740072A (zh) * 2018-07-20 2020-01-31 华为技术有限公司 一种故障检测方法、装置和相关设备
CN110830283A (zh) * 2018-08-10 2020-02-21 华为技术有限公司 故障检测方法、装置、设备和系统
CN109088794A (zh) * 2018-08-20 2018-12-25 郑州云海信息技术有限公司 一种节点的故障监测方法和装置
CN110535692A (zh) * 2019-08-12 2019-12-03 华为技术有限公司 故障处理方法、装置、计算机设备、存储介质及存储系统

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP4148549A4

Also Published As

Publication number Publication date
US20230106077A1 (en) 2023-04-06
EP4148549A4 (en) 2023-10-25
CN113805788A (zh) 2021-12-17
CN113805788B (zh) 2024-04-09
EP4148549A1 (en) 2023-03-15

Similar Documents

Publication Publication Date Title
US6658595B1 (en) Method and system for asymmetrically maintaining system operability
JP3649580B2 (ja) 分散コンピュータ・システムのエラーを報告するシステム
US11463303B2 (en) Determining the health of other nodes in a same cluster based on physical link information
US20140254347A1 (en) Ethernet Ring Protection Switching Method, Node, and System
US20230111966A1 (en) Ethernet storage system, and information notification method and related apparatus thereof
US11750496B2 (en) Method for multi-cloud interconnection and device
CN113472646B (zh) 一种数据传输方法、节点、网络管理器及系统
JP3101604B2 (ja) 分散コンピュータ・システムのエラーを報告する方法
CN111371625A (zh) 一种双机热备的实现方法
CN107612772B (zh) 支付系统的节点状态探测方法及装置
WO2021249173A1 (zh) 一种分布式存储系统及其异常处理方法和相关装置
CN115514719B (zh) 报文发送方法、装置、交换机及可读存储介质
CN110661836B (zh) 消息路由方法、装置及系统、存储介质
CN115152192B (zh) Pce受控网络可靠性
CN109818870B (zh) 一种组播选路方法、装置、业务板及机器可读存储介质
CN112217718A (zh) 一种业务处理方法、装置、设备及存储介质
WO2022083503A1 (zh) 数据处理方法及装置
US11729022B2 (en) Uplink connectivity in ring networks
CN116112500B (zh) 一种基于故障探测和路由策略的nfs高可用系统及方法
CN112328375B (zh) 一种用于跟踪分布式系统的数据片段的关联方法和装置
CN116962446B (zh) 一种NVMe-oF链路动态管理方法及系统
WO2017000097A1 (zh) 一种数据转发的方法、装置和系统
WO2022044546A1 (ja) 通信システムおよびその障害復旧方法
CN117499206A (zh) 一种通信异常处理方法及计算设备
CN115643202A (zh) 一种链路选择控制协议切换方法、装置、设备及介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21822660

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2021822660

Country of ref document: EP

Effective date: 20221208

NENP Non-entry into the national phase

Ref country code: DE