CN110535692B - Fault processing method and device, computer equipment, storage medium and storage system - Google Patents

Fault processing method and device, computer equipment, storage medium and storage system Download PDF

Info

Publication number
CN110535692B
CN110535692B CN201910741190.5A CN201910741190A CN110535692B CN 110535692 B CN110535692 B CN 110535692B CN 201910741190 A CN201910741190 A CN 201910741190A CN 110535692 B CN110535692 B CN 110535692B
Authority
CN
China
Prior art keywords
fault
storage node
target
storage system
fault state
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910741190.5A
Other languages
Chinese (zh)
Other versions
CN110535692A (en
Inventor
陶毅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN201910741190.5A priority Critical patent/CN110535692B/en
Publication of CN110535692A publication Critical patent/CN110535692A/en
Priority to PCT/CN2020/102302 priority patent/WO2021027481A1/en
Application granted granted Critical
Publication of CN110535692B publication Critical patent/CN110535692B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0654Management of faults, events, alarms or notifications using network fault recovery
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]

Abstract

The invention discloses a distributed storage system fault processing method and device, computer equipment, a storage medium and a storage system, and belongs to the technical field of fault processing. According to the method, the fault state of the distributed storage system is determined according to the storage node with the fault in at least one of the plurality of storage nodes, so that the fault state of the distributed storage system is determined without the need of determining the fault state of all the storage nodes, and after the fault state is determined, the fault state can be immediately sent to each storage node in the distributed storage system, so that each storage node can perform fault processing according to the determined fault state, and the time for the distributed storage system to recover to be normal can be reduced.

Description

Fault processing method and device, computer equipment, storage medium and storage system
Technical Field
The present invention relates to the field of failure processing technologies, and in particular, to a method and an apparatus for processing a failure in a distributed storage system, a computer device, a storage medium, and a distributed storage system.
Background
With the development of big data technology, in order to store more data and prevent data loss, distributed storage systems are increasingly favored by enterprises, storage nodes in the distributed storage systems inevitably fail along with the increase of service time, and when storage nodes in the distributed storage systems all fail, computing nodes providing services for users can perform failure processing on failures in the distributed storage systems in order to ensure that the failed storage nodes do not affect normal services.
The fault handling may be the following process: in a distributed storage system, a client sends Small Computer System Interface (SCSI) requests to a plurality of storage nodes; when all storage nodes in the distributed storage system have faults, if the storage nodes have the faults which can be repaired in a short time, the storage nodes do not respond to the SCSI request, when the client does not acquire the response of any storage node, the client determines the fault state of the distributed storage system as an All Path Down (APD) state, wherein the APD state is the fault state of one storage node defined by the VMWare virtual machine and used for indicating that all paths of the rear-end storage node cannot respond to the host request, and the client hangs the unprocessed SCSI request and waits for a technician to repair the faults in the distributed storage system; when a storage node has an irreparable fault within a short time, the storage node returns a storage exception message to a client, and the client receives the storage exception message, determines a fault state of the distributed storage system as a Persistent Device Loss (PDL) state, wherein the PDL state is a fault state of a storage node defined by a VMWare virtual machine and is used for representing a long-term or permanent fault of a back-end storage node, a file system in the distributed storage system is damaged due to the long-term fault of the storage node, and when the fault state determined by the client is the PDL state, the client powers off the file system and waits for a technician to repair the fault.
In the failure processing process, the client executes the failure processing process only when all the storage nodes in the distributed storage system have failures, but the client does not execute the failure processing process when part of the storage nodes in the distributed storage system have failures. However, it is a common phenomenon that a part of storage nodes in the distributed storage system fail, once a part of storage nodes fail, if failure diagnosis is not performed on the storage nodes in the distributed storage system, the client cannot determine whether the storage nodes in the distributed storage system fail, so that a technician cannot know a message that the storage nodes fail from the client in time, and cannot repair the failed storage nodes immediately, thereby prolonging the time for the distributed storage system to recover to normal.
Disclosure of Invention
The embodiment of the invention provides a distributed storage system fault processing method and device, computer equipment, a storage medium and a storage system, which can reduce the time for the distributed storage system to recover to normal. The technical scheme is as follows:
in a first aspect, a method for handling a failure of a distributed storage system is provided, where the distributed storage system includes a plurality of storage nodes; the method comprises the following steps:
determining a fault state of the distributed storage system according to at least one storage node with a fault in the plurality of storage nodes; the fault state is used for indicating whether the at least one failed storage node can be completely repaired within a first preset time length;
transmitting the fault status to each of the plurality of storage nodes.
Based on the implementation manner, the fault state of the distributed storage system is determined according to the storage node with the fault in the plurality of storage nodes, so that the fault state of the distributed storage system is determined without determining that all the storage nodes have faults, and after the fault state is determined, the fault state can be immediately sent to each storage node in the distributed storage system, so that each storage node performs fault processing according to the determined fault state, and the time for the distributed storage system to recover to normal can be reduced.
In one possible implementation, the method further includes:
and carrying out fault processing according to the fault state of the distributed storage system.
In one possible implementation, the determining the failure status of the distributed storage system according to the storage node in which the failure occurs in at least one of the plurality of storage nodes includes:
determining at least one storage node with a fault in the distributed storage system and target data which cannot be accessed in the at least one storage node;
and determining the fault state according to the at least one storage node and the target data.
In one possible implementation, the determining the fault status according to the at least one storage node and the target data includes:
when the number of the at least one storage node is greater than the redundancy of the distributed storage system and the data volume of the target data meets a preset condition, determining the fault state as a first fault state, wherein the first fault state is used for indicating that the at least one storage node can be completely repaired within the first preset time.
Based on the possible implementation manner, in the first failure state, the target device does not power down the file system, and once the at least one storage node can be repaired within the first preset time, the power down of the file system can be avoided, so that the time for repairing the file system can be reduced, the distributed storage system can recover the service as soon as possible, and the service quality can be ensured.
In one possible implementation, the determining the fault status according to the at least one storage node and the target data includes:
when the number of the at least one storage node is larger than the redundancy of the distributed storage system and the data volume of the target data meets a preset condition, determining the fault state according to a fault scene of the distributed storage system, wherein the fault scene is used for indicating whether the at least one storage node simultaneously breaks down.
In one possible implementation, the preset condition includes any one of:
the ratio of the data volume of the target data to a first preset data volume is greater than a preset ratio, and the first preset data volume is the total data volume of all data stored in the distributed storage system;
the data volume of the target data is larger than a second preset data volume.
In one possible implementation, after determining the fault status according to the at least one storage node and the target data, the method further includes:
when the fault state is the first fault state, if all the at least one storage node is not repaired within the first preset time, updating the fault state from the first fault state to a second fault state, where the second fault state is used to indicate that all the at least one storage node cannot be repaired within the first preset time.
In one possible implementation manner, before determining the fault state of the distributed storage system according to the fault scenario of the distributed storage system, the method further includes:
and determining the fault scene according to the time of the at least one storage node.
In one possible implementation manner, the determining the failure scenario according to the time of the failure of the at least one storage node includes:
when the at least one storage node fails within a target time length, determining the fault scene as a first fault scene, otherwise, determining the fault scene as a second fault scene, wherein the first fault scene is used for indicating that the at least one storage node fails simultaneously, and the second fault scene is used for indicating that the at least one storage node fails at different times.
In one possible implementation manner, the determining the fault state of the distributed storage system according to the fault scenario of the distributed storage system includes:
if the fault scene is a first fault scene, determining the fault state according to the fault type of each storage node in the at least one storage node, wherein the first fault scene is used for indicating that the at least one storage node simultaneously has faults, and the fault type is used for indicating whether the faults of one storage node can be repaired within a second preset time length;
and if the fault scene is a second fault scene, determining the fault state according to the fault type of a first storage node which has a fault in the at least one storage node at the latest, wherein the second fault scene is used for indicating that the time when the at least one storage node has the fault is different.
In one possible implementation, the determining the fault status according to the fault type of each of the at least one storage node includes:
when the fault type of each storage node in the at least one storage node is a first fault type, determining the fault state as a first fault state, wherein the first fault type is used for indicating that the fault of one storage node can be repaired within the second preset time period, and the first fault state is used for indicating that the at least one storage node can be completely repaired within the first preset time period;
when the failure type of a target number of storage nodes in the at least one storage node is a second failure type, if the target number is less than or equal to the redundancy of the distributed storage system, determining the failure state as the first failure state, otherwise, determining the failure state as a second failure state, where the second failure type is used to indicate that a failure of one storage node cannot be repaired within the second preset duration, and the second failure state is used to indicate that the at least one storage node cannot be completely repaired within the first preset duration.
In one possible implementation manner, the determining the fault status according to a fault type of a first storage node that has failed latest in the at least one storage node includes:
when the fault type of the first storage node is a first fault type, determining the fault state as a first fault state, wherein the first fault type is used for indicating that the fault of one storage node can be repaired within the second preset time length, and the first fault state is used for indicating that the at least one storage node can be completely repaired within the first preset time length;
and when the fault type of the first storage node is a second fault type, determining the fault state as a second fault state, wherein the second fault type is used for indicating that the fault of one storage node cannot be repaired within the second preset time length, and the second fault state is used for indicating that the at least one storage node cannot be completely repaired within the first preset time length.
In one possible implementation manner, before determining the fault state of the distributed storage system according to the fault scenario of the distributed storage system, the method further includes:
for any storage node in the at least one storage node, when a preset network fault, a preset abnormal power failure fault, a preset misoperation fault, a preset hardware fault or a preset software fault occurs in the any storage node, determining the fault type of the any storage node as a first fault type, otherwise, determining the fault type of the any storage node as a second fault type, wherein the first fault type is used for indicating that the fault of the one storage node can be repaired within a second preset time length, and the second fault type is used for indicating that the fault of the one storage node cannot be repaired within the second preset time length.
In a possible implementation manner, after performing the fault handling according to the fault status of the distributed storage system, the method further includes:
when the repair of the at least one storage node is completed, sending a repair completion response to each device in the distributed storage system, where the repair completion response is used to indicate that there is no failed device in the distributed storage system.
In a second aspect, a method for handling a failure of a distributed storage system is provided, where the distributed storage system includes a plurality of storage nodes; the method comprises the following steps:
sending an access request to a target storage node in the distributed storage system;
receiving a response returned by the target storage node; the response includes a fault state of the distributed storage system; the fault state is used for indicating whether at least one failed storage node can be completely repaired within a first preset time length.
In one possible implementation, after receiving the response returned by the target storage node, the method further includes:
and processing the fault based on the fault state contained in the response.
In a possible implementation manner, the fault identifier of the fault state includes any one of a first fault identifier or a second fault identifier, where the first fault identifier is used to indicate a first fault state, the second fault identifier is used to indicate a second fault state, the first fault state is used to indicate that at least one storage node can be completely repaired within a first preset time period, the second fault state is used to indicate that the at least one storage node cannot be completely repaired within the first preset time period, and the storage node is a failed storage node in the distributed storage system.
In one possible implementation manner, the performing fault handling based on the fault status included in the response includes:
when the access request is sent by a target client in the distributed storage system based on a target virtual machine, and the target virtual machine is a VMWare virtual machine, if the fault state is the first fault state, the access request is not responded to the target virtual machine, and the first fault state is used for indicating that at least one storage node can be completely repaired within a first preset time length;
when the access request is sent by the target client based on a target virtual machine, and the target virtual machine is a VMWare virtual machine, if the fault state is the second fault state, returning a message of abnormal storage to the target virtual machine, wherein the second fault state is used for indicating that the at least one storage node cannot be completely repaired within the first preset time length.
In one possible implementation manner, the performing fault handling based on the fault status included in the response includes:
when the access request is sent by a target client in the distributed storage system based on a target virtual machine and the target virtual machine is not a VMWare virtual machine, if the fault state is the first fault state, sending a retry request to the target virtual machine, wherein the retry request is used for indicating to resend the access request, and the first fault state is used for indicating that at least one storage node can be completely repaired within a first preset time length;
when the access request is sent by the target client based on a target virtual machine and the target virtual machine is not a VMWare virtual machine, if the fault state is the second fault state, returning a target error which can be identified by the target virtual machine to the target virtual machine, wherein the target error is used for indicating a storage medium fault, and the second fault state is used for indicating that the at least one storage node cannot be completely repaired within the first preset time length.
Based on the possible implementation manners, different fault processing manners are provided, so that the fault processing manner provided by the embodiment of the invention has higher universality.
In one possible implementation, before sending the access request to any storage node in the distributed storage system, the method further includes:
receiving a target access request sent by a target client in the distributed storage system;
the sending an access request to a target storage node in the distributed storage system comprises:
and sending the access request to a target storage node in the distributed storage system based on the target access request.
In one possible implementation, after receiving the response returned by the target storage node, the method further includes:
and receiving a repair completion response returned by the target storage node, wherein the repair completion response is used for indicating that no fault equipment exists in the distributed storage system.
In a third aspect, a distributed storage system is provided, which includes a supervisory node and a plurality of storage nodes;
the supervisory node is configured to:
determining a failure state of the distributed storage system according to at least one failed storage node in the plurality of storage nodes; the fault state is used for indicating whether the at least one failed storage node can be completely repaired within a first preset time length;
transmitting the fault status to each of the plurality of storage nodes;
each storage node of the plurality of storage nodes to receive the fault condition.
In a possible implementation manner, the fault identifier of the fault state includes any one of a first fault identifier and a second fault identifier, the first fault identifier is used to indicate the first fault state, the second fault identifier is used to indicate a second fault state, the first fault state is used to indicate that the at least one storage node can be completely repaired within the first preset duration, and the second fault state is used to indicate that the at least one storage node cannot be completely repaired within the first preset duration.
In a possible implementation manner, each of the plurality of storage nodes is further configured to suspend the access request if the access request is received after the failure state is received, and perform failure processing based on the received failure state.
In a fourth aspect, a fault handling apparatus is provided, which is configured to execute the above fault handling method for the distributed storage system. Specifically, the fault handling apparatus includes a functional module configured to execute the fault handling method provided in the first aspect or any one of the optional manners of the first aspect.
In a fifth aspect, a failure processing apparatus is provided, which is configured to execute the failure processing method of the distributed storage system. Specifically, the fault handling apparatus includes a functional module configured to execute the fault handling method provided in the second aspect or any one of the optional manners of the second aspect.
In a sixth aspect, a computer device is provided, which includes a processor and a memory, where the memory stores at least one instruction, and the instruction is loaded and executed by the processor to implement the operations performed by the fault handling method of the distributed storage system.
In a seventh aspect, a storage medium is provided, where at least one instruction is stored, and the instruction is loaded and executed by a processor to implement the operations performed by the method for processing the fault in the distributed storage system.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a schematic diagram of a distributed storage system according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a network environment of a distributed storage system according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of interaction among devices in a distributed storage system according to an embodiment of the present invention
FIG. 4 is a schematic structural diagram of a computer device according to an embodiment of the present invention;
fig. 5 is a flowchart of a method for processing a failure in a distributed storage system according to an embodiment of the present invention;
fig. 6 is a flowchart of a method for processing a failure in a distributed storage system according to an embodiment of the present invention;
fig. 7 is a schematic structural diagram of a fault handling apparatus according to an embodiment of the present invention;
fig. 8 is a schematic structural diagram of a fault handling apparatus according to an embodiment of the present invention;
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in detail with reference to the accompanying drawings.
Fig. 1 is a schematic diagram of a distributed storage system provided by an embodiment of the present invention, and referring to fig. 1, the distributed storage system includes at least one client 101, a plurality of storage nodes 102, and a policing node 103. The client 101 is configured to provide data storage and data reading services for a user, that is, the client 101 may store data uploaded by the user in the storage node 102, and may also read data from the storage node 102.
The storage node 102 is configured to store data written by the client 101, and is further configured to return data to the client 101, where the returned data may be data requested to be read by the client 101, and may also return a fault state of the distributed storage system issued by the monitoring node 103 to the client 101, so that the client 101 processes a fault according to the fault state of the distributed storage system.
The monitoring node 103 is configured to monitor whether each storage node 102 in the distributed storage system fails, and when the number of the failed storage nodes in the distributed storage system is higher than the redundancy of the distributed storage system, normal operation of a service may be affected, so that the monitoring node 103 may determine a failure state of the distributed storage system according to the failed storage nodes in the distributed storage system, and issue the determined failure state to all storage nodes 102, and the storage nodes 102 may notify the client 101 of the failure state of the distributed storage system, so that the client 101 or the storage nodes 102 may perform failure processing according to the failure state of the distributed storage system.
A Baseboard Management Controller (BMC) of each storage node 102 may monitor whether each storage node fails in real time, and when the BMC of any storage node of the at least one storage node 102 monitors that any storage node fails, the BMC may store the reason why any storage node fails and the time when any storage node fails, and may also send the reason why any storage node fails and the time when any storage node fails to the supervisory node 103, so that the supervisory node 103 may know whether any storage node fails.
When the BMC of the storage node does not send the cause of the failure of the storage node and the time of the failure to the supervisory node 103, the supervisory node 103 may access each storage node 102 to know whether each storage node 102 has a failure, and if there is a failure, the cause of the failure and the time of the failure may be acquired from the failed storage node.
Whether the supervising node 103 receives the failure cause and the failure time sent by the failed storage node, or actively acquires the failure cause and the failure time sent by the failed storage node from the failed storage node, the supervising node 103 may store the failure cause and the failure time of the failed storage node, so that the failure type and the failure scenario may be determined according to the failure cause and the failure time sent by the failed storage node, where the description of the failure type refers to step 603 below, and the description of the failure scenario refers to step 602 below.
The supervising node 103 may store the reason why the failed storage node failed and the time when the failed storage node failed, which may include: the method includes the steps of storing a failure cause and a failure time sent by a failed storage node in a failure table, where the failure table may store a number, an identifier of the storage node, failure time and a failure cause, where the number is used to indicate a few failed storage nodes, and an identifier of one storage node is used to uniquely indicate one storage node, where the identifier may be an internet protocol address (IP) of the storage node, a media access control address (MAC) of the storage node, or a number of the storage node in the distributed storage system. In addition, the failure time is the time when the storage node fails, and the failure reason is the reason when the storage node fails.
For example, as shown in the fault table shown in table 1, as can be seen from table 1, there are 2 storage nodes with faults in the current distributed storage system, which are respectively a storage node identified as X and a storage node identified as Y, where the time of the storage node identified as X failing is a, and the cause of the failure is D; the storage node marked as Y fails at the time B, and the failure reason is C.
TABLE 1
Numbering Identification of storage nodes Time of failure Cause of failure
01 X A D
02 Y C E
It should be noted that, after the storage node recorded in the fault table is repaired, the supervisory node may delete the relevant information of the repaired storage node in the fault table, so that the supervisory node may determine the number of the failed storage node in the distributed storage system according to the number of the last storage node in the fault table.
When a storage node in the distributed storage system fails, a file system in the distributed storage system may be damaged, when the file system is damaged, metadata in the file system may be in error, and since the metadata refers to system data used to describe characteristics of a file, such as access rights, file owner, distribution information of data blocks in the file, and the like, when the metadata is in error, the data blocks in the file indicated by the metadata may not be accessed.
In some implementations, when a client fails to access a data block in any storage node, the client may send a data volume of the data block and a data identifier uniquely identifying the data block to a supervisory node, and after receiving the data volume and the data identifier, the supervisory node stores the received data volume and the data identifier, which may specifically be stored in a data table, where the data table may be used to store a data total volume, the data identifier, and a data volume corresponding to each data identifier, where the data total volume is a data volume of all data that is not currently accessible in the distributed storage system.
For example, as shown in table 2, as can be seen from table 2, the data that cannot be currently accessed in the distributed storage system is data in the data block indicated by the data identifier M and data in the data block indicated by the data identifier N, where the total amount of data that cannot be currently accessed is 30 Kilobytes (KB), and the data that cannot be currently accessed includes data of 10KB in the data block indicated by the data identifier M and data of 20KB in the data block indicated by the data identifier N.
TABLE 2
Figure BDA0002164003400000071
It should be noted that, when the client accesses the data block that cannot be accessed again, if the client can access successfully, the client sends an access data success response carrying the data identifier of the data block to the supervisory node, where the access data success response is used to indicate that the data in the data block can be accessed, and after receiving the access data success response, the supervisory node may delete the data amount corresponding to the data identifier and update the data amount in the data table. For example, if the access data success response carries the data identifier M, the supervisory node deletes the information related to the data identifier M in the data table, and updates the total data amount to 20 KB. It should be noted that, the data table may also store an identifier of a storage node corresponding to the data identifier, so as to indicate which data block in which storage node cannot be accessed.
In some embodiments, since the distributed storage system is responsible for a relatively large amount of traffic, the number of the clients 101 and the storage nodes 102 may be relatively large, and in order to facilitate data transmission between the clients 101 and the storage nodes 102, the application layer where the clients 101 are located may be provided with at least one service switch to facilitate interaction between the clients 101 and the storage nodes 102. To facilitate data transmission between the storage nodes 102, at least one storage switch may be disposed in the storage tier where the storage nodes 102 are located to implement interaction between the storage nodes 102. In order to facilitate data transmission between the supervising node 103 and the storage node 102, a supervising switch may be provided to enable interaction between the supervising node 103 and the storage node 102.
As can be seen from the above description, in the distributed storage system, in addition to the service, a monitoring service needs to be provided, and different services can be implemented through different networks. In order to implement network connection among the client, the storage node and the supervisory node, at least one network port may be installed in each of the client, the storage node and the supervisory node, the at least one network port may be used to connect different networks, the different networks may transmit data of different services, and the at least one network port may be a service network port connected to a service network, a supervisory network port connected to a supervisory network, and a BMC network port connected to a BMC network, respectively.
For explaining a network environment in a distributed storage system, refer to fig. 2, which illustrates a schematic diagram of a network environment of a distributed storage system according to an embodiment of the present invention, in which a network may include a service network, a supervisory network, and a BMC network.
For example, when synchronizing the data block 1 stored in the storage node 1 to the storage node 2, the storage node 1 may pass through a service network port, and in the service network, send the data block 1 to the storage node 2, so that the storage node 2 may receive the data block 1 through its own service network port and store the data block 1 in the storage node 2.
The monitoring network is used for monitoring whether the storage node is in fault and inquiring information, and in the monitoring network, the fault state of the distributed storage system issued by the monitoring node can be transmitted, and the storage node in fault can be inquired. In some possible embodiments, the monitoring node may send the fault state of the distributed storage system to the monitoring network port of the storage node through the monitoring network port, and the storage node may receive the fault state of the monitoring node from the monitoring network through its monitoring network port, and after receiving the service request (that is, SCSI request hereinafter) from the client, the storage node may directly return the issued fault state to the client without processing the received service request, so that the client may perform corresponding fault processing according to the fault state.
The BMC network is a network for managing BMC, the monitoring node can monitor the state of the BMC by accessing the BMC network port of the BMC network, and whether the storage node has a fault can be determined according to the monitored state of the BMC. It should be noted that the BMC network is an optional network, and in some embodiments, it may be possible to monitor whether a storage node is faulty or not by using the BMC, but may also monitor whether a storage node is faulty or not by using other methods, so that the BMC network may also not be set in the distributed storage system, and the monitoring is directly implemented by using a monitoring network.
When the storage nodes in the distributed storage system are all connected with the service network, the supervisory network and the BMC network, the supervisory node can receive status information of the storage nodes, such as information about whether a fault occurs, from the three networks in real time.
To further illustrate the interaction process among the client, the storage node, and the supervisory node, referring to fig. 3, which illustrates an interaction diagram among devices in a distributed storage system according to an embodiment of the present invention, in fig. 3, at least one Object Storage Device (OSD) process, SCSI processing process, and Node Monitor Service (NMS) proxy process may be installed in one storage node.
The OSD process may correspond to one or more storage media in the storage node, where the storage media may be a hard disk, the OSD process is configured to manage an access request for the one or more storage media, and the access request is configured to instruct to process data to be processed, where processing the data to be processed may include reading a data block stored in the at least one or more storage media, where the data block to be processed includes the data to be processed, processing the data to be processed may further include writing the data to be processed in the at least one or more storage media, and when the access request may be sent using SCSI, the access request may be regarded as an SCSI request.
The SCSI processing process is used for acquiring the SCSI request sent by the client from the service network, converting and decomposing the SCSI request to obtain a plurality of SCSI sub-requests, and issuing the SCSI sub-requests to the corresponding OSD processes. For example, the SCSI request carries a Logical Block Address (LBA) of data to be read being 100-.
And the NMS agent process is used for receiving the fault state of the distributed storage system sent by the monitoring node and sending the received fault state to all OSD processes of one storage node. For example, a client sends a fault state of a distributed storage system to a storage node through a supervisory network, an NMS proxy process in the storage node obtains the fault state sent by the supervisory node from the supervisory network and sends the obtained fault state to each OSD process in the storage node, and after any OSD process receives the fault state, if an SCSI sub-request or an SCSI request sent by an SCSI processing process is received, the received fault state is directly sent to the SCSI processing process, so that a device installing the SCSI processing process performs fault processing according to the received fault state.
It should be noted that, in some embodiments, the SCSI process is not installed in the storage node, but installed in the client, and the device that installs the SCSI process is not specifically limited in this embodiment of the present invention. For example, the LBA of the SCSI request carrying the data to be read in the SCSI processing process of the client is 0 to 100, and since the storage location indicated by the LBA 0 to 50 is in the storage node 1 and the storage location indicated by the LBA 51 to 100 is in the storage node 2, the SCSI processing process may convert and decompose the SCSI request into 2 SCSI sub-requests, where the SCSI sub-request 1 is used to indicate that the data stored at the LBA 0 to 50 in the storage node 1 is requested to be read, and the SCSI sub-request 2 is used to indicate that the data stored at the LBA 51 to 100 in the storage node 2 is requested to be read, so that the SCSI processing process may send the SCSI sub-request 1 to the OSD process in the storage node 1 and send the SCSI sub-request 2 to the OSD process in the storage node 2.
For further explanation, the hardware structure of the computer device, referring to the schematic structural diagram of the computer device provided in the embodiment of the present invention shown in fig. 4, the computer device 400 includes a relatively large difference that may occur due to different configurations or performances, and may include one or more processors (CPUs) 401 and one or more memories 402, where the memories 402 store at least one instruction, and the at least one instruction is loaded by the processors 401 and executed to implement the method provided in the following fault handling method embodiment. Of course, the computer device 400 may also have components such as a wired or wireless network interface, a keyboard, and an input/output interface, so as to perform input/output, and the computer device 400 may also include other components for implementing device functions, which are not described herein again.
In an exemplary embodiment, a computer-readable storage medium, such as a memory including instructions executable by a processor in a terminal, to perform a fault handling method in the following embodiments is also provided. For example, the computer-readable storage medium may be a read-only memory (ROM), a Random Access Memory (RAM), a compact disc-read-only memory (CD-ROM), a magnetic tape, a floppy disk, an optical data storage node, and the like.
In the embodiment of the invention, the supervisory node can determine the fault state of the distributed storage system according to the storage node with the fault, and the supervisory node issues the determined fault state to all the storage nodes in the distributed storage system; after receiving a read request or a write request of a user, a client may send an SCSI request to a storage node in the distributed storage system, so as to complete the read request or the write request of the user; after any storage node receives the SCSI request, any storage node does not process the received SCSI request for the moment, and returns the received fault state to the client, so that the client can process the fault based on the fault state returned by any storage node. In some embodiments, any storage node may also perform fault processing based on the fault state, and in a possible implementation manner, when any storage node receives the fault state sent by the supervisory node, if a SCSI request sent by the client is received again, any storage node may perform fault processing according to the fault state.
For further explaining the above process, refer to a flowchart of a distributed storage system fault handling method provided in an embodiment of the present invention as shown in fig. 5, where the method specifically includes:
501. the supervisory node determines at least one storage node that fails within the distributed storage system and target data that the at least one storage node cannot be accessed.
The target data may also be data stored in the distributed storage system and not accessible, and the target data may also be only data stored by the at least one storage node that is not accessible. The embodiment of the present invention is described by taking an example that the target data is data that cannot be accessed in the at least one storage node.
In a possible implementation manner, the supervising node may query the fault table and the data table every eighth preset time period, determine a storage node with a fault from the fault table, and determine data that cannot be accessed from the data table. The eighth preset time period may be 10 minutes or 1 hour, and the eighth preset time period is not particularly limited in the embodiment of the present invention.
It should be noted that, the manner of determining the storage node with the fault from the fault table and the manner of determining the data that cannot be accessed from the data table are described in the foregoing, and details are not described herein.
After the supervising node performs step 501, a failure state may be determined according to at least one storage node and the target data, and then, at least one storage node with a failure in the distributed storage system and the target data that cannot be accessed by the at least one storage node are determined; and determining a fault state according to the at least one storage node and the target data, that is, determining a fault state of the distributed storage system according to at least one failed storage node in the plurality of storage nodes. Wherein, according to the storage node in which at least one of the plurality of storage nodes has a failure, the process of determining the failure status of the distributed storage system can be implemented by the process shown in the following step 502.
502. When the number of the at least one storage node is larger than the redundancy of the distributed storage system, and the data volume of the target data meets a preset condition, the supervisory node determines the fault state of the distributed storage system as a first fault state.
The redundancy is the redundancy of the data stored in the distributed storage system, that is, the number of copies of the data stored in the distributed storage system, the fault state of the distributed storage system is used to indicate whether the at least one storage node can be completely repaired within a first preset time period, and the fault state may include any one of a first fault state and a second fault state, where the second fault state is used to indicate that the at least one storage node cannot be completely repaired within the first preset time period. That is, when the at least one storage node can be repaired within the first preset time period, it may be considered that the fault in the distributed storage system can be repaired within a short time period, and the distributed storage system is in a first fault state, which is a node short time fault (TND) state. When the at least one storage node can be repaired within a first preset time, it is considered that the fault in the distributed storage system cannot be repaired within a short time, and the distributed storage system is in a second fault state, which is a long-term node down (PND) state. The first preset time period may be 20 minutes or 2 hours and is not particularly limited in the embodiment of the present invention.
When the number of data stored in the distributed storage system is large and the number of storage nodes is also large, if the number of storage nodes with faults in the distributed storage system is smaller than the redundancy rate of the distributed storage system, the non-faulty storage nodes in the distributed storage system can reconstruct the data which cannot be accessed in the faulty storage nodes according to the data which can be accessed, so that when the number of storage nodes with faults in the distributed storage system is smaller than the redundancy rate of the distributed storage system, the normal service of the distributed storage system cannot be influenced, and the faulty storage nodes do not need to be repaired. However, if the number of failed storage nodes is greater than the redundancy of the distributed storage system, the non-failed storage nodes in the distributed storage system cannot reconstruct the data that cannot be accessed in the failed storage node according to the data that can be accessed, and thus normal services in the distributed storage system may be affected. And considering whether the data stored in the distributed storage system can be accessed or not is also an important factor influencing the service, when the target data which cannot be accessed in the distributed storage system is too much, the influence on the service provided by the distributed storage system is larger, and the normal operation of the service is possibly influenced, therefore, the supervision node can determine the fault state of the distributed storage system firstly, so that the fault treatment can be carried out quickly according to the fault state, and the influence degree on the service is reduced to the maximum extent. When the target data is less, the influence on the service is relatively less, the normal operation of the service may not be influenced, and in order to continuously provide the service for the user, the supervision node may not perform fault processing.
The supervision node may determine whether the data amount of the target data can affect the normal operation of the service through a preset condition, where the preset condition may include any one of the following: the ratio of the data volume of the target data to a first preset data volume is greater than a preset ratio, and the first preset data volume is the total data volume of all data stored in the distributed storage system; the data amount of the target data is larger than a second preset data amount. When the ratio between the data volume of the target data and the first preset data volume is greater than the preset ratio, or when the data volume of the target data is greater than the second preset data volume, it is indicated that the data volume of the target data is relatively large, which may affect normal operation of the service. If the at least one storage node is not completely repaired beyond the first preset time, the fault state can be updated to a second fault state. It should be noted that the preset ratio may be 0.4, 0.5, or 0.6, and the embodiment of the present invention does not specifically limit the preset ratio, the first preset data amount, and the second preset data amount.
To enable comparison of the number of the at least one storage node with the redundancy of the distributed storage system, the supervising node may query a fault table stored by the supervising node, determining the number of the at least one storage node from the fault table. In order to determine whether the data volume of the target data meets the preset condition, the supervisory node may query the stored data table, and determine the data volume of the target data that cannot be accessed currently in the distributed storage system from the data table. For example, the supervising node, by looking up table 2, may determine that the target data includes 10KB of data within the data block indicated by data identity M and 20KB of data within the data block indicated by data identity N.
It should be noted that, when the fault table is introduced in the foregoing, a process of determining the number of storage nodes that have faults in the distributed storage system from the fault table is described, and here, the process of determining the number of storage nodes that have faults in the distributed storage system from the fault table is not described in detail in the embodiment of the present invention.
It should be noted that the process shown in this step 502 is a process of determining a fault state of the distributed storage system according to the at least one storage node and the target data.
503. The supervisory node sends a first fault identification to all storage nodes within the distributed storage system indicating a first fault status.
The fault identifier of the fault state includes any one of a first fault identifier and a second fault identifier, where the first fault identifier is used to indicate the first fault state, and the second fault identifier is used to indicate the second fault state, and the first fault identifier and the second fault identifier may be different, for example, the first fault identifier may be s, and the second fault identifier may be t.
The supervisory node may send the first fault identifier to the NMS agent process of each storage node, thereby sending the first fault identifier to all storage nodes to inform all storage nodes that the current fault state of the distributed storage system is the first fault state.
It should be noted that the process shown in this step 503 is also a process in which the supervisory node sends the fault status to each of the plurality of storage nodes included in the distributed storage system.
It should be noted that, in some embodiments, after the monitoring node determines the fault state, the monitoring node may further perform fault processing according to the fault state of the storage system.
504. A target storage node in the distributed storage system receives the first failure identification.
The target storage node is any storage node in the distributed storage system, and each OSD process in the target storage node can obtain the first fault identifier from the NMS agent process of the target storage node, so that each OSD process of the target storage node can obtain the first fault identifier. It should be noted that each storage node in the distributed storage system may execute this step 504, and the failed storage node may or may not receive the first failure flag.
505. The target device sends an access request to the target storage node,
the access request is used for indicating to read data stored in the target storage node or write data to the target storage node. The target device is a device installed with a SCSI processing process, and may be the target client, or may be a target storage node, where the target client is any client in the distributed storage system. This step 506 may be implemented by a SCSI process within the target device.
Before 505, a target client in the distributed storage system may send a target access request to a target device, where the target access request is used to execute processing on first target data, where the first target data includes data indicated by the access request, and the target access request may carry a target storage address, where the target storage address may be a storage address of the first target data. The target access request is sent by a target virtual machine that the target client may install. In particular, the target virtual machine may send the target access request to a SCSI processing process within the target device. The sending of the target access request by the target virtual machine to the SCSI processing process may be triggered by a user action, for example, when the user inputs a storage address of data to be read within an interface of the client and clicks a read button, the target virtual machine within the client is triggered to send the target access request to the SCSI processing process to request reading of data stored at the storage address input by the user.
Then, the target device receives a target access request sent by a target client in the distributed storage system, and this step 506 may be implemented in the following manner: the target device sends an access request to a target storage node in the distributed storage system based on the target access request. Specifically, after receiving the target access request, the SCSI processing process converts and decomposes the target access request according to a target address to obtain a plurality of access requests, where each access request may carry a part of an address in the target address, and the part of address may be an offset address in a storage medium managed by any OSD process in the target storage node, so that the SCSI processing process sends a corresponding access request to the OSD process in the target storage node, and a process of converting the access request into the access request is also the foregoing process of converting the SCSI request.
506. After a target storage node in the distributed storage system receives the first fault identifier, if the target storage node receives an access request again, the target storage node suspends the access request and sends the first fault identifier to a target device.
This step 506 may be implemented by the OSD process within the target storage node receiving the access request. After the target storage node receives the first failure flag, it indicates that the target storage node already knows the failed storage node in the distributed storage system, and the current failure state is the first failure state.
In order to enable the target device to also know the fault state of the distributed storage system, when the target storage node receives an access request sent by the target device, the target storage node may output the first fault identifier to the target device. Specifically, after any OSD process in the target storage node receives an access request sent by a SCSI processing process of the target device, any OSD process sends the first failure identifier to the SCSI processing process. Of course, in some embodiments, after determining the fault state of the distributed storage system, the supervisory node may also directly send the fault state to the target device, so that the target device may obtain the fault state of the distributed storage system, that is, the fault state does not need to be sent to the target device through the storage node.
It should be noted that the process shown in step 506 is a process of outputting the failure flag when the target storage node in the distributed storage system receives the failure flag and then receives the access request.
507. And the target device receives a first fault identification returned by the target storage node based on the access request.
This step 507 is implemented by a SCSI processing procedure in the target device, and the procedure shown in this step 507 is a procedure of receiving a failure flag returned by the target storage node based on the access request. When any OSD process sends the first failure status to the SCSI process, the SCSI process may receive the first failure indication. It should be noted that, the process shown in this step 507 is to receive a response returned by the target storage node; the response includes a fault state of the distributed storage system; and the fault state is used for indicating a process of whether at least one failed storage node can be completely repaired within a first preset time length, wherein the response comprises the fault state of the distributed storage system, namely a fault identifier.
508. And the target equipment carries out fault processing based on the received first fault identification.
This step 508 may be executed by a SCSI processing process installed in the target device, and after the SCSI processing process receives the first failure identifier, the SCSI processing process may perform failure processing based on the first failure identifier, so that the target client can perform corresponding processing.
In some embodiments, the target virtual machine interfacing with the SCSI processing process in the target client may or may not be a VMWare virtual machine, and since the failure processing manner of different virtual machine target devices is different, this step 508 may be implemented in any one of the following manners 1-2.
Mode 1, when the access request is sent by a target client in the distributed storage system based on a target virtual machine, and the target virtual machine is a VMWare virtual machine, if the failure state is the first failure state, the SCSI processing process does not respond to the access request to the target virtual machine.
When the SCSI processing process receives the first failure flag, it indicates that the distributed storage system is in a first failure state, and for the VMWare virtual machine, the first failure state is also an ADP state. And since the access request received by the SCSI process was sent for that target virtual machine, then, in order to make the VMWare virtual machine sense that the distributed storage system is in the APD state defined by itself, the SCSI processing process does not respond to the target virtual machine, and because all SCSI processing processes in the distributed storage system only need to issue access requests to the OSD process, the SCSI processing processes all receive the first fault identifier, each SCSI processing process that processes an access request does not respond to the target virtual machine, thereby emulating all link no responses (DOWN) in the distributed storage system, for the target virtual machine, without receiving a response from the SCSI process, the target virtual machine will continue to send access requests, therefore, even if all storage nodes in the distributed storage system are not in failure, failure processing can be carried out according to the failure state defined by the VMWare virtual machine.
In the method 2, when the access request is sent by the target client based on the target virtual machine and the target virtual machine is not the VMWare virtual machine, if the failure state is the first failure state, the SCSI processing process sends a retry request to the target virtual machine, where the retry request is used to instruct to resend the access request.
The detection key word carried in the retry request may be a Unit attachment (0x6) error code, and the Unit attachment (0x6) error code may indicate that the storage medium or the link state in the storage node has changed, that is, a failure occurs, and after the target virtual machine receives the retry request, the target virtual machine re-issues an access request to the SCSI processing process, so as to implement the processing method in the above mode 1.
For different virtual machines, the embodiment of the invention provides different fault processing modes, so that the fault processing mode provided by the embodiment of the invention has universality.
It should be noted that the process shown in step 508 is a process of performing fault handling based on the fault state included in the response.
509. If all the at least one storage node is repaired within the first preset time, the supervisory node sends a repair completion response to each device in the distributed storage system, and the repair completion response is used for indicating that no fault device exists in the distributed storage system.
The devices include storage nodes and clients. When all the at least one storage node is repaired within the first preset duration, it indicates that there is no faulty device in the distributed storage system at this time, and if the supervisory node stores the first fault identifier for identifying the first fault state, the supervisory node may delete the first fault identifier, and send a repair completion response to each device in the distributed storage system, and it is notified that there is no faulty device in the distributed storage system for each device to work normally, and then, after each device receives the repair completion response, delete the previously received fault identifier, and may start to work normally. It should be noted that the process shown in step 509 is a process of sending a repair complete response to each device in the distributed storage system when the repair of the at least one storage node is completed.
In the prior art, when a client does not obtain a response of any storage node, a failure state of a distributed storage system is directly considered as an APD state, the failure state of the distributed storage system is determined as a PDL state only when the storage node explicitly returns a message indicating storage exception, if the storage node fails for a long time, the message indicating storage exception may not be returned to the client, and thus the client may determine the failure state as the APD state. In the embodiment provided by the invention, each storage node in the distributed storage system knows the fault state of the distributed storage system, and for the storage nodes without faults, the fault state can be returned to the target equipment based on the SCSI request, so that the target equipment can definitely know the fault state of the distributed storage system, and the precision of determining the fault state by the target equipment can be further improved.
510. When the fault state is the first fault state, if all the at least one storage node is not repaired within the first preset time, the supervisory node updates the fault state from the first fault state to the second fault state.
Since the first preset time duration is only one preset time duration, the at least one storage node may not be repaired completely within the first preset time duration, and thus, when the at least one storage node may not be repaired completely, it may take a longer time to repair the storage node that has not been repaired. Since the time taken to repair a storage node that has not been repaired is uncertain and may be long, the supervising node may directly update the fault state from the first fault state to the second fault state.
511. When the fault state is updated from the first fault state to the second fault state, the supervisory node sends a second fault identification indicating the second fault to all storage nodes within the distributed storage system.
The manner in which the supervising node sends the second failure flag to all the storage nodes is the same as the manner in which all the storage nodes send the first failure flag in step 503, and here, this step 511 is not described in detail in this embodiment of the present invention. The process shown in this step 511 is also a process of sending the fault status to each of the plurality of storage nodes.
512. The target storage node receives the second failure identification.
The manner in which the target storage node receives the second failure flag is the same as the manner in which the target storage node receives the first failure flag in step 504, and here, this step 512 is not described in detail in this embodiment of the present invention.
513. The target device sends an access request to the target storage node.
The manner in which the target device sends the access request to the target storage node is described in step 505, and this step 513 is not described in detail in this embodiment of the present invention.
514. When a target storage node in the distributed storage system receives the second fault identifier, if the target storage node receives an access request again, the target storage node suspends the access request and outputs the second fault identifier.
The manner of suspending the access request and outputting the second failure identifier by any storage node is the same as the manner of suspending the access request and outputting the first failure identifier by the target storage node in step 506, and here, this step 514 is not described in detail in this embodiment of the present invention.
515. And the target device receives a second fault identification returned by the target storage node based on the access request.
The manner in which the target device receives the second failure flag is the same as the manner in which the target device receives the first failure flag in step 507, and here, this is not described in detail in this embodiment of the present invention.
516. And the target equipment carries out fault processing based on the received second fault identification.
This step 516 may be executed by a SCSI processing process of the target device, and in some embodiments, the target virtual machine interfacing with the SCSI processing process in the client may be a VMWare virtual machine or may not be a VMWare virtual machine, and since different virtual machine target devices perform different failure processing manners, this step 516 may be implemented in any of the following manners 3 to 4.
Mode 3, when the access request is sent by the target client based on the target virtual machine and the target virtual machine is the VMWare virtual machine, if the failure state is the second failure state, the SCSI processing process returns a message of storage exception to the VMWare virtual machine.
When the fault mark is the second fault mark, the at least one storage node can not be repaired in the first preset time, and needs longer time, the target virtual machine can perform fault processing in a PDL state, in order that the target virtual machine can sense the PDL state, the message of the storage exception may carry SCSI errors such as SK 0x0, ASC & ASCQ 0x0200 or SK 0x5, ASC & ASCQ 0x2500, etc. customized by the VMWare virtual machine, the SCSI error may indicate that the state in the distributed storage system is the PDL state, so that the target virtual machine can sense the PDL state after receiving the storage exception message, the target virtual machine can power down the file system in the distributed storage system, wait for a technician to repair the fault in the distributed storage system, or selecting a better fault processing mode to process the fault storage node according to a user-defined fault processing mode. For example, the failed storage node is powered down.
It should be noted that, because a file system may be abnormal when a storage node in the distributed storage system fails to be repaired in a short time, in order to ensure that the file system can be normally used after the failure of the storage node in the distributed storage system is repaired, the file system needs to be powered off first. And when the repair is finished, powering on the file system and repairing the file system.
Mode 4, when the access request is sent by the target client based on the target virtual machine and the target virtual machine is not the VMWare virtual machine, if the failure state is the second failure state, the target device returns a target error recognizable by the target virtual machine to the target virtual machine, where the target error is used to indicate a storage medium failure.
The target Error may be a Sense key 0x3 Error, that is, a storage Medium Error (Medium Error), and a general virtual machine may recognize that when the target virtual machine receives the target Error, it indicates that the state of the distributed storage system is the second failure state at this time, the target device may power down the distributed file system, wait for a technician to repair a failure in the distributed storage system, or select a better failure processing mode according to a user-defined failure processing mode to process a failed storage node.
It should be noted that the process shown in step 516 is a process of performing fault handling on the target device based on the received fault identifier, that is, a process of performing fault handling based on the fault status included in the response.
517. When the at least one storage node completes repairing, a repair complete response is sent to each device in the distributed storage system.
Step 517 is the same as step 509, and this step 517 is not described herein in this embodiment of the present invention. It should be noted that the failure of each storage node may be repaired by itself or by a technician, and the embodiment of the present invention does not specifically limit the repair method of the storage node.
It should be noted that, when the client receives the repair completion response, if the file system is powered off, the client powers on the file system and repairs the file system. Because a large amount of metadata is stored in the file system, when the file system is repaired, all metadata in the file system needs to be scanned, and scanned error metadata needs to be modified, generally, part of time needs to be consumed in the process of repairing the file system, in a first fault state, a client does not power down the file system, once at least one storage node can be repaired within a first preset time, the power down of the file system can be avoided, so that the time for repairing the file system can be reduced, and a distributed storage system can recover services as soon as possible, so that the service quality is ensured.
According to the method disclosed by the embodiment of the invention, the fault state of the distributed storage system is determined according to at least one storage node with a fault in the plurality of storage nodes, so that the fault state of the distributed storage system is determined without determining that all the storage nodes have faults, and after the fault state is determined, the fault state can be immediately sent to each storage node in the distributed storage system, so that each storage node can perform fault processing according to the determined fault state, and the time for the distributed storage system to recover to be normal can be reduced. Moreover, for different virtual machines, the embodiment of the invention provides different fault processing modes, so that the fault processing mode provided by the embodiment of the invention has higher universality. And each storage node in the distributed storage system knows the fault state of the distributed storage system, and for the storage nodes without faults, the fault state can be returned to the target equipment based on the access request, so that the target equipment can definitely know the fault state of the distributed storage system, and the precision of determining the fault state by the target equipment can be further improved. In addition, in the first failure state, the target device does not power off the file system, once the at least one storage node can be repaired within the first preset time, the power off of the file system can be avoided, and after the at least one storage node is restored, the file system and the service can be immediately restored, so that the time for repairing the file system can be reduced, the service can be restored by the distributed storage system as soon as possible, and the service quality can be guaranteed.
Since the failure of a storage node may be repairable in a short time and may also require a long time to repair, and the repair time of each storage node may affect the repair time of the entire distributed storage system, in some embodiments, the failure status of the distributed storage system may also be determined according to the failure type of each storage node. To further illustrate this process, refer to a flowchart of a distributed storage system fault handling method provided by the embodiment of the present invention shown in fig. 6, where the flow of the method may include the following steps.
601. The supervisory node determines at least one storage node that fails within the distributed storage system and target data that the at least one storage node cannot be accessed.
Step 601 is the same as step 501, and step 601 is not described herein in this embodiment of the present invention.
602. And the supervision node determines a fault scene of the distributed storage system according to the time when the at least one storage node fails, wherein the fault scene is used for indicating whether the at least one storage node fails simultaneously.
The failure scenario may include any one of a first failure scenario and a second failure scenario, wherein the first failure scenario is used for indicating that the at least one storage node fails simultaneously, and the second failure scenario is used for indicating that the at least one storage node fails at different times.
The supervising node can determine the time of the failure of each storage node from the stored failure table, so that the supervising node can determine the failure scene according to whether the time of the failure of at least one storage node is the same.
In a possible implementation manner, when the at least one storage node fails within the target time length, the supervisory node determines the failure scenario as a first failure scenario, otherwise, determines the failure scenario as a second failure scenario. It should be noted that the process shown in step 602 is a process of determining the failure scenario according to the time when the at least one storage node fails.
603. For any storage node in the at least one storage node, when a preset network fault, a preset abnormal power failure fault, a preset misoperation fault, a preset hardware fault or a preset software fault occurs in the any storage node, the supervisory node determines the fault type of the any storage node as a first fault type, otherwise, determines the fault type of the any storage node as a second fault type.
The fault type of the storage node is used to indicate whether a fault of one storage node can be repaired within a second preset time period, where the fault type may include any one of the first fault type and the second fault type, the second fault type is used to indicate that a fault of one storage node cannot be repaired within the second preset time period, and the second preset time period may be less than or equal to the first preset time period.
Wherein the preset network fault may include any one of the following items 1.1 to 1.7:
item 1.1, the service portal of any storage node cannot be accessed, the supervision portal of any storage node can be accessed, the service portal is a portal of a service network used when heartbeat, data synchronization and mirroring are used between storage nodes, and the supervision portal is a portal of a supervision network used when monitoring whether a storage node is in failure and performing information query.
The supervision node can send an internet packet explorer (ping) request to a service network port of any storage node through the service network port of the supervision node, the ping request is used for requesting to establish connection, if the connection can be successful, the service network of any storage node can be considered to be accessed, otherwise, the service network port of any storage node cannot be accessed. Similarly, the monitoring node can send a ping request to the monitoring network port of any storage node through the monitoring network port of the monitoring node, if the connection is successful, the monitoring network port of any storage node is considered to be accessible, otherwise, the monitoring network port of any storage node is considered to be inaccessible.
If the supervision node cannot access any storage node through the service network port, the network fault occurs in any storage node, but the supervision network port can access any storage node, and the fault occurring in any storage node can be repaired in a short time, so that the preset network fault occurs in any storage node.
Item 1.2, when the service network and the supervisory network are the same target network, a first preset number of packet losses or a second preset number of malformed packets occur in the data packets transmitted by any storage node in the target network, and the service port, the supervisory port, and the BMC port of any storage node are not accessible, and the BMC port is a port of a BMC network that manages BMCs.
The monitoring node can send a ping request to the BMC network port of any storage node through the BMC network port of the monitoring node, if the connection is successful, the BMC network port of any storage node is considered to be accessible, otherwise, the BMC network port of any storage node is considered to be inaccessible.
In the delivery stage, a technician may configure a service network and a supervision network as the same network, that is, a target network, and when the service network and the supervision network are the same target network, if a first preset number of packet losses or a second preset number of malformed packets occur in a data packet transmitted by any storage node in the target network, it is indicated that a network fault occurs in any storage node, and if the supervision node cannot access any storage node, it is indicated that the fault occurring in any storage node can be repaired in a short time, then a preset network fault occurs in any storage node.
Item 1.3, when the service network and the supervisory network are the same target network, a packet loss greater than a first preset number or a malformed packet greater than a second preset number occurs in a data packet transmitted by any storage node in the target network, and a time delay of the data transmission by any storage node in the target network is greater than a third preset time.
It should be noted that, when the monitoring node sends the ping request to any storage node, if the connection is unsuccessful, the any storage node sends a connection failure response to the monitoring node, where the connection failure response is used to indicate that the connection is failed, and the connection failure response may carry delay information, and the delay information is used to indicate a delay of data transmission by any storage node in the target network, so that the monitoring node may determine whether the delay indicated by the delay information is greater than a third preset duration.
Item 1.4, when the service network and the supervisory network are the same target network, a flow control PFC message of a priority of any storage node appearing in the target network is greater than a third preset number, and the any storage node is inaccessible.
The policing node may detect a priority-based flow control (PFC) packet of each priority in the target network, so as to determine whether the number of PFC packets sent by any storage node is greater than a third preset number, and the policing node may determine whether any storage node meets a preset condition. The third preset number is not specifically limited in the embodiment of the present invention.
Item 1.5, when the service network and the supervision network are the same target network, the flow control PFC message sent by any storage node with the third preset number of priorities appears in the target network, and the time delay for transmitting data in the target network by any storage node is longer than a fourth preset time.
It should be noted that, in the embodiment of the present invention, the four preset durations are not specifically limited.
Item 1.6, when the service network and the supervisory network are the same target network, a broadcast storm caused by any storage node occurs in the target network, and the service portal, the supervisory portal, and the BMC portal of any storage node are not accessible.
It should be noted that when any storage node sends a large number of broadcast packets in the target network, a broadcast storm may occur in the target network.
Item 1.7, when the service network and the supervision network are the same target network, a broadcast storm occurs in the target network due to any storage node, and a time delay of any storage node in the target network is longer than a fifth preset time.
It should be noted that, in the embodiment of the present invention, the five preset durations are not specifically limited.
The preset abnormal power failure fault may include any one of the following items 2.1 to 2.2:
item 2.1, the business portal, the supervisory portal, and the BMC portal of all storage nodes in the subrack are not accessible, and the subrack includes any storage node.
The machine frame can comprise at least one storage node, when all storage node service network ports, the monitoring network port and the BMC network port are inaccessible, all storage nodes in the machine frame can be considered to be powered down, if any storage node is in the machine frame, the fact that any storage node is powered down is indicated, the fault of any storage node can be repaired as long as the machine frame is powered up, and then the fact that the preset abnormal power failure occurs to any storage node is considered.
Item 2.2, in a seventh preset time period, none of the service portal, the monitoring portal, and the BMC portal of the storage nodes of the first target number is accessible, and the storage nodes of the first target number include the any storage node.
And when the service network ports, the monitoring network ports and the BMC network ports of the storage nodes with the first target number are not accessible within the seventh preset time, the storage nodes with the first target number can all consider that the preset abnormal power failure fault occurs. It should be noted that, in the embodiment of the present invention, the seventh preset time period is not specifically limited.
The preset malfunction failure may include: the any one storage node is actively powered down. For example, when a user clicks a shutdown button or a restart button of any storage node, the storage node considers that the storage node is powered off actively, and sends active power-off information to the supervisory node, so that the supervisory node determines that a preset misoperation fault occurs in any storage node.
The preset hardware fault comprises: and if any storage node is abnormally quitted, the BMC network port of the any storage node can access the any storage node, and loose parts exist in the any storage node.
When any storage node abnormally exits, the information of the abnormal exit can be sent to the supervision node to indicate that the supervision node abnormally exits. The abnormal exit may be due to loosening of internal components. Any part can be loosened into a memory bank, a clamping bank and the like, and the loosened part can be immediately recovered in a plugging mode, namely, a short-time fault occurs. It should be noted that when any storage node detects that any component is poorly connected, it indicates that any storage node has a loose component, and then any storage node may send information of the loose component to the supervisory node, so that the supervisory node may determine that a preset misoperation fault occurs in any storage node according to the information of the loose component.
The preset software fault may include any one of the following items 3.1-3.3:
the exception reset of any one of the storage nodes is caused by an operating system exception for any one of the storage nodes of item 3.1.
When the memory of any storage node is insufficient, the operating system of any storage node cannot continue to operate, and needs to be reset, or the abnormal reset of any storage node triggered by a watchdog, and the like, when the abnormal reset occurs in any storage node, the any storage node can send an abnormal reset message to the supervisory node, so that the supervisory node can know that the abnormal reset occurs in any storage node, and the preset software fault occurs in any storage node.
The software exception of the any one storage node of item 3.2 causes the target process of the any one storage node to exit.
The target process may be an OSD process, and when the abnormal reset occurs in any storage node, the any storage node may send a message that the target process exits to the supervisory node, so that the supervisory node may know that the target process of any storage node exits, and thus it is indicated that a preset software fault occurs in any storage node.
The software exception of any one of the storage nodes of item 3.3 results in an operating system reset of any one of the storage nodes.
Due to software exception, when the operating system of any storage node is reset, the operating system reset message can be sent to the supervisory node by any storage node, so that the supervisory node can know that the operating system of any storage node is reset, and the preset software fault occurs in any storage node.
It should be noted that, when each storage node in the at least one storage node fails, the supervising node may determine, through this step 603, whether the failure type of each storage node is the first failure type or the second failure type, and store the failure type of each storage node in the failure table, so that when the supervising node needs the failure type of any storage node, the supervising node may directly obtain the failure type from the failure table.
It should be noted that the multiple fault type determination method embodied in this step 603 can accurately determine the fault type of each storage node, and further can determine the fault state of the distributed storage system more accurately according to the fault type of each storage node.
604. When the number of the at least one storage node is greater than the redundancy of the distributed storage system and the data volume of the target data meets a preset condition, if the fault scene is the first fault scene, the supervisory node determines the fault state according to the fault type of each storage node in the at least one storage node.
Since the first failure scenario is used to indicate that the at least one storage node is present at the same time, the supervisory node may determine the failure status of the distributed storage system from each of the storage nodes.
In one possible implementation, the determining, by the supervising node, the fault status according to the fault type of each of the at least one storage node may include: when the fault type of each storage node in the at least one storage node is a first fault type, the supervising node determines the fault state as the first fault state, and the first fault state is used for indicating that the at least one storage node can be completely repaired within the first preset time length; when the failure type of a target number of storage nodes in the at least one storage node is a second failure type, if the target number is less than or equal to the redundancy of the distributed storage system, the supervising node determines the failure state as the first failure state, otherwise, determines the failure state as the second failure state, and the second failure state is used for indicating that the at least one storage node cannot be completely repaired within the first preset time length.
When the failure type of each of the at least one storage node is the first failure type, it may be considered that the failure within the distributed storage system may be repaired in a short time, and the failure state may be determined as the first failure state. Although the failure type of the target number of storage nodes is the second failure type, when the target number is less than or equal to the redundancy of the distributed storage system, which indicates that the target number of storage nodes has little influence on the distributed storage system, the failure state may be determined as the first failure state. Once the target number is greater than the redundancy of the distributed storage system, which indicates that the target number of storage nodes has a greater impact on the distributed storage system, the failure state may be determined as a second failure state, so as to quickly repair the failure.
It should be noted that, for the description whether the number of the at least one storage node is greater than the redundancy of the distributed storage system and whether the data size of the target data meets the preset condition, the description is presented in the foregoing, and no further description is given to this embodiment of the present invention.
It should be noted that, when the fault scenario is the first fault scenario, the supervisory node may determine the fault type of each storage node through any one preset fault of the network fault, the preset abnormal power failure fault, or the preset misoperation fault preset in step 603.
605. When the number of the at least one storage node is greater than the redundancy of the distributed storage system and the data volume of the target data meets a preset condition, if the fault scene is a second fault scene, the supervision node determines the fault state according to the fault type of the first storage node with the fault at the last of the at least one storage node.
Since the second failure scenario is used to indicate that the at least one storage node fails at different times, the supervising node may determine the failure status of the distributed storage system according to the failure type of the last failed storage node of the at least one storage node, where the last failed storage node is also the first storage node.
In a possible implementation manner, when the failure type of the first storage node is a first failure type, the supervising node determines the failure state as a first failure state, where the first failure state is used to indicate that the at least one storage node can be completely repaired within the first preset time period; when the fault type of the first storage node is the second fault type, the supervising node determines the fault state as a second fault state, and the second fault state is used for indicating that the at least one storage node cannot be completely repaired within the first preset time.
It should be noted that the processes shown in steps 604 and 605 are processes for determining the fault state of the distributed storage system according to the fault scenario of the distributed storage system.
It should be noted that, when the fault scenario is the second fault scenario, the supervisory node only needs to determine the fault type of the first storage node, and does not need to determine the fault type of the storage node, and the fault type of the first storage node may be determined through any one preset fault of the network fault, the preset abnormal power failure fault, the preset misoperation fault, the preset hardware fault, or the preset software fault preset in step 603.
606. The supervisory node sends a fault identification to all storage nodes within the distributed storage system indicating a fault status.
When the failure state is the first failure state, the supervising node sends a first failure flag to all storage nodes in the distributed storage system, and when the failure state is the second failure state, the supervising node sends a second failure flag to all storage nodes in the distributed storage system. It should be noted that the process shown in this step 606 is also a process of sending the fault status to each of the plurality of storage nodes.
607. A target storage node within the distributed storage system receives a failure identification.
When the failure identifier is the first failure identifier, the target storage node receives the first failure identifier, and when the failure identifier is the second failure identifier, the target storage node receives the second failure identifier, and the specific execution process is the same as that in step 504, which is not described herein again.
608. The target device sends an access request to the target storage node.
The process shown in step 608 is the same as that shown in step 505, and the embodiment of the present invention does not repeat this step 608.
609. When a target storage node in the distributed storage system receives the fault identifier, if the target storage node receives an access request again, the target storage node suspends the access request and outputs the fault identifier.
When the failure identifier is the first failure identifier, the target storage node outputs the first failure identifier to the target device, and when the failure identifier is the second failure identifier, the target storage node outputs the second failure identifier to the target device, and the specific execution process is the same as that in step 506, which is not described herein again.
610. And the target device receives the fault identification returned by the target storage node based on the access request.
The process shown in step 610 is the same as the process shown in step 505, and the embodiment of the present invention does not repeat this step 610. It should be noted that, the process shown in this step 610 is to receive the response returned by the target storage node; the response includes a fault state of the distributed storage system; and the fault state is used for indicating whether at least one failed storage node can be completely repaired within a first preset time length. The fault status included in the response is also a fault flag.
611. And the target equipment carries out fault processing based on the received fault identification.
When the fault identifier is the first fault identifier, the target device performs fault processing based on the received first fault identifier, and the specific execution process is the same as the process shown in step 508. When the fault identifier is the second fault identifier, the target device performs fault processing based on the received second fault identifier, and the specific execution process is the same as the process shown in step 516, where this step 611 is not described in this embodiment of the present invention again.
It should be noted that each storage node in the distributed storage system knows the fault state of the distributed storage system, and for a storage node that does not have a fault, the fault state may be returned to the target device based on the access request, so that the target device may definitely know the fault state of the distributed storage system, and further, the accuracy of determining the fault state by the target device may be improved.
It should be noted that, for different virtual machines, the embodiment of the present invention provides different failure processing manners, so that the failure processing manner provided by the embodiment of the present invention has more universality.
It should be noted that the process shown in step 611 is a process of performing fault handling based on the fault state included in the response.
612. When the at least one storage node completes repairing, the supervisory node sends a repair complete response to each device within the distributed storage system.
Step 612 is the same as step 509, and this step 612 is not described herein in this embodiment of the present invention. It should be noted that, when the failure state is the first failure state, if all of the at least one storage node is completely repaired within the first preset time period, this step 612 may be directly executed, and if all of the at least one storage node is not repaired within the first preset time period, the supervisory node updates the failure state from the first failure state to the second failure state, and jumps to execute step 606. It should be noted that, in the first failure state, the client may not power down the file system, and once the at least one storage node is repaired within the first preset time, the power down of the file system may be avoided, so that the time for repairing the file system may be reduced, and the distributed storage system may recover the service as soon as possible, so as to ensure the quality of service.
According to the method disclosed by the embodiment of the invention, the fault state of the distributed storage system is determined according to at least one storage node with a fault in the plurality of storage nodes, so that the fault state of the distributed storage system is determined without determining that all the storage nodes have faults, and after the fault state is determined, the fault state can be immediately sent to each storage node in the distributed storage system, so that each storage node can perform fault processing according to the determined fault state, and the time for the distributed storage system to recover to be normal can be reduced. Moreover, for different virtual machines, the embodiment of the invention provides different fault processing modes, so that the fault processing mode provided by the embodiment of the invention has higher universality. And each storage node in the distributed storage system knows the fault state of the distributed storage system, and for the storage nodes without faults, the fault state can be returned to the target equipment based on the access request, so that the target equipment can definitely know the fault state of the distributed storage system, and the precision of determining the fault state by the target equipment can be further improved. In addition, in the first failure state, the target device does not power off the file system, and once the at least one storage node can be repaired within the first preset time, the power off of the file system can be avoided, so that the time for repairing the file system can be reduced, and the distributed storage system can recover the service as soon as possible to ensure the service quality. Moreover, the multiple fault type determination method embodied in this step 603 can accurately determine the fault type of each storage node, and further can more accurately determine the fault state of the distributed storage system according to the fault type of each storage node.
Fig. 7 is a schematic structural diagram of a fault handling apparatus provided in an embodiment of the present invention, which is applied to a distributed storage system, where the distributed storage system includes a plurality of storage nodes, and the apparatus includes:
a determining module 701, configured to determine a failure state of the distributed storage system according to a failed storage node in at least one of the plurality of storage nodes; the fault state is used for indicating whether the at least one failed storage node can be completely repaired within a first preset time length;
a sending module 702, configured to send the failure status to each of the plurality of storage nodes.
Optionally, the apparatus further comprises:
a processing module to send the fault status to each of the plurality of storage nodes.
Optionally, the determining module 701 includes:
a first determining unit, configured to perform step 501;
a second determining unit, configured to determine the fault status according to the at least one storage node and the target data.
Optionally, the second determining unit is configured to:
when the number of the at least one storage node is greater than the redundancy of the distributed storage system and the data volume of the target data meets a preset condition, determining the fault state as a first fault state, wherein the first fault state is used for indicating that the at least one storage node can be completely repaired within the first preset time.
Optionally, the second determining unit is configured to:
when the number of the at least one storage node is larger than the redundancy of the distributed storage system and the data volume of the target data meets a preset condition, determining the fault state according to a fault scene of the distributed storage system, wherein the fault scene is used for indicating whether the at least one storage node simultaneously breaks down.
Optionally, the preset condition comprises any one of:
the ratio of the data volume of the target data to a first preset data volume is greater than a preset ratio, and the first preset data volume is the total data volume of all data stored in the distributed storage system;
the data volume of the target data is larger than a second preset data volume.
Optionally, the apparatus further comprises:
and an updating module, configured to update the failure state from a first failure state to a second failure state if the at least one storage node is not completely repaired within the first preset time period when the failure state is the first failure state, where the second failure state is used to indicate that the at least one storage node cannot be completely repaired within the first preset time period.
Optionally, the second determining unit is configured to perform step 602.
Optionally, the second determining unit is configured to determine the failure scenario as a first failure scenario when the at least one storage node fails within a target time duration, and otherwise, determine the failure scenario as a second failure scenario, where the first failure scenario is used to indicate that the at least one storage node fails at the same time, and the second failure scenario is used to indicate that the at least one storage node fails at different times.
Optionally, the second determining unit includes:
a first determining subunit, configured to perform step 604;
a second determining subunit, configured to perform step 605.
Optionally, the first determining subunit is configured to:
when the fault type of each storage node in the at least one storage node is a first fault type, determining the fault state as a first fault state, wherein the first fault type is used for indicating that the fault of one storage node can be repaired within the second preset time period, and the first fault state is used for indicating that the at least one storage node can be completely repaired within the first preset time period;
when the failure type of a target number of storage nodes in the at least one storage node is a second failure type, if the target number is less than or equal to the redundancy of the distributed storage system, determining the failure state as the first failure state, otherwise, determining the failure state as a second failure state, where the second failure type is used to indicate that a failure of one storage node cannot be repaired within the second preset duration, and the second failure state is used to indicate that the at least one storage node cannot be completely repaired within the first preset duration.
Optionally, the second determining subunit is configured to:
when the fault type of the first storage node is a first fault type, determining the fault state as a first fault state, wherein the first fault type is used for indicating that the fault of one storage node can be repaired within the second preset time length, and the first fault state is used for indicating that the at least one storage node can be completely repaired within the first preset time length;
and when the fault type of the first storage node is a second fault type, determining the fault state as a second fault state, wherein the second fault type is used for indicating that the fault of one storage node cannot be repaired within the second preset time length, and the second fault state is used for indicating that the at least one storage node cannot be completely repaired within the first preset time length.
Optionally, the determining module 701 is further configured to execute step 603.
Optionally, the sending module 702 is further configured to execute the step 509.
Fig. 8 is a schematic structural diagram of a fault handling apparatus provided in an embodiment of the present invention, where the distributed storage system includes a plurality of storage nodes; the device includes:
a sending module 801, configured to execute the step 608;
a receiving module 802, configured to receive a response returned by the target storage node; the response includes a fault state of the distributed storage system; the fault state is used for indicating whether at least one failed storage node can be completely repaired within a first preset time length.
Optionally, the apparatus further comprises:
and the processing module is used for processing the fault based on the fault state contained in the response.
Optionally, the fault identifier of the fault state includes any one of a first fault identifier or a second fault identifier, where the first fault identifier is used to indicate a first fault state, the second fault identifier is used to indicate a second fault state, the first fault state is used to indicate that at least one storage node can be completely repaired within a first preset time period, the second fault state is used to indicate that the at least one storage node cannot be completely repaired within the first preset time period, and the storage node is a storage node in the distributed storage system that has a fault.
Optionally, the processing module is configured to:
when the access request is sent by a target client in the distributed storage system based on a target virtual machine, and the target virtual machine is a VMWare virtual machine, if the fault state is the first fault state, the access request is not responded to the target virtual machine, and the first fault state is used for indicating that at least one storage node can be completely repaired within a first preset time length;
when the access request is sent by the target client based on a target virtual machine, and the target virtual machine is a VMWare virtual machine, if the fault state is the second fault state, returning a message of abnormal storage to the target virtual machine, wherein the second fault state is used for indicating that the at least one storage node cannot be completely repaired within the first preset time length.
Optionally, the processing module is configured to:
when the target access request is sent by a target client in the distributed storage system based on a target virtual machine and the target virtual machine is not a VMWare virtual machine, if the fault state is the first fault state, sending a retry request to the target virtual machine, wherein the retry request is used for indicating to resend the access request, and the first fault state is used for indicating that at least one storage node can be completely repaired within a first preset time length;
when the target access request is sent by the target client based on a target virtual machine and the target virtual machine is not a VMWare virtual machine, if the fault state is the second fault state, returning a target error which can be identified by the target virtual machine to the target virtual machine, wherein the target error is used for indicating a storage medium fault, and the second fault state is used for indicating that the at least one storage node cannot be completely repaired within the first preset time length.
Optionally, the receiving module 802 is further configured to receive a target access request sent by a target client in the distributed storage system, where the target access request is used to instruct to process first target data, where the first target data includes the target data;
the sending module 801 is configured to send the access request to a target storage node in the distributed storage system based on the target access request.
Optionally, the receiving module 802 is configured to receive a repair complete response returned by the target storage node, where the repair complete response is used to indicate that there is no failed device in the distributed storage system.
The embodiment of the invention also provides a distributed storage system, which comprises a supervisory node and a plurality of storage nodes;
the supervisory node is configured to:
determining a failure state of the distributed storage system according to at least one failed storage node in the plurality of storage nodes; the fault state is used for indicating whether the at least one failed storage node can be completely repaired within a first preset time length;
transmitting the fault status to each of the plurality of storage nodes;
each storage node of the plurality of storage nodes to receive the fault condition.
Optionally, the fault identifier of the fault state includes any one of a first fault identifier and a second fault identifier, the first fault identifier is used to indicate the first fault state, the second fault identifier is used to indicate a second fault state, the first fault state is used to indicate that the at least one storage node can be completely repaired within the first preset time period, and the second fault state is used to indicate that the at least one storage node cannot be completely repaired within the first preset time period.
Optionally, each of the plurality of storage nodes is further configured to suspend the access request if the access request is received after the failure identifier is received, and perform failure processing based on a received failure state.
It should be noted that each of the devices in the distributed storage system provided above may be the devices in embodiments 5 and 6.
All the above optional technical solutions may be combined arbitrarily to form the optional embodiments of the present disclosure, and are not described herein again.
It should be noted that: in the fault handling apparatus provided in the above embodiment, when handling a fault, only the division of each functional module is described as an example, and in practical applications, the function distribution may be completed by different functional modules according to needs, that is, the internal structure of the apparatus is divided into different functional modules to complete all or part of the functions described above. In addition, the embodiments of the fault handling method for the distributed storage system provided by the foregoing embodiments belong to the same concept, and specific implementation processes thereof are described in the embodiments of the method for the distributed storage system, and are not described herein again.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (19)

1. A distributed storage system fault processing method is characterized in that the distributed storage system comprises a plurality of storage nodes; the method comprises the following steps:
determining at least one storage node with a fault in the distributed storage system and target data which cannot be accessed in the at least one storage node;
determining a fault state of the distributed storage system according to the at least one storage node and the target data; the fault state is used for indicating whether the at least one storage node can be completely repaired within a first preset time length;
transmitting the fault status to each of the plurality of storage nodes.
2. The method of claim 1, further comprising:
and carrying out fault processing according to the fault state of the storage system.
3. The method of claim 1, wherein determining the fault condition based on the at least one storage node and the target data comprises:
when the number of the at least one storage node is greater than the redundancy of the distributed storage system and the data volume of the target data meets a preset condition, determining the fault state as a first fault state, wherein the first fault state is used for indicating that the at least one storage node can be completely repaired within the first preset time.
4. The method of claim 1, wherein determining the fault condition based on the at least one storage node and the target data comprises:
when the number of the at least one storage node is larger than the redundancy of the distributed storage system and the data volume of the target data meets a preset condition, determining the fault state according to a fault scene of the distributed storage system, wherein the fault scene is used for indicating whether the at least one storage node simultaneously breaks down.
5. A distributed storage system fault processing method is characterized in that the distributed storage system comprises a plurality of storage nodes; the method comprises the following steps:
sending an access request to a target storage node in the distributed storage system;
receiving a response returned by the target storage node; the response comprises a fault state of the distributed storage system, wherein the fault state is determined according to at least one storage node with a fault in the distributed storage system and target data which cannot be accessed in the at least one storage node; the fault state is used for indicating whether at least one storage node can be completely repaired within a first preset time length.
6. The method of claim 5, wherein after receiving the response returned by the target storage node, the method further comprises:
and processing the fault based on the fault state contained in the response.
7. The method of claim 6, wherein performing fault handling based on the fault status included in the response comprises:
when the access request is sent by a target client in the distributed storage system based on a target virtual machine, and the target virtual machine is a VMWare virtual machine, if the fault state is a first fault state, the access request is not responded to the target virtual machine, and the first fault state is used for indicating that at least one storage node can be completely repaired within a first preset time length;
when the access request is sent by the target client based on a target virtual machine, and the target virtual machine is a VMWare virtual machine, if the fault state is a second fault state, returning a message of abnormal storage to the target virtual machine, wherein the second fault state is used for indicating that the at least one storage node cannot be completely repaired within the first preset time.
8. The method of claim 6, wherein performing fault handling based on the fault status included in the response comprises:
when the access request is sent by a target client in the distributed storage system based on a target virtual machine and the target virtual machine is not a VMWare virtual machine, if the fault state is a first fault state, sending a retry request to the target virtual machine, wherein the retry request is used for indicating to resend the access request, and the first fault state is used for indicating that at least one storage node can be completely repaired within a first preset time length;
when the access request is sent by the target client based on a target virtual machine and the target virtual machine is not a VMWare virtual machine, if the fault state is a second fault state, returning a target error which can be identified by the target virtual machine to the target virtual machine, wherein the target error is used for indicating a storage medium fault, and the second fault state is used for indicating that the at least one storage node cannot be completely repaired within the first preset time length.
9. A distributed storage system, comprising a supervisory node and a plurality of storage nodes;
the supervisory node is configured to:
determining at least one storage node with a fault in the distributed storage system and target data which cannot be accessed in the at least one storage node;
determining a fault state of the distributed storage system according to the at least one storage node and the target data; the fault state is used for indicating whether the at least one storage node can be completely repaired within a first preset time length;
transmitting the fault status to each of the plurality of storage nodes;
each storage node of the plurality of storage nodes to receive the fault condition.
10. The system of claim 9, wherein each of the plurality of storage nodes is further configured to suspend the access request if an access request is received after receiving the failure status, and perform failure handling based on the received failure status.
11. A fault handling apparatus, applied to a distributed storage system including a plurality of storage nodes, the apparatus comprising a determining module and a sending module:
wherein the determination module comprises a first determination unit and a second determination unit;
the first determining unit is configured to determine at least one storage node in the distributed storage system that has a failure and target data that cannot be accessed in the at least one storage node;
the second determining unit is configured to determine a fault state of the distributed storage system according to the at least one storage node and the target data; the fault state is used for indicating whether the at least one storage node can be completely repaired within a first preset time length;
the sending module is configured to send the failure status to each of the plurality of storage nodes.
12. The apparatus of claim 11, wherein the second determining unit is configured to:
when the number of the at least one storage node is greater than the redundancy of the distributed storage system and the data volume of the target data meets a preset condition, determining the fault state as a first fault state, wherein the first fault state is used for indicating that the at least one storage node can be completely repaired within the first preset time.
13. The apparatus of claim 11, wherein the second determining unit is configured to:
when the number of the at least one storage node is larger than the redundancy of the distributed storage system and the data volume of the target data meets a preset condition, determining the fault state according to a fault scene of the distributed storage system, wherein the fault scene is used for indicating whether the at least one storage node simultaneously breaks down.
14. A fault handling apparatus, applied to a distributed storage system, the distributed storage system including a plurality of storage nodes, the apparatus comprising:
a sending module, configured to send an access request to a target storage node in the distributed storage system, where the distributed storage system includes a plurality of storage nodes;
the receiving module is used for receiving a response returned by the target storage node; the response comprises a fault state of the distributed storage system, wherein the fault state is determined according to at least one storage node with a fault in the distributed storage system and target data which cannot be accessed in the at least one storage node; the fault state is used for indicating whether at least one storage node can be completely repaired within a first preset time length.
15. The apparatus of claim 14, further comprising:
and the processing module is used for processing the fault based on the fault state contained in the response.
16. The apparatus of claim 15, wherein the processing module is configured to:
when the access request is sent by a target client in the distributed storage system based on a target virtual machine, and the target virtual machine is a VMWare virtual machine, if the fault state is a first fault state, the access request is not responded to the target virtual machine, and the first fault state is used for indicating that at least one storage node can be completely repaired within a first preset time length;
when the access request is sent by the target client based on a target virtual machine, and the target virtual machine is a VMWare virtual machine, if the fault state is a second fault state, returning a message of abnormal storage to the target virtual machine, wherein the second fault state is used for indicating that the at least one storage node cannot be completely repaired within the first preset time.
17. The apparatus of claim 15, wherein the processing module is configured to:
when the access request is sent by a target client in the distributed storage system based on a target virtual machine and the target virtual machine is not a VMWare virtual machine, if the fault state is a first fault state, sending a retry request to the target virtual machine, wherein the retry request is used for indicating to resend the access request, and the first fault state is used for indicating that at least one storage node can be completely repaired within a first preset time length;
when the access request is sent by the target client based on a target virtual machine and the target virtual machine is not a VMWare virtual machine, if the fault state is a second fault state, returning a target error which can be identified by the target virtual machine to the target virtual machine, wherein the target error is used for indicating a storage medium fault, and the second fault state is used for indicating that the at least one storage node cannot be completely repaired within the first preset time length.
18. A computer device comprising a processor and a memory, the memory having stored therein at least one instruction that is loaded and executed by the processor to perform operations performed by the distributed storage system fault handling method of any of claims 1 to 8.
19. A storage medium having stored therein at least one instruction that is loaded and executed by a processor to perform an operation performed by the distributed storage system fault handling method of any one of claims 1 to 8.
CN201910741190.5A 2019-08-12 2019-08-12 Fault processing method and device, computer equipment, storage medium and storage system Active CN110535692B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201910741190.5A CN110535692B (en) 2019-08-12 2019-08-12 Fault processing method and device, computer equipment, storage medium and storage system
PCT/CN2020/102302 WO2021027481A1 (en) 2019-08-12 2020-07-16 Fault processing method, apparatus, computer device, storage medium and storage system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910741190.5A CN110535692B (en) 2019-08-12 2019-08-12 Fault processing method and device, computer equipment, storage medium and storage system

Publications (2)

Publication Number Publication Date
CN110535692A CN110535692A (en) 2019-12-03
CN110535692B true CN110535692B (en) 2020-12-18

Family

ID=68662506

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910741190.5A Active CN110535692B (en) 2019-08-12 2019-08-12 Fault processing method and device, computer equipment, storage medium and storage system

Country Status (2)

Country Link
CN (1) CN110535692B (en)
WO (1) WO2021027481A1 (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110535692B (en) * 2019-08-12 2020-12-18 华为技术有限公司 Fault processing method and device, computer equipment, storage medium and storage system
CN111371848A (en) * 2020-02-21 2020-07-03 苏州浪潮智能科技有限公司 Request processing method, device, equipment and storage medium
CN113805788B (en) * 2020-06-12 2024-04-09 华为技术有限公司 Distributed storage system and exception handling method and related device thereof
CN112187919B (en) * 2020-09-28 2024-01-23 腾讯科技(深圳)有限公司 Storage node management method and related device
CN113326251B (en) * 2021-06-25 2024-02-23 深信服科技股份有限公司 Data management method, system, device and storage medium
US11544139B1 (en) * 2021-11-30 2023-01-03 Vast Data Ltd. Resolving erred 10 flows
CN114584454B (en) * 2022-02-21 2023-08-11 苏州浪潮智能科技有限公司 Processing method and device of server information, electronic equipment and storage medium
CN117008815A (en) * 2022-04-28 2023-11-07 华为技术有限公司 Storage device and data processing method
CN116382850B (en) * 2023-04-10 2023-11-07 北京志凌海纳科技有限公司 Virtual machine high availability management device and system using multi-storage heartbeat detection

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104935481A (en) * 2015-06-24 2015-09-23 华中科技大学 Data recovery method based on redundancy mechanism in distributed storage
CN108984107A (en) * 2017-06-02 2018-12-11 伊姆西Ip控股有限责任公司 Improve the availability of storage system

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103092712B (en) * 2011-11-04 2016-03-30 阿里巴巴集团控股有限公司 A kind of tasks interrupt restoration methods and equipment
US10691479B2 (en) * 2017-06-28 2020-06-23 Vmware, Inc. Virtual machine placement based on device profiles
CN109831342A (en) * 2019-03-19 2019-05-31 江苏汇智达信息科技有限公司 A kind of fault recovery method based on distributed system
CN110535692B (en) * 2019-08-12 2020-12-18 华为技术有限公司 Fault processing method and device, computer equipment, storage medium and storage system

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104935481A (en) * 2015-06-24 2015-09-23 华中科技大学 Data recovery method based on redundancy mechanism in distributed storage
CN108984107A (en) * 2017-06-02 2018-12-11 伊姆西Ip控股有限责任公司 Improve the availability of storage system

Also Published As

Publication number Publication date
WO2021027481A1 (en) 2021-02-18
CN110535692A (en) 2019-12-03

Similar Documents

Publication Publication Date Title
CN110535692B (en) Fault processing method and device, computer equipment, storage medium and storage system
RU2644146C2 (en) Method, device and control system of fault processing
US9021317B2 (en) Reporting and processing computer operation failure alerts
US20140059315A1 (en) Computer system, data management method and data management program
JP5617304B2 (en) Switching device, information processing device, and fault notification control program
US7499987B2 (en) Deterministically electing an active node
CN109586989B (en) State checking method, device and cluster system
CN111342986B (en) Distributed node management method and device, distributed system and storage medium
CN108512753B (en) Method and device for transmitting messages in cluster file system
CN112738295B (en) IP address exception handling method, device, computer system and storage medium
US10860411B2 (en) Automatically detecting time-of-fault bugs in cloud systems
US20140164851A1 (en) Fault Processing in a System
TW201510995A (en) Method for maintaining file system of computer system
CN105323271A (en) Cloud computing system, and processing method and apparatus thereof
CN111162938A (en) Data processing system and method
US8819481B2 (en) Managing storage providers in a clustered appliance environment
CN113596195B (en) Public IP address management method, device, main node and storage medium
CN116382850B (en) Virtual machine high availability management device and system using multi-storage heartbeat detection
US20230090032A1 (en) Storage system and control method
US11880266B2 (en) Malfunction monitor for computing devices
CN113965576B (en) Container-based big data acquisition method, device, storage medium and equipment
CN113609199B (en) Database system, server, and storage medium
US20230106077A1 (en) Distributed Storage System, Exception Handling Method Thereof, and Related Apparatus
CN117312081A (en) Fault detection method, device, equipment and medium for distributed storage system
RU2672184C1 (en) Method, device and management system for processing failures

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant