WO2021027481A1 - 故障处理方法、装置、计算机设备、存储介质及存储系统 - Google Patents
故障处理方法、装置、计算机设备、存储介质及存储系统 Download PDFInfo
- Publication number
- WO2021027481A1 WO2021027481A1 PCT/CN2020/102302 CN2020102302W WO2021027481A1 WO 2021027481 A1 WO2021027481 A1 WO 2021027481A1 CN 2020102302 W CN2020102302 W CN 2020102302W WO 2021027481 A1 WO2021027481 A1 WO 2021027481A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- failure
- storage node
- target
- fault
- node
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/06—Management of faults, events, alarms or notifications
- H04L41/0654—Management of faults, events, alarms or notifications using network fault recovery
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
- H04L67/1097—Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]
Definitions
- the present invention relates to the technical field of fault processing, and in particular to a method, device, computer equipment, storage medium and distributed storage system for fault processing of a distributed storage system.
- the fault handling can be the following process:
- the client sends a small computer system interface (SCSI) request to multiple storage nodes; when all storage nodes in the distributed storage system appear
- SCSI small computer system interface
- the storage node will not respond to the SCSI request.
- the client will determine the fault status of the distributed storage system Determined as the all path down (APD) state.
- the APD state is a storage node failure state defined by the VMWare virtual machine. It is used to indicate that all paths of the back-end storage node cannot respond to host requests, and the client is suspended.
- the processed SCSI request is waiting for technicians to repair the fault in the distributed storage system; when the storage node has a short-term unrecoverable fault, the storage node returns a storage exception message to the client, and the client receives the storage exception.
- the fault status of the distributed storage system is determined to be a permanent device lost (PDL) status, where the PDL status is a fault status of a storage node defined by the VMWare virtual machine, used to indicate the back-end storage Long-term or permanent failure of the node.
- PDL status is a fault status of a storage node defined by the VMWare virtual machine, used to indicate the back-end storage Long-term or permanent failure of the node.
- the long-term failure of the storage node will cause the file system in the distributed storage system to be damaged.
- the client powers off the file system and waits for the technician Fix the fault.
- the client will execute the above fault handling process only when all storage nodes in the distributed storage system fail. However, when some storage nodes in the distributed storage system fail, the client The terminal will not perform the above troubleshooting process. However, the failure of some storage nodes in a distributed storage system is a relatively common phenomenon. Once some storage nodes fail, if the storage nodes in the distributed storage system are not diagnosed, the client cannot determine the distribution. Whether the storage node in the storage system is faulty, so that the technicians cannot get the information of the storage node failure from the client in time, and cannot immediately repair the failed storage node, thus prolonging the recovery of the distributed storage system. time.
- the embodiments of the present invention provide a distributed storage system failure processing method, device, computer equipment, storage medium, and storage system, which can reduce the time it takes for the distributed storage system to recover.
- the technical scheme is as follows:
- a fault handling method for a distributed storage system including a plurality of storage nodes; the method includes:
- the failure state is used to indicate whether the at least one failed storage node can be All are repaired within the set time;
- the failure status of the distributed storage system is determined according to at least one of the storage nodes in the plurality of storage nodes that has failed, so that it is not necessary to determine the failure status of the distributed storage system when all storage nodes fail.
- Fault status when the fault status is determined, the fault status can be sent to each storage node in the distributed storage system immediately, so that each storage node performs fault handling according to the determined fault status, which can reduce the recovery of the distributed storage system to normal The time spent.
- the method further includes:
- the determining the failure state of the distributed storage system according to at least one of the storage nodes in the plurality of storage nodes has failed includes:
- the determining the fault state according to the at least one storage node and the target data includes:
- the failure state is determined to be the first failure state, and the first failure state is A failure state is used to indicate that the at least one storage node can be completely repaired within the first preset time period.
- the target device in the first failure state, will not power off the file system. Once the at least one storage node can be repaired within the first preset time, powering off the file system can be avoided. , Which can reduce the time to repair the file system, so that the distributed storage system can resume business as soon as possible to ensure the quality of service.
- the determining the fault state according to the at least one storage node and the target data includes:
- the failure scenario is used to indicate whether the at least one storage node fails at the same time.
- the preset condition includes any one of the following:
- the ratio between the data volume of the target data and the first preset data volume is greater than the preset ratio, and the first preset data volume is the total data volume of all data stored in the distributed storage system;
- the data amount of the target data is greater than the second preset data amount.
- the method further includes:
- the failure state is the first failure state
- the failure state is updated from the first failure state to the second failure state A failure state, where the second failure state is used to indicate that the at least one storage node cannot be completely repaired within the first preset time period.
- the method before determining the failure state of the distributed storage system according to the failure scenario of the distributed storage system, the method further includes:
- the determining the failure scenario according to the time when the at least one storage node fails includes:
- the failure scenario is determined to be the first failure scenario; otherwise, the failure scenario is determined to be the second failure scenario, and the first failure scenario is used for It is indicated that the at least one storage node fails at the same time, and the second failure scenario is used to indicate that the at least one storage node fails at different times.
- the determining the failure state of the distributed storage system according to the failure scenario of the distributed storage system includes:
- the failure state is determined according to the failure type of each storage node in the at least one storage node, and the first failure scenario is used to indicate that the at least one storage node appears simultaneously Failure, the failure type is used to indicate whether a storage node failure can be repaired within a second preset time period;
- the failure state is determined according to the failure type of the first storage node that failed the latest among the at least one storage node, and the second failure scenario is used to indicate the at least A storage node fails at different times.
- the determining the failure state according to the failure type of each storage node in the at least one storage node includes:
- the failure state is determined to be the first failure state, and the first failure type is used to indicate the failure energy of one storage node Be repaired within the second preset time period, and the first failure state is used to indicate that the at least one storage node can be repaired in the first preset time period;
- the failure state is determined Is the first failure state; otherwise, the failure state is determined to be the second failure state, and the second failure type is used to indicate that the failure of a storage node cannot be repaired within the second preset time period, so The second failure state is used to indicate that the at least one storage node cannot be completely repaired within the first preset time period.
- the determining the failure state according to the failure type of the first storage node that failed the latest among the at least one storage node includes:
- the failure state is determined to be the first failure state, and the first failure type is used to indicate that the failure of a storage node can occur in the second failure state. Assuming that it is repaired within a time period, the first failure state is used to indicate that the at least one storage node can be repaired in the first preset time period;
- the failure state is determined to be the second failure state, and the second failure type is used to indicate that the failure of a storage node cannot be in the second failure state. It is repaired within a preset time period, and the second failure state is used to indicate that the at least one storage node cannot be fully repaired within the first preset time period.
- the method before determining the failure state of the distributed storage system according to the failure scenario of the distributed storage system, the method further includes:
- any storage node of the at least one storage node when a preset network failure, a preset abnormal power-down failure, a preset misoperation failure, a preset hardware failure, or a scheduled If the software fails, the failure type of any storage node is determined as the first failure type; otherwise, the failure type of any storage node is determined as the second failure type, and the first failure type is used for It indicates that the failure of a storage node can be repaired within the second preset time period, and the second failure type is used to indicate that the failure of a storage node cannot be repaired within the second preset time period.
- the method further includes:
- a repair completion response is sent to each device in the distributed storage system, and the repair completion response is used to indicate that there is no faulty device in the distributed storage system.
- a method for troubleshooting a distributed storage system where the distributed storage system includes multiple storage nodes; the method includes:
- the response returned by the target storage node includes the fault status of the distributed storage system; the fault status is used to indicate whether at least one failed storage node can be fully recovered within the first preset time period repair.
- the method further includes:
- the fault identifier of the fault state includes any one of a first fault identifier or a second fault identifier, wherein the first fault identifier is used to indicate the first fault state, and the second fault identifier
- the failure flag is used to indicate a second failure state
- the first failure state is used to indicate that at least one storage node can be repaired within a first preset time period
- the second failure state is used to indicate the at least one storage node All cannot be repaired within the first preset time period
- the storage node is a storage node that has failed in the distributed storage system.
- the performing fault processing based on the fault status contained in the response includes:
- the target virtual machine is a VMWare virtual machine
- the failure state is the first failure state
- no The target virtual machine responds to the access request
- the first failure state is used to indicate that at least one storage node can be repaired in a first preset time period
- the target virtual machine is a VMWare virtual machine
- the fault state is the second fault state
- An abnormal message is stored, and the second failure state is used to indicate that the at least one storage node cannot be completely repaired within the first preset time period.
- the performing fault processing based on the fault status contained in the response includes:
- the target virtual machine When the access request is sent by the target client in the distributed storage system based on the target virtual machine, and the target virtual machine is not a VMWare virtual machine, if the fault state is the first fault state, the The target virtual machine sends a retry request, the retry request is used to instruct to reissue the access request, and the first failure state is used to indicate that at least one storage node can be repaired in a first preset time;
- the target virtual machine is not a VMWare virtual machine
- the fault state is the second fault state
- the target error can be identified by the target virtual machine, the target error is used to indicate a storage medium failure, and the second failure state is used to indicate that the at least one storage node cannot be completely repaired within the first preset time period.
- the method before the sending an access request to any storage node in the distributed storage system, the method further includes:
- the sending an access request to the target storage node in the distributed storage system includes:
- the access request is sent to the target storage node in the distributed storage system.
- the method further includes:
- a distributed storage system in a third aspect, includes a supervisory node and a plurality of storage nodes;
- the supervisory node is used for:
- the failure state of the distributed storage system is determined according to at least one of the storage nodes that has failed; the failure state is used to indicate whether the at least one failed storage node can be All are repaired within a time period;
- Each storage node of the plurality of storage nodes is configured to receive the failure state.
- the fault identifier of the fault state includes any one of a first fault identifier and a second fault identifier, and the first fault identifier is used to indicate the first fault state, and the first fault identifier is The second failure flag is used to indicate a second failure state, the first failure state is used to indicate that the at least one storage node can be repaired within the first preset time period, and the second failure state is used to indicate all The at least one storage node cannot be completely repaired within the first preset time period.
- each storage node of the multiple storage nodes is further used to suspend the access request if the access request is received after receiving the fault state, based on After receiving the fault status, perform fault handling.
- a fault handling device for executing the above-mentioned distributed storage system fault handling method.
- the fault handling device includes a functional module for executing the fault handling method provided in the foregoing first aspect or any optional manner of the foregoing first aspect.
- a fault handling device which is used to execute the above-mentioned distributed storage system fault handling method.
- the fault processing device includes a functional module for executing the fault processing method provided in the foregoing second aspect or any optional manner of the foregoing second aspect.
- a computer device in a sixth aspect, includes a processor and a memory, and at least one instruction is stored in the memory. The instruction is loaded and executed by the processor to implement the execution of the above-mentioned distributed storage system failure handling method. Operation.
- a storage medium is provided, and at least one instruction is stored in the storage medium, and the instruction is loaded and executed by a processor to implement operations performed by the above-mentioned distributed storage system failure handling method.
- Figure 1 is a schematic diagram of a distributed storage system provided by an embodiment of the present invention.
- FIG. 2 is a schematic diagram of a network environment of a distributed storage system provided by an embodiment of the present invention
- Figure 3 is a schematic diagram of interaction between various devices in a distributed storage system provided by an embodiment of the present invention
- Figure 4 is a schematic structural diagram of a computer device provided by an embodiment of the present invention.
- FIG. 5 is a flowchart of a method for processing a fault in a distributed storage system according to an embodiment of the present invention
- FIG. 6 is a flowchart of a method for processing a fault in a distributed storage system according to an embodiment of the present invention
- FIG. 7 is a schematic structural diagram of a fault handling device provided by an embodiment of the present invention.
- Fig. 8 is a schematic structural diagram of a fault handling device provided by an embodiment of the present invention.
- FIG. 1 is a schematic diagram of a distributed storage system provided by an embodiment of the present invention.
- the distributed storage system includes at least one client 101, multiple storage nodes 102, and a supervisory node 103.
- the client 101 is used to provide users with data storage and data reading services, that is, the client 101 can store data uploaded by the user in the storage node 102 or read data from the storage node 102.
- the storage node 102 is used to store data written by the client 101, and is also used to return data to the client 101.
- the returned data can be the data requested by the client 101 to read, or it can return to the client 101 to the supervisory node 103
- the fault status of the distributed storage system is sent so that the client 101 can handle the faults that occur according to the fault status of the distributed storage system.
- the supervisory node 103 is used to monitor whether each storage node 102 in the distributed storage system fails. When the number of failed storage nodes in the distributed storage system is higher than the redundancy of the distributed storage system, it may affect The normal operation of the business, therefore, the supervisory node 103 can determine the fault status of the distributed storage system according to the storage node that has failed in the distributed storage system, and send the determined fault status to all storage nodes 102 for storage The node 102 can inform the client 101 of the fault status of the distributed storage system, so that the client 101 or the storage node 102 can perform fault handling according to the fault status of the distributed storage system.
- the baseboard management controller (BMC) of each storage node 102 can monitor in real time whether the respective storage node fails.
- the BMC can store the cause and time of the failure of any storage node, and can also send the cause and time of the failure of any storage node to the supervisory node 103 so that the supervisory node 103 can Know whether any storage node is faulty.
- the supervisory node 103 can access each storage node 102 to learn whether each storage node 102 has a failure. If there is a failure, Then, the cause of the failure and the time of the failure can be obtained from the failed storage node.
- the supervisory node 103 can respond to The reason for the failure of the failed storage node and the time of failure are stored, so that the failure type and failure scenario can be determined according to the cause of the failure and the time of failure sent by the failed storage node.
- the description of the failure type Refer to step 603 below, and for a description of the failure scenario, refer to step 602 below.
- the supervisory node 103 can store the cause of the failure of the failed storage node and the time of the failure may include: storing the cause of the failure and the time of the failure sent by the failed storage node in a failure table.
- the failure table may Storage number, storage node identification, failure time, and failure reason, where the number is used to indicate the storage node that has failed, and the identification of a storage node is used to uniquely indicate a storage node, which can be the Internet of storage nodes
- the protocol address Internet protocol address, IP
- MAC media access control address
- the identification of the storage node is not specifically limited.
- the failure time is the time when the storage node fails, and the cause of the failure is the cause of the storage node failure.
- the failure table shown in Table 1 it can be seen from Table 1 that there are two storage nodes that have failed in the current distributed storage system, namely the storage node identified as X and the storage node identified as Y, where the identification is X
- the time of failure of the storage node is A, and the cause of failure is D
- the time of failure of the storage node identified as Y is B
- the cause of failure is C.
- the supervisory node can delete the related information of the repaired storage node in the failure table, so that the supervisory node can according to the last storage node in the failure table
- the node number determines the number of failed storage nodes in the distributed storage system.
- the storage node in the distributed storage system fails, it may cause damage to the file system in the distributed storage system.
- the metadata in the file system may be wrong, because the metadata is used to describe System data of the characteristics of a file, such as access permissions, file owner, and distribution information of data blocks in the file.
- the data blocks in the file indicated by the metadata may not be accessible.
- the client when a client fails to access a data block in any storage node, the client can send the data volume of the data block and the data identifier that uniquely identifies the data block to the supervisory node, and the supervisory node receives After the data volume and data identification, the received data volume and data identification are stored. Specifically, they can be stored in a data table.
- the data table can be used to store the total amount of data, the data identification, and the data corresponding to each data identification. The amount of data, where the total amount of data is the amount of all data that is currently inaccessible in the distributed storage system.
- the currently inaccessible data in the distributed storage system is the data in the data block indicated by the data identifier M and the data in the data block indicated by the data identifier N.
- the total amount of data that cannot be accessed currently is 30 kilobytes (KB)
- the currently inaccessible data includes 10KB of data in the data block indicated by the data identifier M and the data indicated by the data identifier N 20KB of data in a block.
- the client accesses a data block that cannot be accessed again, if the client can access successfully, the client sends an access data success response carrying the data identifier of the data block to the supervisory node.
- the data success response is used to indicate that the data in the data block can be accessed.
- the supervisory node can delete the data amount corresponding to the data identifier and update the data table The total amount of data. For example, if the successful response of the access data carries the data identifier M, the supervisory node deletes the information related to the data identifier M in the data table and updates the total amount of data to 20KB.
- the data table may also store the identifier of the storage node corresponding to the data identifier to indicate which data block in which storage node cannot be accessed.
- the number of clients 101 and storage nodes 102 may be relatively large.
- the client 101 is located At least one service switch can be set up at the application layer of, for interaction between the client 101 and the storage node 102.
- At least one storage switch may be provided in the storage layer where the storage node 102 is located, so as to implement interaction between the storage nodes 102.
- a supervisory switch may be provided to realize the interaction between the supervisory node 103 and the storage node 102.
- At least one network port can be installed in the client, the storage node, and the supervisory node.
- the at least one network port can be used to connect to different networks.
- the network can transmit data of different services, and the at least one network port can be a business network port connected to a business network, a supervisory network port connected to a supervisory network, and a BMC network port connected to a BMC network.
- the network may include a business network and a supervisory network. And BMC network.
- the business network is the network used for heartbeat, data synchronization and mirroring between storage nodes. For example, when the data block 1 stored in storage node 1 is synchronized to storage node 2, storage node 1 can pass through the service network In this business network, if data block 1 is sent to storage node 2, then storage node 2 can receive data block 1 through its own service network port and store data block 1 in storage node 2.
- the supervisory network is a network used to monitor whether a storage node fails and for information query.
- the fault status of the distributed storage system issued by the supervisory node can be transmitted, and the storage node that has failed can also be queried.
- the supervisory node can send the fault status of the distributed storage system to the supervisory network port of the storage node through the supervisory network port, and the storage node can send the fault status of the distributed storage system through its supervisory network port. Receive the fault status of the supervisory node.
- the storage node receives the client's service request (that is, the SCSI request in the following), it can directly return the issued fault status to the client without processing the received service request.
- the corresponding fault handling can be carried out according to the fault status.
- the BMC network is a network that manages the BMC.
- the supervisory node can monitor the status of the BMC by accessing the BMC network port of the BMC network. According to the monitored BMC status, it can determine whether the storage node is faulty.
- the BMC network is an optional network. In some implementations, the BMC may not be used to monitor whether the storage node is faulty, but other methods may be used to monitor whether the storage node is faulty. Therefore, the distribution The BMC network may not be set in the storage system, and monitoring can be realized directly through the monitoring network.
- the supervisory node can receive the status information of the storage node from the three networks in real time, such as whether it is faulty.
- a storage At least one object storage device (OSD) process, SCSI processing process, and node monitoring service (node monitor service, NMS) agent process can be installed in the node.
- OSD object storage device
- SCSI processing process SCSI processing process
- node monitoring service (node monitor service, NMS) agent process can be installed in the node.
- an OSD process can correspond to one or more storage media used to store data in a storage node.
- the storage media can be a hard disk.
- the OSD process is used to manage access requests for one or more storage media, and the access requests are used to indicate Processing the data to be processed, wherein processing the data to be processed may include reading data blocks stored in the at least one or more storage media, the data blocks to be processed include the data to be processed, and processing the data to be processed The processing may further include writing the data to be processed into the at least one or more storage media.
- the access request can be sent using SCSI
- the access request can be regarded as a SCSI request.
- the SCSI processing process is used to obtain the SCSI request sent by the client from the service network, convert and decompose the SCSI request, obtain multiple SCSI sub-requests, and issue the multiple SCSI sub-requests to the corresponding OSD process.
- the logical block address (logical block address, LBA) of the SCSI request carrying the data to be read is 100-200. Since the storage location indicated by LBA100-150 is in storage medium 1 of storage node 1, LBA151-200 The indicated storage location is in storage medium 2 of storage node 1, then the SCSI processing process can convert and decompose the SCSI request into two SCSI sub-requests, where SCSI sub-request 1 is used to indicate a request to read the LBA in storage medium 1.
- SCSI sub-request 2 is used to indicate a request to read the data stored at 151-200 of LBA in storage medium 2, so that the SCSI processing process can send SCSI sub-request 1 to the OSD process corresponding to storage medium 1. , Send the SCSI sub-request 2 to the OSD process corresponding to the storage medium 2.
- An NMS agent process is used to receive the fault status of the distributed storage system issued by the supervisory node, and send the received fault status to all OSD processes of a storage node.
- the client sends the fault status of the distributed storage system to the storage node through the supervisory network
- the NMS agent process in the storage node obtains the fault status sent by the supervisory node from the supervisory network, and sends the obtained fault status to the storage node
- each OSD process in the OSD process when any OSD process receives the fault status, if it receives the SCSI sub-request or SCSI request sent by the SCSI processing process, it will directly send the received fault status to the SCSI processing process in order to install the SCSI process
- the process equipment performs fault handling according to the received fault status.
- the SCSI processing process is not installed in the storage node, but is installed in the client.
- the embodiment of the present invention does not specifically limit the device on which the SCSI processing process is installed.
- the SCSI request in the SCSI processing process of the client carries the LBA of the data to be read is 0-100, because the storage location indicated by LBA 0-50 is in storage node 1, and the storage location indicated by LBA 51-100 is In storage node 2, the SCSI processing process can convert and decompose the SCSI request into two SCSI sub-requests.
- SCSI sub-request 1 is used to indicate a request to read the data stored at LBA 0-50 in storage node 1.
- Request 2 is used to indicate a request to read the data stored at LBA 51-100 in storage node 2, so that the SCSI processing process can send SCSI sub-request 1 to the OSD process in storage node 1, and SCSI sub-request 2 to the storage node OSD process within 2.
- the client, the storage node, and the supervisory node may all be computer equipment.
- the computer equipment 400 includes The configuration or performance is different to produce relatively large differences, which may include one or more processors (central processing units, CPU) 401 and one or more memories 402, where at least one instruction is stored in the memory 402, and the at least An instruction is loaded and executed by the processor 401 to implement the method provided in the following fault handling method embodiment.
- the computer device 400 may also have components such as a wired or wireless network interface, a keyboard, and an input/output interface for input and output.
- the computer device 400 may also include other components for implementing device functions, which will not be repeated here.
- a computer-readable storage medium such as a memory including instructions, which can be executed by a processor in a terminal to complete the fault handling method in the following embodiments.
- the computer-readable storage medium may be a read-only memory (read-only memory, ROM), a random access memory (random access memory, RAM), a compact disc read-only memory (CD-ROM), Tape, floppy disk and optical data storage nodes, etc.
- the supervisory node can determine the failure state of the distributed storage system according to the storage node that has failed, and the supervisory node sends the determined failure state to all storage nodes in the distributed storage system; the client After receiving a user's read request or write request, it can send a SCSI request to the storage node in the distributed storage system to complete the user's read request or write request; when any storage node receives the SCSI request, any The storage node temporarily does not process the received SCSI request, and returns the received fault status to the client, so that the client can perform fault handling based on the fault status returned by any storage node. In some embodiments, any storage node can also perform fault handling based on the fault status. In a possible implementation, when any storage node receives the fault status sent by the supervisory node, if it receives the client If the SCSI request is sent, any storage node can perform fault handling according to the fault status.
- the method specifically includes:
- the supervisory node determines at least one storage node that has failed in the distributed storage system and target data to which the at least one storage node cannot be accessed.
- the target data may also be data that is stored in the distributed storage system and cannot be accessed, and the target data may also be only the inaccessible data stored by the at least one storage node.
- the target data is data that cannot be accessed in the at least one storage node as an example for description.
- the supervisory node can determine the at least one storage node and the target data by querying. In a possible implementation manner, the supervisory node can query the fault table and the data table every eighth preset period of time. The faulty storage node is determined in the fault table, and the data that cannot be accessed is determined from the data table.
- the eighth preset duration may be 10 minutes or 1 hour, and the embodiment of the present invention does not specifically limit the eighth preset duration.
- the supervisory node After the supervisory node completes this step 501, it can also determine the fault state based on at least one storage node and the target data. Then, it is determined that at least one storage node in the distributed storage system has failed and the at least one storage node cannot The target data being accessed; the process of determining the fault state according to at least one storage node and the target data, that is, the process of determining the distributed storage according to at least one of the plurality of storage nodes that has failed The process of the failure state of the system. Wherein, the process of determining the failure state of the distributed storage system according to at least one of the storage nodes in the plurality of storage nodes may be implemented by the process shown in step 502 below.
- the supervisory node determines the failure state of the distributed storage system as the first A fault state.
- the redundancy is the redundancy of the data stored in the distributed storage system, that is, the number of copies of the data stored in the distributed storage system, and the failure status of the distributed storage system is used to indicate that the at least one storage node can Whether it is all repaired within the first preset time period, the failure state may include any one of the first failure state and the second failure state, where the second failure state is used to indicate that the at least one storage node cannot be All are repaired within the set time.
- the at least one storage node when the at least one storage node can be repaired within the first preset time period, it can be considered that the fault in the distributed storage system can be repaired in a short time, the distributed storage system is in the first failure state, and the first failure state A failure state is also a transient node down (TND) state.
- TTD transient node down
- the at least one storage node When the at least one storage node can be repaired within the first preset time period, it is considered that the fault in the distributed storage system cannot be repaired in a short time, the distributed storage system is in the second failure state, and the second failure state is also It is a permanent node down (PND) state.
- the first preset duration may be 20 minutes or 2 hours, and the embodiment of the present invention does not specifically limit the first preset duration.
- the distributed storage system When the data stored in the distributed storage system is large and the number of storage nodes is also large, if the number of failed storage nodes in the distributed storage system is less than the redundancy of the distributed storage system, the distributed storage system The non-failed storage nodes can reconstruct the data that cannot be accessed in the failed storage node based on the data that can be accessed. Therefore, when the number of failed storage nodes in the distributed storage system is less than the redundancy of the distributed storage system When redundancy, the normal business of the distributed storage system will not be affected, so there is no need to repair the faulty storage node.
- the supervisory node can first determine the fault status of the distributed storage system so that it can quickly handle the fault according to the fault status to minimize the impact on the business.
- the target data is small, the impact on the business is relatively small, and the normal operation of the business may not be affected.
- the supervision node may temporarily not perform fault handling.
- the supervisory node can determine whether the data volume of the target data can affect the normal operation of the business through preset conditions.
- the preset conditions may include any of the following: the data volume of the target data and the first preset data volume The ratio is greater than the preset ratio, the first preset data amount is the total data amount of all data stored in the distributed storage system; the data amount of the target data is greater than the second preset data amount.
- the supervisory node can first set the failure state of the distributed storage system as the first failure state, so that it can quickly follow the failure state, Perform troubleshooting to reduce the impact on the business. If the first preset time is exceeded and the repair of the at least one storage node is not completed, the failure state may be updated to the second failure state.
- the preset ratio may be 0.4, 0.5 or 0.6, and the embodiment of the present invention does not specifically limit the preset ratio, the first preset data amount, and the second preset data amount.
- the supervisory node may query the failure table stored by the supervisory node, and determine the number of the at least one storage node from the failure table.
- the supervisory node can query the stored data table and determine the data volume of the target data that is currently inaccessible in the distributed storage system from the data table. For example, the supervisory node can determine by querying the table 2 that the target data includes 10KB of data in the data block indicated by the data identifier M and 20KB of data in the data block indicated by the data identifier N.
- the embodiment of the present invention relates to the slave failure table. The process of determining the number of storage nodes that have failed in the distributed storage system will not be repeated.
- process shown in this step 502 is also a process of determining the fault state of the distributed storage system based on the at least one storage node and the target data.
- the supervisory node sends a first failure identifier for indicating a first failure state to all storage nodes in the distributed storage system.
- the fault indicator of the fault state includes any one of a first fault indicator and a second fault indicator, where the first fault indicator is used to indicate the first fault state, and the second fault indicator is used to indicate the second fault state.
- the identifier and the second fault identifier may be different.
- the first fault identifier may be s
- the second fault identifier may be t.
- the embodiment of the present invention does not specifically limit the manner in which the first fault identifier and the second fault identifier are expressed.
- the supervisory node can send the first failure identifier to the NMS agent process of each storage node, so that the first failure identifier is sent to all storage nodes to inform all storage nodes that the current failure state of the distributed storage system is the first failure status.
- process shown in this step 503 is also a process in which the supervisory node sends the fault state to each of the multiple storage nodes included in the distributed storage system.
- the supervisory node may also perform fault handling based on the fault status of the storage system.
- the embodiment of the present invention does not specifically deal with the fault handling process of the supervisory node. limited.
- the target storage node in the distributed storage system receives the first failure identifier.
- the target storage node is any storage node in the distributed storage system, and each OSD process in the target storage node can obtain the first failure identifier from the NMS agent process of the target storage node, so that the target storage node's Each OSD process can obtain the first failure identifier. It should be noted that each storage node in the distributed storage system can perform this step 504. For a failed storage node, the first failure identification may or may not be received.
- the target device sends an access request to the target storage node.
- the access request is used to instruct to read data stored in the target storage node or write data to the target storage node.
- the target device is a device installed with a SCSI processing process, and may be the target client or a target storage node, where the target client is any client in the distributed storage system.
- This step 506 can be implemented by the SCSI processing process in the target device.
- the target client in the distributed storage system can send a target access request to the target device.
- the target access request is used to process the first target data, where the first target data includes the access request. Indicates the data, the target access request may carry a target storage address, and the target storage address may be a storage address of the first target data.
- the target virtual machine that can be installed by the target client sends a target access request. Specifically, the target virtual machine may send the target access request to the SCSI processing process in the target device. The target virtual machine sending a target access request to the SCSI processing process can be triggered by the user's action.
- the client's The target virtual machine sends a target access request to the SCSI processing process to request to read the data stored at the storage address input by the user.
- the target device receives the target access request sent by the target client in the distributed storage system.
- This step 506 can be implemented in the following manner: the target device sends the target storage node in the distributed storage system to the target storage node based on the target access request. Send an access request. Specifically, after receiving the target access request, the SCSI processing process transforms and decomposes the target access request according to the target address to obtain multiple access requests. Each access request can carry part of the address in the target address.
- the address can be an offset address in the storage medium managed by any OSD process in the target storage node, so that the SCSI processing process sends a corresponding access request to the OSD process in the target storage node, and the process of converting the access request into the access request That is, the aforementioned process of converting SCSI requests.
- the target storage node in the distributed storage system After the target storage node in the distributed storage system receives the first fault identifier, if the target storage node receives an access request again, the target storage node suspends the access request and sends the first fault to the target device Logo.
- This step 506 can be implemented by the OSD process in the target storage node that receives the access request.
- the target storage node receives the first failure identifier, it means that the target storage node already knows the failed storage node in the distributed storage system, and the current failure state is the first failure state, because the current distributed storage system If the storage node fails, the target storage node can suspend the access request, and will not process the access request temporarily, waiting for the failed storage node to be automatically repaired or manually repaired.
- the target storage node may output the first failure identifier to the target device.
- any OSD process in the target storage node receives the access request sent by the SCSI processing process of the target device, any OSD process sends the first failure identifier to the SCSI processing process.
- the supervisory node determines the fault status of the distributed storage system, it can also send the fault status directly to the target device, so that the target device can obtain the fault status of the distributed storage system without passing through The storage node sends the failure status to the target device.
- the process shown in this step 506 is also the process of outputting the failure identification when the target storage node in the distributed storage system receives the failure identification, if the target storage node receives the access request again .
- the target device receives the first failure identifier returned by the target storage node based on the access request.
- This step 507 is implemented by the SCSI processing process in the target device.
- the process shown in this step 507 is also a process of receiving the failure identifier returned by the target storage node based on the access request.
- the SCSI processing process may receive the first failure identifier.
- the process shown in this step 507 is to receive the response returned by the target storage node; the response includes the fault status of the distributed storage system; the fault status is used to indicate at least The process of whether a failed storage node can be completely repaired within the first preset time period, wherein the response includes the failure status of the distributed storage system, that is, the failure identification.
- the target device performs fault processing based on the received first fault identifier.
- This step 508 can be performed by the SCSI processing process installed in the target device. After the SCSI processing process receives the first fault identifier, the SCSI processing process can perform fault processing based on the first fault identifier, so that the target client Can be processed accordingly.
- the target virtual machine in the target client to interface with the SCSI processing process may be a VMWare virtual machine, or it may not be a VMWare virtual machine. Because different virtual machine target devices have different fault handling methods, this step 508 can be implemented in any of the following ways 1-2.
- Method 1 When the access request is sent by the target client in the distributed storage system based on the target virtual machine, and the target virtual machine is a VMWare virtual machine, if the fault state is the first fault state, the SCSI processing process does not Respond to the access request to the target virtual machine.
- the SCSI processing process When the SCSI processing process receives the first failure identifier, it indicates that the distributed storage system is in the first failure state. For the VMWare virtual machine, the first failure state is also the ADP state. And because the access request received by the SCSI processing process is sent by the target virtual machine, in order for the VMWare virtual machine to perceive that the distributed storage system is in its own defined APD state, the SCSI processing process does not respond to the target virtual machine, and because of the distribution All SCSI processing processes in the distributed storage system will receive the first fault identifier as long as they issue an access request to the OSD process, and each SCSI processing process that processes the access request will not respond to the target virtual machine, which can simulate distributed All links in the storage system are unresponsive (DOWN).
- the target virtual machine if it does not receive a response from the SCSI processing process, the target virtual machine will continue to send access requests, even if there are not all storage nodes in the distributed storage system.
- the fault can also be handled according to the fault status defined by the VMWare virtual machine.
- the detection keyword carried in the retry request can be the Unit Attention (0x6) error code.
- the Unit Attention (0x6) error code can indicate that the storage medium or link status in the storage node has changed, that is, a failure has occurred.
- the embodiments of the present invention provide different failure handling methods, so that the failure handling methods provided by the embodiments of the present invention are more universal.
- process shown in this step 508 is also a process of performing fault handling based on the fault status contained in the response.
- the supervisory node sends a repair completion response to each device in the distributed storage system, and the repair completion response is used to indicate that there is no repair in the distributed storage system. Faulty equipment.
- Each device includes a storage node and a client.
- the at least one storage node When the at least one storage node is fully repaired within the first preset time period, it means that there is no faulty device in the distributed storage system at this time.
- the supervisory node stores the first fault identifier for identifying the first fault state .
- the supervisory node can delete the first failure identifier, and send a repair completion response to each device in the distributed storage system, and has notified each device that there is no failed device in the distributed storage system and can work normally, then when each device After receiving the repair completion response, the previously received fault identification is deleted, and normal operation can begin.
- the process shown in step 509 is a process of sending a repair completion response to each device in the distributed storage system when the repair of the at least one storage node is completed.
- the client when the client does not obtain a response from any storage node, it directly considers the failure state of the distributed storage system to be the APD state, and only considers the failure of the distributed storage system when the storage node clearly returns a storage abnormality message The state is determined to be the PDL state. If the storage node has a long-term failure, it may not be able to return a storage abnormal message to the client, and the client may determine the failure state as the APD state. Therefore, the prior art determines the failure of the distributed storage system The status is not accurate, and if the PDL status is mistaken for the APD status, the repair work on the business side will not be carried out, and eventually the fault repair time will be prolonged.
- each storage node in the distributed storage system knows the failure status of the distributed storage system. For storage nodes that have not failed, the failure status can be returned to the target device based on the SCSI request. Therefore, the target device can clearly know the fault status of the distributed storage system, and the accuracy of the target device in determining the fault status can be improved.
- the supervisory node updates the failure state from the first failure state to the second failure state .
- the at least one storage node may not be fully repaired within the first preset duration. Then, when the at least one storage node may not be fully repaired, then It may take longer to repair storage nodes that have not been repaired. Since the time taken to repair the storage node that has not been repaired is uncertain and may take a long time, the supervisory node can directly update the failure state from the first failure state to the second failure state.
- the supervisory node sends a second failure identifier for indicating the second failure to all storage nodes in the distributed storage system.
- the manner in which the supervisory node sends the second fault identifier to all storage nodes is the same as the manner in which all storage nodes send the first fault identifier in step 503.
- the embodiment of the present invention does not repeat this step 511.
- the process shown in this step 511 is also a process of sending the failure status to each of the multiple storage nodes.
- the target storage node receives the second failure identifier.
- the manner in which the target storage node receives the second failure identifier is the same as that of receiving the first failure identifier in step 504.
- the embodiment of the present invention does not repeat this step 512.
- the target device sends an access request to the target storage node.
- step 505. The manner in which the target device sends the access request to the target storage node is described in step 505.
- the embodiment of the present invention does not repeat this step 513.
- the target storage node in the distributed storage system receives the second failure identifier, if the target storage node receives an access request again, the target storage node suspends the access request and outputs the second failure identifier.
- the method of suspending the access request and outputting the second fault identification of any storage node is the same as the method of suspending the access request of the target storage node and outputting the first fault identification in step 506.
- the embodiment of the present invention responds to this step 514 will not go into details.
- the target device receives a second failure identifier returned by the target storage node based on the access request.
- the manner in which the target device receives the second fault identifier is the same as the manner in which the first fault identifier is received in step 507, which is not repeated in the embodiment of the present invention.
- the target device performs fault processing based on the received second fault identifier.
- This step 516 may be performed by the SCSI processing process of the target device.
- the target virtual machine that interfaces with the SCSI processing process in the client may be a VMWare virtual machine or not a VMWare virtual machine, due to different virtual machines.
- the target device performs fault handling in different ways. Therefore, step 516 can be implemented in any of the following ways 3-4.
- the failure identification is the second failure identification, it means that at least one storage node cannot be repaired completely within the first preset time, and it takes a longer time.
- the target virtual machine can perform fault handling in the PDL state, so that the target virtual machine can Perceive the PDL status, the storage abnormal message can carry the VMWare virtual machine's customized SK 0x0, ASC&ASCQ 0x0200 or SK 0x5, ASC&ASCQ 0x2500 and other SCSI errors.
- SCSI errors can indicate that the state in the distributed storage system is the PDL state, thus the target virtual machine After receiving the storage abnormality message, the target virtual machine can perceive the PDL status, then the target virtual machine can power off the file system in the distributed storage system and wait for the technician to repair the fault in the distributed storage system, or according to the user’s own requirements.
- the defined fault handling method selects a better fault handling method to handle the faulty storage node. For example, power off a faulty storage node.
- the file system may be abnormal. To ensure that the file system can be normal after the storage node failure in the distributed storage system is repaired To use, you need to power off the file system first. When the repair is complete, power on the file system and repair the file system.
- Manner 4 When the access request is sent by the target client based on the target virtual machine, and the target virtual machine is not a VMWare virtual machine, if the fault state is the second fault state, the target device returns the target virtual machine to the target virtual machine. A target error that can be recognized by the target virtual machine. The target error is used to indicate a storage medium failure.
- the target error can be a Sense key 0x3 error, or a storage medium error (Medium Error), which can be recognized by ordinary virtual machines.
- a storage medium error Medium Error
- the target virtual machine receives the target error, it means that the state of the distributed storage system is the first 2.
- the target device can power off the distributed file system and wait for technicians to repair the fault in the distributed storage system, or select a better fault handling method to handle the faulty storage node according to the user-defined fault handling method .
- the process shown in this step 516 is also a process in which the target device performs fault processing based on the received fault identifier, that is, a process in which fault processing is performed based on the fault status contained in the response.
- This step 517 is the same as step 509, and this step 517 is not described in detail in this embodiment of the present invention. It should be noted that the failure of each storage node may be repaired by itself, or it may be repaired by a technician. The embodiment of the present invention does not specifically limit the repair method of the storage node.
- the client when the client receives the repair completion response, if the file system has been powered off, the client powers on the file system and repairs the file system. Since a large amount of metadata is stored in the file system, when repairing the file system, you need to scan all the metadata in the file system and modify the scanned wrong metadata. Generally, the process of repairing the file system needs to consume some parts. Time, in the first failure state, the client will not power off the file system. Once the at least one storage node can be repaired within the first preset time, powering off the file system can be avoided, thereby reducing repairs The time of the file system enables the distributed storage system to resume business as soon as possible to ensure the quality of service.
- the method shown in the embodiment of the present invention determines the failure state of the distributed storage system according to at least one of the storage nodes in which at least one of the plurality of storage nodes has failed, so that the distribution does not need to be determined when all storage nodes fail.
- the fault status of the distributed storage system is determined, the fault status can be sent to each storage node in the distributed storage system immediately, so that each storage node can perform fault handling according to the determined fault status, thereby reducing the distributed The time taken for the storage system to return to normal.
- the embodiments of the present invention provide different failure handling methods, so that the failure handling methods provided by the embodiments of the present invention are more universal.
- each storage node in the distributed storage system knows the failure status of the distributed storage system.
- the target device For storage nodes that have not failed, they can return the failure status to the target device based on the access request, so that the target device can clearly know The fault status of the distributed storage system can further improve the accuracy of the target device in determining the fault status.
- the target device will not power off the file system. Once the at least one storage node can be repaired within the first preset time, powering off the file system can be avoided. After a storage node is restored, the file system and services can be restored immediately, which can reduce the time to repair the file system, so that the distributed storage system can restore services as soon as possible to ensure the quality of service.
- the supervisory node determines at least one storage node that has failed in the distributed storage system and target data to which the at least one storage node cannot be accessed.
- step 601 is the same as step 501, and step 601 is not described in detail in the embodiment of the present invention.
- the supervisory node determines a failure scenario of the distributed storage system according to the time when the at least one storage node fails, where the failure scenario is used to indicate whether the at least one storage node fails at the same time.
- the failure scenario may include any one of a first failure scenario and a second failure scenario, where the first failure scenario is used to indicate that the at least one storage node fails at the same time, and the second failure scenario is used to indicate the at least one storage node The time of node failure is different.
- the supervisory node can determine the failure time of each storage node from the stored failure table, so that the supervisory node can determine the failure scenario according to whether the failure time of at least one storage node is the same.
- the supervisory node determines the failure scenario as the first failure scenario, otherwise, determines the failure scenario as the second failure scenario .
- the process shown in this step 602 is a process of determining the failure scenario according to the time when the at least one storage node fails.
- the supervisory node determines the failure type of any storage node as the first failure type; otherwise, determines the failure type of any storage node as the second failure type.
- the failure type of a storage node is used to indicate whether the failure of a storage node can be repaired within a second preset time period.
- the failure type may include any one of the first failure type and the second failure type, and the second failure type is used to indicate The failure of a storage node cannot be repaired within the second preset time period.
- the second preset time period may be less than or equal to the first preset time period. The embodiment of the present invention does not specifically limit the second preset time period.
- the preset network fault may include any of the following items 1.1-1.7:
- the service network port of any storage node cannot be accessed, and the supervision network port of any storage node can be accessed.
- This service network port is used for heartbeat, data synchronization and mirroring between storage nodes.
- the network port of the network, the monitoring network port is the network port of the monitoring network used for monitoring whether the storage node is faulty and for information query.
- the supervisory node can send an Internet packet explorer (ping) request to the service network port of any storage node through the service network port of the supervisory node.
- the ping request is used to request the establishment of a connection. If the connection can be successful, it will be considered as any storage node. The service network port of the node can be accessed, otherwise, the service network port of any storage node is considered inaccessible.
- the supervisory node can send a ping request to the supervisory network port of any storage node through the supervisory network port of the supervisory node. If the connection is successful, the supervisory network port of any storage node is considered to be accessible, otherwise the supervisory network port of any storage node is considered to be accessible. The supervisory network port of a storage node cannot be accessed.
- the supervisory node cannot access any storage node through the service network port, it means that any storage node has a network failure, but it can access any storage node through the supervisory network port, indicating that the failure of any storage node can occur in the short term. If repaired within time, a preset network failure occurs in any of the storage nodes.
- the BMC network port is the network port of the BMC network that manages the BMC.
- the supervisory node can send a ping request to the BMC network port of any storage node through the BMC network port of the supervisory node. If the connection is successful, the BMC network port of any storage node is considered to be accessible; otherwise, the BMC network port of any storage node is considered to be accessible. The BMC network port of the node cannot be accessed.
- technicians can configure the business network and the supervision network to be the same network, that is, the target network.
- the business network and the supervision network are the same target network, if any storage node is in the target network If the first preset number of lost packets or the second preset number of malformed packets occurs in the data packets transmitted inside, it indicates that any storage node has a network failure. If the supervisory node cannot access any of the storage nodes, it indicates that any A failure of a storage node can be repaired in a short time, and a preset network failure occurs in any storage node.
- the data packets transmitted by any storage node in the target network have packet loss greater than the first preset number or greater than the second preset number Is a malformed packet, and the time delay for any storage node to transmit data in the target network is greater than the third preset duration.
- connection failure response is used to indicate the connection failure.
- connection failure response can carry delay information, which is used to indicate the delay of any storage node transmitting data in the target network, so that the supervisory node can determine whether the delay indicated by the delay information is greater than The third preset duration.
- the supervisory node can detect priority-based flow control (PFC) packets of each priority in the target network, so as to determine whether the number of PFC packets sent by any storage node is greater than the third preset number. The supervisory node can then determine whether any storage node meets the preset conditions.
- PFC priority-based flow control
- any storage node appears in the target network to send a third preset number of priority flow control PFC packets, and any storage node The time delay for the node to transmit data in the target network is greater than the fourth preset duration.
- Item 1.6 When the business network and the supervision network are the same target network, a broadcast storm caused by any storage node occurs in the target network, and the business network port, supervision network port, and BMC network of any storage node The mouth is inaccessible.
- the preset abnormal power failure can include any of the following items 2.1-2.2:
- the service network ports, supervisory network ports, and BMC network ports of all storage nodes in the chassis are inaccessible, and the chassis includes any of the storage nodes.
- a chassis can include at least one storage node.
- all storage node service network ports, supervisory network ports, and BMC network ports are inaccessible, it can be considered that all storage nodes in the chassis are powered off. If any storage node is in the shelf, it means that any storage node is also powered off. As long as the shelf is powered on, the fault of any storage node can be repaired, and it is considered that any storage node has a preset Abnormal power failure.
- the service network ports, supervisory network ports, and BMC network ports of the first target number of storage nodes are inaccessible, and the first target number of storage nodes includes any storage node node.
- the first target number of storage nodes can all be considered as preset Abnormal power failure. It should be noted that the embodiment of the present invention does not specifically limit the seventh preset duration.
- the preset misoperation fault may include: any storage node is actively powered off. For example, when the user clicks the shutdown button or restart button of any storage node, the storage node considers it to be actively powered off, and sends the active power off information to the supervisory node, so that the supervisory node determines that any storage node has a preset Misoperation failure.
- the preset hardware failure includes: any storage node exits abnormally, the BMC network port of any storage node can be accessed, and any storage node has loose parts.
- any storage node When any storage node exits abnormally, it can send abnormal exit information to the supervisory node to indicate that it has exited abnormally.
- the abnormal exit may be caused by loose internal parts. Any loose part can be a memory bar or a card bar, etc.
- the loose part can be restored immediately by plugging and unplugging, that is, a short-term failure occurs. It should be noted that when any storage node detects that any component is poorly connected, it means that any storage node has loose components, and then any storage node can send loose component information to the supervisory node to supervise the node According to the information of the loose parts, it can be determined that any of the storage nodes has a preset misoperation fault.
- the preset software failure can include any of the following items 3.1-3.3:
- any storage node can send an abnormal reset message to the supervising node, so that the supervising node can learn that any storage node is abnormally reset, indicating that any storage node has a preset software failure.
- the target process can be an OSD process.
- any storage node can send a target process exit message to the supervisory node, so that the supervisory node can learn that the target process of any storage node has exited , It means that any of the storage nodes has a preset software failure.
- Item 3.3 The software abnormality of any storage node causes the operating system of any storage node to reset.
- any storage node can send an operating system reset message to the supervisory node, so that the supervisory node can learn that the operating system of any storage node has been reset.
- a preset software failure occurred on any storage node.
- the supervisory node can determine whether the failure type of each storage node is the first failure type or the second failure type through this step 603, and then The failure type of each storage node is stored in the failure table, so that when a supervisory node needs the failure type of any storage node, it can be directly obtained from the failure table.
- the multiple fault type discrimination methods embodied in this step 603 can accurately determine the fault type of each storage node, and then according to the fault type of each storage node, the distributed storage can be determined more accurately. The fault status of the system.
- the supervisory node performs The failure type of each storage node in the at least one storage node determines the failure state.
- the supervisory node may determine the failure state of the distributed storage system according to each storage node.
- the supervisory node determines the failure state according to the failure type of each storage node in the at least one storage node may include: when the failure type of each storage node in the at least one storage node is In the case of the first failure type, the supervisory node determines the failure state as the first failure state, and the first failure state is used to indicate that the at least one storage node can be completely repaired within the first preset time period; When the failure type of the target number of storage nodes in at least one storage node is the second failure type, if the target number is less than or equal to the redundancy of the distributed storage system, the supervisory node determines the failure status as the The first failure state, otherwise, the failure state is determined as the second failure state, and the second failure state is used to indicate that the at least one storage node cannot be completely repaired within the first preset time period.
- the failure type of each storage node in the at least one storage node is the first failure type
- the failure state can be determined as the first failure status.
- the failure type of the target number of storage nodes is the second failure type
- the failure state can be determined as the first fault state. Once the target number is greater than the redundancy of the distributed storage system, indicating that the target number of storage nodes has a greater impact on the distributed storage system, the failure state can be determined as the second failure state in order to quickly repair the failure.
- the supervisory node can use any of the preset network faults in step 603, the preset abnormal power-down faults or the preset misoperation faults, Determine the fault type of each storage node.
- the supervisory node will perform the task according to the at least The failure type of the first storage node that failed last in a storage node determines the failure state.
- the supervisory node can determine the distributed storage system according to the failure type of the last storage node that failed in the at least one storage node In the failure state of, the last storage node that failed is also the first storage node.
- the supervisory node determines the failure state as the first failure state, and the first failure state is used to indicate the at least one failure state.
- the storage node can all be repaired within the first preset time period; when the failure type of the first storage node is the second failure type, the supervisory node determines the failure state as the second failure state, and the second failure state
- the fault state is used to indicate that the at least one storage node cannot be completely repaired within the first preset time period.
- steps 604 and 605 is also a process of determining the failure state of the distributed storage system according to the failure scenario of the distributed storage system.
- the supervisory node only needs to determine the failure type of the first storage node, and there is no need to determine the failure type of all storage nodes.
- the network failure preset in step 603 can be used. Any one of the preset abnormal power failure, the preset misoperation failure, the preset hardware failure, or the preset software failure is used to determine the failure type of the first storage node.
- the supervisory node sends a fault identifier for indicating a fault state to all storage nodes in the distributed storage system.
- the supervisory node When the failure state is the first failure state, the supervisory node sends the first failure identifier to all storage nodes in the distributed storage system, and when the failure state is the second failure state, the supervisory node sends the All storage nodes in the storage system send the second failure identifier, and the specific execution process is the same as that of step 503, and will not be repeated here. It should be noted that the process shown in this step 606 is also a process of sending the fault status to each of the multiple storage nodes.
- the target storage node in the distributed storage system receives the failure identifier.
- the target storage node When the failure identification is the first failure identification, the target storage node receives the first failure identification, and when the failure identification is the second failure identification, the target storage node receives the second failure identification.
- the specific execution process is the same as that of step 504 , I won’t repeat it here.
- the target device sends an access request to the target storage node.
- This step 608 is the same as the process shown in step 505, and this step 608 is not repeated in this embodiment of the present invention.
- the target storage node in the distributed storage system After the target storage node in the distributed storage system receives the fault indicator, if the target storage node receives an access request again, the target storage node suspends the access request and outputs the fault indicator.
- Step 506 is the same, and will not be repeated here.
- the target device receives the failure identifier returned by the target storage node based on the access request.
- This step 610 is similar to the process shown in step 505, and this step 610 is not repeated in this embodiment of the present invention. It should be noted that the process shown in this step 610 is also to receive the response returned by the target storage node; the response includes the fault status of the distributed storage system; the fault status is used to indicate at least one occurrence The process of whether the failed storage node can all be repaired within the first preset time period. Wherein, the fault status included in the response is also a fault identifier.
- the target device performs fault processing based on the received fault identifier.
- the target device When the fault identifier is the first fault identifier, the target device performs fault processing based on the received first fault identifier, and the specific execution process is the same as the process shown in step 508.
- the target device When the fault identifier is the second fault identifier, the target device performs fault processing based on the received second fault identifier.
- the specific execution process is the same as the process shown in step 516.
- the embodiment of the present invention does not perform step 611. Repeat.
- each storage node in the distributed storage system knows the fault status of the distributed storage system. For storage nodes that have not failed, they can return the fault status to the target device based on the access request, so that the target device can Knowing the fault status of the distributed storage system clearly can improve the accuracy of the target device in determining the fault status.
- the embodiments of the present invention provide different failure handling methods, so that the failure handling methods provided by the embodiments of the present invention are more universal.
- process shown in this step 611 is also a process of performing fault handling based on the fault status contained in the response.
- the supervisory node sends a repair completion response to each device in the distributed storage system.
- This step 612 has the same principle as step 509, and this step 612 is not described in detail in this embodiment of the present invention. It should be noted that when the failure state is the first failure state, if the at least one storage node is completely repaired within the first preset time period, this step 612 can be directly executed. If the at least one storage node is not completely repaired within the time period, the supervisory node updates the fault status from the first fault status to the second fault status, and jumps to step 606. It should be noted that in the first failure state, the client will not power off the file system. Once the at least one storage node can be repaired within the first preset time, powering off the file system can be avoided. The time to repair the file system can be reduced, so that the distributed storage system can resume business as soon as possible to ensure the quality of service.
- the method shown in the embodiment of the present invention determines the failure state of the distributed storage system according to at least one of the storage nodes in which at least one of the plurality of storage nodes has failed, so that the distribution does not need to be determined when all storage nodes fail.
- the fault status of the distributed storage system is determined, the fault status can be sent to each storage node in the distributed storage system immediately, so that each storage node can perform fault handling according to the determined fault status, thereby reducing the distributed The time taken for the storage system to return to normal.
- the embodiments of the present invention provide different failure handling methods, so that the failure handling methods provided by the embodiments of the present invention are more universal.
- each storage node in the distributed storage system knows the failure status of the distributed storage system.
- the target device For storage nodes that have not failed, they can return the failure status to the target device based on the access request, so that the target device can clearly know
- the fault status of the distributed storage system can further improve the accuracy of the target device in determining the fault status.
- the target device will not power off the file system.
- the at least one storage node Once the at least one storage node can be repaired within the first preset time, powering off the file system can be avoided, thereby reducing repairs.
- the time of the file system enables the distributed storage system to resume business as soon as possible to ensure the quality of service.
- the multiple fault type discrimination methods embodied in this step 603 can accurately determine the fault type of each storage node, and according to the fault type of each storage node, the fault of the distributed storage system can be determined more accurately. status.
- FIG. 7 is a schematic structural diagram of a fault handling device provided by an embodiment of the present invention, which is applied to a distributed storage system.
- the distributed storage system includes multiple storage nodes, and the device includes:
- the determining module 701 is configured to determine the fault status of the distributed storage system according to at least one of the storage nodes in the plurality of storage nodes that has failed; the fault status is used to indicate the at least one failed storage node Can all be repaired within the first preset time period;
- the sending module 702 is configured to send the fault status to each storage node of the multiple storage nodes.
- the device further includes:
- the processing module is configured to send the fault status to each storage node of the multiple storage nodes.
- the determining module 701 includes:
- the first determining unit is configured to execute the above step 501;
- the second determining unit is configured to determine the fault state according to the at least one storage node and the target data.
- the second determining unit is used to:
- the failure state is determined to be the first failure state, and the first failure state is A failure state is used to indicate that the at least one storage node can be completely repaired within the first preset time period.
- the second determining unit is used to:
- the failure scenario is used to indicate whether the at least one storage node fails at the same time.
- the preset condition includes any one of the following:
- the ratio between the data volume of the target data and the first preset data volume is greater than the preset ratio, and the first preset data volume is the total data volume of all data stored in the distributed storage system;
- the data amount of the target data is greater than the second preset data amount.
- the device further includes:
- the update module is configured to change the failure state from the first failure state if the at least one storage node is not all repaired within the first preset time period when the failure state is the first failure state
- the state is updated to a second failure state, and the second failure state is used to indicate that the at least one storage node cannot be completely repaired within the first preset time period.
- the second determining unit is configured to perform step 602 above.
- the second determining unit is configured to determine the failure scenario as the first failure scenario when the at least one storage node fails within the target time period; otherwise, determine the failure scenario as A second failure scenario, where the first failure scenario is used to indicate that the at least one storage node fails at the same time, and the second failure scenario is used to indicate that the at least one storage node fails at different times.
- the second determining unit includes:
- the first determining subunit is configured to execute the above step 604;
- the second determining subunit is configured to perform step 605 above.
- the first determining subunit is used for:
- the failure state is determined to be the first failure state, and the first failure type is used to indicate the failure energy of one storage node Be repaired within the second preset time period, and the first failure state is used to indicate that the at least one storage node can be repaired in the first preset time period;
- the failure state is determined Is the first failure state; otherwise, the failure state is determined to be the second failure state, and the second failure type is used to indicate that the failure of a storage node cannot be repaired within the second preset time period, so The second failure state is used to indicate that the at least one storage node cannot be completely repaired within the first preset time period.
- the second determining subunit is used for:
- the failure state is determined to be the first failure state, and the first failure type is used to indicate that the failure of a storage node can occur in the second failure state. Assuming that it is repaired within a time period, the first failure state is used to indicate that the at least one storage node can be repaired in the first preset time period;
- the failure state is determined to be the second failure state, and the second failure type is used to indicate that the failure of a storage node cannot be in the second failure state. It is repaired within a preset time period, and the second failure state is used to indicate that the at least one storage node cannot be fully repaired within the first preset time period.
- the determining module 701 is further configured to perform step 603.
- the sending module 702 is further configured to perform step 509 above.
- FIG. 8 is a schematic structural diagram of a fault handling device provided by an embodiment of the present invention.
- the distributed storage system includes multiple storage nodes; the device includes:
- the sending module 801 is configured to execute the above step 608;
- the receiving module 802 is configured to receive a response returned by the target storage node; the response includes the failure status of the distributed storage system; the failure status is used to indicate whether at least one failed storage node can be in the first All are repaired within the preset time.
- the device further includes:
- the processing module is used to perform fault processing based on the fault status contained in the response.
- the fault identifier of the fault state includes either a first fault identifier or a second fault identifier, wherein the first fault identifier is used to indicate the first fault state, and the second fault identifier is used for Indicates a second failure state, the first failure state is used to indicate that at least one storage node can be repaired within a first preset time period, and the second failure state is used to indicate that the at least one storage node cannot be in the All are repaired within the first preset time period, and the storage node is a failed storage node in the distributed storage system.
- processing module is used for:
- the target virtual machine is a VMWare virtual machine
- the failure state is the first failure state
- no The target virtual machine responds to the access request
- the first failure state is used to indicate that at least one storage node can be repaired in a first preset time period
- the target virtual machine is a VMWare virtual machine
- the fault state is the second fault state
- An abnormal message is stored, and the second failure state is used to indicate that the at least one storage node cannot be completely repaired within the first preset time period.
- processing module is used for:
- the target access request is sent by the target client in the distributed storage system based on the target virtual machine, and the target virtual machine is not a VMWare virtual machine
- the fault state is the first fault state
- send The target virtual machine sends a retry request
- the retry request is used to instruct to reissue the access request
- the first failure state is used to indicate that at least one storage node can be repaired in a first preset time period ;
- the target access request is sent by the target client based on the target virtual machine, and the target virtual machine is not a VMWare virtual machine
- the fault state is the second fault state
- a target error recognizable by the target virtual machine the target error is used to indicate a storage medium failure
- the second failure state is used to indicate that the at least one storage node cannot be completely repaired within the first preset time period .
- the receiving module 802 is further configured to receive a target access request sent by a target client in the distributed storage system, the target access request is used to instruct to process the first target data, and the first target data
- One target data includes the target data
- the sending module 801 is configured to send the access request to the target storage node in the distributed storage system based on the target access request.
- the receiving module 802 is configured to receive a repair completion response returned by the target storage node, where the repair completion response is used to indicate that there is no faulty device in the distributed storage system.
- the embodiment of the present invention also provides a distributed storage system, the distributed storage system includes a supervisory node and multiple storage nodes;
- the supervisory node is used for:
- the failure state of the distributed storage system is determined according to at least one of the storage nodes that has failed; the failure state is used to indicate whether the at least one failed storage node can be All are repaired within a time period;
- Each storage node of the plurality of storage nodes is configured to receive the failure state.
- the fault identifier of the fault state includes any one of a first fault identifier and a second fault identifier, the first fault identifier is used for indicating the first fault state, and the second fault identifier is used for Indicate a second failure state, the first failure state is used to indicate that the at least one storage node can be completely repaired within the first preset time period, and the second failure state is used to indicate the at least one storage node It cannot be completely repaired within the first preset time period.
- each storage node of the plurality of storage nodes is further configured to, after receiving the failure identifier, if the access request is received again, suspend the access request based on the received failure status, Perform troubleshooting.
- each device in the distributed storage system provided above may be the device in Embodiments 5 and 6.
- the fault processing device provided in the above embodiment deals with faults
- only the division of the above functional modules is used as an example for illustration.
- the above functions can be allocated by different functional modules as required, namely The internal structure of the device is divided into different functional modules to complete all or part of the functions described above.
- the embodiments of the method for handling faults in a distributed storage system provided in the above embodiments belong to the same concept. For the specific implementation process, please refer to the method embodiments, which will not be repeated here.
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Debugging And Monitoring (AREA)
- Techniques For Improving Reliability Of Storages (AREA)
Abstract
本发明公开了一种分布式存储系统故障处理方法、装置、计算机设备、存储介质及存储系统,属于故障处理技术领域。本方法根据所述多个存储节点中的至少一个出现故障的存储节点,确定所述分布式存储系统的故障状态,从而无需当所有存储节点均故障时,才确定分布式存储系统的故障状态,当确定完故障状态后,可以立即向分布式存储系统内的每个存储节点发送故障状态,以便每个存储节点根据确定的故障状态进行故障处理,从而可以降低分布式存储系统恢复正常所用的时间。
Description
本发明涉及故障处理技术领域,特别涉及一种分布式存储系统故障处理方法、装置、计算机设备、存储介质及分布式存储系统。
随着大数据技术的发展,为了存储更多的数据以及防止数据丢失,分布式存储系统越来越受到企业的青睐,分布式存储系统中的存储节点随着使用时间的增长,不可避免的会出现故障,当分布式存储系统中的存储节点均出现故障时,为了保证故障的存储节点不影响正常的业务,为用户提供业务的计算节点可以对分布式存储系统中的故障,进行故障处理。
其中,故障处理可以是以下过程:在分布式存储系统中,客户端向多个存储节点发送小型计算机系统接口(small computer system interface,SCSI)请求;当分布式存储系统中的所有存储节点均出现故障时,若存储节点出现的是短时间内可修复的故障,则存储节点不会响应SCSI请求,当客户端未获取任何存储节点的响应时,则客户端将该分布式存储系统的故障状态确定为全部路径异常(all path down,APD)状态,APD状态为VMWare虚拟机定义的一种存储节点的故障状态,用于表示后端存储节点的所有路径均无法响应主机请求,客户端悬挂未处理的SCSI请求,等待技术人员修复分布式存储系统内的故障;当存储节点出现短时间内不可修复的故障时,存储节点向客户端返回存储异常的消息,则客户端接收到该存储异常的消息后,将该分布式存储系统的故障状态确定为永久设备丢失(permanent device lost,PDL)状态,其中,PDL状态为VMWare虚拟机定义的一种存储节点的故障状态,用于表示后端存储节点长期或永久故障,由于存储节点长时间出现故障会导致分布式存储系统内的文件系统损坏,当客户端确定的故障状态为PDL状态时,该客户端将文件系统下电,并等待技术人员修复故障。
在上述故障处理过程中,只有当分布式存储系统中所有存储节点均出现故障时,客户端才会执行上述故障处理的过程,但是,当分布式存储系统中的部分存储节点出现故障时,客户端不会执行上述故障处理的过程。然而,在分布式存储系统中部分存储节点出现故障是一种比较常见的现象,一旦部分存储节点出现故障,若不对分布式存储系统中的存储节点进行故障诊断,则客户端并无法确定分布式存储系统中的存储节点是否有故障,从而技术人员不能及时地从客户端获知存储节点出现故障的消息,也就不能立即对出现故障的存储节点进行修复,从而延长了分布式存储系统恢复正常所用的时间。
发明内容
本发明实施例提供了一种分布式存储系统故障处理方法、装置、计算机设备、存储介质及存储系统,能够降低分布式存储系统恢复正常所用的时间。该技术方案如下:
第一方面,提供了一种分布式存储系统故障处理方法,所述分布式存储系统包含多个存储节点;该方法包括:
根据所述多个存储节点中的至少一个出现故障的存储节点,确定所述分布式存储系统的故障状态;所述故障状态用于指示所述至少一个出现故障的存储节点能否在第一预设时长内全部被修复;
向所述多个存储节点中每一个存储节点发送所述故障状态。
基于上述实现方式,根据所述多个存储节点中的至少一个出现故障的存储节点,确定所述分布式存储系统的故障状态,从而无需当所有存储节点均故障时,才确定分布式存储系统的故障状态,当确定完故障状态后,可以立即向分布式存储系统内的每个存储节点发送故障状态,以便每个存储节点根据确定的故障状态进行故障处理,从而可以降低分布式存储系统恢复正常所用的时间。
在一种可能实现方式中,所述方法还包括:
根据所述分布式存储系统的故障状态,进行故障处理。
在一种可能实现方式中,所述根据所述多个存储节点中的至少一个出现故障的存储节点,确定所述分布式存储系统的故障状态包括:
确定所述分布式存储系统内出现故障的至少一个存储节点以及所述至少一个存储节点内无法被访问的目标数据;
根据所述至少一个存储节点以及所述目标数据,确定所述故障状态。
在一种可能实现方式中,所述根据所述至少一个存储节点以及所述目标数据,确定所述故障状态包括:
当所述至少一个存储节点的数目大于所述分布式存储系统的冗余度,且所述目标数据的数据量符合预设条件时,将所述故障状态确定为第一故障状态,所述第一故障状态用于指示所述至少一个存储节点能在所述第一预设时长内全部被修复。
基于上述可能的实现方式,在第一故障状态下,目标设备不会下电文件系统,一旦在第一预设时间内该至少一个存储节点均能被修复,那么也就可以避免下电文件系统,从而可以减少修复文件系统的时间,使得分布式存储系统可以尽快恢复业务,以保证服务质量。
在一种可能实现方式中,所述根据所述至少一个存储节点以及所述目标数据,确定所述故障状态包括:
当所述至少一个存储节点的数目大于所述分布式存储系统的冗余度,且所述目标数据的数据量符合预设条件时,根据所述分布式存储系统的故障场景,确定所述故障状态,所述故障场景用于指示所述至少一个存储节点是否同时出现故障。
在一种可能实现方式中,所述预设条件包括下述任一项:
所述目标数据的数据量与第一预设数据量之间的比值大于预设比值,所述第一预设数据量为所述分布式存储系统存储的所有数据的总数据量;
所述目标数据的数据量大于第二预设数据量。
在一种可能实现方式中,所述根据所述至少一个存储节点以及所述目标数据,确定所述故障状态之后,所述方法还包括:
当所述故障状态为所述第一故障状态时,若在所述第一预设时长内所述至少一个存储 节点未全部被修复,则将所述故障状态由第一故障状态更新为第二故障状态,所述第二故障状态用于指示所述至少一个存储节点不能在所述第一预设时长内全部被修复。
在一种可能实现方式中,所述根据所述分布式存储系统的故障场景,确定所述分布式存储系统的故障状态之前,所述方法还包括:
根据所述至少一个存储节点出现故障的时间,确定所述故障场景。
在一种可能实现方式中,所述根据所述至少一个存储节点出现故障的时间,确定所述故障场景包括:
当所述至少一个存储节点在目标时长内均出现故障时,将所述故障场景确定为第一故障场景,否则,将所述故障场景确定为第二故障场景,所述第一故障场景用于指示所述至少一个存储节点同时出现故障,所述第二故障场景用于指示所述至少一个存储节点出现故障的时间不同。
在一种可能实现方式中,所述根据所述分布式存储系统的故障场景,确定所述分布式存储系统的故障状态包括:
若所述故障场景为第一故障场景,根据所述至少一个存储节点中每一个存储节点的故障类型,确定所述故障状态,所述第一故障场景用于指示所述至少一个存储节点同时出现故障,所述故障类型用于指示一个存储节点的故障能否在第二预设时长内被修复;
若所述故障场景为第二故障场景,根据所述至少一个存储节点中最晚出现故障的第一存储节点的故障类型,确定所述故障状态,所述第二故障场景用于指示所述至少一个存储节点出现故障的时间不同。
在一种可能实现方式中,所述根据所述至少一个存储节点中每一个存储节点的故障类型,确定所述故障状态包括:
当所述至少一个存储节点中每一个存储节点的故障类型均为第一故障类型时,将所述故障状态确定为第一故障状态,所述第一故障类型用于指示一个存储节点的故障能在所述第二预设时长内被修复,所述第一故障状态用于指示所述至少一个存储节点能在所述第一预设时长内全部被修复;
当所述至少一个存储节点中目标个数的存储节点的故障类型为第二故障类型时,若所述目标个数小于或等于所述分布式存储系统的冗余度,将所述故障状态确定为所述第一故障状态,否则,将所述故障状态确定为第二故障状态,所述第二故障类型用于指示一个存储节点的故障不能在所述第二预设时长内被修复,所述第二故障状态用于指示所述至少一个存储节点不能在所述第一预设时长内全部被修复。
在一种可能实现方式中,所述根据所述至少一个存储节点中最晚出现故障的第一存储节点的故障类型,确定所述故障状态包括:
当所述第一存储节点的故障类型为第一故障类型时,将所述故障状态确定为第一故障状态,所述第一故障类型用于指示一个存储节点的故障能在所述第二预设时长内被修复,所述第一故障状态用于指示所述至少一个存储节点能在所述第一预设时长内全部被修复;
当所述第一存储节点的故障类型为第二故障类型时,则将所述故障状态确定为第二故障状态,所述第二故障类型用于指示一个存储节点的故障不能在所述第二预设时长内被修复,所述第二故障状态用于指示所述至少一个存储节点不能在所述第一预设时长内全部被 修复。
在一种可能实现方式中,所述根据所述分布式存储系统的故障场景,确定所述分布式存储系统的故障状态之前,所述方法还包括:
对于所述至少一个存储节点中的任一存储节点,当所述任一存储节点出现预设的网络故障、预设的异常掉电故障、预设的误操作故障、预设的硬件故障或预设的软件故障时,将所述任一存储节点的故障类型确定为第一故障类型,否则,将所述任一存储节点的故障类型确定为第二故障类型,所述第一故障类型用于指示一个存储节点的故障能在所述第二预设时长内被修复,所述第二故障类型用于指示一个存储节点的故障不能在所述第二预设时长内被修复。
在一种可能实现方式中,所述根据所述分布式存储系统的故障状态,进行故障处理之后,所述方法还包括:
当所述至少一个存储节点修复完成时,向所述分布式存储系统内的各个设备发送修复完成响应,所述修复完成响应用于指示所述分布式存储系统内没有故障设备。
第二方面,提供了一种分布式存储系统故障处理方法,所述分布式存储系统包含多个存储节点;该方法包括:
向所述分布式存储系统中的目标存储节点发送访问请求;
接收所述目标存储节点返回的响应;所述响应中包含所述分布式存储系统的故障状态;所述故障状态用于指示至少一个出现故障的存储节点能否在第一预设时长内全部被修复。
在一种可能实现方式中,所述接收所述目标存储节点返回的响应之后,所述方法还包括:
基于所述响应中包含的故障状态,进行故障处理。
在一种可能实现方式中,所述故障状态的故障标识包括第一故障标识或第二故障标识中的任一个,其中,所述第一故障标识用于指示第一故障状态,所述第二故障标识用于指示第二故障状态,所述第一故障状态用于指示至少一个存储节点能在第一预设时长内全部被修复,所述第二故障状态用于指示所述至少一个存储节点不能在所述第一预设时长内全部被修复,所述存储节点为所述分布式存储系统中出现故障的存储节点。
在一种可能实现方式中,所述基于所述响应中包含的故障状态,进行故障处理包括:
当所述访问请求由所述分布式存储系统中的目标客户端基于目标虚拟机发送,且所述目标虚拟机是VMWare虚拟机时,若所述故障状态为所述第一故障状态,不向所述目标虚拟机响应所述访问请求,所述第一故障状态用于指示至少一个存储节点能在第一预设时长内全部被修复;
当所述访问请求由所述目标客户端基于目标虚拟机发送,且所述目标虚拟机是VMWare虚拟机时,若所述故障状态为所述第二故障状态时,向所述目标虚拟机返回存储异常的消息,所述第二故障状态用于指示所述至少一个存储节点不能在所述第一预设时长内全部被修复。
在一种可能实现方式中,所述基于所述响应中包含的故障状态,进行故障处理包括:
当所述访问请求由所述分布式存储系统中的目标客户端基于目标虚拟机发送,且所述 目标虚拟机不是VMWare虚拟机时,若所述故障状态为所述第一故障状态,向所述目标虚拟机发送重试请求,所述重试请求用于指示重新下发所述访问请求,所述第一故障状态用于指示至少一个存储节点能在第一预设时长内全部被修复;
当所述访问请求由所述目标客户端基于目标虚拟机发送,且所述目标虚拟机不是VMWare虚拟机时,若所述故障状态为所述第二故障状态,向所述目标虚拟机返回所述目标虚拟机可识别的目标错误,所述目标错误用于指示存储介质故障,所述第二故障状态用于指示所述至少一个存储节点不能在所述第一预设时长内全部被修复。
基于上述可能的实现方式,提供了不同的故障处理方式,从而使得该本发明实施例所提供的故障处理方式更具有普适性。
在一种可能实现方式中,所述向分布式存储系统内的任一存储节点发送访问请求之前,所述方法还包括:
接收所述分布式存储系统中的目标客户端发送的目标访问请求;
所述向分布式存储系统中的目标存储节点发送访问请求包括:
基于所述目标访问请求,向分布式存储系统中的目标存储节点发送所述访问请求。
在一种可能实现方式中,所述接收所述目标存储节点返回的响应之后,所述方法还包括:
接收目标存储节点返回的修复完成响应,所述修复完成响应用于指示所述分布式存储系统内没有故障设备。
第三方面,提供了一种分布式存储系统,所述分布式存储系统包括监管节点和多个存储节点;
所述监管节点用于:
根据所述多个存储节点中的至少一个出现故障的存储节点确定所述分布式存储系统的故障状态;所述故障状态用于指示所述至少一个出现故障的存储节点能否在第一预设时长内全部被修复;
向所述多个存储节点中每一个存储节点发送所述故障状态;
所述多个存储节点中的每一个存储节点,用于接收所述故障状态。
在一种可能的实现方式中,所述故障状态的故障标识包括第一故障标识和第二故障标识中的任一个,所述第一故障标识用于指示所述第一故障状态,所述第二故障标识用于指示第二故障状态,所述第一故障状态用于指示所述至少一个存储节点能在所述第一预设时长内全部被修复,所述第二故障状态用于指示所述至少一个存储节点不能在所述第一预设时长内全部被修复。
在一种可能的实现方式中,所述多个存储节点中的每一个存储节点,还用于当接收到所述故障状态后,若再接收到所述访问请求,悬挂所述访问请求,基于接收的故障状态,进行故障处理。
第四方面,提供了一种故障处理装置,用于执行上述分布式存储系统故障处理方法。具体地,该故障处理装置包括用于执行上述第一方面或上述第一方面的任一种可选方式提供的故障处理方法的功能模块。
第五方面,提供了一种故障处理装置,用于执行上述分布式存储系统故障处理方法。 具体地,该故障处理装置包括用于执行上述第二方面或上述第二方面的任一种可选方式提供的故障处理方法的功能模块。
第六方面,提供一种计算机设备,该计算机设备包括处理器和存储器,该存储器中存储有至少一条指令,该指令由该处理器加载并执行以实现如上述分布式存储系统故障处理方法所执行的操作。
第七方面,提供一种存储介质,该存储介质中存储有至少一条指令,该指令由处理器加载并执行以实现如上述分布式存储系统故障处理方法所执行的操作。
为了更清楚地说明本发明实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1是本发明实施例提供的一种分布式存储系统的示意图;
图2是本发明实施例提供的一种分布式存储系统的网络环境的示意图;
图3是本发明实施例提供的一种分布式存储系统内各个设备之间的交互示意图
图4是本发明实施例提供的一种计算机设备的结构示意图;
图5是本发明实施例提供的一种分布式存储系统故障处理方法的流程图;
图6是本发明实施例提供的一种分布式存储系统故障处理方法的流程图;
图7是本发明实施例提供的一种故障处理装置的结构示意图;
图8是本发明实施例提供的一种故障处理装置的结构示意图。
为使本发明的目的、技术方案和优点更加清楚,下面将结合附图对本发明实施方式作进一步地详细描述。
图1是本发明实施例提供的一种分布式存储系统的示意图,参见图1,该分布式存储系统包括至少一个客户端101、多个存储节点102以及监管节点103。其中,客户端101用于为用户提供数据存储以及数据读取的业务,也即是,客户端101可以将用户上传的数据存储在存储节点102中,也可以从存储节点102中读取数据。
存储节点102,用于存储客户端101写入的数据,还用于向客户端101返回数据,返回的数据可以是客户端101请求读取的数据,还可以向客户端101返回监管节点103下发的分布式存储系统的故障状态,以便客户端101根据该分布式存储系统的故障状态,处理出现的故障。
监管节点103,用于监控分布式存储系统中各个存储节点102是否出现故障,当分布式存储系统中出现故障的存储节点的数目高于该分布式存储系统的冗余度时,可能会影响到业务的正常运行,因此,该监管节点103可以根据分布式存储系统中出现故障的存储节点,确定该分布式存储系统的故障状态,并将确定的故障状态下发至所有的存储节点102,存储节点102能够告知客户端101该分布式存储系统的故障状态,从而可以使得客户端101 或存储节点102可以根据该分布式存储系统的故障状态,进行故障处理。
各个存储节点102的基板管理控制器(baseboard management controller,BMC)可以实时监控各自的存储节点是否出现故障,当该至少一个存储节点102中的任一存储节点的BMC监控到该任一存储节点出现故障时,该BMC可以存储该任一存储节点出现故障的原因以及出现故障的时间,还可以将该任一存储节点出现故障的原因以及出现故障的时间发送给监管节点103,以便监管节点103可以获知该任一存储节点是否出现故障。
当存储节点的BMC不将存储节点出现故障的原因以及出现故障的时间发送给监管节点103时,监管节点103可以通过访问各个存储节点102,来获知各个存储节点102是否存在故障,若存在故障,则可以从故障的存储节点获取出现故障的原因以及出现故障的时间。
无论监管节点103是接收故障的存储节点发送的出现故障的原因以及出现故障的时间,还是主动地从故障的存储节点获取发送的出现故障的原因以及出现故障的时间,该监管节点103均可以对故障的存储节点出现故障的原因以及出现故障的时间进行存储,以便后续可以根据故障的存储节点发送的出现故障的原因以及出现故障的时间,确定故障类型以及故障场景,其中,对故障类型的描述参见下文中的步骤603,对故障场景的描述参见下文中的步骤602。
该监管节点103可以对故障的存储节点出现故障的原因以及出现故障的时间进行存储可以包括:将故障的存储节点发送的出现故障的原因以及出现故障的时间存储在故障表中,该故障表可以存储编号、存储节点的标识、故障时间以及故障原因,其中,编号用于指示第几个出现故障的存储节点,一个存储节点的标识用于唯一指示一个存储节点,该标识可以是存储节点的互联网协议地址(internet protocol address,IP),还可以存储节点的媒体访问控制地址(media access control address,MAC),还可以是该存储节点在该分布式存储系统内的编号,本发明实施例对该存储节点的标识不做具体限定。另外,故障时间为存储节点出现故障的时间,故障原因为存储节点出现故障的原因。
例如表1所示的故障表,从表1可知,当前分布式存储系统内出现故障的存储节点有2个,分别为标识为X的存储节点以及标识为Y的存储节点,其中,标识为X的存储节点出现故障的时间为A,出现故障的原因为D;标识为Y的存储节点出现故障的时间为B,出现故障的原因为C。
表1
| 编号 | 存储节点的标识 | 故障时间 | 故障原因 |
| 01 | X | A | D |
| 02 | Y | C | E |
需要说明的是,当故障表内的记录的存储节点被修复后,该监管节点可以在该故障表中,删除被修复完成的存储节点的相关信息,从而监管节点可以根据故障表中最后一个存储节点的编号,确定分布式存储系统内出现故障的存储节点的数目。
当分布式存储系统内的存储节点出现故障时,可能造成分布式存储系统内的文件系统的损坏,当文件系统损坏时,文件系统内的元数据可能出现错误,由于元数据是指用来描 述一个文件的特征的系统数据,例如访问权限、文件拥有者以及文件内的数据块的分布信息等,当元数据出现错误时,元数据所指示的文件中的数据块可能无法被访问。
在一些实施中,当客户端访问任一存储节点内的数据块失败时,该客户端可以将该数据块的数据量以及唯一标识该数据块的数据标识发送给监管节点,该监管节点接收到该数据量以及数据标识后,对接收到的数据量以及数据标识进行存储,具体的可以存储在数据表中,该数据表可以用于存储数据总量、数据标识以及与每个数据标识对应的数据量,其中,数据总量为分布式存储系统内当前不可被访问的所有数据的数据量。
例如表2所示的数据表,从表2可知,分布式存储系统中当前不能被访问的数据为数据标识M所指示的数据块内的数据以及数据标识N所指示的数据块内的数据,其中,当前不能被访问的数据的数据总量为30千字节(kilobyte,KB),当前不能被访问的数据包括数据标识M所指示的数据块内10KB的数据以及数据标识N所指示的数据块内20KB的数据。
表2
需要说明的是,当客户端再一次访问不能被访问的数据块时,若该客户端可以访问成功,则该客户端向监管节点发送携带该数据块的数据标识的访问数据成功响应,该访问数据成功响应用于指示该数据块内的数据可以被访问,当接收到该访问数据成功响应后,在该数据表中,该监管节点可以删除该数据标识对应的数据量,并更新数据表内的数据总量。例如,该访问数据成功响应携带数据标识M,则该监管节点删除数据表内与数据标识M相关的信息,并将数据总量更新为20KB。需要说明的是,数据表中还可以存储有与数据标识对应的存储节点的标识,以指示哪个存储节点内的那个数据块不能被访问。
在一些实施例中,由于分布式存储系统负责的业务量比较大,客户端101和存储节点102的数目可能比较多,为了方便客户端101与存储节点102之间的数据传输,客户端101所在的应用层可以设置有至少一个业务交换机,以客户端101与存储节点102之间的交互。为了便于存储节点102之间的数据传输,可以在存储节点102所在的存储层设置至少一个存储交换机,以实现各个存储节点102之间的交互。为了便于监管节点103与存储节点102之间的数据传输,可以设置有监管交换机,以实现监管节点103与存储节点102之间的交互。
从以上的描述可知,在该分布式存储系统中,除了需要提供业务服务以外,还需要提供监控服务,对于不同的服务可以通过不同的网络来实现。为了实现客户端、存储节点以及监管节点之间的网络连接,该客户端、存储节点以及监管节点中均可以安装有至少一个网口,该至少一个网口可以用于连接不同的网络,不同的网络可以传输不同服务的数据,该至少一个网口可以分别是连接业务网络的业务网口、连接监管网络的监管网口,以及连接BMC网络的BMC网口。
为了说明分布式存储系统中的网络环境,参见图2所示的本发明实施例提供的一种分布式存储系统的网络环境的示意图,在该分布式存储系统中网络可以包括业务网络、监管 网络以及BMC网络。
其中,业务网络是存储节点之间用于心跳、数据同步以及镜像时所使用的网络,例如,当将存储节点1中存储的数据块1同步至存储节点2时,存储节点1可以通过业务网口,在该业务网络中,向存储节点2发送数据块1,那么存储节点2通过自己的业务网口,可以接收到数据块1,并将数据块1存储在存储节点2内。
监管网络是监控存储节点是否出现故障以及进行信息查询时所使用的网络,在监管网络中,可以传输监管节点下发的分布式存储系统的故障状态,还可以查询出现故障的存储节点。在一些可能的实施方式中,监管节点可以通过监管网口向存储节点的监管网口,在监管网络中,发送分布式存储系统的故障状态,存储节点可以通过自己的监管网口从监管网络中接收监管节点的故障状态,当存储节点再接收到客户端的业务请求(也即是下文中SCSI请求)后,可以不处理接收的业务请求,直接向客户端返回下发的故障状态,以便客户端可以根据故障状态,进行相应的故障处理。
BMC网络是管理BMC的网络,监管节点通过访问该BMC网络的BMC网口,可以监控BMC的状态,根据监控的BMC的状态,可以确定存储节点是否有故障。需要说明的是,BMC网络为可选的网络,在一些实施方式中,可以不通过BMC来监控存储节点的是否有故障,而是可以通过其他方式,来监控存储节点是否有故障,因此,分布式存储系统内还可以不设置该BMC网络,直接通过监管网络,来实现监控。
当分布式存储系统内的存储节点均和业务网络、监管网络以及BMC网络连接时,监管节点则可以从三网中实时接收存储节点的状态信息,例如是否故障的信息。
为了进一步说明客户端、存储节点以及监管节点之间交互过程,参见图3所示的本发明实施例提供的一种分布式存储系统内各个设备之间的交互示意图,在图3中,一个存储节点中可以安装有至少一个对象存储(object storage device,OSD)进程、SCSI处理进程以及节点监控服务(node monitor service,NMS)代理进程。
其中,一个OSD进程可以对应存储节点中用于存储数据的一个或多个存储介质,该存储介质可以是硬盘,OSD进程用于管理对于一个或多个存储介质的访问请求,访问请求用于指示对待处理的数据进行处理,其中,对待处理的数据进行处理可以包括读取所述至少一个或多个存储介质内存储的数据块,待处理的数据块包括待处理的数据,对待处理的数据进行处理可以还包括将待处理的数据写入所述至少一个或多个存储介质,当访问请求可以使用SCSI发送时,该访问请求可以视为SCSI请求。
SCSI处理进程用于从业务网络中获取客户端发送的SCSI请求,并转换和分解SCSI请求,得到多个SCSI子请求,并将多个SCSI子请求下发到对应的OSD进程。例如,SCSI请求携带待读取的数据的逻辑区块地址(logical block address,LBA)为100-200,由于LBA100-150所指示的存储位置在存储节点1的存储介质1中,LBA 151-200所指示的存储位置在存储节点1的存储介质2中,则SCSI处理进程可以将SCSI请求转换和分解为2个SCSI子请求,其中,SCSI子请求1用于指示请求读取存储介质1中LBA 100-150处存储的数据,SCSI子请求2用于指示请求读取存储介质2中LBA 151-200处存储的数据,从而SCSI处理进程可以将SCSI子请求1发送至于存储介质1对应的OSD进程,将SCSI子请求2发送至于存储介质2对应的OSD进程。
一个NMS代理进程用于接收监管节点下发的分布式存储系统的故障状态,并向一个存储节点的所有OSD进程下发接收的故障状态。例如,客户端通过监管网络向存储节点发送分布式存储系统的故障状态,存储节点内的NMS代理进程从监管网络中获取监管节点发送的故障状态,并将获取的故障状态下发至该存储节点内的各个OSD进程中,当任一OSD进程接收到故障状态后,若再接收到SCSI处理进程发送的SCSI子请求或SCSI请求,则直接向SCSI处理进程发送接收的故障状态,以便安装SCSI处理进程的设备根据接收的故障状态,进行故障处理。
需要说明的是,在一些实施方式中,SCSI处理进程未被安装在存储节点内,而是安装在客户端内,本发明实施例对安装该SCSI处理进程的设备不做具体限定。例如,客户端的SCSI处理进程中的SCSI请求携带待读取的数据的LBA为0-100,由于LBA 0-50所指示的存储位置在存储节点1中,LBA 51-100所指示的存储位置在存储节点2中,则SCSI处理进程可以将SCSI请求转换和分解为2个SCSI子请求,其中,SCSI子请求1用于指示请求读取存储节点1中LBA 0-50处存储的数据,SCSI子请求2用于指示请求读取存储节点2中LBA 51-100处存储的数据,从而SCSI处理进程可以将SCSI子请求1发送至于存储节点1内的OSD进程,将SCSI子请求2发送至于存储节点2内的OSD进程。
客户端、存储节点以及监管节点均可以是计算机设备,为了进一步说明,计算机设备的硬件结构,参见图4所示的本发明实施例提供的一种计算机设备的结构示意图,计算机设备400包括可因配置或性能不同而产生比较大的差异,可以包括一个或一个以上处理器(central processing units,CPU)401和一个或一个以上的存储器402,其中,该存储器402中存储有至少一条指令,该至少一条指令由该处理器401加载并执行以实现下的故障处理方法实施例提供的方法。当然,该计算机设备400还可以具有有线或无线网络接口、键盘以及输入输出接口等部件,以便进行输入输出,该计算机设备400还可以包括其他用于实现设备功能的部件,在此不做赘述。
在示例性实施例中,还提供了一种计算机可读存储介质,例如包括指令的存储器,上述指令可由终端中的处理器执行以完成下述实施例中的故障处理方法。例如,该计算机可读存储介质可以是只读存储器(read-only memory,ROM)、随机存取存储器(random access memory,RAM)、只读光盘(compact disc read-only memory,CD-ROM)、磁带、软盘和光数据存储节点等。
在本发明实施例中,监管节点可以根据出现故障的存储节点,来确定分布式存储系统的故障状态,监管节点并将确定的故障状态下发至分布式存储系统内所有的存储节点;客户端在接收到用户的读请求或者写请求后,可以向分布式存储系统内的存储节点发送SCSI请求,用于完成用户的读请求或者写请求;当任一存储节点接收到SCSI请求后,任一存储节点暂不处理接收的SCSI请求,并向客户端返回接收的故障状态,使得客户端可以基于任一存储节点返回的故障状态,进行故障处理。而在一些实施例中,任一存储节点也可以基于故障状态进行故障处理,在一种可能的实现方式中,当任一存储节点接收到监管节点发送的故障状态时,若再接收到客户端发送的SCSI请求,则任一存储节点可以根据故障状态,进行故障处理。
为了进一步说明上述过程,参见如图5所示的本发明实施例提供的一种分布式存储系 统故障处理方法的流程图,该方法具体包括:
501、监管节点确定分布式存储系统内出现故障的至少一个存储节点以及该至少一个存储节点无法被访问的目标数据。
该目标数据还可以是该分布式存储系统中存储的且不能被访问的数据,该目标数据还可以仅是该至少一个存储节点所存储的无法被访问的数据。本发明实施例以该目标数据是该至少一个存储节点内不能被访问的数据为例进行说明。
该监管节点可以通过查询的方式,确定该至少一个存储节点以及该目标数据,在一种可能的实现方式中,该监管节点可以每经过第八预设时长,查询故障表以及数据表,从该故障表中确定出现故障的存储节点,从数据表中确定无法被访问的数据。该第八预设时长可以是10分钟或者1小时,本发明实施例对该第八预设时长不做具体限定。
需要说明的是,在前文中对从该故障表中确定出现故障的存储节点的方式以及从数据表中确定无法被访问的数据的方式有描述,在此不做赘述。
当该监管节点执行完本步骤501后,还可以根据至少一个存储节点以及所述目标数据,确定故障状态,那么,确定分布式存储系统内出现故障的至少一个存储节点以及该至少一个存储节点无法被访问的目标数据;根据至少一个存储节点以及所述目标数据,确定故障状态的过程,也即是,根据所述多个存储节点中的至少一个出现故障的存储节点,确定所述分布式存储系统的故障状态的过程。其中,根据所述多个存储节点中的至少一个出现故障的存储节点,确定所述分布式存储系统的故障状态的过程可以通过下述步骤502所示的过程的来实现。
502、当该至少一个存储节点的数目大于该分布式存储系统的冗余度,且该目标数据的数据量符合预设条件时,该监管节点则将该分布式存储系统的故障状态确定为第一故障状态。
该冗余度为分布式存储系统内存储的数据的冗余度,也即是该分布式存储系统内存储的数据的副本数目,分布式存储系统的故障状态用于指示该至少一个存储节点能否在第一预设时长内全部被修复,故障状态可以包括第一故障状态以及第二故障状态中的任一个,其中,第二故障状态用于指示该至少一个存储节点不能在该第一预设时长内全部被修复。也即是,当该至少一个存储节点可以在第一预设时长内被修复时,可以认为分布式存储系统内的故障可以在短时间内修复,分布式存储系统处于第一故障状态,该第一故障状态也即是节点短时故障(transient node down,TND)状态。当该至少一个存储节点可以在第一预设时长内被修复时,则认为分布式存储系统内的故障不可以在短时间内修复,分布式存储系统处于第二故障状态,第二故障状态也即是节点长期故障(permanent node down,PND)状态。第一预设时长可以是20分钟或者2两个小时,本发明实施例对该第一预设时长不做具体限定。
当分布式存储系统内存储的数据较多且存储节点的数目也较多时,若分布式存储系统内出现故障的存储节点的数目小于该分布式存储系统的冗余度,该分布式存储系统内的未故障的存储节点可以根据可以被访问的数据,重构故障的存储节点内不能被访问的数据,因此,当分布式存储系统内出现故障的存储节点的数目小于该分布式存储系统的冗余度时,不会影响到该分布式存储系统的正常业务,那么,也就无需修复故障的存储节点。但是, 若出现故障的存储节点的数目大于该分布式存储系统的冗余度,该分布式存储系统内的未出现故障的存储节点无法根据可以被访问的数据,重构故障的存储节点内不能被访问的数据,从而就可能会影响到分布式存储系统内正常的业务。并且考虑到分布式存储系统内存储的数据能否被访问也是影响业务的重要因素,当分布式存储系统内无法被访问的目标数据过多时,对分布式存储系统所提供的业务的影响也就比较大,可能会影响到业务的正常运行,因此,该监管节点可以先确定分布式存储系统的故障状态,以便可以快速根据故障状态,进行故障处理,来最大限度的降低对业务的影响程度。而当该目标数据的较少时,对业务影响也就相对较小,可能不会影响到业务的正常运行,为了可以为用户持续提供业务,该监管节点可以暂不进行故障处理。
监管节点可以通过预设条件,确定目标数据的数据量是否能够影响到业务的正常运行,该预设条件可以包括下述任一项:目标数据的数据量与第一预设数据量之间的比值大于预设比值,该第一预设数据量为该分布式存储系统存储的所有数据的总数据量;该目标数据的数据量大于第二预设数据量。当目标数据的数据量与第一预设数据量之间的比值大于预设比值时,或者当该目标数据的数据量大于第二预设数据量时,说明目标数据的数据量比较大,也就可能会影响到业务的正常运行,从而当目标数据的数据量符合预设条件时,该监管节点可以先将分布式存储系统的故障状态设置为第一故障状态,以便可以快速根据故障状态,进行故障处理,以降低对业务的影响程度。若超过第一预设时间,该至少一个存储节点未全部修复完成,则可以再将该故障状态更新为第二故障状态。需要说明的是,该预设比值可以是0.4、0.5或者是0.6,本发明实施例对该预设比值、第一预设数据量以及第二预设数据量不做具体限定。
为了能够将该至少一个存储节点的数目与该分布式存储系统的冗余度相比较,该监管节点可以查询该监管节点所存储的故障表,从故障表中确定该至少一个存储节点的数目。为了能够确定目标数据的数据量是否符合预设条件,该监管节点可以查询所存储的数据表,从数据表中确定该分布式存储系统内目前无法访问的目标数据的数据量。例如,监管节点通过查询表2可以确定目标数据包括数据标识M所指示的数据块内10KB的数据以及数据标识N所指示的数据块内20KB的数据。
需要说明的是,在前文中介绍故障表时,对从故障表中确定在该分布式存储系统内出现故障的存储节点的数目的过程进行了叙述,在此,本发明实施例对从故障表中确定在该分布式存储系统内出现故障的存储节点的数目的过程不做赘述。
需要说明的是,本步骤502所示的过程也即是根据该至少一个存储节点以及所述目标数据,确定该分布式存储系统的故障状态的过程。
503、该监管节点向该分布式存储系统内的所有存储节点发送用于指示第一故障状态的第一故障标识。
故障状态的故障标识包括第一故障标识和第二故障标识中的任一个,其中,第一故障标识用于指示第一故障状态,第二故障标识用于指示第二故障状态,该第一故障标识和第二故障标识可以不同,例如,第一故障标识可以是s,第二故障标识可以是t,本发明实施例对第一故障标识和第二故障标识的表示方式不做具体限定。
该监管节点可以向每个存储节点的NMS代理进程发送该第一故障标识,从而实现向所 有存储节点发送该第一故障标识,以告知所有存储节点分布式存储系统目前的故障状态为第一故障状态。
需要说明的是,本步骤503所示的过程也即是监管节点向分布式存储系统包含的多个存储节点中每一个存储节点发送所述故障状态的过程。
需要说明的是,在一些实施例中,监管节点在确定完故障状态后,还可以根据所述存储系统的故障状态,进行故障处理,本发明实施例对监管节点进行故障处理的过程不做具体限定。
504、该分布式存储系统中的目标存储节点接收该第一故障标识。
该目标存储节点为该分布式存储系统中的任一存储节点,该目标存储节点内的每个OSD进程可以从该目标存储节点的NMS代理进程获取该第一故障标识,从而该目标存储节点的每个OSD进程可以获取该第一故障标识。需要说明的是,该分布式存储系统内的每一个存储节点都可以执行本步骤504,对于故障的存储节点可能能够接收到该第一故障标识,也可能接收不到该第一故障标识。
505、目标设备向该目标存储节点发送访问请求,
该访问请求用于指示读取该目标存储节点所存储的数据或者向该目标存储节点写入数据。该目标设备为安装有SCSI处理进程的设备,可以是该目标客户端,还可以是目标存储节点,其中,目标客户端为该分布式存储系统中的任一客户端。本步骤506可以由目标设备内的SCSI处理进程来实现。
在本步骤之前505之前,分布式存储系统内的目标客户端可以向目标设备发送目标访问请求,该目标访问请求用于执行对第一目标数据进行处理,其中第一目标数据包括该访问请求所指示的数据,该目标访问请求可以携带目标存储地址,该目标存储地址可以是第一目标数据的存储地址。由目标客户端可以安装的目标虚拟机发送目标访问请求来。具体地,该目标虚拟机可以向该目标设备内的SCSI处理进程发送该目标访问请求。目标虚拟机向SCSI处理进程发送目标访问请求可以由用户的动作来触发,例如,当用户在客户端的界面内输入待读取的数据的存储地址,并点击读取按钮时,触发客户端内的目标虚拟机向SCSI处理进程发送目标访问请求,以请求读取到用户输入的存储地址处存储的数据。
然后,该目标设备接收分布式存储系统内的目标客户端发送的目标访问请求,本步骤506可以通下述方式来实现:目标设备基于该目标访问请求,向分布式存储系统内的目标存储节点发送访问请求。具体地,SCSI处理进程接收到该目标访问请求后,根据目标地址,对该目标访问请求进行转化和分解,得到多个访问请求,每个访问请求可以携带该目标地址中的部分地址,这部分地址可以是该目标存储节点内的任一OSD进程所管理的存储介质中的偏移地址,从而SCSI处理进程向目标存储节点内的OSD进程发送对应的访问请求,将访问请求转换访问请求的过程也即是前述的转换SCSI请求的过程。
506、当该分布式存储系统内的目标存储节点接收到该第一故障标识后,若该目标存储节点再接收到访问请求,该目标存储节点悬挂该访问请求,向目标设备发送该第一故障标识。
本步骤506可以由该目标存储节点内接收该访问请求的OSD进程来实现。当该目标存储节点接收到该第一故障标识后,说明该目标存储节点已经知道该分布式存储系统中有故 障的存储节点,且目前的故障状态为第一故障状态,由于目前分布式存储系统内的存储节点出现故障,则该目标存储节点可以将该访问请求悬挂,暂不处理该访问请求,等待故障的存储节点被自动修复或者手动修复。
为了使得目标设备也能获知分布式存储系统的故障状态,则该目标存储节点在接收到目标设备发送的访问请求时,该目标存储节点可以将该第一故障标识输出给目标设备。具体地,该目标存储节点内的任一OSD进程接收到目标设备SCSI处理进程发送的访问请求后,任一OSD进程向该SCSI处理进程发送该第一故障标识。当然,在一些实施例中,监管节点确定完该分布式存储系统的故障状态后,也可以将故障状态直接发送给目标设备,以便目标设备可以获取分布式存储系统的故障状态,也就无需通过存储节点向目标设备发送故障状态。
需要说明的是,本步骤506所示的过程也即是当该分布式存储系统中的目标存储节点接收到该故障标识后,若该目标存储节点再接收到访问请求时,输出故障标识的过程。
507、目标设备接收该目标存储节点基于该访问请求返回的第一故障标识。
本步骤507以由该目标设备内的SCSI处理进程来实现,本步骤507所示的过程也即是接收该目标存储节点基于该访问请求返回的故障标识的过程。当任一OSD进程向该SCSI处理进程发送该第一故障状态,该SCSI处理进程可以接收该第一故障标识。需要说明的是,本步骤507所示的过程,也即是,接收所述目标存储节点返回的响应;所述响应中包含所述分布式存储系统的故障状态;所述故障状态用于指示至少一个出现故障的存储节点能否在第一预设时长内全部被修复的过程,其中,所述响应中包含所述分布式存储系统的故障状态也即是故障标识。
508、目标设备基于接收的第一故障标识,进行故障处理。
本步骤508可以由目标设备内安装的SCSI处理进程来执行,当该SCSI处理进程接收到该第一故障标识后,该SCSI处理进程可以将基于第一故障标识,进行故障处理,以便目标客户端内能够进行相应地处理。
在一些实施例中,目标客户端中与该SCSI处理进程对接目标虚拟机可能是VMWare虚拟机,也可能不是VMWare虚拟机,由于不同的虚拟机目标设备进行故障处理的方式不同,因此,本步骤508可以通过下述方式1-2中的任一方式实现。
方式1、当该访问请求由该分布式存储系统中的目标客户端基于目标虚拟机发送,且该目标虚拟机是VMWare虚拟机时,若该故障状态为该第一故障状态,SCSI处理进程不向该目标虚拟机响应该访问请求。
当SCSI处理进程接收到第一故障标识时,说明分布式存储系统处于第一故障状态,对于VMWare虚拟机而言,第一故障状态也即是ADP状态。且由于SCSI处理进程接收的访问请求为该目标虚拟机发送的,那么,为了使得VMWare虚拟机可以感知到分布式存储系统处于自己定义的APD状态,SCSI处理进程不响应目标虚拟机,又因为分布式存储系统内所有的SCSI处理进程只要向OSD进程下发访问请求,均会接收到第一故障标识,而处理访问请求的每个SCSI处理进程均不会响应目标虚拟机,从而可以模拟分布式存储系统内所有链路无响应(DOWN),对于目标虚拟机而言,没有接收到SCSI处理进程的响应,目标虚拟机就会持续发送访问请求,从而即使分布式存储系统内的存储节点没有全部故障时, 也可以根据VMWare虚拟机定义的故障状态进行故障处理。
方式2、当该访问请求由该目标客户端基于目标虚拟机发送,且该目标虚拟机不是VMWare虚拟机时,若该故障状态为该第一故障状态,则SCSI处理进程向该目标虚拟机发送重试请求,该重试请求用于指示重新下发该访问请求。
重试请求中携带的检测关键字可以是Unit Attention(0x6)错误码,Unit Attention(0x6)错误码可以指示存储节点内的存储介质或链路状态发生了变化,也即是出现了故障,该目标虚拟机接收到该重试请求后,目标虚拟机向该SCSI处理进程重新下发一个访问请求,用于实现上述方式1中的处理方式。
对于不同的虚拟机,本发明实施例提供了不同的故障处理方式,从而使得该本发明实施例所提供的故障处理方式更具有普适性。
需要说明的是,本步骤508所示的过程也即是基于响应中包含的故障状态,进行故障处理的过程。
509、若在该第一预设时长内该至少一个存储节点全部修复,监管节点向该分布式存储系统内的各个设备发送修复完成响应,该修复完成响应用于指示该分布式存储系统内没有故障设备。
该各个设备包括存储节点以及客户端。当在该第一预设时长内该至少一个存储节点全部修复时,说明此时分布式存储系统内的没有故障设备,若该监管节点存储了该用于标识第一故障状态的第一故障标识,该监管节点可以删除该第一故障标识,并向该分布式存储系统内的各个设备发送修复完成响应,已告知各个设备该分布式存储系统内没有故障设备,可以正常工作,那么当各个设备接收到该修复完成响应后,删除之前接收的故障标识,并可以开始正常工作。需要说明的是,步骤509所示的过程也即是当该至少一个存储节点修复完成时,向该分布式存储系统内的各个设备发送修复完成响应的过程。
在现有技术中,当客户端未获取任何存储节点的响应时,直接认为分布式存储系统的故障状态为APD状态,当存储节点明确返回存储异常的消息时才会认为分布式存储系统的故障状态确定为PDL状态,若存储节点出现长期故障时,可能无法向客户端返回存储异常的消息,从而客户端可能将故障状态确定为APD状态,因此,现有技术中确定分布式存储系统的故障状态并不精确,且若将PDL状态误认为APD状态,将导致业务侧修复工作无法开展,最终反而延长故障修复时长。而在本发明提供的实施例中,分布式存储系统内的每个存储节点均知道分布式存储系统的故障状态,对于未出现故障的存储节点而言,可以基于SCSI请求向目标设备返回故障状态,从而目标设备可以明确的知道分布式存储系统的故障状态,进而可以提高目标设备确定故障状态的精度。
510、当该故障状态为第一故障状态时,若在该第一预设时长内该至少一个存储节点未全部被修复,则监管节点将该故障状态由第一故障状态更新为第二故障状态。
由于第一预设时长仅是一个预设的时长,在该第一预设时长内,该至少一个存储节点可能无法被全部修复,那么,当该至少一个存储节点可能未全部被修复时,就可能需要更长的时间修复还未修复的存储节点。由于修复还未修复的存储节点所用的时长不确定,可能会比较久,因此,该监管节点可以直接将该故障状态由第一故障状态更新为第二故障状态。
511、当将该故障状态由第一故障状态更新为该第二故障状态时,监管节点向该分布式存储系统内的所有存储节点发送用于指示该第二故障的第二故障标识。
监管节点向所有存储节点发送第二故障标识的方式与步骤503中所有存储节点发送第一故障标识的方式同理,在此,本发明实施例对该本步骤511不做赘述。本步骤511所示的过程也即是向该多个存储节点中每一个存储节点发送所述故障状态的过程。
512、目标存储节点接收该第二故障标识。
目标存储节点接收该第二故障标识与步骤504中接收第一故障标识的方式同理,在此,本发明实施例对该本步骤512不做赘述。
513、目标设备向该目标存储节点发送访问请求。
目标设备向该目标存储节点发送访问请求的方式在步骤505中有相关描述,在此,本发明实施例对该本步骤513不做赘述。
514、当该分布式存储系统内的目标存储节点接收到该第二故障标识时,若该目标存储节点再接收到访问请求,该目标存储节点悬挂该访问请求,输出该第二故障标识。
该任一存储节点悬挂访问请求以及输出该第二故障标识方式与步骤506中该目标存储节点悬挂访问请求以及输出该第一故障标识的方式同理,在此,本发明实施例对该本步骤514不做赘述。
515、目标设备接收该目标存储节点基于该访问请求返回的第二故障标识。
目标设备接收该第二故障标识的方式与步骤507中接收第一故障标识的方式同理,在此,本发明实施例对此不做赘述。
516、目标设备基于接收的第二故障标识,进行故障处理。
本步骤516可以由目标设备的SCSI处理进程来执行,在一些实施例中,客户端中与该SCSI处理进程对接目标虚拟机可能是VMWare虚拟机,也可能不是VMWare虚拟机,由于不同的虚拟机目标设备进行故障处理的方式不同,因此,本步骤516可以通过下述方式3-4中的任一方式实现。
方式3、当该访问请求由该目标客户端基于目标虚拟机发送,且该目标虚拟机是VMWare虚拟机时,若该故障状态为该第二故障状态,SCSI处理进程向VMWare虚拟机返回存储异常的消息。
当该故障标识第二故障标识时,说明至少一个存储节点在第一预设时间内不能全部被修复,需要更长的时间,目标虚拟机可以进行PDL状态下的故障处理,为了目标虚拟机可以感知到PDL状态,存储异常的消息可以携带VMWare虚拟机自定义的SK 0x0,ASC&ASCQ 0x0200或SK 0x5,ASC&ASCQ 0x2500等SCSI错误,SCSI错误可以指示分布式存储系统内的状态为PDL状态,从而目标虚拟机接收到该存储异常的消息就可以感知到PDL状态,那么该目标虚拟机可以将该分布式存储系统内的文件系统下电,等待技术人员修复分布式存储系统内的故障,或者按照用户自定义的故障处理方式选择较优故障处理方式来处理故障的存储节点。例如,将故障的存储节点下电。
需要说明的是,由于分布式存储系统内的存储节点出现不能短时修复的故障时,可能导致文件系统异常,为了保证分布式存储系统内的存储节点的故障被修复后,文件系统能够被正常使用,则需要先将文件系统下电。当修复完成时,在将文件系统进行上电,并修 复文件系统。
方式4、当该访问请求由该目标客户端基于目标虚拟机发送,且该目标虚拟机不是VMWare虚拟机时,若该故障状态为该第二故障状态,该目标设备向该目标虚拟机返回该目标虚拟机可识别的目标错误,该目标错误用于指示存储介质故障。
该目标错误可以是Sense key 0x3错误,也即是存储介质错误(Medium Error),一般的虚拟机均可以识别,当目标虚拟机接收到该目标错误,说明此时分布式存储系统的状态为第二故障状态,那么该目标设备可以将分布式文件系统下电,等待技术人员修复分布式存储系统内的故障,或者按照用户自定义的故障处理方式选择较优故障处理方式来处理故障的存储节点。
需要说明的是,本步骤516所示的过程也即是基于接收的故障标识,目标设备进行故障处理的过程,也即是基于所述响应中包含的故障状态,进行故障处理的过程。
517、当该至少一个存储节点修复完成时,向该分布式存储系统内的各个设备发送修复完成响应。
本步骤517与步骤509同理,在此本发明实施例对本步骤517不做赘述。需要说明的是,该每个存储节点的故障可以是被自身所修复,还可以被技术人员修复,本发明实施例对存储节点的修复方式不做具体限定。
需要说明的是,当客户端接收到该修复完成响应时,若文件系统已经下电,则该客户端对该文件系统上电,并修复该文件系统。由于文件系统内存储有大量的元数据,当对该文件系统进行修复时,需要扫描该文件系统内的所有元数据,并修改扫描到的错误元数据,一般该修复文件系统的过程需要消耗部分时间,在第一故障状态下,客户端不会下电文件系统,一旦在第一预设时间内该至少一个存储节点均能被修复,那么也就可以避免下电文件系统,从而可以减少修复文件系统的时间,使得分布式存储系统可以尽快恢复业务,以保证服务质量。
本发明实施例所示的方法,根据所述多个存储节点中的至少一个出现故障的存储节点,确定所述分布式存储系统的故障状态,从而无需当所有存储节点均故障时,才确定分布式存储系统的故障状态,当确定完故障状态后,可以立即向分布式存储系统内的每个存储节点发送故障状态,以便每个存储节点根据确定的故障状态进行故障处理,从而可以降低分布式存储系统恢复正常所用的时间。并且,对于不同的虚拟机,本发明实施例提供了不同的故障处理方式,从而使得该本发明实施例所提供的故障处理方式更具有普适性。并且,分布式存储系统内的每个存储节点均知道分布式存储系统的故障状态,对于未出现故障的存储节点而言,可以基于访问请求向目标设备返回故障状态,从而目标设备可以明确的知道分布式存储系统的故障状态,进而可以提高目标设备确定故障状态的精度。并且,在第一故障状态下,目标设备不会下电文件系统,一旦在第一预设时间内该至少一个存储节点均能被修复,那么也就可以避免下电文件系统,且当该至少一个存储节点恢复后,文件系统以及业务可以立即恢复,从而可以减少修复文件系统的时间,使得分布式存储系统可以尽快恢复业务,以保证服务质量。
由于存储节点出现的故障可能在短时间内可修复,还可能需要长时间修复,且每个存 储节点的修复时间,都可以影响到整个分布式存储系统的修复时间,因此,在一些实施例中,还可以根据每个存储节点的故障类型,来确定分布式存储系统的故障状态。为了进一步说明此过程,参见图6所示的本发明实施例所提供的一种分布式存储系统故障处理方法的流程图,该方法的流程可以包括以下步骤。
601、监管节点确定分布式存储系统内出现故障的至少一个存储节点以及该至少一个存储节点无法被访问的目标数据。
本步骤601与步骤501同理,在此本发明实施例对步骤601不做赘述。
602、监管节点根据该至少一个存储节点出现故障的时间,确定该分布式存储系统的故障场景,该故障场景用于指示该至少一个存储节点是否同时出现故障。
该故障场景可以包括第一故障场景和第二故障场景中的任一个,其中,该第一故障场景用于指示该至少一个存储节点同时出现故障,该第二故障场景用于指示该至少一个存储节点出现故障的时间不同。
该监管节点可以从存储的故障表中确定每一个存储节点出现故障的时间,从而该监管节点就可以根据至少一个存储节点出现故障的时间是否相同,确定故障场景。
在一种可能的实现方式中,当该至少一个存储节点在目标时长内均出现故障时,该监管节点将该故障场景确定为第一故障场景,否则,将该故障场景确定为第二故障场景。需要说明的是本步骤602所示的过程也即是根据所述至少一个存储节点出现故障的时间,确定所述故障场景的过程。
603、对于该至少一个存储节点中的任一存储节点,当该任一存储节点出现预设的网络故障、预设的异常掉电故障、预设的误操作故障、预设的硬件故障或预设的软件故障时,该监管节点将该任一存储节点的故障类型确定为第一故障类型,否则,将该任一存储节点的故障类型确定为第二故障类型。
存储节点的故障类型用于表示一个存储节点的故障能否在第二预设时长内被修复,故障类型可以包括第一故障类型和第二故障类型中的任一个,第二故障类型用于指示一个存储节点的故障不能在该第二预设时长内被修复,该第二预设时长可以小于或等于第一预设时长,本发明实施例对该第二预设时长不做具体限定。
其中,该预设的网络故障可以包括下述第1.1-1.7项中的任一项:
第1.1项、该任一存储节点的业务网口无法访问,该任一存储节点的监管网口能够访问,该业务网口是存储节点之间用于心跳、数据同步以及镜像时所使用的业务网络的网口,该监管网口为监控存储节点是否出现故障以及进行信息查询时所使用的监管网络的网口。
该监管节点可以通过监管节点的业务网口向任一存储节点的业务网口发送因特网包探索器(ping)请求,该ping请求用于请求建立连接,若可以连接成功,则认为该任一存储节点的业务网能够口访问,反之,则认为该任一存储节点的业务网口无法访问。同理,该监管节点可以通过监管节点的监管网口向任一存储节点的监管网口发送ping请求,若可以连接成功,则认为该任一存储节点的监管网口可以访问,否则认为该任一存储节点的监管网口无法访问。
若监管节点通过业务网口无法访问该任一存储节点,说明任一存储节点出现了网络故障,但是通过监管网口能够访问该任一存储节点,说明该任一存储节点出现的故障可以在 短时间内修复,则该任一存储节点出现了预设的网络故障。
第1.2项、当该业务网络和该监管网络为同一个目标网络时,该任一存储节点在该目标网络内传输的数据包出现第一预设数目的丢包或第二预设数目的畸形包,且该任一存储节点的业务网口、监管网口以及基板管理控制器BMC网口均不可访问,BMC网口为管理BMC的BMC网络的网口。
该监管节点可以通过监管节点的BMC网口向任一存储节点的BMC网口发送ping请求,若可以连接成功,则认为该任一存储节点的BMC网口可以访问,否则,认为该任一存储节点的BMC网口不可以访问。
在交付阶段,技术人员可以将业务网络和监管网络配置为同一个网络,也即是目标网络,当该业务网络和该监管网络为同一个目标网络时,如果该任一存储节点在该目标网络内传输的数据包出现第一预设数目的丢包或第二预设数目的畸形包,说明该任一存储节点出现了网络故障,若该监管节点无法访问该任一存储节点,说明该任一存储节点出现的故障可以在短时间内修复,则该任一存储节点出现了预设的网络故障。
第1.3项、当该业务网络和该监管网络为同一个目标网络时,该任一存储节点在该目标网络内传输的数据包出现大于第一预设数目的丢包或大于第二预设数目的畸形包,且该任一存储节点在目标网络内传输数据的时延大于第三预设时长。
需要说明的是,当监管节点在向该任一存储节点发送ping请求时,若连接不成功,该任一存储节点会向该监管节点发送连接失败响应,该连接失败响应用于指示连接失败,且该连接失败响应内可以携带时延信息,该时延信息用于指示该任一存储节点在目标网络内传输数据的时延,从而该监管节点可以判断时延信息所指示的时延是否大于第三预设时长。
第1.4项、当该业务网络和该监管网络为同一个目标网络时,该目标网络中出现该任一存储节点的优先级的流量控制PFC报文大于第三预设数目,且该任一存储节点不可访问。
监管节点可以检测目标网络内的各个优先级的流量控制(priority-based flow control,PFC)报文,从而可以确定该任一存储节点发送的PFC报文的数目是否大于第三预设数目,该监管节点就可以确定该任一存储节点是否符合预设条件。本发明实施例对该第三预设数目不做具体限定。
第1.5项、当该业务网络和该监管网络为同一个目标网络时,该目标网络中出现该任一存储节点发送第三预设数目的优先级的流量控制PFC报文,且该任一存储节点在该目标网络内传输数据的时延大于第四预设时长。
需要说明的是,本发明实施例对该四预设时长不做具体限定。
第1.6项、当该业务网络和该监管网络为同一个目标网络时,该目标网络中出现任一存储节点导致的广播风暴,且该任一存储节点的业务网口、监管网口以及BMC网口均不可访问。
需要说明的是,当任一存储节点在该目标网络内发送大量的广播包,则该目标网络中可能出现广播风暴。
第1.7项、当该业务网络和该监管网络为同一个目标网络时,该目标网络中出现该任一存储节点导致的广播风暴,且该任一存储节点在该目标网络内的时延大于第五预设时长。
需要说明的是,本发明实施例对该五预设时长不做具体限定。
该预设的异常掉电故障可以包括下述第2.1-2.2项中的任一项:
第2.1项、机框内的所有存储节点的业务网口、监管网口以及BMC网口均不可访问,该机框包括所述任一存储节点。
一个机框内可以包括至少一个存储节点,当所有的存储节点业务网口、监管网口以及BMC网口均不可访问时,可以认为机框内所有的存储节点均被下电,那么,若该任一存储节点在该机框内,说明该任一存储节点也被下电,只要给机框上电,就可以修复该任一存储节点的故障,则认为该任一存储节点出现了预设的异常掉电故障。
第2.2项、在第七预设时长内,第一目标个数的存储节点的业务网口、监管网口以及BMC网口均不可访问,该第一目标个数的存储节点包括该任一存储节点。
当在第七预设时长内,第一目标个数的存储节点的业务网口、监管网口以及BMC网口均不可访问时,第一目标个数的存储节点均可以认为出现了预设的异常掉电故障。需要说明的是,本发明实施例对该第七预设时长不做具体限定。
该预设的误操作故障可以包括:该任一存储节点被主动下电。例如,当用户点击任一存储节点的关机按钮或者重启按钮时,存储节点认为被主动下电,并将主动下电的信息发送给监管节点,从而监管节点确定该任一存储节点出现了预设的误操作故障。
该预设的硬件故障包括:任一存储节点异常退出,该任一存储节点的BMC网口能够访问,且该任一存储节点存在松动的部件。
当任一存储节点异常退出时,可以向监管节点发送异常退出的信息,以表示自己已经异常退出。由于异常退出可能是内部的部件松动导致。该任一部件松动可以是内存条以及卡条等,对于松动的部件通过插拔的方式可以立即恢复,也就是出现的是短时故障。需要说明的是,当该任一存储节点检测到任一部件连接不良时,说明该任一存储节点存在松动的部件,则该任一存储节点可以向监管节点发送松动部件的信息,以便监管节点可以根据松动部件的信息,确定该任一存储节点出现了预设的误操作故障。
该预设的软件故障可以包括下述第3.1-3.3项中的任一项:
第3.1项、该任一存储节点的操作系统异常导致该任一存储节点异常复位。
当该任一存储节点的内存不足时,导致任一存储节点的操作系统无法继续运行,需要复位,或者是看门狗触发的该任一存储节点异常复位等,当该任一存储节点出现异常复位时,该任一存储节点可以向监管节点发送异常复位的消息,从而该监管节点可以获知该任一存储节点异常复位,则说明该任一存储节点出现了预设的软件故障。
第3.2项、该任一存储节点的软件异常导致该任一存储节点的目标进程退出。
该目标进程可以是OSD进程,当该任一存储节点出现异常复位时,该任一存储节点可以向监管节点发送目标进程退出的消息,从而该监管节点可以获知该任一存储节点的目标进程退出,则说明该任一存储节点出现了预设的软件故障。
第3.3项、该任一存储节点的软件异常导致该任一存储节点的操作系统复位。
由于软件异常,该任一存储节点操作系统出现复位时,该任一存储节点可以向监管节点发送操作系统复位的消息,从而该监管节点可以获知该任一存储节点的操作系统复位,则说明该任一存储节点出现了预设的软件故障。
需要说明是,当该至少一个存储节点中的每个存储节点出现故障时,监管节点就可以 通过本步骤603判断每个存储节点的故障类型是第一故障类型,还是第二故障类型,并将每个存储节点的故障类型存储在故障表中,以便需要监管节点需要任一存储节点的故障类型时,可以直接从故障表中获取。
需要说明的是,本步骤603中所体现的多种故障类型判别方法,可以精确地确定每个存储节点的故障类型,进而根据每个存储节点的故障类型确,可以更加精确的确定分布式存储系统的故障状态。
604、当该至少一个存储节点的数目大于该分布式存储系统的冗余度,且该目标数据的数据量符合预设条件时,若该故障场景为该第一故障场景,该监管节点则根据该至少一个存储节点中每一个存储节点的故障类型,确定该故障状态。
由于第一故障场景用于表示该至少一个存储节点同时出现,则该监管节点可以根据每一个存储节点,来确定该分布式存储系统的故障状态。
在一种可能的实现方式中,监管节点则根据该至少一个存储节点中每一个存储节点的故障类型,确定该故障状态可以包括:当该至少一个存储节点中每一个存储节点的故障类型均为第一故障类型时,该监管节点将该故障状态确定为该第一故障状态,该第一故障状态用于指示该至少一个存储节点能在所述第一预设时长内全部被修复;当该至少一个存储节点中目标个数的存储节点的故障类型为第二故障类型时,若该目标个数小于或者等于该分布式存储系统的冗余度时,该监管节点将该故障状态确定为该第一故障状态,否则,将该故障状态确定为该第二故障状态,该第二故障状态用于指示所述至少一个存储节点不能在所述第一预设时长内全部被修复。
当该至少一个存储节点中每一个存储节点的故障类型均为第一故障类型时,可以认为分布式存储系统内的故障可以在短时间内被修复,则可以将该故障状态确定为第一故障状态。虽然目标个数的存储节点的故障类型为第二故障类型,但是当目标个数小于或等于该分布式存储系统的冗余度,说明目标个数的存储节点对分布式存储系统的影响不大,则可以将该故障状态确定为第一故障状态。一旦目标个数大于该分布式存储系统的冗余度,说明目标个数的存储节点对分布式存储系统的影响较大,则可以将该故障状态确定为第二故障状态,以便迅速修复故障。
需要说明的是,对于该至少一个存储节点的数目是否大于该分布式存储系统的冗余度,且该目标数据的数据量是否符合预设条件的描述,在前文中有体现,对此本发明实施例不做赘述。
需要说明的是,当故障场景为第一故障场景时,监管节点可以通过步骤603中预设的网络故障,预设的异常掉电故障或预设的误操作故障中的任一预设故障,判断每个存储节点的故障类型。
605、当该至少一个存储节点的数目大于该分布式存储系统的冗余度,且该目标数据的数据量符合预设条件时,若该故障场景为第二故障场景,监管节点则根据该至少一个存储节点中最后一个出现故障的第一存储节点的故障类型,确定该故障状态。
由于第二故障场景用于表示该至少一个存储节点出现故障的时间不同,则该监管节点可以根据该至少与一个存储节点中最后一个出现故障的存储节点的故障类型,来确定该分布式存储系统的故障状态,其中,最后一个出现故障的存储节点也即是第一存储节点。
在一种可能的实现方式中,当该第一存储节点的故障类型为第一故障类型时,监管节点则将该故障状态确定为第一故障状态,该第一故障状态用于指示该至少一个存储节点能在该第一预设时长内全部被修复;当该第一存储节点的故障类型为所述第二故障类型时,监管节点则将该故障状态确定为第二故障状态,该第二故障状态用于指示该至少一个存储节点不能在所述第一预设时长内全部被修复。
需要说明的是,本步骤604和605所示的过程也即是根据所述分布式存储系统的故障场景,确定所述分布式存储系统的故障状态的过程。
需要说明的是,当故障场景为第二故障场景时,监管节点仅需要确定第一存储节点的故障类型即可,无需确定所以存储节点的故障类型,可以通过步骤603中预设的网络故障,预设的异常掉电故障、预设的误操作故障、预设的硬件故障或预设的软件故障中的任一预设故障,来判断第一存储节点的故障类型。
606、该监管节点向该分布式存储系统内的所有存储节点发送用于指示故障状态的故障标识。
当该故障状态为第一故障状态时,该监管节点向该分布式存储系统内的所有存储节点发送第一故障标识,当该故障状态为第二故障状态时,向该监管节点向该分布式存储系统内的所有存储节点发送第二故障标识,具体执行过程与步骤503同理,在此不做赘述。需要说明的是,本步骤606所示的过程也即是向所述多个存储节点中每一个存储节点发送所述故障状态的过程。
607、该分布式存储系统内的目标存储节点接收故障标识。
当该故障标识为第一故障标识时,目标存储节点接收到第一故障标识,当该故障标识为第二故障标识时,目标存储节点接收到第二故障标识,具体执行过程与步骤504同理,在此不做赘述。
608、目标设备向该目标存储节点发送访问请求。
本步骤608与步骤505所示的过程同理,本发明实施例对本步骤608不做赘述。
609、当该分布式存储系统内的目标存储节点接收到该故障标识后,若该目标存储节点再接收到访问请求,该目标存储节点悬挂该访问请求,输出该故障标识。
当该故障标识为第一故障标识时,目标存储节点向目标设备输出第一故障标识,当该故障标识为第二故障标识时,目标存储节点向目标设备输出第二故障标识,具体执行过程与步骤506同理,在此不做赘述。
610、目标设备接收该目标存储节点基于该访问请求返回的故障标识。
本步骤610与步骤505所示的过程同理,本发明实施例对本步骤610不做赘述。需要说明的是,本步骤610所示的过程也即是接收所述目标存储节点返回的响应;所述响应中包含所述分布式存储系统的故障状态;所述故障状态用于指示至少一个出现故障的存储节点能否在第一预设时长内全部被修复的过程。其中,所述响应中包含的故障状态也即是故障标识。
611、目标设备基于接收的故障标识,进行故障处理。
当该故障标识为第一故障标识时,目标设备基于接收的第一故障标识,进行故障处理,具体执行过程与步骤508所示的过程同理。当该故障标识为第二故障标识时,目标设备基 于接收的第二故障标识,进行故障处理,具体执行过程与步骤516所示的过程同理,在此,本发明实施例对本步骤611不做赘述。
需要说明的是,分布式存储系统内的每个存储节点均知道分布式存储系统的故障状态,对于未出现故障的存储节点而言,可以基于访问请求向目标设备返回故障状态,从而目标设备可以明确的知道分布式存储系统的故障状态,进而可以提高目标设备确定故障状态的精度。
需要说明的是,对于不同的虚拟机,本发明实施例提供了不同的故障处理方式,从而使得该本发明实施例所提供的故障处理方式更具有普适性。
需要说明的是,本步骤611所示的过程也即是基于所述响应中包含的故障状态,进行故障处理的过程。
612、当该至少一个存储节点修复完成时,监管节点向该分布式存储系统内的各个设备发送修复完成响应。
本步骤612与步骤509同理,在此本发明实施例对本步骤612不做赘述。需要说明的是,当该故障状态为第一故障状态时,若在该第一预设时长内该至少一个存储节点全部被修复完整,则可以直接执行本步骤612,若在该第一预设时长内该至少一个存储节点未全部修复,则监管节点将该故障状态由第一故障状态更新为第二故障状态,并跳转执行步骤606。需要说明的是,在第一故障状态下,客户端不会下电文件系统,一旦在第一预设时间内该至少一个存储节点均能被修复,那么也就可以避免下电文件系统,从而可以减少修复文件系统的时间,使得分布式存储系统可以尽快恢复业务,以保证服务质量。
本发明实施例所示的方法,根据所述多个存储节点中的至少一个出现故障的存储节点,确定所述分布式存储系统的故障状态,从而无需当所有存储节点均故障时,才确定分布式存储系统的故障状态,当确定完故障状态后,可以立即向分布式存储系统内的每个存储节点发送故障状态,以便每个存储节点根据确定的故障状态进行故障处理,从而可以降低分布式存储系统恢复正常所用的时间。并且,对于不同的虚拟机,本发明实施例提供了不同的故障处理方式,从而使得该本发明实施例所提供的故障处理方式更具有普适性。并且,分布式存储系统内的每个存储节点均知道分布式存储系统的故障状态,对于未出现故障的存储节点而言,可以基于访问请求向目标设备返回故障状态,从而目标设备可以明确的知道分布式存储系统的故障状态,进而可以提高目标设备确定故障状态的精度。并且,在第一故障状态下,目标设备不会下电文件系统,一旦在第一预设时间内该至少一个存储节点均能被修复,那么也就可以避免下电文件系统,从而可以减少修复文件系统的时间,使得分布式存储系统可以尽快恢复业务,以保证服务质量。并且,本步骤603中所体现的多种故障类型判别方法,可以精确地确定每个存储节点的故障类型,进而根据每个存储节点的故障类型确,可以更加精确的确定分布式存储系统的故障状态。
图7是本发明实施例提供的一种故障处理装置的结构示意图,应用于分布式存储系统,所述分布式存储系统包含多个存储节点,该装置包括:
确定模块701,用于根据所述多个存储节点中的至少一个出现故障的存储节点,确定所述分布式存储系统的故障状态;所述故障状态用于指示所述至少一个出现故障的存储节 点能否在第一预设时长内全部被修复;
发送模块702,用于向所述多个存储节点中每一个存储节点发送所述故障状态。
可选地,所述装置还包括:
处理模块,用于向所述多个存储节点中每一个存储节点发送所述故障状态。
可选地,所述确定模块701,所述确定模块包括:
第一确定单元,用于执行上述步骤501;
第二确定单元,用于根据所述至少一个存储节点以及所述目标数据,确定所述故障状态。
可选地,所述第二确定单元用于:
当所述至少一个存储节点的数目大于所述分布式存储系统的冗余度,且所述目标数据的数据量符合预设条件时,将所述故障状态确定为第一故障状态,所述第一故障状态用于指示所述至少一个存储节点能在所述第一预设时长内全部被修复。
可选地,所述第二确定单元用于:
当所述至少一个存储节点的数目大于所述分布式存储系统的冗余度,且所述目标数据的数据量符合预设条件时,根据所述分布式存储系统的故障场景,确定所述故障状态,所述故障场景用于指示所述至少一个存储节点是否同时出现故障。
可选地,所述预设条件包括下述任一项:
所述目标数据的数据量与第一预设数据量之间的比值大于预设比值,所述第一预设数据量为所述分布式存储系统存储的所有数据的总数据量;
所述目标数据的数据量大于第二预设数据量。
可选地,所述装置还包括:
更新模块,用于当所述故障状态为所述第一故障状态时,若在所述第一预设时长内所述至少一个存储节点未全部被修复,则将所述故障状态由第一故障状态更新为第二故障状态,所述第二故障状态用于指示所述至少一个存储节点不能在所述第一预设时长内全部被修复。
可选地,所述第二确定单元,用于执行上述步骤602。
可选地,所述第二确定单元,用于当所述至少一个存储节点在目标时长内均出现故障时,将所述故障场景确定为第一故障场景,否则,将所述故障场景确定为第二故障场景,所述第一故障场景用于指示所述至少一个存储节点同时出现故障,所述第二故障场景用于指示所述至少一个存储节点出现故障的时间不同。
可选地,所述第二确定单元包括:
第一确定子单元,用于执行上述步骤604;
第二确定子单元,用于执行上述步骤605。
可选地,所述第一确定子单元用于:
当所述至少一个存储节点中每一个存储节点的故障类型均为第一故障类型时,将所述故障状态确定为第一故障状态,所述第一故障类型用于指示一个存储节点的故障能在所述第二预设时长内被修复,所述第一故障状态用于指示所述至少一个存储节点能在所述第一预设时长内全部被修复;
当所述至少一个存储节点中目标个数的存储节点的故障类型为第二故障类型时,若所述目标个数小于或等于所述分布式存储系统的冗余度,将所述故障状态确定为所述第一故障状态,否则,将所述故障状态确定为第二故障状态,所述第二故障类型用于指示一个存储节点的故障不能在所述第二预设时长内被修复,所述第二故障状态用于指示所述至少一个存储节点不能在所述第一预设时长内全部被修复。
可选地,所述第二确定子单元用于:
当所述第一存储节点的故障类型为第一故障类型时,将所述故障状态确定为第一故障状态,所述第一故障类型用于指示一个存储节点的故障能在所述第二预设时长内被修复,所述第一故障状态用于指示所述至少一个存储节点能在所述第一预设时长内全部被修复;
当所述第一存储节点的故障类型为第二故障类型时,则将所述故障状态确定为第二故障状态,所述第二故障类型用于指示一个存储节点的故障不能在所述第二预设时长内被修复,所述第二故障状态用于指示所述至少一个存储节点不能在所述第一预设时长内全部被修复。
可选地,所述确定模块701,还用于执行步骤603。
可选地,所述发送模块702,还用于执行上述步骤509。
图8是本发明实施例提供的一种故障处理装置的结构示意图,所述分布式存储系统包含多个存储节点;该装置包括:
发送模块801,用于执行上述步骤608;
接收模块802,用于接收所述目标存储节点返回的响应;所述响应中包含所述分布式存储系统的故障状态;所述故障状态用于指示至少一个出现故障的存储节点能否在第一预设时长内全部被修复。
可选地,该装置还包括:
处理模块,用于基于所述响应中包含的故障状态,进行故障处理。
可选地,所述故障状态的故障标识包括第一故障标识或第二故障标识中的任一个,其中,所述第一故障标识用于指示第一故障状态,所述第二故障标识用于指示第二故障状态,所述第一故障状态用于指示至少一个存储节点能在第一预设时长内全部被修复,所述第二故障状态用于指示所述至少一个存储节点不能在所述第一预设时长内全部被修复,所述存储节点为所述分布式存储系统中出现故障的存储节点。
可选地,所述处理模块用于:
当所述访问请求由所述分布式存储系统中的目标客户端基于目标虚拟机发送,且所述目标虚拟机是VMWare虚拟机时,若所述故障状态为所述第一故障状态,不向所述目标虚拟机响应所述访问请求,所述第一故障状态用于指示至少一个存储节点能在第一预设时长内全部被修复;
当所述访问请求由所述目标客户端基于目标虚拟机发送,且所述目标虚拟机是VMWare虚拟机时,若所述故障状态为所述第二故障状态时,向所述目标虚拟机返回存储异常的消息,所述第二故障状态用于指示所述至少一个存储节点不能在所述第一预设时长内全部被修复。
可选地,所述处理模块用于:
当所述目标访问请求由所述分布式存储系统中的目标客户端基于目标虚拟机发送,且所述目标虚拟机不是VMWare虚拟机时,若所述故障状态为所述第一故障状态,向所述目标虚拟机发送重试请求,所述重试请求用于指示重新下发所述访问请求,所述第一故障状态用于指示至少一个存储节点能在第一预设时长内全部被修复;
当所述目标访问请求由所述目标客户端基于目标虚拟机发送,且所述目标虚拟机不是VMWare虚拟机时,若所述故障状态为所述第二故障状态,向所述目标虚拟机返回所述目标虚拟机可识别的目标错误,所述目标错误用于指示存储介质故障,所述第二故障状态用于指示所述至少一个存储节点不能在所述第一预设时长内全部被修复。
可选地,所述接收模块802,还用于接收所述分布式存储系统中的目标客户端发送的目标访问请求,所述目标访问请求用于指示对第一目标数据进行处理,所述第一目标数据包括所述目标数据;
所述发送模块801,用于基于所述目标访问请求,向分布式存储系统内的目标存储节点发送所述访问请求。
可选地,接收模块802,用于接收目标存储节点返回的修复完成响应,所述修复完成响应用于指示所述分布式存储系统内没有故障设备。
本发明实施例还提供一种分布式存储系统,所述分布式存储系统包括监管节点和多个存储节点;
所述监管节点用于:
根据所述多个存储节点中的至少一个出现故障的存储节点确定所述分布式存储系统的故障状态;所述故障状态用于指示所述至少一个出现故障的存储节点能否在第一预设时长内全部被修复;
向所述多个存储节点中每一个存储节点发送所述故障状态;
所述多个存储节点中的每一个存储节点,用于接收所述故障状态。
可选地,所述故障状态的故障标识包括第一故障标识和第二故障标识中的任一个,所述第一故障标识用于指示所述第一故障状态,所述第二故障标识用于指示第二故障状态,所述第一故障状态用于指示所述至少一个存储节点能在所述第一预设时长内全部被修复,所述第二故障状态用于指示所述至少一个存储节点不能在所述第一预设时长内全部被修复。
可选地,所述多个存储节点中的每一个存储节点,还用于当接收到所述故障标识后,若再接收到所述访问请求,悬挂所述访问请求,基于接收的故障状态,进行故障处理。
需要说明的上述提供的分布式存储系统内的各个设备均可以是实施例5和6中的设备。
上述所有可选技术方案,可以采用任意结合形成本公开的可选实施例,在此不再一一赘述。
需要说明的是:上述实施例提供的故障处理装置在处理故障时,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将装置的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。 另外,上述实施例提供的分布式存储系统故障处理方法实施例属于同一构思,其具体实现过程详见方法实施例,这里不再赘述。
本领域普通技术人员可以理解实现上述实施例的全部或部分步骤可以通过硬件来完成,也可以通过程序来指令相关的硬件完成,所述的程序可以存储于一种计算机可读存储介质中,上述提到的存储介质可以是只读存储器,磁盘或光盘等。
以上所述仅为本发明的较佳实施例,并不用以限制本发明,凡在本发明的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。
Claims (22)
- 一种分布式存储系统故障处理方法,其特征在于,所述分布式存储系统包含多个存储节点;所述方法包括:根据所述多个存储节点中的至少一个出现故障的存储节点,确定所述分布式存储系统的故障状态;所述故障状态用于指示所述至少一个出现故障的存储节点能否在第一预设时长内全部被修复;向所述多个存储节点中每一个存储节点发送所述故障状态。
- 根据权利要求1所述的方法,其特征在于,所述方法还包括:根据所述存储系统的故障状态,进行故障处理。
- 根据权利要求1所述的方法,其特征在于,所述根据所述多个存储节点中的至少一个出现故障的存储节点,确定所述分布式存储系统的故障状态包括:确定所述分布式存储系统内出现故障的至少一个存储节点以及所述至少一个存储节点内无法被访问的目标数据;根据所述至少一个存储节点以及所述目标数据,确定所述故障状态。
- 根据权利要求3所述的方法,其特征在于,所述根据所述至少一个存储节点以及所述目标数据,确定所述故障状态包括:当所述至少一个存储节点的数目大于所述分布式存储系统的冗余度,且所述目标数据的数据量符合预设条件时,将所述故障状态确定为第一故障状态,所述第一故障状态用于指示所述至少一个存储节点能在所述第一预设时长内全部被修复。
- 根据权利要求3所述的方法,其特征在于,所述根据所述至少一个存储节点以及所述目标数据,确定所述故障状态包括:当所述至少一个存储节点的数目大于所述分布式存储系统的冗余度,且所述目标数据的数据量符合预设条件时,根据所述分布式存储系统的故障场景,确定所述故障状态,所述故障场景用于指示所述至少一个存储节点是否同时出现故障。
- 一种分布式存储系统故障处理方法,其特征在于,所述分布式存储系统包含多个存储节点;所述方法包括:向所述分布式存储系统中的目标存储节点发送访问请求;接收所述目标存储节点返回的响应;所述响应中包含所述分布式存储系统的故障状态;所述故障状态用于指示至少一个出现故障的存储节点能否在第一预设时长内全部被修复。
- 根据权利要求6所述的方法,其特征在于,所述接收所述目标存储节点返回的响应之后,所述方法还包括:基于所述响应中包含的故障状态,进行故障处理。
- 根据权利要求7所述的方法,其特征在于,所述基于所述响应中包含的故障状态,进行故障处理包括:当所述访问请求由所述分布式存储系统中的目标客户端基于目标虚拟机发送,且所述目标虚拟机是VMWare虚拟机时,若所述故障状态为所述第一故障状态,不向所述目标虚拟机响应所述访问请求,所述第一故障状态用于指示至少一个存储节点能在第一预设时长内全部被修复;当所述访问请求由所述目标客户端基于目标虚拟机发送,且所述目标虚拟机是VMWare虚拟机时,若所述故障状态为所述第二故障状态时,向所述目标虚拟机返回存储异常的消息,所述第二故障状态用于指示所述至少一个存储节点不能在所述第一预设时长内全部被修复。
- 根据权利要求7所述的方法,其特征在于,所述基于所述响应中包含的故障状态,进行故障处理包括:当所述访问请求由所述分布式存储系统中的目标客户端基于目标虚拟机发送,且所述目标虚拟机不是VMWare虚拟机时,若所述故障状态为所述第一故障状态,向所述目标虚拟机发送重试请求,所述重试请求用于指示重新下发所述访问请求,所述第一故障状态用于指示至少一个存储节点能在第一预设时长内全部被修复;当所述访问请求由所述目标客户端基于目标虚拟机发送,且所述目标虚拟机不是VMWare虚拟机时,若所述故障状态为所述第二故障状态,向所述目标虚拟机返回所述目标虚拟机可识别的目标错误,所述目标错误用于指示存储介质故障,所述第二故障状态用于指示所述至少一个存储节点不能在所述第一预设时长内全部被修复。
- 一种分布式存储系统,其特征在于,所述分布式存储系统包括监管节点和多个存储节点;所述监管节点用于:根据所述多个存储节点中的至少一个出现故障的存储节点确定所述分布式存储系统的故障状态;所述故障状态用于指示所述至少一个出现故障的存储节点能否在第一预设时长内全部被修复;向所述多个存储节点中每一个存储节点发送所述故障状态;所述多个存储节点中的每一个存储节点,用于接收所述故障状态。
- 根据权利要求10所述的系统,其特征在于,所述多个存储节点中的每一个存储节点,还用于当接收到所述故障标识后,若再接收到所述访问请求,悬挂所述访问请求,基于接收的故障状态,进行故障处理。
- 一种故障处理装置,其特征在于,应用于分布式存储系统,所述分布式存储系统 包含多个存储节点,所述装置包括:确定模块,用于根据所述多个存储节点中的至少一个出现故障的存储节点,确定所述分布式存储系统的故障状态;所述故障状态用于指示所述至少一个出现故障的存储节点能否在第一预设时长内全部被修复;发送模块,用于向所述多个存储节点中每一个存储节点发送所述故障状态。
- 根据权利要求12所述的装置,其特征在于,所述装置还包括:处理模块,用于向所述多个存储节点中每一个存储节点发送所述故障状态。
- 根据权利要求12所述的装置,其特征在于,所述确定模块包括:第一确定单元,用于确定所述分布式存储系统内出现故障的至少一个存储节点以及所述至少一个存储节点内无法被访问的目标数据;第二确定单元,用于根据所述至少一个存储节点以及所述目标数据,确定所述故障状态。
- 根据权利要求14所述的装置,其特征在于,所述第二确定单元用于:当所述至少一个存储节点的数目大于所述分布式存储系统的冗余度,且所述目标数据的数据量符合预设条件时,将所述故障状态确定为第一故障状态,所述第一故障状态用于指示所述至少一个存储节点能在所述第一预设时长内全部被修复。
- 根据权利要求14所述的装置,其特征在于,所述第二确定单元用于:当所述至少一个存储节点的数目大于所述分布式存储系统的冗余度,且所述目标数据的数据量符合预设条件时,根据所述分布式存储系统的故障场景,确定所述故障状态,所述故障场景用于指示所述至少一个存储节点是否同时出现故障。
- 一种故障处理装置,其特征在于,应用于分布式存储系统,所述分布式存储系统包含多个存储节点,所述装置包括:发送模块,用于向所述分布式存储系统中的目标存储节点发送访问请求,所述分布式存储系统包含多个存储节点;接收模块,用于接收所述目标存储节点返回的响应;所述响应中包含所述分布式存储系统的故障状态;所述故障状态用于指示至少一个出现故障的存储节点能否在第一预设时长内全部被修复。
- 根据权利要求17所述的装置,其特征在于,所述装置还包括:处理模块,用于基于所述响应中包含的故障状态,进行故障处理。
- 根据权利要求18所述的装置,其特征在于,所述处理模块用于:当所述访问请求由所述分布式存储系统中的目标客户端基于目标虚拟机发送,且所述 目标虚拟机是VMWare虚拟机时,若所述故障状态为所述第一故障状态,不向所述目标虚拟机响应所述访问请求,所述第一故障状态用于指示至少一个存储节点能在第一预设时长内全部被修复;当所述访问请求由所述目标客户端基于目标虚拟机发送,且所述目标虚拟机是VMWare虚拟机时,若所述故障状态为所述第二故障状态时,向所述目标虚拟机返回存储异常的消息,所述第二故障状态用于指示所述至少一个存储节点不能在所述第一预设时长内全部被修复。
- 根据权利要求18所述的装置,其特征在于,所述处理模块用于:当所述目标访问请求由所述分布式存储系统中的目标客户端基于目标虚拟机发送,且所述目标虚拟机不是VMWare虚拟机时,若所述故障状态为所述第一故障状态,向所述目标虚拟机发送重试请求,所述重试请求用于指示重新下发所述访问请求,所述第一故障状态用于指示至少一个存储节点能在第一预设时长内全部被修复;当所述目标访问请求由所述目标客户端基于目标虚拟机发送,且所述目标虚拟机不是VMWare虚拟机时,若所述故障状态为所述第二故障状态,向所述目标虚拟机返回所述目标虚拟机可识别的目标错误,所述目标错误用于指示存储介质故障,所述第二故障状态用于指示所述至少一个存储节点不能在所述第一预设时长内全部被修复。
- 一种计算机设备,其特征在于,所述计算机设备包括处理器和存储器,所述存储器中存储有至少一条指令,所述指令由所述处理器加载并执行以实现如权利要求1至权利要求9任一项所述的分布式存储系统故障处理方法所执行的操作。
- 一种存储介质,其特征在于,所述存储介质中存储有至少一条指令,所述指令由处理器加载并执行以实现如权利要求1至权利要9任一项所述的分布式存储系统故障处理方法所执行的操作。
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201910741190.5 | 2019-08-12 | ||
| CN201910741190.5A CN110535692B (zh) | 2019-08-12 | 2019-08-12 | 故障处理方法、装置、计算机设备、存储介质及存储系统 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2021027481A1 true WO2021027481A1 (zh) | 2021-02-18 |
Family
ID=68662506
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2020/102302 Ceased WO2021027481A1 (zh) | 2019-08-12 | 2020-07-16 | 故障处理方法、装置、计算机设备、存储介质及存储系统 |
Country Status (2)
| Country | Link |
|---|---|
| CN (1) | CN110535692B (zh) |
| WO (1) | WO2021027481A1 (zh) |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US11544139B1 (en) * | 2021-11-30 | 2023-01-03 | Vast Data Ltd. | Resolving erred 10 flows |
Families Citing this family (12)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN110535692B (zh) * | 2019-08-12 | 2020-12-18 | 华为技术有限公司 | 故障处理方法、装置、计算机设备、存储介质及存储系统 |
| CN111371848A (zh) * | 2020-02-21 | 2020-07-03 | 苏州浪潮智能科技有限公司 | 一种请求处理方法、装置、设备及存储介质 |
| CN113805788B (zh) * | 2020-06-12 | 2024-04-09 | 华为技术有限公司 | 一种分布式存储系统及其异常处理方法和相关装置 |
| CN112187919B (zh) * | 2020-09-28 | 2024-01-23 | 腾讯科技(深圳)有限公司 | 一种存储节点管理方法及相关装置 |
| CN113032106B (zh) * | 2021-04-29 | 2024-07-09 | 中国工商银行股份有限公司 | 计算节点io悬挂异常自动检测方法及装置 |
| CN113326251B (zh) * | 2021-06-25 | 2024-02-23 | 深信服科技股份有限公司 | 数据管理方法、系统、设备和存储介质 |
| CN114584454B (zh) * | 2022-02-21 | 2023-08-11 | 苏州浪潮智能科技有限公司 | 一种服务器信息的处理方法、装置、电子设备及存储介质 |
| CN119200963B (zh) * | 2022-04-28 | 2025-08-08 | 华为技术有限公司 | 存储装置及数据处理方法 |
| CN116382850B (zh) * | 2023-04-10 | 2023-11-07 | 北京志凌海纳科技有限公司 | 一种利用多存储心跳检测的虚拟机高可用管理装置及系统 |
| WO2025000362A1 (en) * | 2023-06-29 | 2025-01-02 | Nokia Shanghai Bell Co., Ltd. | Supervision on supervision object |
| CN118567576B (zh) * | 2024-07-31 | 2024-10-29 | 浪潮电子信息产业股份有限公司 | 多控存储器系统及其数据存储方法、设备、介质、产品 |
| CN120780250B (zh) * | 2025-09-02 | 2025-12-16 | 浪潮电子信息产业股份有限公司 | 多控存储系统的存储管理方法、设备、程序产品及介质 |
Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN103092712A (zh) * | 2011-11-04 | 2013-05-08 | 阿里巴巴集团控股有限公司 | 一种任务中断恢复方法和设备 |
| CN104935481A (zh) * | 2015-06-24 | 2015-09-23 | 华中科技大学 | 一种分布式存储下基于冗余机制的数据恢复方法 |
| US20190004845A1 (en) * | 2017-06-28 | 2019-01-03 | Vmware, Inc. | Virtual machine placement based on device profiles |
| CN109831342A (zh) * | 2019-03-19 | 2019-05-31 | 江苏汇智达信息科技有限公司 | 一种基于分布式系统的故障恢复方法 |
| CN110535692A (zh) * | 2019-08-12 | 2019-12-03 | 华为技术有限公司 | 故障处理方法、装置、计算机设备、存储介质及存储系统 |
Family Cites Families (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN108984107B (zh) * | 2017-06-02 | 2021-06-29 | 伊姆西Ip控股有限责任公司 | 提高存储系统的可用性 |
-
2019
- 2019-08-12 CN CN201910741190.5A patent/CN110535692B/zh active Active
-
2020
- 2020-07-16 WO PCT/CN2020/102302 patent/WO2021027481A1/zh not_active Ceased
Patent Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN103092712A (zh) * | 2011-11-04 | 2013-05-08 | 阿里巴巴集团控股有限公司 | 一种任务中断恢复方法和设备 |
| CN104935481A (zh) * | 2015-06-24 | 2015-09-23 | 华中科技大学 | 一种分布式存储下基于冗余机制的数据恢复方法 |
| US20190004845A1 (en) * | 2017-06-28 | 2019-01-03 | Vmware, Inc. | Virtual machine placement based on device profiles |
| CN109831342A (zh) * | 2019-03-19 | 2019-05-31 | 江苏汇智达信息科技有限公司 | 一种基于分布式系统的故障恢复方法 |
| CN110535692A (zh) * | 2019-08-12 | 2019-12-03 | 华为技术有限公司 | 故障处理方法、装置、计算机设备、存储介质及存储系统 |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US11544139B1 (en) * | 2021-11-30 | 2023-01-03 | Vast Data Ltd. | Resolving erred 10 flows |
Also Published As
| Publication number | Publication date |
|---|---|
| CN110535692A (zh) | 2019-12-03 |
| CN110535692B (zh) | 2020-12-18 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN110535692B (zh) | 故障处理方法、装置、计算机设备、存储介质及存储系统 | |
| US9021317B2 (en) | Reporting and processing computer operation failure alerts | |
| US20190220379A1 (en) | Troubleshooting Method, Apparatus, and Device | |
| US20120221885A1 (en) | Monitoring device, monitoring system and monitoring method | |
| CN105095001A (zh) | 分布式环境下虚拟机异常恢复方法 | |
| CN105607973B (zh) | 一种虚拟机系统中设备故障处理的方法、装置及系统 | |
| US10275330B2 (en) | Computer readable non-transitory recording medium storing pseudo failure generation program, generation method, and generation apparatus | |
| US8949653B1 (en) | Evaluating high-availability configuration | |
| CN108769170A (zh) | 一种集群网络故障自检系统及方法 | |
| CN114868117A (zh) | 通过控制总线进行的对等存储设备消息传送 | |
| CN109947586A (zh) | 一种隔离故障设备的方法、装置和介质 | |
| CN110275793A (zh) | 一种用于MongoDB数据分片集群的检测方法及设备 | |
| CN114826962A (zh) | 一种链路故障检测方法、装置、设备及机器可读存储介质 | |
| CN112764956B (zh) | 数据库的异常处理系统、数据库的异常处理方法及装置 | |
| CN108512753B (zh) | 一种集群文件系统中消息传输的方法及装置 | |
| WO2024113818A1 (zh) | 交换机的复位系统及方法、非易失性可读存储介质、电子设备 | |
| CN115686951A (zh) | 一种数据库服务器的故障处理方法和装置 | |
| CN112612653B (zh) | 一种业务恢复方法、装置、仲裁服务器以及存储系统 | |
| CN116932274B (zh) | 异构计算系统和服务器系统 | |
| CN115705261A (zh) | 内存故障的修复方法、cpu、os、bios及服务器 | |
| CN113868058A (zh) | 一种外设组件高速互联设备故障检测方法、装置及服务器 | |
| CN115220937A (zh) | 存储管理的方法、电子设备和程序产品 | |
| US10599510B2 (en) | Computer system and error isolation method | |
| US8024604B2 (en) | Information processing apparatus and error processing | |
| TW201510995A (zh) | 維護電腦系統之檔案系統的方法 |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20853099 Country of ref document: EP Kind code of ref document: A1 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 20853099 Country of ref document: EP Kind code of ref document: A1 |
