CN117311634A - Method and device for processing disk exception of SAS (serial attached small computer system interface) storage system - Google Patents

Method and device for processing disk exception of SAS (serial attached small computer system interface) storage system Download PDF

Info

Publication number
CN117311634A
CN117311634A CN202311293653.9A CN202311293653A CN117311634A CN 117311634 A CN117311634 A CN 117311634A CN 202311293653 A CN202311293653 A CN 202311293653A CN 117311634 A CN117311634 A CN 117311634A
Authority
CN
China
Prior art keywords
storage device
sas storage
disk
application layer
layer software
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202311293653.9A
Other languages
Chinese (zh)
Other versions
CN117311634B (en
Inventor
刘亿民
汪宏志
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuxi Zhongxing Microsystem Technology Co ltd
Original Assignee
Wuxi Zhongxing Microsystem Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuxi Zhongxing Microsystem Technology Co ltd filed Critical Wuxi Zhongxing Microsystem Technology Co ltd
Priority to CN202311293653.9A priority Critical patent/CN117311634B/en
Publication of CN117311634A publication Critical patent/CN117311634A/en
Application granted granted Critical
Publication of CN117311634B publication Critical patent/CN117311634B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0706Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment
    • G06F11/0727Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment in a storage system, e.g. in a DASD or network based storage system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0793Remedial or corrective actions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0604Improving or facilitating administration, e.g. storage management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0646Horizontal data movement in storage systems, i.e. moving data in between storage devices or systems
    • G06F3/0652Erasing, e.g. deleting, data cleaning, moving of data to a wastebasket
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0655Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
    • G06F3/0656Data buffering arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0655Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
    • G06F3/0659Command handling arrangements, e.g. command buffers, queues, command scheduling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0683Plurality of storage devices
    • G06F3/0689Disk arrays, e.g. RAID, JBOD

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Quality & Reliability (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

The invention provides a method and a device for processing disk exceptions of an SAS storage system, wherein the method comprises the following steps: detecting the state of an SAS storage device, and when the condition of initiating an SAS storage device termination instruction is met, issuing the SAS storage device termination instruction to the HBA through application layer software; and responding to the HBA receiving the SAS storage device termination instruction, emptying an I/O buffer corresponding to the SAS storage device, recovering hardware resources of the I/O corresponding to the SAS storage device, and returning an execution result of the SAS storage device termination instruction to the application layer software. The technical scheme of the invention actively completes the resource recovery of all residual I/O of the target disk or disk group by hardware so as to rapidly complete the abnormal processing of the disk, improve the robustness and simplify the software operation flow.

Description

Method and device for processing disk exception of SAS (serial attached small computer system interface) storage system
Technical Field
The invention belongs to the field of disk storage design, and particularly relates to a method for processing disk abnormality of an SAS storage system.
Background
In an SAS (serial attached SCSI) storage system architecture, an SAS I/O controller is matched with software to complete transactions such as disk management, I/O management and transceiving, SAS protocol processing, and the like of a storage system. The SAS I/O controller may be implemented on an HBA/Raid card. The HBA (Host Bus Adapter) and RAID (redundant array of independent disks) are usually used as a bridge for connection between a Host and a peripheral storage device, so that the number of connections of the peripheral storage device of the server can be expanded, conversion between different interface protocols can be supported, functions of the server system are enriched, and meanwhile, application scenarios of the diversified system are satisfied. A typical SAS storage system is shown in fig. 1. The Expander in the disk array system is responsible for the functions of route switching and service forwarding.
As shown in FIG. 2, in a SAS storage system, the SAS I/O controller issues I/O to disk from the beginning of a software build command through the SAS controller, and then returns a response from disk back to the software, the entire data path requiring processing through multiple stages of modules. In order to improve performance, SAS I/O is designed concurrently, so that a plurality of processing links on an SAS I/O data path are all provided with I/O buffers for storing I/O information in execution, including a disk side, and there are a plurality of I/O instructions to be executed. At the same time, all disk device information in the system is also managed in the SAS controller.
During SAS I/O execution, anomalies may occur due to link anomalies or other reasons. In the SAS protocol, two instructions, that is, an abart task and an abart task set, are defined. When an abnormality occurs in a certain I/O, the software can execute the abnormality processing through the record task, when the disk receives the record task instruction, the residual abnormal target instruction in the disk is cleared, each link on the I/O path also needs to execute the clearing action of the abnormal target instruction, and then the record task is responded to the application layer software; when an abnormality occurs in a certain task set of the disk, the abnormality processing is executed through the record task set, and the processing flow is similar to the record task, except for all corresponding I/Os in the aimed disk task set. In this process, related resources corresponding to the abnormal I/O need to be recovered, including corresponding IPTT number (Initiator port transfer tag, i.e. Command ID, a flag indicating I/O), context cache, I/O cache, and other resources, and finally normal access to the disk is recovered.
However, a precondition that the exception handling must be satisfied by the two instructions described above is that the disk is also capable of normal communication with the HBA card. If a disk pulling or disk changing action occurs, the recovery of the I/O resources cannot be performed through the abart task and the abart task set. Thus, when performing a disk hot plug or a disk change operation, the software needs to perform some operations to meet two key requirements. First, it is desirable to avoid the issue of old I/Os onto new disks. Second, all I/O resources of the original disk cached on the data path need to be reclaimed.
For the first requirement, in the prior art, after the software detects that the disk is dropped, the target disk is set to be invalid so as to avoid that the I/O of the old disk is continuously issued onto the link. For the second requirement, the conventional SAS protocol does not define an instruction for recovering disk I/O, and it is common practice to wait for the I/O on the link to be completed according to the normal flow execution, and complete I/O recovery by means of natural draining. However, this method has the following disadvantages. First, the efficiency is too low, waiting for the I/O to drain naturally is long and unpredictable. Second, the I/O on the disk side may not be reclaimed naturally, requiring waiting for the I/O to timeout. Fig. 3 shows the above conventional process flow.
In view of the above, when the software processes the exception, the process flow will be relatively complex and passive.
Disclosure of Invention
The invention aims to provide a method and a device for processing disk abnormality of an SAS storage system, and aims to solve the problems of low I/O recovery efficiency and complex processing flow under the condition of disk abnormality.
According to a first aspect of the present invention, there is provided a method for processing disk exceptions of an SAS storage system, including:
detecting the state of an SAS storage device, and when the condition of initiating an SAS storage device termination instruction is met, issuing the SAS storage device termination instruction to the HBA through application layer software;
and responding to the HBA receiving the SAS storage device termination instruction, emptying an I/O buffer corresponding to the SAS storage device, recovering hardware resources of the I/O corresponding to the SAS storage device, and returning an execution result of the SAS storage device termination instruction to the application layer software.
Preferably, the SAS storage device termination instruction is a target disk termination instruction, and the condition for initiating SAS storage device termination instruction includes any one of:
when an abnormality occurs in a target disk, the application layer software needs to execute the function operation on all the I/Os corresponding to the issued target disk;
when application layer software needs to delete the target disk from the storage system; or (b)
When application layer software needs to perform a disc change operation.
Preferably, the flushing the I/O buffer corresponding to the SAS storage device and recovering the hardware resource of the I/O corresponding to the SAS storage device further includes:
scanning an internal cache of an SAS controller, clearing all I/Os of the target disk, and recovering hardware resources corresponding to each I/O of the target disk;
responding all I/O states of the target disk to application layer software to inform the application layer software to recover software resources corresponding to all I/Os of the target disk;
and judging whether all the I/Os of the target disk are emptied, if not, returning to continue to execute the emptying operation.
Preferably, the SAS storage device termination instruction is a target port termination instruction, and the condition for initiating a SAS storage device termination instruction comprises any one of:
when the target port is abnormal, the application layer software needs to terminate execution and recovery operation on all the I/Os corresponding to all the disks under the issued target port;
when application layer software needs to delete the target port from the storage system; or (b)
When the application layer software needs to perform a port switch operation.
Preferably, the flushing the I/O buffer corresponding to the SAS storage device and recovering the hardware resource of the I/O corresponding to the SAS storage device further includes:
scanning an internal cache of an SAS controller, clearing corresponding I/Os of all disks under the target port, and recovering hardware resources corresponding to each I/O of all disks under the target port;
responding all I/O states of the target port to application layer software to inform the application layer software to recover software resources corresponding to all I/Os of all disks under the target port;
and judging whether all the I/Os under the target port are emptied, and if not, returning to continue to execute the emptying operation.
According to a second aspect of the present invention, there is provided a device for processing disk abnormality of SAS storage system, comprising:
a termination instruction issuing unit, configured to detect a state of an SAS storage device, and issue, when a condition for initiating a termination instruction of the SAS storage device is satisfied, the termination instruction of the SAS storage device to the HBA through application layer software;
the abnormal processing unit is used for responding to the HBA receiving the SAS storage device termination instruction, clearing the I/O buffer corresponding to the SAS storage device, recovering the hardware resources of the I/O corresponding to the SAS storage device, and returning the execution result of the SAS storage device termination instruction to the application layer software.
Preferably, the SAS storage device termination instruction is a target disk termination instruction, and the condition for initiating SAS storage device termination instruction includes any one of:
when an abnormality occurs in a target disk, the application layer software needs to execute the function operation on all the I/Os corresponding to the issued target disk;
when application layer software needs to delete the target disk from the storage system; or (b)
When application layer software needs to perform a disc change operation.
Preferably, the exception handling unit is further configured to:
scanning an internal cache of an SAS controller, clearing all I/Os of the target disk, and recovering hardware resources corresponding to each I/O of the target disk;
responding all I/O states of the target disk to application layer software to inform the application layer software to recover software resources corresponding to all I/Os of the target disk;
and judging whether all the I/Os of the target disk are emptied, if not, returning to continue to execute the emptying operation.
Preferably, the SAS storage device termination instruction is a target port termination instruction, and the condition for initiating a SAS storage device termination instruction comprises any one of:
when the target port is abnormal, the application layer software needs to terminate execution and recovery operation on all the I/Os corresponding to all the disks under the issued target port;
when application layer software needs to delete the target port from the storage system; or (b)
When the application layer software needs to perform a port switch operation.
Preferably, the exception handling unit is further configured to:
scanning an internal cache of an SAS controller, clearing corresponding I/Os of all disks under the target port, and recovering hardware resources corresponding to each I/O of all disks under the target port;
responding all I/O states of the target port to application layer software to inform the application layer software to recover software resources corresponding to all I/Os of all disks under the target port;
and judging whether all the I/Os under the target port are emptied, and if not, returning to continue to execute the emptying operation.
Compared with the prior art, the technical scheme of the invention improves the abnormal processing time of the I/O based on the disk or the port from the second level to the maximum millisecond level by the self-defining hardware processing modes of abort local by disk and abort local by port, so that the abnormal processing efficiency is obviously improved, the I/O of the target equipment can be ensured to be completely cleaned, the abnormal I/O is not required to be processed by the modes of overtime and scanning inquiry, and the software processing flow is obviously simplified.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention may be realized and attained by the structure and process particularly pointed out in the written description and claims hereof as well as the appended drawings.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic diagram of a typical disk array architecture according to the prior art.
Fig. 2 is a schematic diagram of a SAS storage system datapath in accordance with the prior art.
Figure 3 is a schematic diagram of a typical HBA disk exception handling process according to the prior art.
Fig. 4 is a main flowchart of a method for processing disk exceptions of an SAS storage system according to the present invention.
FIG. 5 is a schematic diagram of a disk termination instruction execution process according to the present invention.
FIG. 6 is a schematic diagram of a port termination instruction execution process according to the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which are derived by a person skilled in the art from the embodiments according to the invention without creative efforts, fall within the protection scope of the invention.
Based on the analysis, the invention provides a design and implementation scheme of a high-performance SAS HBA, the idea of protocol definition is used for referring to, on the basis of the SAS protocol, two new instructions, abort local by disk and abort local by port are customized, the HBA hardware actively completes the resource recovery of all residual I/Os of a single target disk through the customized abort local by disk instruction, the processing flow is accelerated, so that the abnormal processing of disk pulling, disk changing and disk changing groups is completed quickly, the robustness of the abnormal processing is improved, and the operation flow of application layer software is simplified. For a set of disks mounted under a port, the resource reclamation of all the residual I/Os of the set of disks can be performed in batch form by custom abort local by port instructions.
Referring to the flowchart of fig. 4, the method for processing disk exceptions of the SAS storage system provided by the invention includes:
step 101: detecting the state of the SAS storage device, and when the condition of initiating the SAS storage device termination instruction is met, issuing the SAS storage device termination instruction to the HBA through application layer software.
Wherein the SAS storage device comprises a target disk or target port. Correspondingly, SAS storage device termination instructions include a target disk termination instruction (abort local by disk) or a target port termination instruction (abort local by port).
When the application layer software needs to execute the following operations, the exception handling actions of all I/Os of the fast and safe target disk can be completed through a custom abort local by disk instruction:
1.1 When an abnormality occurs in a certain disk, the application layer software needs to execute the function operation on all the issued I/Os of the disk;
1.2 When the application layer software needs to delete the disk from the storage system;
1.3 When the application layer software needs to perform a disc change operation.
Alternatively, when the application layer software needs to perform the following operations, the exception handling actions of all the disks and devices under the fast and secure target port can be completed through the custom abort local by port instruction:
2.1 When an exception occurs in the cable of a port, such as a failure or a specific cause, a termination execution and reclamation operation of all disk I/os under the port needs to be performed;
2.2 When application layer software needs to delete a port from the storage system;
2.3 When application layer software needs to switch a port.
Step 102: and responding to the HBA receiving the SAS storage device termination instruction, emptying an I/O buffer corresponding to the SAS storage device, recovering hardware resources of the I/O corresponding to the SAS storage device, and returning an execution result of the SAS storage device termination instruction to the application layer software.
Specifically, in the case where the SAS storage device is a target disk and the SAS storage device termination instruction is a target disk termination instruction, when the application layer software determines that any of the operations 1.1-1.3 above needs to be performed, an initialization operation is first performed before the abort local by disk instruction, including stopping issuing the I/O of the target disk, and placing the target disk in an invalid state. Then, abort local by disk instructions are issued to the HBA.
The execution flow of the HBA to Abort local by disk instruction is shown in fig. 5, and specifically includes:
step A1, firstly scanning an internal cache of the SAS controller, clearing all I/Os of a target disk (A disk), and recovering hardware resources corresponding to each I/O of the target disk, instead of continuing to dispatch the I/Os in the cache.
And step A2, responding all I/O states of the target disk to the application layer software to inform the application layer software to recover the software resources corresponding to each I/O.
Step A3, judging whether all I/Os of the target disk are emptied, if not, returning to the step A1 to continue to execute the emptying operation; if yes, the Abort local by disk instruction is responded to the application layer software, the application layer software is informed that the resources of the disk can be logged off, and the exception handling flow is ended.
The abort local by disk instruction can be expressed by a software and hardware convention and uses a specific operation code, so as to instruct the HBA hardware to execute the local command recycling operation according to the disk. The instruction does not need to be issued to any target disk device, but also needs to be distributed with IPTT numbers, so that the instruction can be conveniently indicated when the instruction is responded to application layer software; while the target disk number to be subjected to the reclamation operation needs to be specified.
In an alternative embodiment, where the SAS storage device is a destination port and the SAS storage device termination instruction is a destination port termination instruction, when the application layer software determines that any of the above operations 2.1-2.3 need to be performed, an initialization operation is first performed prior to the abort local by port instruction, including stopping issuing the I/O of the destination port and placing the destination port in an inactive state. Then, abort local by port instructions are issued to the HBA.
The process flow of HBA pair Abort local by port is shown in fig. 6, and specifically includes:
step B1, firstly, scanning an internal cache of the SAS controller, clearing corresponding I/Os of all disk devices under a target port (A port), and recovering hardware resources corresponding to each I/O of all disks under the target port, instead of continuing to dispatch the I/Os in the cache.
And B2, responding all I/O states of the target port to the application layer software to inform the application layer software to recover the software resources corresponding to each I/O.
Step B3, judging whether all I/Os of the target port are emptied, if not, returning to the step B1 to continue to execute the emptying operation; if yes, the Abort local by port instruction is responded to the application layer software, the application layer software is informed that the resources of the port can be logged off, and the exception handling flow is ended.
Wherein the abort local by port instruction may be represented by a software and hardware convention, using a specific opcode, thereby instructing the HBA hardware to perform a per-port local command reclamation operation. The instruction does not need to be issued to any target disk device, but also needs to be distributed with IPTT numbers, so that the instruction can be conveniently indicated when the instruction is responded to application layer software; while the destination port number on which the reclamation operation is to be performed needs to be specified.
Therefore, the method for processing the disk abnormality of the SAS storage system provided by the invention has the following advantages:
firstly, by means of hardware processing modes of abort local by disk and abort local by Port, the abnormal processing time of I/O based on a disk or a Port is increased from the second level to the maximum millisecond level, so that the abnormal processing efficiency is remarkably improved, and the hardware can ensure that the I/O of the target equipment is completely cleaned as long as the hardware answers abort local by disk and abort local by Port instructions. Because the software judges that the I/O timeout time is generally designed to be 10-30 seconds, under the condition that the scheme of the invention is not adopted, the software at least needs to wait for a processing period exceeding 1I/O timeout time to basically judge that the I/O of the target equipment is completely cleaned.
Secondly, through a hardware processing mode, the complete cleaning of the I/O to be processed can be ensured, so that the scheme of the invention has better robustness. While there may be situations where processing is incomplete by way of conventional schemes waiting for an I/O timeout.
Third, the software only needs to issue the customized abort local by disk and abort local by port instructions and wait for the response of the customized instructions, so that the corresponding exception handling can be completed, and the exception I/O does not need to be handled in a timeout and scanning inquiry mode, thereby remarkably simplifying the software processing flow.
Accordingly, in a second aspect, the present invention provides a device for processing disk exceptions of an SAS storage system, including:
a termination instruction issuing unit, configured to detect a state of an SAS storage device, and issue, when a condition for initiating a termination instruction of the SAS storage device is satisfied, the termination instruction of the SAS storage device to the HBA through application layer software;
the abnormal processing unit is used for responding to the HBA receiving the SAS storage device termination instruction, clearing the I/O buffer corresponding to the SAS storage device, recovering the hardware resources of the I/O corresponding to the SAS storage device, and returning the execution result of the SAS storage device termination instruction to the application layer software.
The above device may be implemented by using the SAS storage system disk exception handling method provided by the embodiment of the foregoing aspect, and a specific implementation manner may refer to description in the embodiment of the SAS storage system disk exception handling method, which is not described herein.
It will be appreciated that the number of data or command transfer processes, element topologies and functional modules described in the above embodiments are merely examples. Those skilled in the art may also readily devise combinations and adaptations of the structural features of the above embodiments or adaptation of the order or parameters of the individual steps of the above described method flows according to the needs of the use without limiting the inventive concept to the specific structures and steps illustrated above.
While the invention has been described in detail with reference to the foregoing embodiments, it will be appreciated by those skilled in the art that variations may be made in the techniques described in the foregoing embodiments, or equivalents may be substituted for elements thereof; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims (10)

1. The method for processing the disk abnormality of the SAS storage system is characterized by comprising the following steps:
detecting the state of an SAS storage device, and when the condition of initiating an SAS storage device termination instruction is met, issuing the SAS storage device termination instruction to the HBA through application layer software;
and responding to the HBA receiving the SAS storage device termination instruction, emptying an I/O buffer corresponding to the SAS storage device, recovering hardware resources of the I/O corresponding to the SAS storage device, and returning an execution result of the SAS storage device termination instruction to the application layer software.
2. The SAS storage system disk exception handling method of claim 1 wherein the SAS storage device termination instruction is a target disk termination instruction and the condition to initiate a SAS storage device termination instruction comprises any one of:
when an abnormality occurs in a target disk, the application layer software needs to execute the function operation on all the I/Os corresponding to the issued target disk;
when application layer software needs to delete the target disk from the storage system; or (b)
When application layer software needs to perform a disc change operation.
3. The method for processing a disk exception in a SAS storage system of claim 2 wherein said flushing the I/O cache corresponding to the SAS storage device and recovering the hardware resources of the I/O corresponding to the SAS storage device further comprises:
scanning an internal cache of an SAS controller, clearing all I/Os of the target disk, and recovering hardware resources corresponding to each I/O of the target disk;
responding all I/O states of the target disk to application layer software to inform the application layer software to recover software resources corresponding to all I/Os of the target disk;
and judging whether all the I/Os of the target disk are emptied, if not, returning to continue to execute the emptying operation.
4. The SAS storage system disk exception handling method of claim 1 wherein the SAS storage device termination instruction is a target port termination instruction and the condition for initiating a SAS storage device termination instruction comprises any one of:
when the target port is abnormal, the application layer software needs to terminate execution and recovery operation on all the I/Os corresponding to all the disks under the issued target port;
when application layer software needs to delete the target port from the storage system; or (b)
When the application layer software needs to perform a port switch operation.
5. The method for processing a disk exception in a SAS storage system of claim 4 wherein said flushing the I/O cache corresponding to the SAS storage device and recovering hardware resources of the I/O corresponding to the SAS storage device further comprises:
scanning an internal cache of an SAS controller, clearing corresponding I/Os of all disks under the target port, and recovering hardware resources corresponding to each I/O of all disks under the target port;
responding all I/O states of the target port to application layer software to inform the application layer software to recover software resources corresponding to all I/Os of all disks under the target port;
and judging whether all the I/Os under the target port are emptied, and if not, returning to continue to execute the emptying operation.
6. A SAS storage system disk exception handling apparatus comprising:
a termination instruction issuing unit, configured to detect a state of an SAS storage device, and issue, when a condition for initiating a termination instruction of the SAS storage device is satisfied, the termination instruction of the SAS storage device to the HBA through application layer software;
the abnormal processing unit is used for responding to the HBA receiving the SAS storage device termination instruction, clearing the I/O buffer corresponding to the SAS storage device, recovering the hardware resources of the I/O corresponding to the SAS storage device, and returning the execution result of the SAS storage device termination instruction to the application layer software.
7. The SAS storage system disk exception handling apparatus of claim 6 wherein the SAS storage device termination instruction is a target disk termination instruction and the condition to initiate a SAS storage device termination instruction comprises any one of:
when an abnormality occurs in a target disk, the application layer software needs to execute the function operation on all the I/Os corresponding to the issued target disk;
when application layer software needs to delete the target disk from the storage system; or (b)
When application layer software needs to perform a disc change operation.
8. The SAS storage system disk exception handling apparatus of claim 7 wherein the exception handling unit is further to:
scanning an internal cache of an SAS controller, clearing all I/Os of the target disk, and recovering hardware resources corresponding to each I/O of the target disk;
responding all I/O states of the target disk to application layer software to inform the application layer software to recover software resources corresponding to all I/Os of the target disk;
and judging whether all the I/Os of the target disk are emptied, if not, returning to continue to execute the emptying operation.
9. The SAS storage system disk exception handling apparatus of claim 6 wherein the SAS storage device termination instruction is a target port termination instruction and the condition to initiate a SAS storage device termination instruction comprises any one of:
when the target port is abnormal, the application layer software needs to terminate execution and recovery operation on all the I/Os corresponding to all the disks under the issued target port;
when application layer software needs to delete the target port from the storage system; or (b)
When the application layer software needs to perform a port switch operation.
10. The SAS storage system disk exception handling apparatus of claim 9 wherein the exception handling unit is further to:
scanning an internal cache of an SAS controller, clearing corresponding I/Os of all disks under the target port, and recovering hardware resources corresponding to each I/O of all disks under the target port;
responding all I/O states of the target port to application layer software to inform the application layer software to recover software resources corresponding to all I/Os of all disks under the target port;
and judging whether all the I/Os under the target port are emptied, and if not, returning to continue to execute the emptying operation.
CN202311293653.9A 2023-10-08 2023-10-08 Method and device for processing disk exception of SAS (serial attached small computer system interface) storage system Active CN117311634B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311293653.9A CN117311634B (en) 2023-10-08 2023-10-08 Method and device for processing disk exception of SAS (serial attached small computer system interface) storage system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311293653.9A CN117311634B (en) 2023-10-08 2023-10-08 Method and device for processing disk exception of SAS (serial attached small computer system interface) storage system

Publications (2)

Publication Number Publication Date
CN117311634A true CN117311634A (en) 2023-12-29
CN117311634B CN117311634B (en) 2024-04-12

Family

ID=89284499

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311293653.9A Active CN117311634B (en) 2023-10-08 2023-10-08 Method and device for processing disk exception of SAS (serial attached small computer system interface) storage system

Country Status (1)

Country Link
CN (1) CN117311634B (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102103518A (en) * 2011-02-23 2011-06-22 运软网络科技(上海)有限公司 System for managing resources in virtual environment and implementation method thereof
CN102984002A (en) * 2012-11-27 2013-03-20 华为技术有限公司 Method and device for processing input/output (I/O) overtime
CN105760261A (en) * 2014-12-16 2016-07-13 华为技术有限公司 Business IO (input/output) processing method and device
CN106325761A (en) * 2015-06-29 2017-01-11 中兴通讯股份有限公司 Storage resource management system and method
US20170199667A1 (en) * 2016-01-08 2017-07-13 Oracle International Corporation System and method for scalable processing of abort commands in a host bus adapter system
CN107203451A (en) * 2016-03-18 2017-09-26 伊姆西公司 Method and apparatus for handling failure within the storage system
CN107422989A (en) * 2017-07-27 2017-12-01 深圳市云舒网络技术有限公司 A kind of more copy read methods of Server SAN systems and storage architecture
US10223224B1 (en) * 2016-06-27 2019-03-05 EMC IP Holding Company LLC Method and system for automatic disk failure isolation, diagnosis, and remediation
CN116010142A (en) * 2022-12-19 2023-04-25 浙江大华技术股份有限公司 Method, system and device for processing abnormal IO of storage device

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102103518A (en) * 2011-02-23 2011-06-22 运软网络科技(上海)有限公司 System for managing resources in virtual environment and implementation method thereof
CN102984002A (en) * 2012-11-27 2013-03-20 华为技术有限公司 Method and device for processing input/output (I/O) overtime
CN105760261A (en) * 2014-12-16 2016-07-13 华为技术有限公司 Business IO (input/output) processing method and device
CN106325761A (en) * 2015-06-29 2017-01-11 中兴通讯股份有限公司 Storage resource management system and method
US20170199667A1 (en) * 2016-01-08 2017-07-13 Oracle International Corporation System and method for scalable processing of abort commands in a host bus adapter system
CN107203451A (en) * 2016-03-18 2017-09-26 伊姆西公司 Method and apparatus for handling failure within the storage system
US10223224B1 (en) * 2016-06-27 2019-03-05 EMC IP Holding Company LLC Method and system for automatic disk failure isolation, diagnosis, and remediation
CN107422989A (en) * 2017-07-27 2017-12-01 深圳市云舒网络技术有限公司 A kind of more copy read methods of Server SAN systems and storage architecture
CN116010142A (en) * 2022-12-19 2023-04-25 浙江大华技术股份有限公司 Method, system and device for processing abnormal IO of storage device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
韩德志, 王二梅, 宋卫国: "SCSI总线技术概述", 信阳师范学院学报(自然科学版), no. 03, 10 July 2000 (2000-07-10), pages 360 - 365 *

Also Published As

Publication number Publication date
CN117311634B (en) 2024-04-12

Similar Documents

Publication Publication Date Title
CN109815043B (en) Fault processing method, related equipment and computer storage medium
US7600157B2 (en) Recovering from a failed I/O controller in an information handling system
JP4723290B2 (en) Disk array device and control method thereof
US20060089975A1 (en) Online system recovery system, method and program
CN101635652B (en) Method and equipment for recovering fault of multi-core system
JPS58119072A (en) Multiprocessor system
CN100370756C (en) Reset processing method and device for system
US8073993B2 (en) Management of redundant physical data paths in a computing system
JP2006285810A (en) Cluster configuration computer system and system reset method therefor
JP2005242404A (en) Method for switching system of computer system
WO2021048658A1 (en) Link speed recovery in a data storage system
CN101488105B (en) Method for implementing high availability of memory double-controller and memory double-controller system
CN117311634B (en) Method and device for processing disk exception of SAS (serial attached small computer system interface) storage system
US6948015B2 (en) Storage control device
US8589598B2 (en) Management of redundant physical data paths in a computing system
US20140143472A1 (en) Method to Facilitate Fast Context Switching for Partial and Extended Path Extension to Remote Expanders
JP6777848B2 (en) Control device and storage device
CN102467218A (en) Method for turning off power
CN105530120A (en) Service processing method, controller and service processing system
JP2002312333A (en) Multiprocessor initializing/parallel diagnosing method
JP2002014878A (en) Computer system and its maintenance method
JP2021002125A (en) Management device, information processing system and management program
JP2513122B2 (en) Hot standby switching system
JPS62271158A (en) Error statistic processing system
JP2022007301A (en) Recovery control device and recovery control method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant