WO2024082834A1 - 盘仲裁区域检测方法、装置、设备及非易失性可读存储介质 - Google Patents

盘仲裁区域检测方法、装置、设备及非易失性可读存储介质 Download PDF

Info

Publication number
WO2024082834A1
WO2024082834A1 PCT/CN2023/115982 CN2023115982W WO2024082834A1 WO 2024082834 A1 WO2024082834 A1 WO 2024082834A1 CN 2023115982 W CN2023115982 W CN 2023115982W WO 2024082834 A1 WO2024082834 A1 WO 2024082834A1
Authority
WO
WIPO (PCT)
Prior art keywords
disk
arbitration
target
response
arbitration area
Prior art date
Application number
PCT/CN2023/115982
Other languages
English (en)
French (fr)
Inventor
刘晨曦
苑忠科
王见
Original Assignee
苏州元脑智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 苏州元脑智能科技有限公司 filed Critical 苏州元脑智能科技有限公司
Publication of WO2024082834A1 publication Critical patent/WO2024082834A1/zh

Links

Classifications

    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C29/00Checking stores for correct operation ; Subsequent repair; Testing stores during standby or offline operation
    • G11C29/56External testing equipment for static stores, e.g. automatic test equipment [ATE]; Interfaces therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/22Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing
    • G06F11/2205Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing using arrangements specific to the hardware being tested
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/22Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing
    • G06F11/2273Test methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • the present application relates to the field of computer technology, and in particular to a disk arbitration area detection method, device, equipment and non-volatile readable storage medium.
  • an arbitration disk needs to be set up in the storage system to record arbitration information. Based on the arbitration information, the master node in the storage system can be easily determined, or the storage system can be split into multiple nodes.
  • the RAID (Redundant Arrays of Independent Disks) member disks in the storage system can be tested to determine which disks can be selected as arbitration disks based on the health status of the detected disks.
  • the system needs to undertake both IO (Input Output) business and testing tasks, so the overall system load may be large, which will reduce the overall system performance.
  • the purpose of the embodiments of the present application is to provide a disk arbitration area detection method, device, equipment and non-volatile readable storage medium to balance the pressure of disk detection tasks and IO business on the storage system.
  • the detailed scheme is as follows:
  • an embodiment of the present application provides a method for detecting a disk arbitration area, comprising:
  • arbitration area failure mark information is added to the target disk.
  • determining a target node for detecting a target disk based on a preset load balancing strategy includes:
  • a target node for detecting the target disk is determined according to the corresponding relationship.
  • the process of building the corresponding relationship includes:
  • using the target node to detect the response of the arbitration area in the target disk to the IO includes:
  • the method before determining that the response of the arbitration area to the IO is an IO error message, the method further includes:
  • a repair operation is performed on the current continuous address. If the current continuous address repair fails, a step of determining that the arbitration area's response to the IO is an IO error message is performed.
  • a repair operation is performed on the current continuous address, including:
  • it also includes:
  • the target node is used to read the data stored in the next continuous address adjacent to the current continuous address, and determine whether the data is read successfully.
  • it also includes:
  • the next detection time point is waited for so that the next disk can be traversed when the next detection time point is reached.
  • it also includes:
  • the response is an IO error message and the target disk is a system arbitration disk
  • a failure prompt message of the system arbitration disk is generated.
  • the method further includes:
  • reselecting a system arbitration disk based on the tag information of each disk includes:
  • Disks without tag information are taken as candidate disks, and a system arbitration disk is selected from the candidate disks based on the arbitration disk selection strategy.
  • using the target node to detect the response of the arbitration area in the target disk to the IO includes:
  • the processing result of the arbitration area for the read request and/or the write request is used as the response of the arbitration area to the IO.
  • the arbitration area in the target disk is used to store arbitration data; when the target disk is not selected as the system arbitration disk, no data is stored in the arbitration area in the target disk.
  • the arbitration area is located at the end of the storage space of the target disk.
  • using the processing result of the arbitration area for the read request and/or the write request as the response of the arbitration area to the IO includes:
  • the processing result of the arbitration region for the read request is that there is a problem with the response to the read IO or the processing result of the write request is that there is a problem with the response to the write IO, it is determined that the response of the arbitration region to the IO is an IO error message.
  • traverse each disk in the multi-controller storage system including:
  • the timer is set to indicate the interval between detection tasks of the same node for different disks.
  • an embodiment of the present application provides a disk arbitration area detection device, comprising:
  • the traversal module is configured to traverse each disk in the multi-controller storage system, and if the target disk is currently traversed, a target node for detecting the target disk is determined based on a preset load balancing strategy; wherein the target node is any node in the multi-controller storage system;
  • the detection module is configured to detect the response of the arbitration area in the target disk to the IO by using the target node;
  • the marking module is configured to add marking information of arbitration area failure to the target disk if the response is an IO error message and the target disk is not a system arbitration disk.
  • the traversal module is set to:
  • the corresponding relationship between each node in the multi-controller storage system and the disk to be detected is obtained from the preset load balancing strategy; and the target node for detecting the target disk is determined according to the corresponding relationship.
  • the process of building the corresponding relationship includes:
  • All disks in the multi-controller storage system are evenly distributed to each node in the multi-controller storage system to obtain a distribution result; and a corresponding relationship is established according to the distribution result.
  • the detection module includes:
  • the read IO detection unit is configured to use the target node to read the data stored in any segment of continuous addresses in the arbitration area; if the data reading fails, it is determined that the response of the arbitration area to the IO is an IO error message.
  • the detection module further includes:
  • the repair unit is configured to perform a repair operation on the current continuous address before determining that the arbitration area's response to the IO is an IO error message. If the current continuous address repair fails, the step of determining that the arbitration area's response to the IO is an IO error message is executed.
  • the repair unit is set to:
  • the randomly generated data is written to the current continuous address; the randomly generated data is read from the current continuous address; if the read randomly generated data is consistent with the written randomly generated data, it is determined that the current continuous address is repaired successfully; otherwise, it is determined that the current continuous address is repaired unsuccessfully.
  • the detection module further includes:
  • the loop unit is configured to use the target node to read the data stored in the next segment of continuous addresses adjacent to the current continuous address if the data is read successfully or the current continuous address is repaired successfully, when the tail address of the current continuous address is not the end address of the arbitration area, and determine whether the data is read successfully.
  • it also includes:
  • the timing module is configured to wait for the next detection time point if an offline message of the target disk is obtained when reading data or the tail address of the current continuous address is the end address of the arbitration area, so as to traverse to the next disk when the next detection time point is reached.
  • it also includes:
  • the prompt module is configured to generate a fault prompt message of the system arbitration disk if the response is an IO error message and the target disk is a system arbitration disk.
  • it also includes:
  • the arbitration disk replacement module is configured to collect the mark information of each disk in the multi-controller storage system after generating a fault prompt message of the system arbitration disk; and reselect the system arbitration disk based on the mark information of each disk.
  • the arbitration disk replacement module is configured to:
  • Disks without tag information are taken as candidate disks, and a system arbitration disk is selected from the candidate disks based on the arbitration disk selection strategy.
  • the detection module is configured to:
  • the processing result of the arbitration area for the read request and/or the write request is used as the response of the arbitration area to the IO.
  • an electronic device including:
  • a memory arranged to store a computer program
  • the processor is configured to execute a computer program to implement the above-disclosed disk arbitration area detection method.
  • an embodiment of the present application provides a non-volatile readable storage medium, which is configured to store a computer program, wherein the computer program, when executed by a processor, implements the aforementioned disk arbitration area detection method disclosed above.
  • an embodiment of the present application provides a disk arbitration area detection method, including: traversing each disk in a multi-controller storage system, and if the target disk is currently traversed, determining a target node for detecting the target disk based on a preset load balancing strategy; wherein the target node is any node in the multi-controller storage system; using the target node to detect the response of the arbitration area in the target disk to IO; if the response is an IO error message and the target disk is not a system arbitration disk, adding arbitration area failure mark information to the target disk.
  • the embodiment of the present application detects the health status of each disk by traversing each disk in the multi-control storage system. If the target disk is currently traversed, the target node for detecting the target disk is determined based on the preset load balancing strategy, so that all nodes in the system can evenly bear the detection tasks of each disk in the system, so that a certain node does not bear more detection tasks and affect the IO business on the node.
  • the target node determined by the above steps is used to detect the response of the arbitration area in the target disk to IO; if the response is an IO error message and the target disk is not the system arbitration disk, the target disk is added with the mark information of the arbitration area failure, and the arbitration disk can be selected for the system based on this mark information later.
  • the embodiment of the present application detects each disk in the system, that is to say: each disk in the system has a probability of being selected as an arbitration disk, and the selection range is not limited to the RAID member disks in the system. Because the business load pressure of non-RAID member disks in the storage system is generally small, it is more suitable to be selected as an arbitration disk.
  • the embodiment of the present application enables all nodes in the system to evenly bear the detection tasks of each disk in the system, which can balance the pressure of disk detection tasks and IO services on the storage system, and non-RAID member disks with smaller loads in the system may also be selected as arbitration disks, which can optimize the selection strategy of arbitration disks.
  • a disk arbitration area detection device, equipment and non-volatile readable storage medium provided in the embodiments of the present application also have the above-mentioned technical effects.
  • FIG1 is a flow chart of a disk arbitration area detection method disclosed in an embodiment of the present application.
  • FIG2 is a flow chart of another disk arbitration area detection method disclosed in an embodiment of the present application.
  • FIG3 is a schematic diagram of a verify IO result processing disclosed in an embodiment of the present application.
  • FIG4 is a schematic diagram of result processing of a write verify instruction disclosed in an embodiment of the present application.
  • FIG5 is a schematic diagram of a disk arbitration area detection device disclosed in an embodiment of the present application.
  • FIG. 6 is a schematic diagram of an electronic device disclosed in an embodiment of the present application.
  • the RAID member disks in the storage system can be tested to determine which disks can be selected as arbitration disks based on the detected disk health status.
  • the system needs to undertake both IO services and detection tasks, so the overall system load may be large, which will reduce the overall system performance.
  • the embodiment of the present application provides a disk arbitration area detection solution that can balance the pressure of disk detection tasks and IO services on the storage system and optimize the selection strategy of the arbitration disk.
  • the embodiment of the present application discloses a disk arbitration area detection method, comprising:
  • the multi-controller storage system includes multiple nodes and multiple disks, and all disks are visible to each node.
  • each disk can be evenly allocated to the corresponding node so that each node can evenly bear the detection task of each disk. Assuming that there are 2 nodes and 10 disks in total, each node can bear the detection task of 5 disks.
  • the target node is any node in the multi-controller storage system.
  • determining a target node for detecting a target disk based on a preset load balancing strategy includes: obtaining a correspondence between each node in a multi-controller storage system and a disk to be detected from the preset load balancing strategy; and determining a target node for detecting the target disk according to the correspondence.
  • the corresponding relationship construction process includes: evenly distributing all disks in the multi-controller storage system to each node in the multi-controller storage system to obtain a distribution result; and establishing a corresponding relationship according to the distribution result.
  • the arbitration area in the disk is used to store arbitration data. If a disk is not selected as the system arbitration disk, the arbitration area in the disk does not store any data.
  • the arbitration area can be located at the end of the storage space of each disk, for example: 512M is reserved at the end of each disk as the arbitration area. The size of the arbitration area can be adjusted according to actual conditions.
  • the purpose of detecting the response of the arbitration area to IO is to detect whether there is a problem with the read and write functions of the arbitration area.
  • the target node is used to detect the response of the arbitration area in the target disk to IO, including: using the target node to send a read request and/or a write request to the arbitration area; obtaining the processing result of the arbitration area for the read request and/or the write request; using the processing result of the arbitration area for the read request and/or the write request as the response of the arbitration area to IO.
  • the response of the arbitration area to IO includes: a response to read IO and/or a response to write IO. If there is a problem with the response to read IO or the response to write IO, it is considered that the response of the arbitration area to IO is wrong, so the response is determined to be an IO error message.
  • using the target node to detect the response of the arbitration area in the target disk to IO includes: using the target node to read the data stored in any continuous address in the arbitration area; if the data reading fails, determining that the response of the arbitration area to IO is an IO error message. If no data can be read from the current continuous address, it indicates that the current continuous address is faulty, so it is determined that the response of the arbitration area to IO is wrong.
  • determining that the arbitration area's response to IO is an IO error message, it also includes: performing a repair operation on the current continuous address, and if the current continuous address repair fails, executing the step of determining that the arbitration area's response to IO is an IO error message.
  • performing a repair operation on the current continuous address includes: writing randomly generated data to the current continuous address; reading randomly generated data from the current continuous address; if the read randomly generated data is consistent with the written randomly generated data, it is determined that the current continuous address repair is successful; otherwise, it is determined that the current continuous address repair has failed.
  • the target node is used to read the data stored in the next continuous address adjacent to the current continuous address, and it is determined whether the data is read successfully, and processing is performed based on the result of whether the data reading is successful or not.
  • the disk in the storage system may also be offline due to failures and other reasons. Therefore, if the target disk is offline when reading data or the tail address of the current continuous address is the end address of the arbitration area, wait for the next detection time point so that when the next detection time point is reached, traverse to the next disk. It can be seen that this embodiment can be used for all disks in the system. There are disks that are traversed at a fixed time. For example, after the current disk is checked, the next disk will be checked after 3 seconds.
  • the arbitration information recorded in the system arbitration disk is very important, when it is confirmed that there is a problem with the arbitration area of the system arbitration disk, it is necessary to promptly inform the system to replace the system arbitration disk. Therefore, in one implementation, if the response is an IO error message and the target disk is the system arbitration disk, a fault prompt message of the system arbitration disk is generated.
  • the method further includes: collecting tag information of each disk in the multi-control storage system; and reselecting a system arbitration disk based on the tag information of each disk. Reselecting a system arbitration disk based on the tag information of each disk includes: using a disk without tag information as a candidate disk, and selecting a system arbitration disk from each candidate disk based on an arbitration disk selection strategy to avoid problems with the arbitration area of the newly selected system arbitration disk.
  • the arbitration disk selection strategy can be formulated according to existing related technologies.
  • the disk in the system can be a SAS (serial attached SCSI, SCSI bus protocol serial standard) disk or an NVMe (Non-Volatile Memory Express, flash storage protocol) disk.
  • SAS serial attached SCSI, SCSI bus protocol serial standard
  • NVMe Non-Volatile Memory Express, flash storage protocol
  • this embodiment detects the health status of each disk by traversing each disk in the multi-control storage system. If the target disk is currently traversed, the target node for detecting the target disk is determined based on the preset load balancing strategy, so that all nodes in the system can evenly bear the detection tasks of each disk in the system, so that a certain node does not bear more detection tasks and affect the IO business on the node.
  • the target node determined by the above steps is used to detect the response of the arbitration area in the target disk to IO; if the response is an IO error message and the target disk is not the system arbitration disk, the target disk is added with the mark information of the arbitration area failure, and the arbitration disk can be selected for the system based on this mark information.
  • the embodiment of the present application detects each disk in the system, that is to say: each disk in the system has a probability of being selected as an arbitration disk, and the selection range is not limited to the RAID member disks in the system. Because the business load pressure of non-RAID member disks in the storage system is generally small, it is more suitable to be selected as an arbitration disk.
  • this embodiment enables all nodes in the system to evenly bear the detection tasks of each disk in the system, which can balance the pressure of disk detection tasks and IO services on the storage system, and non-RAID member disks with smaller loads in the system may also be selected as arbitration disks, which can optimize the selection strategy of arbitration disks.
  • this embodiment sets a disk inspection task and a timer for executing the inspection task, wherein the execution thread of the inspection task is scrub_scan fiber, and the timer is set: the detection task interval for different disks of the same node is 3 seconds; the detection interval for different disks of different nodes is also 3 seconds.
  • all managed disks in the storage system are traversed according to the disk number, and a disk is removed every 3 seconds to start the inspection of the arbitration area of the disk.
  • the inspection execution thread is scrub_scan fiber. If the disk number exists, the inspection time is up, and there is no inspection task being executed on the node used to detect the current disk, IO is sent to the arbitration area of the current disk, and the sent IO enters the queue and waits for system scheduling. If the disk number does not exist, or the inspection time is up but the determined node is not used to detect the current disk, then the next disk is inspected.
  • the following strategy is designed to select the detection node for each disk.
  • all managed disks are visible to the dual-control nodes, including nodes D0 and D1, and the disk numbers driveD of all managed disks are incremented from D0.
  • Node D0 executes the disks with even drive D
  • node D1 executes the disks with odd drive D.
  • the detection tasks of each disk are evenly shared by the two nodes D0 and D1 to avoid affecting the IO business on the node when the node detects each disk.
  • the inspection tasks of all disks are executed on this node. Therefore, for each node, it can be determined whether the current node should execute the current inspection task of the current disk. If the current node should execute the current inspection task of the current disk, it will proceed to the next step. If the current node should not execute the current inspection task of the current disk, this execution ends and waits for 3 seconds before operating the next disk.
  • the scsi verify instruction can be used to detect whether the target address of the issued IO is readable.
  • the data length read by each issued IO is 512 bytes, and the arbitration area of the disk is 512M.
  • the last->lba (tail address) of the arbitration area of the disk, the length of the data read by this IO, and the address next_lba of the next IO read can be set.
  • the IO that has been issued in the queue is processed by the thread scrub_submit fiber in the node to determine whether the data can be successfully read this time. Among them, the process of scrub_submit fiber processing IO is shown in Figure 3.
  • the thread scrub_submit fiber takes a verify IO from the queue, submits the verify IO to the specified disk, processes the IO on the disk and returns the result (i.e., the verify IO result), which is asynchronously dispatched back to scrub_scan fiber again, and scrub_scan fiber confirms the verify IO result.
  • the return result is success, it means that the target address of the current IO sent to the disk is in good condition. Then, according to the size relationship between next_lba and last->lba, it is judged whether the inspection of the disk arbitration area has been completed.
  • next->lba current read completion address + length, and execute the IO sending process again.
  • the return result it means that the disk is offline and no longer needs to be inspected, so the inspection task of the disk stops immediately.
  • the return result is other errors, it means that the target address of the current IO sent is suspected to be a bad block. If the disk is the arbitration disk of the storage system, the cluster is immediately notified to replace the arbitration disk and the inspection task of the disk is ended; if it is not the arbitration disk of the system, the write verify command is submitted to try to repair it.
  • the write verify command is used to write binary data to suspected bad blocks and compare the read and written data.
  • the bad block is repaired successfully and the verify IO request for the next block address is submitted.
  • the processing result of the write verify instruction is offline, it means that the disk is offline and no longer needs to be inspected.
  • the inspection task stops immediately.
  • the processing result of the write verify instruction is other errors, the bad block repair fails, and the disk is marked as WRITE VERIFY ERROR.
  • the cluster is notified so that the arbitration disk selection mechanism of the storage system can use this result as the selection weight.
  • this embodiment implements bad block detection of the arbitration area of all managed disks in the storage system based on the scsi verify command and the write verify command, which can timely discover bad blocks in the arbitration area of the disk, ensure the reliability of the arbitration disk data, and provide a reliable reference factor for the arbitration selection mechanism of the storage system.
  • this embodiment slides and issues the verify command in units of address blocks for the arbitration area of each disk to detect whether there is a bad block, and attempts to repair the bad block by issuing the write verify command. The detection of the current disk can be stopped immediately after a bad block is detected. The data volume of each IO is small and the detection of the entire arbitration area of the disk can be avoided, and the detection efficiency is high.
  • a disk arbitration area detection device provided in an embodiment of the present application is introduced below.
  • the disk arbitration area detection device described below and the disk arbitration area detection method described above can be referenced to each other.
  • a disk arbitration area detection device including:
  • the traversal module 501 is configured to traverse each disk in the multi-controller storage system. If the target disk is currently traversed, a target node for detecting the target disk is determined based on a preset load balancing strategy; wherein the target node is any node in the multi-controller storage system;
  • the detection module 502 is configured to detect the response of the arbitration area in the target disk to the IO by using the target node;
  • the marking module 503 is configured to add marking information of arbitration area failure to the target disk if the response is an IO error message and the target disk is not a system arbitration disk.
  • the traversal module is configured as:
  • the corresponding relationship between each node in the multi-controller storage system and the disk to be detected is obtained from the preset load balancing strategy; and the target node for detecting the target disk is determined according to the corresponding relationship.
  • the process of constructing the corresponding relationship includes:
  • All disks in the multi-controller storage system are evenly distributed to each node in the multi-controller storage system to obtain a distribution result; and a corresponding relationship is established according to the distribution result.
  • the detection module includes:
  • the read IO detection unit is configured to use the target node to read the data stored in any segment of continuous addresses in the arbitration area; if the data reading fails, it is determined that the response of the arbitration area to the IO is an IO error message.
  • the detection module further includes:
  • the repair unit is configured to perform a repair operation on the current continuous address before determining that the arbitration area's response to the IO is an IO error message. If the current continuous address repair fails, the step of determining that the arbitration area's response to the IO is an IO error message is executed.
  • the repair unit is configured to:
  • the randomly generated data is written to the current continuous address; the randomly generated data is read from the current continuous address; if the read randomly generated data is consistent with the written randomly generated data, it is determined that the current continuous address is repaired successfully; otherwise, it is determined that the current continuous address is repaired unsuccessfully.
  • the detection module further includes:
  • the loop unit is configured to use the target node to read the data stored in the next segment of continuous addresses adjacent to the current continuous address if the data is read successfully or the current continuous address is repaired successfully, when the tail address of the current continuous address is not the end address of the arbitration area, and determine whether the data is read successfully.
  • it further includes:
  • the timing module is configured to wait for the next detection time point if an offline message of the target disk is obtained when reading data or the tail address of the current continuous address is the end address of the arbitration area, so as to traverse to the next disk when the next detection time point is reached.
  • it further includes:
  • the prompt module is configured to generate a fault prompt message of the system arbitration disk if the response is an IO error message and the target disk is a system arbitration disk.
  • it further includes:
  • the arbitration disk replacement module is configured to collect the mark information of each disk in the multi-controller storage system after generating a fault prompt message of the system arbitration disk; and reselect the system arbitration disk based on the mark information of each disk.
  • the arbitration disk replacement module is configured to:
  • Disks without tag information are taken as candidate disks, and a system arbitration disk is selected from the candidate disks based on the arbitration disk selection strategy.
  • the detection module is configured to:
  • the processing result of the arbitration area for the read request and/or the write request is used as the response of the arbitration area to the IO.
  • this embodiment provides a disk arbitration area detection device, which can balance the pressure of disk detection tasks and IO services on the storage system and optimize the arbitration disk selection strategy.
  • An electronic device provided in an embodiment of the present application is introduced below.
  • the electronic device described below and the disk arbitration area detection method and device described above can be referenced to each other.
  • an electronic device including:
  • Memory 601 configured to store computer programs
  • the processor 602 is configured to execute a computer program to implement the method disclosed in any of the above embodiments.
  • the following steps can be implemented: traverse each disk in the multi-controller storage system, and if the target disk is currently traversed, determine the target node for detecting the target disk based on a preset load balancing strategy; wherein the target node is any node in the multi-controller storage system; use the target node to detect the response of the arbitration area in the target disk to IO; if the response is an IO error message and the target disk is not a system arbitration disk, add arbitration area failure mark information to the target disk.
  • the following steps can be implemented: obtaining the correspondence between each node in the multi-control storage system and the disk it needs to detect from the preset load balancing strategy; determining the target node for detecting the target disk according to the correspondence.
  • the following steps can be implemented: allocating all disks in the multi-controller storage system to each node in the multi-controller storage system evenly to obtain an allocation result; and establishing a corresponding relationship according to the allocation result.
  • the following steps can be implemented: using the target node to read the data stored in any continuous address in the arbitration area; if the data reading fails, determining that the response of the arbitration area to the IO is an IO error message.
  • the following steps can be implemented: performing a repair operation on the current continuous address, and if the current continuous address repair fails, executing the step of determining that the arbitration area's response to the IO is an IO error message.
  • the following steps can be implemented: writing the randomly generated data to the current continuous address; reading the randomly generated data from the current continuous address; if the read randomly generated data is consistent with the written randomly generated data, it is determined that the current continuous address is repaired successfully; otherwise, it is determined that the current continuous address is repaired unsuccessfully.
  • the following steps can be implemented: if the data is read successfully or the current continuous address is repaired successfully, when the tail address of the current continuous address is not the end address of the arbitration area, the target node is used to read the data stored in the next continuous address adjacent to the current continuous address, and Determine whether the data is read successfully.
  • the following steps can be implemented: if an offline message of the target disk is obtained when reading data or the tail address of the current continuous address is the end address of the arbitration area, wait for the next detection time point so that when the next detection time point is reached, traverse to the next disk.
  • the following steps may be implemented: if the response is an IO error message and the target disk is a system arbitration disk, a fault prompt message of the system arbitration disk is generated.
  • the following steps can be implemented: collecting the mark information of each disk in the multi-controller storage system; and reselecting the system arbitration disk based on the mark information of each disk.
  • the following steps can be implemented: taking disks without tag information as candidate disks, and selecting a system arbitration disk from the candidate disks based on the arbitration disk selection strategy.
  • the following steps can be implemented: using the target node to send a read request and/or a write request to the arbitration area; obtaining the processing result of the arbitration area for the read request and/or the write request; using the processing result of the arbitration area for the read request and/or the write request as the response of the arbitration area to the IO.
  • an embodiment of the present application further provides a server as the above-mentioned electronic device.
  • the server may include: at least one processor, at least one memory, a power supply, a communication interface, an input/output interface, and a communication bus.
  • the memory is configured to store a computer program, which is loaded and executed by the processor to implement the relevant steps in the disk arbitration area detection method disclosed in any of the above-mentioned embodiments.
  • the power supply is configured to provide operating voltage for each hardware device on the server;
  • the communication interface can create a data transmission channel between the server and external devices, and the communication protocol it follows is any communication protocol that can be applied to the technical solution of the embodiment of the present application, and is not limited here;
  • the input and output interface is configured to obtain external input data or output data to the outside world, and its actual interface type can be selected according to actual application needs and is not limited here.
  • the memory as a carrier for resource storage can be a read-only memory, random access memory, disk or CD, etc.
  • the resources stored thereon include operating system, computer programs and data, etc.
  • the storage method can be temporary storage or permanent storage.
  • the operating system is configured to manage and control the hardware devices and computer programs on the server to realize the operation and processing of the data in the memory by the processor.
  • the operating system may be Windows Server, Netware, Unix, Linux, etc.
  • the computer program may also include computer programs that can be used to complete other specific tasks.
  • the data may also include data such as information about the developer of the virtual machine.
  • the embodiment of the present application further provides a terminal as the above electronic device.
  • the terminal may include but is not limited to a smart phone, a tablet computer, a laptop computer or a desktop computer.
  • the terminal in this embodiment includes: a processor and a memory.
  • the processor may include one or more processing cores, such as a 4-core processor, an 8-core processor, etc.
  • the processor may be implemented in at least one hardware form of DSP (Digital Signal Processing), FPGA (Field-Programmable Gate Array), and PLA (Programmable Logic Array).
  • the processor may also include a main processor and a coprocessor.
  • the main processor is a processor configured to process data in an awake state, also known as a CPU (Central Processing Unit);
  • the coprocessor is a low-power processor configured to process data in a standby state.
  • the processor may be integrated with a GPU (Graphics Processing Unit), and the GPU is configured to be responsible for rendering and drawing the content to be displayed on the display screen.
  • the processor may also include an AI (Artificial Intelligence) processor, which is configured to process computing operations related to machine learning.
  • AI Artificial Intelligence
  • the memory may include one or more non-volatile readable storage media, which may be non-transitory.
  • the memory may also include high-speed random access memory, and non-volatile memory, such as one or more disk storage devices, flash memory storage devices.
  • the memory is at least configured to store the following computer program, wherein, after the computer program is loaded and executed by the processor, it can implement the relevant steps in the disk arbitration area detection method performed by the terminal side disclosed in any of the aforementioned embodiments.
  • the resources stored in the memory may also include an operating system and data, etc., and the storage method may be temporary storage or permanent storage.
  • the operating system may include Windows, Unix, Linux, etc.
  • the data may include, but is not limited to, update information of the application.
  • the terminal may also include a display screen, an input and output interface, a communication interface, a sensor, a power supply, and a communication bus.
  • a non-volatile readable storage medium provided in an embodiment of the present application is introduced below.
  • the non-volatile readable storage medium described below and the disk arbitration area detection method, device and apparatus described above can be cross-referenced.
  • a non-volatile readable storage medium is configured to store a computer program, wherein when the computer program is executed by a processor, the disk arbitration area detection method disclosed in the above embodiment is implemented.
  • the following steps can be implemented: traverse each disk in the multi-controller storage system, and if the target disk is currently traversed, determine the target node for detecting the target disk based on the preset load balancing strategy; wherein the target node is any one of the multi-controller storage system node; using the target node to detect the response of the arbitration area in the target disk to the IO; if the response is an IO error message and the target disk is not the system arbitration disk, adding arbitration area failure mark information to the target disk.
  • the following steps can be implemented: obtaining the correspondence between each node in the multi-controller storage system and the disk it needs to detect from the preset load balancing strategy; determining the target node for detecting the target disk according to the correspondence.
  • the following steps can be implemented: allocating all disks in the multi-controller storage system to each node in the multi-controller storage system evenly to obtain an allocation result; and establishing a corresponding relationship according to the allocation result.
  • the following steps can be implemented: using the target node to read the data stored in any continuous address segment in the arbitration area; if the data reading fails, determining that the arbitration area's response to the IO is an IO error message.
  • the following steps can be implemented: performing a repair operation on the current continuous address, and if the current continuous address repair fails, executing the step of determining that the arbitration area's response to the IO is an IO error message.
  • the following steps can be implemented: writing the randomly generated data to the current continuous address; reading the randomly generated data from the current continuous address; if the read randomly generated data is consistent with the written randomly generated data, it is determined that the current continuous address is repaired successfully; otherwise, it is determined that the current continuous address is repaired unsuccessfully.
  • the following steps can be implemented: if the data is read successfully or the current continuous address is repaired successfully, then when the tail address of the current continuous address is not the end address of the arbitration area, the target node is used to read the data stored in the next continuous address adjacent to the current continuous address, and it is determined whether the data is read successfully.
  • the following steps can be implemented: if an offline message of the target disk is obtained when reading data or the tail address of the current continuous address is the end address of the arbitration area, wait for the next detection time point so that when the next detection time point is reached, traverse to the next disk.
  • the following steps can be implemented: if the response is an IO error message and the target disk is a system arbitration disk, a fault prompt message of the system arbitration disk is generated.
  • the following steps can be implemented: collecting the tag information of each disk in the multi-controller storage system; reselecting the disk based on the tag information of each disk; System arbitration disk.
  • the following steps can be implemented: disks without marking information are used as candidate disks, and a system arbitration disk is selected from the candidate disks based on the arbitration disk selection strategy.
  • the following steps can be implemented: using the target node to send a read request and/or a write request to the arbitration area; obtaining the processing result of the arbitration area for the read request and/or the write request; using the processing result of the arbitration area for the read request and/or the write request as the response of the arbitration area to the IO.
  • the steps of the method or algorithm described in conjunction with the embodiments disclosed herein may be implemented directly using hardware, a software module executed by a processor, or a combination of the two.
  • the software module may be placed in a random access memory (RAM), a memory, a read-only memory (ROM), an electrically programmable ROM, an electrically erasable programmable ROM, a register, a hard disk, a removable disk, a CD-ROM, or any other form of non-volatile readable storage medium known in the art.

Landscapes

  • Engineering & Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

一种盘仲裁区域检测方法、装置、设备及非易失性可读存储介质。方法包括:遍历多控存储系统中的各磁盘,若当前遍历到目标磁盘,则基于预设负载均衡策略确定用于检测目标磁盘的目标节点(S101);利用目标节点检测目标磁盘中的仲裁区域对于 IO 的响应(S102);若响应为 IO 报错消息且目标磁盘不是系统仲裁盘,则为目标磁盘添加仲裁区域故障的标记信息(S103)。使系统中的所有节点均衡承担系统中各盘的检测任务,能够平衡盘检测任务和 IO 业务对存储系统的压力,并且系统中负载较小的非 RAID 成员盘也有可能被选为仲裁盘,可优化仲裁盘的选择策略。一种盘仲裁区域检测装置、设备及非易失性可读存储介质,也同样具有技术效果。

Description

盘仲裁区域检测方法、装置、设备及非易失性可读存储介质
相关申请的交叉引用
本申请要求于2022年10月18日提交中国专利局,申请号为202211269936.5,申请名称为“一种盘仲裁区域检测方法、装置、设备及可读存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及计算机技术领域,特别涉及一种盘仲裁区域检测方法、装置、设备及非易失性可读存储介质。
背景技术
目前,存储系统中需要设置仲裁盘来记录仲裁信息,基于该仲裁信息可方便确定存储系统中的主控节点,或者使存储系统分裂为多个。当前可以对存储系统中的RAID(Redundant Arraysof Independent Disks,磁盘阵列)成员盘进行检测,以根据检测到的盘健康状况确定哪些盘可被选为仲裁盘。在检测各RAID成员盘时,系统既需要承担IO(Input Output,输入输出)业务,又需要承担检测任务,因此系统总体负载可能较大,会降低系统整体性能。
如何平衡盘检测任务和IO业务对存储系统的压力,是本领域技术人员需要解决的问题。
发明内容
有鉴于此,本申请实施例的目的在于提供一种盘仲裁区域检测方法、装置、设备及非易失性可读存储介质,以平衡盘检测任务和IO业务对存储系统的压力。其详细方案如下:
第一方面,本申请实施例提供了一种盘仲裁区域检测方法,包括:
遍历多控存储系统中的各磁盘,若当前遍历到目标磁盘,则基于预设负载均衡策略确定用于检测目标磁盘的目标节点;其中,目标节点为多控存储系统中的任一节点;
利用目标节点检测目标磁盘中的仲裁区域对于IO的响应;
若响应为IO报错消息且目标磁盘不是系统仲裁盘,则为目标磁盘添加仲裁区域故障的标记信息。
可选地,基于预设负载均衡策略确定用于检测目标磁盘的目标节点,包括:
从预设负载均衡策略中获取多控存储系统中的各节点与其需要检测的磁盘之间的对应关系;
按照对应关系确定用于检测目标磁盘的目标节点。
可选地,对应关系的构建过程包括:
将多控存储系统中的所有磁盘均衡分配至多控存储系统中的各节点,得到分配结果;
按照分配结果建立对应关系。
可选地,利用目标节点检测目标磁盘中的仲裁区域对于IO的响应,包括:
利用目标节点读取仲裁区域中的任一段连续地址中存储的数据;
若读取数据失败,则确定仲裁区域对于IO的响应为IO报错消息。
可选地,确定仲裁区域对于IO的响应为IO报错消息之前,还包括:
对当前连续地址执行修复操作,若当前连续地址修复失败,则执行确定仲裁区域对于IO的响应为IO报错消息的步骤。
可选地,对当前连续地址执行修复操作,包括:
将随机生成数据写入当前连续地址;
从当前连续地址读取随机生成数据;
若读取到的随机生成数据与写入的随机生成数据一致,则确定当前连续地址修复成功;否则,确定当前连续地址修复失败。
可选地,还包括:
若读取数据成功或当前连续地址修复成功,则在当前连续地址的尾地址不是仲裁区域的结束地址时,利用目标节点读取与当前连续地址相邻的下一段连续地址中存储的数据,并判断是否成功读取数据。
可选地,还包括:
若读取数据时获取到目标磁盘的离线消息或当前连续地址的尾地址是仲裁区域的结束地址,则等待下一检测时间点,以便到达下一检测时间点时,遍历到下一磁盘。
可选地,还包括:
若响应为IO报错消息且目标磁盘是系统仲裁盘,则生成系统仲裁盘的故障提示消息。
可选地,生成系统仲裁盘的故障提示消息之后,还包括:
收集多控存储系统中各磁盘的标记信息;
基于各磁盘的标记信息重新选择系统仲裁盘。
可选地,基于各磁盘的标记信息重新选择系统仲裁盘,包括:
将没有标记信息的磁盘作为候选盘,并基于仲裁盘选择策略在各候选盘中选择系统仲裁盘。
可选地,利用目标节点检测目标磁盘中的仲裁区域对于IO的响应,包括:
利用目标节点下发读请求和/或写请求至仲裁区域;
获取仲裁区域对于读请求和/或写请求的处理结果;
将仲裁区域对于读请求和/或写请求的处理结果作为仲裁区域对于IO的响应。
可选地,在目标磁盘被选为系统仲裁盘的情况下,目标磁盘中的仲裁区域用于存储仲裁数据;在目标磁盘没有被选为系统仲裁盘的情况下,目标磁盘中的仲裁区域中不存储任何数据。
可选地,仲裁区域位于目标磁盘的存储空间尾部。
可选地,将仲裁区域对于读请求和/或写请求的处理结果作为仲裁区域对于IO的响应,包括:
在仲裁区域对于读请求的处理结果为读IO的响应有问题或对于写请求的处理结果为写IO的响应有问题的情况下,确定仲裁区域对于IO的响应为IO报错消息。
可选地,遍历多控存储系统中的各磁盘,包括:
按照定时器的设定,按照磁盘号遍历多控存储系统中的所有受管磁盘。
可选地,定时器的设定用于指示同一节点对于不同磁盘的检测任务的间隔时长。
第二方面,本申请实施例提供了一种盘仲裁区域检测装置,包括:
遍历模块,被设置为遍历多控存储系统中的各磁盘,若当前遍历到目标磁盘,则基于预设负载均衡策略确定用于检测目标磁盘的目标节点;其中,目标节点为多控存储系统中的任一节点;
检测模块,被设置为利用目标节点检测目标磁盘中的仲裁区域对于IO的响应;
标记模块,被设置为若响应为IO报错消息且目标磁盘不是系统仲裁盘,则为目标磁盘添加仲裁区域故障的标记信息。
可选地,遍历模块被设置为:
从预设负载均衡策略中获取多控存储系统中的各节点与其需要检测的磁盘之间的对应关系;按照对应关系确定用于检测目标磁盘的目标节点。
可选地,对应关系的构建过程包括:
将多控存储系统中的所有磁盘均衡分配至多控存储系统中的各节点,得到分配结果;按照分配结果建立对应关系。
可选地,检测模块包括:
读IO检测单元,被设置为利用目标节点读取仲裁区域中的任一段连续地址中存储的数据;若读取数据失败,则确定仲裁区域对于IO的响应为IO报错消息。
可选地,检测模块还包括:
修复单元,被设置为在确定仲裁区域对于IO的响应为IO报错消息之前,对当前连续地址执行修复操作,若当前连续地址修复失败,则执行确定仲裁区域对于IO的响应为IO报错消息的步骤。
可选地,修复单元被设置为:
将随机生成数据写入当前连续地址;从当前连续地址读取随机生成数据;若读取到的随机生成数据与写入的随机生成数据一致,则确定当前连续地址修复成功;否则,确定当前连续地址修复失败。
可选地,检测模块还包括:
循环单元,被设置为若读取数据成功或当前连续地址修复成功,则在当前连续地址的尾地址不是仲裁区域的结束地址时,利用目标节点读取与当前连续地址相邻的下一段连续地址中存储的数据,并判断是否成功读取数据。
可选地,还包括:
定时模块,被设置为若读取数据时获取到目标磁盘的离线消息或当前连续地址的尾地址是仲裁区域的结束地址,则等待下一检测时间点,以便到达下一检测时间点时,遍历到下一磁盘。
可选地,还包括:
提示模块,被设置为若响应为IO报错消息且目标磁盘是系统仲裁盘,则生成系统仲裁盘的故障提示消息。
可选地,还包括:
仲裁盘更换模块,被设置为在生成系统仲裁盘的故障提示消息之后,收集多控存储系统中各磁盘的标记信息;基于各磁盘的标记信息重新选择系统仲裁盘。
可选地,仲裁盘更换模块被设置为:
将没有标记信息的磁盘作为候选盘,并基于仲裁盘选择策略在各候选盘中选择系统仲裁盘。
可选地,检测模块被设置为:
利用目标节点下发读请求和/或写请求至仲裁区域;
获取仲裁区域对于读请求和/或写请求的处理结果;
将仲裁区域对于读请求和/或写请求的处理结果作为仲裁区域对于IO的响应。
第三方面,本申请实施例提供了一种电子设备,包括:
存储器,被设置为存储计算机程序;
处理器,被设置为执行计算机程序,以实现前述公开的盘仲裁区域检测方法。
第四方面,本申请实施例提供了一种非易失性可读存储介质,被设置为保存计算机程序,其中,计算机程序被处理器执行时实现前述公开的盘仲裁区域检测方法。
通过以上方案可知,本申请实施例提供了一种盘仲裁区域检测方法,包括:遍历多控存储系统中的各磁盘,若当前遍历到目标磁盘,则基于预设负载均衡策略确定用于检测目标磁盘的目标节点;其中,目标节点为多控存储系统中的任一节点;利用目标节点检测目标磁盘中的仲裁区域对于IO的响应;若响应为IO报错消息且目标磁盘不是系统仲裁盘,则为目标磁盘添加仲裁区域故障的标记信息。
可见,本申请实施例通过遍历多控存储系统中的各磁盘来检测各磁盘的健康状况,若当前遍历到目标磁盘,则基于预设负载均衡策略确定用于检测目标磁盘的目标节点,由此可使系统中的所有节点均衡承担系统中各盘的检测任务,不至于某一节点承担的检测任务较多,而影响了该节点上的IO业务。在实际检测各个盘时,利用上述步骤确定的目标节点检测目标磁盘中的仲裁区域对于IO的响应;若响应为IO报错消息且目标磁盘不是系统仲裁盘,则为目标磁盘添加仲裁区域故障的标记信息,后续便可以依据此标记信息为系统选择仲裁盘。并且,本申请实施例对系统中的每个磁盘都进行检测,也就是说:系统中的每个磁盘都有概率被选为仲裁盘,选择范围并不局限于系统中的RAID成员盘。因为存储系统中的非RAID成员盘的业务负载压力一般小,更适合被选为仲裁盘。可见,本申请实施例使系统中的所有节点均衡承担系统中各盘的检测任务,能够平衡盘检测任务和IO业务对存储系统的压力,并且系统中负载较小的非RAID成员盘也有可能被选为仲裁盘,可优化仲裁盘的选择策略。
相应地,本申请实施例提供的一种盘仲裁区域检测装置、设备及非易失性可读存储介质,也同样具有上述技术效果。
附图说明
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据提供的附图获得其他的附图。
图1为本申请实施例公开的一种盘仲裁区域检测方法流程图;
图2为本申请实施例公开的另一种盘仲裁区域检测方法流程图;
图3为本申请实施例公开的一种verify IO的结果处理示意图;
图4为本申请实施例公开的一种write verify指令的结果处理示意图;
图5为本申请实施例公开的一种盘仲裁区域检测装置示意图;
图6为本申请实施例公开的一种电子设备示意图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请实施例保护的范围。
目前,可以对存储系统中的RAID成员盘进行检测,以根据检测到的盘健康状况确定哪些盘可被选为仲裁盘。在检测各RAID成员盘时,系统既需要承担IO业务,又需要承担检测任务,因此系统总体负载可能较大,会降低系统整体性能。为此,本申请实施例提供了一种盘仲裁区域检测方案,能够平衡盘检测任务和IO业务对存储系统的压力,优化仲裁盘的选择策略。
参见图1所示,本申请实施例公开了一种盘仲裁区域检测方法,包括:
S101、遍历多控存储系统中的各磁盘,若当前遍历到目标磁盘,则基于预设负载均衡策略确定用于检测目标磁盘的目标节点。
需要说明的是,多控存储系统包括多个节点以及多个磁盘,并且,所有磁盘被每个节点可见。为了使各个节点分担各磁盘的检测任务,可以将各磁盘均衡分配给相应节点,以使各个节点均衡承担各磁盘的检测任务。假设共有2个节点、10个磁盘,那么可使每个节点承担5个磁盘的检测任务。其中,目标节点为多控存储系统中的任一节点。
在一种实施方式中,基于预设负载均衡策略确定用于检测目标磁盘的目标节点,包括:从预设负载均衡策略中获取多控存储系统中的各节点与其需要检测的磁盘之间的对应关系;按照对应关系确定用于检测目标磁盘的目标节点。其中,对应关系的构建过程包括:将多控存储系统中的所有磁盘均衡分配至多控存储系统中的各节点,得到分配结果;按照分配结果建立对应关系。
S102、利用目标节点检测目标磁盘中的仲裁区域对于IO的响应。
需要说明的是,每个磁盘中都有一部分空间用作仲裁区域,其他空间用来存储业务 数据。若某一磁盘被选为系统仲裁盘,那么该磁盘中的仲裁区域用来存储仲裁数据。若某一磁盘没有被选为系统仲裁盘,那么该磁盘中的仲裁区域不存储任何数据。仲裁区域可位于每个磁盘的存储空间尾部,例如:在每个磁盘的末尾预留512M大小作为仲裁区域。仲裁区域的大小可根据实际情况进行调整。
在本实施例中,检测仲裁区域对于IO的响应的目的是:检测仲裁区域的读写功能是否有问题。为此,利用目标节点检测目标磁盘中的仲裁区域对于IO的响应,包括:利用目标节点下发读请求和/或写请求至仲裁区域;获取仲裁区域对于读请求和/或写请求的处理结果;将仲裁区域对于读请求和/或写请求的处理结果作为仲裁区域对于IO的响应。可见,仲裁区域对于IO的响应包括:读IO的响应和/或写IO的响应。若读IO的响应或写IO的响应有问题,则认为仲裁区域对于IO的响应出错,故确定响应为IO报错消息。
由于仲裁区域是一个整块的存储空间,而该整块存储空间中可能有个别地址故障,为了检测出该整块存储空间中的个别故障地址,可以分区域检测仲裁区域。在一种实施方式中,利用目标节点检测目标磁盘中的仲裁区域对于IO的响应,包括:利用目标节点读取仲裁区域中的任一段连续地址中存储的数据;若读取数据失败,则确定仲裁区域对于IO的响应为IO报错消息。若从当前连续地址读不出数据,则表明当前连续地址故障,因此确定仲裁区域对于IO的响应出错。
当然,由于数据读取出错存在偶然性,因此在某一连续地址读不出数据时,可以尝试进行修复操作;在修复失败后,才认为仲裁区域对于IO的响应出错;若修改成功,则对仲裁区域中的其他地址继续检测。在一种实施方式中,确定仲裁区域对于IO的响应为IO报错消息之前,还包括:对当前连续地址执行修复操作,若当前连续地址修复失败,则执行确定仲裁区域对于IO的响应为IO报错消息的步骤。其中,对当前连续地址执行修复操作,包括:将随机生成数据写入当前连续地址;从当前连续地址读取随机生成数据;若读取到的随机生成数据与写入的随机生成数据一致,则确定当前连续地址修复成功;否则,确定当前连续地址修复失败。
在一种实施方式中,若读取数据成功或当前连续地址修复成功,则在当前连续地址的尾地址不是仲裁区域的结束地址时,利用目标节点读取与当前连续地址相邻的下一段连续地址中存储的数据,并判断是否成功读取数据,并根据数据读取成功与否的结果进行处理。
当然,存储系统中的磁盘也可能因故障等原因离线,因此若读取数据时获取到目标磁盘的离线消息或当前连续地址的尾地址是仲裁区域的结束地址,则等待下一检测时间点,以便到达下一检测时间点时,遍历到下一磁盘。可见,本实施例能够对系统中的所 有磁盘进行定时遍历,例如:检测完当前磁盘后,经过3秒后,再检测下一磁盘。
由于系统仲裁盘中记录的仲裁信息非常重要,因此当确认系统仲裁盘的仲裁区域有问题后,需要及时告知系统进行系统仲裁盘的更换,因此在一种实施方式中,若响应为IO报错消息且目标磁盘是系统仲裁盘,则生成系统仲裁盘的故障提示消息。
在一种实施方式中,生成系统仲裁盘的故障提示消息之后,还包括:收集多控存储系统中各磁盘的标记信息;基于各磁盘的标记信息重新选择系统仲裁盘。其中,基于各磁盘的标记信息重新选择系统仲裁盘,包括:将没有标记信息的磁盘作为候选盘,并基于仲裁盘选择策略在各候选盘中选择系统仲裁盘,以避免新选择的系统仲裁盘的仲裁区域有问题。其中,仲裁盘选择策略可按照现有相关技术制定。
系统中的磁盘可以为SAS(serial attached scsi,scsi总线协议串行标准)盘或NVMe(Non-Volatile Memory express,闪存类存储协议)盘。当磁盘可以为SAS盘时,目标节点采用scsi总线命令检测磁盘中的仲裁区域对于IO的响应;当磁盘可以为NVMe盘时,目标节点采用NVMe总线命令检测磁盘中的仲裁区域对于IO的响应。
S103、若响应为IO报错消息且目标磁盘不是系统仲裁盘,则为目标磁盘添加仲裁区域故障的标记信息。
可见,本实施例通过遍历多控存储系统中的各磁盘来检测各磁盘的健康状况,若当前遍历到目标磁盘,则基于预设负载均衡策略确定用于检测目标磁盘的目标节点,由此可使系统中的所有节点均衡承担系统中各盘的检测任务,不至于某一节点承担的检测任务较多,而影响了该节点上的IO业务。在实际检测各个盘时,利用上述步骤确定的目标节点检测目标磁盘中的仲裁区域对于IO的响应;若响应为IO报错消息且目标磁盘不是系统仲裁盘,则为目标磁盘添加仲裁区域故障的标记信息,后续便可以依据此标记信息为系统选择仲裁盘。并且,本申请实施例对系统中的每个磁盘都进行检测,也就是说:系统中的每个磁盘都有概率被选为仲裁盘,选择范围并不局限于系统中的RAID成员盘。因为存储系统中的非RAID成员盘的业务负载压力一般小,更适合被选为仲裁盘。可见,本实施例使系统中的所有节点均衡承担系统中各盘的检测任务,能够平衡盘检测任务和IO业务对存储系统的压力,并且系统中负载较小的非RAID成员盘也有可能被选为仲裁盘,可优化仲裁盘的选择策略。
下面以SAS盘和scsi总线命令为例介绍本申请实施例。本实施例的方案总体逻辑请参照图2,如图2所示,本实施例设置了磁盘巡检任务和执行巡检任务的定时器,其中,巡检任务的执行线程为scrub_scan fibre,定时器设定:同一节点对于不同磁盘的检测任务间隔3秒;不同节点对于不同磁盘的检测也间隔3秒。
可选的,按照定时器的设定,按照磁盘号遍历存储系统中的所有受管磁盘,每3s取下一块磁盘,启动该盘的仲裁区域的检查,巡检执行线程为scrub_scan fibre。如果磁盘号存在、且巡检时间到、且用于检测当前盘的节点上没有正在执行的巡检任务,则对当前盘的仲裁区域进行IO下发,并且所下发IO进入队列,等待系统调度。如果磁盘号不存在、或巡检时间到但所确定的节点不用于检测当前盘,那么针对下一磁盘进行检测。
其中,为了平衡系统中所有节点的IO负载压力,设计如下策略,以为各个盘选择检测节点。假设双控系统中(集群中有两个节点),所有受管磁盘对双控节点都可见,节点包括D0和D1,所有受管磁盘的盘号driveD从D0开始依次递增。节点D0执行drive D为偶数的磁盘,节点D1执行drive D为奇数的磁盘,如此各个磁盘的检测任务便被D0和D1这两个节点均衡分担,以避免节点在检测各个盘时,影响节点上的IO业务。当然,当集群中只有一个节点工作时,所有盘的巡检任务都在这个节点执行。因此对于每个节点,可以判断当前节点是否应该执行当前盘的本次巡检任务,在当前节点应该执行当前盘的本次巡检任务的情况下,则进入下一步,在当前节点不应该执行当前盘的本次巡检任务的情况下,本次执行结束,等待3s后对下一块盘操作。
由于本实施例设定磁盘为SAS盘,故使用scsi verify指令可检测被下发IO的目标地址是否可读,每次下发IO所读取的数据长度为512byte(字节),盘的仲裁区域为512M。可选的,可以设定盘的仲裁区域的last->lba(尾地址)、本次IO读取的数据长度length、下一次IO读取的地址next_lba。队列中已下发的IO被节点中的线程scrub_submit fibre处理,以确定本次IO是否能成功读取数据。其中,scrub_submit fibre处理IO的过程请参见图3,如图3所示,线程scrub_submit fibre从队列取出一个verify(证明)IO,将verify IO提交给指定盘,盘上处理IO并返回结果(即verify IO结果),该结果再次异步调度回scrub_scan fibre,scrub_scan fibre确认verify IO结果。当返回结果为success(成功)时,说明磁盘当前被下发IO的目标地址状态良好,那么根据next_lba和last->lba的大小关系判断盘仲裁区域的巡检是否执行完毕。若当前读完成地址等于last->lba,则认为盘仲裁区域巡检结束;若当前读完成地址不等于last->lba,则重新计算next->lba=当前读完成地址+length,并再次执行IO下发流程。当返回结果是offline时,说明磁盘离线,不再需要巡检,那么该盘的巡检任务立即停止。当返回结果为其它错误时,说明当前被下发IO的目标地址疑似坏块,若该盘是存储系统的仲裁盘,则立即通知集群更换仲裁盘,并结束该盘的巡检任务;若不是系统的仲裁盘,则提交write verify(写证明)指令,以尝试修复。
其中,使用write verify指令对疑似坏块写入二进制数据,并对比读出和写入的数 据是否一致,以尝试修复。修复流程请参见图4,如图4所示,当得到write verify指令的处理结果后,根据此结果确定是否修复成功。当write verify指令的处理结果为success(成功)时,坏块修复成功,继续提交下一块地址的verify IO请求。当write verify(写证明)指令的处理结果是offline时,说明磁盘离线,不再需要巡检,巡检任务立即停止。当write verify指令的处理结果为其它错误时,坏块修复失败,标记该盘为WRITE VERIFY ERROR(写证明错误),同时通知集群,以便存储系统的仲裁盘选择机制将此结果作为选择权重。
可见,本实施例依据scsi verify指令和write verify指令,实现了存储系统中所有受管磁盘的仲裁区域的坏块检测,可以及时发现盘仲裁区域的坏块,保证仲裁盘数据的可靠性,能够为存储系统的仲裁选择机制提供可靠参考因素。其中,本实施例针对每个盘的仲裁区域,以地址块为单位滑动下发verify指令检测是否坏块,并通过下发write verify指令尝试坏块修复,可在检测到某一坏块后立即停止当前盘的检测,每次IO的数据体量小且可避免检测盘的整个仲裁区域,检测效率较高。
下面对本申请实施例提供的一种盘仲裁区域检测装置进行介绍,下文描述的一种盘仲裁区域检测装置与上文描述的一种盘仲裁区域检测方法可以相互参照。
参见图5所示,本申请实施例公开了一种盘仲裁区域检测装置,包括:
遍历模块501,被设置为遍历多控存储系统中的各磁盘,若当前遍历到目标磁盘,则基于预设负载均衡策略确定用于检测目标磁盘的目标节点;其中,目标节点为多控存储系统中的任一节点;
检测模块502,被设置为利用目标节点检测目标磁盘中的仲裁区域对于IO的响应;
标记模块503,被设置为若响应为IO报错消息且目标磁盘不是系统仲裁盘,则为目标磁盘添加仲裁区域故障的标记信息。
在一种实施方式中,遍历模块被设置为:
从预设负载均衡策略中获取多控存储系统中的各节点与其需要检测的磁盘之间的对应关系;按照对应关系确定用于检测目标磁盘的目标节点。
在一种实施方式中,对应关系的构建过程包括:
将多控存储系统中的所有磁盘均衡分配至多控存储系统中的各节点,得到分配结果;按照分配结果建立对应关系。
在一种实施方式中,检测模块包括:
读IO检测单元,被设置为利用目标节点读取仲裁区域中的任一段连续地址中存储的数据;若读取数据失败,则确定仲裁区域对于IO的响应为IO报错消息。
在一种实施方式中,检测模块还包括:
修复单元,被设置为在确定仲裁区域对于IO的响应为IO报错消息之前,对当前连续地址执行修复操作,若当前连续地址修复失败,则执行确定仲裁区域对于IO的响应为IO报错消息的步骤。
在一种实施方式中,修复单元被设置为:
将随机生成数据写入当前连续地址;从当前连续地址读取随机生成数据;若读取到的随机生成数据与写入的随机生成数据一致,则确定当前连续地址修复成功;否则,确定当前连续地址修复失败。
在一种实施方式中,检测模块还包括:
循环单元,被设置为若读取数据成功或当前连续地址修复成功,则在当前连续地址的尾地址不是仲裁区域的结束地址时,利用目标节点读取与当前连续地址相邻的下一段连续地址中存储的数据,并判断是否成功读取数据。
在一种实施方式中,还包括:
定时模块,被设置为若读取数据时获取到目标磁盘的离线消息或当前连续地址的尾地址是仲裁区域的结束地址,则等待下一检测时间点,以便到达下一检测时间点时,遍历到下一磁盘。
在一种实施方式中,还包括:
提示模块,被设置为若响应为IO报错消息且目标磁盘是系统仲裁盘,则生成系统仲裁盘的故障提示消息。
在一种实施方式中,还包括:
仲裁盘更换模块,被设置为在生成系统仲裁盘的故障提示消息之后,收集多控存储系统中各磁盘的标记信息;基于各磁盘的标记信息重新选择系统仲裁盘。
在一种实施方式中,仲裁盘更换模块被设置为:
将没有标记信息的磁盘作为候选盘,并基于仲裁盘选择策略在各候选盘中选择系统仲裁盘。
在一种实施方式中,检测模块被设置为:
利用目标节点下发读请求和/或写请求至仲裁区域;
获取仲裁区域对于读请求和/或写请求的处理结果;
将仲裁区域对于读请求和/或写请求的处理结果作为仲裁区域对于IO的响应。
其中,关于本实施例中各个模块、单元更加详细的工作过程可以参考前述实施例中公开的相应内容,在此不再进行赘述。
可见,本实施例提供了一种盘仲裁区域检测装置,能够平衡盘检测任务和IO业务对存储系统的压力,优化仲裁盘的选择策略。
下面对本申请实施例提供的一种电子设备进行介绍,下文描述的一种电子设备与上文描述的一种盘仲裁区域检测方法及装置可以相互参照。
参见图6所示,本申请实施例公开了一种电子设备,包括:
存储器601,被设置为保存计算机程序;
处理器602,被设置为执行计算机程序,以实现上述任意实施例公开的方法。
在本实施例中,存储器保存的计算机程序被处理器执行时,可以实现以下步骤:遍历多控存储系统中的各磁盘,若当前遍历到目标磁盘,则基于预设负载均衡策略确定用于检测目标磁盘的目标节点;其中,目标节点为多控存储系统中的任一节点;利用目标节点检测目标磁盘中的仲裁区域对于IO的响应;若响应为IO报错消息且目标磁盘不是系统仲裁盘,则为目标磁盘添加仲裁区域故障的标记信息。
在本实施例中,存储器保存的计算机程序被处理器执行时,可以实现以下步骤:从预设负载均衡策略中获取多控存储系统中的各节点与其需要检测的磁盘之间的对应关系;按照对应关系确定用于检测目标磁盘的目标节点。
在本实施例中,存储器保存的计算机程序被处理器执行时,可以实现以下步骤:将多控存储系统中的所有磁盘均衡分配至多控存储系统中的各节点,得到分配结果;按照分配结果建立对应关系。
在本实施例中,存储器保存的计算机程序被处理器执行时,可以实现以下步骤:利用目标节点读取仲裁区域中的任一段连续地址中存储的数据;若读取数据失败,则确定仲裁区域对于IO的响应为IO报错消息。
在本实施例中,存储器保存的计算机程序被处理器执行时,可以实现以下步骤:对当前连续地址执行修复操作,若当前连续地址修复失败,则执行确定仲裁区域对于IO的响应为IO报错消息的步骤。
在本实施例中,存储器保存的计算机程序被处理器执行时,可以实现以下步骤:将随机生成数据写入当前连续地址;从当前连续地址读取随机生成数据;若读取到的随机生成数据与写入的随机生成数据一致,则确定当前连续地址修复成功;否则,确定当前连续地址修复失败。
在本实施例中,存储器保存的计算机程序被处理器执行时,可以实现以下步骤:若读取数据成功或当前连续地址修复成功,则在当前连续地址的尾地址不是仲裁区域的结束地址时,利用目标节点读取与当前连续地址相邻的下一段连续地址中存储的数据,并 判断是否成功读取数据。
在本实施例中,存储器保存的计算机程序被处理器执行时,可以实现以下步骤:若读取数据时获取到目标磁盘的离线消息或当前连续地址的尾地址是仲裁区域的结束地址,则等待下一检测时间点,以便到达下一检测时间点时,遍历到下一磁盘。
在本实施例中,存储器保存的计算机程序被处理器执行时,可以实现以下步骤:若响应为IO报错消息且目标磁盘是系统仲裁盘,则生成系统仲裁盘的故障提示消息。
在本实施例中,存储器保存的计算机程序被处理器执行时,可以实现以下步骤:收集多控存储系统中各磁盘的标记信息;基于各磁盘的标记信息重新选择系统仲裁盘。
在本实施例中,存储器保存的计算机程序被处理器执行时,可以实现以下步骤:将没有标记信息的磁盘作为候选盘,并基于仲裁盘选择策略在各候选盘中选择系统仲裁盘。
在本实施例中,存储器保存的计算机程序被处理器执行时,可以实现以下步骤:利用目标节点下发读请求和/或写请求至仲裁区域;获取仲裁区域对于读请求和/或写请求的处理结果;将仲裁区域对于读请求和/或写请求的处理结果作为仲裁区域对于IO的响应。
可选的,本申请实施例还提供了一种服务器来作为上述电子设备。该服务器,可以包括:至少一个处理器、至少一个存储器、电源、通信接口、输入输出接口和通信总线。其中,存储器被设置为存储计算机程序,计算机程序由处理器加载并执行,以实现前述任一实施例公开的盘仲裁区域检测方法中的相关步骤。
本实施例中,电源被设置为为服务器上的各硬件设备提供工作电压;通信接口能够为服务器创建与外界设备之间的数据传输通道,其所遵循的通信协议是能够适用于本申请实施例技术方案的任意通信协议,在此不对其进行限定;输入输出接口,被设置为获取外界输入数据或向外界输出数据,其实际的接口类型可以根据实际应用需要进行选取,在此不进行限定。
另外,存储器作为资源存储的载体,可以是只读存储器、随机存储器、磁盘或者光盘等,其上所存储的资源包括操作系统、计算机程序及数据等,存储方式可以是短暂存储或者永久存储。
其中,操作系统被设置为管理与控制服务器上的各硬件设备以及计算机程序,以实现处理器对存储器中数据的运算与处理,其可以是Windows Server、Netware、Unix、Linux等。计算机程序除了包括能够用于完成前述任一实施例公开的盘仲裁区域检测方法的计算机程序之外,还可以包括能够用于完成其他特定工作的计算机程序。数据除了可 以包括虚拟机等数据外,还可以包括虚拟机的开发商信息等数据。
可选的,本申请实施例还提供了一种终端来作为上述电子设备。该终端可以包括但不限于智能手机、平板电脑、笔记本电脑或台式电脑等。
通常,本实施例中的终端包括有:处理器和存储器。
其中,处理器可以包括一个或多个处理核心,比如4核心处理器、8核心处理器等。处理器可以采用DSP(Digital Signal Processing,数字信号处理)、FPGA(Field-Programmable Gate Array,现场可编程门阵列)、PLA(Programmable Logic Array,可编程逻辑阵列)中的至少一种硬件形式来实现。处理器也可以包括主处理器和协处理器,主处理器是被设置为对在唤醒状态下的数据进行处理的处理器,也称CPU(Central Processing Unit,中央处理器);协处理器是被设置为对在待机状态下的数据进行处理的低功耗处理器。在一些实施例中,处理器可以在集成有GPU(Graphics Processing Unit,图像处理器),GPU被设置为负责显示屏所需要显示的内容的渲染和绘制。一些实施例中,处理器还可以包括AI(Artificial Intelligence,人工智能)处理器,该AI处理器被设置为处理有关机器学习的计算操作。
存储器可以包括一个或多个非易失性可读存储介质,该非易失性可读存储介质可以是非暂态的。存储器还可包括高速随机存取存储器,以及非易失性存储器,比如一个或多个磁盘存储设备、闪存存储设备。本实施例中,存储器至少被设置为存储以下计算机程序,其中,该计算机程序被处理器加载并执行之后,能够实现前述任一实施例公开的由终端侧执行的盘仲裁区域检测方法中的相关步骤。另外,存储器所存储的资源还可以包括操作系统和数据等,存储方式可以是短暂存储或者永久存储。其中,操作系统可以包括Windows、Unix、Linux等。数据可以包括但不限于应用程序的更新信息。
在一些实施例中,终端还可包括有显示屏、输入输出接口、通信接口、传感器、电源以及通信总线。
下面对本申请实施例提供的一种非易失性可读存储介质进行介绍,下文描述的一种非易失性可读存储介质与上文描述的一种盘仲裁区域检测方法、装置及设备可以相互参照。
一种非易失性可读存储介质,被设置为保存计算机程序,其中,计算机程序被处理器执行时实现前述实施例公开的盘仲裁区域检测方法。
在本实施例中,非易失性可读存储介质保存的计算机程序被处理器执行时,可以实现以下步骤:遍历多控存储系统中的各磁盘,若当前遍历到目标磁盘,则基于预设负载均衡策略确定用于检测目标磁盘的目标节点;其中,目标节点为多控存储系统中的任一 节点;利用目标节点检测目标磁盘中的仲裁区域对于IO的响应;若响应为IO报错消息且目标磁盘不是系统仲裁盘,则为目标磁盘添加仲裁区域故障的标记信息。
在本实施例中,非易失性可读存储介质保存的计算机程序被处理器执行时,可以实现以下步骤:从预设负载均衡策略中获取多控存储系统中的各节点与其需要检测的磁盘之间的对应关系;按照对应关系确定用于检测目标磁盘的目标节点。
在本实施例中,非易失性可读存储介质保存的计算机程序被处理器执行时,可以实现以下步骤:将多控存储系统中的所有磁盘均衡分配至多控存储系统中的各节点,得到分配结果;按照分配结果建立对应关系。
在本实施例中,非易失性可读存储介质保存的计算机程序被处理器执行时,可以实现以下步骤:利用目标节点读取仲裁区域中的任一段连续地址中存储的数据;若读取数据失败,则确定仲裁区域对于IO的响应为IO报错消息。
在本实施例中,非易失性可读存储介质保存的计算机程序被处理器执行时,可以实现以下步骤:对当前连续地址执行修复操作,若当前连续地址修复失败,则执行确定仲裁区域对于IO的响应为IO报错消息的步骤。
在本实施例中,非易失性可读存储介质保存的计算机程序被处理器执行时,可以实现以下步骤:将随机生成数据写入当前连续地址;从当前连续地址读取随机生成数据;若读取到的随机生成数据与写入的随机生成数据一致,则确定当前连续地址修复成功;否则,确定当前连续地址修复失败。
在本实施例中,非易失性可读存储介质保存的计算机程序被处理器执行时,可以实现以下步骤:若读取数据成功或当前连续地址修复成功,则在当前连续地址的尾地址不是仲裁区域的结束地址时,利用目标节点读取与当前连续地址相邻的下一段连续地址中存储的数据,并判断是否成功读取数据。
在本实施例中,非易失性可读存储介质保存的计算机程序被处理器执行时,可以实现以下步骤:若读取数据时获取到目标磁盘的离线消息或当前连续地址的尾地址是仲裁区域的结束地址,则等待下一检测时间点,以便到达下一检测时间点时,遍历到下一磁盘。
在本实施例中,非易失性可读存储介质保存的计算机程序被处理器执行时,可以实现以下步骤:若响应为IO报错消息且目标磁盘是系统仲裁盘,则生成系统仲裁盘的故障提示消息。
在本实施例中,非易失性可读存储介质保存的计算机程序被处理器执行时,可以实现以下步骤:收集多控存储系统中各磁盘的标记信息;基于各磁盘的标记信息重新选择 系统仲裁盘。
在本实施例中,非易失性可读存储介质保存的计算机程序被处理器执行时,可以实现以下步骤:将没有标记信息的磁盘作为候选盘,并基于仲裁盘选择策略在各候选盘中选择系统仲裁盘。
在本实施例中,非易失性可读存储介质保存的计算机程序被处理器执行时,可以实现以下步骤:利用目标节点下发读请求和/或写请求至仲裁区域;获取仲裁区域对于读请求和/或写请求的处理结果;将仲裁区域对于读请求和/或写请求的处理结果作为仲裁区域对于IO的响应。
本申请实施例涉及的“第一”、“第二”、“第三”、“第四”等(如果存在)是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的实施例能够以除了在这里图示或描述的内容以外的顺序实施。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法或设备固有的其它步骤或单元。
需要说明的是,在本申请实施例中涉及“第一”、“第二”等的描述仅用于描述目的,而不能理解为指示或暗示其相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括至少一个该特征。另外,各个实施例之间的技术方案可以相互结合,但是必须是以本领域普通技术人员能够实现为基础,当技术方案的结合出现相互矛盾或无法实现时应当认为这种技术方案的结合不存在,也不在本申请实施例要求的保护范围之内。
本说明书中各个实施例采用递进的方式描述,每个实施例重点说明的都是与其它实施例的不同之处,各个实施例之间相同或相似部分互相参见即可。
结合本文中所公开的实施例描述的方法或算法的步骤可以直接用硬件、处理器执行的软件模块,或者二者的结合来实施。软件模块可以置于随机存储器(RAM)、内存、只读存储器(ROM)、电可编程ROM、电可擦除可编程ROM、寄存器、硬盘、可移动磁盘、CD-ROM、或技术领域内所公知的任意其它形式的非易失性可读存储介质中。
本文中应用了可选个例对本申请实施例的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申请实施例的方法及其核心思想;同时,对于本领域的一般技术人员,依据本申请实施例的思想,在可选实施方式及应用范围上均会有改变之处,综上,本说明书内容不应理解为对本申请实施例的限制。

Claims (20)

  1. 一种盘仲裁区域检测方法,其特征在于,包括:
    遍历多控存储系统中的各磁盘,若当前遍历到目标磁盘,则基于预设负载均衡策略确定用于检测所述目标磁盘的目标节点;其中,所述目标节点为所述多控存储系统中的任一节点;
    利用所述目标节点检测所述目标磁盘中的仲裁区域对于IO的响应;
    若所述响应为IO报错消息且所述目标磁盘不是系统仲裁盘,则为所述目标磁盘添加仲裁区域故障的标记信息。
  2. 根据权利要求1所述的方法,其特征在于,所述基于预设负载均衡策略确定用于检测所述目标磁盘的目标节点,包括:
    从所述预设负载均衡策略中获取所述多控存储系统中的各节点与其需要检测的磁盘之间的对应关系;
    按照所述对应关系确定用于检测所述目标磁盘的目标节点。
  3. 根据权利要求2所述的方法,其特征在于,所述对应关系的构建过程包括:
    将所述多控存储系统中的所有磁盘均衡分配至所述多控存储系统中的各节点,得到分配结果;
    按照所述分配结果建立所述对应关系。
  4. 根据权利要求1所述的方法,其特征在于,所述利用所述目标节点检测所述目标磁盘中的仲裁区域对于IO的响应,包括:
    利用所述目标节点读取所述仲裁区域中的任一段连续地址中存储的数据;
    若读取数据失败,则确定所述仲裁区域对于IO的响应为所述IO报错消息。
  5. 根据权利要求4所述的方法,其特征在于,所述确定所述仲裁区域对于IO的响应为所述IO报错消息之前,还包括:
    对当前连续地址执行修复操作,若当前连续地址修复失败,则执行所述确定所述仲裁区域对于IO的响应为所述IO报错消息的步骤。
  6. 根据权利要求5所述的方法,其特征在于,所述对当前连续地址执行修复操作,包括:
    将随机生成数据写入当前连续地址;
    从当前连续地址读取所述随机生成数据;
    若读取到的随机生成数据与写入的随机生成数据一致,则确定当前连续地址修复成功;否则,确定当前连续地址修复失败。
  7. 根据权利要求5所述的方法,其特征在于,还包括:
    若读取数据成功或当前连续地址修复成功,则在当前连续地址的尾地址不是所述仲裁区域的结束地址时,利用所述目标节点读取与当前连续地址相邻的下一段连续地址中存储的数据,并判断是否成功读取数据。
  8. 根据权利要求7所述的方法,其特征在于,还包括:
    若读取数据时获取到所述目标磁盘的离线消息或当前连续地址的尾地址是所述仲裁区域的结束地址,则等待下一检测时间点,以便到达下一检测时间点时,遍历到下一磁盘。
  9. 根据权利要求1至8任一项所述的方法,其特征在于,还包括:
    若所述响应为所述IO报错消息且所述目标磁盘是所述系统仲裁盘,则生成所述系统仲裁盘的故障提示消息。
  10. 根据权利要求9所述的方法,其特征在于,所述生成所述系统仲裁盘的故障提示消息之后,还包括:
    收集所述多控存储系统中各磁盘的所述标记信息;
    基于各磁盘的所述标记信息重新选择所述系统仲裁盘。
  11. 根据权利要求10所述的方法,其特征在于,所述基于各磁盘的所述标记信息重新选择所述系统仲裁盘,包括:
    将没有所述标记信息的磁盘作为候选盘,并基于仲裁盘选择策略在各候选盘中选择所述系统仲裁盘。
  12. 根据权利要求1所述的方法,其特征在于,所述利用所述目标节点检测所述目标磁盘中的仲裁区域对于IO的响应,包括:
    利用所述目标节点下发读请求和/或写请求至所述仲裁区域;
    获取所述仲裁区域对于所述读请求和/或所述写请求的处理结果;
    将所述仲裁区域对于所述读请求和/或所述写请求的处理结果作为所述仲裁区域对于IO的响应。
  13. 根据权利要求1所述的方法,其特征在于,在所述目标磁盘被选为所述系统仲裁盘的情况下,所述目标磁盘中的仲裁区域用于存储仲裁数据;在所述目标磁盘没有被选为所述系统仲裁盘的情况下,所述目标磁盘中的仲裁区域中不存储任何数据。
  14. 根据权利要求13所述的方法,其特征在于,所述仲裁区域位于所述目标磁盘的存储空间尾部。
  15. 根据权利要求12所述的方法,其特征在于,所述将所述仲裁区域对于所述读请 求和/或所述写请求的处理结果作为所述仲裁区域对于IO的响应,包括:
    在所述仲裁区域对于所述读请求的处理结果为读IO的响应有问题或对于所述写请求的处理结果为写IO的响应有问题的情况下,确定所述仲裁区域对于所述IO的响应为所述IO报错消息。
  16. 根据权利要求1所述的方法,其特征在于,所述遍历多控存储系统中的各磁盘,包括:
    按照定时器的设定,按照磁盘号遍历所述多控存储系统中的所有受管磁盘。
  17. 根据权利要求16所述的方法,其特征在于,所述定时器的设定用于指示同一节点对于不同磁盘的检测任务的间隔时长。
  18. 一种盘仲裁区域检测装置,其特征在于,包括:
    遍历模块,被设置为遍历多控存储系统中的各磁盘,若当前遍历到目标磁盘,则基于预设负载均衡策略确定用于检测所述目标磁盘的目标节点;其中,所述目标节点为所述多控存储系统中的任一节点;
    检测模块,被设置为利用所述目标节点检测所述目标磁盘中的仲裁区域对于IO的响应;
    标记模块,被设置为若所述响应为IO报错消息且所述目标磁盘不是系统仲裁盘,则为所述目标磁盘添加仲裁区域故障的标记信息。
  19. 一种电子设备,其特征在于,包括:
    存储器,被设置为存储计算机程序;
    处理器,被设置为执行所述计算机程序,以实现如权利要求1至17任一项所述的方法。
  20. 一种非易失性可读存储介质,其特征在于,被设置为保存计算机程序,其中,所述计算机程序被处理器执行时实现如权利要求1至17任一项所述的方法。
PCT/CN2023/115982 2022-10-18 2023-08-30 盘仲裁区域检测方法、装置、设备及非易失性可读存储介质 WO2024082834A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211269936.5 2022-10-18
CN202211269936.5A CN115359834B (zh) 2022-10-18 2022-10-18 一种盘仲裁区域检测方法、装置、设备及可读存储介质

Publications (1)

Publication Number Publication Date
WO2024082834A1 true WO2024082834A1 (zh) 2024-04-25

Family

ID=84008797

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/115982 WO2024082834A1 (zh) 2022-10-18 2023-08-30 盘仲裁区域检测方法、装置、设备及非易失性可读存储介质

Country Status (2)

Country Link
CN (1) CN115359834B (zh)
WO (1) WO2024082834A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115359834B (zh) * 2022-10-18 2023-03-24 苏州浪潮智能科技有限公司 一种盘仲裁区域检测方法、装置、设备及可读存储介质

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080184061A1 (en) * 2007-01-31 2008-07-31 Shailendra Tripathi Managing A Node Cluster
US20110252270A1 (en) * 2010-04-12 2011-10-13 Symantec Corporation Updating a list of quorum disks
US20170147425A1 (en) * 2015-11-25 2017-05-25 Salesforce.Com, Inc. System and method for monitoring and detecting faulty storage devices
CN106980468A (zh) * 2017-03-03 2017-07-25 杭州宏杉科技股份有限公司 触发raid阵列重建的方法及装置
CN107273231A (zh) * 2016-04-07 2017-10-20 阿里巴巴集团控股有限公司 分布式存储系统硬盘挂住故障检测、处理方法及装置
CN114064374A (zh) * 2021-11-12 2022-02-18 中国建设银行股份有限公司 一种基于分布式块存储的故障检测方法和系统
CN115359834A (zh) * 2022-10-18 2022-11-18 苏州浪潮智能科技有限公司 一种盘仲裁区域检测方法、装置、设备及可读存储介质

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105872031B (zh) * 2016-03-26 2019-06-14 天津书生云科技有限公司 存储系统
US10782898B2 (en) * 2016-02-03 2020-09-22 Surcloud Corp. Data storage system, load rebalancing method thereof and access control method thereof
CN111813604B (zh) * 2020-07-17 2022-06-10 济南浪潮数据技术有限公司 一种故障存储设备的数据恢复方法、系统及相关装置

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080184061A1 (en) * 2007-01-31 2008-07-31 Shailendra Tripathi Managing A Node Cluster
US20110252270A1 (en) * 2010-04-12 2011-10-13 Symantec Corporation Updating a list of quorum disks
US20170147425A1 (en) * 2015-11-25 2017-05-25 Salesforce.Com, Inc. System and method for monitoring and detecting faulty storage devices
CN107273231A (zh) * 2016-04-07 2017-10-20 阿里巴巴集团控股有限公司 分布式存储系统硬盘挂住故障检测、处理方法及装置
CN106980468A (zh) * 2017-03-03 2017-07-25 杭州宏杉科技股份有限公司 触发raid阵列重建的方法及装置
CN114064374A (zh) * 2021-11-12 2022-02-18 中国建设银行股份有限公司 一种基于分布式块存储的故障检测方法和系统
CN115359834A (zh) * 2022-10-18 2022-11-18 苏州浪潮智能科技有限公司 一种盘仲裁区域检测方法、装置、设备及可读存储介质

Also Published As

Publication number Publication date
CN115359834B (zh) 2023-03-24
CN115359834A (zh) 2022-11-18

Similar Documents

Publication Publication Date Title
WO2024082834A1 (zh) 盘仲裁区域检测方法、装置、设备及非易失性可读存储介质
US9886451B2 (en) Computer system and method to assist analysis of asynchronous remote replication
US20100299491A1 (en) Storage apparatus and data copy method
CN104423889A (zh) 一种多路径管理方法和系统
KR101410596B1 (ko) 정보 처리 장치, 카피 제어 프로그램 및 카피 제어 방법
CN105786405A (zh) 一种在线升级方法、装置及系统
CN113051104B (zh) 基于纠删码的磁盘间数据恢复方法及相关装置
JP2008146141A (ja) ストレージシステムと記憶領域の選択方法並びにプログラム
JP2013196274A (ja) マルチノードストレージシステムのノード装置および処理速度管理方法
WO2019210844A1 (zh) 存储设备异常检测方法及装置、分布式存储系统
US20200364175A1 (en) Extended storage device based on pcie bus
JP2017054204A (ja) ストレージ制御装置、制御方法、および制御プログラム
US20190129646A1 (en) Method, system, and computer program product for managing storage system
WO2021088423A1 (zh) 一种用于raid io的内存管理方法、系统、终端及存储介质
US9009548B2 (en) Memory testing of three dimensional (3D) stacked memory
JP6069962B2 (ja) 情報処理装置、領域解放制御プログラム、および領域解放制御方法
CN104407806B (zh) 独立磁盘冗余阵列组硬盘信息的修改方法和装置
CN116974489A (zh) 一种数据处理方法、装置、系统、电子设备及存储介质
US20140068214A1 (en) Information processing apparatus and copy control method
CN115309334A (zh) 磁盘管理方法、装置、设备及存储介质
CN115202589A (zh) 放置组成员选择方法、装置、设备及可读存储介质
TWI827105B (zh) 半導體結構的修復系統、修復方法、存儲介質及電子設備
JP5924117B2 (ja) コンピュータ、データ格納方法、データ格納プログラム及び情報処理システム
US20230153206A1 (en) Selective rebuild of interrupted devices in data storage device arrays
CN201804319U (zh) 嵌入式烧录晶片系统