WO2024022469A1 - 磁盘阵列冗余方法、系统、计算机设备和存储介质 - Google Patents

磁盘阵列冗余方法、系统、计算机设备和存储介质 Download PDF

Info

Publication number
WO2024022469A1
WO2024022469A1 PCT/CN2023/109739 CN2023109739W WO2024022469A1 WO 2024022469 A1 WO2024022469 A1 WO 2024022469A1 CN 2023109739 W CN2023109739 W CN 2023109739W WO 2024022469 A1 WO2024022469 A1 WO 2024022469A1
Authority
WO
WIPO (PCT)
Prior art keywords
disk array
operating mode
agent
silent
state machine
Prior art date
Application number
PCT/CN2023/109739
Other languages
English (en)
French (fr)
Inventor
贺坤
朱红玉
Original Assignee
苏州元脑智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 苏州元脑智能科技有限公司 filed Critical 苏州元脑智能科技有限公司
Publication of WO2024022469A1 publication Critical patent/WO2024022469A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0683Plurality of storage devices
    • G06F3/0689Disk arrays, e.g. RAID, JBOD
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1456Hardware arrangements for backup
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1458Management of the backup or restore process
    • G06F11/1469Backup restoration techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0614Improving the reliability of storage systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0629Configuration or reconfiguration of storage systems
    • G06F3/0634Configuration or reconfiguration of storage systems by changing the state or mode of one or more devices

Definitions

  • the present application relates to the field of disk arrays, and in particular, to a disk array redundancy method, system, computer equipment and storage medium.
  • RAID Redundant Array of Independent Disks
  • RAID provides fault tolerance and performance to a certain extent, it cannot guarantee that the business will always be online. That is, when the number of offline disks in RAID exceeds the number of redundant disks, it will cause Going offline will cause the front-end business to stop and directly cause business downtime, which is a catastrophic failure for some key industries such as finance and communications.
  • the present application provides a disk array redundancy method.
  • the disk array redundancy method includes:
  • the state machine sends a silent request to the agent, causing the agent to perform silent processing to suspend the input and output operations of the disk array, and switch the normal operating mode of the disk array to a temporary operating mode through the agent;
  • the state machine sends a silent recovery request to the agent, causing the agent to perform silent recovery processing to restore the input and output operations of the disk array;
  • the response timer does not exceed the preset redundancy time, and the disk array The operating status of the array returns to normal, and the operating mode of the disk array is switched to the normal operating mode through the agent.
  • abnormal information is sent to the state machine, so that the state machine starts a timer for timing according to the preset redundancy duration, including:
  • the state machine sends a quiesce request to the agent, causing the agent to perform quiesce processing to suspend the input and output operations of the disk array, and to switch the normal operating mode of the disk array to a temporary operating mode through the agent, including :
  • the state machine sends a silent request to the agent and prepares mode change data
  • the agent performs silent processing, causing the host to suspend issuing input and output operations, and adds the issued input and output operations to the silent queue. After the agent completes the silent processing, it sends a silent operation completion message to the state machine;
  • the state machine sends the mode change data to the agent, and the agent switches the normal operating mode of the disk array from normal operating mode to temporary operating mode based on the mode changing data.
  • the state machine sends a silent recovery request to the agent, causing the agent to perform silent recovery processing to restore the input and output operations of the disk array, including:
  • the agent performs silent recovery processing, restores the input and output operations in the silent queue to the normal queue, and allows the host to continue to issue input and output operations. After the input and output operations return to normal in the temporary running mode, the input and output operations will be temporarily run. A message to restore normal mode is sent to the state machine.
  • the state machine after the message that the input and output operations are restored to normal in the temporary operating mode is sent to the state machine, it also includes:
  • the temporary operating mode of the disk array is switched to the offline operating mode through the agent.
  • switching the temporary operating mode of the disk array to the normal operating mode through the agent includes:
  • the state machine sends a silent request to the agent and prepares mode change data
  • the agent performs silent processing, causing the host to suspend issuing input and output operations, and adds the issued input and output operations to the silent queue. After the agent completes the silent processing, it sends a silent operation completion message to the state machine;
  • the state machine sends the mode change data to the agent, and the agent changes the operation of the disk array based on the mode change data.
  • the mode switches from temporary operation mode to normal operation mode.
  • switching the temporary operating mode of the disk array to the normal operating mode through the agent also includes:
  • the agent performs silent recovery processing, restores the input and output operations in the silent queue to the normal queue, and causes the host to continue to issue input and output operations. After the input and output operations return to normal in the normal operating mode, the input and output operations will run normally. A message to restore normal mode is sent to the state machine; and
  • switching the temporary operating mode of the disk array to the offline operating mode through the agent includes:
  • the state machine sends a silent request to the agent and prepares mode change data
  • the agent performs silent processing, causing the host to stop issuing input and output operations, and adds the issued input and output operations to the silent queue. After the agent completes the silent processing, it sends a silent operation completion message to the state machine;
  • the state machine sends the mode change data to the agent, and the agent switches the disk array's operating mode from temporary operating mode to offline operating mode based on the mode changing data.
  • switching the temporary operating mode of the disk array to the offline operating mode through the agent also includes:
  • the agent performs silent recovery processing, returns all input and output operation requests to the host, and disconnects the main link of the host;
  • the abnormal operating status of the disk array means that the number of offline disks in the disk array exceeds the number of redundant disks.
  • the method before starting the timer according to the preset redundancy duration, the method further includes:
  • the method before starting the timer according to the preset redundancy duration, the method further includes:
  • this application provides a disk array redundant system.
  • the disk array redundant system includes:
  • the disk management terminal is used to monitor the operating status of the disk array in real time, and in response to abnormal operating status of the disk array, sends abnormal information to the state machine;
  • the state machine is used to start the timer according to the preset redundancy duration, send a silent request or a silent recovery request to the agent, and determine whether the timer's timing exceeds the preset redundancy duration;
  • the agent side is used to enable the agent side to perform silent processing or silent recovery processing, and to switch the operating mode of the agent side.
  • a computer device including one or more memories, one or more processors, and a computer program stored on the one or more memories and capable of running on the one or more processors. Read instructions, steps in a method for implementing disk array redundancy when one or more processors execute computer-readable instructions.
  • the present application provides a non-volatile computer-readable storage medium.
  • the non-volatile computer-readable storage medium stores computer-readable instructions.
  • the computer-readable instructions are executed by one or more processors, When, one or more processors are caused to execute the steps of the method for implementing disk array redundancy.
  • Figure 1 is a flow chart of a first disk array redundancy method in one or more embodiments of the present application
  • Figure 2 is a flow chart of a second disk array redundancy method in one or more embodiments of the present application
  • Figure 3 is a timing diagram of a disk array redundancy method in one or more embodiments of the present application.
  • Figure 4 is a system structure diagram of a disk array redundancy system in one or more embodiments of the present application.
  • Figure 5 is a device structure diagram of a computer device in one or more embodiments of the present application.
  • RAID provides a fault-tolerant mechanism and performance to a certain extent, it cannot guarantee that the business will always be online. That is, when the number of offline disks in RAID exceeds the number of redundant disks, it will cause offline operations and cause the front-end business to stop. , directly causing business downtime, which is a catastrophic failure for some key industries such as finance and communications.
  • this article proposes a disk array redundancy method, system, computer equipment and storage media, so that a disk array with abnormal operating status can enter a temporary operating mode without interrupting the business and ensuring the normal operation of the business; and Set a preset redundancy duration for the temporary operating mode, so that the normal operation of the business can be guaranteed within the preset redundancy duration, and after the disk array returns to normal within the preset redundancy duration, it can automatically return to the normal operation mode, improving It improves the redundancy capability of the disk array and ensures the normal operation of the business.
  • Figure 1 is a flow chart of the first method of the disk array redundancy method of the present application
  • Figure 2 is a flow chart of the first method of the disk array redundancy method of the present application.
  • Two method flow chart Figure 3 This is a timing diagram of the disk array redundancy method of this application.
  • the disk array redundancy method includes the following steps:
  • the disk array will be equipped with redundant disks.
  • the operating status of the disk array will be in an abnormal state (for example, a RAID6 disk array will be equipped with two redundant disks.
  • the RAID6 disk array When the number of offline disks in a RAID6 disk array exceeds 2, the RAID6 disk array is in an abnormal state), so the disk array must be monitored in real time to monitor the operating status of the disk array.
  • the abnormal message of the disk array is sent to the state machine, causing the state machine to start the timer timing function, and monitor the relationship between the timing time of the state machine and the preset redundancy time to pass the two Switch the disk array operating mode based on the relationship between the
  • the state machine sends a silent request to the agent, causing the agent to perform silent processing to suspend the input and output operations of the disk array, and switch the normal operating mode of the disk array to a temporary operating mode through the agent;
  • Silent processing refers to pausing the input and output operations of the disk array in the current operating mode.
  • the normal operation of the disk array is suspended. mode, and then switches the disk array's operating mode to the temporary operating mode through the agent, so that the input and output operations can be processed when the disk array is abnormal, but the disk array in the temporary operating mode is processed The amount of input and output operations is reduced, but normal business can still be guaranteed.
  • the state machine sends a silent recovery request to the agent, causing the agent to perform silent recovery processing to restore the input and output operations of the disk array;
  • the state machine After the agent switches the operating mode of the disk array to the temporary operating mode, the state machine sends a silent recovery request to the agent.
  • the silent recovery request refers to performing input and output operations on the disk array according to the operating mode to be switched.
  • the disk array will Perform input and output operations according to the temporary operating mode so that the disk array in an abnormal state can ensure normal business operations.
  • the operating mode of the agent will be switched to the normal operating mode.
  • the agent After the agent performs silent recovery processing, it is necessary to monitor the operating status of the disk array. If the operating status of the disk array returns to normal and the timer does not exceed the preset redundancy time, the operating mode of the agent needs to be switched to normal. operating mode. During this preset redundancy period, the offline disks in the disk array will automatically come back online, or the operation and maintenance personnel will manually perform maintenance on the disk array, so that the offline disks in the disk array will come back online. In short, the preset redundancy time is here.
  • the operating mode of the disk array can be switched back to the normal operating mode and the normal input and output operations of the disk array can be restored. Restoring the normal business of the disk array effectively improves the redundancy capability of the disk array.
  • the abnormality information is sent to the state machine, so that the state machine starts a timer for timing according to the preset redundancy duration, including:
  • the abnormal information is sent to the state machine and the operation and maintenance personnel are notified of the abnormal information;
  • a disk array abnormality message will be sent to the state machine to start the timer timing function of the state machine, and the abnormal information will be notified to the operation and maintenance personnel. , allowing operation and maintenance personnel to troubleshoot and restore offline disks in the disk array. In addition, logs must be printed to save operation information.
  • the state machine is set with a flag state to identify the operating status of the disk array; after the state machine receives the abnormal information of the disk array, it automatically sets the flag state of the state machine to the disk abnormal state, that is, the bottom of the disk in the disk array. online state; then the state machine automatically starts the timing function of the timer, and always judges the relationship between the timer's timing time and the preset redundancy duration, so as to pass the relationship between the timer's timing time and the preset redundancy duration. To determine the operating mode to which the disk array is to be switched; in addition, the operation and maintenance personnel must be notified that timing has started.
  • the state machine sends a quiesce request to the agent, causing the agent to perform quiesce processing to suspend the input and output operations of the disk array, and to switch the normal operating mode of the disk array to a temporary operating mode through the agent, including :
  • the state machine sends a silent request to the agent and prepares mode change data
  • the state machine After the timer of the state machine starts the timing function, the state machine automatically sends a silent request to the agent and prepares mode change data.
  • the disk array performs input and output operation requests according to the normal operating mode and needs to execute according to the normal operating mode data; the disk array performs input and output operation requests according to the temporary operating mode and needs to execute according to the temporary operating mode data; the disk array performs input and output operations according to the offline operating mode
  • the request needs to be executed according to the temporary offline mode data; therefore, before the operating mode switching of the disk array needs to be performed, the mode change data must be prepared. For example, if the current operating mode of the disk array is normal operating mode and it is going to be switched to temporary operating mode, you need to prepare the mode change data for switching the normal operating mode to temporary operating mode.
  • the agent performs silent processing, causing the host to suspend issuing input and output operations, and adds the issued input and output operations to the silent queue. After the agent completes the silent processing, it sends a silent operation completion message to the state machine;
  • the agent After receiving the quiescent request from the state machine, the agent performs quiescent processing, that is, stops the input and output operations issued by the host in the normal operating mode of the disk array, and adds the issued input and output operations to the quiescent queue; wait for the quiescent processing to be executed After completion, a silent operation completion message is returned to the state machine.
  • quiescent processing that is, stops the input and output operations issued by the host in the normal operating mode of the disk array, and adds the issued input and output operations to the quiescent queue; wait for the quiescent processing to be executed After completion, a silent operation completion message is returned to the state machine.
  • the state machine sends the mode change data to the agent, and the agent switches the disk array's operating mode from normal operating mode to temporary operating mode based on the mode changing data.
  • the state machine After the state machine receives the silent operation completion message, it sends the mode change data to the agent.
  • the agent switches the operating mode of the disk array based on the mode change data and switches the normal operating mode of the disk array to temporary operation. row mode to perform input and output operations in temporary operating mode without interrupting business, effectively improving the redundancy capability of the disk array.
  • the state machine sends a silent recovery request to the agent, causing the agent to perform silent recovery processing to restore the input and output operations of the disk array, including:
  • a message to switch the normal operating mode of the disk array to the temporary operating mode is sent to the state machine.
  • the state machine receives the message, the default disk array operating mode switch is completed.
  • the state machine then sends a quiesce recovery request to the agent to restore quiesce, and the disk array performs input and output operations according to the switched operating mode.
  • the agent performs silent recovery processing, restores the input and output operations in the silent queue to the normal queue, and causes the host to continue to issue input and output operations. After the input and output operations return to normal in the temporary operation mode, the input and output operations will be temporarily run. A message to restore normal mode is sent to the state machine.
  • the agent restores the input and output operations in the silent queue to the normal queue in the temporary operating mode, that is, the input and output operations in the silent queue are executed normally according to the operating mode after the disk array is switched.
  • the host resumes issuing input and output operations, which are executed normally according to the operating mode after the disk array is switched; after the input and output operations return to normal, the message that the input and output operations return to normal returns to the state machine.
  • the state machine after the message that the input and output operations return to normal in the temporary operating mode is sent to the state machine, it also includes:
  • the state machine After receiving the message that the input and output operations have returned to normal, the state machine obtains the operating status of the disk array and determines whether the status of the disk array is normal and whether the timer's timing exceeds the preset redundancy time; if the operating status of the disk array is restored Normal, and the timer's timing does not exceed the preset redundancy time, it means that the offline disks in the disk array have been automatically brought online, or the operation and maintenance personnel have manually brought the offline disks online, that is, the disk array has returned to normal status, you need to switch the temporary operating mode of the disk array to the normal operating mode through the agent, perform input and output operations according to the normal operating mode, and the disk array processing business returns to normal.
  • the temporary operating mode of the disk array is switched to the offline operating mode through the agent.
  • the temporary operating mode of the disk array needs to be switched to the offline operating mode through the agent, and a message needs to be sent to the operation and maintenance personnel to promptly maintain the disk array in the offline operating mode.
  • switching the temporary operating mode of the disk array to the normal operating mode through the agent includes:
  • the state machine sends a silent request to the agent and prepares mode change data
  • the state machine After the timer of the state machine starts the timing function, the state machine automatically sends a silent request to the agent, and prepares mode change data before switching the operating mode of the disk array.
  • the mode change data is to switch the temporary operating mode of the disk array to Mode change data for normal operating mode.
  • the agent performs silent processing, causing the host to suspend issuing input and output operations, and adds the issued input and output operations to the silent queue. After the agent completes the silent processing, it sends a silent operation completion message to the state machine;
  • the agent After receiving the quiescent request from the state machine, the agent performs quiescent processing, that is, stops the input and output operations issued by the host in the temporary operating mode of the disk array, and adds the issued input and output operations to the quiescent queue; wait for the quiescent processing to be executed After completion, a silent operation completion message is returned to the state machine.
  • quiescent processing that is, stops the input and output operations issued by the host in the temporary operating mode of the disk array, and adds the issued input and output operations to the quiescent queue; wait for the quiescent processing to be executed After completion, a silent operation completion message is returned to the state machine.
  • the state machine sends the mode change data to the agent, and the agent switches the disk array's operating mode from temporary operating mode to normal operating mode based on the mode changing data.
  • the state machine After receiving the silent operation completion message, the state machine sends the mode change data for switching the temporary operating mode of the disk array to the normal operating mode to the agent.
  • the agent switches the operating mode of the disk array based on the mode change data and transfers the disk array's operating mode to the agent.
  • the temporary operating mode is switched to the normal operating mode to restore the normal operating mode, perform input and output operations, and restore business processing capabilities, effectively improving the redundancy capability of the disk array.
  • switching the temporary operating mode of the disk array to the normal operating mode through the agent also includes:
  • a message for switching the temporary operating mode of the disk array to the normal operating mode is sent to the state machine.
  • the state machine receives the message, the default disk array operating mode switch is completed.
  • the state machine then sends a quiesce recovery request to the agent to restore quiesce, and the disk array performs input and output operations according to the normal operating mode after switching.
  • the agent performs silent recovery processing, restores the input and output operations in the silent queue to the normal queue, and causes the host to continue to issue input and output operations. After the input and output operations return to normal in the normal operating mode, the input and output operations will run normally.
  • the message to restore normal mode is sent to the state machine;
  • the agent After receiving the silent recovery request of the state machine, the agent will restore the input and output operations in the silent queue to the normal queue in the normal operating mode, that is, the input and output operations in the silent queue will be normal according to the normal operating mode after the disk array is switched. Execution, and the host resumes issuing input and output operations, all are executed normally according to the normal operating mode after the disk array is switched; after the input and output operations return to normal, the message that the input and output operations have returned to normal in the normal operating mode returns to the state machine.
  • the operating status of the disk array has returned to normal, and the disk array has returned to the normal operating mode.
  • the flag state of the state machine needs to be set to the disk normal state, the business is executed normally, the input and output operations are executed normally, and the disk array continues to operate normally. run. In addition, logs must be printed to save operation information.
  • switching the temporary operating mode of the disk array to the offline operating mode through the agent includes:
  • the state machine sends a silent request to the agent and prepares mode change data
  • the state machine After the timer of the state machine starts the timing function, the state machine automatically sends a silent request to the agent, and prepares mode change data before switching the operating mode of the disk array.
  • the mode change data is to switch the temporary operating mode of the disk array to Mode change data for offline running mode.
  • the agent performs silent processing, causing the host to stop issuing input and output operations, and adds the issued input and output operations to the silent queue. After the agent completes the silent processing, it sends a silent operation completion message to the state machine;
  • the agent After receiving the quiescent request from the state machine, the agent performs quiescent processing, that is, stops the input and output operations issued by the host in the temporary operating mode of the disk array, and adds the issued input and output operations to the quiescent queue; wait for the quiescent processing to be executed After completion, a silent operation completion message is returned to the state machine.
  • quiescent processing that is, stops the input and output operations issued by the host in the temporary operating mode of the disk array, and adds the issued input and output operations to the quiescent queue; wait for the quiescent processing to be executed After completion, a silent operation completion message is returned to the state machine.
  • the state machine sends the mode change data to the agent, and the agent switches the disk array's operating mode from temporary operating mode to offline operating mode based on the mode changing data.
  • the state machine After receiving the silent operation completion message, the state machine sends the mode change data for switching the temporary operating mode of the disk array to the offline operating mode to the agent.
  • the agent switches the operating mode of the disk array based on the mode change data and transfers the disk array's operating mode to the offline operating mode.
  • the temporary operating mode is switched to offline operating mode to stop all input and output operations and services.
  • switching the temporary operating mode of the disk array to the offline operating mode through the agent also includes:
  • a message to switch the temporary operating mode of the disk array to the offline operating mode is sent to the state machine.
  • the state machine receives the message, the default disk array operating mode switch is completed.
  • the state machine then sends a quiesce recovery request to the agent to restore quiesce, and the disk array performs input and output operations according to the switched offline operating mode.
  • the agent performs silent recovery processing, returns all input and output operation requests to the host, and disconnects the main link of the host;
  • the agent executes the input and output operations in the silent queue according to the offline operation mode after the disk array is switched, that is, stops all input and output operations and services.
  • the flag status of the state machine needs to be set to the disk abnormal status and reported to the operation and maintenance personnel. In addition, logs must be printed to save operation information.
  • the abnormal operating status of the disk array means that the number of offline disks in the disk array exceeds the number of redundant disks.
  • the abnormal state of the disk array means that the number of offline disks in the disk array exceeds the number of redundant disks, and normal input and output operations and normal business processing cannot be performed.
  • the method before starting the timer according to the preset redundancy duration, the method further includes:
  • Different fault types of disk arrays correspond to different preset redundancy durations. Therefore, different preset redundancy durations can be set for different fault types of disk arrays, and the preset redundancy durations under different fault types can be stored in the database. , so that the preset redundancy duration corresponding to the fault type can be obtained before the timer executes timing. In addition, the preset redundancy time also takes into account factors such as the controller failure recovery time.
  • the method before starting the timer according to the preset redundancy duration, the method further includes:
  • the state machine determines the fault type of the disk array, then obtains the corresponding preset redundancy duration from the preset redundancy duration database based on the fault type, and then starts the timer according to the preset redundancy duration. Time it.
  • Figure 1 is a flow chart of the first method of the disk array redundancy method of the present application
  • Figure 2 is a flow chart of the second method of the disk array redundancy method of the present application
  • Figure 3 is a flow chart of the second method of the disk array redundancy method of the present application
  • Timing diagram of disk array redundancy method is a flow chart of the first method of the disk array redundancy method of the present application.
  • the abnormal information is sent to the state machine and the operation and maintenance personnel are notified of the abnormal information;
  • a disk array abnormality message will be sent to the state machine to start the timer timing function of the state machine, and the abnormal information will be notified to the operation and maintenance personnel. , allowing operation and maintenance personnel to troubleshoot and restore offline disks in the disk array.
  • the fault type of the disk array will be saved.
  • the state machine After the state machine receives the abnormal information of the disk array, it automatically sets the flag state of the state machine to the disk abnormal state, which is the offline state of the disk in the disk array.
  • the state machine automatically starts the timing function of the timer, obtains the corresponding preset redundancy duration from the preset redundancy duration database according to the disk array failure type, and always judges the difference between the timer timing time and the preset redundancy duration.
  • the relationship between the timer timing time and the preset redundancy duration is used to determine the operating mode that the disk array is to switch to; in addition, the operation and maintenance personnel must be notified that timing has started.
  • the state machine sends a silent request to the agent and prepares mode change data
  • the state machine After the timer of the state machine starts the timing function, the state machine automatically sends a silent request to the agent and prepares the mode Change data.
  • the current operating mode of the disk array is the normal operating mode. If it is to be switched to the temporary operating mode, you need to prepare the mode change data for switching the normal operating mode to the temporary operating mode.
  • the agent performs silent processing, causing the host to suspend issuing input and output operations, and adds the issued input and output operations to the silent queue. After the agent completes the silent processing, it sends a silent operation completion message to the state machine;
  • the agent After receiving the quiescent request from the state machine, the agent performs quiescent processing, that is, stops the input and output operations issued by the host in the normal operating mode of the disk array, and adds the issued input and output operations to the quiescent queue; wait for the quiescent processing to be executed After completion, a silent operation completion message is returned to the state machine.
  • quiescent processing that is, stops the input and output operations issued by the host in the normal operating mode of the disk array, and adds the issued input and output operations to the quiescent queue; wait for the quiescent processing to be executed After completion, a silent operation completion message is returned to the state machine.
  • the state machine sends the mode change data to the agent, and the agent switches the normal operating mode of the disk array from normal operating mode to temporary operating mode based on the mode changing data;
  • the state machine After the state machine receives the silent operation completion message, it sends the mode change data to the agent.
  • the agent switches the operating mode of the disk array according to the mode change data and switches the normal operating mode of the disk array to the temporary operating mode to temporarily Input and output operations are performed in running mode without interrupting business, effectively improving the redundancy capability of the disk array.
  • a message to switch the normal operating mode of the disk array to the temporary operating mode is sent to the state machine.
  • the state machine receives the message, the default disk array operating mode switch is completed.
  • the state machine then sends a quiesce recovery request to the agent to restore quiesce, and the disk array performs input and output operations according to the switched operating mode.
  • the agent performs silent recovery processing, restores the input and output operations in the silent queue to the normal queue, and causes the host to continue to issue input and output operations. After the input and output operations return to normal in the temporary operation mode, the input and output operations will be temporarily run.
  • the message to restore normal mode is sent to the state machine;
  • the agent restores the input and output operations in the silent queue to the normal queue in the temporary operating mode, that is, the input and output operations in the silent queue are executed normally according to the operating mode after the disk array is switched.
  • the host resumes issuing input and output operations, which are executed normally according to the operating mode after the disk array is switched; after the input and output operations return to normal, the message that the input and output operations return to normal returns to the state machine.
  • the state machine sends a silent request to the agent and prepares mode change data
  • the state machine After the timer of the state machine starts the timing function, the state machine automatically sends a silent request to the agent, and prepares mode change data before switching the operating mode of the disk array.
  • the mode change data is to switch the temporary operating mode of the disk array to Mode change data for normal operating mode.
  • the agent performs silent processing, causing the host to suspend issuing input and output operations, and adds the issued input and output operations to the silent queue. After the agent completes the silent processing, it sends a silent operation completion message to the state machine;
  • the agent After receiving the quiescent request from the state machine, the agent performs quiescent processing, that is, stops the input and output operations issued by the host in the temporary operating mode of the disk array, and adds the issued input and output operations to the quiescent queue; wait for the quiescent processing to be executed After completion, a silent operation completion message is returned to the state machine.
  • quiescent processing that is, stops the input and output operations issued by the host in the temporary operating mode of the disk array, and adds the issued input and output operations to the quiescent queue; wait for the quiescent processing to be executed After completion, a silent operation completion message is returned to the state machine.
  • the state machine sends the mode change data to the agent, and the agent switches the disk array's operating mode from temporary operating mode to normal operating mode based on the mode changing data;
  • the state machine After receiving the silent operation completion message, the state machine sends the mode change data for switching the temporary operating mode of the disk array to the normal operating mode to the agent.
  • the agent switches the operating mode of the disk array based on the mode change data and transfers the disk array's operating mode to the agent.
  • the temporary operating mode is switched to the normal operating mode to restore the normal operating mode, perform input and output operations, and restore business processing capabilities, effectively improving the redundancy capability of the disk array.
  • a message for switching the temporary operating mode of the disk array to the normal operating mode is sent to the state machine.
  • the state machine receives the message, the default disk array operating mode switch is completed.
  • the state machine then sends a quiesce recovery request to the agent to restore quiesce, and the disk array performs input and output operations according to the normal operating mode after switching.
  • the agent performs silent recovery processing, restores the input and output operations in the silent queue to the normal queue, and causes the host to continue to issue input and output operations. After the input and output operations return to normal in the normal operating mode, the input and output operations will run normally.
  • the message to restore normal mode is sent to the state machine;
  • the agent After receiving the silent recovery request of the state machine, the agent will restore the input and output operations in the silent queue to the normal queue in the normal operating mode, that is, the input and output operations in the silent queue will be normal according to the normal operating mode after the disk array is switched. Execution, and the host resumes issuing input and output operations, all are executed normally according to the normal operating mode after the disk array is switched; after the input and output operations return to normal, the message that the input and output operations return to normal returns to the state machine.
  • the operating status of the disk array has returned to normal, and the disk array has returned to the normal operating mode.
  • the flag state of the state machine needs to be set to the disk normal state.
  • the state machine sends a silent request to the agent and prepares mode change data
  • the state machine After the timer of the state machine starts the timing function, the state machine automatically sends a silent request to the agent, and prepares mode change data before switching the operating mode of the disk array.
  • the mode change data is to switch the temporary operating mode of the disk array to Mode change data for offline running mode.
  • the agent performs silent processing, causing the host to stop issuing input and output operations, and adds the issued input and output operations to the silent queue. After the agent completes the silent processing, it sends a silent operation completion message to the state machine;
  • the agent After receiving the quiescence request of the state machine, the agent performs silencing processing, that is, stops the input and output operations issued by the host in the temporary operating mode of the disk array, and adds the issued input and output operations to the silencing queue; pending silencing processing After the execution is completed, a silent operation completion message is returned to the state machine.
  • the state machine sends the mode change data to the agent, and the agent switches the disk array's operating mode from temporary operating mode to offline operating mode based on the mode changing data;
  • the state machine After receiving the silent operation completion message, the state machine sends the mode change data for switching the temporary operating mode of the disk array to the offline operating mode to the agent.
  • the agent switches the operating mode of the disk array based on the mode change data and transfers the disk array's operating mode to the offline operating mode.
  • the temporary operating mode is switched to offline operating mode to stop all input and output operations and services.
  • a message to switch the temporary operating mode of the disk array to the offline operating mode is sent to the state machine.
  • the state machine receives the message, the default disk array operating mode switch is completed.
  • the state machine then sends a quiesce recovery request to the agent to restore quiesce, and the disk array performs input and output operations according to the switched offline operating mode.
  • the agent performs silent recovery processing, returns all input and output operation requests to the host, and disconnects the host's main link;
  • the agent After receiving the silent recovery request of the state machine, the agent performs the input and output operations in the silent queue according to the offline operation mode after the disk array is switched, that is, stops all input and output operations and services.
  • the flag status of the state machine needs to be set to the disk abnormal status and reported to the operation and maintenance personnel.
  • Figure 4 is a system structure diagram of the disk array redundant system of the present application.
  • the disk management terminal is used to monitor the operating status of the disk array in real time. When the operating status of the disk array is abnormal, it sends abnormal information to the state machine;
  • the disk management end includes a status monitoring module and an information processing module.
  • the status monitoring module is used to monitor the operating status of the disk array in real time.
  • the information processing module is used to send abnormal information to the state machine when the operating status of the disk array is abnormal.
  • the disk array will be equipped with redundant disks. When the disk array fails and the number of offline disks exceeds the number of redundant disks, the operating status of the disk array will be in an abnormal state. Therefore, the status monitoring module must be used to monitor the disk array in real time to ensure Monitor the operating status of the disk array.
  • the The disk array exception message is sent to the state machine through the information processing module, causing the state machine to start the timer timing function.
  • the state machine is used to start the timer for timing according to the preset redundancy duration, send a silent request or a silent recovery request to the agent, and determine whether the timer's timing exceeds the preset redundancy duration;
  • the state machine includes a timer module, a request processing module and a timing judgment module; the timer module is used to start the timer according to the preset redundancy duration; the request processing module is used to send a silent request or a silent recovery request to the agent; timing judgment The module is used to determine whether the timer's timing exceeds the preset redundancy time. Start the timer timing function through the timer module.
  • the state machine After the state machine starts the timer timing function, the state machine sends a silent request or silent recovery request to the agent through the request processing module, so that the agent performs silent processing or silent recovery processing; input and output operations
  • the normal operation message After the normal operation message is sent to the state machine in the temporary operation mode, the relationship between the timer's timing time and the preset redundancy time needs to be judged through the timing judgment module, so as to determine the subsequent need to switch to normal operation based on the relationship between the two. mode or offline running mode.
  • the agent is used to perform silent processing or silent recovery processing, and switch the operating mode of the disk array.
  • the agent includes a silent processing module and a mode switching module; the silent processing module is used to perform silent processing or silent recovery processing; the mode switching module is used to switch the operating mode of the disk array.
  • the silent processing module After receiving the silent request or silent recovery request sent by the state machine, the silent processing module performs the corresponding silent processing or silent recovery processing; in addition, the mode switching module switches the operating mode of the disk array according to the mode change data.
  • the disk array redundant system further includes:
  • Information alarm terminal is used to send alarm information to operation and maintenance personnel.
  • the state machine further includes:
  • the state setting module is used to set the flag state of the state machine.
  • the flag state of the state machine is set to the disc abnormal state, or the flag state of the state machine is set to the disc normal state.
  • the mode data module is used to store the configuration data of each operating mode of the disk array.
  • the mode change data is the data when switching the operating mode of the disk array, and the configuration data of each operating mode of the disk array is stored through the mode data module.
  • Each module in the above-mentioned disk array redundant system can be realized in whole or in part through software, hardware and their combination.
  • Each of the above modules may be embedded in or independent of the processor of the computer device in the form of hardware, or may be stored in the memory of the computer device in the form of software, so that the processor can call and execute the operations corresponding to the above modules.
  • This embodiment provides a computer device, including a memory, a processor, and computer readable instructions stored in the memory and executable on the processor.
  • a computer device including a memory, a processor, and computer readable instructions stored in the memory and executable on the processor.
  • the processor executes the computer readable instructions, a disk array redundancy method is implemented. A step of.
  • the computer device may be a terminal, and its internal structure diagram may be as shown in Figure 5.
  • the computer equipment includes The system bus connects the processor, memory, network interfaces, display screens, and input devices.
  • the processor of the computer device is used to provide computing and control capabilities.
  • the memory of the computer device includes non-volatile storage media and internal memory.
  • the non-volatile storage medium stores an operating system and computer-readable instructions.
  • This internal memory provides an environment for the execution of an operating system and computer-readable instructions in a non-volatile storage medium.
  • the network interface of the computer device is used to communicate with external terminals through a network connection.
  • the computer readable instructions when executed by the processor implement the disk array redundancy method.
  • the display screen of the computer device may be a liquid crystal display or an electronic ink display.
  • the input device of the computer device may be a touch layer covered on the display screen, or may be a button, trackball or touch pad provided on the computer device shell. , it can also be an external keyboard, track
  • FIG. 5 is only a block diagram of a partial structure related to the solution of the present application, and does not constitute a limitation on the computer equipment to which the solution of the present application is applied.
  • a specific computer Devices may include more or fewer components than shown in the figures, or some combinations of components, or have different arrangements of components.
  • a computer device including a memory, a processor, and computer-readable instructions stored in the memory and executable on the processor.
  • the processor executes the computer-readable instructions, disk array redundancy is achieved.
  • This embodiment provides a non-volatile computer-readable storage medium on which computer-readable instructions are stored.
  • the steps of the disk array redundancy method are implemented.
  • Non-volatile memory may include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory may include random access memory (RAM) or external cache memory.
  • RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Synchlink DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.
  • SRAM static RAM
  • DRAM dynamic RAM
  • SDRAM synchronous DRAM
  • DDRSDRAM double data rate SDRAM
  • ESDRAM enhanced SDRAM
  • SLDRAM synchronous chain Synchlink DRAM
  • Rambus direct RAM
  • DRAM direct memory bus dynamic RAM
  • RDRAM memory bus dynamic RAM

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Quality & Reliability (AREA)
  • Hardware Redundancy (AREA)

Abstract

本申请公开了一种磁盘阵列冗余方法、系统、计算机设备和存储介质,方法包括:实时监测磁盘阵列的运行状态,响应于监测到磁盘阵列的运行状态异常,发送异常信息至状态机,以通过状态机按照预设冗余时长启动定时器进行计时;状态机向代理端发送静默请求,使代理端执行静默处理,以暂停磁盘阵列的输入输出操作,并通过代理端将磁盘阵列的正常运行模式切换为临时运行模式;状态机向代理端发送静默恢复请求,使代理端执行静默恢复处理,以恢复磁盘阵列的输入输出操作;待静默恢复处理完毕后,响应于定时器的计时时间未超过预设冗余时长,且磁盘阵列的运行状态恢复正常,通过代理端将磁盘阵列的运行模式切换为正常运行模式。

Description

磁盘阵列冗余方法、系统、计算机设备和存储介质
相关申请的交叉引用
本申请要求于2022年07月28日提交中国专利局,申请号为202210895183.2,申请名称为“磁盘阵列冗余方法、系统、计算机设备和存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及磁盘阵列领域,尤其涉及一种磁盘阵列冗余方法、系统、计算机设备和存储介质。
背景技术
在存储系统中,RAID(Redundant Array of Independent Disks,独立冗余磁盘阵列)把多个硬盘设备组合成一个容量更大、安全性更好的磁盘阵列,并把数据切割成多个区段后分别存放在各个不同的物理硬盘设备上,然后利用分散读写技术来提升磁盘阵列整体的性能,同时把多个重要数据的副本同步到不同的物理硬盘设备上,从而起到了非常好的数据冗余备份效果。
然而,发明人意识到,在RAID使用过程中,不可避免的会发生各类磁盘故障、链路故障、框间节点故障等,这些故障会导致下线的磁盘数量超过冗余的磁盘数量,会进一步导致硬盘临时下线或者频繁的上下线,虽然RAID在一定程度上提供了容错机制和性能,但是也无法保证业务一直在线,即RAID中下线的磁盘数量超过冗余磁盘的数量时会造成离线,会导致前端业务停止,直接造成业务宕机,对于一些关键行业比如金融、通信等是灾难性的故障。
发明内容
本申请根据一方面,提供一种磁盘阵列冗余方法,磁盘阵列冗余方法包括:
实时监测磁盘阵列的运行状态,响应于监测到磁盘阵列的运行状态异常,发送异常信息至状态机,以通过状态机按照预设冗余时长启动定时器进行计时;
状态机向代理端发送静默请求,使代理端执行静默处理,以暂停磁盘阵列的输入输出操作,并通过代理端将磁盘阵列的正常运行模式切换为临时运行模式;
状态机向代理端发送静默恢复请求,使代理端执行静默恢复处理,以恢复磁盘阵列的输入输出操作;及
待静默恢复处理完毕后,响应于定时器的计时时间未超过预设冗余时长,且磁盘阵 列的运行状态恢复正常,通过代理端将磁盘阵列的运行模式切换为正常运行模式。
在其中一些实施例中,响应于监测到磁盘阵列的运行状态异常,发送异常信息至状态机,以通过状态机按照预设冗余时长启动定时器进行计时,包括:
响应于监测到磁盘阵列的运行状态异常,发送异常信息至状态机,并将异常信息通知运维人员;及
将状态机的标志状态设置为成圆盘异常状态,按照预设冗余时长启动定时器进行计时,并通知运维人员已开始进行计时。
在其中一些实施例中,状态机向代理端发送静默请求,使代理端执行静默处理,以暂停磁盘阵列的输入输出操作,并通过代理端将磁盘阵列的正常运行模式切换为临时运行模式,包括:
状态机向代理端发送静默请求,并准备模式更改数据;
代理端执行静默处理,使主机暂停下发输入输出操作,将已下发的输入输出操作加入到静默队列,并待代理端执行静默处理完毕后,向状态机发送静默操作完成消息;及
状态机将模式更改数据发送至代理端,代理端根据模式更改数据将磁盘阵列的正常运行模式从正常运行模式切换为临时运行模式。
在其中一些实施例中,状态机向代理端发送静默恢复请求,使代理端执行静默恢复处理,以恢复磁盘阵列的输入输出操作,包括:
将磁盘阵列的正常运行模式切换为临时运行模式的消息发送至状态机,状态机向代理端发送静默恢复请求;及
代理端执行静默恢复处理,将静默队列中的输入输出操作恢复到正常队列,并使主机继续下发输入输出操作,待输入输出操作在临时运行模式下恢复正常后,将输入输出操作在临时运行模式下恢复正常的消息发送至状态机。
在其中一些实施例中,输入输出操作在临时运行模式下恢复正常的消息发送至状态机后,还包括:
获取磁盘阵列的运行状态,响应于磁盘阵列的运行状态恢复正常,且定时器的计时时间未超过预设冗余时长,通过代理端将磁盘阵列的临时运行模式切换为正常运行模式;及
响应于磁盘阵列的运行状态未恢复正常,或者定时器的计时时间超过预设冗余时长,通过代理端将磁盘阵列的临时运行模式切换为离线运行模式。
在其中一些实施例中,通过代理端将磁盘阵列的临时运行模式切换为正常运行模式,包括:
状态机向代理端发送静默请求,并准备模式更改数据;
代理端执行静默处理,使主机暂停下发输入输出操作,将已下发的输入输出操作加入到静默队列,并待代理端执行静默处理完毕后,向状态机发送静默操作完成消息;及
状态机将模式更改数据发送至代理端,代理端根据模式更改数据将磁盘阵列的运行 模式从临时运行模式切换为正常运行模式。
在其中一些实施例中,通过代理端将磁盘阵列的临时运行模式切换为正常运行模式,还包括:
将磁盘阵列的临时运行模式切换为正常运行模式的消息发送至状态机,状态机向代理端发送静默恢复请求;
代理端执行静默恢复处理,将静默队列中的输入输出操作恢复到正常队列,并使主机继续下发输入输出操作,待输入输出操作在正常运行模式下恢复正常后,将输入输出操作在正常运行模式下恢复正常的消息发送至状态机;及
将状态机的标志状态设置为成圆盘正常状态。
在其中一些实施例中,通过代理端将磁盘阵列的临时运行模式切换为离线运行模式,包括:
状态机向代理端发送静默请求,并准备模式更改数据;
代理端执行静默处理,使主机停止下发输入输出操作,将已下发的输入输出操作加入到静默队列,并待代理端执行静默处理完毕后,向状态机发送静默操作完成消息;及
状态机将模式更改数据发送至代理端,代理端根据模式更改数据将磁盘阵列的运行模式从临时运行模式切换为离线运行模式。
在其中一些实施例中,通过代理端将磁盘阵列的临时运行模式切换为离线运行模式,还包括:
将磁盘阵列的临时运行模式切换为离线运行模式的消息发送至状态机,状态机向代理端发送静默恢复请求;
代理端执行静默恢复处理,并向主机返回所有输入输出操作请求,断开主机主链路;及
将状态机的成圆盘异常状态上报给运维人员。
在其中一些实施例中,磁盘阵列的运行状态异常指磁盘阵列中处于下线状态的磁盘数量超过冗余磁盘的数量。
在其中一些实施例中,按照预设冗余时长启动定时器进行计时前,还包括:
根据磁盘阵列的故障类型、输入输出操作处理时长和磁盘插拔时长确定预设冗余时长数据库。
在其中一些实施例中,按照预设冗余时长启动定时器进行计时前,还包括:
根据故障类型从预设冗余时长数据库中获取相应的预设冗余时长。
本申请根据另一方面,提供了一种磁盘阵列冗余系统,磁盘阵列冗余系统包括:
磁盘管理端,用于实时监测磁盘阵列的运行状态,响应于磁盘阵列的运行状态异常,发送异常信息至状态机;
状态机,用于按照预设冗余时长启动定时器进行计时,向代理端发送静默请求或者静默恢复请求,及判断定时器的计时时间是否超过预设冗余时长;及
代理端,用于使代理端执行静默处理或者静默恢复处理,及切换代理端的运行模式。
本申请根据再一方面,提供了一种计算机设备,包括一个或多个存储器、一个或多个处理器及存储在一个或多个存储器上并可在一个或多个处理器上运行的计算机可读指令,一个或多个处理器执行计算机可读指令时实现磁盘阵列冗余方法的步骤。
本申请根据又一方面,提供了一种非易失性计算机可读存储介质,非易失性计算机可读存储介质存储有计算机可读指令,当计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行时实现磁盘阵列冗余方法的步骤。
附图说明
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其它的附图。
图1是本申请一个或多个实施例中的一种第一磁盘阵列冗余方法流程图;
图2是本申请一个或多个实施例中一种第二磁盘阵列冗余方法流程图;
图3是本申请一个或多个实施例中一种磁盘阵列冗余方法的时序图;
图4是本申请一个或多个实施例中一种磁盘阵列冗余系统的系统结构图;
图5是本申请一个或多个实施例中一种计算机设备的设备结构图。
具体实施方式
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。
在RAID使用过程中,不可避免的会发生各类磁盘故障、链路故障、框间节点故障等,这些故障会导致下线的磁盘数量超过冗余的磁盘数量,会进一步导致硬盘临时下线或者频繁的上下线,虽然RAID在一定程度上提供了容错机制和性能,但是也无法保证业务一直在线,即RAID中下线的磁盘数量超过冗余磁盘的数量时会造成离线,会导致前端业务停止,直接造成业务宕机,对于一些关键行业比如金融、通信等是灾难性的故障。针对这一技术问题,本文提出一种磁盘阵列冗余方法、系统、计算机设备和存储介质,使得运行状态异常的磁盘阵列进入临时运行模式,不会断掉业务,能够保证业务的正常运行;并且对临时运行模式设置预设冗余时长,使得在预设冗余时长内能保证业务的正常运行,且在预设冗余时长内待磁盘阵列恢复正常后,可以自动恢复到正常运行模式,提高了磁盘阵列的冗余能力,保证了业务的正常进行。
本实施例的磁盘阵列冗余方法,参照图1~图3所示,图1为本申请的磁盘阵列冗余方法的第一方法流程图;图2为本申请的磁盘阵列冗余方法的第二方法流程图;图3 是本申请的磁盘阵列冗余方法的时序图。
磁盘阵列冗余方法包括以下步骤:
实时监测磁盘阵列的运行状态,当监测到磁盘阵列的运行状态异常时,发送异常信息至状态机,以通过状态机按照预设冗余时长启动定时器进行计时;
磁盘阵列会设置冗余磁盘,当磁盘阵列发生故障,使得下线磁盘数量超过冗余磁盘数量时,磁盘阵列的运行状态会处于异常状态(例如RAID6磁盘阵列会设置有两块冗余磁盘,当RAID6磁盘阵列下线的磁盘数量超过2时,RAID6磁盘阵列处于异常状态),因此要对磁盘阵列进行实时监测,以监测磁盘阵列的运行状态。当监测到磁盘阵列的运行状态异常时,将磁盘阵列的异常消息发送至状态机,使得状态机启动定时器计时功能,并监测状态机的计时时间与预设冗余时长的关系,以通过两者的关系进行磁盘阵列运行模式的切换。
状态机向代理端发送静默请求,使代理端执行静默处理,以暂停磁盘阵列的输入输出操作,并通过代理端将磁盘阵列的正常运行模式切换为临时运行模式;
状态机启动定时器计时功能后,状态机会向代理端发送静默请求,使得代理端执行静默处理,静默处理是指暂停磁盘阵列当前运行模式下的输入输出操作,此时即暂停磁盘阵列在正常运行模式下的输入输出操作,然后通过代理端将磁盘阵列的运行模式切换为临时运行模式,以便于在磁盘阵列异常的时候还能够处理输入输出操作,只不过处于临时运行模式的磁盘列所处理的输入输出操作量降低,但是依然能够保证正常业务的进行。
状态机向代理端发送静默恢复请求,使代理端执行静默恢复处理,以恢复磁盘阵列的输入输出操作;
代理端将磁盘阵列的运行模式切换为临时运行模式后,状态机会向代理端发送静默恢复请求,静默恢复请求是指将磁盘阵列按照将要切换后的运行模式执行输入输出操作,此时即将磁盘阵列按照临时运行模式执行输入输出操作,以使得处于异常状态的磁盘阵列能够保证正常业务的进行。
待静默恢复处理完毕后,若定时器的计时时间未超过预设冗余时长,且磁盘阵列的运行状态恢复正常,则将代理端的运行模式切换为正常运行模式。
代理端执行静默恢复处理完毕后,需要监测磁盘阵列的运行状态,如果磁盘阵列的运行状态恢复正常,并且定时器的计时时间未超过预设冗余时长,则需将代理端的运行模式切换为正常运行模式。在此预设冗余时长期间,磁盘阵列中下线的磁盘会自动恢复上线,或者运维人员手动对磁盘阵列进行检修,使得磁盘阵列中下线的磁盘恢复上线,总之,在此预设冗余时长期间,只要磁盘阵列的运行状态恢复正常,并且定时器的计时时间未超过预设冗余时长,便可以将磁盘阵列的运行模式切换回正常运行模式,恢复磁盘阵列正常的输入输出操作,恢复磁盘阵列正常的业务,有效地提高了磁盘阵列的冗余能力。
在其中一个实施方式中,当监测到磁盘阵列的运行状态异常时,发送异常信息至状态机,以通过状态机按照预设冗余时长启动定时器进行计时,包括:
当监测到磁盘阵列的运行状态异常时,发送异常信息至状态机,并将异常信息通知运维人员;
实时监测磁盘阵列的运行状态,当监测到磁盘阵列的运行状态异常时,会将磁盘阵列异常的消息发送给状态机,以启动状态机的定时器计时功能,并且要将异常信息通知运维人员,使运维人员进行故障排查,恢复磁盘阵列中下线的磁盘。此外,还要打印日志,以保存操作信息。
将状态机的标志状态设置为成圆盘异常状态,按照预设冗余时长启动定时器进行计时,并通知运维人员已开始进行计时。
状态机设置有标志状态,用来标识磁盘阵列的运行状态;状态机接收到磁盘阵列的异常信息后,自动将状态机的标志状态设置为成圆盘异常状态,也就是磁盘阵列中磁盘的下线状态;然后状态机自动启动定时器的计时功能,并时刻判断定时器的计时时间与预设冗余时长之间的关系,以通过定时器的计时时间与预设冗余时长之间的关系来判断磁盘阵列所要切换到的运行模式;此外,还要通知运维人员已开始进行计时。
在其中一个实施方式中,状态机向代理端发送静默请求,使代理端执行静默处理,以暂停磁盘阵列的输入输出操作,并通过代理端将磁盘阵列的正常运行模式切换为临时运行模式,包括:
状态机向代理端发送静默请求,并准备模式更改数据;
状态机的定时器启动计时功能后,状态机自动向代理端发送静默请求,并准备模式更改数据。磁盘阵列按照正常运行模式执行输入输出操作请求,需按照正常运行模式数据执行;磁盘阵列按照临时运行模式执行输入输出操作请求,需按照临时运行模式数据执行;磁盘阵列按照离线运行模式执行输入输出操作请求,需按照临时离线模式数据执行;因此在需要执行磁盘阵列的运行模式切换前,要准备模式更改数据。例如,磁盘阵列当前的运行模式为正常运行模式,将要切换到临时运行模式,则需准备正常运行模式切换到临时运行模式的模式更改数据。
代理端执行静默处理,使主机暂停下发输入输出操作,将已下发的输入输出操作加入到静默队列,并待代理端执行静默处理完毕后,向状态机发送静默操作完成消息;
代理端接收到状态机的静默请求后,执行静默处理,即,停掉磁盘阵列正常运行模式下主机下发的输入输出操作,将已下发的输入输出操作加入到静默队列;待静默处理执行完毕后,向状态机返回静默操作完成消息。
状态机将模式更改数据发送至代理端,代理端根据模式更改数据将磁盘阵列的运行模式从正常运行模式切换为临时运行模式。
状态机接收到静默操作完成消息后,将模式更改数据发送至代理端,代理端根据模式更改数据对磁盘阵列的运行模式进行切换,将磁盘阵列的正常运行模式切换为临时运 行模式,以在临时运行模式下执行输入输出操作,不会断掉业务,有效地提高了磁盘阵列的冗余能力。
在其中一个实施方式中,状态机向代理端发送静默恢复请求,使代理端执行静默恢复处理,以恢复磁盘阵列的输入输出操作,包括:
将磁盘阵列的正常运行模式切换为临时运行模式的消息发送至状态机,状态机向代理端发送静默恢复请求;
代理端磁盘阵列的正常运行模式切换为临时运行模式后,将磁盘阵列的正常运行模式切换为临时运行模式的消息发送至状态机,状态机接收到该消息后默认磁盘阵列的运行模式切换完毕,然后状态机会向代理端发送静默恢复请求,以恢复静默,磁盘阵列按照切换后的运行模式执行输入输出操作。
代理端执行静默恢复处理,将静默队列中的输入输出操作恢复到正常队列,并使主机继续下发输入输出操作,待输入输出操作在临时运行模式下恢复正常后,将输入输出操作在临时运行模式下恢复正常的消息发送至状态机。
代理端接收到状态机的静默恢复请求后,将静默队列中的输入输出操作在临时运行模式下恢复到正常队列,即,将静默队列中的输入输出操作按照磁盘阵列切换后的运行模式正常执行,并且主机恢复下发输入输出操作,均按照磁盘阵列切换后的运行模式正常执行;待输入输出操作恢复正常后,输入输出操作恢复正常的消息返回状态机。
在其中一个实施方式中,输入输出操作在临时运行模式下恢复正常的消息发送至状态机后,还包括:
获取磁盘阵列的运行状态,若磁盘阵列的运行状态恢复正常,且定时器的计时时间未超过预设冗余时长,则通过代理端将磁盘阵列的临时运行模式切换为正常运行模式;
状态机接收到输入输出操作恢复正常的消息后,获取磁盘阵列的运行状态,并判断磁盘阵列的状态是否正常,并且定时器的计时时间是否超过预设冗余时长;如果磁盘阵列的运行状态恢复正常,且定时器的计时时间未超过预设冗余时长,则说明磁盘阵列中下线的磁盘已自动上线,或者运维人员已手动将下线的磁盘上线,即,磁盘阵列已恢复到正常状态,则需通过代理端将磁盘阵列的临时运行模式切换为正常运行模式,按照正常运行模式执行输入输出操作,磁盘阵列处理业务恢复正常。
若磁盘阵列的运行状态未恢复正常,或者定时器的计时时间超过预设冗余时长,则通过代理端将磁盘阵列的临时运行模式切换为离线运行模式。
如果磁盘阵列的运行状态未恢复正常,或者定时器的计时时间超过预设冗余时长,则说明磁盘阵列已无法按照临时运行模式正常执行输入输出操作,且无法恢复到正常运行模式,无法正常执行业务处理,则需通过代理端将磁盘阵列的临时运行模式切换为离线运行模式,需发送消息给运维人员,及时对处于离线运行模式的磁盘阵列进行维护。
在其中一个实施方式中,通过代理端将磁盘阵列的临时运行模式切换为正常运行模式,包括:
状态机向代理端发送静默请求,并准备模式更改数据;
状态机的定时器启动计时功能后,状态机自动向代理端发送静默请求,并在需要执行磁盘阵列的运行模式切换前,准备模式更改数据,模式更改数据为将磁盘阵列的临时运行模式切换为正常运行模式的模式更改数据。
代理端执行静默处理,使主机暂停下发输入输出操作,将已下发的输入输出操作加入到静默队列,并待代理端执行静默处理完毕后,向状态机发送静默操作完成消息;
代理端接收到状态机的静默请求后,执行静默处理,即,停掉磁盘阵列临时运行模式下主机下发的输入输出操作,将已下发的输入输出操作加入到静默队列;待静默处理执行完毕后,向状态机返回静默操作完成消息。
状态机将模式更改数据发送至代理端,代理端根据模式更改数据将磁盘阵列的运行模式从临时运行模式切换为正常运行模式。
状态机接收到静默操作完成消息后,将磁盘阵列的临时运行模式切换为正常运行模式的模式更改数据发送至代理端,代理端根据模式更改数据对磁盘阵列的运行模式进行切换,将磁盘阵列的临时运行模式切换为正常运行模式,以恢复正常运行模式,执行输入输出操作,恢复业务处理能力,有效地提高了磁盘阵列的冗余能力。
在其中一个实施方式中,通过代理端将磁盘阵列的临时运行模式切换为正常运行模式,还包括:
将磁盘阵列的临时运行模式切换为正常运行模式的消息发送至状态机,状态机向代理端发送静默恢复请求;
代理端磁盘阵列的临时运行模式切换为正常运行模式后,将磁盘阵列的临时运行模式切换为正常运行模式的消息发送至状态机,状态机接收到该消息后默认磁盘阵列的运行模式切换完毕,然后状态机会向代理端发送静默恢复请求,以恢复静默,磁盘阵列按照切换后的正常运行模式执行输入输出操作。
代理端执行静默恢复处理,将静默队列中的输入输出操作恢复到正常队列,并使主机继续下发输入输出操作,待输入输出操作在正常运行模式下恢复正常后,将输入输出操作在正常运行模式下恢复正常的消息发送至状态机;
代理端接收到状态机的静默恢复请求后,将静默队列中的输入输出操作在正常运行模式下恢复到正常队列,即,将静默队列中的输入输出操作按照磁盘阵列切换后的正常运行模式正常执行,并且主机恢复下发输入输出操作,均按照磁盘阵列切换后的正常运行模式正常执行;待输入输出操作恢复正常后,输入输出操作在正常运行模式下恢复正常的消息返回状态机。
将状态机的标志状态设置为成圆盘正常状态。
此时磁盘阵列的运行状态已恢复正常,且磁盘阵列已恢复至正常运行模式,需将状态机的标志状态设置为成圆盘正常状态,业务正常执行,输入输出操作正常执行,磁盘阵列继续正常运行。此外,还要打印日志,以保存操作信息。
在其中一个实施方式中,通过代理端将磁盘阵列的临时运行模式切换为离线运行模式,包括:
状态机向代理端发送静默请求,并准备模式更改数据;
状态机的定时器启动计时功能后,状态机自动向代理端发送静默请求,并在需要执行磁盘阵列的运行模式切换前,准备模式更改数据,模式更改数据为将磁盘阵列的临时运行模式切换为离线运行模式的模式更改数据。
代理端执行静默处理,使主机停止下发输入输出操作,将已下发的输入输出操作加入到静默队列,并待代理端执行静默处理完毕后,向状态机发送静默操作完成消息;
代理端接收到状态机的静默请求后,执行静默处理,即,停掉磁盘阵列临时运行模式下主机下发的输入输出操作,将已下发的输入输出操作加入到静默队列;待静默处理执行完毕后,向状态机返回静默操作完成消息。
状态机将模式更改数据发送至代理端,代理端根据模式更改数据将磁盘阵列的运行模式从临时运行模式切换为离线运行模式。
状态机接收到静默操作完成消息后,将磁盘阵列的临时运行模式切换为离线运行模式的模式更改数据发送至代理端,代理端根据模式更改数据对磁盘阵列的运行模式进行切换,将磁盘阵列的临时运行模式切换为离线运行模式,以便于停止所有的输入输出操作和业务。
在其中一个实施方式中,通过代理端将磁盘阵列的临时运行模式切换为离线运行模式,还包括:
将磁盘阵列的临时运行模式切换为离线运行模式的消息发送至状态机,状态机向代理端发送静默恢复请求;
代理端磁盘阵列的临时运行模式切换为离线运行模式后,将磁盘阵列的临时运行模式切换为离线运行模式的消息发送至状态机,状态机接收到该消息后默认磁盘阵列的运行模式切换完毕,然后状态机会向代理端发送静默恢复请求,以恢复静默,磁盘阵列按照切换后的离线运行模式执行输入输出操作。
代理端执行静默恢复处理,并向主机返回所有输入输出操作请求,断开主机主链路;
代理端接收到状态机的静默恢复请求后,将静默队列中的输入输出操作按照磁盘阵列切换后的离线运行模式执行,即,停止所有的输入输出操作和业务。
将状态机的成圆盘异常状态上报给运维人员。
此时磁盘阵列的运行状态已无法恢复正常,需将状态机的标志状态设置为成圆盘异常状态,并上报给运维人员。此外,还要打印日志,以保存操作信息。
在其中一个实施方式中,磁盘阵列的运行状态异常指磁盘阵列中处于下线状态的磁盘数量超过冗余磁盘的数量。
磁盘阵列的异常状态是指磁盘阵列中处于下线状态的磁盘数量超过冗余磁盘的数量,无法进行正常的输入输出操作和正常的业务处理。
在其中一个实施方式中,按照预设冗余时长启动定时器进行计时前,还包括:
根据磁盘阵列的故障类型、输入输出操作处理时长和磁盘插拔时长确定预设冗余时长数据库。
磁盘阵列不同的故障类型所对应的预设冗余时长不同,因此可以针对磁盘阵列不同的故障类型设置不同的预设冗余时长,并将不同故障类型下的预设冗余时长存储到数据库中,以便于定时器执行计时前获取与故障类型相应的预设冗余时长。此外,预设冗余时长还要考虑控制器故障恢复时长等因素。
在其中一个实施方式中,按照预设冗余时长启动定时器进行计时前,还包括:
根据故障类型从预设冗余时长数据库中获取相应的预设冗余时长。
监测到磁盘阵列处于异常状态时,状态机判断磁盘阵列的故障类型,然后根据故障类型从预设冗余时长数据库中获取相对应的预设冗余时长,然后按照预设冗余时长启动定时器进行计时。
参照图1~图3所示,图1为本申请的磁盘阵列冗余方法的第一方法流程图;图2为本申请的磁盘阵列冗余方法的第二方法流程图;图3是本申请的磁盘阵列冗余方法的时序图。
实时监测磁盘阵列的运行状态;
对磁盘阵列进行实时监测,以实时监测磁盘阵列的运行状态,以便于在磁盘阵列处于异常状态时将磁盘阵列的运行模式切换到临时运行模式,保证业务的正常进行。
当监测到磁盘阵列的运行状态异常时,发送异常信息至状态机,并将异常信息通知运维人员;
实时监测磁盘阵列的运行状态,当监测到磁盘阵列的运行状态异常时,会将磁盘阵列异常的消息发送给状态机,以启动状态机的定时器计时功能,并且要将异常信息通知运维人员,使运维人员进行故障排查,恢复磁盘阵列中下线的磁盘。此外,当监测到磁盘阵列的运行状态异常时,会保存磁盘阵列的故障类型。
将状态机的标志状态设置为成圆盘异常状态;
状态机接收到磁盘阵列的异常信息后,自动将状态机的标志状态设置为成圆盘异常状态,也就是磁盘阵列中磁盘的下线状态。
根据磁盘阵列的故障类型从预设冗余时长数据库中获取相应的预设冗余时长,按照预设冗余时长启动定时器进行计时,并通知运维人员已开始进行计时;
状态机自动启动定时器的计时功能,根据磁盘阵列的故障类型从预设冗余时长数据库中获取相应的预设冗余时长,并时刻判断定时器的计时时间与预设冗余时长之间的关系,以通过定时器的计时时间与预设冗余时长之间的关系来判断磁盘阵列所要切换到的运行模式;此外,还要通知运维人员已开始进行计时。
状态机向代理端发送静默请求,并准备模式更改数据;
状态机的定时器启动计时功能后,状态机自动向代理端发送静默请求,并准备模式 更改数据。磁盘阵列当前的运行模式为正常运行模式,将要切换到临时运行模式,则需准备正常运行模式切换到临时运行模式的模式更改数据。
代理端执行静默处理,使主机暂停下发输入输出操作,将已下发的输入输出操作加入到静默队列,并待代理端执行静默处理完毕后,向状态机发送静默操作完成消息;
代理端接收到状态机的静默请求后,执行静默处理,即,停掉磁盘阵列正常运行模式下主机下发的输入输出操作,将已下发的输入输出操作加入到静默队列;待静默处理执行完毕后,向状态机返回静默操作完成消息。
状态机将模式更改数据发送至代理端,代理端根据模式更改数据将磁盘阵列的正常运行模式从正常运行模式切换为临时运行模式;
状态机接收到静默操作完成消息后,将模式更改数据发送至代理端,代理端根据模式更改数据对磁盘阵列的运行模式进行切换,将磁盘阵列的正常运行模式切换为临时运行模式,以在临时运行模式下执行输入输出操作,不会断掉业务,有效地提高了磁盘阵列的冗余能力。
将磁盘阵列的正常运行模式切换为临时运行模式的消息发送至状态机,状态机向代理端发送静默恢复请求;
代理端磁盘阵列的正常运行模式切换为临时运行模式后,将磁盘阵列的正常运行模式切换为临时运行模式的消息发送至状态机,状态机接收到该消息后默认磁盘阵列的运行模式切换完毕,然后状态机会向代理端发送静默恢复请求,以恢复静默,磁盘阵列按照切换后的运行模式执行输入输出操作。
代理端执行静默恢复处理,将静默队列中的输入输出操作恢复到正常队列,并使主机继续下发输入输出操作,待输入输出操作在临时运行模式下恢复正常后,将输入输出操作在临时运行模式下恢复正常的消息发送至状态机;
代理端接收到状态机的静默恢复请求后,将静默队列中的输入输出操作在临时运行模式下恢复到正常队列,即,将静默队列中的输入输出操作按照磁盘阵列切换后的运行模式正常执行,并且主机恢复下发输入输出操作,均按照磁盘阵列切换后的运行模式正常执行;待输入输出操作恢复正常后,输入输出操作恢复正常的消息返回状态机。
获取磁盘阵列的运行状态,判断磁盘阵列的运行状态是否恢复正常,且定时器的计时时间是否超过预设冗余时长;
若磁盘阵列的运行状态恢复正常,且定时器的计时时间未超过预设冗余时长,则状态机向代理端发送静默请求,并准备模式更改数据;
状态机的定时器启动计时功能后,状态机自动向代理端发送静默请求,并在需要执行磁盘阵列的运行模式切换前,准备模式更改数据,模式更改数据为将磁盘阵列的临时运行模式切换为正常运行模式的模式更改数据。
代理端执行静默处理,使主机暂停下发输入输出操作,将已下发的输入输出操作加入到静默队列,并待代理端执行静默处理完毕后,向状态机发送静默操作完成消息;
代理端接收到状态机的静默请求后,执行静默处理,即,停掉磁盘阵列临时运行模式下主机下发的输入输出操作,将已下发的输入输出操作加入到静默队列;待静默处理执行完毕后,向状态机返回静默操作完成消息。
状态机将模式更改数据发送至代理端,代理端根据模式更改数据将磁盘阵列的运行模式从临时运行模式切换为正常运行模式;
状态机接收到静默操作完成消息后,将磁盘阵列的临时运行模式切换为正常运行模式的模式更改数据发送至代理端,代理端根据模式更改数据对磁盘阵列的运行模式进行切换,将磁盘阵列的临时运行模式切换为正常运行模式,以恢复正常运行模式,执行输入输出操作,恢复业务处理能力,有效地提高了磁盘阵列的冗余能力。
将磁盘阵列的临时运行模式切换为正常运行模式的消息发送至状态机,状态机向代理端发送静默恢复请求;
代理端磁盘阵列的临时运行模式切换为正常运行模式后,将磁盘阵列的临时运行模式切换为正常运行模式的消息发送至状态机,状态机接收到该消息后默认磁盘阵列的运行模式切换完毕,然后状态机会向代理端发送静默恢复请求,以恢复静默,磁盘阵列按照切换后的正常运行模式执行输入输出操作。
代理端执行静默恢复处理,将静默队列中的输入输出操作恢复到正常队列,并使主机继续下发输入输出操作,待输入输出操作在正常运行模式下恢复正常后,将输入输出操作在正常运行模式下恢复正常的消息发送至状态机;
代理端接收到状态机的静默恢复请求后,将静默队列中的输入输出操作在正常运行模式下恢复到正常队列,即,将静默队列中的输入输出操作按照磁盘阵列切换后的正常运行模式正常执行,并且主机恢复下发输入输出操作,均按照磁盘阵列切换后的正常运行模式正常执行;待输入输出操作恢复正常后,输入输出操作恢复正常的消息返回状态机。
将状态机的标志状态设置为成圆盘正常状态;
此时磁盘阵列的运行状态已恢复正常,且磁盘阵列已恢复至正常运行模式,需将状态机的标志状态设置为成圆盘正常状态。
若磁盘阵列的运行状态未恢复正常,或者定时器的计时时间超过预设冗余时长,则状态机向代理端发送静默请求,并准备模式更改数据;
状态机的定时器启动计时功能后,状态机自动向代理端发送静默请求,并在需要执行磁盘阵列的运行模式切换前,准备模式更改数据,模式更改数据为将磁盘阵列的临时运行模式切换为离线运行模式的模式更改数据。
代理端执行静默处理,使主机停止下发输入输出操作,将已下发的输入输出操作加入到静默队列,并待代理端执行静默处理完毕后,向状态机发送静默操作完成消息;
代理端接收到状态机的静默请求后,执行静默处理,即,停掉磁盘阵列临时运行模式下主机下发的输入输出操作,将已下发的输入输出操作加入到静默队列;待静默处理 执行完毕后,向状态机返回静默操作完成消息。
状态机将模式更改数据发送至代理端,代理端根据模式更改数据将磁盘阵列的运行模式从临时运行模式切换为离线运行模式;
状态机接收到静默操作完成消息后,将磁盘阵列的临时运行模式切换为离线运行模式的模式更改数据发送至代理端,代理端根据模式更改数据对磁盘阵列的运行模式进行切换,将磁盘阵列的临时运行模式切换为离线运行模式,以便于停止所有的输入输出操作和业务。
将磁盘阵列的临时运行模式切换为离线运行模式的消息发送至状态机,状态机向代理端发送静默恢复请求;
代理端磁盘阵列的临时运行模式切换为离线运行模式后,将磁盘阵列的临时运行模式切换为离线运行模式的消息发送至状态机,状态机接收到该消息后默认磁盘阵列的运行模式切换完毕,然后状态机会向代理端发送静默恢复请求,以恢复静默,磁盘阵列按照切换后的离线运行模式执行输入输出操作。
代理端执行静默恢复处理,并向主机返回所有输入输出操作请求,断开主机主链路;
代理端接收到状态机的静默恢复请求后,将静默队列中的输入输出操作按照磁盘阵列切换后的离线运行模式执行,即,停止所有的输入输出操作和业务。
将状态机的成圆盘异常状态上报给运维人员。
此时磁盘阵列的运行状态已无法恢复正常,需将状态机的标志状态设置为成圆盘异常状态,并上报给运维人员。
应该理解的是,虽然图1~3的流程图中的各个步骤按照箭头的指示依次显示,但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有严格的顺序限制,这些步骤可以以其它的顺序执行。而且,图1~3中的至少一部分步骤可以包括多个子步骤或者多个阶段,这些子步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,这些子步骤或者阶段的执行顺序也不必然是依次进行,而是可以与其它步骤或者其它步骤的子步骤或者阶段的至少一部分轮流或者交替地执行。
参照图4所示,图4为本申请的磁盘阵列冗余系统的系统结构图。
本实施例的磁盘阵列冗余系统,包括:
磁盘管理端,用于实时监测磁盘阵列的运行状态,当磁盘阵列的运行状态异常时,发送异常信息至状态机;
磁盘管理端包括状态监测模块和信息处理模块,状态监测模块用于实时监测磁盘阵列的运行状态,信息处理模块,用于当磁盘阵列的运行状态异常时,发送异常信息至状态机。磁盘阵列会设置冗余磁盘,当磁盘阵列发生故障,使得下线磁盘数量超过冗余磁盘数量时,磁盘阵列的运行状态会处于异常状态,因此要通过状态监测模块对磁盘阵列进行实时监测,以监测磁盘阵列的运行状态。当监测到磁盘阵列的运行状态异常时,通 过信息处理模块将磁盘阵列的异常消息发送至状态机,使得状态机启动定时器计时功能。
状态机,用于按照预设冗余时长启动定时器进行计时,向代理端发送静默请求或者静默恢复请求,及判断定时器的计时时间是否超过预设冗余时长;
状态机包括定时器模块、请求处理模块和计时判断模块;定时器模块用于按照预设冗余时长启动定时器进行计时;请求处理模块用于向代理端发送静默请求或者静默恢复请求;计时判断模块用于判断定时器的计时时间是否超过预设冗余时长。通过定时器模块启动定时器计时功能,状态机启动定时器计时功能后,状态机会通过请求处理模块向代理端发送静默请求或者静默恢复请求,使得代理端执行静默处理或者静默恢复处理;输入输出操作在临时运行模式下恢复正常的消息发送至状态机后,需要通过计时判断模块判断定时器的计时时间与预设冗余时长之间的关系,以根据两者的关系确定后续需要切换为正常运行模式或者离线运行模式。
代理端,用于执行静默处理或者静默恢复处理,及切换磁盘阵列的运行模式。
代理端包括静默处理模块和模式切换模块;静默处理模块用于执行静默处理或者静默恢复处理;模式切换模块,用于切换磁盘阵列的运行模式。当接收到状态机发送的静默请求或者静默恢复请求后,静默处理模块执行相应的静默处理或者静默恢复处理;此外,通过模式切换模块根据模式更改数据切换磁盘阵列的运行模式。
在其中一个实施方式中,磁盘阵列冗余系统还包括:
信息告警端,用于向运维人员发送告警信息。
通过信息告警端向运维人员发送磁盘阵列的故障信息、定时器计时启动信息、磁盘恢复上线信息及磁盘阵列的运行模式切换信息等。
在其中一个实施方式中,状态机还包括:
状态设置模块,用于设置状态机的标志状态。
通过状态设置模块将状态机的标志状态设置为成圆盘异常状态,或者将状态机的标志状态设置为成圆盘正常状态。
模式数据模块,用于存储磁盘阵列各运行模式的配置数据。
模式更改数据即执行磁盘阵列的运行模式切换时的数据,通过模式数据模块存储磁盘阵列各运行模式的配置数据。
关于磁盘阵列冗余系统的具体限定可以参见上文中对于方法的限定,在此不再赘述。上述磁盘阵列冗余系统中的各个模块可全部或部分通过软件、硬件及其组合来实现。上述各模块可以硬件形式内嵌于或独立于计算机设备中的处理器中,也可以以软件形式存储于计算机设备中的存储器中,以便于处理器调用执行以上各个模块对应的操作。
本实施例提供一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机可读指令,所述处理器执行所述计算机可读指令时实现磁盘阵列冗余方法的步骤。
该计算机设备可以是终端,其内部结构图可以如图5所示。该计算机设备包括通过 系统总线连接的处理器、存储器、网络接口、显示屏和输入装置。其中,该计算机设备的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有操作系统和计算机可读指令。该内存储器为非易失性存储介质中的操作系统和计算机可读指令的运行提供环境。该计算机设备的网络接口用于与外部的终端通过网络连接通信。该计算机可读指令被处理器执行时以实现磁盘阵列冗余方法。该计算机设备的显示屏可以是液晶显示屏或者电子墨水显示屏,该计算机设备的输入装置可以是显示屏上覆盖的触摸层,也可以是计算机设备外壳上设置的按键、轨迹球或触控板,还可以是外接的键盘、触控板或鼠标等。
本领域内的技术人员应明白,图5中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备的限定,具体的计算机设备可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。
在其中一个实施方式中,提供了一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机可读指令,处理器执行计算机可读指令时实现磁盘阵列冗余方法的步骤。
本实施例提供一种非易失性计算机可读存储介质,其上存储有计算机可读指令,计算机可读指令被处理器执行时实现磁盘阵列冗余方法的步骤。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机可读指令来指令相关的硬件来完成,所述的计算机可读指令可存储于一非易失性计算机可读取存储介质中,该计算机可读指令在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。
以上实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。
以上所述实施例仅表达了本申请的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对发明专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本申请构思的前提下,还可以做出若干变形和改进,这些都属于本申请的 保护范围。因此,本申请专利的保护范围应以所附权利要求为准。

Claims (20)

  1. 一种磁盘阵列冗余方法,其特征在于,所述磁盘阵列冗余方法包括:
    实时监测所述磁盘阵列的运行状态,响应于监测到所述磁盘阵列的运行状态异常,发送异常信息至状态机,以通过所述状态机按照预设冗余时长启动定时器进行计时;
    所述状态机向代理端发送静默请求,使所述代理端执行静默处理,以暂停所述磁盘阵列的输入输出操作,并通过所述代理端将所述磁盘阵列的正常运行模式切换为临时运行模式;
    所述状态机向所述代理端发送静默恢复请求,使所述代理端执行静默恢复处理,以恢复所述磁盘阵列的输入输出操作;及
    待所述静默恢复处理完毕后,响应于所述定时器的计时时间未超过所述预设冗余时长,且所述磁盘阵列的运行状态恢复正常,通过所述代理端将所述磁盘阵列的运行模式切换为正常运行模式;
    其中,所述磁盘阵列的运行状态异常指所述磁盘阵列中处于下线状态的磁盘数量超过冗余磁盘的数量。
  2. 根据权利要求1所述的磁盘阵列冗余方法,其特征在于,所述响应于监测到所述磁盘阵列的运行状态异常,发送异常信息至状态机,以通过所述状态机按照预设冗余时长启动定时器进行计时,包括:
    响应于监测到所述磁盘阵列的运行状态异常,发送所述异常信息至状态机,并将所述异常信息通知运维人员;及
    将所述状态机的标志状态设置为成圆盘异常状态,按照所述预设冗余时长启动定时器进行计时,并通知运维人员已开始进行计时。
  3. 根据权利要求1所述的磁盘阵列冗余方法,其特征在于,所述磁盘阵列设置有两块冗余磁盘,响应于所述磁盘阵列下线的磁盘数量超过2,所述磁盘阵列处于异常状态。
  4. 根据权利要求1所述的磁盘阵列冗余方法,其特征在于,所述静默处理为暂停所述磁盘阵列当前运行模式下的输入输出操作,使所述代理端执行静默处理,包括:
    使所述代理端暂停所述磁盘阵列在正常运行模式下的输入输出操作。
  5. 根据权利要求1所述的磁盘阵列冗余方法,其特征在于,所述静默恢复请求为将磁盘阵列按照将要切换后的运行模式执行输入输出操作,使所述代理端执行静默恢复处理,包括:
    使所述代理端将所述磁盘阵列按照临时运行模式执行输入输出操作。
  6. 根据权利要求1所述的磁盘阵列冗余方法,其特征在于,将所述磁盘阵列的运行模式切换为正常运行模式,包括:
    恢复所述磁盘阵列正常的输入输出操作,恢复所述磁盘阵列正常的业务,提高所述 磁盘的冗余能力。
  7. 根据权利要求2所述的磁盘阵列冗余方法,其特征在于,在发送异常信息至状态机之后,所述方法还包括:
    将所述异常信息通知运维人员,使所述运维人员进行故障排查,恢复所述磁盘阵列中下线的磁盘;
    打印日志以保存操作信息。
  8. 根据权利要求2所述的磁盘阵列冗余方法,其特征在于,所述磁盘阵列所需要切换到的运行模式是通过所述定时器的计时时长与预设冗余时长之间的关系来判断的。
  9. 根据权利要求1所述的磁盘阵列冗余方法,其特征在于,所述状态机向代理端发送静默请求,使所述代理端执行静默处理,以暂停所述磁盘阵列的输入输出操作,并通过所述代理端将所述磁盘阵列的正常运行模式切换为临时运行模式,包括:
    所述状态机向代理端发送静默请求,并准备模式更改数据;
    所述代理端执行静默处理,使主机暂停下发输入输出操作,将已下发的输入输出操作加入到静默队列,并待所述代理端执行所述静默处理完毕后,向所述状态机发送静默操作完成消息;及
    所述状态机将所述模式更改数据发送至所述代理端,所述代理端根据所述模式更改数据将所述磁盘阵列的运行模式从正常运行模式切换为临时运行模式。
  10. 根据权利要求2所述的磁盘阵列冗余方法,其特征在于,所述状态机向所述代理端发送静默恢复请求,使所述代理端执行静默恢复处理,以恢复所述磁盘阵列的输入输出操作,包括:
    将所述磁盘阵列的正常运行模式切换为临时运行模式的消息发送至所述状态机,所述状态机向所述代理端发送静默恢复请求;及
    所述代理端执行静默恢复处理,将静默队列中的输入输出操作恢复到正常队列,并使主机继续下发输入输出操作,待所述输入输出操作在所述临时运行模式下恢复正常后,将所述输入输出操作在所述临时运行模式下恢复正常的消息发送至所述状态机。
  11. 根据权利要求10所述的磁盘阵列冗余方法,其特征在于,所述输入输出操作在所述临时运行模式下恢复正常的消息发送至所述状态机后,还包括:
    获取所述磁盘阵列的运行状态,响应于所述磁盘阵列的运行状态恢复正常,且所述定时器的计时时间未超过所述预设冗余时长,通过所述代理端将所述磁盘阵列的临时运行模式切换为正常运行模式;及
    响应于所述磁盘阵列的运行状态未恢复正常,或者所述定时器的计时时间超过所述预设冗余时长,通过所述代理端将所述磁盘阵列的临时运行模式切换为离线运行模式。
  12. 根据权利要求11所述的磁盘阵列冗余方法,其特征在于,所述通过所述代理端将所述磁盘阵列的临时运行模式切换为正常运行模式,包括:
    所述状态机向代理端发送静默请求,并准备模式更改数据;
    所述代理端执行静默处理,使主机暂停下发输入输出操作,将已下发的输入输出操作加入到静默队列,并待所述代理端执行所述静默处理完毕后,向所述状态机发送静默操作完成消息;及
    所述状态机将所述模式更改数据发送至所述代理端,所述代理端根据所述模式更改数据将所述磁盘阵列的运行模式从临时运行模式切换为正常运行模式。
  13. 根据权利要求12所述的磁盘阵列冗余方法,其特征在于,所述通过所述代理端将所述磁盘阵列的临时运行模式切换为正常运行模式,还包括:
    将所述磁盘阵列的临时运行模式切换为正常运行模式的消息发送至所述状态机,所述状态机向所述代理端发送静默恢复请求;
    所述代理端执行静默恢复处理,将所述静默队列中的输入输出操作恢复到正常队列,并使主机继续下发输入输出操作,待所述输入输出操作在所述正常运行模式下恢复正常后,将所述输入输出操作在所述正常运行模式下恢复正常的消息发送至所述状态机;及
    将所述状态机的标志状态设置为成圆盘正常状态。
  14. 根据权利要求11所述的磁盘阵列冗余方法,其特征在于,所述通过所述代理端将所述磁盘阵列的临时运行模式切换为离线运行模式,包括:
    所述状态机向代理端发送静默请求,并准备模式更改数据;
    所述代理端执行静默处理,使主机停止下发输入输出操作,将已下发的输入输出操作加入到静默队列,并待所述代理端执行所述静默处理完毕后,向所述状态机发送静默操作完成消息;及
    所述状态机将所述模式更改数据发送至所述代理端,所述代理端根据所述模式更改数据将所述磁盘阵列的运行模式从临时运行模式切换为离线运行模式。
  15. 根据权利要求14所述的磁盘阵列冗余方法,其特征在于,所述通过所述代理端将所述磁盘阵列的临时运行模式切换为离线运行模式,还包括:
    将所述磁盘阵列的临时运行模式切换为离线运行模式的消息发送至所述状态机,所述状态机向所述代理端发送静默恢复请求;
    所述代理端执行静默恢复处理,并向主机返回所有输入输出操作请求,断开主机主链路;及
    将所述状态机的成圆盘异常状态上报给运维人员。
  16. 根据权利要求1所述的磁盘阵列冗余方法,其特征在于,所述按照预设冗余时长启动定时器进行计时前,还包括:
    根据所述磁盘阵列的故障类型、输入输出操作处理时长和磁盘插拔时长确定预设冗余时长数据库。
  17. 根据权利要求16所述的磁盘阵列冗余方法,其特征在于,所述按照预设冗余时长启动定时器进行计时前,还包括:
    根据所述故障类型从所述预设冗余时长数据库中获取相应的预设冗余时长。
  18. 一种执行如权利要求1~17任一项所述磁盘阵列冗余方法的磁盘阵列冗余系统,其特征在于,所述磁盘阵列冗余系统包括:
    磁盘管理端,用于实时监测所述磁盘阵列的运行状态,响应于所述磁盘阵列的运行状态异常,发送异常信息至状态机;
    状态机,用于按照预设冗余时长启动定时器进行计时,向代理端发送静默请求或者静默恢复请求,及判断所述定时器的计时时间是否超过所述预设冗余时长;及
    代理端,用于使所述代理端执行静默处理或者静默恢复处理,及切换所述代理端的运行模式。
  19. 一种计算机设备,包括一个或多个存储器、一个或多个处理器及存储在一个或多个存储器上并可在一个或多个处理器上运行的计算机可读指令,其特征在于,所述一个或多个处理器执行所述计算机可读指令时实现权利要求1~17中任意一项所述方法的步骤。
  20. 一种非易失性计算机可读存储介质,其特征在于:所述非易失性计算机可读存储介质存储有计算机可读指令,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行如权利要求1~17中任意一项所述方法的步骤。
PCT/CN2023/109739 2022-07-28 2023-07-28 磁盘阵列冗余方法、系统、计算机设备和存储介质 WO2024022469A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210895183.2 2022-07-28
CN202210895183.2A CN114968129B (zh) 2022-07-28 2022-07-28 磁盘阵列冗余方法、系统、计算机设备和存储介质

Publications (1)

Publication Number Publication Date
WO2024022469A1 true WO2024022469A1 (zh) 2024-02-01

Family

ID=82969538

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/109739 WO2024022469A1 (zh) 2022-07-28 2023-07-28 磁盘阵列冗余方法、系统、计算机设备和存储介质

Country Status (2)

Country Link
CN (1) CN114968129B (zh)
WO (1) WO2024022469A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114968129B (zh) * 2022-07-28 2022-12-06 苏州浪潮智能科技有限公司 磁盘阵列冗余方法、系统、计算机设备和存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6397347B1 (en) * 1998-02-26 2002-05-28 Nec Corporation Disk array apparatus capable of dealing with an abnormality occurring in one of disk units without delaying operation of the apparatus
US20130110909A1 (en) * 2011-11-02 2013-05-02 Jeffrey A. Dean Redundant Data Requests with Cancellation
CN110908613A (zh) * 2019-11-28 2020-03-24 深信服科技股份有限公司 一种数据写命令处理方法、装置、电子设备及存储介质
CN114020516A (zh) * 2022-01-05 2022-02-08 苏州浪潮智能科技有限公司 一种异常io处理的方法、系统、设备及可读存储介质
CN114968129A (zh) * 2022-07-28 2022-08-30 苏州浪潮智能科技有限公司 磁盘阵列冗余方法、系统、计算机设备和存储介质

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4325817B2 (ja) * 1999-04-05 2009-09-02 株式会社日立製作所 ディスクアレイ装置
JP2010244130A (ja) * 2009-04-01 2010-10-28 Toshiba Corp ディスクアレイ装置及びディスクアレイ制御方法
CN101561773B (zh) * 2009-06-03 2011-08-24 成都市华为赛门铁克科技有限公司 一种磁盘数据恢复方法及装置
CN101609420A (zh) * 2009-07-17 2009-12-23 杭州华三通信技术有限公司 实现磁盘冗余阵列重建的方法和磁盘冗余阵列及其控制器
US8307157B2 (en) * 2010-04-21 2012-11-06 Hitachi, Ltd. Disk array system and traffic control method
JP5938997B2 (ja) * 2012-03-30 2016-06-22 富士通株式会社 情報記憶装置、情報記憶装置制御プログラム、情報記憶装置制御方法
CN103019894B (zh) * 2012-12-25 2015-03-04 创新科存储技术(深圳)有限公司 一种独立冗余磁盘阵列的重建方法
CN110413225B (zh) * 2019-06-28 2023-01-10 苏州浪潮智能科技有限公司 高可靠集群存储双活配置方法、系统、终端及存储介质
CN112181298B (zh) * 2020-09-25 2022-05-17 杭州宏杉科技股份有限公司 阵列访问方法、装置、存储设备及机器可读存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6397347B1 (en) * 1998-02-26 2002-05-28 Nec Corporation Disk array apparatus capable of dealing with an abnormality occurring in one of disk units without delaying operation of the apparatus
US20130110909A1 (en) * 2011-11-02 2013-05-02 Jeffrey A. Dean Redundant Data Requests with Cancellation
CN110908613A (zh) * 2019-11-28 2020-03-24 深信服科技股份有限公司 一种数据写命令处理方法、装置、电子设备及存储介质
CN114020516A (zh) * 2022-01-05 2022-02-08 苏州浪潮智能科技有限公司 一种异常io处理的方法、系统、设备及可读存储介质
CN114968129A (zh) * 2022-07-28 2022-08-30 苏州浪潮智能科技有限公司 磁盘阵列冗余方法、系统、计算机设备和存储介质

Also Published As

Publication number Publication date
CN114968129B (zh) 2022-12-06
CN114968129A (zh) 2022-08-30

Similar Documents

Publication Publication Date Title
US6986076B1 (en) Proactive method for ensuring availability in a clustered system
JP4415610B2 (ja) 系切替方法、レプリカ作成方法、及びディスク装置
US6622261B1 (en) Process pair protection for complex applications
CN100426247C (zh) 数据恢复方法
JP2017084333A (ja) 仮想マシンクラスタの監視方法及びシステム
WO2024022469A1 (zh) 磁盘阵列冗余方法、系统、计算机设备和存储介质
JP2011060055A (ja) 仮想計算機システム、仮想マシンの復旧処理方法及びそのプログラム
CN105302661A (zh) 一种实现虚拟化管理平台高可用的系统和方法
CN111327467A (zh) 一种服务器系统及其容灾备份方法和相关设备
CN110825562B (zh) 数据备份方法、装置、系统和存储介质
US11809295B2 (en) Node mode adjustment method for when storage cluster BBU fails and related component
US20130086413A1 (en) Fast i/o failure detection and cluster wide failover
JP2006114064A (ja) 記憶サブシステム
JP3447347B2 (ja) 障害検出方法
JPH05314075A (ja) オンラインコンピュータ装置
JPH10116261A (ja) 並列計算機システムのチェックポイントリスタート方法
JPH07319836A (ja) 障害監視方式
JP5532687B2 (ja) 情報処理システム、情報処理システムの障害対応機構、及び情報処理システムの障害対応方法
CN114750774B (zh) 安全监测方法和汽车
CN117827544B (zh) 热备份系统、方法、电子设备及存储介质
CN113722156B (zh) 一种PCIe设备N+1冗余备份方法及系统
JPH07219802A (ja) 2重化制御方式
CN117806903A (zh) 远程复制监控方法、装置、电子设备和存储介质
JP5951520B2 (ja) 多重系処理システム
JP2004171373A (ja) ディスクアレイ装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23845673

Country of ref document: EP

Kind code of ref document: A1