CN115098302A

CN115098302A - Multi-control storage system RAID hot spare disk management method, system, terminal and storage medium

Info

Publication number: CN115098302A
Application number: CN202210842311.7A
Authority: CN
Inventors: 朱红玉
Original assignee: Suzhou Inspur Intelligent Technology Co Ltd
Current assignee: Suzhou Inspur Intelligent Technology Co Ltd
Priority date: 2022-07-18
Filing date: 2022-07-18
Publication date: 2022-09-23

Abstract

The invention relates to the technical field of storage, in particular to a method, a system, a terminal and a storage medium for managing a RAID hot spare disk of a multi-control storage system, which comprises the following steps: configuring a special hot standby disk for the RAID, and storing the special hot standby disk information to a hot standby array of the RAID; when a fault RAID is monitored, extracting special hot spare disk information from a hot spare disk array of the fault RAID, and reconstructing the fault RAID based on the extracted special hot spare disk information. The invention achieves the effect that the hot spare disk is special for the RAID, not only configures a special hot spare hard disk for the RAID with high importance degree to ensure the safety of the hot spare hard disk to the maximum degree, but also configures a global hot spare hard disk for the RAID with low importance degree.

Description

Multi-control storage system RAID hot spare disk management method, system, terminal and storage medium

Technical Field

The invention belongs to the technical field of storage, and particularly relates to a method, a system, a terminal and a storage medium for managing a multi-control storage system RAID hot spare disk.

Background

In RAID design, a hot spare disk is required to replace a member disk of an unavailable RAID due to various reasons, so as to ensure the redundancy characteristic of the RAID, and this process is called reconstruction of the RAID. When configuring the RAID, besides configuring the hard disks to form the RAID for storing data, a sufficient number of hot standby hard disks need to be configured in addition to the member hard disks, so as to ensure that a timely hot standby hard disk exists when the array fails. The hot spare hard disk will be gradually consumed along with the failure of the hard disk. In a conventional RAID hot spare design, the hot spare is typically set to be globally available.

Currently, a single storage system is often configured with multiple RAIDs, and these RAIDs are often configured with different services or for isolation of services, wherein the importance of each RAID is not the same. When a failure occurs, users often expect the RAID configured with important services to be reconstructed first, while the globally configured hot spare space often is limited in number, which results in the RAID that failed early being reconstructed rather than the more important RAID being reconstructed.

Disclosure of Invention

Aiming at the problem of unreasonable hot spare hard disk allocation in the prior art, the invention provides a method, a system, a terminal and a storage medium for managing a multi-control storage system RAID hot spare disk, so as to solve the technical problem.

In a first aspect, the present invention provides a method for managing a RAID hot spare disk of a multi-control storage system, including:

configuring a special hot standby disk for the RAID, and storing the special hot standby disk information to a hot standby array of the RAID;

when a fault RAID is monitored, extracting special hot spare disk information from a hot spare disk array of the fault RAID, and reconstructing the fault RAID based on the extracted special hot spare disk information.

Further, configuring a dedicated hot spare disk for the RAID, and storing information of the dedicated hot spare disk to a hot spare array of the RAID includes:

distributing global hard disk numbers for all hot standby disks;

according to the service importance degree of the RAID, priority levels are divided for the RAID;

assigning a special hot standby disk for the high-level RAID, and storing the global hard disk number of the special hot standby disk to a hot standby array of the high-level RAID;

storing the global hard disk numbers of the hot standby disks except the special hot standby disk into a global array;

and summarizing the global array, all high-level RAID information and corresponding hot standby arrays to a hot standby hard disk configuration file, and persisting the hot standby hard disk configuration file to all controllers.

Further, when a failed RAID is monitored, extracting dedicated hot spare disk information from a hot spare disk array of the failed RAID, and reconstructing the failed RAID based on the extracted dedicated hot spare disk information includes:

if the RAID is monitored to have a fault member disk, adding the RAID into a fault RAID list as a fault RAID;

and the multiple controllers select a target hot standby disk for the fault RAID in the fault RAID list and reconstruct the fault RAID based on the selected target hot standby disk.

Further, the multi-controller selects a target hot spare disk for the failed RAID in the failed RAID list, and reconstructs the failed RAID based on the selected target hot spare disk, including:

the plurality of controllers detect the numbers of the currently-alive controllers, and the controller with the smallest number is selected as an execution controller;

the execution controller reads a hot standby array of the failure RAID from a local hot standby hard disk configuration file, and searches available hot standby disk information from the hot standby array;

if the available hot spare disk information is found, all controllers are informed to reconstruct the fault RAID based on the available hot spare disk information;

if the available hot spare disk information is not found, searching global available hot spare disk information from a global array of the local hot spare hard disk configuration file, and informing all controllers to reconstruct the fault RAID based on the found global available hot spare disk information;

and if the available hot spare disk information and the global available hot spare disk information are not found, not reconstructing the failed RAID.

In a second aspect, the present invention provides a RAID hot spare disk management system for a multi-control storage system, including:

the hot standby configuration unit is used for configuring the special hot standby disk for the RAID and storing the special hot standby disk information to a hot standby array of the RAID;

and the hot standby selection unit is used for extracting the special hot standby disk information from the hot standby array of the failed RAID when the failed RAID is monitored, and reconstructing the failed RAID based on the extracted special hot standby disk information.

Further, the hot standby configuration unit includes:

the serial number distribution module is used for distributing global hard disk serial numbers for all the hot standby disks;

the system comprises a level dividing module, a priority level judging module and a priority level judging module, wherein the level dividing module is used for dividing the priority level of the RAID according to the service importance degree of the RAID;

the special designation module is used for designating a special hot standby disk for the high-level RAID and storing the global hard disk number of the special hot standby disk to a hot standby array of the high-level RAID;

the global storage module is used for storing the global hard disk numbers of the hot standby disks except the special hot standby disk into a global array;

and the special storage module is used for summarizing the global array, all high-level RAID information and corresponding hot standby arrays to a hot standby hard disk configuration file and persisting the hot standby hard disk configuration file to all controllers.

Further, the hot standby selection unit comprises:

the failure monitoring module is used for adding the RAID as a failure RAID into a failure RAID list when the failure member disk of the RAID is monitored;

and the target selection module is used for selecting a target hot standby disk for the fault RAID in the fault RAID list by the multiple controllers and reconstructing the fault RAID based on the selected target hot standby disk.

Further, the target selection module comprises:

the execution election submodule is used for detecting the serial numbers of the currently-alive controllers by the plurality of controllers, and electing the controller with the smallest serial number as an execution controller;

the first searching submodule is used for reading a hot standby array of the failure RAID from a local hot standby hard disk configuration file by the execution controller and searching available hot standby disk information from the hot standby array;

the first reconstruction submodule is used for informing all controllers to reconstruct the fault RAID based on the available hot spare disk information if the available hot spare disk information is found;

the second reconfiguration sub-module is used for searching global available hot spare disk information from a global array of the local hot spare disk configuration file if the available hot spare disk information is not searched, and informing all controllers to reconfigure the failed RAID based on the searched global available hot spare disk information;

and the reconstruction waiting submodule is used for not reconstructing the fault RAID if the available hot spare disk information and the global available hot spare disk information are not found.

In a third aspect, a terminal is provided, including:

a processor, a memory, wherein,

the memory is used for storing a computer program which,

the processor is used for calling and running the computer program from the memory so as to make the terminal execute the method of the terminal.

In a fourth aspect, a computer storage medium is provided having stored therein instructions that, when executed on a computer, cause the computer to perform the method of the above aspects.

The method, the system, the terminal and the storage medium for managing the hot spare disk of the RAID of the multi-control storage system have the advantages that the special hot spare disk is assigned to some RAIDs, so that the problem that the important RAID fails to be reconstructed due to the fact that no hot spare disk exists when the important RAID needs to be reconstructed is solved. The invention achieves the effect that the hot spare disk is special for the RAID, not only configures a special hot spare hard disk for the RAID with high importance degree to ensure the safety of the hot spare hard disk to the maximum degree, but also configures a global hot spare hard disk for the RAID with low importance degree.

In addition, the invention has reliable design principle, simple structure and very wide application prospect.

Drawings

In order to more clearly illustrate the embodiments or prior art solutions of the present invention, the drawings used in the description of the embodiments or prior art will be briefly described below, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.

FIG. 1 is a schematic flow diagram of a method of one embodiment of the invention.

FIG. 2 is a schematic block diagram of a system of one embodiment of the present invention.

Fig. 3 is a schematic structural diagram of a terminal according to an embodiment of the present invention.

Detailed Description

In order to make those skilled in the art better understand the technical solution of the present invention, the technical solution in the embodiment of the present invention will be clearly and completely described below with reference to the drawings in the embodiment of the present invention, and it is obvious that the described embodiment is only a part of the embodiment of the present invention, and not all embodiments. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, shall fall within the protection scope of the present invention.

The following explains key terms appearing in the present invention.

A RAID (Redundant Arrays of Independent Disks) has the meaning of an array with redundancy capability formed by a plurality of Independent Disks. The disk array is a disk group with huge capacity composed of a plurality of independent disks, and the performance of the whole disk system is improved by the additive effect generated by providing data by individual disks. With this technique, data is divided into a plurality of sectors, each of which is stored on a respective hard disk. The disk array can also utilize the concept of Parity Check (Parity Check) to read data when any disk in the array fails. When the data is reconstructed, the data can be calculated and then placed into a new hard disk again.

Basic composition of the storage system: hardware composition: a storage frame and a control frame. A safe box disc: the stored data is as follows: dirty data, system logs, alarm logs. The multi-control storage system means that the same raid group is shared or managed by a plurality of controllers at the same time.

FIG. 1 is a schematic flow diagram of a method of one embodiment of the invention. The execution subject in fig. 1 may be a RAID hot spare disk management system of a multi-control storage system.

As shown in fig. 1, the method includes:

step 110, configuring a special hot spare disk for the RAID, and storing the special hot spare disk information to a hot spare array of the RAID;

and 120, when the fault RAID is monitored, extracting special hot spare disk information from the hot spare disk array of the fault RAID, and reconstructing the fault RAID based on the extracted special hot spare disk information.

In order to facilitate understanding of the present invention, the following describes a method for managing a RAID hot spare disk of a multi-control storage system according to the principle of the method for managing a RAID hot spare disk of a multi-control storage system of the present invention, in combination with a process of managing a RAID hot spare disk of a multi-control storage system in an embodiment.

Specifically, the method for managing the RAID hot spare disk of the multi-control storage system includes:

and S1, configuring the special hot spare disk for the RAID, and storing the special hot spare disk information to a hot spare array of the RAID.

Distributing global hard disk numbers for all hot standby disks; according to the service importance degree of the RAID, priority levels are divided for the RAID; assigning a special hot standby disk for the high-level RAID, and storing the global hard disk number of the special hot standby disk into a hot standby array of the high-level RAID; storing the global hard disk numbers of the hot standby disks except the special hot standby disk into a global array; and summarizing the global array, all high-level RAID information and corresponding hot standby arrays to a hot standby hard disk configuration file, and persisting the hot standby hard disk configuration file to all controllers.

Specifically, when configuring the RAID, the dedicated hot spare hard disk number of the RAID is designated, and the RAID other than the RAID is controlled to prohibit the designated hot spare hard disk from being used as the hot spare hard disk. The algorithm is based on the cluster state machine implementation mode, and the hot standby disk selection logic is completed by two parts, including: initial configuration, hot standby selection monitoring, hot standby hard disk selection and multi-controller adoption suggestion.

Initializing an RAID configuration environment and configuring a RAID special hot standby hard disk:

(1) multiple controllers of the storage server operate to form a cluster, and initial data of all the controllers are consistent

(2) The user creates RAID through a command mode or an interface mode, and a plurality of controllers operate together to create RAID logic

(3) The user designates the hard disk as a special hot standby hard disk of the designated RAID in a command mode, and a plurality of controllers of the storage server configure the hot standby hard disk together

The step (3) of commonly configuring the hot standby hard disk comprises the steps of setting the specified hard disk as the hot standby hard disk and setting the specified hard disk as the special hot standby hard disk of the specified RAID. The configuration data array is consistent among the controllers.

The hot standby hard disk configuration information is stored by using a global array, and the global hard disk number of the hot standby hard disk is stored in the array; the configuration information of the special hot standby hard disk is stored by using an array belonging to each RAID, and the global hard disk number of the special hot standby hard disk is stored in the array.

And S2, when the failure RAID is monitored, extracting special hot spare disk information from the hot spare disk array of the failure RAID, and reconstructing the failure RAID based on the extracted special hot spare disk information.

If a failure member disk exists in the RAID, adding the RAID into a failure RAID list as a failure RAID; and the multiple controllers select a target hot standby disk for the fault RAID in the fault RAID list and reconstruct the fault RAID based on the selected target hot standby disk. The method comprises the following steps that the multiple controllers select target hot spare disks for a fault RAID in a fault RAID list, and reconstruct the fault RAID based on the selected target hot spare disks, and comprises the following steps: the plurality of controllers detect the numbers of the currently-alive controllers, and the controller with the smallest number is selected as an execution controller; the execution controller reads a hot standby array of the failure RAID from a local hot standby hard disk configuration file, and searches available hot standby disk information from the hot standby array; if the available hot spare disk information is found, all controllers are informed to reconstruct the fault RAID based on the available hot spare disk information; if the available hot spare disk information is not found, searching global available hot spare disk information from a global array of the local hot spare hard disk configuration file, and informing all controllers to reconstruct the fault RAID based on the found global available hot spare disk information; and if the available hot spare disk information and the global available hot spare disk information are not found, the fault RAID is not reconstructed.

Specifically, the hot standby selection monitoring method comprises the following steps: and monitoring the failure of the RAID member disk, adding the RAID needing the hot standby hard disk into a failure RAID list, and triggering the selection of the hot standby hard disk and the suggestion adoption of the multiple controllers. Monitoring the change of the RAID hot standby hard disk, reading a 'failure RAID list', and triggering 'hot standby hard disk selection and multi-controller proposal adoption' if the RAID needing hot standby exists in the 'failure RAID list'.

The method comprises the following steps of selecting a hot standby hard disk and adopting and executing the following multiple controllers by the aid of a multi-controller suggestion:

1) detecting the serial numbers of the currently-alive controllers by the plurality of controllers, and executing the steps 2, 3, 4, 5 and 6 if the serial number of the controller is the minimum serial number; otherwise, exit logic

2) And if the exclusive hot spare disk exists, selecting the exclusive hot spare disk available for the RAID member disk.

3) If the exclusive hot spare disk does not exist, selecting the exclusive hot spare disk which does not belong to the rest RAID:

reading an available hot standby hard disk in a hot standby array; traversing all the RAIDs, and removing the special hot standby hard disks belonging to the rest of the RAIDs; the smallest one of the remaining hot spare hard disks is selected.

4) If a hot spare hard disk is not selected, the array will not initiate a rebuild.

5) And if the hot standby hard disk is selected, updating the hot standby configuration data and storing the hot standby selection result.

6) And the controller synchronizes the hot standby configuration data and the hot standby selection result to other controllers in the cluster.

7) And the rest controllers synchronize the operation result and the configuration information of the controller, and the plurality of controllers start reconstruction after synchronization is completed.

As shown in fig. 2, the system 200 includes:

the hot standby configuration unit 210 is configured to configure a dedicated hot standby disk for the RAID and store information of the dedicated hot standby disk to a hot standby array of the RAID;

and the hot standby selection unit 220 is configured to, when a failed RAID is monitored, extract dedicated hot standby disk information from a hot standby array of the failed RAID, and reconstruct the failed RAID based on the extracted dedicated hot standby disk information.

Optionally, as an embodiment of the present invention, the hot standby configuration unit includes:

the serial number distribution module is used for distributing the global hard disk serial numbers for all the hot standby disks;

the special designation module is used for designating a special hot standby disk for the high-grade RAID and storing the global hard disk number of the special hot standby disk to a hot standby array of the high-grade RAID;

the global storage module is used for storing the global hard disk numbers of the hot standby disks except the special hot standby disk into the global array;

Optionally, as an embodiment of the present invention, the hot standby selecting unit includes:

Optionally, as an embodiment of the present invention, the target selecting module includes:

the execution election submodule is used for detecting the numbers of the currently-alive controllers by the plurality of controllers and electing the controller with the smallest number as an execution controller;

the first reconstruction submodule is used for informing all controllers of reconstructing the fault RAID based on the available hot spare disk information if the available hot spare disk information is found;

the second reconfiguration submodule is used for searching global available hot spare disk information from a global array of the local hot spare hard disk configuration file if the available hot spare disk information is not searched, and informing all controllers to reconfigure the fault RAID based on the searched global available hot spare disk information;

Fig. 3 is a schematic structural diagram of a terminal 300 according to an embodiment of the present invention, where the terminal 300 may be used to execute the method for managing a RAID hot spare disk of a multi-control storage system according to the embodiment of the present invention.

Among them, the terminal 300 may include: a processor 310, a memory 320, and a communication unit 330. The components communicate via one or more buses, and those skilled in the art will appreciate that the architecture of the servers shown in the figures is not intended to be limiting, and may be a bus architecture, a star architecture, a combination of more or less components than those shown, or a different arrangement of components.

The memory 320 may be used for storing instructions executed by the processor 310, and the memory 320 may be implemented by any type of volatile or non-volatile storage terminal or combination thereof, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic disk or optical disk. The executable instructions in memory 320, when executed by processor 310, enable terminal 300 to perform some or all of the steps in the method embodiments described below.

The processor 310 is a control center of the storage terminal, connects various parts of the entire electronic terminal using various interfaces and lines, and performs various functions of the electronic terminal and/or processes data by operating or executing software programs and/or modules stored in the memory 320 and calling data stored in the memory. The processor may be composed of an Integrated Circuit (IC), for example, a single packaged IC, or a plurality of packaged ICs connected with the same or different functions. For example, the processor 310 may include only a Central Processing Unit (CPU). In the embodiment of the present invention, the CPU may be a single operation core, or may include multiple operation cores.

A communication unit 330, configured to establish a communication channel so that the storage terminal can communicate with other terminals. And receiving user data sent by other terminals or sending the user data to other terminals.

The present invention also provides a computer storage medium, wherein the computer storage medium may store a program, and the program may include some or all of the steps in the embodiments provided by the present invention when executed. The storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM) or a Random Access Memory (RAM).

Therefore, the invention avoids the problem that the reconstruction fails because the important RAID does not have the hot spare disk when the reconstruction is needed by assigning the special hot spare disk for some RAIDs. The invention achieves the effect that the hot spare disk is dedicated by the RAID, namely, configuring a dedicated hot spare hard disk for the RAID with high importance degree to ensure the security of the RAID to the greatest extent, and configuring a global hot spare hard disk for the RAID with low importance degree.

Those skilled in the art will readily appreciate that the techniques of the embodiments of the present invention may be implemented as software plus a required general purpose hardware platform. Based on such understanding, the technical solutions in the embodiments of the present invention may be substantially or partially embodied in the form of a software product, the computer software product is stored in a storage medium, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and various media capable of storing program codes include several instructions for enabling a computer terminal (which may be a personal computer, a server, or a second terminal, a network terminal, etc.) to execute all or part of the steps of the method in the embodiments of the present invention.

The same and similar parts in the various embodiments in this specification may be referred to each other. Especially, for the terminal embodiment, since it is basically similar to the method embodiment, the description is relatively simple, and the relevant points can be referred to the description in the method embodiment.

In the embodiments provided in the present invention, it should be understood that the disclosed system and method can be implemented in other ways. For example, the above-described system embodiments are merely illustrative, and for example, the division of the units is only one logical functional division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, systems or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

Although the present invention has been described in detail by referring to the drawings in connection with the preferred embodiments, the present invention is not limited thereto. Various equivalent modifications or substitutions can be made on the embodiments of the present invention by those skilled in the art without departing from the spirit and scope of the present invention, and these modifications or substitutions are within the scope of the present invention/any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A method for managing a hot spare disk of a multi-control storage system RAID is characterized by comprising the following steps:

2. The method of claim 1, wherein configuring the dedicated hot spare disk for the RAID and saving the dedicated hot spare disk information to a hot spare array of the RAID comprises:

distributing global hard disk numbers for all hot standby disks;

3. The method of claim 2, wherein when a failed RAID is monitored, extracting dedicated hot spare disk information from a hot spare array of the failed RAID, and reconstructing the failed RAID based on the extracted dedicated hot spare disk information comprises:

if a failure member disk exists in the RAID, adding the RAID into a failure RAID list as a failure RAID;

4. The method of claim 3, wherein the selecting, by the multi-controller, a target hot spare disk for the failed RAID in the failed RAID list, and reconstructing the failed RAID based on the selected target hot spare disk comprises:

detecting the serial numbers of the currently-alive controllers by the plurality of controllers, and selecting the controller with the smallest serial number as an execution controller;

5. A multi-control storage system RAID hot spare disk management system is characterized by comprising:

6. The system of claim 5, wherein the hot standby configuration unit comprises:

7. The system of claim 6, wherein the hot standby selection unit comprises:

8. The system of claim 7, wherein the target selection module comprises:

9. A terminal, comprising:

a processor;

a memory for storing instructions for execution by the processor;

wherein the processor is configured to perform the method of any one of claims 1-4.

10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1-4.