CN116560916A

CN116560916A - Disk switching method, system, device, medium and distributed storage system

Info

Publication number: CN116560916A
Application number: CN202310828471.0A
Authority: CN
Inventors: 田润芦
Original assignee: Suzhou Inspur Intelligent Technology Co Ltd
Current assignee: Suzhou Inspur Intelligent Technology Co Ltd
Priority date: 2023-07-07
Filing date: 2023-07-07
Publication date: 2023-08-08

Abstract

The application discloses a disk switching method, a disk switching system, a disk switching device, a disk switching medium and a distributed storage system, relates to the technical field of storage, and solves the problems that the efficiency is low and the operation of the distributed system is affected when a failed disk is manually replaced. In the scheme, a plurality of main disks are monitored to judge whether the main disk fails or not; if yes, selecting a target spare disk which corresponds to the main disk in a one-to-one correspondence manner from the plurality of spare disks; and migrating the data stored on the main disk corresponding to the fault to the target spare disk. Therefore, in the method, the fault of the main disk is found in time by monitoring the plurality of main disks, the target spare disk is electrified, and the data stored on the target spare disk is migrated to the target spare disk, so that the automatic switching of the disks is realized, the efficiency and the speed of the disk switching are improved, the operation and maintenance cost is reduced, and the safety and the fault tolerance of the distributed storage system are improved.

Description

Disk switching method, system, device, medium and distributed storage system

Technical Field

The present disclosure relates to the field of storage technologies, and in particular, to a method, a system, an apparatus, a medium, and a distributed storage system for disk switching.

Background

With the advent of the digital information age, the data volume has become increasingly large and storage and protection has become extremely important. Distributed storage is a new technology, and mass data is stored in a scattered manner by utilizing disk spaces of machines in an enterprise to form virtual equipment. However, at present, the disk fault maintenance mode is still performed through manual command or field operation, which is time-consuming and has high operation and maintenance cost, and the process of manually replacing the fault disk affects the system service.

Therefore, a new fault handling method is needed to reduce the impact on the system, shorten the processing cycle, reduce the cost and improve the efficiency.

Disclosure of Invention

The purpose of the application is to provide a disk switching method, a system, a device, a medium and a distributed storage system, by monitoring a plurality of main disks, a main disk fault is found in time, a target spare disk is electrified, and data stored on the target spare disk is migrated to the target spare disk, so that automatic disk switching is realized, the efficiency and the speed of disk switching are improved, the operation and maintenance cost is reduced, and the safety and the fault tolerance of the distributed storage system are improved.

In order to solve the above technical problems, the present application provides a disk switching method, which is applied to a distributed storage system, where the distributed storage system includes a plurality of primary disks and a plurality of spare disks, and the method includes:

Judging whether the main disk fails or not;

if yes, selecting a target spare disk which corresponds to the main disk with the fault one by one from a plurality of spare disks;

and migrating the data stored on the main disk corresponding to the fault to the target spare disk.

In one embodiment, a plurality of the spare disks are not powered up in an initial state;

before migrating the data stored on the main disk corresponding to the fault to the target spare disk, the method further comprises:

and powering up the target standby disk.

In one embodiment, determining whether there is a failure of the primary disk includes:

judging whether the service life of the main disk reaches the service life upper limit;

if the service life of the main disk reaches the service life upper limit, judging that the main disk with the service life reaching the service life upper limit fails.

executing reading operation and/or writing operation on each main disk at intervals of preset time, and judging whether the process of the reading operation and/or writing operation corresponding to the main disk is abnormal or not;

if the process of the read operation and/or the write operation corresponding to the main disk is abnormal, the main disk with the abnormal process of the read operation and/or the write operation is failed.

monitoring performance indexes of a plurality of main disks, and judging whether the performance indexes exceed corresponding preset thresholds or not;

if yes, judging that the main disk exceeding the preset threshold fails.

In one embodiment, selecting a target spare disk from the plurality of spare disks, wherein the target spare disk corresponds to the failed main disk one by one, comprises:

and performing in-place detection on each spare disk, and selecting the in-place spare disk from a plurality of spare disks as the target spare disk.

In one embodiment, selecting a spare disk in place from a plurality of the spare disks as the target spare disk includes:

and selecting the standby disk which is in place and is free from the standby disk in place as the target standby disk.

In one embodiment, after determining that there is a failure of the primary disk, the method further comprises:

determining the fault type of a main disk with faults or determining the fault type of each storage area in the main disk with faults;

migrating data stored on the failed primary disk corresponding to the data to the target spare disk, including:

And determining a target migration mode according to the determined failure type of the main disk or each storage area of the main disk, and migrating the data stored on the main disk corresponding to the failure to the target standby disk by using the target migration mode.

In one embodiment, determining the target migration mode according to the determined failure type of the primary disk or each storage area of the primary disk includes:

when data of a storage area which does not have faults in a main disk with faults are migrated, determining that the target migration mode is copy;

and migrating the data stored on the main disk corresponding to the failure to the target standby disk by using the target migration mode, wherein the method comprises the following steps:

and copying the data in the storage area which does not have faults in the main disk corresponding to the data to the target standby disk.

when data of a storage area with a fault in a main disk with the fault is migrated, determining that the target migration mode is data reconstruction;

and acquiring data in a storage area which is in failure in the main disk and corresponds to the main disk and in failure from other main disks which are not in failure in a data reconstruction mode, and storing the data in the target spare disk.

In one embodiment, the distributed storage system is further provided with a prompting device corresponding to each magnetic disk one by one; when it is determined that there is a failure of the primary disk, further comprising:

and setting a prompting device corresponding to the target standby disk corresponding to the main disk with the fault to be in a first state so as to generate first prompting information.

In one embodiment, when it is determined that there is a failure of the primary disk, the method further includes:

and setting the prompting device corresponding to the main disk with the fault to be in the second state so as to generate the second prompting information.

In one embodiment, when migrating data stored in the failed primary disk, the method further comprises:

and setting a prompting device corresponding to the target standby disk corresponding to the main disk with the fault to be in a third state so as to generate third prompting information.

In one embodiment, when migration of data stored in the failed primary disk is completed, the method further comprises:

and setting the corresponding standby disk of the prompting device target corresponding to the main disk with the fault to be in a fourth state so as to generate fourth prompting information.

In one embodiment, the indication means is a display prompt means and/or an audible prompt means.

In one embodiment, after migrating the data stored on the failed primary disk corresponding to itself to the target spare disk, the method further includes:

reminding a worker to swap the position of the target spare disk and the position of the main disk with faults.

In one embodiment, before the swapping the location of the target spare disk and the location of the failed primary disk, further comprising:

and performing power-off processing on the slot position of the target spare disk.

In one embodiment, the process of exchanging the position of the target spare disk and the position of the main disk with the fault further comprises:

determining whether the target spare disk is pulled out of a slot position where the target spare disk is located;

and if the data is pulled out, issuing a reconfiguration prohibition instruction to the distributed storage system so as to prohibit the system from reconstructing the data.

In one embodiment, after determining that the target spare disk has been pulled from the slot in which the target spare disk is located, the method further comprises:

determining whether the target standby disk is inserted into a slot position of a main disk with a fault;

and if so, issuing a recovery reconstruction instruction to the distributed system so as to enable the system to recover data reconstruction.

In order to solve the above technical problem, the present application further provides a disk switching system, which is applied to a distributed storage system, where the distributed storage system includes a plurality of primary disks and a plurality of spare disks, and the system includes:

the detection unit is used for judging whether the main disk fails or not;

the triggering unit is used for selecting target spare disks which are in one-to-one correspondence with the failed main disk from the plurality of spare disks when the main disk fails;

and the data migration unit is used for migrating the data stored on the main disk corresponding to the failure to the target standby disk.

In order to solve the above technical problem, the present application further provides a disk switching device, including:

a memory for storing a computer program;

a processor for implementing the steps of the disk switching method as described above when storing a computer program.

To solve the above technical problem, the present application further provides a computer readable storage medium, where a computer program is stored, where the computer program is executed by a processor to implement the steps of the disk switching method as described above.

In order to solve the technical problem, the application also provides a distributed storage system, which comprises the disk switching device, a plurality of main disks and a plurality of spare disks.

The application provides a disk switching method, a disk switching system, a disk switching device, a disk switching medium and a distributed storage system, relates to the technical field of storage, and solves the problems that the efficiency is low and the operation of the distributed system is affected when a failed disk is manually replaced. In the scheme, a plurality of main disks are monitored to judge whether the main disk fails or not; if yes, selecting a target spare disk which corresponds to the main disk in a one-to-one correspondence manner from the plurality of spare disks; and migrating the data stored on the main disk corresponding to the fault to the target spare disk. Therefore, in the method, the fault of the main disk is found in time by monitoring the plurality of main disks, the target spare disk is electrified, and the data stored on the target spare disk is migrated to the target spare disk, so that the automatic switching of the disks is realized, the efficiency and the speed of the disk switching are improved, the operation and maintenance cost is reduced, and the safety and the fault tolerance of the distributed storage system are improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the following description will briefly explain the drawings needed in the prior art and embodiments, and it is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic flow chart of a disk switching method provided in the present application;

FIG. 2 is a schematic flow chart of a disk switching method provided in the present application;

FIG. 3 is a flow chart of a worker during disc replacement provided in the present application;

FIG. 4 is a block diagram of a disk switching system provided in the present application;

FIG. 5 is a block diagram of a disk switching apparatus according to the present application;

FIG. 6 is a block diagram of a computer readable storage medium provided herein;

fig. 7 is a block diagram of a distributed storage system provided in the present application.

Detailed Description

The core of the application is to provide a disk switching method, a system, a device, a medium and a distributed storage system, by monitoring a plurality of main disks, timely finding out the failure of the main disk, powering on a target standby disk and transferring the data stored on the target standby disk to the target standby disk, thereby realizing the automatic switching of the disks, improving the efficiency and the speed of disk switching, reducing the operation and maintenance cost and improving the safety and fault tolerance of the distributed storage system.

For the purposes of making the objects, technical solutions and advantages of the embodiments of the present application more clear, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.

In particular, in the conventional distributed storage system, the disk failure requires manual command or field operation maintenance, and has the following difficulties and problems: the time and effort are consumed, the fault maintenance process is performed by manual command or on-site operation, the time and effort are consumed, and the service is influenced. The operation and maintenance cost is high: the fault maintenance requires professional personnel to operate, and high manpower and material cost is paid. System availability is poor: in the fault maintenance process, the system service may be interrupted, and misoperation is easy to occur.

Aiming at the technical problems, the application provides a disk switching method, which utilizes a plurality of spare disks to perform fault switching, so that the usability and stability of a system are improved, and the operation and maintenance cost is reduced.

The main technical principle of the embodiment is to automatically detect the disk faults, automatically select the spare disk to replace the faulty disk and transfer the data to the target spare disk, thereby realizing the automation and optimization of disk switching. Referring to fig. 1, fig. 1 is a flow chart of a disk switching method provided in the present application, where the method is applied to a distributed storage system, and the distributed storage system includes a plurality of primary disks and a plurality of spare disks, and the method includes:

s11: monitoring the plurality of main disks to judge whether the main disk fails or not;

in this embodiment, the function of automatically detecting the disk failure is realized by periodically monitoring the main disk. In the specific implementation process, the monitoring of the main disk may include monitoring indexes such as disk read-write speed, response time, disk space utilization rate and the like to determine whether a fault exists. The flow of disk switching is automatically triggered upon failure.

Specifically, a disk performance index model under normal conditions can be established through long-term experiments and monitoring, and when the disk is abnormal, whether faults exist or not can be rapidly judged by comparing the changes of the disk performance index. Specifically, whether an abnormality exists in a certain performance index can be determined by setting a proper threshold, for example, when the index exceeds the threshold, the abnormality of the index is determined.

In one embodiment, determining whether there is a primary disk failure includes:

Considering that in a distributed storage system, there may be a difference in the life of a plurality of main disks, if the implementation is not proper, some early failures of the disks may occur, resulting in data loss and instability of the system. Therefore, in this embodiment, by monitoring the service life of the main disk, whether the main disk fails or not is determined, and when the service life of the main disk reaches the upper limit, it is indicated that the disk may have a failure risk, and disk switching should be performed in time. At this time, a corresponding target spare disk is selected from the spare disks, and data is migrated to the target spare disk.

Specifically, in this embodiment, the life expiration detection technology may detect whether the life of the main disk reaches the life upper limit, and specifically, whether the main disk has a failure is detected by counting the number of writing times of the Flash storage unit. As the writing times of the Flash storage unit of the main disk are limited, the writing times gradually increase along with the time, and if the writing upper limit of the design of the main disk is reached, the service life of the main disk reaches the service life upper limit and can not be used continuously. That is, in this embodiment, by counting the number of times of writing to the primary disk, it is possible to determine whether or not there is a failure in the primary disk.

When selecting the target spare disk, the spare disk with the highest matching degree needs to be selected for switching in consideration of the storage content of the main disk and the storage capacity of the spare disk.

For example, assuming that there are three primary disks and two spare disks in the distributed storage system, the lifetime of primary disk a reaches an upper limit and fails. The system automatically selects the spare disk B as a target spare disk, and migrates the data stored on the failed main disk A to the spare disk B.

In addition, for the main disk with shorter service life, an early warning mechanism can be arranged so as to switch the disk in advance and reduce the occurrence rate of faults.

The disk switching method provided by the embodiment can effectively reduce data loss and system instability caused by disk faults, improves the reliability and stability of the system, and reduces the data recovery and maintenance cost.

In another embodiment, determining whether there is a primary disk failure includes:

Specifically, in the distributed storage system, the failure of the disk may cause data loss or cannot be read, so in this embodiment, by periodically performing read-write operation on the main disk and monitoring whether an abnormality occurs in the process, timely monitoring of the disk status is achieved. The process of read-write operation requires the response of the disk itself, and if the disk cannot respond normally, the fault is indicated. When the fault of the disk is found, the target spare disk is automatically selected from the spare disks, so that automatic disk switching is realized.

In this embodiment, by executing a read operation and/or a write operation on each main disk at intervals of a preset time, it is determined whether an abnormality occurs in a process of the read operation and/or the write operation corresponding to the main disk; if the process of the read operation and/or the write operation corresponding to the main disk is abnormal, the main disk with the abnormal process of the read operation and/or the write operation is judged to be faulty. The specific process can be as follows: presetting a monitoring time interval, and executing read-write operation on all the main disks at intervals. The response time of the disk is monitored, if the response time exceeds a preset value or the response time is overtime, the partition of the disk is damaged or the surface of the disk is damaged, or a data area on the disk is damaged, and the disk is considered to have faults. And selecting a target spare disk from the spare disks, and transferring the data on the fault disk to the target spare disk. And (5) putting the target spare disk on line again, and continuing to ensure the stable operation of the system.

Such as: the disk switching method of the present embodiment is assumed to be used in a distributed storage system including 5 primary disks and 5 spare disks. The preset monitoring time interval is 30 minutes, and the read-write operation is carried out every 30 minutes. If the read-write operation on a certain disk is abnormal (response time is overtime, etc.), the disk is judged to be faulty. The system automatically selects a target spare disk from the spare disks, migrates data on the failed disk to the target spare disk, and re-brings the target spare disk on line.

In addition, besides basic monitoring indexes such as read-write operation, other indexes such as disk temperature can be considered to be added so as to more comprehensively judge the state of the disk. Furthermore, the preset time interval can be shortened according to actual requirements, so that more timely and accurate geomagnetic disk monitoring is realized.

In summary, the core technology of the embodiment is based on periodic read-write operation, and by monitoring the response condition of the disk, the automatic detection and switching of the disk faults are realized. Compared with the traditional disk monitoring and switching method, the scheme of the invention has the advantages of high monitoring precision, high reaction speed, high automation degree and the like, and can realize stable data storage and operation in a large-scale distributed storage system.

S12: if yes, selecting a target spare disk which corresponds to the main disk in a one-to-one correspondence manner from the plurality of spare disks;

in this embodiment, when only one main disk fails, one corresponding target spare disk may be selected at random or according to a preset requirement; when a plurality of main disks are in failure, a plurality of target standby disks which are in one-to-one correspondence with the failed main disks are selected. The mode of selecting the target spare disk can be specified by a user or selected according to requirements.

Specifically, when selecting according to the requirements, the type, capacity performance and the like of the spare disk and the corresponding relation between the spare disk and the main disk with faults can be considered, but not limited to. By evaluating factors such as reliability, capacity and performance of the spare disk, the optimal spare disk is selected by combining factors such as the specification of the main disk and data migration time of faults.

In one embodiment, selecting a target spare disk from a plurality of spare disks that corresponds one-to-one to a failed primary disk comprises:

and selecting an idle spare disk from the plurality of spare disks as a target spare disk.

Specifically, if the spare disk is directly selected as the target spare disk to switch, the switching failure may be caused by insufficient number of spare disks or insufficient spare disks. Therefore, a reasonable approach is needed to select the target spare disk to ensure the handoff success rate.

In this embodiment, when selecting a target spare disk, two aspects are considered: availability and number of free disks. Availability refers to whether the spare disk can meet the requirement of replacing a faulty main disk, namely, has the same capacity, speed and other characteristics; the number of spare disks refers to the number of spare disks available for switching in the system. Therefore, the present embodiment can select an appropriate spare disk as the target spare disk in consideration of both aspects.

When the spare disk is selected as the target spare disk, only one spare disk which is not allocated to other main disks is needed to be found out from the existing spare disks. This does not affect the use of other spare disks, nor does it place additional burden on the system.

In summary, in this embodiment, factors such as availability of spare disks and the number of spare disks are comprehensively considered, so that the success rate of system switching can be improved. Meanwhile, the idle spare disk is selected as the target spare disk, so that extra burden on the system can be avoided, and the performance and stability of the system can be improved.

In one embodiment, selecting an empty spare disk from a plurality of spare disks as a target spare disk includes:

And performing in-place detection on each spare disk, and selecting an idle spare disk from the in-place spare disks as a target spare disk.

In this embodiment, when selecting, first, in-place detection is performed on each spare disk, and an idle spare disk is selected from the in-place spare disks as a target spare disk. Thus, the effectiveness and idle state of the spare disk can be ensured, and the reliability of data and the high efficiency of a storage system are improved. The in-place detection refers to detection of a power supply and a connection interface of the spare disk so as to ensure that the spare disk is in a normal working state. Selecting an idle spare disk refers to selecting an idle disk, from among spare disks, that is not storing data or is performing other operations as a target spare disk. Thus, the continuity and the integrity of the data can be ensured, and the data collision and the data error are avoided.

For example, in a distributed storage system, there are 5 primary disks and 3 spare disks, where spare disks 1 and 2 are detected as in-place by slots and spare disk 3 is out of place, at which point primary disk 4 fails and a switch to the spare disk is required. According to the step S2, firstly, scanning the spare disk, screening out 2 spare disks meeting the requirements, then selecting an idle spare disk from the 2 spare disks as a target spare disk, and transferring the data on the main disk 4 to the target spare disk.

In summary, in this embodiment, the spare disk is selected as the target spare disk, so that the validity and reliability of the spare disk can be improved, data errors and conflicts are avoided, and the continuity and integrity of the data are ensured.

S13: and migrating the data stored on the main disk corresponding to the fault to the target spare disk.

Specifically, migration of data is a key step in achieving disk switching. In the specific implementation process, the method of reading data from the fault disk and transferring the data to the target spare disk is adopted, so that the integrity and consistency of the data are ensured.

In addition, when the data on the main disk with failure is backed up to the spare disk, a repeated check and synchronization mechanism can be set, so that the integrity and accuracy of the data are ensured, and after the migration is completed, the data are checked to judge whether the data are lost or not.

Further, before the data migration, the method may further include: the target spare disk is formatted.

In one embodiment, after determining that there is a failure of the primary disk, further comprising:

Migrating data stored on the failed primary disk corresponding to the data to a target spare disk, comprising:

and determining a target migration mode according to the determined failure type of the main disk or each storage area of the main disk, and migrating the data stored on the main disk corresponding to the failure to a target standby disk by using the target migration mode.

In particular, in a distributed storage system, how to ensure reliability and durability of data and how to quickly switch to a spare disk when a failure of a primary disk occurs is a very challenging technical problem.

In this embodiment, by monitoring a plurality of primary disks and determining a failure type, a corresponding target spare disk is quickly selected, and data stored on the failed primary disk is migrated to the target spare disk by using a target migration method.

The mode in the embodiment can greatly improve the reliability and durability of the data of the distributed storage system, reduce the data loss caused by the failure of the main disk and improve the stability and usability of the system.

When data of a storage area which does not have faults in a main disk with faults are migrated, determining that a target migration mode is copy;

migrating data stored on the main disk corresponding to the failure to the target spare disk by using a target migration mode, wherein the method comprises the following steps:

and copying the data in the storage area which does not have faults in the main disk corresponding to the data in the storage area to the target spare disk.

In this embodiment, when data of a storage area that does not fail in a failed primary disk is migrated, a target migration mode is selected as a copy. And then migrating the data in the storage area which does not have faults to a target standby disk by using a copy migration mode. The migration mode with high speed and small influence on the service can quickly migrate the data to the target standby disk, and the consistency and the correctness of the data are ensured.

In addition to the copy mode, other target migration modes, such as a mirror mode, an incremental backup mode, and the like, may be employed. Different migration modes are selected according to specific scenes and requirements, and high-efficiency reliability of disk switching is guaranteed.

In summary, the disk switching method based on the fault type is adopted in the embodiment, so that an optimal migration mode can be selected according to the specific fault type, the safety and reliability of data are ensured, the stability and usability of the system are improved, and the speed and efficiency of data access are improved.

when data of a storage area with a fault in a main disk with the fault is migrated, determining a target migration mode as data reconstruction;

and acquiring data in a storage area which is in failure in the main disk and corresponds to the main disk and in failure from other main disks which are not in failure by means of data reconstruction, and storing the data in a target standby disk.

In this embodiment, when migration is performed on data in a storage area where a failure occurs in a failed primary disk, copy migration cannot be performed, so that data in a storage area where a failure occurs in a primary disk corresponding to the failure is obtained from other primary disks where no failure occurs in a data reconstruction manner, and is stored in a target spare disk, so as to achieve recovery of data in the failed disk.

The data reconstruction refers to a process of recombining existing data according to a specific rule of the data to obtain a new set of data. In a distributed storage system, data is generally stored in a multi-copy mode, and data reconstruction can be implemented by acquiring and reorganizing data from a main disk which is not failed to recover the data of the failed disk.

In one embodiment, the plurality of spare disks are not powered up in an initial state;

and powering up the target standby disk.

Specifically, the spare disk is not powered up in the initial state, and power-up processing is performed only when the spare disk is selected as a target spare disk, so that power consumption is reduced.

In summary, the data in the failed storage area can be recovered in the mode in this embodiment, which not only can improve the recovery efficiency of the data, but also can ensure the integrity and reliability of the data. Meanwhile, the method can also reduce the possibility of data loss and improve the fault tolerance and stability of the system.

In one embodiment, the distributed storage system is also provided with a prompting device which corresponds to each magnetic disk one by one; when it is determined that the main disk fails, the method further comprises: and setting a prompting device corresponding to the target standby disk corresponding to the main disk with the fault to be in a first state so as to generate first prompting information. In one embodiment, when it is determined that there is a failure of the primary disk, the method further includes: and setting the prompting device corresponding to the main disk with the fault to be in a second state so as to generate second prompting information. In one embodiment, when migrating data stored in the failed primary disk, the method further comprises: and setting a prompting device corresponding to the target standby disk corresponding to the main disk with the fault to be in a third state so as to generate third prompting information. In one embodiment, when migration of data stored in the failed primary disk is completed, the method further comprises: and setting the corresponding standby disk of the prompting device target corresponding to the main disk with the fault to be in a fourth state so as to generate fourth prompting information.

In this embodiment, by monitoring the operation condition of the primary disk, once a failure is found, the primary disk is automatically switched to the standby disk, and corresponding data is migrated to the target standby disk, and the operation state of the staff system is prompted by the prompting device, so that the system is maintained in time.

Specifically, when the main disk is found to be faulty, the prompting device corresponding to the main disk and the target spare disk corresponding to the main disk is set to be in a first state, so as to generate first prompting information. Meanwhile, in the data migration process, the prompting device corresponding to the target spare disk is set to be in a third state so as to generate third prompting information. And when the data stored in the main disk with the fault is migrated, setting the corresponding standby disk of the prompting device target corresponding to the main disk with the fault to be in a fourth state so as to generate fourth prompting information.

In practical applications, status prompts may be implemented using indicator lights as prompting means. When the main disk fault is found, the first prompt information is generated by setting the target spare disk corresponding indicator lamp corresponding to the main disk to be red, and the first prompt information can be generated by setting the main disk corresponding indicator lamp to be red. In the data migration process, the indicator light corresponding to the target spare disk can be set to be in a yellow or orange state to generate third prompt information. When the data stored in the main disk with the fault is migrated, the indicator light corresponding to the target standby disk can be set to be in a blue state to generate fourth prompt information, and the indicator light of the main disk with the fault can be turned off. The specific implementation mode can be adjusted according to actual conditions.

In the disk switching method provided by the application, the prompting device can be an indicator lamp, and different states are represented by lamps with different colors. In addition, the state prompt can be performed by means of sound, vibration and the like. Meanwhile, the state information can be transmitted to a mobile phone or a computer of an administrator through a network or a wireless signal, or a fault notification can be sent to the administrator in a mode of mail, short message, telephone and the like. The present application is not limited herein.

In summary, the embodiment prompts the running state of the administrator system through the state of the prompting device, so that the system can be maintained in time, and the reliability and the safety of the system are further improved.

reminding a worker to exchange the position of the target spare disk with the position of the main disk with the fault.

After the data on the failed main disk is migrated to the target spare disk, in order to ensure the normal operation of the system, the position of the target spare disk and the position of the failed main disk need to be exchanged. This way of exchanging may be done by means of a prompting device (e.g. indicator lights, sounds etc.) to alert the staff. The specific implementation mode can be adjusted according to actual conditions.

In one embodiment, before the swapping of the location of the target spare disk and the location of the failed primary disk, further comprising:

In this embodiment, in order to improve the security and data integrity of the system, it is ensured that the data on the target spare disk will not be deleted or tampered by mistake before the disk position swap is performed. And performing power-off processing on the slot position of the target spare disk.

In addition, besides the power-off processing of the slot position of the target spare disk, other modes can be adopted to ensure the safety and the integrity of the data. For example, the data may be encrypted before being stored on the target spare disk to prevent data leakage and tampering.

In conclusion, the embodiment can avoid data loss and data falsification caused by misoperation, and improves the safety and data integrity of system operation.

In one embodiment, the process of exchanging the position of the target spare disk with the position of the main disk with failure further comprises:

determining whether the target spare disk is pulled out of the slot position of the target spare disk;

if the data is pulled out, a reconfiguration prohibition instruction is issued to the distributed storage system so as to prohibit the system from reconstructing the data.

In particular, in a distributed storage system, when a disk needs to be replaced, the system typically automatically performs data reconstruction to ensure consistency and integrity of the data. Therefore, in performing disk swap operations, consideration needs to be given to how to avoid the impact of data reconstruction on the system.

In this embodiment, by determining whether the target spare disk has been pulled out from the slot position where the target spare disk is located, if the target spare disk has been pulled out, it is indicated that the spare disk is no longer involved in the data reconstruction operation of the system, so that a command for prohibiting the reconstruction can be issued to the system, so as to avoid the influence of the data reconstruction on the system.

The specific implementation mode can be as follows: the status bit is set in the control circuit of the target spare disk, and when the target spare disk is pulled out, the status bit becomes "1", and the system can judge whether the target spare disk has been pulled out or not by reading the status bit. If the command is pulled out, a command for prohibiting the reconstruction is issued to the system.

In summary, the method and the device for detecting the extraction state of the target spare disk avoid the influence of the disk on the reconstruction of system data caused by the replacement operation, ensure the data consistency and the integrity of the system, and improve the reliability and the stability of the system. Meanwhile, by prohibiting the data reconstruction instruction, the data reconstruction operation can be controlled more flexibly.

determining whether the target standby disk is inserted into the slot position of the main disk with the fault;

if so, a recovery reconstruction instruction is issued to the distributed system so that the system recovers the data reconstruction.

In this embodiment, the system is enabled to resume data reconstruction based on the above embodiment. Specifically, whether the target spare disk is inserted into the slot position of the main disk with the fault is determined; if the data is inserted, a recovery reconstruction instruction is issued to the distributed system so as to reconstruct the system recovery data, thereby facilitating the re-balanced distribution of the data of the target spare disk and the data of other disks and ensuring the effective storage and backup of the data. In addition, the system can record the position information of the target spare disk and compare the position information with other spare disks so as to ensure that the state information of the disks in the system is correctly maintained and managed.

In the embodiment, through the operations of plugging and unplugging the spare disk, recovering the reconstructed data, recording and comparing the position information and the like, the safe storage and reliable backup of the data in the distributed storage system are effectively ensured, and the self-repairing capability of the system is improved. Meanwhile, staff can timely process the faults of the magnetic disk according to warning information sent by the system, and continuous damage and data loss of the magnetic disk are avoided.

Referring to fig. 2, fig. 2 is a specific flow chart of a disk switching method provided in the present application. In one embodiment, the process is as follows:

1. when the system detects the fault of the main disk, the indicator light of the main disk with the fault is lightened to be red, the slot position of the spare disk of the system is scanned, and if the spare disk which is available (in place and idle) is scanned, the power-on operation of the spare disk (equivalent to the cold spare disk in fig. 2) is triggered.

2. And after the spare disk is electrified, the lamp indicator lamp of the spare disk is turned yellow, and the damage condition of the main fault disk is judged. Firstly, migrating data of a storage area which does not have faults in a main disk in a copying mode with higher speed and smaller influence on service. If the failed storage area cannot be copied and restored, the data is restored by the backup of other nodes (other main disks) of the system.

3. And after the data in the failed storage area is backed up in the target spare disk, turning on a blue lamp to prompt, and meanwhile, performing turn-off treatment on the failed main disk, wherein the target spare disk replaces the failed main disk. And prompting a worker to operate the target spare disk to replace the failed disk.

4. After the staff replaces the main disk with the target spare disk, the system performs power-off processing on the slot position of the target spare disk, and the staff reinserts a new spare disk, so that the system can be used when the main disk fails next time.

Referring to fig. 3, fig. 3 is a flowchart of a worker during disc replacement according to the present application. In fig. 3, after the system recognizes that the target spare disk (equivalent to the cold spare disk in fig. 3) is pulled out, the system is set to prohibit data reconstruction, and at the same time, the slot position of the target spare disk is powered down. After reinserting the target spare disk at the slot position of the main disk with faults, the system automatically identifies the target spare disk, mounts the target spare disk, sets the system to restore and reconstruct after the system is restored to normal, and lights the target spare disk to prompt (such as blue indicator lights) to prompt workers that the system is restored to normal.

In summary, in the method, by monitoring a plurality of main disks, faults of the main disks are found in time, and the target spare disk is electrified, so that data stored on the main disk is migrated to the target spare disk, automatic switching of the disks is realized, the efficiency and speed of disk switching are improved, the operation and maintenance cost is reduced, and the safety and fault tolerance of the distributed storage system are improved.

In order to solve the above technical problem, the present application further provides a disk switching system, please refer to fig. 4, fig. 4 is a block diagram of a disk switching system provided in the present application, the system is applied to a distributed storage system, the distributed storage system includes a plurality of primary disks and a plurality of spare disks, and the system includes:

a detecting unit 41, configured to monitor a plurality of primary disks to determine whether there is a failure of the primary disk;

a triggering unit 42, configured to select, when there is a failure of the primary disk, a target spare disk corresponding to the failed primary disk one by one from the plurality of spare disks;

and the data migration unit 43 is configured to migrate data stored on the failed primary disk corresponding to the data migration unit to the target spare disk.

In one embodiment, a plurality of the spare disks are not powered up in an initial state; further comprises:

and the power-on unit is used for powering on the target spare disk before the data stored on the main disk corresponding to the fault is migrated to the target spare disk.

In one embodiment, the detection unit specifically includes:

the service life detection unit is used for judging whether the service life of the main magnetic disk reaches the service life upper limit; if the service life of the main disk reaches the service life upper limit, judging that the main disk with the service life reaching the service life upper limit fails.

In one embodiment, the detection unit specifically includes:

the read-write detection unit is used for executing read operation and/or write operation on each main disk at intervals of preset time and judging whether the process of the read operation and/or write operation corresponding to the main disk is abnormal or not; if the process of the read operation and/or the write operation corresponding to the main disk is abnormal, the main disk with the abnormal process of the read operation and/or the write operation is failed.

In one embodiment, the detection unit specifically includes:

the performance monitoring unit is used for monitoring performance indexes of the plurality of main disks and judging whether the performance indexes exceed corresponding preset thresholds or not;

if yes, judging that the main disk exceeding the preset threshold fails.

In one embodiment, the triggering unit is specifically configured to select an idle spare disk from the plurality of spare disks as the target spare disk.

In one embodiment, the trigger unit is specifically configured to: and performing in-place detection on each spare disk, and selecting an idle spare disk from the in-place spare disks as a target spare disk.

In one embodiment, further comprising: a failure type determining unit, configured to determine a failure type of a failed primary disk after determining that there is a failure of the primary disk, or determine a failure type of each storage area in the failed primary disk;

A data migration unit comprising: and determining a target migration mode according to the determined failure type of the main disk or each storage area of the main disk, and migrating the data stored on the main disk corresponding to the failure to a target standby disk by using the target migration mode.

In one embodiment, the data migration unit is specifically configured to: when data of a storage area which does not have faults in a main disk with faults are migrated, determining that a target migration mode is copy; and copying the data in the storage area which does not have faults in the main disk corresponding to the data in the storage area to the target spare disk.

In one embodiment, the data migration unit is specifically configured to: when data of a storage area with a fault in a main disk with the fault is migrated, determining a target migration mode as data reconstruction; and acquiring data in a storage area which is in failure in the main disk and corresponds to the main disk and in failure from other main disks which are not in failure by means of data reconstruction, and storing the data in a target standby disk.

In one embodiment, the distributed storage system is also provided with a prompting device which corresponds to each magnetic disk one by one; further comprises:

And the first prompting unit is used for setting the prompting device corresponding to the target standby disk corresponding to the main disk with the fault into a first state when judging that the main disk has the fault so as to generate first prompting information.

In one embodiment, further comprising:

and the second prompting unit is used for setting the prompting device corresponding to the main disk with the fault to be in a first state when judging that the main disk has the fault so as to generate first prompting information.

In one embodiment, further comprising:

and the third prompting unit is used for setting the prompting device corresponding to the target standby disk corresponding to the main disk with the fault to be in a second state when the data stored in the main disk with the fault is migrated so as to generate second prompting information.

In one embodiment, further comprising:

and the fourth prompting unit is used for setting the corresponding prompting device target standby disk corresponding to the main disk with the fault into a third state when the data stored in the main disk with the fault is migrated, so as to generate third prompting information.

In one embodiment, further comprising:

and the fifth prompting unit is used for prompting staff to swap the position of the target spare disk and the position of the main disk with the fault after the data stored on the main disk with the fault corresponding to the staff is migrated to the target spare disk.

In one embodiment, further comprising:

and the power-off unit is used for performing power-off processing on the slot position of the target spare disk before the position of the target spare disk and the position of the main disk with faults are exchanged.

the unplugging determining unit is used for determining whether the target spare disk is unplugged from the slot position of the target spare disk;

and the reconfiguration inhibition unit is used for issuing a reconfiguration inhibition instruction to the distributed storage system when the target spare disk is pulled out of the slot position of the target spare disk so as to inhibit the system from carrying out data reconfiguration.

the insertion determining unit is used for determining whether the target standby disk is inserted into the slot position of the main disk with the fault;

and the reconstruction recovery unit is used for issuing a recovery reconstruction instruction to the distributed system when the target spare disk is inserted into the slot position of the main disk with the fault, so that the system recovers the data reconstruction.

For the description of the disk switching system, refer to the above embodiments, and the description is omitted herein.

In order to solve the above technical problem, the present application further provides a magnetic disk switching device, please refer to fig. 5, fig. 5 is a block diagram of a magnetic disk switching device provided in the present application, the device includes:

a memory 51 for storing a computer program;

the processor 52 is configured to implement the steps of the disk switching method described above when storing the computer program.

For the description of the disk switching device, refer to the above embodiments, and the description is omitted herein.

In order to solve the above technical problem, the present application further provides a computer readable storage medium, please refer to fig. 6, fig. 6 is a block diagram of a structure of the computer readable storage medium provided in the present application, the computer readable storage medium 60 stores a computer program 61 thereon, and the computer program 61 implements the steps of the disk switching method when executed by the processor 52. For the description of the computer-readable storage medium 60, refer to the above embodiments, and the description thereof is omitted herein.

In order to solve the above technical problems, the present application further provides a distributed storage system, and fig. 7 is a block diagram of a distributed storage system provided in the present application, where the distributed storage system includes the above disk switching device, a plurality of primary disks, and a plurality of spare disks. For the description of the distributed storage system, refer to the above embodiments, and the description is omitted herein.

It should also be noted that in this specification, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A disk switching method, applied to a distributed storage system, the distributed storage system including a plurality of primary disks and a plurality of spare disks, the method comprising:

judging whether the main disk fails or not;

2. The disk switching method as claimed in claim 1, wherein a plurality of said spare disks are not powered up in an initial state;

and powering up the target standby disk.

3. The disk switching method as claimed in claim 1, wherein determining whether there is a failure of the primary disk comprises:

4. The disk switching method as claimed in claim 1, wherein determining whether there is a failure of the primary disk comprises:

5. The disk switching method as claimed in claim 1, wherein determining whether there is a failure of the primary disk comprises:

if yes, judging that the main disk exceeding the preset threshold fails.

6. The disk switching method as claimed in claim 1, wherein selecting a target spare disk corresponding one-to-one to a failed primary disk from a plurality of said spare disks, comprises:

7. The disk switching method as claimed in claim 6, wherein selecting a spare disk in place from a plurality of said spare disks as said target spare disk, comprises:

8. The disk switching method according to claim 1, further comprising, after determining that there is a failure of the primary disk:

9. The disk switching method according to claim 8, wherein determining a target migration pattern according to the determined failure type of the primary disk or each storage area of the primary disk comprises:

10. The disk switching method according to claim 8, wherein determining a target migration pattern according to the determined failure type of the primary disk or each storage area of the primary disk comprises:

11. The disk switching method as claimed in claim 1, wherein the distributed storage system is further provided with a prompting device corresponding to each disk one by one; when it is determined that there is a failure of the primary disk, further comprising:

12. The disk switching method as claimed in claim 11, wherein upon determining that there is a failure of the primary disk, further comprising:

and setting the prompting device corresponding to the main disk with the fault to be in a second state so as to generate second prompting information.

13. The disk switching method as claimed in claim 11, wherein when migrating data stored in the failed primary disk, further comprising:

14. The disk switching method as claimed in claim 11, further comprising, upon completion of migration of data stored in the failed primary disk:

15. The disc switching method according to claim 11, wherein the presentation means is a display presentation means and/or an audio presentation means.

16. The disk switching method according to any one of claims 1 to 15, further comprising, after migrating data stored on the failed primary disk corresponding to itself to the target spare disk:

17. The disk switching method of claim 16, further comprising, prior to swapping the location of the target spare disk and the location of the failed primary disk:

18. The disk switching method as claimed in claim 16, wherein the process of swapping the location of the target spare disk and the location of the failed primary disk further comprises:

19. The disk switching method of claim 16, further comprising, after determining that the target spare disk has been pulled from a slot in which the target spare disk is located:

and if so, issuing a recovery reconstruction instruction to the distributed storage system so as to enable the system to recover data reconstruction.

20. A disk switching system for use in a distributed storage system comprising a plurality of primary disks and a plurality of spare disks, the system comprising:

the detection unit is used for judging whether the main disk fails or not;

21. A disk switching apparatus, comprising:

a memory for storing a computer program;

a processor for implementing the steps of the disk switching method according to any one of claims 1-19 when storing a computer program.

22. A computer readable storage medium, characterized in that the computer readable storage medium has stored thereon a computer program which, when executed by a processor, implements the steps of the disk switching method according to any of claims 1-19.

23. A distributed storage system comprising a disk switching apparatus as claimed in claim 21, a plurality of primary disks and a plurality of spare disks.