CN107544868B

CN107544868B - Data recovery method and device

Info

Publication number: CN107544868B
Application number: CN201710331059.2A
Authority: CN
Inventors: 钟晋明
Original assignee: New H3C Cloud Technologies Co Ltd
Current assignee: New H3C Cloud Technologies Co Ltd
Priority date: 2017-05-11
Filing date: 2017-05-11
Publication date: 2020-06-09
Anticipated expiration: 2037-05-11
Also published as: CN107544868A

Abstract

The application provides a data recovery method and device. In the method, the virtual machine does not need to be closed in the process of repairing the shared file system, so that the problems of service interruption and the like caused by closing the virtual machine related to the shared file system due to data repair are prevented; further, the method and the device can realize online migration of the virtual machine from one shared file system to another shared file system in a short time.

Description

Data recovery method and device

Technical Field

The present application relates to network communication technologies, and in particular, to a data recovery method and apparatus.

Background

In a virtualization environment, when a virtual machine is created, disk image files of the virtual machine are uniformly deployed on a shared file system in advance. And when the virtual machine successfully runs the service, the virtual machine accesses the shared file system to perform read-write operation aiming at the disk image file.

However, when the shared file system fails, part or all of the data (including the disk image file of the virtual machine) on the storage space (such as a disk) corresponding to the shared file system also fails (the failure includes failures such as incomplete data and inconsistency), and at this time, the shared file system only supports a read operation.

In order to repair data in a storage space corresponding to a shared file system in time when the shared file system fails, a commonly used repair method is as follows: closing all virtual machines (specifically, disk image files are deployed in the virtual machines of the shared file system), unloading the shared file system, repairing a fault of data on a storage space corresponding to the shared file system, after the fault is repaired, re-mounting the shared file system (the re-mounting is similar to the re-starting in software), and then starting all virtual machines associated with the shared file system.

However, the repair method described above needs to close all the virtual machines associated with the shared file system before the shared file system is mounted again, and closing all the virtual machines associated with the shared file system may affect the service of running all the virtual machines associated with the shared file system, and finally cause a large amount of service interruption.

Disclosure of Invention

The application provides a data recovery method and device, which are used for preventing the problem caused by closing a virtual machine associated with a shared file system due to data repair.

The technical scheme provided by the application comprises the following steps:

a method of data recovery, the method comprising:

when the first shared file system only supports read operation, repairing fault data on a storage space corresponding to the first shared file system;

after fault data recovery is completed, establishing a second shared file system based on data on a storage space corresponding to a first shared file system, wherein the second shared file system and the first shared file system correspond to the storage space;

and verifying whether the second shared file system supports read and write operations, and migrating each virtual machine associated with the first shared file system to the second shared file system on line when the second shared file system supports read and write operations.

A data recovery apparatus, the apparatus comprising:

the fault repairing unit is used for repairing fault data on a storage space corresponding to the first shared file system when the first shared file system only supports read operation;

the second shared file system processing unit is used for establishing a second shared file system based on data on a storage space corresponding to a first shared file system after fault data recovery is completed, and the second shared file system and the first shared file system correspond to the storage space;

a verifying unit for verifying whether the second shared file system supports read and write operations,

and the migration unit is used for migrating each virtual machine associated with the first shared file system to the second shared file system on line when the verification unit verifies that the second shared file system supports read and write operations.

According to the technical scheme, when the first shared file system only supports read operation, a second shared file system is newly established on the storage space of the first shared file system in the process of repairing the first shared file system, all virtual machines related to the first shared file system are migrated to the second shared file system on line without closing the virtual machines, and the problem caused by closing the virtual machines related to the shared file system due to data repair is solved;

furthermore, the second shared file system and the first shared file system are established in the same storage space, so that a large amount of storage space cannot be occupied when the virtual machine associated with the first shared file system is online migrated to the second shared file system, the problems of disk data copying of the virtual machine and the like cannot occur, and the virtual machine can be online migrated from the first shared file system to the second shared file system in a short time.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure.

FIG. 1 is a flow chart of a method provided herein;

FIG. 2 is a flowchart of an embodiment of verifying whether a second shared file system supports read and write operations, as provided herein;

FIG. 3 is a flow diagram of an embodiment of virtual machine migration provided herein;

FIG. 4 is a flow diagram of an embodiment of a shared file system provided by the present application when the shared file system does not support read or write operations;

FIG. 5 is a flow diagram of an embodiment of data repair provided herein;

fig. 6 is a diagram illustrating the structure of the apparatus according to the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in detail below with reference to the accompanying drawings and specific embodiments.

Referring to fig. 1, fig. 1 is a flow chart of a method provided by the present application. As an example, the method shown in fig. 1 may be applied to a management apparatus. The management device may be a host, a server, or other devices when implemented specifically, and the application is not limited specifically.

As shown in fig. 1, the process may include the following steps:

step 101, when the first shared file system only supports read operation, repairing the fault data on the storage space corresponding to the first shared file system.

In this application, the first shared file system is named for convenience of description, and is not used to limit a certain shared file system.

Here, the storage space corresponding to the first shared file system may be: the storage space occupied by the first shared file system may also be: the storage space not occupied by the first shared file system, but a storage space corresponding to the first shared file system is specified. The storage space may also be a physical disk, and may also be other storage media, and the present application is not particularly limited.

Here, the first shared file system only supports read operation, and it can be considered that the first shared file system has a failure. When the first shared file system fails, part or all of the data (including the disk image file of the virtual machine) in the storage space corresponding to the first shared file system also fails (the failure includes failures such as incomplete data and inconsistency), and when the failure occurs, part or all of the data (including the disk image file of the virtual machine) in the storage space corresponding to the first shared file system fails, the service operation is affected, so as to prevent the service operation from being affected, as described in step 101, the present application repairs the failed data in the storage space corresponding to the first shared file system when the first shared file system is mounted. How to repair the failure data on the storage space corresponding to the first shared file system will be described below, which will not be described herein again.

Step 102, after the fault data recovery is completed, establishing a second shared file system based on data on a storage space corresponding to a first shared file system, where the second shared file system and the first shared file system correspond to the storage space.

In this application, the second shared file system is named for convenience of distinguishing from the first shared file system described above, and is not used to limit a certain shared file system.

By the step 102, two shared file systems can coexist in the same storage space.

And 103, verifying whether the second shared file system supports read and write operations, and migrating each virtual machine associated with the first shared file system to the second shared file system on line when the second shared file system supports read and write operations.

As described above, the second shared file system corresponds to the same storage space as the first shared file system, which is equivalent to that the second shared file system and the first shared file system are established in the same storage space, and based on this, in this step 103, online migration of each virtual machine associated with the first shared file system to the second shared file system is equivalent to migration of a virtual machine corresponding to the same storage space, which does not occupy a large amount of storage space, and does not cause problems such as disk data replication of the virtual machine, so that online migration of the virtual machine from the first shared file system to the second shared file system in a short time is realized.

Thus, the flow shown in fig. 1 is completed.

As can be seen from the flow shown in fig. 1, in the present application, when a first shared file system only supports a read operation, after repairing of faulty data in a storage space corresponding to the first shared file system is completed, a second shared file system is newly created in the storage space of the first shared file system, and virtual machines associated with the first shared file system are migrated to the second shared file system online without closing the virtual machines, so as to prevent a problem caused by closing the virtual machines associated with the shared file system due to data repair;

In this application, for verifying whether the second shared file system supports read and write operations, reference may be made to the flow illustrated in fig. 2, which includes steps 201 and 202:

step 201, performing a read operation on data in a storage space corresponding to the second shared file system, and performing a write operation on the storage space corresponding to the second shared file system, where the performing the write operation includes: the newly created data is written.

Step 202, if the read operation and the write operation are successfully executed, it is determined that the second shared file system supports the read operation and the write operation, otherwise, when the read operation is not successfully executed, it is determined that the second shared file system does not support the read operation, and when the write operation is not successfully executed, it is determined that the second shared file system does not support the write operation.

Here, whether it is determined that the second shared file system does not support a read operation or that it is determined that the second shared file system does not support a write operation means that the second shared file system cannot support both a read and a write operation.

Through steps 201 to 202, the step of verifying whether the second shared file system supports read and write operations is realized.

In step 103, as an embodiment, when each virtual machine associated with the first shared file system is migrated to the second shared file system online, the host of each virtual machine remains unchanged. Because the virtual machines are virtualized on the host machine, on the premise that the host machine of each virtual machine is kept unchanged, online migration of each virtual machine associated with the first shared file system to the second shared file system is equivalent to modification of the configuration in the host machine, but does not affect any other device connected with the host machine, the host machine does not need to notify any other device connected with the host machine, and networking topology does not change.

As described above, each virtual machine associated with the first shared file system is migrated to the second shared file system online, while the host of each virtual machine remains unchanged, and for this case, an optimal implementation is: it is also necessary that each virtual machine associated with the first shared file system is migrated from the second shared file system back to the first shared file system.

In order to implement that each virtual machine associated with the first shared file system is migrated from the second shared file system back to the first shared file system, the method provided by the present application may further include steps 301 to 303 in the flow illustrated in fig. 3:

step 301, unmount the first shared file system.

Step 302, when the first shared file system is mounted again, migrating each virtual machine associated with the first shared file system from the second shared file system to the first shared file system on line.

Step 302, unmount the second shared file system.

Through steps 301 to 303, it is realized that each virtual machine associated with the first shared file system is migrated from the second shared file system back to the first shared file system.

As an embodiment, before repairing the failure data on the storage space corresponding to the first shared file system, the method further includes: and backing up the fault data on the storage space corresponding to the first shared file system. The failed data on the storage space corresponding to the first shared file system is preferably backed up in the form of data blocks.

Here, the reason why the failed data backup on the storage space corresponding to the first shared file system is: and when the fault data is not successfully repaired, repairing the fault data on the storage space corresponding to the first shared file system again, so as to achieve the purposes of repeated repair and final successful repair.

In one embodiment, the backed-up data may be stored in a storage space corresponding to the first shared file system, with a relatively small amount.

Based on the above-described backup of the failure data on the storage space corresponding to the first shared file system, in this application, when it is verified that the second shared file system does not support the read or write operation, the method provided by this application further includes

steps

401 and 402 in the flow shown in fig. 4:

step 401, uninstalling the second shared file system, and recovering the corresponding data in the storage space based on the backup of the failure data.

Namely, the failure data on the storage space corresponding to the first shared file system at the beginning is recovered.

Step 402, repairing the fault data on the storage space corresponding to the first shared file system, and returning to execute the operation of establishing the second shared file system based on the data on the storage space corresponding to the first shared file system after completing the data repair.

How to repair the failure data on the storage space corresponding to the first shared file system in step 402 is described in detail below, and will not be described herein again.

Through steps 401 to 402, repeated repair and repeated verification are finally realized, and the correctness of data repair is verified in the data repair process.

In this application, the repairing the failure data in the storage space corresponding to the first shared file system includes steps 501 to 502 in the flow shown in fig. 5:

step 501, dividing data blocks of fault data in a storage space corresponding to a first shared file system.

Step 502, repairing the failure data on the storage space corresponding to the first shared file system according to the sequence from the smallest failure data block to the largest failure data block.

Through steps 501 to 502, repairing the failure data on the storage space corresponding to the first shared file system when the first shared file system is mounted is realized. In one embodiment, in the present application, a fsck tool may be used to repair the failure data on the storage space corresponding to the first shared file system when the first shared file system is mounted. The fsck tool is a common tool for checking and repairing a file system in the Linux system, and how to repair the fault data on the storage space corresponding to the first shared file system is not described in detail here.

In step 103, as another embodiment, each virtual machine associated with the first shared file system is migrated to the second shared file system online across host hosts, that is, the host of each virtual machine changes, and compared with a case that the host of each virtual machine remains unchanged when each virtual machine associated with the first shared file system is migrated to the second shared file system online, the host of each virtual machine changes and thus the networking topology changes, at this time, the original host of each virtual machine needs to notify any other connected device of a message that the virtual machine is migrated online in time, and the new host of each virtual machine needs to notify any other connected device of a message that the virtual machine is newly migrated in time, so as to prevent service interruption of the virtual machine.

The methods provided herein are described above. The following describes the apparatus provided in the present application:

referring to fig. 6, fig. 6 is a diagram illustrating the structure of the apparatus according to the present invention. As an example, the apparatus shown in fig. 6 may be applied to a management device. The management device may be a host, a server, or other devices when implemented specifically, and the application is not limited specifically.

As shown in fig. 6, the apparatus includes:

Preferably, as shown in fig. 6, the apparatus further comprises: a first shared file system processing unit;

the first shared file system processing unit is used for unloading the first shared file system and migrating the virtual machines from the second shared file system to the first shared file system on line when the first shared file system is mounted again;

the second shared file system processing unit is further used for unloading the second shared file system after the virtual machines are online migrated from the second shared file system to the first shared file system.

Preferably, the verifying unit verifies whether the second shared file system supports read and write operations, including:

executing a read operation on data in a storage space corresponding to the second shared file system, and executing a write operation to the storage space corresponding to the second shared file system, where the executing the write operation includes: writing the newly created data;

and if the read operation and the write operation are successfully executed, determining that the second shared file system supports the read operation and the write operation, otherwise, when the read operation is not successfully executed, determining that the second shared file system does not support the read operation, and when the write operation is not successfully executed, determining that the second shared file system does not support the write operation.

Preferably, as shown in fig. 6, the apparatus further comprises: a backup processing unit;

the backup processing unit is used for backing up the fault data on the storage space corresponding to the first shared file system before the fault repairing unit repairs the fault data on the storage space corresponding to the first shared file system;

the second shared file system processing unit is further used for unloading the second shared file system when the verification unit verifies that the second shared file system does not support read or write operation;

the backup processing unit further restores the corresponding data on the storage space based on the backup of the fault data, and triggers the fault restoration unit to restore the fault data on the storage space corresponding to the first shared file system.

Preferably, the repairing the failure data on the storage space corresponding to the first shared file system by the failure repairing unit includes:

dividing data blocks of fault data on a storage space corresponding to a first shared file system;

and repairing the fault data on the storage space corresponding to the first shared file system according to the sequence from the minimum fault data block to the maximum fault data block.

Thus, the description of the device structure shown in fig. 6 is completed.

The above description is only exemplary of the present application and should not be taken as limiting the present application, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the scope of protection of the present application.

Claims

1. A method for data recovery, the method comprising:

2. The method of claim 1, further comprising:

unloading the first shared file system;

when the first shared file system is mounted again, migrating each virtual machine from the second shared file system to the first shared file system on line;

the second shared file system is unloaded.

3. The method of claim 1, wherein verifying whether the second shared file system supports read and write operations comprises:

4. The method of claim 1, further comprising, before repairing the failed data on the storage space corresponding to the first shared file system: backing up fault data on a storage space corresponding to the first shared file system;

when it is verified that the second shared file system does not support read or write operations, the method further comprises:

unloading the second shared file system, and recovering corresponding data on the storage space based on backup of fault data;

and repairing the fault data on the storage space corresponding to the first shared file system, and returning to execute the operation of establishing a second shared file system based on the data on the storage space corresponding to the first shared file system after completing the data repair.

5. The method of claim 1 or 4, wherein repairing the failed data on the storage space corresponding to the first shared file system comprises:

6. A data recovery apparatus, characterized in that the apparatus comprises:

7. The apparatus of claim 6, further comprising:

8. The apparatus of claim 6, wherein the verifying unit verifies whether the second shared file system supports read and write operations comprises:

9. The apparatus of claim 6, further comprising: a backup processing unit;

10. The apparatus according to claim 6 or 9, wherein the repairing the failed data on the storage space corresponding to the first shared file system by the failure repairing unit comprises: