CN114895849A

CN114895849A - Data migration and storage method and device and management node

Info

Publication number: CN114895849A
Application number: CN202210475723.1A
Authority: CN
Inventors: 周晓熔
Original assignee: Chongqing Unisinsight Technology Co Ltd
Current assignee: Chongqing Unisinsight Technology Co Ltd
Priority date: 2022-04-29
Filing date: 2022-04-29
Publication date: 2022-08-12

Abstract

The application provides a data migration and storage method, a data migration and storage device and a management node. And when the preset reliability requirement is not met, detecting whether the data block to be migrated forms a ring structure, if so, directly marking the data block to be migrated as a migration completion state, and if not, waiting for triggering of the next migration operation. In the scheme, the problem that data loss is caused by subsequent abnormity possibly caused by migration can be avoided. Moreover, for the data blocks to be migrated which form a special ring structure, a mode of directly marking successful migration is adopted, so that unnecessary migration work can be avoided on one hand, and reasonable migration of each data block can be guaranteed on the other hand.

Description

Data migration and storage method and device and management node

Technical Field

The present application relates to the field of distributed storage technologies, and in particular, to a data migration storage method, an apparatus, and a management node.

Background

With data expansion, distributed storage systems are becoming the mainstream of storage systems due to their advantages of scalable scale, high reliability of data, and the like. Among them, distributed block storage based on erasure codes is widely used due to its advantages such as high space utilization.

The storage space created by the user can be divided into a plurality of FILEs (FILE), one FILE can be divided into a plurality of Objects (OBJ) according to different redundancy strategies, and each OBJ consists of N + M data blocks (block). When the cluster encounters an abnormal condition, the block data is abnormal, the reliability of the cluster data is reduced to a certain extent, and at the moment, data recovery is triggered to ensure the reliability of the data. However, a large amount of block recovery caused by node offline also faces the problem of load imbalance after the node recovery, and recovery migration is required. When the cluster needs to expand nodes (disks) at the later stage, data migration needs to be carried out on the cluster. Data reliability in the migration process becomes a critical consideration.

In the prior art, when data migration processing is performed, the reliability requirement of data in the migration process is not considered, so that the nodes in the migration process may be abnormal again, and data loss is caused. Moreover, there is no complete mechanism in the prior art to ensure that each data block can be effectively and reasonably migrated.

Disclosure of Invention

The present application provides a data migration storage method, apparatus, and management node, which can avoid data loss and ensure that data blocks are reasonably migrated.

The embodiment of the application can be realized as follows:

in a first aspect, the present application provides a data migration storage method, which is applied to a management node in a distributed cluster, where the distributed cluster further includes a plurality of storage nodes connected to the management node, each storage node includes a plurality of disks, each disk is used for storing data, and each data is divided into a plurality of data blocks, and the method includes:

when the migration operation is triggered, acquiring a data block to be migrated;

detecting whether the data block to be migrated meets a preset reliability requirement or not according to a target storage node of the data block to be migrated and a data block in a target disk, and migrating the data block to be migrated to the target storage node and the target disk if the preset reliability requirement is met;

if the preset reliability requirement is not met, detecting whether the data blocks to be migrated form a ring structure or not, wherein the ring structure is a structure in which respective current storage nodes of the two data blocks to be migrated are destination storage nodes of each other;

if a ring structure is formed, directly marking the two data blocks to be migrated as a migration completion state;

if the ring structure is not formed, waiting for the triggering of the next migration operation.

In an optional embodiment, the migration operation includes a recovery migration operation after the storage node or the disk is offline and then online, and the method further includes a step of recovering the data block when the storage node or the disk is offline, where the step includes:

when determining that the storage node or the disk is offline, acquiring data information of a data block on the offline storage node or the disk and storing the data information into a recovery table;

scanning the data information in the recovery table, and detecting whether other data blocks of the same data to which the data block corresponding to the scanned data information belongs exist at present and are in a recovery state;

if yes, skipping the data information to scan the data information of the next data block;

and if the data block does not exist, sending the data block corresponding to the data information to a target storage node or a target disk determined from other online storage nodes for recovery.

In an optional embodiment, the step of determining that the storage node or the disk is offline includes:

and determining that the storage node or the disk is offline when the storage node or the disk is monitored to be offline and the offline duration exceeds the preset duration.

In an optional embodiment, the migration operation includes a migration recovery operation after a storage node or a disk is offline and then online, and the step of detecting whether the data block to be migrated meets a preset reliability requirement according to a target storage node of the data block to be migrated and a data block in a target disk includes:

detecting whether a data block with the same data as the data block to be migrated belongs to exists in a waiting migration state or not at present, and if not, detecting whether a data block with the same data as the data block to be migrated belongs to exists in the target disk or not;

if not, detecting whether the number of data blocks of the same data as the data block to be migrated in the target storage node is smaller than a preset maximum allowable number, and if so, judging that the data block to be migrated meets a preset reliability requirement.

In an optional embodiment, the migration operation includes an extension migration when there is a newly added storage node in the distributed cluster, and before the step of triggering the migration operation, the method further includes:

acquiring the used capacity of each storage node currently existing in the distributed cluster;

judging whether the capacity expansion migration between the storage nodes is carried out or not according to the number of the storage nodes which exist at present, the used capacity of each storage node which exists at present and a preset storage node capacity difference threshold value, and triggering the migration operation if the capacity expansion migration between the storage nodes is judged to be carried out.

In an optional implementation manner, the step of determining whether to perform the capacity expansion migration between the storage nodes according to the number of the currently existing storage nodes, the used capacity of each currently existing storage node, and a preset capacity difference threshold includes:

calculating to obtain the average used capacity of the storage nodes according to the total used capacity of the storage nodes of all the storage nodes in the distributed cluster and by combining the number of the currently existing storage nodes;

and for any storage node, judging whether to perform capacity expansion migration between the storage nodes based on the difference between the used capacity of the storage node and the average used capacity of the storage nodes.

In an optional embodiment, the step of obtaining the data block to be migrated includes:

comparing the used capacity of each storage node currently existing in the distributed cluster;

taking the storage node with the highest used capacity as a migration storage node, and taking the storage node with the lowest used capacity as a target storage node;

and extracting the data block from the migrated storage node as the data block to be migrated.

aiming at a storage node with a newly added disk in the distributed cluster, acquiring the used capacity of each disk under the storage node;

and judging whether the capacity expansion migration between the disks is carried out or not according to the number of the disks under the storage node, the used capacity of each disk and a preset disk capacity difference threshold value, and triggering the migration operation if the capacity expansion migration between the disks is judged to be carried out.

In a second aspect, the present application provides a data migration storage apparatus, which is applied to a management node in a distributed cluster, where the distributed cluster further includes a plurality of storage nodes connected to the management node, each storage node includes a plurality of disks, each disk is used to store data, and each data is divided into a plurality of data blocks, and the apparatus includes:

the acquisition module is used for acquiring the data block to be migrated when the migration operation is triggered;

the first detection module is used for detecting whether the data block to be migrated meets the preset reliability requirement or not according to the target storage node of the data block to be migrated and the data block in the target disk;

the migration module is used for migrating the data block to be migrated to the destination storage node and the destination disk when the preset reliability requirement is met;

the second detection module is used for detecting whether the data blocks to be migrated form a ring structure or not when the preset reliability requirement is not met, wherein the ring structure is a structure in which respective current storage nodes of the two data blocks to be migrated are destination storage nodes of each other;

the marking module is used for directly marking the two data blocks to be migrated as migration completion states when a ring structure is formed;

and the waiting module is used for waiting for triggering the next migration operation when a ring structure is not formed.

In a third aspect, the present application provides a management node comprising one or more storage media and one or more processors in communication with the storage media, the one or more storage media storing processor-executable machine-executable instructions that, when executed by the management node, are executed by the processors to perform the method steps of any one of the preceding embodiments.

The beneficial effects of the embodiment of the application include, for example:

the application provides a data migration and storage method, a data migration and storage device and a management node. And when the preset reliability requirement is not met, detecting whether the data block to be migrated forms a ring structure, if so, directly marking the data block to be migrated as a migration completion state, and if not, waiting for triggering of the next migration operation. In the scheme, whether the data block to be migrated meets the preset reliability requirement is judged based on the target storage node and the data block in the target disk, so that the problem of data loss caused by subsequent abnormity possibly caused by migration is avoided. Moreover, for the data blocks to be migrated which form a special ring structure, a mode of directly marking successful migration is adopted, so that unnecessary migration work can be avoided on one hand, and reasonable migration of each data block can be guaranteed on the other hand.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained from the drawings without inventive effort.

Fig. 1 is a schematic structural diagram of a distributed cluster provided in an embodiment of the present application;

fig. 2 is a flowchart of a data migration storage method according to an embodiment of the present application;

fig. 3 is a flowchart of a data recovery method in the data migration storage method according to the embodiment of the present application;

FIG. 4 is a flowchart of sub-steps included in step S202 of FIG. 2;

FIG. 5 is a diagram of a data block in a ring structure provided in an embodiment of the present application;

FIG. 6 is a diagram illustrating an original distribution of data blocks in the same group of data according to an embodiment of the present application;

fig. 7 is a schematic distribution diagram of data blocks in the same group of data after multiple recoveries according to an embodiment of the present application;

FIG. 8 is a schematic diagram of a data migration storage process according to an embodiment of the present application;

fig. 9 is a flowchart of a capacity expansion migration determination method in the data migration storage method according to the embodiment of the present application;

fig. 10 is a second flowchart of a capacity expansion migration determination method in the data migration storage method according to the embodiment of the present application;

fig. 11 is a flowchart of sub-steps included in step S201 in fig. 2;

FIG. 12 is a schematic diagram of the used capacity of each node when a node is newly added in a distributed cluster;

FIG. 13 is a specific numerical value of the used capacity of each node when a node is newly added in the distributed cluster;

FIG. 14 is a diagram illustrating the capacity used by each node in a distributed cluster after adding nodes and performing migration processing;

fig. 15 is a specific numerical value of the used capacity of each node after newly adding a node in the distributed cluster and performing migration processing;

fig. 16 is a block diagram of a management node according to an embodiment of the present application;

fig. 17 is a functional block diagram of a data migration storage apparatus according to an embodiment of the present application.

Icon: 110-a storage medium; 120-a processor; 130-data migration storage; 131-an acquisition module; 132-a first detection module; 133-a migration module; 134-a second detection module; 135-a labeling module; 136-a waiting module; 140-communication interface.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations.

Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.

In the description of the present application, it is noted that the terms "first", "second", and the like are used merely for distinguishing between descriptions and are not intended to indicate or imply relative importance.

It should be noted that the features of the embodiments of the present application may be combined with each other without conflict.

Fig. 1 is a schematic structural diagram of a distributed cluster according to an embodiment of the present disclosure. The distributed cluster includes a management node and a plurality of storage nodes (data nodes) connected to the management node. Wherein each storage node comprises a plurality of disks. The management node and the storage node may be terminal devices or servers. The management node and the storage node construct an operation framework of the storage system.

Each disk may be used to store data, each data being divided into a plurality of data blocks, wherein data blocks belonging to the same data may be divided into the same group. For example, the created storage space may be split into a plurality of FILEs (FILE), and one FILE may be split into a plurality of Objects (OBJ) according to different redundancy policies, each OBJ consisting of N + M data blocks (block).

Please refer to fig. 2, which is a flowchart illustrating a data migration and storage method according to an embodiment of the present application, where the migration and storage method is applicable to a management node in the distributed cluster. The following describes a data migration storage method provided in the embodiment of the present application in detail with reference to fig. 2.

S201, when the migration operation is triggered, the data block to be migrated is obtained.

S202, detecting whether the data block to be migrated meets a preset reliability requirement or not according to the target storage node of the data block to be migrated and the data block in the target disk, if so, executing the following step S203, and if not, executing the following step S204.

S203, the data block to be migrated is migrated to the destination storage node and the destination disk.

And S204, detecting whether the data blocks to be migrated form a ring structure, wherein the ring structure is a structure in which the current storage nodes of the two data blocks to be migrated are the destination storage nodes of each other, and if the ring structure is formed, executing the following step S205, and if the ring structure is not formed, executing the following step S206.

S205, directly marking the two data blocks to be migrated as a migration completion state.

S206, waiting for the trigger of the next migration operation.

In this embodiment, when an abnormality occurs in a storage node or a disk in a distributed cluster, the reliability of cluster data will be affected, and at this time, data on the abnormal storage node or disk needs to be recovered. After the abnormal storage node or disk is on-line again, the data restored before needs to be migrated to the original storage node or disk, so as to ensure load balance in the cluster. In addition, when a new storage node or disk exists in the cluster, the problem of load imbalance in the cluster also exists, and at this time, the migration operation is also triggered.

Under the condition that the migration operation is triggered, the data block to be migrated can be obtained. When the abnormal storage node or disk is online again, the data block to be migrated may be a data block that is originally on the abnormal storage node or disk and has been restored to another storage node. And under the condition of adding a storage node or a disk, the data block to be migrated is the data block on the storage node originally existing in the distributed cluster.

And under the condition that the abnormal storage node or disk is online again, the target storage node and the target disk of the data block to be migrated are the online storage node or disk again. However, when a storage node or a disk is newly added to the cluster, the destination storage node and the destination disk of the data block to be migrated may be the newly added storage node and the newly added disk.

After determining the data block to be migrated, the target storage node and the target disk, the conditions of the data block on the target storage node and the target disk can be obtained, and whether the data block to be migrated meets the preset reliability requirement or not is judged by combining the data block to be migrated, that is, whether the data block to be migrated is suitable for being migrated to the target storage node and the target disk at present is judged. If the data block to be migrated meets the preset reliability requirement, the data block to be migrated can be directly migrated to the destination storage node and the destination disk.

If the data block to be migrated is determined not to meet the preset reliability requirement, generally, the data block needs to wait for triggering of the next migration operation, and then determine whether to perform migration. However, there are some special cases, for example, the data blocks to be migrated form a ring structure, that is, the current data block to be migrated includes at least two data blocks, and each current storage node of the two data blocks to be migrated is a destination storage node of each other. For example, the data block to be migrated includes a data block a and a data block B, the current storage node of the data block a is a storage node a, the destination storage node is a storage node B, and the current storage node of the data block B is a storage node B, and the destination storage node is a storage node a. That is, data block a needs to be migrated from storage node a to storage node B, and data block B needs to be migrated from storage node B to storage node a.

When the data blocks to be migrated do not meet the preset reliability requirement but form the ring structure, the two data blocks to be migrated can be directly marked as migration completion states, that is, actual migration actions do not need to be performed any more. If the data block to be migrated does not meet the preset reliability requirement and does not form the ring formation structure, the migration needs to be judged after waiting for the triggering of the next migration operation.

The data migration storage method provided by this embodiment determines whether the data block to be migrated meets the preset reliability requirement based on the target storage node and the data block in the target disk, and avoids the problem of data loss due to subsequent abnormality possibly caused by migration. Moreover, for the data blocks to be migrated which form a special ring structure, a mode of directly marking successful migration is adopted, so that unnecessary migration work can be avoided on one hand, and reasonable migration of each data block can be guaranteed on the other hand.

In this embodiment, the migration operation includes a recovery migration operation after the storage node or the disk is offline and then online, and an extension migration when a newly added storage node or a newly added disk exists in the distributed cluster.

When the migration operation is a recovery migration operation after the storage node or the disk is offline and then online, it means that the storage node or the disk is offline before the recovery migration operation. In order to ensure that data is not lost, the data blocks are recovered when the storage nodes or the disks are offline. Therefore, referring to fig. 3, the data migration storage method provided in this embodiment further includes the following step of recovering the data block when the storage node or the disk is offline.

S101, when determining that the storage node or the disk is offline, acquiring data information of the data block on the offline storage node or the disk and storing the data information into a recovery table.

S102, scanning the data information in the recovery table, and detecting whether there is another data block of the same data to which the data block corresponding to the scanned data information belongs currently in a recovery state, if so, executing the following step S103, and if not, executing the following step S104.

S103, skipping the data information to scan the data information of the next data block;

and S104, sending the data block corresponding to the data information to a target storage node or a target disk determined from other online storage nodes for recovery.

In this embodiment, when the storage node or the disk goes offline for more than a specified time, the data block on the storage node or the disk will be determined to be unavailable, so that the data block on the storage node or the disk that goes offline will be used as the data block to be recovered, and the data information of the data block on the storage node or the disk that goes offline can be stored in the recovery table.

In order to avoid wasting cluster resources by temporarily taking off the storage node or the disk, for example, pulling out and plugging the disk, restarting the node, and the like, which are triggered by the off-line, in this embodiment, it may be determined that the storage node or the disk is off-line when it is monitored that the off-line duration of the storage node or the disk exceeds a preset duration, and then the recovery operation of the data block is triggered. Therefore, the problem of resource waste caused by data block recovery due to short offline can be avoided.

In addition, in this embodiment, when it is monitored that the storage node or the disk is offline, and the offline duration does not exceed the preset duration, but there is a read-write operation on the data block therein, the read-write operation will fail, and at this time, although the storage node or the disk is not determined to be in an offline state, the data block pointed by the read-write operation still needs to be recovered. Similarly, the data information of the data block pointed by the read/write operation can be stored in the recovery table to wait for recovery. Therefore, the read-write operation can be prevented from being influenced.

In this embodiment, the data information of the data block determined to be offline on the storage node or the disk and the data information of the data block pointed by the read-write operation and failed to be read are stored in a recovery table as shown below to wait for recovery, and the state of the corresponding data block is updated to be offline.

TABLE 1 recovery Table

Description of the invention	Table name	Type	Key	Value
					Recovery table	TBL_RECOVER	Hash	FILEID-OBJID-BLOCKID	timestamp-state

Here, state indicates a recovery state, 0 indicates that recovery is not started, 1 indicates ready recovery, and 2 indicates recovery in progress.

The recovery thread periodically scans the data information in the recovery table, and the number of the data information which can be scanned in each scanning process can be a specified number. For the scanned data information, it can be first detected whether there are other data blocks in the recovery state, i.e. whether there are data blocks in the same group as it belongs to. If so, the data information is skipped to wait for the next scanning. If not, the recovery can be executed on the data block corresponding to the data information, a destination storage node can be determined from the online storage nodes in the current distributed cluster, and if the recovery between the disks is performed, the destination disk can be determined from the destination storage node. And restoring the data block corresponding to the scanned data information to the destination storage node or the destination disk.

After the data block is recovered, the management node may receive the reported recovery confirmation message, and if the recovery state in the reported recovery confirmation message is normal, it may be determined that the data block is successfully recovered, otherwise, the data block is failed to be recovered, and the recovery needs to be performed again. After the data block is successfully recovered, the data information of the data block may be deleted from the recovery table and stored in a recovery completion table as shown below, and the data block corresponding to the data information in the recovery completion table waits for a migration operation.

Table 2 recovery completion table

Description of the invention	Table name	Type	Key
				Recovery completion table	TBL _ RECOVERD-Source DNID-Source DISKID	Set	FILEID-OBJID-BLOCKID

Wherein the source DNID and the source DISKID record the initial position of the data block.

After the data blocks corresponding to all the data information in the recovery table are recovered, that is, after all the data information in the recovery table is stored in the recovery completion table. For the data information in the recovery completion table, the migration scanning thread may periodically scan the data information from the recovery completion table, and store the data information into the migration table shown below to wait for data migration when it is determined that the corresponding data block meets the preset reliability requirement. For the data information in the recovery completion table, if the data to which the data block corresponding to the data information belongs does not exist in the database, the data information can be deleted from the recovery completion table. In addition, if the destination disk of the data block corresponding to the data information is not online, the data information can be skipped over to wait for the next migration scan.

Table 3 migration table

The method comprises the steps that SrcDNID and SrcDISKID record the positions of a current storage node and a disk of a data block, DesDNID and DesDISKID record the positions of a destination storage node and the disk, RecvedDataType represents a received data type (1-recovery migration type; 2-capacity-extended migration type), MgrStatus represents the migration state of the data block (0-migration not started; 1-migration waiting; 2-migration in progress; 3-migration ending), and Timestamp represents migration table time.

Referring to fig. 4, in the present embodiment, when determining whether the data block meets the reliability requirement, the following method may be implemented:

s2021, detecting whether there is a data block of the same data as the data block to be migrated currently in the migration waiting state, if not, executing the following step S2022, and if so, executing the following step S2025.

S2022, detecting whether there is a data block in the destination disk that belongs to the same data as the data block to be migrated, if not, executing the following step S2023, and if so, executing the following step S2025.

S2023, detecting whether the number of data blocks in the destination storage node, which belong to the same data as the data block to be migrated, is less than a preset maximum allowable number, if so, executing the following step S2024, and if not, executing the following step S2025.

S2024, judging that the data block to be migrated meets the preset reliability requirement.

And S2025, judging that the data block to be migrated does not meet the preset reliability requirement.

As can be seen from the above, the migration scan thread may scan data information from the recovery completion table and store the data information in the migration table. After the data information is scanned, whether a data block of the same data to which the data block corresponding to the data information belongs is in a waiting migration state or not can be detected, that is, whether data information of the data block of the same data exists in a migration table or not can be detected, and if the data information exists, the preset reliability requirement is judged not to be met. If not, the subsequent detection operation can be executed.

In addition, whether the data block of the same data as the data block to be migrated belongs to the target disk, that is, whether the data block of the same group exists on the target disk can be detected. If so, it may be determined that the predetermined reliability requirement is not satisfied. If the data block to be migrated does not exist, the size relation between the number of the data blocks on the target storage node and the same data to which the data block to be migrated belongs and the preset maximum allowable number can be detected. The preset maximum allowable number can be calculated in the following way:

MaxNum＝HighAlign(N+M,DnNum)/DnNum

n and M respectively represent the number of data blocks divided by data in the erasure code and the number of redundant fragments generated according to a certain algorithm, HighAlign represents upward alignment, and DnNum represents the number of storage nodes in the distributed cluster.

And if the number of the data blocks of the same data on the target storage node and the data blocks to be migrated is less than the preset maximum allowable number, determining that the preset reliability requirement is met. If the number is greater than or equal to the preset maximum allowable number, it can be determined that the preset reliability requirement is not met.

After the preset reliability requirement is judged to be met, the data information of the data block to be migrated can be stored in the migration table, and the migration state of the data block to be migrated is updated from the non-beginning migration state to the waiting migration state.

The migration processing thread can acquire the data block waiting for the migration state, send the data block to the target storage node and the target disk for migration, and update the migration state of the data block into the migration state after the data block is successfully sent. And if the transmission fails, updating the migration state of the data block to the non-started migration state, and waiting for the next migration scanning. The next migration scan herein refers to a scan for data information already in the migration table.

In addition, in the migration process, if the time of the data block in the migration exceeds the specified time length, timeout processing may be performed, and the data block is inserted into the recovery completion table in a rollback manner, and is deleted from the migration table at the same time, so as to wait for the next migration scan. The next migration scan herein refers to a scan for restoring data information in the completion table.

If the data block to be migrated does not satisfy the predetermined reliability requirement, the data block to be migrated may be configured into a ring structure, for example, as shown in fig. 5, the data block b1 is restored to Node2, and the data block b2 is restored to Node 1. At this time, due to the limitation of reliability, b1 and b2 cannot be moved back to their initial positions. Similarly, a ring structure can be constructed in the case of more than 2 data blocks.

In this case, b1 needs to be migrated back to Node1, at which time the data block that actually exists on Node1 (in this case b2) can be queried. And then judging whether the b2 is a data block to be migrated back to the Node2, if so, modifying the position to be migrated of the b1 into the position to be migrated of the b2, and deleting the migration request information of the b 2. Then, whether the position to be migrated of b1 is the same as the current position of b1 (Node 2 is the current position) is determined, and if so, the migration request information of b1 may be deleted. That is, in this way, it can be judged whether or not b1 and b2 constitute a loop structure, and, when constituting a loop structure, the respective states can be modified to a migration completed state, that is, without performing an actual migration operation.

If the loop structure is not formed, the next migration operation is waited to be triggered.

Further, the migration of the ring-formed structure is illustrated in conjunction with fig. 6 and 7:

in a 4-node cluster with an erasure ratio of 4+2, the data blocks of the node1 are recovered to other three nodes, then the node2 is offline, the node1 is online, and then the disk is offline after recovery, since recovery is prior to migration, after recovery is performed for many times, the data blocks in the same group of files are migrated back to form a ring, taking the file 67444838135472 as an example, and 6 block changes in the same group are shown in fig. 6 and 7 (file-object-block id).

After multiple recoveries, 67444838135472-1-6 was recovered on disk 3 of node2, 67444838135472-1-3 was recovered on disk1 of node1, and the information in the recovery completion table is: "TBL _ RECOVERD-1-1", "67444838135472-1-6", "TBL _ RECOVERD-2-3", "67444838135472-1-3". The two data blocks need to be migrated to the opposite node, but the migration judgment does not meet the preset reliability requirement, when the ring is formed, the migration position of 67444838135472-1-6 is modified to be the to-be-migrated position of 67444838135472-1-3, the to-be-migrated information of 67444838135472-1-3 is deleted, namely the information in the recovery completion table is changed to be: "TBL _ RECOVERD-2-3" and "67444838135472-1-6". And then carrying out next migration judgment, wherein the migration position is the same as the current position, the migration is not needed, the migration position is deleted from the recovery completion table, and the migration is completed after the ring-shaped data block.

After the migration is completed, the management node may receive the reported migration result, determine whether the migration is successful according to the migration state in the migration result, update the current location of the data block in the data if the migration is successful, and delete the data information from the migration table, otherwise, need to perform the migration again.

And after the migration of the data blocks corresponding to all the data information in the recovery completion table is completed, namely when the recovery completion table is empty, ending the recovery migration operation.

As can be seen from the above, in this embodiment, the migration operation further includes an extension migration when there is a new storage node or a new disk in the distributed cluster. In this embodiment, as shown in fig. 8, a recovery module may perform recovery processing on a data block (block), and the recovery migration module is used to implement the recovery migration processing. Moreover, the capacity expansion migration processing can be realized by using the capacity expansion migration module.

When a storage node or a disk is newly added, it is necessary to first determine whether nodes in a cluster are balanced or not, or whether disks in the nodes are balanced or not, and if the nodes are not balanced, data blocks need to be migrated.

In this embodiment, as can be seen from the above description, the migration table stores data information of data blocks to be migrated, the number of the data information in the migration table may be periodically scanned, and if the number of the data information in the migration table is greater than the preset threshold, it indicates that there are more data blocks to be migrated at present, and the volume expansion migration may not be triggered temporarily, so as to wait for the next scanning. If the number of the data information in the migration table is less than or equal to a preset threshold, whether the data information is balanced between the nodes or between the disks can be judged, and if the data information is unbalanced, capacity expansion migration is triggered.

Referring to fig. 9, in this embodiment, when detecting whether storage nodes are balanced, before triggering a migration operation, the detection of whether a cluster is balanced may be performed in the following manner:

s301, acquiring the used capacity of each storage node currently existing in the distributed cluster.

S302, judging whether the capacity expansion migration between the storage nodes is carried out or not according to the number of the storage nodes which currently exist, the used capacity of each storage node which currently exists and a preset storage node capacity difference threshold value, and triggering the migration operation if the capacity expansion migration between the storage nodes is judged to be carried out.

In this embodiment, the storage nodes currently existing in the distributed cluster include an existing storage node and a newly added storage node in the cluster.

In this embodiment, except for the newly added storage node, the used capacity of each storage node in the existing storage nodes may be the same or different. The total used capacity of the storage nodes of all the storage nodes in the distributed cluster can be calculated, that is, the used capacities of all the storage nodes are added. And calculating the average used capacity of the storage nodes according to the total used capacity of the storage nodes of all the storage nodes in the distributed cluster and the number of the currently existing storage nodes. And then, for any storage node, judging whether to perform capacity expansion migration between the storage nodes based on the difference between the used capacity of the storage node and the average used capacity of the storage node.

Specifically, when the absolute value of the difference between the used capacity of any storage node and the average used capacity of the storage node is greater than the preset storage node capacity difference threshold, it may be determined that the cluster has an imbalance and needs to be subjected to capacity expansion migration.

In this embodiment, it may be determined that capacity expansion migration needs to be performed when the following formula is satisfied:

Labs(nodeUsedCap-(nodeTotalUsedCap/nodeNum))>nodeUsedCapThread

wherein labs represents an absolute value, nodeUsedCap represents the used capacity of any storage node, nodeTotalUsedCap is the total used capacity of the storage node, nodeNum is the number of the storage nodes, and nodeUsedCapTahread is a preset storage node capacity difference threshold.

In addition, if a new disk is added below the storage node, similarly, it can be determined whether the node has an unbalanced phenomenon. Referring to fig. 10, in this embodiment, it may also be determined in advance whether to perform capacity expansion migration between the disks under the nodes by the following method:

s401, aiming at the storage nodes with the newly added disks in the distributed cluster, the used capacity of each disk under the storage nodes is obtained.

S402, judging whether the capacity expansion migration between the disks is carried out or not according to the number of the disks under the storage node, the used capacity of each disk and a preset disk capacity difference threshold, and triggering the migration operation if the capacity expansion migration between the disks is judged to be carried out.

In this embodiment, when a new disk exists under the storage node, whether to perform migration may be determined based on the used condition of the current disk. The total used capacity of the disks of all the disks can be calculated, and the average used capacity of the disks can be calculated according to the total number of the disks. For any disk, the difference between the used capacity of the disk and the average used capacity of the disk can be calculated, and whether migration is performed or not is judged according to the magnitude relation between the difference and a preset disk capacity difference threshold value. Specifically, when the absolute value of the difference is greater than the disk capacity difference threshold, it may be determined that migration is required.

In this embodiment, whether to perform migration between disks may be determined by the following formula:

Labs(diskUsedCap-(nodeUsedCap/diskNum))>diskUsedCapThread

the distUsedCap indicates the used capacity of any disk, the nodeUsedCap is the used capacity of a node (namely the total used capacity of the disk), the diskNum is the number of disks in a storage node, and the distUsedCapThread is a preset threshold value of the capacity difference between the disks.

In this embodiment, if the cluster load balancing or the node load balancing is determined, the next scanning determination may be performed after a certain time. If the data migration between the nodes or the data migration between the disks is needed, the migration operation can be executed until the cluster or the node load is balanced. Referring to fig. 11, when a migration operation is performed, the acquired data block to be migrated may be determined by:

s2011, comparing the used capacity of each storage node currently existing in the distributed cluster.

S2012 takes the storage node with the highest used capacity as the migration storage node, and takes the storage node with the lowest used capacity as the destination storage node.

And S2013, extracting the data block from the migrated storage node as the data block to be migrated.

In this embodiment, when the load in the cluster is unbalanced, the migrated storage node may be determined according to the ranking structure of the used capacity of each storage node, and similarly, the migrated disk may also be determined. And determining a destination storage node and a destination disk. The migration principle is that the node or disk with the highest used capacity is migrated to the storage node or disk with the lowest used capacity.

Therefore, the extraction of the data block to be migrated is to extract from the migrated storage node or the migrated disk. The specified number of data blocks can be extracted from the migrated storage node or the migrated disk, and the data information of the data blocks meeting the reliability requirement is stored in the migration table for waiting for migration. After the migration is successful, the data information of the data block may be deleted from the migration table. If the migration fails, the data information of the data block is deleted from the migration table in the same way, but the data block does not need to be returned to the recovery completion table, and the next round of capacity expansion migration scanning is waited.

The following illustrates an example of the capacity expansion migration process when a new storage node exists in the cluster:

in a cluster of three nodes, each node is provided with 10 disks with 8T capacity, the erasure ratio is 4+2, a plurality of files are created, the total capacity occupies 5000GB, the used capacity of each node is 2500GB, the used capacity of each disk is 250GB, after the creation of a file is completed, 1 new node 4 with the same configuration is expanded into the cluster, after the expansion is successful, the used capacity of the node 4 is 0, and the expansion migration is triggered, as shown in fig. 12.

During capacity expansion migration, the used capacity of each node and each disk may be obtained by polling, and the used capacity is sorted according to the size of the used capacity, with the sorting result shown in fig. 13.

Then judging whether the capacity expansion migration condition is met, setting a capacity difference threshold of the capacity expansion migration to be 500GB, and calculating a capacity difference value: and lab (2500 GB- (2500 x 3) GB/4) — 625GB > the threshold value 500GB, and the expansion migration condition can be determined to be met. According to the sorting result, the DN1 with the largest used capacity and the DISK1 after sorting are selected as the migrated DN and the migrated DISK, the DN4 is the migrated node, 100 blocks which are located on the DN1-DISK1 and meet the MIGRATION reliability are acquired, and the data information is put into a MIGRATION table TBL-MIGRATION for waiting for MIGRATION.

When the number of the migration tables is less than 10, the used capacities of the DNs and the disks are obtained again, capacity expansion migration capacity calculation is performed, and if each node completes a group of 100 block migrations, the node 1/2/3 has a capacity of 64M × 100 ═ 6400M ═ 6.25GB and migrates to the node 4, where 64M refers to the size of one data block. The used capacity at each node and disk at this time is as shown in fig. 14.

The used capacity of each node and disk can be obtained by polling, and the used capacity is sorted according to the size of the used capacity, and the sorting result is shown in fig. 15.

Calculating a capacity difference value: and (3) lab (2493.75 GB- (2493.75 × 3+18.75) GB/4) ═ 618.75GB > 500GB, the expansion migration condition is met, the expansion migration is carried out according to the same steps, and the expansion migration is stopped until the calculation result of the capacity difference is not greater than 500 GB.

The data migration and storage method provided by this embodiment can migrate the recovered data block to the original location after the cluster is recovered to normal, thereby ensuring the reliability of cluster data and the balance of cluster capacity. And for the data blocks which form the ring structure but cannot meet the preset reliability requirement and cannot be migrated back, the ring structure is destroyed, the migration completion state is directly set, and all the data blocks can be normally migrated. In addition, under the condition that the cluster does not meet the capacity use requirement and the capacity of the nodes is unbalanced after the new disks and the nodes are expanded, capacity expansion migration can ensure that the capacity of each node and each disk is balanced when the normal use of the cluster is not influenced, and the load balance of the cluster is ensured.

Referring to fig. 16, a diagram illustrating exemplary components of a management node according to an embodiment of the present application is provided. The management node may include a storage medium 110, a processor 120, a data migration storage 130, and a communication interface 140. In this embodiment, the storage medium 110 and the processor 120 are both located in the management node and are separately disposed. However, it should be understood that the storage medium 110 may be separate from the management node and may be accessed by the processor 120 through a bus interface. Alternatively, the storage medium 110 may be integrated into the processor 120, such as a cache and/or general purpose registers.

The data migration storage apparatus 130 may be understood as the management node, or the processor 120 of the management node, or may be understood as a software functional module that is independent of the management node or the processor 120 and implements the data migration storage method under the control of the management node.

As shown in fig. 17, the data migration storage apparatus 130 may include an obtaining module 131, a first detecting module 132, a migration module 133, a second detecting module 134, a marking module 135, and a waiting module 136. The functions of the functional modules of the data migration storage apparatus 130 are described in detail below.

An obtaining module 131, configured to obtain a data block to be migrated when a migration operation is triggered;

it is understood that the obtaining module 131 may be configured to perform the step S201, and as to the detailed implementation of the obtaining module 131, reference may be made to what is described above with respect to the step S201.

A first detecting module 132, configured to detect whether the data block to be migrated meets a preset reliability requirement according to a destination storage node of the data block to be migrated and a data block in a destination disk;

it is understood that the first detection module 132 can be used to perform the step S202, and for the detailed implementation of the first detection module 132, reference can be made to the above description regarding the step S202.

A migration module 133, configured to migrate the data block to be migrated to the destination storage node and the destination disk when the preset reliability requirement is met;

it is understood that the migration module 133 may be configured to perform the step S203, and for a detailed implementation of the migration module 133, reference may be made to the content related to the step S203.

A second detecting module 134, configured to detect whether the data block to be migrated forms a ring structure when the preset reliability requirement is not met, where the ring structure is a structure in which current storage nodes of two data blocks to be migrated are destination storage nodes of each other;

it is to be understood that the second detection module 134 can be used to perform the step S204, and for the detailed implementation of the second detection module 134, reference can be made to the content related to the step S204.

A marking module 135, configured to directly mark the two data blocks to be migrated as migration complete states when a ring structure is formed;

it is understood that the marking module 135 may be configured to perform the step S205, and as to the detailed implementation of the marking module 135, reference may be made to what is described above with respect to the step S205.

And a waiting module 136, configured to wait for triggering of a next migration operation when the ring structure is not formed.

It is understood that the wait module 136 can be used to execute the step S206, and the detailed implementation of the wait module 136 can refer to the content related to the step S206.

In a possible implementation manner, the migration operation includes a recovery migration operation after the storage node or the disk goes offline and then comes online, and the data migration storage apparatus 130 further includes a recovery module, where the recovery module may be configured to:

In a possible implementation, the recovery module may be configured to:

In a possible implementation manner, the migration operation includes a recovery migration operation after the storage node or the disk goes offline and then comes online, and the first detection module 132 may be configured to:

In a possible implementation manner, the migration operation includes an extension migration when there is a newly added storage node in the distributed cluster, and the data migration storage apparatus 130 may further include a determining module, where the determining module may be configured to:

judging whether the capacity expansion migration between the storage nodes is carried out or not according to the number of the storage nodes which exist at present, the used capacity of each storage node which exists at present and a preset storage node capacity difference threshold, and triggering the migration operation if the capacity expansion migration between the storage nodes is judged to be carried out.

In a possible implementation manner, the determining module may be configured to:

In a possible implementation, the obtaining module 131 may be configured to:

and extracting the data block from the migrated storage node as a data block to be migrated.

In a possible implementation manner, the migration operation includes an extension migration when there is a newly added storage node in the distributed cluster, and the determining module may be further configured to:

The description of the processing flow of each module in the device and the interaction flow between the modules may refer to the related description in the above method embodiments, and will not be described in detail here.

To sum up, according to the data migration and storage method, the data migration and storage device, and the management node, when a migration operation is triggered, a data block to be migrated is acquired, and when it is determined that the data block to be migrated meets a preset reliability requirement, the data block to be migrated is migrated to a target storage node and a target disk according to a target storage node of the data block to be migrated and a data block in a target disk. And when the preset reliability requirement is not met, detecting whether the data block to be migrated forms a ring structure, if so, directly marking the data block to be migrated as a migration completion state, and if not, waiting for triggering of the next migration operation. In the scheme, whether the data block to be migrated meets the preset reliability requirement is judged based on the data blocks in the target storage node and the target disk, so that the problem of data loss caused by subsequent abnormity possibly caused by migration is avoided. Moreover, for the data blocks to be migrated which form a special ring structure, a mode of directly marking successful migration is adopted, so that unnecessary migration work can be avoided on one hand, and reasonable migration of each data block can be guaranteed on the other hand.

The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present application should be covered within the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A data migration and storage method, applied to a management node in a distributed cluster, where the distributed cluster further includes a plurality of storage nodes connected to the management node, each storage node includes a plurality of disks, each disk is used for storing data, and each data is divided into a plurality of data blocks, and the method includes:

if a ring-forming structure is formed, directly marking the two data blocks to be migrated as a migration completion state;

2. The data migration and storage method according to claim 1, wherein the migration operation includes a recovery migration operation after the storage node or the disk is offline and then online, the method further includes a step of recovering the data block when the storage node or the disk is offline, the step includes:

3. The method for migrating data and storing according to claim 2, wherein the step of determining whether the storage node or the disk is offline includes:

4. The data migration and storage method according to claim 1, wherein the migration operation includes a recovery migration operation after a storage node or a disk is offline and then online, and the step of detecting whether the data block to be migrated meets a preset reliability requirement according to a target storage node of the data block to be migrated and a data block in a target disk includes:

5. The method for data migration and storage according to claim 1, wherein the migration operation includes an extension migration when there is a new storage node in the distributed cluster, and before the step of triggering the migration operation, the method further includes:

6. The data migration and storage method according to claim 5, wherein the step of determining whether to perform the capacity expansion migration between the storage nodes according to the number of the currently existing storage nodes, the used capacity of each currently existing storage node, and a preset capacity difference threshold includes:

7. The data migration storage method according to claim 5, wherein the step of obtaining the data block to be migrated includes:

8. The method for data migration and storage according to claim 1, wherein the migration operation includes an extension migration when there is a new storage node in the distributed cluster, and before the step of triggering the migration operation, the method further includes:

9. A data migration storage apparatus, applied to a management node in a distributed cluster, where the distributed cluster further includes a plurality of storage nodes connected to the management node, each storage node includes a plurality of disks, each disk is used for storing data, and each data is divided into a plurality of data blocks, and the apparatus includes:

10. A management node comprising one or more storage media and one or more processors in communication with the storage media, the one or more storage media storing processor-executable machine-executable instructions that, when executed by the management node, are executed by the processors to perform the method steps of any of claims 1-8.