CN114138197A

CN114138197A - Online cross-pool data migration method and electronic equipment

Info

Publication number: CN114138197A
Application number: CN202111426913.6A
Authority: CN
Inventors: 刘爱贵; 董冠军; 阮薛平
Original assignee: Beijing Dadao Yunxing Technology Co ltd
Current assignee: Beijing Dadao Yunxing Technology Co ltd
Priority date: 2021-11-28
Filing date: 2021-11-28
Publication date: 2022-03-04
Anticipated expiration: 2041-11-28
Also published as: CN114138197B

Abstract

The invention relates to the technical field of data migration, in particular to an online cross-pool data migration method and electronic equipment. The invention manages the relationship between the volume and the storage pool and the export of the volume by adopting the globally unique volume ID and using the reference relationship, realizes the online data migration of the volume and ensures the good mobility of the bottom layer data block. The mapping table is maintained through the distributed cache module, and strong consistency and access performance are guaranteed. And the performance and controllability of data migration are improved by parallel execution of the distributed operation modules.

Description

Online cross-pool data migration method and electronic equipment

Technical Field

The invention relates to the technical field of data migration, in particular to an online cross-pool data migration method and electronic equipment.

Background

In a distributed block storage system, a cluster is divided into multiple physically isolated storage pools containing disks on several different nodes. All data blocks of a volume are located in a storage pool. As shown in FIG. 1, the storage pool has nodes with disks on the nodes, and the two copies of data blocks C1, C2, C3 of a two-copy volume are located on different disks of different nodes, respectively. The location of the copy of a data block must satisfy the fault domain rule, i.e. different copies cannot fall into the same fault domain (node level fault domain in this example).

If there are multiple storage pools in a cluster, it is desirable to dynamically map a volume to a different storage pool, such as for data migration between a high performance storage pool and a high capacity storage pool. Data migration is divided into two modes: offline migration and online migration. Offline migration entails unloading all client connections of a volume and reusing them after migration is complete. On-line migration can realize cross-pool migration of volumes without affecting client IO.

In the prior art, the following defects exist: the volume ID contains storage pool information, although the process of obtaining the storage pool ID can be simplified, the coupling relation between the storage pool and the volume is generated, the dynamic modification is difficult, and the online cross-pool data migration is difficult.

The storage pool and volume tree structure is coupled with the volume export, so that only all the volumes under the storage pool can be exported by the same Target, and the export strategy cannot be flexibly customized to export volumes from different storage pools.

Disclosure of Invention

Aiming at the defects of the prior art, the invention discloses an online cross-pool data migration method and electronic equipment, which are used for solving the problems.

The invention is realized by the following technical scheme:

in a first aspect, the present invention provides an online cross-pool data migration method, where a central controller is responsible for generating a monotone increasing natural number, and a tree structure is used on a global configuration server to manage the relationship between storage pools and volumes, so that when a volume is migrated between different storage pools, a volume is unmapped from a source storage pool, and a volume is remapped into a Target storage pool, and finally volumes from different storage pools are derived through the same Target, thereby completing distributed data migration online and asynchronously.

Further, in the method, when a volume is created, an initial pool _ id attribute is recorded in metadata of the volume.

Furthermore, in the method, when the Client calls the lookup interface (volume _ id) to query the storage pool in which the Client is located from the local Range Controller, a cache scheme is adopted, the Master is not directly queried, but the cache on the Range Controller is queried, and the latest value is loaded on the Master only when the cache is not hit.

Furthermore, in the method, after the Master updates the volume _ id, pool _ id, the Client terminal initiates an RPC call to the Master: and after receiving the message, the Master informs all Range controllers of the failure of the cache entry by using a broadcast event, and then returns an RPC success message to the Client.

Further, in the method, the data blocks in the IO process are allocated according to needs, and the allocation process depends on pool _ id.

Further, in the method, when allocating the data block, the corresponding storage pool information is specified:

allocate:＝diskmap(pool_id,REPNUM)

wherein REPNUM is the number of copies of the application, because the data block ID itself does not carry pool _ ID information, the volume _ ID is utilized: pool _ id mapping.

Further, in the method, the export and access process of the volume, the export management establishes another reference relationship with the volume, and the export and access process and the export management dynamically establish and remove.

Furthermore, in the method, the distributed data migration process completed asynchronously is executed concurrently in a distributed operation mode, and a master/worker distributed operation architecture is adopted.

Furthermore, in the method, a master node polls a job queue regularly, jobs are initiated on all nodes according to requirements, a work thread on a worker node executes a specific task, and a QoS strategy is applied to the jobs.

In a second aspect, the present invention provides an electronic device, including a processor and a memory storing execution instructions, where when the processor executes the execution instructions stored in the memory, the processor executes the online cross-pool data migration method according to the first aspect.

The invention has the beneficial effects that:

the invention manages the relationship between the volume and the storage pool and the export of the volume by adopting the globally unique volume ID and using the reference relationship, realizes the online data migration of the volume, and ensures the good fluidity of the bottom layer data block on the premise of not interrupting the IO of the client. The volume _ id and pool _ id mapping table is maintained through the distributed cache module, so that strong consistency and access performance are guaranteed. And the performance and controllability of data migration are improved by parallel execution of the distributed operation modules.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a diagram illustrating a distribution of data block copies in a storage pool according to the background art of the present invention;

FIG. 2 is a diagram illustrating an organization of volumes within a cluster, in accordance with an embodiment of the present invention;

FIG. 3 is a diagram of a volume _ id to pool _ id mapping used by an embodiment of the present invention;

FIG. 4 is a diagram illustrating an embodiment of the present invention for updating the volume _ id to pool _ id mapping

FIG. 5 is a diagram of an IO process under the iSCSI protocol of an embodiment of the invention;

FIG. 6 is a diagram of a distributed job processing framework according to an embodiment of the present invention;

FIG. 7 is a diagram illustrating a migration process of data blocks according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Example 1

The embodiment provides an online cross-pool data migration method, in which a central controller is responsible for generating a monotone increasing natural number, and a tree structure is used on a global configuration server to manage the relationship between storage pools and volumes, so that when volumes are migrated between different storage pools, mapping of the volumes is removed from a source storage pool, the volumes are remapped into a Target storage pool, and finally the volumes from different storage pools are exported by the same Target, thereby completing distributed data migration online and asynchronously.

Referring to fig. 2, this embodiment provides an organization of volumes in a cluster, where storage pool a includes volume1 and volume 2, and storage pool B includes volume 3 and volume 4. Volume1 and volume 3 are derived by Target a, and volume 2 and volume 4 are derived by Target B. It can be seen that migrating volumes between different storage pools does not change the volumes themselves, and the basic process is to unmap volumes from the source storage pool and remap volumes to the target storage pool. Meanwhile, volumes from different storage pools can be exported by the same Target.

The embodiment adopts the globally unique volume ID, does not depend on the storage pool related information any more, and adopts two reference relations to manage the relation between the storage pools and the volumes and the export of the volumes.

Example 2

In a specific implementation level, this embodiment provides a specific application of the online cross-pool data migration method.

The present embodiment provides a 1: m dynamic mapping relationship between storage pools and volumes. Generation rule of volume ID: the central controller is responsible for generating monotonically increasing natural numbers.

On the global configuration server, the relation between the storage pool and the volume is managed by a tree structure, wherein the key point is that the storage pool information cannot be contained in the volume ID. In this way, the remapping of the metadata level can be accomplished by a simple rename operation. Update volume to storage pool remapping operation: remap (/ pool1/volume1,/pool2/volume 1).

This embodiment provides a mapping relationship for managing volume _ id to pool _ id, and when creating a volume, an initial pool _ id attribute is recorded in the metadata of the volume. However, the mapping relationship of volume _ id to pool _ id can be dynamically adjusted, and each controller needs to be able to sense the update process (which needs to satisfy the atomicity of the operation) in real time, so as to obtain the latest value. Therefore, the update process must be handled reliably.

As shown in FIG. 3, the Client calls the interface lookup (volume _ id) to query the local Range Controller for the pool in which it is located. In order to reduce the load of the Master, a buffer scheme is adopted: the Master is not directly queried, but the cache in the Range Controller is queried, and the latest value is loaded on the Master only when the cache misses.

In addition, after the volume _ id, pool _ id, is updated by the Master, the cache in the Range Controller fails, and the problem of distributed cache consistency needs to be solved. As shown in fig. 4, the Client terminal initiates an RPC call to the Master: and after receiving the message, the Master informs all Range controllers of the failure of the cache entry by using a broadcast event, and then returns an RPC success message to the Client.

The present embodiment provides an IO procedure. The IO process for the volume tracks the copy location information based on the metadata, so pool _ id is not needed. However, to support thin provisioning, the data blocks in the IO process are allocated on demand, and the allocation process depends on pool _ id. Therefore, the IO process has the same processing logic as the allocation process.

The present embodiment provides an allocation procedure of data blocks. When allocating data blocks, corresponding storage pool information needs to be specified:

allocate:＝diskmap(pool_id,REPNUM)

wherein REPNUM is the number of copies of the application. Since the data block ID itself does not carry pool _ ID information, it is necessary to use volume _ ID: pool _ id mapping.

The present embodiment provides for export and access of volumes. Because the volume ID is used for connection, the volume ID has global uniqueness and does not depend on storage pool information, and therefore the flow of the bottom layer data blocks among different storage pools does not influence the established client connection. Similarly to the reference relationship between storage pools and volumes, export management amounts to establishing another reference relationship with a volume, which may be dynamically established and removed.

The present embodiments provide for an asynchronously completed distributed data migration process. In order to improve the performance, the whole migration process is executed concurrently in a distributed operation mode, and a master/worker distributed operation frame is adopted. The master node polls the job queue regularly, initiates jobs on all nodes as required, executes specific tasks by the working thread on the worker node, and can apply a QoS strategy to the jobs.

Example 3

The embodiment provides an electronic device, which comprises a processor and a memory, wherein the memory stores execution instructions, and when the processor executes the execution instructions stored in the memory, the processor executes an online cross-pool data migration method.

In summary, the present invention uses the globally unique volume ID, and manages the relationship between the volume and the storage pool and the export of the volume by using the reference relationship, thereby implementing online data migration of the volume and ensuring good mobility of the underlying data block without interrupting the client IO. The volume _ id and pool _ id mapping table is maintained through the distributed cache module, so that strong consistency and access performance are guaranteed. And the performance and controllability of data migration are improved by parallel execution of the distributed operation modules.

The above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. An online cross-pool data migration method is characterized in that a central controller is responsible for generating monotone increasing natural numbers, the relation between storage pools and volumes is managed on a global configuration server through a tree structure, when the volumes are migrated between different storage pools, mapping of the volumes is removed from a source storage pool, the volumes are remapped into a Target storage pool, finally the volumes from different storage pools are exported through the same Target, and distributed data migration is completed online and asynchronously.

2. An online cross-pool data migration method according to claim 1, wherein in the method, when a volume is created, an initial pool _ id attribute is recorded in metadata of the volume.

3. The method as claimed in claim 1, wherein in the method, when a Client calls a lookup interface (volume _ id) to query the local Range Controller for the pool, a cache scheme is adopted, instead of querying the Master directly, the cache in the Range Controller is queried, and when the cache misses, the latest value is loaded on the Master.

4. The method for online cross-pool data migration according to claim 1, wherein in the method, after updating volume _ id: pool _ id through Master, the Client terminal initiates an RPC call to Master: and after receiving the message, the Master informs all Range controllers of the failure of the cache entry by using a broadcast event, and then returns an RPC success message to the Client.

5. The method of claim 1, wherein in the method, data blocks in IO processes are allocated on demand, and the allocation process depends on pool _ id.

6. The method of claim 5, wherein allocating data blocks specifies corresponding storage pool information:

allocate:＝diskmap(pool_id,REPNUM)

7. The method of claim 1, wherein the exporting and accessing process of the volume, the exporting management establishes another reference relationship with the volume, and the dynamically establishing and removing are performed.

8. The on-line cross-pool data migration method according to claim 1, wherein in the method, the asynchronously completed distributed data migration process is executed concurrently in a distributed job manner, and a master/worker distributed job architecture is adopted.

9. The method according to claim 8, wherein in the method, a master node regularly polls a job queue, initiates jobs on all nodes as required, executes specific tasks by a worker thread on a worker node, and applies a QoS policy to the jobs.

10. An electronic device comprising a processor and a memory storing execution instructions, the processor executing the online cross-pool data migration method of any of claims 1-9 when the processor executes the execution instructions stored by the memory.