CN107391033B

CN107391033B - Data migration method and device, computing equipment and computer storage medium

Info

Publication number: CN107391033B
Application number: CN201710555265.1A
Authority: CN
Inventors: 刘高辉; 李丹
Original assignee: Beijing Qihoo Technology Co Ltd
Current assignee: Beijing Qihoo Technology Co Ltd
Priority date: 2017-06-30
Filing date: 2017-06-30
Publication date: 2020-07-07
Anticipated expiration: 2037-06-30
Also published as: CN107391033A

Abstract

The invention discloses a data migration method, a data migration device, a computing device and a computer storage medium, wherein the method comprises the following steps: building a distributed data cluster system; deleting the second data node in the distributed data cluster system, and adding the first data node into the distributed data cluster system; adding a second data node to the distributed data cluster system; storing a part of data stored in the first data node into the second data node according to a data balancing strategy; the metadata recorded in the metadata node is updated. According to the scheme provided by the invention, the mode of storing data by the first data node can be expanded into the mode of storing data by the distributed data cluster system, and the data stored in the first data node is completely migrated to the distributed data cluster system in the expanding process, so that the aim of improving the efficiency of reading and writing data by a database is fulfilled when the data amount is increased.

Description

Data migration method and device, computing equipment and computer storage medium

Technical Field

The invention relates to the technical field of data storage, in particular to a data migration method and device, computing equipment and a computer storage medium.

Background

MongoDB is a non-relational database based on distributed file storage, which has two storage modes, the first is storage in duplicate sets, and the second is distributed storage. When the data volume is small, the data is stored in the duplicate set, that is, the data is stored in a single machine, but the single machine storage has a bottleneck, for example, the single machine storage is limited by the space of a disk and a CPU, when the data volume is increased, the read-write efficiency of the single machine storage becomes low because a larger storage space cannot be provided, at this time, the data needs to be stored in a distributed storage mode, that is, the data is stored in multiple machines, and the data originally stored in the single machine can be shared to the multiple machines for storage by using the distributed storage mode.

The prior art has not provided a measure for completely migrating the data stored in the copy set to the distributed storage system, so as to achieve the effect of improving the efficiency of database reading and writing when the data volume is increased.

Disclosure of Invention

In view of the above, the present invention has been made to provide a data migration method and apparatus, a computing device, a computer storage medium that overcome or at least partially solve the above problems.

According to an aspect of the present invention, a data migration method for migrating data stored in a first data node to a distributed data cluster system is provided, which includes:

newly building a distributed data cluster system, wherein the distributed data cluster system comprises at least one routing node, at least one metadata node and at least one second data node, and data stored in the second data node is empty;

deleting the second data node in the distributed data cluster system, and adding the first data node into the distributed data cluster system;

adding a second data node to the distributed data cluster system;

storing a part of data stored in the first data node into the second data node according to a data balancing strategy;

the metadata recorded in the metadata node is updated.

According to another aspect of the present invention, there is provided a data migration apparatus for migrating data stored in a first data node into a distributed data cluster system, including:

the system comprises a new building module, a first data node and a second data node, wherein the new building module is suitable for building a distributed data cluster system, the distributed data cluster system comprises at least one routing node, at least one metadata node and at least one second data node, and data stored in the second data node is empty;

the loading module is suitable for deleting the second data node in the distributed data cluster system and adding the first data node into the distributed data cluster system;

the adding module is suitable for adding the second data node into the distributed data cluster system;

the balancing module is suitable for storing a part of data stored in the first data node into the second data node according to the data balancing strategy;

and the updating module is suitable for updating the metadata recorded in the metadata node.

According to yet another aspect of the present invention, there is provided a computing device comprising: the processor, the memory and the communication interface complete mutual communication through the communication bus;

the memory is used for storing at least one executable instruction, and the executable instruction enables the processor to execute the operation corresponding to the data migration method.

According to still another aspect of the present invention, a computer storage medium is provided, where at least one executable instruction is stored in the storage medium, and the executable instruction causes a processor to perform operations corresponding to the data migration method.

According to the data migration method, the data migration device, the computing equipment and the computer storage medium, a distributed data cluster system comprising at least one routing node, at least one metadata node and at least one second data node is newly built, a first data node storing data is added into the distributed data cluster system, and therefore the data stored in the first data node is completely migrated to the distributed data cluster system; and data balance in the first data node and the second data node is achieved according to a data balance strategy, so that the aim of improving the database reading and writing efficiency when the data volume is increased is achieved.

The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.

Drawings

Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:

FIG. 1 illustrates a flow diagram of a data migration method according to one embodiment of the present invention;

FIG. 2 illustrates a flow diagram of a data migration method according to another embodiment of the present invention;

FIG. 3 illustrates a functional block diagram of a data migration apparatus according to one embodiment of the present invention;

FIG. 4 shows a functional block diagram of a data migration apparatus according to another embodiment of the present invention;

FIG. 5 illustrates a schematic structural diagram of a computing device according to an embodiment of the invention.

Detailed Description

Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

The invention aims to realize the migration of data stored in a first data node to a distributed data cluster system capable of storing more data under the condition of increasing business data when the data are stored so as to improve the database reading and writing efficiency.

Taking the MongoDB database as an example, one storage mode is storage in a copy set, and the other storage mode is storage in a sharded cluster mode. Correspondingly, the first data node in the following embodiments is a MongoDB copy set, which requires at least two nodes, one of which is a master node and is responsible for processing the client request and recording all operations thereon; the other slave nodes are responsible for periodically polling the master node to acquire operations on the master node and then executing the operations on the own data copy, so that the data of the slave nodes are consistent with the master node; in addition, in order that the main node with the MongoDB duplicate set can continuously provide read-write service for the service requester, the voting nodes are arranged in the MongoDB duplicate set to ensure the success of the election main node. Correspondingly, the distributed data cluster system of the following embodiments is a mongoDB shard cluster, that is, a Mongos architecture, which is a database architecture for distributed storage, and is suitable for storage of massive data, the Mongos architecture includes a plurality of routing nodes, a plurality of shard nodes and a plurality of metadata nodes, which correspond to the routing nodes, the second data nodes and the metadata nodes in the following embodiments respectively, wherein, the routing node is used for directing the requirement of the service request party to the corresponding fragment, the position of the stored data in each fragment is recorded in the metadata, the fragment node is used for storing a large amount of data, each fragment is divided into a plurality of data blocks (chunck or barrel), when a user inquires data, the corresponding data is inquired through the storage position of the corresponding data in the metadata node, when a huge amount of data storage requirements are met, the whole database cluster can be expanded by adding the shards.

In the two storage modes of the MongoDB database, the MongoDB copy set is equivalent to a single fragment node in a Mongos framework and is used for storing business data, and under the condition that the business data is increased, the storage capacity of the MongoDB copy set cannot meet the business requirement, so that the data in the MongoDB copy set can be migrated to a plurality of fragments in the Mongos framework for storage, and the Mongos framework provides services to the outside to meet the requirement of a business requester.

FIG. 1 shows a flow diagram of a data migration method according to one embodiment of the invention. The method is used for migrating data stored in a first data node to a distributed data cluster system, and as shown in fig. 1, the method includes the following steps:

and step S101, establishing a distributed data cluster system.

The purpose of the present invention is to implement migration of data stored in a first data node to a distributed data cluster system capable of storing more data, so that a distributed data cluster system capable of storing business data corresponding to current data needs to be constructed at the beginning of the present invention.

Specifically, the distributed data cluster system comprises at least one routing node, at least one metadata node and at least one second data node, and data stored in the second data node is empty. The at least one routing node is an entry for data read-write access, and is used for directing the requirement of a service requester to a corresponding second data node, and a routing rule for routing data is set in the at least one routing node, and when the routing rule is used for routing data, the data is stored in different data blocks of the second data node according to the structure of the cluster system; at least one metadata node records the mapping relation between the storage data and the storage position, and the metadata node records the data nodes in which the corresponding data can be inquired; at least one second data node is used for storing data, the specific stored data is determined by the routing rule, and the data stored in the second data node is changed due to the extension of the cluster system (such as adding a new data node).

The newly-built distributed data cluster system can provide services to the outside by the first data node and the at least one second data node, and can meet larger business requirements compared with the method that data is stored only by the first data node before migration.

And S102, deleting the second data node in the distributed data cluster system, and adding the first data node into the distributed data cluster system.

When data in a first data node is to be completely migrated to a newly-built distributed data cluster system, a second data node is added to the distributed data cluster system as a whole, so that the data is not lost in the migration process.

For the distributed data cluster system, under the condition that a second data node exists in the system, a node with empty stored data can be directly added into the distributed data cluster system, and the condition is suitable for the condition that after the data migration is successful, the second data node needs to be continuously added to the distributed data cluster system to realize capacity expansion. However, when the second data node already exists in the system, the first data node storing the data cannot be directly added to the distributed data cluster system, that is, if the first data node is directly added to the distributed data cluster system, the loading fails.

Specifically, the step of adding the first data node to the distributed data cluster system as a whole is as follows: and deleting the second data node in the distributed data cluster system, and adding the first data node into the distributed data cluster system. After the second data node is deleted, the incidence relation between the second data node and the distributed data cluster system is deleted, and the corresponding distributed data cluster system does not have a node for storing data any more; and adding a first data node storing business data to the distributed data cluster system, so as to establish an association relationship between the first data node and the distributed data cluster system, wherein the first data node is the only node in the distributed data cluster system which can be used for storing data, the cluster system can default that all data are stored in the data block of the first data node, and the metadata node can record corresponding metadata of the data block in which all data are stored in the first data node.

And step S103, adding the second data node into the distributed data cluster system.

After the first data node is successfully added to the distributed data cluster system, the storage space of the distributed data cluster system, which can store data, is the same as the storage space of the first data node, which stores data, in the duplicate set before the addition, and is not enlarged. If the requirement of larger service data is to be met, the second data node needs to be added to the distributed data cluster system, and the number of the second data nodes can be set according to the quantity of the service data so as to meet the requirement of efficiency of data reading and writing by using the most reasonable resource configuration.

And step S104, storing a part of data stored in the first data node into the second data node according to the data balancing strategy.

The second data node added to the distributed data cluster system is a node with empty data storage, so that the condition of data storage imbalance occurs, and when data is read and written, the read-write efficiency of the whole system is affected due to the heavy burden of the first data node.

Specifically, the distributed data cluster system detects whether data are balanced or not according to the number of nodes and the data volume at a certain frequency, and performs reallocation on data stored in a first data node by using a data balancing strategy to balance the data into a second data node, so as to balance the data volumes stored in the first data node and the second data node. Taking the Mongos framework as an example, in order to avoid the data imbalance, the Mongos framework has a data balancing function, a part of data in the fragments with more stored data is moved out and stored in other relatively idle fragments, so that the effect of balancing the data amount in all the fragments is achieved, that is, when the difference of the data amount in different fragments reaches a certain degree, the Mongos starts the data balancer to perform data balancing processing, and the data amount in each fragment is guaranteed to be consistent as much as possible.

Step S105, the metadata recorded in the metadata node is updated.

Before data equalization, all data are stored in the data block of the first data node, and at the moment, the metadata recorded in the metadata node are the corresponding metadata of the data block in which all data are stored in the first data node; after data balancing is performed, data originally stored in the first data node is balanced to all nodes, which is equivalent to updating data in the first data node and the second data node, and metadata corresponding to each node is also updated, so that the metadata recorded in the metadata node is consistent with the data and storage position relationship stored in each node at present.

In the data migration method provided by this embodiment, by newly building a distributed data cluster system including at least one routing node, at least one metadata node, and at least one second data node, the distributed data cluster system can expand nodes for storing data, and reduce storage pressure when a single first data node is used for storage; adding a first data node storing data into the distributed data cluster system, and completely migrating the data stored in the first data node into the distributed data cluster system; and data balance in the first data node and the second data node is achieved according to a data balance strategy, so that the aim of improving the database reading and writing efficiency when the data volume is increased is achieved. The embodiment for realizing the above purpose is suitable for the situations that the traffic volume is increased, the service data is increased, and the capacity of the storage space needs to be expanded on the basis of ensuring that the historical service data is not lost.

FIG. 2 shows a flow diagram of a data migration method according to another embodiment of the invention. As shown in fig. 2, the method comprises the steps of:

step S201, a distributed data cluster system is newly built according to the minimization principle.

The minimization principle is that even if the number of nodes of the newly-built distributed data cluster system is minimum, specifically: the distributed cluster system comprises a routing node, a metadata node and a second data node.

Step S202, a routing rule is set in the routing node.

Routing rules are rules for determining the routing of service data, for example, in the Mongos architecture, it is necessary to determine to store data in a certain segment according to the routing rules of routing nodes.

Step S203, the service connection between the service requester and the first data node is disconnected.

In the process of adding the first data node to the distributed data cluster system to implement data migration, in order to ensure stability and reliability of data reading and writing, before adding the first data node to the distributed data cluster system, a service connection between a service requester and the first data node needs to be disconnected, and for a requester, that is, a service is suspended for a period of time, in such a period of time, accuracy of migrating data is exchanged, taking replicated data as an example, if the service connection is not disconnected in the process of replication and data is written at the same time, a situation of omission or a replication error may occur.

Specifically, the operation of disconnecting the service connection is performed by a Database Administrator (DBA).

And step S204, deleting the second data node in the distributed data cluster system, and adding the first data node into the distributed data cluster system.

Because a newly-built distributed data cluster system has a second data node, if the first data node is directly added into the distributed data cluster system at this time, loading fails, the second data node in the distributed data cluster system needs to be deleted first, after the second data node is deleted, the association relationship between the second data node and the distributed data cluster system is deleted, and the corresponding distributed data cluster system does not have a node for storing data any more; and adding a first data node storing business data to the distributed data cluster system, so as to establish an association relationship between the first data node and the distributed data cluster system, wherein the first data node is the only node in the distributed data cluster system which can be used for storing data, the cluster system can default that all data are stored in the data block of the first data node, and the metadata node can record corresponding metadata of the data block in which all data are stored in the first data node.

And step S205, adding the second data node into the distributed data cluster system.

After the first data node is successfully added to the distributed data cluster system, the storage space capable of storing data is the same as that before the first data node is added, namely the storage space is used for storing data and is not enlarged. If the requirement of larger service data is to be met, the second data node needs to be added to the distributed data cluster system, and the number of the second data nodes can be set according to the quantity of the service data so as to meet the requirement of efficiency of data reading and writing by using the most reasonable resource configuration.

Step S206, storing a part of data stored in the first data node into the second data node according to the routing rule.

The distributed data cluster system may detect whether data is balanced at a certain frequency according to the number of nodes and the data amount, specifically, after data imbalance in each node is detected, perform data balancing according to a data balancing policy and a routing rule in a routing node, for example, after an empty second data node is added to the distributed data cluster system each time, data balancing may be implemented according to the routing rule.

In a specific embodiment, the routing rule is specifically: and carrying out data routing according to the hash value of the data key. For example, a hash value is obtained for a primary key of data, if the hash value is 1, the data is stored in a data block of a first data node, corresponding metadata is a mapping relation between the data and a storage position, the data stored in the first data node is sequentially subjected to the hash value obtaining, and the data is stored according to the hash value to achieve balance of data quantities stored in the first data node and a second data node, wherein whether the data quantities are balanced is determined by a data balancing strategy.

Step S207, recording the mapping relationship between the data stored in the first data node and the second data node and the storage location thereof in the metadata node.

Before data equalization, defaulting that all data are stored in a data block of a first data node, and recording corresponding metadata of the data block, in which all data are stored in the first data node, in a metadata node; after data balancing, a part of data in the first data node is stored in the second data node, and at this time, the data in the second data node is no longer empty, so the metadata corresponding to the second data node is no longer empty, and the metadata in the metadata node needs to be updated according to the routing condition during data balancing.

Specifically, the mapping relationship between the data remaining stored in the first data node and the storage location thereof is recorded in the metadata node, and the mapping relationship between the data stored in the second data node and the storage location thereof is recorded in the metadata node.

And step S208, notifying the access address of the routing node to the service request party, and establishing service connection between the service request party and the distributed data cluster system.

Before the first data node is added to the distributed data cluster system, the service connection between the service requester and the first data node is disconnected, so after the distributed data cluster system is deployed and data is updated, the access address of the routing node needs to be notified to the service requester, and the service connection between the service requester and the distributed data cluster system is established.

The distributed data cluster system which completes data migration and data updating is utilized to realize the storage of larger service data, wherein the routing node is used as an entrance for data reading and writing, when data is written, the data is stored into the data block of the corresponding node according to the routing rule, and the corresponding metadata is recorded in the metadata node; when data is read, if the data read request is based on a data key, the data can be directly inquired from the routing node to the metadata node about the mapping relationship between the data and the storage position, and then the data is read from the corresponding storage position and the routing node feeds back the request.

In another embodiment of the present invention, the newly built distributed data cluster system includes a plurality of routing nodes, and the routing nodes can share the pressure of processing requests when a large number of concurrent requests occur.

In the data migration method provided by this embodiment, by newly building a distributed data cluster system including a routing node, a metadata node, and a second data node, the distributed data cluster system can expand nodes for storing data, and reduce storage pressure when storing with the first data node; service connection is cut off in the process of migrating the data in the first data node to the distributed data cluster system, so that the accuracy of data migration and the reliability of access are ensured; adding a first data node storing data into the distributed data cluster system, and completely migrating the data stored in the first data node into the distributed data cluster system; and data balance in the first data node and the second data node is realized according to the routing rule, so that the aim of improving the database reading and writing efficiency when the data volume is increased is fulfilled. The embodiment for realizing the above purpose is suitable for the situations that the traffic volume is increased, the service data is increased, and the capacity of the storage space needs to be expanded on the basis of ensuring that the historical service data is not lost.

FIG. 3 shows a functional block diagram of a data migration apparatus according to one embodiment of the present invention. As shown in fig. 3, the apparatus includes: a new building module 31, a loading module 32, an adding module 33, a balancing module 34 and an updating module 35.

The new building module 31 is adapted to build a distributed data cluster system, where the distributed data cluster system includes at least one routing node, at least one metadata node, and at least one second data node, and data stored in the second data node is empty.

And the loading module 32 is suitable for deleting the second data node in the distributed data cluster system and adding the first data node into the distributed data cluster system.

After the second data node is deleted, the incidence relation between the second data node and the distributed data cluster system is deleted, and the corresponding distributed data cluster system does not have a node for storing data any more; and adding a first data node storing business data to the distributed data cluster system, so as to establish an association relationship between the first data node and the distributed data cluster system, wherein the first data node is the only node in the distributed data cluster system which can be used for storing data, the cluster system can default that all data are stored in the data block of the first data node, and the metadata node can record corresponding metadata of the data block in which all data are stored in the first data node.

An adding module 33 adapted to add the second data node to the distributed data cluster system.

If the requirement of larger service data is to be met, a second data node needs to be added to the distributed data cluster system.

A balancing module 34 adapted to store a portion of the data stored in the first data node into the second data node according to a data balancing policy;

specifically, the distributed data cluster system detects whether data are balanced or not according to the number of nodes and the data volume at a certain frequency, and performs reallocation on data stored in a first data node by using a data balancing strategy to balance the data into a second data node, so as to balance the data volumes stored in the first data node and the second data node.

An updating module 35 adapted to update the metadata recorded in the metadata node.

In the data migration apparatus provided in this embodiment, by newly building a distributed data cluster system including at least one routing node, at least one metadata node, and at least one second data node, the distributed data cluster system can expand nodes for storing data, and reduce storage pressure when storing in a single copy set manner; adding a first data node storing data into the distributed data cluster system, and completely migrating the data stored in the first data node into the distributed data cluster system; and data balance in the first data node and the second data node is achieved according to a data balance strategy, so that the aim of improving the database reading and writing efficiency when the data volume is increased is achieved. The embodiment for realizing the above purpose is suitable for the situations that the traffic volume is increased, the service data is increased, and the capacity of the storage space needs to be expanded on the basis of ensuring that the historical service data is not lost.

FIG. 4 shows a functional block diagram of a data migration apparatus according to another embodiment of the present invention. As shown in fig. 4, the apparatus further includes, on the basis of fig. 3: a setup module 41 and a service processing module 42.

A setting module 41 adapted to set routing rules in the routing nodes.

A service handling module 42 adapted to disconnect the service connection between the service requester and the first data node.

In the process of adding the first data node to the distributed data cluster system to implement data migration, in order to ensure stability and reliability of data reading and writing, before the first data node is added to the distributed data cluster system, the service connection between the service requester and the first data node needs to be disconnected, and for the requester, the service is suspended for a period of time, so that the accuracy of migrating data is exchanged for the period of time.

The traffic handling module 42 is further adapted to: and informing the access address of the routing node to a service request party, and establishing service connection between the service request party and the distributed data cluster system.

The newly created module 31 is further adapted to: and establishing a distributed data cluster system according to the minimization principle, so that the distributed data cluster system comprises a routing node, a metadata node and a second data node.

The equalization module 34 is further adapted to: and storing a part of the data stored in the first data node into the second data node according to the routing rule.

The routing rule is specifically as follows: and carrying out data routing according to the hash value of the data key.

The update module 35 is further adapted to: the mapping relation between the data which is remained and stored in the first data node and the storage position thereof is recorded in the metadata node, and the mapping relation between the data which is stored in the second data node and the storage position thereof is recorded in the metadata node.

In the data migration apparatus provided in this embodiment, by newly building a distributed data cluster system including a routing node, a metadata node, and a second data node, the distributed data cluster system can expand nodes for storing data, and reduce storage pressure when storing in a single copy set manner; service connection is cut off in the process of migrating the data in the first data node to the distributed data cluster system, so that the accuracy of data migration and the reliability of access are ensured; adding a first data node storing data into the distributed data cluster system, and completely migrating the data stored in the first data node into the distributed data cluster system; and data balance in the first data node and the second data node is realized according to the routing rule, so that the aim of improving the database reading and writing efficiency when the data volume is increased is fulfilled. The embodiment for realizing the above purpose is suitable for the situations that the traffic volume is increased, the service data is increased, and the capacity of the storage space needs to be expanded on the basis of ensuring that the historical service data is not lost.

The embodiment of the application provides a nonvolatile computer storage medium, and the computer storage medium stores at least one executable instruction, and the computer executable instruction can execute the data migration method in any method embodiment.

Fig. 5 is a schematic structural diagram of a computing device according to an embodiment of the present invention, and the specific embodiment of the present invention does not limit the specific implementation of the computing device.

As shown in fig. 5, the computing device may include: a processor (processor)502, a Communications Interface 504, a memory 506, and a communication bus 508.

Wherein:

the processor 502, communication interface 504, and memory 506 communicate with one another via a communication bus 508.

A communication interface 504 for communicating with network elements of other devices, such as clients or other servers.

The processor 502 is configured to execute the program 510, and may specifically perform relevant steps in the data migration method embodiment described above.

In particular, program 510 may include program code that includes computer operating instructions.

The processor 502 may be a central processing unit CPU, or an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits configured to implement an embodiment of the invention. The computing device includes one or more processors, which may be the same type of processor, such as one or more CPUs; or may be different types of processors such as one or more CPUs and one or more ASICs.

And a memory 506 for storing a program 510. The memory 506 may comprise high-speed RAM memory, and may also include non-volatile memory (non-volatile memory), such as at least one disk memory.

The program 510 may specifically be used to cause the processor 502 to perform the following operations:

adding a second data node to the distributed data cluster system;

the metadata recorded in the metadata node is updated.

The program 510 may be specifically configured to cause the processor 502 to perform the following operations:

and establishing a distributed data cluster system according to the minimization principle, so that the distributed data cluster system comprises a routing node, a metadata node and a second data node.

setting a routing rule in a routing node;

and storing a part of the data stored in the first data node into the second data node according to the routing rule.

and carrying out data routing according to the hash value of the data key.

and disconnecting the service connection between the service requester and the first data node.

and informing the access address of the routing node to a service request party, and establishing service connection between the service request party and the distributed data cluster system.

the mapping relation between the data which is remained and stored in the first data node and the storage position thereof is recorded in the metadata node, and the mapping relation between the data which is stored in the second data node and the storage position thereof is recorded in the metadata node.

The algorithms and displays presented herein are not inherently related to any particular computer, virtual machine, or other apparatus. Various general purpose systems may also be used with the teachings herein. The required structure for constructing such a system will be apparent from the description above. Moreover, the present invention is not directed to any particular programming language. It is appreciated that a variety of programming languages may be used to implement the teachings of the present invention as described herein, and any descriptions of specific languages are provided above to disclose the best mode of the invention.

In the description provided herein, numerous specific details are set forth. It is understood, however, that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.

Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be interpreted as reflecting an intention that: that the invention as claimed requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.

Those skilled in the art will appreciate that the modules in the device in an embodiment may be adaptively changed and disposed in one or more devices different from the embodiment. The modules or units or components of the embodiments may be combined into one module or unit or component, and furthermore they may be divided into a plurality of sub-modules or sub-units or sub-components. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where at least some of such features and/or processes or elements are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.

Furthermore, those skilled in the art will appreciate that while some embodiments herein include some features included in other embodiments, rather than other features, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the following claims, any of the claimed embodiments may be used in any combination.

The various component embodiments of the invention may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art will appreciate that a microprocessor or Digital Signal Processor (DSP) may be used in practice to implement some or all of the functions of some or all of the components in a data migration apparatus according to embodiments of the present invention. The present invention may also be embodied as apparatus or device programs (e.g., computer programs and computer program products) for performing a portion or all of the methods described herein. Such programs implementing the present invention may be stored on computer-readable media or may be in the form of one or more signals. Such a signal may be downloaded from an internet website or provided on a carrier signal or in any other form.

It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names.

Claims

1. A data migration method for migrating data stored in a first data node to a distributed data cluster system, comprising:

newly building a distributed data cluster system, wherein the distributed data cluster system comprises at least one routing node, at least one metadata node and at least one second data node, and data stored in the second data node is null;

deleting the second data node in the distributed data cluster system, then adding the first data node into the distributed data cluster system, and establishing an association relationship between the first data node and the distributed data cluster system;

adding the second data node to the distributed data cluster system;

storing a part of data stored in the first data node into a second data node according to a data balancing strategy;

and updating the metadata recorded in the metadata node.

2. The method of claim 1, wherein the newly created distributed data cluster system further comprises: and building a distributed data cluster system according to a minimization principle, so that the distributed data cluster system comprises a routing node, a metadata node and a second data node.

3. The method of claim 1 or 2, wherein after the newly building a distributed data cluster system, the method further comprises: setting a routing rule in a routing node;

the storing a portion of the data stored in the first data node into the second data node further comprises:

and storing a part of data stored in the first data node into a second data node according to the routing rule.

4. The method according to claim 3, wherein the routing rule is specifically: and carrying out data routing according to the hash value of the data key.

5. The method of any of claims 1-4, wherein prior to the adding the first data node into the distributed data cluster system, the method further comprises: and disconnecting the service connection between the service requester and the first data node.

6. The method of claim 5, wherein after the updating the meta information recorded in the meta data node, the method further comprises: and notifying the access address of the routing node to the service request party, and establishing service connection between the service request party and the distributed data cluster system.

7. The method of any of claims 1-6, wherein the updating metadata recorded in a metadata node further comprises: and recording the mapping relation between the data stored in the second data node and the storage position thereof in the metadata node.

8. A data migration apparatus for migrating data stored in a first data node into a distributed data cluster system, comprising:

the loading module is suitable for deleting the second data node in the distributed data cluster system, then adding the first data node into the distributed data cluster system, and establishing an association relation between the first data node and the distributed data cluster system;

an adding module adapted to add the second data node to the distributed data cluster system;

the balancing module is suitable for storing a part of data stored in the first data node into a second data node according to a data balancing strategy;

9. The apparatus of claim 8, wherein the newly created module is further adapted to: and building a distributed data cluster system according to a minimization principle, so that the distributed data cluster system comprises a routing node, a metadata node and a second data node.

10. The apparatus of claim 8 or 9, wherein the apparatus further comprises: a setting module adapted to set routing rules in the routing nodes;

the equalization module is further adapted to: and storing a part of data stored in the first data node into a second data node according to the routing rule.

11. The apparatus according to claim 10, wherein the routing rule is specifically: and carrying out data routing according to the hash value of the data key.

12. The apparatus of any one of claims 8-11, wherein the apparatus further comprises: and the service processing module is suitable for disconnecting the service connection between the service requester and the first data node.

13. The apparatus of claim 12, wherein the traffic processing module is further adapted to: and notifying the access address of the routing node to the service request party, and establishing service connection between the service request party and the distributed data cluster system.

14. The apparatus of any of claims 8-13, wherein the update module is further adapted to:

recording in the metadata node a mapping between the data remaining stored in the first data node and its storage location, an

And recording the mapping relation between the data stored in the second data node and the storage position thereof in the metadata node.

15. A computing device, comprising: the system comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface complete mutual communication through the communication bus;

the memory is used for storing at least one executable instruction, and the executable instruction causes the processor to execute the operation corresponding to the data migration method according to any one of claims 1-7.

16. A computer storage medium having stored therein at least one executable instruction that causes a processor to perform operations corresponding to the data migration method of any one of claims 1-7.