WO2019001017A1 - Procédé et système de transfert de données entre groupes, serveur et support de stockage informatique - Google Patents

Procédé et système de transfert de données entre groupes, serveur et support de stockage informatique Download PDF

Info

Publication number
WO2019001017A1
WO2019001017A1 PCT/CN2018/079027 CN2018079027W WO2019001017A1 WO 2019001017 A1 WO2019001017 A1 WO 2019001017A1 CN 2018079027 W CN2018079027 W CN 2018079027W WO 2019001017 A1 WO2019001017 A1 WO 2019001017A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
cluster
storage system
intermediate storage
child node
Prior art date
Application number
PCT/CN2018/079027
Other languages
English (en)
Chinese (zh)
Inventor
张恒
杨挺
Original Assignee
北京奇虎科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京奇虎科技有限公司 filed Critical 北京奇虎科技有限公司
Publication of WO2019001017A1 publication Critical patent/WO2019001017A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/214Database migration support
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor

Definitions

  • the present disclosure relates to the field of computer technologies, and in particular, to an inter-cluster data migration method, system, server, and non-transitory computer readable storage medium.
  • the existing technical solution is to directly transfer data between two clusters during data migration.
  • each node For each data table, each node needs to start a remote transmission process, and the time required to start the service is long, and when the data table is in the data table, When the data is very small, the time required to transfer the data may be short. It is not as long as the time required to start the service.
  • the data migration speed is very slow, and the existing data migration method cannot transfer the empty table between the clusters. When an empty table is used, the process is blocked.
  • the existing data migration method does not support the migration of data of a cluster with a large number of child nodes to a cluster with a small number of child nodes.
  • the present disclosure has been made in order to provide an inter-cluster data migration method, an inter-cluster data migration system, a server, and a non-transitory computer-readable storage medium that overcome the above problems or at least partially solve the above problems.
  • an inter-cluster data migration method for data migration between a first cluster and a second cluster, the first cluster and the second cluster each including at least one child node; the method includes :
  • each child node in the first cluster writes the respective data in parallel to the specified path of the intermediate storage system
  • the data is read and stored from the designated path of the intermediate storage system in parallel by the respective child nodes in the second cluster in accordance with the data redistribution policy of the second cluster.
  • an inter-cluster data migration system for data migration between a first cluster and a second cluster, the system comprising: a first cluster, a second cluster, an intermediate storage system, Wherein, the first cluster and the second cluster each include at least one child node;
  • Each child node in the first cluster is adapted to write the respective data to the specified path of the intermediate storage system in parallel according to the data migration request;
  • Each child node in the second cluster is adapted to read data from the specified path of the intermediate storage system and store it in parallel according to the data redistribution policy of the second cluster.
  • a server including: a processor, a memory, a communication interface, and a communication bus, wherein the processor, the memory, and the communication interface complete communication with each other through a communication bus;
  • the memory is configured to store at least one executable instruction, and the executable instruction causes the processor to perform an operation corresponding to the inter-cluster data migration method.
  • a computer program comprising:
  • Computer readable code when the computer readable code is run on a computing device, causes the computing device to perform the inter-cluster data migration method described above.
  • a non-transitory computer readable storage medium having stored therein at least one executable instruction that causes a processor to execute The operation corresponding to the above inter-cluster data migration method.
  • each child node in the first cluster writes the respective data in parallel to the specified path of the intermediate storage system; each child node in the second cluster follows the second cluster.
  • the data redistribution strategy reads data from the specified path of the intermediate storage system in parallel and stores it, which improves the data migration speed and reduces the time required for data migration.
  • the second cluster reads data from the intermediate storage system.
  • FIG. 1 is a schematic flowchart diagram of an inter-cluster data migration method according to Embodiment 1 of the present disclosure
  • FIG. 2 is a schematic flowchart diagram of an inter-cluster data migration method according to Embodiment 2 of the present disclosure
  • FIG. 3 is a schematic flowchart diagram of an inter-cluster data migration method according to Embodiment 3 of the present disclosure
  • FIG. 4 is a schematic structural diagram of an inter-cluster data migration system according to Embodiment 4 of the present disclosure.
  • FIG. 5 is still another schematic structural diagram of an inter-cluster data migration system according to Embodiment 4 of the present disclosure.
  • FIG. 6 is a schematic structural diagram of a server according to Embodiment 6 of the present disclosure.
  • FIG. 1 is a schematic flowchart diagram of an inter-cluster data migration method according to Embodiment 1 of the present disclosure.
  • the method is used for data migration between the first cluster and the second cluster.
  • the first cluster and the second cluster each include at least one child node. As shown in FIG. 1, the method includes the following steps:
  • Step S100 according to the data migration request, each sub-node in the first cluster writes the respective data in parallel to the specified path of the intermediate storage system.
  • the inter-cluster data migration method provided by the embodiment of the present disclosure may be used for data migration between the first cluster and the second cluster, for example, migrating data in the first cluster to the second cluster, after receiving the data migration request.
  • Each child node in the first cluster writes the respective data in parallel to the specified path of the intermediate storage system, wherein the intermediate storage system is used to store data of each child node in the first cluster, which is independent of the first The storage system of the cluster and the second cluster.
  • the intermediate storage system is a distributed file system with the advantages of large bandwidth, large capacity, and large I/O throughput. Therefore, each sub-node of the data storage system can be supported in parallel. Data is written to the intermediate storage system.
  • Step S102 Read and store data from the specified path of the intermediate storage system in parallel by each child node in the second cluster according to the data redistribution policy of the second cluster.
  • the data redistribution policy defines how the data is redistributed.
  • the data read by each child node in the second cluster may not be stored by the child nodes of the read data. Therefore, the data needs to be performed according to the data redistribution policy.
  • the redistribution is distributed to the child nodes that should store the corresponding data. Specifically, each child node in the second cluster reads data from the specified path of the intermediate storage system in parallel, and stores the data according to the data redistribution policy of the second cluster.
  • FIG. 2 is a schematic flowchart diagram of an inter-cluster data migration method according to Embodiment 2 of the present disclosure.
  • the method is used for data migration between the first cluster and the second cluster.
  • the first cluster and the second cluster each include a primary node and at least one child node. As shown in FIG. 2, the method includes the following steps:
  • Step S100 according to the data migration request, each sub-node in the first cluster writes the respective data in parallel to the specified path of the intermediate storage system.
  • the inter-cluster data migration method provided by the embodiment of the present disclosure may be used for data migration between the first cluster and the second cluster, for example, migrating data in the first cluster to the second cluster, after receiving the data migration request.
  • Each child node in the first cluster writes the respective data in parallel to the specified path of the intermediate storage system, wherein the intermediate storage system is used to store data of each child node in the first cluster, which is independent of the first The storage system of the cluster and the second cluster.
  • the intermediate storage system is a distributed file system with the advantages of large bandwidth, large capacity, and large I/O throughput. Therefore, each sub-node of the data storage system can be supported in parallel. Data is written to the intermediate storage system.
  • Step S100 is the same as step S100 in the first embodiment.
  • Step S101 the primary node of the first cluster backs up the data table structure to the local node in the form of a table file, and sends the table file to the primary node in the second cluster, so that the primary node in the second cluster will structure the data table. Synchronize to each child node in the second cluster.
  • the data is stored in the data table as the primary node and each child node of the first cluster, and the data table structure of the data table defines the fields, types, primary keys, foreign keys, indexes, and the like of the data table, and therefore, the data is Before migrating to the second cluster, the data table structure of the data table needs to be migrated to the second cluster.
  • the primary node of the first cluster stores the data table structure of all the data tables. Therefore, the primary node of the first cluster can use the data table.
  • the structure is backed up to the primary node in the form of a table file, and then the table file is sent to the primary node in the second cluster, and the primary node in the second cluster synchronizes the data table structure to each child node in the second cluster.
  • Step S102 Read and store data from the specified path of the intermediate storage system in parallel by each child node in the second cluster according to the data redistribution policy of the second cluster.
  • the data redistribution policy defines how the data is redistributed.
  • the data read by each child node in the second cluster may not be stored by the child nodes of the read data. Therefore, the data needs to be performed according to the data redistribution policy.
  • the redistribution is distributed to the child nodes that should store the corresponding data. Specifically, each child node in the second cluster reads data from the specified path of the intermediate storage system in parallel, and stores the data according to the data redistribution policy of the second cluster.
  • Step S102 is the same as step S102 in the first embodiment.
  • each child node in the first cluster writes the respective data in parallel to the specified path of the intermediate storage system; each child node in the second cluster follows the The data re-distribution strategy of the two clusters reads and stores data from the specified path of the intermediate storage system in parallel, which improves the data migration speed and reduces the time required for data migration.
  • the second cluster is from the intermediate storage system Read data, therefore, for data migration between any two clusters, not limited to clusters with a small number of child nodes to migrate data to clusters with a large number of child nodes, or data migration between clusters with the same number of child nodes , a wide range of applications, and there will be no defects in the data table between the clusters that cannot be migrated.
  • FIG. 3 is a schematic flowchart diagram of an inter-cluster data migration method according to Embodiment 3 of the present disclosure. As shown in FIG. 3, the method includes the following steps:
  • Step S200 Start a data writing service for connecting each child node in the first cluster with the HDFS system according to the data migration request.
  • the intermediate storage system includes an HDFS system.
  • the HDFS system has the advantages of large bandwidth, large capacity, and large I/O throughput. Therefore, each sub-node of the data storage system can be supported to write data to the HDFS system in parallel.
  • the HDFS system will be described in detail below as an example.
  • each child node in the first cluster After receiving the data migration request, each child node in the first cluster starts a data writing service for connecting the child node to the HDFS system according to the data migration request, wherein the number of the data writing service initiated and the child node The number is the same, and each sub-node corresponds to a data writing service. For example, if the first cluster has 10 sub-nodes, 10 data writing services need to be started, and each sub-node writes its own data through the initiated data writing service. Enter the HDFS system.
  • a configuration file is preset for the data writing service, and the data writing service can obtain a specified path for writing data to the HDFS system by reading the configuration file, wherein the specified path indicates that the data is in the HDFS.
  • the storage path in the system is preset for the data writing service, and the data writing service can obtain a specified path for writing data to the HDFS system by reading the configuration file, wherein the specified path indicates that the data is in the HDFS.
  • Step S201 reading data distributed in each sub-node in the form of a data table, and writing the respective data in the form of a data file to the specified path of the HDFS system in parallel according to the specified path through the data writing service.
  • each child node only stores part of the data of the data table, and data of one data table is distributedly stored in each child node.
  • the first cluster includes a master node and 10
  • the data of the data table A is distributed among the 10 child nodes, which are represented as A1, A2, ..., A10.
  • Each child node can store a large amount of data of the data table.
  • the node 1 can store the data table A, Partial data of B, C, D, for example, A1, B1, C1, D1, each child node can read data of one data table or multiple data tables, for example, read data A1, B1, C1, and then, according to The specified path is written to the HDFS system in the form of a data file in parallel by the initiated data write service.
  • Each child node can store a large number of data tables.
  • a data table storage directory is automatically created in a specified path for different data tables.
  • the directory name of the data table storage directory includes at least a data table identifier, and the directory is stored according to the data table.
  • Data is written to the HDFS system as a data file through the data write service.
  • each child node stores a large amount of data in the data table.
  • the table automatically creates a data table storage directory under the specified path, where the directory name of the data table storage directory includes at least a data table identifier, for example, a data table name, to quickly identify data of each data table according to the data table identifier, for example
  • the data table storage directory with the directory names A, B, C, and D is automatically created in the specified path.
  • each child node Since each child node stores only part of the data of the data table, for each data table, according to the data table storage directory, part of the data of the data table stored by the child node is used as a separate data file by the data writing service.
  • the form is written into the HDFS system.
  • the data of the data table A is distributed and stored in 10 sub-nodes, which are represented as A1, A2, ..., A10. Therefore, part of the data A1, A2, ..., A10 can be respectively
  • the form of the separate data file is stored in the data storage directory named A, that is, in the data storage directory named A, there are 10 data files stored.
  • the data file in the HDFS system with the data table identifier and the child node ID, and carry the data write time information, such as the timestamp, and the time information of the current data write.
  • the current write time is 2017- 6-29.
  • the number of data files stored in the HDFS system is related to the number of data tables and the number of child nodes.
  • the number of data tables is 10 and the number of child nodes is 10.
  • the number of data files stored in the HDFS system is 10*10. , that is, 100, here is only an example, and does not have any limiting effect.
  • Step S202 the primary node of the first cluster backs up the data table structure to the local node in the form of a table file, and sends the table file to the primary node in the second cluster, so that the primary node in the second cluster will structure the data table. Synchronize to each child node in the second cluster.
  • the data is stored in the data table in each child node of the first cluster, and the data table structure of the data table defines the fields, types, primary keys, foreign keys, indexes, and the like of the data table, so the data is migrated to the first Before the second cluster, the data table structure of the data table needs to be migrated to the second cluster.
  • the primary node of the first cluster stores the data table structure of all the data tables. Therefore, the primary node of the first cluster can use the data table structure as a table.
  • the file form is backed up to the local node, and then the table file is sent to the primary node in the second cluster, and the primary node in the second cluster synchronizes the data table structure to each child node in the second cluster.
  • Step S203 after the data writing is completed, cancel the data writing service for connecting each child node in the first cluster with the HDFS system.
  • the data write service is for data write. After the data is written, the data write service has been completed. To save resources, you can log out to connect each child node in the first cluster with HDFS. The system's data is written to the service.
  • step S204 the HDFS system performs compression processing on each data file in the specified path to obtain a compressed data file.
  • the HDFS system can compress and process each data file in the specified path to store the compressed data file.
  • Step S205 Start a data reading service for connecting each child node in the second cluster with the HDFS system according to the data migration request.
  • each child node in the second cluster After receiving the data migration request, each child node in the second cluster starts a data reading service for connecting the child node to the HDFS system according to the data migration request, wherein the number of the data reading service initiated and the child node The number is the same, and each child node corresponds to one data reading service. For example, if the second cluster has 5 child nodes, 5 data reading services need to be started, and each child node reads data through the started data reading service. storage.
  • a configuration file is preset for the data reading service, and the data reading service can obtain a specified path for reading data in the HDFS system by reading the configuration file, wherein the specified path indicates that the data is in the HDFS system. Storage path.
  • the data can be migrated to any cluster system.
  • the number of child nodes included in the cluster system is not limited. That is, the number of child nodes included in the second cluster can be the same as the number of child nodes included in the first cluster. The difference may be different. For example, the number of child nodes included in the second cluster may be greater or smaller than the number of child nodes included in the first cluster.
  • Step S206 reading the data file in the HDFS system through the data reading service according to the specified path.
  • the data reading service is pre-configured with a specified path for reading data in the HDFS system. Therefore, each child node in the second cluster can read the data file in the HDFS system through the data reading service according to the specified path.
  • each child node can read data files in parallel, and can also read data files of multiple data tables in parallel, thereby improving the efficiency of data migration and saving time required for data migration.
  • Step S207 decompressing the read data file.
  • the data files read by each child node are compressed, so the decompression process needs to be performed first to obtain the decompressed data file.
  • Step S208 determining, according to the data redistribution policy, whether each data fragment in the data file belongs to the data to be stored by the child node, and if yes, executing step S209; if not, executing step S210.
  • each data file when data is written, each data file stores a plurality of data fragments. Therefore, after each child node reads the data file, it is also required to determine whether the data in the data file belongs to the child node and needs to be stored.
  • the data redistribution policy may sequentially determine whether each data fragment in the data file belongs to the data to be stored by the child node, and if it is determined that the data fragment does not belong to the data to be stored by the child node, The data fragment is distributed to the corresponding node for storage; if it is determined that the data fragment belongs to the data to be stored by the child node, the child node stores the corresponding data fragment.
  • the following method may be specifically used to determine whether each data fragment in the data file belongs to the data to be stored by the child node: determining data belonging to the preset distribution column in the data fragment; The data of the distribution column is hashed to obtain a hash value. According to the hash value, it is determined whether each data fragment in the data file belongs to the data to be stored by the child node.
  • the data belonging to the preset distribution column is hashed to obtain a hash value.
  • the MD5 algorithm or the SHA-1 algorithm may be used to preset the distribution.
  • the data of the column is hashed, which is only an example, and does not have any limiting effect. Then, according to the hash value, it is judged whether each data fragment in the data file belongs to the data to be stored by the child node.
  • Step S209 the corresponding data segment is stored by the child node.
  • step S210 the data fragments are distributed to the corresponding child nodes for storage.
  • the data may be redistributed according to the hash value to the corresponding child node for storage.
  • Step S211 after the data reading is completed, the data reading service for connecting each child node in the second cluster with the HDFS system is cancelled.
  • the data reading service serves the data reading. After the data reading is completed, the data reading service plays the role. In order to save resources, the child nodes and the HDFS connected to the second cluster can be cancelled. The system's data reading service.
  • each child node in the first cluster writes the respective data in parallel to the specified path of the intermediate storage system; each child node in the second cluster follows the The data re-distribution strategy of the two clusters reads and stores data from the specified path of the intermediate storage system in parallel, and does not need to start the transmission process for each data table, thereby improving the data migration speed and reducing the time required for data migration.
  • the second cluster reads data from the intermediate storage system, it is applicable to data migration between any two clusters, and is not limited to clustering data from a cluster having a small number of child nodes to a cluster having a large number of child nodes, or a child.
  • the data migration between the clusters with the same number of nodes has a wide range of application, and overcomes the defect that the data of the cluster with a large number of child nodes cannot be migrated to the cluster with a small number of child nodes in the prior art, and the data of the data table is The table structure is backed up locally and then transferred to the second cluster so that there is no data table that cannot be migrated between clusters. Defects.
  • FIG. 4 is a schematic structural diagram of an inter-cluster data migration system according to Embodiment 4 of the present disclosure.
  • the system is configured to perform data migration between the first cluster and the second cluster.
  • the system includes: a first cluster 300, a second cluster 310, and an intermediate storage system 320, where the first cluster includes at least One child node 302, the second cluster includes at least one child node 312.
  • Each child node in the first cluster is adapted to write the respective data to the specified path of the intermediate storage system in parallel according to the data migration request;
  • Each child node in the second cluster is adapted to read data from the specified path of the intermediate storage system and store it in parallel according to the data redistribution policy of the second cluster.
  • the first cluster may further include a master node 301
  • the second cluster may further include a master node 311.
  • the master node of the first cluster is adapted to the master node of the first cluster to have a data table structure. Backing up to the primary node in the form of a table file, and sending the table file to the primary node in the second cluster, so that the primary node in the second cluster synchronizes the data table structure to each child node in the second cluster.
  • each of the child nodes in the first cluster is further adapted to: initiate, according to the data migration request, a data writing service for connecting each child node in the first cluster with the intermediate storage system, where the data writing service is pre- Configured with a specified path for writing data to the intermediate storage system;
  • each sub-node in the form of a data table is read, and the respective data is written in parallel to the specified path of the intermediate storage system in the form of a data file by the data writing service according to the specified path.
  • each child node in the first cluster is further adapted to: after the data writing is completed, log out a data writing service for connecting each child node in the first cluster with the intermediate storage system.
  • each sub-node in the first cluster is further adapted to automatically create a data table storage directory under a specified path for different data tables, where the directory name of the data table storage directory at least includes a data table identifier;
  • the respective data is written in parallel to the specified path of the intermediate storage system in the form of a data file by the data writing service.
  • each child node stores part of the data of the data table
  • each child node in the first cluster is further adapted to: write, for each data table, part of the data of the data table stored by the child node into the middle by a data writing service in the form of a single data file. Under the specified path of the storage system.
  • the intermediate storage system is configured to perform compression processing on each data file in the specified path to obtain a compressed data file.
  • the data file in the intermediate storage system is named by the data table identifier and the child node identifier, and carries data writing time information.
  • the number of data files stored in the intermediate storage system is related to the number of data tables and the number of child nodes.
  • each sub-node in the second cluster is further adapted to: start, according to the data migration request, a data reading service for connecting each sub-node in the second cluster with the intermediate storage system, where the data reading service is pre- Configured with a specified path for reading data in the intermediate storage system;
  • the child node stores the corresponding data fragment
  • the data fragments are distributed to the corresponding child nodes for storage.
  • each sub-node in the second cluster is further adapted to: determine data belonging to the preset distribution column in the data fragment;
  • the distribution unit is further adapted to: if each data fragment in the data file does not belong to the data to be stored by the child node, distribute the data to the corresponding child node for storage according to the hash value.
  • each child node in the second cluster is further adapted to: perform decompression processing on the read data file.
  • each child node in the second cluster is further adapted to: after the data reading is completed, cancel the data reading service for connecting each child node in the second cluster with the intermediate storage system.
  • the number of child nodes in the first cluster is greater than the number of child nodes in the second cluster.
  • the intermediate storage system includes: an HDFS system.
  • each child node in the first cluster writes the respective data in parallel to the specified path of the intermediate storage system; each child node in the second cluster follows the The data re-distribution strategy of the two clusters reads and stores data from the specified path of the intermediate storage system in parallel, which improves the data migration speed and reduces the time required for data migration.
  • the second cluster is from the intermediate storage system Read data, therefore, for data migration between any two clusters, not limited to clusters with a small number of child nodes to migrate data to clusters with a large number of child nodes, or data migration between clusters with the same number of child nodes , a wide range of applications, and there will be no defects in the data table between the clusters that cannot be migrated.
  • the fifth embodiment provides a non-transitory computer readable storage medium storing at least one executable instruction, the computer executable instruction being executable in any of the above method embodiments Inter-cluster data migration method.
  • FIG. 6 is a schematic structural diagram of a server according to Embodiment 6 of the present disclosure, and the specific embodiment of the present disclosure does not limit the specific implementation of the server.
  • the server may include a processor 402, a Communications Interface 404, a memory 406, and a communication bus 408.
  • the processor 402, the communication interface 404, and the memory 406 complete communication with one another via the communication bus 408.
  • the communication interface 404 is configured to communicate with network elements of other devices, such as clients or other servers.
  • the processor 402 is configured to execute the program 410, and specifically, the related steps in the foregoing embodiment of the inter-cluster data migration method.
  • program 410 can include program code, the program code including computer operating instructions.
  • the processor 402 may be a central processing unit CPU, or an Application Specific Integrated Circuit (ASIC), or one or more integrated circuits configured to implement the embodiments of the present disclosure.
  • the server includes one or more processors, which may be the same type of processor, such as one or more CPUs; or may be different types of processors, such as one or more CPUs and one or more ASICs.
  • the memory 406 is configured to store the program 410.
  • Memory 406 may include high speed RAM memory and may also include non-volatile memory, such as at least one disk memory.
  • the program 410 may be specifically configured to cause the processor 402 to perform the following operations: according to the data migration request, each sub-node in the first cluster writes the respective data in parallel to the specified path of the intermediate storage system;
  • the primary node of the first cluster backs up the data table structure to the primary node in the form of a table file, and sends the table file to the primary node in the second cluster, so that the primary node in the second cluster synchronizes the data table structure to the first node.
  • Each child node in the second cluster ;
  • the data is read and stored from the designated path of the intermediate storage system in parallel by the respective child nodes in the second cluster in accordance with the data redistribution policy of the second cluster.
  • the program 410 is further configured to cause the processor 402 to write the respective data in parallel to the specified path of the intermediate storage system by each child node in the first cluster according to the data migration request.
  • a data writing service for connecting each child node in the first cluster with the intermediate storage system, wherein the data writing service is preconfigured with a specified path for writing data into the intermediate storage system;
  • each sub-node in the form of a data table is read, and the respective data is written in parallel to the specified path of the intermediate storage system in the form of a data file by the data writing service according to the specified path.
  • the program 410 is further configured to cause the processor 402 to perform the following operations: after the data writing is completed, the data writing for connecting the respective child nodes in the first cluster and the intermediate storage system is cancelled. service.
  • the program 410 is further configured to cause the processor 402 to write the respective data in the form of a data file to the specified path of the intermediate storage system in parallel according to the specified path through the data writing service.
  • the data table storage directory is automatically created in the specified path, and the directory name of the data table storage directory includes at least the data table identifier;
  • the respective data is written in parallel to the specified path of the intermediate storage system in the form of a data file by the data writing service.
  • each child node stores part of the data of the data table
  • the program 410 is also operative to cause the processor 402 to write the respective data in the form of a data file to a specified path of the intermediate storage system in parallel by the data write service:
  • part of the data of the data table stored by the child node is written into the specified path of the intermediate storage system by a data writing service in the form of a single data file.
  • the program 410 is further configured to cause the processor 402 to perform compression processing on each data file in the specified path to obtain a compressed data file.
  • the data file in the intermediate storage system is named by the data table identifier, the child node identifier, and carries the data write time information.
  • the number of data files stored in the intermediate storage system is related to the number of data tables and the number of child nodes.
  • the program 410 is further configured to cause the processor 402 to read from the specified path of the intermediate storage system in parallel by the respective child nodes in the second cluster according to the data redistribution policy of the second cluster.
  • the child node stores the corresponding data fragment
  • the data fragments are distributed to the corresponding child nodes for storage.
  • the program 410 is further configured to cause the processor 402 to sequentially determine, according to the data redistribution policy, whether each data fragment in the data file belongs to the data to be stored by the child node:
  • the data is distributed to the corresponding child nodes for storage according to the hash value.
  • the program 410 is further configured to cause the processor 402 to perform the following operations: decompressing the read data file.
  • the program 410 is further configured to cause the processor 402 to perform the following operations: after the data reading is completed, cancel the data reading for connecting each sub-node in the second cluster with the intermediate storage system. service.
  • the number of child nodes in the first cluster is greater than the number of child nodes in the second cluster.
  • the intermediate storage system includes: an HDFS system.
  • modules in the devices of the embodiments can be adaptively changed and placed in one or more devices different from the embodiment.
  • the modules or units or components of the embodiments may be combined into one module or unit or component, and further they may be divided into a plurality of sub-modules or sub-units or sub-components.
  • any combination of the features disclosed in the specification, including the accompanying claims, the abstract and the drawings, and any methods so disclosed, or All processes or units of the device are combined.
  • Each feature disclosed in this specification (including the accompanying claims, the abstract and the drawings) may be replaced by alternative features that provide the same, equivalent or similar purpose.
  • Various component embodiments of the present disclosure may be implemented in hardware, or in a software module running on one or more processors, or in a combination thereof.
  • a microprocessor or digital signal processor may be used in practice to implement some or all of the functionality of some or all of the components of the inter-cluster data migration device in accordance with embodiments of the present disclosure.
  • the present disclosure may also be implemented as a device or device program (e.g., a computer program and a computer program product) for performing some or all of the methods described herein.
  • Such a program implementing the present disclosure may be stored on a computer readable medium or may be in the form of one or more signals. Such signals may be downloaded from an Internet website, provided on a carrier signal, or provided in any other form.
  • the present disclosure can be implemented by means of hardware comprising several distinct elements and by means of a suitably programmed computer.
  • several of these means can be embodied by the same hardware item.
  • the use of the words first, second, and third does not indicate any order. These words can be interpreted as names.

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

L'invention concerne un procédé et un système de transfert de données entre groupes, un serveur et un support de stockage non volatil lisible par ordinateur. Le procédé est utilisé pour un transfert de données entre un premier groupe et un second groupe, le premier groupe et le second groupe comprenant tous les deux au moins un sous-nœud. Le procédé consiste : selon une demande de transfert de données, à écrire des données respectives sur un trajet spécifié d'un système de stockage intermédiaire de manière parallèle par le biais de chaque sous-nœud dans le premier groupe (S100) ; selon une stratégie de redistribution de données du second groupe, à lire et à stocker les données à partir du trajet spécifié du système de stockage intermédiaire de manière parallèle par le biais de chaque sous-nœud dans le second groupe (S102).
PCT/CN2018/079027 2017-06-30 2018-03-14 Procédé et système de transfert de données entre groupes, serveur et support de stockage informatique WO2019001017A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710555588.0A CN107391629B (zh) 2017-06-30 2017-06-30 集群间数据迁移方法、系统、服务器及计算机存储介质
CN201710555588.0 2017-06-30

Publications (1)

Publication Number Publication Date
WO2019001017A1 true WO2019001017A1 (fr) 2019-01-03

Family

ID=60335391

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/079027 WO2019001017A1 (fr) 2017-06-30 2018-03-14 Procédé et système de transfert de données entre groupes, serveur et support de stockage informatique

Country Status (2)

Country Link
CN (1) CN107391629B (fr)
WO (1) WO2019001017A1 (fr)

Families Citing this family (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107391629B (zh) * 2017-06-30 2021-01-29 三六零科技集团有限公司 集群间数据迁移方法、系统、服务器及计算机存储介质
CN108052664A (zh) * 2017-12-29 2018-05-18 北京小度信息科技有限公司 数据库存储集群的数据迁移方法和装置
CN110928943B (zh) * 2018-08-29 2023-06-20 阿里云计算有限公司 一种分布式数据库及数据写入方法
CN109298974B (zh) * 2018-09-30 2023-04-07 平安科技(深圳)有限公司 系统控制方法、装置、计算机及计算机可读存储介质
CN111444008B (zh) * 2018-12-29 2024-04-16 北京奇虎科技有限公司 集群间服务迁移方法及装置
CN110287060B (zh) * 2019-06-06 2021-06-22 郑州阿帕斯科技有限公司 一种数据的处理方法、装置
CN111258985A (zh) * 2020-01-17 2020-06-09 中国工商银行股份有限公司 数据集群迁移方法及装置
CN111459411B (zh) * 2020-03-30 2023-07-21 北京奇艺世纪科技有限公司 数据迁移方法、装置、设备及存储介质
CN111708763B (zh) * 2020-06-18 2023-12-01 北京金山云网络技术有限公司 分片集群的数据迁移方法、装置和分片集群系统
CN112035064B (zh) * 2020-08-28 2024-09-20 浪潮云信息技术股份公司 一种用于对象存储的分布式迁移方法
CN112506606A (zh) * 2020-11-23 2021-03-16 北京达佳互联信息技术有限公司 集群中容器的迁移方法、装置、设备和介质
CN112861188A (zh) * 2021-02-01 2021-05-28 青岛易来智能科技股份有限公司 用于多集群的数据汇集系统和方法
CN113050890B (zh) * 2021-03-26 2024-10-18 北京沃东天骏信息技术有限公司 一种数据迁移方法和装置
CN114615263A (zh) * 2022-02-10 2022-06-10 深圳市小满科技有限公司 集群在线迁移方法、装置、设备及存储介质
CN115103020B (zh) * 2022-08-25 2022-11-15 建信金融科技有限责任公司 数据迁移处理方法和装置
CN115905167B (zh) * 2022-11-10 2023-11-21 上海威固信息技术股份有限公司 一种可快速迁移数据的智能化数据存储方法及系统

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6499058B1 (en) * 1999-09-09 2002-12-24 Motokazu Hozumi File shared apparatus and its method file processing apparatus and its method recording medium in which file shared program is recorded and recording medium in which file processing program is recorded
CN102737130A (zh) * 2012-06-21 2012-10-17 广州从兴电子开发有限公司 处理hdfs元数据的方法及系统
CN103365740A (zh) * 2012-04-06 2013-10-23 腾讯科技(深圳)有限公司 一种数据冷备方法及装置
CN107391629A (zh) * 2017-06-30 2017-11-24 北京奇虎科技有限公司 集群间数据迁移方法、系统、服务器及计算机存储介质

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014057520A1 (fr) * 2012-10-11 2014-04-17 Hitachi, Ltd. Serveur de fichiers de destination de migration et procédé de migration d'un système de fichiers
US9159149B2 (en) * 2013-03-14 2015-10-13 Interntional Business Machines Corporation Visualizing data transfers in distributed file system
CN103500146B (zh) * 2013-09-30 2016-04-27 北京邮电大学 虚拟机磁盘存储数据迁移方法和系统
CN106708902A (zh) * 2015-11-18 2017-05-24 青岛海日安电子有限公司 数据库数据迁移方法及系统
CN106777225B (zh) * 2016-12-26 2021-04-06 腾讯科技(深圳)有限公司 一种数据的迁移方法和系统

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6499058B1 (en) * 1999-09-09 2002-12-24 Motokazu Hozumi File shared apparatus and its method file processing apparatus and its method recording medium in which file shared program is recorded and recording medium in which file processing program is recorded
CN103365740A (zh) * 2012-04-06 2013-10-23 腾讯科技(深圳)有限公司 一种数据冷备方法及装置
CN102737130A (zh) * 2012-06-21 2012-10-17 广州从兴电子开发有限公司 处理hdfs元数据的方法及系统
CN107391629A (zh) * 2017-06-30 2017-11-24 北京奇虎科技有限公司 集群间数据迁移方法、系统、服务器及计算机存储介质

Also Published As

Publication number Publication date
CN107391629A (zh) 2017-11-24
CN107391629B (zh) 2021-01-29

Similar Documents

Publication Publication Date Title
WO2019001017A1 (fr) Procédé et système de transfert de données entre groupes, serveur et support de stockage informatique
US10853242B2 (en) Deduplication and garbage collection across logical databases
CN106777225B (zh) 一种数据的迁移方法和系统
WO2016197994A1 (fr) Procédé et dispositif d'extension de capacité
US10291696B2 (en) Peer-to-peer architecture for processing big data
WO2016180055A1 (fr) Procédé, dispositif et système de stockage et de lecture de données
US20150074671A1 (en) Anticipatory warm-up of cluster resources for jobs processed on multiple cluster nodes
US9952940B2 (en) Method of operating a shared nothing cluster system
US10157214B1 (en) Process for data migration between document stores
US11245774B2 (en) Cache storage for streaming data
CN109643310B (zh) 用于数据库中数据重分布的系统和方法
CN108228102B (zh) 节点间数据迁移方法、装置、计算设备及计算机存储介质
CN112199427A (zh) 一种数据处理方法和系统
US20170161313A1 (en) Detection and Resolution of Conflicts in Data Synchronization
US9984139B1 (en) Publish session framework for datastore operation records
US10298709B1 (en) Performance of Hadoop distributed file system operations in a non-native operating system
CN110858194A (zh) 一种数据库扩容的方法和装置
CN107391033B (zh) 数据迁移方法及装置、计算设备、计算机存储介质
WO2019001021A1 (fr) Procédé, appareil et système de traitement de données, serveur et support de stockage informatique
CN111225003B (zh) 一种nfs节点配置方法和装置
TW201738781A (zh) 資料表連接方法及裝置
CN110532123A (zh) HBase系统的故障转移方法及装置
JP7038864B2 (ja) 検索サーバの集中型ストレージ
US8914324B1 (en) De-duplication storage system with improved reference update efficiency
US11157456B2 (en) Replication of data in a distributed file system using an arbiter

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18824632

Country of ref document: EP

Kind code of ref document: A1