WO2021238701A1 - 数据迁移方法以及装置 - Google Patents

数据迁移方法以及装置 Download PDF

Info

Publication number
WO2021238701A1
WO2021238701A1 PCT/CN2021/094094 CN2021094094W WO2021238701A1 WO 2021238701 A1 WO2021238701 A1 WO 2021238701A1 CN 2021094094 W CN2021094094 W CN 2021094094W WO 2021238701 A1 WO2021238701 A1 WO 2021238701A1
Authority
WO
WIPO (PCT)
Prior art keywords
database
target
replica
data migration
data
Prior art date
Application number
PCT/CN2021/094094
Other languages
English (en)
French (fr)
Inventor
李鑫
潘岳
张浩然
郑博文
李飞飞
Original Assignee
阿里巴巴集团控股有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 阿里巴巴集团控股有限公司 filed Critical 阿里巴巴集团控股有限公司
Publication of WO2021238701A1 publication Critical patent/WO2021238701A1/zh
Priority to US18/070,450 priority Critical patent/US20230087447A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/214Database migration support
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • G06F16/275Synchronous replication

Definitions

  • the embodiments of this specification relate to the field of database technology, and in particular to a data migration method.
  • One or more embodiments of this specification simultaneously relate to a data migration device, a computing device, and a computer-readable storage medium.
  • the embodiment of this specification provides a data migration method.
  • One or more embodiments of this specification also involve a data migration device, a computing device, and a computer-readable storage medium, so as to solve the technical defects in the prior art.
  • a data migration method including:
  • routing configuration is performed on the target master replica database of the target database.
  • the source database is composed of the master replica database and at least one slave replica database;
  • the target database is composed of at least two replica databases.
  • the data migration log is synchronized to the target database in the following manner:
  • the synchronization of the data migration log to the target database according to the upstream and downstream connection relationship of each node in the log synchronization link includes:
  • the data migration log is synchronized from the at least one replica database to the at least two replica databases.
  • the performing routing configuration on the target master replica database of the target database according to the routing rules of the master replica database includes:
  • the target master replica database is determined in the following manner:
  • the creating a snapshot of the data to be migrated in the source database and migrating the snapshot to the target database includes:
  • the snapshot is migrated to the target database according to a preset migration manner.
  • the routing rules of the master replica database are performed to the target database.
  • the routing rules of the master replica database are performed to the target database.
  • the method further includes:
  • a data migration device including:
  • the obtaining module is configured to obtain the migration request for the source database
  • a creation module configured to create a snapshot of the data to be migrated in the source database, and migrate the snapshot to the target database;
  • a reading module configured to read the data migration log stored in the primary replica database of the source database, and synchronize with the target database
  • the configuration module is configured to perform routing configuration on the target master replica database of the target database according to the routing rules of the master replica database.
  • a computing device including:
  • the memory is used to store computer-executable instructions
  • the processor is used to execute the computer-executable instructions:
  • routing configuration is performed on the target master replica database of the target database.
  • a computer-readable storage medium which stores computer-executable instructions, which implement the steps of the data migration method when the instructions are executed by a processor.
  • An embodiment of this specification obtains a migration request for the source database, creates a snapshot of the data to be migrated in the source database, migrates the snapshot to the target database, and reads the data stored in the master copy database of the source database Migrating logs, synchronizing with the target database, and performing routing configuration on the target master replica database of the target database according to the routing rules of the master replica database;
  • the data migration log is synchronized by incremental synchronization, and the route switch is performed during the log synchronization process.
  • it will not block the use of Incremental synchronization of the synchronization link is conducive to improving the efficiency of log synchronization; on the other hand, route switching is not performed during the full migration process, so data can be read and written normally during the full migration process, which is conducive to improving data read and write efficient.
  • FIG. 1 is a processing flowchart of a data migration method provided by an embodiment of this specification
  • Figure 2 is a schematic diagram of a database expansion process provided by an embodiment of this specification
  • Fig. 3 is a process flow chart of a data migration method provided by an embodiment of this specification.
  • FIG. 4 is a schematic diagram of a data migration device provided by an embodiment of this specification.
  • Fig. 5 is a structural block diagram of a computing device provided by an embodiment of this specification.
  • first, second, etc. may be used to describe various information in one or more embodiments of this specification, the information should not be limited to these terms. These terms are only used to distinguish the same type of information from each other.
  • the first may also be referred to as the second, and similarly, the second may also be referred to as the first.
  • word “if” as used herein can be interpreted as "when” or “when” or “in response to a certainty”.
  • Raft algorithm a consensus algorithm that provides a general method for distributing state machines in a computing system cluster, ensuring that each node in the cluster can achieve the same state transition.
  • Data mirroring backup tool an effective data transfer and file synchronization tool, written in C language, is widely integrated and used in Unix-like operating systems.
  • a data migration method is provided.
  • This specification also relates to a data migration device, a computing device, and a computer-readable storage medium, which are described in detail in the following embodiments.
  • FIG. 1 shows a processing flowchart of a data migration method according to an embodiment of the present specification, including step 102 to step 108.
  • Step 102 Obtain a migration request for the source database.
  • the data migration method provided in the embodiment of this specification is applied to a distributed database system, and the data to be migrated in the source database is migrated to the target database in a snapshot export mode.
  • the data migration log is synchronized by incremental synchronization.
  • Route switching during the log synchronization process on the one hand, will not block the synchronization link used for incremental synchronization, which is conducive to improving the efficiency of log synchronization; on the other hand, route switching is not performed during the full migration process, therefore, Data can be read and written normally during the full migration process, which is beneficial to improve the efficiency of data read and write.
  • the data migration method In order to ensure high availability in the linear expansion and contraction process of the distributed database system, the data migration method provided in the embodiment of this specification uses a full-plus-increment data migration method, so that the entire migration process and the routing switching process do not affect the system availability.
  • the data migration is to migrate the data in the source database to the target database;
  • the source database is composed of the master replica database and at least one slave replica database, and the target database is composed of at least two replica databases;
  • a snapshot of the full amount of data in the source database can be created, and all the data in the source database can be migrated to the target database by means of snapshot migration.
  • Step 104 Create a snapshot of the data to be migrated in the source database, and migrate the snapshot to the target database.
  • the source database contains a master replica database and at least one slave replica database
  • a snapshot of the data to be migrated in the source database is created, and all The snapshot is migrated to the target database, that is, a master copy database of the source database or any snapshot of all data to be migrated from the copy database is created, and the snapshot is migrated to the target database according to a preset migration mode.
  • the master replica database in the source database and the data stored in at least one slave replica database are the same, during the data migration process, only the master replica database or at least one slave replica database in the source database needs to be The data stored in any replica database can be synchronized. Therefore, a snapshot of the data to be migrated in the source database is created, that is, a master replica database of the source database or a snapshot of all the data to be migrated from the replica database is created. ; After the snapshot is created, the snapshot is migrated in parallel to at least two replica databases of the target database according to the preset migration method.
  • the preset migration method may be a data mirroring backup tool (rsync tool), that is, using the rsync tool All data is migrated.
  • the source database Before completing the data migration, the source database provides data read and write functions. Therefore, before the route is switched from the source database to the target database, the status of the master replica database and at least one slave replica database in the source database is active (active processing) Data read and write requests), and the state of at least two replica databases in the target database is passive (idle state) in preparation for the state to become active.
  • Step 106 Read the data migration log stored in the primary replica database of the source database, and synchronize with the target database.
  • the data migration log is a separate file from the database file. It stores all changes made to the database, and records all inserts, updates, deletes, commits, rollbacks, and database schema changes.
  • the data migration log is an important component for data/file backup and recovery. Therefore, in the source database After the data is fully migrated to the target database, the data migration log needs to be incrementally migrated (synchronized) to the target database.
  • the raft algorithm can be used to synchronize the data migration log.
  • the raft algorithm divides the roles in the distributed database system into leader (leader node), follower (follower node) and candidate (candidate node).
  • leader leader node
  • follower node follower node
  • candidate candidate node
  • the master replica database contained in the source database is the leader node
  • at least one slave replica database contained in the source database is the follower node. Because before the data migration log is synchronized to the target database, the target master does not exist in the target database. The replica database, therefore, before the data migration log is synchronized to the target database, at least two replica databases contained in the target database are in passive state.
  • the at least two replica databases After the data migration log is synchronized to the target database, at least two of the target databases After the status of the replica database is converted from passive to active, the at least two replica databases can be used as switching targets of the leader node (that is, the at least two replica databases can be used as candidate nodes).
  • the leader node is used to receive data read and write requests sent by the client, and synchronize the data migration log to the follower node.
  • the leader node can send prompt information to submit the log to the follower node;
  • the follower node is used to receive and persist the data migration log synchronized by the leader node. After receiving the prompt information of the commit log sent by the leader node, the log is submitted; the candidate node is a temporary role in the leader node election process.
  • the state of the leader node and follower node in the source database is active, and the state of at least two replica databases in the target database is passive.
  • at least two replica databases contained in the target database All are in the passive state.
  • the at least two replica databases can serve as candidate nodes, and the source database
  • the leader node and follower node are used to elect at least two replica databases in the target database when the log synchronization progress meets the preset threshold, that is, elect the target master replica database in the candidate node.
  • the raft-log synchronization mechanism can be used to synchronize data migration logs. Since there can only be one leader node in the raft distributed database system, the log can only be copied from the leader node to the follower node, while the raft algorithm
  • An application scenario is the replication state machine.
  • the client participant
  • the raft algorithm is responsible for copying the command to other state machines in the form of log. If different state machines are consistent The same output results can be obtained by executing these commands in the same order. Therefore, the consensus algorithm can be used to ensure that the content and sequence of the synchronized data migration log are consistent.
  • the data migration log can be synchronized through the log synchronization link. Therefore, before synchronizing the logs, a log synchronization link needs to be established and the logs are synchronized to the target database through the log synchronization link.
  • the specific method can be as follows accomplish:
  • synchronizing the data migration log to the target database according to the upstream and downstream connection relationship of each node in the log synchronization link can be specifically implemented in the following manner:
  • the data migration log is synchronized from the at least one replica database to the at least two replica databases.
  • the data migration log is a separate file from the database file. It stores all changes made to the database, and records the insertion, update, deletion, submission, rollback, and database mode changes of data in the database.
  • the data migration log is an important component for data/file backup and recovery, and The log is stored in the leader node.
  • the leader node synchronizes the log to the follower node, and then synchronizes the log from the follower node to at least two replica databases of the target database. Therefore, before synchronizing the data migration log, it must be based on the master replica database, at least A log synchronization link is established from the synchronization sequence of the replica database, at least two replica databases, and data migration logs among the nodes. After the log synchronization link is established, the upstream and downstream connection relationship of each node in the link can be followed , To synchronize the data migration log to the target database.
  • Step 108 Perform routing configuration on the target master replica database of the target database according to the routing rules of the master replica database.
  • routing can be configured for the replica database of the target database, that is, the routing of the master replica database in the source database is switched to the target master replica database of the target database, and the routing can provide data reading and writing functions.
  • the synchronization progress of the data migration log can be determined in the following ways:
  • routing configuration is performed for the target master replica database of the target database according to the routing rules of the master replica database in the source database.
  • the process of judging whether the value of the log entry that has been synchronized in the data migration log meets a preset threshold can be used to synchronize data to the target database.
  • the log migration process is performed at the same time. If it is determined that the log entry value of the synchronization completed meets the preset threshold, the route can be switched during the log synchronization process, that is, the source database is switched to the target database; if it is determined that the synchronized log is completed If the entry value does not meet the preset threshold, just continue log synchronization.
  • the leader node After receiving these requests, the leader node first synchronizes the requests to the follower nodes in the form of logs. After receiving the follower node’s response to the successful log replication, it updates the log location. And call the interface, execute the counting operation in the request, and add the instruction sent by the client to the counter.
  • the log site After the log site is updated, it means that the logs before the log site (including this site) have been copied to more than half of the nodes in the system. If the position is at the position of "2", it means that the logs of "0-2" have been copied to more than half of the nodes. If the leader continues to copy the two logs of "3" and "4" to the follower node in batches, Then the location slides to the "4" position, indicating that the logs of "0-4" have been copied to more than half of the nodes.
  • the leader node in the source database initiates the member change operation, which is about At least one of the source databases is switched from the status of the replica database from active to passive, and the status of at least two replica databases in the target database is switched from passive to active, and the role of the at least two replica databases is changed to follower node.
  • the target master copy database of the target database can be determined through elections, which can be implemented in the following ways:
  • the statistics of the voting results are dynamically carried out along with the voting process.
  • the master replica database in the source database and at least one slave replica database vote for at least two replica databases in the target database, and the target database is determined according to the voting results.
  • the target master replica database of the target database that is, during the voting process, the number of votes of at least two replica databases in the target database is dynamically counted, and the first replica database of the at least two replica databases with a number of votes greater than a preset threshold is determined as The target master copy database.
  • raft uses a heartbeat to trigger leader election, initializes the roles of at least two replica databases in the target database as follower nodes, and uses the master replica database in the source database and at least one slave replica database as voting members.
  • the follower nodes in the target database are elected to generate the target master replica database in the target database.
  • the voting results are counted, and the first follower node in the target database whose number of votes is greater than the preset threshold is determined as the leader node; in practical applications, the preset threshold can be based on The number of replica databases in the source database is determined. For example, if the source database includes 3 replica databases (1 master replica data and 2 slave replica databases), the preset threshold can be set to 2 (the rate of votes is greater than 50). %), and determine the first follower node in the target database with a number of votes greater than 2 as the leader node.
  • the basis for voting may be the synchronization progress of the data migration logs in each node, that is, the follower node with the most synchronized log entries has a higher probability of becoming the leader node.
  • the target master replica database can be used to receive data read and write requests from the client.
  • the processing process of the data read and write request can be specifically implemented in the following ways:
  • the target master replica database (leader node) can provide data read and write services; the leader node adds the request as log entries to its log, and then parallel Synchronize log entries with other follower nodes. When this log is synchronized to most of the follower nodes, the leader node applies this log to its state machine and returns the execution result to the client.
  • the embodiment of this specification takes the source database containing three replica databases as an example for description.
  • a schematic diagram of the expansion process of a distributed database system is shown in FIG. 2, and the distributed database system shown in FIG. 2 contains four sources.
  • the databases are DB1, DB2, DB3, and DB4, and the data migration process of DB2 is taken as an example for schematic illustration.
  • the data migration process of DB1, DB3, and DB4 is similar to the data migration process of DB2, and will not be repeated here.
  • the source database DB2 and the target database DB2-1 both contain 3 replica databases.
  • one replica database in the source database is the master replica database (leader node), and the remaining two
  • the replica database is a slave replica database (follower node).
  • the 3 replica databases in the source database are still in the active state, and the 3 replica databases in the target database are in the passive state; read Take the data migration log of the leader node in the source database, and incrementally migrate the data migration log to the target database.
  • Incremental migration uses the synchronization mechanism of raft-log.
  • the data migration log is synchronized by the copy database of the source database through the log synchronization link. To the target database.
  • the leader node When the log location of any replica database in the target database is close to the location of the master replica database (leader node) of the source database (the location difference is less than 100), the leader node initiates the member change operation, and the source database
  • the status of the two follower nodes is switched from active to passive, the status of the three replica databases of the target database is switched from passive to active, and the roles of the three replica databases are switched to follower nodes.
  • the leader node After the status of each replica database in the source database and the target database is changed, the leader node initiates an election to the source database based on the identification information of any replica database (follower node) in the target database, and selects any replica database Determining as the target master replica database means that the master replica database in the source database actively migrates the leader to any replica database of the target database. After migration, the any replica database is converted from a follower node to a leader node.
  • routing configuration is performed on the target master replica database according to the routing rules of the leader node of the source database.
  • the controller (contoroller) is used to manage the read and write routing, that is, it is used to determine which node the fragmented data to be read/written falls on.
  • the contoroller is used to receive data read and write requests. After a read and write request is sent to the contoroller, The contoroller allocates read and write tasks, so as long as the read and write routing information is stored on the contoroller, the contoroller can find the copy database related to the read and write request. Therefore, when the route is switched, the new leader node immediately initiates a heartbeat to inform the contoroller Regarding the result of route switching, the contoroller can update its own routing information based on this.
  • An embodiment of this specification obtains a data migration request for a source database, creates a snapshot of the data to be migrated in the source database, migrates the snapshot to the target database, and reads the master copy database of the source database
  • the stored data migration log is synchronized with the target database.
  • the target master of the target database is processed according to the routing rules of the master replica database. Replica database for routing configuration;
  • the data migration log is synchronized by incremental synchronization, and routing is switched during the log synchronization process. On the one hand, it will not block the use of incremental data. Synchronous synchronization links help improve the efficiency of log synchronization; on the other hand, no route switching is performed during the full migration process, so data can be read and written normally during the full migration process, which is beneficial to improve the efficiency of data read and write.
  • FIG. 3 shows a process flow chart of a data migration method provided by an embodiment of this specification.
  • the specific steps include step 302 to step 316.
  • Step 302 Obtain a migration request for the source database.
  • Step 304 Create a snapshot of all the data to be migrated in the master replica database or any slave replica database in the source database.
  • Step 306 Migrate the snapshot to at least one replica database in the target database according to a preset migration manner.
  • Step 308 Read the data migration log stored in the master copy data of the source database.
  • Step 310 Establish a log synchronization link based on the master replica database, the at least one slave replica database, and the at least two replica databases.
  • Step 312 Synchronize the data migration log to the target database according to the upstream and downstream connection relationship of each node in the log synchronization link.
  • the data migration log is synchronized from the master replica database to the at least one slave replica database, and the data migration log is synchronized from the at least one slave replica database to the at least two replica databases.
  • Step 314 Determine whether the value of the synchronized log entry in the data migration log meets a preset threshold; if so, perform step 316.
  • log synchronization is continued; if it is determined that the value of the log entry that has been synchronized in the data migration log meets the preset threshold, it can be in the log synchronization
  • the routing switch is performed, that is, the read-write routing of the primary replica database in the source database is switched to the target primary replica database of the target database.
  • Step 316 Determine any one of the replica databases included in the target database as the target master replica database, and perform routing configuration for the target master replica database according to the routing rules of the master replica database.
  • the data in the source database is migrated to the target database in a snapshot export mode.
  • the data migration log is synchronized in an incremental synchronization mode. Route switching is performed during the log synchronization process. On the one hand, it will not block the use of Incremental synchronization of the synchronization link is conducive to improving the efficiency of log synchronization; on the other hand, route switching is not performed during the full migration process, so data can be read and written normally during the full migration process, which is conducive to improving data read and write efficient.
  • FIG. 4 shows a schematic diagram of a data migration device provided by an embodiment of this specification. As shown in Figure 4, the device includes:
  • the obtaining module 402 is configured to obtain a migration request for the source database
  • the creation module 404 is configured to create a snapshot of the data to be migrated in the source database, and migrate the snapshot to the target database;
  • the reading module 406 is configured to read the data migration log stored in the primary replica database of the source database, and synchronize with the target database;
  • the configuration module 408 is configured to perform routing configuration on the target master replica database of the target database according to the routing rules of the master replica database.
  • the source database is composed of the master replica database and at least one slave replica database;
  • the target database is composed of at least two replica databases.
  • the reading module 406 includes:
  • An establishment sub-module configured to establish a log synchronization link based on the master replica database, the at least one slave replica database, and the at least two replica databases;
  • the synchronization sub-module is configured to synchronize the data migration log to the target database according to the upstream and downstream connection relationship of each node in the log synchronization link.
  • the synchronization sub-module includes:
  • a first synchronization unit configured to synchronize the data migration log from the master replica database to the at least one slave replica database
  • the second synchronization unit is configured to synchronize the data migration log from the at least one replica database to the at least two replica databases.
  • the configuration module 408 includes:
  • the determining sub-module is configured to determine any one of the replica databases included in the target database as the target master replica database;
  • the configuration sub-module is configured to perform routing configuration for the target master replica database according to the routing rules of the master replica database.
  • the target master replica database is determined in the following manner:
  • the creation module 404 includes:
  • the creation sub-module is configured to create a master replica database of the source database or any snapshot of all data to be migrated from the replica database;
  • the migration sub-module is configured to migrate the snapshot to the target database according to a preset migration manner.
  • the data migration device further includes:
  • a judging module configured to judge whether the value of the synchronized log entry in the data migration log meets a preset threshold
  • the operation result of the judgment module is yes, it is determined that the synchronization progress of the data migration log meets the preset progress threshold, and the routing rules of the master replica database in the source database are executed as the target master of the target database. Steps to configure routing for the replica database.
  • the data migration device further includes:
  • the receiving module is configured to receive data read and write requests
  • the search module is configured to search for the corresponding target master copy database according to the requested data identifier in the data read request;
  • the execution module is configured to perform data read and write operations on the target master replica database according to the data read and write routing rules of the target master replica database.
  • Fig. 5 shows a structural block diagram of a computing device 500 according to an embodiment of the present specification.
  • the components of the computing device 500 include but are not limited to a memory 510 and a processor 520.
  • the processor 520 and the memory 510 are connected through a bus 530, and the database 550 is used to store data.
  • the computing device 500 also includes an access device 540 that enables the computing device 500 to communicate via one or more networks 560.
  • networks include a public switched telephone network (PSTN), a local area network (LAN), a wide area network (WAN), a personal area network (PAN), or a combination of communication networks such as the Internet.
  • the access device 540 may include one or more of any type of wired or wireless network interface (for example, a network interface card (NIC)), such as IEEE802.11 wireless local area network (WLAN) wireless interface, global interconnection for microwave access ( Wi-MAX) interface, Ethernet interface, universal serial bus (USB) interface, cellular network interface, Bluetooth interface, near field communication (NFC) interface, etc.
  • NIC network interface card
  • the aforementioned components of the computing device 500 and other components not shown in FIG. 5 may also be connected to each other, for example, via a bus. It should be understood that the structural block diagram of the computing device shown in Fig. 5 is only for the purpose of example, and is not intended to limit the scope of this specification. Those skilled in the art can add or replace other components as needed.
  • the computing device 500 may be any type of stationary or mobile computing device, including mobile computers or mobile computing devices (for example, tablet computers, personal digital assistants, laptop computers, notebook computers, netbooks, etc.), mobile phones (for example, smart phones). ), wearable computing devices (for example, smart watches, smart glasses, etc.) or other types of mobile devices, or stationary computing devices such as desktop computers or PCs.
  • the computing device 500 may also be a mobile or stationary server.
  • the memory 510 is used to store computer-executable instructions
  • the processor 520 is used to execute the following computer-executable instructions:
  • routing configuration is performed on the target master replica database of the target database.
  • An embodiment of the present specification also provides a computer-readable storage medium that stores computer instructions, which are used to implement the steps of the data migration method when executed by a processor.
  • the computer instructions include computer program codes, and the computer program codes may be in the form of source code, object code, executable files, or some intermediate forms.
  • the computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, U disk, mobile hard disk, magnetic disk, optical disk, computer memory, read-only memory (ROM, Read-Only Memory) , Random Access Memory (RAM, Random Access Memory), electrical carrier signal, telecommunications signal, and software distribution media, etc. It should be noted that the content contained in the computer-readable medium can be appropriately added or deleted according to the requirements of the legislation and patent practice in the jurisdiction. For example, in some jurisdictions, according to the legislation and patent practice, the computer-readable medium Does not include electrical carrier signals and telecommunication signals.

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

数据迁移方法以及装置,其中所述数据迁移方法包括:获取针对源数据库的迁移请求,创建所述源数据库中待迁移数据的快照,并将所述快照迁移至目标数据库,读取所述源数据库的主副本数据库中存储的数据迁移日志,并向所述目标数据库同步,根据所述主副本数据库的路由规则,对所述目标数据库的目标主副本数据库进行路由配置。

Description

数据迁移方法以及装置
本申请要求2020年05月29日递交的申请号为202010477729.3、发明名称为“数据迁移方法以及装置”中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本说明书实施例涉及数据库技术领域,特别涉及一种数据迁移方法。本说明书一个或者多个实施例同时涉及一种数据迁移装置,一种计算设备,以及一种计算机可读存储介质。
背景技术
随着科技的发展,互联网已经深入到社会生活的方方面面,给人们的工作、生活和学习等带来了巨大的便利。在互联网业务运营中,在某些特定的时间段,往往会遇到数据流量激增(数据库容量需求大幅增加)或的数据流量骤减(数据库容量需求减少)的情况。
在分布式数据库中,为了满足弹性需求,一般需要具备横向节点扩缩容的线性能力。扩缩容过程中,为了满足存储和计算负载的均衡,需要在节点间进行数据迁移,而数据迁移效率影响线性扩缩期间系统的可用性和性能,例如,若使用通信链路直接发送数据块,由于通信链路要求延时短,因此会限制其数据分片的大小,缺乏灵活性;或者,当数据集较大时,可能阻塞同步和通信链路,影响数据同步效率。因此,亟需一种数据迁移方法,以克服此类问题。
发明内容
有鉴于此,本说明书实施例提供了一种数据迁移方法。本说明书一个或者多个实施例同时涉及一种数据迁移装置,一种计算设备,以及一种计算机可读存储介质,以解决现有技术中存在的技术缺陷。
根据本说明书实施例的第一方面,提供了一种数据迁移方法,包括:
获取针对源数据库的迁移请求;
创建所述源数据库中待迁移数据的快照,并将所述快照迁移至目标数据库;
读取所述源数据库的主副本数据库中存储的数据迁移日志,并向所述目标数据库同步;
根据所述主副本数据库的路由规则,对所述目标数据库的目标主副本数据库进行路由配置。
可选地,所述源数据库由所述主副本数据库以及至少一个从副本数据库组成;所述目标数据库由至少两个副本数据库组成。
可选地,所述数据迁移日志通过以下方式向所述目标数据库同步:
基于所述主副本数据库、所述至少一个从副本数据库以及所述至少两个副本数据库建立日志同步链路;
按照所述日志同步链路中各个节点的上下游连接关系,向所述目标数据库同步所述数据迁移日志。
可选地,所述按照所述日志同步链路中各个节点的上下游连接关系,向所述目标数据库同步所述数据迁移日志,包括:
将所述数据迁移日志由所述主副本数据库同步至所述至少一个从副本数据库;
将所述数据迁移日志由所述至少一个从副本数据库同步至所述至少两个副本数据库。
可选地,所述根据所述主副本数据库的路由规则,对所述目标数据库的目标主副本数据库进行路由配置,包括:
将所述目标数据库中包含的任意一个副本数据库确定为所述目标主副本数据库;
根据所述主副本数据库的路由规则为所述目标主副本数据库进行路由配置。
可选地,所述目标主副本数据库通过以下方式确定:
基于所述至少两个副本数据库的标识信息向所述源数据库发起选举;所述选举用于从所述至少两个副本数据库中选举出所述目标主副本数据库;
获取所述主副本数据库以及至少一个从副本数据库提交的投票结果;
对所述投票结果进行统计,将所述至少两个副本数据库中首个获得投票数大于预设阈值的副本数据库确定为所述目标主副本数据库。
可选地,所述创建所述源数据库中待迁移数据的快照,并将所述快照迁移至目标数据库,包括:
创建所述源数据库的主副本数据库或任意一个从副本数据库中全量待迁移数据的快照;
按照预设迁移方式将所述快照迁移至所述目标数据库。
可选地,所述读取所述源数据库的主副本数据库中存储的数据迁移日志,并向所述 目标数据库同步步骤执行之后,所述根据所述主副本数据库的路由规则,对所述目标数据库的目标主副本数据库进行路由配置步骤执行之前,还包括:
判断所述数据迁移日志中同步完成的日志条目数值是否满足预设阈值;
若是,则确定所述数据迁移日志的同步进度满足预设进度阈值,执行所述根据所述源数据库中主副本数据库的路由规则为所述目标数据库的目标主副本数据库进行路由配置的步骤。
可选地,所述根据所述源数据库中主副本数据库的路由规则为所述目标数据库的目标主副本数据库进行路由配置步骤执行之后,还包括:
接收数据读写请求;
根据所述数据读取请求中的请求数据标识,查找对应的目标主副本数据库;
根据所述目标主副本数据库的数据读写路由规则,对所述目标主副本数据库执行数据读写操作。
根据本说明书实施例的第二方面,提供了一种数据迁移装置,包括:
获取模块,被配置为获取针对源数据库的迁移请求;
创建模块,被配置为创建所述源数据库中待迁移数据的快照,并将所述快照迁移至目标数据库;
读取模块,被配置为读取所述源数据库的主副本数据库中存储的数据迁移日志,并向所述目标数据库同步;
配置模块,被配置为根据所述主副本数据库的路由规则,对所述目标数据库的目标主副本数据库进行路由配置。
根据本说明书实施例的第三方面,提供了一种计算设备,包括:
存储器和处理器;
所述存储器用于存储计算机可执行指令,所述处理器用于执行所述计算机可执行指令:
获取针对源数据库的迁移请求;
创建所述源数据库中待迁移数据的快照,并将所述快照迁移至目标数据库;
读取所述源数据库的主副本数据库中存储的数据迁移日志,并向所述目标数据库同步;
根据所述主副本数据库的路由规则,对所述目标数据库的目标主副本数据库进行路由配置。
根据本说明书实施例的第四方面,提供了一种计算机可读存储介质,其存储有计算机可执行指令,该指令被处理器执行时实现所述数据迁移方法的步骤。
本说明书一个实施例通过获取针对源数据库的迁移请求,创建所述源数据库中待迁移数据的快照,并将所述快照迁移至目标数据库,读取所述源数据库的主副本数据库中存储的数据迁移日志,并向所述目标数据库同步,根据所述主副本数据库的路由规则,对所述目标数据库的目标主副本数据库进行路由配置;
实现了通过快照导出方式将源数据库中的待迁移数据全量迁移至目标数据库,另外,以增量同步的方式同步数据迁移日志,在日志同步过程中进行路由切换,一方面,不会阻塞用于增量同步的同步链路,有利于提高日志同步的效率;另一方面,在全量迁移过程中未进行路由切换,因此,在全量迁移过程中可正常进行数据读写,有利于提高数据读写效率。
附图说明
图1是本说明书一个实施例提供的一种数据迁移方法的处理流程图;
图2是本说明书一个实施例提供的一种数据库扩容过程的示意图;
图3是本说明书一个实施例提供的一种数据迁移方法的处理过程流程图;
图4是本说明书一个实施例提供的一种数据迁移装置的示意图;
图5是本说明书一个实施例提供的一种计算设备的结构框图。
具体实施方式
在下面的描述中阐述了很多具体细节以便于充分理解本说明书。但是本说明书能够以很多不同于在此描述的其它方式来实施,本领域技术人员可以在不违背本说明书内涵的情况下做类似推广,因此本说明书不受下面公开的具体实施的限制。
在本说明书一个或多个实施例中使用的术语是仅仅出于描述特定实施例的目的,而非旨在限制本说明书一个或多个实施例。在本说明书一个或多个实施例和所附权利要求书中所使用的单数形式的“一种”、“所述”和“该”也旨在包括多数形式,除非上下文清楚地表示其他含义。还应当理解,本说明书一个或多个实施例中使用的术语“和/或”是指并包含一个或多个相关联的列出项目的任何或所有可能组合。
应当理解,尽管在本说明书一个或多个实施例中可能采用术语第一、第二等来描述 各种信息,但这些信息不应限于这些术语。这些术语仅用来将同一类型的信息彼此区分开。例如,在不脱离本说明书一个或多个实施例范围的情况下,第一也可以被称为第二,类似地,第二也可以被称为第一。取决于语境,如在此所使用的词语“如果”可以被解释成为“在……时”或“当……时”或“响应于确定”。
首先,对本说明书一个或多个实施例涉及的名词术语进行解释。
Raft算法:一种共识算法,提供了一种在计算系统集群中分布状态机的通用方法,确保集群内每个节点可以达成相同的状态转换。
数据镜像备份工具:一种有效数据传输和文件同步工具,使用C语言编写,被广泛集成和使用在类Unix操作系统中。
在本说明书中,提供了一种数据迁移方法,本说明书同时涉及一种数据迁移装置,一种计算设备,以及一种计算机可读存储介质,在下面的实施例中逐一进行详细说明。
图1示出了根据本说明书一个实施例提供的一种数据迁移方法的处理流程图,包括步骤102至步骤108。
步骤102,获取针对源数据库的迁移请求。
随着科技的发展,互联网已经深入到社会生活的方方面面,给人们的工作、生活和学习等带来了巨大的便利。在互联网业务运营中,在某些特定的时间段,往往会遇到数据流量激增(数据库容量需求大幅增加)或的数据流量骤减(数据库容量需求减少)的情况。
在分布式数据库中,为了满足弹性需求,一般需要具备横向节点扩缩容的线性能力。扩缩容过程中,为了满足存储和计算负载的均衡,需要在节点间进行数据迁移,而数据迁移效率影响线性扩缩期间系统的可用性和性能。
基于此,本说明书实施例提供的数据迁移方法,应用于分布式数据库系统,通过快照导出方式将源数据库中的待迁移数据全量迁移至目标数据库,另外,以增量同步的方式同步数据迁移日志,在日志同步过程中进行路由切换,一方面,不会阻塞用于增量同步的同步链路,有利于提高日志同步的效率;另一方面,在全量迁移过程中未进行路由切换,因此,在全量迁移过程中可正常进行数据读写,有利于提高数据读写效率。
为保证分布式数据库系统线性扩缩容过程中的高可用,本说明书实施例提供的数据迁移方法,利用全量加增量的数据迁移方式,使得整个迁移过程以及路由切换过程均不影响系统可用性。
具体的,所述数据迁移即将源数据库中的数据迁移至目标数据库;源数据库由所述 主副本数据库以及至少一个从副本数据库组成,目标数据库由至少两个副本数据库组成;在获取针对源数据库的迁移请求后,可创建源数据库中的全量数据的快照,并通过快照迁移的方式将源数据库中的数据全量迁移至目标数据库。
步骤104,创建所述源数据库中待迁移数据的快照,并将所述快照迁移至目标数据库。
具体的,由于源数据库中包含一个主副本数据库以及至少一个从副本数据库,因此,具体实施时,在获取针对源数据库的迁移请求后,创建所述源数据库中待迁移数据的快照,并将所述快照迁移至目标数据库,即创建所述源数据库的主副本数据库或任意一个从副本数据库中全量待迁移数据的快照,并按照预设迁移方式将所述快照迁移至所述目标数据库。
进一步的,由于所述源数据库中的主副本数据库以及至少一个从副本数据库中存储的数据相同,因此,在数据迁移过程中,只需将源数据库中的主副本数据库或至少一个从副本数据库中任意一个副本数据库中存储的数据进行同步即可,因此,创建所述源数据库中待迁移数据的快照,即创建所述源数据库的主副本数据库或任意一个从副本数据库中全量待迁移数据的快照;创建快照后,按照预设迁移方式将快照并行迁移至目标数据库的至少两个副本数据库,具体实施时,所述预设迁移方式可以是数据镜像备份工具(rsync工具),即利用rsync工具进行数据全量迁移。
在未完成数据迁移之前,由源数据库提供数据读写功能,因此,在将路由从源数据库切换至目标数据库之前,源数据库中的主副本数据库以及至少一个从副本数据库的状态为active(主动处理数据读写请求),而目标数据库中的至少两个副本数据库的状态为passive(空闲状态),以为其状态变为active做准备。
步骤106,读取所述源数据库的主副本数据库中存储的数据迁移日志,并向所述目标数据库同步。
具体的,由于数据迁移日志是一个与数据库文件分开的文件。它存储对数据库进行的所有更改,并全部记录插入、更新、删除、提交、回退和数据库模式变化,数据迁移日志是进行数据/文件备份和恢复的重要组件,因此,在将源数据库中的数据全量迁移至目标数据库后,还需将数据迁移日志增量迁移(同步)至目标数据库。
具体实施时,可采用raft算法进行数据迁移日志的同步,raft算法将分布式数据库系统中的角色分为领导者(leader节点)、跟从者(follower节点)和候选人(candidate节点),本说明书实施例中,源数据库中包含的主副本数据库即为leader节点,源数据 库中包含的至少一个从副本数据库即为follower节点,由于在向目标数据库同步数据迁移日志之前,目标数据库中不存在目标主副本数据库,因此,在未将数据迁移日志同步至目标数据库之前,目标数据库中包含的至少两个副本数据库均为passive状态,将数据迁移日志同步至目标数据库后,并且目标数据库中的至少两个副本数据库的状态由passive转换为active之后,所述至少两个副本数据库可作为leader节点的切换目标(即所述至少两个副本数据库可作为candidate节点)。
其中,leader节点用于接收客户端发送的数据读写请求,并向follower节点同步数据迁移日志,当数据迁移日志同步到大多数节点上后,leader节点可向follower节点发送提交日志的提示信息;follower节点用于接收并持久化leader节点同步的数据迁移日志,在接收到leader节点发送的提交日志的提示信息之后,提交日志;candidate节点是leader节点选举过程中的临时角色。
源数据库中的leader节点和follower节点的状态为active,目标数据库中的至少两个副本数据库的状态为passive,在未将数据迁移日志同步至目标数据库之前,目标数据库中包含的至少两个副本数据库均为passive状态,将数据迁移日志同步至目标数据库后,并且目标数据库中的至少两个副本数据库的状态由passive转换为active之后,所述至少两个副本数据库即可作为candidate节点,源数据库中的leader节点和follower节点用于在日志同步进度满足预设阈值的情况下,对目标数据库中的至少两个副本数据库进行选举,即在candidate节点中选举产生目标主副本数据库。
实际应用中,可使用raft-log的同步机制进行数据迁移日志的同步,由于raft分布式数据库系统中最多只能有一个leader节点,日志只能从leader节点复制到follower节点上,而raft算法的一个应用场景为复制状态机,client(参与者)向复制状态机发送用于在状态机上执行的命令,raft算法负责将命令以log的形式复制给其他的状态机,若不同的状态机按照一致的顺序来执行这些命令,即可获得相同的输出结果,因此,可利用共识算法保证被同步的数据迁移日志的内容和顺序一致。
具体实施时,可通过日志同步链路进行数据迁移日志的同步,因此,在同步日志之前,需建立日志同步链路,并通过所述日志同步链路向目标数据库同步日志,具体可通过以下方式实现:
基于所述主副本数据库、所述至少一个从副本数据库以及所述至少两个副本数据库建立日志同步链路;
按照所述日志同步链路中各个节点的上下游连接关系,向所述目标数据库同步所述 数据迁移日志。
进一步的,按照所述日志同步链路中各个节点的上下游连接关系,向所述目标数据库同步所述数据迁移日志,具体可通过以下方式实现:
将所述数据迁移日志由所述主副本数据库同步至所述至少一个从副本数据库;
将所述数据迁移日志由所述至少一个从副本数据库同步至所述至少两个副本数据库。
具体的,由于数据迁移日志是一个与数据库文件分开的文件。它存储对数据库进行的所有更改,并将对数据库中数据的插入、更新、删除、提交、回退和数据库模式变化等进行记录,数据迁移日志是进行数据/文件备份和恢复的重要组件,并且日志存储在leader节点中,由leader节点将日志同步至follower节点,再由follower节点同步至目标数据库的至少两个副本数据库,因此,在进行数据迁移日志的同步之前,需基于主副本数据库、至少一个从副本数据库、至少两个副本数据库以及数据迁移日志在各节点间的同步顺序建立日志同步链路,在所述日志同步链路建立完成后,可按照链路中各个节点的上下游连接关系,向目标数据库同步数据迁移日志。
步骤108,根据所述主副本数据库的路由规则,对所述目标数据库的目标主副本数据库进行路由配置。
具体的,由于数据迁移日志由有序编号(log index)的日志条目组成,因此,数据迁移日志的同步实际是日志中各日志条目的同步,在同步完成的日志条目的数目满足预设阈值的情况下,可为目标数据库的副本数据库配置路由,即将源数据库中主副本数据库的路由切换至目标数据库的目标主副本数据库,所述路由可提供数据读写功能。
具体实施时,由于数据迁移日志由日志条目组成,因此,数据迁移日志的同步进度可通过以下方式确定:
判断所述数据迁移日志中同步完成的日志条目数值是否满足预设阈值;
若是,则确定所述数据迁移日志的同步进度满足预设进度阈值,根据所述源数据库中主副本数据库的路由规则为所述目标数据库的目标主副本数据库进行路由配置。
具体的,由于数据迁移日志的同步实际是日志中各日志条目的同步,因此,判断数据迁移日志中同步完成的日志条目数值是否满足预设阈值这一过程,可在向所述目标数据库同步数据迁移日志的过程中同时进行,若判断同步完成的日志条目数值满足预设阈值,则可在日志同步的过程中,进行路由的切换,即由源数据库切换至目标数据库;若判断同步完成的日志条目数值不满足预设阈值,则继续进行日志同步即可。
以client发起了三次写请求为例,leader节点在收到这些请求之后,首先以日志的形式将请求批量同步至follower节点,在接收到follower节点对日志复制成功的响应之后,更新日志位点,并调用接口,执行请求中的计数运算,将client发来的指令加到计数器当中。
更新日志位点后,表示在该日志位点之前(包括该位点)的日志均已复制到了系统半数以上的节点之中。若位点在“2”这个位置,表示“0-2”的日志都已经复制到了半数以上节点之中,若leader继续将“3”、“4”两条日志批量的复制到了follower节点上,则位点滑动到“4”的位置,表示“0-4”的日志都已经复制到了半数以上节点之中。
当目标数据库中的至少一个副本数据库的日志位点与源数据库中主副本数据库的日志位点间的差值小于预设阈值(100),则由源数据库中的leader节点发起成员变更操作,即将源数据库中的至少一个从副本数据库的状态由active切换为passive,并将目标数据库中的至少两个副本数据库的状态由passive切换为active,并将所述至少两个副本数据库的角色变为follower节点。
另外,目标数据库的目标主副本数据库可通过选举确定,具体可通过以下方式实现:
基于所述至少两个副本数据库的标识信息向所述源数据库发起选举;所述选举用于从所述至少两个副本数据库中选举出所述目标主副本数据库;
获取所述主副本数据库以及至少一个从副本数据库提交的投票结果;
对所述投票结果进行统计,将所述至少两个副本数据库中首个获得投票数大于预设阈值的副本数据库确定为所述目标主副本数据库。
具体的,对投票结果的统计是随着投票过程动态进行的,源数据库中的主副本数据库以及至少一个从副本数据库对目标数据库中的至少两个副本数据库进行投票,根据投票结果确定目标数据库中的目标主副本数据库,即在投票过程中,对目标数据库中至少两个副本数据库的得票数进行动态统计,并将至少两个副本数据库中第一个得票数大于预设阈值的副本数据库确定为所述目标主副本数据库。
实际应用中,raft使用心跳(heartbeat)触发leader选举,将目标数据库中至少两个副本数据库的角色初始化为follower节点,并由源数据库中的主副本数据库以及至少一个从副本数据库作为投票成员,对目标数据库中的follower节点进行选举,产生目标数据库中的目标主副本数据库。
在获取投票成员的投票结果后,对所述投票结果进行统计,并将目标数据库中首个 获得投票数大于预设阈值的follower节点确定为leader节点;实际应用中,所述预设阈值可根据源数据库中副本数据库的个数确定,例如,若所述源数据库中包3个副本数据库(1个主副本数据和2个从副本数据库),则预设阈值可设置为2(得票率大于50%),并将目标数据库中首个获得投票数大于2的follower节点确定为leader节点。
投票的依据可以是各个节点中数据迁移日志的同步进度,即同步完成的日志条目最多的follower节点成为leader节点的概率更高。
另外,由于leader节点用于接收客户端的数据读写请求,因此将路由切换至目标数据库的目标主副本数据库(leader节点)后,所述目标主副本数据库可用于接收客户端的数据读写请求,针对所述数据读写请求的处理过程具体可通过以下方式实现:
接收数据读写请求;
根据所述数据读取请求中的请求数据标识,查找对应的目标主副本数据库;
根据所述目标主副本数据库的数据读写路由规则,对所述目标主副本数据库执行数据读写操作。
具体的,为目标数据库的目标主副本数据库配置路由后,目标主副本数据库(leader节点)可提供数据读写服务;leader节点把请求作为日志条目(log entries)加入到它的日志中,然后并行的向其他follower节点同步日志条目。当这条日志被同步到大多数follower节点上,leader节点将这条日志应用到它的状态机并向客户端返回执行结果。
具体的,本说明书实施例以源数据库中包含3个副本数据库为例进行说明,分布式数据库系统的扩容过程的示意图如图2所示,图2所示的分布式数据库系统中包含四个源数据库,分别为DB1、DB2、DB3和DB4,并以DB2的数据迁移过程为例进行示意性说明,DB1、DB3和DB4的数据迁移过程与DB2的数据迁移过程类似,在此不再赘述。
图2中,源数据库DB2和目标数据库DB2-1中均包含3个副本数据库,在未对目标数据库进行路由配置之前,源数据库中的一个副本数据库为主副本数据库(leader节点),剩余两个副本数据库为从副本数据库(follower节点),在获取针对源数据库DB2的数据迁移请求后,创建主副本数据库或任意一个从副本数据库中全量待迁移数据的快照,并通过rsync工具将快照并行迁移至目标数据库的3个副本数据库中;迁移完成后,该数据片raft具有6个副本数据库,其中,源数据库中的3个副本数据库仍是active状态,目标数据库中的3个副本为passive状态;读取源数据库中leader节点的数据迁移日志,并将数据迁移日志增量迁移至目标数据库,增量迁移使用raft-log的同步机制, 数据迁移日志通过日志同步链路,由源数据库的副本数据库同步至目标数据库。当目标数据库中任意一个副本数据库的日志位点与源数据库的主副本数据库(leader节点)的位点接近时(位点差值小于100),由leader节点发起成员变更操作,将源数据库中另外2个follower节点的状态由active切换为passive,将目标数据库的3个副本数据库的状态由passive切换为active,并将3个副本数据库的角色切换为follower节点。
源数据库以及目标数据库中各副本数据库的状态变更后,由leader节点基于所述目标数据库中任意一个副本数据库(follower节点)的标识信息向所述源数据库发起选举,并将所述任意一个副本数据库确定为目标主副本数据库,即由源数据库中主副本数据库主动将leader迁移到目标数据库的任意一个副本数据库上,迁移之后,所述任意一个副本数据库即由follower节点转换为leader节点。
在从3个follower节点中确定目标主副本数据库(新的leader节点)后,根据源数据库的leader节点的路由规则,对目标主副本数据库进行路由配置。
控制器(contoroller)用于管理读写路由,即用于确定待读/写的分片数据具体落在哪个节点上,contoroller用于接收数据读写请求,在有读写请求发送到contoroller后,contoroller进行读写任务分配,所以读写路由的信息只要保存在contoroller上,contoroller均可找到与读写请求相关的副本数据库,因此,当路由切换后,新的leader节点立即发起一次心跳,告知contoroller关于路由切换的结果,contoroller可基于此更新自身的路由信息。
本说明书一个实施例通过获取针对源数据库进行数据迁移的迁移请求,创建所述源数据库中待迁移数据的快照,并将所述快照迁移至目标数据库,读取所述源数据库的主副本数据库中存储的数据迁移日志,并向所述目标数据库同步,在所述数据迁移日志的同步进度满足预设进度阈值的情况下,根据所述主副本数据库的路由规则,对所述目标数据库的目标主副本数据库进行路由配置;
实现了通过快照导出方式将源数据库中的数据全量迁移至目标数据库,另外,以增量同步的方式同步数据迁移日志,在日志同步过程中进行路由切换,一方面,不会阻塞用于增量同步的同步链路,有利于提高日志同步的效率;另一方面,在全量迁移过程中未进行路由切换,因此,在全量迁移过程中可正常进行数据读写,有利于提高数据读写效率。
下述结合附图3,以本说明书提供的数据迁移方法的应用为例,对所述数据迁移方法进行进一步说明。其中,图3示出了本说明书一个实施例提供的一种数据迁移方法的 处理过程流程图,具体步骤包括步骤302至步骤316。
步骤302,获取针对源数据库的迁移请求。
步骤304,创建所述源数据库中主副本数据库或任意一个从副本数据库中全量待迁移数据的快照。
步骤306,按照预设迁移方式将所述快照迁移至目标数据库中的至少一个副本数据库。
步骤308,读取所述源数据库的主副本数据中存储的数据迁移日志。
步骤310,基于所述主副本数据库、所述至少一个从副本数据库以及所述至少两个副本数据库建立日志同步链路。
步骤312,按照所述日志同步链路中各个节点的上下游连接关系,向所述目标数据库同步所述数据迁移日志。
具体的,将所述数据迁移日志由所述主副本数据库同步至所述至少一个从副本数据库,并将所述数据迁移日志由所述至少一个从副本数据库同步至所述至少两个副本数据库。
步骤314,判断所述数据迁移日志中同步完成的日志条目数值是否满足预设阈值;若是,则执行步骤316。
具体的,若判断数据迁移日志中同步完成的日志条目数值不满足预设阈值,则继续进行日志同步;若判断数据迁移日志中同步完成的日志条目数值满足预设阈值,则可在日志同步的同时进行路由切换,即将源数据库中主副本数据库的读写路由切换至目标数据库的目标主副本数据库。
步骤316,将所述目标数据库中包含的任意一个副本数据库确定为所述目标主副本数据库,并根据所述主副本数据库的路由规则为所述目标主副本数据库进行路由配置。
本说明书实施例通过快照导出方式将源数据库中的数据全量迁移至目标数据库,另外,以增量同步的方式同步数据迁移日志,在日志同步过程中进行路由切换,一方面,不会阻塞用于增量同步的同步链路,有利于提高日志同步的效率;另一方面,在全量迁移过程中未进行路由切换,因此,在全量迁移过程中可正常进行数据读写,有利于提高数据读写效率。
与上述方法实施例相对应,本说明书还提供了数据迁移装置实施例,图4示出了本说明书一个实施例提供的一种数据迁移装置的示意图。如图4所示,该装置包括:
获取模块402,被配置为获取针对源数据库的迁移请求;
创建模块404,被配置为创建所述源数据库中待迁移数据的快照,并将所述快照迁移至目标数据库;
读取模块406,被配置为读取所述源数据库的主副本数据库中存储的数据迁移日志,并向所述目标数据库同步;
配置模块408,被配置为根据所述主副本数据库的路由规则,对所述目标数据库的目标主副本数据库进行路由配置。
可选地,所述源数据库由所述主副本数据库以及至少一个从副本数据库组成;所述目标数据库由至少两个副本数据库组成。
可选地,所述读取模块406,包括:
建立子模块,被配置为基于所述主副本数据库、所述至少一个从副本数据库以及所述至少两个副本数据库建立日志同步链路;
同步子模块,被配置为按照所述日志同步链路中各个节点的上下游连接关系,向所述目标数据库同步所述数据迁移日志。
可选地,所述同步子模块,包括:
第一同步单元,被配置为将所述数据迁移日志由所述主副本数据库同步至所述至少一个从副本数据库;
第二同步单元,被配置为将所述数据迁移日志由所述至少一个从副本数据库同步至所述至少两个副本数据库。
可选地,所述配置模块408,包括:
确定子模块,被配置为将所述目标数据库中包含的任意一个副本数据库确定为所述目标主副本数据库;
配置子模块,被配置为根据所述主副本数据库的路由规则为所述目标主副本数据库进行路由配置。
可选地,所述目标主副本数据库通过以下方式确定:
基于所述至少两个副本数据库的标识信息向所述源数据库发起选举;所述选举用于从所述至少两个副本数据库中选举出所述目标主副本数据库;
获取所述主副本数据库以及至少一个从副本数据库提交的投票结果;
对所述投票结果进行统计,将所述至少两个副本数据库中首个获得投票数大于预设阈值的副本数据库确定为所述目标主副本数据库。
可选地,所述创建模块404,包括:
创建子模块,被配置为创建所述源数据库的主副本数据库或任意一个从副本数据库中全量待迁移数据的快照;
迁移子模块,被配置为按照预设迁移方式将所述快照迁移至所述目标数据库。
可选地,所述数据迁移装置,还包括:
判断模块,被配置为判断所述数据迁移日志中同步完成的日志条目数值是否满足预设阈值;
若所述判断模块的运行结果为是,则确定所述数据迁移日志的同步进度满足预设进度阈值,执行所述根据所述源数据库中主副本数据库的路由规则为所述目标数据库的目标主副本数据库进行路由配置的步骤。
可选地,所述数据迁移装置,还包括:
接收模块,被配置为接收数据读写请求;
查找模块,被配置为根据所述数据读取请求中的请求数据标识,查找对应的目标主副本数据库;
执行模块,被配置为根据所述目标主副本数据库的数据读写路由规则,对所述目标主副本数据库执行数据读写操作。
上述为本实施例的一种数据迁移装置的示意性方案。需要说明的是,该数据迁移装置的技术方案与上述的数据迁移方法的技术方案属于同一构思,数据迁移装置的技术方案未详细描述的细节内容,均可以参见上述数据迁移方法的技术方案的描述。
图5示出了根据本说明书一个实施例提供的一种计算设备500的结构框图。该计算设备500的部件包括但不限于存储器510和处理器520。处理器520与存储器510通过总线530相连接,数据库550用于保存数据。
计算设备500还包括接入设备540,接入设备540使得计算设备500能够经由一个或多个网络560通信。这些网络的示例包括公用交换电话网(PSTN)、局域网(LAN)、广域网(WAN)、个域网(PAN)或诸如因特网的通信网络的组合。接入设备540可以包括有线或无线的任何类型的网络接口(例如,网络接口卡(NIC))中的一个或多个,诸如IEEE802.11无线局域网(WLAN)无线接口、全球微波互联接入(Wi-MAX)接口、以太网接口、通用串行总线(USB)接口、蜂窝网络接口、蓝牙接口、近场通信(NFC)接口,等等。
在本说明书的一个实施例中,计算设备500的上述部件以及图5中未示出的其他部件也可以彼此相连接,例如通过总线。应当理解,图5所示的计算设备结构框图仅仅是 出于示例的目的,而不是对本说明书范围的限制。本领域技术人员可以根据需要,增添或替换其他部件。
计算设备500可以是任何类型的静止或移动计算设备,包括移动计算机或移动计算设备(例如,平板计算机、个人数字助理、膝上型计算机、笔记本计算机、上网本等)、移动电话(例如,智能手机)、可佩戴的计算设备(例如,智能手表、智能眼镜等)或其他类型的移动设备,或者诸如台式计算机或PC的静止计算设备。计算设备500还可以是移动式或静止式的服务器。
其中,所述存储器510用于存储计算机可执行指令,处理器520用于执行如下计算机可执行指令:
获取针对源数据库进行数据迁移的迁移请求;
创建所述源数据库中数据的快照,并将所述快照迁移至目标数据库;
读取所述源数据库的主副本数据库中存储的数据迁移日志,并向所述目标数据库同步;
根据所述主副本数据库的路由规则,对所述目标数据库的目标主副本数据库进行路由配置。
上述为本实施例的一种计算设备的示意性方案。需要说明的是,该计算设备的技术方案与上述的数据迁移方法的技术方案属于同一构思,计算设备的技术方案未详细描述的细节内容,均可以参见上述数据迁移方法的技术方案的描述。
本说明书一实施例还提供一种计算机可读存储介质,其存储有计算机指令,该指令被处理器执行时以用于实现所述数据迁移方法的步骤。
上述为本实施例的一种计算机可读存储介质的示意性方案。需要说明的是,该存储介质的技术方案与上述的数据迁移方法的技术方案属于同一构思,存储介质的技术方案未详细描述的细节内容,均可以参见上述数据迁移方法的技术方案的描述。
上述对本说明书特定实施例进行了描述。其它实施例在所附权利要求书的范围内。在一些情况下,在权利要求书中记载的动作或步骤可以按照不同于实施例中的顺序来执行并且仍然可以实现期望的结果。另外,在附图中描绘的过程不一定要求示出的特定顺序或者连续顺序才能实现期望的结果。在某些实施方式中,多任务处理和并行处理也是可以的或者可能是有利的。
所述计算机指令包括计算机程序代码,所述计算机程序代码可以为源代码形式、对象代码形式、可执行文件或某些中间形式等。所述计算机可读介质可以包括:能够携带 所述计算机程序代码的任何实体或装置、记录介质、U盘、移动硬盘、磁碟、光盘、计算机存储器、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、电载波信号、电信信号以及软件分发介质等。需要说明的是,所述计算机可读介质包含的内容可以根据司法管辖区内立法和专利实践的要求进行适当的增减,例如在某些司法管辖区,根据立法和专利实践,计算机可读介质不包括电载波信号和电信信号。
需要说明的是,对于前述的各方法实施例,为了简便描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本说明书实施例并不受所描述的动作顺序的限制,因为依据本说明书实施例,某些步骤可以采用其它顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作和模块并不一定都是本说明书实施例所必须的。
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其它实施例的相关描述。
以上公开的本说明书优选实施例只是用于帮助阐述本说明书。可选实施例并没有详尽叙述所有的细节,也不限制该发明仅为所述的具体实施方式。显然,根据本说明书实施例的内容,可作很多的修改和变化。本说明书选取并具体描述这些实施例,是为了更好地解释本说明书实施例的原理和实际应用,从而使所属技术领域技术人员能很好地理解和利用本说明书。本说明书仅受权利要求书及其全部范围和等效物的限制。

Claims (12)

  1. 一种数据迁移方法,包括:
    获取针对源数据库的迁移请求;
    创建所述源数据库中待迁移数据的快照,并将所述快照迁移至目标数据库;
    读取所述源数据库的主副本数据库中存储的数据迁移日志,并向所述目标数据库同步;
    根据所述主副本数据库的路由规则,对所述目标数据库的目标主副本数据库进行路由配置。
  2. 根据权利要求1所述的数据迁移方法,所述源数据库由所述主副本数据库以及至少一个从副本数据库组成;所述目标数据库由至少两个副本数据库组成。
  3. 根据权利要求2所述的数据迁移方法,所述数据迁移日志通过以下方式向所述目标数据库同步:
    基于所述主副本数据库、所述至少一个从副本数据库以及所述至少两个副本数据库建立日志同步链路;
    按照所述日志同步链路中各个节点的上下游连接关系,向所述目标数据库同步所述数据迁移日志。
  4. 根据权利要求3所述的数据迁移方法,所述按照所述日志同步链路中各个节点的上下游连接关系,向所述目标数据库同步所述数据迁移日志,包括:
    将所述数据迁移日志由所述主副本数据库同步至所述至少一个从副本数据库;
    将所述数据迁移日志由所述至少一个从副本数据库同步至所述至少两个副本数据库。
  5. 根据权利要求2所述的数据迁移方法,所述根据所述主副本数据库的路由规则,对所述目标数据库的目标主副本数据库进行路由配置,包括:
    将所述目标数据库中包含的任意一个副本数据库确定为所述目标主副本数据库;
    根据所述主副本数据库的路由规则为所述目标主副本数据库进行路由配置。
  6. 根据权利要求2所述的数据迁移方法,所述目标主副本数据库通过以下方式确定:
    基于所述至少两个副本数据库的标识信息向所述源数据库发起选举;所述选举用于从所述至少两个副本数据库中选举出所述目标主副本数据库;
    获取所述主副本数据库以及至少一个从副本数据库提交的投票结果;
    对所述投票结果进行统计,将所述至少两个副本数据库中首个获得投票数大于预设阈值的副本数据库确定为所述目标主副本数据库。
  7. 根据权利要求1所述的数据迁移方法,所述创建所述源数据库中待迁移数据的快照,并将所述快照迁移至目标数据库,包括:
    创建所述源数据库的主副本数据库或任意一个从副本数据库中全量待迁移数据的快照;
    按照预设迁移方式将所述快照迁移至所述目标数据库。
  8. 根据权利要求1所述的数据迁移方法,所述读取所述源数据库的主副本数据库中存储的数据迁移日志,并向所述目标数据库同步步骤执行之后,所述根据所述主副本数据库的路由规则,对所述目标数据库的目标主副本数据库进行路由配置步骤执行之前,还包括:
    判断所述数据迁移日志中同步完成的日志条目数值是否满足预设阈值;
    若是,则确定所述数据迁移日志的同步进度满足预设进度阈值,执行所述根据所述源数据库中主副本数据库的路由规则为所述目标数据库的目标主副本数据库进行路由配置的步骤。
  9. 根据权利要求1所述的数据迁移方法,所述根据所述源数据库中主副本数据库的路由规则为所述目标数据库的目标主副本数据库进行路由配置步骤执行之后,还包括:
    接收数据读写请求;
    根据所述数据读取请求中的请求数据标识,查找对应的目标主副本数据库;
    根据所述目标主副本数据库的数据读写路由规则,对所述目标主副本数据库执行数据读写操作。
  10. 一种数据迁移装置,包括:
    获取模块,被配置为获取针对源数据库的迁移请求;
    创建模块,被配置为创建所述源数据库中待迁移数据的快照,并将所述快照迁移至目标数据库;
    读取模块,被配置为读取所述源数据库的主副本数据库中存储的数据迁移日志,并向所述目标数据库同步;
    配置模块,被配置为根据所述主副本数据库的路由规则,对所述目标数据库的目标主副本数据库进行路由配置。
  11. 一种计算设备,包括:
    存储器和处理器;
    所述存储器用于存储计算机可执行指令,所述处理器用于执行所述计算机可执行指令:
    获取针对源数据库的迁移请求;
    创建所述源数据库中待迁移数据的快照,并将所述快照迁移至目标数据库;
    读取所述源数据库的主副本数据库中存储的数据迁移日志,并向所述目标数据库同步;
    根据所述主副本数据库的路由规则,对所述目标数据库的目标主副本数据库进行路由配置。
  12. 一种计算机可读存储介质,其存储有计算机指令,该指令被处理器执行时实现权利要求1至9任意一项所述数据迁移方法的步骤。
PCT/CN2021/094094 2020-05-29 2021-05-17 数据迁移方法以及装置 WO2021238701A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US18/070,450 US20230087447A1 (en) 2020-05-29 2022-11-28 Data migration method and device

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010477729.3A CN111813760B (zh) 2020-05-29 2020-05-29 数据迁移方法以及装置
CN202010477729.3 2020-05-29

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US18/070,450 Continuation US20230087447A1 (en) 2020-05-29 2022-11-28 Data migration method and device

Publications (1)

Publication Number Publication Date
WO2021238701A1 true WO2021238701A1 (zh) 2021-12-02

Family

ID=72848426

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/094094 WO2021238701A1 (zh) 2020-05-29 2021-05-17 数据迁移方法以及装置

Country Status (3)

Country Link
US (1) US20230087447A1 (zh)
CN (1) CN111813760B (zh)
WO (1) WO2021238701A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115118604A (zh) * 2022-07-01 2022-09-27 杭州宇信数字科技有限公司 一种动态扩缩容的数据迁移方法、设备、系统和介质

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111813760B (zh) * 2020-05-29 2024-03-26 阿里巴巴集团控股有限公司 数据迁移方法以及装置
WO2022005322A1 (en) * 2020-06-29 2022-01-06 Marvell Rus Llc Method and apparatus for direct memory access of network device
CN113010496B (zh) * 2021-03-19 2024-03-08 腾讯云计算(北京)有限责任公司 一种数据迁移方法、装置、设备和存储介质
CN113821362B (zh) * 2021-11-25 2022-03-22 云和恩墨(北京)信息技术有限公司 数据复制方法及装置
CN115080541A (zh) * 2022-06-16 2022-09-20 京东科技信息技术有限公司 数据迁移方法、装置、设备和存储介质
CN115600172A (zh) * 2022-12-15 2023-01-13 南京鹏云网络科技有限公司(Cn) 分布式存储系统的身份状态处理方法、设备、介质和计算机程序产品
CN117370078B (zh) * 2023-10-31 2024-05-28 广州鼎甲计算机科技有限公司 数据库备份管理方法、装置、计算机设备和存储介质
CN117194390B (zh) * 2023-11-08 2024-02-09 建信金融科技有限责任公司 数据库迁移方法和装置
CN117349025B (zh) * 2023-12-01 2024-02-02 中控技术股份有限公司 一种组态迁移方法、装置、电子设备和存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105528368A (zh) * 2014-09-30 2016-04-27 北京金山云网络技术有限公司 一种数据库迁移方法及装置
CN107391634A (zh) * 2017-06-30 2017-11-24 北京奇虎科技有限公司 数据迁移方法及装置
CN110502373A (zh) * 2019-07-26 2019-11-26 苏州浪潮智能科技有限公司 一种主从节点数据同步的方法、设备及可读介质
US20200104377A1 (en) * 2018-09-28 2020-04-02 Oracle International Corporation Rules Based Scheduling and Migration of Databases Using Complexity and Weight
CN111813760A (zh) * 2020-05-29 2020-10-23 阿里巴巴集团控股有限公司 数据迁移方法以及装置

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108132949B (zh) * 2016-12-01 2021-02-12 腾讯科技(深圳)有限公司 数据库集群中数据迁移的方法及装置
CN109842636A (zh) * 2017-11-24 2019-06-04 阿里巴巴集团控股有限公司 云服务迁移方法、装置以及电子设备

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105528368A (zh) * 2014-09-30 2016-04-27 北京金山云网络技术有限公司 一种数据库迁移方法及装置
CN107391634A (zh) * 2017-06-30 2017-11-24 北京奇虎科技有限公司 数据迁移方法及装置
US20200104377A1 (en) * 2018-09-28 2020-04-02 Oracle International Corporation Rules Based Scheduling and Migration of Databases Using Complexity and Weight
CN110502373A (zh) * 2019-07-26 2019-11-26 苏州浪潮智能科技有限公司 一种主从节点数据同步的方法、设备及可读介质
CN111813760A (zh) * 2020-05-29 2020-10-23 阿里巴巴集团控股有限公司 数据迁移方法以及装置

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115118604A (zh) * 2022-07-01 2022-09-27 杭州宇信数字科技有限公司 一种动态扩缩容的数据迁移方法、设备、系统和介质

Also Published As

Publication number Publication date
CN111813760A (zh) 2020-10-23
CN111813760B (zh) 2024-03-26
US20230087447A1 (en) 2023-03-23

Similar Documents

Publication Publication Date Title
WO2021238701A1 (zh) 数据迁移方法以及装置
US11416495B2 (en) Near-zero downtime relocation of a pluggable database across container databases
WO2020224374A1 (zh) 数据复制方法、装置、计算机设备及存储介质
US10915549B2 (en) Techniques for keeping a copy of a pluggable database up to date with its source pluggable database in read-write mode
US9589041B2 (en) Client and server integration for replicating data
US10891291B2 (en) Facilitating operations on pluggable databases using separate logical timestamp services
US7299378B2 (en) Geographically distributed clusters
CN113535656B (zh) 数据访问方法、装置、设备及存储介质
US20170249246A1 (en) Deduplication and garbage collection across logical databases
WO2018113580A1 (zh) 一种数据管理方法及服务器
US20140181026A1 (en) Read-only operations processing in a paxos replication system
CN111797121B (zh) 读写分离架构业务系统的强一致性查询方法、装置及系统
CN111078667B (zh) 一种数据迁移的方法以及相关装置
CN111386522A (zh) 数据库表的多区多主复制
EP2380090B1 (en) Data integrity in a database environment through background synchronization
WO2022111188A1 (zh) 事务处理方法、系统、装置、设备、存储介质及程序产品
CN108369588B (zh) 数据库级别自动存储管理
WO2024041433A1 (zh) 数据处理方法以及装置
US20230394024A1 (en) Data processing method and apparatus, electronic device, storage medium, and program product
CN113297159A (zh) 数据存储方法以及装置
CN113297231A (zh) 数据库处理方法及装置
CN113946542A (zh) 数据处理方法以及装置
CN114661690A (zh) 多版本并发控制和日志清除方法、节点、设备和介质
CN111400098A (zh) 一种副本管理方法、装置、电子设备及存储介质
US20240126781A1 (en) Consensus protocol for asynchronous database transaction replication with fast, automatic failover, zero data loss, strong consistency, full sql support and horizontal scalability

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21813370

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21813370

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 21813370

Country of ref document: EP

Kind code of ref document: A1