CN117851434A - Method, system and storage medium for concurrent migration of database - Google Patents

Method, system and storage medium for concurrent migration of database Download PDF

Info

Publication number
CN117851434A
CN117851434A CN202410258965.4A CN202410258965A CN117851434A CN 117851434 A CN117851434 A CN 117851434A CN 202410258965 A CN202410258965 A CN 202410258965A CN 117851434 A CN117851434 A CN 117851434A
Authority
CN
China
Prior art keywords
data
database
migration
source database
concurrency
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202410258965.4A
Other languages
Chinese (zh)
Other versions
CN117851434B (en
Inventor
梁彬华
张瑜平
春煜
毕锦程
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Ylink Computing System Co ltd
Original Assignee
Shenzhen Ylink Computing System Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Ylink Computing System Co ltd filed Critical Shenzhen Ylink Computing System Co ltd
Priority to CN202410258965.4A priority Critical patent/CN117851434B/en
Publication of CN117851434A publication Critical patent/CN117851434A/en
Application granted granted Critical
Publication of CN117851434B publication Critical patent/CN117851434B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method, a system and a storage medium for concurrent migration of a database, which comprise the following steps: scanning and analyzing a source database to obtain the characteristics of the source database; splitting and sequencing the tasks of the data migration according to the analysis result of the source database to form the concurrency plan; the first list structure ddl statement is led out from the source database, converted into a second list structure ddl statement of a target database, executed into the target database, and the foreign key reference is deleted from the target database according to the analysis result of the source database; performing the data migration according to the concurrency plan; and after the data is transplanted, checking, and restoring the deleted foreign key reference after the checking is successful. The method and the system can improve the concurrency of database migration, further improve the efficiency of data migration, and realize a universal, intelligent, efficient and customizable data migration scheme.

Description

Method, system and storage medium for concurrent migration of database
Technical Field
The invention relates to the field of database migration, in particular to a method, a system and a storage medium for concurrent migration of a database.
Background
Interpretation of related terms:
source database: in data migration, a database is used as a source of data.
Target database: in data migration, a database is used as a destination of data migration. The target database and the source database may be the same type of database or may be different types of databases.
Main table: among tables having foreign key association relationships, a table having a foreign key of another relationship as a primary key is called a primary table.
From the table: among tables having foreign key association relationships, a table having a foreign key is called a slave table of the master table.
In the field of database migration technology, the situation is often faced with: firstly, the amount of transplanted data is large, such as tens of TB and even hundreds of TB; secondly, the data size distribution in different data tables is extremely uneven, for example, small data tables such as parameter tables can be only hundreds of pieces of data, and transaction detail tables for recording each service can be hundreds of millions of pieces.
In the existing data migration method, in order to achieve the purpose of high efficiency, multiple data tables are generally migrated simultaneously. But typically suffer from the following drawbacks:
(1) Because the data volume of different data tables in the database is unevenly distributed, when the data tables are concurrently transplanted according to the dimension of the tables, the total time spent on data transplantation is determined by the transplantation time of the data table with larger data volume, and the total time spent on data transplantation can become the bottleneck of whole database transplantation;
(2) When there is a foreign key dependency relationship between data tables, the slave table must wait for the master table to be transplanted after the transplantation, and the parallel transplantation cannot be performed. If the data amount of the slave table and the master table is large, the total time length of data transplantation is determined by the sum of the time spent by the slave table and the master table, so that the concurrency is greatly reduced, and the improvement of the data transplantation efficiency is not facilitated.
Therefore, a method and a system for database concurrency migration are needed to be developed, which can solve the problems of low concurrency rate and low efficiency of database migration caused by uneven data volume distribution of a data table.
Disclosure of Invention
The invention aims to provide a method, a system and a storage medium for concurrent migration of a database, which are used for solving the problems of low concurrency rate and low efficiency of database migration caused by uneven data volume distribution of a data table in the prior art.
In order to achieve the above purpose, the present invention adopts the following technical scheme:
according to one aspect of the present invention, there is provided a method of database concurrency migration, the method comprising:
analyzing a source database, scanning and analyzing the source database, and obtaining the characteristics of the source database;
scheduling a concurrency plan, namely splitting and sequencing the tasks transplanted by the data according to the analysis result of the source database to form the concurrency plan;
The method comprises the steps of table structure transplanting and foreign key reference removing, wherein a first table structure ddl statement is derived from a source database, converted into a second table structure ddl statement of a target database, executed into the target database, and the foreign key reference is deleted from the target database according to the analysis result of the source database;
data transplantation, wherein the data transplantation is carried out according to the concurrency plan;
and checking the data, adding foreign key references, performing checking after the data is transplanted, and restoring the deleted foreign key references after the checking is successful.
According to another aspect of the present invention, there is provided a database concurrency migration system, comprising:
the source database analysis module scans and analyzes a source database to acquire the characteristics of the source database;
the concurrency plan scheduling module is used for splitting and sequencing the tasks transplanted by the data according to the analysis result of the source database to form the concurrency plan;
the table structure transplanting and foreign key reference removing module is used for leading out a first table structure ddl statement from the source database, converting the first table structure ddl statement into a second table structure ddl statement of a target database, executing the second table structure ddl statement into the target database, and deleting the foreign key reference in the target database according to the analysis result of the source database;
The data transplanting module is used for carrying out data transplanting according to the concurrency plan;
and the data verification and foreign key reference adding module is used for verifying after the data migration is completed, and restoring the deleted foreign key reference after the verification is successful.
Based on the foregoing solution, the method for scanning and analyzing a source database to obtain features of the source database includes:
counting the data quantity N of each data table of the source database;
summarizing the data volume to obtain the total data volume N of the source database T
Identifying an oversized table in the data table;
and analyzing the table structure of the source database to determine whether foreign key dependency relationship exists between the data tables.
Based on the foregoing solution, the method for splitting and ordering the tasks of data migration according to the result of the analysis of the source database to form the concurrency plan includes:
sequencing each data table according to the data quantity sequence, and storing the data tables into a queue Q;
sequentially taking out the data table from the queue Q, and judging whether the current data table has an oversized table identifier or not;
if yes, identifying partition keys of the oversized table, and partitioning the oversized table according to the partition keys;
And allocating the concurrent groups, calculating the average data quantity of each concurrent group, allocating the concurrent groups according to the calculation result, and generating the concurrent planning schedule.
Based on the foregoing solution, the data migration is performed according to the concurrency schedule, using a multi-process and multi-thread architecture mode, including:
different concurrent groups adopt the multi-process parallel mode to carry out the data migration;
and carrying out the data migration by adopting the multithreading parallel mode by different data tables in the same concurrent group.
Based on the foregoing scheme, the method for verifying after the data migration is completed, and restoring the deleted foreign key reference after the verification is successful includes:
checking whether the data amount of each data table in the target database is consistent with a table record number field in the concurrent scheduling table;
if yes, marking that the transplanting is successful;
if not, comparing the data in the source database and the target database one by one according to the primary key of the data table, screening out difference data, and forming an analysis report and revising sql;
executing the revision sql;
executing the foreign key reference ddl statement to restore the foreign key reference of the target database.
Based on the foregoing solution, the method for partitioning the oversized table according to the partition key includes:
calculating an average data size for each of the partitionsWherein->For the average data volume of each of the partitions, N t The data size of the super-large table is calculated, and n is the number of the super-large table divided into small data tables;
identifying partition keys of the oversized table;
computing the sum of said data amounts over i key rangesComparison->And->When meeting the size ofAnd->When i is the end point of the key range, i+1 is the start point of the next key range, where N i An amount of data within any of the key ranges;
and repeating the previous step until the oversized table partition is completed.
Based on the foregoing solution, the method for calculating the average data size of each concurrent group and distributing the concurrent group according to the calculation result includes:
calculating the average data volume of each concurrent groupWherein->For the average data volume of each of the concurrent groups, N T M is the number of concurrent groups transplanted by the data;
calculating the sum of the data amounts of j data tablesComparison->And->When meeting the size ofAnd->Then j of said data tables are assigned to one of said concurrent groups, where N j A data amount for any one of the data tables;
and repeating the previous step until all the data tables are distributed.
Based on the foregoing scheme, the generation rule of the above-mentioned revision sql includes:
if the difference data only exist in the source database, the difference data are taken out from the source database to generate an insert statement;
if the difference data only exist in the target database, the difference data are taken out from the target database to generate delete sentences;
and if the difference data exist in the source database and the target database, the difference data are taken out from the source database to generate update sentences.
The embodiment of the invention also provides a computer storage medium which stores computer executable codes; after the computer executable code is executed, the method for concurrently transplanting the database provided by the one or more technical schemes can be realized.
Compared with the prior art, the invention has at least the following advantages and positive effects:
(1) According to the method, the oversized table is partitioned, the oversized table is split into the plurality of small data tables, and the plurality of tasks are concurrently subjected to data migration in multiple processes and multiple threads, so that the problem of long migration time in a database caused by uneven data size distribution of the data tables is solved, and the migration efficiency is effectively improved;
(2) The data verification process uses partition verification, so that the verification time after data transplantation is effectively shortened, the transplantation efficiency is further improved, and the accuracy of data transplantation is ensured;
(3) The partitioning of the ultra-large table can be automatically performed according to a set technical scheme, parameterized configuration is set, and the partitioning can be flexibly adjusted according to actual conditions during use; the partition result automatically completed according to the set technical scheme can be manually adjusted according to different service characteristics of the database, so that the partition key is ensured to carry more important service and technical information, and the partition is more accurate and balanced;
(4) For the data transplanting task with small data volume, the transplanting can be deployed by using a single machine and a single node, so that the method is simple and efficient; for a data transplanting task with large data volume, the task can be laterally expanded, distributed deployment is adopted, and resources such as CPU (central processing unit), memory and the like of a plurality of machines are fully utilized for parallel transplanting;
(5) The method has good universality and is suitable for data migration among all relational databases. The migration of data between the source and target databases can be accomplished as long as they support standard select and insert statements. Therefore, the method can be widely used for databases such as oracle, mysql, postgresql, db2 and other national databases.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention as claimed.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention. It is evident that the drawings in the following description are only some embodiments of the present invention and that other drawings may be obtained from these drawings without inventive effort for a person of ordinary skill in the art. In the drawings:
FIG. 1 is a flow chart of a method for concurrent database migration according to the present invention;
FIG. 2 is a schematic diagram of data migration using multi-process and multi-thread architecture modes in accordance with the present invention;
FIG. 3 is a schematic diagram of the time spent in the prior art of data migration;
FIG. 4 is a schematic diagram of the time taken for the optimized database concurrency migration of the present invention;
FIG. 5 is a flow chart of a method of the present invention for obtaining characteristics of a source database;
FIG. 6 is a flow chart of a method of forming a concurrency plan of the present invention;
FIG. 7 is a flow chart of a method of data verification and foreign key referencing according to the present invention;
FIG. 8 is a schematic diagram of a block flow diagram system of the present invention.
Detailed Description
For a more clear explanation of the objects, technical solutions and advantages of the present invention, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are only some embodiments of the present invention, but not all embodiments, and the exemplary embodiments can be implemented in various forms and should not be construed as being limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of the example embodiments to those skilled in the art.
Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the invention. One skilled in the relevant art will recognize, however, that the invention may be practiced without one or more of the specific details, or with other methods, components, devices, steps, etc. In other instances, well-known methods, devices, implementations, or operations are not shown or described in detail to avoid obscuring aspects of the invention.
The block diagrams depicted in the figures are merely functional entities and do not necessarily correspond to physically separate entities. That is, the functional entities may be implemented in software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.
The flow diagrams depicted in the figures are exemplary only, and do not necessarily include all of the elements and operations/steps, nor must they be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the order of actual execution may be changed according to actual situations.
The invention will be described in detail with reference to specific examples below:
example 1
As shown in fig. 1, the embodiment provides a method for concurrent migration of databases, which specifically includes the following steps:
s1: analyzing a source database, scanning and analyzing the source database, and obtaining the characteristics of the source database;
in this embodiment, the scanning and analysis of the source database is an automatic process, and the data amount including, but not limited to, the data amount of each data table, the total data amount of all data tables, and the foreign key dependency relationship can be obtained by analysis, and the oversized table can be identified, so that preparation work is made for subsequent data migration.
S2: scheduling a concurrency plan, namely splitting and sequencing the tasks transplanted by the data according to the analysis result of the source database to form the concurrency plan;
in this embodiment, the concurrency planning adjusts the extremely unbalanced data amount in the concurrency task to a relatively balanced state, so that the multithreaded concurrency task is more balanced, thereby solving the problem of long migration time in the database caused by uneven data amount distribution of the data table, and effectively shortening the time of data migration.
S3: the method comprises the steps of table structure transplanting and foreign key reference removing, wherein a first table structure ddl statement is derived from a source database, converted into a second table structure ddl statement of a target database, executed into the target database, and the foreign key reference is deleted from the target database according to the analysis result of the source database;
in this embodiment, when there is a foreign key dependency relationship between data tables, the slave table must wait for the master table to be migrated before parallel migration is possible. Therefore, before data migration, all foreign key reference relations are deleted, so that concurrency of data table migration is improved. Specifically, according to the analysis result of S1, when determining that the foreign key reference exists, backing up all foreign key reference ddl sentences, and then temporarily deleting all foreign key references of the target database by using an 'alter table name drop constraint foreign key name' sentence.
S4: data transplantation, wherein the data transplantation is carried out according to the concurrency plan;
in this embodiment, as shown in fig. 2, data migration using multi-process and multi-thread architecture modes is illustratively shown. Specifically, according to the concurrency plan formed in S2, the architecture mode of multi-process and multi-thread is adopted. Further, the data migration is carried out by adopting the multi-process parallel mode by different concurrent groups, so that the resources of a plurality of machines can be fully utilized for parallel migration; and carrying out data migration on different data tables in the same concurrency group by adopting the multithreading parallel mode, specifically, carrying out parallel migration on different data tables by different threads or carrying out parallel migration on different partitions of the same oversized table by different threads.
In the existing data migration method, in order to increase the migration speed and reduce the migration time, the migration operation is generally performed in parallel with the table as the caliber, taking 6 data tables t1, t2, t3, t4, t5 and t6 in the source database as examples, and the data amount and external key dependency relationship of the 6 data tables are as follows:
the data size of t1 is 1 ten thousand, and no external key dependency relation exists;
t2 data size is 2 ten thousand, and no foreign key dependency relation exists;
t3 data size is 1 hundred million, and has external key dependency relationship, which is a master table;
the data quantity of t4 is 10 hundred million, and the data quantity has an external key dependency relationship and is a slave table of t 3;
the data quantity of t5 is 1 ten thousand records, the external key dependency relationship exists, and the data quantity is t 3's slave table;
the t6 data size is 2 hundred million records, and no foreign key dependence exists.
If 6 threads are used to concurrently migrate 6 data tables, as shown in fig. 3, a schematic diagram of the time taken by the existing data migration technique is exemplarily shown. When the foreign key dependency relationship exists, the secondary table must wait for the primary table to be transplanted after the primary table is transplanted, that is, t4 and t5 must be transplanted after t3 is transplanted, and the data size of t3 and t4 is larger, as can be seen from fig. 3, the total time spent for data transplantation mainly depends on the sum of the time spent for t3 and t4 to be transplanted, the concurrency degree at the later stage of the time axis is greatly reduced, and the effects of optimizing the concurrency degree and shortening the transplanting time cannot be expected.
In this embodiment, the six data tables t1, t2, t3, t4, t5 and t6 are transplanted, all foreign key references are deleted, and the three large data tables t3, t4 and t6 are respectively split into n small data tables with the same magnitude, where n is a different value for t3, t4 and t6 in the splitting process of the large data tables, and is a configurable parameter that can be adjusted according to the data amount of the data tables and the requirement of data transplantation. As shown in fig. 4, a schematic diagram of time spent for concurrent migration of the database optimized in this embodiment is exemplarily shown, where t3-1 represents a split first small data table of the t3 table, and the other is the same.
In this embodiment, since foreign key references are deleted and the oversized table is split before data migration is performed, concurrency is optimized, and migration time is shortened.
S5: and checking the data, adding foreign key references, performing checking after the data is transplanted, and restoring the deleted foreign key references after the checking is successful.
In this embodiment, since the processing such as removing the foreign key and partitioning is performed before the data migration, in order to ensure the accuracy of the data migration, avoiding missing or repeating the data, performing verification after the data migration is successful, and restoring the foreign key reference deleted in the foregoing step after the data migration is verified and confirmed, so as to keep the structure consistent with that of the source database table, and finally completing the data migration formally.
In the embodiment, for the data transplanting task with small data volume, a single machine and a single node can be used for deploying the transplanting, so that the method is simple and efficient; for the data transplanting task with large data volume, the method can be laterally expanded, adopts distributed deployment, and fully utilizes the resources such as cpu, memory and the like of a plurality of machines to carry out parallel transplanting.
Example 2
As shown in fig. 5, the present embodiment provides a method for obtaining features of a source database, which is specifically as follows:
S11: counting the data quantity N of each data table of the source database;
in this embodiment, the data amount of each data table is preferably counted in a manner of parallel initiation of "select count (x) from table name". In the process of counting each data table, the larger the data quantity in the data table is, the more time is consumed, but the step is only needed to be executed once, the counting result is recorded, and the subsequent processing directly adopts the counting result.
S12: summarizing the data volume to obtain the total data volume N of the source database T
In this embodiment, the data amounts of the data tables are summed, and the final result is the total data amount of the source database.
S13: identifying an oversized table in the data table;
in this embodiment, preferably, the oversized table is defined according to the data amount of the data table, and it is determined whether the data amount of each data table exceeds a set threshold according to the data amount of each data table counted in S11, and if so, the oversized table is identified as the oversized table. Further, the threshold is a configurable parameter, and a user can flexibly adjust the threshold according to actual requirements. For example, in the present embodiment, when the data amount threshold of the oversized table is set to 1 million, that is, the data amount of the data table exceeds 1 million, the oversized table is identified.
S14: and analyzing the table structure of the source database to determine whether foreign key dependency relationship exists between the data tables.
In this embodiment, the primary purpose of analyzing the table structure of the source database is to determine whether there are foreign key dependencies between the data tables. When the dependency relationship exists, the slave table can be transplanted after the master table is transplanted, and parallel transplantation cannot be performed, so that when the characteristics of the source database are acquired, the foreign key dependency relationship among the data tables is determined for the subsequent steps to be directly adopted when needed.
Further, the method for determining whether there is a foreign key dependency relationship between the data tables is as follows:
the determination is made for Mysql using the following statement:
the determination is made for Oracle using the following statement:
using this statement for postgresql makes a determination
Using this statement for db2 to make the determination
Example 3
As shown in fig. 6, the present embodiment exemplarily illustrates a method for forming a concurrency plan, which is specifically as follows:
s21: sequencing each data table according to the data quantity sequence, and storing the data tables into a queue Q;
in this embodiment, it is preferable that each data table is sorted according to the statistics of S1 in descending order of data amount and stored in the queue Q, and it should be noted that the descending order is only an exemplary illustration and is not limited by the way of arranging.
S22: sequentially taking out the data table from the queue Q, and judging whether the current data table has an oversized table identifier or not;
in this embodiment, according to the identification result of S13, it is determined whether the data table has an oversized table identification, and if so, step S43 is entered for processing; if not, the flow advances to S44 to wait for the concurrent group allocation.
S23: if yes, identifying partition keys of the oversized table, and partitioning the oversized table according to the partition keys;
in this embodiment, a partition key capable of parameterizing configuration is provided, for example, the partition key may be set to date, serial number, or the like.
Further, the method for partitioning the oversized table according to the partition key specifically comprises the following steps:
s231: calculating each of saidAverage data volume of partitionWherein->For the average data volume of each of the partitions, N t The data size of the super-large table is calculated, and n is the number of the super-large table divided into small data tables;
s232: identifying partition keys of the oversized table;
s233: computing the sum of said data amounts over i key rangesComparison->And->When meeting the size ofAnd->When i is the end point of the key range, i+1 is the start point of the next key range, where N i An amount of data within any of the key ranges;
S234: and repeating the previous step until the oversized table partition is completed.
In this embodiment, further, with the partition key having the date exch_date field set as the oversized table, the date of the exch_date field ranges from 2000 to 2024, the data size of the oversized table is 1 million, the number of the oversized table divided into small data tables is n, and the average data size of each partition is taken as an exampleThe content of the estimation of the date range according to the partition average data amount is as follows:
starting from year 2000, it is assumed that the sum of data amounts from year 2000 to year 2008 is smaller thanHowever, the sum of the data amounts in 2000 to 2009 is greater than +>Then 2000 is the start of the first partition key range, 2008 is the end of the first partition key range, 2009 is the start of the next partition key range. The above process is repeatedly performed until 2024, and partitioning of the extra-large table is completed.
Preferably, in order to simplify the complexity of calculation, in practical application, the partitioning range is not required to be too precise, and the purpose of balancing the data volume can be achieved only by ensuring that the magnitude of the data volume of each small data table is the same.
Furthermore, the embodiment also provides a simple algorithm for partitioning the oversized table, specifically taking the date exch_date field as a partitioning key of the oversized table as an example, firstly carrying out multithread statistics on the data quantity of each year, and then sorting according to the year; if the magnitude of the data quantity of the adjacent years reaches a preset value, merging the adjacent years into a partition; if the individual data amount for a year has reached a preset value, the year is taken as a partition.
In this embodiment, the partition of the oversized table is automatically deduced first, and the partition key can carry more important business and technical information by manually adjusting the partition key according to different business characteristics of the database under the result of the automatic deduction, so that the partition can be more accurate and balanced.
S24: and allocating the concurrent groups, calculating the average data quantity of each concurrent group, allocating the concurrent groups according to the calculation result, and generating the concurrent planning schedule.
In this embodiment, the concurrency group allocation is to allocate all data tables to be transplanted in the source database, including an undivided data table and an oversized table split into a plurality of partitions, and reasonably allocate all data tables to different machines for concurrency transplantation, so as to improve the transplantation speed.
Specifically, the present embodiment exemplarily illustrates a method for concurrency group allocation, including:
s241: calculating the average data volume of each concurrent groupWherein->For the average data volume of each of the concurrent groups, N T M is the number of concurrent groups transplanted by the data;
in this embodiment, m is a configurable parameter, which can be flexibly adjusted according to practical situations in practical application.
S242: calculating the sum of the data amounts of j data tablesComparison->And->When meeting +.>And->Then j of said data tables are assigned to one of said concurrent groups, where N j A data amount for any one of the data tables;
in this embodiment, preferably, the data table may be an undivided data table, or may be a small data table formed by splitting an oversized table.
S243: and repeating the previous step until all the data tables are distributed.
Preferably, in order to simplify the complexity of calculation, in practical application, the sum of the data amounts of each concurrent group only needs to ensure that the orders of magnitude are the same, so as to achieve the purpose of balancing the data amounts.
After the data table allocation is completed, a concurrency scheduling table as shown in table 1 is finally formed:
TABLE 1
Concurrency group: the grouping of different data tables is to use multiple processes to perform data transplantation, different processes configure different grouping numbers, and the process only loads the configuration table items of the grouping numbers to perform data transplantation.
Table name: and (3) the table name of the data table to be transplanted is a new table name corresponding to the split small data table for the split oversized table, and if t3-1 represents the first small data table split by the t3 table, and the other is the same.
Super large table partition white condition: for an oversized table, the partitions are partitioned according to the range of values of the partition key, and for a table with a small data volume and no need for partitioning, this field is empty.
Key range: setting the range value of partition key of oversized table partition, and for the table with small data quantity and no partition, the field is empty.
Transplanting state: the migration status is determined by this field and the state of a table is supported to be modified alone to repeatedly migrate the table.
Table record number: the data amount corresponding to the data table.
In this embodiment, preferably, after the concurrency schedule is generated in practical application, the concurrency schedule may be manually adjusted according to practical situations, so as to make the concurrency schedule more accurate.
Example 4
As shown in fig. 7, the present embodiment exemplarily illustrates a method for data verification and foreign key referencing, including:
s51: checking whether the data amount of each data table in the target database is consistent with a table record number field in the concurrent scheduling table;
in this embodiment, it is checked whether the data amount of each data table in the target database is consistent with the concurrency schedule, so as to ensure the accuracy of data migration.
S52: if yes, marking that the transplanting is successful;
S53: if not, comparing the data in the source database and the target database one by one according to the primary key of the data table, screening out difference data, and forming an analysis report and revising sql;
in this embodiment, after confirming that the data is inconsistent, the source database and the target database are directly compared, so as to screen out the differential data. Further, for the super-large table subjected to the partitioning, the data is checked according to the partitioning, and if the partitioning data is inconsistent, only the detail data of the partitioning may be checked.
Further, this embodiment exemplarily shows the generation rule of the revision sql, specifically as follows:
s531: if the difference data only exist in the source database, the difference data are taken out from the source database to generate an insert statement;
specifically, if the difference data exists only in the source database and does not exist in the target database, the missing of the migration is indicated, and the migration should be continued to the target database.
S532: if the difference data only exist in the target database, the difference data are taken out from the target database to generate delete sentences;
Specifically, if the difference data only exists in the target database and does not exist in the source database, the difference data indicates that the data which is not in the data migration range is migrated, and the difference data is deleted from the target database.
S533: and if the difference data exist in the source database and the target database, the difference data are taken out from the source database to generate update sentences.
Specifically, if the difference data exists in both the source database and the target database, but there is a difference, the difference data should be updated according to the data in the source database to ensure the accuracy of the migration. Preferably, the generated revision sql is presented in the form of a report, in which application it can be decided whether to execute into the target database according to the actual situation.
In the embodiment, after the data is successfully transplanted, the verification is carried out, so that the situations of data misplacement, missed migration and the like can be effectively avoided, and the accuracy of the data transplantation is ensured; and the data is checked by using the partition, so that the checking time is effectively shortened, and the efficiency of data transplantation is further improved.
S54: executing the revision sql;
s55: executing the foreign key reference ddl statement to restore the foreign key reference of the target database.
Specifically, after all revisions are executed, the deleted foreign key references are restored, the structure of the foreign key references is kept consistent with that of the source database table, and the data migration is formally completed.
Example 5
As shown in fig. 8, the present embodiment exemplarily presents a set of database concurrency migration system, which includes a source database analysis module, a concurrency planning module, a table structure migration and foreign key reference removal module, a data migration module, a data verification and foreign key reference addition module, specifically as follows:
the source database analysis module scans and analyzes a source database to acquire the characteristics of the source database; specifically, the source database analysis module comprises a data volume statistics and summarization unit, an oversized table identification unit and a table structure analysis unit.
Data volume statistics and summarization unit: specifically, the method is used for counting the data quantity N of each data table of a source database, and summing the data quantity of each data table to obtain the total data quantity N of the source database T For direct retrieval in the subsequent data transplanting process.
Ultra-large table identification unit: in particular for identifying an oversized table in the data table. Specifically, according to the data amount of each data table counted by the data amount counting and summarizing unit, whether the data amount of each data table exceeds a set threshold value is judged, and if so, the table is identified as an oversized table. In practical application, a user can flexibly adjust the threshold according to practical requirements.
Table structure analysis unit: specifically, the method is used for determining whether the foreign key dependency relationship exists between the data tables for direct retrieval in the subsequent data transplanting process.
The concurrency plan scheduling module is used for splitting and sequencing the tasks transplanted by the data according to the analysis result of the source database to form the concurrency plan; specifically, the concurrency planning module includes a data table queue forming unit, an oversized table partition unit, and a concurrency allocation unit.
A data table queue forming unit: specifically, according to the data amount of each data table counted by the data amount counting and summarizing unit, each data table is ordered according to the data amount sequence and stored in the queue Q, further, the order ordering can be descending or ascending, and the order ordering can be flexibly adjusted in practical application.
Ultra-large table partition unit: specifically, the specific method for determining whether the data table in the queue Q has the oversized table identifier, identifying the partition key of the data table with the oversized table identifier, and partitioning the oversized table according to the partition key is described in detail in embodiment 3, and is not described herein. In this embodiment, preferably, the oversized table is split before data migration, so that concurrency can be effectively optimized, and migration time can be shortened.
Concurrency allocation unit: specifically, the average data size of each concurrency group is calculated, the concurrency groups are allocated according to the calculation result, the specific computer allocation method is described in detail in embodiment 3, and is not described in detail herein, and finally, a concurrency planning schedule is generated, and can be directly invoked in the subsequent data transplanting process. In this embodiment, the concurrency group allocation is to allocate all data tables to be transplanted in the source database, including an undivided data table and an oversized table split into a plurality of partitions, and reasonably allocate all data tables to different machines for concurrency transplantation, so as to improve the transplantation speed.
The table structure transplanting and foreign key reference removing module is used for leading out a first table structure ddl statement from the source database, converting the first table structure ddl statement into a second table structure ddl statement of a target database, executing the second table structure ddl statement into the target database, and deleting the foreign key reference in the target database according to the analysis result of the source database; specifically, the method comprises a table structure transplanting unit and an external key removing reference unit.
Table structure transplanting unit: specifically, according to the ddl language of the table structure of the source database, the ddl language of the target database is determined so as to be consistent with the source database.
Foreign key reference unit: specifically, according to the foreign key dependency relationship determined by the table structure analysis unit, when the existence of the foreign key reference is confirmed, all foreign key reference ddl sentences are backed up first, and then all foreign key references of the target database are temporarily deleted by using the statement of 'alter table name drop constraint foreign key name'.
The data transplanting module is used for carrying out data transplanting according to the concurrency plan;
in this embodiment, specifically, data migration is performed using a multi-process and multi-thread architecture mode. Further, according to the concurrency plan output by the concurrency plan scheduling module, the method is carried out in a multi-process and multi-thread architecture mode. Further, the concurrent plan output by the concurrent plan arrangement module carries out data transplantation by adopting a multi-process parallel mode by different concurrent groups, and the resources of a plurality of machines can be fully utilized for parallel transplantation; different data tables in the same concurrency group are subjected to data migration in a multithreading parallel mode, and specifically, different threads carry out parallel migration on different data tables or different threads carry out parallel migration on different partitions of the same oversized table.
And the data verification and foreign key reference adding module is used for verifying after the data migration is completed, and restoring the deleted foreign key reference after the verification is successful. Specifically, the device comprises a data quantity checking unit, a difference data confirming unit, a difference data revising unit and an external key referencing unit.
Data amount checking unit: specifically, it is checked concurrently whether the data amount of each data table in the target database is consistent with the number of table records in the concurrency scheduling table, so as to ensure the accuracy of data migration. If the identification is consistent, the identification is successfully transplanted, and if the identification is inconsistent, the identification enters a difference data confirmation unit for further confirmation.
A difference data confirmation unit: specifically, after confirming that the data are inconsistent, the data in the source database and the target database are compared piece by piece according to the primary key of the data table, the differential data are screened out, and an analysis report and a revision sql are formed. Further, for the super-large table subjected to the partitioning, the data is checked according to the partitioning, and if the partitioning data is inconsistent, only the detail data of the partitioning may be checked.
A difference data revision unit: specifically, the difference data is distinguished in the source database and the target database according to the difference data, and the difference data is reasonably revised. Further, if the difference data only exist in the source database, the difference data are taken out from the source database to generate an insert statement; if the difference data only exist in the target database, the difference data are taken out from the target database to generate delete sentences; and if the difference data exist in the source database and the target database, the difference data are taken out from the source database to generate update sentences.
Preferably, verification is carried out after successful data migration, so that the situations of data misplacement, missed migration and the like can be effectively avoided, and the accuracy of data migration is ensured; and the data is checked by using the partition, so that the checking time is effectively shortened, and the efficiency of data transplantation is further improved.
Add foreign key reference unit: and after all the difference data are revised by the difference data revising unit, executing the foreign key reference ddl statement to restore the foreign key reference of the target database according to the foreign key reference ddl statement backed up by the foreign key reference unit, and keeping the structure consistent with that of the source database table, so that the data migration is formally completed.
In an exemplary embodiment of the present invention, a computer storage medium capable of implementing the above method is also provided. On which a program product is stored which enables the implementation of the method described above in the present specification. In some possible embodiments, various aspects of the present disclosure may also be implemented in the form of a program product comprising program code for causing an apparatus as described above to carry out the steps according to the various exemplary embodiments of the disclosure as described in the "exemplary methods" section of this specification, when the program product is run on the apparatus.
Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims. It is to be understood that the invention is not limited to the precise arrangements and instrumentalities shown in the drawings, which have been described above, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the invention is limited only by the appended claims.

Claims (10)

1. A method of database concurrency migration, the method comprising:
analyzing a source database, scanning and analyzing the source database, and obtaining the characteristics of the source database;
scheduling a concurrency plan, namely splitting and sequencing the tasks transplanted by the data according to the analysis result of the source database to form the concurrency plan;
the method comprises the steps of table structure transplanting and foreign key reference removing, wherein a first table structure ddl statement is derived from a source database, converted into a second table structure ddl statement of a target database, executed into the target database, and the foreign key reference is deleted from the target database according to the analysis result of the source database;
Data transplantation, wherein the data transplantation is carried out according to the concurrency plan;
and checking the data, adding foreign key references, performing checking after the data is transplanted, and restoring the deleted foreign key references after the checking is successful.
2. The method for concurrency migration of databases according to claim 1, wherein the method for scanning and analyzing a source database to obtain features of the source database comprises:
counting the data quantity N of each data table of the source database;
summarizing the data volume to obtain the total data volume N of the source database T
Identifying an oversized table in the data table;
and analyzing the table structure of the source database to determine whether foreign key dependency relationship exists between the data tables.
3. The method for concurrency migration of databases according to claim 2, wherein the method for splitting and ordering tasks of data migration according to the results of the analysis of the source databases to form the concurrency plan comprises:
sequencing each data table according to the data quantity sequence, and storing the data tables into a queue Q;
sequentially taking out the data table from the queue Q, and judging whether the current data table has the oversized table identifier or not;
If yes, identifying partition keys of the oversized table, and partitioning the oversized table according to the partition keys;
and allocating the concurrent groups, calculating the average data quantity of each concurrent group, allocating the concurrent groups according to the calculation result, and generating a concurrent planning schedule.
4. A method of concurrent migration of databases as in claim 3, wherein said data migration is performed in a multi-process and multi-thread architecture mode according to said concurrency schedule, comprising:
different concurrent groups adopt the multi-process parallel mode to carry out the data migration;
and carrying out the data migration by adopting the multithreading parallel mode by different data tables in the same concurrent group.
5. A method of concurrent migration of a database according to claim 3, wherein the data migration is verified after completion, and the method of restoring the deleted foreign key reference after verification is successful comprises:
checking whether the data amount of each data table in the target database is consistent with a table record number field in the concurrent scheduling table;
if yes, marking that the transplanting is successful;
If not, comparing the data in the source database and the target database one by one according to the primary key of the data table, screening out difference data, and forming an analysis report and revising sql;
executing the revision sql;
executing the foreign key reference ddl statement to restore the foreign key reference of the target database.
6. A method of concurrent migration of a database as claimed in claim 3, wherein said method of partitioning said oversized table according to said partitioning key comprises:
calculating an average data size for each of the partitionsWherein->For the average data volume of each of the partitions, N t The data size of the super-large table is calculated, and n is the number of the super-large table divided into small data tables;
identifying partition keys of the oversized table;
computing the sum of said data amounts over i key rangesComparison->And->When meeting +.>And (2) andwhen i is the end point of the key range, i+1 is the start point of the next key range, where N i An amount of data within any of the key ranges;
and repeating the previous step until the oversized table partition is completed.
7. A method of database concurrency migration according to claim 3, wherein said calculating an average data size for each of said concurrency groups, assigning said concurrency groups based on said calculation results comprises:
Calculating the average data volume of each concurrent groupWherein->For the average data volume of each of the concurrent groups, N T M is the number of concurrent groups transplanted by the data;
calculating the sum of the data amounts of j data tablesComparison->And->When meeting +.>And->Then j of said data tables are assigned to one of said concurrent groups, where N j A data amount for any one of the data tables;
and repeating the previous step until all the data tables are distributed.
8. The method of claim 5, wherein the generating rules of the revised sql comprise:
if the difference data only exist in the source database, the difference data are taken out from the source database to generate an insert statement;
if the difference data only exist in the target database, the difference data are taken out from the target database to generate delete sentences;
and if the difference data exist in the source database and the target database, the difference data are taken out from the source database to generate update sentences.
9. A computer storage medium having computer executable code stored thereon; the computer executable code, when executed, is capable of implementing the method provided in any one of claims 1-8.
10. A database concurrency migration system, comprising:
the source database analysis module scans and analyzes a source database to acquire the characteristics of the source database;
the concurrency plan scheduling module is used for splitting and sequencing the tasks transplanted by the data according to the analysis result of the source database to form the concurrency plan;
the table structure transplanting and foreign key reference removing module is used for leading out a first table structure ddl statement from the source database, converting the first table structure ddl statement into a second table structure ddl statement of a target database, executing the second table structure ddl statement into the target database, and deleting the foreign key reference in the target database according to the analysis result of the source database;
the data transplanting module is used for carrying out data transplanting according to the concurrency plan;
and the data verification and foreign key reference adding module is used for verifying after the data migration is completed, and restoring the deleted foreign key reference after the verification is successful.
CN202410258965.4A 2024-03-07 2024-03-07 Method, system and storage medium for concurrent migration of database Active CN117851434B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410258965.4A CN117851434B (en) 2024-03-07 2024-03-07 Method, system and storage medium for concurrent migration of database

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410258965.4A CN117851434B (en) 2024-03-07 2024-03-07 Method, system and storage medium for concurrent migration of database

Publications (2)

Publication Number Publication Date
CN117851434A true CN117851434A (en) 2024-04-09
CN117851434B CN117851434B (en) 2024-05-14

Family

ID=90542076

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410258965.4A Active CN117851434B (en) 2024-03-07 2024-03-07 Method, system and storage medium for concurrent migration of database

Country Status (1)

Country Link
CN (1) CN117851434B (en)

Citations (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1664816A (en) * 2005-03-11 2005-09-07 谢永强 Method for realizing transplantation between information websites of different platforms
US20070233648A1 (en) * 2006-03-30 2007-10-04 International Business Machines Corporation Execution cost reduction of sampled queries in a database
US20110029577A1 (en) * 2009-07-29 2011-02-03 One Network Enterprises, Inc. Computer program product and method for sharing information between multiple computer applications using a grafted model network
US20120265726A1 (en) * 2011-04-18 2012-10-18 Infosys Limited Automated data warehouse migration
US20140067824A1 (en) * 2012-08-30 2014-03-06 International Business Machines Corporation Database table format conversion based on user data access patterns in a networked computing environment
EP2746964A2 (en) * 2012-12-20 2014-06-25 Business Objects Software Ltd. Automatic tuning of database queries
WO2016107102A1 (en) * 2014-12-30 2016-07-07 中兴通讯股份有限公司 Short message concurrent sending service processing method and apparatus
US20180357298A1 (en) * 2017-06-08 2018-12-13 Sap Se Performance of Distributed Databases and Database-Dependent Software Applications
US10176435B1 (en) * 2015-08-01 2019-01-08 Shyam Sundar Sarkar Method and apparatus for combining techniques of calculus, statistics and data normalization in machine learning for analyzing large volumes of data
CN112650736A (en) * 2020-12-31 2021-04-13 中国农业银行股份有限公司 Data migration method and device
US20210289746A1 (en) * 2019-03-22 2021-09-23 Arizona Board Of Regents On Behalf Of Arizone State University Systems, methods, and apparatuses for implementing real time beef cattle monitoring utilizing radio-frequency identification (rfid) based technologies
CN113515306A (en) * 2021-04-26 2021-10-19 建信金融科技有限责任公司 System transplanting method and device
CA3131725A1 (en) * 2020-09-23 2022-03-23 10353744 Canada Ltd. Sql optimization method and device, computer equipment and storage medium
US11461366B1 (en) * 2021-10-20 2022-10-04 Bnsf Railway Company System and method for data pruning via dynamic partition management
CN115525624A (en) * 2022-03-17 2022-12-27 敦伟 Method for modeling and processing batch data
CN116226086A (en) * 2023-02-08 2023-06-06 电子科技大学 Sequential maintenance and verification method for segmented data in data migration process
CN116821138A (en) * 2023-08-24 2023-09-29 腾讯科技(深圳)有限公司 Data processing method and related equipment
CN117194385A (en) * 2023-10-12 2023-12-08 中国船舶集团有限公司第七一三研究所 Data migration method between different types of relational databases
CN117421302A (en) * 2023-10-31 2024-01-19 腾讯科技(深圳)有限公司 Data processing method and related equipment
CN117539849A (en) * 2023-12-28 2024-02-09 金篆信科有限责任公司 Data migration method and device of heterogeneous database, storage medium and electronic device

Patent Citations (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1664816A (en) * 2005-03-11 2005-09-07 谢永强 Method for realizing transplantation between information websites of different platforms
US20070233648A1 (en) * 2006-03-30 2007-10-04 International Business Machines Corporation Execution cost reduction of sampled queries in a database
US20110029577A1 (en) * 2009-07-29 2011-02-03 One Network Enterprises, Inc. Computer program product and method for sharing information between multiple computer applications using a grafted model network
US20120265726A1 (en) * 2011-04-18 2012-10-18 Infosys Limited Automated data warehouse migration
US20140067824A1 (en) * 2012-08-30 2014-03-06 International Business Machines Corporation Database table format conversion based on user data access patterns in a networked computing environment
EP2746964A2 (en) * 2012-12-20 2014-06-25 Business Objects Software Ltd. Automatic tuning of database queries
WO2016107102A1 (en) * 2014-12-30 2016-07-07 中兴通讯股份有限公司 Short message concurrent sending service processing method and apparatus
US10176435B1 (en) * 2015-08-01 2019-01-08 Shyam Sundar Sarkar Method and apparatus for combining techniques of calculus, statistics and data normalization in machine learning for analyzing large volumes of data
US20180357298A1 (en) * 2017-06-08 2018-12-13 Sap Se Performance of Distributed Databases and Database-Dependent Software Applications
US20210289746A1 (en) * 2019-03-22 2021-09-23 Arizona Board Of Regents On Behalf Of Arizone State University Systems, methods, and apparatuses for implementing real time beef cattle monitoring utilizing radio-frequency identification (rfid) based technologies
CA3131725A1 (en) * 2020-09-23 2022-03-23 10353744 Canada Ltd. Sql optimization method and device, computer equipment and storage medium
CN112650736A (en) * 2020-12-31 2021-04-13 中国农业银行股份有限公司 Data migration method and device
CN113515306A (en) * 2021-04-26 2021-10-19 建信金融科技有限责任公司 System transplanting method and device
US11461366B1 (en) * 2021-10-20 2022-10-04 Bnsf Railway Company System and method for data pruning via dynamic partition management
CN115525624A (en) * 2022-03-17 2022-12-27 敦伟 Method for modeling and processing batch data
CN116226086A (en) * 2023-02-08 2023-06-06 电子科技大学 Sequential maintenance and verification method for segmented data in data migration process
CN116821138A (en) * 2023-08-24 2023-09-29 腾讯科技(深圳)有限公司 Data processing method and related equipment
CN117194385A (en) * 2023-10-12 2023-12-08 中国船舶集团有限公司第七一三研究所 Data migration method between different types of relational databases
CN117421302A (en) * 2023-10-31 2024-01-19 腾讯科技(深圳)有限公司 Data processing method and related equipment
CN117539849A (en) * 2023-12-28 2024-02-09 金篆信科有限责任公司 Data migration method and device of heterogeneous database, storage medium and electronic device

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
宁泰安;刘金刚;: "基于SQL-92和JDBC的SpaceOS数据库数据迁移的实现", 计算机应用与软件, no. 12, 15 December 2014 (2014-12-15) *
段永见;: "ORACLE到DB2应用移植方法探讨", 科技资讯, no. 11, 13 April 2007 (2007-04-13) *
沈世铭;尹绍宏;: "基于XML在异构数据之间转换的研究", 仪器仪表用户, no. 01, 8 February 2007 (2007-02-08) *
甘海鹰;: "数据库性能优化方法", 科技广场, no. 07, 1 July 2009 (2009-07-01) *
陈光仪;祝转;: "异构关系数据库移植平台的设计", 现代计算机(专业版), no. 34, 5 December 2014 (2014-12-05) *
黎冬媛;周文辉;莫剑斌;: "基于MVC的网上学习系统数据库的开发", 计算机与现代化, no. 06, 15 June 2012 (2012-06-15) *

Also Published As

Publication number Publication date
CN117851434B (en) 2024-05-14

Similar Documents

Publication Publication Date Title
Sofianopoulou Manufacturing cells design with alternative process plans and/or replicate machines
US6801903B2 (en) Collecting statistics in a database system
EP0421408B1 (en) Joining two database relations on a common field in a parallel relational database field
AU770289B2 (en) Method and apparatus for concurrent DBMS table operations
Fan et al. Incrementalization of graph partitioning algorithms
US8140490B2 (en) Method, system and program for prioritizing maintenance of database tables
US20050235004A1 (en) Using estimated cost to schedule an order for refreshing a set of materialized views (MVS)
CN106611064B (en) Data processing method and device for distributed relational database
US11531704B2 (en) Autonomously partitioning database tables
CN103514223A (en) Data synchronism method and system of database
US20100257154A1 (en) Testing Efficiency and Stability of a Database Query Engine
CN109408591B (en) Decision-making distributed database system supporting SQL (structured query language) driven AI (Artificial Intelligence) and feature engineering
CN117851434B (en) Method, system and storage medium for concurrent migration of database
CN110084507A (en) The scientific workflow method for optimizing scheduling of perception is classified under cloud computing environment
US20050234945A1 (en) Allocating CPU resources for a particular refresh schedule
US8046394B1 (en) Dynamic partitioning for an ordered analytic function
Tsuchiya et al. A new heuristic algorithm based on GAs for multiprocessor scheduling with task duplication
CN112256418B (en) Big data task scheduling method
JPH1139340A (en) Data base retrieval system, multiprocessor system and data base retrieval method
CN112783892A (en) Chained task execution engine realized through event-driven model
CN110515989B (en) Data real-time statistical method based on financial data management platform
KR20130047431A (en) Method for storaging in memory and pararell-processing for batch process of mass information
CN106649754B (en) Data warehouse performance test method and system
CN111522805A (en) Distributed batch data cleaning method and system
US12007881B2 (en) Method for populating spanner databases in test environments

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant