CN113553313A - Data migration method and system, storage medium and electronic device - Google Patents

Data migration method and system, storage medium and electronic device Download PDF

Info

Publication number
CN113553313A
CN113553313A CN202110778574.1A CN202110778574A CN113553313A CN 113553313 A CN113553313 A CN 113553313A CN 202110778574 A CN202110778574 A CN 202110778574A CN 113553313 A CN113553313 A CN 113553313A
Authority
CN
China
Prior art keywords
data
version
warehouse
new database
patch
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110778574.1A
Other languages
Chinese (zh)
Other versions
CN113553313B (en
Inventor
许哲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Advanced New Technologies Co Ltd
Original Assignee
Advanced New Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Advanced New Technologies Co Ltd filed Critical Advanced New Technologies Co Ltd
Priority to CN202110778574.1A priority Critical patent/CN113553313B/en
Publication of CN113553313A publication Critical patent/CN113553313A/en
Application granted granted Critical
Publication of CN113553313B publication Critical patent/CN113553313B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/219Managing data history or versioning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a data migration method, which comprises the following steps: data migration preparation, data reflow, data patch generation and data migration completion judgment. The invention also relates to a data migration system, a storage medium and an electronic device. The invention overcomes the defect of strong dependence of leading-in and leading-out tools, adopts the data synchronous reflux tool of the data warehouse, and overcomes the defects that the checking efficiency or feasibility and the checking success can not strictly ensure the data consistency: the invention adopts row-level field-by-field comparison, and the successful check can ensure that the data are completely consistent. The method overcomes the defects that the difference data and the abnormal reason cannot be checked, and the process time is repeatedly consumed by operation. The invention overcomes the defects of manual operation of DBA, low efficiency and possible human errors, the migration process can be executed in series through the task tree of the data warehouse, and the invention realizes the efficient and full-automatic data migration through the system.

Description

Data migration method and system, storage medium and electronic device
The application is a divisional application of Chinese patent application CN109063005A, and the application date of the original application is as follows: 7 months and 10 days in 2018; the application numbers are: 201810749595.9, respectively; the invention provides the following: a data migration method and system, a storage medium and an electronic device are provided.
Technical Field
The present invention relates to the field of data migration, and in particular, to a data migration method and system, a storage medium, and an electronic device.
Background
A database middleware, which is a middleware system that manages data organized, stored, and managed according to a data structure; the middleware user can manage the authority and the data structure of the database and perform the operations of increasing, deleting, modifying and checking the data through the database language. With the rapid increase of the business volume supported by an application system, the capacity and the performance of the middleware of the database meet the bottleneck, and the storage of business data and the read-write operation of business services cannot be supported; the rapid development of database technology has brought forward new technology providing better quality features, for example, a distributed database has an infinite level capacity expansion capability compared with a traditional barycentric database; in many scenarios of business needs and architecture upgrades, there is a need for database middleware upgrades.
The upgrading of the database middleware is stateless, and only the upgrading installation of a new middleware needs to be completed, but the data carried by the database is stateful, so that the main problem to be solved in the upgrading process of the database middleware is how to efficiently migrate the data to the new database middleware in a manner of ensuring data consistency. Current data migration is usually accomplished by DBA operation, wherein a DataBase Administrator (DataBase Administrator) is responsible for system management of DataBase middleware (system CPU, memory and physical storage resource monitoring and deployment), base table management (DataBase authority and table structure, etc.), data management (data migration, backup and archiving, etc.), and the like.
The data migration method flow of the DBA operation shown in fig. 1 and the data migration system shown in fig. 2:
q1. new data table data cleanup. In order to prevent the influence of dirty data, such as data failed in the last verification or some test data, the operation is performed by deleting and reconstructing the table or emptying the table data;
and Q2, exporting the old database table data into a data file. Exporting and generating a data file through a data export tool carried by the database, such as an export command tool of mysql, wherein the data file comprises full-scale data;
q3. the new database imports the data file. Importing a data file through a data importing tool of the database, such as an import command tool of mysql, and newly adding data records in a new database;
and Q4, carrying out statistics and sampling to check the consistency of the new data and the old data. The common statistical mode is to count the number of records of a data table, the number of records of a table meeting certain characteristics, the summary value of a field and the like, and compare whether the statistical indexes of the new and old tables are consistent; the common sampling mode is to sample data records with different characteristics and compare whether all fields in the new and old tables are consistent or not; when the checkup is inconsistent, steps Q1 through Q4 are re-executed until the data migration is completed when the checkup is consistent in step Q4.
The data migration scheme of DBA operations has the following disadvantages:
1. the strong dependence of the database import and export tools results in inapplicability to all types of databases or inability to support large data volume migration. When the old database does not have an export tool or the new database does not have an import tool, the migration process cannot be completed; when the import/export tool has a limit to the size of the data file, the database may need to be migrated in stages, and even the data migration cannot be completed.
2. The consistency check of the new data and the old data is executed in a statistical and sampling mode, the check efficiency or the feasibility has problems, and the data consistency cannot be strictly ensured after the check is successful. When the table data quantity is large, the time consumption for counting the table is long, the efficiency is low, and even the table cannot be executed; when the verification is successful, misjudgment of data consistency may be caused by the condition of negative and positive, and the verification success only can ensure the consistency of the approximate rate.
3. When the inconsistency is checked, the difference data and the reason of the abnormality cannot be checked, and the export process or the import process may be abnormal, so that the data file needs to be exported and imported again, and the process time needs to be consumed repeatedly.
DBA manual operation, inefficient, and possible human error. When the amount of table data to be migrated is large, the operation of each step is time-consuming, the serial execution among the steps consumes a lot of time of the DBA, the efficiency is low, and human errors are easily caused.
Disclosure of Invention
The invention is based on at least one of the technical problems, and provides a data migration method and system, electronic equipment and a storage medium, wherein the difference of new data and old data is checked by utilizing the capability of a data warehouse for cleaning big data; the data warehouse adopts the MAP REDUCE principle to carry out parallel computation on big data in a grading way, and the computation performance does not linearly increase along with the increase of data sets. Data are compared in a data warehouse cleaning mode, and abnormal data can be compared quickly and accurately. The invention adopts recursion comparison to generate and reflux difference data set, only reflux difference data is refluxed in each recursion, and the error compensation efficiency is improved. By accurately acquiring the difference data set, only the difference data needs to be refluxed without executing all steps again like the traditional mode, and the difference data is continuously reduced in each recursion, so that the migration efficiency is greatly improved.
In order to achieve the above object, the present invention provides a data migration method, comprising the steps of:
preparing data migration, namely, performing data synchronization on an old database through a data warehouse synchronization tool to generate a first data version, cleaning data in a new database table, and emptying data in the new database;
data reflow, namely reflowing a first data version or data patch to a new database through a data warehouse synchronization tool, and then synchronizing the data of the new database to a data warehouse through the data warehouse synchronization tool to generate a second data version;
generating a data patch, comparing the data difference between the first data version and the second data version, if the difference exists, generating the data patch, and if the difference does not exist, generating no data patch;
and judging whether the data migration is finished, namely judging whether a data patch exists, if so, skipping to data reflux, and if not, finishing the data migration.
Preferably, the data reflow is preceded by the steps of performing first reflow judgment, judging whether data exists in the second data version or not, judging that the data reflow is performed for the first time if the data does not exist in the second data version or does not exist in the second data version, reflowing the first data version to the new database by the data warehouse synchronization tool in the data reflow, judging that the data reflow is not performed for the first time if the data exists in the second data version or does exist in the second data version, and reflowing the data patch to the new database by the data warehouse synchronization tool in the data reflow.
Preferably, the data warehouse adopts the Map Reduce principle to decompose the data patch generation task into a plurality of task units with the same processing logic, and merges the task units after the processing is completed to output a final result set, so as to obtain the data patch.
Further, the data differences are generated by row-level field-by-field comparison using an offline data warehouse.
Further, the data patch is generated by executing an SQL script by the data warehouse.
Further, emptying the data in the new database is realized by deleting and creating the table or directly emptying the table data.
A data migration system comprises a data warehouse, a comparison unit and a cleaning tool; the data warehouse is internally provided with a synchronization tool and two independent storage areas for storing data, and the synchronization tool is used for synchronizing the data of the new database and the data of the old database into the storage areas; the two independent storage areas are respectively a first data version and a second data version, the old database synchronizes data to the first data version through the synchronization tool, and the new database synchronizes data to the second data version through the synchronization tool; the cleaning tool is used for cleaning a new database and cleaning data in the new database, and the comparison unit is used for comparing the data difference between the first data version and the second data version and generating a data patch; the data patch is synchronized to a new database by the synchronization tool.
Preferably, the data warehouse adopts the Map Reduce principle to decompose the data patch generation task into a plurality of task units with the same processing logic, and merges the task units after the processing is completed to output a final result set, so as to obtain the data patch.
Preferably, the data differences are generated by row-level field-by-field comparison using an offline data warehouse.
Preferably, the comparison unit is specifically an SQL script.
Preferably, the cleaning tool comprises deleting and newly building the table and directly emptying the table data.
An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing a data migration method as described above when executing the program.
A computer-readable storage medium, on which a computer program is stored which, when executed by a processor, implements a data migration method as described above.
Compared with the prior art, the invention has the advantages that:
the invention provides a data migration method, which comprises the following steps: preparing data migration, namely, performing data synchronization on an old database through a data warehouse synchronization tool to generate a first data version, cleaning data in a new database table, and emptying data in the new database; data reflow, namely reflowing a first data version or data patch to a new database through a data warehouse synchronization tool, and then synchronizing the data of the new database to a data warehouse through the data warehouse synchronization tool to generate a second data version; generating a data patch, comparing the data difference between the first data version and the second data version, if the difference exists, generating the data patch, and if the difference does not exist, generating no data patch; and judging whether the data migration is finished, namely judging whether a data patch exists, if so, skipping to data reflux, and if not, finishing the data migration. The invention also relates to a data migration system, a storage medium and an electronic device. The invention overcomes the defect of strong dependence of leading-in and leading-out tools, adopts a data synchronization reflow tool of a data warehouse, and adopts a solution scheme that a big data synchronization and reflow tool are independent of a database; meanwhile, the defects that the checking efficiency or feasibility and the checking success can not strictly ensure the data consistency are overcome: the data synchronous reflow tool generally adopts synchronous reading and writing, does not adopt files for transition, and does not have the problem that the operation cannot be carried out when large data exists; by adopting row-level field-by-field comparison, the data can be ensured to be completely consistent by successful check. The method overcomes the defects that the difference data and the abnormal reason cannot be checked, and the flow time is repeatedly consumed by operation: based on MAP REDUCE, the method can accurately compare the field difference of each row by means of the cleaning capability of the off-line data warehouse, only needs to process the abnormal record again each time, gradually REDUCEs the abnormal data set, and greatly improves the migration efficiency. The invention overcomes the defects of manual operation of DBA, low efficiency and possible human errors, the data migration process can be executed in series through the task tree of the data warehouse, and the invention realizes efficient and full-automatic data migration through the system.
The foregoing description is only an overview of the technical solutions of the present invention, and in order to make the technical solutions of the present invention more clearly understood and to implement them in accordance with the contents of the description, the following detailed description is given with reference to the preferred embodiments of the present invention and the accompanying drawings. The detailed description of the present invention is given in detail by the following examples and the accompanying drawings.
Drawings
The following description will be made in further detail with reference to the accompanying drawings and embodiments of the present invention.
FIG. 1 is a flow chart of a conventional data migration method;
FIG. 2 is a schematic diagram of the data migration system of FIG. 1;
FIG. 3 is a flow chart illustrating a data migration method according to the present invention;
FIG. 4 is a flowchart of a data migration method according to embodiment 1 of the present invention;
FIG. 5 is a flowchart of a data migration method according to embodiment 2 of the present invention;
FIG. 6 is a diagram illustrating a data migration system according to the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
A data migration method, as shown in fig. 3, comprising the steps of:
preparing data migration, namely, performing data synchronization on an old database through a data warehouse synchronization tool to generate a first data version, cleaning data in a new database table, and emptying data in the new database;
data reflow, namely reflowing a first data version or data patch to a new database through a data warehouse synchronization tool, and then synchronizing the data of the new database to a data warehouse through the data warehouse synchronization tool to generate a second data version;
generating a data patch, comparing the data difference between the first data version and the second data version, if the difference exists, generating the data patch, and if the difference does not exist, generating no data patch;
and judging whether the data migration is finished, namely judging whether a data patch exists, if so, skipping to data reflux, and if not, finishing the data migration.
The method utilizes the capability of a data warehouse to clean big data to check the difference of new and old data, the data warehouse uses the Map Reduce principle to perform parallel calculation on the big data in a grading and treating manner, the calculation performance cannot be linearly increased along with the increase of a data set, and the data migration process can be performed in series through a task tree of the data warehouse; data are compared in a data warehouse cleaning mode, and abnormal data can be compared quickly and accurately. Generating and reflowing a difference data set by recursive comparison, wherein only the difference data is reflowed in each recursion, and the error compensation efficiency is improved; by accurately acquiring the difference data set, the method does not need to execute all steps again like the traditional method, and the difference data is reduced continuously in each recursion, thereby greatly improving the migration efficiency.
Map is a mapping function that is a specified operation on each element of a conceptual list (e.g., a list of test results) of individual elements (e.g., some finds that the results of all students are over-rated by one, which may define a "minus one" mapping function to correct the error). In fact, each element is operated independently, and the original list is not modified, since a new list is created to hold the new answers; that is, Map operations can be highly parallel, which is very useful for high performance demanding applications and for requirements in the field of parallel computing. Reduce, or Reduce, refers to the appropriate merging of elements of a list (e.g., if one wants to know how to do the average score of a class. Although not as parallel as the mapping function, the reduction function is also useful in a highly parallel environment because reduction always has a simple answer and large scale operations are relatively independent. The Map Reduce distributes the large-scale operation of the data set to each node on the network to realize reliability; each node will periodically return the work it has done and the latest state. If a node remains silent for more than a predetermined period of time, the master node (similar to the master server in the Google File System) records the node's status as dead and sends the data assigned to the node to another node. Each operation uses an atomic operation of a named file to ensure that conflicts between parallel threads do not occur; when files are renamed, the system may copy them to another name than the task name.
Map Reduce provides the following main functions:
1) data partitioning and computing task scheduling:
the system automatically divides the big data to be processed of one Job (Job) into a plurality of data blocks, each data block corresponds to one calculation Task (Task), and automatically schedules the calculation nodes to process the corresponding data blocks. The job and task scheduling function is mainly responsible for distributing and scheduling computing nodes (Map nodes or Reduce nodes), monitoring the execution states of the nodes and controlling the synchronization executed by the Map nodes.
2) Data/code mutual location:
in order to reduce data communication, a basic principle is localized data processing, that is, a computing node processes data stored in a local disk as much as possible, and migration of code to data is realized; when such localized data processing is not possible, other available nodes are sought and data is transferred from the network to the node (data migration to code), but as much as possible from the local rack in which the data resides is sought to reduce communication delay.
3) And (3) system optimization:
in order to Reduce data communication overhead, intermediate result data is subjected to certain merging processing before entering the Reduce node; data processed by one Reduce node may come from a plurality of Map nodes, and in order to avoid data correlation in the Reduce computing stage, an intermediate result output by the Map nodes needs to be properly divided by using a certain strategy, so that the correlation data is ensured to be sent to the same Reduce node; in addition, the system also performs some calculation performance optimization processing, such as executing multiple backups for the slowest calculation task and selecting the fastest completer as a result.
4) Error detection and recovery:
in a large-scale Map Reduce computing cluster formed by low-end commercial servers, node hardware (a host, a magnetic disk, a memory and the like) errors and software errors are normal, so the Map Reduce needs to be capable of detecting and isolating error nodes and scheduling and distributing new nodes to take over computing tasks of the error nodes. Meanwhile, the system also maintains the reliability of data storage, improves the reliability of data storage by using a multi-backup redundant storage mechanism, and can detect and recover error data in time.
Embodiment 1, as shown in fig. 4, a data migration method includes the steps of:
s11, synchronizing the old database data to a data warehouse; and synchronizing the old database table data to the data warehouse through a data warehouse synchronization tool, and assuming that the version is a data warehouse data version A.
S21, cleaning the data of the new database table; clearing the data of the new database table, and establishing the new database by deleting and newly establishing the table or directly emptying the table data; for example, in MySQL, drop, delete or PHP script is used to delete the data table and create the data table by create or PHP script.
S3, reflowing the data or the data patch to the new database through the data warehouse synchronization tool; and synchronously reflowing the data warehouse data version A (data patch if the data patch exists) to the new database through a data warehouse synchronization tool. Wherein, the core component ETL in the data warehouse is used as a synchronization tool, and represents: extracting an extraction, converting and loading a load; the extraction process represents that the operation type database collects specified data, the conversion process represents that the data are converted into a specified format and are subjected to data cleaning to ensure the data quality, and the loading process represents that the converted data meeting the specified format are loaded into the data warehouse. The data warehouse may periodically and continuously extract cleaned data from the source database.
S4, synchronizing the new database data to the data warehouse through the data warehouse synchronization tool; and synchronizing the data of the new database table to the data warehouse through a data warehouse synchronization tool, and assuming that the version is a data warehouse data version B.
S5, comparing the differences of the new data and the old data of the data warehouse to generate a data patch; and comparing the data version A of the data warehouse with the data version B of the data warehouse to generate a data patch. The data warehouse decomposes a data patch generation task into a plurality of task units with the same processing logic by adopting a Map Reduce principle, and merges the task units after the processing is finished and outputs a final result set to obtain a data patch; SQL scripts are typically executed in a data warehouse to generate data patches, taking table fields [ id (primary key identification), content (content) ] as an example:
select id1,c1 from(
select a.id id1,a.content content1,b.id id2,b.content content2
from data version A a
left outer join data version B b
on a.id=b.id
)g
where g.id2 is null; the data difference of the content is compared row-level field by an offline data warehouse, and the abnormal record is only needed to be processed again each time, so that the abnormal data set is gradually reduced, and the migration efficiency is greatly improved.
And S6, judging the data patch, if the data patch exists, jumping to S3, and if the data patch does not exist, completing the data migration.
Embodiment 2, as shown in fig. 5, a data migration method includes the steps of:
s12, cleaning the data of the new database table; clearing the data of the new database table, and establishing the new database by deleting and newly establishing the table or directly emptying the table data; for example, in MySQL, drop, delete or PHP script is used to delete the data table and create the data table by create or PHP script.
S22, synchronizing the old database data to a data warehouse; and synchronizing the old database table data to the data warehouse through a data warehouse synchronization tool to generate a first data version of the data warehouse. It should be appreciated that there is relative isolation between new database table data cleansing and synchronization of old database data to the data warehouse, and no temporal or procedural order restriction relationship.
And S23, judging the first reflux, namely judging whether data exists in the second data version or not, if the data does not exist in the second data version or does not exist in the second data version, judging the first reflux, and jumping to S31, and if the data exists in the second data version or does exist in the second data version, judging the first reflux is not, and jumping to S32. The first reflow judgment is equivalent to prior, and the type of reflow is judged by checking the data of the second data version, of course, in the first reflow, the second data version can be defined as empty, the difference between the first data version and the second data version is whole, and the first data version, namely the data patch is the whole first data version.
S31, reflowing the first data version to the new database through the data warehouse synchronization tool; and synchronously reflowing the first data version to the new database through the data warehouse synchronization tool.
S32, reflowing the data patch to the new database through the data warehouse synchronization tool; and synchronously reflowing the data patch to the new database through a data warehouse synchronization tool.
Wherein, the core component ETL in the data warehouse is used as a synchronization tool, and represents: extracting an extraction, converting and loading a load; the extraction process represents that the operation type database collects specified data, the conversion process represents that the data are converted into a specified format and are subjected to data cleaning to ensure the data quality, and the loading process represents that the converted data meeting the specified format are loaded into the data warehouse. The data warehouse may periodically and continuously extract cleaned data from the source database.
S4, synchronizing the new database data to the data warehouse through the data warehouse synchronization tool; and synchronizing the data of the new database table to the data warehouse through a data warehouse synchronization tool to generate a second data version of the data warehouse.
S5, comparing the differences of the new data and the old data of the data warehouse to generate a data patch; and comparing the first data version and the second data version of the data warehouse to generate a data patch. The data warehouse decomposes a data patch generation task into a plurality of task units with the same processing logic by adopting a Map Reduce principle, and merges the task units after the processing is finished and outputs a final result set to obtain a data patch; SQL scripts are typically executed in a data warehouse to generate data patches, taking table fields [ id (primary key identification), content (content) ] as an example:
select id1,c1 from(
select a.id id1,a.content content1,b.id id2,b.content content2
from first data version A a
left outer join second data version B b
on a.id=b.id
)g
where g.id2 is null; the data difference of the content is compared row-level field by an offline data warehouse, and the abnormal record is only needed to be processed again each time, so that the abnormal data set is gradually reduced, and the migration efficiency is greatly improved.
And S6, judging the data patch, if the data patch exists, jumping to S23, and if the data patch does not exist, completing the data migration.
A data migration system, as shown in FIG. 6, includes a data warehouse, a comparison unit, and a cleaning tool; the data warehouse is internally provided with a synchronization tool and two independent storage areas for storing data, and the synchronization tool is used for synchronizing the data of the new database and the data of the old database into the storage areas; the two independent storage areas are respectively a first data version and a second data version, the old database synchronizes data to the first data version through the synchronization tool, and the new database synchronizes data to the second data version through the synchronization tool; the cleaning tool is used for cleaning a new database and cleaning data in the new database, and the comparison unit is used for comparing the data difference between the first data version and the second data version and generating a data patch; the data patch is synchronized to a new database by the synchronization tool.
Preferably, the data warehouse adopts the Map Reduce principle to decompose the data patch generation task into a plurality of task units with the same processing logic, and merges the task units after the processing is completed to output a final result set, so as to obtain the data patch.
Preferably, the data differences are generated by row-level field-by-field comparison using an offline data warehouse.
Preferably, the comparison unit is specifically an SQL script.
Preferably, the cleaning tool comprises deleting and newly building the table and directly emptying the table data.
An electronic device comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor executes the computer program to realize the data migration method.
A computer-readable storage medium, on which a computer program is stored, which computer program, when being executed by a processor, carries out a data migration method as set forth above.
The invention provides a data migration method, which comprises the following steps: preparing data migration, namely, performing data synchronization on an old database through a data warehouse synchronization tool to generate a first data version, cleaning data in a new database table, and emptying data in the new database; data reflow, namely reflowing a first data version or data patch to a new database through a data warehouse synchronization tool, and then synchronizing the data of the new database to a data warehouse through the data warehouse synchronization tool to generate a second data version; generating a data patch, comparing the data difference between the first data version and the second data version, if the difference exists, generating the data patch, and if the difference does not exist, generating no data patch; and judging whether the data migration is finished, namely judging whether a data patch exists, if so, skipping to data reflux, and if not, finishing the data migration. The invention also relates to a data migration system, a storage medium and an electronic device. The invention overcomes the defect of strong dependence of leading-in and leading-out tools, adopts a data synchronization reflow tool of a data warehouse, and adopts a solution scheme that a big data synchronization and reflow tool are independent of a database; meanwhile, the defects that the checking efficiency or feasibility and the checking success can not strictly ensure the data consistency are overcome: the data synchronous reflow tool generally adopts synchronous reading and writing, does not adopt files for transition, and does not have the problem that the operation cannot be carried out when large data exists; by adopting row-level field-by-field comparison, the data can be ensured to be completely consistent by successful check. The method overcomes the defects that difference data and abnormal reasons cannot be checked and the repeated process time consumption of operation is avoided, can accurately compare the field difference of each row based on MAP REDUCE by means of the cleaning capability of an offline data warehouse, only needs to supplement and process the abnormal record once again, gradually REDUCEs the abnormal data set, and greatly improves the migration efficiency. The defects of manual operation, low efficiency and possible human errors of the DBA are overcome, the data migration process can be executed in series through the task tree of the data warehouse, and the invention realizes efficient and full-automatic data migration through the system.
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above examples only show some embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (19)

1. A method of data migration, comprising:
performing data synchronization on the old database through a data warehouse synchronization tool to generate a first data version; and emptying the data in the new database;
reflowing the first data version or the data patch to the new database through a data warehouse synchronization tool, and synchronizing the data of the new database to the data warehouse through the data warehouse synchronization tool to generate a second data version;
comparing the data difference between the first data version and the second data version, and if the data difference exists, generating a data patch; if no difference exists, no data patch is generated;
and judging whether a data patch exists, if so, skipping to data reflux, and if not, completing data migration.
2. The data migration method according to claim 1, wherein before reflowing the first data version or the data patch to the new database, it is determined whether the second data version exists or whether data exists in the second data version; if the second data version does not exist or no data exists in the second data version, reflowing the first data version to the new database; and if the second data version exists or the data exists in the second data version, reflowing the data patch to the new database.
3. The data migration method according to claim 1, wherein a data patch generation task is decomposed into a plurality of task units with the same processing logic by using a Map Reduce principle, and the processed task units are merged to output a final result set to obtain the data patch.
4. A method of data migration as claimed in claim 3, wherein said data differences are generated by row level field-by-field alignment using an offline data warehouse.
5. A data migration method as claimed in claim 3, wherein said data patch is generated by a data warehouse executing SQL scripts.
6. A data migration method as claimed in claim 3, wherein the clearing of data in the new database is performed by deleting and creating tables or by directly clearing table data.
7. A method of data migration, comprising:
synchronizing the old database data to a data warehouse and cleaning the data of the new database; the old database data is data of a first data version;
synchronizing data of the first data version of the data warehouse to the new database;
synchronizing new database data to the data warehouse;
comparing the data of the first data version of the data warehouse with the data newly synchronized by the new database for data differences;
and if the data difference does not exist, finishing the data migration.
8. The method of claim 7, wherein each time a data difference exists between the data of the first data version of the data warehouse and the data newly synchronized by the new database, if the data of the first data version of the data warehouse and the data newly synchronized by the new database exist a data difference, a data patch is generated and the newly generated data patch is synchronized to the new database; synchronizing the latest data of the new database to the data warehouse, and comparing whether the data of the first data version of the data warehouse and the data newly synchronized by the new database have data difference.
9. The method of claim 8, wherein each time a data difference exists between the data of the first data version of the data warehouse and the data newly synchronized by the new database, if no data difference exists between the data of the first data version of the data warehouse and the data newly synchronized by the new database, then the data migration is completed.
10. The method of claim 8, generating a data patch comprising: and decomposing the data patch generation task into a plurality of task units with the same processing logic by adopting a Map Reduce principle, and merging the task units after the processing is finished and outputting a final result set to obtain the data patch.
11. The method of claim 7, wherein the cleaning of the data of the new database is realized by deleting and creating the table or directly cleaning the table data.
12. The method of any of claims 8 to 11, wherein the data patch is generated from data differences in the first data version of the data warehouse with data newly synchronized with the new database each time there are data differences in the first data version of the data warehouse with data newly synchronized with the new database.
13. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the method of any one of claims 1 to 12 when executing the program.
14. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 12.
15. A data migration system comprises a data warehouse, a comparison unit and a cleaning tool; the data warehouse is internally provided with a synchronization tool and two independent storage areas for storing data, and the synchronization tool is used for synchronizing the data of the new database and the data of the old database into the storage areas; the two independent storage areas are respectively a first data version and a second data version, the old database synchronizes data to the first data version through the synchronization tool, and the new database synchronizes data to the second data version through the synchronization tool; the cleaning tool is used for cleaning a new database and cleaning data in the new database, and the comparison unit is used for comparing the data difference between the first data version and the second data version and generating a data patch; the data patch is synchronized to a new database by the synchronization tool.
16. The data migration system according to claim 15, wherein the data warehouse decomposes the data patch generation task into a plurality of task units with the same processing logic by using a Map Reduce principle, and merges the task units after the processing is completed to output a final result set, so as to obtain the data patch.
17. A data migration system as claimed in claim 15, wherein said data differences are generated by row level field-by-field alignment using an offline data warehouse.
18. The data migration system of claim 15, wherein the comparison unit is embodied as an SQL script.
19. A data migration system according to any one of claims 15 to 18, said cleaning means comprising deleting and creating tables, emptying table data directly.
CN202110778574.1A 2018-07-10 2018-07-10 Data migration method and system, storage medium and electronic equipment Active CN113553313B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110778574.1A CN113553313B (en) 2018-07-10 2018-07-10 Data migration method and system, storage medium and electronic equipment

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810749595.9A CN109063005B (en) 2018-07-10 2018-07-10 Data migration method and system, storage medium and electronic device
CN202110778574.1A CN113553313B (en) 2018-07-10 2018-07-10 Data migration method and system, storage medium and electronic equipment

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
CN201810749595.9A Division CN109063005B (en) 2018-07-10 2018-07-10 Data migration method and system, storage medium and electronic device

Publications (2)

Publication Number Publication Date
CN113553313A true CN113553313A (en) 2021-10-26
CN113553313B CN113553313B (en) 2023-12-05

Family

ID=64819317

Family Applications (2)

Application Number Title Priority Date Filing Date
CN202110778574.1A Active CN113553313B (en) 2018-07-10 2018-07-10 Data migration method and system, storage medium and electronic equipment
CN201810749595.9A Active CN109063005B (en) 2018-07-10 2018-07-10 Data migration method and system, storage medium and electronic device

Family Applications After (1)

Application Number Title Priority Date Filing Date
CN201810749595.9A Active CN109063005B (en) 2018-07-10 2018-07-10 Data migration method and system, storage medium and electronic device

Country Status (1)

Country Link
CN (2) CN113553313B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113553313B (en) * 2018-07-10 2023-12-05 创新先进技术有限公司 Data migration method and system, storage medium and electronic equipment
CN111367886B (en) * 2020-03-02 2024-01-19 中国邮政储蓄银行股份有限公司 Method and device for data migration in database
CN112422635B (en) * 2020-10-27 2023-05-23 中国银联股份有限公司 Data checking method, device, equipment, system and storage medium
CN112905602B (en) * 2021-03-26 2022-09-30 掌阅科技股份有限公司 Data comparison method, computing device and computer storage medium
CN113157668B (en) * 2021-04-23 2022-06-10 上海数禾信息科技有限公司 Non-stop data migration method and device

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2001033468A1 (en) * 1999-11-03 2001-05-10 Accenture Llp Data warehouse computing system
CN101105793A (en) * 2006-07-11 2008-01-16 阿里巴巴公司 Data processing method and system of data library
US20100145910A1 (en) * 2008-12-10 2010-06-10 Alibaba Group Holding Limited Method and system for efficient data synchronization
US20150363452A1 (en) * 2014-06-15 2015-12-17 Enping Tu Hybrid database upgrade migration
US20180095953A1 (en) * 2016-10-05 2018-04-05 Sap Se Multi-procedure support in data migration
CN109063005B (en) * 2018-07-10 2021-05-25 创新先进技术有限公司 Data migration method and system, storage medium and electronic device

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9519695B2 (en) * 2013-04-16 2016-12-13 Cognizant Technology Solutions India Pvt. Ltd. System and method for automating data warehousing processes
CN104915341B (en) * 2014-03-10 2018-06-26 中国科学院沈阳自动化研究所 Visualize multiple database ETL integrated approaches and system
CN105389312A (en) * 2014-09-04 2016-03-09 上海福网信息科技有限公司 Big data migration method and tool
US10089377B2 (en) * 2014-09-26 2018-10-02 Oracle International Corporation System and method for data transfer from JDBC to a data warehouse layer in a massively parallel or distributed database environment
CN104899333A (en) * 2015-06-24 2015-09-09 浪潮(北京)电子信息产业有限公司 Cross-platform migrating method and system for Oracle database
US10331656B2 (en) * 2015-09-25 2019-06-25 Microsoft Technology Licensing, Llc Data migration validation
CN107122355B (en) * 2016-02-24 2021-07-06 阿里巴巴集团控股有限公司 Data migration system and method
CN106570086B (en) * 2016-10-19 2020-08-14 上海携程商务有限公司 Data migration system and data migration method
CN107656951B (en) * 2016-12-23 2018-11-23 航天星图科技(北京)有限公司 A kind of method of real time data in synchronous and heterogeneous Database Systems
CN107315814B (en) * 2017-06-29 2021-03-02 苏州浪潮智能科技有限公司 Method and system for verifying data consistency after data migration of KDB (KDB) database
CN107958082B (en) * 2017-12-15 2021-03-26 杭州有赞科技有限公司 Off-line increment synchronization method and system from database to data warehouse

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2001033468A1 (en) * 1999-11-03 2001-05-10 Accenture Llp Data warehouse computing system
CN101105793A (en) * 2006-07-11 2008-01-16 阿里巴巴公司 Data processing method and system of data library
US20100145910A1 (en) * 2008-12-10 2010-06-10 Alibaba Group Holding Limited Method and system for efficient data synchronization
US20150363452A1 (en) * 2014-06-15 2015-12-17 Enping Tu Hybrid database upgrade migration
US20180095953A1 (en) * 2016-10-05 2018-04-05 Sap Se Multi-procedure support in data migration
CN109063005B (en) * 2018-07-10 2021-05-25 创新先进技术有限公司 Data migration method and system, storage medium and electronic device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
景清锟;徐启恒;黄滢冰;余谦;: "地理信息公共服务平台更新运维体系升级策略研究", 地理信息世界 *

Also Published As

Publication number Publication date
CN109063005A (en) 2018-12-21
CN109063005B (en) 2021-05-25
CN113553313B (en) 2023-12-05

Similar Documents

Publication Publication Date Title
CN109063005B (en) Data migration method and system, storage medium and electronic device
CN103116596B (en) System and method of performing snapshot isolation in distributed databases
CN108694195B (en) Management method and system of distributed data warehouse
CN104252481A (en) Dynamic check method and device for consistency of main and salve databases
CN105608086A (en) Transaction processing method and device of distributed database system
CN108509462B (en) Method and device for synchronizing activity transaction table
US20130227194A1 (en) Active non-volatile memory post-processing
CN104036029B (en) Large data consistency control methods and system
CN109643310B (en) System and method for redistribution of data in a database
CN107545015B (en) Processing method and processing device for query fault
JP2015504218A (en) Distributed database with modular blocks and associated log files
CN110019469B (en) Distributed database data processing method and device, storage medium and electronic device
US11226985B2 (en) Replication of structured data records among partitioned data storage spaces
CN107665219B (en) Log management method and device
CN111930716A (en) Database capacity expansion method, device and system
CN111930850A (en) Data verification method and device, computer equipment and storage medium
CN115145943B (en) Method, system, equipment and storage medium for rapidly comparing metadata of multiple data sources
CN114942965B (en) Method and system for accelerating synchronous operation of main database and standby database
CN111143323B (en) MPP database management method, device and system
CN112000649A (en) Incremental data synchronization method and device based on map reduce
CN111382198A (en) Data recovery method, device, equipment and storage medium
Wang et al. A work-stealing scheduling framework supporting fault tolerance
CN104750849A (en) Method and system for maintaining tree structure-based directory relationship
CN115098486A (en) Real-time data acquisition method based on customs service big data
CN111581221B (en) Method for redundant storage and reconstruction of information of distributed multi-station fusion system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant