CN114968975A

CN114968975A - Data migration method, device, equipment and storage medium

Info

Publication number: CN114968975A
Application number: CN202210376626.7A
Authority: CN
Inventors: 姚晓龙; 张雪
Original assignee: China United Network Communications Group Co Ltd
Current assignee: China United Network Communications Group Co Ltd
Priority date: 2022-04-12
Filing date: 2022-04-12
Publication date: 2022-08-30

Abstract

The disclosure provides a data migration method, a data migration device, data migration equipment and a storage medium. The method comprises the following steps: acquiring data configuration rules of an original database and a target database; processing the data in the original database based on the data configuration rule, and storing the processed data in a cache database; acquiring a data inspection log corresponding to a data processing process; based on the data check log, repairing abnormal data appearing in the cache database; and after the repair is determined to be completed, migrating the data of the cache database to the target database. The method and the system realize automatic repair of abnormal data in the migration process among different databases, complete automatic migration of mass data, and are high in efficiency and easy to master and maintain.

Description

Data migration method, device, equipment and storage medium

Technical Field

The present disclosure relates to the field of data processing technologies, and in particular, to a data migration method, apparatus, device, and storage medium.

Background

With the development of database technology, the database modification migration between systems with upgraded versions is more and more common, and the migration of data becomes a normalized operation. In order to ensure the consistency of data before and after data migration, the migrated data needs to be comprehensively audited. For the migration of data with data volume more than ten million levels, the data consistency before and after the data migration needs to spend a lot of time for auditing and repairing.

The manual comparison method auditing method in the prior art has poor flexibility and high error rate, and cannot meet the requirements of efficiency and accuracy in the mass data migration process.

Disclosure of Invention

The disclosure provides a data migration method, a data migration device, data migration equipment and a storage medium, so as to improve efficiency and accuracy in a mass data migration process.

In a first aspect, the present disclosure provides a data migration method, where the data migration method includes:

acquiring data configuration rules of an original database and a target database;

processing the data in the original database based on the data configuration rule;

storing the processed data in the original database in a cache database;

acquiring a data inspection log corresponding to a data processing process;

based on the data check log, repairing abnormal data appearing in the cache database;

and after the repair is determined to be completed, migrating the data of the cache database to the target database.

Optionally, the repairing abnormal data occurring in the cache database based on the data check log includes: determining the corresponding abnormal position of the abnormal data in the buffer database based on the data check log; and restoring the abnormal data based on the original data of the position corresponding to the abnormal position in the original database.

Optionally, the method further comprises: acquiring an automatic repairing rule of abnormal data; correspondingly, based on the data check log, repairing the abnormal data appearing in the cache database, further comprising: and repairing the abnormal data based on the automatic repairing rules and the data in the original database.

Optionally, after the abnormal data is repaired based on the original data corresponding to the abnormal position in the original database, the method further includes: and determining that the abnormal data is repaired completely, wherein the repaired data is used for indicating that the original data is the same as the repaired data of the abnormal position in the cache database.

Optionally, after repairing abnormal data occurring in the cache database based on the data check log, the method further includes: and determining that the abnormal data is repaired, wherein the repair completion is used for indicating that no abnormal data is newly added in the data check log.

Optionally, after it is determined that the repair is completed, migrating the data of the cache database to the target database, including: when data of a cache database is migrated to a target database to generate abnormal data, acquiring a corresponding migration data check log in the data migration process; based on the migration data check log, repairing abnormal data appearing in the cache database; and after the repair is determined to be completed, continuously migrating the data of the cache database to the target database.

Optionally, after obtaining the data configuration rules of the original database and the target database, the method further includes: determining that the data configuration rules in the original database and the target database meet the following pre-rules: the data types of fields between the forms in the original database and the target database are the same; the length of fields between the forms in the original database and the target database is the same; the field names between the forms in the original database and the target database are the same; the form in the target database contains primary keys.

Optionally, after obtaining the data configuration rules of the original database and the target database, the method further includes: preprocessing data in an original database based on a data configuration rule in a target database; and determining that the data configuration rules in the preprocessed original database and the preprocessed target database can meet the pre-set rules.

In a second aspect, the present disclosure provides a data migration apparatus comprising:

the acquisition module is used for acquiring data configuration rules of an original database and a target database;

the processing module is used for processing the data in the original database based on the data configuration rule;

the cache module is used for storing the processed data in the original database in a cache database;

the checking module is used for acquiring a data checking log corresponding to the data processing process;

the recovery module is used for recovering abnormal data in the cache database based on the data check log;

and the migration module is used for migrating the data of the cache database to the target database after the repair is determined to be completed.

Optionally, the repair module is specifically configured to determine, based on the data check log, an abnormal position corresponding to the abnormal data in the buffer database; and restoring the abnormal data based on the original data of the position corresponding to the abnormal position in the original database.

Optionally, the repair module is further configured to obtain an automatic repair rule of the abnormal data; correspondingly, based on the data check log, repairing the abnormal data appearing in the cache database, further comprising: and repairing the abnormal data based on the automatic repairing rules and the data in the original database.

Optionally, the repairing module is further configured to determine that repairing of the abnormal data is completed after repairing the abnormal data based on the original data corresponding to the abnormal position in the original database, where the repaired data is used to indicate that the original data is identical to the repaired data of the abnormal position in the cache database.

Optionally, the repairing module is further configured to determine that repairing of the abnormal data is completed after the abnormal data appearing in the cache database is repaired based on the data check log, where the repairing is completed and is used to indicate that no new abnormal data is added in the data check log.

Optionally, the migration module is further configured to, when data in the cache database is migrated to the target database to generate abnormal data, obtain a migration data check log corresponding to the data migration process; based on the migration data check log, repairing abnormal data appearing in the cache database; and after the repair is determined to be completed, continuously migrating the data of the cache database to the target database.

Optionally, the obtaining module is further configured to, after obtaining the data configuration rules of the original database and the target database, determine that the data configuration rules in the original database and the target database satisfy the following pre-rules: the data types of fields between the forms in the original database and the target database are the same; the length of fields between the forms in the original database and the target database is the same; the field names between the forms in the original database and the target database are the same; the form in the target database contains primary keys.

Optionally, the obtaining module is further configured to, after obtaining the data configuration rules of the original database and the target database, perform preprocessing on the data in the original database based on the data configuration rules in the target database; and determining that the data configuration rules in the preprocessed original database and the preprocessed target database can meet the pre-set rules.

Optionally, the obtaining module includes a Django architecture and a front-end interface set in the Django architecture, where the front-end interface is used to obtain data configuration rules of an original database and a target database; the processing module comprises a data migration tool yugong, and the data migration tool yugong is used for processing data in an original database; the caching module comprises a Hadoop Distributed File System (HDFS) in a Hadoop system of the data file system, and the HDFS is used for caching processed data; the checking module comprises a CHECK mode in a yugong tool of the data migration tool, and the CHECK mode is used for acquiring a data checking log corresponding to the data processing process; the restoration module comprises a MapReduce model of a data processing tool contained in the Hadoop system, and the MapReduce model is used for restoring abnormal data found by the data check log; the migration module comprises the yugong tool, and the yugong tool is also used for migrating the data in the HDFS to the target database.

Optionally, the repair module further includes: and the data restoration interface is arranged in the Django architecture and used for restoring abnormal data appearing in the cache database based on the data check log.

In a third aspect, the present disclosure also provides an electronic device, including:

at least one processor;

and a memory communicatively coupled to the at least one processor;

the memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor to cause the electronic device to perform a data migration method corresponding to any one of the embodiments of the first aspect of the disclosure.

In a fourth aspect, the present disclosure also provides a computer-readable storage medium, in which computer-executable instructions are stored, and when the computer-executable instructions are executed by a processor, the computer-executable instructions are used for implementing the data migration method according to any one of the first aspects of the present disclosure.

According to the data migration method, the data migration device, the data migration equipment and the data migration storage medium, the data configuration rules of an original database and a target database are obtained; processing the data in the original database based on a data configuration rule, and storing the processed data in a cache database; then acquiring a data check log corresponding to the data processing process, and repairing abnormal data appearing in the cache database based on the data check log; and finally, after the repair is determined to be completed, migrating the data of the cache database to the target database. Because the data inevitably has difference in the migration process among different databases, abnormal data in the processed data is firstly repaired in the cache database and then directly migrated to the target database without modification, so that field-by-field automatic inspection between the heterogeneous database tables and the tables can be realized, the accuracy is high, the efficiency is obviously improved, the problem can be directly determined through the data inspection log, the problem positioning and problem solving are facilitated, and operators can more easily perform and maintain the data.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure.

Fig. 1 is an application scenario diagram of a data migration method according to an embodiment of the present disclosure;

FIG. 2 is a flow chart of a data migration method provided by an embodiment of the present disclosure;

FIG. 3 is a flow chart of a data migration method according to another embodiment of the present disclosure;

FIG. 4 is a schematic structural diagram of a data migration apparatus according to yet another embodiment of the present disclosure;

fig. 5 is a schematic connection relationship diagram of structures corresponding to a data migration apparatus according to yet another embodiment of the present disclosure;

fig. 6 is a schematic structural diagram of an electronic device according to yet another embodiment of the present disclosure.

Specific embodiments of the present disclosure have been shown by way of example in the drawings and will be described in more detail below. These drawings and written description are not intended to limit the scope of the disclosed concepts in any way, but rather to illustrate the concepts of the disclosure to those skilled in the art by reference to specific embodiments.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.

The following describes the technical solutions of the present disclosure and how to solve the above technical problems in specific embodiments. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments. Embodiments of the present disclosure will be described below with reference to the accompanying drawings.

The following is a description of the related terms referred to in the embodiments of the present disclosure:

django architecture: the open-source Web application framework based on the Python language is used for deploying tools for data migration and an interactive interface for user operation, so that data migration and abnormal data repair processing are completed.

Hadoop system: and the distributed system infrastructure installed in the Django architecture is used for caching the data to be migrated and processing the data.

MapReduce model: a model tool for data processing in a Hadoop system maps a group of key value pairs into a group of new key value pairs by appointing a Map mapping function, and then appoints a concurrent Reduce reduction function to ensure that all mapped key value pairs share the same key group, thereby realizing the parallel processing of large-scale data.

HDFS (Hadoop distributed File System): the System is called a Hadoop Distributed File System and a Hadoop Distributed File System, and is used for being matched with a MapReduce model to cache large-scale data so that the MapReduce model can process the data.

yugong tool: corresponding to the fact that Chinese is named 'Yugong (moving mountain)', the open-source database migration tool can realize large-scale data migration.

CHECK mode: the yugong tool comprises a functional mode which is used for generating and acquiring an automatic data migration log so as to check relevant information in the data migration process, such as migration progress and problems.

The DRDS system comprises: the distributed Relational Database is a horizontally split, smoothly expandable and read-write separated online distributed Database Service, and is used for providing a key value corresponding to a specific sub-Database in which selected data in an original Database is located, so that the selected data can be quickly positioned for modification.

RDS system: the system is called a Relational Database Service, a sub-Database for storing selected data in an original Database is used for quickly positioning an RDS system where the selected data is located through sub-Database key values provided by a DRDS system, and then determining the accurate position of the selected data.

With the development of database technology and data storage technology, the transformation and migration of a service system related to big data and the migration of data therein become normative and are particularly important, how to ensure the accuracy and comprehensiveness of the migration of data above ten million levels on the basis of a heterogeneous database, how to improve the efficiency of data comparison after data migration, and avoid the problems of inflexibility and error proneness caused by the need of manual comparison in the conventional data auditing method, and the labor and material cost caused by manually writing an auditing script.

Different key points of migration data need to be compared before and after traditional data migration, difference analysis and abnormity analysis are carried out on the data according to predefined data comparison and verification rules, for the analysis result, real-time early warning and notification are carried out on the one hand, and a statistical report is produced according to a predefined report template on the other hand, so that the accuracy before and after the data migration is ensured.

The comparison process is configured as an automatic process, and for data corresponding to a core business object or entity, a comparison time range and business rules can be defined, and real-time data comparison work can be performed on the data before and after migration.

When data are found to be abnormal through comparison, the abnormal data need to be repaired, a conventional data repairing method such as data repairing through programming statements (and migrating by using a migration tool) or data repairing through a storage process and a trigger, secondary processing of the data through codes and an interface and the like can only meet the requirements of extremely small parts of simplicity, and for data with small magnitude, the data processing with more rules and complexity is weak, and a server with lower processing performance and environment configuration has risks; the period of work is doubled after the data are migrated again, and the manpower consumption and the work efficiency are increased; data restoration and secondary processing of codes and interfaces on data through a storage process and a trigger not only have high requirements on the technology of operators, but also the iteration of a system and the change of personnel are considered, and the tracing of original data in the later period becomes difficult.

Meanwhile, due to the difference of the new and old system structures, when the data magnitude is large enough, the method cannot compare all fields in the migrated table, even if program scripts are compiled for comparison, the requirements on programming capability and language basis are high, and for different auditing (namely data comparison and verification) requirements, different auditing scripts need to be compiled, so that the method has no reusability of the scripts and low efficiency.

For data migration of a database based on Excel, data can be audited and repaired through various data analysis functions such as Excel self-contained functions VLOOKUP, EXACT and the like. However, due to the complexity of the function, the control degree of the auditors is different, the function is limited to a few rules, other methods need to be combined to support the audit requirement, and the method cannot be applied to the data of other data sources. And the problems that manual comparison is easy to make mistakes and is not easy to find exist.

Another method is to use a conventional data verification tool to verify data consistency, and a process is required to confirm whether data migration is consistent before and after data migration. However, the configuration environment of such tools is complicated, and such tools need higher learning cost, more complex mastering difficulty and higher technical requirements on operators.

In order to solve the above problem, an embodiment of the present disclosure provides a data migration method, where data in an original database is processed based on data configuration rules of the original database and a target database, and is stored in a cache database, and data migration is performed after abnormal data existing in the data is repaired, so that accuracy of data migration can be ensured, and efficiency of data repair in a data migration process can be reduced.

The following explains an application scenario of the embodiment of the present disclosure:

fig. 1 is an application scenario diagram of a data migration method according to an embodiment of the present disclosure. As shown in fig. 1, in the process of data migration, the server 100 processes data in the original database 110 based on the data configuration rule of the target database 120, stores the processed data in the cache database 130, and migrates the processed data to the target database 120 after repair is completed, so as to complete data migration.

It should be noted that, in the scenario shown in fig. 1, the server, the original database, the cache database, and the target database are only illustrated as an example, but the present disclosure is not limited thereto, that is, the number of the server, the original database, the cache database, and the target database may be any.

The data migration method provided by the present disclosure is explained in detail by specific embodiments below. It should be noted that the following specific embodiments may be combined with each other, and the same or similar concepts or processes may not be described in detail in some embodiments.

Fig. 2 is a flowchart of a data migration method according to an embodiment of the present disclosure. As shown in fig. 2, the method comprises the following steps:

step S201, acquiring data configuration rules of the original database and the target database.

The original database is used for storing data to be migrated, and the target database is used for receiving the data to be migrated. The original database and the target database may be EXCEL-based spreadsheet databases or Oracle (Oracle) databases. The original database and the target database may be the same database or different databases, for example, the original database is an EXCEL-based spreadsheet database, and the target database is an Oracle database, at this time, data migration may also be implemented.

The data configuration rule is used for representing specific configuration information of the original database and the target database, such as the name and the address of the database, the list name contained in the database, and the user name and the password of the corresponding management account of the database.

The data configuration rules between the original database and the target database can be the same, for example, the data configuration rules have the same internal address, name and management account in different storage systems; there may also be differences, such as names of databases and form names, and in this case, the data configuration rule further includes the corresponding relationship between different forms.

Further, after the data configuration rules of the original database and the target database are determined, the specific data and the administrator authority of the two databases can be obtained, and therefore data migration processing can be completed.

Furthermore, the data configuration rules of the original database and the target database can be manually input, and the corresponding data configuration rules can be automatically acquired through automatic detection rules, so that the operation can be simplified, and the processing difficulty of data migration can be reduced.

Step S202, processing the data in the original database based on the data configuration rule.

Specifically, the processing of the data in the original database is mainly to adjust the data in the original database based on the data configuration rule of the target database, such as adjusting the corresponding database version, the corresponding form name, and the format of the specific data. And processing the data in the original database to ensure that the data configuration rule of the processed data is the same as the data configuration rule in the target database.

In some embodiments, when the data configuration rule in the original database is the same as the data configuration rule in the target database, the data in the original database is processed only by copying the data into the cache database.

And step S203, storing the processed data in the original database in a cache database.

Specifically, the cache database is configured by the HDFS and is used for caching the processed data. Since data processing errors are likely to occur in the data processing process, the processed data needs to be temporarily stored in the cache database, checked, and then migrated to the target database. Because in general data migration, even if an error is found in the target database, the data is modified in the target database, and the security and modification difficulty of the data are far greater than those of the data before the data migration.

And the data configuration rule stored in the cache database is the same as the data configuration rule stored in the target database.

And step S204, acquiring a data check log corresponding to the data processing process.

Specifically, in the process of processing data in the original database, the data before and after migration is compared and checked in real time, and the comparison result with differences is recorded, so that a data check log can be obtained. By checking the log for data, it is possible to determine that there is erroneous data.

In some embodiments, through the CHECK mode of the yugong tool, the change of the primary key in the Database can be obtained through JDBC (Java Database Connectivity), and the changed data is fetched for batch comparison, so as to generate the corresponding data CHECK log.

Furthermore, based on the comparison and difference of data fed back from the data inspection log, data analysis can be used to confirm which data fields have errors and need to be repaired, and then an operator can configure the fields needing to be repaired according to the requirements, and can also automatically repair the fields based on the existing configuration rules.

And S205, repairing abnormal data in the cache database based on the data check log.

Specifically, according to the data check log, the position of the abnormal data in the cache database and the position of the data in the corresponding original database can be determined, and then the data is repaired based on the preset data repair rule, for example, after the data in the corresponding original database is subjected to data processing again, the data in the cache database is replaced, so that the data is repaired.

Illustratively, the data obtained by the yugong tool is checked to identify that the data with the location IDs of 137, 141, 144 and 145 are recorded in the data, and the fields c4 and c6 in the cache database are compared with the corresponding fields in the original database to find that a difference exists, so that the fields c4 and c6 in the cache database need to be repaired, and the data in the cache database is replaced by the original database, thereby completing the data repair.

And step S206, after the repair is determined to be completed, migrating the data of the cache database to the target database.

Specifically, after the repair is completed, the data in the cache database can be directly migrated to the target database. In the data migration process, data configuration rules such as versions and structures of data in the database are not modified, and the data in the cache database are directly stored in the target database in a copying mode. Therefore, abnormal data caused by modification processing of data in the migration process can be avoided, and accuracy of the migrated data is guaranteed to the maximum extent.

Furthermore, data are processed before being migrated, and processed data are subjected to abnormal inspection and repair, so that the limitation and complexity of data comparison after data migration in the past are solved, the technical requirements on operators are greatly reduced, the data in a migrated database are prevented from being repaired, and a large amount of required codes and complicated SQL are more easily operated.

According to the data migration method provided by the embodiment of the disclosure, data configuration rules of an original database and a target database are obtained; processing the data in the original database based on the data configuration rule, and storing the processed data in a cache database; then acquiring a data check log corresponding to the data processing process, and repairing abnormal data appearing in the cache database based on the data check log; and finally, after the repair is determined to be completed, migrating the data of the cache database to the target database. Because the data inevitably has difference in the migration process among different databases, abnormal data in the processed data is firstly repaired in the cache database and then directly migrated to the target database without modification, so that field-by-field automatic inspection between the heterogeneous database tables and the tables can be realized, the accuracy is high, the efficiency is obviously improved, the problem can be directly determined through the data inspection log, the problem positioning and problem solving are facilitated, and operators can more easily perform and maintain the data.

FIG. 3 is a flow chart of a data migration method provided by the present disclosure. As shown in fig. 3, the data migration method provided in this embodiment includes the following steps:

step S301, acquiring data configuration rules of the original database and the target database.

This step is the same as step S201 in the embodiment shown in fig. 2, and is not described here again.

Step S302, determining that the data configuration rules in the original database and the target database meet the pre-set rule.

Wherein the pre-rules include: the data types of fields between the forms in the original database and the target database are the same; the length of fields between the forms in the original database and the target database is the same; the field names between the forms in the original database and the target database are the same; the form in the target database contains primary keys.

In some embodiments, the form in the original database needs to be the same structure as the form in the target database to ensure that data can be migrated between the two databases. Because the data volume is huge, if the data form is adjusted in the data migration process, the data processing volume is large, a large amount of abnormal data is easy to appear, the repair workload is also large, and the reliability of the migration processing is insufficient, so that the data to be migrated needs to meet the precondition.

Step S303, preprocessing the data in the original database based on the data configuration rule in the target database.

In some embodiments, a difference in data configuration rules may exist between the forms of the original database and the target database, and at this time, if data processing is performed, data in the original database can be adjusted to a state identical to the data configuration rules in the target database, and by performing data check and repair on the adjusted data, accuracy of data for migration can be ensured, thereby ensuring reliability of a migration process.

Furthermore, the difference of the data configuration rules in the original database and the target database can be that format versions of specific fields need to be converted on the form, or the internal addresses of different forms are different, and at this time, the difference can be quickly adjusted through data processing, and meanwhile, the overlarge data processing amount can be avoided.

And S304, determining that the data configuration rules in the preprocessed original database and the preprocessed target database can meet the preposed rules.

Specifically, steps S303 to S304 are optional steps parallel to step S302, and those skilled in the art may select corresponding steps to implement according to the data configuration rule relationship between the actually processed original database and the actually processed target database.

In some embodiments, after determining the data configuration rules in the original database and the target database, the method further includes: and acquiring an automatic repairing rule of the abnormal data.

Specifically, the automatic repair rule is used for matching with the data check log to obtain the abnormal data record of the corresponding position. Based on the structure of the data inspection log, the abnormal data is automatically positioned, and the abnormal data is automatically repaired by a preset method. Therefore, the data restoration process can be facilitated.

Step S305, processing the data in the original database based on the data configuration rule, and storing the processed data in the cache database.

This step is the same as step S202 and step S203 in the embodiment shown in fig. 2, and is not described here again.

And step S306, acquiring a data check log corresponding to the data processing process.

Specifically, the data check log is a log which is based on the original database, performs checks on data format, data length, interval range, integrity, consistency and the like on data in the cache database, and records data with abnormality in the checks. Therefore, the abnormal data in the data check log may be caused by various reasons, such as format change during data processing, missing entries existing when the data is cached in the cache database, and different configuration attributes of the cache database and the original database.

And step S307, determining the corresponding abnormal position of the abnormal data in the buffer database based on the data check log.

Specifically, the position of the abnormal data in the buffer database can be determined, and the position of the original data corresponding to the abnormal data in the original database can also be determined, so that the abnormal data can be repaired based on the original data.

Step S308, based on the original data of the position corresponding to the abnormal position in the original database, the abnormal data is repaired.

Specifically, after the positions of the original data and the abnormal data are respectively determined, the abnormal data which is different from the original data is repaired based on the original data.

Further, the specific repairing method may be to directly replace the abnormal data with the original data, or may be to replace the abnormal data with the processed data after the original data is processed. If the original data is in the format of 2000-1-1, a missing item exists in the corresponding position in the cache database, and the corresponding data in the target database needs the format of XXXXXX/XX/XX, the original data needs to be processed into 2000/1/1 after being subjected to format adjustment, and then the abnormal data in the cache database is replaced.

In some embodiments, the abnormal data is repaired based on the automatic repair rules and the data in the original database.

Specifically, the abnormal data can be automatically repaired based on an automatic repair rule, so that the processing efficiency is improved, the inaccuracy problem possibly caused by manual repair is avoided, and the accuracy of the data migration process is further ensured.

Step S309, it is determined that the repair of the abnormal data is completed.

And the repaired data used for representing the original data and the repaired data of the abnormal position in the cache database are the same after the repairing is finished.

Or, the repair completion is used to indicate that no new abnormal data is added in the data check log.

And the repair completion is used for indicating that no abnormal data is newly added in the data check log.

Specifically, after the abnormal data is repaired, in order to ensure that unrepaired data does not exist in the buffer database (and further ensure the accuracy of the buffer data for migration), the repaired buffer database needs to be reviewed to ensure that the abnormal data is repaired.

In some embodiments, confirming that the repair is complete may be accomplished by reviewing the data check log again.

By looking at the record in the data repair log, it can be determined whether the data has been repaired, and when no data is displayed in the log, it is indicated that the data repair has been completed.

In some embodiments, confirming that the repair is completed may also be implemented by reviewing an address corresponding to the abnormal data.

And performing secondary inspection on the abnormal data through the address in the data inspection log acquired before repair, and if the data at the position is the same as the corresponding data in the original database or the data in the original database after processing, determining that the repair of the abnormal data is finished.

And step S310, after the repair is determined to be completed, migrating the data of the cache database to the target database.

After the repair is determined to be completed, the data in the cache database can be directly migrated to the target database without manual control.

This step is the same as step S205 in the embodiment shown in fig. 2, and is not described here again.

Step S311, when the data in the cache database is migrated to the target database and abnormal data is generated, acquiring a corresponding migration data check log in the data migration process.

Specifically, in the process of data migration from the cache database to the target database, there are processes of extracting and loading data in the cache database, which may also cause an exception to occur to the data in the cache database. When the data in the cache database is abnormal or changed, the data check log also generates a corresponding record, namely, the data check log is migrated, at this time, the data migration process needs to be suspended, and after the abnormal data is timely repaired, the migration process is continued.

In some embodiments, when the target database supports repeated loading of data, if abnormal data is migrated to the target database, the data needs to be repaired in the cache database, and the repaired data needs to be migrated to the target database again.

Steps S311 to S313 are optional steps in parallel to step S310, and the implementation of the steps can be selected according to the encountered situation.

Step S312, based on the migration data check log, abnormal data appearing in the cache database is repaired.

Specifically, the repairing of the abnormal data occurring in the cache database is the same as the repairing of the abnormal data generated in the data processing process from step S305 to step S307, and is not described herein again.

And step S313, after the repair is determined to be completed, continuing to migrate the data of the cache database to the target database.

After the repair is determined to be completed, the suspended migration process can be directly resumed, and the data migration is continued, so that the continuity of the data migration process is ensured without manual control.

According to the data migration method provided by the embodiment of the disclosure, the data configuration rules of the original database and the target database are determined to meet the pre-rules which can be used for data migration, then the data are processed and stored in the cache database, the abnormal data are repaired based on the data check log, the repair result is confirmed, and if the abnormal data occur in the data migration process, the data migration process is continued after the abnormal data are repaired through the data check log. The abnormal data are automatically repaired based on the data inspection logs after the data are processed and in the data migration process, so that the accuracy in the data migration process is effectively guaranteed, and the reliability of the data migration is effectively guaranteed by carrying out inspection and confirmation after the data are repaired.

Fig. 4 is a schematic structural diagram of a data migration apparatus provided in the present disclosure. As shown in fig. 4, the data migration apparatus 400 includes: an acquisition module 410, a processing module 420, a caching module 430, a checking module 440, a repair module 450, and a migration module 460. Wherein:

an obtaining module 410, configured to obtain data configuration rules of an original database and a target database;

a processing module 420, configured to process data in the original database based on the data configuration rule;

the cache module 430 is configured to store the processed data in the original database in a cache database;

the checking module 440 is configured to obtain a data checking log corresponding to the data processing process;

the recovery module 450 is configured to recover abnormal data occurring in the cache database based on the data check log;

and the migration module 460 is configured to migrate the data of the cache database to the target database after the repair is determined to be completed.

Optionally, the repairing module 450 is specifically configured to determine, based on the data check log, an abnormal position corresponding to the abnormal data in the buffer database; and restoring the abnormal data based on the original data of the position corresponding to the abnormal position in the original database.

Optionally, the repairing module 450 is further configured to obtain an automatic repairing rule of the abnormal data; correspondingly, based on the data check log, repairing the abnormal data appearing in the cache database, further comprising: and repairing the abnormal data based on the automatic repairing rules and the data in the original database.

Optionally, the repairing module 450 is further configured to determine that repairing of the abnormal data is completed after repairing the abnormal data based on the original data corresponding to the abnormal position in the original database, where the repairing completion is used to indicate that the repaired data of the abnormal position in the original database is the same as the repaired data of the abnormal position in the cache database.

Optionally, the repairing module 450 is further configured to determine that repairing of the abnormal data is completed after the abnormal data appearing in the cache database is repaired based on the data check log, where the repairing is completed and is used to indicate that no new abnormal data is added in the data check log.

Optionally, the migration module 460 is further configured to, when data in the cache database is migrated to the target database and abnormal data is generated, obtain a corresponding data check log in the data migration process; based on the data check log, repairing abnormal data appearing in the cache database; and after the repair is determined to be completed, continuing to migrate the data of the cache database to the target database.

Optionally, the obtaining module 410 is further configured to, after obtaining the data configuration rules of the original database and the target database, determine that the data configuration rules in the original database and the target database satisfy the following pre-rules: the data types of fields between the forms in the original database and the target database are the same; the length of fields between the forms in the original database and the target database is the same; the field names between the forms in the original database and the target database are the same; the form in the target database contains primary keys.

Optionally, the processing module 420 is further configured to, after obtaining the data configuration rules of the original database and the target database, perform preprocessing on the data in the original database based on the data configuration rules in the target database; and determining that the data configuration rules in the preprocessed original database and the preprocessed target database can meet the pre-set rules.

Optionally, the obtaining module 410 includes a Django architecture and a front-end interface set in the Django architecture, where the front-end interface is used to obtain data configuration rules of the original database and the target database; the processing module 420 includes a data migration tool yugong, which is used for processing data in the original database; the caching module 430 comprises a Hadoop Distributed File System (HDFS) in a Hadoop system of the data file system, and the HDFS is used for caching the processed data; the checking module 440 includes a CHECK mode in the yugong tool, where the CHECK mode is used to obtain a data CHECK log corresponding to the data processing process; the repair module 450 comprises a MapReduce model of a data processing tool contained in the Hadoop system, wherein the MapReduce model is used for repairing abnormal data found by the data check log; the migration module 460 includes a yugong tool that is also used to migrate data in the HDFS into the target database.

Specifically, by configuring the yugong tool under the Django architecture, the extraction and loading work required by large-scale data migration can be realized, and the data consistency CHECK can be performed on abnormal data generated in the large-scale data processing process through the provided CHECK mode. The working principle of the CHECK mode is based on JDBC (Java Database Connectivity), and the processed data and the data in the original Database are compared in batch according to the range change of the primary key in the Database, so as to quickly find out the data with abnormality.

By configuring the Hadoop system, distributed processing can be performed on a large amount of data in the original database in a reliable, efficient and telescopic mode, and the processed data is cached in an HDFS system provided by the Hadoop system. The HDFS system can provide storage for massive data and compare the data in the massive heterogeneous database, so that the HDFS system can be matched with a CHECK mode of a yugong tool to CHECK fields between an original database form and a cache database form one by one, and therefore data with problems can be found out quickly.

By setting the MapReduce model, calculation is provided for the mass data cached in the HDFS system, and the mass data in the original database can be converted according to different configuration rules of the target database and the original database.

In some embodiments, data in the original database is usually stored in a plurality of different RDS databases due to a huge data amount, and a DRDS system connected to each RDS database is configured, so that the RDS database where the abnormal data to be modified is located can be quickly found out through a key value of the sub database provided by the DRDS system, and the abnormal data is quickly processed.

Furthermore, by providing a front-end interface, the data configuration rules of the original database and the target database can be visually input, the configuration environment is simple, and the automatic repair rules can be input, so that abnormal data can be automatically repaired.

Optionally, the repair module 450 further includes: a data restoration interface arranged in the Django architecture; and the data repair interface is used for repairing abnormal data in the cache database based on the data check log.

As shown in fig. 5, which is a schematic diagram illustrating a connection relationship between structures corresponding to a data migration apparatus, the data migration apparatus 400 includes: django architecture 510; a front-end interface 511, a data file system Hadoop system 520 and a data migration tool yugong tool 530 are arranged in the Django architecture 510; the Hadoop system 520 comprises data processing tools MapReduce model 521 and a Hadoop distributed file system HDFS system 522, and the yugong tool 530 comprises a CHECK mode.

Therefore, the data migration device is supported by the capacity of a Hadoop system for processing a large amount of data, the property of data inspection logs can be obtained through the yugong tool according to batch comparison in a main key range, the data comparison and data restoration time is greatly shortened by combining key values of a sub database provided by a DRDS system, and the limitation and complexity of data comparison after data migration in the past are solved; meanwhile, by checking fields between forms of heterogeneous databases (a target database and an original database), the accuracy rate is high, the efficiency is high, the compared result can be directly output to the problem, the problem positioning and the problem solving are facilitated, the technical requirements on operators are greatly reduced, a large number of codes and complex query commands are avoided, the manual operation is easier, and the accuracy and the reliability of data migration are guaranteed.

In this embodiment, the data migration apparatus can perform automatic data recovery in real time in the data migration process by combining the modules, thereby effectively ensuring the accuracy of the data in the migration process and avoiding the problem that the migrated data is abnormal.

Fig. 6 is a schematic structural diagram of an electronic device provided in the present disclosure, and as shown in fig. 6, the electronic device 600 includes: a memory 610 and a processor 620.

Wherein the memory 610 stores computer programs executable by the at least one processor 620. The computer program is executed by the at least one processor 620 to cause the electronic device to implement the data migration method as provided in any of the embodiments above.

Wherein the memory 610 and the processor 620 may be connected by a bus 630.

The related descriptions may be understood by referring to the related descriptions and effects corresponding to the method embodiments, which are not repeated herein.

One embodiment of the present disclosure provides a computer-readable storage medium on which a computer program is stored, the computer program being executed by a processor to implement the data migration method according to any of the embodiments corresponding to fig. 2 to 3.

The computer readable storage medium may be, among others, ROM, Random Access Memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, and the like.

One embodiment of the present disclosure provides a computer program product comprising computer executable instructions for implementing the data migration method according to any of the embodiments corresponding to fig. 2 to 3 when executed by a processor.

In the several embodiments provided in the present disclosure, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, a division of modules is merely a logical division, and other divisions may be realized in practice, for example, a plurality of modules or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or modules, and may be in an electrical, mechanical or other form.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This disclosure is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice in the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. A data migration method, characterized in that the migration method comprises:

storing the processed data in the original database in a cache database;

acquiring a data inspection log corresponding to a data processing process;

2. The data migration method according to claim 1, wherein the repairing abnormal data occurring in the cache database based on the data check log comprises:

determining the corresponding abnormal position of the abnormal data in a buffer database based on the data check log;

and repairing the abnormal data based on the original data of the position corresponding to the abnormal position in the original database.

3. The data migration method according to claim 2, wherein the method further comprises:

acquiring an automatic repairing rule of abnormal data;

correspondingly, the repairing abnormal data occurring in the cache database based on the data check log further includes:

and repairing the abnormal data based on the automatic repairing rule and the data in the original database.

4. The data migration method according to claim 2, wherein after the repairing the abnormal data based on the original data corresponding to the abnormal location in the original database, the method further comprises:

and determining that the abnormal data is repaired completely, wherein the repair completion is used for indicating that the original data is the same as the repaired data of the abnormal position in the cache database.

5. The data migration method according to any one of claims 1 to 4, wherein after repairing the abnormal data occurring in the cache database based on the data check log, the method further comprises:

and determining that the abnormal data is repaired, wherein the repair completion is used for indicating that no additional abnormal data is added in the data check log.

6. The data migration method according to any one of claims 1 to 4, wherein the determining that the repair is completed and then migrating the data of the cache database to the target database includes:

when the data of the cache database is migrated to the target database to generate abnormal data, acquiring a corresponding migration data check log in the data migration process;

repairing abnormal data appearing in the cache database based on the migration data check log;

and after the repair is determined to be completed, continuing to migrate the data of the cache database to the target database.

7. The data migration method according to any one of claims 1 to 4, wherein after the obtaining of the data configuration rules of the original database and the target database, the method further comprises:

determining that the data configuration rules in the original database and the target database meet the following pre-rules:

the data types of fields between the forms in the original database and the target database are the same;

the length of fields between the forms in the original database and the target database is the same;

the field names between the forms in the original database and the target database are the same;

the form in the target database contains primary keys.

8. The data migration method according to claim 7, wherein after obtaining the data configuration rules of the original database and the target database, the method further comprises:

preprocessing the data in the original database based on a data configuration rule in a target database;

and determining that the data configuration rules in the preprocessed original database and the preprocessed target database can meet the pre-set rule.

9. A data migration apparatus, comprising:

the acquisition module is used for acquiring data configuration rules of the original database and the target database;

10. The data migration apparatus according to claim 9, comprising:

the acquisition module comprises a Django architecture and a front-end interface arranged in the Django architecture, and the front-end interface is used for acquiring data configuration rules of an original database and a target database;

the processing module comprises a data migration tool yugong, and the data migration tool yugong is used for processing data in an original database;

the cache module comprises a Hadoop Distributed File System (HDFS) in a Hadoop system of the data file system, and the HDFS is used for caching the processed data;

the checking module comprises a CHECK mode in a yugong tool of the data migration tool, and the CHECK mode is used for acquiring a data checking log corresponding to the data processing process;

the restoration module comprises a MapReduce model of a data processing tool contained in the Hadoop system, and the MapReduce model is used for restoring abnormal data found by the data check log;

the migration module comprises the yugong tool, and the yugong tool is also used for migrating the data in the HDFS to the target database.

11. The data migration apparatus according to claim 10, wherein the repair module further comprises:

and the data restoration interface is arranged in the Django architecture and used for restoring abnormal data appearing in the cache database based on the data check log.

12. An electronic device, comprising:

at least one processor;

and a memory communicatively coupled to the at least one processor;

wherein the memory stores instructions executable by the at least one processor to cause the electronic device to perform the data migration method of any of claims 1-8.

13. A computer-readable storage medium having computer-executable instructions stored therein, which when executed by a processor, are configured to implement the data migration method of any one of claims 1 to 8.