CN116910025A - Full-quantity and incremental data verification method from Oracle to openGauss - Google Patents

Full-quantity and incremental data verification method from Oracle to openGauss Download PDF

Info

Publication number
CN116910025A
CN116910025A CN202310884968.4A CN202310884968A CN116910025A CN 116910025 A CN116910025 A CN 116910025A CN 202310884968 A CN202310884968 A CN 202310884968A CN 116910025 A CN116910025 A CN 116910025A
Authority
CN
China
Prior art keywords
data
verification
incremental
full
oracle
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310884968.4A
Other languages
Chinese (zh)
Inventor
杨杰
苏章艳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Mass Database Technology Co ltd
Original Assignee
Guangzhou Mass Database Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Mass Database Technology Co ltd filed Critical Guangzhou Mass Database Technology Co ltd
Priority to CN202310884968.4A priority Critical patent/CN116910025A/en
Publication of CN116910025A publication Critical patent/CN116910025A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/214Database migration support
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • G06F16/2365Ensuring data consistency and integrity
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Security & Cryptography (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a full-quantity and increment data checking method from Oracle to openGauss. The method comprises the steps of executing full-volume data verification by a full-volume verification module after full-volume data migration, analyzing an archive log and a redo log of Oracle in real time through an increment al, collecting data required by incremental data verification, sending the data to kafka, and reading the data in the kafka by the incremental verification module and performing the incremental data verification. The method provides a feasible solution for solving the problem of consistency and integrity of the full-volume data and the incremental data in the database replacement process, can realize the full-volume live library verification and the incremental data verification of the incremental migration after the full migration of the Oracle to the openGauss database, and provides powerful technical support for ensuring the accuracy and consistency of the data in the process of executing the data migration from the Oracle to the openGauss.

Description

Full-quantity and incremental data verification method from Oracle to openGauss
Technical Field
The invention belongs to the technical field of data verification methods, and particularly relates to a full-quantity and incremental data verification method from Oracle to openGauss.
Background
With the perfection of domestic database technology and products, more and more institutions and enterprises are turning from original use of Oracle to use of domestic databases. Generally, after the database is replaced, in order to ensure consistency and integrity of data in heterogeneous databases, application support is required to run bi-directionally between a source database and a target database for a period of time, and incremental data of the period of time is required to ensure data consistency and integrity. There is therefore a need for a method of validating and detecting data that can compare the data of a source library to a target library without application downtime, however, until now, no report of the relevant method has been seen.
For the above reasons, a tool for data verification is urgently needed, which can perform full-quantity verification and incremental verification to ensure consistency of data migration.
Disclosure of Invention
In order to solve the problem of consistency and integrity of full data and incremental data in the process of replacing a database, the invention provides a novel data verification method after the data of an Oracle to openGauss database is migrated.
The scheme provides a full-quantity verification and increment verification module for ensuring the accuracy and consistency of data after the data migration is carried out from Oracle to openGauss.
Specifically, the invention provides a full-quantity and incremental data verification method from Oracle to openGauss, which comprises the following steps:
s1, executing full data verification by a full verification module after full data migration;
s2, analyzing an archiving log and a redo log of the Oracle in real time through an encrustement, collecting data required by incremental data verification and sending the data to the kafka;
s3, the incremental verification module reads the data in the kafka and performs incremental data verification.
Further, according to some embodiments of the present invention, in the step S1 of the method for verifying full-size and incremental data from Oracle to openGauss, the full-size data verification is performed by the full-size verification module after the full-size data is migrated, and includes:
s11, simultaneously carrying out query operation taking a table as a unit on a source library and a target library, carrying out data processing on the queried data in a row unit, and converting the queried data into data of a comparable character string type;
s12, carrying out hash value on the data of the character string type;
s13, dividing the data obtained in the previous step into a source library and a target library in a format of { primary key } | { hash }, and writing the source library and the target library into a file of a linux environment by taking a table as a unit;
s14, ordering the files obtained in the previous step by using a sort command of a linux system, and then performing comparison and verification operation on corresponding tables of a source library and a target library by using a diff command;
s15, obtaining a result file of comparison verification for a user to check so as to determine the correctness and consistency of the data verification, generating a data difference file, and storing the data difference file in a linux server file.
Further, according to some embodiments of the present invention, the incremental verification module in step S3 of the method for verifying full and incremental data from Oracle to openGauss of the present invention reads data in kafka and performs incremental data verification, including:
s31, collecting incremental verification data;
s32, integrating the data obtained in the previous step by an incremental verification module, and assembling the data into conditional query sentences to respectively query in a source library and a target library;
s33, processing the query result, and generating a new data difference file through sequencing and comparison;
s34, the operation is circulated until the incremental data migration is completed or the incremental verification task is stopped manually.
Further, according to some embodiments of the present invention, the data of the increment verification in step S31 of the method for verifying the full-amount and increment data from Oracle to openGauss of the present invention includes:
(1) Data stored in the last checked data difference file in the linux server file;
(2) incremental data obtained by analyzing the source library log by the encrustement.
Further, according to some embodiments of the present invention, in the method for verifying full and incremental data from Oracle to openGauss in step S14, the sort command of the linux system is used to sort the files obtained in the previous step, and the database sorting may be further adopted, that is, sorting conditions are added when the database is queried, so as to sort the files with a main key or other sortable fields, thereby obtaining a database with a sequence.
Further, according to some embodiments of the present invention, in the step S14 of the method for verifying full and incremental data from Oracle to openGauss, the diff command is used to perform a comparison verification operation on the corresponding tables of the source library and the target library, or a merkle tree may be used to compare, i.e. hash the queried data, and then the hash values of two adjacent data are combined together to calculate the hash value again, and so on, finally a root node hash value is obtained, and the root node hash values of the source library table and the target library table are compared, if they are consistent, the data are consistent, if they are inconsistent, binary tree query is performed, and inconsistent leaf nodes are found and a data difference file is generated.
Further, according to some embodiments of the present invention, the method for verifying full and incremental data from Oracle to openGauss further includes:
and establishing a management end, wherein the management end is used for reporting the state and the abnormality of the data verification, reporting the backlog state of the message queue, and carrying out integral data management on the incremental verification process, such as the verification speed, the difference result and the like of each incremental verification.
Further, in the method for verifying the full and incremental data from Oracle to openGauss, the management end is a SpringBoot item and operates on a linux server in a java-jar mode.
In addition, the invention also provides a computer readable storage medium, wherein the storage medium is stored with a computer program, and the program is executed by a processor to realize the steps of the method for verifying the full and incremental data from Oracle to openGauss.
In summary, the method for verifying the full-volume and incremental data from Oracle to openGauss provides a feasible solution for solving the problem of verifying the consistency and the integrity of the full-volume data and the incremental data in the process of replacing a database, and by using the method, the full-volume live library verification and the incremental data verification of the incremental migration after the full migration of the Oracle to the openGauss database can be realized, thereby providing powerful technical support for ensuring the accuracy and the consistency of the data in the process of executing the data migration from Oracle to the openGauss.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that need to be used in the embodiments of the present invention will be briefly described below, and it is obvious that the following drawings are only some embodiments described in the present invention, and other drawings can be obtained according to the drawings without inventive effort for those skilled in the art.
FIG. 1 is a flow chart of an overall implementation of the method of the present invention.
FIG. 2 is a flow chart of the method of the present invention for performing full data verification.
FIG. 3 is a flow chart illustrating the incremental data verification in the method of the present invention.
FIG. 4 is a schematic diagram of the technical architecture of the method of the present invention.
Fig. 5 is a diagram illustrating an exemplary implementation of the method of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to specific embodiments and corresponding drawings. It is apparent that the described embodiments are only some embodiments of the present invention, but not all embodiments, and the present invention may be implemented or applied by different specific embodiments, and that various modifications or changes may be made in the details of the present description based on different points of view and applications without departing from the spirit of the present invention.
Meanwhile, it should be understood that the scope of the present invention is not limited to the following specific embodiments; it is also to be understood that the terminology used in the examples of the invention is for the purpose of describing particular embodiments only, and is not intended to limit the scope of the invention.
Examples: full-quantity and incremental data verification method from Oracle to openGauss
As shown in fig. 1-3, the method comprises the following steps:
full data verification
After the full-size data is migrated, a full-size verification module executes a full-size verification task, the task can simultaneously perform query operation taking a table as a unit on a source library and a target library, the queried data are subjected to different data processing according to different database types in a row unit, and finally the queried data are converted into data of comparable character string types and are subjected to hash value. Dividing the data into a source library and a target library in a { main key } | { hash } format, writing the source library and the target library into files in a linux environment by taking a table as a unit, sequencing each file by using a sort command of a linux system, and performing comparison checking operation on corresponding tables of the source library and the target library by using a diff command. And finally, obtaining a comparison verification result file for a user to check so as to determine the correctness and consistency of the data verification, generating a data difference file and storing the data difference file in a linux server file.
Incremental data verification
In order to avoid inconsistent newly added data, the source library and the target library are required to be in a static library state when full verification is performed, but verification of the stopped library by a user is difficult to accept. In addition, in order to achieve seamless switching of databases, it is necessary to ensure that the source and target databases run simultaneously for a period of time, and this period of time also requires migration of data, which is called incremental migration. In the method, an incremental verification module is used for solving the problems, and the verification mode of the incremental verification is consistent with that of a full-quantity verification module, and the difference is the collection of data. The increment check data source is increment data (stored in kafka) obtained by analyzing a source library log (archiving log and redo log) by an increment migration collector module, wherein the increment check data source is analysis of a last check data difference file result stored in a linux server file. The incremental verification module integrates the data and assembles the data into conditional query sentences to query in the source library and the target library respectively, then processes the query results, orders and compares the query results to generate new data difference files, and the task circulates the operations until the incremental data migration is completed or the incremental verification task is stopped manually.
Because the sort and diff commands are used in this solution, but there are no commands above on windows or some operating systems, the following alternatives are provided for this:
the sort operation may use database sorting, and sorting conditions are added when the database data is queried, and sorting is generally performed by a primary key or other sortable fields, so that the obtained database still has sequence.
The diff operation can use a merkle tree to compare, namely, the queried data is subjected to hash value, then the hash values of two adjacent data are combined together, the hash value is calculated once again, and the like, finally, a root node hash value is obtained, the root node hash values of the source library table and the target library table are compared, if the root node hash values are consistent, the data are consistent, if the root node hash values are inconsistent, binary tree query is performed, inconsistent leaf nodes are found, and a data difference file is generated.
In the scheme, the management end is established to report the state and the abnormality of the data verification, report the backlog state of the message queue, and carry out integral data management on the incremental verification process, such as the verification speed, the difference result and the like of each incremental verification. The management end is a SpringBoot project and can run on the linux server in a java-jar mode.
The complete implementation steps of the data verification method of the invention are as follows:
1. installation of an encremental
The encremental is divided into an acquisition end and an application end, and the acquisition end is used for mounting an archive log and a redox log of the Oracle in an SSRFS mode by adopting an independent deployment mode, so that the limitation of a machine operating system and resources where the Oracle is located can be avoided. Reading the Oracle log file in a remote mounting mode, and analyzing the file content. The application end and the management end are deployed as a process to finish the management of the incremental task created by the user and the interaction with the acquisition end. The application end needs to filter and process the data of the acquisition end into a data format required by incremental verification.
2. Installation management terminal
The management end is a SpringBoot item, and the packing is completed and sent to the linux server to run in a java-jar mode.
3. Starting a data verification task
And after the management end page configures the connection information of the source library and the target library, selecting a migration object for migration by newly-built operation.
And 3.1, after the migration is completed, running an incremental migration task, notifying an encremental module to start a collector and an application end, starting to collect incremental data of a source library log and synchronizing the incremental data to a target library.
And 3.2, after the migration is completed, running a full-quantity verification and increment migration task, and simultaneously notifying an encremental module to start generating increment verification data. The full-volume verification content comprises full-volume migration data and partial data which are subjected to incremental migration in task operation, and the verification is stopped after a verification difference data file is generated. And immediately running an incremental migration task after the full migration is stopped, starting to receive the kafka incremental verification data, acquiring the last verification difference result file integration, and then performing incremental data verification, thereby circulating.
FIG. 4 is a schematic diagram of a method of the present invention, showing a scenario in which full data migration has been performed and incremental data migration is being performed. Comprising the following steps: running a full-quantity verification task at a management end full-quantity verification module, generating a data difference file, and storing the data difference file in a linux server file; the increment checking module monitors the increment checking data of the kafka, acquires the data difference file checked last time in the linux server file after reading the data difference file, assembles the data difference file into a conditional query statement, queries in the Oracle database and the openGauss database respectively, and checks to generate a data checking result, and circulates until the increment data migration is stopped or the checking task is stopped manually.
FIG. 5 is an exemplary diagram of the implementation of the method of the present invention, comprising the steps of:
step one, after the user carries out full-quantity data migration, starting full-quantity data verification, an application end queries the data of a source library and a target library by taking a table as a unit, writes the processed data into a file of a server, then uses a sort command to sort, uses a diff command to compare the data of the source library and the target library, and generates a data difference file for the user to check if the difference exists.
And step two, starting incremental data verification after the full data verification is finished, informing an application end of an increment to start to collect data required by the incremental verification, sending the data to the kafka, merging the data difference file checked last time and the incremental data to generate a query sql condition, extracting the incremental data verification by using a condition extraction verification mode of the full data verification, and finally generating a new data difference file.
The foregoing is merely exemplary of the present invention and is not intended to limit the present invention. Various modifications and variations of the present invention will be apparent to those skilled in the art. Any modification, replacement, etc. that comes within the spirit and principle of the present invention should be included in the scope of the claims of the present invention.

Claims (9)

1. The method for verifying full-quantity and increment data from Oracle to openGauss is characterized by comprising the following steps:
s1, executing full data verification by a full verification module after full data migration;
s2, analyzing an archiving log and a redo log of the Oracle in real time through an encrustement, collecting data required by incremental data verification and sending the data to the kafka;
s3, the incremental verification module reads the data in the kafka and performs incremental data verification.
2. The method for verifying full and incremental data from Oracle to openGauss according to claim 1, wherein the step S1 of performing full data verification by the full verification module after performing full data migration comprises:
s11, simultaneously carrying out query operation taking a table as a unit on a source library and a target library, carrying out data processing on the queried data in a row unit, and converting the queried data into data of a comparable character string type;
s12, carrying out hash value on the data of the character string type;
s13, dividing the data obtained in the previous step into a source library and a target library in a format of { primary key } | { hash }, and writing the source library and the target library into a file of a linux environment by taking a table as a unit;
s14, ordering the files obtained in the previous step by using a sort command of a linux system, and then performing comparison and verification operation on corresponding tables of a source library and a target library by using a diff command;
s15, obtaining a result file of comparison verification for a user to check so as to determine the correctness and consistency of the data verification, generating a data difference file, and storing the data difference file in a linux server file.
3. The method for verifying full and incremental data from Oracle to openGauss according to claim 2, wherein the incremental verification module in step S3 reads the data in kafka and performs incremental data verification, comprising:
s31, collecting incremental verification data;
s32, integrating the data obtained in the previous step by an incremental verification module, and assembling the data into conditional query sentences to respectively query in a source library and a target library;
s33, processing the query result, and generating a new data difference file through sequencing and comparison;
s34, the operation is circulated until the incremental data migration is completed or the incremental verification task is stopped manually.
4. The method for verifying full and incremental data from Oracle to openGauss as in claim 3, wherein the incremental verification data in step S31 comprises:
(1) Data stored in the last checked data difference file in the linux server file;
(2) incremental data obtained by analyzing the source library log by the encrustement.
5. The method for verifying full and incremental data from Oracle to openGauss according to claim 2, wherein in step S14, the sort command of the linux system is used to sort the files obtained in the previous step, and further a database sort may be adopted, that is, a sort condition is added when the database data is queried, so as to sort the files with a main key or other sortable fields, thereby obtaining a database with a sequence.
6. The method for verifying full and incremental data from Oracle to openGauss according to claim 2, wherein in step S14, the diff command is used to perform a comparison verification operation on the corresponding tables of the source library and the target library, a merkle tree is used to compare, i.e. the queried data is subjected to hash value, the hash values of two adjacent data are combined together to calculate the hash value again, and the like, finally a root node hash value is obtained, the root node hash values of the source library table and the target library table are compared, if the two hash values are consistent, the data are consistent, if the two hash values are inconsistent, a binary tree query is performed, inconsistent leaf nodes are found, and a data difference file is generated.
7. The method for verifying full and incremental data from Oracle to openGauss according to claim 1, further comprising:
and establishing a management end, wherein the management end is used for reporting the state and the abnormality of the data verification, reporting the backlog state of the message queue and carrying out overall data management on the incremental verification process.
8. The method for verifying full and incremental data from Oracle to openGauss according to claim 7, wherein the management end is a SpringBoot item and operates on a linux server in a java-jar mode.
9. A computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of the Oracle to openGauss full and delta data verification method of any of claims 1-8.
CN202310884968.4A 2023-07-19 2023-07-19 Full-quantity and incremental data verification method from Oracle to openGauss Pending CN116910025A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310884968.4A CN116910025A (en) 2023-07-19 2023-07-19 Full-quantity and incremental data verification method from Oracle to openGauss

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310884968.4A CN116910025A (en) 2023-07-19 2023-07-19 Full-quantity and incremental data verification method from Oracle to openGauss

Publications (1)

Publication Number Publication Date
CN116910025A true CN116910025A (en) 2023-10-20

Family

ID=88366380

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310884968.4A Pending CN116910025A (en) 2023-07-19 2023-07-19 Full-quantity and incremental data verification method from Oracle to openGauss

Country Status (1)

Country Link
CN (1) CN116910025A (en)

Similar Documents

Publication Publication Date Title
US11461294B2 (en) System for importing data into a data repository
US10678810B2 (en) System for data management in a large scale data repository
US8140565B2 (en) Autonomic information management system (IMS) mainframe database pointer error diagnostic data extraction
CN102521316B (en) Pattern matching framework for log analysis
CN110781231B (en) Database-based batch import method, device, equipment and storage medium
EP3513314A1 (en) System for analysing data relationships to support query execution
CN111400408A (en) Data synchronization method, device, equipment and storage medium
CN108647357B (en) Data query method and device
CN103514223A (en) Data synchronism method and system of database
CN109063178B (en) Method and device for automatically expanding self-help analysis report
CN109299074B (en) Data verification method and system based on templated database view
CN113326247A (en) Cloud data migration method and device and electronic equipment
CN111611276A (en) Data query method, device and storage medium
CN112068981B (en) Knowledge base-based fault scanning recovery method and system in Linux operating system
CN111221698A (en) Task data acquisition method and device
CN116483831B (en) Recommendation index generation method for distributed database
CN110309206B (en) Order information acquisition method and system
CN107004036B (en) Method and system for searching logs containing a large number of entries
CN109426576B (en) Fault-tolerant processing method and fault-tolerant assembly
CN116910025A (en) Full-quantity and incremental data verification method from Oracle to openGauss
CN111259082B (en) Method for realizing full data synchronization in big data environment
CN112559641A (en) Processing method and device of pull chain table, readable storage medium and electronic equipment
US20020178140A1 (en) Method for characterizing and storing data analyses in an analysis database
CN117827920A (en) Method and device for screening temporary files, electronic equipment and storage medium
CN117909392A (en) Intelligent data asset inventory method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination