CN113704213A

CN113704213A - Sqlldr2 and ogg data synchronization-based implementation method

Info

Publication number: CN113704213A
Application number: CN202110965086.1A
Authority: CN
Inventors: 陈典银; 张德权
Original assignee: Liaoning Zhenxing Bank Co ltd
Current assignee: Liaoning Zhenxing Bank Co ltd
Priority date: 2021-08-20
Filing date: 2021-08-20
Publication date: 2021-11-26

Abstract

The invention relates to the technical field of data processing, and discloses a realization method based on sqlldr2 and ogg data synchronization, wherein an sqlldr2 client is used for exporting Oracle database data, exporting the data in a txt form and transmitting the data to an HDFS distributed file system, an oggogg synchronization tool source end reads a redo log in real time and analyzes the redo log into an sql statement and transmits the sql statement to a target end in a binary file manner, and an ogg synchronization tool process of the target end reads files and imports the files into the HDFS distributed file system. According to the invention, the required data is formatted and output to the txt file by using the sqlldr2 tool according to different requirements, and the data of the specific table and the specific field are appointed to be collected by adopting the ogg tool, so that the Oracle data can be synchronously backed up and stored in the HDFS distributed file system, the problems that the data is frequently imported and exported, the data is difficult to import across platforms, and the table required to be synchronized cannot be newly added or deleted according to the requirements are solved, the automatic synchronization of the incremental data is realized, and the workload of each manual synchronization is reduced.

Description

Sqlldr2 and ogg data synchronization-based implementation method

Technical Field

The invention relates to the technical field of data processing, in particular to a method for realizing data synchronization based on sqlldr2 and ogg.

Background

Sqlldr 2: sqlldr2 is a client tool that can export data in the oracle database into txt files according to a specific format.

Ogg: the source end reads redolog in real time, analyzes the redolog into sql statements and transmits the sql statements to the target end in a binary file mode.

The functions realized by the two are as follows: the sqlldr2 is used for initializing oracle-hadoop data, and the ogg is used for real-time synchronization of incremental data, so that the integrity of initial data and the timeliness of the incremental data are guaranteed.

However, the existing implementation method for synchronizing sqlldr2 and ogg data has the problems of frequent data import and export, difficult data cross-platform import and incapability of adding or deleting a table to be synchronized according to requirements. Therefore, those skilled in the art provide an implementation method based on sqlldr2 and ogg data synchronization to solve the problems in the background art.

Disclosure of Invention

The invention aims to provide an implementation method based on sqlldr2 and ogg data synchronization, so as to solve the problems in the background technology.

In order to achieve the purpose, the invention provides the following technical scheme: an implementation method based on sqlldr2 and ogg data synchronization is characterized by comprising an Oracle database, an sqlldr2 client, an ogg synchronization tool and an HDFS distributed file system;

the sqlldr2 client is used for exporting Oracle database data, exporting the data in a txt form and transmitting the data to an HDFS distributed file system;

the ogg synchronization tool is used for synchronously exporting the log documents of the Oracle database, a redo log in the Oracle database loads the modified log documents into a memory from a disk, an archive log mode archives and backups the modified log documents, when data are exported, a source end of the ogg synchronization tool reads the redo log in real time, analyzes the redo log into sql statements and transmits the sql statements to a target end in a binary file mode, and a process of the target end of the ogg synchronization tool reads the files and imports the files into an HDFS distributed file system.

As a still further scheme of the invention: the redo log consists of two parts, one is a redo log buffer of a log cache in the memory, the other is a redo log file in the disk, when the log document data record is modified each time, the modified content is firstly written into the redo log buffer, and then the modification in the memory is refreshed back to the redo log file after waiting for a proper time, and the whole process is as follows: and if the data is in the memory, directly modifying, otherwise, loading the data into the memory from the disk, generating a redo log after the modification is finished, writing the redo log into a redo log buffer, recording the modified value, and flushing the content in the redo log file back to the redo log buffer according to the selected strategy.

As a still further scheme of the invention: when the archive log mode is operated, all transaction redo logs are stored, the Oracle database stops all new operations before the redo log file is copied, and the Oracle database does not cover the old transaction log file before the old transaction log file is copied.

As a still further scheme of the invention: the ogg synchronization tool starts all processes in the form of start at the commands of the source end and the target end, and the starting sequence is completed according to the source mgr, the target mgr, the source extract, the source pump and the target replay;

the configuration of the source end of the ogg synchronization tool comprises the following steps:

a. configuring global variables of the ogg;

b. configuration manager mgr;

c. adding a copy table;

d. configuring an extract process;

e. configuring a pump process;

f. configuring a define file;

sending the generated ogg-test file to a destination file;

the configuration of the target end of the ogg synchronization tool comprises the following steps:

a. starting HDFS distributed file system service;

b. configuration manager mgr;

c. configuring checkpoint;

d. prop, hdfs;

e. the trail file is added to the replication process.

As a still further scheme of the invention: the sqlldr2 client side transmits data in an Oracle database to an HDFS distributed file system by adopting a Hadoop distributed computing platform, and the specific flow is as follows: the sqlldr2 client exports and converts data in an Oracle database into a txt format, transmits a txt format data task to a Hadoop distributed computing platform, breaks single task data through a MapReduce function in the Hadoop distributed computing platform, sends the broken task to a plurality of nodes, and then loads (Reduce) the broken task to an HDFS distributed file system in a single data set form.

Compared with the prior art, the invention has the beneficial effects that:

according to the invention, the required data is formatted and output to the txt file by using the sqlldr2 tool according to different requirements, and the data of the specific table and the specific field is appointed to be collected by adopting the ogg tool, so that synchronous conversion and transmission can be carried out on Oracle data, synchronous backup and storage are carried out on the Oracle data in the HDFS distributed file system, the problems that the data is frequently imported and exported, the data is difficult to import across platforms, and the table which needs to be synchronized cannot be newly added or deleted according to the requirements are effectively solved, automatic synchronization of incremental data is realized, and the workload of each manual synchronization is reduced.

Drawings

Fig. 1 is a schematic structural diagram of an implementation method based on sqlldr2 and ogg data synchronization.

Detailed Description

Referring to fig. 1, in the embodiment of the present invention, an implementation method based on sqlldr2 and ogg data synchronization includes an Oracle database, an sqlldr2 client, an ogg synchronization tool, and an HDFS distributed file system;

The redo log consists of two parts, one is a redo log buffer of a log cache in the memory, the other is a redo log file in the disk, when the log document data record is modified each time, the modified content is firstly written into the redo log buffer, and then the modification in the memory is refreshed back to the redo log file after waiting for a proper time, and the whole process is as follows: and if the data is in the memory, directly modifying, otherwise, loading the data into the memory from the disk, generating a redo log after the modification is finished, writing the redo log into a redo log buffer, recording the modified value, and flushing the content in the redo log file back to the redo log buffer according to the selected strategy.

When the archive log mode is operated, all transaction redo logs are stored, the Oracle database stops all new operations before the redo log file is copied, and the Oracle database does not cover the old transaction log file before the old transaction log file is copied.

The ogg synchronization tool starts all processes in the form of start at the commands of the source end and the target end, and the starting sequence is completed according to the source mgr, the target mgr, the source extract, the source pump and the target replay;

a. configuring global variables of the ogg;

b. configuration manager mgr;

c. adding a copy table;

d. configuring an extract process;

e. configuring a pump process;

f. configuring a define file;

sending the generated ogg-test file to a destination file;

a. starting HDFS distributed file system service;

b. configuration manager mgr;

c. configuring checkpoint;

d. prop, hdfs;

e. the trail file is added to the replication process.

The sqlldr2 client side transmits data in an Oracle database to an HDFS distributed file system by adopting a Hadoop distributed computing platform, and the specific flow is as follows: the sqlldr2 client exports and converts data in an Oracle database into a txt format, transmits a txt format data task to a Hadoop distributed computing platform, breaks single task data through a MapReduce function in the Hadoop distributed computing platform, sends the broken task to a plurality of nodes, and then loads (Reduce) the broken task to an HDFS distributed file system in a single data set form.

An implementation method based on sqlldr2 and ogg data synchronization comprises the following work flows:

the method comprises the following steps that S1, an sqlldr2 client side exports and converts data in an Oracle database and writes the data into a txt file, the txt file is imported for data initialization for the first time, then the sqlldr2 client side transmits a txt format data task to a Hadoop distributed computing platform, a MapReduce function in the Hadoop distributed computing platform breaks single task data, fragment tasks are sent to multiple nodes, and then the single task data are loaded into an HDFS distributed file system in a single data set mode;

s2, synchronizing, when modifying the log document data record in the Oracle database each time, if the data is in the memory, directly modifying, otherwise, loading the data from the disk into the memory, after the modification is completed, generating a redo log, writing the redo log into a redo log buffer, recording the modified value, flushing the content in the redo log file back into the redo log buffer according to the selected strategy, after the modification, operating in an archive log mode, and storing all the transaction redo logs;

and S3, reading the modified log file and generating a message by the source process of the ogg synchronization tool, and reading the file and importing the file into the HDFS distributed file system by the target process of the ogg synchronization tool.

The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art should be considered to be within the technical scope of the present invention, and the technical solutions and the inventive concepts thereof according to the present invention are equivalent to or changed within the technical scope of the present invention.

Claims

1. An implementation method based on sqlldr2 and ogg data synchronization is characterized by comprising an Oracle database, an sqlldr2 client, an ogg synchronization tool and an HDFS distributed file system;

2. The method for realizing data synchronization based on sqlldr2 and ogg according to claim 1, wherein the redo log is composed of two parts, one is a redo log buffer of a log cache in a memory, and the other is a redo log file of a log file in a disk, when a log document data record is modified each time, the modified content is written into the redo log buffer first, and then the modified content in the memory is flushed back to the redo log file after waiting for a proper time, and the whole process is as follows: and if the data is in the memory, directly modifying, otherwise, loading the data into the memory from the disk, generating a redo log after the modification is completed, writing the redo log into a redo log buffer, recording the modified value, and flushing the content in the redo log file back to the redo log buffer according to the selected strategy.

3. The method for realizing data synchronization based on sqlldr2 and ogg according to claim 1, wherein when the archivelog model is running, all the transaction redo logs are saved, before the redo log file replication is completed, the Oracle database stops all new operations, and before the old transaction records are completed, the Oracle database does not overwrite the old transaction records.

4. The method for realizing data synchronization based on sqlldr2 and ogg according to claim 1, wherein the ogg synchronization tool starts all processes in the form of start at the command of source end and target end, and the starting sequence is completed according to source mgr-target mgr-source extract-source pump-target replay;

a. configuring global variables of the ogg;

b. configuration manager mgr;

c. adding a copy table;

d. configuring an extract process;

e. configuring a pump process;

f. configuring a define file;

sending the generated ogg-test file to a destination file;

a. starting HDFS distributed file system service;

b. configuration manager mgr;

c. configuring checkpoint;

d. prop, hdfs;

e. the trail file is added to the replication process.

5. The method for achieving data synchronization based on sqlldr2 and ogg according to claim 1, wherein the sqlldr2 client side transmits data in an Oracle database to an HDFS distributed file system by using a Hadoop distributed computing platform, and the specific process is as follows: the sqlldr2 client exports and converts data in an Oracle database into a txt format, transmits a txt format data task to a Hadoop distributed computing platform, breaks single task data through a MapReduce function in the Hadoop distributed computing platform, sends the broken task to a plurality of nodes, and then loads the broken task to an HDFS distributed file system in a single data set mode.