CN113704213A - Sqlldr2 and ogg data synchronization-based implementation method - Google Patents

Sqlldr2 and ogg data synchronization-based implementation method Download PDF

Info

Publication number
CN113704213A
CN113704213A CN202110965086.1A CN202110965086A CN113704213A CN 113704213 A CN113704213 A CN 113704213A CN 202110965086 A CN202110965086 A CN 202110965086A CN 113704213 A CN113704213 A CN 113704213A
Authority
CN
China
Prior art keywords
data
ogg
sqlldr2
redo log
log
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110965086.1A
Other languages
Chinese (zh)
Inventor
陈典银
张德权
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Liaoning Zhenxing Bank Co ltd
Original Assignee
Liaoning Zhenxing Bank Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Liaoning Zhenxing Bank Co ltd filed Critical Liaoning Zhenxing Bank Co ltd
Priority to CN202110965086.1A priority Critical patent/CN113704213A/en
Publication of CN113704213A publication Critical patent/CN113704213A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/178Techniques for file synchronisation in file systems
    • G06F16/1794Details of file format conversion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to the technical field of data processing, and discloses a realization method based on sqlldr2 and ogg data synchronization, wherein an sqlldr2 client is used for exporting Oracle database data, exporting the data in a txt form and transmitting the data to an HDFS distributed file system, an oggogg synchronization tool source end reads a redo log in real time and analyzes the redo log into an sql statement and transmits the sql statement to a target end in a binary file manner, and an ogg synchronization tool process of the target end reads files and imports the files into the HDFS distributed file system. According to the invention, the required data is formatted and output to the txt file by using the sqlldr2 tool according to different requirements, and the data of the specific table and the specific field are appointed to be collected by adopting the ogg tool, so that the Oracle data can be synchronously backed up and stored in the HDFS distributed file system, the problems that the data is frequently imported and exported, the data is difficult to import across platforms, and the table required to be synchronized cannot be newly added or deleted according to the requirements are solved, the automatic synchronization of the incremental data is realized, and the workload of each manual synchronization is reduced.

Description

Sqlldr2 and ogg data synchronization-based implementation method
Technical Field
The invention relates to the technical field of data processing, in particular to a method for realizing data synchronization based on sqlldr2 and ogg.
Background
Sqlldr 2: sqlldr2 is a client tool that can export data in the oracle database into txt files according to a specific format.
Ogg: the source end reads redolog in real time, analyzes the redolog into sql statements and transmits the sql statements to the target end in a binary file mode.
The functions realized by the two are as follows: the sqlldr2 is used for initializing oracle-hadoop data, and the ogg is used for real-time synchronization of incremental data, so that the integrity of initial data and the timeliness of the incremental data are guaranteed.
However, the existing implementation method for synchronizing sqlldr2 and ogg data has the problems of frequent data import and export, difficult data cross-platform import and incapability of adding or deleting a table to be synchronized according to requirements. Therefore, those skilled in the art provide an implementation method based on sqlldr2 and ogg data synchronization to solve the problems in the background art.
Disclosure of Invention
The invention aims to provide an implementation method based on sqlldr2 and ogg data synchronization, so as to solve the problems in the background technology.
In order to achieve the purpose, the invention provides the following technical scheme: an implementation method based on sqlldr2 and ogg data synchronization is characterized by comprising an Oracle database, an sqlldr2 client, an ogg synchronization tool and an HDFS distributed file system;
the sqlldr2 client is used for exporting Oracle database data, exporting the data in a txt form and transmitting the data to an HDFS distributed file system;
the ogg synchronization tool is used for synchronously exporting the log documents of the Oracle database, a redo log in the Oracle database loads the modified log documents into a memory from a disk, an archive log mode archives and backups the modified log documents, when data are exported, a source end of the ogg synchronization tool reads the redo log in real time, analyzes the redo log into sql statements and transmits the sql statements to a target end in a binary file mode, and a process of the target end of the ogg synchronization tool reads the files and imports the files into an HDFS distributed file system.
As a still further scheme of the invention: the redo log consists of two parts, one is a redo log buffer of a log cache in the memory, the other is a redo log file in the disk, when the log document data record is modified each time, the modified content is firstly written into the redo log buffer, and then the modification in the memory is refreshed back to the redo log file after waiting for a proper time, and the whole process is as follows: and if the data is in the memory, directly modifying, otherwise, loading the data into the memory from the disk, generating a redo log after the modification is finished, writing the redo log into a redo log buffer, recording the modified value, and flushing the content in the redo log file back to the redo log buffer according to the selected strategy.
As a still further scheme of the invention: when the archive log mode is operated, all transaction redo logs are stored, the Oracle database stops all new operations before the redo log file is copied, and the Oracle database does not cover the old transaction log file before the old transaction log file is copied.
As a still further scheme of the invention: the ogg synchronization tool starts all processes in the form of start at the commands of the source end and the target end, and the starting sequence is completed according to the source mgr, the target mgr, the source extract, the source pump and the target replay;
the configuration of the source end of the ogg synchronization tool comprises the following steps:
a. configuring global variables of the ogg;
b. configuration manager mgr;
c. adding a copy table;
d. configuring an extract process;
e. configuring a pump process;
f. configuring a define file;
sending the generated ogg-test file to a destination file;
the configuration of the target end of the ogg synchronization tool comprises the following steps:
a. starting HDFS distributed file system service;
b. configuration manager mgr;
c. configuring checkpoint;
d. prop, hdfs;
e. the trail file is added to the replication process.
As a still further scheme of the invention: the sqlldr2 client side transmits data in an Oracle database to an HDFS distributed file system by adopting a Hadoop distributed computing platform, and the specific flow is as follows: the sqlldr2 client exports and converts data in an Oracle database into a txt format, transmits a txt format data task to a Hadoop distributed computing platform, breaks single task data through a MapReduce function in the Hadoop distributed computing platform, sends the broken task to a plurality of nodes, and then loads (Reduce) the broken task to an HDFS distributed file system in a single data set form.
Compared with the prior art, the invention has the beneficial effects that:
according to the invention, the required data is formatted and output to the txt file by using the sqlldr2 tool according to different requirements, and the data of the specific table and the specific field is appointed to be collected by adopting the ogg tool, so that synchronous conversion and transmission can be carried out on Oracle data, synchronous backup and storage are carried out on the Oracle data in the HDFS distributed file system, the problems that the data is frequently imported and exported, the data is difficult to import across platforms, and the table which needs to be synchronized cannot be newly added or deleted according to the requirements are effectively solved, automatic synchronization of incremental data is realized, and the workload of each manual synchronization is reduced.
Drawings
Fig. 1 is a schematic structural diagram of an implementation method based on sqlldr2 and ogg data synchronization.
Detailed Description
Referring to fig. 1, in the embodiment of the present invention, an implementation method based on sqlldr2 and ogg data synchronization includes an Oracle database, an sqlldr2 client, an ogg synchronization tool, and an HDFS distributed file system;
the sqlldr2 client is used for exporting Oracle database data, exporting the data in a txt form and transmitting the data to an HDFS distributed file system;
the ogg synchronization tool is used for synchronously exporting the log documents of the Oracle database, a redo log in the Oracle database loads the modified log documents into a memory from a disk, an archive log mode archives and backups the modified log documents, when data are exported, a source end of the ogg synchronization tool reads the redo log in real time, analyzes the redo log into sql statements and transmits the sql statements to a target end in a binary file mode, and a process of the target end of the ogg synchronization tool reads the files and imports the files into an HDFS distributed file system.
The redo log consists of two parts, one is a redo log buffer of a log cache in the memory, the other is a redo log file in the disk, when the log document data record is modified each time, the modified content is firstly written into the redo log buffer, and then the modification in the memory is refreshed back to the redo log file after waiting for a proper time, and the whole process is as follows: and if the data is in the memory, directly modifying, otherwise, loading the data into the memory from the disk, generating a redo log after the modification is finished, writing the redo log into a redo log buffer, recording the modified value, and flushing the content in the redo log file back to the redo log buffer according to the selected strategy.
When the archive log mode is operated, all transaction redo logs are stored, the Oracle database stops all new operations before the redo log file is copied, and the Oracle database does not cover the old transaction log file before the old transaction log file is copied.
The ogg synchronization tool starts all processes in the form of start at the commands of the source end and the target end, and the starting sequence is completed according to the source mgr, the target mgr, the source extract, the source pump and the target replay;
the configuration of the source end of the ogg synchronization tool comprises the following steps:
a. configuring global variables of the ogg;
b. configuration manager mgr;
c. adding a copy table;
d. configuring an extract process;
e. configuring a pump process;
f. configuring a define file;
sending the generated ogg-test file to a destination file;
the configuration of the target end of the ogg synchronization tool comprises the following steps:
a. starting HDFS distributed file system service;
b. configuration manager mgr;
c. configuring checkpoint;
d. prop, hdfs;
e. the trail file is added to the replication process.
The sqlldr2 client side transmits data in an Oracle database to an HDFS distributed file system by adopting a Hadoop distributed computing platform, and the specific flow is as follows: the sqlldr2 client exports and converts data in an Oracle database into a txt format, transmits a txt format data task to a Hadoop distributed computing platform, breaks single task data through a MapReduce function in the Hadoop distributed computing platform, sends the broken task to a plurality of nodes, and then loads (Reduce) the broken task to an HDFS distributed file system in a single data set form.
An implementation method based on sqlldr2 and ogg data synchronization comprises the following work flows:
the method comprises the following steps that S1, an sqlldr2 client side exports and converts data in an Oracle database and writes the data into a txt file, the txt file is imported for data initialization for the first time, then the sqlldr2 client side transmits a txt format data task to a Hadoop distributed computing platform, a MapReduce function in the Hadoop distributed computing platform breaks single task data, fragment tasks are sent to multiple nodes, and then the single task data are loaded into an HDFS distributed file system in a single data set mode;
s2, synchronizing, when modifying the log document data record in the Oracle database each time, if the data is in the memory, directly modifying, otherwise, loading the data from the disk into the memory, after the modification is completed, generating a redo log, writing the redo log into a redo log buffer, recording the modified value, flushing the content in the redo log file back into the redo log buffer according to the selected strategy, after the modification, operating in an archive log mode, and storing all the transaction redo logs;
and S3, reading the modified log file and generating a message by the source process of the ogg synchronization tool, and reading the file and importing the file into the HDFS distributed file system by the target process of the ogg synchronization tool.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art should be considered to be within the technical scope of the present invention, and the technical solutions and the inventive concepts thereof according to the present invention are equivalent to or changed within the technical scope of the present invention.

Claims (5)

1. An implementation method based on sqlldr2 and ogg data synchronization is characterized by comprising an Oracle database, an sqlldr2 client, an ogg synchronization tool and an HDFS distributed file system;
the sqlldr2 client is used for exporting Oracle database data, exporting the data in a txt form and transmitting the data to an HDFS distributed file system;
the ogg synchronization tool is used for synchronously exporting the log documents of the Oracle database, a redo log in the Oracle database loads the modified log documents into a memory from a disk, an archive log mode archives and backups the modified log documents, when data are exported, a source end of the ogg synchronization tool reads the redo log in real time, analyzes the redo log into sql statements and transmits the sql statements to a target end in a binary file mode, and a process of the target end of the ogg synchronization tool reads the files and imports the files into an HDFS distributed file system.
2. The method for realizing data synchronization based on sqlldr2 and ogg according to claim 1, wherein the redo log is composed of two parts, one is a redo log buffer of a log cache in a memory, and the other is a redo log file of a log file in a disk, when a log document data record is modified each time, the modified content is written into the redo log buffer first, and then the modified content in the memory is flushed back to the redo log file after waiting for a proper time, and the whole process is as follows: and if the data is in the memory, directly modifying, otherwise, loading the data into the memory from the disk, generating a redo log after the modification is completed, writing the redo log into a redo log buffer, recording the modified value, and flushing the content in the redo log file back to the redo log buffer according to the selected strategy.
3. The method for realizing data synchronization based on sqlldr2 and ogg according to claim 1, wherein when the archivelog model is running, all the transaction redo logs are saved, before the redo log file replication is completed, the Oracle database stops all new operations, and before the old transaction records are completed, the Oracle database does not overwrite the old transaction records.
4. The method for realizing data synchronization based on sqlldr2 and ogg according to claim 1, wherein the ogg synchronization tool starts all processes in the form of start at the command of source end and target end, and the starting sequence is completed according to source mgr-target mgr-source extract-source pump-target replay;
the configuration of the source end of the ogg synchronization tool comprises the following steps:
a. configuring global variables of the ogg;
b. configuration manager mgr;
c. adding a copy table;
d. configuring an extract process;
e. configuring a pump process;
f. configuring a define file;
sending the generated ogg-test file to a destination file;
the configuration of the target end of the ogg synchronization tool comprises the following steps:
a. starting HDFS distributed file system service;
b. configuration manager mgr;
c. configuring checkpoint;
d. prop, hdfs;
e. the trail file is added to the replication process.
5. The method for achieving data synchronization based on sqlldr2 and ogg according to claim 1, wherein the sqlldr2 client side transmits data in an Oracle database to an HDFS distributed file system by using a Hadoop distributed computing platform, and the specific process is as follows: the sqlldr2 client exports and converts data in an Oracle database into a txt format, transmits a txt format data task to a Hadoop distributed computing platform, breaks single task data through a MapReduce function in the Hadoop distributed computing platform, sends the broken task to a plurality of nodes, and then loads the broken task to an HDFS distributed file system in a single data set mode.
CN202110965086.1A 2021-08-20 2021-08-20 Sqlldr2 and ogg data synchronization-based implementation method Pending CN113704213A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110965086.1A CN113704213A (en) 2021-08-20 2021-08-20 Sqlldr2 and ogg data synchronization-based implementation method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110965086.1A CN113704213A (en) 2021-08-20 2021-08-20 Sqlldr2 and ogg data synchronization-based implementation method

Publications (1)

Publication Number Publication Date
CN113704213A true CN113704213A (en) 2021-11-26

Family

ID=78653801

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110965086.1A Pending CN113704213A (en) 2021-08-20 2021-08-20 Sqlldr2 and ogg data synchronization-based implementation method

Country Status (1)

Country Link
CN (1) CN113704213A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114048178A (en) * 2021-11-29 2022-02-15 众和空间(北京)科技有限责任公司 Dual-mode storage and synchronization method of data

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120030172A1 (en) * 2010-07-27 2012-02-02 Oracle International Corporation Mysql database heterogeneous log based replication
CN102752372A (en) * 2012-06-18 2012-10-24 天津神舟通用数据技术有限公司 File based database synchronization method
US20150193466A1 (en) * 2014-01-06 2015-07-09 Siegfried Luft Architecture and method for cloud provider selection and projection
CN106503104A (en) * 2016-10-17 2017-03-15 山东浪潮商用系统有限公司 Oracle database data copy method under a kind of tertiary-structure network pattern
CN107943979A (en) * 2017-11-29 2018-04-20 山东鲁能软件技术有限公司 The quasi real time synchronous method and device of data between a kind of database
CN109376196A (en) * 2018-09-25 2019-02-22 浪潮软件集团有限公司 Method and device for batch synchronization of redo logs
CN110727548A (en) * 2019-09-29 2020-01-24 上海英方软件股份有限公司 Continuous data protection method and device based on database DML synchronization
CN111427898A (en) * 2020-03-16 2020-07-17 上海爱数信息技术股份有限公司 Continuous data protection system and method based on analysis of Oracle log
CN111723160A (en) * 2020-08-24 2020-09-29 国网浙江省电力有限公司 Multi-source heterogeneous incremental data synchronization method and system
CN111797337A (en) * 2020-07-14 2020-10-20 上海乾臻信息科技有限公司 Method and device for generating export page
CN112181902A (en) * 2020-11-30 2021-01-05 阿里云计算有限公司 Database storage method and device and electronic equipment
CN112527801A (en) * 2020-12-21 2021-03-19 中国人民银行清算总中心 Data synchronization method and system between relational database and big data system

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120030172A1 (en) * 2010-07-27 2012-02-02 Oracle International Corporation Mysql database heterogeneous log based replication
CN102752372A (en) * 2012-06-18 2012-10-24 天津神舟通用数据技术有限公司 File based database synchronization method
US20150193466A1 (en) * 2014-01-06 2015-07-09 Siegfried Luft Architecture and method for cloud provider selection and projection
CN106503104A (en) * 2016-10-17 2017-03-15 山东浪潮商用系统有限公司 Oracle database data copy method under a kind of tertiary-structure network pattern
CN107943979A (en) * 2017-11-29 2018-04-20 山东鲁能软件技术有限公司 The quasi real time synchronous method and device of data between a kind of database
CN109376196A (en) * 2018-09-25 2019-02-22 浪潮软件集团有限公司 Method and device for batch synchronization of redo logs
CN110727548A (en) * 2019-09-29 2020-01-24 上海英方软件股份有限公司 Continuous data protection method and device based on database DML synchronization
CN111427898A (en) * 2020-03-16 2020-07-17 上海爱数信息技术股份有限公司 Continuous data protection system and method based on analysis of Oracle log
CN111797337A (en) * 2020-07-14 2020-10-20 上海乾臻信息科技有限公司 Method and device for generating export page
CN111723160A (en) * 2020-08-24 2020-09-29 国网浙江省电力有限公司 Multi-source heterogeneous incremental data synchronization method and system
CN112181902A (en) * 2020-11-30 2021-01-05 阿里云计算有限公司 Database storage method and device and electronic equipment
CN112527801A (en) * 2020-12-21 2021-03-19 中国人民银行清算总中心 Data synchronization method and system between relational database and big data system

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114048178A (en) * 2021-11-29 2022-02-15 众和空间(北京)科技有限责任公司 Dual-mode storage and synchronization method of data

Similar Documents

Publication Publication Date Title
CN104850598B (en) A kind of real-time data base back-up restoring method
CN105868343B (en) Database migration method and system
CN101266606B (en) On-line data migration method based on Oracle database
US20110106768A1 (en) Backup using metadata virtual hard drive and differential virtual hard drive
EP2194467A1 (en) Extend CRUD to support lifecycle management and business continuity
CN100498796C (en) Logic log generation method, database backup/ restoration method and system
US20050278388A1 (en) Apparatus, system, and method for reliably updating a data group in a data replication environment
CN102339321A (en) Network file system with version control and method using same
CN109299072A (en) A kind of method and device that database migrates online
CN114036119A (en) Data synchronization method based on button and database log
CN103530290A (en) Method and system for data migration among databases
US8051054B2 (en) Method and system for data processing with database update for the same
CN102541694A (en) Method and device for database backup
CN103914359A (en) Data recovery method and device
CN110597821B (en) Method and device for changing database table structure
CN108614876B (en) Redis database-based system and data processing method
CN110727548A (en) Continuous data protection method and device based on database DML synchronization
CN111930850A (en) Data verification method and device, computer equipment and storage medium
CN111367994A (en) Method and system for synchronously backing up incremental data of database
CN113704213A (en) Sqlldr2 and ogg data synchronization-based implementation method
US8255367B2 (en) File management apparatus, method, and program
CN103176868A (en) File status backup method
CN115658391A (en) Backup recovery method of WAL mechanism based on QianBase MPP database
US20230306014A1 (en) Transactionally consistent database exports
WO2017067397A1 (en) Data recovery method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB03 Change of inventor or designer information
CB03 Change of inventor or designer information

Inventor after: Kang Zhuang

Inventor before: Chen Dianyin

Inventor before: Zhang Dequan

RJ01 Rejection of invention patent application after publication

Application publication date: 20211126