CN110555055A - data mining method for redo log file of Oracle database - Google Patents

data mining method for redo log file of Oracle database Download PDF

Info

Publication number
CN110555055A
CN110555055A CN201910654000.6A CN201910654000A CN110555055A CN 110555055 A CN110555055 A CN 110555055A CN 201910654000 A CN201910654000 A CN 201910654000A CN 110555055 A CN110555055 A CN 110555055A
Authority
CN
China
Prior art keywords
log
file
redo
redo log
log file
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910654000.6A
Other languages
Chinese (zh)
Inventor
杨万清
栾敬钊
张葆刚
赵云
于颖
李若斌
娄道国
王跃东
姜学朴
牛明珠
李君秋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Corp of China SGCC
State Grid Liaoning Electric Power Co Ltd
Original Assignee
State Grid Corp of China SGCC
State Grid Liaoning Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Corp of China SGCC, State Grid Liaoning Electric Power Co Ltd filed Critical State Grid Corp of China SGCC
Priority to CN201910654000.6A priority Critical patent/CN110555055A/en
Publication of CN110555055A publication Critical patent/CN110555055A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1471Saving, restoring, recovering or retrying involving logging of persistent data for recovery
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/1805Append-only file systems, e.g. using logs or journals to store data
    • G06F16/1815Journaling file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2216/00Indexing scheme relating to additional aspects of information retrieval not explicitly covered by G06F16/00 and subgroups
    • G06F2216/03Data mining

Abstract

the invention discloses a data mining method for redo log files of an Oracle database. And the backup terminal synthesis process synthesizes the log fragments into a log file, finally, the application process analyzes and registers the log, and writes the log file into the disaster recovery database after the log file passes the verification and registration so as to realize the data replication of the database. The invention supports heterogeneous operating system and database management system, which is convenient for the data synchronization of the client between different database management systems and operating system platforms; compared with the transmission of the whole log file, the data volume transmitted by the method is greatly reduced, and the method can control the number of the sub-processes, so that the resources occupied by the sub-processes are less.

Description

data mining method for redo log file of Oracle database
Technical Field
the invention relates to the field of data backup and recovery of a database, in particular to a data mining method for redo log files of an Oracle database.
background
The primary purpose of the redo log file is that the redo log file can be dispatched to the farm in the event of an instance or media failure, or can be used as a way to maintain a standby database (standby database) to accomplish failover. If the host on which the database is located loses power, causing the instance to fail, Oracle will use the online redo log to restore the system to the point before the power failure. If the disk drive containing the data file has a permanent failure, Oracle will use the archive redo log and the online redo log to restore the backup of the disk drive to the appropriate point in time. In addition, if you inadvertently delete a table, or delete some important information, and commit the operation, a backup can be restored and Oracle can use the online and archive redo log files to restore it to the point before the incident occurred.
Disclosure of Invention
the invention provides a data mining method for redoing log files of an Oracle database aiming at the problems of data backup of the database and data recovery after the database fails, which comprises the following specific steps:
S1: acquiring a full path name of a database redo log, and judging whether the redo log is stored in a file system or on an Automatic Storage Manager (ASM) according to the full path name;
S2: if the log file exists on the ASM, acquiring the data block size information of the redo log file from the ASM; otherwise, accessing a file system, and acquiring data block size information of the redo log file from the file system, wherein the data block size information refers to the number of bytes of one data block;
S3: calculating the number of data blocks according to the size information of the data blocks of the current redo log file, starting a plurality of sub-processes, performing data mining of an independent application process on each data block, and setting the maximum number of processes if the performance factor of a processor is considered so as to avoid the excessive number of processes;
s4: database ID information of 24 th-27 th bytes of a file head of the current redo log file and thread information of 176 th-177 th bytes are mined, whether the database ID information is consistent with the corresponding information of the database or not is checked, and whether the obtained current redo log belongs to the database or not is judged; if not, returning an error, and if the error is consistent, entering the next step;
s5: reading and recording a log sequence number log sequence of 8 th-11 th bytes of a file header of the current redo log file, wherein the log sequence number represents the sequence of the current redo log file;
s6: mining a data block of recorded content after a file header of the current redo log file, mining 1M data each time, returning error information if the acquisition fails, and terminating the current mining;
S7: comparing the log sequence number log sequence in the 'file header' of the redo log file with the log sequence number log sequence in the data block of the recorded content, if the log sequence number log sequence is not consistent with the log sequence number log sequence, indicating that the redo recorded content recorded in the current redo log file is not the redo recorded content in the database, returning error information and terminating the current log mining;
s8: if the two log serial numbers are consistent, recording the current excavation position, and writing the excavated data block content into a newly-established temporary log record file, wherein the temporary file name is named according to the service system id, the online log group number, the log serial number and the current data block number;
S9: checking whether the redo log file finishes the mining work of all the data blocks, if not, jumping to S5, and continuing to mine the next data block in the redo log file;
S10: if the redo log file finishes mining, recording the number of data blocks of the current redo log file, and storing the number of the data blocks into 24-27 bytes of a file header of the redo log file;
s11: checking the redo log file, and writing the checking result into 14-15 bytes of a redo log file header;
S12: changing the online log type in the bytes 236 and 239 of the 'file head' of the redo log file into an archive log type;
s13: and finishing the mining of the current redo log file or performing the mining of the next redo log file.
in a preferred mode, the redo log file is divided into a plurality of data blocks, the first two data blocks of the redo log are respectively a file header and a log header, the file header and the log header record attribute information of the redo log file, and each data block records redo records in the database operation process from the third data block.
the invention has the beneficial effects that:
(1) supporting heterogeneous operating systems and database management systems, facilitating data synchronization of clients between different database management systems and operating system platforms, which is the core advantage;
(2) Compared with the transmission of the whole log file, the data volume transmitted by the method is greatly reduced, and the whole log file 1/4 or less is common;
(3) the method avoids the influence of the update of the copied database log on the copying process to the maximum extent, ensures zero loss of data, can realize limited copying of partial objects, meets the requirements of data distribution and data integration, and reduces the pressure of a source database;
(4) the method can control the number of the sub-processes, and occupies less system resources.
Drawings
FIG. 1 is a flow chart of a data method;
FIG. 2 is the format and specific information of the "header" of the redo log.
Detailed Description
the mining method for the redo log of the database is specifically described with reference to fig. 1, and a specific embodiment thereof is as follows;
The log change is tracked by rapidly analyzing the log, the log is mined at the first time of change to obtain data change fragments, and the log change data is cached in a local file system and then transmitted to the standby terminal through a transmission process. And the backup terminal synthesis process synthesizes the log fragments into a log file, finally, the application process analyzes and registers the log, and writes the log file into the disaster recovery database after the log file passes the verification and registration so as to realize the data replication of the database. The redo log file is divided into a plurality of data blocks, the first two data blocks of the redo log are respectively a file header and a log header, the file header and the log header record attribute information of the redo log file, and from the third data block, each data block records a redo record of the database operation process, as shown in fig. 2, the format and specific information of the file header of the redo log are shown. The specific data mining method comprises the following steps:
s1: acquiring a full path name of a database redo log, and judging whether the redo log is stored in a file system or on an Automatic Storage Manager (ASM) according to the full path name;
s2: if the log file exists on the ASM, acquiring data block size information of the redo log file from a table v $ ASM _ disk in the ASM; otherwise, accessing a file system, and acquiring data block size information of the redo log file from v $ log and v $ logfile in the file system, wherein the data block size information refers to the number of bytes of one data block; the table v $ ASM _ disk and the table v $ log, v $ logfile are tables inherent in the file system and the ASM, are used for storing basic information of the database, and include size information of the data block;
S3: calculating the number of data blocks according to the size information of the data blocks of the current redo log file, starting a plurality of sub-processes, performing data mining of an independent application process on each data block, and setting the maximum number of processes if the performance factor of a processor is considered so as to avoid the excessive number of processes;
S4: database ID information of 24 th-27 th bytes of a file head and thread information of 176 th-177 th bytes in the current redo log file are mined, whether the database ID information is consistent with the corresponding information of the database or not is checked, and whether the obtained current redo log belongs to the database or not is judged; if not, returning an error, and if the error is consistent, entering the next step;
S5: reading and recording a log sequence number log sequence of 8 th-11 th bytes of a file header of the current redo log file, wherein the log sequence number represents the sequence of the current redo log file;
S6: mining a data block of recorded content after a file header of the current redo log file, mining 1M data each time, returning error information if the acquisition fails, and terminating the current mining; generally, each data block has 512 bytes, and mining 1M is exactly the size of two data blocks, if the byte number of the mining data is increased, the mining speed is too low, if the database is changed again at the moment, the mined data is inconsistent with the actual redo record, and if the byte number of the mining data is reduced, the waste of computer operation resources is caused, so the mining speed is set to 1M;
S7: comparing the log sequence number log sequence in the 'file header' of the redo log file with the log sequence number log sequence in the data block of the recorded content, if the log sequence number log sequence is not consistent with the log sequence number log sequence, indicating that the redo recorded content recorded in the current redo log file is not the redo recorded content in the database, returning error information and terminating the current log mining;
S8: if the two log serial numbers are consistent, recording the current excavation position, and writing the excavated data block content into a newly-established temporary log record file, wherein the temporary file name is named according to the service system id, the online log group number, the log serial number and the current data block number;
s9: checking whether the redo log file finishes the mining work of all the data blocks, if not, jumping to S5, and continuing to mine the next data block in the redo log file;
s10: if the redo log file finishes mining, recording the number of data blocks of the current redo log file, and storing the number of the data blocks into 24-27 bytes of a file header of the redo log file;
S11: calculating the checksum of the redo log file, and writing the check result into 14-15 bytes of a file header of the redo log; the algorithm for the checksum is as follows:
(1) dividing the log header into a plurality of 64-byte blocks; the definition of 64 bytes is because the following calculation is convenient, and each block is divided into 4 segments of 16 bytes, 4 × 16 ═ 64;
(2) Firstly, dividing a first block into four 16-byte segments, and carrying out 16-byte XOR on the 1 st segment and the 2 nd segment to obtain a value a; carrying out 16-byte XOR on the 3 rd section and the 4 th section to obtain a value b; carrying out XOR on a and b to obtain new c, wherein the initial value of c is 0, and carrying out XOR on c and b to obtain new c;
(3) Circularly executing the operation on the large 64-byte section, and carrying out exclusive OR on the large 64-byte section and the value c obtained in the previous round each time, wherein c is a global variable and is not an initial value taken each time;
(4) waiting for all the calculation to obtain a final c value, and dividing 16 bytes of c into 4 segments of 4 bytes, r1, r2, r3 and r 4; carrying out bitwise XOR on the four values to obtain a value r 0;
(5) shifting r0 to the right by 16 bits to obtain g, and performing XOR on r0 and g to obtain m;
(6) performing bitwise AND on m and 0xFFFF to obtain a final checksum;
s12: changing the online log type in the bytes 236 and 239 of the 'file head' of the redo log file into an archive log type;
S13: and finishing the mining of the current redo log file or performing the mining of the next redo log file.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art should be able to cover the technical solutions and the inventive concepts of the present invention within the technical scope of the present invention.

Claims (2)

1. A data mining method for redo log files of an Oracle database is characterized by comprising the following steps:
s1: acquiring a full path name of a database redo log, and judging whether the redo log is stored in a file system or on an Automatic Storage Manager (ASM) according to the full path name;
s2: if the log file exists on the ASM, acquiring the data block size information of the redo log file from the ASM; otherwise, accessing a file system, and acquiring data block size information of the redo log file from the file system, wherein the data block size information refers to the number of bytes of one data block;
s3: calculating the number of data blocks according to the size information of the data blocks of the current redo log file, starting a plurality of sub-processes, performing data mining of an independent application process on each data block, and setting the maximum number of processes if the performance factor of a processor is considered so as to avoid the excessive number of processes;
s4: database ID information of 24 th-27 th bytes of a file head of the current redo log file and thread information of 176 th-177 th bytes are mined, whether the database ID information is consistent with the corresponding information of the database or not is checked, and whether the obtained current redo log belongs to the database or not is judged; if not, returning an error, and if the error is consistent, entering the next step;
s5: reading and recording a log sequence number log sequence of 8 th-11 th bytes of a file header of the current redo log file, wherein the log sequence number represents the sequence of the current redo log file;
S6: mining a data block of recorded content after a file header of the current redo log file, mining 1M data each time, returning error information if the acquisition fails, and terminating the current mining;
s7: comparing the log sequence number log sequence in the 'file header' of the redo log file with the log sequence number log sequence in the data block of the recorded content, if the log sequence number log sequence is not consistent with the log sequence number log sequence, indicating that the redo recorded content recorded in the current redo log file is not the redo recorded content in the database, returning error information and terminating the current log mining;
s8: if the two log serial numbers are consistent, recording the current excavation position, and writing the excavated data block content into a newly-established temporary log record file, wherein the temporary file name is named according to the service system id, the online log group number, the log serial number and the current data block number;
s9: checking whether the redo log file finishes the mining work of all the data blocks, if not, jumping to S5, and continuing to mine the next data block in the redo log file;
S10: if the redo log file finishes mining, recording the number of data blocks of the current redo log file, and storing the number of the data blocks into 24-27 bytes of a file header of the redo log file;
S11: checking the redo log file, and writing the checking result into 14-15 bytes of a redo log file header;
S12: changing the online log type in the bytes 236 and 239 of the 'file head' of the redo log file into an archive log type;
s13: and finishing the mining of the current redo log file or performing the mining of the next redo log file.
2. the data mining method for the Oracle database redo log file according to claim 1, wherein the redo log file is divided into a plurality of data blocks, the first two data blocks of the redo log are respectively a "file header" and a "log header", the "file header" and the "log header" record the attribute information of the redo log file, and each data block records a redo record of the database operation process from the third data block.
CN201910654000.6A 2019-07-19 2019-07-19 data mining method for redo log file of Oracle database Pending CN110555055A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910654000.6A CN110555055A (en) 2019-07-19 2019-07-19 data mining method for redo log file of Oracle database

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910654000.6A CN110555055A (en) 2019-07-19 2019-07-19 data mining method for redo log file of Oracle database

Publications (1)

Publication Number Publication Date
CN110555055A true CN110555055A (en) 2019-12-10

Family

ID=68736477

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910654000.6A Pending CN110555055A (en) 2019-07-19 2019-07-19 data mining method for redo log file of Oracle database

Country Status (1)

Country Link
CN (1) CN110555055A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110309128A (en) * 2019-07-05 2019-10-08 广东铭太信息科技有限公司 Oracle backup file automatic leading-in device and its implementation, the method that the importing of backup file is carried out using the device
CN111190771A (en) * 2019-12-27 2020-05-22 柏科数据技术(深圳)股份有限公司 Block tracking technology for disaster recovery
CN111611107A (en) * 2020-05-21 2020-09-01 云和恩墨(北京)信息技术有限公司 Method and device for acquiring database logs
CN112965949A (en) * 2021-03-03 2021-06-15 上海英方软件股份有限公司 Method for remotely reading ORACLE log file
CN113515573A (en) * 2021-05-08 2021-10-19 上海英方软件股份有限公司 Method and system for one-to-many negotiation replication breakpoint of ORACLE database

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060117074A1 (en) * 2004-11-30 2006-06-01 Ezzat Ahmed K Method and apparatus for database cluster recovery
CN102629250A (en) * 2012-02-28 2012-08-08 杭州丰城信息技术有限公司 Recovery method of redo log files for main memory database
CN105022676A (en) * 2014-04-22 2015-11-04 大唐软件技术股份有限公司 Recovery method and device of main memory database redo log files
CN105677876A (en) * 2016-01-12 2016-06-15 国家电网公司 Method for log mining based on physical level database

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060117074A1 (en) * 2004-11-30 2006-06-01 Ezzat Ahmed K Method and apparatus for database cluster recovery
CN102629250A (en) * 2012-02-28 2012-08-08 杭州丰城信息技术有限公司 Recovery method of redo log files for main memory database
CN105022676A (en) * 2014-04-22 2015-11-04 大唐软件技术股份有限公司 Recovery method and device of main memory database redo log files
CN105677876A (en) * 2016-01-12 2016-06-15 国家电网公司 Method for log mining based on physical level database

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110309128A (en) * 2019-07-05 2019-10-08 广东铭太信息科技有限公司 Oracle backup file automatic leading-in device and its implementation, the method that the importing of backup file is carried out using the device
CN110309128B (en) * 2019-07-05 2020-07-17 广东铭太信息科技有限公司 Oracle backup file automatic importing device, implementation method thereof and method for importing backup file by using device
CN111190771A (en) * 2019-12-27 2020-05-22 柏科数据技术(深圳)股份有限公司 Block tracking technology for disaster recovery
CN111611107A (en) * 2020-05-21 2020-09-01 云和恩墨(北京)信息技术有限公司 Method and device for acquiring database logs
CN112965949A (en) * 2021-03-03 2021-06-15 上海英方软件股份有限公司 Method for remotely reading ORACLE log file
CN113515573A (en) * 2021-05-08 2021-10-19 上海英方软件股份有限公司 Method and system for one-to-many negotiation replication breakpoint of ORACLE database

Similar Documents

Publication Publication Date Title
CN110555055A (en) data mining method for redo log file of Oracle database
US20200409791A1 (en) Serial Storage Node Processing of Data Functions
KR101863406B1 (en) Nonvolatile media journaling of verified data sets
US7421617B2 (en) Systems and methods for optimizing restoration of stored data
US7360113B2 (en) Protocol for communicating data block copies in an error recovery environment
CN106776130B (en) Log recovery method, storage device and storage node
US20160085630A1 (en) Hash collision recovery in a deduplication vault
US20100070478A1 (en) Retrieval and recovery of data chunks from alternate data stores in a deduplicating system
JP2005301497A (en) Storage management system, restoration method and its program
US20080162599A1 (en) Optimizing backup and recovery utilizing change tracking
US7487385B2 (en) Apparatus and method for recovering destroyed data volumes
JPH04230512A (en) Method and apparatus for updating record for dasd array
US7818524B2 (en) Data migration systems and methods for independent storage device expansion and adaptation
US7020805B2 (en) Efficient mechanisms for detecting phantom write errors
US10503620B1 (en) Parity log with delta bitmap
US11429498B2 (en) System and methods of efficiently resyncing failed components without bitmap in an erasure-coded distributed object with log-structured disk layout
CN105302665A (en) Improved copy-on-write snapshot method and system
EP2669804A1 (en) Storage system
US10990324B2 (en) Storage node processing of predefined data functions
CN113885809B (en) Data management system and method
CN113868028A (en) Method for replaying log on data node, data node and system
US8151069B1 (en) Multiprotection for snapsnots
CN113377569A (en) Method, apparatus and computer program product for recovering data
WO2022105442A1 (en) Erasure code-based data reconstruction method and appratus, device, and storage medium
US20120158652A1 (en) System and method for ensuring consistency in raid storage array metadata

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20191210

RJ01 Rejection of invention patent application after publication