CN114385260A

CN114385260A - ROWID interval-based initialization loading method and equipment

Info

Publication number: CN114385260A
Application number: CN202111534715.1A
Authority: CN
Inventors: 孙峰; 余院兰; 彭青松; 刘启春
Original assignee: Wuhan Dream Database Co ltd
Current assignee: Wuhan Dream Database Co ltd
Priority date: 2021-12-15
Filing date: 2021-12-15
Publication date: 2022-04-22

Abstract

The invention relates to an initial loading method and equipment based on a ROWID interval. The method mainly comprises the following steps: dividing the table data set into a plurality of small data sets according to the ROWID size sequence, and obtaining a plurality of corresponding ROWID intervals; taking a small data set, obtaining loading LSN of the small data set, and completing the loading of the small data set; repeating the steps until all the small data sets are loaded; after the data synchronization is started, the destination end positions the corresponding ROWID interval according to the ROWID of the operation log, and the data synchronization filtering is realized by comparing the operation LSN with the corresponding loading LSN. The method comprises the steps of dividing a result set to be loaded, dividing the whole data set into N small data sets according to a specified numerical value, and then respectively loading the N small data sets. Thus, the time to acquire data of a small data set is much shorter than the data acquisition time of the whole data set, thereby avoiding the problem of 'snapshot over-old'.

Description

ROWID interval-based initialization loading method and equipment

Technical Field

The invention relates to the technical field of database data processing, in particular to an initial loading method and equipment based on a ROWID interval.

Background

At present, heterogeneous database replication technology based on database log analysis is widely applied. The technology captures the incremental data of the database at the source end, then sends the incremental data to the destination end, and applies the incremental data to the target database at the destination end through a general database access interface to realize data replication. The technology uses a universal database interface, so that heterogeneous database system replication is supported, heterogeneous operating system environments are supported, and a destination standby database system can read and write, and is a 'double-active' system.

When the database data is synchronized in real time, first, data initialization operation needs to be performed on a target database to obtain a base point of data synchronization. After the data initialization operation is completed, real-time incremental data synchronization can be performed on the basic data. In a real database application, a source database may have many applications, the applications may modify the database in minutes, seconds, and if tables involved in the modification operations have a large data volume, the tables need a long result set extraction time when the data initialization from the source database to a destination database is realized, and a "snapshot over-old" result set is frequently reported when the result set is extracted in a database environment with multiple version functions, thereby causing initialization failure. The old snapshot is a common error in a database with a multi-version function, and is mainly caused by that the space of a rollback section is quickly consumed by frequent operations on the database, and the database is supplemented by releasing the rollback section space of other submitted transactions, so that the modification history of a corresponding record cannot be traced back through the corresponding rollback section when a current result set is extracted, and an error is reported because the corresponding rollback section is released and does not exist at the moment.

Therefore, a technical problem to be solved in the art is to find a method that can solve the influence of an "overused snapshot" error on a data synchronization initialization loading function, and ensure the consistency between a source-end database and a destination-end database after the loading is completed and synchronization is started.

Disclosure of Invention

In view of the above drawbacks or improvement requirements of the prior art, the present invention provides an initial loading method and device based on a ROWID interval, which divides a result set to be loaded, divides the entire data set into N small data sets according to a specified value, and then loads the N small data sets with data. In addition, the ROWID is a unique identifier of each row of data in the table of the database, the ROWID is used as a data screening condition, the query efficiency is high, and the influence on the database is small.

The embodiment of the invention adopts the following technical scheme:

in a first aspect, the present invention provides an initial loading method based on a ROWID interval, including:

dividing the table data set into a plurality of small data sets according to the ROWID size sequence, and obtaining a plurality of corresponding ROWID intervals;

taking a small data set, obtaining loading LSN of the small data set, and completing the loading of the small data set; repeating the steps until all the small data sets are loaded;

after the data synchronization is started, the destination end positions the corresponding ROWID interval according to the ROWID of the operation log, and the data synchronization filtering is realized by comparing the operation LSN with the corresponding loading LSN.

Further, the dividing the table data set into a plurality of small data sets according to the size sequence of the roids, and the obtaining of the corresponding plurality of roid intervals specifically includes:

acquiring all EXTENT information of a table data set, and sorting the EXTENT according to the file number and the page number of a first page of the EXTENT;

dividing the sorted EXTENT into n small data sets according to the designated grouping number n;

obtaining a starting value ROWIDi of each small data set ROWID, and dividing a ROWID interval of each small data set according to the starting value ROWIDi; wherein ROWIDi is ROWID generated by the first row of the first page of the first EXTENT in the ith small data set, i belongs to [1, n ], and n is the number of the small data sets.

Further, the dividing of the ROWID interval of each small data set according to the starting value ROWIDi specifically includes:

if the small data set is not the last small data set, the ROWID interval of the small data set is [ ROWIDi, ROWIDj "); wherein j ═ i + 1;

if the small data set is the last small data set, the ROWID interval of the small data set is [ ROWIDi, ROWIDend ]; where ROWIDend is the ROWID generated from the last row of the last page of the last EXTENT in the last small dataset.

Preferably, the obtaining a small data set, obtaining a loading LSN of the small data set, and completing the loading of the small data set specifically includes:

taking a small data set, carrying out S-lock on a table to be loaded in a source end database, and inquiring the current LSN of the source end database at the source end to be used as a loading LSN;

generating a corresponding query statement for the small data set according to the ROWID interval range of the small data set and executing;

and after the S lock is released, extracting a result set executed by the query statement, and sending the extracted result set to a destination terminal for storage.

Further, after the data synchronization is started, the destination locates the corresponding ROWID interval according to the ROWID of the operation log, and the filtering for realizing the data synchronization by comparing the operation LSN with the corresponding loading LSN specifically comprises:

the source end captures an operation log of a source end database, and sends the obtained operation information to the destination end to execute synchronization after analyzing the operation log;

after receiving the operation information sent by the source end, the destination end carries out classification management according to the transaction ID, and when receiving the commit operation, finds out the transaction corresponding to the commit operation to be ready for execution;

the destination terminal executes each operation in the transaction in turn, and positions the ROWID interval to which the operation belongs through the table ID and the ROWID value corresponding to the operation so as to obtain the corresponding loading LSN;

comparing the sizes of the operation LSN and the loading LSN of the operation, discarding the operation when the operation LSN is smaller than the loading LSN, otherwise synchronizing the operation to the destination end database.

Preferably, the obtaining a small data set, obtaining a loaded LSN of the small data set, and completing the loading of the small data set specifically includes:

a small data set is taken, and the current LSN of a source database is inquired at a source end to be used as a loading LSN;

generating a query statement for the small data set by using a flash-back technology according to the ROWID interval range of the small data set and executing;

and extracting a result set executed by the query statement, and sending the extracted result set to a destination terminal for storage.

comparing the sizes of the commit LSN and the load LSN of the transaction of the operation, discarding the operation when the commit LSN is smaller than the load LSN, otherwise synchronizing the operation to the destination end database.

Furthermore, when the ROWID interval to which the operation belongs is positioned through the table ID and the ROWID value corresponding to the operation, if the corresponding interval is not found, the operation is synchronized to the destination end database.

Further, the destination creates a LOAD _ LSN table for storing the result set data extracted during the loading process, where the stored data includes table ID, loading LSN, START _ ROWID and END _ ROWID, where START _ ROWID represents the START value of the row interval, END _ ROWID represents the END value of the row interval, and the row in START _ row and END _ row represents the START LSN of the table.

On the other hand, the invention provides an initialization loading device based on the ROWID interval, which specifically comprises the following steps: the loading method comprises at least one processor and a memory, wherein the at least one processor and the memory are connected through a data bus, the memory stores instructions capable of being executed by the at least one processor, and the instructions are used for completing the loading method based on the ROWID interval in the first aspect after being executed by the processor.

Compared with the prior art, the invention has the beneficial effects that: and dividing a result set to be loaded, dividing the whole data set into N small data sets according to a specified numerical value, and then respectively loading the data of the N small data sets. In addition, the ROWID is a unique identifier of each row of data in the table of the database, the ROWID is used as a data screening condition, the query efficiency is high, and the influence on the database is small. On the other hand, the invention records the loading LSN when each small data set is loaded, and is used for comparing with the LSN to be synchronously operated during synchronization so as to screen whether the operation needs to be synchronously operated, thereby ensuring the consistency of the table related to the operation between the source end and the target end.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required to be used in the embodiments of the present invention will be briefly described below. It is obvious that the drawings described below are only some embodiments of the invention, and that for a person skilled in the art, other drawings can be derived from them without inventive effort.

Fig. 1 is a flowchart of an initial loading method based on a roild interval according to embodiment 1 of the present invention;

FIG. 2 is a schematic diagram of a ROWID component provided in embodiment 1 of the present invention;

fig. 3 is an exemplary diagram of the ROWID format provided in embodiment 1 of the present invention;

FIG. 4 is an exemplary diagram of EXTENT provided in example 1 of the present invention;

FIG. 5 is an exemplary diagram of the first and last rows of EXTENT provided in embodiment 1 of the present invention;

FIG. 6 is a table data organization example diagram of EXTENT provided in embodiment 1 of the present invention;

FIG. 7 is a diagram illustrating an example of the ordering of EXTENT provided in embodiment 1 of the present invention;

FIG. 8 is a diagram illustrating an example of the segmentation of EXTENT provided in embodiment 1 of the present invention;

fig. 9 is an exemplary diagram of small data set ROWID interval division provided in embodiment 1 of the present invention;

FIG. 10 is a diagram illustrating data loading provided in embodiment 1 of the present invention;

FIG. 11 is a flowchart specifically illustrating the step 200 provided in embodiment 2 of the present invention;

FIG. 12 is a flowchart illustrating a step 300 according to embodiment 2 of the present invention;

fig. 13 is a schematic diagram of a synchronization operation flow provided in embodiment 2 of the present invention;

FIG. 14 is a diagram illustrating an example of data loading time according to embodiment 2 of the present invention;

fig. 15 is a schematic diagram of query time provided in embodiment 3 of the present invention;

FIG. 16 is an explanatory view of the flashback timing provided in embodiment 3 of the present invention;

fig. 17 is a flow chart of flashback timing determination provided in embodiment 3 of the present invention;

FIG. 18 is a flowchart specifically illustrating the step 200 provided in embodiment 3 of the present invention;

FIG. 19 is a flowchart illustrating a step 300 according to embodiment 3 of the present invention;

FIG. 20 is a flowchart illustrating a synchronization operation according to embodiment 3 of the present invention;

FIG. 21 is a diagram of an exemplary flashback query provided in embodiment 3 of the present invention;

fig. 22 is a schematic structural diagram of an initialization loading device based on a roild zone according to embodiment 4 of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

The present invention is a system structure of a specific function system, so the functional logic relationship of each structural module is mainly explained in the specific embodiment, and the specific software and hardware implementation is not limited.

In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other. The invention will be described in detail below with reference to the figures and examples.

Example 1:

the initialization loading of a source base to a target base table of data synchronization and the migration of table data between two ordinary databases are greatly different, and as the source table can be continuously modified by a third party application during the loading process, the former needs to consider the data consistency of the source table and the target table when synchronization is started after the loading is finished; the latter only needs to copy the data image at that moment to the destination. Therefore, the former needs to record the current database log LSN corresponding to the query result set before loading, so as to filter the data modification operation loaded in the log according to the LSN after synchronization is started. Because the loading is carried out by taking the tables as units, each table has an independent loading LSN, the LSNs are registered in a destination end data synchronization system, after data synchronization is started, the destination end data synchronization system can identify which operations are loaded through a commit LSN of a transaction corresponding to an operation log when receiving the operation log of the table (the transaction commit log LSN is smaller than the loading LSN and indicates that the operation is visible during loading without resynchronization), and the loaded log directly discards the asynchronization, so that the consistency of the data of the table related to the operation at the source end and the destination end is ensured.

The root cause of the overused snapshot is that the time for reading the data set is too long, so that the modified data cannot find the historical version in the loading process, and in order to ensure the consistency of the read data and avoid dirty reading, the database terminates the application program to acquire the data. To solve this problem, a preferred embodiment of the present invention provides that the result set to be loaded is divided, the entire data set is divided into N small data sets according to a specified value, and then the N small data sets are loaded with data. The time for acquiring the data of the small data set is greatly shorter than the data acquisition time of the whole data set, so that the problem of 'too old snapshot' is avoided.

As shown in fig. 1, to solve the above technical problem, an embodiment of the present invention provides an initial loading method based on a ROWID interval, which specifically includes the following steps.

Step 100: and dividing the table data set into a plurality of small data sets according to the ROWID size sequence, and obtaining a plurality of corresponding ROWID intervals. Through the step, one table data set with a large amount of data can be divided into a plurality of small data sets, so that the small data sets can be initialized, loaded and synchronized conveniently, and thus, the time for acquiring the data of the small data sets is greatly shorter than the data acquisition time of the whole data set, and the problem of 'snapshot over-old' is avoided.

Step 200: taking a small data set, obtaining loading LSN of the small data set, and completing the loading of the small data set; this step is repeated until all the small data sets are loaded.

Step 300: after the data synchronization is started, the destination end positions the corresponding ROWID interval according to the ROWID of the operation log, and the data synchronization filtering is realized by comparing the operation LSN with the corresponding loading LSN. In addition, whether the operation needs to be synchronously screened is determined by comparing the operation LSN with the corresponding loading LSN, and the consistency of the data of the table related to the operation at the source end and the target end can be ensured.

For the above steps, the present preferred embodiment is expanded as follows.

First, step 100 of the preferred embodiment is to segment the table data set using the ROWID. In the ORACLE database, the ROWID is a mark capable of uniquely identifying a row of records, and meanwhile, in the ORACLE database, the row data storage position is contained in the row data information, and whether an index exists in a table or not, the row data can be quickly positioned by using the ROWID as a screening condition, so that the row data can be quickly divided by using the row data as a separation condition of a data set, and meanwhile, the row data is very quick to acquire small data set data by using the row data.

As shown in fig. 2, which is a schematic diagram of a component of a ROWID, in an ORACLE-like database, the ROWID is a 64-ary character string of 18 bytes, and the content of the character string is composed of a file number, a page number, a table id and a line number, where: the table id is an internal identifier of the table of the database; the file number is the internal identification of the database to the file; the data page is an internal management mode of the database to the disk file, the database is convenient to manage, the file is managed according to pages (BLOCK), the size of each page is n x 1024 bytes, the page numbers are sorted from 0, and the sequence is increased; the row number is the internal identification of the data in the data page by the database, the BLOCK is N × 1024 bytes, one BLOCK can store N (N >1) row table data, and in order to distinguish the table data in the BLOCK, the database marks the data row in the BLOCK, and the numbering starts from 0.

In the above-mentioned database, the ROWID includes physical location information of actual data storage, and the storage location where the ROWID corresponds to the row record may be calculated according to the ROWID, and meanwhile, the ROWID of the data may also be calculated according to the storage location.

Taking the ROWID format shown in fig. 3 as an example, the row of aaasxzaaeaacuaaa indicates that the row belongs to the table with the table id of 76889, and the row is located at row 0 of 174 block of file No. 4.

When the database allocates storage to the table, an express storage space is allocated to the table only by using a principle of allocation on demand, that is, the allocated storage space is used up, and an express is N (N > ═ 8) continuous data pages (BLOCK). Fig. 4 is an exemplary diagram of EXTENT.

If a newly assigned EXTENT is located in File No. 4, the first BLOCK page number is 174, EXTENT contains 64 BLOCKs, and the BLOCK maximum line number is 65535, then the EXTENT last data page number may be calculated as 174+63, 237, and the data ROWID range within the EXTENT may be calculated as [ AAASxZAAEAAAACuAAA, AAASxZAAEAAAADtf// ]. FIG. 5 is an exemplary diagram of the first and last rows of EXTENT.

As shown in FIG. 6, an exemplary graph is organized for the table data of EXTENT, in which the newly allocated storage space cannot be guaranteed to be continuous with the previously allocated storage space, and there may be EXTENT of other tables between the EXTENT tables, resulting in a larger span for the ROWID table.

Based on the above situation, the preferred embodiment may be specifically extended to the following step for step 100 (dividing the table data set into several small data sets according to the order of the size of the ROWID, and obtaining several corresponding ROWID intervals).

Step 101: and acquiring all the EXTENT information of the table data set, and sorting the EXTENT according to the file number and the page number of the first page of the EXTENT. As shown in fig. 7, for an example of sorting the EXTENTs, for the above-mentioned type of database, first, information of all the EXTENTs in the table is obtained, and then sorting is performed according to the file number and page number where the first page (BLOCK0) of the EXTENT is located, when sorting is performed, sorting is performed with priority given to the file number sequence (from small to large), and when the file numbers are the same, sorting is performed with priority given to the page number sequence (from small to large).

Step 102: and dividing the sorted EXTENT into n small data sets according to the designated grouping number n. As shown in fig. 8, in an exemplary diagram of dividing EXTENT, the preferred embodiment takes table data of EXTENT0-EXTENT as an example, and divides the data into n small data sets (i.e. record set1-record set in the diagram), where each small data set includes several EXTENTs.

Step 103: obtaining a starting value ROWIDi of each small data set ROWID, and dividing a ROWID interval of each small data set according to the starting value ROWIDi; wherein ROWIDi is ROWID generated by the first row of the first page of the first EXTENT in the ith small data set, i belongs to [1, n ], and n is the number of the small data sets. As shown in fig. 9, which is an exemplary diagram of small data set rodid interval division, in the preferred embodiment, the starting value of RECORDSET1 is ROWID1, the starting value of RECORDSETi is ROWIDi, and the starting value of RECORDSETn is up to the n-th small data set RECORDSETn that is divided, and the starting value is rowlnn. Thus, from the start value ROWIDi, the following ROWID interval can be determined: if the small data set is not the last small data set, the ROWID interval of the small data set is [ ROWIDi, ROWIDj), that is, ROWIDi < (ROWID < ROWIDj) in the graph, wherein j is i + 1; if the small data set is the last small data set divided, the region of rodid of the small data set is [ rodid, rodend ], where rodid is equal to rodind, that is, the starting value of the nth (last) small data set, that is, the region of rodind is the rodid generated by the last row of the last page of the last exchange in the last small data set.

As shown in fig. 10, which is an exemplary diagram of data loading, it can be seen that the entire loading ROWID range is [ ROWID1, ROWIDend ], and when data is loaded, each RECORDSET is loaded separately. The loading of the source library table T1 data into the destination library in fig. 10 is based on the small data set ROWID interval division of fig. 9.

The above is a specific description of the table data set being divided into a plurality of small data sets in the preferred embodiment, and after the division is performed in the above manner, each small data set can be loaded and synchronously processed, so that the time for acquiring the small data set data is much shorter than the data acquisition time of the whole data set, thereby avoiding the problem of "too old snapshot", and in addition, the ROWID is a unique identifier for each row of data in the table of the database, and the ROWID division interval is used as a data screening condition, so that the query efficiency is high, and the influence on the database is small.

Before the scheme, when synchronous initialization encounters an 'old snapshot' error, measures such as expanding the space of a rollback section or prolonging the space release time of the rollback section and the like are often adopted to adjust the operation parameters of a source database, but the implementation of the measures has great uncertainty, because the risk of adjusting the operation parameters in a production system is large, the influence on the production system cannot be estimated, and meanwhile, the operation after adjustment and reloading still cannot avoid error reporting again. By adopting the scheme, the 'snapshot over-old' error can be effectively processed, the ROWID is used for segmentation, the data set of the table can be subdivided, and when the segmented result set is small enough, the problem of the snapshot over-old can be completely avoided.

Example 2:

based on the method in embodiment 1, embodiment 2 further expands the loading and synchronization of data divided into small data sets.

Firstly, a synchronization system needs to be deployed in a source end database and a destination end database, the source end database synchronization system reads logs from the source end database, and the destination end database synchronization system is responsible for transmitting synchronization operations sent by a source end to the destination end database. The data synchronization system at the destination end creates a table LOAD _ LSN at the destination end during initialization, which is used to store the LOAD LSN of the synchronization table, and the table structure is as follows: CREATE TABLE LOAD _ LSN (OBJID INT, LSN NUMBER (20), START _ round value (18), END _ round value (18)); the OBJID column is used for storing ID of the loading table, LSN column is used for storing loading LSN of the table in corresponding ROWID interval, START _ ROWID column is used for storing starting value of loading ROWID interval, END _ ROWID column is used for storing ending value of loading ROWID interval, and in addition, the row which is NULL in START _ ROWID and END _ ROWID represents the starting LSN of the table. (there is a global START LSN for each table indicating that the line record is the global START LSN for the table when START _ ROWID and/or END _ ROWID are NULL.)

It should be noted that data loading does not need to stop the source-side database service system, and table data also changes during data loading, so that after data loading is completed, when incremental data synchronization is started, synchronization software needs to distinguish whether operations need to be synchronized to a target database, if table data is loaded at time T, operations before time T should not be synchronized to the target database, and operations after time T need to be synchronized to the target database. In the ORACLE database, an S-lock may be placed on the table, which when successful, indicates that there are no outstanding transactions on the table at that time. If all the transactions on one table are finished, the table data is persistent at the moment, any client side makes the same query on the table, and the result is consistent. Meanwhile, in the ORACLE database, a database internal clock (LSN) is defined, and each LSN has only one data modification at most. The database records the LSN of the operation occurrence time for each data modification operation, so that the LSN and the S lock can be used as a basis for judging whether the incremental data needs to be synchronized to a target library.

Specifically, based on the above description, in this embodiment 2, the process of "taking one small data set, obtaining the loading LSN of the small data set, and completing the loading of the small data set" in step 200 in embodiment 1 may be extended to the steps shown in fig. 11:

step 201: and taking a small data set, carrying out S-locking on a table to be loaded in a source database, and inquiring the current LSN of the source database at the source end to be used as a loading LSN. If the S-lock is successful, it indicates that the transactions on the table are all finished, and the data on the table cannot be modified until the S-lock is not finished. In the event that the S-lock is successful, the data in the table during the query get load LSN will not change.

Step 202: and generating a corresponding query statement for the small data set according to the ROWID interval range of the small data set and executing. In this step, according to the rolling screening range (rolling < rolling > rolling), a query statement SQL _ SET is generated for each record SET, and the result SET satisfying the rolling screening range can be obtained by executing the SQL _ SET. It should be noted that, if the small data set is the last (nth) small data set, the roid screening range should be updated to roidi ═ roid ═ roided (that is, roidn ═ roid ═ roided).

Step 203: and after the S lock is released, extracting a result set executed by the query statement, and sending the extracted result set to a destination terminal for storage. After the S-lock releasing operation is completed, other clients/applications may perform data modification operation on the table, and at this time, if the LSN of the data operation is greater than the loaded LSN obtained in step 201, the synchronization software synchronizes the operation to the target database when synchronizing the incremental data; if the load LSN acquired in step 201 is a data operation, the data operation may be discarded, e.g., the data operation prior to step 201 does not need to be synchronized to the destination database. In addition, the result set data extracted by this step includes: and the table ID, the loading LSN, the START _ ROWID and the END _ ROWID are extracted and then sent to a destination END data synchronization system, and the destination END data synchronization system stores the information into a LOAD _ LSN table so as to be used when the corresponding tables are synchronized. Where START _ ROWID represents the START value of the ROWID interval (i.e., result set ROWIDi), and END _ ROWID represents the END value of the ROWID interval (i.e., result set ROWIDj and ROWIDend). The NULL row in START _ ROWID and END _ ROWID represents the starting LSN of the table.

As shown in fig. 12, the specific expansion steps of the expansion of step 300 (after data synchronization is opened, the destination locates the belonging ROWID interval according to the ROWID of the operation log, and compares the operation LSN with the corresponding loading LSN to implement filtering of data synchronization) after the operation based on the S-lock in this embodiment are as follows.

Step 301: the source end captures an operation log of a source end database, and sends the obtained operation information to the destination end to execute synchronization after analyzing the operation log. In this step, it is also necessary to LOAD data in the LOAD _ LSN table into the memory in order of obj ID when synchronization is started, and record the data as GLOBAL _ LOAD, where the source-end data synchronization system captures an operation log of the source-end database, obtains a transaction ID, a table ID, an operation LSN, an operation type, and operation data corresponding to an operation after analysis, includes a ROWID of the data on the source table in the operation data, and then sends these pieces of information to the destination to perform synchronization.

Step 302: and after receiving the operation information sent by the source end, the destination end performs classification management according to the transaction ID, and finds the transaction corresponding to the commit operation to be ready for execution when receiving the commit operation.

Step 303: and the destination terminal executes each operation in the transaction in turn, and positions the ROWID interval to which the operation belongs through the table ID and the ROWID value corresponding to the operation so as to obtain the corresponding loading LSN. When the ROWID interval to which the operation belongs is positioned through the table ID and the ROWID value corresponding to the operation, if the corresponding interval is not found, the operation is synchronized to the destination end database, because if the ROWID interval to which the operation belongs is not positioned, the operation can be directly determined as the operation needing synchronization, which indicates that the data of the operation is newly added data. In addition, before starting the ROWID interval to which the positioning operation belongs, whether the operation LSN of the operation is smaller than the initial LSN of the table or not can be judged, if so, the operation is visible before being locked, the operation does not need to be synchronized to a target database, and the operation can be directly discarded, otherwise, further judgment is needed, namely, the ROWID interval to which the positioning operation belongs is started to perform next judgment.

Step 304: comparing the sizes of the operation LSN and the loading LSN of the operation, discarding the operation when the operation LSN is smaller than the loading LSN, otherwise synchronizing the operation to the destination end database. When the operation LSN is smaller than the loading LSN, the operation is visible before the upper S lock, and the operation can be directly discarded without being synchronized to a target database; when the operation LSN is larger than the loading LSN, the operation is the operation after S lock is released, and the operation needs to be synchronized to the target end database.

The step of the above-mentioned synchronization operation is represented by a flow chart, as shown in fig. 13, after the synchronization is started, the table id (tabid) of the operation, the operation LSN (oplsn), the row id (lROWID) of the operation row are obtained, the record related to the tab is obtained from the GLOBAL _ LOAD and is marked as LSN _ LST, the row with START _ row NULL is found from the LSN _ LST, the LSN value in the row is marked as slsn0 (i.e. the START LSN of the table), it is determined whether oplsn < slsn0, if so, the operation is discarded and the flow is ended, otherwise, the interval with lROWID is found from GLOBAL _ LOAD, if not, the operation is synchronized to the destination database and the flow is ended, if found, the LSN value of the interval is marked as RLSN (loading LSN of the interval is determined whether or not, the operation is synchronized to the destination database and the flow is ended, otherwise, the operation is discarded and the flow is ended.

The scheme of this embodiment 2 will be described below by taking the data loading time in a specific example shown in fig. 14 as an example. In fig. 14, R1 and RI … … Rn are subsets of ROWID 1-rowgodn, SLSN0 is LSN of the ROWID set of the acquisition table, SLSN1 … … SLSNn is LSN of the acquisition subset R1 … … Rn, and INS0 … … INSn is operations of the user on the loading table.

According to the difference of the INS0 … … INSn operation time in the above figure, the processing result of the synchronous software to the operation is different:

INS 0: this data is visible at the time of SLSN0, occurring before SLSN0, so that the data affected by the operation is in the data of ROWID1-ROWIDn, and the operation does not need to be synchronized to the destination database.

INS 1: the operation occurs between SLSN0-SLSN1, and at the same time, the data line ROWID of the operation exists in the ROWID range represented by R1. Since the data load of R1 is at SLSN1 and the INS1 operation is visible at SLSN1, the operation does not need to be synchronized to the destination database.

INS2 operation occurs after SLSN1, and at the same time, the ROWID of the data line operated by the operation exists in the ROWID range represented by R1. Since the data load of R1 is at SLSN1 and the INS1 operation is not visible at SLSN1, the operation needs to be synchronized to the destination database.

INSm: the principle is the same as that of INS2, and operation requires synchronization to the destination database.

INSn: operations occur between SLSN0-SLSN1, however, the data affected by the operation is not among the ROWID0-ROWIDn data, although INSn is visible after SLSN1, but the load data is only the ROWID0-ROWIDn data, so INSn needs to be synchronized to the destination database.

In summary, in this embodiment, the range of the roid interval, the starting LSN of the table, and the loading LSN of the small data set represented by each roid interval are compared with the roid value and the operation LSN value to be synchronized for screening to determine whether the operation needs to be synchronized, so as to ensure the consistency of the data of the table related to the operation at the source end and the destination end.

Example 3:

based on the methods for initializing and loading data based on the ROWID interval provided in

embodiments

1 and 2, embodiment 3 further expands the loading and synchronization of data divided into small data sets.

The method for loading and synchronizing data of intervals by using ROWIDs, namely the detailed expansion of steps 200-300 in embodiment 1, has been described in detail in embodiment 2, but in embodiment 2, before loading data of each ROWID interval, an S-lock on a loading table is required to obtain a version of a loaded data set, but this operation often fails to lock in a production system, resulting in a data loading failure. In this embodiment 3, a method is proposed to read a data set of a specified version by a database flashback technique without performing a lock operation on a load table when loading data of a ROWID interval.

In the database, the data changes in real time due to the operation of the business system. In general, the query result set returned by the database to the application system is filtered from the current visible data. As shown in fig. 15, if there is an operation on the data between time t1 and time t2, the data visible at time t1 and time t2 are different, which results in the same data filtering statement, and the result sets of the query at time t1 and time t2 are different.

In the database, data operation is carried out by taking a transaction as a unit, and if the transaction is not finished, the data operated in the transaction cannot be seen for other database connections.

When the synchronization software finishes loading data and performs incremental data synchronization, it needs to judge which incremental data need to be synchronized and which need to be discarded. In the database, the internal time (LSN) of the database is defined, and the synchronization software loads data by acquiring a result set of an sql statement, wherein the result set of the sql statement is visible data in an execution schedule of the sql statement. In order to obtain data at the execution time of the SQL statement, the processing method in embodiment 2 is to lock the table with an S lock, if the table is successfully locked, other application programs cannot modify the table data before the S lock is not released, and the table data is static data during the period from the successful locking to the releasing of the S lock. In this case, the load LSN may be used to determine which operations need to be synchronized and which operations do not need to be synchronized in the incremental data synchronization; when there are outstanding transactions on the table, there is no S-lock on the table.

As shown in fig. 16, there is a transaction in the graph, starting at time lsn2 and ending at time lsn 4. In this case, when the current time is lsn3, the table tab1 cannot be locked by S, and similarly, it can be seen that from lsn2 to lsn5, the table cannot be locked by S. The flash-back technology of the database can enable the application to inquire the visible data of the loading table at the appointed LSN moment, so that the problem of unsuccessful table locking can be solved without an S lock on the table and inquiring the visible data before the current time. In fig. 16, at time LSN5, table data at time LSN3 can be queried by a flashback technique, and then incremental data is read from LSN2 in incremental synchronization, so that no data loss can be guaranteed. In this process, the values lsn5 and lsn2 need to be determined, and in this embodiment, the method flows obtained by lsn5 and lsn2 are shown in fig. 17: acquiring the current LSN (LSN2 at this time) of the database; inquiring whether an uncommitted transaction exists in tab1 in the current database, referring to the example of fig. 16, where an uncommitted transaction exists at this time, so that it is necessary to acquire the execution start time LSNt of the earliest executed uncommitted transaction, determining whether LSNt is greater than the current LSN (LSN2), if not, sleeping for 10s, returning to the step of inquiring whether an uncommitted transaction exists, until LSNt is greater than LSN2 or no uncommitted transaction exists in the table, acquiring the current LSN of the database (in this example, this time is LSN5), and finally acquiring data of table tab1 at LSN5 by using a snapshot technique.

In the above process, it is required to wait until all transactions before lsn2 are submitted before snapshot query is performed, and at this time, it can be ensured that the query result at time lsn5 includes all modifications of transactions started before lsn2, and transactions started after lsn2 are completed by incremental synchronization. The flashback query statement is as follows: select from t1 as of scn 12345; where as of scn is the flashback query key and 12345 is the flashback query time (lsn).

Based on the above description of the flashback function, for the case that S-lock is not needed or S-lock fails, this embodiment 3 can be extended to the steps shown in fig. 18 for "fetching a small data set, obtaining the loading LSN of the small data set, and completing the loading of the small data set" in step 200 in embodiment 1:

step 211: and taking a small data set, and inquiring the current LSN of the source database at the source end to be used as the loading LSN. It should be noted that the current LSN found at this time as the load LSN is the LSN at the time of obtaining the table data by using the snapshot technique after the process shown in fig. 17. After querying the LOAD LSN, the source end sends the table ID, SLSN (LOAD LSN), NULL, and NULL to the destination end data synchronization system, and the destination end data synchronization system stores the above information in the LOAD _ LSN table. START _ ROWID and END _ ROWID are NULL rows that represent the starting LSN of the table.

Step 212: and generating a query statement for the small data set by using a flash-back technology according to the ROWID interval range of the small data set and executing. Specifically, according to the roild screening range (roidi ═ roid < roidj, and if the range is the last roid interval, the range is roidi ═ roid ═ roindend), a query statement SQL _ SET is generated for RECORDSET by using the flash-back technique, and then SQL _ SET is executed.

Step 213: and extracting a result set executed by the query statement, and sending the extracted result set to a destination terminal for storage. Specifically, the SQL _ SET execution result SET is extracted, and the extracted result SET is sent to the destination end until the result SET extraction is completed, it should be noted that the result SET extracted in this step is a visible data SET found by the flash-back technique. The data information of the result set includes the table ID, load LSN, START _ ROWID, and END _ ROWID. The destination data synchronization system stores the information into the LOAD _ LSN table so as to be used when the corresponding tables are synchronized. Where START _ ROWID represents the START value of the ROWID interval (i.e., result set ROWIDi), and END _ ROWID represents the END value of the ROWID interval (i.e., result set ROWIDj and ROWIDend). The NULL row in START _ ROWID and END _ ROWID represents the starting LSN of the table.

As shown in fig. 19, the specific expansion steps of the expansion of step 300 (after data synchronization is started, the destination locates the corresponding roid interval according to the roid of the operation log, and compares the operation LSN with the corresponding loading LSN to implement filtering of data synchronization) after the operation based on the flashback technique in this embodiment are as follows.

Step 311: the source end captures an operation log of a source end database, and sends the obtained operation information to the destination end to execute synchronization after analyzing the operation log. In this step, it is also necessary to LOAD data in the LOAD _ LSN table into the memory in order of obj ID when synchronization is started, and record the data as GLOBAL _ LOAD, where the source-end data synchronization system captures an operation log of the source-end database, obtains a transaction ID, a table ID, an operation LSN, an operation type, and operation data corresponding to an operation after analysis, includes a ROWID of the data on the source table in the operation data, and then sends these pieces of information to the destination to perform synchronization.

Step 312: and after receiving the operation information sent by the source end, the destination end performs classification management according to the transaction ID, and finds the transaction corresponding to the commit operation to be ready for execution when receiving the commit operation.

Step 313: and the destination terminal executes each operation in the transaction in sequence, and positions the ROWID interval to which the operation belongs through the table ID and the ROWID value corresponding to the operation so as to obtain the loading LSN corresponding to the ROWID interval. When the ROWID interval to which the operation belongs is positioned through the table ID and the ROWID value corresponding to the operation, if the corresponding interval is not found, the operation is synchronized to the destination end database, because if the ROWID interval to which the operation belongs is not positioned, the operation can be directly determined as the operation needing synchronization, which indicates that the data of the operation is newly added data. In addition, before positioning the ROWID interval to which the operation belongs, whether the submitted LSN of the transaction to which the operation belongs is smaller than the initial LSN of the table or not can be judged, if so, the transaction to which the operation belongs is visible during loading and does not need to be synchronized to a target database, the transaction can be directly discarded, otherwise, further judgment is needed, namely, the ROWID interval to which the operation belongs is started to be positioned for next judgment.

Step 314: comparing the sizes of the commit LSN and the load LSN of the transaction to which the operation belongs, discarding the operation when the commit LSN is smaller than the load LSN, otherwise synchronizing the operation to the destination end database. When the submitted LSN is smaller than the loaded LSN, the transaction is visible during loading, does not need to be synchronized to a target database, and can be directly discarded; when the commit LSN is greater than the load LSN, the transaction is invisible during the load and needs to be synchronized to the destination end database.

The steps of the above synchronization operation are represented by a flow chart, as shown in fig. 20, after the synchronization is started, the table id (tabid) of the operation is obtained, the commit LSN (oplsn) of the transaction to which the operation belongs, the row id (lwid) of the operation row, the record related to the tabid is obtained from the GLOBAL _ LOAD and is marked as LSN _ LST, the row with START _ ROWID being NULL is found from the LSN _ LST, the LSN value in the row is marked as slsn0 (i.e. the START LSN of the table), whether oplsn < slsn0 is judged, if yes, the operation is discarded and the flow is ended, otherwise, from the GLOBAL _ LOAD, finding the interval of the ROWID where the lROWID is located, if the corresponding interval is not found, synchronizing the operation to a target database and ending the process, if the corresponding interval is found, obtaining the LSN value of the interval, recording the LSN value as RLSN (namely loading LSN of the interval), judging whether the RLSN is greater than oplsn, if so, discarding the operation and ending the process, otherwise, synchronizing the operation to the target database and ending the process.

For the solution of this embodiment 3, a specific flashback query example shown in fig. 21 is illustrated next. In fig. 21, there are three transactions in sequence, and after the data loading starts, the synchronization software obtains that the current database LSN is LSN3, at this time, the start time of the second transaction is LSN2, which is smaller than LSN3, so the data loading needs to wait for the second transaction to end. At time lsnt, a new transaction, i.e., a third transaction, is generated, but lsnt is greater than lsn3, so the data load does not have to wait for the third transaction to end. At time lsn4, the second transaction ends, at which point the table may be loaded with data using a flashback technique.

After the data loading is finished, when the incremental data synchronization is carried out, the synchronization software carries out data capture from lsn3, and all second transaction operations are discarded as the end time of the second transaction is not more than lsn 4; and the transaction end time of the third transaction is greater than lsn4, so the operation of the third transaction needs to be synchronized. For a specific example of incremental synchronization, reference may be made to the example in fig. 14 in embodiment 2, and details are not described here.

Example 4:

on the basis of the initialization loading method based on the ROWID interval provided in embodiments 1 to 3, the present invention further provides an initialization loading device based on the ROWID interval, which can be used to implement the method, as shown in fig. 22, which is a schematic diagram of a device architecture in an embodiment of the present invention. The ROWID interval based initialization load device of the present embodiment includes one or more processors 21 and a memory 22. In fig. 22, one processor 21 is taken as an example.

The processor 21 and the memory 22 may be connected by a bus or other means, and fig. 22 illustrates the connection by a bus as an example.

The memory 22, as a non-volatile computer-readable storage medium, may be used to store non-volatile software programs, non-volatile computer-executable programs, and modules, such as the method for initial loading based on the ROWID interval in embodiments 1 to 3. The processor 21 executes various functional applications and data processing of the initial loading device based on the ROWID zone by running the nonvolatile software program, instructions and modules stored in the memory 22, that is, implements the initial loading method based on the ROWID zone of embodiments 1 to 3.

The memory 22 may include high speed random access memory and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some embodiments, the memory 22 may optionally include memory located remotely from the processor 21, and these remote memories may be connected to the processor 21 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The program instructions/modules are stored in the memory 22 and, when executed by the one or more processors 21, perform the method for initial loading based on the ROWID interval in embodiments 1 to 3, for example, perform the steps shown in FIG. 1, FIG. 11 to FIG. 12, and FIG. 18 to FIG. 19 described above.

Those of ordinary skill in the art will appreciate that all or part of the steps of the various methods of the embodiments may be implemented by associated hardware as instructed by a program, which may be stored on a computer-readable storage medium, which may include: a Read Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and the like.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims

1. An initial loading method based on ROWID interval is characterized by comprising the following steps:

2. The method of claim 1, wherein the dividing the table dataset into a plurality of small datasets according to the order of the size of the ROWID interval and obtaining a plurality of corresponding ROWID intervals specifically comprises:

3. The method as claimed in claim 2, wherein the dividing of the ROWID interval of each small data set according to the starting value ROWIDi comprises:

4. The method of claim 2, wherein the taking a small dataset, obtaining a loading LSN of the small dataset, and completing the loading of the small dataset specifically comprises:

5. The method as claimed in claim 4, wherein after the data synchronization is started, the destination locates the corresponding ROWID interval according to the ROWID of the operation log, and the filtering for data synchronization by comparing the operation LSN with the corresponding loading LSN specifically comprises:

6. The method of claim 2, wherein for the case that the S-lock is not required or fails, the fetching of one small dataset, obtaining the loading LSN of the small dataset, and completing the loading of the small dataset specifically comprise:

7. The method as claimed in claim 6, wherein after the data synchronization is started, the destination locates the corresponding ROWID interval according to the ROWID of the operation log, and the filtering for data synchronization by comparing the operation LSN with the corresponding loading LSN specifically comprises:

comparing the sizes of the commit LSN and the load LSN of the transaction to which the operation belongs, discarding the operation when the commit LSN is smaller than the load LSN, otherwise synchronizing the operation to the destination end database.

8. The method of claim 5 or 7, wherein when the ROWID span to which the operation belongs is located by the corresponding table ID and ROWID value of the operation, if the corresponding span is not found, the operation is synchronized to the destination database.

9. The method of any one of claims 1-7, wherein the destination creates a LOAD _ LSN table for storing the result set data extracted during the loading process, and the stored data includes table ID, LOAD LSN, START _ row, and END _ row, where START _ row represents the START value of the row width, END _ row represents the END value of the row width, and the row in START _ row and END _ row represents the START LSN of the table.

10. An initialization loading device based on ROWID interval, which is characterized in that:

the ROWID interval-based initialization loading method comprises at least one processor and a memory, wherein the at least one processor and the memory are connected through a data bus, and the memory stores instructions capable of being executed by the at least one processor, and the instructions are used for completing the ROWID interval-based initialization loading method according to any one of claims 1-9 after being executed by the processor.