[ detailed description ] embodiments
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
Example 1:
in an actual application scenario, when data synchronization is performed by using log analysis, correct object definition information must be used, and analysis or synchronization errors may be caused when the object definition information is too new or too old. In this embodiment, the object definition information in the dictionary of the source database system is stored locally, and during the log analysis process, the local offline dictionary is accessed without accessing the database in real time, so that not only can the influence on the performance of the database be reduced, but also the correct object definition information can be obtained, and further the accuracy of data synchronization or log analysis is ensured. The following describes a method for data synchronization using an offline dictionary.
Step 10: an offline dictionary containing object definition information is established.
In this embodiment, the source database first establishes an offline dictionary containing object definition information according to actual requirements.
The source database comprises any one of an SQLSERVER database, a MYSQL database, a DB2 database, a DM6 database, a DM7 database, a POSTGRESQL database or an ORACLE database, the type of the database is not specifically limited, and the type of the database can be selected according to actual conditions.
Wherein the object comprises one or more of a table, an index, and a view.
Specifically, a table is composed of rows and columns, the columns are composed of the same kind of information, each column is also called a field, the header of each column is called a field name, and the rows include a plurality of columns of information items. A line of data is called a record or record and it expresses a meaningful combination of information. A table may consist of one or more records, and a table with no records is referred to as an empty table.
Indexes are built according to the order of the designated database table columns, provide a way to quickly access data, and can supervise the data of the tables so that the data in the columns to which the indexes point are not duplicated, such as clustered indexes.
A view looks just like a table, with a set of named fields and data items, but a view is actually a virtual table and does not actually exist in a database. Views are generated by querying database tables, which limits the data that a user can see and modify. The view may be used to control user access to the data and to simplify the display of the data, i.e. to display only those data information that is required through the view.
Referring to fig. 2, a method for creating an offline dictionary is described in detail.
Step 101: and acquiring object definition information in a dictionary of the source database system and a current log serial number corresponding to the source database at the current moment.
In this embodiment, definition information of an object matching a Structured Query Language (SQL) is obtained based on a SQL source-side database system dictionary. In an actual application scenario, SQL statements may be designed according to actual requirements to obtain definition information of corresponding objects. The definition information of the object comprises one or more items of a mode name, an object name and a partition position. In an alternative embodiment, the definition information of the object includes a mode name, an object name, and a partition position. For example, when the object is a table, the definition information of the object includes a schema name, a table name, a column name, and a partition position of the table.
In an actual application scenario, when the types of the database systems are different, the manner of acquiring the object definition information is also different. For convenience of explanation, the manner of obtaining the object definition information is explained by taking the database system as an ORACLE database and the object as a table as an example.
First, the initialization information of the table is obtained by using a select statement, a form statement and a where statement. For example, in an actual application scenario, the initialization information of the table is obtained as follows: a selection sc.obj #, sc.interactive #, sc.name, sc.type #, sc.charset form (a selection dataobj # from sys. obj # and sl. interactive #) local, sc.length, sc.precision #, sc.scale, sc.null, sc.default #, sc.pro, sc.property, (a selection toid. system, sc.internal, sc.1, sc.60, sc.surface, sc.pro, sc.60, sc.100, sc.internal, sc.100, sc.60, sc.1, sc.internal, sc.60, sc.100, 13, 2, 60, 183, 60, 2, 60, 2, 60, 2, 60, 2, 60, 2, 60.
Then, the column information of the table is obtained by adopting a select statement, a form statement and a where statement. For example, in an actual application scenario, the following method is adopted to obtain the column information of the table: a selection sc.obj #, sc.intcol #, sc.name, sc.type #, sc.charstform, (a selection dataobj # from sys.obj # and sl.intcol #) combined, sc.length, sc.precision #, sc.scale, sc.null, sc.default, sc.property, (a selection toid. sys.obj # and sl.intcol #) combined, sc.length, sc.integer # and sc.intcol # combined, sc.1, sc.1000, sc.kernel #, sc.32, sc.100, sc.60, sc.100, 13, etc. (sc.100, 13, 60, 183, 13, 60, etc.), and 60, 183, 60, 2, 60, 2, 60, 2, 60, 2, 60.
And finally, obtaining the partition information of the table by adopting a select statement, a form statement and a where statement. For example, in an actual application scenario, the following manner is adopted to obtain the partition information of the table: a selection tp.obj #, tp.databobj #, tp.bo #, tab.subnet from system.tabpart #, a.bbo # tp, a.bbo from system.tab from system.obj #, c.subnet from system.tabsub.p.a.tab.b.sub.p.a.tab.b.sub.b, a.ob.b.p.b.p.b.p.b.p.b.p.b.p.b.p.b.p.b.p.b.p.b.p.b.p.b.b.p.b.o.b.o.b.p.b.and a.obj # c.o.b.p.u.n.p.o.p.obj #, tcp.ob.p.p.p.p.b.p.p.p.b.p.p.b.p.p.b.p.p.b.p.p.b.p.p.b.p.p.b.p.p.b.p.p.b.p.p.p.b.p.p.b.p.p.b.p.p.b.p.p.p.b.p.p.b.p
// obtaining partition information
select obj#,parttype,mod(SPARE2,256)from sys.partobj$
// partition table Large object
select a.dataobj#,b.tabfragobj#,a.name from sys.obj$a left join sys.lobfrag$b on(a.obj#=b.fragobj#)where(a.type#=40or a.type#=41)。
The above-mentioned method for acquiring the definition information of the specification table is explained only according to a specific example, and in an actual application scenario, a corresponding SQL statement can be written according to the type of the database, so as to acquire the definition information of the table.
In this embodiment, the offline dictionary includes at least one offline dictionary file, and each offline dictionary file corresponds to definition information of one object, that is, the offline dictionary may include definition information of at least one object. Since the definition information of an object may be modified in an actual application scenario, in order to distinguish the definition information of an object at different times, in the present embodiment, each offline dictionary file has at least one version of the offline dictionary file. In order to distinguish the offline dictionary files of different versions of the same object, in this embodiment, the log sequence number is configured as a version number of the offline dictionary file.
Specifically, when the object definition information is acquired, a current log serial number corresponding to the source-end database at the current time is acquired, where the current log serial number may be a maximum log serial number in a log file of the source-end database at the current time. The log sequence number is a numerical value automatically maintained by a database system, and has the characteristics of automatic increment and global uniqueness.
In an actual application scenario, because a large number of objects exist in a database system, an operation log is correspondingly generated when each object is operated, and meanwhile, the system automatically generates a log serial number to distinguish different operations. The log sequence number is used to represent one physical transaction generated inside the database system, and has global uniqueness in the log file of the database system to distinguish different physical transactions. Currently, most database management systems use LSN (Log sequence number, LSN) to represent a physical transaction generated inside the database system, for example, SQLSERVER, MYSQL, DB2, DM6, DM7, and other database management systems. However, in the ORACLE database, SCN (System Change Number, SCN for short) is used as the identifier for generating a physical transaction, and LSN in ORACLE is only used as a serial Number for log switching, so SCN can be used as the log serial Number in the ORACLE database.
In addition, when the types of the database systems are different, the mode of obtaining the log serial number is correspondingly different. For example, when the database system is an ORACLE database, the SCN is used as a log serial number, and can be queried through an SQL statement, specifically, a select dbms _ flash. When the database system is a DM7 database, the log sequence number LSN is used as the log sequence number, and the query can be performed through SQL statements, specifically, a select cur _ LSN from sys.v. $ rlog command can be used to obtain the LSN value (log sequence number). When the database system is a DM6 database, the log sequence number LSN is used as a log sequence number, and the query may be performed through an SQL statement, specifically, a SELECT GET _ LAST _ LSN ('database name') command may be used to obtain an LSN value (log sequence number). When the database system is a MYSQL database, the log sequence number LSN is used as a log sequence number, and the query can be performed through SQL statements, and specifically, a show binary logs command can be used to obtain an LSN value (log sequence number). When the database system is a POSTGRESQL database, the log sequence number LSN is used as the log sequence number, and query may be performed through an SQL statement, and specifically, a SELECT cast (pg _ current _ xlog _ insert _ location () -cast ('0/0' as pg _ LSN) as binary) command may be used to obtain the LSN value (log sequence number). When the database system is an SQLSERVER database, the log sequence number LSN is used as the log sequence number, and the query can be performed through an SQL statement, specifically, a select sys. When the database system is a DB2 database, the log serial number LSN is used as a log serial number, and can be acquired through an Api interface, and specifically, a DB2ReadLog function can be used to acquire an LSN value (log serial number).
The foregoing is only a list of simple description of the manner of obtaining the log serial number, and specifically, the log serial number may be obtained by using an SQL statement or an Api interface according to an actual application scenario, which is not described herein again.
Step 102: integrating the object definition information and the current log serial number into a first version offline dictionary file, and storing the first version offline dictionary file in a local disk.
In this embodiment, after the definition information of the object and the current log serial number are obtained, the definition information corresponding to the object is integrated into a first version offline dictionary file, and the offline dictionary file is stored in a local disk. The definition information of the object in the first version offline dictionary file can be used as the reference definition information of the object. When the object definition information is modified, the modified object definition information may be added to the first version offline dictionary file, and the log sequence number corresponding to the object modification operation is configured as the version number of the version to distinguish.
Specifically, the definition information of each object may be integrated into an offline dictionary file in txt text format. Since the number of the objects is multiple, and correspondingly, the number of the offline dictionary files in the offline dictionary is also multiple, in order to distinguish different offline dictionary files, in this embodiment, a unique file identification number is further allocated to the offline dictionary file corresponding to each object, and after the unique file identification number is allocated to the offline dictionary file corresponding to each object, the offline dictionary file is stored in the local disk. The file identification number is automatically allocated by the database system and has global uniqueness.
After the definition information corresponding to the object is integrated into an offline dictionary file, configuring the log serial number as the version number of the offline dictionary file, and finally storing the offline dictionary file with the configured version number in a local disk.
Step 103: and analyzing the data operation of the source database, and updating the offline dictionary file of the object corresponding to the data operation.
In this embodiment, when the source database performs data operation, the definition information of the object may be changed. In order to ensure the accuracy of the definition information of the object, the embodiment further needs to analyze the data operation of the database system, and update the offline dictionary file of the object corresponding to the data operation.
In particular, the data operation is a DDL operation, wherein the DDL operation includes creating an object, modifying an object, and deleting an object.
In an alternative embodiment, the data operation is a create object. Referring to fig. 3, step 103: analyzing the data operation of the source database, and updating the offline dictionary file of the object corresponding to the data operation comprises the following steps:
step 1131: judging and determining whether the object created by the source database is an object needing synchronization.
In this embodiment, the object created by the source database belongs to a newly added object, and the local existing offline dictionary does not include a corresponding offline dictionary file. First, it is determined whether an object created by the source database is an object that needs to be synchronized, and specifically, it may be determined whether the created object is an object that needs to be synchronized according to the SQL statement that acquires the reference object definition information. If the object needs to be synchronized, step 1132 is executed, and if the object does not need to be synchronized, step 1134 is executed.
Step 1132: and if the object needs to be synchronized, acquiring definition information of the created object and a log serial number corresponding to the operation of the created object.
In the database system, each transaction is generated, and a corresponding log serial number is correspondingly distributed so as to distinguish different operation logs. In this embodiment, first, definition information of a created object and a log serial number corresponding to an operation of the created object are acquired.
Step 1133: and integrating the definition information of the created object and the corresponding log serial number into a first version offline dictionary file, and storing the first version offline dictionary file in a local disk.
And integrating the definition information of the created object and the corresponding log serial number into a first version offline dictionary file, and storing the first version offline dictionary file in a local disk. Please refer to step 102 for a specific method for creating an offline dictionary file, which is not described herein again.
Step 1134: and if the synchronous object is not needed, the offline dictionary is not updated.
And if the synchronous object is not needed, not establishing the off-line dictionary file corresponding to the created object.
The embodiment mainly illustrates how to update the offline dictionary when creating an object, so as to ensure the accuracy and comprehensiveness of the object definition information stored in the offline dictionary.
In an alternative embodiment, the data operation is a modified object. Referring to fig. 4, step 103: analyzing the data operation of the source database, and updating the offline dictionary file of the object corresponding to the data operation comprises the following steps:
step 1231: and acquiring definition information of the modified object and a log serial number corresponding to the operation of the modified object.
In this embodiment, when an object is modified, definition information of the object is modified. For example, the object is a table, and in an actual application scenario, after a table name or a column name is modified, definition information of the object is changed accordingly. At this time, the definition information of the modified object and the log serial number corresponding to the operation of the modified object in the log file of the database system are obtained.
Step 1232: and integrating the definition information of the modified object and the corresponding log serial number into a next version offline dictionary file.
In this embodiment, when the offline dictionary file corresponding to the modified object is updated, the offline dictionary file of the previous version is not covered, but the updated offline dictionary file of the updated version is added to the end of the offline dictionary file corresponding to the object. In order to distinguish the offline dictionary files with different versions of the same object, the log sequence number is configured to be the version number of the updated offline dictionary file. In the log analysis, corresponding definition information can be obtained according to the version number of the offline dictionary file. The following will explain in detail how to obtain the corresponding definition information according to the version number of the offline dictionary file.
In an alternative embodiment, the data operation is a delete object. Referring to fig. 5, step 103: analyzing the data operation of the source database, and updating the offline dictionary file of the object corresponding to the data operation comprises the following steps:
step 1331: acquiring a log serial number corresponding to the operation of the deleted object;
in this embodiment, after an object is deleted, an offline dictionary file corresponding to the deleted object and a log serial number corresponding to the operation of the deleted object in a log file of a database system are obtained. In an optional embodiment, the special character is used to mark that the offline dictionary file has been deleted, so that an object to be deleted is determined according to the special character in a subsequent operation, and the offline dictionary file corresponding to the deleted object is removed from the local disk, so as to save disk space.
Step 1332: and judging and determining whether the log sequence number corresponding to the operation of the deleted object is smaller than the initial log sequence number.
Specifically, when the source database receives a data synchronization point of the destination database, the source database determines an initial log serial number according to the data synchronization point, and determines data to be synchronized according to the log serial number. Specifically, the starting log sequence number may be a maximum corresponding log sequence number in a log file of the source-end database, and the maximum log sequence number is set as the starting log sequence number, where the first synchronization request that the source-end database receives the destination-end database is received by the source-end database.
In this embodiment, the source database determines whether the log sequence number corresponding to the delete object operation is smaller than the start log sequence number, and if so, it indicates that the object has been deleted before data synchronization, that is, the deleted object does not need to be synchronized to the destination database, and then step 1333 is executed. If the data needed to be synchronized may exist from the synchronization point to the time point when the object is deleted, the offline dictionary file is reserved, and the offline dictionary file corresponding to the deleted object is deleted from the local disk after the data synchronization is completed.
Step 1333: and if the number of the deleted objects is smaller than the preset number, deleting the offline dictionary file corresponding to the deleted object from the local disk.
The foregoing describes in detail a process of updating an offline dictionary according to different data operations (DDL operations) to ensure the accuracy and comprehensiveness of offline dictionary files in the offline dictionary, and further provide accurate and comprehensive definition information of objects for log analysis or data synchronization.
Step 11: and acquiring an operation log in a log file of a source database.
In a specific application scenario, first, a data synchronization point of a destination database is obtained, and a start log sequence number is determined according to the data synchronization point, where the start log sequence number may be a maximum log sequence number corresponding to a log file of the destination database when the source database receives a first synchronization request of the destination database, and the maximum log sequence number is set as a start log sequence number.
Then, obtaining an operation log of which the log serial number is not less than the initial log serial number in the source database log file, and performing log analysis on the operation log meeting the conditions.
Step 12: and acquiring corresponding object definition information from the offline dictionary according to the log serial number of the operation log, and analyzing the operation log according to the acquired object definition information.
In this embodiment, since there are a large number of operation logs in the log file, and the operation object corresponding to each operation log may be different, the source database parses the operation logs to obtain the target operation object. In an alternative embodiment, the operation object name may be obtained, and then the corresponding offline dictionary file may be obtained from the offline dictionary according to the operation object name, so as to obtain the correct object definition information.
Specifically, as can be seen from the foregoing analysis, each object may correspond to multiple versions of the offline dictionary file, in order to determine the object definition corresponding to the operated object at the operation time point. In this embodiment, log sequence numbers corresponding to all versions of offline dictionary files containing definition information of the target operation object are obtained, the log sequence numbers of the operation logs are compared with the log sequence numbers corresponding to all versions of offline dictionary files, and the offline dictionary files of corresponding versions are determined based on a rule greater than the latest rule, so as to obtain the definition information of the target operation object. The reason that the log sequence number corresponding to the operation log is possibly greater than the log sequence numbers corresponding to the offline dictionary files of the multiple versions is based on the principle that the log sequence number is greater than the most recent principle, and the offline dictionary file which is greater than the log sequence number corresponding to one of the multiple versions and has the smallest difference value is selected as the target version offline dictionary file. Then, correct object definition information is acquired from the target version offline dictionary file.
To more clearly illustrate the aforementioned process of acquiring the correct definition information, a specific application scenario is first illustrated.
Now, it is assumed that the offline dictionary file corresponding to the target operation object includes two versions of offline dictionary files, and the log sequence number corresponding to the first version of offline dictionary file is l0After the database synchronization service is started, a defined modification operation is performed on the object, and then an offline dictionary file corresponding to definition information after the modification operation is performed is added after the first version offline dictionary file, that is, a second version offline dictionary file, where a log sequence number corresponding to the second version offline dictionary file is l1。
In the database synchronization process, the log serial number is used for identifying the data synchronization sequence, and when the source end database performs data synchronization, the initial log serial number is l0I.e. from0I.e. synchronizing the source database data from the beginning. Three operation logs are assumed in a source database log file, and the log sequence number corresponding to the first operation log is ll-1The log sequence number corresponding to the second operation log is ll1And the log sequence number corresponding to the third operation log is ll2And ll is-1<l0<ll1<l1<ll2。
When the data of the source end database is synchronized, comparing the log serial number corresponding to each operation log with the initial log serial number, because ll-1Identified oplogs are synchronized in the database (corresponding to l)0) Before, ll was not analyzed-1Carrying out corresponding operation; when the log analysis reaches ll1When due to l0<ll1<l1That is, the second operation log occurs at the time point corresponding to the first version of the offline dictionary file and the time point corresponding to the second version of the offline dictionary file, and l in the offline dictionary file is selected based on the rule greater than the latest rule0Analyzing the definition corresponding to the corresponding operation object; when the log analysis reaches ll2When due to l1<ll2That is, the third operation log occurs after the time point corresponding to the second version of the offline dictionary file, and is selected from the offline dictionary file based on the rule greater than the latest rule1And analyzing the definition corresponding to the object. Therefore, the offline dictionary file with the correct version can be selected in the offline dictionary according to the principle that the log serial number of the current operation log is larger than the latest principle so as to obtain the correct definition information.
Step 13: and sending the data obtained by log analysis to a destination database.
In this embodiment, the data obtained by performing log analysis is sent to the destination database, but it is needless to say that the operation logs and the object definition information corresponding to each operation log may also be sent to the destination database, and the destination database restores the operation logs into SQL statements for data synchronization. The design may be specifically designed according to actual conditions, and is not specifically limited herein.
In an actual application scenario, in order to reduce the data volume of the source end database synchronized to the destination end database, the synchronization efficiency is improved. In an optional embodiment, before data is synchronized, it is determined whether an object that needs to be synchronized has been deleted before a synchronization point, and if the object has been deleted, log analysis on a plurality of DML operation logs corresponding to the object is not performed during data synchronization, and meanwhile, data corresponding to the deleted object does not need to be synchronized to a destination database. This reduces the amount of data and improves the efficiency of the synchronization.
Specifically, if the log sequence number corresponding to the last version of the offline dictionary file of the object is less than the start log sequence number determined according to the synchronization point, it is determined that the object to be synchronized has been deleted before the synchronization point, and the data corresponding to the object does not need to be synchronized to the destination-side database.
Different from the prior art, when the data synchronization is carried out in a log analysis mode, the method firstly establishes the offline dictionary containing the object definition information, can access the offline dictionary during log analysis, and establishes the complementary analysis with the operation log only containing the local key information in the prior art, thereby not only ensuring the correctness of the object definition information in the operation log, but also avoiding the condition that the performance of the database is influenced by accessing the database in real time.
Example 2:
Referring to fig. 6, fig. 6 is a schematic structural diagram of an apparatus for data synchronization according to an embodiment of the present invention. The means for data synchronization of the present embodiment comprises one or more processors 61 and a memory 62. In fig. 6, one processor 61 is taken as an example.
The processor 61 and the memory 62 may be connected by a bus or other means, such as the bus connection in fig. 6.
The memory 62, which is a non-volatile computer-readable storage medium based on data synchronization, may be used to store non-volatile software programs, non-volatile computer-executable programs, and modules, such as the method for data synchronization in embodiment 1and corresponding program instructions. The processor 61 executes various functional applications of the method of performing data synchronization and data processing, that is, functions of the method of data synchronization of embodiment 1, by executing nonvolatile software programs, instructions, and modules stored in the memory 62.
The memory 62 may include, among other things, high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid-state storage device. In some embodiments, the memory 62 may optionally include memory located remotely from the processor 61, and these remote memories may be connected to the processor 61 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
Please refer to fig. 1 to 5 and the related text description for a data synchronization method, which will not be described again.
It should be noted that, for the information interaction, execution process and other contents between the modules and units in the apparatus and system, the specific contents may refer to the description in the embodiment of the method of the present invention because the same concept is used as the embodiment of the processing method of the present invention, and are not described herein again.
Those of ordinary skill in the art will appreciate that all or part of the steps of the various methods of the embodiments may be implemented by associated hardware as instructed by a program, which may be stored on a computer-readable storage medium, which may include: a Read Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and the like.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.