CN111104445A - Data synchronization method, device and equipment - Google Patents
Data synchronization method, device and equipment Download PDFInfo
- Publication number
- CN111104445A CN111104445A CN201911243594.8A CN201911243594A CN111104445A CN 111104445 A CN111104445 A CN 111104445A CN 201911243594 A CN201911243594 A CN 201911243594A CN 111104445 A CN111104445 A CN 111104445A
- Authority
- CN
- China
- Prior art keywords
- data
- target data
- incremental
- increment
- timestamp
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/25—Integrating or interfacing systems involving database management systems
- G06F16/254—Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/27—Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computing Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The application provides a data synchronization method, a data synchronization device and data synchronization equipment, wherein the method comprises the following steps: acquiring first incremental data of a first database, inserting the first incremental data into a full-scale table in a data warehouse, and writing an incremental identifier and a timestamp corresponding to the first incremental data into the full-scale table; determining target data meeting preset conditions from the full-scale table according to the time stamp; and inserting the target data into the increment table in the data warehouse, and synchronizing the target data in the increment table to the second database. Therefore, for database products which only support insertion and do not support modification/deletion, the ETL data increment synchronization scheme is realized, the problem that the whole periodic synchronization consumes long time due to the large data quantity of the whole database table is solved, and the service scene with high real-time requirement can be met.
Description
Technical Field
The present application relates to the field of data transmission technologies, and in particular, to a data synchronization method, apparatus and device.
Background
An ETL (Extract-Transform-Load) is used to describe a process of extracting, interactively converting, and loading data from a source database to a destination database, and an existing ETL system generally includes two data synchronization modes, namely full-volume synchronization and incremental synchronization, when implementing data synchronization. When incremental synchronization is performed, the database is required to support data insertion, data modification and data deletion.
In the related art, some database products only support the insertion function and do not support the modification deletion function, for example, database products used for big data calculation and analysis scenarios, ETL cannot support data increment synchronization of such databases, and when the data volume of the database full table is large, the full synchronization consumes a long time and cannot meet the requirement of service real-time performance.
Disclosure of Invention
The present application is directed to solving, at least to some extent, one of the technical problems in the related art.
Therefore, a first objective of the present application is to provide a data synchronization method, which can implement an ETL data increment synchronization scheme for an unsupported database product, and meet a service real-time requirement.
A second object of the present application is to provide a data synchronization apparatus.
A third object of the present application is to propose a computer device.
A fourth object of the present application is to propose a computer readable storage medium.
An embodiment of a first aspect of the present application provides a data synchronization method, including:
acquiring first incremental data of a first database, inserting the first incremental data into a full-scale table in a data warehouse, and writing an incremental identifier and a timestamp corresponding to the first incremental data into the full-scale table;
determining target data meeting preset conditions from the full scale according to the timestamps;
inserting the target data into an increment table in the data warehouse, and synchronizing the target data in the increment table to a second database.
In addition, the data synchronization method according to the above embodiment of the present application may further have the following additional technical features:
optionally, the inserting the target data into the increment table in the data warehouse comprises: taking the sequence of the timestamps corresponding to the target data as the insertion sequence of the target data, and inserting the target data into the increment table according to the insertion sequence; the determining, from the full scale table according to the timestamp, target data meeting a preset condition includes: acquiring the historical time of inserting the target data into the increment table last time; and matching the time stamp with the historical time, and determining second incremental data of the time stamp after the historical time in the full scale as the target data.
Optionally, the determining, from the full scale, target data meeting a preset condition according to the timestamp further includes: acquiring the historical time of inserting the target data into the increment table last time; matching the timestamp with the historical time, determining second incremental data of the timestamp in the full scale after the historical time; and acquiring an identification field in the second incremental data, screening the second incremental data with the same identification field according to the timestamp, and reserving the incremental data with the latest timestamp in the second incremental data with the same identification field and the incremental data without the same identification field in the second incremental data as the target data.
Optionally, inserting the target data into an increment table in the data warehouse, comprising: judging whether the data inserted last time in the increment table is successfully synchronized to a second database; and after the last inserted data of the increment table is successfully synchronized, emptying the data in the increment table and inserting the target data into the increment table.
Optionally, after inserting the target data into an increment table in the data warehouse, further comprising: and acquiring a timestamp corresponding to the target data inserted into the increment table, and updating and recording the historical time according to the latest timestamp.
Optionally, the inserting the target data into an increment table in the data warehouse includes: and removing the time stamp in the target data, and inserting the target data with the time stamp removed into the increment table.
An embodiment of a second aspect of the present application provides a data synchronization apparatus, including:
the system comprises an acquisition module, a storage module and a processing module, wherein the acquisition module is used for acquiring first incremental data of a first database, inserting the first incremental data into a full-scale table in a data warehouse, and writing an incremental identifier and a timestamp corresponding to the first incremental data into the full-scale table;
the determining module is used for determining target data meeting preset conditions from the full scale according to the timestamps;
a processing module for inserting the target data into an increment table in the data warehouse;
and the synchronization module is used for synchronizing the target data in the increment table to a second database.
In addition, the data synchronization device according to the above-mentioned embodiment of the present application may further have the following additional technical features:
optionally, the processing module is specifically configured to: taking the sequence of the timestamps corresponding to the target data as the insertion sequence of the target data, and inserting the target data into the increment table according to the insertion sequence; the determining module is specifically configured to: acquiring the historical time of inserting the target data into the increment table last time; and matching the time stamp with the historical time, and determining second incremental data of the time stamp after the historical time in the full scale as the target data.
Optionally, the determining module is specifically configured to: acquiring the historical time of inserting the target data into the increment table last time; matching the timestamp with the historical time, determining second incremental data of the timestamp in the full scale after the historical time; and acquiring an identification field in the second incremental data, screening the second incremental data with the same identification field according to the timestamp, and reserving the incremental data with the latest timestamp in the second incremental data with the same identification field and the incremental data without the same identification field in the second incremental data as the target data.
Optionally, the processing module is specifically configured to: judging whether the data inserted last time in the increment table is successfully synchronized to a second database; and after the synchronization of the data inserted last time in the increment is successful, emptying the data in the increment table, and inserting the target data into the increment table.
Optionally, the apparatus further comprises: and the recording module is used for acquiring the time stamp corresponding to the target data inserted into the increment table, and updating and recording the historical time according to the latest time stamp.
Optionally, the processing module is specifically configured to: and removing the time stamp in the target data, and inserting the target data with the time stamp removed into the increment table.
An embodiment of a third aspect of the present application provides a computer device, including a processor and a memory; wherein the processor executes a program corresponding to the executable program code by reading the executable program code stored in the memory, so as to implement the data synchronization method according to the embodiment of the first aspect.
An embodiment of a fourth aspect of the present application provides a computer-readable storage medium, on which a computer program is stored, which when executed by a processor implements the data synchronization method according to the embodiment of the first aspect.
One embodiment in the above application has the following advantages or benefits: due to the adoption of the method for acquiring the first incremental data of the first database, the first incremental data is inserted into the full-scale table in the data warehouse, and the incremental identifier and the timestamp corresponding to the first incremental data are written into the full-scale table. And further, determining target data meeting preset conditions from the full-scale table according to the time stamp. Further, the target data is inserted into an increment table in the data warehouse, and the target data in the increment table is synchronized to the second database. Therefore, for database products which only support insertion and do not support modification/deletion, the ETL data increment synchronization scheme is realized, the problem that the whole periodic synchronization consumes long time due to the large data quantity of the whole database table is solved, and the service scene with high real-time requirement can be met.
Additional aspects and advantages of the present application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the present application.
Drawings
Fig. 1 is a schematic flowchart of a data synchronization method according to an embodiment of the present application;
FIG. 2 is a diagram of a data synchronization scenario;
fig. 3 is a schematic flowchart of another data synchronization method according to an embodiment of the present application;
fig. 4 is a schematic diagram of a data synchronization scenario provided in an embodiment of the present application;
fig. 5 is a schematic structural diagram of a data synchronization apparatus according to an embodiment of the present application;
fig. 6 is a schematic structural diagram of another data synchronization apparatus according to an embodiment of the present application.
Detailed Description
Reference will now be made in detail to embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are exemplary and intended to be used for explaining the present application and should not be construed as limiting the present application.
The following describes a data synchronization method, apparatus, and device according to an embodiment of the present application with reference to the drawings.
Fig. 1 is a schematic flowchart of a data synchronization method according to an embodiment of the present application, and as shown in fig. 1, the method includes:
The data synchronization method of the embodiment of the application can be applied to data synchronization between databases, for example, referring to three databases shown in fig. 2, data to be synchronized is acquired from the database 1 and finally synchronized to the database 2 through a data warehouse, wherein the data warehouse is used for data aggregation analysis processing. It is understood that the data synchronization includes full synchronization and incremental synchronization, and the incremental synchronization requires that the database supports the function of inserting the modified deleted data, whereas the database product for the big data calculation and analysis scenario in the related art only supports the insertion function and does not support the modified deleted function, for example, the data warehouse in the data collection and analysis scenario described above, and thus cannot support the data incremental synchronization.
In this embodiment, when performing data synchronization in the data collection analysis scenario, the first incremental data of the first database may be obtained, and the first incremental data is inserted into the full scale table in the data warehouse. And writing an increment identification and a timestamp corresponding to the first increment data into the full scale, wherein the increment identification comprises an insertion identification, a deletion identification and a modification identification.
As an example, first incremental data is obtained from a first database through incremental extraction such as dynamic Change Data Capture (CDC), the first incremental data is output to a full-scale table in a data warehouse, and an incremental identifier and a timestamp field are written into the full-scale table when the first incremental data is output, wherein the data warehouse is used for data aggregation analysis, and the timestamp can comprise the time when the data in the first database is changed such as insertion, modification, deletion and the like, and can also comprise the time when the first incremental data is written into the full-scale table.
It should be noted that the manner of incremental extraction includes, but is not limited to, CDC, log increment, trigger, timestamp increment, etc., and is not limited herein.
And step 102, determining target data meeting preset conditions from the full-scale table according to the time stamp.
In this embodiment, an incremental synchronization condition may be preset, and target data meeting the incremental synchronization condition may be determined from all data stored in the full-scale table to further output the target data, without synchronizing all data in the full-scale table each time.
In one embodiment of the present application, determining target data satisfying a preset condition from the full-scale table according to the timestamp includes: the method comprises the steps of acquiring historical time of last target data inserted into an increment table, matching a timestamp with the historical time, determining second increment data of the timestamp after the historical time in a full-scale table as target data, further updating and recording the latest historical time after processing is completed each time, specifically, acquiring the timestamp corresponding to the target data inserted into the increment table, and updating and recording the historical time according to the latest timestamp. For example, if the data 1 and the data 2 in the full-scale table are synchronized to the incremental table at the last time and the timestamps are 6:00 and 7:00 respectively, the historical time is determined to be 7:00, the data 3 with the timestamp of 9:00 is further acquired, and the data 3 is determined to be the target data of the data synchronization according to the timestamp and the historical time.
In one embodiment of the present application, determining target data satisfying a preset condition from the full-scale table according to the timestamp includes: and acquiring preset time period information, matching according to the time stamp and the time period information, and determining data of the time stamp in the current time period in the full scale as target data. For example, the preset time period information includes 0:00-12:00, 12:00-24:00, and if the time period corresponding to the current data synchronization is 0:00-12:00, the data of the timestamp in the current time period is determined in the full-scale table as the target data.
And 103, inserting the target data carrying the increment identification into an increment table in the data warehouse, and synchronizing the target data in the increment table to a second database.
In this embodiment, an increment table may be preset in the data warehouse, and the target data may be inserted into the preset increment table in the data warehouse from the full-volume table. Wherein the target data carries an incremental identification. As a possible implementation, inserting the target data into an increment table preset in the data warehouse includes: and clearing the increment table, and inserting the target data into the cleared increment table. Optionally, after determining second increment data of the timestamp after the historical time in the full-scale table as the target data, determining an insertion order of the target data according to a sequence of the timestamps corresponding to the target data, and inserting the target data into the increment table according to the insertion order. For example, if the timestamps corresponding to the target data A, B are 7:00 and 8:00, respectively, the target data a is inserted into the increment table, and then the target data B is inserted into the increment table.
As one example, an increment table may be extracted to obtain target data. Specifically, because the data warehouse does not support incremental synchronization, the incremental table can be subjected to full-table data extraction in a full-data synchronization mode, so that target data can be extracted according to the incremental table. And further processing the target data in an increment simulating mode, determining the writing type of each target data according to the increment identification corresponding to each target data, converting the target data into an increment input data format according to the determined writing type, processing by using the existing increment input data processing mechanism of the ETL framework, outputting a Structured Query Language (SQL) which can be executed by a second database and can be inserted/modified/deleted, outputting the SQL to the second database according to the recorded sequence, and realizing the data insertion/modification/deletion of the second database, thereby realizing the data increment synchronization of the first database, the data warehouse and the second database in the data collection scene.
According to the data synchronization method, the first incremental data of the first database are obtained, the first incremental data are inserted into the full-scale table in the data warehouse, and the incremental identification and the time stamp corresponding to the first incremental data are written into the full-scale table. And further, determining target data meeting preset conditions from the full-scale table according to the time stamp. Further, the target data is inserted into an increment table in the data warehouse, and the target data in the increment table is synchronized to the second database. Therefore, for database products which only support insertion and do not support modification/deletion, the ETL data increment synchronization scheme is realized, the problem that the whole periodic synchronization consumes long time due to the large data quantity of the whole database table is solved, and the service scene with high real-time requirement can be met.
Based on the foregoing embodiment, further, in the foregoing data synchronization scenario, a data synchronization period may be set according to a service requirement, and there may be a case where the same piece of data changes multiple times within one period, so that after the second incremental data of the timestamp after the historical time is determined in the full-scale table according to matching between the timestamp and the historical time, the second incremental data may be further filtered according to the timestamp and the identification field.
Fig. 3 is a schematic flowchart of another data synchronization method according to an embodiment of the present application, and as shown in fig. 3, the method includes:
Optionally, when the first incremental data is output to the full table in the data warehouse, the timestamp fields of the data in the full table are sequentially incremented by ensuring the output order of the first incremental data. The synchronization process may select real-time synchronization or periodic timing synchronization according to service requirements, which is not limited herein.
In this embodiment, the first incremental data may also carry a primary key field (ID field/identification field) in the first database table.
And step 303, matching the time stamp with the historical time, and determining second incremental data of the time stamp after the historical time in the full-scale table.
In this embodiment, the incremental identifier includes an insertion identifier, a deletion identifier, and a modification identifier. Since there may be multiple changes to the data of the same ID field, in some scenarios, the second delta data may be filtered according to the timestamp and the identification field. Specifically, an identification field in the second incremental data is obtained, the second incremental data with the same identification field is screened according to the timestamp, and the incremental data with the latest timestamp in the second incremental data with the same identification field and the incremental data without the same identification field in the second incremental data are reserved as target data.
As an example, the second incremental data sequentially includes [1, a, modification identifier ], [1, b, modification identifier ] according to the time stamp sequence, that is, the data change with ID field of 1 includes modification to content a and modification to content b, and in some scenarios, the second incremental data may be filtered according to the time stamp and the ID field, and [1, b, modification identifier ] is used as the target data.
As another example, the second incremental data sequentially includes [1, a, modification identifier ], [1, a, deletion identifier ], [2, c, modification identifier ], in order of the timestamp, that is, the data change with ID field of 1 includes modification to content a and deletion to content a, and in some scenarios, the second incremental data may be filtered according to the timestamp and ID field, and [1, a, deletion identifier ], [2, c, modification identifier ] is reserved as the target data. Therefore, the processing amount of data synchronization can be reduced, and the processing efficiency can be improved.
In this embodiment, because the target data is output from the full quantity table to the increment table in the overwriting manner, before outputting the data in the increment table of the data warehouse to the second database, it is further necessary to determine whether the increment table can be overwritten currently, specifically, determine whether the data inserted last time in the increment table is successfully synchronized to the second database, determine that the increment table can be overwritten if it is known that the data inserted last time in the increment table is successfully synchronized, clear the data in the increment table, and insert the target data into the increment table.
As a possible implementation manner, the target data is extracted according to the increment table, the target data is synchronized to the second database according to the increment identifier, if the second database is successfully synchronized, the second database sends a signal that the data synchronization is successful to the data warehouse, and if the second database is unsuccessfully synchronized, the second database sends a signal that the data synchronization is unsuccessful to the data warehouse. And the data warehouse judges whether the data inserted last time in the increment table is successfully synchronized to the second database according to the received signal.
As another possible implementation manner, a database flag field is set in the data warehouse in advance, and if it is known that the synchronization of the second database is successful, the data warehouse updates the database identification field. And then, judging whether the data inserted last time in the increment table is successfully synchronized to a second database according to the database identification field.
Optionally, if it is known that the data inserted last time in the increment table is not successfully synchronized to the second database, the data in the current increment table is retained until the data inserted last time in the increment table is successfully synchronized.
And 307, extracting target data from the increment table, and synchronizing the target data in the increment table to a second database.
The explanation of step 103 in the foregoing embodiment is also applicable to step 306 and step 307, and is not described here again.
According to the data synchronization method, the second incremental data of the time stamp after the historical time can be determined from the full scale according to the time stamp, the second incremental data are screened according to the identification field and the time stamp, and the target data meeting the preset conditions are determined. Thus, the processing amount of data synchronization can be further reduced, and the processing efficiency can be improved. And when the data inserted last time in the increment table is successfully synchronized to the second database, emptying the data in the increment table, and inserting the target data into the increment table, thereby improving the stability of the system.
The following is an example with reference to a practical application scenario.
Wherein, the table 1 fields contain: primary key field ID, data content NAME.
The full scale field contains: ID, NAME, increment FLAG (including I/D/U, indicating that the present piece of data is an insert/delete/modify operation), timestamp TIME.
The increment table field contains: ID, NAME, FLAG.
The table 2 fields contain: ID, NAME.
Specifically, referring to the above table and fig. 4, when data changes for the first time, three pieces of first incremental data 1, 2, and 3 are newly added in the first database table 1, and are inserted into the full data table of the data warehouse and written with the insertion identifier I and the time stamp. And then sequentially outputting the data to a data warehouse increment table, and recording the historical time of 2019072007: 20: 16. And performing full-table data extraction on the incremental table by adopting a full-data synchronization mode, and realizing data synchronization of the second database table 2 through incremental simulation. As one example, data repositories include, but are not limited to, MaxCommute databases, EMR databases, and the like.
And when the data change occurs for the second time, deleting the data 1 from the first database table 1, modifying the content of the data 3, adding the data 4, and inserting the first incremental data into the full data warehouse table. And further acquiring the historical time and the time stamp of each data in the full-scale table, determining second incremental data after the historical time as target data, emptying the incremental table in an overwriting mode, sequentially outputting the target data to the data warehouse incremental table, and recording the historical time as 2019072007:21: 25. And performing full-table data extraction on the incremental table by adopting a full-data synchronization mode, and realizing data synchronization of the second database table 2 through incremental simulation. It should be noted that, in the data synchronization method according to the embodiment of the present application, the target data includes an identification field, data content, an increment identification, and a timestamp, and the target data may be directly inserted into the increment table, or the timestamp in the target data may be removed, and the target data with the timestamp removed is inserted into the increment table, which is not limited herein.
As another example, with reference to the following table,
in this example, a synchronization period is preset according to a data synchronization scenario, where data synchronization is not performed when data is changed for the second time, and when data is changed for the third time, data with an ID of 3 is modified again in the first database table 1, and at this time, periodic data synchronization is performed. Further, the historical time and the time stamp of each data in the full-scale table are obtained, and second incremental data with the time stamp after the historical time is determined. Further, screening the second increment data according to the identification field and the timestamp, reserving (1, a, D,2019072007:21:20), (4, y, I, 2019072007:21: 25), (3, t, U, 2019072007: 25:25) as target data, emptying the increment table in an overwriting mode, sequentially outputting the target data to the data warehouse increment table, and recording the historical time to be 2019072007: 25: 25. And performing full-table data extraction on the incremental table by adopting a full-data synchronization mode, and realizing data synchronization of the second database table 2 through incremental simulation.
As yet another example, with reference to the following table,
in this example, data synchronization is not performed at the time of the second data change, and when data change is performed for the third time, matching is performed according to the time stamp and the history time, and (1, a, D,2019072007:21:20), (3, x, U, 2019072007:21: 22), (4, y, I, 2019072007:21: 25), (3, t, U, 2019072007: 25) of the time stamp after the history time is determined as target data in the full-scale table, and the target data is inserted into the increment table in the data warehouse according to the insertion order of the target data according to the precedence order of the time stamp. And performing full-table data extraction on the incremental table by adopting a full-data synchronization mode, and realizing data synchronization of the second database table 2 through incremental simulation. Therefore, for database products which only support insertion and do not support modification/deletion, the ETL data increment synchronization scheme is realized, the problem that the whole periodic synchronization consumes long time due to the large data quantity of the whole database table is solved, and the service scene with high real-time requirement can be met.
In order to implement the above embodiments, the present application further provides a data synchronization apparatus.
Fig. 5 is a schematic structural diagram of a data synchronization apparatus according to an embodiment of the present application, and as shown in fig. 5, the apparatus includes: the system comprises an acquisition module 10, a determination module 20, a processing module 30 and a synchronization module 40.
The obtaining module 10 is configured to obtain first incremental data of the first database, insert the first incremental data into a full-scale table in the data warehouse, and write an incremental identifier and a timestamp corresponding to the first incremental data into the full-scale table.
And the determining module 20 is used for determining the target data meeting the preset condition from the full-scale table according to the time stamp.
And the processing module 30 is configured to insert the target data carrying the increment identifier into an increment table in the data warehouse.
And a synchronization module 40, configured to synchronize the target data in the increment table to the second database.
In an embodiment of the present application, the determining module 20 is specifically configured to: acquiring the historical time of inserting the target data into the increment table last time; matching the time stamp with historical time, and determining second incremental data of the time stamp after the historical time in the full scale as target data; the processing module 30 is specifically configured to: and taking the sequence of the timestamps corresponding to the target data as the insertion sequence of the target data, and inserting the target data into the increment table according to the insertion sequence.
In an embodiment of the present application, the determining module 20 is specifically configured to: acquiring the historical time of inserting the target data into the increment table last time; matching the time stamp with the historical time, and determining second incremental data of the time stamp after the historical time in the full scale; and acquiring an identification field in the second incremental data, screening the second incremental data with the same identification field according to the timestamp, and keeping the incremental data with the latest timestamp in the second incremental data with the same identification field and the incremental data without the same identification field in the second incremental data as target data.
On the basis of fig. 5, the data synchronization apparatus shown in fig. 6 further includes: a recording module 50.
The recording module 50 is configured to obtain a timestamp corresponding to the target data inserted into the increment table, and update and record the historical time according to the latest timestamp.
In an embodiment of the present application, the processing module 30 is specifically configured to: judging whether the data inserted last time in the increment table is successfully synchronized to a second database; and after the synchronization of the data inserted last time in the increment is successful, emptying the data in the increment table, and inserting the target data into the increment table.
It should be noted that the explanation of the data synchronization method in the foregoing embodiment is also applicable to the data synchronization apparatus in this embodiment, and is not repeated herein.
According to the data synchronization device, the first incremental data of the first database are obtained, the first incremental data are inserted into the full-scale table in the data warehouse, and the incremental identification and the timestamp corresponding to the first incremental data are written into the full-scale table. And further, determining target data meeting preset conditions from the full-scale table according to the time stamp. Further, target data carrying the increment identification are inserted into an increment table in the data warehouse, and the target data in the increment table are synchronized to a second database. Therefore, for database products which only support insertion and do not support modification/deletion, the ETL data increment synchronization scheme is realized, the problem that the whole periodic synchronization consumes long time due to the large data quantity of the whole database table is solved, and the service scene with high real-time requirement can be met.
In order to implement the above embodiments, the present application also provides a computer device, including a processor and a memory; wherein the processor runs a program corresponding to the executable program code by reading the executable program code stored in the memory, so as to implement the data synchronization method according to any of the foregoing embodiments.
In order to implement the foregoing embodiments, the present application also proposes a computer program product, wherein instructions of the computer program product, when executed by a processor, implement the data synchronization method according to any of the foregoing embodiments.
In order to implement the above embodiments, the present application also proposes a computer-readable storage medium, on which a computer program is stored, which when executed by a processor implements the data synchronization method according to any of the preceding embodiments.
In the description herein, reference to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the application. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present application, "plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.
Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and the scope of the preferred embodiments of the present application includes other implementations in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present application.
The logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.
It should be understood that portions of the present application may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.
It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and when the program is executed, the program includes one or a combination of the steps of the method embodiments.
In addition, functional units in the embodiments of the present application may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may also be stored in a computer readable storage medium.
The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc. Although embodiments of the present application have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present application, and that variations, modifications, substitutions and alterations may be made to the above embodiments by those of ordinary skill in the art within the scope of the present application.
Claims (14)
1. A method of data synchronization, comprising:
acquiring first incremental data of a first database, inserting the first incremental data into a full-scale table in a data warehouse, and writing an incremental identifier and a timestamp corresponding to the first incremental data into the full-scale table;
determining target data meeting preset conditions from the full scale according to the timestamps;
inserting the target data into an increment table in the data warehouse, and synchronizing the target data in the increment table to a second database.
2. The method of claim 1, wherein said determining target data from said full table that meets a predetermined condition based on said time stamp comprises:
acquiring the historical time of inserting the target data into the increment table last time;
matching the timestamp with the historical time, determining second incremental data of the timestamp in the full scale after the historical time;
and acquiring an identification field in the second incremental data, screening the second incremental data with the same identification field according to the timestamp, and reserving the incremental data with the latest timestamp in the second incremental data with the same identification field and the incremental data without the same identification field in the second incremental data as the target data.
3. The method of claim 1, wherein said determining target data from said full table that meets a predetermined condition based on said time stamp comprises:
acquiring the historical time of inserting the target data into the increment table last time;
matching the timestamp with the historical time, and determining second incremental data of the timestamp after the historical time in the full scale as the target data;
the inserting the target data into an increment table in the data warehouse comprises:
and taking the sequence of the timestamps corresponding to the target data as the insertion sequence of the target data, and inserting the target data into the increment table in the data warehouse according to the insertion sequence.
4. The method of claim 2 or 3, wherein the inserting the target data into a delta table in the data warehouse comprises:
judging whether the data inserted last time in the increment table is successfully synchronized to a second database;
and after the last inserted data of the increment table is successfully synchronized, emptying the data in the increment table and inserting the target data into the increment table.
5. The method of claim 4, after inserting the target data into a delta table in the data warehouse, further comprising:
and acquiring a timestamp corresponding to the target data inserted into the increment table, and updating and recording the historical time according to the latest timestamp.
6. The method of claim 1, when inserting the target data into a delta table in the data warehouse, comprising:
and removing the time stamp in the target data, and inserting the target data with the time stamp removed into the increment table.
7. A data synchronization apparatus, comprising:
the system comprises an acquisition module, a storage module and a processing module, wherein the acquisition module is used for acquiring first incremental data of a first database, inserting the first incremental data into a full-scale table in a data warehouse, and writing an incremental identifier and a timestamp corresponding to the first incremental data into the full-scale table;
the determining module is used for determining target data meeting preset conditions from the full scale according to the timestamps;
a processing module for inserting the target data into an increment table in the data warehouse;
and the synchronization module is used for synchronizing the target data in the increment table to a second database.
8. The apparatus of claim 7, wherein the determination module is specifically configured to:
acquiring the historical time of inserting the target data into the increment table last time;
matching the timestamp with the historical time, determining second incremental data of the timestamp in the full scale after the historical time;
and acquiring an identification field in the second incremental data, screening the second incremental data with the same identification field according to the timestamp, and reserving the incremental data with the latest timestamp in the second incremental data with the same identification field and the incremental data without the same identification field in the second incremental data as the target data.
9. The apparatus of claim 7, wherein the processing module is specifically configured to:
taking the sequence of the timestamps corresponding to the target data as the insertion sequence of the target data, and inserting the target data into the increment table according to the insertion sequence;
the determining module is specifically configured to:
acquiring the historical time of inserting the target data into the increment table last time;
and matching the time stamp with the historical time, and determining second incremental data of the time stamp after the historical time in the full scale as the target data.
10. The apparatus of claim 8 or 9, wherein the processing module is specifically configured to:
judging whether the data inserted last time in the increment table is successfully synchronized to a second database;
and after the synchronization of the data inserted last time in the increment is successful, emptying the data in the increment table, and inserting the target data into the increment table.
11. The apparatus of claim 10, further comprising:
and the recording module is used for acquiring the time stamp corresponding to the target data inserted into the increment table, and updating and recording the historical time according to the latest time stamp.
12. The apparatus of claim 7, wherein the processing module is specifically configured to:
and removing the time stamp in the target data, and inserting the target data with the time stamp removed into the increment table.
13. A computer device comprising a processor and a memory;
wherein the processor executes a program corresponding to the executable program code by reading the executable program code stored in the memory for implementing the data synchronization method according to any one of claims 1 to 6.
14. A computer-readable storage medium, on which a computer program is stored, which program, when being executed by a processor, carries out the data synchronization method of any one of claims 1 to 6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911243594.8A CN111104445A (en) | 2019-12-06 | 2019-12-06 | Data synchronization method, device and equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911243594.8A CN111104445A (en) | 2019-12-06 | 2019-12-06 | Data synchronization method, device and equipment |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111104445A true CN111104445A (en) | 2020-05-05 |
Family
ID=70421830
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911243594.8A Pending CN111104445A (en) | 2019-12-06 | 2019-12-06 | Data synchronization method, device and equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111104445A (en) |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111914012A (en) * | 2020-08-12 | 2020-11-10 | 深圳市汉云科技有限公司 | Data extraction method, device, equipment and storage medium |
CN111966750A (en) * | 2020-08-12 | 2020-11-20 | 北京海致网聚信息技术有限公司 | Heterogeneous data source integration method and device based on data lake |
CN112256702A (en) * | 2020-10-23 | 2021-01-22 | 上海恒生聚源数据服务有限公司 | Increment identification correction method and device |
CN112527894A (en) * | 2020-11-27 | 2021-03-19 | 聚好看科技股份有限公司 | Database consistency checking method and system |
CN112632190A (en) * | 2020-12-26 | 2021-04-09 | 中国农业银行股份有限公司 | Data synchronization method and device |
CN112783848A (en) * | 2021-01-20 | 2021-05-11 | 杭州数梦工场科技有限公司 | Data synchronization method and device and electronic equipment |
CN112988916A (en) * | 2021-03-05 | 2021-06-18 | 杭州天阙科技有限公司 | Full and incremental synchronization method, device and storage medium for Clickhouse |
CN113360505A (en) * | 2021-07-02 | 2021-09-07 | 招商局金融科技有限公司 | Data processing method and device based on time sequence data, electronic equipment and readable storage medium |
CN113672692A (en) * | 2021-10-25 | 2021-11-19 | 腾讯科技(深圳)有限公司 | Data processing method, data processing device, computer equipment and storage medium |
CN113760910A (en) * | 2021-08-31 | 2021-12-07 | 中国银联股份有限公司 | Data synchronization method and device |
CN114157677A (en) * | 2021-12-14 | 2022-03-08 | 南京欧珀软件科技有限公司 | Data synchronization method and related product |
CN114398359A (en) * | 2022-01-17 | 2022-04-26 | 深圳依时货拉拉科技有限公司 | Order data automatic reconciliation method, device and storage medium |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101183387A (en) * | 2007-12-14 | 2008-05-21 | 沈阳东软软件股份有限公司 | Increment data capturing method and system |
CN105488187A (en) * | 2015-12-02 | 2016-04-13 | 北京四达时代软件技术股份有限公司 | Method and device for extracting multi-source heterogeneous data increment |
CN106682140A (en) * | 2016-12-20 | 2017-05-17 | 华北计算技术研究所(中国电子科技集团公司第十五研究所) | Multi-system user incremental synchronization method based on timestamps and mapping strategies |
CN108920698A (en) * | 2018-07-16 | 2018-11-30 | 北京京东金融科技控股有限公司 | A kind of method of data synchronization, device, system, medium and electronic equipment |
CN109033127A (en) * | 2018-05-31 | 2018-12-18 | 阿里巴巴集团控股有限公司 | A kind of synchrodata method of calibration, device and equipment |
CN109871378A (en) * | 2019-02-21 | 2019-06-11 | 杭州市商务委员会(杭州市粮食局) | The data acquisition and processing (DAP) method and system of big data platform |
-
2019
- 2019-12-06 CN CN201911243594.8A patent/CN111104445A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101183387A (en) * | 2007-12-14 | 2008-05-21 | 沈阳东软软件股份有限公司 | Increment data capturing method and system |
CN105488187A (en) * | 2015-12-02 | 2016-04-13 | 北京四达时代软件技术股份有限公司 | Method and device for extracting multi-source heterogeneous data increment |
CN106682140A (en) * | 2016-12-20 | 2017-05-17 | 华北计算技术研究所(中国电子科技集团公司第十五研究所) | Multi-system user incremental synchronization method based on timestamps and mapping strategies |
CN109033127A (en) * | 2018-05-31 | 2018-12-18 | 阿里巴巴集团控股有限公司 | A kind of synchrodata method of calibration, device and equipment |
CN108920698A (en) * | 2018-07-16 | 2018-11-30 | 北京京东金融科技控股有限公司 | A kind of method of data synchronization, device, system, medium and electronic equipment |
CN109871378A (en) * | 2019-02-21 | 2019-06-11 | 杭州市商务委员会(杭州市粮食局) | The data acquisition and processing (DAP) method and system of big data platform |
Cited By (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111966750A (en) * | 2020-08-12 | 2020-11-20 | 北京海致网聚信息技术有限公司 | Heterogeneous data source integration method and device based on data lake |
CN111914012A (en) * | 2020-08-12 | 2020-11-10 | 深圳市汉云科技有限公司 | Data extraction method, device, equipment and storage medium |
CN112256702A (en) * | 2020-10-23 | 2021-01-22 | 上海恒生聚源数据服务有限公司 | Increment identification correction method and device |
CN112256702B (en) * | 2020-10-23 | 2023-12-22 | 上海恒生聚源数据服务有限公司 | Incremental identification correction method and device |
CN112527894A (en) * | 2020-11-27 | 2021-03-19 | 聚好看科技股份有限公司 | Database consistency checking method and system |
CN112527894B (en) * | 2020-11-27 | 2023-04-14 | 聚好看科技股份有限公司 | Database consistency checking method and system |
CN112632190A (en) * | 2020-12-26 | 2021-04-09 | 中国农业银行股份有限公司 | Data synchronization method and device |
CN112783848B (en) * | 2021-01-20 | 2023-12-26 | 杭州数梦工场科技有限公司 | Data synchronization method and device and electronic equipment |
CN112783848A (en) * | 2021-01-20 | 2021-05-11 | 杭州数梦工场科技有限公司 | Data synchronization method and device and electronic equipment |
CN112988916A (en) * | 2021-03-05 | 2021-06-18 | 杭州天阙科技有限公司 | Full and incremental synchronization method, device and storage medium for Clickhouse |
CN112988916B (en) * | 2021-03-05 | 2023-06-16 | 杭州天阙科技有限公司 | Full and incremental synchronization method, apparatus and storage medium for Clickhouse |
CN113360505A (en) * | 2021-07-02 | 2021-09-07 | 招商局金融科技有限公司 | Data processing method and device based on time sequence data, electronic equipment and readable storage medium |
CN113360505B (en) * | 2021-07-02 | 2023-09-26 | 招商局金融科技有限公司 | Time sequence data-based data processing method and device, electronic equipment and readable storage medium |
CN113760910A (en) * | 2021-08-31 | 2021-12-07 | 中国银联股份有限公司 | Data synchronization method and device |
CN113672692A (en) * | 2021-10-25 | 2021-11-19 | 腾讯科技(深圳)有限公司 | Data processing method, data processing device, computer equipment and storage medium |
CN113672692B (en) * | 2021-10-25 | 2022-02-22 | 腾讯科技(深圳)有限公司 | Data processing method, data processing device, computer equipment and storage medium |
CN114157677B (en) * | 2021-12-14 | 2023-11-28 | 南京欧珀软件科技有限公司 | Data synchronization method and related product |
CN114157677A (en) * | 2021-12-14 | 2022-03-08 | 南京欧珀软件科技有限公司 | Data synchronization method and related product |
CN114398359A (en) * | 2022-01-17 | 2022-04-26 | 深圳依时货拉拉科技有限公司 | Order data automatic reconciliation method, device and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111104445A (en) | Data synchronization method, device and equipment | |
CN109460349B (en) | Test case generation method and device based on log | |
US7610317B2 (en) | Synchronization with derived metadata | |
CN101127034B (en) | Data organization, inquiry, presentation, documentation, recovery, deletion, refining method, device and system | |
CN106874281B (en) | Method and device for realizing database read-write separation | |
CN110716739B (en) | Code change information statistical method, system and readable storage medium | |
US10210238B2 (en) | Continuous automatic update statistics evaluation using change data capture techniques | |
CN104270605B (en) | A kind of processing method and processing device of video monitoring data | |
CN110134689B (en) | Target group screening method and system based on main body object label change and computer equipment | |
CN106503158A (en) | Method of data synchronization and device | |
CN111400407A (en) | Data synchronization method and device, storage medium and electronic device | |
CN114691704A (en) | Metadata synchronization method based on MySQL binlog | |
CN111159020B (en) | Method and device applied to synchronous software test | |
CN113094442A (en) | Full data synchronization method, device, equipment and medium | |
CN115878027A (en) | Storage object processing method and device, terminal and storage medium | |
CN114116795B (en) | Data storage and query method, device, storage medium and electronic equipment | |
CN114169860A (en) | Enterprise organizational structure synchronization method | |
CN112434108B (en) | Database synchronization method, device and equipment | |
CN111694853B (en) | Data increment collection method and device based on lineage, storage medium and electronic equipment | |
CN115391355B (en) | Data processing method, device, equipment and storage medium | |
CN115455059A (en) | Method, device and related medium for analyzing user behavior based on underlying data | |
CN115658815A (en) | CDC (control data center) -based data synchronization method | |
CN116185986A (en) | Data migration method, device, equipment and computer readable storage medium | |
CN109684291B (en) | File data acquisition method, system, electronic equipment and medium | |
CN111694887B (en) | Data adaptive storage scheduling system and method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |