CN114706867A

CN114706867A - Data synchronization method and device, electronic equipment and storage medium

Info

Publication number: CN114706867A
Application number: CN202210306867.4A
Authority: CN
Inventors: 陈伦; 王辉; 刘德华; 曾琳铖曦; 吴海英; 伍应标; 冯仕炳
Original assignee: Mashang Xiaofei Finance Co Ltd
Current assignee: Mashang Xiaofei Finance Co Ltd
Priority date: 2022-03-25
Filing date: 2022-03-25
Publication date: 2022-07-05

Abstract

The application discloses a data synchronization method, a device, an electronic device and a storage medium, relating to the technical field of data storage, when the source database comprises the incremental data, acquiring the incremental data from the source database according to a first acquisition frequency with higher frequency and caching the incremental data in a cache region, acquiring the increment data from the cache region according to a second acquiring frequency with lower frequency, wherein the first acquiring frequency is higher than the second acquiring frequency, therefore, some increment data are accumulated in the cache region, the target database is updated according to the increment data, the data included in the updated target database is consistent with the data included in the source database, data synchronization is achieved, the first obtaining frequency is higher than the second obtaining frequency, the target database is prevented from being updated immediately when incremental data appear, frequent jumping of the target database is prevented, and processing resources of electronic equipment are saved.

Description

Data synchronization method and device, electronic equipment and storage medium

Technical Field

The present application relates to the field of big data technologies, and in particular, to a data synchronization method and apparatus, an electronic device, and a storage medium.

Background

With the rapid development of the internet industry, the rise of big data and artificial intelligence, enterprises, units or individuals accumulate a large amount of business data, operation data and the like, the data is gradually valued by the enterprises and the units as a resource, and the economic and social values for using and analyzing the data output are more and more obvious. With the continuous accumulation of data volume, data becomes more and more huge and complex, and the characteristics of huge data volume and various data types are presented, which will bring potential safety hazards to the data, so that the synchronization and backup of the data are very important.

Disclosure of Invention

In view of the above problems, the present application provides a data synchronization method, apparatus, electronic device and storage medium, which can solve the above problems.

In a first aspect, an embodiment of the present application provides a data synchronization method, which is applied to an electronic device, where the electronic device includes a source database, a cache region, and a target database, and the method includes: under the condition that the source database comprises incremental data, acquiring the incremental data from the source database according to a first acquisition frequency and caching the incremental data in the cache region; acquiring the incremental data from the cache region according to a second acquisition frequency, wherein the first acquisition frequency is higher than the second acquisition frequency; and updating the target database according to the incremental data, wherein the data included in the updated target database is consistent with the data included in the source database.

In a second aspect, an embodiment of the present application provides a data synchronization apparatus, which is applied to an electronic device, where the electronic device includes a source database, a cache region, and a target database, and the apparatus includes: the cache module is used for acquiring the incremental data from the source database according to a first acquisition frequency and caching the incremental data in the cache area under the condition that the source database comprises the incremental data; an obtaining module, configured to obtain the incremental data from the cache area according to a second obtaining frequency, where the first obtaining frequency is higher than the second obtaining frequency; and the synchronization module is used for updating the target database according to the incremental data, wherein the data included in the updated target database is consistent with the data included in the source database.

In a third aspect, an embodiment of the present application provides an electronic device, including: one or more processors; a memory; one or more applications, wherein the one or more applications are stored in the memory and configured to be executed by the one or more processors, the one or more applications configured to perform the above-described method.

In a fourth aspect, the present application provides a computer-readable storage medium, in which a program code is stored, and the program code can be called by a processor to execute the above method.

When the source database comprises the incremental data, the data synchronization method, the data synchronization device, the electronic equipment and the storage medium provided by the application, acquiring incremental data from a source database according to a first acquiring frequency with higher frequency and caching the incremental data in a cache region, acquiring the increment data from the cache region according to a second acquiring frequency with lower frequency, wherein the first acquiring frequency is higher than the second acquiring frequency, therefore, accumulating some increment data in the buffer area, updating the target database according to the increment data, the data included in the updated target database is consistent with the data included in the source database, data synchronization is achieved, the first obtaining frequency is higher than the second obtaining frequency, the target database is prevented from being updated immediately when incremental data appear, frequent jumping of the target database due to updating is prevented, and processing resources of electronic equipment are saved. And the incremental data is changed data in the source database, and in order to ensure that the two databases are consistent, the target database needs to be updated by the incremental data, the incremental data is separately stored in the cache region, the incremental data can be directly obtained from the cache region, and the incremental data does not need to be searched from the source database with huge data volume according to the index.

These and other aspects of the present application will be more readily apparent from the following description of the embodiments.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 shows a flow diagram of a data synchronization method;

fig. 2 is a schematic diagram illustrating an application environment of a data synchronization method according to an embodiment of the present application;

FIG. 3 is a block diagram of an electronic device for performing a data synchronization method according to an embodiment of the present application;

FIG. 4 is a flow chart illustrating a data synchronization method according to an embodiment of the present application;

FIG. 5 is a block diagram of an electronic device for performing a data synchronization method according to an embodiment of the present application;

FIG. 6 is a flow chart illustrating a data synchronization method according to another embodiment of the present application;

FIG. 7 is a block diagram of an electronic device for performing a data synchronization method according to an embodiment of the present application, according to another embodiment of the present application;

FIG. 8 is a flow chart illustrating step S210 of the data synchronization method of FIG. 6 of the present application;

FIG. 9 is a flow chart illustrating a data synchronization method according to another embodiment of the present application;

FIG. 10 is a flow chart illustrating a data synchronization method according to yet another embodiment of the present application;

FIG. 11 is a block diagram of a data synchronization apparatus provided by an embodiment of the present application;

FIG. 12 is a block diagram of an electronic device for performing a data synchronization method according to an embodiment of the present application, according to yet another embodiment of the present application;

fig. 13 illustrates a storage unit for storing or carrying program codes for implementing a data synchronization method according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

With the rapid development of the internet industry, the rise of big data and artificial intelligence, enterprises, units or individuals accumulate a large amount of business data, operation data and the like, the data is gradually valued by the enterprises and the units as a resource, and the economic and social values for using and analyzing the data output are more and more obvious. In mining the potential value of the above data, support of an offline data warehouse is required, wherein the data warehouse is a strategic set providing all types of data support for decision making processes of all levels of an enterprise, is created for the purposes of analytic reporting and decision support, and is also a structured data environment of decision-making support systems (dss) and online analysis application data sources, and is characterized by theme-oriented, integrated, stable and time-varying properties. With the continuous accumulation of data volume, the data is more and more huge and complex, the characteristics of huge data volume and various data types are presented, potential safety hazards are brought to the data, the off-line data warehouse also faces new challenges to the processing of massive data, and the problems of acquisition and integration of large-scale original data are included. Synchronization and backup of data is of particular importance.

In the aspect of data synchronization and integration, a traditional data warehouse usually adopts a mode of directly connecting a database or a backup database thereof, and synchronizes and integrates data in an incremental or full-scale mode by means of a tool through data investigation in an earlier stage to achieve the purpose of storing snapshot data, wherein the snapshot data refers to all static data at a certain moment, so that the problem of data integration of 90% can be solved, and timeliness is met. Meanwhile, the method also has use limitations, which are specifically as follows:

1) the incremental synchronization is suitable for a database with a data volume exceeding a threshold, for example, the threshold V is 1000 ten thousand (which can be adjusted according to actual needs), and the update time field must have an index, so that the extraction efficiency is high and the integration is fast; if no index exists, the source database (which can be in the form of a table) is comprehensively scanned, so that the read-write performance of the service library is greatly reduced, the extraction efficiency is reduced, and the timeliness cannot be met.

2) The full data synchronization is applicable to a database with a data volume lower than a threshold, for example, the threshold V is 1000 ten thousand (which can be adjusted according to actual needs), and there may be any index, because the data volume is small, the extraction speed is also fast.

Based on the above, if the data amount reaches hundreds of millions, 10 hundreds of millions or more, the traditional multi-bin cannot meet the time efficiency of data extraction in the scene of no index of the source database. As shown in fig. 1, a MYSQL Data source is a commonly used database, and a source database is a MYSQL Data source, and a target database is an ODS layer, where ODS is an abbreviation of Operational Data Store, i.e., an operation Data Store, and is a theme-oriented, integrated, variable set of detailed Data reflecting a current Data value. As shown in fig. 1, in the existing method, a research data amount in a MYSQL data source is determined, whether the research data amount is smaller than a threshold is determined, for example, the threshold may be 1000 thousands, when the data amount is smaller than the threshold, full data in the MYSQL data source is obtained, and then incremental data or full data is extracted from the MYSQL data source directly by using a decimation tool, where the full data refers to all data in the MYSQL data source, and the incremental data refers to newly added and updated partial data in the MYSQL data source after the last data acquisition (generally determined by "update time"). And merging the full data or the incremental data into the ODS layer, and finishing data extraction. Or when the data volume is larger than or equal to the threshold value, obtaining incremental data in the MYSQL data source, judging whether the updating time has an index, and when the updating time has no index, utilizing a drawing tool to scan and draw the increment in a full table mode, wherein the efficiency of the mode is extremely low; when the updating time has indexes, extracting incremental data from the MYSQL data source by using a drawing tool according to the indexes, merging the incremental data into the ODS layer, and finishing the drawing.

For a data table of hundred million level, 10 hundred million level or more, if the "update time" field of the source table has no index (and cannot create an index), the conventional data warehouse generally synchronizes data to a Distributed File System (HDFS) by means of full extraction, setting of slice fields, increasing concurrency, lengthening extraction time, and the like, wherein the HDFS is designed to be suitable for running on general hardware (comfort hardware), but cannot meet the requirement of time efficiency.

In view of the above technical problems, the inventors have found and proposed a data synchronization method, apparatus, electronic device and storage medium through long-term research. When the source database comprises incremental data, acquiring the incremental data from the source database according to a first acquiring frequency with higher frequency and caching the incremental data in a cache region, acquiring the incremental data from the cache region according to a second acquiring frequency with lower frequency, wherein the first acquiring frequency is higher than the second acquiring frequency, so that some incremental data are already accumulated in the cache region, and updating the target database according to the incremental data, so that the data included in the updated target database is consistent with the data included in the source database, data synchronization is realized, the first acquiring frequency is higher than the second acquiring frequency, the target database is prevented from being updated immediately as soon as the incremental data appear, frequent jump of the target database is prevented, and processing resources of the electronic equipment are also saved. The specific data synchronization method is specifically described in the following embodiments.

In order to better understand a data synchronization method, an apparatus, an electronic device, and a storage medium provided in the embodiments of the present application, an application environment suitable for the embodiments of the present application is described below.

Fig. 2 shows a schematic application environment of the data synchronization method provided in an embodiment of the present application, and please refer to fig. 2, the data synchronization method is used in a data synchronization system 100, where the data synchronization system 100 includes a first electronic device 110, a second electronic device 120, and a communication network 130, and the first electronic device 110 and the second electronic device 120 are connected through the communication network 130.

The first electronic device 110 includes a source database, the second electronic device 120 includes a target database, the full data in the source database is synchronized to the target database for backup through the communication network 130, and when incremental data exists in the source database, the incremental data is synchronized to the target database of the second electronic device 120 through the communication network 130 for backup. For example, an application program is installed on the first electronic device 110 (e.g., a mobile phone), the same application program as the first electronic device 110 is installed on the second electronic device 120, or an application program for backup is installed on the second electronic device 120, and the application program on the first electronic device 110 generates various data, such as social chat records, video files, photos, and the like, during use, and in order to avoid the loss of the data, the data may be synchronized from the first electronic device 110 to the second electronic device 120 (e.g., a computer) through the communication network 130 for backup, so that when the data on the first electronic device 110 is lost, corresponding backup data may be obtained from the second electronic device 120.

The first electronic device 110 (or the second electronic device 120) may be various types of computer devices, such as portable handheld devices, general purpose computers (such as personal computers and laptop computers), and wearable devices, among others. Among other things, portable handheld devices may include cellular phones, smart phones, tablet computers, Personal Digital Assistants (PDAs), and the like; wearable devices may include head mounted displays and other devices. The first electronic device 110 is capable of executing a variety of different applications, such as various Internet-related applications, communication applications (e.g., email applications), Short Message Service (SMS) applications, and may use a variety of communication protocols.

The first electronic device 110 (or the second electronic device 120) may also be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a Network service, cloud communication, a middleware service, a domain name service, a security service, a Network acceleration service (CDN), and a big data and artificial intelligence platform.

Communication network 130 may be any type of network that may support data communications using any of a variety of available protocols, including but not limited to TCP/IP, SNA, IPX, etc. Merely by way of example, one or more networks 130 may be a Local Area Network (LAN), an ethernet-based network, a token ring, a Wide Area Network (WAN), the internet, a virtual network, a Virtual Private Network (VPN), an intranet, an extranet, a Public Switched Telephone Network (PSTN), an infrared network, a wireless network (e.g., bluetooth, WIFI), and/or any combination of these and/or other networks.

Optionally, data synchronization may be performed on the same electronic device, that is, a source database and a target database may also be in the same electronic device, for example, referring to fig. 3, an electronic device 200 includes a source database 210, a cache region 220, and a target database 230, the source database 210, the cache region 220, and the target database 230 are connected in sequence, incremental data in the source database 210 may be cached in the cache region 220, and the incremental data cached in the cache region 220 is synchronized to the target database 230.

Fig. 4 is a flowchart illustrating a data synchronization method according to an embodiment of the present application, and referring to fig. 4, the embodiment of the present application provides a data synchronization method that can be used in the data synchronization system 100 shown in fig. 2, the electronic device 200 shown in fig. 3, and the data synchronization apparatus 300 shown in fig. 11. In the following, taking an example that the data synchronization method is applied to the electronic device 200, and a detailed description is made with respect to the flow shown in fig. 4, the data synchronization method may specifically include the following steps:

step S110, when the source database includes incremental data, acquiring the incremental data from the source database according to a first acquisition frequency and caching the incremental data in the cache region.

Databases are important components of data-centric applications that can store large amounts of data. The source database stores data generated by an enterprise, an organization, or an individual, and optionally, the source database may be a MYSQL database. For example, the data in the source database may be data generated by a certain application on the electronic device of the user during the use process, or may be all data generated by the electronic device of the user during the use process. For another example, the source database may be in the form of a table, and the source database may be an "account transaction flow meter" (the amount of data corresponding to the table is more than 30 hundred million), "gold price change table", or the like. Under the condition that the source database has the problems of downtime, confusion, data damage or data loss and the like, in order to prevent the loss of useful data, the data in the source database can be backed up to a specified position, namely the target database, so that the data can be recovered from the target database under the condition that the source database has the problems.

Incremental data is compared to full data, which refers to data already existing in the source database at the backup start time, and can be understood as all data in the source database at the backup start time. The incremental data refers to data newly added in the source database between the last data backup and the data backup. For example, the incremental data may be data generated during the operation of the application program between the last data backup and the data backup. For another example, the incremental data may be transaction changes occurring in the "account transaction flow meter" between the last data backup and the current data backup, for example, the incremental data may be 0.005% of the increment, or the balance detail.

Under the condition that the source database comprises the incremental data, in order to reduce the pressure of data backup and save storage resources in the target database, only the incremental data may be backed up, and according to backup setting started by a user or a default backup function of the electronic device, the incremental data is acquired from the source database according to the first acquisition frequency and cached in a cache region.

Optionally, a first obtaining frequency with a higher frequency may be set, the incremental data in the source database may be obtained in real time according to the first obtaining frequency, and cached in the cache region, that is, the incremental data is cached in the cache region as long as the incremental data exists in the source database, so that it is ensured that all data changes in the source database are recorded in the cache region. A first obtaining frequency that is slightly larger may also be set, for example, the first obtaining frequency may be set to 1 minute/time, 2 minutes/time, and the like, and according to the first obtaining frequency, the incremental data in the source database may be obtained in a quasi-real-time manner and cached in the cache region, that is, the incremental data may be continuously cached in the cache region. Illustratively, the first acquiring frequency is set to 2 minutes/time, data is backed up at 8:00, incremental data needs to be backed up again at 8:02 according to the first acquiring frequency, data newly added in the source database between 8:00 and 8:02 is used as incremental data, and the incremental data is acquired and buffered in the buffer area.

In some embodiments, the cache region may include a Binlog log, which is a binary log file and is a temporary memory file. The Binlog log records all Data Manipulation Language (DML) operations in the source database, where the DML operations are not limited to modification, creation, addition, insertion, deletion, and the like, and the DML operations may be cached in the Binlog log in the form of incremental Data.

In other embodiments, the cache region includes a Canal module, a Kafka module, and a Hudi table, and when the source database includes the incremental data, the incremental data is cached sequentially through the Canal module, the Kafka module, and the Hudi table at the first obtaining frequency, and specifically, how to cache the incremental data through the Canal module, the Kafka module, and the Hudi table is described in the subsequent embodiments.

Step S120, obtaining the incremental data from the cache region according to a second obtaining frequency, where the first obtaining frequency is higher than the second obtaining frequency.

The cache region is cached with the incremental data, and the incremental data is obtained from the cache region for backup according to a Structured Query Language (SQL) or by using an offline tool (for example, Spark, Sqoop, DataX, or the like) according to a preset second obtaining frequency.

In this embodiment, the first obtaining frequency is higher than the second obtaining frequency, and the second obtaining frequency is set to a lower frequency, so that the first obtaining frequency is higher than the second obtaining frequency, for example, the first obtaining frequency may be 1 minute/time, and the second obtaining frequency is 1 day/time, because the first obtaining frequency is higher than the second obtaining frequency, increment data has been obtained from the source database according to the first obtaining frequency with higher frequency for multiple times, increment data generated multiple times has been accumulated in the cache area, and the increment data accumulated multiple times is directly obtained from the cache area at one time according to the second obtaining frequency with lower frequency, thereby ensuring integrity of data obtained in the cache area as much as possible, and avoiding that the increment data needs to be pulled from the source database step by step under the condition that the target database needs to backup the increment data, the backup efficiency is reduced.

The first obtaining frequency is a higher frequency, and for a case that the first obtaining frequency is less than or equal to the second obtaining frequency, the second obtaining frequency is also a higher frequency, in this case, frequent updating of the target database is also caused, and frequent jumping of the result concerned by the user is caused. Therefore, setting the first acquisition frequency higher than the second acquisition frequency can prevent frequent jumps in data in the target database.

For example, when the second obtaining frequency is 1 day/time, obtaining incremental data from the buffer area may be an offline operation, and according to a use requirement, a backup may be started in the morning every day, and incremental data accumulated for a period of time (for example, one day) is obtained from the buffer area and backed up to the target database. It should be noted that the second obtaining frequency is not limited to 1 day/time, and may also be set to 12 hours/time or 2 days/time according to the speed of updating the increment speed, the service type, and the like in the source database.

And step S130, updating the target database according to the incremental data.

And the updated data included in the target database is consistent with the data included in the source database, so that the backup of the data in the source database is realized.

Storing incremental data (which may be in the form of a Merge command) into a target database, and then updating the target database according to the incremental data, where the number of the target database may be 1 or more, and specifically, deleting, modifying, replacing, adding and the like the original data in the target database according to the incremental data, so that the data included in the updated target database is consistent with the data included in the source database, and the data in the source database is backed up to the target database, where the target database may be an ODS table. For example, when the source database is "account transaction flow meter", and the second acquisition frequency is 1 day/time, the "account transaction flow meter" in the target database is updated once every day according to the incremental data, and the income and expense details of the flow meter can be checked according to the incremental data.

Optionally, after the incremental data in the cache region is used for updating the target database, the incremental data stored in the cache region is cleared to free up the storage space of the cache region for subsequent storage of the incremental data, so that the occupation of the storage space of the cache region is reduced, and the influence of the incremental data which has been used for updating the target database before on subsequent updating is prevented.

In the data synchronization method provided in this embodiment, when the source database includes incremental data, the incremental data is acquired from the source database according to a first acquisition frequency with a higher frequency and cached in the cache region, the incremental data is acquired from the cache region according to a second acquisition frequency with a lower frequency, because the first acquisition frequency is higher than the second acquisition frequency, some incremental data are already accumulated in the cache region, and the target database is updated according to the incremental data, so that data included in the updated target database is consistent with data included in the source database, thereby achieving data synchronization, and the first acquisition frequency is higher than the second acquisition frequency, which avoids updating the target database immediately upon occurrence of the incremental data, prevents the target database from frequently jumping, and also saves processing resources of the electronic device. And the incremental data is changed data in the source database, and in order to ensure that the two databases are consistent, the target database needs to be updated by the incremental data, the incremental data is separately stored in the cache region, the incremental data can be directly obtained from the cache region, and the incremental data does not need to be searched from the source database with huge data volume according to the index.

Based on the above embodiment, referring to fig. 5, the source database 210 includes an incremental log 211, and the incremental log 211 is a binary log file. The incremental log 211 records all write operations (or DML operations) in the source database, where the write operations are not limited to modify, create, add, insert, delete, etc., the write operations may be cached in the incremental log 211 in the form of incremental data, and optionally, the incremental log may be a Binlog (it should be noted that, in the case that the incremental log is a Binlog, the Binlog is of the same type as the Binlog in the embodiment of step S110, but is not the same Binlog). Based on this, the present application further provides a data synchronization method, fig. 6 shows a schematic flow chart of the data synchronization method provided in another embodiment of the present application, please refer to fig. 6, where the data synchronization method specifically includes the following steps:

step S210, when the incremental log in the source database includes the incremental data, acquiring the incremental data from the incremental log according to the first acquisition frequency and caching the incremental data in the cache region.

And opening the incremental log of the source database, specifically, setting a parameter log _ bin to be on, and opening the incremental log according to the setting. After the incremental log is opened, data changes generated by all write operations in the source database are recorded in the incremental log, for example, a user performs write operations on the source database through equipment, where the write operations include operations of addition, deletion, modification, and the like; under the condition that the write operation exists in the source database, the data in the source database changes, the data (which can be understood as incremental data) written by the write operation is acquired, and the data written by the write operation is stored into an incremental log according to a preset storage path to obtain the incremental data. Because the incremental log is a binary log file, the incremental data in the source database is recorded in the incremental log in a binary format. When the incremental data exist in the incremental log, acquiring the incremental data from the incremental log according to a preset first acquiring frequency, and caching the acquired incremental data in a cache region.

In some embodiments, referring to fig. 7 on the basis of fig. 5, the source database 210 further includes a log reading module 212, and optionally, the log reading module 212 may be a MySQL Mater module. Referring to fig. 7, the buffer area 220 includes a format conversion module 221, a transition module 222, and a buffer module 223, wherein the format conversion module 221, the transition module 222, and the buffer module 223 are connected in sequence, and the buffer module 223 is connected to the target database 230. Based on this, referring to fig. 8, step S210 may further include the following sub-steps:

and a substep S211, sending a backup request to the log reading module through the format conversion module according to the first obtaining frequency when the incremental log includes the incremental data.

In the case that there is incremental data in the incremental log, in order to ensure consistency of data in the two databases, the format conversion module 221 sends a backup request to the log reading module 212 in the source database to obtain the incremental data sent by the incremental log 211 in the source database.

Alternatively, the format conversion module can be a Canal module, which is middleware developed by java and used for providing subscription and consumption of incremental data based on the incremental log of the database, and can be understood as a tool for synchronizing the incremental data. The incremental data of the source database can be conveniently synchronized to other storage modules or storage applications through the Canal module.

The working principle of the cancer module is that the MySQL slave module disguises itself as a source database, an interactive protocol simulating the MySQL slave generates a backup request as a Dump request, and the Dump request is sent to the MySQL Mater module of the source database through the cancer module.

And a substep S212, converting the incremental data in the first preset format in the incremental log into the incremental data in the second preset format through the format conversion module when the log reading module receives the backup request.

When the log reading module 212 of the source database receives the backup request, the incremental log 211 of the source database sends incremental data in a first preset format to the format conversion module 221. Specifically, when the log reading module of the source database receives the backup request, the incremental log of the source database is controlled to send incremental data in the binary format to the format conversion module in response to the request, that is, the incremental log pushes the incremental data to the format conversion module. Optionally, the incremental log is a Binlog log, and the corresponding first preset format is a binary format.

The format conversion module converts the incremental data in the first format in the incremental log into incremental data in a second preset format, wherein the second preset format is a format convenient for a user to view, for example, the second format is a JSON format. The format conversion module 221 may send the delta data in the second preset format to the transition module 222.

And a substep S213, sending the incremental data in the second preset format to the cache module through the transition module, wherein the cache module is configured to cache the incremental data.

Alternatively, the transition module 222 may be a Kafka module, which is a high-throughput distributed publish-subscribe messaging system that can handle all actions (e.g., web browsing, searching, and other user actions) of a consumer on a web site flowing data.

Because the incremental data in the first preset format is in a format convenient for the electronic device to understand, and the format convenient for the user to understand is in the second preset format, the incremental data in the first preset format needs to be converted into the incremental data in the second preset format for the user to understand and view. The format conversion module 221 converts the incremental data in the first preset format into the incremental data in the second preset format, and sends the incremental data in the second preset format to the transition module 222 for temporary storage.

The transition module 222 receives the incremental data in the second preset format pushed by the format conversion module 221, and controls the transition module 222 to send the incremental data in the second preset format to the cache module 223 for caching. The cache module may be a Hudi table, kudu, delta Lake, or the like, and the Hudi table is an abbreviation of Hadoop Updates and accommodals and is used for executing operation middleware such as update, insertion, and deletion on Hadoop.

In this embodiment, the incremental data in the second preset format is sent to the cache module through the transition module. The cache module marks the incremental data in the second preset format by using a data identifier, such as Hoodie _ type, DELETE/UPDATE/INSERT, wherein the data identifier indicates a processing type of the corresponding data, for example, the processing type includes a deletion type, an addition type, and the like. Optionally, for an Upset operation performed by the cache module, the incremental data is identified by using a data identifier "Upset" or "U", for a DELETE operation, the incremental data may be identified by using a data identifier "DELETE" or "D", for an ADD operation, the incremental data may be identified by using a data identifier "ADD" or "a", for a transfer-IN operation, the incremental data may be identified by using a data identifier "IN" or "I", and for a transfer-OUT operation, the incremental data is identified by using a data identifier "OUT" or "O". The cache module caches the marked incremental data in the second preset format and the data identifier, where the marked incremental data refers to incremental data carrying the data identifier, and after the marked incremental data enters the target database at a later stage, the target database processes the incremental data according to the processing type indicated by the data identifier, for example, for the incremental data 3 of the data identifier "D", the target database deletes the data 3 corresponding to the incremental data from the stored data (exemplarily, the data includes 1,2,3), and the obtained data in the target database includes 1 and 2. In the embodiment, the target database performs increase and decrease processing on the incremental data according to the data identifier, so that the data stored in the processed target database and the data stored in the source database are kept consistent.

Step S220, obtaining the incremental data from the cache region according to a second obtaining frequency, where the first obtaining frequency is higher than the second obtaining frequency.

And acquiring the incremental data from the cache module according to the second acquisition frequency.

And step S230, updating the target database according to the incremental data.

In this embodiment, the detailed description of steps S220 to S230 may refer to steps 120 to S130 in the above embodiments, and will not be described herein again.

In the data synchronization method provided in this embodiment, when the source database includes incremental data, the incremental data in the first preset format is obtained from the incremental log of the source database according to a first obtaining frequency with a higher frequency and is cached in the cache format conversion module, the format conversion module converts the incremental data in the first preset format into incremental data in a second preset format and sends the incremental data to the transition module, the transition module sends the incremental data in the second preset format to the cache module for caching, the incremental data is obtained from the cache module according to a second obtaining frequency with a lower frequency, because the first obtaining frequency is higher than the second obtaining frequency, some incremental data have been accumulated in the cache region, and the target database is updated according to the incremental data, so that the data included in the updated target database is consistent with the data included in the source database, thereby implementing data synchronization, the first obtaining frequency is higher than the second obtaining frequency, the target database is prevented from being updated immediately when incremental data appears, frequent jumping of the target database is prevented, and processing resources of the electronic equipment are saved.

On the basis of the foregoing embodiment, the present embodiment provides a data synchronization method, fig. 9 shows a schematic flow chart of a data synchronization method provided in another embodiment of the present application, and referring to fig. 9, the data synchronization method may specifically include the following steps:

step S310, acquiring the full amount of data included in the source database.

When the source database is backed up to the target database for the first time, in order to ensure that the contents in the two databases are consistent, an initialization operation is required, that is, all the data in the source database are backed up to the target database. And acquiring the full amount of data stored in the source database.

And step S320, backing up the full data to the target database.

Optionally, the full amount of data is sent to the cache module shown in fig. 7 for temporary storage, and the full amount of data is sent to the target data through the cache module for backup, so that the full amount of data in the source database is stored in the target database.

Optionally, when backing up the full amount of data, data search is not needed, so that the full amount of data in the source database can be directly sent to the target database for backup, so that the full amount of data in the source database is stored in the target database.

Step S330, when the source database includes incremental data, acquiring the incremental data from the source database according to a first acquisition frequency and caching the incremental data in the cache region, where the cache region includes a cache module.

Referring to fig. 7, the buffer area 220 includes a format conversion module 221, a transition module 222, and a buffer module 223, and sends a backup request to a log reading module of the source database through the format conversion module, when the log reading module of the source database receives the backup request, the incremental log of the source database is controlled to send incremental data of a first preset format to the format conversion module, the format conversion module converts the incremental data of the first preset format into incremental data of a second preset format, and sends the incremental data of the second preset format to the transition module for temporary storage, and sends the incremental data of the second preset format to the buffer module through the transition module for buffer storage.

Step S340, acquiring the marked incremental data from the cache module according to the second acquisition frequency when the cache module finishes caching the incremental data, wherein the cache module finishes caching the incremental data and is determined by the acquired cache finish identifier.

Whether data delay exists in the cache module needs to be judged, specifically, when data of a previous period exists in the cache module, it is described that there is a delay due to factors such as a network in the cache module, for example, when the second acquisition frequency is 1 day/time, that is, the target database acquires incremental data from the cache module every day, when the day is 10 months and 8 days, data before 10 months and 8 days still exists in the cache module, for example, data before 10 months and 7 days exists in the cache module, it is described that there is data delay in the cache module, it is necessary to circularly wait for the cache module to store the incremental data, that is, the target database needs to acquire previous data from the cache module first, so that the cache module vacates a storage location for caching the incremental data, when the cache module stores the incremental data, and generating a cache completion identifier, and finishing the cycle waiting.

When the cache completion identifier sent by the cache module is acquired, it indicates that the cache module has cached the incremental data, and the incremental data and the data identifier of the incremental data are acquired from the cache module according to a second acquisition frequency. And under the condition that the cache module does not have delay, directly acquiring marked incremental data from the cache module, wherein the marked incremental data comprises incremental data and data identification.

And step S350, updating the full data according to the data identification and the incremental data.

And updating the full data in the target database according to the data identification and the incremental data. For example, when the data is identified as "DELETE", the corresponding incremental data in the full amount of data is deleted, and for example, when the data is identified as "ADD", the incremental data is added to the full amount of data.

The data synchronization method provided in this embodiment obtains full data included in a source database, backs up all the full data to a target database, obtains incremental data from the source database according to a first obtaining frequency and buffers the incremental data in a buffer module of a buffer area when the source database includes the incremental data, generates a buffer completion identifier when there is no delay in the buffer module and the incremental data is already buffered in the buffer module, obtains the incremental data and a data identifier from the buffer module according to a second obtaining frequency when the buffer completion identifier sent by the buffer module is received, updates the full data in the target database according to the incremental data and the data identifier, and completes the backup of the source database.

Optionally, in a scenario where the incremental log is a Binlog log, the format conversion module is a Canal module, the transition module is a Kafka module, and the caching module is a Hudi table, fig. 10 shows a schematic flow diagram of a data synchronization method provided in another embodiment of the present application, please refer to fig. 10, in the data synchronization method, a source database includes the Binlog, and the Binlog in the source database is started, so that all DML operations in the source database are recorded in the Binlog in the form of incremental data. If the data backup method is initialized for the first time, writing the full data in the source database into the Hudi table, and sending the full data to the target database through the Hudi table to realize the backup of the full data. If the data is not backed up for the first time, initializing again, and converting the incremental data in the Binlog log into the JSON format through a cancer module, wherein the log is binary data, that is, the binary format incremental data is converted into the JSON format incremental data; writing the incremental data in the JSON format into the kafka module through the cancer module, responding to Upset operation, and acquiring the incremental data in the kafka module by the Hudi table; judging whether delay exists in the Hudi table; if the Hudi table has delay, storing the previous data in the Hudi table in a target database to make the storage space of the Hudi table free for storing incremental data; and if the Hudi table has no delay, directly acquiring data of the T-1 partition (the data of the T-1 partition is yesterday data), namely incremental data, merging the incremental data into a target database, realizing backup of the incremental data, finishing the number of samples (the number of samples is extracted incremental data), and representing that the backup is successful.

Wherein, the T-2 partition refers to the data of the previous day, and the T-3 partition refers to the data of 3 days.

In this embodiment, the specific description in fig. 10 may refer to the steps in the above embodiments, and is not repeated herein.

To implement the foregoing method embodiments, this embodiment provides a data synchronization apparatus, which is applied to an electronic device, where the electronic device includes a source database, a cache region, and a target database, fig. 11 shows a block diagram of the data synchronization apparatus provided in this embodiment of the present application, and referring to fig. 11, a data synchronization apparatus 300 includes: a caching module 310, an acquisition module 320, and a synchronization module 330.

A caching module 310, configured to, when the source database includes incremental data, obtain the incremental data from the source database according to a first obtaining frequency and cache the incremental data in the caching area;

an obtaining module 320, configured to obtain the incremental data from the cache region according to a second obtaining frequency, where the first obtaining frequency is higher than the second obtaining frequency;

a synchronization module 330, configured to update the target database according to the incremental data.

Optionally, the source database includes an incremental log, and the caching module 310 includes: and an increment buffer module.

And the increment caching module is used for acquiring the increment data from the increment log according to the first acquisition frequency and caching the increment data in the caching area under the condition that the increment log in the source database comprises the increment data.

Optionally, the data synchronization apparatus 300 further includes: a write module and a storage module.

A write-in module, configured to, when there is a write-in operation in the source database, obtain data written in by the write-in operation;

and the storage module is used for storing the written data to the incremental log according to a preset storage path to be used as incremental data.

Optionally, the source database further includes a log reading module, the cache region includes a format conversion module, a transition module, and a cache module, and the increment cache module includes: the device comprises a backup request module, a format conversion module and a target cache module.

The backup request module is used for sending a backup request to the log reading module through the format conversion module according to the first acquisition frequency under the condition that the incremental log comprises the incremental data;

the format conversion module is used for converting the incremental data in the incremental log in a first preset format into the incremental data in a second preset format through the format conversion module under the condition that the log reading module receives the backup request;

and the target cache module is used for sending the incremental data in the second preset format to the cache module through the transition module, and the cache module is used for caching the incremental data.

Optionally, the target cache module includes: the device comprises a sending module and a processing module.

The sending module is used for sending the incremental data in the second preset format to the cache module through the transition module;

and the processing module is used for marking the incremental data in the second preset format through the cache module according to a data identifier and caching the marked incremental data in the second preset format, wherein the data identifier is used for indicating the processing type of the corresponding data.

Optionally, the data synchronization apparatus 300 further includes: the system comprises a full data acquisition module and a full data backup module.

A full data acquisition module, configured to acquire full data included in the source database;

and the full data backup module is used for backing up the full data to the target database.

Optionally, the obtaining module 320 includes: and an identification acquisition module.

And the identification acquisition module is used for acquiring the marked incremental data from the cache module according to the second acquisition frequency under the condition that the cache module finishes caching the incremental data, wherein the cache module finishes caching the incremental data and is determined by the acquired cache finishing identification.

It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described apparatuses and modules may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the several embodiments provided in the present application, the coupling between the modules may be electrical, mechanical or other type of coupling.

In addition, functional modules in the embodiments of the present application may be integrated into one processing module, or each of the modules may exist alone physically, or two or more modules are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode.

Fig. 12 is a block diagram of an electronic device for executing a data synchronization method according to an embodiment of the present application, please refer to fig. 12, which shows a block diagram of an electronic device 200 according to an embodiment of the present application. The electronic device 200 may be a smart phone, a tablet computer, an electronic book, or other electronic devices capable of running an application program. The electronic device 200 in the present application may include one or more of the following components: a processor 240, a memory 250, and one or more applications, wherein the one or more applications may be stored in the memory 250 and configured to be executed by the one or more processors 240, the one or more programs configured to perform a method as described in the aforementioned method embodiments.

Processor 240 may include one or more processing cores, among others. The processor 240 interfaces with various components throughout the electronic device 200 using various interfaces and lines to perform various functions of the electronic device 200 and process data by executing or performing instructions, programs, code sets, or instruction sets stored in the memory 250 and invoking data stored in the memory 250. Alternatively, the processor 240 may be implemented in hardware using at least one of Digital Signal Processing (DSP), Field-Programmable Gate Array (FPGA), and Programmable Logic Array (PLA). The processor 240 may integrate one or more of a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a modem, and the like. The CPU mainly processes an operating system, a user interface, an application program and the like; the GPU is used for rendering and drawing the components to be displayed; the modem is used to handle wireless communications. It is understood that the modem may not be integrated into the processor 240, but may be implemented by a communication chip.

The Memory 250 may include a Random Access Memory (RAM) or a Read-Only Memory (Read-Only Memory). The memory 250 may be used to store instructions, programs, code sets, or instruction sets. The memory 250 may include a stored program area and a stored data area, wherein the stored program area may store instructions for implementing an operating system, instructions for implementing at least one function (such as a touch function, a sound playing function, an image playing function, etc.), instructions for implementing various method embodiments described below, and the like. The storage data area may also store data created by the electronic device 200 in use (such as historical profiles) and the like.

Fig. 13 shows a storage unit for storing or carrying program codes for implementing a data synchronization method according to an embodiment of the present application, please refer to fig. 13, which shows a block diagram of a computer-readable storage medium provided in an embodiment of the present application. The computer-readable storage medium 400 has stored therein program code that can be called by a processor to execute the methods described in the above-described method embodiments.

The computer-readable storage medium 400 may be an electronic memory such as a flash memory, an EEPROM (electrically erasable programmable read only memory), an EPROM, a hard disk, or a ROM. Alternatively, the computer-readable storage medium 400 includes a non-volatile computer-readable storage medium. The computer readable storage medium 400 has storage space for program code 410 for performing any of the method steps of the method described above. The program code can be read from and written to one or more computer program products. Program code 410 may be compressed, for example, in a suitable form.

Optionally, an embodiment of the present application further provides a computer program product, where the computer program product includes a computer program/instruction, and the computer program/instruction, when executed by a processor, implements the above method.

In summary, the present application provides a data synchronization method, an apparatus, an electronic device and a storage medium, under the condition that the source database comprises the incremental data, acquiring the incremental data from the source database according to a first acquisition frequency with higher frequency and caching the incremental data in a cache region, acquiring the increment data from the cache region according to a second acquiring frequency with lower frequency, wherein the first acquiring frequency is higher than the second acquiring frequency, therefore, some increment data are accumulated in the cache region, the target database is updated according to the increment data, the data included in the updated target database is consistent with the data included in the source database, data synchronization is achieved, the first obtaining frequency is higher than the second obtaining frequency, the target database is prevented from being updated immediately when incremental data appear, frequent jumping of the target database is prevented, and processing resources of electronic equipment are saved. And the incremental data is data changed in the source database, in order to ensure consistency of the two databases, the target database needs to be updated by the incremental data, the incremental data is separately stored in the cache region, the incremental data can be directly obtained from the cache region, and the incremental data does not need to be searched from the source database with huge data volume according to the index.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present application, and not to limit the same; although the present application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.

Claims

1. A data synchronization method is applied to an electronic device, wherein the electronic device comprises a source database, a cache region and a target database, and the method comprises the following steps:

under the condition that the source database comprises incremental data, acquiring the incremental data from the source database according to a first acquisition frequency and caching the incremental data in the cache region;

acquiring the incremental data from the cache region according to a second acquisition frequency, wherein the first acquisition frequency is higher than the second acquisition frequency;

and updating the target database according to the incremental data.

2. The method of claim 1, wherein the source database comprises a delta log, and wherein, in the case that the source database comprises delta data, retrieving the delta data from the source database and caching the delta data in the cache region according to a first retrieval frequency comprises:

and under the condition that the incremental log in the source database comprises the incremental data, acquiring the incremental data from the incremental log according to the first acquisition frequency and caching the incremental data in the cache region.

3. The method according to claim 2, wherein when the delta data is included in the delta log in the source database, the method further includes, according to the first obtaining frequency, obtaining the delta data from the delta log and buffering the obtained delta data before the obtaining frequency is buffered in the buffer area, and further including:

under the condition that the write-in operation exists in the source database, acquiring data written by the write-in operation;

and storing the written data into the incremental log according to a preset storage path to obtain the incremental data.

4. The method according to claim 2, wherein the source database further includes a log reading module, and the cache region includes a format conversion module, a transition module, and a cache module, and the obtaining the incremental data from the incremental log according to the first obtaining frequency and caching the incremental data in the cache region when the incremental data is included in the incremental log in the source database includes:

under the condition that the incremental log comprises the incremental data, sending a backup request to the log reading module through the format conversion module according to the first acquisition frequency;

under the condition that the log reading module receives the backup request, converting the incremental data in the first preset format in the incremental log into the incremental data in the second preset format through the format conversion module;

and sending the incremental data in the second preset format to the cache module through the transition module, wherein the cache module is used for caching the incremental data.

5. The method according to claim 4, wherein the sending, by the transition module, the incremental data in the second preset format to the cache module comprises:

sending the incremental data in the second preset format to the cache module through the transition module;

and marking the incremental data in the second preset format through the cache module according to a data identifier, and caching the marked incremental data in the second preset format, wherein the data identifier is used for indicating the processing type of the corresponding data.

6. The method of claim 5, wherein in a case that the source database includes delta data, acquiring the delta data from the source database according to a first acquiring frequency and buffering the delta data in the buffer area, further comprising:

acquiring full data included in the source database;

and backing up the full data to the target database.

7. The method of claim 5, wherein retrieving the delta data from the cache region according to a second retrieval frequency comprises:

and under the condition that the cache module finishes caching the incremental data, acquiring the marked incremental data from the cache module according to the second acquisition frequency, wherein the cache module finishes caching the incremental data and is determined by the acquired cache finish identifier.

8. A data synchronization device is applied to an electronic device, wherein the electronic device comprises a source database, a cache region and a target database, and the device comprises:

the cache module is used for acquiring the incremental data from the source database according to a first acquisition frequency and caching the incremental data in the cache area under the condition that the source database comprises the incremental data;

an obtaining module, configured to obtain the incremental data from the cache region according to a second obtaining frequency, where the first obtaining frequency is higher than the second obtaining frequency;

and the synchronization module is used for updating the target database according to the incremental data, wherein the data included in the updated target database is consistent with the data included in the source database.

9. An electronic device, comprising:

one or more processors;

a memory;

one or more applications, wherein the one or more applications are stored in the memory and configured to be executed by the one or more processors, the one or more applications configured to perform the method of any of claims 1-7.

10. A computer-readable storage medium, having stored thereon program code that can be invoked by a processor to perform the method according to any one of claims 1 to 7.