CN112347192A

CN112347192A - Data synchronization method, device, platform and readable medium

Info

Publication number: CN112347192A
Application number: CN202011279897.8A
Authority: CN
Inventors: 巴铁凯
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Baidu Online Network Technology Beijing Co Ltd; Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2020-11-16
Filing date: 2020-11-16
Publication date: 2021-02-09

Abstract

A data synchronization method, a data synchronization device, a data synchronization platform and a storage medium are provided, which relate to the technical field of computers, in particular to the technical field of data processing. The data synchronization method comprises the following steps: in response to receiving a data synchronization request, determining a data synchronization type indicated in the data synchronization request; according to the data synchronization type, a data synchronization process corresponding to the data synchronization type is started to extract data to be synchronized from a source database; determining table structure information of a target database; processing data to be synchronized according to the table structure information of the target database; and writing the processed data to be synchronized into the target database.

Description

Data synchronization method, device, platform and readable medium

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to the field of data processing and data synchronization technologies, and in particular, to a data synchronization method and apparatus, a data synchronization platform, and a computer-readable storage medium.

Background

With the development of internet technology, the surfing behavior of people on the internet generates a large amount of data, and the data is stored in different types of databases. How to achieve data synchronization between different types of databases is a general appeal.

In the related art, a corresponding synchronization tool has been developed, but the synchronization tool can be used only for data synchronization of one type of database to another kind of database, for example, for MySQL database to Hive database. If data synchronization between other types of databases is to be achieved, for example, data synchronization from a Hive database to a Palo database, a set of synchronization tasks needs to be developed for the data synchronization type, which is time-consuming and labor-consuming, and the development process is difficult for ordinary users to achieve.

Disclosure of Invention

According to an aspect of an embodiment of the present disclosure, a data synchronization method is provided. The method comprises the following steps: in response to receiving a data synchronization request, determining a data synchronization type indicated in the data synchronization request; according to the data synchronization type, a data synchronization process corresponding to the data synchronization type is started to extract data to be synchronized from a source database; determining table structure information of a target database; processing data to be synchronized according to the table structure information of the target database; and writing the processed data to be synchronized into the target database.

According to another aspect of the disclosed embodiments, a data synchronization apparatus is provided. The data synchronization apparatus includes: a determining unit configured to determine a data synchronization type indicated in a data synchronization request in response to receiving the data synchronization request; the calling unit is configured to call a data synchronization process corresponding to the data synchronization type according to the data synchronization type so as to extract data to be synchronized from the source database; an acquisition unit configured to determine table structure information of a target database; the processing unit is configured to process the data to be synchronized according to the table structure information of the target database; and the synchronization unit is configured to write the processed data to be synchronized into the target database.

According to another aspect of an embodiment of the present disclosure, a data synchronization platform is provided. The data synchronization platform comprises: a processor and a memory storing a program. The program comprises instructions which, when executed by the processor, cause the processor to perform a data synchronization method according to some embodiments of the present disclosure.

According to another aspect of an embodiment of the present disclosure, there is provided a computer-readable storage medium storing a program. The program comprises instructions which, when executed by a processor of an electronic device, cause the electronic device to perform a data synchronization method according to some embodiments of the present disclosure.

By means of the scheme of the exemplary embodiment of the disclosure, the data synchronization type can be determined according to the data synchronization request, and according to the data synchronization type, the data synchronization process adapted to the data synchronization type is automatically started, so that the data to be synchronized is extracted from the source database, and is written into the target database after the data to be synchronized is processed. Therefore, data synchronization among different types of databases can be realized without developing a set of synchronization tasks aiming at each type of data synchronization requirement, the task amount of synchronous development is reduced, and the data synchronization efficiency is improved.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate exemplary embodiments of the embodiments and, together with the description, serve to explain the exemplary implementations of the embodiments. The illustrated embodiments are for purposes of illustration only and do not limit the scope of the claims. Throughout the drawings, identical reference numbers designate similar, but not necessarily identical, elements, in which:

FIG. 1 is a schematic diagram of a data synchronization system, according to some example embodiments of the present disclosure;

FIG. 2 is a flow chart of a method of data synchronization according to some example embodiments of the present disclosure;

FIG. 3 is a schematic diagram of a data synchronization process in a banking and tabulation scenario, according to some example embodiments of the present disclosure;

FIG. 4 is a flow chart of a method of data synchronization according to further exemplary embodiments of the present disclosure;

FIG. 5 is a schematic structural diagram of a data synchronization platform, according to some demonstrative embodiments of the present disclosure;

FIG. 6 is a schematic diagram of a framework of a data synchronization platform, according to some example embodiments of the present disclosure;

FIG. 7 is a schematic diagram of a functional hierarchical framework of a data synchronization platform according to some exemplary embodiments of the present disclosure;

fig. 8 is a schematic block diagram of a data synchronization apparatus according to some example embodiments of the present disclosure; and

fig. 9 is a schematic block diagram of an example computing device, according to an example embodiment of the present disclosure.

Detailed Description

In order to make the technical solutions of the present disclosure better understood, the technical solutions of the embodiments of the present disclosure will be clearly and completely described below with reference to the drawings in the embodiments of the present disclosure. It is to be understood that the described embodiments are merely a subset of the disclosed embodiments and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments disclosed herein without making any creative effort, shall fall within the protection scope of the present disclosure.

The terms "first" and "second," and the like in the description and claims of the present disclosure and the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are capable of operation in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

Heterogeneous data, i.e. the same data is stored in different types of databases due to different application scenarios. Data synchronization between different types of databases is a common requirement, for example, data synchronization from a MySQL database to a Hive database is performed, so that offline analysis is performed on the synchronized mass data by Hive, for example, data synchronization from the Hive database to a Palo database is performed, so as to support online display of downstream report data, and the like. In the related art, there have been developed corresponding synchronization tools, but the synchronization tools can be used only for data synchronization of one type of database to another type of database. If data synchronization between other types of databases is to be realized, a set of synchronization tasks needs to be developed for the data synchronization type, which is time-consuming and labor-consuming, and the development process is difficult for ordinary users to realize.

In view of this, the present disclosure provides a data synchronization method, which can implement data synchronization between different types of databases. Fig. 1 illustrates a schematic diagram of an exemplary system 100 in which various methods and apparatus described herein may be implemented in accordance with embodiments of the present disclosure. Referring to fig. 1, the system 100 includes one or

more client devices

101, 102, 103, 104, 105, and 106, a server 120, and one or more communication networks 110 coupling the one or more client devices to the server 120.

Client devices

101, 102, 103, 104, 105, and 106 may be configured to execute one or more applications.

In embodiments of the present disclosure, the server 120 may also provide other services or software applications that may include non-virtual environments and virtual environments. In certain embodiments, these services may be provided as web-based services or cloud services, for example, provided to users of

client devices

101, 102, 103, 104, 105, and/or 106 under a software as a service (SaaS) model. In the configuration shown in fig. 1, server 120 may include one or more components that implement the functions performed by server 120. These components may include software components, hardware components, or a combination thereof, which may be executed by one or more processors. A user operating a

client device

101, 102, 103, 104, 105, and/or 106 may, in turn, utilize one or more client applications to interact with the server 120 to take advantage of the services provided by these components. It should be understood that a variety of different system configurations are possible, which may differ from system 100. Accordingly, fig. 1 is one example of a system for implementing the various methods described herein and is not intended to be limiting.

A user may submit a data synchronization request and query data synchronization tasks via

client devices

101, 102, 103, 104, 105, and/or 106. The client device may provide an interface that enables a user of the client device to interact with the client device. The client device may also output information to the user via the interface. Although fig. 1 depicts only six client devices, those skilled in the art will appreciate that any number of client devices may be supported by the present disclosure.

Client devices

101, 102, 103, 104, 105, and/or 106 may include various types of computing systems, such as portable handheld devices, general purpose computers (such as personal computers and laptop computers), workstation computers, wearable devices, gaming systems, thin clients, various messaging devices, sensors or other sensing devices, and so forth. These computing devices may run various types and versions of software applications and operating systems, such as Microsoft Windows, Apple iOS, UNIX-like operating systems, Linux, or Linux-like operating systems (e.g., Google Chrome OS); or include various Mobile operating systems, such as Microsoft Windows Mobile OS, iOS, Windows Phone, Android. Portable handheld devices may include cellular telephones, smart phones, tablets, Personal Digital Assistants (PDAs), and the like. Wearable devices may include head mounted displays and other devices. The gaming system may include a variety of handheld gaming devices, internet-enabled gaming devices, and the like. The client device is capable of executing a variety of different applications, such as various Internet-related applications, communication applications (e.g., email applications), Short Message Service (SMS) applications, and may use a variety of communication protocols.

Network 110 may be any type of network known to those skilled in the art that may support data communications using any of a variety of available protocols, including but not limited to TCP/IP, SNA, IPX, etc. By way of example only, one or more networks 110 may be a Local Area Network (LAN), an ethernet-based network, a token ring, a Wide Area Network (WAN), the internet, a virtual network, a Virtual Private Network (VPN), an intranet, an extranet, a Public Switched Telephone Network (PSTN), an infrared network, a wireless network (e.g., bluetooth, WIFI), and/or any combination of these and/or other networks.

The server 120 may include one or more general purpose computers, special purpose server computers (e.g., PC (personal computer) servers, UNIX servers, mid-end servers), blade servers, mainframe computers, server clusters, or any other suitable arrangement and/or combination. The server 120 may include one or more virtual machines running a virtual operating system, or other computing architecture involving virtualization (e.g., one or more flexible pools of logical storage that may be virtualized to maintain virtual storage for the server). In various embodiments, the server 120 may run one or more services or software applications that provide the functionality described below.

The computing system in server 120 may run one or more operating systems including any of the operating systems described above, as well as any commercially available server operating systems. The server 120 may also run any of a variety of additional server applications and/or middle tier applications, including HTTP servers, FTP servers, CGI servers, JAVA servers, database servers, and the like.

In some implementations, the server 120 may include one or more applications to analyze and consolidate data feeds and/or event updates received from users of the

client devices

101, 102, 103, 104, 105, and 106. Server 120 may also include one or more applications to display data feeds and/or real-time events via one or more display devices of

client devices

101, 102, 103, 104, 105, and 106.

The system 100 may also include one or more databases 130, and the one or more databases 130 may be a source database and a target database. In some embodiments, these databases may be used to store data and other information. For example, one or more of the databases 130 may be used to store information such as audio files and video files. The database 130 may reside in various locations. For example, the data store used by the server 120 may be local to the server 120, or may be remote from the server 120 and may communicate with the server 120 via a network-based or dedicated connection. The database 130 may be of different types. One or more of these databases may store, update, and retrieve data to and from the database in response to the command.

The system 100 of fig. 1 may be configured and operated in various ways to enable application of the various methods and apparatus described in accordance with the present disclosure. It should be understood that the system 100 shown in fig. 1 is only an example, and no limitation is made to the system to which the method provided by the embodiments of the present disclosure is applicable.

Fig. 2 is a schematic flow diagram of a data synchronization method 200 according to some example embodiments of the present disclosure. As shown in fig. 2, the method includes: in response to receiving the data synchronization request, determining a data synchronization type indicated in the data synchronization request (step S201); according to the data synchronization type, starting a data synchronization process corresponding to the data synchronization type to extract data to be synchronized from a source database (step S202); determining table structure information of the target database (step S203); processing the data to be synchronized according to the table structure information of the target database (step S204); and writing the processed data to be synchronized into the target database (step S205).

Therefore, the data synchronization type can be determined according to the data synchronization request, and the data synchronization process adaptive to the data synchronization type is automatically started according to the data synchronization type, so that the data to be synchronized is extracted from the source database and is written into the target database after the data to be synchronized is processed. Therefore, data synchronization among different types of databases can be realized without developing a set of synchronization tasks aiming at each type of data synchronization requirement, the cost and the task amount of synchronous development are reduced, and the data synchronization efficiency is improved.

A database may refer to a repository that organizes, stores, and manages data by data structures, and is an organized, shareable, and uniformly managed collection of large amounts of data stored within a computer. The user can perform operations such as adding, inquiring, updating and deleting on the data in the database. The source database may refer to a database in which data to be synchronized is located, and the target database may refer to a database in which data to be synchronized is to be written.

In some embodiments, a user may submit a data synchronization request through a client or browser page. The link information of the source database may be indicated in the data synchronization request to enable the data synchronization process to locate the source database. Therefore, the accurate positioning of the source database can be realized according to the link information. In some examples, the link information of the source database may include, for example, an address of the source database, a port of the source database, a username and password for accessing the source database, a name of a data table included in the source database, and the like. The address of the source database may be its IP address. In some examples, the table structure information of the source database may also be parsed according to the link information of the source database. According to the table structure information of the source database, the table structure of the target database can be constructed. Therefore, the table structure of the target database can be kept consistent with the table structure of the source database, the data synchronization time is saved, and the data synchronization efficiency is improved.

In some other embodiments, a mode of data synchronization may also be indicated in the data synchronization request, such as full data synchronization, incremental data synchronization, real-time data synchronization. Therefore, the data synchronization requirements of multiple modes of a user can be supported. The full data synchronization means that the full data in the data table of the source database is read and synchronized into the target database. Under the full data synchronization, the data to be synchronized is all the data in the source database. Incremental data synchronization refers to reading incremental data in a data table of a source database system and synchronizing the incremental data to a target database. Under incremental data synchronization, the data to be synchronized may include data that the source database changed within a predetermined period of time and the type of data change. The types of data changes may include, for example, modification of data, addition of data, and deletion of data. The predetermined period of time may be, for example, one day, two days, etc., as the present disclosure is not limited thereto. In some examples, incremental data synchronization may be achieved by reading a commit log of the database. Real-time data synchronization refers to a synchronization scenario with high requirement on timeliness of data storage, and the time interval from a source database to a target database is short. For real-time data synchronization, the source database may be a data stream pipe, the content (e.g., distributed message queue) in the data stream pipe may be any customized data structure, and the data may be continuously pushed over time.

In some embodiments, a data synchronization type may be indicated in the data synchronization request. The data synchronization type may be determined based on the type of the source database. The source database may be different types of databases, for example, the source database is a first type of source database, a second type of source database, and a third type of source database. In some examples, the first type of source database may be a relational database. A relational database may refer to a database that organizes data using a relational model, which stores data in rows and columns. Examples of relational databases may include, for example, MySQL, PostgreSQL, Oracle, SQLite, and the like. The second type of source database may be a non-relational database, such as Elastic Search (ES), Solr, and the like. ES is a Lucene-based search server, and Solr is a Lucene-based full-text search server. The third type source database may be a distributed data stream pipeline, such as Kafka. Kafka is developed by Linkedin corporation, is a distributed, partitioned, multi-copy, multi-subscriber, Zookeeper-coordinated-based distributed logging system, and can be used for Web/Nginx logs, access logs, message services, and the like. The target database may be the same type of database as the source database, or may be a different type of database.

In some examples, for example, the target database may be Hive, Palo, etc. Hive is a data warehouse tool based on Hadoop ecology, can store, query and analyze large-scale data sets stored in Hadoop, and can realize flexible data analysis by providing SQL query. Palo is a high-performance parallel database supporting online reporting and multidimensional analysis applications. Formatted data may be imported into palo, analyzed and accessed through MySQL's interface. Palo can complete the analysis of TB level data in seconds level time. In other examples, the target database may also be an intra-enterprise database, such as a User Data Warehouse (UDW) provided by a hundred degree company for its employees.

Accordingly, the data synchronization types may include data synchronization of a first type of source database to a target database (e.g., MySQL to UDW data synchronization, abbreviated MySQL2UDW), data synchronization of a second type of source database to a target database (ES to UDW data synchronization, abbreviated ES2UDW), and data synchronization of a third type of source database to a target database (Kafka to UDW data synchronization, abbreviated Kafka2 UDW). For each data synchronization type, a data synchronization process corresponding to the data synchronization type can be respectively invoked to extract data to be synchronized from the source database. It is understood that, in the case that the source database is different in type, the way in which the data synchronization process extracts the data to be synchronized from the source database will also be different.

In some embodiments, according to the data synchronization type, invoking a data synchronization process corresponding to the data synchronization type to extract data to be synchronized from the source database may include: and in response to determining that the data synchronization type is data synchronization from the source database of the first type to the target database, sending a request to a data transmission service corresponding to the source database of the first type to request to extract data to be synchronized from the source database of the first type and store the extracted data to be synchronized in the temporary storage unit. In some examples, for example, in a case that the source database of the first type is MySQL, the Data to be synchronized may be extracted from the source database through a Data Transmission Service (DTS) interface, and the extracted Data to be synchronized is stored in a Distributed File System (HDFS) cluster. Then, a Spark task may be called to process the data to be synchronized stored in the HDFS cluster, for example, data cleaning, data deduplication, and the like.

In some embodiments, according to the data synchronization type, invoking a data synchronization process corresponding to the data synchronization type to extract data to be synchronized from the source database may further include: responding to the data synchronization from the source database with the data synchronization type being the second type to the target database, and acquiring a plug-in corresponding to the source database with the second type; and extracting the data to be synchronized from the source database of the second type through the plug-in. In some examples, for example, in the case that the source database of the second type is an ES, data (e.g., data to be synchronized) in the ES may be read by an espark plug-in, and then the extracted data is processed by a Spark task. The processed data may be written to the target database. Therefore, the target database is directly reached without unnecessary storage media, and the storage overhead is reduced.

In some embodiments, according to the data synchronization type, invoking a data synchronization process corresponding to the data synchronization type to extract data to be synchronized from the source database may further include: and in response to the fact that the data synchronization type is determined to be data synchronization from the third type source database to the target database, extracting data to be synchronized from the third type source database at intervals of preset time intervals, and storing the extracted data to be synchronized into a temporary storage unit. In some examples, for example, in a case where the third type source database may be Kafka, data (e.g., data to be synchronized) may be extracted from Kafka by a Spark Streaming task at regular time intervals (e.g., 5 minutes, 10 minutes, etc.), and the extracted data may be temporarily stored in the HDFS. Then, the data in the HDFS may be processed by a Spark batch processing task.

After the data to be synchronized is extracted from the source database, the data to be synchronized can be processed according to the table structure information of the target database and written into the target database. In some embodiments, the table structure information of the target database may include a unique key of a data table contained in the target database. According to the table structure information of the target database, processing the data to be synchronized may include: and according to the unique key of the data table contained in the target database, carrying out duplicate removal on the data to be synchronized. A unique key may refer to the uniqueness used to identify data in a corresponding field. In some examples, the data table may have one unique key. As shown in Table 1, the school number may be the only key of the data table. Each school number (e.g., 123XXX, 234XXX, 345XXX) is the value of the unique key of the row of data.

TABLE 1

Number learning	Name (I)	Achievement of	Class of class
				123XXX	Sheet XX	78	Eight classes of eight years
234XXX	Plum XX	85	Eight classes of eight years
				345XXX	King XX	95	Eight classes of eight years
123XXX	Sheet XX	87	Eight classes of eight years

In some other examples, the data table may have multiple unique keys. As shown in Table 2, the order ID and the identification number may together comprise the unique key of Table 2.

TABLE 2

Order ID	Identity card number	Amount of order	Date of placing order
				123***	4114***	49.9RMB	2020-05-20
234***	4223***	199.9RMB	2020-06-15
				345***	1123***	239.8RMB	2020-10-11

In some embodiments, de-duplicating the data to be synchronized according to the unique key of the data table contained in the target database may include: dividing data with the same unique key in the data to be synchronized into a group to obtain a plurality of groups; acquiring a timestamp of data to be synchronized; for each group in the plurality of groups, sequencing the data in the group according to the time sequence according to the time stamp; and for each of the plurality of packets, selecting the top-ranked data in the packet. In some examples, the data in the database is associated with a timestamp that characterizes the time at which the data was written to the database. In some examples, the database may include multiple data with the same unique key, which may be the result of modifications, additions to the data. For example, as shown in Table 1 above, it includes 2 data rows having the same study number 123 XXX. The first row of data (i.e., the data in the first row of table 1) may be written to table 1 at 9:00 am, e.g., on 10 months and 4 months of 2020. The second row of data (i.e., the data in the fourth row of table 1) may be written to table 1 at 14:00 p.m. on, for example, 4.10.2020. When grouping is performed, the first row of data and the second row of data may be grouped into a group and sorted according to the chronological order. Since the second line data is the time closest to the current grouping time, the second line data is ordered between the first line data. When data synchronization is performed, only the second line data may be selected for synchronization. Therefore, the synchronized data can be the latest data, the data synchronization amount is reduced, and the data synchronization efficiency is improved.

As the amount of data increases, for example, when the data of a data table reaches tens of millions, it takes much time to look up the data from the table. To reduce the burden on the database and shorten the query time, the database is usually divided into a plurality of data tables (i.e., sub-databases). In the related art, for data synchronization of a database (e.g., MySQL) including a plurality of data tables, the plurality of data tables are generally synchronized one by one in a serial manner. Fig. 3 shows a schematic diagram of a data synchronization process in a banking and tabulation scenario according to an example of the present disclosure. As shown in fig. 3, in the case that the database includes n (n may be a positive integer) sub-banks, it takes a lot of time to synchronize each sub-bank in a serial manner. For example, in the case that it takes T time to synchronize one sub-library (before speeding up), it takes n × T time to complete the synchronization process of all n sub-libraries. On the other hand, when one of the sub-library synchronization fails, the whole synchronization task fails.

In this regard, the present disclosure provides an asynchronous concurrency mechanism. In some embodiments, in response to determining that the data synchronization type is data synchronization of the source database to the target database of the first type, the data synchronization method of the present disclosure further includes: the number of data tables in the first type of source database is determined. Extracting the data to be synchronized from the first type of source database may include: in response to determining that the source database of the first type includes a plurality of data tables, sending a request to a data transfer service corresponding to the plurality of data tables to request to simultaneously extract data to be synchronized corresponding to the plurality of data tables from the plurality of data tables, respectively, and storing the extracted data to be synchronized corresponding to the plurality of data tables in a temporary storage unit. As shown in fig. 3, in the case that the database includes n sub-banks, an automatic synchronization Thread1, Thread2, …, Thread may be created for each sub-bank 1,2, …, n, respectively. Each thread is responsible for handling the synchronization tasks of one sub-pool. Therefore, under the condition that T time is consumed by synchronizing one sub-library, a concurrent execution mechanism (after acceleration) is introduced, the time consumed by the synchronization process of all n sub-libraries is close to the T time, the data synchronization time is saved, and the data synchronization efficiency is improved. Meanwhile, when one of the sub-libraries fails to synchronize, the synchronization process of other sub-libraries cannot be influenced.

Fig. 4 is a flow diagram of a data synchronization method 400 according to some example embodiments of the present disclosure. As shown in fig. 4, the method includes the following steps.

In step 401, in response to receiving a data synchronization request, a data synchronization type indicated in the data synchronization request is determined.

In step 402, according to the data synchronization type, a data synchronization process corresponding to the data synchronization type is invoked to extract the data to be synchronized from the source database.

In step 403, table structure information of the target database is determined, and data to be synchronized is processed according to the table structure information of the target database.

In step 404, the processed data to be synchronized is written into the target database.

In step 405, an execution state of writing the processed data to be synchronized into the target database is detected at predetermined time intervals.

At step 406, if the execution status is determined to be a failure, the number of times the execution status is a failure is counted.

In step 407, it is determined whether the number of times the execution status is failed is less than or equal to a preset threshold, if so, step 404 is executed, otherwise, step 408 is executed.

In step 408, a message that the execution status of writing the processed data to be synchronized into the target database is failed is sent to the initiator of the data synchronization request.

Thus, by allowing retries of the synchronization process and limiting the number of retries, the efficiency of data synchronization can be improved. Meanwhile, after the synchronization process fails, a synchronization process failure message is sent to the initiator of the data synchronization request, so that the initiator can take other remedial measures in time, and the user experience is improved.

The data synchronization method according to the exemplary embodiment of the present disclosure is explained above. Although the operations are depicted in the drawings in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order, nor that all illustrated operations be performed, to achieve desirable results. A data synchronization platform and a data synchronization apparatus according to an exemplary embodiment of the present disclosure are described below.

The embodiment of the disclosure also provides a data synchronization platform, and the data synchronization method of the embodiment of the disclosure can be executed by the data synchronization platform, and the data synchronization platform is configured with a plurality of data synchronization components. Fig. 5 is a schematic structural diagram of a data synchronization platform 500 according to some exemplary embodiments of the present disclosure. For each data synchronization type, such as MySQL2UDW, ES2UDW, or Kafka2UDW, the data synchronization platform 500 may invoke the data synchronization component 501 corresponding to that data synchronization type to automatically implement data synchronization.

Fig. 6 is a framework diagram of a data synchronization platform 600 according to some exemplary embodiments of the present disclosure. As shown in fig. 6, the data synchronization platform 600 includes a plurality of service modules. The configuration entry module 601 is configured to enable a user to input a data synchronization request, such as link information of a source database, a synchronization mode (e.g., full sync, incremental sync, or real-time sync), synchronization type and permission group information, and the like. The Tomcat container service module 602 is configured to respond to a request (e.g., a data synchronization request) submitted by a user, and is responsible for routing the request to a service module corresponding to the platform, thereby triggering a subsequent execution action. The configuration parsing service module 603 is configured to parse configuration information submitted by a user, for example, parsing out a data table structure to be synchronized, a used permission group, a synchronization component to be used, and the like according to the link information. The user meta-information module 604 is configured to store the user's permission group information and corresponding task meta-information for subsequent task management and downstream task construction and execution. The rights management service module 605 is configured to decide the rights groups used by each task, and then entitle downstream synchronous tasks to read and write the target table constructed by the user. The component identification service module 606 is configured to identify the synchronization component to be used based on the meta-information submitted by the user. The task construction service module 607 is configured to automatically invoke the modular component library according to the component recognition result, and construct synchronous tasks, such as MySQL2UDW, ES2UDW, Kafka2 UDW. The routine results synchronization module 608 is configured to send the status of the synchronization task execution to the user, such as upon a synchronization task failure, a message to the user that the synchronization task failed.

Fig. 7 is a schematic diagram of a functional hierarchical framework of a data synchronization platform 700 according to some exemplary embodiments of the present disclosure. At the user configuration layer, the service only needs to simply configure the task information (such as synchronization type, synchronization mode, etc.) needing synchronization. And maintaining the meta-information of each task on a platform service layer, and automatically selecting a corresponding synchronization component according to the configuration information provided by the user to construct a synchronization task. The HQL (Hive SQL, a data analysis and processing language used in Hive) parsing service is to acquire structural information of a data table according to source database link information, and then parse and generate a table building statement conforming to Hive SQL syntax to create a Hive table. The Web UI refers to a front-end interactive page, is used for interfacing with a user, and is an entrance for operations such as task information submission and task state query of the user. And various synchronous components including MySQL2UDW, ES2UDW, Kafka2UDW and the like are integrated on the template component layer, and the upstream and downstream of the data are synchronized to form a closed loop, so that the data tracing is facilitated.

Fig. 8 is a schematic block diagram of a data synchronization apparatus 800 according to some example embodiments of the present disclosure. As shown in fig. 8, the data synchronization apparatus 800 includes a determination unit 801, a retrieval unit 802, an acquisition unit 803, a processing unit 804, and a synchronization unit 805.

The determining unit 801 is configured to determine, in response to receiving a data synchronization request, a data synchronization type indicated in the data synchronization request.

The invoking unit 802 is configured to invoke a data synchronization process corresponding to the data synchronization type according to the data synchronization type, so as to extract data to be synchronized from the source database.

The acquisition unit 803 is configured to determine table structure information of the target database.

The processing unit 804 is configured to process the data to be synchronized according to the table structure information of the target database.

The synchronization unit 805 is configured to write the processed data to be synchronized to the target database.

In some examples, the operations of the determining unit 801, the invoking unit 802, the obtaining unit 803, the processing unit 804 and the synchronizing unit 805 correspond to steps 201-205, respectively, of the method 200 described above with respect to fig. 2 and are therefore not described in detail herein. In some examples, the

data synchronization platforms

500 and 600 described above may be examples of the data synchronization apparatus 800.

Although specific functionality is discussed above with reference to particular modules, it should be noted that the functionality of the various modules discussed herein may be divided into multiple modules and/or at least some of the functionality of multiple modules may be combined into a single module. Performing an action by a particular module discussed herein includes the particular module itself performing the action, or alternatively the particular module invoking or otherwise accessing another component or module that performs the action (or performs the action in conjunction with the particular module). Thus, a particular module that performs an action can include the particular module that performs the action itself and/or another module that the particular module invokes or otherwise accesses that performs the action.

An aspect of the present disclosure also provides an electronic device, which may include a processor; and a memory storing a program comprising instructions which, when executed by the processor, cause the processor to perform the steps of the method according to an embodiment of the disclosure.

An aspect of the present disclosure also provides a computer readable storage medium storing a program comprising instructions which, when executed by a processor of an electronic device, cause the electronic device to perform the steps of a method according to an embodiment of the present disclosure.

An aspect of the present disclosure also provides a computer program product comprising computer programs/instructions which, when executed by a processor, implement the steps of a method according to an embodiment of the present disclosure.

Examples of such electronic devices and computer-readable storage media are described below with reference to fig. 9.

Fig. 9 illustrates an example configuration of a computing device 900 as an electronic device that may be used to implement the modules and functions described herein. Computing device 900 may be a variety of different types of devices, such as a server of a service provider, a device associated with a client (e.g., a client device), a system on a chip, and/or any other suitable computing device or computing system. Examples of computing device 900 include, but are not limited to: a desktop computer, a server computer, a notebook or netbook computer, a mobile device (e.g., a tablet or phablet device, a cellular or other wireless phone (e.g., a smartphone), a notepad computer, a mobile station), a wearable device (e.g., glasses, a watch), an entertainment device (e.g., an entertainment appliance, a set-top box communicatively coupled to a display device, a game console), a television or other display device, an automotive computer, and so forth. Thus, the computing device 900 may range from a full resource device with substantial memory and processor resources (e.g., personal computers, game consoles) to a low-resource device with limited memory and/or processing resources (e.g., traditional set-top boxes, hand-held game consoles).

Computing device 900 may include at least one processor 902, memory 904, communication interface(s) 906, display device 908, other input/output (I/O) devices 910, and one or more mass storage devices 912, which may be capable of communicating with each other, such as through a system bus 914 or other appropriate connection.

The processor 902 may be a single processing unit or multiple processing units, all of which may include single or multiple computing units or multiple cores. The processor 902 may be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, state machines, logic circuitry, and/or any devices that manipulate signals based on operational instructions. Among other capabilities, the processor 902 may be configured to retrieve and execute computer-readable instructions stored in the memory 904, mass storage device 912, or other computer-readable medium, such as program code for an operating system 916, program code for an application program 918, program code for other programs 920, and so forth.

Memory 904 and mass storage device 912 are examples of computer storage media for storing instructions that are executed by processor 902 to perform the various functions described above. By way of example, the memory 904 may generally include both volatile and nonvolatile memory (e.g., RAM, ROM, and the like). In addition, the mass storage device 912 may generally include a hard disk drive, solid state drive, removable media including external and removable drives, memory cards, flash memory, floppy disks, optical disks (e.g., CDs, DVDs), storage arrays, network attached storage, storage area networks, and the like. Memory 904 and mass storage device 912 may both be collectively referred to herein as memory or computer storage media, and may be non-transitory media capable of storing computer-readable, processor-executable program instructions as computer program code that may be executed by processor 902 as a particular machine configured to implement the operations and functions described in the examples herein.

A number of program modules may be stored on the mass storage device 912. These programs include an operating system 916, one or more application programs 918, other programs 920, and program data 922, which can be loaded into memory 904 for execution. Examples of such applications or program modules may include, for instance, computer program logic (e.g., computer program code or instructions) for implementing the following components/functions: the determination unit 801, the retrieval unit 802, the acquisition unit 803, the processing unit 804 and the synchronization unit 805, the method 200, the method 400 and/or further embodiments described herein.

Although illustrated in fig. 9 as being stored in memory 904 of computing device 900,

modules

916, 918, 920, and 922, or portions thereof, may be implemented using any form of computer-readable media that is accessible by computing device 900. As used herein, "computer-readable media" includes at least two types of computer-readable media, namely computer storage media and communication media.

Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, Digital Versatile Disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information for access by a computing device.

In contrast, communication media may embody computer readable instructions, data structures, program modules, or other data in a modulated data signal, such as a carrier wave or other transport mechanism. Computer storage media, as defined herein, does not include communication media.

Computing device 900 may also include one or more communication interfaces 906 for exchanging data with other devices, such as over a network, direct connection, or the like, as previously discussed. Such communication interfaces may be one or more of the following: any type of network interface (e.g., a Network Interface Card (NIC)), wired or wireless (such as IEEE 802.11 Wireless LAN (WLAN)) wireless interface, global microwave access interoperability (Wi-MAX) interface, ethernet interface, Universal Serial Bus (USB) interface, cellular network interface, bluetooth interface, Near Field Communication (NFC) interface, and the like. Communication interface 906 may facilitate communications within a variety of networks and protocol types, including wired networks (e.g., LAN, cable, etc.) and wireless networks (e.g., WLAN, cellular, satellite, etc.), the Internet, and so forth. Communication interface 906 may also provide for communication with external storage devices (not shown), such as in storage arrays, network attached storage, storage area networks, and the like.

In some examples, a display device 908, such as a monitor, may be included for displaying information and images to a user. Other I/O devices 910 may be devices that receive various inputs from a user and provide various outputs to the user, and may include touch input devices, gesture input devices, cameras, keyboards, remote controls, mice, printers, audio input/output devices, and so forth.

While the disclosure has been illustrated and described in detail in the drawings and foregoing description, such illustration and description are to be considered illustrative and exemplary and not restrictive; the present disclosure is not limited to the disclosed embodiments. Variations to the disclosed embodiments can be understood and effected by those skilled in the art in practicing the claimed subject matter, from a study of the drawings, the disclosure, and the appended claims. In the claims, the word "comprising" does not exclude other elements or steps not listed, the indefinite article "a" or "an" does not exclude a plurality, and the term "a plurality" means two or more. The mere fact that certain measures are recited in mutually different dependent claims does not indicate that a combination of these measures cannot be used to advantage.

Claims

1. A method of data synchronization, comprising:

in response to receiving a data synchronization request, determining a data synchronization type indicated in the data synchronization request;

according to the data synchronization type, a data synchronization process corresponding to the data synchronization type is started to extract data to be synchronized from a source database;

determining table structure information of a target database;

processing the data to be synchronized according to the table structure information of the target database; and

and writing the processed data to be synchronized into the target database.

2. The method of claim 1, wherein the data synchronization type is determined based on a type of the source database.

3. The method of claim 2, wherein according to the data synchronization type, invoking a data synchronization process corresponding to the data synchronization type to extract the data to be synchronized from the source database comprises:

in response to determining that the data synchronization type is data synchronization from the source database of the first type to the target database, sending a request to a data transmission service corresponding to the source database of the first type to request to extract the data to be synchronized from the source database of the first type and store the extracted data to be synchronized in a temporary storage unit.

4. The method of claim 2, wherein according to the data synchronization type, invoking a data synchronization process corresponding to the data synchronization type to extract the data to be synchronized from the source database comprises:

in response to determining that the data synchronization type is data synchronization from the source database of the second type to the target database, acquiring a plug-in corresponding to the source database of the second type; and

and extracting the data to be synchronized from the source database of the second type through the plug-in.

5. The method of claim 2, wherein according to the data synchronization type, invoking a data synchronization process corresponding to the data synchronization type to extract the data to be synchronized from the source database comprises:

and in response to the data synchronization from the source database with the data synchronization type being the third type to the target database, extracting the data to be synchronized from the source database with the third type at intervals of preset time, and storing the extracted data to be synchronized into a temporary storage unit.

6. The method of any of claims 1-5, wherein the table structure information of the target database includes a unique key of a data table contained in the target database,

according to the table structure information of the target database, the processing of the data to be synchronized comprises the following steps:

and according to the unique key of the data table contained in the target database, carrying out duplicate removal on the data to be synchronized.

7. The method of claim 6, wherein de-duplicating the data to be synchronized according to the unique key of the data table contained in the target database comprises:

dividing the data with the same unique key in the data to be synchronized into a group to obtain a plurality of groups;

acquiring a timestamp of the data to be synchronized, wherein the timestamp is used for representing the time when the data to be synchronized is written into a corresponding source database;

for each group in the plurality of groups, sequencing the data in the group according to the time sequence according to the time stamp; and

for each of the plurality of packets, the data in the packet that is ranked first is selected.

8. The method of claim 2 or 3, in response to determining that the data synchronization type is data synchronization of a source database of a first type to the target database, the method further comprising:

determining the number of data tables in the first type of source database;

wherein, extracting the data to be synchronized from the source database of the first type comprises:

in response to determining that the source database of the first type includes a plurality of data tables, sending a request to a data transmission service corresponding to the plurality of data tables to request to simultaneously extract data to be synchronized corresponding to the plurality of data tables from the plurality of data tables, respectively, and storing the extracted data to be synchronized corresponding to the plurality of data tables in the temporary storage unit.

9. The method of claim 1, wherein the method further comprises:

determining link information of the source database indicated in the data synchronization request, wherein the link information is used for enabling the data synchronization process to locate the source database.

10. The method of claim 9, wherein the link information of the source database comprises one or more selected from the group consisting of an address of the source database, a port of the source database, a username and password to access the source database, a name of the source database, and a name of a table of data contained in the source database.

11. The method according to any one of claims 1-5, wherein the method further comprises:

detecting the execution state of writing the processed data to be synchronized into the target database at preset time intervals;

in response to determining that the execution state is a failure, counting a number of times that the execution state is a failure; and

and in response to determining that the number of times is less than or equal to a preset threshold, continuing to execute the step of writing the processed data to be synchronized into the target database.

12. The method of claim 11, wherein the method further comprises:

in response to determining that the number of times is greater than the preset threshold, determining that the execution state of writing the processed data to be synchronized into the target database is a failure; and

and sending a message that the execution state of writing the processed data to be synchronized into the target database is failure to an initiator of the data synchronization request.

13. The method of claim 1, wherein the data to be synchronized comprises all data in the source database.

14. The method of claim 1, wherein the data to be synchronized comprises data changed by the source database within a predetermined period of time and a type of data change;

the data change types comprise data modification, data addition and data deletion.

15. A data synchronization apparatus, comprising:

a determining unit configured to determine a data synchronization type indicated in a data synchronization request in response to receiving the data synchronization request;

the calling unit is configured to call a data synchronization process corresponding to the data synchronization type according to the data synchronization type so as to extract data to be synchronized from the source database;

an acquisition unit configured to determine table structure information of a target database;

the processing unit is configured to process the data to be synchronized according to the table structure information of the target database; and

and the synchronization unit is configured to write the processed data to be synchronized into the target database.

16. A data synchronization platform, comprising:

a processor; and

a memory storing a program comprising instructions that, when executed by the processor, cause the processor to perform the method of any of claims 1 to 14.

17. A computer readable storage medium storing a program, the program comprising instructions that when executed by a processor of an electronic device cause the electronic device to perform the method of any of claims 1-14.