CN112214543A

CN112214543A - Data synchronization method and device and terminal equipment

Info

Publication number: CN112214543A
Application number: CN201910626397.8A
Authority: CN
Inventors: 李伟; 熊友军
Original assignee: Ubtech Robotics Corp
Current assignee: Ubtech Robotics Corp
Priority date: 2019-07-11
Filing date: 2019-07-11
Publication date: 2021-01-12

Abstract

The invention is suitable for the technical field of data transmission, and provides a data synchronization method, a device and terminal equipment, wherein the method comprises the following steps: the method comprises the steps of deploying an increment report data synchronization agent service of a relational database, deploying a ground distributed file system agent service of a distributed publish-subscribe message system cluster, writing data into the relational database when a write request instruction is received, synchronizing the data to the distributed publish-subscribe message system cluster through the increment report data synchronization agent service, and synchronizing and storing the data of the distributed publish-subscribe message system cluster to a distributed file system. The invention realizes the data storage from the relational database to the distributed file system based on the flash, realizes the read-write separation during the data synchronization and improves the data reading speed.

Description

Data synchronization method and device and terminal equipment

Technical Field

The invention belongs to the technical field of data transmission, and particularly relates to a data synchronization method, a data synchronization device and terminal equipment.

Background

Flash is a distributed, reliable and available system for efficiently collecting, aggregating and moving large amounts of log data from many different sources to a centralized data store. Furthermore, the Flume can acquire data from a plurality of sources and send the data to a receiver end through a channel, and can support multiplexing and multi-stage jumping. Accordingly, Flume is widely used in various fields.

However, Flume does not provide and support data synchronization to relational databases, nor does it provide a complete implementation of data storage from relational databases to non-relational databases.

Disclosure of Invention

In view of this, embodiments of the present invention provide a data synchronization method, an apparatus, and a terminal device, so as to solve the problem that in the prior art, flute does not provide and support data synchronization for a relational database, and does not have a complete implementation method for storing data from the relational database to a non-relational database.

A first aspect of an embodiment of the present invention provides a data synchronization method, including:

incremental reporting data synchronization agent service of a deployment relational database; the incremental reporting data synchronization agent service is used for synchronizing the data increment of the relational database to the distributed publishing and subscribing message system cluster;

deploying a ground distributed file system agent service of a distributed publish-subscribe message system cluster; the landing distributed file system agent service is used for storing the data of the distributed publishing and subscribing message system cluster to a distributed file system;

when a write request instruction is received, writing data into the relational database, and synchronizing the data to the distributed publish-subscribe message system cluster through the incremental report data synchronization proxy service;

and synchronizing the data of the distributed publishing and subscribing message system cluster to a distributed file system and storing the data.

Optionally, before the incremental reporting of the data synchronization agent service by the deployment relational database, the method includes:

customizing a source based on a log collection system;

defining an implementation mechanism for incremental synchronization of the relational database;

and building the distributed publishing and subscribing message system and the distributed file system cluster.

Optionally, the incremental reporting data synchronization proxy service for deploying the relational database includes

And deploying increment report data synchronization proxy service of the relational database, so that the relational database synchronizes data increments to the distributed publish-subscribe message system cluster according to an implementation mechanism.

Optionally, defining an implementation mechanism of incremental synchronization of the relational database includes:

reading the offset of the database table, and recording the offset in a preset file;

if the process is restarted, acquiring a reading record;

and according to the reading record, continuously reading the offset of the database table and recording.

A second aspect of an embodiment of the present invention provides a data synchronization apparatus, including:

the first deployment module is used for deploying the incremental reporting data synchronization proxy service of the relational database; the incremental reporting data synchronization agent service is used for synchronizing the data increment of the relational database to the distributed publishing and subscribing message system cluster;

the second deployment module is used for deploying the ground distributed file system agent service of the distributed publish-subscribe message system cluster; the landing distributed file system agent service is used for storing the data of the distributed publishing and subscribing message system cluster to a distributed file system;

the receiving module is used for writing data into the relational database when a write request instruction is received, and synchronizing the data to the distributed publish-subscribe message system cluster through the incremental report data synchronization proxy service;

and the synchronization module is used for synchronizing the data of the distributed publishing and subscribing message system cluster to a distributed file system and storing the data.

Optionally, the apparatus further includes:

the self-defining module is used for self-defining the source based on the log collection system;

the defining module is used for defining an incremental synchronization realizing mechanism of the relational database;

and the building module is used for building the distributed publishing and subscribing message system and the distributed file system cluster.

Optionally, the first deployment module includes:

and the deployment unit is used for deploying the increment reporting data synchronization proxy service of the relational database, so that the relational database synchronizes the data increment to the distributed publish-subscribe message system cluster according to an implementation mechanism.

Optionally, the defining module includes:

the first reading unit is used for reading the offset of the database table and recording the offset in a preset file;

the acquisition unit is used for acquiring the read record if the process is restarted;

and the second reading unit is used for continuously reading the offset of the database table and recording the offset according to the reading record.

A third aspect of an embodiment of the present invention provides a terminal device, including: a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the steps of the method as described above when executing the computer program.

A fourth aspect of embodiments of the present invention provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the method as described above.

The embodiment of the invention reports data synchronization proxy service and the ground distributed file system proxy service of a distributed publish-subscribe message system cluster through the increment of a deployed relational database, writes data into the relational database when receiving a write request instruction, and synchronizes the data to the distributed publish-subscribe message system cluster through the increment of the reported data synchronization proxy service; and synchronizing the data of the distributed publishing and subscribing message system cluster to a distributed file system and storing the data. The data storage from the relational database to the distributed file system based on the flash is realized, the read-write separation during data synchronization is realized, and the data reading speed is improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.

Fig. 1 is a schematic flowchart of a data synchronization method according to an embodiment of the present invention;

fig. 2 is a schematic flow chart of a high availability clustering scheme according to an embodiment of the present invention;

fig. 3 is a schematic flowchart of a data synchronization method according to a second embodiment of the present invention;

fig. 4 is a schematic flowchart of a data synchronization method according to a third embodiment of the present invention;

fig. 5 is a schematic structural diagram of a data synchronization apparatus according to a fourth embodiment of the present invention;

fig. 6 is a schematic structural diagram of a data synchronization apparatus according to a fifth embodiment of the present invention;

fig. 7 is a schematic structural diagram of a data synchronization apparatus according to a sixth embodiment of the present invention;

fig. 8 is a schematic diagram of a terminal device according to a seventh embodiment of the present invention.

Detailed Description

In order to make the technical solutions of the present invention better understood by those skilled in the art, the technical solutions in the embodiments of the present invention will be clearly described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some embodiments of the present invention, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The terms "comprises" and "comprising," and any variations thereof, in the description and claims of this invention and the above-described drawings are intended to cover non-exclusive inclusions. For example, a process, method, or system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus. Furthermore, the terms "first," "second," and "third," etc. are used to distinguish between different objects and are not used to describe a particular order.

In order to explain the technical means of the present invention, the following description will be given by way of specific examples.

Example one

As shown in fig. 1, the present embodiment provides a data synchronization method, which can be applied to terminal devices such as a mobile phone, a PC, a tablet computer, and the like. The data synchronization method provided by the embodiment includes:

s101, deploying incremental reporting data synchronization proxy service of a relational database; the incremental reporting data synchronization agent service is used for synchronizing the data increment of the relational database to the distributed publish-subscribe message system cluster.

In a specific application, the incremental reporting data synchronization agent service of the relational database is deployed so as to synchronize the data increment of the relational database to the distributed publish-subscribe message system cluster. The incremental reporting data synchronization agent service is used for synchronizing the data increment of the relational database to the distributed publish-subscribe message system cluster. Relational databases include, but are not limited to, MySQL databases.

In this embodiment, the distributed publish-subscribe messaging system cluster is a Kafaka cluster, and Kafka is a high-throughput distributed publish-subscribe messaging system that can handle all the action stream data in a consumer-scale website.

In one embodiment, step S101 includes:

In a specific application, the incremental reporting data synchronization proxy service of the relational database is deployed, so that the relational database synchronizes data increments to the distributed publish-subscribe message system cluster according to an incremental synchronization implementation mechanism. The incremental synchronization mechanism is specifically an incremental synchronization mechanism of the relational database described in step S106 of the following embodiment.

S102, deploying a ground distributed file system agent service of a distributed publish-subscribe message system cluster; the floor distributed file system agent service is used for storing the data of the distributed publish-subscribe message system cluster to a distributed file system.

In a specific application, a floor distributed file system proxy service of the distributed publish-subscribe message system cluster is deployed so that data of the distributed publish-subscribe message system cluster is stored in the distributed file system. The distributed file system agent service is used for landing the data of the distributed publish-subscribe message system cluster to the distributed file system.

In this embodiment, the distributed file system proxy service refers to an HDFS proxy service (kafka tohdfsent proxy), and Hadoop is a distributed system infrastructure developed by the Apache foundation. Hadoop realizes a Distributed File System (Hadoop Distributed File System), which is called HDFS for short. HDFS is characterized by high fault tolerance, is designed to be deployed on low-cost hardware, can provide high throughput (high throughput) to access data of an application program, and is suitable for application programs with huge data sets (large data sets). HDFS relaxes the requirements of (relax) POSIX and can access (streaming access) data in a file system in the form of streams.

In this embodiment, the distributed file system proxy service of the distributed publish-subscribe messaging system cluster uses the high availability clustering scheme as shown in fig. 2.

S103, when a write request instruction is received, data is written into the relational database, and the data is synchronized to the distributed publish-subscribe message system cluster through the incremental report data synchronization proxy service.

In a specific application, when a write request instruction is received, writing data to be written into a relational database, reporting data synchronization proxy service through the increment, and synchronizing the data to be written in the relational database into a distributed publish-subscribe message system cluster.

And S104, synchronizing the data of the distributed publish-subscribe message system cluster to a distributed file system and storing the data.

In the specific application, the data to be written of the distributed publish-subscribe message system cluster is synchronized to the distributed file system and stored according to the ground distributed file system proxy service of the distributed publish-subscribe message system cluster, and the data synchronization based on flash is realized.

In the embodiment, the data synchronization agent service is reported in an increment of a deployment relational database and the ground distributed file system agent service of a distributed publish-subscribe message system cluster, when a write request instruction is received, the data is written into the relational database, and the data is synchronized to the distributed publish-subscribe message system cluster by the increment reporting data synchronization agent service; and synchronizing the data of the distributed publishing and subscribing message system cluster to a distributed file system and storing the data. The data storage from the relational database to the distributed file system based on the flash is realized, the read-write separation during data synchronization is realized, and the data reading speed is improved.

Example two

As shown in fig. 3, this embodiment is a further description of the method steps in the first embodiment. In this embodiment, before the step S101, the method further includes

And S105, customizing the source based on the log collection system.

In specific application, the source is customized based on a log mobile phone system. Wherein, Source refers to a data Source for collecting data in flash. The method comprises the following steps that a flash (log collection system) is a high-availability, high-reliability and distributed system for collecting, aggregating and transmitting massive logs, wherein the flash supports various data senders customized in the log system and used for collecting data; at the same time, flash provides the ability to simply process data and write to various data recipients (customizable).

And S106, defining an incremental synchronization implementation mechanism of the relational database.

In a specific application, an implementation mechanism of incremental synchronization of the relational database is defined, so that data increments of the relational database are synchronized to a specified position according to the implementation mechanism.

In this embodiment, an implementation mechanism of incremental synchronization of the relational database is set to record an offset of reading a database table in a preset file (which may be a [ status. If the process is restarted, the reading record can be obtained, so that the table is continuously read in an incremental manner according to the reading record and is recorded in the preset file.

And S107, building the distributed publishing and subscribing message system and the distributed file system cluster.

In specific application, a distributed publish-subscribe message system and a distributed file system cluster are built. In this embodiment, the distributed publish-subscribe message system is a Kafka system, and the distributed file system cluster is a Hadoop cluster.

In the embodiment, the incremental synchronization implementation mechanism of the relational database is defined based on the custom source of the log collection system, and the distributed publish-subscribe message system and the distributed file system cluster are built, so that a foundation is laid for realizing data storage from the relational database to the distributed file system.

EXAMPLE III

As shown in fig. 4, this embodiment is a further description of the method steps in the first embodiment. In this embodiment, step S106 includes:

and S1061, reading the offset of the database table, and recording the offset in a preset file.

In a specific application, the offset of the database table in the relational database is read and recorded in a preset file, and in this embodiment, the preset file may be a [ status.

And S1062, if the process is restarted, acquiring the read record.

In a specific application, if the currently performed process is restarted, the reading record is acquired to check the last reading progress.

And S1063, continuously reading the offset of the database table according to the reading record and recording.

In specific application, according to the reading record, the last reading progress is checked, the offset of the database table in the relational database is continuously read, and the offset is recorded in a preset file.

The embodiment defines an implementation mechanism of incremental synchronization of the relational database, and realizes incremental synchronization operation of data from the relational database to the Kafka cluster.

It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present invention.

Example four

As shown in fig. 5, the present embodiment provides a data synchronization apparatus 100 for performing the method steps in the first embodiment. The data synchronization apparatus 100 provided in this embodiment includes:

the first deployment module 101 is configured to deploy an incremental reporting data synchronization proxy service of the relational database; the incremental reporting data synchronization agent service is used for synchronizing the data increment of the relational database to the distributed publishing and subscribing message system cluster;

the second deployment module 102 is configured to deploy a floor distributed file system proxy service of the distributed publish-subscribe message system cluster; the landing distributed file system agent service is used for storing the data of the distributed publishing and subscribing message system cluster to a distributed file system;

the receiving module 103 is configured to, when receiving a write request instruction, write data into the relational database, and synchronize the data to the distributed publish-subscribe message system cluster by reporting a data synchronization proxy service in an incremental manner;

a synchronization module 104, configured to synchronize and store the data of the distributed publish-subscribe message system cluster to a distributed file system.

In one embodiment, the first deployment module 101 includes:

EXAMPLE five

As shown in fig. 6, in the present embodiment, the data synchronization apparatus 100 in the fourth embodiment further includes the following structure for executing the method steps in the second embodiment:

a customization module 105, configured to customize a source based on a log collection system;

a defining module 106, configured to define an implementation mechanism of incremental synchronization of the relational database;

and the building module 107 is used for building the distributed publish-subscribe message system and the distributed file system cluster.

EXAMPLE six

As shown in fig. 7, in the present embodiment, the definition module 106 in the fourth embodiment further includes the following structure for executing the method steps in the third embodiment:

a first reading unit 1061, configured to read an offset of a database table, and record the offset in a preset file;

an obtaining unit 1062, configured to obtain a read record if the process is restarted;

and the second reading unit 1063 is configured to continue reading the offset of the database table according to the read record and record the offset.

EXAMPLE seven

Fig. 8 is a schematic diagram of the terminal device provided in this embodiment. As shown in fig. 8, the terminal device 8 of this embodiment includes: a processor 80, a memory 81 and a computer program 82, such as a data synchronization program, stored in said memory 81 and operable on said processor 80. The processor 80, when executing the computer program 82, implements the steps in the various data synchronization method embodiments described above, such as the steps S101 to S104 shown in fig. 1. Alternatively, the processor 80, when executing the computer program 82, implements the functions of each module/unit in the above-described device embodiments, for example, the functions of the modules 101 to 104 shown in fig. 5.

Illustratively, the computer program 82 may be partitioned into one or more modules/units that are stored in the memory 81 and executed by the processor 80 to implement the present invention. The one or more modules/units may be a series of computer program instruction segments capable of performing specific functions, which are used to describe the execution of the computer program 82 in the terminal device 8. For example, the computer program 82 may be divided into a first deployment module, a second deployment module, a receiving module, and a synchronization module, and each module has the following specific functions:

The terminal device 8 may be a desktop computer, a notebook, a palm computer, a cloud server, or other computing devices. The terminal device may include, but is not limited to, a processor 80, a memory 81. Those skilled in the art will appreciate that fig. 8 is merely an example of a terminal device 8 and does not constitute a limitation of terminal device 8 and may include more or fewer components than shown, or some components may be combined, or different components, e.g., the terminal device may also include input-output devices, network access devices, buses, etc.

The Processor 80 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field-Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The storage 81 may be an internal storage unit of the terminal device 8, such as a hard disk or a memory of the terminal device 8. The memory 81 may also be an external storage device of the terminal device 8, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital Card (SD), a Flash memory Card (Flash Card), and the like, which are provided on the terminal device 8. Further, the memory 81 may also include both an internal storage unit and an external storage device of the terminal device 8. The memory 81 is used for storing the computer program and other programs and data required by the terminal device. The memory 81 may also be used to temporarily store data that has been output or is to be output.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

In the embodiments provided in the present invention, it should be understood that the disclosed apparatus/terminal device and method may be implemented in other ways. For example, the above-described embodiments of the apparatus/terminal device are merely illustrative, and for example, the division of the modules or units is only one logical division, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated modules/units, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. Based on such understanding, all or part of the flow of the method according to the embodiments of the present invention may also be implemented by a computer program, which may be stored in a computer-readable storage medium, and when the computer program is executed by a processor, the steps of the method embodiments may be implemented. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.

The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present invention, and are intended to be included within the scope of the present invention.

Claims

1. A method of data synchronization, comprising:

2. The data synchronization method of claim 1, wherein before the incrementally reporting the data synchronization agent service of the deployment relational database, comprising:

customizing a source based on a log collection system;

3. The data synchronization method of claim 1, wherein deploying the incremental reporting data synchronization agent service of the relational database comprises

4. The data synchronization method of claim 1, wherein defining an implementation mechanism for incremental synchronization of the relational database comprises:

if the process is restarted, acquiring a reading record;

5. A data synchronization apparatus, comprising:

6. The data synchronization apparatus of claim 5, wherein the apparatus further comprises:

7. The data synchronization apparatus of claim 5, wherein the first deployment module comprises:

8. The data synchronization apparatus of claim 6, wherein the definition module comprises:

9. A terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any of claims 1 to 4 when executing the computer program.

10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 4.