CN113010608A - Data real-time synchronization method and device and computer readable storage medium - Google Patents
Data real-time synchronization method and device and computer readable storage medium Download PDFInfo
- Publication number
- CN113010608A CN113010608A CN202110371580.5A CN202110371580A CN113010608A CN 113010608 A CN113010608 A CN 113010608A CN 202110371580 A CN202110371580 A CN 202110371580A CN 113010608 A CN113010608 A CN 113010608A
- Authority
- CN
- China
- Prior art keywords
- data
- real
- time
- processed
- database
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 70
- 238000003780 insertion Methods 0.000 claims abstract description 35
- 230000037431 insertion Effects 0.000 claims abstract description 35
- 238000012545 processing Methods 0.000 claims abstract description 25
- 230000001360 synchronised effect Effects 0.000 claims abstract description 16
- 230000008676 import Effects 0.000 claims abstract description 11
- 230000001960 triggered effect Effects 0.000 claims abstract description 9
- 238000012217 deletion Methods 0.000 claims abstract description 8
- 230000037430 deletion Effects 0.000 claims abstract description 8
- 230000006870 function Effects 0.000 claims description 8
- 230000008859 change Effects 0.000 claims description 6
- 238000004590 computer program Methods 0.000 claims description 6
- 230000008569 process Effects 0.000 description 15
- 238000005516 engineering process Methods 0.000 description 7
- 238000011084 recovery Methods 0.000 description 6
- 238000010586 diagram Methods 0.000 description 4
- 238000007405 data analysis Methods 0.000 description 3
- 230000007246 mechanism Effects 0.000 description 3
- 230000010076 replication Effects 0.000 description 3
- 238000013473 artificial intelligence Methods 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000002411 adverse Effects 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 230000005055 memory storage Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000005192 partition Methods 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 238000009877 rendering Methods 0.000 description 1
- 230000001131 transforming effect Effects 0.000 description 1
- 230000001052 transient effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/27—Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/23—Updating
- G06F16/2365—Ensuring data consistency and integrity
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/25—Integrating or interfacing systems involving database management systems
- G06F16/254—Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computing Systems (AREA)
- Computer Security & Cryptography (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The application discloses a real-time data synchronization method and device and a computer readable storage medium. The method is applied to a Flink distributed framework, and data to be processed is read from a Binlog according to configuration information of a source database and a ClickHouse and real-time synchronous task control information; and writing the data to be processed into the memory by adopting a matched data insertion operation mode according to whether the database operation type corresponding to the data to be processed is a data insertion operation, a data deletion operation or a data updating operation. And when the data import operation of the target database is triggered, loading the data in the memory into the ClickHouse. The method and the device can solve the problem that data synchronization is inconsistent easily in long-time processing in the related art, effectively improve the real-time performance of data synchronization, and meet the high-performance real-time synchronization requirement.
Description
Technical Field
The present application relates to the field of computer technologies, and in particular, to a method and an apparatus for real-time synchronization of data, and a computer-readable storage medium.
Background
The ClickHouse is a columnar database management system for OLAP (Online Analytical Processing), which is used as a data analysis engine that is both computationally intensive and storage intensive and is increasingly widely applied in the field of large data due to its lightweight and easy-to-use characteristics, as the amount of data greatly increases. The primary condition for using the ClickHouse real-time data analysis is to implement real-time synchronization of data from a business database, such as MySQL database, the kind of MySQL database like RDS of airy cloud, and the like, into ClickHouse. The real-time data synchronization is a process of transferring data from a source database to a target database, and the real-time data synchronization is a process of ensuring that the source database and the target database are consistent in real time on the basis of full data.
In the process of synchronizing business data to ClickHouse in real time to analyze OLAP real-time data, the related technology places the data into Kafka or a message queue as a transit place between a source database and a target database, and then synchronizes the data into the target database from the transit place. However, the data synchronization method increases a transmission link, improves the complexity of the whole system, increases the data delay probability, reduces the real-time performance of data synchronization due to inconsistent data caused by long-time processing, and cannot meet the requirement of real-time large data volume synchronization.
Disclosure of Invention
The application provides a real-time data synchronization method, a real-time data synchronization device and a computer readable storage medium, which solve the problem that data synchronization is easily inconsistent in long-time processing in the related art, effectively improve the real-time performance of data synchronization and meet the high-performance real-time synchronization requirement.
In order to solve the above technical problems, embodiments of the present invention provide the following technical solutions:
an embodiment of the present invention provides a real-time data synchronization method, which is applied to a Flink distributed framework, and includes:
reading data to be processed from the Binlog according to configuration information of a source database and a ClickHouse and real-time synchronous task control information; the Binlog stores the change data of the source database;
writing the data to be processed into a memory by adopting a matched data insertion operation mode according to the database operation type corresponding to the data to be processed; the operation database type comprises a data insertion operation, a data deletion operation and a data updating operation;
and when detecting that the data import operation of the target database is triggered, loading the data in the memory into the ClickHouse.
Optionally, after writing the to-be-processed data into the memory by using the matched data insertion operation mode, the method further includes:
and recording the processing time point of the data to be processed and the position in the Bilog in Checkpoint so as to record the Bilog synchronization site in a state in real time.
Optionally, after writing the to-be-processed data into the memory by using the matched data insertion operation mode, the method further includes:
and when the data real-time synchronization task is started again, automatically recovering the data real-time synchronization task from the position point corresponding to the fault point and recorded in the Binlog.
Optionally, when it is detected that a destination database data import operation is triggered, loading the data in the memory into the clickwouse includes:
and when the operation times of storing the data into the memory reach a preset time threshold or the current time reaches a Checkpoint period, loading the data in the memory into the clickwouse.
Optionally, writing the to-be-processed data into the memory by using a matched data insertion operation mode according to the database operation type corresponding to the to-be-processed data includes:
presetting the data insertion operation mode to comprise a field and a main key, wherein the field is used for identifying the type of an operation database;
if the database operation type corresponding to the data to be processed is the data insertion operation, writing the data of the corresponding main key into the memory;
if the database operation type corresponding to the data to be processed is the data deleting operation, deleting the corresponding data by utilizing the same-main key merging function;
if the database operation type corresponding to the data to be processed is the data updating operation, the data insertion operation mode comprises a first field, a first main key, a second field and a second main key; and writing the data of the first main key and the data of the second main key into the memory in sequence.
Optionally, before reading the data to be processed from the Binlog according to the source database, the configuration information of the clickwouse, and the real-time synchronization task control information, the method further includes:
receiving a concurrency degree setting instruction;
and setting the task thread number of the real-time synchronous task control information according to the concurrency degree in the concurrency degree setting instruction.
Another aspect of the embodiments of the present invention provides a real-time data synchronization apparatus, which is applied to a Flink distributed framework, and includes:
the data reading module is used for reading data to be processed from the Binlog according to the configuration information of the source database and the clickwouse and the real-time synchronous task control information; the Binlog stores the change data of the source database;
the data writing module is used for writing the data to be processed into the memory by adopting a matched data insertion operation mode according to the database operation type corresponding to the data to be processed; the operation database type comprises a data insertion operation, a data deletion operation and a data updating operation;
and the data synchronization module is used for loading the data in the memory into the ClickHouse when detecting that the data import operation of the target database is triggered.
Optionally, the system further comprises a state recording module, configured to record the processing time point of the data to be processed and the position in the Binlog in Checkpoint, so as to record the Binlog synchronization site in a state in real time.
The embodiment of the present invention further provides a real-time data synchronization apparatus, which includes a processor, where the processor is configured to implement the steps of the real-time data synchronization method according to any one of the foregoing embodiments when executing a computer program stored in a memory.
Finally, an embodiment of the present invention provides a computer-readable storage medium, where a data real-time synchronization program is stored on the computer-readable storage medium, and when executed by a processor, the data real-time synchronization program implements the steps of the data real-time synchronization method according to any one of the foregoing items.
The technical scheme provided by the application has the advantages that business data of inserting, updating and deleting the source database can be converted and transmitted in the memory by using the Flink distributed framework, so that the end-to-end high-performance data real-time synchronization from the source end to the destination end ClickHouse is realized, the end-to-end data consistency is ensured, the phenomenon that the data synchronization is inconsistent easily occurs in the long-time processing of the related technology is solved, the real-time performance of the data synchronization is effectively improved, and the high-performance real-time synchronization requirement is met; in addition, the Flink distributed framework can adjust the concurrency according to actual needs, and the real-time synchronization requirement of high performance is further met.
In addition, the embodiment of the invention also provides a corresponding implementation device and a computer readable storage medium for the data real-time synchronization method, so that the method has higher practicability, and the device and the computer readable storage medium have corresponding advantages.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions of the related art, the drawings required to be used in the description of the embodiments or the related art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
Fig. 1 is a schematic flow chart of a real-time data synchronization method according to an embodiment of the present invention;
fig. 2 is a schematic flow chart of another real-time data synchronization method according to an embodiment of the present invention;
FIG. 3 is a schematic flow chart of another real-time data synchronization method according to an embodiment of the present invention;
fig. 4 is a structural diagram of a specific embodiment of a real-time data synchronization apparatus according to an embodiment of the present invention;
fig. 5 is a structural diagram of another embodiment of a real-time data synchronization apparatus according to an embodiment of the present invention.
Detailed Description
In order that those skilled in the art will better understand the disclosure, the invention will be described in further detail with reference to the accompanying drawings and specific embodiments. It is to be understood that the described embodiments are merely exemplary of the invention, and not restrictive of the full scope of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The terms "first," "second," "third," "fourth," and the like in the description and claims of this application and in the above-described drawings are used for distinguishing between different objects and not for describing a particular order. Furthermore, the terms "comprising" and "having," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements but may include other steps or elements not expressly listed.
Having described the technical solutions of the embodiments of the present invention, various non-limiting embodiments of the present application are described in detail below.
Referring to fig. 1, fig. 1 is a schematic flow chart of a data real-time synchronization method provided by an embodiment of the present invention, and is applied to a Flink distributed framework, where the embodiment of the present invention may include the following contents:
s101: and reading the data to be processed from the Binlog according to the configuration information of the source database and the ClickHouse and the real-time synchronous task control information.
It can be understood that the data synchronization is a process of migrating data from a source database to a destination database, and the real-time data synchronization is a process of ensuring that the source database and the destination database are consistent in real time on the basis of full data in the system operation process, that is, the real-time synchronization process of data is an incremental data update process on the basis of data synchronization, as shown in fig. 2. The source database is a storage location of data to be processed, the clickwouse is used as a target database and is used as the location of the data to be processed, namely the data to be processed is migrated, the data in the source database can be written into a server node through replenification operation, namely data which changes after data synchronization or incremental data is written into a binary log Binlog at the same time, the Binlog stores the changing data of the source database, the Flink distributed framework reads the data from the Binlog through canal, then a Binlog resolver such as a binlogmt Reader in the Flink distributed framework is called to resolve the data, and the resolved data is converted and written into the clickwouse through a data processor such as a clickwriter in the Flink distributed framework. The binlog file can be stored in the directory where the database file is located by default. If the source database is MySQL, the binlog records SQL statements of data generation or potential data generation, and stores the SQL statements in a binary form in a disk. binlog can be used to view the change history of a database such as all SQL operations at a particular point in time, incremental backup and recovery of a database such as incremental backup and point in time based recovery, replication of Mysql such as replication of the master-master database, replication of the master-slave database. The real-time synchronization task control information is used for indicating which data to be processed is in the source database, which position is stored in the source database, what the operation type of the database corresponding to the data to be processed is, the type of the target field of the data to be processed, how to process the data to be processed, and the like, that is, the real-time synchronization task control information is used for determining the data to be processed and the specific task to be executed.
S102: and writing the data to be processed into the memory by adopting a matched data insertion operation mode according to the database operation type corresponding to the data to be processed.
The operand database types of the present application include data insertion operations, data deletion operations, and data update operations. The ClickHouse is a database with column type storage, and like other column type storage products such as HBase and Kudu, when data is imported, all the data are written in sequence, the written data cannot be changed, the updating and deleting operations are realized by a rotation mechanism, the performance is greatly influenced, and the ClickHouse is not recommended by an official. Therefore, the data synchronization of the ClickHouse comprises real-time synchronization, and only supports inserting data, but not modifying and deleting data. For the reason that a source database such as a random read-write line database MySQL database inevitably has a large number of operations for updating and deleting data, the present embodiment can utilize a replace Merge Tree engine of clickwouse to implement the operations for inserting, updating and deleting data of real-time synchronous MySQL, and ensure end-to-end data consistency, rather than only supporting data insertion.
S103: and when detecting that the data import operation of the target database is triggered, loading the data in the memory into the ClickHouse.
In order to reduce adverse effects caused by too large resource occupation due to frequent data writing, data needs to be kept consistent in real time, the requirement of real-time large data volume synchronization is met, and data inconsistency caused by long-time processing is avoided. And a target database data import operation triggering condition can be preset, and when the condition that the target database data import operation condition is met is detected, data is written into the ClickHouse from the internal memory of the Flink distributed framework. The trigger conditions may be, for example: and when the operation times for storing the data into the memory reach a preset time threshold or the current time reaches a Checkpoint period, loading the data in the memory into the clickwouse.
The embodiment is developed based on the flex, synchronizes the service data to the clickwouse in real time for performing OLAP real-time data analysis, and is a lightweight solution for solving real-time calculation or analysis of big data, or can be used as a real-time ETL (Extract-Transform-Load) tool for extracting, transforming, and loading Load data from a source end to a destination end by a bottom layer in an HTAP (Hybrid Transaction and Analytical processing) system.
In the technical scheme provided by the embodiment of the invention, business data of inserting, updating and deleting a source database can be converted and transmitted in a memory by using a Flink distributed framework, so that the real-time synchronization of end-to-end high-performance data from a source end to a destination end ClickHouse is realized, the end-to-end data consistency is ensured, the phenomenon that the data synchronization is inconsistent easily occurs in the long-time processing of the related technology is solved, the real-time performance of the data synchronization is effectively improved, and the real-time synchronization requirement of high performance is met; in addition, the Flink distributed framework can adjust the concurrency according to actual needs, and the real-time synchronization requirement of high performance is further met.
In the foregoing embodiment, how to perform step S102 is not limited, and this application also provides an implementation manner in conjunction with fig. 3, and after the source to-be-processed data is parsed and extracted from Binlog, the to-be-processed data is converted into data that can be recognized and processed by clickwouse based on a destination table field of the read to-be-processed data, that is, based on a character type, such as integer type or float type, and then written into the clickwouse through a Flink distribution framework, the following contents may be included:
the preset data insertion operation mode comprises a field and a main key, wherein the field is used for identifying the type of the operation database, and the data of the main key is the data corresponding to the execution of the insertion operation. If the database operation type corresponding to the data to be processed is data insertion operation, writing the data of the corresponding main key into the memory; if the operation type of the database corresponding to the data to be processed is data deletion operation, deleting the corresponding data by utilizing the same main key merging function of the database; if the operation type of the database corresponding to the data to be processed is data updating operation, the data insertion operation mode comprises a first field, a first main key, a second field and a second main key; and writing the data of the first main key and the data of the second main key into the memory in sequence. The first field is used to identify the execution of a data delete operation and the second field is used to identify the execution of a data write operation.
As shown in fig. 3, that is, in the data synchronization process, the source database may write data that needs to be updated synchronously in real time into the source table, the destination database writes data that is synchronized in real time into the destination table, and in order to update all types of incremental data in real time, the destination table synchronized in real time in clickshouse may use a data engine of replacing MergeTree, which is a data engine that is used most in clickshouse, and supports mass data, index, and partition; the replacing MergeTree is an upgraded version of the method, the latest row of the same rows of data of the main key is reserved according to a version field, and other rows are automatically eliminated after the data are merged. That is, in this embodiment, any add/delete update operation of the source database, such as MySQL end source table, is converted into an insert operation of the clickwouse end, where the insert is the latest version of the data, such as the maximum value of the _ version field, and if the delete operation _ sign field is-1, it indicates that the data of the primary key is deleted. If valid data of the table is to be acquired, the following clickwouse SQL is used: select from table final _ sign is 1.
The embodiment is end-to-end real-time data synchronization from a source end to a target end, data is converted and transmitted in a memory, and the method is an implementation of an end-to-end efficient real-time ETL tool.
It can be understood that the real-time data synchronization task is ended due to reasons such as power failure, downtime and system operation errors in the real-time data synchronization process, and the embodiment for enabling the system to have the fault recovery capability is further provided. After the data to be processed is written into the memory by adopting a matched data insertion operation mode according to the database operation type corresponding to the data to be processed in S102, the following contents may also be included:
and recording the processing time point of the data to be processed and the position in the Bilog in the Checkpoint so as to record the Bilog synchronous site in the state in real time. And when the data real-time synchronization task is started again, automatically recovering the data real-time synchronization task from the position corresponding to the fault point and recorded in the Binlog.
The CheckPoint saves the state of the history at some time by giving a snapshot to the program, and after the task is suspended, the task is recovered from the last saved complete snapshot by default. The state concept in the Flink distributed framework is: the flink has data stored in the state function and operator during the processing of each element/event. The state data can be modified and inquired, can be maintained by self, and can store historical data or intermediate results into a state according to the service scene of the self. By utilizing a checkpoint mechanism of the Flink, the binlog synchronization site can be recorded into the state of the Flink in real time, for example, the state can be recorded through checkpoint every 10 seconds, if a synchronization fault occurs, a restarting task can automatically recover the previously recorded binlog synchronization site from checkpoint, so that the automatic recovery of the fault point is realized, and the specific binlog site does not need to be determined like the same real-time synchronization products, and then the fault recovery is carried out.
The data real-time synchronization task restarted in this embodiment may be restarted by being forcibly exited due to interruption of the data real-time synchronization task caused by a fault, or restarted after being actively exited due to a human reason, or restarted after being exited due to any other reason. Not only do
In this embodiment, the synchronization site information of Binlog, such as Binlog, is recorded in the state in real time, and if the task of real-time synchronization fails, the state is recorded in the file system by using a checkpoint mechanism of Flink. When the task is restarted, as long as the data in the Binlog of the source database is not cleared or the site is not expired, the task can be automatically recovered from the Binlog site recorded at the fault point, and the data of the real-time synchronous task is ensured not to be lost.
In order to promote the practicality, promote user's performance, this application still provides the mode of generation of a parameter in the real-time synchronization task control information, can include:
receiving a concurrency degree setting instruction;
and setting the task thread number of the real-time synchronous task control information according to the concurrency number in the concurrency setting instruction.
In this embodiment, a Flink distributed framework is adopted to implement real-time data synchronization, and a Flink on Yarn mode may be adopted during operation. The writing concurrency of the ClickHouse can be adjusted in the real-time synchronous task control information according to the requirement, each concurrency is equivalent to a task thread, and the writing performance of the task is in positive correlation with the concurrency linearity.
It should be noted that, in the present application, there is no strict sequential execution order among the steps, and as long as a logical order is met, the steps may be executed simultaneously or according to a certain preset order, and fig. 1 to fig. 3 are only schematic manners, and do not represent only such an execution order.
The embodiment of the invention also provides a corresponding device for the real-time data synchronization method, thereby further ensuring that the method has higher practicability. Wherein the means can be described separately from the functional module point of view and the hardware point of view. In the following, the data real-time synchronization device provided by the embodiment of the present invention is introduced, and the data real-time synchronization device described below and the data real-time synchronization method described above may be referred to correspondingly.
Based on the angle of the functional module, referring to fig. 4, fig. 4 is a structural diagram of a data real-time synchronization apparatus provided in an embodiment of the present invention, in a specific implementation manner, and is applied to a Flink distributed framework, where the apparatus may include:
the data reading module 401 is configured to read data to be processed from the Binlog according to the configuration information of the source database and the clickwouse and the real-time synchronous task control information; binlog stores the change data of the source database.
A data writing module 402, configured to write the data to be processed into the memory in a matched data insertion operation manner according to the database operation type corresponding to the data to be processed; the operation database type includes a data insertion operation, a data deletion operation, and a data update operation.
The data synchronization module 403 is configured to load data in the memory into the clickwouse when detecting that a data import operation of the destination database is triggered.
Optionally, in some embodiments of this embodiment, the apparatus may further include a state recording module, configured to record a processing time point of the data to be processed and a position in the Binlog in Checkpoint, so as to record a Binlog synchronization point in a state in real time.
As some embodiments of this embodiment, the apparatus may further include a failure recovery module, configured to automatically recover the real-time data synchronization task from the failure point corresponding to the Binlog at the Binlog position when the real-time data synchronization task is started again.
As some other embodiments of this embodiment, the apparatus may further include a concurrency degree setting module, configured to set a task thread number of the real-time synchronization task control information according to the received concurrency degree in the concurrency degree setting instruction.
Optionally, in other embodiments of this embodiment, the data writing module 402 may be specifically configured to: presetting a data insertion operation mode comprising a field and a main key, wherein the field is used for identifying the type of an operation database; if the database operation type corresponding to the data to be processed is data insertion operation, writing the data of the corresponding main key into the memory; if the database operation type corresponding to the data to be processed is a data deleting operation, deleting the corresponding data by utilizing the primary key merging function; if the operation type of the database corresponding to the data to be processed is data updating operation, the data insertion operation mode comprises a first field, a first main key, a second field and a second main key; and writing the data of the first main key and the data of the second main key into the memory in sequence.
As some embodiments of this embodiment, the data synchronization module 403 may be specifically configured to load the data in the memory into the clickwouse when detecting that the operation frequency for storing the data in the memory reaches the preset frequency threshold or detecting that the current time reaches the Checkpoint period.
The functions of the functional modules of the data real-time synchronization apparatus according to the embodiments of the present invention may be specifically implemented according to the method in the foregoing method embodiments, and the specific implementation process may refer to the related description of the foregoing method embodiments, which is not described herein again.
Therefore, the embodiment of the invention solves the problem that the data synchronization is easily inconsistent after long-time processing in the related technology, effectively improves the real-time performance of the data synchronization, and meets the high-performance real-time synchronization requirement.
The above mentioned data real-time synchronizer is described from the perspective of functional modules, and further, the present application also provides a data real-time synchronizer, which is described from the perspective of hardware. Fig. 5 is a structural diagram of another data real-time synchronization apparatus according to an embodiment of the present application. As shown in fig. 5, the apparatus comprises a memory 50 for storing a computer program;
the processor 51, when executing the computer program, is configured to implement the steps of the data real-time synchronization method according to any of the above embodiments.
The processor 51 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and the like. The processor 51 may be implemented in at least one hardware form of a DSP (Digital Signal Processing), an FPGA (Field-Programmable Gate Array), and a PLA (Programmable Logic Array). The processor 51 may also include a main processor and a coprocessor, where the main processor is a processor for Processing data in an awake state, and is also called a Central Processing Unit (CPU); a coprocessor is a low power processor for processing data in a standby state. In some embodiments, the processor 51 may be integrated with a GPU (Graphics Processing Unit), which is responsible for rendering and drawing the content required to be displayed on the display screen. In some embodiments, the processor 51 may further include an AI (Artificial Intelligence) processor for processing computing operations related to machine learning.
Memory 50 may include one or more computer-readable storage media, which may be non-transitory. Memory 50 may also include high speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In this embodiment, the memory 50 is at least used for storing the following computer program 501, wherein after being loaded and executed by the processor 51, the computer program can implement the relevant steps of the data real-time synchronization method disclosed in any of the foregoing embodiments. In addition, the resources stored in the memory 50 may also include an operating system 502, data 503, and the like, and the storage manner may be a transient storage manner or a permanent storage manner. Operating system 502 may include Windows, Unix, Linux, etc. The data 503 may include, but is not limited to, data corresponding to real-time synchronization results of the data, and the like.
In some embodiments, the real-time data synchronization device may further include a display 52, an input/output interface 53, a communication interface 54, a power supply 55, and a communication bus 56.
Those skilled in the art will appreciate that the configuration shown in FIG. 5 does not constitute a limitation of a real-time data synchronization apparatus and may include more or fewer components than those shown, such as sensor 57.
The functions of the functional modules of the data real-time synchronization apparatus according to the embodiments of the present invention may be specifically implemented according to the method in the foregoing method embodiments, and the specific implementation process may refer to the related description of the foregoing method embodiments, which is not described herein again.
Therefore, the embodiment of the invention solves the problem that the data synchronization is easily inconsistent after long-time processing in the related technology, effectively improves the real-time performance of the data synchronization, and meets the high-performance real-time synchronization requirement.
It is to be understood that, if the data real-time synchronization method in the above embodiments is implemented in the form of software functional units and sold or used as a stand-alone product, it can be stored in a computer readable storage medium. Based on such understanding, the technical solutions of the present application may be substantially or partially implemented in the form of a software product, which is stored in a storage medium and executes all or part of the steps of the methods of the embodiments of the present application, or all or part of the technical solutions. And the aforementioned storage medium includes: a U disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), an electrically erasable programmable ROM, a register, a hard disk, a removable magnetic disk, a CD-ROM, a magnetic or optical disk, and other various media capable of storing program codes.
Based on this, an embodiment of the present invention further provides a computer-readable storage medium, which stores a data real-time synchronization program, where the data real-time synchronization program is executed by a processor, and the data real-time synchronization method according to any one of the above embodiments is provided.
The functions of the functional modules of the computer-readable storage medium according to the embodiment of the present invention may be specifically implemented according to the method in the foregoing method embodiment, and the specific implementation process may refer to the related description of the foregoing method embodiment, which is not described herein again.
Therefore, the embodiment of the invention solves the problem that the data synchronization is easily inconsistent after long-time processing in the related technology, effectively improves the real-time performance of the data synchronization, and meets the high-performance real-time synchronization requirement.
The embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same or similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.
Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
The present application provides a method, an apparatus, and a computer-readable storage medium for real-time synchronization of data. The principles and embodiments of the present invention are explained herein using specific examples, which are presented only to assist in understanding the method and its core concepts. It should be noted that, for those skilled in the art, it is possible to make various improvements and modifications to the present invention without departing from the principle of the present invention, and those improvements and modifications also fall within the scope of the claims of the present application.
Claims (10)
1. A real-time data synchronization method is applied to a Flink distributed framework and comprises the following steps:
reading data to be processed from the Binlog according to configuration information of a source database and a ClickHouse and real-time synchronous task control information; the Binlog stores the change data of the source database;
writing the data to be processed into a memory by adopting a matched data insertion operation mode according to the database operation type corresponding to the data to be processed; the operation database type comprises a data insertion operation, a data deletion operation and a data updating operation;
and when detecting that the data import operation of the target database is triggered, loading the data in the memory into the ClickHouse.
2. The real-time data synchronization method according to claim 1, wherein after writing the data to be processed into the memory by using the matched data insertion operation mode, the method further comprises:
and recording the processing time point of the data to be processed and the position in the Bilog in Checkpoint so as to record the Bilog synchronization site in a state in real time.
3. The real-time data synchronization method according to claim 2, wherein after the data to be processed is written into the memory by using the matched data insertion operation mode, the method further comprises:
and when the data real-time synchronization task is started again, automatically recovering the data real-time synchronization task from the position point corresponding to the fault point and recorded in the Binlog.
4. The real-time data synchronization method according to claim 1, wherein when it is detected that a destination database data import operation is triggered, loading data in the memory into the ClickHouse includes:
and when the operation times of storing the data into the memory reach a preset time threshold or the current time reaches a Checkpoint period, loading the data in the memory into the clickwouse.
5. The real-time data synchronization method according to any one of claims 1 to 4, wherein writing the data to be processed into the memory by using a matched data insertion operation mode according to the database operation type corresponding to the data to be processed comprises:
presetting the data insertion operation mode to comprise a field and a main key, wherein the field is used for identifying the type of an operation database;
if the database operation type corresponding to the data to be processed is the data insertion operation, writing the data of the corresponding main key into the memory;
if the database operation type corresponding to the data to be processed is the data deleting operation, deleting the corresponding data by utilizing the same-main key merging function;
if the database operation type corresponding to the data to be processed is the data updating operation, the data insertion operation mode comprises a first field, a first main key, a second field and a second main key; and writing the data of the first main key and the data of the second main key into the memory in sequence.
6. The method of claim 5, wherein before reading the data to be processed from the Binlog according to the configuration information of the source database and the clickwouse and the real-time synchronization task control information, the method further comprises:
receiving a concurrency degree setting instruction;
and setting the task thread number of the real-time synchronous task control information according to the concurrency degree in the concurrency degree setting instruction.
7. A real-time data synchronization device is applied to a Flink distributed framework and comprises the following components:
the data reading module is used for reading data to be processed from the Binlog according to the configuration information of the source database and the clickwouse and the real-time synchronous task control information; the Binlog stores the change data of the source database;
the data writing module is used for writing the data to be processed into the memory by adopting a matched data insertion operation mode according to the database operation type corresponding to the data to be processed; the operation database type comprises a data insertion operation, a data deletion operation and a data updating operation;
and the data synchronization module is used for loading the data in the memory into the ClickHouse when detecting that the data import operation of the target database is triggered.
8. The real-time data synchronization device according to claim 7, further comprising a status recording module for recording a processing time point of the data to be processed and a position in the Binlog in Checkpoint to record the Binlog synchronization site in a status in real time.
9. A real-time data synchronization apparatus comprising a processor for implementing the steps of the real-time data synchronization method according to any one of claims 1 to 6 when executing a computer program stored in a memory.
10. A computer-readable storage medium, having stored thereon a real-time data synchronization program, which when executed by a processor implements the steps of the real-time data synchronization method according to any one of claims 1 to 6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110371580.5A CN113010608A (en) | 2021-04-07 | 2021-04-07 | Data real-time synchronization method and device and computer readable storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110371580.5A CN113010608A (en) | 2021-04-07 | 2021-04-07 | Data real-time synchronization method and device and computer readable storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113010608A true CN113010608A (en) | 2021-06-22 |
Family
ID=76387889
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110371580.5A Pending CN113010608A (en) | 2021-04-07 | 2021-04-07 | Data real-time synchronization method and device and computer readable storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113010608A (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113791934A (en) * | 2021-08-13 | 2021-12-14 | 阿里云计算有限公司 | Data recovery method, computing device and storage medium |
CN114003660A (en) * | 2021-11-05 | 2022-02-01 | 广州宸祺出行科技有限公司 | Method and device for efficiently synchronizing real-time data to click House based on flash |
CN114297216A (en) * | 2021-12-30 | 2022-04-08 | 北京金堤科技有限公司 | Data synchronization method and device, computer storage medium and electronic equipment |
CN114297214A (en) * | 2021-12-30 | 2022-04-08 | 北京金堤科技有限公司 | Data synchronization method and device, computer storage medium and electronic equipment |
CN117171811A (en) * | 2023-09-12 | 2023-12-05 | 浪潮数字(山东)建设运营有限公司 | Database synchronization and tamper-resistant tracing method and device and electronic equipment |
WO2024082693A1 (en) * | 2022-10-21 | 2024-04-25 | 华为云计算技术有限公司 | Data processing method, and apparatus |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112100147A (en) * | 2020-07-27 | 2020-12-18 | 杭州玳数科技有限公司 | Method and system for realizing real-time acquisition from Bilog to HIVE based on Flink |
-
2021
- 2021-04-07 CN CN202110371580.5A patent/CN113010608A/en active Pending
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112100147A (en) * | 2020-07-27 | 2020-12-18 | 杭州玳数科技有限公司 | Method and system for realizing real-time acquisition from Bilog to HIVE based on Flink |
Non-Patent Citations (4)
Title |
---|
BENOIT SANCHEZ: "clickhouse-使用 ReplacingMergeTree 作为可更新表:如何删除?", pages 1, Retrieved from the Internet <URL:《https://www.nuomiphp.com/a/stackoverflow/zh/647e0f08116dda29a74510a4.html》> * |
CLICKHOUSE EDITOR: "How to Update Data in ClickHouse", pages 1, Retrieved from the Internet <URL:《https://clickhouse.com/blog/how-to-update-data-in-click-house》> * |
阿里云云栖号: "基于 Flink SQL CDC 的实时数据同步方案", pages 1, Retrieved from the Internet <URL:《https://zhuanlan.zhihu.com/p/280742219》> * |
阿里云云栖号: "基于Flink SQL CDC的实时数据同步方案", pages 1, Retrieved from the Internet <URL:《https://zhuanlan.zhihu.com/p/280742219》> * |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113791934A (en) * | 2021-08-13 | 2021-12-14 | 阿里云计算有限公司 | Data recovery method, computing device and storage medium |
CN114003660A (en) * | 2021-11-05 | 2022-02-01 | 广州宸祺出行科技有限公司 | Method and device for efficiently synchronizing real-time data to click House based on flash |
CN114003660B (en) * | 2021-11-05 | 2022-06-03 | 广州宸祺出行科技有限公司 | Method and device for efficiently synchronizing real-time data to click House based on flash |
CN114297216A (en) * | 2021-12-30 | 2022-04-08 | 北京金堤科技有限公司 | Data synchronization method and device, computer storage medium and electronic equipment |
CN114297214A (en) * | 2021-12-30 | 2022-04-08 | 北京金堤科技有限公司 | Data synchronization method and device, computer storage medium and electronic equipment |
CN114297216B (en) * | 2021-12-30 | 2022-09-02 | 北京金堤科技有限公司 | Data synchronization method and device, computer storage medium and electronic equipment |
WO2024082693A1 (en) * | 2022-10-21 | 2024-04-25 | 华为云计算技术有限公司 | Data processing method, and apparatus |
CN117171811A (en) * | 2023-09-12 | 2023-12-05 | 浪潮数字(山东)建设运营有限公司 | Database synchronization and tamper-resistant tracing method and device and electronic equipment |
CN117171811B (en) * | 2023-09-12 | 2024-04-05 | 浪潮数字(山东)建设运营有限公司 | Database synchronization and tamper-resistant tracing method and device and electronic equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113010608A (en) | Data real-time synchronization method and device and computer readable storage medium | |
US10261869B2 (en) | Transaction processing using torn write detection | |
US9779128B2 (en) | System and method for massively parallel processing database | |
US8250033B1 (en) | Replication of a data set using differential snapshots | |
US10430298B2 (en) | Versatile in-memory database recovery using logical log records | |
CN113111129A (en) | Data synchronization method, device, equipment and storage medium | |
CN108920698A (en) | A kind of method of data synchronization, device, system, medium and electronic equipment | |
CN112286905A (en) | Data migration method and device, storage medium and electronic equipment | |
US11436139B2 (en) | Object storage change-events | |
CN105159818A (en) | Log recovery method in memory data management and log recovery simulation system in memory data management | |
WO2014060881A1 (en) | Consistency group management | |
CN109614054B (en) | data reading method and system | |
JP2022078978A (en) | Method for equipping computer for data synchronization in data analysis system, computer program product, and computer system (data synchronization in data analysis system) | |
CN110019063A (en) | Method, terminal device and the storage medium of calculate node data disaster tolerance playback | |
CN106155838A (en) | A kind of database back-up data restoration methods and device | |
CN112434108B (en) | Database synchronization method, device and equipment | |
US20180025034A1 (en) | Archival of data in a relational database management system using block level copy | |
US11093348B2 (en) | Method, device and computer program product for recovering metadata | |
CN109614273B (en) | Method and system for reading incremental data | |
CN105659214B (en) | The checkpointing of data cell set | |
CN114138424B (en) | Virtual machine memory snapshot generation method and device and electronic equipment | |
CN115640280A (en) | Data migration method and device | |
US20220253409A1 (en) | Cleaning compensated change records in transaction logs | |
CN115237875B (en) | Log data processing method, device, equipment and storage medium | |
CN111984460B (en) | Metadata recovery method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |