CN117149909A - Data synchronization method, device, storage medium and processor - Google Patents

Data synchronization method, device, storage medium and processor Download PDF

Info

Publication number
CN117149909A
CN117149909A CN202311139696.1A CN202311139696A CN117149909A CN 117149909 A CN117149909 A CN 117149909A CN 202311139696 A CN202311139696 A CN 202311139696A CN 117149909 A CN117149909 A CN 117149909A
Authority
CN
China
Prior art keywords
data
batch
synchronized
synchronization
data source
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311139696.1A
Other languages
Chinese (zh)
Inventor
段雪超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Construction Bank Corp
CCB Finetech Co Ltd
Original Assignee
China Construction Bank Corp
CCB Finetech Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Construction Bank Corp, CCB Finetech Co Ltd filed Critical China Construction Bank Corp
Priority to CN202311139696.1A priority Critical patent/CN117149909A/en
Publication of CN117149909A publication Critical patent/CN117149909A/en
Pending legal-status Critical Current

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application provides a data synchronization method, a data synchronization device, a processor and a storage medium. The data synchronization method comprises the following steps: responding to a real-time data synchronization instruction, and acquiring a stream processing configuration file; judging whether the database has new operation or not by monitoring the log file of the database under the condition that the external data source is a database type data source, and acquiring data to be synchronized from the external data source under the condition that the database is determined to have the new operation; under the condition that the external data source is a non-standardized data source, acquiring data to be synchronized from the external data source; based on the stream processing configuration file, converting the data to be synchronized into service data, and sending the service data to a target position to realize data synchronization. Therefore, real-time streaming synchronization of heterogeneous data sources is realized, the data synchronization can be realized based on rules in a streaming configuration file, hot loading is supported, the new online synchronization flow cannot influence the existing synchronization task based on hot loading capacity, and the service influence rate is extremely low.

Description

Data synchronization method, device, storage medium and processor
Technical Field
The present application relates to the field of computer technology, and in particular, to a data synchronization method, a data synchronization device, a machine-readable storage medium, a computer program product, and a processor.
Background
Generally, an automatic operation and maintenance system of an enterprise needs to be in butt joint with various cross-domain (such as a host class, a platform class, an application class, a network class and the like) data of a cross-system (such as a user center, a configuration management database, a large data platform, a log platform, a change platform and the like), and meanwhile, because the upstream and downstream systems of the automatic operation and maintenance system of each branch company of the enterprise are inconsistent in construction, a unified synchronization scheme is difficult to design, and when data synchronization is needed, independent development is needed for different upstream and downstream system interfaces of different branch companies, and the efficiency is low.
Kettle is a tool that extracts (extracts), converts (transforms), and loads (loads) data from a source to a destination. Kettle can abstract (Extraction), load (Loading), drop (Data Lake Injection), clean (Cleaning), transform (Transformation), and blend (Blending) various data sources. The existing automatic operation and maintenance system generally adopts Kettle to carry out data synchronization, and along with the increasing complexity of the service of the automatic operation and maintenance system, the data sources of the butt joint are various, and a plurality of quasi-real-time synchronization scenes exist, but Kettle does not support the real-time streaming synchronization of the data, so that the existing automatic operation and maintenance system adopting Kettle cannot carry out the real-time streaming synchronization of the data.
Disclosure of Invention
It is an object of embodiments of the present application to provide a data synchronization method, a data synchronization device, a machine-readable storage medium, a computer program product and a processor. The data synchronization method realizes real-time streaming synchronization of heterogeneous data sources, and the data synchronization can be realized based on rules in a streaming configuration file, so that hot loading is supported, and the synchronization process of new online can not influence the existing synchronization task based on the hot loading capability, and the service influence rate is extremely low.
In order to achieve the above object, a first aspect of the present application provides a data synchronization method, including:
responding to a real-time data synchronization instruction, and acquiring a stream processing configuration file;
judging whether the database has new operation or not by monitoring a log file of the database under the condition that the external data source is a database type data source, and acquiring data to be synchronized from the external data source under the condition that the database is determined to have the new operation;
under the condition that an external data source is a non-standardized data source, acquiring data to be synchronized from the external data source;
and converting the data to be synchronized into service data based on the stream processing configuration file, and sending the service data to a target position to realize data synchronization.
In the embodiment of the present application, the obtaining the data to be synchronized from the external data source in the case that it is determined that the database has a new operation includes:
judging whether the new operation is an updating operation or an inserting operation;
determining the new operation as an updating operation or an inserting operation, and acquiring a main key ID of data corresponding to the new operation;
and searching the data to be synchronized from the external data source based on the primary key ID of the data.
In an embodiment of the present application, the data synchronization method further includes:
responding to a batch data synchronization instruction to acquire a batch configuration file;
acquiring a batch scheduling flow under the condition that the external data source is a non-standardized data source; the batch scheduling flow is obtained by scheduling pre-packaged node assemblies, and the node assemblies are used for realizing each stage in the data synchronization process;
and processing the data to be synchronized in the external data source according to the batch arrangement flow based on the batch configuration file to obtain batch data, and sending the batch data to a target position to realize data synchronization.
In the embodiment of the present application, the batch configuration file includes a cron expression, and the processing the data to be synchronized in the external data source according to the batch arrangement flow based on the batch configuration file to obtain batch data includes:
And based on the cron expression, processing the data to be synchronized in the external data source according to the batch processing arrangement flow at regular time to obtain batch processing data.
In the embodiment of the present application, the batch configuration file includes a fragmentation parameter, and the processing the data to be synchronized in the external data source according to the batch arrangement flow based on the batch configuration file to obtain batch data, and sending the batch data to a target location to realize data synchronization includes:
dividing the data to be synchronized in the external data source into a plurality of business data to be synchronized based on the slicing parameters; wherein, each business data to be synchronized is not related;
based on the batch configuration file, processing each service data to be synchronized according to the batch arrangement flow by adopting a distributed processing mode to obtain batch data, and sending the batch data to a target position to realize data synchronization.
In an embodiment of the present application, the sending the service data to a target location to achieve data synchronization includes:
selecting a corresponding output engine according to the data type of the external data source;
And selecting a pooling interface corresponding to the target position from a preset connection pool, and sending the service data to the target position based on the output engine and the pooling interface so as to realize data synchronization.
A second aspect of the present application provides a data synchronizing device, including:
the first configuration file acquisition module is used for responding to the real-time data synchronization instruction and acquiring a stream processing configuration file;
the first data acquisition module is used for judging whether the database has new operation or not by monitoring the log file of the database under the condition that the external data source is a database type data source, and acquiring data to be synchronized from the external data source under the condition that the database is determined to have the new operation;
the second data acquisition module is used for acquiring data to be synchronized from the external data source under the condition that the external data source is a non-standardized data source;
and the first synchronous output module is used for converting the data to be synchronized into service data based on the stream processing configuration file and sending the service data to a target position so as to realize data synchronization.
In an embodiment of the present application, the first data acquisition module includes:
A judging unit configured to judge whether the new operation is an update operation or an insert operation;
a primary key obtaining unit, configured to determine that the new operation is an update operation or an insert operation, and obtain a primary key ID of data corresponding to the new operation;
and the searching unit is used for searching the data to be synchronized from the external data source based on the primary key ID of the data.
In an embodiment of the present application, the data synchronization device further includes:
the second configuration file acquisition module is used for responding to the batch processing data synchronization instruction to acquire a batch processing configuration file;
the arrangement flow acquisition module is used for acquiring a batch processing arrangement flow under the condition that the external data source is a non-standardized data source; the batch scheduling flow is obtained by scheduling pre-packaged node assemblies, and the node assemblies are used for realizing each stage in the data synchronization process;
and the second synchronous output module is used for processing the data to be synchronized in the external data source according to the batch arrangement flow based on the batch configuration file to obtain batch data, and sending the batch data to a target position to realize data synchronization.
A third aspect of the present application provides a processor configured to perform the data synchronization method described above.
A fourth aspect of the application provides a machine-readable storage medium having stored thereon instructions which, when executed by a processor, cause the processor to be configured to perform the data synchronization method described above.
A fifth aspect of the application provides a computer program product comprising a computer program which, when executed by a processor, implements the data synchronization method described above.
According to the technical scheme, the stream processing configuration file is obtained by responding to the real-time data synchronization instruction; judging whether the database has new operation or not by monitoring a log file of the database under the condition that the external data source is a database type data source, and acquiring data to be synchronized from the external data source under the condition that the database is determined to have the new operation; under the condition that an external data source is a non-standardized data source, acquiring data to be synchronized from the external data source; and converting the data to be synchronized into service data based on the stream processing configuration file, and sending the service data to a target position to realize data synchronization. Therefore, real-time synchronization of heterogeneous data sources is realized, data synchronization can be realized based on rules in the stream processing configuration file, hot loading is supported, the new online synchronization flow cannot influence the existing synchronization task based on hot loading capacity, and the service influence rate is extremely low.
Additional features and advantages of embodiments of the application will be set forth in the detailed description which follows.
Drawings
The accompanying drawings are included to provide a further understanding of embodiments of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain, without limitation, the embodiments of the application. In the drawings:
FIG. 1 schematically illustrates an application environment of a data synchronization method according to an embodiment of the application;
FIG. 2 schematically shows a flow diagram of a data synchronization method according to an embodiment of the application;
FIG. 3 schematically illustrates a system architecture diagram according to an embodiment of the application;
FIG. 4 schematically shows a block diagram of a data synchronization device according to an embodiment of the application;
fig. 5 schematically shows an internal structural view of a computer device according to an embodiment of the present application.
Description of the reference numerals
102-terminal; 104-a server; 410-a first profile acquisition module; 420-a first data acquisition module; 430-a second data acquisition module; 440-a first synchronous output module; a01-a processor; a02-a network interface; a03-an internal memory; a04-a display screen; a05-an input device; a06—a nonvolatile storage medium; b01-operating system; b02-computer program.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it should be understood that the detailed description described herein is merely for illustrating and explaining the embodiments of the present application, and is not intended to limit the embodiments of the present application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
It should be noted that, if directional indications (such as up, down, left, right, front, and rear … …) are included in the embodiments of the present application, the directional indications are merely used to explain the relative positional relationship, movement conditions, etc. between the components in a specific posture (as shown in the drawings), and if the specific posture is changed, the directional indications are correspondingly changed.
In addition, if there is a description of "first", "second", etc. in the embodiments of the present application, the description of "first", "second", etc. is for descriptive purposes only and is not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include at least one such feature. In addition, the technical solutions of the embodiments may be combined with each other, but it is necessary to base that the technical solutions can be realized by those skilled in the art, and when the technical solutions are contradictory or cannot be realized, the combination of the technical solutions should be considered to be absent and not within the scope of protection claimed in the present application.
It should be noted that, in the technical scheme of the application, the acquisition, storage, use, processing and the like of the data all meet the relevant regulations of national laws and regulations.
The data synchronization method provided by the application can be applied to an application environment shown in figure 1. Wherein the terminal 102 communicates with the server 104 via a network. Obtaining a stream processing configuration file; judging whether the database has new operation or not by monitoring the log file of the database under the condition that the external data source is a database type data source, and acquiring data to be synchronized from the external data source under the condition that the database is determined to have the new operation; under the condition that the external data source is a non-standardized data source, acquiring data to be synchronized from the external data source; based on the stream processing configuration file, converting the data to be synchronized into service data, and sending the service data to a target position to realize data synchronization. Therefore, real-time synchronization of heterogeneous data sources is realized, the data synchronization can be realized based on rules in the stream processing configuration file, so that hot loading is supported, the existing synchronization task cannot be influenced by a new online synchronization process, and the service influence rate is extremely low. The terminal 102 may be, but not limited to, various personal computers, notebook computers, smartphones, tablet computers, and portable wearable devices, and the server 104 may be implemented by a stand-alone server or a server cluster composed of a plurality of servers.
Fig. 2 schematically shows a flow diagram of a data synchronization method according to an embodiment of the application. As shown in fig. 2, in an embodiment of the present application, a data synchronization method is provided, and this embodiment is mainly applied to the terminal 102 (or the server 104) in fig. 1 to exemplify the method, and includes the following steps:
step 210: responding to a real-time data synchronization instruction, and acquiring a stream processing configuration file;
in this embodiment, when real-time data synchronization is required, the user obtains a stream processing configuration file through editing configuration, where the stream processing configuration file includes a stream processing engine configuration, so that a different engine configuration is used without a data source, for example, the data source may be MySQL, elasticSearch, kafka, etc., and configuration items include, but are not limited to, a data source IP, a data source port, a data source name, a data source account number/password, a processing thread number, an output engine name, a rule name, and support for dynamic refresh. The stream processing profile also includes a stream processing rule configuration: and configuring the implementation class and parameters corresponding to the rule, supporting dynamic refreshing and the like. Different data sources correspond to different stream processing configuration files, and the configuration files can be specifically set by a user according to actual conditions.
Step 220: judging whether the database has new operation or not by monitoring a log file of the database under the condition that the external data source is a database type data source, and acquiring data to be synchronized from the external data source under the condition that the database is determined to have the new operation;
in this embodiment, when the external data source is a database type data source, the display message log (Display Message Log, DML) file of the database may be monitored, and then, when new operation of the database is monitored, the data to be synchronized may be obtained. The method specifically comprises the following steps:
firstly, judging whether the new operation is an updating operation or an inserting operation;
then, determining the new operation as an updating operation or an inserting operation, and acquiring a main key ID of data corresponding to the new operation;
and finally, searching out the data to be synchronized from the external data source based on the primary key ID of the data.
In this embodiment, when a database has a new operation, it is determined whether the new operation belongs to an update or insert operation, if any one of the new operation belongs to the update or insert operation, the primary key ID of the piece of data may be obtained from the DML log, and if not any one of the new operation belongs to the update or insert operation, the new operation does not belong to the update or insert operation; and then the corresponding current data is found in the external data source by using the main key ID so as to obtain the data to be synchronized.
By monitoring the log file, whether a new operation exists or not can be judged in real time, corresponding data to be synchronized can be obtained when the new operation is an updating operation or an inserting operation, and changed data in a database can be quickly obtained in real time, so that real-time synchronization is facilitated.
Step 230: under the condition that an external data source is a non-standardized data source, acquiring data to be synchronized from the external data source;
in this embodiment, the non-standardized data sources include network attached storage (Network Attac hed Storage, NAS) files, highly fault tolerant system (Hadoop Distributed File Syste m, HDFS) files, platform interfaces, and the like. The data to be synchronized is collected by a file, an interface or a message queue mode. In specific implementation, the non-standardized data source can push data to the streaming collection unit in the above three modes, and the streaming collection unit obtains the data to be synchronized from the message queue or the file or the interface.
Step 240: and converting the data to be synchronized into service data based on the stream processing configuration file, and sending the service data to a target position to realize data synchronization.
In this embodiment, a rule is configured in the stream processing configuration file, and the acquired data to be synchronized can be parsed and converted into service data based on the rule, and then the service data is sent to the target location. The target location may be a destination system or a database, etc. Specifically, when the target position is sent, the method comprises the following steps:
Firstly, selecting a corresponding output engine according to the data type of an external data source;
and then selecting a pooling interface corresponding to the target position from a preset connection pool, and sending the service data to the target position based on the output engine and the pooling interface so as to realize data synchronization.
In this embodiment, data may be output for different external systems or databases, where different external systems or databases have different output engines, and these output engines may be preset to support HDFS, mySQL, kafka, time-series database (Influx DataBase, influxDB), and the like, where different output engines are selected by different databases or external systems. The connection pool is pre-stored with pooled interfaces for connecting different destination positions so as to realize data communication with different databases or systems.
And the pooling interface is selected for connection, and connection is obtained from a connection pool during data output, so that the CPU or memory resource abnormality caused by unlimited connection creation is prevented. By setting the data synchronous output engine and the corresponding interface, various new engines and rules are supported and developed in an expanding way so as to realize the butt joint with different middleware, databases and systems.
In the implementation process, the stream processing configuration file is obtained by responding to the real-time data synchronization instruction; judging whether the database has new operation or not by monitoring a log file of the database under the condition that the external data source is a database type data source, and acquiring data to be synchronized from the external data source under the condition that the database is determined to have the new operation; under the condition that an external data source is a non-standardized data source, acquiring data to be synchronized from the external data source; and converting the data to be synchronized into service data based on the stream processing configuration file, and sending the service data to a target position to realize data synchronization. Therefore, real-time streaming synchronization of heterogeneous data sources is realized, the data synchronization can be realized based on rules in a streaming configuration file, hot loading is supported, the new online synchronization flow cannot influence the existing synchronization task based on hot loading capacity, and the service influence rate is extremely low.
In some embodiments, a batch synchronization method is further provided for a non-standardized data source, and specifically the data synchronization method further includes:
firstly, responding to a batch data synchronization instruction to acquire a batch configuration file;
Then, under the condition that the external data source is a non-standardized data source, acquiring a batch processing scheduling flow; the batch scheduling flow is obtained by scheduling pre-packaged node assemblies, and the node assemblies are used for realizing each stage in the data synchronization process;
and finally, processing the data to be synchronized in the external data source according to the batch arrangement flow based on the batch configuration file to obtain batch data, and sending the batch data to a target position to realize data synchronization.
In this embodiment, when batch data synchronization is required, a user obtains a batch configuration file through editing configuration, where the batch configuration file includes: batch scheduling configuration, batch task configuration and the like, wherein the batch scheduling configuration can be that a data synchronization flow is scheduled by dragging a node component and then by means of connection and configuration, so as to obtain a batch scheduling flow. The batch processing arrangement flow is generated by supporting the mode of adopting the dragging component and the connecting line, so that the tool learning cost is low, and the synchronous flow development efficiency is high. The node components are obtained by defining synchronization of data as different nodes, such as reading excel, field replacement, line filtering, field conversion and the like, as a single node, and then packaging the nodes into one component. The execution of the batch orchestration procedure is performed according to the node procedure defined in the batch configuration file.
It should be noted that, the above method for synchronizing the batch processing data to the target position and the method for synchronizing the stream processing data may be the same, and data output may be performed for different external systems or databases, where different external systems or databases have different output engines, and specific implementation processes will not be described herein.
The non-standardized data sources are subjected to batch processing data synchronization so as to meet different business requirements. Meanwhile, the batch processing arrangement flow is generated by supporting the mode of adopting the dragging component and the connecting line, so that the tool learning cost is low, and the synchronous flow development efficiency is high.
Considering that Kettle does not support distributed task scheduling management, task re-running, distributed scheduling and task slicing in a batch processing scene cannot be guaranteed, and the problem of slow batch processing performance in a big data synchronization scene cannot be met. In some embodiments, in order to achieve the synchronization of batch data in a timing manner, the batch configuration file includes a cron expression, and the processing, based on the batch configuration file, the data to be synchronized in the external data source according to the batch scheduling flow to obtain batch data includes: and based on the cron expression, processing the data to be synchronized in the external data source according to the batch processing arrangement flow at regular time to obtain batch processing data.
In this embodiment, the implementation may be based on Spring integrated qurate timer development. The cron expression is a character string expression for designating a task to be performed at a certain point of time or periodically. The method comprises 6 or 7 domains, each domain represents different meanings, and is ' seconds, time sharing, days, months, weeks and days ' in sequence from left to right ', wherein the year is not necessary, the configuration of the cron expression is simple and convenient, the batch arrangement flow in the batch data synchronization process is executed at fixed time according to the configured cron expression and the time point in the cron expression, and then the function of timing data synchronization is realized. The timing function can be realized by setting the cron expression, and the configuration is simple.
In some embodiments, in order to meet a situation that more data may be encountered and a slicing process is required when performing batch data synchronization, specifically, when performing batch data synchronization, the batch configuration file includes slicing parameters, and based on the batch configuration file, the processing the data to be synchronized in the external data source according to the batch scheduling flow to obtain batch data, and send the batch data to a target location to achieve data synchronization, including:
Firstly, dividing data to be synchronized in the external data source into a plurality of service data to be synchronized based on the slicing parameters; wherein, each business data to be synchronized is not related;
and then, based on the batch configuration file, processing each service data to be synchronized by adopting a distributed processing mode according to the batch arrangement flow to obtain batch data, and sending the batch data to a target position to realize data synchronization.
In this embodiment, the data to be synchronized is divided into a plurality of service data to be synchronized, so that there is no service data association between each service data to be synchronized, so as to facilitate distributed processing. And then carrying out distributed processing on the service data to be synchronized according to different service fields according to the batch processing configuration file, processing each service data to be synchronized according to a batch processing arrangement flow, and then sending the processed data to a target position to realize synchronization.
By setting the slicing parameters, the big data can be synchronously sliced and batched, the distributed slicing scheduling processing task is realized, the data throughput is high, and the stability is high.
How data synchronization is achieved in this embodiment is further described with reference to fig. 3, where fig. 3 schematically illustrates a system configuration diagram according to an embodiment of the present application. The system is divided into 5 modules: the system comprises a stream processing module, a batch processing module, an output module, a task scheduling module and a configuration management module.
Wherein, the configuration management module includes: stream processing engine configuration: different engine configurations are not used for the data source, such as mysql, elastic search, kafka and the like, and configuration items comprise data source IP, data source ports, data source names, data source account numbers/passwords, processing thread numbers, output engine names, rule names and support for dynamic refreshing; stream processing rule configuration: configuring the implementation class and parameters corresponding to the rule, and supporting dynamic refreshing; batch orchestration configuration: dragging the batch processing node, and arranging a data synchronization flow in a connection and configuration mode; batch processing task configuration: configuring timing execution of batch arrangement, configuring fragments, notifying mailboxes, retrying policies, and the like.
When the external data source is a database type data source, a processing engine in the stream processing module monitors a DML log file of the database, judges whether the data belongs to updating/inserting operation or not when the database has new operation, and acquires a main key ID of the data if the data belongs to updating/inserting operation; and acquiring current data by using the primary key ID to obtain data to be synchronized. Analyzing and converting the acquired data into service data based on rules in the configuration file; after the engine collects a line of data, different analysis, filtering, merging and expansion are executed according to rule content defined by the configuration file.
In the case that the external data source is a non-standardized data source and batch data synchronization is adopted, the batch module defines the synchronization of the data as different nodes, and the scheduling flow of the batch is executed according to the node flow defined in the configuration file.
The output module receives output data of the stream processing and batch processing module and outputs data aiming at different external systems/databases. Different external systems/databases are provided with different output engines, the pooling interface is connected with the databases, and connection is required to be obtained from a connection pool during data output, so that the CPU/memory resource abnormality caused by unlimited connection creation is prevented.
The task scheduling module is a module for scheduling batch processing tasks at fixed time and is developed and realized based on a Spring integrated Quatz timer; and according to the configured cron expression, the arrangement in the batch processing module is executed at fixed time, and according to the slicing parameters, the data are processed in a distributed mode according to different service fields. The related personnel are notified when the task fails or is abnormal, and the strategy is retried after the task fails.
FIG. 2 is a flow chart of a data synchronization method in one embodiment. It should be understood that, although the steps in the flowchart of fig. 2 are shown in sequence as indicated by the arrows, the steps are not necessarily performed in sequence as indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in fig. 2 may include multiple sub-steps or stages that are not necessarily performed at the same time, but may be performed at different times, nor do the order in which the sub-steps or stages are performed necessarily performed in sequence, but may be performed alternately or alternately with at least a portion of the sub-steps or stages of other steps or other steps.
Based on the same inventive concept, the present embodiment provides a data synchronization device, referring to fig. 4, and fig. 4 schematically shows a block diagram of the data synchronization device according to an embodiment of the present application. The data synchronization device includes a first profile acquisition module 410, a first data acquisition module 420, a second data acquisition module 430, and a first synchronization output module 440, wherein:
a first profile acquisition module 410, configured to acquire a stream processing profile in response to a real-time data synchronization instruction;
the first data obtaining module 420 is configured to determine whether a database has a new operation by monitoring a log file of the database in a case where an external data source is a database type data source, and obtain data to be synchronized from the external data source in a case where it is determined that the database has the new operation;
the second data acquisition module 430 is configured to acquire data to be synchronized from an external data source if the external data source is a non-standardized data source;
the first synchronization output module 440 is configured to convert the data to be synchronized into service data based on the stream processing configuration file, and send the service data to a target location, so as to achieve data synchronization.
Wherein the first data acquisition module 420 includes:
a judging unit configured to judge whether the new operation is an update operation or an insert operation;
a primary key obtaining unit, configured to determine that the new operation is an update operation or an insert operation, and obtain a primary key ID of data corresponding to the new operation;
and the searching unit is used for searching the data to be synchronized from the external data source based on the primary key ID of the data.
Wherein, the data synchronization device further includes:
the second configuration file acquisition module is used for responding to the batch processing data synchronization instruction to acquire a batch processing configuration file;
the arrangement flow acquisition module is used for acquiring a batch processing arrangement flow under the condition that the external data source is a non-standardized data source; the batch scheduling flow is obtained by scheduling pre-packaged node assemblies, and the node assemblies are used for realizing each stage in the data synchronization process;
and the second synchronous output module is used for processing the data to be synchronized in the external data source according to the batch arrangement flow based on the batch configuration file to obtain batch data, and sending the batch data to a target position to realize data synchronization.
Wherein, the batch configuration file includes a cron expression, and the second synchronous output module includes:
and the timing unit is used for processing the data to be synchronized in the external data source according to the batch arrangement flow at fixed time based on the cron expression to obtain batch data.
The batch processing configuration file includes a slicing parameter, and the second synchronous output module includes:
the slicing unit is used for dividing the data to be synchronized in the external data source into a plurality of service data to be synchronized based on the slicing parameters; wherein, each business data to be synchronized is not related;
and the slicing processing unit is used for processing each business data to be synchronized according to the batch arrangement flow by adopting a distributed processing mode based on the batch configuration file to obtain batch data, and sending the batch data to a target position to realize data synchronization.
Wherein the first synchronous output module 440 includes:
the engine determining unit is used for selecting a corresponding output engine according to the data type of the external data source;
and the interface selection unit is used for selecting a pooling interface corresponding to the target position in a preset connection pool, and sending the service data to the target position based on the output engine and the pooling interface so as to realize data synchronization.
The data synchronization device includes a processor and a memory, where the first profile acquiring module 410, the first data acquiring module 420, the second data acquiring module 430, the first synchronization output module 440, etc. are stored as program units in the memory, and the processor executes the program modules stored in the memory to implement corresponding functions.
The processor includes a kernel, and the kernel fetches the corresponding program unit from the memory. The kernel can be provided with one or more than one, and the real-time data streaming synchronization is realized by adjusting the kernel parameters.
The memory may include volatile memory, random Access Memory (RAM), and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM), among other forms in computer readable media, the memory including at least one memory chip.
An embodiment of the present application provides a machine-readable storage medium having stored thereon a program which, when executed by a processor, implements the above-described data synchronization method.
The embodiment of the application provides a processor for running a program, wherein the data synchronization method is executed when the program runs.
In one embodiment, a computer device is provided, which may be a terminal, and the internal structure of which may be as shown in fig. 5. The computer apparatus includes a processor a01, a network interface a02, a display screen a04, an input device a05, and a memory (not shown in the figure) which are connected through a system bus. Wherein the processor a01 of the computer device is adapted to provide computing and control capabilities. The memory of the computer device includes an internal memory a03 and a nonvolatile storage medium a06. The nonvolatile storage medium a06 stores an operating system B01 and a computer program B02. The internal memory a03 provides an environment for the operation of the operating system B01 and the computer program B02 in the nonvolatile storage medium a06. The network interface a02 of the computer device is used for communication with an external terminal through a network connection. The computer program is executed by the processor a01 to implement a data synchronization method. The display screen a04 of the computer device may be a liquid crystal display screen or an electronic ink display screen, and the input device a05 of the computer device may be a touch layer covered on the display screen, or may be a key, a track ball or a touch pad arranged on a casing of the computer device, or may be an external keyboard, a touch pad or a mouse.
It will be appreciated by those skilled in the art that the structure shown in FIG. 5 is merely a block diagram of some of the structures associated with the present inventive arrangements and is not limiting of the computer device to which the present inventive arrangements may be applied, and that a particular computer device may include more or fewer components than shown, or may combine some of the components, or have a different arrangement of components.
In one embodiment, the data synchronization apparatus provided by the present application may be implemented in the form of a computer program that is executable on a computer device as shown in fig. 5. The memory of the computer device may store various program modules constituting the data synchronization apparatus, such as the first profile acquisition module 410, the first data acquisition module 420, the second data acquisition module 430, and the first synchronization output module 440 shown in fig. 4. The computer program of each program module causes a processor to execute the steps of the data synchronization method of each embodiment of the present application described in the present specification.
The computer device shown in fig. 5 may perform step 210 through the first profile acquisition module 410 in the data synchronization apparatus as shown in fig. 4. The computer device may perform step 220 by the first data acquisition module 420, the second data acquisition module 430 performs step 230, and the first synchronization output module 440 performs step 240.
The application also provides a computer program product adapted to perform, when executed on a data processing device, a program initialized with the method steps of:
responding to a real-time data synchronization instruction, and acquiring a stream processing configuration file;
judging whether the database has new operation or not by monitoring a log file of the database under the condition that the external data source is a database type data source, and acquiring data to be synchronized from the external data source under the condition that the database is determined to have the new operation;
under the condition that an external data source is a non-standardized data source, acquiring data to be synchronized from the external data source;
and converting the data to be synchronized into service data based on the stream processing configuration file, and sending the service data to a target position to realize data synchronization.
In one embodiment, the obtaining the data to be synchronized from the external data source in the case that it is determined that the database has a new operation includes:
judging whether the new operation is an updating operation or an inserting operation;
determining the new operation as an updating operation or an inserting operation, and acquiring a main key ID of data corresponding to the new operation;
and searching the data to be synchronized from the external data source based on the primary key ID of the data.
In one embodiment, the data synchronization method further comprises:
responding to a batch data synchronization instruction to acquire a batch configuration file;
acquiring a batch scheduling flow under the condition that the external data source is a non-standardized data source; the batch scheduling flow is obtained by scheduling pre-packaged node assemblies, and the node assemblies are used for realizing each stage in the data synchronization process;
and processing the data to be synchronized in the external data source according to the batch arrangement flow based on the batch configuration file to obtain batch data, and sending the batch data to a target position to realize data synchronization.
In one embodiment, the batch configuration file includes a cron expression, and the processing the data to be synchronized in the external data source according to the batch scheduling flow based on the batch configuration file to obtain batch data includes:
and based on the cron expression, processing the data to be synchronized in the external data source according to the batch processing arrangement flow at regular time to obtain batch processing data.
In one embodiment, the batch configuration file includes a fragmentation parameter, and the processing the data to be synchronized in the external data source according to the batch arrangement flow based on the batch configuration file to obtain batch data, and sending the batch data to a target location to achieve data synchronization includes:
Dividing the data to be synchronized in the external data source into a plurality of business data to be synchronized based on the slicing parameters; wherein, each business data to be synchronized is not related;
based on the batch configuration file, processing each service data to be synchronized according to the batch arrangement flow by adopting a distributed processing mode to obtain batch data, and sending the batch data to a target position to realize data synchronization.
In one embodiment, the sending the service data to the target location to achieve data synchronization includes:
selecting a corresponding output engine according to the data type of the external data source;
and selecting a pooling interface corresponding to the target position from a preset connection pool, and sending the service data to the target position based on the output engine and the pooling interface so as to realize data synchronization.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, etc., such as Read Only Memory (ROM) or flash RAM. Memory is an example of a computer-readable medium.
Computer-readable media include both permanent and non-permanent, removable and non-removable media, and information storage may be implemented by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises an element.
The foregoing is merely exemplary of the present application and is not intended to limit the present application. Various modifications and variations of the present application will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. which come within the spirit and principles of the application are to be included in the scope of the claims of the present application.

Claims (12)

1. A data synchronization method, characterized in that the data synchronization method comprises:
responding to a real-time data synchronization instruction, and acquiring a stream processing configuration file;
judging whether the database has new operation or not by monitoring a log file of the database under the condition that the external data source is a database type data source, and acquiring data to be synchronized from the external data source under the condition that the database is determined to have the new operation;
Under the condition that an external data source is a non-standardized data source, acquiring data to be synchronized from the external data source;
and converting the data to be synchronized into service data based on the stream processing configuration file, and sending the service data to a target position to realize data synchronization.
2. The method according to claim 1, wherein said obtaining data to be synchronized from said external data source in case it is determined that the database has a new operation comprises:
judging whether the new operation is an updating operation or an inserting operation;
determining the new operation as an updating operation or an inserting operation, and acquiring a main key ID of data corresponding to the new operation;
and searching the data to be synchronized from the external data source based on the primary key ID of the data.
3. The method of claim 1, wherein the data synchronization method further comprises:
responding to a batch data synchronization instruction to acquire a batch configuration file;
acquiring a batch scheduling flow under the condition that the external data source is a non-standardized data source; the batch scheduling flow is obtained by scheduling pre-packaged node assemblies, and the node assemblies are used for realizing each stage in the data synchronization process;
And processing the data to be synchronized in the external data source according to the batch arrangement flow based on the batch configuration file to obtain batch data, and sending the batch data to a target position to realize data synchronization.
4. The method of claim 3, wherein the batch configuration file includes a cron expression, and wherein the processing the data to be synchronized in the external data source according to the batch scheduling procedure based on the batch configuration file to obtain batch data includes:
and based on the cron expression, processing the data to be synchronized in the external data source according to the batch processing arrangement flow at regular time to obtain batch processing data.
5. The method of claim 3, wherein the batch configuration file includes a fragmentation parameter, and wherein the processing the data to be synchronized in the external data source according to the batch scheduling procedure to obtain batch data and sending the batch data to a target location based on the batch configuration file to achieve data synchronization includes:
dividing the data to be synchronized in the external data source into a plurality of business data to be synchronized based on the slicing parameters; wherein, each business data to be synchronized is not related;
Based on the batch configuration file, processing each service data to be synchronized according to the batch arrangement flow by adopting a distributed processing mode to obtain batch data, and sending the batch data to a target position to realize data synchronization.
6. The method of claim 1, wherein said sending the traffic data to a target location to achieve data synchronization comprises:
selecting a corresponding output engine according to the data type of the external data source;
and selecting a pooling interface corresponding to the target position from a preset connection pool, and sending the service data to the target position based on the output engine and the pooling interface so as to realize data synchronization.
7. A data synchronization device, the data synchronization device comprising:
the first configuration file acquisition module is used for responding to the real-time data synchronization instruction and acquiring a stream processing configuration file;
the first data acquisition module is used for judging whether the database has new operation or not by monitoring the log file of the database under the condition that the external data source is a database type data source, and acquiring data to be synchronized from the external data source under the condition that the database is determined to have the new operation;
The second data acquisition module is used for acquiring data to be synchronized from the external data source under the condition that the external data source is a non-standardized data source;
and the first synchronous output module is used for converting the data to be synchronized into service data based on the stream processing configuration file and sending the service data to a target position so as to realize data synchronization.
8. The apparatus of claim 7, wherein the first data acquisition module comprises:
a judging unit configured to judge whether the new operation is an update operation or an insert operation;
a primary key obtaining unit, configured to determine that the new operation is an update operation or an insert operation, and obtain a primary key ID of data corresponding to the new operation;
and the searching unit is used for searching the data to be synchronized from the external data source based on the primary key ID of the data.
9. The apparatus of claim 7, wherein the data synchronization apparatus further comprises:
the second configuration file acquisition module is used for responding to the batch processing data synchronization instruction to acquire a batch processing configuration file;
the arrangement flow acquisition module is used for acquiring a batch processing arrangement flow under the condition that the external data source is a non-standardized data source; the batch scheduling flow is obtained by scheduling pre-packaged node assemblies, and the node assemblies are used for realizing each stage in the data synchronization process;
And the second synchronous output module is used for processing the data to be synchronized in the external data source according to the batch arrangement flow based on the batch configuration file to obtain batch data, and sending the batch data to a target position to realize data synchronization.
10. A processor configured to perform the data synchronization method according to any one of claims 1 to 6.
11. A machine-readable storage medium having instructions stored thereon, which when executed by a processor cause the processor to be configured to perform the data synchronization method according to any one of claims 1 to 6.
12. A computer program product comprising a computer program, characterized in that the computer program, when executed by a processor, implements the data synchronization method according to any one of claims 1 to 6.
CN202311139696.1A 2023-09-05 2023-09-05 Data synchronization method, device, storage medium and processor Pending CN117149909A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311139696.1A CN117149909A (en) 2023-09-05 2023-09-05 Data synchronization method, device, storage medium and processor

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311139696.1A CN117149909A (en) 2023-09-05 2023-09-05 Data synchronization method, device, storage medium and processor

Publications (1)

Publication Number Publication Date
CN117149909A true CN117149909A (en) 2023-12-01

Family

ID=88883928

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311139696.1A Pending CN117149909A (en) 2023-09-05 2023-09-05 Data synchronization method, device, storage medium and processor

Country Status (1)

Country Link
CN (1) CN117149909A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118051564A (en) * 2024-04-16 2024-05-17 天津南大通用数据技术股份有限公司 Quart-based distributed asynchronous data synchronization method

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118051564A (en) * 2024-04-16 2024-05-17 天津南大通用数据技术股份有限公司 Quart-based distributed asynchronous data synchronization method

Similar Documents

Publication Publication Date Title
US11768811B1 (en) Managing user data in a multitenant deployment
US11550829B2 (en) Systems and methods for load balancing in a system providing dynamic indexer discovery
US20200014590A1 (en) Automatic generation of template for provisioning services in a hosted computing environment
US10459884B1 (en) Filesystem block sampling to identify user consumption of storage resources
US11615082B1 (en) Using a data store and message queue to ingest data for a data intake and query system
US8676763B2 (en) Remote data protection in a networked storage computing environment
US10338958B1 (en) Stream adapter for batch-oriented processing frameworks
US9058334B2 (en) Parallel file system processing
CN111209011A (en) Cross-platform container cloud automatic deployment system
CN107577420B (en) File processing method and device and server
US10909000B2 (en) Tagging data for automatic transfer during backups
US11966797B2 (en) Indexing data at a data intake and query system based on a node capacity threshold
CN104657497A (en) Mass electricity information concurrent computation system and method based on distributed computation
US10567557B2 (en) Automatically adjusting timestamps from remote systems based on time zone differences
CA3059738A1 (en) Behaviour data processing method, device, electronic device and computer readable medium
US11892976B2 (en) Enhanced search performance using data model summaries stored in a remote data store
CN117149909A (en) Data synchronization method, device, storage medium and processor
CN112162821A (en) Container cluster resource monitoring method, device and system
CN112506870A (en) Data warehouse increment updating method and device and computer equipment
CN114416868B (en) Data synchronization method, device, equipment and storage medium
CN113486095A (en) Civil aviation air traffic control cross-network safety data exchange management platform
US11841827B2 (en) Facilitating generation of data model summaries
CN113708994A (en) Go language-based cloud physical host and cloud server monitoring method and system
Horalek et al. Proposed Solution for Log Collection and Analysis in Kubernetes Environment
US11836125B1 (en) Scalable database dependency monitoring and visualization system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination