CN116578650A

CN116578650A - Data synchronization method, device, computer equipment and storage medium

Info

Publication number: CN116578650A
Application number: CN202310841212.1A
Authority: CN
Inventors: 明培楠; 熊鹏飞; 阙金超; 李渊湲; 杨鑫颖
Original assignee: Taiping Finance Technology Services Shanghai Co ltd
Current assignee: Taiping Finance Technology Services Shanghai Co ltd
Priority date: 2023-07-11
Filing date: 2023-07-11
Publication date: 2023-08-11

Abstract

The application relates to a data synchronization method, a data synchronization device, computer equipment and a storage medium, and relates to the technical field of databases. The method comprises the following steps: reading target task information of data to be synchronized into a memory list; assigning the stored target task information in the object of the memory list to the target variable according to the relation between the object and the target variable in the memory list; inputting the value of the target variable into a template of a business processing logic unit to generate a target business processing logic unit corresponding to each task; and finishing the data synchronization of the corresponding tasks through the target service processing logic unit. The method realizes the multiplexing of the service processing logic units, reduces the number of the service processing logic units of engineering, and reduces the development workload.

Description

Data synchronization method, device, computer equipment and storage medium

Technical Field

The present application relates to the field of database technologies, and in particular, to a data synchronization method, apparatus, computer device, and storage medium.

Background

Kettle is an ETL tool of foreign open source, is written by pure java, can run on Windows, linux, unix, and is efficient and stable in data extraction.

In the conventional technology, in the application scenario of ETL data synchronization, multiple tables of a source database need to be synchronized into a target database, if a ketle platform is used as a synchronization tool, each data table has its own information column Field, we need to create a Transformation (service processing logic unit) for each data table, if the data volume of the data table is large, the Transformation of the service processing logic unit of the whole project increases, the development workload is large, and the later maintenance work is error-prone and tedious.

Disclosure of Invention

In view of the foregoing, it is desirable to provide a data synchronization method, apparatus, computer device, and computer readable storage medium that can implement service processing logic multiplexing, reduce the number of service processing logic units in engineering, and thereby reduce development effort.

In a first aspect, the present application provides a data synchronization method, the method comprising:

reading target task information of data to be synchronized into a memory list;

assigning the stored target task information in the object of the memory list to the target variable according to the relation between the object and the target variable in the memory list;

Inputting the value of the target variable into a template of a business processing logic unit to generate a target business processing logic unit corresponding to each task;

and finishing the data synchronization of the corresponding tasks through the target service processing logic unit.

In one embodiment, the data synchronization for completing the corresponding task by the target service processing logic unit includes:

and the data synchronization of the corresponding tasks is completed through parallel processing of each target service processing logic unit.

In one embodiment, the source database and/or the target database corresponding to the data to be synchronized includes at least two databases; the step of inputting the value of the target variable into a template of a business processing logic unit to generate a target business processing logic unit corresponding to each task, comprising the following steps:

splitting the task according to the source database and the target database to obtain a task unique to the source database and the target database;

and inputting the value of the target variable into a template of the business processing logic unit to generate a target business processing logic unit corresponding to each task.

Assigning the data in the target variable to a Java variable;

reading database connection information from a metadata configuration library, and generating an operable Java object according to the database connection information;

generating a table structure of a cache table corresponding to a target database according to the method of the operable Java object and the Java variable;

and synchronizing the data to be synchronized in the source database to the cache table, and synchronizing the data to be synchronized in the cache table to the target database.

In one embodiment, the reading the target task information of the data to be synchronized into the memory list includes:

reading initial task information of data to be synchronized, wherein the initial task information is generated according to information of a source database of the data to be synchronized configured by a user;

preprocessing the initial task information to obtain target task information corresponding to a memory list;

and loading the target task information into a memory list.

In one embodiment, the initial task information includes metadata information of a source database and necessary information of a data synchronization task; the metadata information of the source database changes along with the change of the data structure of the source database.

In one embodiment, after the reading the target task information of the data to be synchronized into the memory list, the method further includes:

generating state monitoring information corresponding to the target task information;

after the data synchronization of the corresponding task is completed by the target service processing logic unit, the method further comprises the following steps:

and after all tasks in the target task information are executed, modifying the state monitoring information corresponding to the target task information.

In a second aspect, the present application also provides a data synchronization device, the device comprising:

the reading module is used for reading target task information of the data to be synchronized into the memory list;

the binding module is used for assigning the stored target task information in the object of the memory list to the target variable according to the relation between the object and the target variable in the memory list;

the business processing logic unit generating module is used for inputting the value of the target variable to a template of the business processing logic unit to generate a target business processing logic unit corresponding to each task;

and the data synchronization module is used for completing the data synchronization of the corresponding tasks through the target service processing logic unit.

In a third aspect, the present application also provides a computer device comprising a memory storing a computer program and a processor implementing the steps of the method of any one of the embodiments described above when the computer program is executed by the processor.

In a fourth aspect, the present application also provides a computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of the method of any of the embodiments described above.

According to the data synchronization method, the data synchronization device, the computer equipment and the storage medium, the target task information of the data to be synchronized is input into the memory list and assigned to the corresponding target variable, so that the value of the target variable is input into the template of the service processing logic unit to generate the target service processing logic unit corresponding to each task, one service processing logic unit is not required to be developed for each service, multiplexing of the service processing logic units is achieved, the number of the service processing logic units of engineering is reduced, and accordingly development workload is reduced.

Drawings

FIG. 1 is an application environment diagram of a data synchronization method in one embodiment;

FIG. 2 is a flow chart of a data synchronization method in one embodiment;

FIG. 3 is a schematic diagram of template multiplexing of a business processing logic unit in one embodiment;

FIG. 4 is a flow chart of a data synchronization method according to another embodiment;

FIG. 5 is a block diagram of a data synchronization device in one embodiment;

fig. 6 is an internal structural diagram of a computer device in one embodiment.

Detailed Description

The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.

The data synchronization method provided by the embodiment of the application can be applied to an application environment shown in fig. 1. Wherein the terminal 102 communicates with the source database 104 and the target database 106 via a network. The terminal 102 reads target task information corresponding to the data to be synchronized in the source database 104 into the memory list, and then assigns stored target task information in an object of the memory list to a target variable according to a relationship between the object and the target variable in the memory list, so that a value of the target variable is input to a template of the service processing logic unit, and a target service processing logic unit corresponding to each task is generated, thereby completing data synchronization of the corresponding task according to the target service processing logic unit. Therefore, multiplexing of the service processing logic units is realized, and the number of the service processing logic units in engineering is reduced, so that the development workload is reduced.

The terminal 102 may be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, internet of things devices, and portable wearable devices, where the internet of things devices may be smart speakers, smart televisions, smart air conditioners, smart vehicle devices, and the like. The portable wearable device may be a smart watch, smart bracelet, headset, or the like. The source database 104 may be implemented as a stand-alone database or as a cluster of databases comprised of multiple databases.

For convenience of understanding, the terms introduced in the present application are explained first, ETL: is a short for the processing flow of data extraction, conversion and loading. Kettle: is an open-source ETL, task flow configuration and scheduling platform system. Transformation (conversion): the Kettle platform is a basic logic unit for task flow processing. DataBase (DataBase): is an organized collection of structured information or data (typically stored in electronic form in a computer system). Table (Table): is an object in DataBase for storing structured data. DDL: data Definition Language for manipulating objects and properties of objects. Mainly used in initializing operations for defining or changing the structure of TABLEs (TABLE), data types, links and constraints between TABLEs, etc. DML: data Manipulation Language for manipulating data contained in database objects. Mainly SELECT, UPDATE, INSERT, DELETE, these 4 commands are the languages used to operate on the data in the database.

In one embodiment, as shown in fig. 2, a data synchronization method is provided, and the method is applied to the terminal in fig. 1 for illustration, and includes the following steps:

s202: and reading target task information of the data to be synchronized into the memory list.

Specifically, the data to be synchronized refers to data that needs to be synchronized from a source database to a target database. The target task information is information of the data to be synchronized, wherein the target task information can include metadata of the data to be synchronized. In one optional embodiment, the target task information further includes synchronization control information corresponding to the data to be synchronized. In one embodiment, the target task information may be stored in an ETL task information table, so that the metadata information table of the source database is read first, and then the custom ETL control information is added, so that the target task information may be obtained, where the ETL control information may be embedded point information and debug information fields that need to be used in the ETL process, and the like, and the method is not limited specifically herein. In one alternative embodiment, the metadata information table of the source database may be a table of all metadata information of the entire source database, or may be a metadata information table corresponding to data that changes in a specific period of time.

The memory list is a memory list object in the terminal memory, for example, the terminal reads data from the target task information and copies the read data to the corresponding memory list object.

In practical application, the target task information is used for recording object information of each task, thus, the object information is read through Kettle, a dynamic parameter definition conversion flow is generated, the object information is circularly read and loaded into the runtime parameters, and the task flow multiplexing is realized through a parameterized conversion component, so that the configuration of the ETL task flow is realized.

The target task information may include the following fields, but it should be noted that this embodiment is only illustrative, and in other embodiments, the fields may be more or smaller, and are set according to the requirement of the synchronization task: the table name of the table to be synchronized, that is, the name of the table of the data to be synchronized in the source database, is required. Srcchema, schema name of source library. Tgtgtschema, schema name of the target library. Module, defining module group names, for example, in one embodiment, includes an underwriting module, a claim settlement module, etc., where the group includes an underwriting group, a claim settlement group, and in other embodiments, other groups may be set, where it should be noted that each task in the same module group is serial processing, and tasks in different module groups may be parallel processing. Batch id, batch number. Srcrowcnt is used for comparing KPI. Tgtrowcnt, the table number of the target database after synchronization is used for comparing KPIs, wherein the meaning of the comparison KPIs is to compare the table number of the source database with the table number of the target database so as to judge whether the data synchronization is correct or not, if so, the error is timely reported, for example, a mail is sent to inform a creator of the target task that the synchronous data has problems, and the like. Status, status of executing synchronization, including: w, waiting for execution, R, executing, F, executing completion, E, executing failure, wherein the point to be described is that the synchronous state here refers to the synchronous state of the whole target task list, but not the synchronous state of one of the target tasks, that is, the execution completion state is displayed only if all the tasks in the target task list are executed, and the other states are similar. Begin execution time; endtime, execution completion time; primary key, primary key of the table to be synchronized; an updated field, a name of the sync time field; fields, list of sync fields, comma separated; the query condition of the synchronous table supports partial data synchronization; globalind, whether to perform full-scale synchronization, wherein the query condition of the synchronization table supports partial data synchronization, that is, only data to be synchronized is queried and then synchronized when partial data is synchronized. While full synchronization is all that is synchronized. Lastdealtime, the time of last synchronization success, that is, the completion time of last synchronization. The Checkind is used in combination with the above Srcrowcnt and Tgtrowcnt, and when the synchronous line number KPI is checked, the check is performed according to the Srcrowcnt and Tgtrowcnt, if not, the Srcrowcnt and Tgtrowcnt do not need to be processed in the synchronization process. Execind, whether or not to execute the synchronization flag, is used in conjunction with Status, and is not described in detail herein.

S204: and assigning the stored target task information in the object of the memory list to the target variable according to the relation between the object and the target variable in the memory list.

Specifically, the target variable may refer to a Kettle variable, which is a variable defined in the ETL tool Kettle. The target variable corresponds to a template of the service processing logic unit, wherein the Kettle provides a mapping relation between the target variable and the memory list, so that data in the memory list is assigned to the corresponding target variable according to the mapping relation.

It should be noted that, the above target task information may refer to information of one target task, or may refer to information of a plurality of target tasks, when the terminal performs task synchronization, the terminal may query the target task information according to an input query parameter, for example, an identifier of a scheduled task, so as to extract the target task information of a corresponding task row in the target task information, and store the target task information in the above-mentioned memory list, or may be referred to as a result set object in a ketle memory, so that according to a mapping relationship between the result set object and a target variable, the terminal circularly provides data in the result set object in the ketle memory to the reusable service processing logic unit, for example, when the ketle detects that there are multiple rows in the result set object, each time, binds current data to a corresponding ketle target variable, and then transmits the current data to a template of the service processing logic unit.

In practical applications, the corresponding mapping relationship between the Kettle target variable name and the target task information field name may be referred to in the following table, where variable scope type is a limitation of the value range of the variable, and for each target task information explanation, reference may be made above.

Table 1, mapping relation

S206: and inputting the value of the target variable into a template of the business processing logic unit to generate a target business processing logic unit corresponding to each task.

S208: and the data synchronization of the corresponding tasks is completed through the target service processing logic unit.

Specifically, the service processing logic unit may refer to Transformation in ketle. The terminal circularly transmits Kettle target variables of each row to a template of the service processing logic unit to generate a target service processing logic unit, and the data synchronization is completed through the target service processing logic unit.

Specifically, referring to fig. 3, fig. 3 is a schematic diagram of template multiplexing of a service processing logic unit in an embodiment, where a multiplexing request queue, that is, a ketle target variable of each row corresponds to one multiplexing request task, and the data of each ketle target variable in the multiplexing request queue is added to the multiplexing request queue, where the data of a modulname in each ketle target variable in the multiplexing request queue may determine whether the multiplexing request task may be processed in parallel, for example, each task in the same module group may be processed in serial, and tasks in different module groups may be processed in parallel.

The template of the service processing logic unit can be regarded as a template class, and is similar to normal form programming, the template of the service processing logic unit comprises a plurality of attributes, so that different behavior operations are performed according to different input attribute values in the running process to achieve different logic processing purposes, wherein the template of the service processing logic unit can dynamically generate required data definition language DDL, data query language DQL and data manipulation language DML statements according to the input attribute values, wherein the data definition language DDL statements are used for creating various objects in a database, such as tables, views, indexes, synonyms, clusters and the like, and in the embodiment, are used for generating the required temporary cache tables isomorphic with a target database, and the data manipulation language DML statements mainly have three forms: insertion: INSERT, update: UPDATE and delete: DELETE, in this embodiment, is a statement obtained by instantiating an sql script for data synchronization. The data query language DQL is a statement for querying the corresponding data in the source database, and the basic structure is a query block consisting of a SELECT clause, a FROM clause and a WHERE clause: SELECT < field name table >, FROM < table or view name >, and WHERE < query condition >. The types of the transmitted attribute values may be shown in fig. 3, for example, a database connection object, a database schema, a table name, a field list, a primary key, a table constraint, a filtering condition, an ETL processing mode, an ETL timestamp, etc., and the types of the transmitted attribute values correspond to the above target variables, which are not limited herein specifically.

In this embodiment, the parameter transmission of N:1 in the ketle can be implemented by multiplexing the templates of the service processing logic unit, and the parameter transmission of the same Transformation is different each time, so that the multi-directional parallel processing of different DDL, DQL, DML, and even different databases of the same task can be implemented for different table processing.

In one embodiment, steps S204 to S208 can be regarded as a process of multiplexing templates of the service processing logic unit. That is to say, after the task information is read, the task information is added into the corresponding multiplexing request queue, then the template of the service processing logic unit is multiplexed, the data synchronization is completed once, and the next task information in the memory list is continuously acquired until all the data synchronization is completed.

According to the data synchronization method, the target task information of the data to be synchronized is input into the memory list and assigned to the corresponding target variable, so that the value of the target variable is input into the template of the service processing logic unit to generate the target service processing logic unit corresponding to each task, one service processing logic unit does not need to be developed for each service, multiplexing of the service processing logic units is achieved, the number of the service processing logic units of the engineering is reduced, and accordingly development workload is reduced.

In one embodiment, the data synchronization of the corresponding task is completed by the target service processing logic unit, including: and the data synchronization of the corresponding tasks is completed through parallel processing of each target service processing logic unit.

In one embodiment, the source database and/or the target database corresponding to the data to be synchronized at least comprise two databases; inputting the value of the target variable into a template of the business processing logic unit, generating a target business processing logic unit corresponding to each task, comprising: splitting the task according to the source database and the target database to obtain a task unique to the source database and the target database; and inputting the value of the target variable into a template of the business processing logic unit to generate a target business processing logic unit corresponding to each task.

Specifically, the above two embodiments mainly describe a task parallel processing process, where, since different tasks can be processed in parallel, a terminal can generate multiple different target service processing logic units according to the tasks that can be processed in parallel in the processing process, so that each different target service processing logic unit can process in parallel, so as to improve the task processing efficiency.

In one embodiment, whether the target task can be processed in parallel may be determined according to the module group name in each row in the target task information, for example, each task in the same module group is processed in series, and tasks in different module groups may be processed in parallel. That is, the synchronization of all data in the underwriting module is serial, while the synchronization of data of the underwriting module and the claims module may be parallel. Therefore, in practical application, when new target task information is newly added in the multiplexing request queue, a module group of the target task information can be read, and then whether a new target service processing logic unit needs to be generated is judged based on the module group.

In other embodiments, whether to perform parallel processing may be further determined according to the task, for example, splitting the task according to the source database and the target database to obtain a task unique to both the source database and the target database; the values of the target variables are input into templates of the business processing logic units, and the target business processing logic units corresponding to each task are generated, so that the source database and the target database are one task, and each business can be processed in sequence.

In the embodiment, parallel processing can be performed according to the need, so that the processing efficiency is improved.

In one embodiment, the data synchronization of the corresponding task is completed by the target service processing logic unit, including: assigning data in the target variable to the Java variable; reading database connection information from a metadata configuration library, and generating an operable Java object according to the database connection information; generating a table structure of a cache table corresponding to the target database according to the method of the operable Java object and Java variables; and synchronizing the data to be synchronized in the source database to the cache table, and synchronizing the data to be synchronized in the cache table to the target database.

Specifically, in this embodiment, the process of generating the target service processing logic unit and performing data synchronization through the service processing logic unit is mainly described, where the process mainly involves transmitting a Kettle variable into a Java control of Kettle, generating a series of dynamic DDL and DML statements, and referring to an API method call database provided by Kettle, so as to complete the creation of a table structure, a primary key, buried point information and a temporary cache table of the target database.

Specifically, the data in the target variable is assigned to the Java variable, and the to-be-processed database connection name, table name and main key are mainly obtained from the Kettle variable and assigned to the Java variable.

And reading the database connection information from the metadata configuration library, generating an operable Java object according to the database connection information, namely reading the metadata configuration library according to an API method provided by Kettle, reading the database connection configuration information from the configuration library, acquiring the source database connection information and the target database connection information, and generating the Java object with the database operable.

The generating a table structure of a cache table corresponding to the target database according to the method of the operable Java object and Java variables may include: metadata information (field name, field number, primary key, non-null constraint) in the database is read according to a Java object method operable by the database, and an executable Sql script code string is generated. The Sql script code character string is further formatted, idempotent characteristic support is added, performance optimization processing (such as no record of archive log) is performed, case sensitive processing is performed, classification type conversion matching is performed according to data types (date, numerical value, binary system and XML) supported by different types of databases, and a DDL method of Java objects operable by the databases is called to generate a table structure.

In one preferred embodiment, the terminal may further splice the embedded point field $ { v_degradation }, and the information field $ { v_checkind of the subsequent process debugging tracking, which are imported by the Kettle variable, into an executable DDL through Java code, and call the DDL method of the Java object operable in the database to add the required fields. In one preferred embodiment, the terminal may also continue to create DDL to create a primary key, constraint, to the Sql script based on the Kettle variable $ { v_primary }. The DDL method, which continues to call the database-operable Java objects, creates the isomorphic temporary cache table required for the ETL process. This creates the table structure required for a single task by Java code in conjunction with the ketle variable.

The method for synchronizing the data to be synchronized in the source database to the cache table and synchronizing the data to be synchronized in the cache table to the target database comprises the following steps: the source data table data are synchronized into a temporary cache table, kettle variables are used in the Sql script template, when external Kettle variables are initialized, the Sql script template is instantiated to generate executable DML sentences, database execution is immediately submitted, and an execution result set is inserted into a table name $ { v_tgtatblename_tmp } pointed by the temporary table variables. Similarly, data from the temporary cache table to the target table are synchronized, and a Kettle variable $ { v_primary key } is used as a deleting condition according to a set main key, so that a historical redundancy record in a formal table $ { v_tgtattable name } is deleted; and then the record in the temporary table $ { v_tgtatcomeame_tmp } is inserted into the formal table $ { v_tgtatcomeame } to complete updating.

Thus, by traversing each row in the memory list, the data of each row is assigned to the corresponding Kettle variable with different values, so that different tables can be realized, and the same Transformation is used for completing data synchronization.

In the embodiment, the data structure generation of the ETL based on the Kettle platform is realized, and the data synchronization and automation processing is realized. Multiplexing is realized, repeated development tasks are avoided, and the working efficiency of the submitted Kettle development synchronous data is improved. The threshold of the data synchronous development task is reduced, the operation flow of Kettle is not required to be known, and the seamless butt joint with Kettle can be realized only by configuring relevant information in a database.

In one embodiment, reading target task information of data to be synchronized into the memory list includes: reading initial task information of the data to be synchronized, wherein the initial task information is generated according to information of a source database of the data to be synchronized configured by a user; preprocessing the initial task information to obtain target task information corresponding to the memory list; and loading the target task information into the memory list.

Specifically, in this embodiment, the terminal obtains detailed task information to be processed, for example, a source group, a table name, a target group, a temporary cache table name, a queue number, a primary key, a field list, a last update time, a screening condition, and the like, which are initial task information of the data to be synchronized, from the source database according to parameters, for example, a table name, a full/increment identifier, a queue serial number, and the like. And then further processing and formatting detailed information of the task to be processed, and expanding definition values of other derivative variables by means of splicing character strings and the like so that the definition values can be correctly received, identified and processed by a Kettle component in the subsequent process.

In one embodiment, the initial task information includes metadata information of the source database and necessary information of the data synchronization task; metadata information of the source database changes with changes in the data structure of the source database.

For the new type ETL development work, only one new data source is needed to be configured, and target task information is generated according to the new database metadata information table, so that Job and Transformation are not needed to be redesigned.

For the maintenance ETL work, the data structure of the source system is changed, and because the metadata information Table of the DataBase is used, the latest information of the DataBase can be automatically adapted, the target task information can be properly adjusted, and the latest DDL, DML and SQL can be regenerated to realize data synchronization.

In the embodiment, dynamically configurable Job and Transformation codes of the Kettle platform are realized, and seamless butt joint from Source Database to Kettle and then to TargetDatBuse is realized. The automation of the ETL task flow by the Kettle platform is improved, the configuration is realized, and the maintainability in the later period is improved.

In the embodiment, the configurable and automatic processing of the ETL based on the Kettle platform is realized.

In one embodiment, after reading the target task information of the data to be synchronized into the memory list, the method further includes: generating state monitoring information corresponding to the target task information; after finishing the data synchronization of the corresponding task by the target service processing logic unit, the method further comprises the following steps: and after all tasks in the target task information are executed, modifying the state monitoring information corresponding to the target task information.

Specifically, the state monitoring information refers to state information of target task information, wherein after all tasks in the target task information are executed, the state monitoring information corresponding to the target task information is modified to be completed, otherwise, the corresponding state monitoring information is in execution. When the target task information is in the multiplexing request queue, the corresponding state monitoring information is not executed, and if the execution error exists, the corresponding state monitoring information is in error.

In one embodiment, referring to fig. 4, fig. 4 is a flowchart of a data synchronization method in another embodiment, in which basic information of metadata information Table to be processed is collected by reading a DataBase metadata information Table, for example, a metadata information Table built in a system of DataBase. And then storing the information into an ETL task information table of the appointed DataBase, and adding some custom ETL control information, such as embedded point information, debugging information fields and the like which are required to be used in the ETL process, so that target task information is obtained.

When the terminal starts task scheduling, the Kettle queries task row information in the target task information table according to the input query parameters of the task, and adds the corresponding task row information to a memory list in the Kettle memory, which can also be called as a result set object.

And the terminal invokes the task in the corresponding memory list to judge whether the task is executed, if so, the terminal records a log, and the task is already in execution.

If not, updating the scheduling state of the task, if yes, recording a log, and if not, updating the scheduling state. If the task scheduling is successful, the task information in the memory list is read, the task information is assigned to a corresponding Kettle target variable, the task state is updated, DDL is dynamically generated, an ETL temporary cache table and a target table are created, DQL is dynamically generated, a source database table is queried, data is synchronized to the temporary cache table, DML is dynamically generated, historical redundant data in the target table is deleted, records in the temporary table are inserted into the target table, the task state is updated, the next row of the memory list is continuously acquired, the process is repeated until the task of each row in the memory list is processed, and the whole task state is updated.

The data synchronization method realizes the configurable and automatic processing of the ETL based on the Kettle platform. The data structure generation of the ETL based on the Kettle platform is realized, and the data synchronization and automation processing is realized. Multiplexing is realized, repeated development tasks are avoided, and the working efficiency of the submitted Kettle development synchronous data is improved. The threshold of the data synchronous development task is reduced, the operation flow of Kettle is not required to be known, and the seamless butt joint with Kettle can be realized only by configuring relevant information in a database.

It should be understood that, although the steps in the flowcharts related to the above embodiments are sequentially shown as indicated by arrows, these steps are not necessarily sequentially performed in the order indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in the flowcharts described in the above embodiments may include a plurality of steps or a plurality of stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the steps or stages is not necessarily performed sequentially, but may be performed alternately or alternately with at least some of the other steps or stages.

Based on the same inventive concept, the embodiment of the application also provides a data synchronization device for realizing the above related data synchronization method. The implementation of the solution provided by the device is similar to the implementation described in the above method, so the specific limitation in one or more embodiments of the data synchronization device provided below may refer to the limitation of the data synchronization method hereinabove, and will not be repeated herein.

In one embodiment, as shown in fig. 5, there is provided a data synchronization apparatus including: a reading module 501, an assignment module 502, a service processing logic unit generating module 503 and a data synchronizing module 504, wherein:

the reading module 501 is configured to read target task information of data to be synchronized into the memory list.

And the assignment module 502 is configured to assign the target task information stored in the object of the memory list to the target variable according to the relationship between the object of the memory list and the target variable.

The service processing logic unit generating module 503 is configured to input the value of the target variable to a template of the service processing logic unit, and generate a target service processing logic unit corresponding to each task.

And the data synchronization module 504 is configured to complete data synchronization of the corresponding task through the target service processing logic unit.

In one embodiment, the data synchronization module 504 is further configured to perform data synchronization of the corresponding task through parallel processing of each target service processing logic unit.

In one embodiment, the source database and/or the target database corresponding to the data to be synchronized at least comprise two databases; the service processing logic unit generating module 503 includes:

And the splitting unit is used for splitting the task according to the source database and the target database to obtain the unique task of the source database and the unique task of the target database.

And the business processing logic unit generating unit is used for inputting the value of the target variable into the template of the business processing logic unit and generating the target business processing logic unit corresponding to each task.

In one embodiment, the data synchronization module 504 includes:

and the assignment unit is used for assigning the data in the target variable to the Java variable.

And the Java object generating unit is used for reading the database connection information from the metadata configuration database and generating the operable Java object according to the database connection information.

And the table structure generating unit is used for generating a table structure of a cache table corresponding to the target database according to the method of the Java object with operability and Java variables.

The synchronization unit is used for synchronizing the data to be synchronized in the source database to the cache table and synchronizing the data to be synchronized in the cache table to the target database.

In one embodiment, the reading module 501 includes:

the initial task information reading unit is used for reading initial task information of the data to be synchronized, and the initial task information is generated according to information of a source database of the data to be synchronized configured by a user.

And the preprocessing unit is used for preprocessing the initial task information to obtain target task information corresponding to the memory list.

And the loading unit is used for loading the target task information to the memory list.

In one embodiment, the apparatus further includes:

the state monitoring information modification module is used for generating state monitoring information corresponding to the target task information; and when all tasks in the target task information are executed, modifying the state monitoring information corresponding to the target task information.

The modules in the data synchronization device may be implemented in whole or in part by software, hardware, or a combination thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.

In one embodiment, a computer device is provided, which may be a server, the internal structure of which may be as shown in fig. 6. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer device is used for the memory list. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a data synchronization method.

It will be appreciated by those skilled in the art that the structure shown in FIG. 6 is merely a block diagram of some of the structures associated with the present inventive arrangements and is not limiting of the computer device to which the present inventive arrangements may be applied, and that a particular computer device may include more or fewer components than shown, or may combine some of the components, or have a different arrangement of components.

In one embodiment, a computer device is provided comprising a memory and a processor, the memory having stored therein a computer program, the processor when executing the computer program performing the steps of: reading target task information of data to be synchronized into a memory list; according to the relation between the object in the memory list and the target variable, assigning the stored target task information in the object in the memory list to the target variable; inputting the value of the target variable into a template of the business processing logic unit to generate a target business processing logic unit corresponding to each task; and the data synchronization of the corresponding tasks is completed through the target service processing logic unit.

In one embodiment, the data synchronization achieved by the target service processing logic unit when the processor executes the computer program to complete the corresponding task includes: and the data synchronization of the corresponding tasks is completed through parallel processing of each target service processing logic unit.

In one embodiment, the source database and/or the target database corresponding to the data to be synchronized involved in the execution of the computer program by the processor includes at least two; the method for inputting the value of the target variable into the template of the business processing logic unit when the processor executes the computer program, and generating the target business processing logic unit corresponding to each task comprises the following steps: splitting the task according to the source database and the target database to obtain a task unique to the source database and the target database; and inputting the value of the target variable into a template of the business processing logic unit to generate a target business processing logic unit corresponding to each task.

In one embodiment, the data synchronization achieved by the target service processing logic unit when the processor executes the computer program to complete the corresponding task includes: assigning data in the target variable to the Java variable; reading database connection information from a metadata configuration library, and generating an operable Java object according to the database connection information; generating a table structure of a cache table corresponding to the target database according to the method of the operable Java object and Java variables; and synchronizing the data to be synchronized in the source database to the cache table, and synchronizing the data to be synchronized in the cache table to the target database.

In one embodiment, reading target task information of data to be synchronized, which is implemented when a processor executes a computer program, into a memory list includes: reading initial task information of the data to be synchronized, wherein the initial task information is generated according to information of a source database of the data to be synchronized configured by a user; preprocessing the initial task information to obtain target task information corresponding to the memory list; and loading the target task information into the memory list.

In one embodiment, the initial task information involved in executing the computer program by the processor includes metadata information of the source database and necessary information for the data synchronization task; metadata information of the source database changes with changes in the data structure of the source database.

In one embodiment, after the target task information of the data to be synchronized is read into the memory list, the method further includes: generating state monitoring information corresponding to the target task information; after the data synchronization of the corresponding tasks is completed by the target service processing logic unit when the processor executes the computer program, the method further comprises the following steps: and after all tasks in the target task information are executed, modifying the state monitoring information corresponding to the target task information.

In one embodiment, a computer readable storage medium is provided having a computer program stored thereon, which when executed by a processor, performs the steps of: reading target task information of data to be synchronized into a memory list; according to the relation between the object in the memory list and the target variable, assigning the stored target task information in the object in the memory list to the target variable; inputting the value of the target variable into a template of the business processing logic unit to generate a target business processing logic unit corresponding to each task; and the data synchronization of the corresponding tasks is completed through the target service processing logic unit.

In one embodiment, the data synchronization of corresponding tasks accomplished by the target business processing logic when the computer program is executed by the processor comprises: and the data synchronization of the corresponding tasks is completed through parallel processing of each target service processing logic unit.

In one embodiment, the source database and/or the target database corresponding to the data to be synchronized involved when the computer program is executed by the processor comprises at least two; inputting the value of the target variable into a template of the business processing logic unit when the computer program is executed by a processor, generating a target business processing logic unit corresponding to each task, comprising: splitting the task according to the source database and the target database to obtain a task unique to the source database and the target database; and inputting the value of the target variable into a template of the business processing logic unit to generate a target business processing logic unit corresponding to each task.

In one embodiment, the data synchronization of corresponding tasks accomplished by the target business processing logic when the computer program is executed by the processor comprises: assigning data in the target variable to the Java variable; reading database connection information from a metadata configuration library, and generating an operable Java object according to the database connection information; generating a table structure of a cache table corresponding to the target database according to the method of the operable Java object and Java variables; and synchronizing the data to be synchronized in the source database to the cache table, and synchronizing the data to be synchronized in the cache table to the target database.

In one embodiment, reading target task information of data to be synchronized into a memory list, which is implemented when a computer program is executed by a processor, includes: reading initial task information of the data to be synchronized, wherein the initial task information is generated according to information of a source database of the data to be synchronized configured by a user; preprocessing the initial task information to obtain target task information corresponding to the memory list; and loading the target task information into the memory list.

In one embodiment, the initial task information involved in the execution of the computer program by the processor includes metadata information of the source database and necessary information for the data synchronization task; metadata information of the source database changes with changes in the data structure of the source database.

In one embodiment, after the target task information of the data to be synchronized is read into the memory list, the method further includes: generating state monitoring information corresponding to the target task information; after the data synchronization of the corresponding tasks is completed by the target service processing logic unit when the computer program is executed by the processor, the method further comprises the following steps: and after all tasks in the target task information are executed, modifying the state monitoring information corresponding to the target task information.

In one embodiment, a computer program product is provided comprising a computer program which, when executed by a processor, performs the steps of: reading target task information of data to be synchronized into a memory list; according to the relation between the object in the memory list and the target variable, assigning the stored target task information in the object in the memory list to the target variable; inputting the value of the target variable into a template of the business processing logic unit to generate a target business processing logic unit corresponding to each task; and the data synchronization of the corresponding tasks is completed through the target service processing logic unit.

Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, database, or other medium used in embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high density embedded nonvolatile Memory, resistive random access Memory (ReRAM), magnetic random access Memory (Magnetoresistive Random Access Memory, MRAM), ferroelectric Memory (Ferroelectric Random Access Memory, FRAM), phase change Memory (Phase Change Memory, PCM), graphene Memory, and the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory, and the like. By way of illustration, and not limitation, RAM can be in the form of a variety of forms, such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), and the like. The databases referred to in the embodiments provided herein may include at least one of a relational database and a non-relational database. The non-relational database may include, but is not limited to, a blockchain-based distributed database, and the like. The processor referred to in the embodiments provided in the present application may be a general-purpose processor, a central processing unit, a graphics processor, a digital signal processor, a programmable logic unit, a data processing logic unit based on quantum computing, or the like, but is not limited thereto.

The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The foregoing examples illustrate only a few embodiments of the application and are described in detail herein without thereby limiting the scope of the application. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of the application should be assessed as that of the appended claims.

Claims

1. A method of data synchronization, the method comprising:

reading target task information of data to be synchronized into a memory list;

2. The method of claim 1, wherein the data synchronization of the corresponding task performed by the target service processing logic unit comprises:

3. The method according to claim 2, wherein the source database and/or the target database corresponding to the data to be synchronized comprises at least two; the step of inputting the value of the target variable into a template of a business processing logic unit to generate a target business processing logic unit corresponding to each task, comprising the following steps:

4. A method according to any one of claims 1 to 3, wherein the data synchronization of the corresponding task performed by the target service processing logic unit comprises:

Assigning the data in the target variable to a Java variable;

5. The method of claim 1, wherein the reading the target task information of the data to be synchronized into the memory list comprises:

and loading the target task information into a memory list.

6. The method of claim 5, wherein the initial task information includes metadata information of a source database and necessary information of a data synchronization task; the metadata information of the source database changes along with the change of the data structure of the source database.

7. A method according to any one of claims 1 to 3, wherein after the reading the target task information of the data to be synchronized into the memory list, the method further comprises:

8. A data synchronization device, the device comprising:

9. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any of claims 1 to 7 when the computer program is executed.

10. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 7.