CN117633116A - Data synchronization method, device, electronic equipment and storage medium - Google Patents

Data synchronization method, device, electronic equipment and storage medium Download PDF

Info

Publication number
CN117633116A
CN117633116A CN202311659172.5A CN202311659172A CN117633116A CN 117633116 A CN117633116 A CN 117633116A CN 202311659172 A CN202311659172 A CN 202311659172A CN 117633116 A CN117633116 A CN 117633116A
Authority
CN
China
Prior art keywords
data
synchronous
information
synchronization
task
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311659172.5A
Other languages
Chinese (zh)
Inventor
孟洋
刘一阳
桂亦慧
许翠
左飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Agricultural Bank of China
Original Assignee
Agricultural Bank of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Agricultural Bank of China filed Critical Agricultural Bank of China
Priority to CN202311659172.5A priority Critical patent/CN117633116A/en
Publication of CN117633116A publication Critical patent/CN117633116A/en
Pending legal-status Critical Current

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a data synchronization method, a device, electronic equipment and a storage medium, wherein the method comprises the following steps: dividing a target synchronous job into a plurality of synchronous tasks based on synchronous configuration information corresponding to a source database; determining at least one synchronous task group based on the synchronous tasks, and creating a task group queue to be executed based on the at least one synchronous task group; invoking synchronous inquiry instructions corresponding to synchronous tasks included in the current synchronous task group; and responding to a synchronous inquiry instruction corresponding to the current synchronous task, extracting data information to be synchronized of a data table to be synchronized corresponding to the current synchronous task from a source end database, and synchronizing the data information to be synchronized into a target data table which is created in advance in a target end database and corresponds to the current synchronous task. According to the technical scheme, the effect of real-time and high-efficiency synchronization of data among multiple data sources in a heterogeneous environment is achieved, and the data synchronization efficiency and the timeliness of the data synchronization process are improved.

Description

Data synchronization method, device, electronic equipment and storage medium
Technical Field
The present invention relates to the field of big data processing technologies, and in particular, to a data synchronization method, a device, an electronic device, and a storage medium.
Background
With the gradual advancement of the technology of integrating lakes and reservoirs, the problem of data synchronization between newly added data lakes and stored data warehouses gradually emerges, and before the data warehouses in the lakes are built, a data synchronization process between the data lakes and the data warehouses outside the lakes exists for a quite long time.
At present, a data synchronization tool adopted in data synchronization can complete a data synchronization process only by manual dry prognosis, and the problems that the data synchronization efficiency is low and the automatic synchronization and the real-time synchronization of the data cannot be realized exist.
Disclosure of Invention
The invention provides a data synchronization method, a data synchronization device, electronic equipment and a storage medium, which are used for realizing the effect of real-time and efficient synchronization of data among multiple data sources in a heterogeneous environment and improving the data synchronization efficiency and the timeliness of a data synchronization process.
According to an aspect of the present invention, there is provided a data synchronization method, the method comprising:
dividing a target synchronous job into a plurality of synchronous tasks based on synchronous configuration information corresponding to a source database;
determining at least one synchronous task group based on the synchronous tasks, and creating a task group queue to be executed based on the synchronous task group;
For each synchronous task group in the task group queue to be executed, invoking a synchronous query instruction corresponding to each synchronous task included in the current synchronous task group;
for each synchronous task in the current synchronous task group, responding to a synchronous query instruction corresponding to the current synchronous task, extracting data information to be synchronized of a data table to be synchronized corresponding to the current synchronous task from the source end database, and synchronizing the data information to be synchronized into a target data table which is created in advance in a target end database and corresponds to the current synchronous task.
According to another aspect of the present invention, there is provided a data synchronizing device, comprising:
the job segmentation module is used for segmenting the target synchronous job into a plurality of synchronous tasks based on the synchronous configuration information corresponding to the source database;
a task group determining module, configured to determine at least one synchronous task group based on the plurality of synchronous tasks, and create a task group queue to be executed based on the at least one synchronous task group;
the instruction calling module is used for calling synchronous inquiry instructions corresponding to synchronous tasks included in the current synchronous task group for each synchronous task group in the task group queue to be executed;
The data synchronization module is used for responding to a synchronization query instruction corresponding to the current synchronization task for each synchronization task in the current synchronization task group, extracting data information to be synchronized of a data table to be synchronized corresponding to the current synchronization task from the source end database, and synchronizing the data information to be synchronized into a target data table which is created in advance in a target end database and corresponds to the current synchronization task.
According to another aspect of the present invention, there is provided an electronic apparatus including:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the data synchronization method of any one of the embodiments of the present invention.
According to another aspect of the present invention, there is provided a computer readable storage medium storing computer instructions for causing a processor to execute a data synchronization method according to any one of the embodiments of the present invention.
According to the technical scheme, the target synchronous operation is segmented into the synchronous tasks based on the synchronous configuration information corresponding to the source database, at least one synchronous task group is determined based on the synchronous tasks, a task group queue to be executed is established based on the synchronous task group, and then synchronous inquiry instructions corresponding to the synchronous tasks included in the current synchronous task group are called for each synchronous task group in the task group queue to be executed.
It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the invention or to delineate the scope of the invention. Other features of the present invention will become apparent from the description that follows.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a data synchronization method according to a first embodiment of the present invention;
FIG. 2 is a flow chart of a data synchronization method according to a second embodiment of the present invention;
fig. 3 is a schematic structural diagram of a data synchronization system according to a second embodiment of the present invention;
fig. 4 is a schematic structural diagram of a data synchronization device according to a third embodiment of the present invention;
fig. 5 is a schematic structural diagram of an electronic device implementing a data synchronization method according to an embodiment of the present invention.
Detailed Description
In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and the claims of the present invention and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
Example 1
Fig. 1 is a flowchart of a data synchronization method according to a first embodiment of the present invention, where the method may be performed by a data synchronization device, which may be implemented in hardware and/or software, and the data synchronization device may be configured in a terminal and/or a server, where the data synchronization method is applicable to a case of synchronizing data stored in a source database to a destination database. As shown in fig. 1, the method includes:
s110, dividing the target synchronous job into a plurality of synchronous tasks based on synchronous configuration information corresponding to the source database.
The source database may be a database corresponding to the data providing end in the data synchronization process. The source database may be the source of the data synchronization from which the data may be derived. The source database may be any form of repository or system, and may alternatively be a data lake or data warehouse. As will be appreciated by those skilled in the art, a data lake is a repository or system that stores data in a raw format, stores data as it is, and does not require structural processing of the data. The data lake can store structured data, semi-structured data, unstructured data, binary data and the like for various forms of data analysis applications such as machine learning, deep learning, statistical analysis and the like. A data warehouse is a theme-oriented, integrated, relatively stable, historical-change-reflecting data storage system that primarily stores structured data. The synchronization configuration information may be understood as information configuring a data synchronization process of the source database. Alternatively, the synchronization configuration information may include the number of data tables or the positions and the number of split primary keys, etc. It should be noted that, the synchronization configuration information may be information of a default setting of the system, or may be information of a custom setting based on a user requirement, which is not limited in this embodiment. The target sync job may be used to indicate the entire process of synchronizing data information from the source database to the target database. A synchronization task may be understood as a data synchronization task that synchronizes data information from a source database to a target database.
In practical applications, before synchronizing data information in the source database to the target database, a target synchronization job corresponding to the synchronization process may be determined. Furthermore, the synchronous configuration information of the source database can be obtained. Further, the target synchronization job may be split into a plurality of synchronization tasks according to the synchronization configuration information. It should be noted that the synchronization configuration information may include a plurality of configuration information, and for different configuration information, the splitting process of the synchronization task may be different, and the splitting process of the synchronization task under different configuration information may be described below.
Optionally, the synchronization configuration information includes the number of data tables to be synchronized, and dividing the target synchronization job into a plurality of synchronization tasks based on the synchronization configuration information corresponding to the source database includes: determining the number of tasks of the synchronous task based on the number of data tables to be synchronized in the source database; and cutting the target synchronous job according to the task number of the synchronous tasks to obtain a plurality of synchronous tasks.
In this embodiment, the number of data tables to be synchronized is the number of data tables to be synchronized from the source database to the target database, and the data tables to be synchronized are at least part of the data tables stored in the source database.
In practical application, the number of the data tables to be synchronized can be determined based on the synchronization configuration information corresponding to the source database. Furthermore, the number of the data tables to be synchronized can be used as the task number of the synchronous tasks, that is, one data table to be synchronized corresponds to one task, so as to obtain the task number of the synchronous tasks. Furthermore, the target synchronous job can be segmented according to the task number of the synchronous tasks. Thus, a plurality of synchronous tasks can be obtained.
For example, if the number of data tables to be synchronized is 100, the target synchronization job may be split into 100 synchronization tasks.
Optionally, the synchronization configuration information includes a position and a number of the split primary keys, and for each synchronization task, the current synchronization task may be split based on the position and the number of the split primary keys to obtain a plurality of synchronization subtasks corresponding to the current synchronization task, and further, each synchronization subtask may be used as a task execution unit to execute in a data synchronization process.
S120, determining at least one synchronous task group based on the synchronous tasks, and creating a task group queue to be executed based on the synchronous task group.
In this embodiment, the synchronization task group may be a set including at least one synchronization task. A task group queue to be executed may be understood as a message queue storing synchronized task groups. And arranging all synchronous task groups in the task group queue to be executed according to a preset sequence, and sequentially executing all synchronous task groups included in the task group queue to be executed according to the sequence of the queue in the synchronous task executing process.
In practical application, in order to improve data synchronization efficiency, after obtaining a plurality of synchronization tasks, the plurality of synchronization tasks may be divided into at least one synchronization task group according to a preset grouping criterion. Furthermore, each synchronization task included in each synchronization task group can be sequentially executed to complete the process of synchronizing the data information in the data table to be synchronized from the source database to the target database.
Optionally, determining at least one synchronization task group based on the at least one synchronization task includes: determining the number of task groups based on the preset number of job channels and the preset number of group channels; and according to the number of task groups and the number of synchronous tasks, at least one synchronous task is correspondingly distributed to each task group so as to obtain at least one synchronous task group.
In the present embodiment, the job channel number can be understood as the channel number required to complete the synchronization job. The number of job channels may correspond to the number of data tables to be synchronized included in the synchronization job. By way of example, assuming that the number of data tables to be synchronized required for the target synchronization job is 100, the number of job channels is 100, i.e., 100 channels are required to complete the target synchronization job. The number of group channels may be understood as the number of channels required for each task group. The number of group channels may be any predetermined value, alternatively, may be 5. The number of the synchronous tasks is the total number of all synchronous tasks obtained after the task division of the target synchronous job.
In practical application, the preset number of job channels and the preset number of group channels can be obtained. Further, a ratio between the number of job channels and the number of group channels may be determined, and the ratio may be taken as the task group number. Further, the number of the synchronization tasks allocated to each task group may be determined according to the number of task groups and the total number of the synchronization tasks, and further, a plurality of synchronization tasks may be allocated to each task group according to the determined number of the synchronization tasks included in each task group, and further, the task group including the synchronization tasks may be regarded as a synchronization task group.
By way of example, with continued reference to the above example, in the case where the job channel number is 100 and the group channel number is 5, the task group number may be 20. Furthermore, the total number of the synchronous tasks is 100, after the synchronous tasks are distributed to each task group, 20 synchronous task groups can be obtained, and each task group can comprise 5 synchronous tasks.
Furthermore, each synchronous task group can be stored in the message queue according to a preset sequence, and the task group queue to be executed can be obtained.
S130, for each synchronous task group in the task group queue to be executed, a synchronous query instruction corresponding to each synchronous task included in the current synchronous task group is called.
In this embodiment, the synchronization inquiry instruction may be a piece of pre-written program code, which is used to trigger and complete the data synchronization process. The synchronization query instruction may include a plurality of field information associated with the data synchronization process, and optionally may include address information of the source database, address information of the target database, and data table index information corresponding to the synchronization task, where the data table index information may include information of a data table to be synchronized in the source database and information of a target data table in the target database. The synchronous query instruction may be any form of instruction, and alternatively may be an SQL instruction.
In practical application, after determining a plurality of synchronization tasks corresponding to the target synchronization job, a synchronization configuration file corresponding to each synchronization task may be determined according to a data table to be synchronized corresponding to the synchronization task and data table information of the target data table corresponding to the data table to be synchronized. Further, each synchronization configuration file may be parsed, and a synchronization query instruction corresponding to each synchronization task may be generated and stored according to the parsed file. Further, when executing the data synchronization task, for each synchronization task group in the task group to be executed, a synchronization query instruction corresponding to each synchronization task included in the current task group may be called from the pre-stored synchronization query instructions according to the synchronization task included in the current synchronization task group.
It should be noted that, for each synchronous task group in the task group queue to be executed, before the synchronous query instruction corresponding to each synchronous task included in the current synchronous task group is invoked to execute each synchronous task based on the synchronous query instruction, the resource occupancy rate of the data node executing the synchronous task may also be detected to determine whether the related data node can execute the corresponding synchronous task.
Based on the above, the above technical means further includes: before the synchronous inquiry instruction corresponding to each synchronous task included in the current synchronous task group is called, the method further comprises the following steps: determining the available resource amount corresponding to each data node associated with the current synchronous task group according to the pre-acquired flow control parameters; and executing the current synchronous task group under the condition that the available amount of each resource meets the task execution standard corresponding to the current synchronous task group.
In this embodiment, the flow control parameter may be understood as a parameter for detecting and controlling the data flow of each data node in the data synchronization process. The flow control parameters may include any parameters associated with flow control, and optionally, may include a maximum number of data tables that each channel allows concurrent synchronization, a maximum number of records or channels that each channel can synchronize simultaneously, and so on. The resource availability may be understood as the remaining amount of resources of the data node at the current time, i.e. how much resources the data node has left available for performing the synchronization task at the current time. The task execution criteria may be understood as criteria set in advance for defining execution trigger timings of the synchronous task group. Alternatively, the task execution criteria may be that the available amounts of resources corresponding to each data node associated with the synchronized task group are each capable of executing a respective synchronized task.
In practical application, before a synchronous query instruction corresponding to each synchronous task included in the current synchronous task group is called to execute the current synchronous task group, the resource occupation amount corresponding to each data node associated with the current synchronous task group can be obtained, further, the difference between the flow control parameter and each resource occupation amount can be determined, and the difference can be used as the resource availability amount corresponding to the data node. Furthermore, the available resource amount corresponding to each data node can be obtained. Further, it may be determined whether each available amount of resources is capable of executing each synchronization task included in the current synchronization task group. Further, in the case where it is determined that each resource availability can execute each synchronous task included in the current synchronous task group, it is possible to determine that each resource availability satisfies the task execution criterion corresponding to the current synchronous task group. Further, a synchronization inquiry instruction corresponding to each synchronization task included in the current synchronization task group may be invoked and the current synchronization task group may be executed.
S140, for each synchronous task in the current synchronous task group, responding to a synchronous query instruction corresponding to the current synchronous task, extracting data information to be synchronized of a data table to be synchronized corresponding to the current synchronous task from a source end database, and synchronizing the data information to be synchronized into a target data table which is created in advance in a target end database and corresponds to the current synchronous task.
In this embodiment, the data information to be synchronized may be understood as data information to be synchronized, and the data information is data information in a corresponding data table to be synchronized. The database of the target end can be a database corresponding to the data importing end in the data synchronization process, and is a destination of data synchronization. It should be noted that the source database and the target database may be the same type of database or different types of databases, which is not particularly limited in the embodiment of the present invention. The target data table may be an empty table created in advance in the target database and not filled with data information. The data tables to be synchronized are in one-to-one correspondence with the target data tables.
In practical application, when executing the current synchronous task group, for each synchronous task in the current synchronous task group, the synchronous query instruction corresponding to the current synchronous task can be responded, and the data table to be synchronized corresponding to the current synchronous task in the source database is determined based on the synchronous query instruction. Furthermore, the data information to be synchronized in the data table to be synchronized can be extracted. Furthermore, the extracted data information to be synchronized can be synchronized to a target data table corresponding to the current synchronization task in the target database based on the synchronization query instruction.
In practical applications, after the data table information in the source database is synchronized to the target database, there may be a situation that the data table information in the source database is changed, so in order to achieve real-time synchronization between the source database and the target database, after the synchronization operation is completed, the data table information in the source database may be detected, and if the data change is detected, the data change is synchronized to the target database.
Based on the above, the above technical means further includes: under the condition that the completion of the target synchronous operation is determined, collecting a data source log corresponding to a source database according to a preset collection period; analyzing the data source log, and extracting data table information and data operation information in the data source log; and caching the data table information and the data operation information into a preset message queue in a preset format.
In this embodiment, the preset acquisition period may be any value, alternatively, may be 1 minute, 10 minutes, 30 minutes, or the like. In the field of data analysis, journals are a special type of data that is very important in handling historical data, diagnosing problems, and understanding system activity. The data source log may be understood as a log in which application situations of data are recorded by respective data application sources. Alternatively, the data source log may include a server log, an application log, a network device log, and the like, depending on the data source. The data table information may be information for identifying the data table. The data table information may include a data table name and/or a data table identification, etc. Data manipulation information may be understood as information characterizing the data application process, i.e. describing the manipulation process of the data. The preset format may be any format, and optionally, may be JSON format. The preset message queue may be any queue, and alternatively, may be Kafka. Kafka is a distributed stream processing platform that publishes and subscribes to message streams, stores the message streams in a fault tolerant, persistent manner, and processes the messages in time as they arrive. The method is mainly used for data pushing, data large buffer area, log collection, service middleware and the like.
In practical application, under the condition that the target synchronous operation is determined to be completed, the data change condition of the source database can be detected. And acquiring the data source log corresponding to the source database according to a preset acquisition period to obtain the data source log. Further, the data source log can be parsed, and data table information and data operation information in the data source log can be extracted. The extracted data table information and data operation information can be cached in a preset format to a preset message queue. The advantages of this arrangement are that: the data table information and the data operation information are cached in a preset message queue, so that log information can be ensured not to be lost in a period of time, and the overall reliability of the system is improved.
Further, in order to determine the data change condition, the cached data table information and the data operation information may be pulled from the preset message queue. Further, the pulled data table information and the data operation information can be analyzed to determine the data fluctuation information. Further, the data change information can be synchronized to the target database.
Based on the above, the above technical means further includes: retrieving data table information and data operation information from a preset message queue; analyzing the data table information and the data operation information to determine data editing information; and determining a target data table corresponding to the data editing information in the target database, and updating the data information in the target data table based on the data editing information.
In this embodiment, the data editing information may be log information characterizing the condition of data fluctuation. The data editing information may include at least one of data newly-added information, data modified information, and data deleted information. The data newly added information may be log information characterizing the newly added condition of the data. The data modification information may be log information characterizing the data modification. The data deletion information may be log information characterizing a data deletion situation.
In this embodiment, the data table information and the data operation information may be retrieved from the preset message queue in any manner, and optionally, a Flink tool may be used. Flink is a computational framework and distributed processing engine for stateful computation of unbounded and bounded data streams. The Flink may support stateful computation, higher throughput and lower latency, and may support out-of-order event processing relative to other stream computation engines.
In practical application, the data table information and the data operation information can be fetched from a preset message queue. Further, the data table information and the data operation information may be analyzed, and log information concerning the data fluctuation may be analyzed. Further, data editing information can be obtained. Then, a target data table corresponding to the data editing information in the target database can be determined, and the data information in the target data table is updated based on the data editing information. Optionally, if the data editing information includes data newly-added information, the corresponding newly-added data may be inserted into the target data table; if the data editing information comprises data deleting information, deleting the corresponding data from the target data table; if the data editing information includes data modification information, the data in the target data table may be updated.
In the actual application process, the resource use condition of each data node can be detected, the flow control parameters are adjusted based on the detection result, and the adjusted flow control parameters can be applied to the data synchronization process. The advantages of this arrangement are that: the data synchronization efficiency is improved, and the automatic flow control of data synchronization among multiple data sources in a heterogeneous environment is realized.
Based on the above, the above technical means further includes: acquiring resource occupation information corresponding to each data node according to a preset acquisition period; determining average occupation information of resources of each data node in a preset time period; and adjusting the flow control parameters corresponding to each data node based on the average occupation information of each resource.
In this embodiment, the preset acquisition period may be any value, alternatively, may be 1 minute. The preset time period may be any time interval, alternatively, may be 10 minutes. The resource occupancy information may include any information characterizing the use of the resource, and optionally may include central processor usage, memory usage, disk IO, network bandwidth occupancy percentages, and the like.
In practical application, the resource occupation information corresponding to each data node can be collected according to a preset collection period, and further, the average resource occupation information of each data node in a preset time period can be determined. Then, if the average occupation information of the resources meets a first preset condition, the flow control parameters can be adjusted based on a first function; and if the average occupation information of the resources meets a second preset condition, adjusting the flow control parameters based on a second function.
Illustratively, the adjustment process of the flow control parameter may include the steps of:
1. CPU usage, memory usage, disk IO, network bandwidth occupancy percentage of each data node are collected every 1 minute and recorded into a configuration library.
2. Calculating average use condition of resources: the average CPU utilization, the average memory utilization, the average disk IO and the average network bandwidth occupation percentage in the first 10 minutes are calculated every 1 minute.
3. If any of the following is present: CPU average usage=100%, memory average usage >90%, disk average IO >90% of the maximum IO amount of the disk, and network average bandwidth occupation percentage >90% (i.e. the first preset condition), the maximum data table number (maxTabNum) and the maximum record number (maxTabCount) will be adjusted. The adjustment formula (i.e., the first function) is: the adjusted maxtabnum=maxtabnum×80% is rounded down, the adjusted maxtabcount=maxtabcount×80% is rounded down, and the adjusted parameters are recorded into the configuration library.
4. If any of the following is present: the CPU average usage rate <10%, the memory average usage rate <10%, the disk average IO <10% of the maximum IO amount of the disk, and the network average bandwidth occupation percentage <10% (i.e., the second preset condition), the maximum data table number (maxTabNum) and the maximum record number (maxTabCount) will be adjusted. The adjustment formula (i.e., the second function) is: adjusted maxtabnum=maxtabnum×2, adjusted maxtabcount=maxtabcount×2, and recording the adjusted parameters into the configuration library.
5. If there is no such condition, the maximum number of data tables (maxTabNum) and the maximum number of record bars (maxTabContent) are not adjusted.
According to the technical scheme, the target synchronous operation is segmented into the synchronous tasks based on the synchronous configuration information corresponding to the source database, at least one synchronous task group is determined based on the synchronous tasks, a task group queue to be executed is established based on the synchronous task group, and then synchronous inquiry instructions corresponding to the synchronous tasks included in the current synchronous task group are called for each synchronous task group in the task group queue to be executed.
Example two
Fig. 2 is a flowchart of a data synchronization method according to a second embodiment of the present invention, where metadata information of a data table to be synchronized in a source database may be obtained and field type conversion may be performed on the metadata information, and further, a target data table corresponding to the data table to be synchronized may be created in a target database based on the converted metadata information. The specific implementation manner can be seen in the technical scheme of the embodiment. Wherein, the technical terms identical to or corresponding to the above embodiments are not repeated herein.
As shown in fig. 2, the method includes:
s210, for each data table to be synchronized in the source database, acquiring metadata information of the current data table to be synchronized.
The data table to be synchronized is understood to be a data table stored in the source database and to be subjected to data synchronization. The metadata information may be information characterizing a table structure of the data table. The metadata information may include a plurality of items of information associated with the table structure. Alternatively, the metadata information may include information of a source table field, a source table field type, a source table primary key, a source table partition, and the like.
In practical application, for each data table to be synchronized in the source database, metadata information corresponding to the current data table to be synchronized can be collected. Furthermore, metadata information corresponding to the current data table to be synchronized can be obtained.
S220, converting the source table field type in the metadata information into the target table field type based on a pre-established field mapping relation, and obtaining the converted metadata information.
The field mapping relationship may be used to indicate a correspondence between a source table field type and a target table field type. The source table field type may be understood as the field type of the data table in the source database. The source table field type may be matched to the database type of the source database. The target table field type may be understood as a field type of a data table in the target database. The target table field type may be matched to the database type of the target database.
In practical application, a plurality of databases may be predetermined, and field types corresponding to the databases may be obtained respectively. It should be noted that, these databases may include a source database and a destination database, and the determination of the source database and the destination database may be determined based on the data synchronization requirement. Further, the source database and the target database may be determined based on data synchronization requirements. Further, a field type corresponding to the source database may be acquired from the determined database field type, the field type may be used as a source table field type, and a field type corresponding to the target database may be acquired, and the field type may be used as a target table field type. Then, a corresponding relation between the source table field type and the target table field type can be established, and a field mapping relation is obtained. Further, the field mapping relationship may be stored in a configuration library, so that the field mapping relationship may be called from the configuration library when performing the field type conversion, and the field type conversion may be completed based on the field mapping relationship.
In practical application, after obtaining metadata information corresponding to the current data table to be synchronized, a pre-established field mapping relation can be called from a configuration library. Furthermore, the source table field type in the metadata information can be converted into the target table field type based on the field mapping relation, and the converted metadata information can be obtained. Only the field type of the converted metadata information changes, and other information remains unchanged, that is, the converted metadata information differs from the unconverted metadata information in that the field type is different.
S230, generating a target table-building script based on the converted metadata information, and creating a target data table corresponding to the current data table to be synchronized based on the target table-building script.
In this embodiment, the target table script may be understood as a script obtained by performing script conversion processing on the metadata information after conversion. The target table script may be a computer executable file that writes the target table logic code according to a certain format.
In practical application, after the converted metadata information is obtained, a script conversion mode for converting the metadata information into a script may be invoked. Further, the converted metadata information may be converted into a target table script based on the called script conversion mode. Further, a target table-building script may be executed to create a target data table corresponding to the current data table to be synchronized in the target-side database.
S240, dividing the target synchronous job into a plurality of synchronous tasks based on synchronous configuration information corresponding to the source database.
S250, determining at least one synchronous task group based on the synchronous tasks, and creating a task group queue to be executed based on the synchronous task group.
S260, for each synchronous task group in the task group queue to be executed, a synchronous query instruction corresponding to each synchronous task included in the current synchronous task group is called.
S270, for each synchronous task in the current synchronous task group, responding to a synchronous query instruction corresponding to the current synchronous task, extracting data information to be synchronized of a data table to be synchronized corresponding to the current synchronous task from a source end database, and synchronizing the data information to be synchronized into a target data table which is created in advance in a target end database and corresponds to the current synchronous task.
It should be noted that, the technical solution provided in this embodiment may be implemented based on a data synchronization system, and as illustrated in fig. 3, the data synchronization system may include a source database, a data synchronization module, a configuration management module, a traffic monitoring module, and a target database.
The data synchronization module is used for executing synchronization operation, collecting data information of a data table in the source end database and metadata information of the data table, and synchronizing the data information and the metadata information to the target end database;
The configuration management module is used for storing the field mapping relation and collecting the synchronization progress in the data synchronization process, and in practical application, the field mapping relation can be obtained and stored based on the field type mapping configuration input by the user and received by the front end interface in the configuration management module. During the data synchronization process, the user can view the data synchronization progress based on the front-end interface.
And the flow monitoring module is used for collecting the resource occupation information of each data node and automatically adjusting the flow control parameters.
According to the technical scheme, the target synchronous operation is segmented into the synchronous tasks based on the synchronous configuration information corresponding to the source database, at least one synchronous task group is determined based on the synchronous tasks, a task group queue to be executed is created based on the synchronous task group, then synchronous query instructions corresponding to the synchronous tasks included in the current synchronous task group are called for each synchronous task group in the task group queue to be executed, further, for each synchronous task in the current synchronous task group, the synchronous query instructions corresponding to the current synchronous task are responded, the to-be-synchronized data information of the to-be-synchronized data table corresponding to the current synchronous task is extracted from the source database, and the to-be-synchronized data information is synchronized to the target data table pre-created in the target database and corresponding to the current synchronous task, so that the automatic synchronization effect of metadata among multiple data sources under the heterogeneous environment is achieved, the timeliness of the data synchronization efficiency and the data synchronization process under the condition without manual intervention is improved, and the configurable effect of the field mapping relationship among the multiple data sources under the heterogeneous environment is achieved.
Example III
Fig. 4 is a schematic structural diagram of a data synchronization device according to a third embodiment of the present invention. As shown in fig. 4, the apparatus includes: job segmentation module 310, task group determination module 320, instruction invocation module 330, and data synchronization module 340.
The job segmentation module 310 is configured to segment a target synchronous job into a plurality of synchronous tasks based on synchronous configuration information corresponding to the source database; a task group determination module 320, configured to determine at least one synchronous task group based on the plurality of synchronous tasks, and create a task group queue to be executed based on the at least one synchronous task group; the instruction calling module 330 is configured to call, for each synchronous task group in the task group queue to be executed, a synchronous query instruction corresponding to each synchronous task included in the current synchronous task group; the data synchronization module 340 is configured to, for each synchronization task in the current synchronization task group, respond to a synchronization query instruction corresponding to the current synchronization task, extract data information to be synchronized of a data table to be synchronized corresponding to the current synchronization task from the source database, and synchronize the data information to be synchronized to a target data table corresponding to the current synchronization task, which is created in advance in a target database.
According to the technical scheme, the target synchronous operation is segmented into the synchronous tasks based on the synchronous configuration information corresponding to the source database, at least one synchronous task group is determined based on the synchronous tasks, a task group queue to be executed is established based on the synchronous task group, and then synchronous inquiry instructions corresponding to the synchronous tasks included in the current synchronous task group are called for each synchronous task group in the task group queue to be executed.
Optionally, the synchronization configuration information includes the number of data tables to be synchronized, and the job segmentation module 310 includes: and the task number determining unit and the job segmentation unit.
The task number determining unit is used for determining the task number of the synchronous task based on the number of the data tables to be synchronized corresponding to the source database;
and the job segmentation unit is used for segmenting the target synchronous job according to the task number of the synchronous tasks so as to obtain a plurality of synchronous tasks.
Optionally, the task group determination module 320 includes: a task group number determination unit and a task group determination unit.
The task group number determining unit is used for determining the task group number based on the preset job channel number and the preset group channel number;
and the task group determining unit is used for distributing the plurality of synchronous tasks to each task group according to the number of the task groups and the total number of the synchronous tasks so as to obtain at least one synchronous task group.
Optionally, the apparatus further includes: the system comprises a metadata information acquisition module, a field type conversion module and a target data table creation module.
The metadata information acquisition module is used for acquiring metadata information of the current data table to be synchronized for each data table to be synchronized in the source database, wherein the metadata information comprises a source table field type;
The field type conversion module is used for converting the source table field type in the metadata information into the target table field type based on a pre-established field mapping relation to obtain converted metadata information;
and the target data table creation module is used for generating a target table creation script based on the converted metadata information and creating a target data table corresponding to the current data table to be synchronized based on the target table creation script.
Optionally, the apparatus further includes: the resource availability determining module and the synchronous task group executing module.
The resource availability determining module is used for determining the resource availability corresponding to each data node associated with the current synchronous task group according to the pre-acquired flow control parameter before the synchronous query instruction corresponding to each synchronous task included in the current synchronous task group is called;
and the synchronous task group execution module is used for executing the current synchronous task group under the condition that the available amount of each resource meets the task execution standard corresponding to the current synchronous task group.
Optionally, the apparatus further includes: the system comprises a log acquisition module, a log analysis module and a data caching module.
The log acquisition module is used for acquiring a data source log corresponding to the source database according to a preset acquisition period under the condition that the target synchronous operation is determined to be completed;
The log analysis module is used for analyzing the data source log and extracting data table information and data operation information in the data source log;
and the data caching module is used for caching the data table information and the data operation information into a preset message queue in a preset format.
Optionally, the apparatus further includes: the device comprises a data extraction module, a data analysis module and a data updating module.
The data extraction module is used for extracting the data table information and the data operation information from the preset message queue;
the data analysis module is used for analyzing the data table information and the data operation information to determine data editing information;
and the data updating module is used for determining a target data table corresponding to the data editing information in the target database and updating the data information in the target data table based on the data editing information, wherein the data editing information comprises at least one of data newly-added information, data modified information and data deleted information.
The data synchronization device provided by the embodiment of the invention can execute the data synchronization method provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of the execution method.
Example IV
Fig. 5 shows a schematic diagram of the structure of an electronic device 10 that may be used to implement an embodiment of the invention. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. Electronic equipment may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices (e.g., helmets, glasses, watches, etc.), and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed herein.
As shown in fig. 5, the electronic device 10 includes at least one processor 11, and a memory, such as a Read Only Memory (ROM) 12, a Random Access Memory (RAM) 13, etc., communicatively connected to the at least one processor 11, in which the memory stores a computer program executable by the at least one processor, and the processor 11 may perform various appropriate actions and processes according to the computer program stored in the Read Only Memory (ROM) 12 or the computer program loaded from the storage unit 18 into the Random Access Memory (RAM) 13. In the RAM 13, various programs and data required for the operation of the electronic device 10 may also be stored. The processor 11, the ROM 12 and the RAM 13 are connected to each other via a bus 14. An input/output (I/O) interface 15 is also connected to bus 14.
Various components in the electronic device 10 are connected to the I/O interface 15, including: an input unit 16 such as a keyboard, a mouse, etc.; an output unit 17 such as various types of displays, speakers, and the like; a storage unit 18 such as a magnetic disk, an optical disk, or the like; and a communication unit 19 such as a network card, modem, wireless communication transceiver, etc. The communication unit 19 allows the electronic device 10 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.
The processor 11 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of processor 11 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various processors running machine learning model algorithms, digital Signal Processors (DSPs), and any suitable processor, controller, microcontroller, etc. The processor 11 performs the various methods and processes described above, such as the data synchronization method.
In some embodiments, the data synchronization method may be implemented as a computer program tangibly embodied on a computer-readable storage medium, such as the storage unit 18. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 10 via the ROM 12 and/or the communication unit 19. When the computer program is loaded into RAM 13 and executed by processor 11, one or more steps of the data synchronization method described above may be performed. Alternatively, in other embodiments, the processor 11 may be configured to perform the data synchronization method in any other suitable way (e.g., by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.
A computer program for carrying out methods of the present invention may be written in any combination of one or more programming languages. These computer programs may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the computer programs, when executed by the processor, cause the functions/acts specified in the flowchart and/or block diagram block or blocks to be implemented. The computer program may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of the present invention, a computer-readable storage medium may be a tangible medium that can contain, or store a computer program for use by or in connection with an instruction execution system, apparatus, or device. The computer readable storage medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. Alternatively, the computer readable storage medium may be a machine readable signal medium. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on an electronic device having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) through which a user can provide input to the electronic device. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), blockchain networks, and the internet.
The computing system may include clients and servers. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical hosts and VPS service are overcome.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present invention may be performed in parallel, sequentially, or in a different order, so long as the desired results of the technical solution of the present invention are achieved, and the present invention is not limited herein.
The above embodiments do not limit the scope of the present invention. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the scope of the present invention.

Claims (10)

1. A method of data synchronization, comprising:
dividing a target synchronous job into a plurality of synchronous tasks based on synchronous configuration information corresponding to a source database;
determining at least one synchronous task group based on the synchronous tasks, and creating a task group queue to be executed based on the synchronous task group;
for each synchronous task group in the task group queue to be executed, invoking a synchronous query instruction corresponding to each synchronous task included in the current synchronous task group;
For each synchronous task in the current synchronous task group, responding to a synchronous query instruction corresponding to the current synchronous task, extracting data information to be synchronized of a data table to be synchronized corresponding to the current synchronous task from the source end database, and synchronizing the data information to be synchronized into a target data table which is created in advance in a target end database and corresponds to the current synchronous task.
2. The method according to claim 1, wherein the synchronization configuration information includes a number of data tables to be synchronized, and the dividing the target synchronization job into a plurality of synchronization tasks based on the synchronization configuration information corresponding to the source database includes:
determining the task number of the synchronous task based on the number of the data tables to be synchronized corresponding to the source database;
and cutting the target synchronous job according to the task number of the synchronous tasks to obtain a plurality of synchronous tasks.
3. The method of claim 1, wherein said determining at least one synchronization task group based on said plurality of synchronization tasks comprises:
determining the number of task groups based on the preset number of job channels and the preset number of group channels;
and distributing the plurality of synchronous tasks to each task group according to the number of the task groups and the total number of the synchronous tasks so as to obtain at least one synchronous task group.
4. The method as recited in claim 1, further comprising:
for each data table to be synchronized in a source database, acquiring metadata information of the current data table to be synchronized, wherein the metadata information comprises a source table field type;
converting the source table field type in the metadata information into a target table field type based on a pre-established field mapping relation to obtain converted metadata information;
and generating a target table-building script based on the converted metadata information, and creating a target data table corresponding to the current data table to be synchronized based on the target table-building script.
5. The method of claim 1, further comprising, prior to said invoking the synchronization query instruction corresponding to each synchronization task included in the current synchronization task group:
determining available resource information corresponding to each data node associated with the current synchronous task group according to the pre-acquired flow control parameters;
and executing the current synchronous task group under the condition that each piece of available resource information meets the task execution standard corresponding to the current synchronous task group.
6. The method as recited in claim 1, further comprising:
Under the condition that the target synchronous operation is determined to be completed, collecting a data source log corresponding to the source database according to a preset collection period;
analyzing the data source log, and extracting data table information and data operation information in the data source log;
and caching the data table information and the data operation information into a preset message queue in a preset format.
7. The method as recited in claim 6, further comprising:
the data table information and the data operation information are called from the preset message queue;
analyzing the data table information and the data operation information to determine data editing information;
determining a target data table corresponding to the data editing information in the target database, and updating data information in the target data table based on the data editing information, wherein the data editing information comprises at least one of data newly-added information, data modification information and data deletion information.
8. A data synchronization device, comprising:
the job segmentation module is used for segmenting the target synchronous job into a plurality of synchronous tasks based on the synchronous configuration information corresponding to the source database;
A task group determining module, configured to determine at least one synchronous task group based on the plurality of synchronous tasks, and create a task group queue to be executed based on the at least one synchronous task group;
the instruction calling module is used for calling synchronous inquiry instructions corresponding to synchronous tasks included in the current synchronous task group for each synchronous task group in the task group queue to be executed;
the data synchronization module is used for responding to a synchronization query instruction corresponding to the current synchronization task for each synchronization task in the current synchronization task group, extracting data information to be synchronized of a data table to be synchronized corresponding to the current synchronization task from the source end database, and synchronizing the data information to be synchronized into a target data table which is created in advance in a target end database and corresponds to the current synchronization task.
9. An electronic device, the electronic device comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the data synchronization method of any one of claims 1-7.
10. A computer readable storage medium storing computer instructions for causing a processor to perform the data synchronization method of any one of claims 1-7.
CN202311659172.5A 2023-12-05 2023-12-05 Data synchronization method, device, electronic equipment and storage medium Pending CN117633116A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311659172.5A CN117633116A (en) 2023-12-05 2023-12-05 Data synchronization method, device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311659172.5A CN117633116A (en) 2023-12-05 2023-12-05 Data synchronization method, device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN117633116A true CN117633116A (en) 2024-03-01

Family

ID=90021324

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311659172.5A Pending CN117633116A (en) 2023-12-05 2023-12-05 Data synchronization method, device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN117633116A (en)

Similar Documents

Publication Publication Date Title
CN110147470B (en) Cross-machine-room data comparison system and method
CN114223189A (en) Duration statistical method and device, electronic equipment and computer readable medium
CN115291806A (en) Processing method, processing device, electronic equipment and storage medium
CN115408546A (en) Time sequence data management method, device, equipment and storage medium
CN115146000A (en) Database data synchronization method and device, electronic equipment and storage medium
CN112925811B (en) Method, apparatus, device, storage medium and program product for data processing
CN112433757A (en) Method and device for determining interface calling relationship
CN116383207A (en) Data tag management method and device, electronic equipment and storage medium
CN115905322A (en) Service processing method and device, electronic equipment and storage medium
CN115438007A (en) File merging method and device, electronic equipment and medium
CN117633116A (en) Data synchronization method, device, electronic equipment and storage medium
CN115454971A (en) Data migration method and device, electronic equipment and storage medium
CN112749204B (en) Method and device for reading data
CN114610719A (en) Cross-cluster data processing method and device, electronic equipment and storage medium
CN113722141A (en) Method and device for determining delay reason of data task, electronic equipment and medium
CN116431698B (en) Data extraction method, device, equipment and storage medium
CN113688159B (en) Data extraction method and device
CN111459981A (en) Query task processing method, device, server and system
CN115599806A (en) Method and device for inquiring presence, electronic equipment and storage medium
CN117667942A (en) Data synchronous integration method and device, electronic equipment and storage medium
CN115599863A (en) Bank data synchronization method and device based on Hudi, electronic equipment and medium
CN114969009A (en) Rainfall data processing system, rainfall data processing method, electronic device, and storage medium
CN117931805A (en) Data processing method and device, electronic equipment and storage medium
CN115730000A (en) Medical data integration method, device, equipment and medium based on data lake
CN117950850A (en) Data transmission method, device, electronic equipment and computer readable medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination