CN111258742A - Data synchronization method, system, computing device and storage medium - Google Patents

Data synchronization method, system, computing device and storage medium Download PDF

Info

Publication number
CN111258742A
CN111258742A CN202010095675.4A CN202010095675A CN111258742A CN 111258742 A CN111258742 A CN 111258742A CN 202010095675 A CN202010095675 A CN 202010095675A CN 111258742 A CN111258742 A CN 111258742A
Authority
CN
China
Prior art keywords
task
data
synchronization
data synchronization
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010095675.4A
Other languages
Chinese (zh)
Other versions
CN111258742B (en
Inventor
郑永升
石磊
汤昭荣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Yitu Medical Technology Co ltd
Original Assignee
Hangzhou Yitu Medical Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Yitu Medical Technology Co ltd filed Critical Hangzhou Yitu Medical Technology Co ltd
Priority to CN202010095675.4A priority Critical patent/CN111258742B/en
Publication of CN111258742A publication Critical patent/CN111258742A/en
Application granted granted Critical
Publication of CN111258742B publication Critical patent/CN111258742B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/52Program synchronisation; Mutual exclusion, e.g. by means of semaphores
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a data synchronization method, which comprises the following steps: packaging the custom task type in the Azkaban; configuring synchronous information of data, wherein the synchronous information comprises an original data source, a data name and a target data source; generating a scheduling task according to the synchronization information and the user-defined task type; and executing the scheduling task to complete data synchronization. The data synchronization method disclosed by the invention can realize distributed calculation of data synchronization, is efficient and stable, and is suitable for high-throughput and large-concurrency data scenes. The invention also provides a system, a computing device and a storage medium for data synchronization.

Description

Data synchronization method, system, computing device and storage medium
Technical Field
The present invention relates to the field of big data processing, and in particular, to a method, a system, a computing device, and a storage medium for data synchronization.
Background
At present, in a big data era, data has the characteristics of mass, high growth, diversification and the like, and in actual big data processing, data change in one system is often required to be synchronized to another system in time, or data change in one database is synchronized to another database, namely data synchronization is performed. The existing data synchronization method, such as data synchronization using DataX, often has the problems of insufficient single-machine memory, insufficient CPU processing performance, insufficient network throughput capability, and the like, and thus cannot be effectively applied to high throughput and large concurrency data scenes.
Therefore, a method for realizing distributed computation, high efficiency and stability, and being suitable for data synchronization in high-throughput and large-concurrency data scenes is urgently needed.
Disclosure of Invention
The invention aims to provide a data synchronization method to solve the problem of insufficient network throughput capacity during data synchronization in the prior art.
In order to solve the above technical problem, an embodiment of the present invention discloses a data synchronization method, which includes the following steps: packaging the custom task type in the Azkaban; configuring synchronous information of data, wherein the synchronous information comprises an original data source, a data name and a target data source; generating a scheduling task according to the synchronization information and the user-defined task type; and executing the scheduling task to complete data synchronization.
By adopting the technical scheme, distributed computation of data synchronization can be realized, the efficiency is high, the stability is high, and the method is suitable for high-throughput and large-concurrency data scenes.
Optionally, the custom task type is a kubernets task, and the data synchronization method further includes the following steps: the sync containers are packaged in kubernets.
Optionally, the step of performing scheduling task to complete data synchronization includes: calling Kubernetes to generate a Kubernetes Job task corresponding to the scheduling task; the Kubernetes Job task is executed to complete data synchronization.
Optionally, the sync container is a DataX sync container, and the step of performing kubernets Job task completion data synchronization includes: acquiring an original data table and meta information corresponding to the original data table according to the synchronous information; generating configuration information according to the original data table and the meta information; and calling the DataX to read the configuration information to complete data synchronization.
Optionally, the synchronization information further comprises speed limit information.
Optionally, the method of data synchronization further comprises the following steps: when the scheduled task fails to execute, the scheduled task is retried.
The embodiment of the invention also discloses a system for data synchronization, which comprises: the configuration module is used for configuring synchronous information of data, and the synchronous information comprises an original data source, a data name and a target data source; the device comprises an Azkaban module, a task scheduling module and a task scheduling module, wherein the Azkaban module comprises a first packaging unit, a scheduling unit and an execution unit, the first packaging unit is used for packaging a self-defined task type, the scheduling unit is used for generating a scheduling task, and the execution unit is used for executing the scheduling task; and the custom module corresponds to the custom task type and is used for being called by the execution unit to execute the scheduling task.
The data synchronization system adopting the technical scheme can realize distributed calculation, high efficiency and stability of data synchronization, and is suitable for high-throughput and large-concurrency data scenes.
Optionally, the system further includes a DataX module, the customization module is a kubernets module, the customization task type is a kubernets task, the execution unit includes a kubernets interface, the kubernets interface is used for calling the kubernets module to generate a kubernets Job task, the kubernets module includes a second encapsulation unit, the second encapsulation unit is used for encapsulating a DataX synchronization container, and the DataX module is used for being called by the kubernets module to complete data synchronization.
Optionally, the synchronization information further comprises speed limit information.
Optionally, the Azkaban module further includes a monitoring unit, and the monitoring unit is configured to monitor a completion condition of the scheduling task and send a signal to the execution unit.
The embodiment of the invention also discloses a computing device, which comprises: a processor adapted to implement various instructions; a memory adapted to store a plurality of instructions adapted to be loaded by the processor and any of the aforementioned methods of data synchronization.
The computing equipment adopting the technical scheme can realize distributed computation of data synchronization, is efficient and stable when in use, and is suitable for high-throughput and large-concurrency data scenes.
The embodiment of the invention also discloses a storage medium, wherein a plurality of instructions are stored in the storage medium, and the instructions are suitable for being loaded by a processor and executing any one of the data synchronization methods.
By adopting the storage medium of the technical scheme, distributed computation of data synchronization can be realized, the efficiency is high, the stability is high, and the method is suitable for high-throughput and large-concurrency data scenes.
Drawings
FIG. 1 illustrates a flow diagram of a method of data synchronization in accordance with an embodiment of the present invention;
FIG. 2 shows a flow diagram of a method of data synchronization of a further embodiment of the present invention;
FIG. 3 is a flowchart of step S4 in an embodiment of the present invention;
FIG. 4 is a flowchart illustrating step S42 according to an embodiment of the present invention;
FIG. 5 shows a flow diagram of a method of data synchronization of another embodiment of the present invention;
FIG. 6 shows a schematic block diagram of a system for data synchronization of an embodiment of the present invention;
FIG. 7 shows a schematic block diagram of a system for data synchronization of a further embodiment of the present invention;
fig. 8 shows a schematic block diagram of an Azkaban module according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention is provided for illustrative purposes, and other advantages and effects of the present invention will become apparent to those skilled in the art from the present disclosure. While the invention will be described in conjunction with the preferred embodiments, it is not intended that features of the invention be limited to these embodiments. On the contrary, the invention is described in connection with the embodiments for the purpose of covering alternatives or modifications that may be extended based on the claims of the present invention. In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention. The invention may be practiced without these particulars. Moreover, some of the specific details have been left out of the description in order to avoid obscuring or obscuring the focus of the present invention. It should be noted that the embodiments and features of the embodiments may be combined with each other without conflict.
It should be noted that in this specification, like reference numerals and letters refer to like items in the following drawings, and thus, once an item is defined in one drawing, it need not be further defined and explained in subsequent drawings.
The terms "first," "second," and the like are used solely to distinguish one from another and are not to be construed as indicating or implying relative importance.
In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in detail with reference to the accompanying drawings.
Referring to fig. 1, an embodiment of the present invention discloses a method for data synchronization, including the following steps, S1: packaging the custom task type in the Azkaban; s2: configuring synchronous information of data, wherein the synchronous information comprises an original data source, a data name and a target data source; s3: generating a scheduling task according to the synchronization information and the user-defined task type; s4: and executing the scheduling task to complete data synchronization.
In S1, because Azkaban has strong compatibility, the user can encapsulate the custom task type on Azkaban according to the data synchronization requirement, such as the difference of original data source, using software and other information. For example, when the user side uses the Hadoop system more, the user-defined task type corresponding to the Hadoop can be encapsulated in Azkaban, and in the subsequent data synchronization process, when the user-defined task type is selected as the Hadoop task, the Hadoop can be called for data synchronization. For another example, when the user side mostly uses kubernets, the corresponding kubernets task can be encapsulated at Azkaban, and in the subsequent data synchronization process, when the user-defined task type is selected to be the kubernets task, the kubernets can be called for data synchronization. The invention does not limit the number and specific content of the user-defined task types, can select according to actual needs, and only needs to use the corresponding interface in the Azkaban and call the corresponding module. It can be understood that S1 is not required to be performed in each data synchronization process, that is, after one encapsulation is completed, it can be called in subsequent data synchronization processes, and when necessary, it can be called by adding, deleting or modifying custom task types.
In S2, the user can configure corresponding synchronization information through the UI interface of the Web end according to the data synchronization requirement. Such as original data source, data name, target data source, synchronization time, synchronization speed, custom task type, etc. that need to be synchronized. The synchronization information at least needs to include an original data source, a data name, and a target data source, and may be configured in an input manner or selected in a checking manner, which is not limited in the present invention. It can be understood that, according to the selected "original data source" and "target data source", the meta information of the corresponding data source, such as JDBC (Java database connectivity) information, can be obtained, which facilitates the subsequent generation of the scheduling task. It is understood that in the data synchronization process, there are multiple data sources, one data source may include one or more databases, and one database may include multiple data tables, and the data name may be the name of the data table or the name of the database. For example, when the method is applied to a medical big data scene, when a data table tableA in "nuclear medicine" needs to be synchronized into an "electronic medical record", an original data source here is "nuclear medicine", a data name represents a name of the data table, which is "tableA", and a target data source is "electronic medical record", and other settings can be selected as needed. It will be appreciated that each of the original data source and the target data source has its corresponding type information, such as MySQL, Oracle, SqlServer, etc. When a plurality of user-defined task types are packaged in the Azkaban, the synchronization information can also comprise the user-defined task types, so that a user can conveniently select the user-defined task types according to the requirement. After configuration is completed, the data is submitted to Azkaban through a UI interface so as to carry out a subsequent data synchronization process. Different from Sqoop and DataX, only a command line interface is provided, Azkaban supports a complete Web interface, assists a user to complete definition and submission of tasks, and can directly jump to a task execution interface to check task states and execution logs, so that the operation is convenient, and the use experience is more friendly.
In S3, Azkaban generates a corresponding scheduling task according to the synchronization information and the custom task type. For example, when there are multiple data synchronization tasks, the dependency relationship between the tasks can be obtained according to the synchronization information configured for each task, and the corresponding scheduling task is configured based on the dependency relationship. When the encapsulated custom task types are unique, Azkaban can generate a corresponding scheduling task containing the custom task type according to the custom task types, and when a plurality of encapsulated custom task types exist, Azkaban can generate a corresponding scheduling task according to the custom task type selected from the synchronization information configured by the user.
In S4, Azkaban performs scheduling task completion data synchronization generated in S3. The Azkaban can generate Job tasks corresponding to the custom task types according to the scheduling tasks, execute all Job tasks, call custom modules corresponding to the custom task types in the executing process, and use the custom modules to complete data synchronization. Because Azkaban has three modes to choose from: the trial mode of a single server, the double-server mode of a production environment and the distributed multi-executor mode can be deployed in different modes according to different scales of a user side and the number of tasks needing to be synchronized, the flexibility is high, and the Azkaban has excellent scheduling management capacity and can execute the tasks concurrently. Therefore, according to the synchronization information and the user-defined task type of the data, the distributed computation, high efficiency and stability of data synchronization can be realized by using the Azkaban, and the method is suitable for high-throughput and large-concurrency data scenes.
By adopting the technical scheme, distributed computation of data synchronization can be realized, the efficiency is high, the stability is high, and the method is suitable for high-throughput and large-concurrency data scenes.
In another embodiment of the present invention, the custom task type is a kubernets task, and the data synchronization method further includes the following steps, S5: the sync containers are packaged in kubernets. Kubernetes has become a de facto standard for container organization systems. More and more enterprises begin to embrace containers and form clusters through a container arrangement system, so that the advantages of good isolation, resource allocation and arrangement management of the containers are exerted to the greatest extent. Kubernets can be applied to large data scenes such as online micro-service environment and offline calculation, and a large data technology stack is convenient to build. In kubernets, different synchronization containers can be packaged according to different requirements of data synchronization, such as DataX, Sqoop, keytle and the like, and distributed (cluster) deployment of the executors of Azkaban is realized, so that the problem that the executors become performance bottlenecks because the Azkaban tasks can only run on fixed executors is solved.
In still another embodiment of the present invention, the step of performing scheduled task completion data synchronization S4 includes, S41: calling Kubernetes to generate a Kubernetes Job task corresponding to the scheduling task; s42: the Kubernetes Job task is executed to complete data synchronization. When the custom task type used in the data synchronization is a kubernets task, it is understood that, in S3, the generated scheduled task contains information that the custom task type is a kubernets task. Therefore, in the process of executing the scheduling task, Azkaban can call kubernets by using a kubernets interface according to the prior encapsulation to generate a corresponding kubernets Job task. Then, the Kubernetes Job task is executed, and the synchronization container packaged in the Kubernetes is called to complete data synchronization. The Kubernetes encapsulation synchronous container is used for completing data synchronization, the advantages of good isolation, resource allocation and arrangement management of the container can be exerted to the greatest extent, and the compatibility and efficiency of distributed data synchronization are improved.
In another embodiment of the present invention, the sync container is a DataX sync container, and the step S42 of performing Kubernetes Job task completion data synchronization includes, S421: acquiring an original data table and meta information corresponding to the original data table according to the synchronous information; s422: generating configuration information according to the original data table and the meta information; s423: and calling the DataX to read the configuration information to complete data synchronization. It is understood that in S2, the data name may be a name of a database or a name of a data table, and in the specific data synchronization process, the data table is used as a basic synchronization unit, when the data name represents the database name, all the original data tables and the meta information in the database are obtained for data synchronization, and when the data name represents the data table, the original data tables and the meta information corresponding to the table name are obtained for data synchronization. For example, the original data source is "nuclear medicine" of SqlServer type, the data name is tableA, and the target data source is "electronic medical record" of Oracle type. That is, when a data table tableA in "nuclear medicine" needs to be synchronized to an "electronic medical record" to generate a corresponding tableB, first, the corresponding data table tableA may be determined according to an original data source and a data name in synchronization information, and the original data table tableA and corresponding meta information, including structure information, type information, JDBC information, etc. of the tableA, are obtained, and corresponding configuration information is generated by combining the synchronization information and the structure information, the type information, and the JDBC information corresponding to the "electronic medical record" of the target data source. The configuration information includes JDBC information of the two side original data sources, fields to be synchronized, and corresponding types of the two side original data sources. And then invoking DataX to acquire and identify the table structure of the tableA, map and convert the structure and the type according to the tableA and the configuration information, performing table building operation on a target data source 'electronic medical record', and storing the table building operation on Oracle to finish data synchronization. The present invention is exemplified by synchronizing one data table, but it can be understood that a plurality of data tables or databases can be synchronized in the practical application process, and the present invention is not limited thereto. The DataX completely realizes a synchronization core based on JVM based on Java, can support high-efficiency data synchronization function between various heterogeneous original data sources such as MySQL, Oracle, SqlServer, Postgre, HDFS, Hive, ADS, HBase and the like in a plug-in mode, supports various storage formats such as ORC, Textfile and the like, and can complete the access and use of the system only by very simple JDBC configuration of upper-layer services. Therefore, the data X is used as the synchronization container, so that the method has good compatibility and simple configuration, and the efficiency of data synchronization among various heterogeneous original data sources is improved.
In a further embodiment of the invention, the synchronization information further comprises speed limit information. The speed limit information may be input or selected at the time of configuring the synchronization information at S2, and the configuration information also includes the corresponding speed limit information during the data synchronization at S4. The speed limit information is a threshold value of the speed of data synchronization in the process of performing data synchronization on the subsequent scheduling task, and is usually the highest threshold value, and the unit is megabyte per second measured by the data amount synchronized per second. For example, when the speed limit information is set to 50MB/s, the speed of the subsequent data synchronization is not higher than 50 MB/s. The speed of data synchronization can be limited by setting the speed limit information. The specific speed limit information can be set differently according to the condition of hardware, and if the hardware performance of the database server where the original data source is located is low, the highest threshold value is recommended to be reduced, which is not limited by the invention. Optionally, the speed limit information is 50MB/s to 300MB/s, the data synchronization speed can be guaranteed in the interval, the data synchronization efficiency is improved, damage to equipment hardware caused by too high speed can be prevented, and when the speed limit information is 50MB/s, operation can still be stable and smooth when the original data source with poor hardware level is faced. Optionally, the synchronization information may further include synchronization time, and at this time, Azkaban may perform data synchronization periodically according to a set time point, so as to facilitate better scheduling of tasks. Optionally, the information of the original data source is editable, for example, when the original data source selected in S2 is "nuclear medicine", the corresponding original data source is "nuclear medicine", the database type may be "SqlServer", and the corresponding database port number and other information are convenient for the user to manage, and the information may also include information such as a user name and a user password for authentication, so as to prevent data leakage and improve data confidentiality.
In another embodiment of the present invention, the method for data synchronization further includes the following steps, S6: when the scheduled task fails to execute, the scheduled task is retried. The invention optimizes and enriches the judgment of error conditions and the disaster recovery mechanism, and adds corresponding processing logic to various error conditions, so that when the execution failure of the scheduling task is monitored, the Azkaban is informed to retry, and the corresponding scheduling task is executed again. For example, whether the scheduling task is successfully executed or not can be judged according to whether a target table corresponding to the scheduling task is generated or not, when the situation that the scheduling task is not generated is monitored, Azkaban is notified to retry, if the scheduling task is not generated after the retry, the retry can be continued, and a corresponding error prompt can also be skipped through a UI (user interface), so that the user can conveniently process the scheduling task. By retrying the failed scheduling task, the success rate of data synchronization can be improved.
Referring to fig. 6, an embodiment of the present invention further discloses a system 1 for data synchronization, including: the configuration module 11 is configured to configure synchronization information of data, where the synchronization information includes an original data source, a data name, and a target data source; the Azkaban module 12, the Azkaban module 12 includes a first encapsulating unit 121, a scheduling unit 122 and an executing unit 123, the first encapsulating unit 121 is used for encapsulating the custom task type, the scheduling unit 122 is used for generating a scheduling task, and the executing unit 123 is used for executing the scheduling task; the custom module 13 corresponding to the custom task type is used for being called by the execution unit 123 to execute the scheduling task.
Referring to the data synchronization method in the foregoing embodiment, the first encapsulation unit 121 of the Azkaban module 12 encapsulates the custom task type in advance, the configuration module 11 configures the synchronization information of the data and submits the information to the scheduling unit 122 of the Azkaban, the scheduling unit 122 generates a corresponding scheduling task according to the synchronization information and the corresponding custom task type, and the execution unit 123 invokes the custom module 13 corresponding to the custom task type to execute all scheduling tasks, thereby completing data synchronization. The distributed computation of data synchronization is realized, the efficiency is high, the stability is high, and the method is suitable for high-throughput and large-concurrency data scenes.
Referring to fig. 7, in another embodiment of the present invention, the system 1 for data synchronization further includes a DataX module 14, the customization module 13 is a kubernets module, the customization task is a kubernets task, the execution unit 123 includes a kubernets interface, the kubernets interface is used for calling the kubernets module to generate a kubernets Job task, the kubernets module includes a second encapsulation unit 131, the second encapsulation unit 131 is used for encapsulating a DataX synchronization container, and the DataX module 14 is used for being called by the kubernets module to complete data synchronization. With reference to the data synchronization method in the foregoing embodiment, a Kubernetes package DataX synchronization container is used to complete data synchronization, so that the advantages of good isolation, resource allocation, and layout management of the container can be exerted to the greatest extent, the compatibility and efficiency of distributed data synchronization are improved, the problem that an executive becomes a performance bottleneck because an Azkaban task can only run on a fixed executive is solved, good compatibility is achieved, configuration is simple, and the efficiency of data synchronization between various heterogeneous original data sources is improved.
In a further embodiment of the invention, the synchronization information further comprises speed limit information. Referring to the data synchronization method in the foregoing embodiment, the speed of data synchronization can be ensured by setting the speed limit information, the efficiency of data synchronization is improved, damage to the hardware of the device due to an excessively high speed can be prevented, and the operation is still stable and smooth in the face of an original data source with a poor hardware level.
In another embodiment of the present invention, the Azkaban module 12 further includes a monitoring unit 124, and the monitoring unit 124 is configured to monitor completion of the scheduling task and send a signal to the execution unit 123. Referring to the data synchronization method in the foregoing embodiment, for example, the monitoring unit 124 may determine whether the scheduling task is successfully executed according to whether the target table corresponding to the scheduling task is generated, when it is detected that the scheduling task is not generated, the monitoring unit 124 may send a failure signal to the execution unit 123, the execution unit 123 retries the failed scheduling task after receiving the failure signal, and if the scheduling task is not generated after the retry, the retry may be continued, or a corresponding error prompt may be skipped through the UI interface, so as to facilitate user processing. By retrying the failed scheduling task, the success rate of data synchronization can be improved. When the corresponding target table is generated, the task is successfully completed, at this time, the monitoring unit 124 sends a success signal to the execution unit 123, and the execution unit 123 continues to execute other scheduling tasks to be executed, so that scheduling management of the tasks is optimized, and the efficiency of data synchronization is improved. For another example, it may be set that after a scheduling task is completed, a completion signal is automatically generated, and the monitoring unit 124 monitors the completion condition of the task according to whether a corresponding completion signal is monitored.
The embodiment of the invention also discloses a computing device, which comprises: a processor adapted to implement various instructions; a memory adapted to store a plurality of instructions adapted to be loaded by the processor and to implement the method of data synchronization of any of the preceding embodiments.
The computing equipment adopting the technical scheme can realize distributed computation of data synchronization, is efficient and stable when in use, and is suitable for high-throughput and large-concurrency data scenes.
The embodiment of the invention also discloses a storage medium, wherein a plurality of instructions are stored in the storage medium, and the instructions are suitable for being loaded by a processor and executing the method for synchronizing the data in the embodiment.
By adopting the storage medium of the technical scheme, distributed computation of data synchronization can be realized, the efficiency is high, the stability is high, and the method is suitable for high-throughput and large-concurrency data scenes.
The embodiments disclosed herein may be implemented in hardware, software, firmware, or a combination of these implementations. Embodiments of the application may be implemented as computer programs or program code executing on programmable systems comprising at least one processor, a storage system (including volatile and non-volatile memory and/or storage elements), at least one input device, and at least one output device. Program code may be applied to input instructions to perform the functions described herein and generate output information. The output information may be applied to one or more output devices in a known manner. For purposes of this application, a processing system includes any system having a processor such as, for example, a Digital Signal Processor (DSP), a microcontroller, an Application Specific Integrated Circuit (ASIC), or a microprocessor.
The program code may be implemented in a high level procedural or object oriented programming language to communicate with a processing system. The program code can also be implemented in assembly or machine language, if desired. Indeed, the mechanisms described in this application are not limited in scope to any particular programming language. In any case, the language may be a compiled or interpreted language.
In some cases, the disclosed embodiments may be implemented in hardware, firmware, software, or any combination thereof. The disclosed embodiments may also be implemented as instructions carried by or stored on one or more transitory or non-transitory machine-readable (e.g., computer-readable) storage media, which may be read and executed by one or more processors. For example, the instructions may be distributed via a network or via other computer readable media. Thus, a machine-readable medium may include any mechanism for storing or transmitting information in a form readable by a machine (e.g., a computer), including, but not limited to, floppy diskettes, optical disks, read-only memories (CD-ROMs), magneto-optical disks, read-only memories (ROMs), Random Access Memories (RAMs), erasable programmable read-only memories (EPROMs), electrically erasable programmable read-only memories (EEPROMs), magnetic or optical cards, flash memory, or a tangible machine-readable memory for transmitting information (e.g., carrier waves, infrared digital signals, etc.) using the internet in an electrical, optical, acoustical or other form of propagated signal. Thus, a machine-readable medium includes any type of machine-readable medium suitable for storing or transmitting electronic instructions or information in a form readable by a machine (e.g., a computer).
In the drawings, some features of the structures or methods may be shown in a particular arrangement and/or order. However, it is to be understood that such specific arrangement and/or ordering may not be required. Rather, in some embodiments, the features may be arranged in a manner and/or order different from that shown in the illustrative figures. In addition, the inclusion of a structural or methodical feature in a particular figure is not meant to imply that such feature is required in all embodiments, and in some embodiments, may not be included or may be combined with other features.
It should be noted that, all the modules/units mentioned in the embodiments of the apparatuses in this application are logical modules/units, and physically, one logical module/unit may be one physical module/unit, or may be a part of one physical module/unit, and may also be implemented by a combination of multiple physical modules/units, where the physical implementation manner of the logical modules/units itself is not the most important, and the combination of the functions implemented by the logical modules/units is the key to solve the technical problem proposed in this application. Furthermore, in order to highlight the innovative part of the present application, the above-mentioned embodiments of the apparatus of the present application do not introduce modules/units that are not so closely related to solve the technical problems presented in the present application, which does not indicate that there are no other modules/units in the above-mentioned embodiments of the apparatus.
While the invention has been shown and described with reference to certain preferred embodiments thereof, it will be understood by those skilled in the art that the foregoing is a more detailed description of the invention, taken in conjunction with the specific embodiments thereof, and that no limitation of the invention is intended thereby. Various changes in form and detail, including simple deductions or substitutions, may be made by those skilled in the art without departing from the spirit and scope of the invention.

Claims (12)

1. A method of data synchronization, comprising the steps of:
packaging the custom task type in the Azkaban;
configuring synchronous information of data, wherein the synchronous information comprises an original data source, a data name and a target data source;
generating a scheduling task according to the synchronization information and the user-defined task type;
and executing the scheduling task to complete data synchronization.
2. The method of data synchronization of claim 1, wherein the custom task type is a kubernets task, the method of data synchronization further comprising the steps of:
the sync containers are packaged in kubernets.
3. The method of data synchronization of claim 2, wherein the step of performing the scheduled task to accomplish data synchronization comprises:
calling the Kubernetes to generate a Kubernetes Job task corresponding to the scheduling task;
and executing the Kubernetes Job task to complete data synchronization.
4. The method of data synchronization of claim 3, wherein the synchronization container is a DataX synchronization container, and wherein the step of performing the Kubernets Job task to complete data synchronization comprises:
acquiring an original data table and meta information corresponding to the original data table according to the synchronous information;
generating configuration information according to the original data table and the meta information;
and calling DataX to read the configuration information to complete data synchronization.
5. The method of data synchronization of claim 4, wherein the synchronization information further comprises rate limit information.
6. The method of data synchronization of claim 1, further comprising the steps of:
and when the execution of the scheduling task fails, retrying the scheduling task.
7. A system for data synchronization, comprising:
the system comprises a configuration module, a data processing module and a data processing module, wherein the configuration module is used for configuring synchronous information of data, and the synchronous information comprises an original data source, a data name and a target data source;
the device comprises an Azkaban module, a task scheduling module and a task scheduling module, wherein the Azkaban module comprises a first packaging unit, a scheduling unit and an execution unit, the first packaging unit is used for packaging a self-defined task type, the scheduling unit is used for generating a scheduling task, and the execution unit is used for executing the scheduling task;
a custom module corresponding to the custom task type, the custom module being for being invoked by the execution unit to execute the scheduling task.
8. The system for data synchronization according to claim 7, further comprising a DataX module, wherein the custom module is a kubernets module, the custom task type is a kubernets task, the execution unit comprises a kubernets interface, the kubernets interface is used for calling the kubernets module to generate a kubernets Job task, the kubernets module comprises a second packaging unit, the second packaging unit is used for packaging a DataX synchronization container, and the DataX module is used for being called by the kubernets module to complete data synchronization.
9. The system for data synchronization of claim 8, wherein the synchronization information further comprises rate limit information.
10. The system for data synchronization of claim 8, wherein the Azkaban module further comprises a monitoring unit configured to monitor completion of the scheduled task and send a signal to the execution unit.
11. A computing device, comprising:
a processor adapted to implement various instructions;
a memory adapted to store a plurality of instructions adapted to be loaded by the processor and to perform the method of data synchronization of any of claims 1-6.
12. A storage medium storing a plurality of instructions adapted to be loaded by a processor and to perform the method of data synchronization according to any one of claims 1-6.
CN202010095675.4A 2020-02-17 2020-02-17 Data synchronization method, system, computing device and storage medium Active CN111258742B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010095675.4A CN111258742B (en) 2020-02-17 2020-02-17 Data synchronization method, system, computing device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010095675.4A CN111258742B (en) 2020-02-17 2020-02-17 Data synchronization method, system, computing device and storage medium

Publications (2)

Publication Number Publication Date
CN111258742A true CN111258742A (en) 2020-06-09
CN111258742B CN111258742B (en) 2023-08-04

Family

ID=70949324

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010095675.4A Active CN111258742B (en) 2020-02-17 2020-02-17 Data synchronization method, system, computing device and storage medium

Country Status (1)

Country Link
CN (1) CN111258742B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111708659A (en) * 2020-06-10 2020-09-25 中国—东盟信息港股份有限公司 Method for constructing cloud native disaster tolerance architecture based on kubernets
CN111930466A (en) * 2020-05-28 2020-11-13 武汉达梦数据库有限公司 Kubernetes-based data synchronization environment deployment method and device
CN113407629A (en) * 2021-06-18 2021-09-17 湖南快乐阳光互动娱乐传媒有限公司 Data synchronization method and device, electronic equipment and storage medium

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101398857A (en) * 2008-11-12 2009-04-01 北京星网锐捷网络技术有限公司 Data synchronization method in embedded distribution system and embedded distribution system
US20100185582A1 (en) * 2009-01-16 2010-07-22 Microsoft Corporation Web Deployment Functions and Interfaces
CN103023809A (en) * 2012-12-28 2013-04-03 中国船舶重工集团公司第七0九研究所 Information system synchronous data processing method utilizing secondary buffer technology
CN105095327A (en) * 2014-05-23 2015-11-25 深圳市珍爱网信息技术有限公司 Distributed ELT system and scheduling method
CN105162878A (en) * 2015-09-24 2015-12-16 网宿科技股份有限公司 Distributed storage based file distribution system and method
CN107766132A (en) * 2017-06-25 2018-03-06 平安科技(深圳)有限公司 Multi-task scheduling method, application server and computer-readable recording medium
CN108874524A (en) * 2018-06-21 2018-11-23 山东浪潮商用系统有限公司 Big data distributed task dispatching system
CN108958729A (en) * 2017-05-19 2018-12-07 腾讯科技(深圳)有限公司 A kind of data processing method, device and storage medium
WO2019000629A1 (en) * 2017-06-25 2019-01-03 平安科技(深圳)有限公司 Multi-data-source data synchronizing method and system, application server and computer readable storage medium
CN110069334A (en) * 2019-05-05 2019-07-30 重庆天蓬网络有限公司 A kind of method and system based on the distributed data job scheduling for assuring reason
CN110602253A (en) * 2019-09-30 2019-12-20 新华三大数据技术有限公司 Task scheduling method, device and system
CN110704458A (en) * 2019-08-15 2020-01-17 平安科技(深圳)有限公司 Data synchronization method and device, computer equipment and storage medium

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101398857A (en) * 2008-11-12 2009-04-01 北京星网锐捷网络技术有限公司 Data synchronization method in embedded distribution system and embedded distribution system
US20100185582A1 (en) * 2009-01-16 2010-07-22 Microsoft Corporation Web Deployment Functions and Interfaces
CN103023809A (en) * 2012-12-28 2013-04-03 中国船舶重工集团公司第七0九研究所 Information system synchronous data processing method utilizing secondary buffer technology
CN105095327A (en) * 2014-05-23 2015-11-25 深圳市珍爱网信息技术有限公司 Distributed ELT system and scheduling method
CN105162878A (en) * 2015-09-24 2015-12-16 网宿科技股份有限公司 Distributed storage based file distribution system and method
CN108958729A (en) * 2017-05-19 2018-12-07 腾讯科技(深圳)有限公司 A kind of data processing method, device and storage medium
CN107766132A (en) * 2017-06-25 2018-03-06 平安科技(深圳)有限公司 Multi-task scheduling method, application server and computer-readable recording medium
WO2019000629A1 (en) * 2017-06-25 2019-01-03 平安科技(深圳)有限公司 Multi-data-source data synchronizing method and system, application server and computer readable storage medium
CN108874524A (en) * 2018-06-21 2018-11-23 山东浪潮商用系统有限公司 Big data distributed task dispatching system
CN110069334A (en) * 2019-05-05 2019-07-30 重庆天蓬网络有限公司 A kind of method and system based on the distributed data job scheduling for assuring reason
CN110704458A (en) * 2019-08-15 2020-01-17 平安科技(深圳)有限公司 Data synchronization method and device, computer equipment and storage medium
CN110602253A (en) * 2019-09-30 2019-12-20 新华三大数据技术有限公司 Task scheduling method, device and system

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
刘英娜等: "基于大数据分析的船舶综合电网调度系统", 《舰船科学技术》 *
许皓皓等: "基于ETL的政务云气象数据仓库构建", 《计算机系统应用》 *
陈杰等: "游戏大数据平台工作流引擎研究与实践", 《电信科学》 *
骆金维等: "基于大数据平台的教学资源共享系统访问量实时统计", 《智能计算机与应用》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111930466A (en) * 2020-05-28 2020-11-13 武汉达梦数据库有限公司 Kubernetes-based data synchronization environment deployment method and device
CN111708659A (en) * 2020-06-10 2020-09-25 中国—东盟信息港股份有限公司 Method for constructing cloud native disaster tolerance architecture based on kubernets
CN113407629A (en) * 2021-06-18 2021-09-17 湖南快乐阳光互动娱乐传媒有限公司 Data synchronization method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN111258742B (en) 2023-08-04

Similar Documents

Publication Publication Date Title
US11656852B2 (en) System and method for autowiring of a microservice architecture
CN111258742B (en) Data synchronization method, system, computing device and storage medium
US11948014B2 (en) Multi-tenant control plane management on computing platform
CN108959385B (en) Database deployment method, device, computer equipment and storage medium
CN110825420A (en) Configuration parameter updating method, device, equipment and storage medium for distributed cluster
CN109614167B (en) Method and system for managing plug-ins
CN111897633A (en) Task processing method and device
CN111831757B (en) Method and device for generating and managing distributed global unique identification information
US11678092B2 (en) Method, apparatus and system for transmitting OMCI messages
CN113204353B (en) Big data platform assembly deployment method and device
CN111651219A (en) Method and equipment for managing multi-module project configuration file
US11271895B1 (en) Implementing advanced networking capabilities using helm charts
CN115242752B (en) Address allocation method, device, equipment and medium of battery management system
CN115426361A (en) Distributed client packaging method and device, main server and storage medium
CN111294377B (en) Dependency network request sending method, terminal device and storage medium
CN114691445A (en) Cluster fault processing method and device, electronic equipment and readable storage medium
CN111045783B (en) Method and device for generating container mirror image, storage medium and electronic equipment
CN110968406B (en) Method, device, storage medium and processor for processing task
CN112738181B (en) Method, device and server for cluster external IP access
CN109933562A (en) Server architecture, resource assemblage method and the method for obtaining server sensing data
CN115934292A (en) Calling method, device and equipment of microservice application
CN115563226A (en) Database-based data consumption method, control device and readable storage medium
CN112130900B (en) User information management method, system, equipment and medium for BMC
CN114679465A (en) Resource operation method and device, electronic equipment and storage medium
CN113867776A (en) Method and device for publishing middle station application, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant