CN114328722A - Data synchronization method and device supporting multiple data sources and computer equipment - Google Patents

Data synchronization method and device supporting multiple data sources and computer equipment Download PDF

Info

Publication number
CN114328722A
CN114328722A CN202111480115.1A CN202111480115A CN114328722A CN 114328722 A CN114328722 A CN 114328722A CN 202111480115 A CN202111480115 A CN 202111480115A CN 114328722 A CN114328722 A CN 114328722A
Authority
CN
China
Prior art keywords
data
message queue
supporting multiple
program
data sources
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111480115.1A
Other languages
Chinese (zh)
Inventor
张星亮
石昌义
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Workec Technology Co ltd
Original Assignee
Shenzhen Workec Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Workec Technology Co ltd filed Critical Shenzhen Workec Technology Co ltd
Priority to CN202111480115.1A priority Critical patent/CN114328722A/en
Publication of CN114328722A publication Critical patent/CN114328722A/en
Pending legal-status Critical Current

Links

Images

Abstract

The invention is suitable for the technical field of data processing, and particularly relates to a data synchronization method supporting multiple data sources, which comprises the following steps: acquiring a load value of a program node, a pressure index of operation and a database increment log; distributing the operation to the corresponding program node according to the load value and the pressure index; according to the database increment log, putting the distributed operation into a corresponding memory message queue for parallel analysis; and writing the parallelly analyzed incremental data into a storage message queue. The invention also provides a data synchronization device, computer equipment and a computer readable storage medium supporting multiple data sources. The data synchronization method supporting multiple data sources provided by the embodiment of the invention can support the safety configuration and the performance configuration of different scenes aiming at different services; meanwhile, the tasks are distributed evenly, and computer resources of each node are not wasted or idle as far as possible.

Description

Data synchronization method and device supporting multiple data sources and computer equipment
Technical Field
The invention belongs to the field of data processing, and particularly relates to a data synchronization method and device supporting multiple data sources, computer equipment and a computer readable storage medium.
Background
The data processing is the collection, transmission, storage, processing and the like of data. For some business scenarios, there are many mysql (a relational database system) data that need to be synchronized to kafka (a high throughput distributed publish-subscribe messaging system) and then big data analysis, categorization and querying in real time.
The existing solution is to query the database in a polling way to check whether the data is changed; or adopting a Canal (synchronous mysql incremental data) scheme to capture the latest real-time data from the database in real time by simulating mysql slave library and subscribing binlog (binary log of mysql) of the database.
Under the condition of multiple data sources and multiple process nodes, the method in the prior art has the problems of uneven task distribution, no support for local ordering and incapability of intercepting abnormal flow.
Disclosure of Invention
The embodiment of the invention provides a data synchronization method supporting multiple data sources, and aims to solve the problems that in the prior art, under the conditions of multiple data sources and multiple process nodes, task distribution is uneven, local order is not supported, and abnormal flow cannot be intercepted.
The embodiment of the invention is realized in such a way, and provides a data synchronization method supporting multiple data sources, which comprises the following steps:
acquiring a load value of a program node, a pressure index of operation and a database increment log;
distributing the operation to the corresponding program node according to the load value and the pressure index;
according to the database increment log, putting the distributed operation into a corresponding memory message queue for parallel analysis;
and writing the parallelly analyzed incremental data into a storage message queue.
The embodiment of the invention also provides a data synchronization device supporting multiple data sources, which comprises:
the data acquisition unit is used for acquiring a load value of a program node, a pressure index of a job and a database increment log;
the job distribution unit is used for distributing jobs to corresponding program nodes according to the load values and the pressure indexes;
the parallel analysis unit is used for putting the distributed operation into a corresponding memory message queue for parallel analysis according to the database increment log;
and the storage unit is used for writing the parallelly analyzed incremental data into the storage message queue.
The embodiment of the present invention further provides a computer device, which includes a memory and a processor, where the memory stores a computer program, and the computer program can be executed by the processor, and the processor includes the above apparatus.
An embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the processor is enabled to execute the steps of the above method.
In the embodiment of the invention, the load value of a program node, the pressure index of a job and a database increment log are obtained, and the job is distributed to the corresponding program node according to the load value and the pressure index; then according to the database increment log, putting the distributed operation into a corresponding memory message queue for parallel analysis; finally, writing the parallelly analyzed incremental data into a storage message queue, and supporting the safety configuration and the performance configuration of different scenes aiming at different services; meanwhile, the tasks are distributed evenly, and computer resources of each node are not wasted or idle as far as possible.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings required in the description of the embodiments or the prior art will be briefly described below. It is obvious that the drawings in the following description are only embodiments of the invention, and that other drawings can be derived from the provided drawings by a person skilled in the art without inventive effort.
FIG. 1 is a flow chart of a method for supporting data synchronization of multiple data sources according to an embodiment of the present invention;
FIG. 2 is a flow chart of a method for supporting data synchronization of multiple data sources according to another embodiment of the present invention;
FIG. 3 is a flow chart of a method for supporting data synchronization of multiple data sources according to another embodiment of the present invention;
fig. 4 is a block diagram of a data synchronization apparatus supporting multiple data sources according to an embodiment of the present invention;
fig. 5 is a block diagram of a data synchronization apparatus supporting multiple data sources according to another embodiment of the present invention;
fig. 6 is a block diagram of a data synchronization apparatus supporting multiple data sources according to another embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
In the embodiment of the invention, the load value of a program node, the pressure index of operation and a database increment log are obtained firstly; distributing the operation to the corresponding program node according to the load value and the pressure index; then according to the database increment log, putting the distributed operation into a corresponding memory message queue for parallel analysis; and finally writing the parallelly analyzed incremental data into a storage message queue. The data synchronization method supporting multiple data sources solves the problems that in the prior art, under the conditions of multiple data sources and multiple process nodes, task distribution is uneven, local order is not supported, and abnormal flow cannot be intercepted.
Fig. 1 shows a flowchart of a data synchronization method supporting multiple data sources, which is applicable to the method provided by an embodiment of the present invention, and includes the following steps:
and step S101, acquiring a load value of a program node, a pressure index of a job and a database increment log.
In the embodiment of the invention, the load value of the program node and the pressure index of the operation can be automatically sensed through the pressure sensing component.
For example, a program node reports information such as its memory, network, node load value, etc. to redis (a distributed key-value memory database) at intervals (assumed to be 30 seconds). And determining a unique master node (master node) through a distributed lock, wherein the master node periodically pulls the load condition of each program node to obtain the load value of the program node.
As an embodiment of the invention, the pressure index of the operation is estimated through the task to be processed of the program node.
In an embodiment of the present invention, the database increment log may be synchronously obtained from the database through the corresponding component.
Such components include, but are not limited to, dbsync (a tool for synchronization between heterogeneous databases), maxwell (binlog based on mysql).
And step S102, distributing the operation to the corresponding program node according to the load value and the pressure index.
In the embodiment of the present invention, the idle condition of the current program node is calculated according to the obtained load value of the program node, and then the job is allocated to the corresponding program node according to the obtained estimated pressure index of the job.
According to the embodiment of the invention, the load value of the program node is automatically sensed, and the operation task can be intelligently distributed to the optimal program node according to the estimated operation pressure index. For example, the current most idle program node can be calculated by the load value of the program node, and then the job task is distributed to the most idle program node according to the job pressure index.
Preferably, the program node is deployed in advance and then registered as a free node. The number of idle nodes may be one or more.
As an embodiment of the invention, a job set can also be registered in advance, and the job set can be used for newly adding jobs in the running process.
And step S103, according to the database increment log, putting the distributed operation into a corresponding memory message queue for parallel analysis.
In the embodiment of the invention, the table name and the library name can be analyzed according to the acquired database increment log, and then the distributed operation is put into the corresponding memory message queue for parallel analysis according to the hash value (namely hash (library name + table name)) of the analyzed table name and library name. And sending the memory message queue in a concurrent mode, and then analyzing the database increment log in the concurrent memory message queue.
For example, 10 memory message queues are maintained in the program memory, and for messages in different memory message queues, parallel processing is performed through the thread pool, and for messages in the same memory message queue, serial processing is performed, so that parallel processing capabilities of different tables can be accelerated.
In the embodiment of the present invention, data of the same table still needs to be stored in order.
As an embodiment of the present invention, a memory queue of memory blocking is introduced, and global ordering is transformed into in-table ordering. The method can be realized by the following steps:
assuming that 10 memory message queues are maintained in program memory, making Val equal to the value of hash (library name + table name), the corresponding memory message queue can be represented as mqs [ Val% 10 ]. And writing the incremental data corresponding to the database incremental log into a memory message queue with index as a subscript by the function index being Val% 10.
And step S104, writing the parallelly analyzed incremental data into a storage message queue.
In the embodiment of the invention, in order to facilitate the storage of data, the consumption of downstream services, the continuous processing and the like, the incremental data analyzed in parallel in the steps are written into the storage message queue. The store message queue may be a third party message queue, such as kafka.
As an embodiment of the present invention, different kafka AcK message acknowledgement mechanisms can be used in a personalized manner according to different requirements of different service scenarios on performance and security. Different acknowledgement mechanisms, vary in performance, the higher the security, the lower the performance.
For example, in the Kafka Ack message acknowledgement mechanism:
when the Ack is-1, it indicates that the producer (message producer) receives confirmation from the producer (slave object of producer sending data) and the leader (master object of producer sending data), and then sends the next piece of data.
When Ack is 0, it means that the producer does not wait for confirmation of completion of synchronization of the broker (a node of one Kafka server), and continues to transmit the next piece (batch) of data.
When Ack is 1, it indicates that the producer needs to wait for the leader to successfully receive the data and get an acknowledgement, and then sends the next piece of data.
In terms of safety, Ack-1 is better than Ack-1, and Ack-1 is better than Ack-0; in terms of performance, Ack-0 is better than Ack-1, and Ack-1 is better than Ack-1. Thus, the higher the safety, the lower the performance.
Different service scenes correspond to different subjects, and different Ack values can be selected according to the characteristics of the service of each subject.
In summary, in the data synchronization method supporting multiple data sources provided in the embodiment of the present invention, the load value of a program node, the pressure index of a job, and a database increment log are obtained first, and the job is allocated to the corresponding program node according to the load value and the pressure index; then according to the database increment log, putting the distributed operation into a corresponding memory message queue for parallel analysis; finally, writing the parallelly analyzed incremental data into a storage message queue, and supporting the safety configuration and the performance configuration of different scenes aiming at different services; meanwhile, the tasks are distributed evenly, and computer resources of each node are not wasted or idle as far as possible.
Fig. 2 is a flowchart of a data synchronization method supporting multiple data sources, which is applicable to another embodiment of the present invention, and step S102 specifically includes:
and step S1021, determining an idle program node according to the load value.
In the embodiment of the invention, the idle program node can be determined according to the load value of each program node.
For example, the currently most idle node may be calculated by the pressure sensing component using a particular algorithm. The program node reports information such as its memory, network, node load value, etc. to redis at intervals (30 seconds is assumed). And determining a unique master node (master node) through the distributed lock, and regularly pulling the load condition of each program node by the master node to obtain the load value of the program node. The specific algorithm described above is exemplified by:
free degree (the higher the value, the higher the degree of Free).
And Tasks is the number of the operation of the program node.
Load is the Load factor (representing the comprehensive Load condition of the node).
Load1 Load condition of past 1 minute.
Load5 Load condition of past 5 minutes.
Load15 Load condition of the past 15 minutes.
memoryFactor: and storing the correction factor.
network factor: a network traffic correction factor.
And (3) load calculation:
Load=Load1*3+Load5*5+Load15*1。
when the remaining available memory is 512MB < >:
memoryFactor=0。
when the remaining available memory > -512 MB:
memoryFactor ═ (current remaining available memory-512 MB)/512 MB.
When available network traffic is 5 MB:
networkFactor=0。
when available network traffic is between 5MB and 10 MB:
networkFactor=0.3。
when available network traffic >10 MB:
networkFactor=1。
Free=100/(Load+Tasks*50)*memoryFactor*networkFactor。
it should be noted that the above examples of the free node calculation method do not represent limitations to the scope of the present invention, and other methods capable of determining a free node are also within the scope of the present invention.
In step S1022, the job is assigned to the corresponding idle program node according to the stress index.
In the embodiment of the invention, the pressure indexes of different jobs are different corresponding to the idle degree of the program node required by the job, and the higher the pressure index of the job is, the higher the idle degree of the program node required to process the job is.
As one embodiment of the invention, jobs are assigned to the most idle program nodes. If the calculated stress indexes of all the program nodes are high, that is, all the program nodes are busy, it indicates that the job cannot be allocated.
As another embodiment of the present invention, when a job cannot be allocated, an error log may be printed to remind a user to perform capacity expansion.
Fig. 3 is a flowchart of a data synchronization method supporting multiple data sources, which is applicable to another embodiment of the present invention, and step S103 specifically includes:
step S1031, the configuration data is loaded.
In embodiments of the present invention, configuration data may be loaded via a synchronous configuration. For example, various configuration data is loaded from zookeeper (a kind of coordination service), and configuration modification is realized by means of a broadcast-subscription mechanism of zookeeper, and then the configuration is automatically synchronized to relevant business programs.
As an embodiment of the present invention, the definition of multiple message body formats and the configuration of various serialization protocols may be supported. Such as Protocol Buffers (serialization framework), Kryo (highly optimized custom 64-bit core architecture designed for heterogeneous computing), JSON (JavaScript Object Notation, JS Object Notation).
And step S1032, analyzing the hash value according to the database increment log and the configuration data.
In the embodiment of the present invention, according to the obtained database increment log and configuration data, a hash value is parsed, where the hash value includes, but is not limited to, a table name and a library name, such as hash (library name + table name).
And step S1033, according to the hash value, putting the distributed operation into a corresponding memory message queue for parallel analysis.
In an embodiment of the present invention, the allocated jobs are placed in the corresponding memory message queues for parallel parsing according to the parsed hash (library name + table name) value including the table name and the library name. And sending the memory message queue in a concurrent mode, and then analyzing the database increment log in the concurrent memory message queue.
For example, 10 memory message queues are maintained in the program memory, and for messages in different memory message queues, parallel processing is performed through the thread pool, and for messages in the same memory message queue, serial processing is performed, so that parallel processing capabilities of different tables can be accelerated.
As shown in fig. 1, a flowchart of a data synchronization method supporting multiple data sources according to another embodiment of the present invention further includes:
step S201, judging whether abnormal flow exists according to a parallel analysis result; when the judgment result is yes, executing step S105; when the judgment result is no, step S104 is executed.
In the embodiment of the invention, after the database incremental logs are analyzed in parallel, whether abnormal flow exists can be judged according to the analysis result.
For example, when the business scenario is multi-tenancy Saas (Software-as-a-Service) business, each user is assigned a unique enterprise-id. The abnormal traffic judging component can obtain the enterprise id from the analysis result to judge whether the enterprise has large-scale burst traffic. When data updating of a certain enterprise is abnormal, a large amount of messages are suddenly written into a storage message queue (such as kafka), so that the flow is overlarge, and the downstream business is consumed urgently, so that the normal data updating of other enterprises can be influenced.
When judging that the abnormal flow exists, executing the step S105; and when abnormal flow is judged to be absent, the step S104 is carried out, and the incremental data analyzed in parallel are written into the storage message queue.
And step S105, intercepting abnormal traffic.
In the embodiment of the invention, when the abnormal flow is judged, the abnormal flow is intercepted in order to ensure the normal operation of the system.
And step S106, writing the abnormal flow into a storage message queue at a low speed.
In the embodiment of the invention, the intercepted abnormal flow can be put into a slow queue, namely the abnormal flow is slowly written into a message storage queue, so that the influence on the whole current network service is avoided.
As an embodiment of the present invention, the intercepted abnormal traffic may be stored in a disk (or a memory), and then consumed slowly, so as to ensure that the normal traffic can be updated and synchronized in time.
Fig. 4 is a block diagram of a data synchronization apparatus supporting multiple data sources according to an embodiment of the present invention, including:
a data obtaining unit 401, configured to obtain a load value of a program node, a pressure index of a job, and a database increment log.
In the embodiment of the present invention, the data obtaining unit 401 may automatically sense the load value of the program node and the pressure index of the job through the pressure sensing component.
For example, a program node reports information such as its memory, network, node load value, etc. to redis (a distributed key-value memory database) at intervals (which may be 30 seconds). The data obtaining unit 401 determines a unique master node (master node) through a distributed lock, and the master node periodically pulls the load condition of each program node to obtain the load value of the program node.
In the embodiment of the invention, the pressure index of the job is estimated through the task to be processed of the program node, and the incremental logs of the database can be synchronously acquired from the database through corresponding components.
Such components include, but are not limited to, dbsync (a tool for synchronization between heterogeneous databases), maxwell (binlog based on mysql).
A job assigning unit 402, configured to assign a job to a corresponding program node according to the load value and the pressure index.
In this embodiment of the present invention, the job allocating unit 402 calculates the idle condition of the current program node according to the obtained load value of the program node, and allocates the job to the corresponding program node according to the obtained estimated pressure index of the job.
According to the embodiment of the invention, the load value of the program node is automatically sensed, and the operation task can be intelligently distributed to the optimal program node according to the estimated operation pressure index. For example, the current most idle program node can be calculated by the load value of the program node, and then the job task is distributed to the most idle program node according to the job pressure index.
Preferably, the program node is deployed in advance and then registered as a free node. The number of idle nodes may be one or more.
As an embodiment of the present invention, the job assigning unit 402 includes:
and the operation set pre-registration module is used for pre-registering an operation set, and the operation set can be used for newly adding an operation in the running process.
And a parallel analysis unit 403, configured to place the allocated job into a corresponding memory message queue for parallel analysis according to the database increment log.
In this embodiment of the present invention, the parallel parsing unit 403 may parse the table name and the library name according to the obtained database increment log, and then place the allocated job into the corresponding memory message queue for parallel parsing according to the hash value (i.e., hash (library name + table name)) of the parsed table name and library name. The parallel parsing unit 403 sends the memory message queue in a concurrent manner, and then parses the database increment log in the concurrent memory message queue.
For example, 10 memory message queues are maintained in the program memory, and for messages in different memory message queues, parallel processing is performed through the thread pool, and for messages in the same memory message queue, serial processing is performed, so that parallel processing capabilities of different tables can be accelerated.
In the embodiment of the present invention, data of the same table still needs to be stored in order.
As an embodiment of the present invention, the parallel parsing unit 403 includes:
and the in-table order transformation module is used for introducing the memory queue blocked by the memory and transforming the global order into the in-table order. The method can be realized by the following steps:
assuming that 10 memory message queues are maintained in program memory, making Val equal to the value of hash (library name + table name), the corresponding memory message queue can be represented as mqs [ Val% 10 ]. And writing the incremental data corresponding to the database incremental log into a memory message queue with index as a subscript by the function index being Val% 10.
And the storage unit 404 is configured to write the parallel parsed incremental data into the storage message queue.
In the embodiment of the present invention, in order to facilitate storage of data, consumption of downstream services, continued processing, and the like, the storage unit 404 writes the parallel parsed incremental data into the storage message queue. The store message queue may be a third party message queue, such as kafka.
As an embodiment of the present invention, the storage unit 404 includes:
and the personalized message confirmation mechanism module is used for using different kafka AcK message confirmation mechanisms in a personalized manner according to different requirements of different service scenes on performance and safety. Different acknowledgement mechanisms, vary in performance, the higher the security, the lower the performance.
For example, in the Kafka Ack message acknowledgement mechanism:
when the Ack is-1, it indicates that the producer (message producer) receives confirmation from the producer (slave object of producer sending data) and the leader (master object of producer sending data), and then sends the next piece of data.
When Ack is 0, it means that the producer does not wait for confirmation of completion of synchronization of the broker (a node of one Kafka server), and continues to transmit the next piece (batch) of data.
When Ack is 1, it indicates that the producer needs to wait for the leader to successfully receive the data and get an acknowledgement, and then sends the next piece of data.
In terms of safety, Ack-1 is better than Ack-1, and Ack-1 is better than Ack-0; in terms of performance, Ack-0 is better than Ack-1, and Ack-1 is better than Ack-1. Thus, the higher the safety, the lower the performance.
Different service scenes correspond to different themes, and the personalized message confirmation mechanism module can select different Ack values according to the characteristics of the service of each theme.
In summary, in the data synchronization method supporting multiple data sources provided in the embodiments of the present invention, the data synchronization apparatus supporting multiple data sources first obtains the load value of the program node, the pressure index of the job, and the database increment log, and allocates the job to the corresponding program node according to the load value and the pressure index; then according to the database increment log, putting the distributed operation into a corresponding memory message queue for parallel analysis; finally, writing the parallelly analyzed incremental data into a storage message queue, and supporting the safety configuration and the performance configuration of different scenes aiming at different services; meanwhile, the tasks are distributed evenly, and computer resources of each node are not wasted or idle as far as possible.
Fig. 5 is a block diagram showing a structure of a data synchronization apparatus supporting multiple data sources, which is applicable to another embodiment of the present invention, and the job assigning unit 402 specifically includes:
an idle node determining module 4021, configured to determine an idle program node according to the load value.
For example, the idle node determination module 4021 may calculate the currently most idle node by using a specific algorithm through the pressure sensing component. The program node reports information such as its memory, network, node load value, etc. to redis at intervals (here, 30 seconds are temporarily set). And determining a unique master node (master node) through the distributed lock, and regularly pulling the load condition of each program node by the master node to obtain the load value of the program node. The specific algorithm described above is exemplified by:
free degree (the higher the value, the higher the degree of Free).
And Tasks is the number of the operation of the program node.
Load is the Load factor (representing the comprehensive Load condition of the node).
Load1 Load condition of past 1 minute.
Load5 Load condition of past 5 minutes.
Load15 Load condition of the past 15 minutes.
memoryFactor: and storing the correction factor.
network factor: a network traffic correction factor.
And (3) load calculation:
Load=Load1*3+Load5*5+Load15*1。
when the remaining available memory is 512MB < >:
memoryFactor=0。
when the remaining available memory > -512 MB:
memoryFactor ═ (current remaining available memory-512 MB)/512 MB.
When available network traffic is 5 MB:
networkFactor=0。
when available network traffic is between 5MB and 10 MB:
networkFactor=0.3。
when available network traffic >10 MB:
networkFactor=1。
Free=100/(Load+Tasks*50)*memoryFactor*networkFactor。
it should be noted that the above examples of the free node calculation method do not represent limitations to the scope of the present invention, and other methods capable of determining a free node are also within the scope of the present invention.
The job assignment module 4022 is configured to assign a job to a corresponding idle program node according to the pressure index.
In the embodiment of the invention, different work pressure indexes correspond to different idle degrees of program nodes required by the work, and the higher the work pressure index is, the higher the idle degree of the program node required to process the work is. The job assignment module 4022 assigns jobs to corresponding idle program nodes according to the stress index.
As an embodiment of the present invention, the job assigning module 4022 includes:
and the first job distribution module is used for distributing the jobs to the most idle program nodes. If the calculated stress indexes of all the program nodes are high, that is, all the program nodes are busy, it indicates that the job cannot be allocated.
As another embodiment of the present invention, the job assigning module 4022 includes:
and the reminding module is used for printing an error log when the operation cannot be distributed and reminding a user of capacity expansion.
Fig. 6 shows a block diagram of a data synchronization apparatus supporting multiple data sources, which is applicable to another embodiment of the present invention, where the parallel parsing unit 403 specifically includes:
a configuration data loading module 4031 for loading configuration data.
In an embodiment of the invention, configuration data loading module 4031 may load configuration data by synchronizing configurations. For example, various configuration data is loaded from zookeeper (a kind of coordination service), and configuration modification is realized by means of a broadcast-subscription mechanism of zookeeper, and then the configuration is automatically synchronized to relevant business programs.
As one embodiment of the invention, configuration data loading module 4031 may support the definition of multiple message body formats and the configuration of various serialization protocols. Such as Protocol Buffers (serialization framework), Kryo (highly optimized custom 64-bit core architecture designed for heterogeneous computing), JSON (JavaScript Object Notation, JS Object Notation).
And the parsing module 4032 is configured to parse the hash value according to the database increment log and the configuration data.
In the embodiment of the present invention, the parsing module 4032 parses a hash value according to the obtained database incremental log and configuration data, where the hash value includes, but is not limited to, a table name and a library name, such as a hash (library name + table name).
And the parallel analysis module 4033 is configured to place the allocated job into a corresponding memory message queue for parallel analysis according to the hash value.
In the embodiment of the present invention, the parallel parsing module 4033 puts the allocated job into the corresponding memory message queue for parallel parsing according to the hash (library name + table name) value including the table name and the library name that is parsed. The parallel parsing module 4033 sends the memory message queue in a concurrent manner, and then parses the database increment log in the concurrent memory message queue.
For example, 10 memory message queues are maintained in the program memory, and for messages in different memory message queues, parallel processing is performed through the thread pool, and for messages in the same memory message queue, serial processing is performed, so that parallel processing capabilities of different tables can be accelerated.
As shown in fig. 4, a block diagram of a data synchronization apparatus supporting multiple data sources according to another embodiment of the present invention further includes:
an abnormal traffic determination unit 501, configured to determine whether there is abnormal traffic according to the parallel analysis result.
In the embodiment of the present invention, after the database incremental logs are analyzed in parallel, the abnormal traffic determination unit 501 may determine whether there is abnormal traffic according to the analysis result.
For example, when the business scenario is multi-tenancy Saas (Software-as-a-Service) business, each user is assigned a unique enterprise-id. The abnormal traffic judging component can obtain the enterprise id from the analysis result to judge whether the enterprise has large-scale burst traffic. When data updating of a certain enterprise is abnormal, a large amount of messages are suddenly written into a storage message queue (such as kafka), so that the flow is overlarge, and the downstream business is consumed urgently, so that the normal data updating of other enterprises can be influenced.
When the abnormal traffic determination unit 501 determines that there is abnormal traffic, it processes the abnormal traffic by the abnormal traffic interception unit 405; when the abnormal traffic determination unit 501 determines that there is no abnormal traffic, the storage unit 404 writes the incremental data obtained by parallel parsing into the storage message queue.
And an abnormal traffic intercepting unit 405, configured to intercept the abnormal traffic.
In the embodiment of the present invention, when the abnormal traffic determining unit 501 determines that there is abnormal traffic, the abnormal traffic intercepting unit 405 intercepts the abnormal traffic in order to ensure that the system operates normally.
And the abnormal traffic storage unit 406 is used for writing the abnormal traffic into the storage message queue at a slow speed.
In the embodiment of the present invention, the abnormal traffic storage unit 406 may put the intercepted abnormal traffic into a slow queue, that is, the abnormal traffic is written into a message storage queue at a slow speed, so as to avoid affecting the entire current network service.
As an embodiment of the present invention, the abnormal traffic storage unit 406 includes:
and the abnormal flow storage module is used for storing the intercepted abnormal flow in a disk (or a memory), and then consuming the abnormal flow at a low speed to ensure that the normal flow can be updated and synchronized in time.
The invention also proposes a computer device comprising a processor for executing a data synchronization apparatus supporting multiple data sources as described above.
The computer equipment provided by the embodiment of the invention also comprises a memory. Illustratively, a computer program can be partitioned into one or more modules, which are stored in memory and executed by a processor to implement the present invention. One or more of the modules may be a series of computer program instruction segments capable of performing certain functions, the instruction segments being used to describe the execution of a computer program in a data synchronization apparatus that supports multiple data sources.
Those skilled in the art will appreciate that the above description of a data synchronization apparatus supporting multiple data sources is merely an example, and does not constitute a limitation of the data synchronization apparatus supporting multiple data sources, and may include more or less components than those described above, or combine some components, or different components, such as may include input-output devices, network access devices, buses, etc.
The Processor may be a Central Processing Unit (CPU), other general purpose Processor, a Micro Control Unit (MCU), a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, a discrete hardware component, etc. The general purpose processor may be a microprocessor or the processor may be any conventional processor or the like, the processor is the control center of the data synchronization apparatus supporting multiple data sources, and various interfaces and lines are used to connect various parts of the entire data synchronization apparatus supporting multiple data sources.
The memory can be used for storing the computer program and/or the module, and the processor realizes various functions of the charger by running or executing the computer program and/or the module stored in the memory and calling the data stored in the memory. The memory may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data, a phonebook, etc.) created according to the use of the cellular phone, and the like. In addition, the memory may include high speed random access memory, and may also include non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), at least one magnetic disk storage device, a Flash memory device, or other volatile solid state storage device.
The modules/units integrated by the data synchronization device supporting multiple data sources can be stored in a computer readable storage medium if they are implemented in the form of software functional units and sold or used as independent products. Based on such understanding, all or part of the functions of the units in the above embodiments may be implemented by a computer program, which may be stored in a computer-readable storage medium and used by a processor to implement the functions of the above embodiments. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims (10)

1. A data synchronization method supporting multiple data sources, comprising the steps of:
acquiring a load value of a program node, a pressure index of operation and a database increment log;
distributing the operation to the corresponding program node according to the load value and the pressure index;
according to the database increment log, putting the distributed operation into a corresponding memory message queue for parallel analysis;
and writing the parallelly analyzed incremental data into a storage message queue.
2. The method for data synchronization supporting multiple data sources according to claim 1, wherein the step of assigning jobs to corresponding program nodes according to the load values and the stress indexes specifically comprises:
determining an idle program node according to the load value;
and distributing the operation to the corresponding idle program node according to the pressure index.
3. The method for data synchronization supporting multiple data sources according to claim 1, wherein the step of placing the allocated jobs into corresponding memory message queues for parallel parsing according to the database increment logs specifically comprises:
loading configuration data;
analyzing a hash value according to the database increment log and the configuration data;
and according to the hash value, putting the distributed operation into a corresponding memory message queue for parallel analysis.
4. The method for data synchronization supporting multiple data sources as claimed in claim 1, wherein the method further comprises:
judging whether abnormal flow exists according to the parallel analysis result;
if yes, intercepting the abnormal traffic;
and writing the abnormal flow into a storage message queue at a low speed.
5. A data synchronization apparatus supporting multiple data sources, comprising:
the data acquisition unit is used for acquiring a load value of a program node, a pressure index of a job and a database increment log;
the job distribution unit is used for distributing jobs to corresponding program nodes according to the load values and the pressure indexes;
the parallel analysis unit is used for putting the distributed operation into a corresponding memory message queue for parallel analysis according to the database increment log;
and the storage unit is used for writing the parallelly analyzed incremental data into the storage message queue.
6. The apparatus for data synchronization supporting multiple data sources as claimed in claim 5, wherein the job assigning unit specifically comprises:
an idle node determining module, configured to determine an idle program node according to the load value;
and the job distribution module is used for distributing the jobs to the corresponding idle program nodes according to the pressure indexes.
7. The apparatus for data synchronization supporting multiple data sources as claimed in claim 5, wherein the parallel parsing unit specifically comprises:
the configuration data loading module is used for loading configuration data;
the analysis module is used for analyzing a hash value according to the database increment log and the configuration data;
and the parallel analysis module is used for putting the distributed operation into a corresponding memory message queue for parallel analysis according to the hash value.
8. The apparatus for data synchronization supporting multiple data sources as claimed in claim 5, wherein said apparatus further comprises:
the abnormal flow judging unit is used for judging whether abnormal flow exists according to the parallel analysis result;
the abnormal flow intercepting unit is used for intercepting the abnormal flow;
and the abnormal flow storage unit is used for writing the abnormal flow into the storage message queue at a low speed.
9. A computer arrangement comprising a memory and a processor, the memory having stored therein a computer program executable by the processor, the processor comprising the apparatus according to any of claims 5-8.
10. A computer-readable storage medium, having stored thereon a computer program which, when executed by a processor, causes the processor to carry out the steps of the method according to any one of claims 1-4.
CN202111480115.1A 2021-12-06 2021-12-06 Data synchronization method and device supporting multiple data sources and computer equipment Pending CN114328722A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111480115.1A CN114328722A (en) 2021-12-06 2021-12-06 Data synchronization method and device supporting multiple data sources and computer equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111480115.1A CN114328722A (en) 2021-12-06 2021-12-06 Data synchronization method and device supporting multiple data sources and computer equipment

Publications (1)

Publication Number Publication Date
CN114328722A true CN114328722A (en) 2022-04-12

Family

ID=81048395

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111480115.1A Pending CN114328722A (en) 2021-12-06 2021-12-06 Data synchronization method and device supporting multiple data sources and computer equipment

Country Status (1)

Country Link
CN (1) CN114328722A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116860869A (en) * 2023-05-29 2023-10-10 玖章算术(浙江)科技有限公司 Queue delivery method and system under primary key concurrency scene

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116860869A (en) * 2023-05-29 2023-10-10 玖章算术(浙江)科技有限公司 Queue delivery method and system under primary key concurrency scene

Similar Documents

Publication Publication Date Title
CN108388479B (en) Delayed message pushing method and device, computer equipment and storage medium
CN108874558B (en) Message subscription method of distributed transaction, electronic device and readable storage medium
JP2019523462A (en) Multitask scheduling method, system, application server, and computer-readable storage medium
CN107807815B (en) Method and device for processing tasks in distributed mode
JP2011523738A (en) Mass data processing method and system
CN109388677B (en) Method, device and equipment for synchronizing data among clusters and storage medium thereof
WO2022132233A1 (en) Multi-tenant control plane management on computing platform
US20150112934A1 (en) Parallel scanners for log based replication
CN112579692B (en) Data synchronization method, device, system, equipment and storage medium
CN111966943A (en) Streaming data distribution method and system
CN113821506A (en) Task execution method, device, system, server and medium for task system
CN114328722A (en) Data synchronization method and device supporting multiple data sources and computer equipment
CN112860387A (en) Distributed task scheduling method and device, computer equipment and storage medium
CN112231073A (en) Distributed task scheduling method and device
CN113360577A (en) MPP database data processing method, device, equipment and storage medium
CN113422808A (en) Internet of things platform HTTP information pushing method, system, device and medium
CN112307046A (en) Data acquisition method and device, computer readable storage medium and electronic equipment
CN113761052A (en) Database synchronization method and device
CN116842090A (en) Accounting system, method, equipment and storage medium
CN111767126A (en) System and method for distributed batch processing
CN110750362A (en) Method and apparatus for analyzing biological information, and storage medium
CN112486638A (en) Method, apparatus, device and storage medium for executing processing task
CN115309827A (en) Data differentiation synchronization method, system, device and medium
CN110971664B (en) Interface service management system
CN111324668B (en) Database data synchronous processing method, device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination