CN109656999B

CN109656999B - Method, device, storage medium and apparatus for synchronizing large data volume data

Info

Publication number: CN109656999B
Application number: CN201811188299.2A
Authority: CN
Inventors: 许永夫
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2018-10-11
Filing date: 2018-10-11
Publication date: 2024-03-15
Anticipated expiration: 2038-10-11
Also published as: CN109656999A

Abstract

The invention discloses a data synchronization method, equipment, a storage medium and a device for large data volume, wherein the method comprises the following steps: acquiring source data with large data volume from a source terminal, classifying the source data with large data volume according to a service type cluster, and acquiring link data with multiple categories; and synchronizing the link data of each category to a target end through different links respectively, so that the target end starts a plurality of preset processes to process the link data of each category in parallel. According to the invention, the link data of each category are respectively synchronized to the target end through different links, so that the target end starts a plurality of preset processes to process the link data of each category in parallel, a large amount of source data is prevented from being transmitted through one link, the efficiency is low, and the time for synchronizing the data of a large amount of data is greatly shortened.

Description

Method, device, storage medium and apparatus for synchronizing large data volume data

Technical Field

The present invention relates to the field of big data technologies, and in particular, to a method, an apparatus, a storage medium, and a device for synchronizing data with a big data volume.

Background

At present, it is necessary to synchronize the data change in one system to another system in time, if the data update amount is very large (more than ten millions) at a certain time, and the data correlation is complex, the synchronization will generate relatively large delay, and if the data amount is very large, how to improve the efficiency of synchronizing the data with large data amount is a technical problem to be solved.

The foregoing is provided merely for the purpose of facilitating understanding of the technical solutions of the present invention and is not intended to represent an admission that the foregoing is prior art.

Disclosure of Invention

The invention mainly aims to provide a large-data-volume data synchronization method, equipment, a storage medium and a device, and aims to solve the technical problem of low efficiency when large-data-volume data are synchronized in the prior art.

In order to achieve the above object, the present invention provides a data synchronization method of a large data volume, the data synchronization method of a large data volume comprising the steps of:

acquiring source data with large data volume from a source terminal, classifying the source data with large data volume according to a service type cluster, and acquiring link data with multiple categories;

and synchronizing the link data of each category to a target end through different links respectively, so that the target end starts a plurality of preset processes to process the link data of each category in parallel.

Preferably, the obtaining the source data with large data volume from the source end classifies the source data with large data volume according to the service type cluster to obtain link data with multiple categories, including:

acquiring source data with large data volume from a source terminal, and analyzing the source data with large data volume to acquire each business keyword set corresponding to each source data;

acquiring a keyword set of each type of cluster corresponding to each service type cluster;

matching each business keyword set with each type of cluster keyword set to obtain a matching degree;

and classifying each source data corresponding to each business keyword set according to the matching degree to obtain link data of a plurality of categories.

Preferably, the service type cluster comprises a plurality of associated service types;

the method for synchronizing the data with large data volume before the matching degree is obtained by respectively matching each business keyword set with each type of cluster keyword set comprises the following steps:

acquiring sample data corresponding to each associated service type in each service type cluster, and extracting keywords from the sample data to obtain sample keywords;

and constructing various types of cluster keyword sets corresponding to the business type clusters according to the sample keywords.

Preferably, the matching the service keyword sets with the cluster keyword sets of each type to obtain a matching degree includes:

traversing each business keyword set, and respectively matching the business keyword sets with each type of cluster keyword set to obtain the matching degree between each business keyword set and each type of cluster keyword set.

Preferably, the synchronizing the link data of each category to the target terminal through different links, so that the target terminal starts a plurality of preset processes to process the link data of each category in parallel, includes:

reading an online redo log file from a database of the source end through an extraction process;

analyzing the online redo log file to obtain change data in the link data of each category;

and synchronizing the change data in the link data of each category to a target end through a transmission process, so that the target end starts a plurality of preset processes to process the change data in the link data of each category in parallel.

Preferably, after the analyzing the online redo log file to obtain the change data in the link data of each category, the data synchronization method of the large data volume further includes:

converting the format of the change data in the link data of each category according to a preset rule to obtain the data to be transmitted in the link data of each category, and storing the data to be transmitted in a preset queue file;

the step of synchronizing the change data in the link data of each category to the target end through the transmission process, so that the target end starts a plurality of preset processes to process the change data in the link data of each category in parallel, comprising the following steps:

synchronizing the data to be transmitted in the link data of each category to a target end through a transmission process, so that the target end starts a plurality of preset processes to process the data to be transmitted in the link data of each category in parallel.

Preferably, the synchronizing the data to be transmitted in the link data of each category to the target end through the transmission process includes:

synchronizing the data to be transmitted in the link data of each category to a target end through a transmission process, and recording the synchronization time;

acquiring a current time, and detecting whether synchronization completion information sent by the target end is received or not when the time difference between the current time and the synchronization time exceeds a preset synchronization time;

and if not, repeating the step of synchronizing the data to be transmitted in the link data of each category to the target end through the transmission process.

In addition, in order to achieve the above object, the present invention also proposes a large-data-volume data synchronization apparatus including a memory, a processor, and a large-data-volume data synchronization program stored on the memory and executable on the processor, the large-data-volume data synchronization program being configured to implement the steps of the large-data-volume data synchronization method as described above.

In addition, in order to achieve the above object, the present invention also proposes a storage medium having stored thereon a data synchronization program of a large data amount, which when executed by a processor, implements the steps of the data synchronization method of a large data amount as described above.

In addition, in order to achieve the above object, the present invention also proposes a data synchronization device of large data volume, the data synchronization device of large data volume comprising:

the classification module is used for acquiring source data with large data volume from a source terminal, classifying the source data with large data volume according to a service type cluster and acquiring link data with multiple categories;

and the synchronization module is used for synchronizing the link data of each category to the target end through different links respectively, so that the target end starts a plurality of preset processes to process the link data of each category in parallel.

In the invention, the source data with large data volume is obtained from the source end, and the source data with large data volume is classified according to the service type cluster to obtain link data with multiple categories, so that the source data with strong relevance is divided into one category, the data synchronization is carried out by the category in the follow-up, and the synchronization efficiency is improved; and synchronizing the link data of each category to the target end through different links respectively, so that the target end starts a plurality of preset processes to process the link data of each category in parallel, synchronization delay caused by simultaneous synchronization of data with large data quantity is avoided, and the time for data synchronization of the data with large data quantity is greatly shortened.

Drawings

FIG. 1 is a schematic diagram of a large data volume data synchronization device of a hardware runtime environment according to an embodiment of the present invention;

FIG. 2 is a flow chart of a first embodiment of a large data volume data synchronization method according to the present invention;

FIG. 3 is a flow chart of a second embodiment of a large data volume data synchronization method according to the present invention;

FIG. 4 is a flow chart of a third embodiment of a large data volume data synchronization method according to the present invention;

fig. 5 is a block diagram of a first embodiment of a large data volume data synchronization apparatus according to the present invention.

The achievement of the objects, functional features and advantages of the present invention will be further described with reference to the accompanying drawings, in conjunction with the embodiments.

Detailed Description

It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

Referring to fig. 1, fig. 1 is a schematic diagram of a data synchronization device with a large data volume in a hardware running environment according to an embodiment of the present invention.

As shown in fig. 1, the data synchronization device of a large data volume may include: a processor 1001, such as a central processing unit (Central Processing Unit, CPU), a communication bus 1002, a user interface 1003, a network interface 1004, a memory 1005. Wherein the communication bus 1002 is used to enable connected communication between these components. The user interface 1003 may include a Display (Display), and the optional user interface 1003 may also include a standard wired interface, a wireless interface, and the wired interface for the user interface 1003 may be a USB interface in the present invention. The network interface 1004 may optionally include a standard wired interface, a WIreless interface (e.g., a WIreless-FIdelity (WI-FI) interface). The Memory 1005 may be a high-speed random access Memory (Random Access Memory, RAM) Memory or a stable Memory (NVM), such as a disk Memory. The memory 1005 may also optionally be a storage device separate from the processor 1001 described above.

It will be appreciated by those skilled in the art that the structure shown in fig. 1 does not constitute a limitation of a data synchronization device for large data volumes, and may include more or fewer components than shown, or may combine certain components, or may be a different arrangement of components.

As shown in fig. 1, the memory 1005, which is assumed to be a computer storage medium, may include an operating system, a network communication module, a user interface module, and a data synchronization program for a large data volume.

In the data synchronization device with large data volume shown in fig. 1, the network interface 1004 is mainly used for connecting to a background server, and performing data communication with the background server; the user interface 1003 is mainly used for connecting user equipment; the large-data-volume data synchronization apparatus calls a large-data-volume data synchronization program stored in the memory 1005 through the processor 1001, and executes the large-data-volume data synchronization method provided by the embodiment of the present invention.

Based on the above hardware structure, an embodiment of the data synchronization method of the present invention for large data volume is presented.

Referring to fig. 2, fig. 2 is a flow chart of a first embodiment of a large data volume data synchronization method according to the present invention.

In a first embodiment, the data synchronization method of a large data volume includes the steps of:

step S10: and acquiring the source data with large data volume from the source terminal, and classifying the source data with large data volume according to the service type cluster to obtain link data with multiple categories.

It should be understood that the execution body of the embodiment is the data synchronization device with large data volume, where the data synchronization device with large data volume may be an electronic device such as a personal computer or a server. The synchronization of the source data with large data volume can be realized through OGG (oracle golden gate), the OGG provides real-time capturing, conversion and delivery of transaction data under heterogeneous environment, the source data with large data volume is obtained from the source end through the OGG, the source data with large data volume can be classified into a plurality of types of link data in order to improve the synchronization efficiency of the source data with large data volume, and the associated service data can be transmitted through the same link, so that the condition that if one link fails during transmission through different links, the associated service data needs to wait to influence the subsequent processing progress of the associated service data is avoided.

It can be understood that the service type clusters include a plurality of associated service types, so that the source data in each service type cluster is source data with strong association, and when service data processing is generally performed, data analysis is also required according to the source data with strong association, so that the source data with large data volume is classified according to the service type clusters. The source data in each service type cluster is used as link data of one category, and the link data of each category is transmitted through different links respectively, so that the transmission efficiency is improved, and a large amount of source data is prevented from being transmitted through one link, and the efficiency is low.

Step S20: and synchronizing the link data of each category to a target end through different links respectively, so that the target end starts a plurality of preset processes to process the link data of each category in parallel.

It should be noted that, each kind of link data is synchronized to the target end through different links, so that a large amount of source data is transmitted in a branching way, and the transmission efficiency is improved. The preset process comprises a REP process, and in order to further improve the synchronization efficiency, a plurality of REP processes can be started to process the link data in parallel at the target end, and usually the link data is stored in a tail data file, namely, a plurality of REP processes are started to process the tail data file in parallel. the tail data file is a descriptive file, and can be split according to new addition, deletion or modification and the like, and is processed in parallel by a plurality of processes.

In a specific implementation, an Online Redo Log file (Online Redo Log) or an Archive Log (Archive Log) is read in a database of a source end by an OGG through an extraction Process (Extract Process), then analysis is performed, only change information of data in the Online Redo Log file (Online Redo Log) or the Archive Log (Archive Log) is extracted, such as addition, deletion or modification operation is performed, the extracted information is converted into a golden gate custom intermediate format and is stored in a queue file (track file), and the queue file (track file) is transmitted to a target end through a TCP/IP through a transmission Process. The target end has a process called Server Collector, the process accepts the data change information transmitted from the source end, caches the information into a GoldenGate queue file (track file), waits for the copying process of the target end to read data, reads the data change information from the queue file (track file) by the GoldenGate copying process, creates a corresponding structured query language (Structured Query Language, abbreviated SQL) statement, executes the statement through a local interface of the database, submits the statement to the database of the target end, updates own check points after the successful submission, records the position where the copying is completed, and finally completes the copying process of the data.

In the embodiment, the source data with large data volume is obtained from the source end, and the source data with large data volume is classified according to the service type cluster to obtain link data with multiple categories, so that the source data with strong relevance is divided into one category, the data synchronization is carried out by the category in the follow-up, and the synchronization efficiency is improved; and synchronizing the link data of each category to the target end through different links respectively, so that the target end starts a plurality of preset processes to process the link data of each category in parallel, synchronization delay caused by simultaneous synchronization of data with large data quantity is avoided, and the time for data synchronization of the data with large data quantity is greatly shortened.

Referring to fig. 3, fig. 3 is a flowchart of a second embodiment of the large data volume data synchronization method according to the present invention, and based on the first embodiment shown in fig. 2, the second embodiment of the large data volume data synchronization method according to the present invention is proposed.

In a second embodiment, the step S10 includes:

step S101: and acquiring source data with large data volume from a source terminal, and analyzing the source data with large data volume to obtain each business keyword set corresponding to each source data.

It can be understood that, in order to classify source data with large data size, keyword extraction needs to be performed on each source data, word segmentation may be performed on each source data to obtain all words corresponding to each source data, de-duplication processing is performed on all words corresponding to each source data, and then keyword extraction is performed, where TF-IDF values of the de-duplicated words may be calculated, where TF-IDF values are actually: TF is IDF, TF word Frequency (Term Frequency), IDF reverse file Frequency (Inverse Document Frequency). And filtering out common words according to the TF-IDF value, and reserving important words so as to obtain each business keyword set corresponding to each source data.

Step S102: and acquiring various types of cluster keyword sets corresponding to the business type clusters.

In this embodiment, the service type cluster includes a plurality of associated service types;

before the step S102, the method further includes:

It should be noted that, the service type cluster includes a plurality of associated service types with strong relevance, and various service types can be clustered according to the dependency relationship between sample data corresponding to each service type, so as to obtain each service type cluster. Specifically, keyword extraction is performed on sample data corresponding to each service type to obtain keywords corresponding to each service type, and then keywords corresponding to associated service types in each service type cluster are obtained to construct a corresponding type cluster keyword set. For example, the service type cluster is order processing, and the service types with strong relevance to the order processing include: order request information, trade success order information, trade failure order information and other associated service types. Extracting keywords from sample data corresponding to order request information, transaction success order information and transaction failure order information, obtaining sample keywords corresponding to each associated service type, and obtaining the sample keywords as a type cluster keyword set corresponding to order processing.

Step S103: and matching each business keyword set with each type of cluster keyword set to obtain the matching degree.

It should be understood that, classifying each source data according to the service type clusters, by matching each service keyword set corresponding to each source data with each type of cluster keyword set corresponding to each service type cluster, the matching degree between each service keyword set language corresponding to each source data and each type of cluster keyword set corresponding to each service type cluster, that is, the matching degree between each source data and each service type cluster is higher, which means that the higher the matching degree is, the higher the relevance between the corresponding source data and the service type cluster is, and the corresponding source data can be classified into the service type cluster with the highest matching degree.

In a specific implementation, the service keywords are respectively matched with each keyword set, typically, each service keyword set corresponding to each source data is traversed, and the traversed service keywords are matched with each type of cluster keyword set corresponding to each service type cluster, so as to obtain the matching degree between each service keyword set corresponding to each source data and each type of cluster keyword set corresponding to each service type cluster. In this embodiment, the step S102 includes: traversing each business keyword set, and respectively matching the business keyword sets with each type of cluster keyword set to obtain the matching degree between each business keyword set and each type of cluster keyword set.

For example, the source data includes A, B and C, the set of service keywords corresponding to the source data a is { a1, a2, a3, a4}, the set of service keywords corresponding to the source data B is { a1, a2, a3, a4}, the set of service keywords corresponding to the source data C is { C1, C2, C3, C4}, the service type cluster includes M and N, the set of service keywords corresponding to the service type cluster M is { a1, a2, a3, a4, a5}, the set of service keywords corresponding to the service type cluster N is { a1, C2, C3, C4}, the set of service keywords corresponding to the source data a and the set of service keywords corresponding to the source data B are matched to the set of service keywords corresponding to the service type cluster M to the highest degree, and the source data a and the service type cluster M can be classified as the service type cluster M, and the data can be synchronized through the same link.

Step S104: and classifying each source data corresponding to each business keyword set according to the matching degree to obtain link data of a plurality of categories.

It can be understood that, in the above example, the service type cluster corresponding to the type cluster with the highest matching degree is generally used as the category corresponding to the source data corresponding to the service keyword set, and in the above example, the service keyword set corresponding to the source data a and the service keyword set corresponding to the source data B have the highest matching degree with the service keyword set corresponding to the type cluster corresponding to the service type cluster M, and the source data a and the source data B may be classified as the service type cluster M, and serve as the link data of one category. The source data with large data quantity are classified according to the matching degree, so that the source data with large data quantity are divided into link data with multiple categories, the link data with different categories are transmitted through different links, the transmission efficiency is improved, and the transmission of a large amount of source data through one link is avoided, so that the efficiency is low.

In this embodiment, source data with large data volume is obtained from a source end, the source data with large data volume is parsed to obtain each service keyword set corresponding to each source data, each type of cluster keyword set corresponding to each service type cluster is obtained, each service keyword set is matched with each type of cluster keyword set to obtain a matching degree, each source data corresponding to each service keyword set is classified according to the matching degree to obtain link data with multiple categories, and matching among keyword sets can reflect the association degree between each source data and each service type cluster, so that each source data is effectively classified, associated service data can be transmitted through the same link, and if one link fails during transmission through different links, the associated service data needs to wait, and the processing progress of the associated service data is affected.

Referring to fig. 4, fig. 4 is a flow chart of a third embodiment of the large data volume data synchronization method according to the present invention, and based on the second embodiment shown in fig. 3, the third embodiment of the large data volume data synchronization method according to the present invention is proposed.

In a third embodiment, the step S20 includes:

step S201: and reading the online redo log file from the database of the source end through an extraction process.

It should be understood that the Online Redo Log file is created when the database is created, the database cannot lack the Online Redo Log file, and the Online Redo Log file (Online Redo Log) is read in the database at the source end by the OGG through an extraction Process (Extract Process) and then parsed, so as to obtain the data needed to be data synchronized in the source data.

Step S202: and analyzing the online redo log file to obtain the change data in the link data of each category.

It will be appreciated that in general, to avoid a large number of duplicate source data being synchronized to a destination, the subsequent data synchronization need only synchronize the changing data in the source to the destination, except for the first data synchronization. And analyzing the online redo log file to obtain changed data in each source data, such as data obtained by adding, deleting or modifying operations, wherein the changed data in each source data form the changed data in the corresponding category of link data.

Step S203: and synchronizing the change data in the link data of each category to a target end through a transmission process, so that the target end starts a plurality of preset processes to process the change data in the link data of each category in parallel.

The method and the device synchronize the change data in the link data of each category to the target end through different links, so that the change data in a large amount of source data are transmitted in a branching mode, and the transmission efficiency is improved. The preset process comprises an REP process, and in order to further improve the synchronization efficiency, a plurality of REP processes can be started simultaneously to process the change data in the link data in parallel at the target end.

In this embodiment, after the step S202, the method further includes:

the step S203 includes:

In a specific implementation, the link data is generally stored in a tail data file, the preset rule is a transaction data management (GoldenGate TDM) customized intermediate format, format conversion is required to be performed on the change data in each type of link data, the to-be-transmitted data in each type of link data in the transaction data management customized intermediate format is obtained, the preset queue file is a tail data file, the to-be-transmitted data is stored in the tail data file, and the to-be-transmitted data in each type of link data is synchronized to a target end through different links, so that a great amount of change data in source data is transmitted in a branching mode, and transmission efficiency is improved. The preset process comprises an REP process, and in order to further improve the synchronization efficiency, a plurality of REP processes can be started at the same time to process the data to be transmitted in the link data in parallel.

In this embodiment, the synchronizing, by a transmission process, the data to be transmitted in the link data of each category to the target end includes:

It should be understood that, in order to avoid data synchronization failure caused by transmission link failure or connection delay, it is necessary to monitor whether each source data completes data synchronization within the preset synchronization time, where the preset synchronization time may be used as the preset synchronization time by calculating the data amount synchronized in unit time according to the record of the time of previous data synchronization, counting the data amount of the source data with the large data amount, and calculating the time required for data synchronization according to the data amount synchronized in unit time and the data amount of the source data with the large data amount. Normally, the source data can be successfully synchronized to the target end within the preset time.

It can be understood that, in order to calculate the synchronization time conveniently, when the data to be transmitted in the link data of each category is synchronized to the target end through the transmission process, the synchronization time is recorded, and the current time is obtained, when the time difference obtained by subtracting the synchronization time from the current time exceeds the preset synchronization time, under normal conditions, the target end already completes the synchronization of the data sent by the source end, and can send synchronization completion information to the source end, if the synchronization completion information is not received at this time, the data synchronization is considered to be failed, and the step of synchronizing the data to be transmitted in the link data of each category to the target end through the transmission process needs to be repeatedly executed, so as to ensure that each source data of the source end is synchronized to the target end.

In this embodiment, an online redo log file is read from the database of the source end by an extraction process, the online redo log file is parsed to obtain the change data in the link data of each category, and the change data in the link data of each category is synchronized to the target end by a transmission process, so that the target end starts a plurality of preset processes to process the change data in the link data of each category in parallel, only the change data in the link data of each category is synchronized to the target end, and the data amount of data synchronization is reduced, thereby improving the data synchronization efficiency.

In addition, the embodiment of the invention also provides a storage medium, wherein the storage medium stores a data synchronization program with large data volume, and the data synchronization program with large data volume realizes the steps of the data synchronization method with large data volume when being executed by a processor.

In addition, referring to fig. 5, an embodiment of the present invention further provides a data synchronization device with a large data volume, where the data synchronization device with a large data volume includes:

the classification module 10 is configured to obtain source data with a large data volume from a source end, classify the source data with the large data volume according to a service type cluster, and obtain link data with multiple categories;

and the synchronization module 20 is configured to synchronize the link data of each class to the target end through different links, so that the target end starts a plurality of preset processes to process the link data of each class in parallel.

In an embodiment, the data synchronization device for large data volume further includes:

the analysis module is used for acquiring the source data with large data volume from the source terminal, analyzing the source data with large data volume and acquiring each business keyword set corresponding to each source data;

the acquisition module is used for acquiring various types of cluster keyword sets corresponding to the business type clusters;

the matching module is used for respectively matching each business keyword set with each type of cluster keyword set to obtain the matching degree;

the classification module 10 is further configured to classify each source data corresponding to each service keyword set according to the matching degree, so as to obtain link data of multiple categories.

In one embodiment, the service type cluster includes a plurality of associated service types;

the data synchronization device of large data volume further includes:

the extraction module is used for obtaining sample data corresponding to each associated service type in each service type cluster, extracting keywords from the sample data and obtaining sample keywords;

and the construction module is used for constructing various cluster keyword sets corresponding to the business type clusters according to the sample keywords.

In an embodiment, the matching module is further configured to traverse each service keyword set, and match the service keyword set with each type of cluster keyword set, so as to obtain a matching degree between each service keyword set and each type of cluster keyword set.

the reading module is used for reading the online redo log file from the database of the source end through the extraction process;

the analysis module is further used for analyzing the online redo log file to obtain change data in the link data of each category;

the synchronization module 20 is further configured to synchronize the change data in the link data of each class to the target through the transmission process, so that the target starts a plurality of preset processes to process the change data in the link data of each class in parallel.

the conversion module is used for carrying out format conversion on the change data in the link data of each category according to a preset rule to obtain data to be transmitted in the link data of each category, and storing the data to be transmitted in a preset queue file;

the synchronization module 20 is further configured to synchronize the data to be transmitted in the link data of each class to a target end through a transmission process, so that the target end starts a plurality of preset processes to process the data to be transmitted in the link data of each class in parallel.

In an embodiment, the synchronization module 20 is further configured to synchronize, by a transmission process, the data to be transmitted in the link data of each category to a target end, and record a synchronization time;

the data synchronization device of large data volume further includes:

the detection module is used for acquiring the current moment, and detecting whether synchronization completion information sent by the target end is received or not when the time difference between the current moment and the synchronization moment exceeds the preset synchronization time;

the synchronization module 20 is further configured to repeatedly execute the step of synchronizing the data to be transmitted in the link data of each category to the target end through the transmission process if the data to be transmitted is not received.

Other embodiments or specific implementation manners of the data synchronization device with large data volume according to the present invention may refer to the above method embodiments, and are not described herein again.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.

The foregoing embodiment numbers of the present invention are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The use of the terms first, second, third, etc. do not denote any order, but rather the terms first, second, third, etc. are used to interpret the terms as labels.

From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. read only memory mirror (Read Only Memory image, ROM)/random access memory (Random Access Memory, RAM), magnetic disk, optical disk), comprising instructions for causing a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the method according to the embodiments of the present invention.

The foregoing description is only of the preferred embodiments of the present invention, and is not intended to limit the scope of the invention, but rather is intended to cover any equivalents of the structures or equivalent processes disclosed herein or in the alternative, which may be employed directly or indirectly in other related arts.

Claims

1. A data synchronization method for a large data volume, characterized in that the data synchronization method for a large data volume comprises the steps of:

synchronizing the link data of each category to a target end through different links respectively, so that the target end starts a plurality of preset processes to process the link data of each category in parallel;

after the step of synchronizing the link data of each category to the target end through different links, the method further includes:

recording a synchronization time, and detecting whether synchronization completion information sent by the target terminal is received or not when the time difference between the current time and the synchronization time exceeds a preset synchronization time;

if not, returning to the step of synchronizing the link data of each category to the target end through different links;

before the step of synchronizing the link data of each category to the target end through different links, the method further comprises:

calculating the synchronous data quantity in unit time according to the prior data synchronous time record, and counting the data quantity of the source data with large data quantity;

calculating the preset synchronization time required by the data synchronization of the large data amount of source data according to the data amount synchronized in the unit time and the data amount of the large data amount of source data;

the step of synchronizing the link data of each category to the target end through different links so that the target end starts a plurality of preset processes to process the link data of each category in parallel includes:

reading an online redo log file from a database of the source end by using an OGG (open gateway) by using an extraction process;

2. The method for synchronizing large data volume data according to claim 1, wherein said obtaining large data volume source data from a source end, classifying said large data volume source data according to a service type cluster, obtaining a plurality of kinds of link data, comprises:

3. The method for large data volume data synchronization of claim 2 wherein said traffic type cluster comprises a plurality of associated traffic types;

4. The method for synchronizing data of large data volume as recited in claim 3, wherein said matching each service keyword set with each type of cluster keyword set to obtain a matching degree comprises:

5. A large data amount data synchronization apparatus, characterized by comprising: a memory, a processor and a large data amount data synchronization program stored on the memory and executable on the processor, which when executed by the processor, implements the steps of the large data amount data synchronization method as claimed in any one of claims 1 to 4.

6. A storage medium having stored thereon a large data amount data synchronization program which, when executed by a processor, implements the steps of the large data amount data synchronization method according to any one of claims 1 to 4.

7. A large data amount data synchronization device, characterized in that the large data amount data synchronization device comprises:

the synchronization module is used for synchronizing the link data of each category to the target end through different links respectively, so that the target end starts a plurality of preset processes to process the link data of each category in parallel;

the synchronization module is further configured to:

if not, repeating the operation of synchronizing the link data of each category to the target end through different links;

the synchronization module is further configured to: