CN109656999B - Method, device, storage medium and apparatus for synchronizing large data volume data - Google Patents

Method, device, storage medium and apparatus for synchronizing large data volume data Download PDF

Info

Publication number
CN109656999B
CN109656999B CN201811188299.2A CN201811188299A CN109656999B CN 109656999 B CN109656999 B CN 109656999B CN 201811188299 A CN201811188299 A CN 201811188299A CN 109656999 B CN109656999 B CN 109656999B
Authority
CN
China
Prior art keywords
data
synchronization
category
source
link
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811188299.2A
Other languages
Chinese (zh)
Other versions
CN109656999A (en
Inventor
许永夫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN201811188299.2A priority Critical patent/CN109656999B/en
Publication of CN109656999A publication Critical patent/CN109656999A/en
Application granted granted Critical
Publication of CN109656999B publication Critical patent/CN109656999B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a data synchronization method, equipment, a storage medium and a device for large data volume, wherein the method comprises the following steps: acquiring source data with large data volume from a source terminal, classifying the source data with large data volume according to a service type cluster, and acquiring link data with multiple categories; and synchronizing the link data of each category to a target end through different links respectively, so that the target end starts a plurality of preset processes to process the link data of each category in parallel. According to the invention, the link data of each category are respectively synchronized to the target end through different links, so that the target end starts a plurality of preset processes to process the link data of each category in parallel, a large amount of source data is prevented from being transmitted through one link, the efficiency is low, and the time for synchronizing the data of a large amount of data is greatly shortened.

Description

Method, device, storage medium and apparatus for synchronizing large data volume data
Technical Field
The present invention relates to the field of big data technologies, and in particular, to a method, an apparatus, a storage medium, and a device for synchronizing data with a big data volume.
Background
At present, it is necessary to synchronize the data change in one system to another system in time, if the data update amount is very large (more than ten millions) at a certain time, and the data correlation is complex, the synchronization will generate relatively large delay, and if the data amount is very large, how to improve the efficiency of synchronizing the data with large data amount is a technical problem to be solved.
The foregoing is provided merely for the purpose of facilitating understanding of the technical solutions of the present invention and is not intended to represent an admission that the foregoing is prior art.
Disclosure of Invention
The invention mainly aims to provide a large-data-volume data synchronization method, equipment, a storage medium and a device, and aims to solve the technical problem of low efficiency when large-data-volume data are synchronized in the prior art.
In order to achieve the above object, the present invention provides a data synchronization method of a large data volume, the data synchronization method of a large data volume comprising the steps of:
acquiring source data with large data volume from a source terminal, classifying the source data with large data volume according to a service type cluster, and acquiring link data with multiple categories;
and synchronizing the link data of each category to a target end through different links respectively, so that the target end starts a plurality of preset processes to process the link data of each category in parallel.
Preferably, the obtaining the source data with large data volume from the source end classifies the source data with large data volume according to the service type cluster to obtain link data with multiple categories, including:
acquiring source data with large data volume from a source terminal, and analyzing the source data with large data volume to acquire each business keyword set corresponding to each source data;
acquiring a keyword set of each type of cluster corresponding to each service type cluster;
matching each business keyword set with each type of cluster keyword set to obtain a matching degree;
and classifying each source data corresponding to each business keyword set according to the matching degree to obtain link data of a plurality of categories.
Preferably, the service type cluster comprises a plurality of associated service types;
the method for synchronizing the data with large data volume before the matching degree is obtained by respectively matching each business keyword set with each type of cluster keyword set comprises the following steps:
acquiring sample data corresponding to each associated service type in each service type cluster, and extracting keywords from the sample data to obtain sample keywords;
and constructing various types of cluster keyword sets corresponding to the business type clusters according to the sample keywords.
Preferably, the matching the service keyword sets with the cluster keyword sets of each type to obtain a matching degree includes:
traversing each business keyword set, and respectively matching the business keyword sets with each type of cluster keyword set to obtain the matching degree between each business keyword set and each type of cluster keyword set.
Preferably, the synchronizing the link data of each category to the target terminal through different links, so that the target terminal starts a plurality of preset processes to process the link data of each category in parallel, includes:
reading an online redo log file from a database of the source end through an extraction process;
analyzing the online redo log file to obtain change data in the link data of each category;
and synchronizing the change data in the link data of each category to a target end through a transmission process, so that the target end starts a plurality of preset processes to process the change data in the link data of each category in parallel.
Preferably, after the analyzing the online redo log file to obtain the change data in the link data of each category, the data synchronization method of the large data volume further includes:
converting the format of the change data in the link data of each category according to a preset rule to obtain the data to be transmitted in the link data of each category, and storing the data to be transmitted in a preset queue file;
the step of synchronizing the change data in the link data of each category to the target end through the transmission process, so that the target end starts a plurality of preset processes to process the change data in the link data of each category in parallel, comprising the following steps:
synchronizing the data to be transmitted in the link data of each category to a target end through a transmission process, so that the target end starts a plurality of preset processes to process the data to be transmitted in the link data of each category in parallel.
Preferably, the synchronizing the data to be transmitted in the link data of each category to the target end through the transmission process includes:
synchronizing the data to be transmitted in the link data of each category to a target end through a transmission process, and recording the synchronization time;
acquiring a current time, and detecting whether synchronization completion information sent by the target end is received or not when the time difference between the current time and the synchronization time exceeds a preset synchronization time;
and if not, repeating the step of synchronizing the data to be transmitted in the link data of each category to the target end through the transmission process.
In addition, in order to achieve the above object, the present invention also proposes a large-data-volume data synchronization apparatus including a memory, a processor, and a large-data-volume data synchronization program stored on the memory and executable on the processor, the large-data-volume data synchronization program being configured to implement the steps of the large-data-volume data synchronization method as described above.
In addition, in order to achieve the above object, the present invention also proposes a storage medium having stored thereon a data synchronization program of a large data amount, which when executed by a processor, implements the steps of the data synchronization method of a large data amount as described above.
In addition, in order to achieve the above object, the present invention also proposes a data synchronization device of large data volume, the data synchronization device of large data volume comprising:
the classification module is used for acquiring source data with large data volume from a source terminal, classifying the source data with large data volume according to a service type cluster and acquiring link data with multiple categories;
and the synchronization module is used for synchronizing the link data of each category to the target end through different links respectively, so that the target end starts a plurality of preset processes to process the link data of each category in parallel.
In the invention, the source data with large data volume is obtained from the source end, and the source data with large data volume is classified according to the service type cluster to obtain link data with multiple categories, so that the source data with strong relevance is divided into one category, the data synchronization is carried out by the category in the follow-up, and the synchronization efficiency is improved; and synchronizing the link data of each category to the target end through different links respectively, so that the target end starts a plurality of preset processes to process the link data of each category in parallel, synchronization delay caused by simultaneous synchronization of data with large data quantity is avoided, and the time for data synchronization of the data with large data quantity is greatly shortened.
Drawings
FIG. 1 is a schematic diagram of a large data volume data synchronization device of a hardware runtime environment according to an embodiment of the present invention;
FIG. 2 is a flow chart of a first embodiment of a large data volume data synchronization method according to the present invention;
FIG. 3 is a flow chart of a second embodiment of a large data volume data synchronization method according to the present invention;
FIG. 4 is a flow chart of a third embodiment of a large data volume data synchronization method according to the present invention;
fig. 5 is a block diagram of a first embodiment of a large data volume data synchronization apparatus according to the present invention.
The achievement of the objects, functional features and advantages of the present invention will be further described with reference to the accompanying drawings, in conjunction with the embodiments.
Detailed Description
It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
Referring to fig. 1, fig. 1 is a schematic diagram of a data synchronization device with a large data volume in a hardware running environment according to an embodiment of the present invention.
As shown in fig. 1, the data synchronization device of a large data volume may include: a processor 1001, such as a central processing unit (Central Processing Unit, CPU), a communication bus 1002, a user interface 1003, a network interface 1004, a memory 1005. Wherein the communication bus 1002 is used to enable connected communication between these components. The user interface 1003 may include a Display (Display), and the optional user interface 1003 may also include a standard wired interface, a wireless interface, and the wired interface for the user interface 1003 may be a USB interface in the present invention. The network interface 1004 may optionally include a standard wired interface, a WIreless interface (e.g., a WIreless-FIdelity (WI-FI) interface). The Memory 1005 may be a high-speed random access Memory (Random Access Memory, RAM) Memory or a stable Memory (NVM), such as a disk Memory. The memory 1005 may also optionally be a storage device separate from the processor 1001 described above.
It will be appreciated by those skilled in the art that the structure shown in fig. 1 does not constitute a limitation of a data synchronization device for large data volumes, and may include more or fewer components than shown, or may combine certain components, or may be a different arrangement of components.
As shown in fig. 1, the memory 1005, which is assumed to be a computer storage medium, may include an operating system, a network communication module, a user interface module, and a data synchronization program for a large data volume.
In the data synchronization device with large data volume shown in fig. 1, the network interface 1004 is mainly used for connecting to a background server, and performing data communication with the background server; the user interface 1003 is mainly used for connecting user equipment; the large-data-volume data synchronization apparatus calls a large-data-volume data synchronization program stored in the memory 1005 through the processor 1001, and executes the large-data-volume data synchronization method provided by the embodiment of the present invention.
Based on the above hardware structure, an embodiment of the data synchronization method of the present invention for large data volume is presented.
Referring to fig. 2, fig. 2 is a flow chart of a first embodiment of a large data volume data synchronization method according to the present invention.
In a first embodiment, the data synchronization method of a large data volume includes the steps of:
step S10: and acquiring the source data with large data volume from the source terminal, and classifying the source data with large data volume according to the service type cluster to obtain link data with multiple categories.
It should be understood that the execution body of the embodiment is the data synchronization device with large data volume, where the data synchronization device with large data volume may be an electronic device such as a personal computer or a server. The synchronization of the source data with large data volume can be realized through OGG (oracle golden gate), the OGG provides real-time capturing, conversion and delivery of transaction data under heterogeneous environment, the source data with large data volume is obtained from the source end through the OGG, the source data with large data volume can be classified into a plurality of types of link data in order to improve the synchronization efficiency of the source data with large data volume, and the associated service data can be transmitted through the same link, so that the condition that if one link fails during transmission through different links, the associated service data needs to wait to influence the subsequent processing progress of the associated service data is avoided.
It can be understood that the service type clusters include a plurality of associated service types, so that the source data in each service type cluster is source data with strong association, and when service data processing is generally performed, data analysis is also required according to the source data with strong association, so that the source data with large data volume is classified according to the service type clusters. The source data in each service type cluster is used as link data of one category, and the link data of each category is transmitted through different links respectively, so that the transmission efficiency is improved, and a large amount of source data is prevented from being transmitted through one link, and the efficiency is low.
Step S20: and synchronizing the link data of each category to a target end through different links respectively, so that the target end starts a plurality of preset processes to process the link data of each category in parallel.
It should be noted that, each kind of link data is synchronized to the target end through different links, so that a large amount of source data is transmitted in a branching way, and the transmission efficiency is improved. The preset process comprises a REP process, and in order to further improve the synchronization efficiency, a plurality of REP processes can be started to process the link data in parallel at the target end, and usually the link data is stored in a tail data file, namely, a plurality of REP processes are started to process the tail data file in parallel. the tail data file is a descriptive file, and can be split according to new addition, deletion or modification and the like, and is processed in parallel by a plurality of processes.
In a specific implementation, an Online Redo Log file (Online Redo Log) or an Archive Log (Archive Log) is read in a database of a source end by an OGG through an extraction Process (Extract Process), then analysis is performed, only change information of data in the Online Redo Log file (Online Redo Log) or the Archive Log (Archive Log) is extracted, such as addition, deletion or modification operation is performed, the extracted information is converted into a golden gate custom intermediate format and is stored in a queue file (track file), and the queue file (track file) is transmitted to a target end through a TCP/IP through a transmission Process. The target end has a process called Server Collector, the process accepts the data change information transmitted from the source end, caches the information into a GoldenGate queue file (track file), waits for the copying process of the target end to read data, reads the data change information from the queue file (track file) by the GoldenGate copying process, creates a corresponding structured query language (Structured Query Language, abbreviated SQL) statement, executes the statement through a local interface of the database, submits the statement to the database of the target end, updates own check points after the successful submission, records the position where the copying is completed, and finally completes the copying process of the data.
In the embodiment, the source data with large data volume is obtained from the source end, and the source data with large data volume is classified according to the service type cluster to obtain link data with multiple categories, so that the source data with strong relevance is divided into one category, the data synchronization is carried out by the category in the follow-up, and the synchronization efficiency is improved; and synchronizing the link data of each category to the target end through different links respectively, so that the target end starts a plurality of preset processes to process the link data of each category in parallel, synchronization delay caused by simultaneous synchronization of data with large data quantity is avoided, and the time for data synchronization of the data with large data quantity is greatly shortened.
Referring to fig. 3, fig. 3 is a flowchart of a second embodiment of the large data volume data synchronization method according to the present invention, and based on the first embodiment shown in fig. 2, the second embodiment of the large data volume data synchronization method according to the present invention is proposed.
In a second embodiment, the step S10 includes:
step S101: and acquiring source data with large data volume from a source terminal, and analyzing the source data with large data volume to obtain each business keyword set corresponding to each source data.
It can be understood that, in order to classify source data with large data size, keyword extraction needs to be performed on each source data, word segmentation may be performed on each source data to obtain all words corresponding to each source data, de-duplication processing is performed on all words corresponding to each source data, and then keyword extraction is performed, where TF-IDF values of the de-duplicated words may be calculated, where TF-IDF values are actually: TF is IDF, TF word Frequency (Term Frequency), IDF reverse file Frequency (Inverse Document Frequency). And filtering out common words according to the TF-IDF value, and reserving important words so as to obtain each business keyword set corresponding to each source data.
Step S102: and acquiring various types of cluster keyword sets corresponding to the business type clusters.
In this embodiment, the service type cluster includes a plurality of associated service types;
before the step S102, the method further includes:
acquiring sample data corresponding to each associated service type in each service type cluster, and extracting keywords from the sample data to obtain sample keywords;
and constructing various types of cluster keyword sets corresponding to the business type clusters according to the sample keywords.
It should be noted that, the service type cluster includes a plurality of associated service types with strong relevance, and various service types can be clustered according to the dependency relationship between sample data corresponding to each service type, so as to obtain each service type cluster. Specifically, keyword extraction is performed on sample data corresponding to each service type to obtain keywords corresponding to each service type, and then keywords corresponding to associated service types in each service type cluster are obtained to construct a corresponding type cluster keyword set. For example, the service type cluster is order processing, and the service types with strong relevance to the order processing include: order request information, trade success order information, trade failure order information and other associated service types. Extracting keywords from sample data corresponding to order request information, transaction success order information and transaction failure order information, obtaining sample keywords corresponding to each associated service type, and obtaining the sample keywords as a type cluster keyword set corresponding to order processing.
Step S103: and matching each business keyword set with each type of cluster keyword set to obtain the matching degree.
It should be understood that, classifying each source data according to the service type clusters, by matching each service keyword set corresponding to each source data with each type of cluster keyword set corresponding to each service type cluster, the matching degree between each service keyword set language corresponding to each source data and each type of cluster keyword set corresponding to each service type cluster, that is, the matching degree between each source data and each service type cluster is higher, which means that the higher the matching degree is, the higher the relevance between the corresponding source data and the service type cluster is, and the corresponding source data can be classified into the service type cluster with the highest matching degree.
In a specific implementation, the service keywords are respectively matched with each keyword set, typically, each service keyword set corresponding to each source data is traversed, and the traversed service keywords are matched with each type of cluster keyword set corresponding to each service type cluster, so as to obtain the matching degree between each service keyword set corresponding to each source data and each type of cluster keyword set corresponding to each service type cluster. In this embodiment, the step S102 includes: traversing each business keyword set, and respectively matching the business keyword sets with each type of cluster keyword set to obtain the matching degree between each business keyword set and each type of cluster keyword set.
For example, the source data includes A, B and C, the set of service keywords corresponding to the source data a is { a1, a2, a3, a4}, the set of service keywords corresponding to the source data B is { a1, a2, a3, a4}, the set of service keywords corresponding to the source data C is { C1, C2, C3, C4}, the service type cluster includes M and N, the set of service keywords corresponding to the service type cluster M is { a1, a2, a3, a4, a5}, the set of service keywords corresponding to the service type cluster N is { a1, C2, C3, C4}, the set of service keywords corresponding to the source data a and the set of service keywords corresponding to the source data B are matched to the set of service keywords corresponding to the service type cluster M to the highest degree, and the source data a and the service type cluster M can be classified as the service type cluster M, and the data can be synchronized through the same link.
Step S104: and classifying each source data corresponding to each business keyword set according to the matching degree to obtain link data of a plurality of categories.
It can be understood that, in the above example, the service type cluster corresponding to the type cluster with the highest matching degree is generally used as the category corresponding to the source data corresponding to the service keyword set, and in the above example, the service keyword set corresponding to the source data a and the service keyword set corresponding to the source data B have the highest matching degree with the service keyword set corresponding to the type cluster corresponding to the service type cluster M, and the source data a and the source data B may be classified as the service type cluster M, and serve as the link data of one category. The source data with large data quantity are classified according to the matching degree, so that the source data with large data quantity are divided into link data with multiple categories, the link data with different categories are transmitted through different links, the transmission efficiency is improved, and the transmission of a large amount of source data through one link is avoided, so that the efficiency is low.
In this embodiment, source data with large data volume is obtained from a source end, the source data with large data volume is parsed to obtain each service keyword set corresponding to each source data, each type of cluster keyword set corresponding to each service type cluster is obtained, each service keyword set is matched with each type of cluster keyword set to obtain a matching degree, each source data corresponding to each service keyword set is classified according to the matching degree to obtain link data with multiple categories, and matching among keyword sets can reflect the association degree between each source data and each service type cluster, so that each source data is effectively classified, associated service data can be transmitted through the same link, and if one link fails during transmission through different links, the associated service data needs to wait, and the processing progress of the associated service data is affected.
Referring to fig. 4, fig. 4 is a flow chart of a third embodiment of the large data volume data synchronization method according to the present invention, and based on the second embodiment shown in fig. 3, the third embodiment of the large data volume data synchronization method according to the present invention is proposed.
In a third embodiment, the step S20 includes:
step S201: and reading the online redo log file from the database of the source end through an extraction process.
It should be understood that the Online Redo Log file is created when the database is created, the database cannot lack the Online Redo Log file, and the Online Redo Log file (Online Redo Log) is read in the database at the source end by the OGG through an extraction Process (Extract Process) and then parsed, so as to obtain the data needed to be data synchronized in the source data.
Step S202: and analyzing the online redo log file to obtain the change data in the link data of each category.
It will be appreciated that in general, to avoid a large number of duplicate source data being synchronized to a destination, the subsequent data synchronization need only synchronize the changing data in the source to the destination, except for the first data synchronization. And analyzing the online redo log file to obtain changed data in each source data, such as data obtained by adding, deleting or modifying operations, wherein the changed data in each source data form the changed data in the corresponding category of link data.
Step S203: and synchronizing the change data in the link data of each category to a target end through a transmission process, so that the target end starts a plurality of preset processes to process the change data in the link data of each category in parallel.
The method and the device synchronize the change data in the link data of each category to the target end through different links, so that the change data in a large amount of source data are transmitted in a branching mode, and the transmission efficiency is improved. The preset process comprises an REP process, and in order to further improve the synchronization efficiency, a plurality of REP processes can be started simultaneously to process the change data in the link data in parallel at the target end.
In this embodiment, after the step S202, the method further includes:
converting the format of the change data in the link data of each category according to a preset rule to obtain the data to be transmitted in the link data of each category, and storing the data to be transmitted in a preset queue file;
the step S203 includes:
synchronizing the data to be transmitted in the link data of each category to a target end through a transmission process, so that the target end starts a plurality of preset processes to process the data to be transmitted in the link data of each category in parallel.
In a specific implementation, the link data is generally stored in a tail data file, the preset rule is a transaction data management (GoldenGate TDM) customized intermediate format, format conversion is required to be performed on the change data in each type of link data, the to-be-transmitted data in each type of link data in the transaction data management customized intermediate format is obtained, the preset queue file is a tail data file, the to-be-transmitted data is stored in the tail data file, and the to-be-transmitted data in each type of link data is synchronized to a target end through different links, so that a great amount of change data in source data is transmitted in a branching mode, and transmission efficiency is improved. The preset process comprises an REP process, and in order to further improve the synchronization efficiency, a plurality of REP processes can be started at the same time to process the data to be transmitted in the link data in parallel.
In this embodiment, the synchronizing, by a transmission process, the data to be transmitted in the link data of each category to the target end includes:
synchronizing the data to be transmitted in the link data of each category to a target end through a transmission process, and recording the synchronization time;
acquiring a current time, and detecting whether synchronization completion information sent by the target end is received or not when the time difference between the current time and the synchronization time exceeds a preset synchronization time;
and if not, repeating the step of synchronizing the data to be transmitted in the link data of each category to the target end through the transmission process.
It should be understood that, in order to avoid data synchronization failure caused by transmission link failure or connection delay, it is necessary to monitor whether each source data completes data synchronization within the preset synchronization time, where the preset synchronization time may be used as the preset synchronization time by calculating the data amount synchronized in unit time according to the record of the time of previous data synchronization, counting the data amount of the source data with the large data amount, and calculating the time required for data synchronization according to the data amount synchronized in unit time and the data amount of the source data with the large data amount. Normally, the source data can be successfully synchronized to the target end within the preset time.
It can be understood that, in order to calculate the synchronization time conveniently, when the data to be transmitted in the link data of each category is synchronized to the target end through the transmission process, the synchronization time is recorded, and the current time is obtained, when the time difference obtained by subtracting the synchronization time from the current time exceeds the preset synchronization time, under normal conditions, the target end already completes the synchronization of the data sent by the source end, and can send synchronization completion information to the source end, if the synchronization completion information is not received at this time, the data synchronization is considered to be failed, and the step of synchronizing the data to be transmitted in the link data of each category to the target end through the transmission process needs to be repeatedly executed, so as to ensure that each source data of the source end is synchronized to the target end.
In this embodiment, an online redo log file is read from the database of the source end by an extraction process, the online redo log file is parsed to obtain the change data in the link data of each category, and the change data in the link data of each category is synchronized to the target end by a transmission process, so that the target end starts a plurality of preset processes to process the change data in the link data of each category in parallel, only the change data in the link data of each category is synchronized to the target end, and the data amount of data synchronization is reduced, thereby improving the data synchronization efficiency.
In addition, the embodiment of the invention also provides a storage medium, wherein the storage medium stores a data synchronization program with large data volume, and the data synchronization program with large data volume realizes the steps of the data synchronization method with large data volume when being executed by a processor.
In addition, referring to fig. 5, an embodiment of the present invention further provides a data synchronization device with a large data volume, where the data synchronization device with a large data volume includes:
the classification module 10 is configured to obtain source data with a large data volume from a source end, classify the source data with the large data volume according to a service type cluster, and obtain link data with multiple categories;
and the synchronization module 20 is configured to synchronize the link data of each class to the target end through different links, so that the target end starts a plurality of preset processes to process the link data of each class in parallel.
In an embodiment, the data synchronization device for large data volume further includes:
the analysis module is used for acquiring the source data with large data volume from the source terminal, analyzing the source data with large data volume and acquiring each business keyword set corresponding to each source data;
the acquisition module is used for acquiring various types of cluster keyword sets corresponding to the business type clusters;
the matching module is used for respectively matching each business keyword set with each type of cluster keyword set to obtain the matching degree;
the classification module 10 is further configured to classify each source data corresponding to each service keyword set according to the matching degree, so as to obtain link data of multiple categories.
In one embodiment, the service type cluster includes a plurality of associated service types;
the data synchronization device of large data volume further includes:
the extraction module is used for obtaining sample data corresponding to each associated service type in each service type cluster, extracting keywords from the sample data and obtaining sample keywords;
and the construction module is used for constructing various cluster keyword sets corresponding to the business type clusters according to the sample keywords.
In an embodiment, the matching module is further configured to traverse each service keyword set, and match the service keyword set with each type of cluster keyword set, so as to obtain a matching degree between each service keyword set and each type of cluster keyword set.
In an embodiment, the data synchronization device for large data volume further includes:
the reading module is used for reading the online redo log file from the database of the source end through the extraction process;
the analysis module is further used for analyzing the online redo log file to obtain change data in the link data of each category;
the synchronization module 20 is further configured to synchronize the change data in the link data of each class to the target through the transmission process, so that the target starts a plurality of preset processes to process the change data in the link data of each class in parallel.
In an embodiment, the data synchronization device for large data volume further includes:
the conversion module is used for carrying out format conversion on the change data in the link data of each category according to a preset rule to obtain data to be transmitted in the link data of each category, and storing the data to be transmitted in a preset queue file;
the synchronization module 20 is further configured to synchronize the data to be transmitted in the link data of each class to a target end through a transmission process, so that the target end starts a plurality of preset processes to process the data to be transmitted in the link data of each class in parallel.
In an embodiment, the synchronization module 20 is further configured to synchronize, by a transmission process, the data to be transmitted in the link data of each category to a target end, and record a synchronization time;
the data synchronization device of large data volume further includes:
the detection module is used for acquiring the current moment, and detecting whether synchronization completion information sent by the target end is received or not when the time difference between the current moment and the synchronization moment exceeds the preset synchronization time;
the synchronization module 20 is further configured to repeatedly execute the step of synchronizing the data to be transmitted in the link data of each category to the target end through the transmission process if the data to be transmitted is not received.
Other embodiments or specific implementation manners of the data synchronization device with large data volume according to the present invention may refer to the above method embodiments, and are not described herein again.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.
The foregoing embodiment numbers of the present invention are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The use of the terms first, second, third, etc. do not denote any order, but rather the terms first, second, third, etc. are used to interpret the terms as labels.
From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. read only memory mirror (Read Only Memory image, ROM)/random access memory (Random Access Memory, RAM), magnetic disk, optical disk), comprising instructions for causing a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the method according to the embodiments of the present invention.
The foregoing description is only of the preferred embodiments of the present invention, and is not intended to limit the scope of the invention, but rather is intended to cover any equivalents of the structures or equivalent processes disclosed herein or in the alternative, which may be employed directly or indirectly in other related arts.

Claims (7)

1. A data synchronization method for a large data volume, characterized in that the data synchronization method for a large data volume comprises the steps of:
acquiring source data with large data volume from a source terminal, classifying the source data with large data volume according to a service type cluster, and acquiring link data with multiple categories;
synchronizing the link data of each category to a target end through different links respectively, so that the target end starts a plurality of preset processes to process the link data of each category in parallel;
after the step of synchronizing the link data of each category to the target end through different links, the method further includes:
recording a synchronization time, and detecting whether synchronization completion information sent by the target terminal is received or not when the time difference between the current time and the synchronization time exceeds a preset synchronization time;
if not, returning to the step of synchronizing the link data of each category to the target end through different links;
before the step of synchronizing the link data of each category to the target end through different links, the method further comprises:
calculating the synchronous data quantity in unit time according to the prior data synchronous time record, and counting the data quantity of the source data with large data quantity;
calculating the preset synchronization time required by the data synchronization of the large data amount of source data according to the data amount synchronized in the unit time and the data amount of the large data amount of source data;
the step of synchronizing the link data of each category to the target end through different links so that the target end starts a plurality of preset processes to process the link data of each category in parallel includes:
reading an online redo log file from a database of the source end by using an OGG (open gateway) by using an extraction process;
analyzing the online redo log file to obtain change data in the link data of each category;
converting the format of the change data in the link data of each category according to a preset rule to obtain the data to be transmitted in the link data of each category, and storing the data to be transmitted in a preset queue file;
synchronizing the data to be transmitted in the link data of each category to a target end through a transmission process, so that the target end starts a plurality of preset processes to process the data to be transmitted in the link data of each category in parallel.
2. The method for synchronizing large data volume data according to claim 1, wherein said obtaining large data volume source data from a source end, classifying said large data volume source data according to a service type cluster, obtaining a plurality of kinds of link data, comprises:
acquiring source data with large data volume from a source terminal, and analyzing the source data with large data volume to acquire each business keyword set corresponding to each source data;
acquiring a keyword set of each type of cluster corresponding to each service type cluster;
matching each business keyword set with each type of cluster keyword set to obtain a matching degree;
and classifying each source data corresponding to each business keyword set according to the matching degree to obtain link data of a plurality of categories.
3. The method for large data volume data synchronization of claim 2 wherein said traffic type cluster comprises a plurality of associated traffic types;
the method for synchronizing the data with large data volume before the matching degree is obtained by respectively matching each business keyword set with each type of cluster keyword set comprises the following steps:
acquiring sample data corresponding to each associated service type in each service type cluster, and extracting keywords from the sample data to obtain sample keywords;
and constructing various types of cluster keyword sets corresponding to the business type clusters according to the sample keywords.
4. The method for synchronizing data of large data volume as recited in claim 3, wherein said matching each service keyword set with each type of cluster keyword set to obtain a matching degree comprises:
traversing each business keyword set, and respectively matching the business keyword sets with each type of cluster keyword set to obtain the matching degree between each business keyword set and each type of cluster keyword set.
5. A large data amount data synchronization apparatus, characterized by comprising: a memory, a processor and a large data amount data synchronization program stored on the memory and executable on the processor, which when executed by the processor, implements the steps of the large data amount data synchronization method as claimed in any one of claims 1 to 4.
6. A storage medium having stored thereon a large data amount data synchronization program which, when executed by a processor, implements the steps of the large data amount data synchronization method according to any one of claims 1 to 4.
7. A large data amount data synchronization device, characterized in that the large data amount data synchronization device comprises:
the classification module is used for acquiring source data with large data volume from a source terminal, classifying the source data with large data volume according to a service type cluster and acquiring link data with multiple categories;
the synchronization module is used for synchronizing the link data of each category to the target end through different links respectively, so that the target end starts a plurality of preset processes to process the link data of each category in parallel;
the synchronization module is further configured to:
recording a synchronization time, and detecting whether synchronization completion information sent by the target terminal is received or not when the time difference between the current time and the synchronization time exceeds a preset synchronization time;
if not, repeating the operation of synchronizing the link data of each category to the target end through different links;
the synchronization module is further configured to:
calculating the synchronous data quantity in unit time according to the prior data synchronous time record, and counting the data quantity of the source data with large data quantity;
calculating the preset synchronization time required by the data synchronization of the large data amount of source data according to the data amount synchronized in the unit time and the data amount of the large data amount of source data;
the synchronization module is further configured to:
reading an online redo log file from a database of the source end by using an OGG (open gateway) by using an extraction process;
analyzing the online redo log file to obtain change data in the link data of each category;
converting the format of the change data in the link data of each category according to a preset rule to obtain the data to be transmitted in the link data of each category, and storing the data to be transmitted in a preset queue file;
synchronizing the data to be transmitted in the link data of each category to a target end through a transmission process, so that the target end starts a plurality of preset processes to process the data to be transmitted in the link data of each category in parallel.
CN201811188299.2A 2018-10-11 2018-10-11 Method, device, storage medium and apparatus for synchronizing large data volume data Active CN109656999B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811188299.2A CN109656999B (en) 2018-10-11 2018-10-11 Method, device, storage medium and apparatus for synchronizing large data volume data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811188299.2A CN109656999B (en) 2018-10-11 2018-10-11 Method, device, storage medium and apparatus for synchronizing large data volume data

Publications (2)

Publication Number Publication Date
CN109656999A CN109656999A (en) 2019-04-19
CN109656999B true CN109656999B (en) 2024-03-15

Family

ID=66110491

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811188299.2A Active CN109656999B (en) 2018-10-11 2018-10-11 Method, device, storage medium and apparatus for synchronizing large data volume data

Country Status (1)

Country Link
CN (1) CN109656999B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110417892B (en) * 2019-07-31 2021-08-27 中国工商银行股份有限公司 Message analysis-based data replication link optimization method and device
CN110750592B (en) * 2019-09-06 2023-10-20 中国平安财产保险股份有限公司 Data synchronization method, device and terminal equipment
CN110825711A (en) * 2019-10-17 2020-02-21 上海易点时空网络有限公司 Method and device for transmitting data in quick partitioning mode based on Flume
CN113051278B (en) * 2019-12-27 2023-04-07 中国移动通信集团湖北有限公司 Processing method and system for data replication process delay
CN111586099B (en) * 2020-04-01 2023-03-24 烽火通信科技股份有限公司 Cross-node data backup synchronization method and system
CN111541770B (en) * 2020-04-24 2023-08-15 深圳市元征科技股份有限公司 Vehicle data sorting method and related products
CN112347192A (en) * 2020-11-16 2021-02-09 百度在线网络技术(北京)有限公司 Data synchronization method, device, platform and readable medium
CN114911862B (en) * 2022-07-18 2022-12-06 国网江苏省电力有限公司营销服务中心 System and method for transmitting big data of network national network operation link

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101419615A (en) * 2008-12-10 2009-04-29 阿里巴巴集团控股有限公司 Method and apparatus for synchronizing foreground and background databases
CN102081665A (en) * 2008-12-10 2011-06-01 阿里巴巴集团控股有限公司 Data synchronization method and device
CN103561472A (en) * 2013-10-30 2014-02-05 中国人民解放军理工大学 Multi-service link distribution and reorganization device and method
CN103778136A (en) * 2012-10-19 2014-05-07 阿里巴巴集团控股有限公司 Cross-room database synchronization method and system
WO2016101752A1 (en) * 2014-12-22 2016-06-30 北京奇虎科技有限公司 Method and device for data synchronization
CN106528893A (en) * 2016-12-26 2017-03-22 北京奇虎科技有限公司 Data synchronization method and device
CN107656958A (en) * 2017-06-09 2018-02-02 平安科技(深圳)有限公司 A kind of classifying method and server of multi-data source data
CN107689982A (en) * 2017-06-25 2018-02-13 平安科技(深圳)有限公司 Multi-data source method of data synchronization, application server and computer-readable recording medium
CN108509482A (en) * 2018-01-23 2018-09-07 深圳市阿西莫夫科技有限公司 Question classification method, device, computer equipment and storage medium
CN108596785A (en) * 2018-04-26 2018-09-28 广州供电局有限公司 Processing method, device, computer equipment and the storage medium of power equipment data

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101674128B (en) * 2008-09-12 2013-04-24 电信科学技术研究院 Method and device for controlling data transmission
CN101764746B (en) * 2009-12-17 2012-07-18 中国电力科学研究院 Method and device for sending data
CN108241693B (en) * 2016-12-26 2020-10-27 北京国双科技有限公司 Method and device for synchronizing data
CN107798039B (en) * 2017-05-19 2020-06-05 平安科技(深圳)有限公司 Data synchronization method and device

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101419615A (en) * 2008-12-10 2009-04-29 阿里巴巴集团控股有限公司 Method and apparatus for synchronizing foreground and background databases
CN102081665A (en) * 2008-12-10 2011-06-01 阿里巴巴集团控股有限公司 Data synchronization method and device
CN103778136A (en) * 2012-10-19 2014-05-07 阿里巴巴集团控股有限公司 Cross-room database synchronization method and system
CN103561472A (en) * 2013-10-30 2014-02-05 中国人民解放军理工大学 Multi-service link distribution and reorganization device and method
WO2016101752A1 (en) * 2014-12-22 2016-06-30 北京奇虎科技有限公司 Method and device for data synchronization
CN106528893A (en) * 2016-12-26 2017-03-22 北京奇虎科技有限公司 Data synchronization method and device
CN107656958A (en) * 2017-06-09 2018-02-02 平安科技(深圳)有限公司 A kind of classifying method and server of multi-data source data
CN107689982A (en) * 2017-06-25 2018-02-13 平安科技(深圳)有限公司 Multi-data source method of data synchronization, application server and computer-readable recording medium
CN108509482A (en) * 2018-01-23 2018-09-07 深圳市阿西莫夫科技有限公司 Question classification method, device, computer equipment and storage medium
CN108596785A (en) * 2018-04-26 2018-09-28 广州供电局有限公司 Processing method, device, computer equipment and the storage medium of power equipment data

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于用电信息采集系统的Oracle GoldenGate性能优化方法;杨国良;董京;唐如意;刘金亮;李雄;赵佩;;河北电力技术;20170430;第36卷(第02期);第22-26页 *

Also Published As

Publication number Publication date
CN109656999A (en) 2019-04-19

Similar Documents

Publication Publication Date Title
CN109656999B (en) Method, device, storage medium and apparatus for synchronizing large data volume data
CN109033387B (en) Internet of things searching system and method fusing multi-source data and storage medium
CN110569214B (en) Index construction method and device for log file and electronic equipment
CN109034993A (en) Account checking method, equipment, system and computer readable storage medium
US10311111B2 (en) Search method and apparatus and storage medium
CN110135590B (en) Information processing method, information processing apparatus, information processing medium, and electronic device
CN110990390A (en) Data cooperative processing method and device, computer equipment and storage medium
US20220052976A1 (en) Answer text processing methods and apparatuses, and key text determination methods
US11657078B2 (en) Automatic identification of document sections to generate a searchable data structure
CN111563382A (en) Text information acquisition method and device, storage medium and computer equipment
US20190197140A1 (en) Automation of sql tuning method and system using statistic sql pattern analysis
US11822578B2 (en) Matching machine generated data entries to pattern clusters
CN114398520A (en) Data retrieval method, system, device, electronic equipment and storage medium
CN116644035B (en) File batch warehousing method, device, equipment and storage medium
CN112052259A (en) Data processing method, device, equipment and computer storage medium
CN110309206B (en) Order information acquisition method and system
US9824140B2 (en) Method of creating classification pattern, apparatus, and recording medium
CN110597765A (en) Large retail call center heterogeneous data source data processing method and device
CN111159213A (en) Data query method, device, system and storage medium
CN110062112A (en) Data processing method, device, equipment and computer readable storage medium
CN113380414B (en) Data acquisition method and system based on big data
KR20220099690A (en) Apparatus, method and computer program for summarizing document
CN113138974A (en) Database compliance detection method and device
CN111683128A (en) Information sending method, device, prejudgment server and storage medium
CN114756901B (en) Operational risk monitoring method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant