WO2019047479A1 - 一种普适多源异构大规模数据同步系统 - Google Patents

一种普适多源异构大规模数据同步系统 Download PDF

Info

Publication number
WO2019047479A1
WO2019047479A1 PCT/CN2018/076485 CN2018076485W WO2019047479A1 WO 2019047479 A1 WO2019047479 A1 WO 2019047479A1 CN 2018076485 W CN2018076485 W CN 2018076485W WO 2019047479 A1 WO2019047479 A1 WO 2019047479A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
synchronization
change
server
client
Prior art date
Application number
PCT/CN2018/076485
Other languages
English (en)
French (fr)
Inventor
杨海涛
徐飞
阮镇江
Original Assignee
广东省建设信息中心
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 广东省建设信息中心 filed Critical 广东省建设信息中心
Publication of WO2019047479A1 publication Critical patent/WO2019047479A1/zh
Priority to US16/812,243 priority Critical patent/US11500903B2/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • G06F16/273Asynchronous replication or reconciliation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • G06F16/275Synchronous replication
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2246Trees, e.g. B+trees
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/60Software deployment
    • G06F8/61Installation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/546Message passing systems or structures, e.g. queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/5011Pool
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/5018Thread allocation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/54Indexing scheme relating to G06F9/54
    • G06F2209/541Client-server

Definitions

  • the invention relates to the field of Internet data processing technology, in particular to a wide-area network loose computing environment connected by an Internet communication protocol, which is applicable to a plurality of mainstream data system types, and can cover a large number of autonomously managed heterogeneous data source nodes.
  • Scale data synchronization middle layer system Scale data synchronization middle layer system.
  • Heterogeneous data refers to data of different structures.
  • the heterogeneity of data is mainly reflected in the heterogeneity of computer architecture, the physical storage of data comes from computers with different architectures; the heterogeneity of operating systems, the storage of data comes from different sources.
  • heterogeneous data format, data storage management mechanism can be a relational database system, such as Oracle, SQL Server, DB2, etc., can also be file-type two-dimensional data, such as txt, CSV, XLS, etc.;
  • the data storage location is heterogeneous, the data is stored in a dispersed physical location; the logical model of the data storage is heterogeneous, and the data is stored and maintained in different business logics, so that the data of the same meaning is heterogeneous, such as independent There are inconsistencies in the coding of departments in the sales system and the independent procurement system.
  • Heterogeneous data is often not heterogeneous at one level, but heterogeneous at multiple levels.
  • the widespread use of mobile terminals (such as mobile phones, PDAs, iPads, and laptops) generates a large number of individual mobile terminal data, including contacts, calendars, files, etc., and the logical or physical implementation of its storage structure may be heterogeneous. Further, there is a heterogeneous data synchronization requirement in data backup of the mobile terminal and data synchronization between the mobile terminals.
  • the massive data generated by large-scale application of cloud computing also has the need for real-time data replication, and this application also involves a large amount of heterogeneous data.
  • the existing heterogeneous data synchronization is realized by data transmission between the source database and the target database. Due to different data structures, various types, or the same semantics and different data expression forms, the synchronous data throughput of massive heterogeneous data is limited. The efficiency of heterogeneous data synchronization is low, and the synchronization technology scheme has certain technical limitations. For example, when synchronizing between an unreliable Internet communication environment and a loosely coupled autonomous application, an abnormal synchronization end may occur for data synchronization with large throughput, resulting in unsuccessful data synchronization.
  • a practical solution that meets the current multi-source heterogeneous large-scale data synchronization needs, preferably generic, reliable, and efficient, relatively independent of application layer logic and operating system underlying, and independent of database products and computer operations
  • the middle layer of the system loosely couples the software component system.
  • the technical problem to be solved by the present invention lies in the drawback of overcoming the limited data throughput of the synchronization method of the existing massive heterogeneous data.
  • an embodiment of the present invention provides a universal multi-source heterogeneous large-scale data synchronization system, including: a synchronization network planning management unit, configured to construct a synchronous topology of a tree structure, and the synchronization topology Include a plurality of synchronization pairs consisting of a pair of adjacent nodes, the synchronization pair including a client and a server;
  • the block pipeline processing unit is configured to divide the change data of the source data end of the synchronization data sent by the client into a plurality of change blocks, and transmit the data to the server in batches according to the network condition and the processing capability of the server, and each time the transmission is completed.
  • the batch change block locks the sending process until the server returns the receiving confirmation message, and then resumes the reverse sending of the changed block of the subsequent batch, repeats the loop until the completion of the transmission of all the changes, and stores the changed block received by the server.
  • Corresponding message queue ;
  • a one-way synchronization unit configured to: when the change event monitoring thread of the client monitors that the change log table is not empty, scheduling a synchronization thread of the client to start a data synchronization operation; and when the message queue monitoring thread of the server monitors the message queue is not empty , scheduling a number of data update threads of the server to perform data update operations;
  • a two-way synchronization unit for sequentially performing two opposite-direction one-way synchronizations to complete two-way synchronization
  • the synchronization correctness guarantee unit is configured to record, in the client, the data change event of the source data end according to the sequence of occurrences in the change log table, and to sequentially receive the data change of the source data end on the server side, respectively, in the original order. Implement each data change and record it in the synchronization log table.
  • the client includes a middle layer software component for implementing full/incremental data synchronization
  • the server includes a data change for receiving a client to send source data, and is responsible for updating the received data change.
  • a software component to the target data end includes a database or a file directory in which data copies are used as data change sources in data synchronization; and the target data end includes data in response to data changes from the source data end in data synchronization The database or file directory where the copy is located.
  • the intermediate nodes of the synchronous topology of the tree structure are simultaneously provided with clients and servers belonging to different synchronization pairs.
  • the change log table of the target data end is used to save data change information.
  • the installation configuration unit further includes a synchronization configuration subunit, wherein the synchronization configuration subunit is configured to control the client to acquire metadata information of the source data end participating in the synchronization and transmit the metadata information to the server, and the control server according to the metadata Information builds and stores heterogeneous data mapping rules, controls the client to obtain heterogeneous data mapping rules from the server, and creates insert, delete, and change three change capture triggers for each participating data table on the source data side, and controls The server provides a visual configuration tool to maintain or adjust heterogeneous data mapping rules.
  • the installation configuration unit includes a preset number of data update threads in a data update thread pool pre-created by the server, and data update of each data table in the target data end is performed by a data update thread;
  • the installation configuration unit includes a preset number of data synchronization threads created according to the metadata information in a synchronization thread pool pre-created by the client, where the incremental or full synchronization of each data table in the source data end is Or a set of synchronous threads is responsible.
  • the one-way synchronization unit is further configured to control the change capture trigger to capture the data change event whenever a data change occurs on the source data end, and store the corresponding data change information in the change log table of the client;
  • the data change information includes a table name of a data table subject to data change, a primary key value of the data record, and a change operation type.
  • the one-way synchronization unit is further configured to control the synchronization thread of the client to divide the corresponding change data or the synchronization operation record into multiple change blocks according to the synchronization preset value of each data table subjected to the data change, and each The change blocks are encapsulated into SyncML message packets and sent to the server in sequence.
  • the one-way synchronization unit is further configured to: when the server synchronizes the session request for each SyncML message packet, allocate a data receiving thread to receive the SyncML message packet uploaded by the client; the data receiving The thread is configured to receive the SyncML message packet and parse and restore the change block, and store the parsed change block to the specified message queue, and feed back the synchronization success information to the client after the storage is successful.
  • the one-way synchronization unit is further configured to control a synchronization thread of the client to associate the change log table with a data table subject to data change, and perform positive ordering according to the time of the change log table record insertion, and read the change log table.
  • the control thread After receiving the change block and returning the confirmation information, deletes the change log record in the change log table corresponding to the change block.
  • the one-way synchronization unit is further configured to control a data update thread of the server to read the change block from the message queue, perform local data change according to the heterogeneous data mapping rule, and make a target data copy and source of the target data end.
  • the synchronization data of the source data copy of the data side is consistent.
  • the one-way synchronization unit is further configured to control a synchronization thread of the client to perform a hash calculation on the changed block to be sent, and encapsulate the hash value and the change block into a SyncML message package; and control the data receiving thread of the server to receive After the SyncML message packet, the parsed change block is hashed, and the change block is stored in the message queue when the verification is successful, otherwise, the synchronization failure message is returned.
  • the one-way synchronization unit is further configured to control the size of the change block, and the synchronization thread of the control client enters a lock wait state after each change block is sent until the server returns an acknowledgement message or times out.
  • the heterogeneous data mapping rule in the heterogeneous data mapping table created by the installation configuration unit on the server includes a data table name, a primary key or a virtual primary key, a field name, and field data of the synchronized source data end and the target data end. Type, field data length, and field mapping.
  • the installation configuration unit is further configured to construct a virtual primary key in the heterogeneous data mapping rule; when the data table of the source data end does not define a primary key, the installation configuration unit controls the server according to the metadata information.
  • the field information constructs a virtual primary key for the data table that uniquely identifies its data record, and stores the construction rule of the virtual primary key on the server.
  • the installation configuration unit is further configured to check whether the server has a message queue corresponding to the client; when the server does not have a message queue corresponding to the client, the installation configuration unit controls the server to create a corresponding message queue for the client.
  • the message queue is configured to temporarily store a change block of a corresponding client received by the server.
  • an embodiment of the present invention provides a universal multi-source heterogeneous large-scale data synchronization method, including: acquiring heterogeneous data to be synchronized; and acquiring at least one data table according to the heterogeneous data.
  • the data table is a table including the heterogeneous data identification information created according to the mapping rule; respectively, a data synchronization thread is established according to each of the data tables; and the data synchronization thread respectively divides at least one data table according to a preset value into Multiple data blocks.
  • the heterogeneous data includes incremental heterogeneous data
  • the acquiring heterogeneous data to be synchronized includes: creating a data change log table, where the data change log table includes a field description, a field name, and a field type;
  • the trigger captures change event information of the data; and the change event information of the data is recorded in the data change log table.
  • the ubiquitous multi-source heterogeneous large-scale data synchronization method further comprises: uploading the data block to a server.
  • the data block is encapsulated as a message packet.
  • an embodiment of the present invention provides a universal multi-source heterogeneous large-scale data synchronization method, including: acquiring a data block according to an embodiment of the second aspect; and placing the data block into a message queue; The data block is retrieved from the message queue and synchronized to the target database.
  • the acquiring the data block in the embodiment of the second aspect comprises: receiving a message packet uploaded by a client, the message packet includes a plurality of data blocks; and parsing the message packet into a data block.
  • the method before the step of extracting the data block from the message queue to the target database, the method includes: determining whether the data block includes a primary key value; and when the data block does not include a primary key value, according to the The value of multiple attribute columns of the data block constructs a virtual primary key.
  • an embodiment of the present invention provides a universal multi-source heterogeneous large-scale data synchronization apparatus, including: a heterogeneous data acquisition unit, configured to acquire heterogeneous data to be synchronized; and a data table acquisition unit, Acquiring at least one data table according to the heterogeneous data, the data table is a table including the heterogeneous data identification information created according to a mapping rule; and a thread establishing unit, configured to separately establish according to each of the data tables a data synchronization thread; the data block dividing unit is configured to divide the at least one data table into a plurality of data blocks according to a preset value, respectively.
  • the heterogeneous data includes incremental heterogeneous data
  • the heterogeneous data obtaining unit includes: a change log table creation subunit, configured to create a data change log table, where the data change log table includes a field description and a field Name and field type; data change capture subunit, change event information for capturing data by a trigger; data change record subunit for recording change event information of the data in the data change log table.
  • the ubiquitous multi-source heterogeneous large-scale data synchronization device further includes: a data uploading unit, configured to upload the data block to the server.
  • the data uploading unit further includes an encapsulating subunit for encapsulating the data block into a message packet.
  • an embodiment of the present invention provides a universal multi-source heterogeneous large-scale data synchronization apparatus, including: an obtaining unit, configured to acquire the data block; and a message queue unit, configured to use the data The block is placed in a message queue; a synchronization unit is configured to retrieve the data block from the message queue and synchronize to the target database.
  • the obtaining unit includes: a receiving subunit, configured to receive a message packet uploaded by the client, the message packet includes a plurality of data blocks, and a parsing subunit configured to parse the message packet into a data block.
  • the ubiquitous multi-source heterogeneous large-scale data synchronization device further includes: a determining unit, configured to determine whether the data block includes a primary key value; and a virtual primary key construction unit, configured to: when the data block does not include a primary key value The virtual primary key is constructed based on the values of the plurality of attribute columns of the data block.
  • an embodiment of the present invention provides a client, including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory is stored with the at least one An instruction executed by the processor, the instructions being executed by the at least one processor to cause the at least one processor to perform the heterogeneous data synchronization method as described in the second aspect.
  • an embodiment of the present invention provides a server, including: a wireless network interface, a processor, and a memory, wherein the wireless network interface, the processor, and the memory are interconnected by a bus, and the memory is stored in the memory.
  • a computer instruction by which the processor implements the heterogeneous data synchronization method as described in the third aspect is a computer instruction by which the processor implements the heterogeneous data synchronization method as described in the third aspect.
  • the present invention provides a universal multi-source heterogeneous large-scale data synchronization system distributed on each data node participating in synchronization, that is, a computer system, and working at an intermediate level below the node database layer and below the application logic layer. .
  • the client transmits the captured local change information to the server through the SyncML protocol according to the synchronization task plan; the server receives the change information and stores it in the corresponding message queue by the asynchronous parallel message processing mechanism.
  • the server polls the local message queue to read the pending change information, and then implements subsequent data changes according to the heterogeneous data mapping rule to maintain the consistency of the source data copy and the target data copy synchronization data.
  • the system of the present invention operates independently in parallel with the local application of the synchronization node, and provides a loose transaction guarantee mechanism for Internet distributed multi-source heterogeneous data synchronization through loose coupling.
  • the present invention provides a universal multi-source heterogeneous large-scale data synchronization method and apparatus, and a client and a server, by obtaining a metadata table of heterogeneous data to be synchronized at a client; according to the metadata table, A data synchronization thread is separately established according to each data table by constructing a data mapping rule of the source data end and the target data end; the data synchronization thread respectively divides at least one data table into a plurality of data blocks according to a preset value.
  • Synchronous job scheduling implements single-node multi-thread parallel processing, which can provide a dedicated synchronization thread for each synchronous data set.
  • the synchronization thread can horizontally split the synchronous data objects according to the preset synchronization arrangement to form more fine-grained data.
  • Block based on this data partitioning, implements a reliable synchronous transaction advancement mechanism. Obtaining the above data block at the server end; putting the data block into a message queue; and extracting the data block from the message queue to synchronize to the target database.
  • Data update parallel processing optimization is implemented on the server side by introducing a message queue mechanism.
  • the invention provides a practical level heterogeneous data synchronization service on a wide area network without specific communication guarantee through a combination of client concurrent multi-thread scheduling, large data set horizontal segmentation and server-side asynchronous parallel processing.
  • FIG. 1 is a schematic structural diagram of a universal multi-source heterogeneous large-scale data synchronization system
  • FIG. 2 is a flow chart of data synchronization processing of a universal multi-source heterogeneous large-scale data synchronization system
  • FIG. 3 is a flow chart of a heterogeneous data synchronization method
  • FIG. 4 is a flow chart of a heterogeneous data synchronization method.
  • connection or integral connection; may be mechanical connection or electrical connection; may be directly connected, may also be indirectly connected through an intermediate medium, or may be internal communication of two components, may be wireless connection, or may be wired connection.
  • connection or integral connection; may be mechanical connection or electrical connection; may be directly connected, may also be indirectly connected through an intermediate medium, or may be internal communication of two components, may be wireless connection, or may be wired connection.
  • the embodiment provides a universal multi-source heterogeneous large-scale data synchronization system, as shown in FIG. 1, comprising: a synchronous network planning management unit 11, an installation configuration unit 12, a blocking pipeline processing unit 13, and a one-way synchronization unit 14.
  • the synchronization network planning management unit 11 is configured to construct a synchronization topology of a tree structure, where the synchronization topology includes a plurality of synchronization pairs composed of a pair of adjacent nodes, and the synchronization pair includes a client (relative lower node) And the server (relative superior node), the intermediate node of the synchronization tree (nodes other than the leaf and the root node) can be equipped with both client and server roles, but they belong to different synchronization pairs.
  • the client includes a middle layer software component for implementing full/incremental data synchronization; the server includes a data change for receiving a client to send source data, and is responsible for updating the received data change to target data.
  • the client and the server can be installed and deployed on the same computer, each running its function in the synchronization job of the synchronization pair in which it is located;
  • the source data side includes the data as the source of the data change in the data synchronization
  • the target data end includes a database or file directory in which data copies in response to data changes from the source data end are in data synchronization.
  • the system provides both diffusion and non-diffusion configuration strategies for the one-way data synchronization of the synchronization tree (client ⁇ server).
  • the diffusion policy records the data change information in the change log table of the target data end (the server of the current synchronization pair), and the non-diffusion policy does not perform this record operation.
  • the installation configuration unit 12 is configured to execute an installation script on the server to create a heterogeneous data mapping table, a synchronization monitoring table, a synchronization log table, a message queue, a message queue monitoring thread, and a data update thread pool, and is used to perform installation on the client. Scripts to create synchronization configuration tables, change log tables, change event monitoring threads, and synchronization thread pools.
  • the heterogeneous data mapping rules built by the server include the data table name, primary key or "virtual primary key", field name, field data type, field data length, field mapping relationship of the source data end and the target data end.
  • the server uses the "virtual primary key” algorithm to construct a unique identifier for the data table according to the field information in the metadata information.
  • the "virtual primary key” of the data record, and the “virtual primary key” build rule is stored on the server.
  • the “virtual primary key” algorithm constructs a "virtual primary key” by filtering the primary key fields of the data table.
  • the server also checks whether there is a message queue corresponding to the client, and if not, creates a corresponding message queue for the client. The message queue is used to temporarily store the change block of the corresponding client received by the server.
  • the client obtains the metadata information of the source data end participating in the synchronization (the structure data of all the data tables of the source data end participating in the synchronization) and transmits the data to the server; the server automatically constructs and stores the “source data end” according to the metadata information.
  • Target Data End
  • the client obtains the heterogeneous data mapping rule from the server, and according to the rule and the change capture trigger ("trigger"), creates "insert", “delete” and “change” for each data table participating in the synchronization on the source data side. "Three change capture triggers.
  • this embodiment provides a visual configuration tool on the server to maintain or adjust heterogeneous data mapping rules.
  • the specific method for the client to pre-create the synchronous thread pool is to create a plurality (preset number) of data synchronization threads according to the metadata information.
  • An incremental or full synchronization of a data table is handled by a (group) synchronization thread.
  • the specific method for the server to pre-create the update thread pool is to create a number of (pre-set number) data update threads.
  • a data update for a data table is handled by a data update thread.
  • the block pipeline processing unit 13 is configured to divide the change data of the source data end participating in the synchronization sent by the client into a plurality of change blocks, and transmit the data to the server in batches according to the network condition and the processing capability of the server.
  • a batch of change blocks locks the transmission process until the server returns to receive the confirmation message, and then resumes the reverse transmission of the changed block of the subsequent batch, repeats the loop until the completion of the transmission of all the changes, and stores the changed block that the server receives. Enter the corresponding message queue.
  • the one-way synchronization unit 14 is configured to: when the change event monitoring thread of the client monitors that the change log table is not empty, scheduling a synchronization thread of the client to start a data synchronization operation; and when the message queue monitoring thread of the server monitors the message queue is not If it is empty, some data update threads of the dispatch server perform data update operations.
  • the block pipeline processing unit 13 and the one-way synchronization unit 14 cooperate to perform one data synchronization, and the data processing process thereof is as shown in FIG. 2 .
  • the trigger captures the data change event and stores the data change information in the client's data change log table.
  • the data change information includes the table name of the corresponding data table subject to the data change, the primary key value of the data record, the type of the change operation, and the like.
  • the client's change event monitoring thread can detect the data change of the source data end by monitoring the change log table, and further the client initiates the data synchronization job through the synchronous thread.
  • the synchronization thread divides all the change data or synchronization operation records to be synchronized into multiple change blocks according to the synchronization preset value of each data table, and encapsulates each change block into a SyncML message package, and sequentially sends it to the server.
  • the server allocates a data receiving thread to receive the SyncML message packet uploaded by the client for each session request of the SyncML message packet.
  • the data receiving thread receives the SyncML message packet and parses and restores the changed block, and stores it in the message processing mechanism to the specified message queue, and feeds back the synchronization success information (confirmation message) to the client after the storage is successful.
  • the message queue monitoring thread of the server monitors whether the message queue is empty. If it is not empty, the server is notified to schedule a number of data update threads to perform data update operations.
  • the data update thread of the server performs subsequent processing in an asynchronous parallel manner, that is, reads the change block from the message queue, and implements local data change according to the heterogeneous data mapping rule, so that the target data copy (local) is consistent with the synchronous data of the source data copy.
  • the data synchronization thread of the client When the data synchronization thread of the client reads the data table change data, the data synchronization thread first associates the data change log table with the data table, sorts the time of the insertion according to the data change log table record, and reads all the changed data of the data table. Recording; secondly, the data synchronization thread transmits the SyncML message packet to the server through the SyncML general protocol; finally, after the server returns the acknowledgement (receive) message, the data synchronization thread changes the data corresponding to the change block in the SyncML message packet. Log records are deleted.
  • the heterogeneous data synchronization system of this embodiment is suitable for incremental synchronization and full synchronization.
  • incremental synchronization refers to only synchronizing the newly generated local data changes since the last successful synchronization to the target data end.
  • the client polls the local change log table, transmits the data change to be synchronized to the server, and deletes the local change log record after the server returns the synchronization confirmation message.
  • Full-scale synchronization refers to synchronizing the synchronous data table of the source data end, that is, the full table data of the data table participating in the data synchronization, to the target data end.
  • the full amount of synchronization in this embodiment includes two types, namely, merge synchronization and refresh synchronization.
  • Refresh Synchronization completely erases the data table of the target data side before starting to update the data to the target data table, which is usually used to reset the synchronization task of the synchronization pair.
  • the merge synchronization checks whether the data record already exists in the target table before starting to update the data to the target data table. If it does not exist, the Insert operation is performed; otherwise, the Update operation is performed. It should be noted that during the full amount of synchronization, the activated triggers on the source data side can capture and record all local changes that occurred during this period for subsequent incremental synchronization processing.
  • the two-way synchronization unit 15 is configured to perform two-way synchronization in two opposite directions to perform two-way synchronization, that is, C ⁇ S is equivalent to C ⁇ S and S ⁇ C.
  • C and S represent the client and the server respectively
  • arrow ⁇ or ⁇ represents the synchronization aspect (changing the direction of block transmission), the same below.
  • the synchronization correctness guarantee unit 16 is configured to record, in the client, the data change event of the source data end according to the sequence of occurrences in the change log table, and to sequentially receive the data change of the source data end on the server side, respectively, according to the original Each data change is committed in sequence and recorded in the synchronization log table.
  • the client completes checking and managing the data table triggers participating in the synchronization, such as the client. Responsible for the creation, modification, activation and other management of triggers and consistency checks.
  • the incremental synchronization in this embodiment prohibits single-point multi-process synchronization, that is, prohibits multiple clients from simultaneously synchronizing data on the same source data end to avoid synchronization errors of the same data table. sequence.
  • system of the embodiment further provides a general "multi-attribute primary key ⁇ single attribute primary key" mapping rule to record the change event of the single and non-single attribute primary key data table, such change event can be recorded in the client's change log table. in.
  • the data synchronization is initiated by the client.
  • the embodiment further provides a synchronization scheduling and synchronization status real-time monitoring unit for each client.
  • this embodiment provides two synchronization performance options of "optimistic” and "cautious".
  • the change block to be sent is hashed, and its hash value and change block are encapsulated into a SyncML message packet; after receiving the SyncML message packet, the receiver must make an interpretation of the changed block.
  • Hash check that is, recalculate the hash value of the changed block, and compare it with the sent hash value. If the two are the same, pass the check; if the check succeeds, the change block is stored in the message queue. Otherwise, the synchronization is returned. Failure message.
  • the above processing involving hash values is omitted.
  • the “optimistic” option is suitable for scenarios where the reliability of synchronous transmission is more optimistic and tends to pursue higher synchronization performance, such as the case where the quality of the synchronous communication network is better; on the contrary, the “cautious” option is suitable for synchronous transmission.
  • the reliability of the application is less optimistic, even pessimistic, or an application scenario that tends to more carefully verify the correctness of the synchronization transmission results, such as the situation where the quality of the synchronous communication network is not good.
  • the change block is sent as a synchronous message unit, and a block lock step sending mechanism is adopted, that is, each time a batch of change blocks is sent, the lock waits until the receiver returns an acknowledgement message or times out.
  • the larger the change block the higher the transmission efficiency. However, if a transmission failure occurs, the entire change block needs to be retransmitted, and the retransmission cost is higher.
  • This embodiment provides the function of provisioning a change block size (how many pieces of change data or change operation records) to adapt the efficiency and reliability of large-scale data synchronization. If the applied network environment, including the endpoint system environment, has high communication reliability, the synchronization change block is adjusted.
  • the lock-free transmission mechanism can be regarded as a special case of the lock-step transmission mechanism, which is equivalent to adjusting the change block to the maximum, that is, all the transmitted content as a change block.
  • This embodiment provides a heterogeneous data synchronization method, which is applicable to the ubiquitous multi-source heterogeneous large-scale data synchronization system as described in Embodiment 1.
  • the following uses the client as an example to introduce the heterogeneous data synchronization method, as shown in FIG. As shown, the following steps are included:
  • S11 Synchronization configuration initialization phase.
  • the client obtains the metadata information of the source data end participating in the synchronization data table and transmits it to the server. Then, the client creates several (preset) data synchronization threads, and creates "delete”, “insert” and “change” on the source data side according to the heterogeneous data mapping rules and change capture rule templates obtained from the server. Change capture triggers (collectively “triggers").
  • S12 Change event triggering phase.
  • the trigger stores the change event information in the local data change log table.
  • the change event information includes information such as the table name of the data table involved, the primary key value of the data record, the type of change, and the change time.
  • S13 Data synchronization phase.
  • the change event monitoring thread polls the change log table to confirm whether the change log table is empty. If the change log table is not empty, the client is notified to schedule a number of data synchronization threads to perform data synchronization operations on the data tables involved.
  • the data synchronization thread divides the change set of the at least one data table into a plurality of change blocks according to the heterogeneous data mapping rule acquired from the server and the preset value.
  • the synchronization thread Before each round of data synchronization starts, the synchronization thread counts the total number of records of the data table to be synchronized, and informs the server of the table name of the data table and the total number of records to be synchronized, and then starts the current round synchronization process.
  • the synchronization thread sends the last change block of the current round, it informs the server of the last change block sent by the current round synchronization session; after receiving the change block, the server ends the current round of synchronization.
  • the above step S11 includes the following sub-steps:
  • the change log table includes information such as the name of the data table involved in the change, the primary key value of the data record, the type of change event, and the change time. Specifically, the change log table is shown in Table 1 (for both client and server):
  • S112 Capture a data change event by using a trigger. Captures operational events such as UPDATE (change), INSERT (insert), and DELETE (delete) for each row of data records.
  • UPDATE change
  • INSERT insert
  • DELETE delete
  • S113 Record the data change event in the change log table. Preset change log tables and triggers enable incremental data synchronization without interfering with upper-layer applications and without affecting normal database throughput.
  • the method includes the following steps:
  • S21 Synchronization configuration initialization phase.
  • the server automatically constructs and stores the heterogeneous data mapping rule between the source data end and the target data end according to the client uploading the metadata information; the server creates a plurality of (preset value) data update threads.
  • a middle layer system is written in the Java language, and a unified database access interface is implemented using a JDBC (Java Data Base Connectivity) specification.
  • JDBC Java Data Base Connectivity
  • the mapping relationship which is the default mapping rule.
  • Using the Java data type as the intermediate database type also ensures that the transport resolution between the client and the server is not easy to make mistakes.
  • mapping relationship between the above various heterogeneous data types and the Java data types will be used as a medium to determine the specific data type correspondence of the two sides of the synchronization.
  • the method further includes: determining whether the primary data key is defined in the data table involved. If the data table does not define a primary key, a virtual primary key is constructed for it to uniquely identify its data record to implement a guaranteed synchronization process.
  • S22 Data receiving phase. For each session request of the client to upload a SyncML message packet, the server allocates a data receiving thread to be responsible for receiving and parsing the message packet, and depositing the modified block that is parsed and restored by the message processing mechanism (the client) Message queue.
  • the present invention embeds an asynchronous parallel message processing mechanism on the server side. After receiving the SyncML message packet from the client, the data receiving thread of the server only performs simple parsing and processing, that is, calling the message processing mechanism to store the modified block that is interpreted by the message processing mechanism, and returning a confirmation message to the client to indicate the client.
  • SyncML message packets can continue to be sent without having to wait for the target data side to complete the synchronization change.
  • the change block in the message queue is forwarded to the synchronous processing module of the server to perform parallel processing asynchronously, that is, multiple threads simultaneously process. This kind of processing speeds up the progress of client message sending.
  • the server does not have to complete the whole process immediately, the overall processing pressure of the server is alleviated.
  • the server is usually a performance bottleneck.
  • S23 Data update phase.
  • the message queue monitoring thread monitors whether the message queue is empty. If the queue is not empty, the notification server dispatches a number of data update threads to perform data update operations on the queue corresponding synchronization data table.
  • the data update thread is used to read the change block update from the message queue to the target data side.
  • the data update thread reads the change block from the message queue, and then updates the change block to the corresponding synchronization data table of the target data end according to the heterogeneous data mapping rule.
  • the above step S22 includes the following sub-steps:
  • S221 Receive a message packet uploaded by the client, where the message includes one or more change blocks.
  • the present invention is based on the SyncML standard protocol common to the IT industry for data synchronization.
  • SyncML is now part of the OMA (Open Mobile Alliance) Data Synchronization and Device Management protocol suite, enabling data synchronization on compatible devices, programs, and networks. Equipment or programs can get consistent data.
  • OMA Open Mobile Alliance
  • SyncML only provides a basic communication framework, which is far from enough to meet the needs of large-scale data synchronization applications. For example, it does not provide a reliability guarantee mechanism, which is crucial in practical applications.
  • the synchronization correctness guarantee unit 16 of the universal multi-source heterogeneous large-scale data synchronization system of Embodiment 1 ensures the reliability of heterogeneous data synchronization.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Computing Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

一种普适多源异构大规模数据系统,包括同步网络规划管理单元(11)、安装配置单元(12)、分块流水线处理单元(13)、单向同步单元(14)、双向同步单元(15)和同步正确性保障单元(16)。该系统工作在节点数据库层之上和应用逻辑层之下的中间层面。在数据同步过程,客户端根据同步任务计划,将捕获的当地变更信息传输到服务端;服务端接收变更信息并交由异步并行消息处理机制存储在对应的消息队列中;服务端轮询当地消息队列读取待处理的变更信息,然后根据异构数据映射规则实施后继数据变更,以维护源数据副本与目标数据副本同步数据的一致。该系统以并行于同步节点当地应用的方式独立运行,并通过松耦合协同提供互联网分布式多源异构数据同步的松弛事务保障机制。

Description

一种普适多源异构大规模数据同步系统 技术领域
本发明涉及互联网数据处理技术领域,具体涉及一种面向通过互联网通讯协议而联结的广域网络松散计算环境的,适用于多种主流数据系统类型的,可覆盖大批自主管理异构数据源节点的大规模数据同步中间层系统。
背景技术
异构数据是指不同结构的数据,数据的异构性主要体现在:计算机体系结构的异构,数据的物理存储来源于不同体系结构的计算机;操作系统的异构,数据的存储来源于不同的操作系统;数据格式的异构,数据的存储管理机制不同,可以是关系型数据库系统,如Oracle、SQL Server、DB2等,也可以是文件型二维数据,如txt、CSV、XLS等;数据存储地点的异构,数据存储在分散的物理位置上;数据存储的逻辑模型异构,数据分别在不同的业务逻辑中存储和维护,使相同意义的数据存在表现的异构,如独立的销售系统和独立的采购系统中存在部门的编码不一致等。异构数据往往不是一个层面的异构,而是在多个层面上都存在异构。此外,移动终端(如手机、PDA、iPad和手提电脑等)的广泛应用产生了大量的移动终端数据个体,包括通讯录、日历、文件等,其存储结构的逻辑或物理实现可能是异构的,进而在移动终端的数据备份以及移动终端间的数据同步方面, 存在异构数据同步需求。另外,云计算大规模应用所产生的海量数据也存在实时数据复制的需求,这方面应用也涉及大量异构数据。
现有异构数据同步是在源数据库和目标数据库之间通过数据传输来实现,由于数据结构不同、种类繁杂,或者语义相同、数据表达形式不同,使得海量异构数据的同步数据吞吐量有限,异构数据同步的效率较低,同步技术方案存在一定的技术局限性。例如,当在不可靠的互联网通讯环境和松散耦合自治的应用之间的同步时,对于吞吐量较大的数据同步可能出现不正常的同步结束的情况,导致数据同步不成功。满足当前多源异构大规模数据同步需求的实用级解决方案,最好是通用、可靠和高效的,既相对独立于应用层逻辑又不涉及操作系统底层的,且不依赖数据库产品和计算机操作系统的中间层松耦合软件组件系统。
发明内容
本发明要解决的技术问题在于克服现有海量异构数据的同步方法数据吞吐量有限的缺陷。
根据第一方面,本发明的一个实施例提供一种普适多源异构大规模数据同步系统,包括:同步网络规划管理单元,用于构建树状结构的同步拓扑结构,所述同步拓扑结构包括多个由一对相邻节点构成的同步对,所述同步对包括客户端和服务端;
安装配置单元,用于在服务端执行安装脚本以创建异构数据映射表、同步日志表、消息队列、消息队列监控线程和数据更新线程池,以及用于在客户端执行安装脚本以创建变更日志表、变更事件监控线程和同步线程池;
分块流水线处理单元,用于将客户端发送的参与同步的源数据端的变更数据划分为若干变更块,并根据网络状况和服务端的处理能力分批次依序向服务端传输,每发送完一批变更块即锁定发送过程直至服务端返回接收确认消息后再恢复后继批次的变更块的反送,循环反复直至完成全部变更志的发送;以及用于服务端将接收到的变更块存入对应的消息队列;
单向同步单元,用于当客户端的变更事件监控线程监控变更日志表不为空,则调度客户端的若干同步线程启动数据同步作业;以及用于当服务端的消息队列监控线程监控消息队列不为空,则调度服务端的若干数据更新线程进行数据更新作业;
双向同步单元,用于顺序执行的两个方向相反的单向同步以完成双向同步;
同步正确性保障单元,用于在客户端将源数据端的数据变更事件依发生的先后时序不重复记载在变更日志表中,以及用于在服务端依次接收源数据端的数据变更,分别按原顺序落实各个数据变更并记载在同步日志表中。
优选地,所述客户端包括用于实现全量/增量数据同步的中间层软件组件;所述服务端包括用于接收客户端发送源数据端数据变更,并负责将所接收到的数据变更更新到目标数据端的软件组件;所述源数据端包括在数据同步中作为数据变更来源的数据副本所在的数据库或文件目录;所述目标数据端包括在数据同步中响应来自源数据端的数据变更的数据副本所在的数据库或文件目录。
优选地,对于所述树状结构的同步拓扑结构的中间节点,同时配有分属于不同的同步对的客户端和服务端。
优选地,所述目标数据端的变更日志表用于保存数据变更信息。
优选地,所述安装配置单元还包括同步配置子单元,所述同步配置子单元用于控制客户端获取参与同步的源数据端的元数据信息并传输给服务端,控制服务端根据所述元数据信息构建并存储异构数据映射规则,控制客户端从服务端获取异构数据映射规则和在源数据端为每个参与同步的数据表创建插入、删除和更改三种变更捕获触发器,以及控制服务端提供可视化配置工具维护或调整异构数据映射规则。
优选地,所述安装配置单元在服务端预创建的数据更新线程池中包含预设个数的数据更新线程,所述目标数据端中每一个数据表的数据更新由一个数据更新线程负责;所述安装配置单元在客户端预创建的同步线程池中包含根据所述元数据信息创建的预设个数的数据同步线程,所述源数据端中每一个数据表的增量或全量同步由一个或一组同步线程负责。
优选地,所述单向同步单元还用于每当源数据端发生数据变更时,控制所述变更捕获触发器捕获该数据变更事件,并将对应的数据变更信息存储在客户端的变更日志表;所述数据变更信息包含经受数据变更的数据表的表名、数据记录的主键值和变更操作类型。
优选地,所述单向同步单元还用于控制客户端的同步线程根据每个经受数据变更的数据表的同步预设值将其对应的变更数据或同步操作记录划 分为多个变更块,将每个所述变更块封装为SyncML消息包并依序发送到服务端。
优选地,所述单向同步单元还用于控制服务端在数据同步过程中,对每个SyncML消息包的会话请求,分配一个数据接收线程以接收客户端上传的SyncML消息包;所述数据接收线程用于接收SyncML消息包并解析还原出变更块,以及将所述解析出的变更块存储到指定的消息队列,并在存储成功后向客户端反馈同步成功信息。
优选地,所述单向同步单元还用于控制客户端的同步线程将变更日志表与经受数据变更的数据表关联,按所述变更日志表记录插入的时间正排序,读取所述变更日志表的所有变更的数据记录;以及在服务端的接收变更块并返回确认信息后,控制所述同步线程将变更块所对应的变更日志表中的变更日志记录删除。
优选地,所述单向同步单元还用于控制服务端的数据更新线程从所述消息队列读取变更块,根据所述异构数据映射规则实施当地数据变更,使得目标数据端的目标数据副本与源数据端的源数据副本的同步数据一致。
优选地,所述单向同步单元还用于控制客户端的同步线程对待发送的变更块做Hash计算,并将其Hash值和变更块一起封装成SyncML消息包;以及控制服务端的数据接收线程在接收到所述SyncML消息包后,要对解析出的变更块做Hash校验,当校验成功时将所述变更块存入消息队列,否则,返回同步失败消息。
优选地,所述单向同步单元还用于控制所述变更块的大小,以及控制客户端的同步线程在每发送一个变更块后进入锁定等待状态,直至服务端返回确认消息或者超时。
优选地,所述安装配置单元在服务端创建的异构数据映射表中的异构数据映射规则,包括同步的源数据端和目标数据端的数据表名、主键或虚拟主键、字段名、字段数据类型、字段数据长度和字段映射关系。
优选地,所述安装配置单元还用于构建所述异构数据映射规则中的虚拟主键;当源数据端的数据表未定义主键时,所述安装配置单元控制服务端根据所述元数据信息中的字段信息,为所述数据表构建一个可唯一标识其数据记录的虚拟主键,并把所述虚拟主键的构建规则存储在服务端。
优选地,所述安装配置单元还用于检查服务端有无对应客户端的消息队列;当服务端没有对应客户端的消息队列时,所述安装配置单元控制服务端为该客户端创建对应的消息队列,所述消息队列用于暂时存储服务端接收的对应客户端的变更块。
根据第二方面,本发明的一个实施例提供一种普适多源异构大规模数据同步方法,包括:获取需同步的异构数据;根据所述异构数据,获取至少一个数据表,所述数据表是根据映射规则创建的包括所述异构数据标识信息的表;根据每个所述数据表分别建立数据同步线程;所述数据同步线程分别根据预设值将至少一个数据表划分为多个数据块。
优选地,所述异构数据包括增量异构数据,所述获取需同步的异构数 据,包括:创建数据变更日志表,所述数据变更日志表包括字段描述、字段名称及字段类型;通过触发器捕获数据的变更事件信息;将所述数据的变更事件信息记录在所述数据变更日志表中。
优选地,普适多源异构大规模数据同步方法还包括:将所述数据块上传至服务端。
优选地,所述数据块被封装为消息包。
根据第三方面,本发明的一个实施例提供一种普适多源异构大规模数据同步方法,包括:获取第二方面实施例所述的数据块;将所述数据块放入消息队列;从所述消息队列中取出数据块同步至目标数据库。
优选地,所述获取第二方面实施例所述的数据块包括:接收客户端上传的消息包,所述消息包包括多个数据块;将所述消息包解析为数据块。
优选地,在所述从所述消息队列中取出数据块同步至目标数据库的步骤之前,包括:判断所述数据块是否包括主键值;当所述数据块不包括主键值时,根据所述数据块多个属性列的值构建虚拟主键。
根据第四方面,本发明的一个实施例提供一种普适多源异构大规模数据同步装置,包括:异构数据获取单元,用于获取需同步的异构数据;数据表获取单元,用于根据所述异构数据,获取至少一个数据表,所述数据表是根据映射规则创建的包括所述异构数据标识信息的表;线程建立单元,用于根据每个所述数据表分别建立数据同步线程;数据块划分单元,用于所述数据同步线程分别根据预设值将至少一个数据表划分为多个数据块。
优选地,所述异构数据包括增量异构数据,所述异构数据获取单元包括:变更日志表创建子单元,用于创建数据变更日志表,所述数据变更日志表包括字段描述、字段名称及字段类型;数据变更捕获子单元,用于通过触发器捕获数据的变更事件信息;数据变更记录子单元,用于将所述数据的变更事件信息记录在所述数据变更日志表中。
优选地,普适多源异构大规模数据同步装置还包括:数据上传单元,用于将所述数据块上传至服务端。
优选地,所述数据上传单元还包括封装子单元,用于将所述数据块封装为消息包。
根据第五方面,本发明的一个实施例提供一种普适多源异构大规模数据同步装置,包括:获取单元,用于获取上述数据块;放入消息队列单元,用于将所述数据块放入消息队列;同步单元,用于从所述消息队列中取出数据块同步至目标数据库。
优选地,所述获取单元包括:接收子单元,用于接收客户端上传的消息包,所述消息包包括多个数据块;解析子单元,用于将所述消息包解析为数据块。
优选地,普适多源异构大规模数据同步装置还包括:判断单元,用于判断所述数据块是否包括主键值;虚拟主键构建单元,用于当所述数据块不包括主键值时,根据所述数据块多个属性列的值构建虚拟主键。
根据第六方面,本发明的一个实施例提供一种客户端,包括:至少一 个处理器;以及与所述至少一个处理器通信连接的存储器;其中,所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器执行如第二方面实施例所述的异构数据同步方法。
根据第七方面,本发明的一个实施例提供一种服务端,包括:无线网络接口、处理器及存储器,所述无线网络接口、处理器及存储器之间通过总线互相连接,所述存储器中存储有计算机指令,所述处理器通过执行所述计算机指令,从而实现如第三方面实施例所述的异构数据同步方法。
本发明技术方案,具有如下优点:
1.本发明提供一种普适多源异构大规模数据同步系统,分布在参与同步的各个数据节点,即计算机系统上,并工作在节点数据库层之上和应用逻辑层之下的中间层面。在数据同步过程,客户端根据同步任务计划,将捕获的当地变更信息通过SyncML协议传输到服务端;服务端接收上述变更信息并交由异步并行消息处理机制存储在对应的消息队列中。服务端轮询当地消息队列读取待处理的变更信息,然后根据异构数据映射规则实施后继数据变更,以维护源数据副本与目标数据副本同步数据的一致。本发明的系统以并行于同步节点当地应用的方式独立运行,并通过松耦合协同提供互联网分布式多源异构数据同步的松弛事务保障机制。
2.本发明提供一种普适多源异构大规模数据同步方法及装置和客户端、服务端,通过在客户端获取需同步的异构数据的元数据表;根据所述元数据表,通过构建源数据端和目标数据端的数据映射规则,根据每个数 据表分别建立数据同步线程;所述数据同步线程分别根据预设值将至少一个数据表划分为多个数据块。同步作业调度实现了单节点多线程并行处理,可为每个同步数据集提供专用同步线程;同步线程可根据预设的同步安排,对同步数据对象做水平分割,形成更细粒度的多个数据块,在此数据分块的基础上,实现可靠的同步事务前进机制。在服务端端获取上述数据块;将所述数据块放入消息队列;从所述消息队列中取出数据块同步至目标数据库。在服务端端通过引入消息队列机制,实现数据更新并行处理优化。以统一的SyncML同步模式,在移动网和互联网上实现大型关系型数据库、计算机文档对象、手机文档对象的通用数据同步机制。本发明通过客户端并发多线程调度、大数据集水平分割、服务端异步并行处理的组合措施,在无特定通讯保障的广域网上提供了实用级的异构数据同步服务。
附图说明
为了更清楚地说明本发明具体实施方式或现有技术中的技术方案,下面将对具体实施方式或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图是本发明的一些实施方式,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为一种普适多源异构大规模数据同步系统的结构示意图;
图2为一种普适多源异构大规模数据同步系统的数据同步处理流程图;
图3为一种异构数据同步方法的流程图;
图4为一种异构数据同步方法的流程图。
具体实施方式
下面将结合附图对本发明的技术方案进行清楚、完整地描述,显然,所描述的实施例是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。
在本发明的描述中,需要说明的是,术语“中心”、“上”、“下”、“左”、“右”、“竖直”、“水平”、“内”、“外”等指示的方位或位置关系为基于附图所示的方位或位置关系,仅是为了便于描述本发明和简化描述,而不是指示或暗示所指的装置或元件必须具有特定的方位、以特定的方位构造和操作,因此不能理解为对本发明的限制。此外,术语“第一”、“第二”、“第三”仅用于描述目的,而不能理解为指示或暗示相对重要性。
在本发明的描述中,需要说明的是,除非另有明确的规定和限定,术语“安装”、“相连”、“连接”应做广义理解,例如,可以是固定连接,也可以是可拆卸连接,或一体地连接;可以是机械连接,也可以是电连接;可以是直接相连,也可以通过中间媒介间接相连,还可以是两个元件内部的连通,可以是无线连接,也可以是有线连接。对于本领域的普通技术人员而言,可以具体情况理解上述术语在本发明中的具体含义。
此外,下面所描述的本发明不同实施方式中所涉及的技术特征只要彼 此之间未构成冲突就可以相互结合。
实施例1
本实施例提供一种普适多源异构大规模数据同步系统,如图1所示,包括:同步网络规划管理单元11、安装配置单元12、分块流水线处理单元13、单向同步单元14、双向同步单元15和同步正确性保障单元16。
同步网络规划管理单元11,用于构建树状结构的同步拓扑结构,所述同步拓扑结构包括多个由一对相邻节点构成的同步对,所述同步对包括客户端(相对的下级节点)和服务端(相对的上级节点),同步树的中间节点(除叶和根节点之外的节点)可以同时配有客户端和服务端两种角色,但它们分属于不同的同步对。所述客户端包括用于实现全量/增量数据同步的中间层软件组件;所述服务端包括用于接收客户端发送源数据端数据变更,并负责将所接收到的数据变更更新到目标数据端的软件组件;客户端和服务端可安装部署在同一台计算机上,各自在其所处的同步对的同步作业中运行其功能;所述源数据端包括在数据同步中作为数据变更来源的数据副本所在的数据库或文件目录;所述目标数据端包括在数据同步中响应来自源数据端的数据变更的数据副本所在的数据库或文件目录。
系统为同步树的(客户端→服务端)单向数据同步提供扩散与非扩散两种配置策略。扩散策略会把数据变更信息记入目标数据端(当前同步对的服务端)的变更日志表,非扩散策略则不做此记录操作。
安装配置单元12,用于在服务端执行安装脚本以创建异构数据映射表、同步监控表、同步日志表、消息队列、消息队列监控线程和数据更新线程 池,以及用于在客户端执行安装脚本以创建同步配置表、变更日志表、变更事件监控线程和同步线程池。
服务端构建的异构数据映射规则包括同步的源数据端和目标数据端的数据表名、主键或“虚拟主键”、字段名、字段数据类型、字段数据长度、字段映射关系等。在服务端构建异构数据映射规则时,若源数据端的数据表未定义主键,服务端则根据所述元数据信息中的字段信息,使用“虚拟主键”算法为数据表构建一个可唯一标识其数据记录的“虚拟主键”,并把“虚拟主键”构建规则存储在服务端。“虚拟主键”算法通过筛选所述数据表的主键字段构建“虚拟主键”。此外,服务端还会检查有无该客户端对应的消息队列,若无,则为该客户端创建对应的消息队列。消息队列用于暂时存储服务端接收的对应客户端的变更块。
客户端获取参与同步的源数据端的元数据信息(源数据端参与同步的所有数据表的结构数据)并传输给服务端;服务端根据所述元数据信息,自动构建并存储“源数据端”←→“目标数据端”的异构数据映射规则。客户端从服务端获取异构数据映射规则,根据该规则和变更捕获触发器(简称“触发器”)在源数据端为每个参与同步的数据表创建“插入”、“删除”和“更改”三种变更捕获触发器。为了方便查看和修改异构数据映射规则,本实施例在服务端提供可视化配置工具以维护或调整异构数据映射规则。
具体地,客户端预创建同步线程池的具体方法为根据所述元数据信息创建若干(预设个数)数据同步线程。一个数据表的增量或全量同步由一个(组)同步线程负责。服务端预创建更新线程池的具体方法为创建若干(预设个数)数据更新线程。一个数据表的数据更新由一个数据更新线程 负责。
分块流水线处理单元13,用于将客户端发送的参与同步的源数据端的变更数据划分为若干变更块,并根据网络状况和服务端的处理能力分批次依序向服务端传输,每发送完一批变更块即锁定发送过程直至服务端返回接收确认消息后再恢复后继批次的变更块的反送,循环反复直至完成全部变更志的发送;以及用于服务端将接收到的变更块存入对应的消息队列。
单向同步单元14,用于当客户端的变更事件监控线程监控变更日志表不为空,则调度客户端的若干同步线程启动数据同步作业;以及用于当服务端的消息队列监控线程监控消息队列不为空,则调度服务端的若干数据更新线程进行数据更新作业。分块流水线处理单元13和单向同步单元14协同作业可完成一次数据同步,其数据处理过程如图2所示。
每当源数据端发生数据变更时,触发器将捕获该数据变更事件,并将数据变更信息存储在客户端的数据变更日志表。数据变更信息包含其对应的经受数据变更的数据表的表名、数据记录的主键值、变更操作类型等。客户端的变更事件监控线程通过监控变更日志表即可发现源数据端的数据变更,进一步客户端通过同步线程发起数据同步作业。同步线程根据每个数据表的同步预设值将其待同步的全部变更数据或同步操作记录划分为多个变更块,并将每个变更块封装为SyncML消息包,依序发送到服务端。服务端在数据同步过程中,对每个SyncML消息包的会话请求,分配一个数据接收线程负责接收客户端上传的SyncML消息包。数据接收线程接收SyncML消息包并解析还原出变更块,将其交由消息处理机制存储到指定的消息队 列,并在存储成功后向客户端反馈同步成功信息(确认消息)。服务端的消息队列监控线程监控消息队列是否为空,若不为空,则通知服务端调度若干数据更新线程进行数据更新作业。服务端的数据更新线程以异步并行方式做后继处理,即从消息队列读取变更块,根据异构数据映射规则实施当地数据变更,使得目标数据副本(当地)与源数据副本的同步数据一致。
客户端的数据同步线程在读取数据表变更数据时,数据同步线程首先将数据变更日志表与数据表关联,按数据变更日志表记录插入的时间正排序,读取该数据表的所有变更的数据记录;其次,数据同步线程通过SyncML通用协议把SyncML消息包传送到服务端;最后,在服务端返回确认(收到)消息后,数据同步线程将SyncML消息包中的变更块所对应的数据变更日志记录删除。
本实施例的异构数据同步系统,适用于增量同步和全量同步。其中,增量同步指的是仅将上次成功同步以来新产生的当地数据变更同步到目标数据端。实施增量同步时,客户端轮询当地变更日志表,将待同步的数据变更传送到服务端,并在服务端返回同步确认消息后删除当地对应的变更日志记录。全量同步指的是将源数据端的同步数据表,即参与数据同步的数据表的全表数据同步到目标数据端。本实施例中的全量同步,包括两种类型,即合并同步和刷新同步。刷新同步在开始更新数据到目标数据表前,会完全擦除目标数据端的数据表,通常用于重置同步对的同步任务。合并同步在开始更新数据到目标数据表前,会检查数据记录是否已存在目标表中,若不存在,则执行Insert操作,否则执行Update操作。需要说明的是,在实施全量同步期间,源数据端上已激活的触发器可捕获并记载此期 间所有发生的当地变更,以便后续的增量同步作业处理。
双向同步单元15,用于顺序执行的两个方向相反的单向同步以完成双向同步,即C←→S等价于C→S和S→C。其中,C和S分别代表客户端和服务端,箭头→或←代表同步方面(变更块传送的方向),以下同。
同步正确性保障单元16,用于在客户端将源数据端的数据变更事件依发生的先后时序不重复记载在变更日志表中,以及用于在服务端依次接收源数据端的数据变更,分别按原顺序落实各个数据变更并记载在同步日志表中。在本实施例中,无论是C→S或S→C的数据同步,均由客户端发起,因此,在本实施例中由客户端完成查验和管理参与同步的数据表触发器,例如客户端负责触发器的创建、更改、激活等管理以及一致性查验。为了进一步保障数据同步的正确性,本实施例的增量同步禁止单点多进程同步,即禁止多个客户端同时对同一个源数据端进行数据同步,以避免同一数据表的同步变更出现错序。
此外,本实施例的系统还提供通用的“多属性主键←→单属性主键”映射规则,以记载单一以及非单一属性主键数据表的变更事件,此类变更事件可记载在客户端的变更日志表中。
由于在本实施例中,数据同步均由客户端发起,为实现对数据同步的统一监管,本实施例还提供针对各个客户端的同步调度以及同步状态实时监控单元。
为了提高数据同步的可靠性和安全性,本实施例提供“乐观”和“谨慎”两种同步性能选项。在“谨慎”选项下,需对待发送的变更块做Hash 计算,并将其Hash值和变更块一起封装成SyncML消息包;接收方在接收到SyncML消息包后,要对解释出的变更块做Hash校验,即重算变更块的Hash值,并与发送过来的Hash值比较,如两者相同,则通过校验;如校验成功,才将变更块存入消息队列,否则,返回同步失败消息。在“乐观”选项下,略去上述涉及Hash值的处理。“乐观”选项适用于对同步传输的可靠性有比较乐观预期且倾向于追求更高的同步性能的应用场景,例如同步通讯网络质量较好的情况;相反,“谨慎”选项适用于对同步传输的可靠性有不太乐观,甚至悲观预期,或者倾向于更加谨慎地校验同步传输结果的正确性的应用场景,例如同步通讯网络质量不太好的情况。
本实施例以变更块为同步消息单位发送,并采用块锁步发送机制,即每发送一批变更块就锁定等待,直至接收方返回确认消息或者超时。变更块越大,传输效率就越高,但若出现传输故障,需要重传整个变更块,其重传代价也越高。本实施例提供调配变更块大小(其包含多少条变更数据或变更操作记录)的功能,以便调适大规模数据同步的效率和可靠性。若所应用的网络环境,含端点系统环境,通讯可靠性高,则调大同步的变更块,若所应用的网络环境,含端点系统环境,通讯可靠性低,则调小同步的变更块。无锁步发送机制可视为锁步发送机制的一个特例,其相当于将变更块调到最大,即将全部发送内容作为一个变更块。
实施例2
本实施例提供一种异构数据同步方法,适用于如实施例1所述的普适多源异构大规模数据同步系统,下面以客户端为例介绍该异构数据同步方法,如图3所示,包括如下步骤:
S11:同步配置初始化阶段。客户端获取源数据端参与同步数据表的元数据信息并传输给服务端。然后,客户端创建若干(预设值)数据同步线程,以及根据从服务端获取的异构数据映射规则和变更捕获规则模板,在源数据端创建“删除”、“插入”和“更改”三种变更捕获触发器(统称“触发器”)。
S12:变更事件触发阶段。在变更事件触发时,触发器将变更事件信息存入当地数据变更日志表。变更事件信息包含所涉及的数据表的表名、数据记录的主键值、变更类型以及变更时间等信息。
S13:数据同步阶段。变更事件监控线程轮询变更日志表,确认变更日志表是否为空;若变更日志表不为空,则通知客户端调度若干数据同步线程进行所涉及数据表的数据同步作业。所述数据同步线程分别根据从服务端获取的异构数据映射规则及预设值将至少一个数据表的变更集划分为多个变更块。松散耦合自治应用之间的海量异构数据同步,在不可靠互联网通讯环境场景下,发生意外夭折是不能忽略的,因此需要提供断点续恢复同步机制,即:将同步过程划分成若干段,每段结束时产生一个保存点来保存当前同步进度情况,以备在同步夭折后恢复同步时确定重拾同步过程的位置。
每轮数据同步开始前,同步线程会统计所要同步的数据表的总记录数,并把数据表的表名以及拟同步的总记录数告知服务端,然后启动本轮同步过程。同步线程在发送本轮最后一个变更块时,告知服务端此为本轮同步会话发送的最后一个变更块;服务端接收该变更块后,结束本轮同步。
优选的,上述步骤S11包括以下子步骤:
S111:创建数据变更日志表。变更日志表包括变更所涉及的数据表名、数据记录的主键值、变更事件类型和变更时间等信息。具体地,变更日志表如表1所示(适用于客户端和服务端):
表1
字段描述 字段名 字段类型
记录标识号 CHG_SEQ_NUM INTEGER
变更记录时间 CHG_TIMESTAMP TIMESTAMP
同步发起端URI SOURCE_URI VARCHAR(256)
同步发起端数据表 SOURCE_TABLE VARCHAR(128)
变更记录的主键值 SOURCE_KEY_VALUE VARCHAR(128)
变更类型 CHG_TYPE CHAR(1)
S112:通过触发器捕获数据变更事件。捕获每一行数据记录的UPDATE(更改)、INSERT(插入)和DELETE(删除)等操作事件。
S113:将数据变更事件记录在变更日志表中。预设变更日志表和触发器,能在不干涉上层应用程序、不影响正常的数据库吞吐的情况下,实现增量数据的同步。
下面以服务端为例介绍实施例2中的异构数据同步方法,如图4所示,包括如下步骤:
S21:同步配置初始化阶段。服务端根据客户端上传元数据信息,自动构建并存储源数据端和目标数据端之间的异构数据映射规则;服务端创建若干(预设值)数据更新线程。
本实施例采用Java语言编写中间层系统,使用JDBC(Java DataBase Connectivity,Java数据库连接)规范实现统一的数据库访问接口。特别地,我们采用建立各种异构关系数据库数据类型至统一的Java数据类型(作为 中间数据类型)的方法,为Oracle、MS SQL Server和Sybase ASE 15等异构主流数据库产品之间建立数据类型的映射关系,即预设的映射规则。采用Java数据类型作为中间数据库类型,也保证客户端与服务端之间的传输解析不易出错。
Java数据类型与常见的其它异构数据类型的部分映射如表2所示。
表2
Figure PCTCN2018076485-appb-000001
在构建实际异构数据映射规则时,上述各种异构数据类型与Java数据类型之间的映射关系将作为媒介,用以确定同步双方的具体数据类型对应。
优选地,在上述步骤S21构建异构映射规则过程中,还包括:判断所涉及的数据表是否定义了主键。若所述数据表未定义主键时,则为其构建虚拟主键来唯一标识其数据记录,以实施保障同步过程。
具体地,构建虚拟主键的算法如下:虚拟主键=str(F 1)+ASII(11)+str(F 2)+…+ASII(11)+str(F k),其中,F 1、F 2、…和F k为可联合构成主键的字段,str(X)函数计算变量X的字符串值,“+”为字符串连接运行符,ASII(11)代表ASCII(American Standard Code for Information Interchange)十进制编码值为11的字符,此为一个不可显示的字符(称“垂 直制表符”),它不可能会出现在文字内容中。
S22:数据接收阶段。为客户端每个上传SyncML消息包的会话请求,服务端分配一个数据接收线程来负责消息包的接收、解析,并把解析还原出的变更块交由消息处理机制存入(该客户端)对应的消息队列。为提高同步过程的整体效率,本发明在服务端嵌入异步并行消息处理机制。服务端的数据接收线程接收到来自客户端的SyncML消息包后,只做简单的解析和处理,即调用消息处理机制将其解释出来的变更块存入队列,就向客户端返回确认消息以示意客户端可继续发送其它SyncML消息包,而不必等待目标数据端完成同步变更。而存在消息队列中的变更块,会交由服务端的同步处理模块异步地做并行处理,即多道线程同时进行处理。这种处理方式,加快了客户端消息发送进度,另外,由于服务端不必马上完成全程处理,使得服务端的整体处理压力得到缓解。在大规模数据同步应用中,服务端通常是性能瓶颈。
S23:数据更新阶段。消息队列监控线程监控消息队列是否为空,若队列不为空,则通知服务端调度若干数据更新线程进行队列对应同步数据表的数据更新作业。数据更新线程用于从消息队列读取变更块更新到目标数据端。数据更新线程从消息队列中读取变更块,然后根据异构数据映射规则,把变更块更新到目标数据端的对应的同步数据表。
优选地,上述步骤S22包括以下子步骤:
S221:接收客户端上传的消息包,所述消息包含有一个或多个变更块。
S222:将消息包解析为变更块。
为得到更广泛的的应用,本发明是基于IT业界通用的SyncML标准协 议进行数据同步。SyncML最初作为一种开放的、平台无关的信息同步标准协议提出,现属于OMA(开放移动联盟)组织的Data Synchronization and Device Management协议族,可在兼容的设备、程序及网络进行数据同步,使任何设备或程序可以获得一致的数据。但是,SyncML只提供基本的通讯框架,远不足以应对大规模数据同步应用实用级的产品需求,例如,它未提供可靠性保障机制,这在实用级的应用中是至关重要的。为了填补上述不足,实施例1的普适多源异构大规模数据同步系统设置的同步正确性保障单元16,以保障异构数据同步的可靠性。
显然,上述实施例仅仅是为清楚地说明所作的举例,而并非对实施方式的限定。对于所属领域的普通技术人员来说,在上述说明的基础上还可以做出其它不同形式的变化或变动。这里无需也无法对所有的实施方式予以穷举。而由此所引伸出的显而易见的变化或变动仍处于本发明创造的保护范围之中。

Claims (16)

  1. 一种普适多源异构大规模数据同步系统,其特征在于,包括:
    同步网络规划管理单元,用于构建树状结构的同步拓扑结构,所述同步拓扑结构包括多个由一对相邻节点构成的同步对,所述同步对包括客户端和服务端;
    安装配置单元,用于在服务端执行安装脚本以创建异构数据映射表、同步日志表、消息队列、消息队列监控线程和数据更新线程池,以及用于在客户端执行安装脚本以创建变更日志表、变更事件监控线程和同步线程池;
    分块流水线处理单元,用于将客户端发送的参与同步的源数据端的变更数据划分为若干变更块,并根据网络状况和服务端的处理能力分批次依序向服务端传输,每发送完一批变更块即锁定发送过程直至接服务端返回接收确认消息后再恢复后继批次的变更块的反送,循环反复直至完成全部变更志的发送;以及用于服务端将接收到的变更块存入对应的消息队列;
    单向同步单元,用于当客户端的变更事件监控线程监控到变更日志表不为空,则调度客户端的若干同步线程启动数据同步作业;以及用于当服务端的消息队列监控线程监控到消息队列不为空,则调度服务端的若干数据更新线程进行数据更新作业;
    双向同步单元,用于顺序执行的两个方向相反的单向同步以完成双向同步;
    同步正确性保障单元,用于在客户端将源数据端的数据变更事件依发生的先后时序不重复记载在变更日志表中,以及用于在服务端依次接收源 数据端的数据变更,分别按原顺序落实各个数据变更并记载在同步日志表中。
  2. 如权利要求1所述的一种普适多源异构大规模数据同步系统,其特征在于,所述客户端包括用于实现全量/增量数据同步的中间层软件组件;所述服务端包括用于接收客户端发送的源数据端的数据变更,并负责将所接收到的数据变更更新到目标数据端的软件组件;所述源数据端包括在数据同步中作为数据变更来源的数据副本所在的数据库或文件目录;所述目标数据端包括在数据同步中响应来自源数据端的数据变更的数据副本所在的数据库或文件目录。
  3. 如权利要求1所述的一种普适多源异构大规模数据同步系统,其特征在于,对于所述树状结构的同步拓扑结构的中间节点,同时配有分属于不同的同步对的客户端和服务端。
  4. 如权利要求2所述的一种普适多源异构大规模数据同步系统,其特征在于,所述源数据端的变更日志表用于保存数据变更信息。
  5. 如权利要求1所述的一种普适多源异构大规模数据同步系统,其特征在于,所述安装配置单元还包括同步配置子单元,所述同步配置子单元用于控制客户端获取参与同步的源数据端的元数据信息并传输给服务端,控制服务端根据所述元数据信息构建并存储异构数据映射规则,控制客户端从服务端获取异构数据映射规则和在源数据端为每个参与同步的数据表创建插入、删除和更改三种变更捕获触发器,以及控制服务端提供可视化配置工具维护或调整异构数据映射规则。
  6. 如权利要求1所述的一种普适多源异构大规模数据同步系统,其特征在于,所述安装配置单元在服务端预创建的数据更新线程池中包含预设个数的数据更新线程,所述目标数据端中每一个数据表的数据更新由一个数据更新线程负责;所述安装配置单元在客户端预创建的同步线程池中包含根据所述元数据信息创建的预设个数的数据同步线程,所述源数据端中每一个数据表的增量或全量同步由一个或一组同步线程负责。
  7. 如权利要求5所述的一种普适多源异构大规模数据同步系统,其特征在于,所述单向同步单元还用于每当源数据端发生数据变更时,控制所述变更捕获触发器捕获该数据变更事件,并将对应的数据变更信息存储在客户端的变更日志表;所述数据变更信息包含经受数据变更的数据表的表名、数据记录的主键值和变更操作类型。
  8. 如权利要求7所述的一种普适多源异构大规模数据同步系统,其特征在于,所述单向同步单元还用于控制客户端的同步线程根据每个经受数据变更的数据表的同步预设值将其对应的变更数据或同步操作记录划分为多个变更块,将每个所述变更块封装为SyncML消息包并依序发送到服务端。
  9. 如权利要求8所述的一种普适多源异构大规模数据同步系统,其特征在于,所述单向同步单元还用于控制服务端在数据同步过程中,对每个SyncML消息包的会话请求,分配一个数据接收线程以接收客户端上传的SyncML消息包;所述数据接收线程用于接收SyncML消息包并解析还原出变更块,以及将所述解析出的变更块存储到指定的消息队列,并在存储成功后向客户端反馈同步成功信息。
  10. 如权利要求9所述的一种普适多源异构大规模数据同步系统,其特征在于,所述单向同步单元还用于控制客户端的同步线程将变更日志表与经受数据变更的数据表关联,按所述变更日志表记录插入的时间正排序,读取所述变更日志表的所有变更的数据记录;以及在服务端接收变更块并返回确认信息后,控制所述同步线程将变更块所对应的变更日志表中的变更日志记录删除。
  11. 如权利要求9所述的一种普适多源异构大规模数据同步系统,其特征在于,所述单向同步单元还用于控制服务端的数据更新线程从所述消息队列读取变更块,根据所述异构数据映射规则实施当地数据变更,使得目标数据端的目标数据副本与源数据端的源数据副本的同步数据一致。
  12. 如权利要求9所述的一种普适多源异构大规模数据同步系统,其特征在于,所述单向同步单元还用于控制客户端的同步线程对待发送的变更块做Hash计算,并将其Hash值和变更块一起封装成SyncML消息包;以及控制服务端的数据接收线程在接收到所述SyncML消息包后,要对解析出的变更块做Hash校验,当校验成功时将所述变更块存入消息队列,否则,返回同步失败消息。
  13. 如权利要求9所述的一种普适多源异构大规模数据同步系统,其特征在于,所述单向同步单元还用于控制所述变更块的大小,以及控制客户端的同步线程在每发送一个变更块后进入锁定等待状态,直至服务端返回确认消息或者超时。
  14. 如权利要求1所述的一种普适多源异构大规模数据同步系统,其特 征在于,所述安装配置单元在服务端创建的异构数据映射表中的异构数据映射规则,包括同步的源数据端和目标数据端的数据表名、主键或虚拟主键、字段名、字段数据类型、字段数据长度和字段映射关系。
  15. 如权利要求14所述的一种普适多源异构大规模数据同步系统,其特征在于,所述安装配置单元还用于构建所述异构数据映射规则中的虚拟主键;当源数据端的数据表未定义主键时,所述安装配置单元控制服务端根据所述元数据信息中的字段信息,为所述数据表构建一个可唯一标识其数据记录的虚拟主键,并把所述虚拟主键的构建规则存储在服务端。
  16. 如权利要求1所述的一种普适多源异构大规模数据同步系统,其特征在于,所述安装配置单元还用于检查服务端有无对应客户端的消息队列;当服务端没有对应客户端的消息队列时,所述安装配置单元控制服务端为该客户端创建对应的消息队列,所述消息队列用于暂时存储服务端接收的对应客户端的变更块。
PCT/CN2018/076485 2017-09-08 2018-02-12 一种普适多源异构大规模数据同步系统 WO2019047479A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US16/812,243 US11500903B2 (en) 2017-09-08 2020-03-06 Generic multi-source heterogeneous large-scale data synchronization client-server method

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710805313.8A CN107729366B (zh) 2017-09-08 2017-09-08 一种普适多源异构大规模数据同步系统
CN201710805313.8 2017-09-08

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US16/812,243 Continuation-In-Part US11500903B2 (en) 2017-09-08 2020-03-06 Generic multi-source heterogeneous large-scale data synchronization client-server method

Publications (1)

Publication Number Publication Date
WO2019047479A1 true WO2019047479A1 (zh) 2019-03-14

Family

ID=61205807

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/076485 WO2019047479A1 (zh) 2017-09-08 2018-02-12 一种普适多源异构大规模数据同步系统

Country Status (3)

Country Link
US (1) US11500903B2 (zh)
CN (1) CN107729366B (zh)
WO (1) WO2019047479A1 (zh)

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110659138A (zh) * 2019-08-26 2020-01-07 平安科技(深圳)有限公司 基于定时任务的数据同步方法、装置、终端及存储介质
CN110895547A (zh) * 2019-11-29 2020-03-20 交通银行股份有限公司 基于db2联邦特性的多源异构数据库数据同步系统及方法
CN111291119A (zh) * 2020-01-21 2020-06-16 郑州阿帕斯数云信息科技有限公司 数据同步方法和装置
CN111339103A (zh) * 2020-03-13 2020-06-26 河南安冉云网络科技有限公司 一种基于全量分片和增量日志解析的数据交换方法及系统
CN111782690A (zh) * 2019-04-04 2020-10-16 上海晶赞融宣科技有限公司 多源异构数据的汇聚方法及装置、存储介质、终端
CN111881214A (zh) * 2020-07-29 2020-11-03 浪潮云信息技术股份公司 一种基于cmsp的drdb数据库数据同步的方法
CN112765265A (zh) * 2020-12-30 2021-05-07 杭州贝嘟科技有限公司 数据同步方法、装置、计算机设备和可读存储介质
CN112800134A (zh) * 2021-01-27 2021-05-14 北京明略软件系统有限公司 一种数据同步方法及系统
CN112818054A (zh) * 2020-10-15 2021-05-18 广州南天电脑系统有限公司 数据同步方法、装置、计算机设备和存储介质
CN113220791A (zh) * 2021-06-03 2021-08-06 西安热工研究院有限公司 一种数据级联同步系统及方法
CN113407636A (zh) * 2021-07-09 2021-09-17 浙江明度智控科技有限公司 一种用于数字工厂的离线数据同步方法、装置和服务器
CN115208899A (zh) * 2022-06-29 2022-10-18 安阳师范学院 一种改进的p2p海量科学数据同步方法
CN115914330A (zh) * 2022-10-11 2023-04-04 中国电子科技集团公司第二十八研究所 一种基于nio异步线程模型的异构应用间通信方法
CN116094938A (zh) * 2023-01-16 2023-05-09 紫光云技术有限公司 基于kafka的网络拓扑同步方法、设备、服务器及存储介质
US11928131B2 (en) 2021-09-27 2024-03-12 International Business Machines Corporation Synchronized activation of datasets in multicloud computing environment

Families Citing this family (56)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109241191B (zh) * 2018-09-13 2021-09-14 华东交通大学 一种分布式数据源异构同步平台及同步方法
CN109445816B (zh) * 2018-10-16 2022-03-22 网易(杭州)网络有限公司 导表处理方法、装置、存储介质、处理器及终端
CN109710690B (zh) * 2018-11-30 2020-09-01 北京大数元科技发展有限公司 一种业务驱动计算方法及系统
US10977275B1 (en) * 2018-12-21 2021-04-13 Village Practice. Management Company, Llc System and method for synchronizing distributed databases
CN109815028B (zh) * 2018-12-27 2022-02-08 汉海信息技术(上海)有限公司 数据同步的系统、方法、装置和计算机存储介质
CN109815292A (zh) * 2019-01-03 2019-05-28 广州中软信息技术有限公司 一种基于异步消息机制的涉税数据采集系统
CN109933587B (zh) * 2019-02-26 2023-04-11 厦门市美亚柏科信息股份有限公司 基于目录注册的数据处理方法、装置、系统及存储介质
CN111985944B (zh) * 2019-05-21 2024-06-18 北京沃东天骏信息技术有限公司 物料数据的处理方法、装置、设备及存储介质
CN110413677A (zh) * 2019-07-30 2019-11-05 无锡柠檬科技服务有限公司 一种支持并发应用的分布式数据同步方法和系统
CN110795499B (zh) * 2019-09-17 2024-04-16 中国平安人寿保险股份有限公司 基于大数据的集群数据同步方法、装置、设备及存储介质
CN110795505A (zh) * 2019-10-23 2020-02-14 北京仿真中心 一种用户数据跨域同步分发系统以及方法
CN111078687B (zh) * 2019-11-14 2023-07-25 青岛民航空管实业发展有限公司 航班运行数据融合方法、装置及设备
CN111008189B (zh) * 2019-11-26 2023-08-25 浙江电子口岸有限公司 一种动态数据模型构建方法
CN111177266B (zh) * 2019-12-27 2024-03-22 广州米茄科技有限公司 Cg生产中的数据同步系统
CN111274257B (zh) * 2020-01-20 2023-10-20 山东省电子口岸有限公司 一种基于数据的实时同步方法及系统
CN111339207A (zh) * 2020-03-20 2020-06-26 宁夏菲麦森流程控制技术有限公司 一种多类型数据库之间同步数据的方法
WO2021237704A1 (zh) * 2020-05-29 2021-12-02 深圳市欢太科技有限公司 数据同步方法及相关装置
CN111767332B (zh) * 2020-06-12 2021-07-30 上海森亿医疗科技有限公司 异构数据源的数据集成方法、系统以及终端
CN111831748B (zh) * 2020-06-30 2024-04-30 北京小米松果电子有限公司 数据同步方法、装置及存储介质
CN112069157B (zh) * 2020-08-24 2024-02-02 浙江鸿城科技有限责任公司 一种车间数据可信度判断的方法
US11669548B2 (en) 2020-09-14 2023-06-06 Formagrid Inc Partial table and multisource synchronization for databases
CN112182003A (zh) * 2020-09-28 2021-01-05 北京沃东天骏信息技术有限公司 一种数据同步方法和装置
CN113407530A (zh) * 2020-11-20 2021-09-17 广东美云智数科技有限公司 一种权限数据的回收方法、管理装置以及存储介质
CN113407528A (zh) * 2020-11-20 2021-09-17 广东美云智数科技有限公司 一种权限数据的同步方法、管理装置以及存储介质
CN112463780B (zh) * 2020-12-02 2024-01-05 中国工商银行股份有限公司 数据质量检查方法及装置
CN112540839B (zh) * 2020-12-22 2024-03-19 平安银行股份有限公司 信息变更方法、装置、电子设备及存储介质
CN112631369B (zh) * 2020-12-23 2023-05-12 中国人民解放军63921部队 一种用于多个异构系统间的时间同步联合控制方法
CN112667745A (zh) * 2020-12-31 2021-04-16 北京天融信网络安全技术有限公司 数据同步方法、装置、隔离网闸及数据同步系统
CN112732839B (zh) * 2021-01-21 2023-06-23 云账户技术(天津)有限公司 一种数据同步方法及装置
CN112905109B (zh) * 2021-01-28 2023-02-03 平安普惠企业管理有限公司 消息处理方法、装置、设备及存储介质
CN112925958A (zh) * 2021-02-05 2021-06-08 深圳力维智联技术有限公司 多源异构数据适配方法、装置、设备及可读存储介质
CN112783899B (zh) * 2021-02-08 2023-03-24 青岛港国际股份有限公司 空轨单机控制系统与码头设备控制系统交互方法
US11847138B2 (en) 2021-03-08 2023-12-19 Smart Systems Technology, Inc. Reflection database architecture with state driven synchronization
CN113032379B (zh) * 2021-03-16 2022-07-22 广东电网有限责任公司广州供电局 面向配网运检多源数据采集的方法
CN113220711A (zh) * 2021-05-19 2021-08-06 李坚白 一种同步异构机器人控制器数据库的方法及系统
CN113468168B (zh) * 2021-05-27 2024-01-19 中国特种设备检测研究院 一种起重机械多源异构数据高速采集与处理软控制方法
CN113076208B (zh) * 2021-06-04 2021-08-31 上海燧原科技有限公司 一种多事件同步电路、方法以及计算芯片
CN113254535B (zh) * 2021-06-08 2022-12-13 成都新潮传媒集团有限公司 一种mongodb到mysql的数据同步方法、装置及计算机可读存储介质
US11816110B2 (en) * 2021-06-22 2023-11-14 International Business Machines Corporation Processing large query results in a database accelerator environment
CN114071698B (zh) * 2021-10-19 2024-01-09 四川九洲空管科技有限责任公司 一种具备参数动态配置与状态感知的自组网数据收发方法及装置
CN113722401B (zh) * 2021-11-04 2022-02-01 树根互联股份有限公司 数据缓存方法、装置、计算机设备及可读存储介质
CN114297295A (zh) * 2021-12-24 2022-04-08 北京京东拓先科技有限公司 数据同步方法及装置、电子设备和计算机可读存储介质
CN114510367B (zh) * 2022-01-11 2023-06-02 电子科技大学 一种安全智能的多源异构数据处理系统
CN114756629B (zh) * 2022-06-16 2022-10-21 之江实验室 基于sql的多源异构数据交互分析引擎及方法
CN115599797A (zh) * 2022-11-29 2023-01-13 广东通莞科技股份有限公司(Cn) 基于操作日志的双向同步方法、装置和电子设备
CN115982133A (zh) * 2023-02-01 2023-04-18 花瓣云科技有限公司 数据处理方法及装置
CN116204593A (zh) * 2023-03-17 2023-06-02 北京金和网络股份有限公司 一种数据实时同步的方法、系统、设备及存储介质
CN116414902B (zh) * 2023-03-31 2024-06-04 华能信息技术有限公司 一种快速数据源接入方法
CN116954927B (zh) * 2023-09-07 2023-12-01 建信住房服务有限责任公司 一种分布式异构数据采集方法、存储介质及电子设备
CN117290451B (zh) * 2023-09-12 2024-06-07 上海沄熹科技有限公司 一种保证数据库系统事务一致性的方法及系统
CN117421337B (zh) * 2023-09-26 2024-05-28 东土科技(宜昌)有限公司 数据采集方法、装置、设备及计算机可读介质
CN117171534B (zh) * 2023-11-03 2024-03-19 济南二机床集团有限公司 一种数控机床多源异构数据获取方法、系统、装置及介质
CN117573759A (zh) * 2023-12-11 2024-02-20 中国电子投资控股有限公司 一种多源异构数据管理系统、管理装置和管理方法
CN117407407B (zh) * 2023-12-15 2024-04-12 中信证券股份有限公司 多异构数据源数据集更新方法、装置、设备和计算机介质
CN117909410B (zh) * 2023-12-29 2024-07-02 江苏大塔科技有限公司 一种基于快照的无锁全增量一体化数据同步方法
CN117591607B (zh) * 2024-01-19 2024-05-07 杭州青橄榄网络技术有限公司 数据同步管理方法与系统

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070226263A1 (en) * 2001-08-30 2007-09-27 Ming-Tao Liou Method, apparatus and system for transforming, converting and processing messages between multiple systems
CN102542007A (zh) * 2011-12-13 2012-07-04 中国电子科技集团公司第十五研究所 关系型数据库之间的同步方法及系统
CN102982126A (zh) * 2012-11-14 2013-03-20 浙江星汉信息技术有限公司 一种数据库表数据实时推送的方法及其系统
CN103617176A (zh) * 2013-11-04 2014-03-05 广东电子工业研究院有限公司 一种实现多源异构数据资源自动同步的方法
CN104376017A (zh) * 2013-08-15 2015-02-25 阿里巴巴集团控股有限公司 在数据库之间进行数据同步的方法及系统
CN106055723A (zh) * 2016-08-17 2016-10-26 浪潮软件股份有限公司 一种数据库数据同步装置、系统及方法
CN106557592A (zh) * 2016-12-02 2017-04-05 中铁程科技有限责任公司 数据同步方法、装置及服务器集群

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6535892B1 (en) * 1999-03-08 2003-03-18 Starfish Software, Inc. System and methods for exchanging messages between a client and a server for synchronizing datasets
US7526575B2 (en) * 2001-09-28 2009-04-28 Siebel Systems, Inc. Method and system for client-based operations in server synchronization with a computing device
US20070073899A1 (en) * 2005-09-15 2007-03-29 Judge Francis P Techniques to synchronize heterogeneous data sources
US8024290B2 (en) * 2005-11-14 2011-09-20 Yahoo! Inc. Data synchronization and device handling
US8965954B2 (en) * 2008-10-21 2015-02-24 Google Inc. Always ready client/server data synchronization
US9002787B2 (en) * 2009-01-30 2015-04-07 Blackberry Limited Method and apparatus for tracking device management data changes
CN104778175A (zh) * 2014-01-13 2015-07-15 世纪禾光科技发展(北京)有限公司 一种实现异构数据库数据同步的方法及系统
CN104376062B (zh) * 2014-11-11 2018-01-26 中国有色金属长沙勘察设计研究院有限公司 一种异构数据库平台数据的同步方法
CN106790407B (zh) * 2016-11-29 2019-08-02 中通服咨询设计研究院有限公司 一种基于dlp大屏异构事务处理和信息通讯的系统

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070226263A1 (en) * 2001-08-30 2007-09-27 Ming-Tao Liou Method, apparatus and system for transforming, converting and processing messages between multiple systems
CN102542007A (zh) * 2011-12-13 2012-07-04 中国电子科技集团公司第十五研究所 关系型数据库之间的同步方法及系统
CN102982126A (zh) * 2012-11-14 2013-03-20 浙江星汉信息技术有限公司 一种数据库表数据实时推送的方法及其系统
CN104376017A (zh) * 2013-08-15 2015-02-25 阿里巴巴集团控股有限公司 在数据库之间进行数据同步的方法及系统
CN103617176A (zh) * 2013-11-04 2014-03-05 广东电子工业研究院有限公司 一种实现多源异构数据资源自动同步的方法
CN106055723A (zh) * 2016-08-17 2016-10-26 浪潮软件股份有限公司 一种数据库数据同步装置、系统及方法
CN106557592A (zh) * 2016-12-02 2017-04-05 中铁程科技有限责任公司 数据同步方法、装置及服务器集群

Cited By (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111782690A (zh) * 2019-04-04 2020-10-16 上海晶赞融宣科技有限公司 多源异构数据的汇聚方法及装置、存储介质、终端
CN110659138A (zh) * 2019-08-26 2020-01-07 平安科技(深圳)有限公司 基于定时任务的数据同步方法、装置、终端及存储介质
CN110659138B (zh) * 2019-08-26 2024-03-15 平安科技(深圳)有限公司 基于定时任务的数据同步方法、装置、终端及存储介质
CN110895547A (zh) * 2019-11-29 2020-03-20 交通银行股份有限公司 基于db2联邦特性的多源异构数据库数据同步系统及方法
CN110895547B (zh) * 2019-11-29 2023-09-05 交通银行股份有限公司 基于db2联邦特性的多源异构数据库数据同步系统及方法
CN111291119B (zh) * 2020-01-21 2023-04-25 郑州阿帕斯数云信息科技有限公司 数据同步方法和装置
CN111291119A (zh) * 2020-01-21 2020-06-16 郑州阿帕斯数云信息科技有限公司 数据同步方法和装置
CN111339103A (zh) * 2020-03-13 2020-06-26 河南安冉云网络科技有限公司 一种基于全量分片和增量日志解析的数据交换方法及系统
CN111339103B (zh) * 2020-03-13 2023-06-20 河南安冉云网络科技有限公司 一种基于全量分片和增量日志解析的数据交换方法及系统
CN111881214B (zh) * 2020-07-29 2024-04-16 浪潮云信息技术股份公司 一种基于cmsp的drdb数据库数据同步的方法
CN111881214A (zh) * 2020-07-29 2020-11-03 浪潮云信息技术股份公司 一种基于cmsp的drdb数据库数据同步的方法
CN112818054A (zh) * 2020-10-15 2021-05-18 广州南天电脑系统有限公司 数据同步方法、装置、计算机设备和存储介质
CN112818054B (zh) * 2020-10-15 2022-05-06 广州南天电脑系统有限公司 数据同步方法、装置、计算机设备和存储介质
CN112765265A (zh) * 2020-12-30 2021-05-07 杭州贝嘟科技有限公司 数据同步方法、装置、计算机设备和可读存储介质
CN112800134A (zh) * 2021-01-27 2021-05-14 北京明略软件系统有限公司 一种数据同步方法及系统
CN113220791B (zh) * 2021-06-03 2023-07-28 西安热工研究院有限公司 一种数据级联同步系统及方法
CN113220791A (zh) * 2021-06-03 2021-08-06 西安热工研究院有限公司 一种数据级联同步系统及方法
CN113407636B (zh) * 2021-07-09 2022-06-03 明度智云(浙江)科技有限公司 一种用于数字工厂的离线数据同步方法、装置和服务器
CN113407636A (zh) * 2021-07-09 2021-09-17 浙江明度智控科技有限公司 一种用于数字工厂的离线数据同步方法、装置和服务器
US11928131B2 (en) 2021-09-27 2024-03-12 International Business Machines Corporation Synchronized activation of datasets in multicloud computing environment
CN115208899A (zh) * 2022-06-29 2022-10-18 安阳师范学院 一种改进的p2p海量科学数据同步方法
CN115914330A (zh) * 2022-10-11 2023-04-04 中国电子科技集团公司第二十八研究所 一种基于nio异步线程模型的异构应用间通信方法
CN115914330B (zh) * 2022-10-11 2024-04-12 中国电子科技集团公司第二十八研究所 一种基于nio异步线程模型的异构应用间通信方法
CN116094938A (zh) * 2023-01-16 2023-05-09 紫光云技术有限公司 基于kafka的网络拓扑同步方法、设备、服务器及存储介质
CN116094938B (zh) * 2023-01-16 2024-04-19 紫光云技术有限公司 基于kafka的网络拓扑同步方法、设备、服务器及存储介质

Also Published As

Publication number Publication date
CN107729366B (zh) 2021-01-05
US11500903B2 (en) 2022-11-15
CN107729366A (zh) 2018-02-23
US20200409977A1 (en) 2020-12-31

Similar Documents

Publication Publication Date Title
WO2019047479A1 (zh) 一种普适多源异构大规模数据同步系统
US11928029B2 (en) Backup of partitioned database tables
US11036591B2 (en) Restoring partitioned database tables from backup
US11604804B2 (en) Data replication system
US11327949B2 (en) Verification of database table partitions during backup
US9250811B1 (en) Data write caching for sequentially written media
EP2954403B1 (en) Cloud-based streaming data receiver and persister
US11860741B2 (en) Continuous data protection
US10795910B2 (en) Robust communication system for guaranteed message sequencing with the detection of duplicate senders
EP3333713A1 (en) Data storage application programming interface
US20140046908A1 (en) Archival data storage system
US10990629B2 (en) Storing and identifying metadata through extended properties in a historization system
WO2023029519A1 (zh) 一种数据同步方法及装置、计算机设备、存储介质
CN102012944B (zh) 一种提供复制特性的分布式nosql数据库的实现方法
US11488082B2 (en) Monitoring and verification system for end-to-end distribution of messages
WO2023005075A1 (zh) 数据的容灾恢复方法、系统、终端设备及计算机存储介质
WO2021147793A1 (zh) 数据处理方法、装置、系统、电子设备及计算机存储介质
US11822556B2 (en) Exactly-once performance from a streaming pipeline in a fault-vulnerable system
CN103491137A (zh) 数据同步系统和数据同步方法
CN113703917B (zh) 一种多集群资源数据处理系统与方法、非暂态存储介质
US9734185B2 (en) Mechanism for communication in a distributed database
CN107330089B (zh) 跨网络结构化数据收集系统
RU2690777C1 (ru) Способ и система комплексного управления большими данными
Ashwarya et al. RecSyncETCD: A fault-tolerant service for EPICS PV configuration data
CN117407215A (zh) 双活数据同步方法及装置、计算机可读存储介质

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18854536

Country of ref document: EP

Kind code of ref document: A1