WO2013071428A1 - System and method for data synchronization over a network - Google Patents

System and method for data synchronization over a network Download PDF

Info

Publication number
WO2013071428A1
WO2013071428A1 PCT/CA2012/050784 CA2012050784W WO2013071428A1 WO 2013071428 A1 WO2013071428 A1 WO 2013071428A1 CA 2012050784 W CA2012050784 W CA 2012050784W WO 2013071428 A1 WO2013071428 A1 WO 2013071428A1
Authority
WO
WIPO (PCT)
Prior art keywords
node
file
version
files
difference
Prior art date
Application number
PCT/CA2012/050784
Other languages
English (en)
French (fr)
Other versions
WO2013071428A8 (en
Inventor
Ram Sudama
Brad Moore
Balash Akbari
Charles Elliott
Michael Ye
Sam Demooy
Original Assignee
Dassault Systemes Geovia Inc., Dba Gemcom Software International Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dassault Systemes Geovia Inc., Dba Gemcom Software International Inc. filed Critical Dassault Systemes Geovia Inc., Dba Gemcom Software International Inc.
Priority to CN201280065691.2A priority Critical patent/CN104272649A/zh
Priority to AU2012339532A priority patent/AU2012339532B2/en
Publication of WO2013071428A1 publication Critical patent/WO2013071428A1/en
Publication of WO2013071428A8 publication Critical patent/WO2013071428A8/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1095Replication or mirroring of data, e.g. scheduling or transport for data synchronisation between network nodes

Definitions

  • the following relates generally to data communication over a network.
  • synchronizing information between these devices may be difficult if the network connection has low bandwidth or is only intermittently available.
  • the method comprises a first node establishing a connection with a second node, the first node having stored thereon one or more data files, each being associated with a version identifier, and a first synchronization list of data files to be synchronized.
  • the second node has stored thereon one or more corresponding data files, each being associated with a version identifier.
  • the first node determines, based on the version identifiers, whether a more recent version of each of the one or more data files on the first synchronization list exists on the second node. Upon determining that the second node comprises a more recent version, the first node obtains, from the second node, a difference file to update the data file on the first node.
  • the difference file represents the difference between the version of the data file on one node and the version of the data file on the other node.
  • at least two of a sequence of difference files are used to update the file on the first node; one of the at least two difference files representing the difference between the version of the data file on the second node and an intermediate version.
  • the synchronization list may comprise data files selected by a user. Data files associated with those selected by the user may also be included in the synchronization list.
  • the second node may further comprise a second synchronization list of data files to be synchronized.
  • the second node may determine whether the first node comprises a more recent version of each of the files on the second synchronization list.
  • the second node obtains, from the first node, a difference file to update the data file on the second node.
  • a priority ranking may be with each of the files on the synchronization list, wherein the data files are synchronized according to the priority ranking.
  • the priority ranking may be generated based on metadata associated with each of the data files.
  • the priority ranking may be generated based on the magnitude of difference between the version identifier of the data file on the first node and the more recent version of the data file on the second node.
  • a reference file for each of the one or more data files is also stored on the first node.
  • a modification detection module on the first node determines the degree to which each of the data files on the first synchronization list differ with respect to the respective reference file.
  • the priority ranking is generated based on the magnitude of difference between the data file and the reference file.
  • FIG. 1 is a block diagram illustrating a system in accordance with the present invention
  • FIG. 2 is a block diagram illustrating a node
  • FIG. 3 is a flow diagram illustrating a process of a node updating a file on a server
  • FIG. 4 is a flow diagram illustrating generating a difference file
  • FIG. 5 is a flow diagram illustrating a version history of an example file
  • FIG. 6 is a flow diagram illustrating an example process of a node obtaining, from a server, a more recent version of a file stored on the node;
  • FIG. 7 is an example diagram illustrating various types of data and links therebetween relevant to a mining operation.
  • the file management system enables transfer and synchronization of files over a network to enable data communication between two or more devices.
  • the network may be one that is unreliable or reliable and may exhibit characteristics comprising low bandwidth and/or low quality of service (QoS).
  • QoS quality of service
  • a node 10 may be a computer device, such as a desktop computer, a laptop computer, a mobile device such as a smartphone, a network-enabled piece of industrial equipment (e.g. an automated drill), a network-enabled piece of sensing equipment (e.g. an aerial gravimeter), a rack-mount server, a cloud-based server, or any other network-enabled computing device.
  • the node 10 may comprise, or be linked to, a processor 9 and a memory 8.
  • the memory 8 may have stored thereon computer instructions which, when executed by the processor 9, provide the functionality of the file management system as described herein.
  • the node 10 may further comprise, or be linked to, a transceiver 15 for communicating with a network 14, for example, an intranet or the Internet. Further nodes 10, being substantially similar to the aforementioned node 10, can also be linked to the network. Each node 10 may be operable to communicate with one or more other nodes 10 via the network.
  • the node 10 may be user-controllable or automatically controlled by computer executable instructions.
  • Each node 10 may further comprise, or be linked to, a data store 16.
  • the data store 16 is operable to have stored thereon at least one file.
  • the at least one file may comprise at least one reference file, and at least one difference file associated with each reference file.
  • a reference file is a file that can be considered a complete file, in that if a node 10 provides the reference file to another node 10, the other node 10 can receive and read the reference file for the purposes of operating upon it (i.e., opening file, modifying the file, etc.).
  • a difference file is a file providing information sufficient to generate a file that can be operated upon when used in conjunction with a reference file.
  • a difference file may map the differences between a first file and a second file.
  • a node 10 having received a difference file and a reference file can recover, from the difference file, information relating to modifications that have been made relative to the reference file.
  • the node 10, having the reference file and the difference file can generate a modified file corresponding to the modifications made relative to the reference file and can operate upon the modified file.
  • a differencing file which embodies the differences between two or more generations of file revisions, may also be used to generate a modified file based on a reference file.
  • a differencing file is a subset of a difference file which can be generated by combining two or more in a sequence of difference files.
  • a difference file may be substantially smaller than the size of a reference file.
  • Transferring only a difference file, rather than the entire reference file between nodes 10 over the network 14 may substantially reduce the amount of data that must be transferred over the network 14, which may enable faster and more efficient synchronization of files.
  • one of the nodes 10 may be referred to as a "server” for the purposes of storing one or more particular reference files and associated difference files.
  • the first node may act as a server for a first file or first group of files while the second server may act as a server for a second file or a second group of files.
  • server or “server node” identifies which node is acting as the server in a particular exchange. It will be understood that the server may act as a node other than the server in a different exchange.
  • a plurality of nodes 10 may each be referred to as a server, wherein each node 10 may be a server of particular reference files and associated difference files.
  • a first reference file may be provided on a first node 10 which is the server of the first reference file
  • a second reference file may be provided on a second node 10 which is the server of the second reference file, and so on for any number of reference files.
  • a third node may act as a server for a third file but act as a node other than the server for the first and second files.
  • any particular node 10 could also be the server of a plurality of such reference files.
  • the server may track and store data associated with files, for example, the version number of the file. Newer versions of a file may comprise incremental changes from an older version of the file. Version information controlled by the server may be accessed by the nodes 10. By centralizing storage of the version number for a particular file, the server is operable to determine whether each node 10 is operating upon the most recent version of that file.
  • At least one of the nodes 10 further comprises, or is linked to, a database 17.
  • the database 17 may be used to store information relating to files stored on the data store 16.
  • a server may store on its database 17 a version number for each reference file (which may correspond to the number of associated difference files for that reference file) stored on its data store 16, as well as one or more annotations for each such reference file and/or difference file. These annotations may be implemented by metatags associated with such file.
  • Nodes other than the server may also comprise or be linked to a database for storing information relating to files stored on their respective data stores 16, for example version numbers of such files.
  • each node 10 includes a differencing module 20 and a synchronization module 22.
  • Node 10 may further comprise a modification detection module 28, a compression module 24, a search module 26 and a file monitor 30.
  • the differencing module 20 is operable to generate a difference file from a reference file and modifications made relative to the reference file, which may be embodied in a permanently or temporarily stored modified file.
  • the modified file need not be stored on the date store at any time, though it may be stored in memory 8 while the node 10 is operating upon it. However, it will be appreciated that the modified file may also be stored on the data store in some embodiments.
  • the differencing module 20 is also operable to apply the difference file to the corresponding reference file (or a differencing file) to generate a modified file to be operated upon.
  • the synchronization module 22 may be operable to facilitate the synchronization of files between a node 10 and a server.
  • the synchronization module 22 may cooperate with the node's transceiver 15 to enable a node 10 to communicate (receive and transmit) reference files, difference files and differencing files with another node 10.
  • the node's modification detection module 24 is operable to detect whether modifications have been made to a file on node 10 regardless of whether the node 10 is connected to a server over network 14.
  • the node 10 may comprise a compression module 24 to compress and decompress reference files and difference files to reduce the size of such files.
  • Compression of files may be provided to reduce the amount of data that must be sent over the network 14, thereby reducing the time required to send a file.
  • the node's search module 26 may be operable to perform searches for files regardless of the connectivity state of the node 10 to the network 14. For example, the node's search module 26 may enable searching for files on other nodes despite an unreliable connection using metadata, as will be further described herein. Typically, performing a search from a node 10 for information on a server requires a connection between the server and the node 10. Over an unreliable network, the connection may not always be available, which may cause long search times and timed-out searches. [0042] In operation, in one example, a node 10 (or a user of the node 10) may request to access a particular file. The node's synchronization module 22 determines whether a version of the requested file is stored on the node's data store 16.
  • FIG. 3 an example process for a node obtaining a file from a server, updating the file, and providing the updated version of the file to the server is shown.
  • access to a file at the node is requested.
  • the synchronization module on the node determines that file does not exist in the node's data store.
  • the node further determines the server for the requested file and requests the file from that server in 1 14.
  • the server accesses its database 17 to determine the version number of the requested file, and correspondingly accesses its data store 16 to retrieve the requested file.
  • the synchronization module of the server, in 1 16 provides to the node the file that the node has requested, as well as the associated version information.
  • the node stores the file in the node's data store 16 as a reference file and stores the corresponding version information in the database 17 of the node.
  • the node generates a modified file by making modifications to the reference file and stores, in the database 17, version information associated with the file.
  • 120 could comprise a user making an addition to the file and storing the modified file as a new version.
  • differencing module 20 generates a difference file based on the modified file and the reference file stored in 1 18.
  • the node provides the server with the difference file and associated version information. Upon obtaining the difference file, the server updates the reference file based on the difference file and saves the updated file as a new version in 126.
  • the server node's synchronization module 22 accesses its database 17 to determine the most recent version number of the requested file, and correspondingly accesses its data store 16 to retrieve the reference file and one or more associated difference files.
  • the number of difference files may be derived from the version number. For example, a modified file generated from a reference file associated with four difference files may have a version number of five.
  • the server then provides the reference file and the associated difference files to the node, where the differencing module 20 of the node generates a first modified file based on the reference file and the associated difference files and stores the first modified file in 1 18.
  • the differencing module of the node Upon the node further modifying the file to produce a second modified file, the differencing module of the node would then generate an additional difference file which maps the differences between the first modified file and the second modified file. Similarly to the above description, the node would then provide the server with the difference file in 124. The server may then apply the difference file to a reference file or simply store the difference file to be shared with other nodes.
  • the first node's synchronization module 22 determines its corresponding version number, for example, by accessing the first node's database 17 which stores the version number for each file stored on the first node's data store 16. The first node's
  • the synchronization module 22 transmits the version number to the node functioning as a server.
  • the server node's synchronization module 22 accesses its database 17 to determine the version number corresponding to the requested file on the server's data store 16. If the version numbers of the first node and server node are identical, the server node need not transmit any file to the first node 10. If the version numbers differ, the server node's synchronization module 22 directs its differencing module 20 to generate a differencing file corresponding to the set of difference files for the intervening version numbers. The server's synchronization module 22 may transmit the differencing file 20 and version number to the node's synchronization module 22.
  • the node's differencing module 20 then generates a modified file by applying the differencing file to its version of the file.
  • the first node 10 stores the modified file in its data store 16 and stores the version number in its database 17.
  • the first node 10 may overwrite its previous version of the requested file with the modified file.
  • the server node may store in its database 17 an indicator that the node has accessed the file.
  • the node 10 may operate upon a file to create a further modified file. Once the node 10 has finished modifying the file, resulting in a further modified file, the differencing module 20 constructs a difference file based on the further modified file and the (received) modified file. The node 10 stores the further modified file on its data store 16 and an updated version number on its database 17. The node 10 may overwrite the (received) modified file with the further modified file on its data store 16. The node's synchronization module 22 transmits the difference file to the server node's synchronization module 22. The server node's differencing module 20 may save the difference file to its data store 17 and update the version number corresponding to that file on its database 16.
  • the file management system may further enable a first node 10 and a second node 10 to synchronize a file through an intermediary server. Once modifications are made on a first node 10, the updated file may be provided to the second node 10 via the intermediary server.
  • the server 10 may be done, for example, by the node 10 providing the difference file to a server in accordance with the foregoing, the server determining from its database other nodes that have accessed the file (in this case, the second node), the server 16 updating the version number of the file in accordance with the foregoing, the server requesting that the second node 10 provide it with its version number for the file, and the server 16 correspondingly generating a differencing file to enable the second node 10 to generate a modified file in accordance with the foregoing.
  • the difference file is provided to the second node in a push model, rather than as a response to a request by the second node.
  • Such a request from the node acting as a server to any one or more nodes may be provided as follows.
  • the server's synchronization module 22 transmits a notification to other nodes 10 that have previously accessed the reference file associated with a received difference file (from a node 10). Any such other node's synchronization module 22 may correspondingly determine the version number of its corresponding copy of the file and send the version number to the server.
  • the server's synchronization module 22 transmits to each node 10 a differencing file to update the node's file to the current version.
  • any two or more of such nodes 10 may receive a distinct differencing file as they may not have been previously updated to the same version number, for example if any such nodes 10 were offline at the previous update.
  • Such nodes' differencing module 20 may update such file on its data store, along with a corresponding version number on its database when it receives the differencing file.
  • the file management system also enables a node 10, or a user operating a node 10, to view the actual contents of the file, as well as the metatags associated with the state of the locally modified file and the state of the file on a server.
  • the server may be out of contact for a period of time while the file is being modified at the node 10.
  • a reference file preferably corresponds to the file on the server at the time that the file was most recently synchronized.
  • a difference file may be generated on the first node based on the reference file and a version of the file that was modified on the first node. For example, if the first node obtained a reference file from the server, the first node may modify the reference file to form a modified file. A difference file may then be generated on the first node based on the modified file and the reference file.
  • the modification detection module 24 may be operable to determine whether the reference file differs from a more recent version of the file on the node 10.
  • the modification detection module 24 may provide updated information regarding the state of the local file with respect to the state of a reference file in the temporary absence of a network connection.
  • the modification detection module 24 may also provide an indication to a user that the reference file differs from a modified file, as is described below.
  • the file on the server may have been modified since the last synchronization by a second node, the first node or a user at the first node may be able to determine, from the indication, which of the files have been modified at the node since the time of the last synchronization.
  • the node may be provided with a display, the display being operable to provide a displayed list of files on a node.
  • the list of files may be provided with a visual indication that provides the user with an indication that the version of the file on the node differs from the reference file, based on information received from the modification detection module 24.
  • the indication that the file on the node differs from the reference file may comprise, for example, a pair of arrows that point in opposite directions when the files differ.
  • the indication may also comprise further details, for example, the date and time that the file was last modified as well as the date and time that the last synchronization occurred, or a percentage outlining what percent of the blocks in the file are identical. Other indications that provide the user with a sensory experience based on differences in the files may also be possible.
  • the file management system may further provide compression of files.
  • the amount of time required to transfer the file between nodes 10 over the network 14 may be inconveniently or prohibitively long.
  • a node's compression module 24 may compress the file that is being transmitted by applying a compression algorithm.
  • the compression algorithm may comprise, for example, VCDIFF, another format for encoding compressed and/or differencing data, or any appropriate compression algorithms for the types of files being transmitted.
  • the differencing module 20 segments a reference file into a plurality of blocks.
  • the size of the blocks may be determined based on preconfigured segmenting parameters which the differencing module 20 may adaptively adjust. Examples of block sizes may be 4092 bytes, 8192 bytes, etc.
  • the differencing module 20 may compute a hash of each block, and assign an identifier (e.g. a number) to each of the blocks in step 38.
  • the differencing module 20 may segment the corresponding modified file in step 40, assign an identifier (e.g. a number) and compute the hash of each of its blocks in step 42.
  • the hash of the blocks of the modified file may then be compared with the hash of the blocks of the reference file in step 44.
  • the differencing module 20 determines which blocks have been modified between the modified file and the reference file in step 46.
  • the differencing module 20 may generate a difference file comprising the modified blocks and modified block identifiers in step 48. It will be appreciated that the differencing module 20 may alternatively compare the blocks without hashing. It will also be appreciated that certain of the foregoing steps can be performed in different sequences without an affect on functionality.
  • the differencing module 20 may segment blocks by setting markers based on the contents of the file, enabling a modified file to be compared to a reference file even if it comprises significant rearrangement. For example, each block may be hashed. The hashes of one file may then be compared with the hashes of another file to determine which blocks are identical and which blocks must be transmitted in a difference file over the network 14, to synchronize the files.
  • the hashes are compared locally by a node's differencing module 20 rather than at a server, however the hashes may be transmitted to the server for comparison by its differencing module 20. Since only those blocks that have been changed may need to be transferred over the network 14, the generated difference file may comprise the data located in these blocks.
  • the differencing module 20 may generate and cache the difference file locally before the synchronization process is initiated. For example, if the differencing module 20 generates the difference file on a node 10, the difference file may be cached on the node's memory 8 prior to the synchronization process being initiated.
  • the difference files may be generated by the node prior to re-establishing the network connection.
  • the node may generate the difference files in the background while performing other tasks.
  • the difference files have been generated in full or in part, expediting transmission over the network.
  • the computational steps required once the network connection is reestablished may be significantly lower than if the difference files had been generated at the time that the network is re-established.
  • the synchronization process may be completed more quickly, with a lower risk of again losing the connection during synchronization, and with less interruption to other activities that require use of the node's processor and must take place while the network is established.
  • the differencing module 20 may also generate a difference file as the modifications to the file are taking place.
  • the synchronization module 22 may apply a break and resume transfer algorithm to continue synchronization when the network connection is re-established.
  • the break and resume transfer algorithm may be any algorithm enabling a file to be transferred where it has previously not been transferred or only partially transferred.
  • some files may be associated with other files.
  • an executable program may require an input from a spreadsheet. If the executable file is synchronized but the spreadsheet is not, the executable file may not have access to the required input value.
  • a first file may comprise an executable that retrieves data from a comma separated value (.csv) file to perform a pre-determined calculation. If the node's synchronization module 22 is set to synchronize the executable file, all related data, including the .csv file, may also be synchronized. As the executable file may not be of use without the most recent .csv file, associating the .csv file to the spreadsheet for the purposes of the synchronization process prevents erroneous, incomplete, or non-functional groups of files from
  • synchronizing It will be appreciated that other examples of associations between files may exist. If one of the group of associated files cannot be synchronized, a warning message may appear to alert a user at a node 10 that a related file is missing. Alternatively, the file management system may block the file from being shown, or even delete the file, as this file may contain or initiate an error at another node 10. The newly synchronized file may otherwise, or in addition, remain hidden until all associated files may be synchronized.
  • only one node 10 may modify a particular file at any given time, to reduce the likelihood that two nodes 10 will simultaneously operate upon a particular version of the file and attempt to synchronize different modified files.
  • a node 10 may be restricted to updating only the most recent version of the file on the server. If a node 10 accesses a particular file, the server's synchronization module 22 may indicate in its database that the file is "checked out" by the node 10. The node 10 that has checked out the file may be given the authority to designate a file as a master copy when the
  • synchronization module 22 synchronizes the file with the server.
  • the master copy designation may be saved on the database 17.
  • the node 10 may then check in the file to allow other nodes 10 to designate the file as a master copy.
  • the node 10 may save the file as a new version and store the version information in the database 17.
  • the version information which may comprise information indicating whether the file is a master copy, may be stored on the database 17, whereas the reference file itself, and any associated difference files (or differencing files), may be stored on the data store 16. If another node 10 attempts to access the file, the other node 10 may be provided the file, however, because the first node 10 had already checked out the file, any modifications made by the second node 10 will not be applied as differencing files for that file. The second node 10 may save its modifications to a new file, however.
  • the synchronization module 22 when the synchronization module 22 synchronizes a file on a node 10 with the corresponding file in the data store 16, the file may remain on the server. Similarly, when the synchronization module 22 updates a file on a node 10 with a newer version that is available on the data store 16 and delivered through the server, the older version of the file may not be deleted. The older version of the file may be retained and the update may be stored by way of storing one or more difference files that are associated with the file.
  • a reference file on node 10 is version one of a file which has since been modified in two successive iterations to yield modified versions two and three
  • the synchronization module 22 on node 10 provides the server node with both revisions in the form of two difference files or a single differencing file.
  • a first difference file may provide the differencing module on the server with the information necessary to construct the modified file corresponding to version two.
  • a second difference file may provide the differencing module on the server with the information necessary to construct version three of the file based on version two.
  • both the first version of the file and the difference files that enable the differencing module 20 to construct the second and third versions of the file are required to enable access to all three versions of the file.
  • the node may apply a single differencing file, corresponding to the difference between version one and version three, to obtain version three of the file.
  • access to version two of the file may not be available.
  • the synchronization module 22 on a node 10 other than the server may cause the server to update its data store 16. For example, if modifications to a reference file 50 were made on a node 10 that had checked out the file, the synchronization module 22 of that node may provide the server with the difference file that had been calculated by that node's differencing module 20. The server may store this difference file on its data store 16. The difference file may then be used by the differencing module of the server to construct the second version of the file 52.
  • the server may store the reference file on the data store 16 as well as each of the difference file updates. It may be noted that in this example, the reference file, as well as the difference files provided to the data store 16, are saved on the data store 16 of the server.
  • the reference file 50 may be uploaded to the data store 16 on the node acting as the server.
  • the synchronization module 22 of a node 10 not acting as the server then checks out the reference file 50. While the file is checked out, a second node 10, also not acting as a server, may access the reference file 50 on the data store 16 of the server by downloading the reference file. Once the second node 10 has finished modifying the file, the second node's synchronization module 22 may provide the modifications to the data store 16 on the server. To provide the modifications to the data store 16 on the server, the second node's differencing module 20 can calculate a difference file 54 locally on the second node based on the reference file 50 and the modified file 52.
  • the second node's synchronization module 22 may then provide the difference file 54 to the server to be stored on the data store. Since the reference file 50 was checked out by the first node 10, the file produced by the second node 10 may not be designated as a master copy. Hence, the difference file may be saved separately as B1 . The information relating to the file's version may be stored on the database of the server.
  • a second difference file 56 may be saved on the data store 16 of the server.
  • the first node's differencing module 20 may compute the difference file, which may then be provided to the data store 16 of the server by the synchronization module 22.
  • the difference file uploaded to the data store 16 of the server corresponds to the second master version.
  • the synchronization module 22 may provide a further difference file 58 to the data store 16 of the server as a master version of the file and save the corresponding version information on the database 17 of the server.
  • the first node 10 may then check in the file once the first node 10 has completed any modifications.
  • the synchronization module 22 may provide the resulting difference file, as outlined in the process explained above, to the data store 16 of the server.
  • This difference file 62 may be stored on the data store 16 of the server with the master copy designation.
  • the information relating to the master copy designation may be saved in the database 17 of the server.
  • the modifications may be uploaded to the data store 16 of the server in the form of a difference file 60 by the synchronization module 22 but may not be saved as a master copy.
  • the synchronization module 22 may provide a copy of the most recent difference file 64 computed by the differencing module 20 to the data store 16 of the server. Hence, in this example, at each new update of the reference file, a new difference file is provided to the data store 16 and no files are deleted.
  • a node 10 may log operations performed on files stored on its data store, for example to determine the history of file updates, particularly if the node 10 is a server for such files.
  • the operations performed on the data store 16 may be identified with, for example, a timestamp, identification of the node 10 (and/or its user), location of the node 10 (and/or its user), MAC address/computer ID, etc.
  • a node 10 may request a version of the file from the server that is not the most recent version of the file stored on the server.
  • the node 10 may also request a copy of the file that is more than one version behind the most recent version.
  • the server's differencing module may be operable to generate a checkpoint file.
  • a checkpoint file is a complete file that can be accessed by a node 10 without requiring the server to apply a difference file to a reference file.
  • a checkpoint file may be saved at predetermined intervals to reduce the number of computations that must be performed by the server's differencing module 20 if there are many versions of file.
  • the file management system may be operable to track the number of times a new version of a particular file has been saved.
  • the file management system may be also operable to save a checkpoint file based on other parameters, for example, the number of version changes, the date, time elapsed since the last checkpoint was saved, the amount of content that has changed between version updates, etc.
  • the file management system may also be operable to save checkpoint files at intervals that are based upon how often nodes 10 update files and how often nodes 10 request older versions of the file.
  • the server can transmit the checkpoint file to the node 10.
  • the server's differencing module 20 may be operable to compute the requested version of the file by applying a difference file to the most appropriate checkpoint file.
  • the tenth version of a checkpoint file may be provided to the node 10 by the server.
  • the server may calculate the eleventh version by applying a difference file to the tenth version.
  • the server may also, depending on the difference files that are stored, apply one or more difference file to the fifteenth version to obtain the tenth version.
  • the node's synchronization module 22 may be operable to provide the requested version of the file to the node 10 through the network 14.
  • the server may compute a differencing file mapping the differences between the twentieth version and the ninth version and transmit this difference file to the node 10.
  • the transmission may be completed more quickly.
  • the node's differencing module 20 may be operable to compute version nine of the file. By saving only a certain number of version files but saving enough difference files to enable the intermediate versions of the file to be calculated by the server's differencing module, the required amount of memory on the data store 16 of the server may also be reduced.
  • the differencing module may be operable to calculate relevant combinations of difference files in coordination with the checkpoint files.
  • the server saves a checkpoint file for every ten versions of updates, and there are a total of fifty five versions of a particular file, then forty five difference files may be required to provide access to each of the versions in between the checkpoints.
  • most nodes 10 will have a version that is no more than ten versions old and will want to update the version of the file to the most recent version. This reduces the number of required difference files (or differencing files) to ten.
  • the difference files enabling a particular number of past versions to be updated to the most recent version can be stored in order to reply to any node's request for an update more rapidly.
  • a node 10 may be operable to generate a synchronization list to request a plurality of files from the server. All files on the synchronization list may be updated as a group, or in priority, when there is access to the network 14.
  • the node 10 may update the files on the synchronization list when the user is away from the node 10 or not using the network connection, resulting in more bandwidth being available for synchronization processes.
  • some version of each file may need to be stored on the node 10, for example, a reference file.
  • the node's synchronization module 22 requests that the server 16 provide it with corresponding difference files for each such reference file.
  • FIG. 6 a method of a first node, which is not acting as a server, receiving an updated version of a file from a server is explained.
  • an operator of the node requests access to a file.
  • the synchronization module on the node determines, from the node's database 17, the version of the file in the node's data store 16. The node determines whether the file is on the synchronization list.
  • the node determines that the file is on the synchronization list.
  • the node requests an updated version from a server and provides the server with the version identifier of the file that exists on the node in 136.
  • the server determines whether the version identifier of the file on the server is more recent than the version on the node. Upon determining that the version on the server is more recent, the server provides the node with the one or more difference files required to generate an updated version of the file from the reference file on the node in 140.
  • the difference files are generated as described above. As explained above, the difference files may be generated before, or after, the update request from the node.
  • the node stores the difference file in memory and generates a modified file based on the difference file obtained from the server and an existing reference file on the node.
  • the node 10 may request, based on the synchronization list, an extensive list of files that are to be synchronized. To conserve memory on the node 10, and bandwidth over the network 14, the files on the synchronization list may be compressed by the node's compression module 24 and stored in a compressed format prior to transmission over the network 14. Similarly, the files transmitted by the server 16 to the node 10 may be compressed prior to
  • the time required for synchronization can be reduced. This can be particularly advantageous if network access is only available for a limited number of hours in a day.
  • the utility of the network 14 may be maximized while it is available.
  • the transfer since only one file is being transferred, the transfer may be interrupted and resumed in an intermittently available network without significant loss. Since only certain versions are saved as checkpoint versions and the other versions can be calculated based on difference files by the differencing module 20, the amount of space required from the data store 16 may be significantly reduced.
  • a difference file can be used for updating future files without requiring an extra computation step. This increases the efficiency of the synchronization system and reduces the load on both the server and the nodes 10.
  • the server may distribute updates to each of the nodes 10 in the form of difference files as a more recent version of the file is created, the number of difference files that must be calculated may be reduced. Moreover, since difference files are typically smaller than reference files, there may be a lower probability of file corruption during transmission.
  • Another advantage of the synchronization process of the current invention is that compression may be applied to a particular difference file, further reducing the quantity of data that must be transmitted over the network 14.
  • the file management system enables metadata tagging of files in the data store 16 of the server or locally on the node.
  • Metadata tags may be stored in a database 17 of the node, as well as in database of the server. By storing metadata tags on the database 17 of the node, the metadata tags may be used to perform searches during a temporary interruption of the network connection.
  • the database 17 of the node may provide metadata to the search module 26 of the node.
  • the search module 26 of the node provides search functionality to the nodes 10.
  • each file on a server may be tagged by the node 10, the server, or a search module 26, based on a class of the file.
  • Metadata searches may be performed by the search module on the server using metadata in the database 17 of the server.
  • the metadata may comprise a class.
  • Classes may be user-created or may be automatically created by the search module 26. Classes may exist for particular work sites, particular types of projects, particular employee types and/or particular file types, for example. Files may also be tagged based, for example, on the creator or editor of the file, the date that the file was created, the program used to create the file, the content in the file, the number of times that the file has been accessed, and particular information in the file. For example, in a mining operation, a certain file may be tagged as belonging to the class containing drill-hole data. Each of the files in this class may have a unique set of properties and the node's search module 26 may be operable to search for files based on their class.
  • Metadata tagging in accordance with the foregoing may optimally be applied in connection with data transmission over unreliable networks.
  • the class and tagging information may also be provided to the node 10. This may ensure that class information, as well as other metadata tags associated with the file, are available to a node's search module 26.
  • the node 10 may save the class and tagging data.
  • the server may also provide the tagging data of other files that have been tagged as being similar to the synchronized files.
  • the server may also provide a larger subset of the tagging metadata available on the data store 16 or may provide all metadata associated with the tagging to the node 10.
  • a user at a node not acting as a server 10 may apply metadata tags to search the entire body of files on the data store 16 or a subset of the files on the data store 16. For example, if the user is working at a node 10 that has downloaded the metadata tags for all the files on the data store 16 from a server, the user may search for all files of a specific class or all files tagged with particular information. For example, a user may wish to search information from all drill holes bored using tool steel bits in a particular area. A corresponding search may bring up all files in the class of drill holes bored using tool steel bits in the particular area.
  • the user may then select to have particularly relevant files incorporated into the synchronization list to enable the user to view the file and maintain the file in its most recent version. For example, if a file having an association other files is added to the synchronization list, all associated files may be similarly added to the synchronization list.
  • files created by a node 10 can be added to the synchronization list of files that must be synchronized with the server of those files. Since no copy of the file may exist on the data store 16 of the server, the server's synchronization module 22 may upload the file to the data store 16 during the synchronization process. If modifications are made to the file either at a node 10 or at some other node, the differencing module 20 on the server may incorporate updates into the data store 16.
  • Metadata may also comprise folder information that may be relevant to the contents of a particular file. For example, if an existing folder structure is uploaded into the data store 16, the server may create metadata from the folder names or other information associated with the folders being uploaded.
  • the file management system enables off-line synchronization.
  • a node 10 may determine the synchronization status of each file on the node 10 or particular files on the node 10.
  • the system may implement a file monitor 30.
  • the file monitor is operable to determine the difference between modified files and the most recent version of the file downloaded to the node 10 from the server. Since, as explained above, the node 10 stores the most recent copy of the file downloaded from the server as a reference file, the node's modification detection module 24 may compare the modified version of the file to the reference file.
  • the node's modification detection module 24 may provide a warning to the user that the file should be synchronized when access to a network 14 becomes available.
  • the file management system may further prioritize the synchronization of files that are most different from the version that had previously been synchronized with the server. For example, if the node's synchronization module 22 is set to synchronize two files, the file that has been most heavily modified compared with the file last accessed from the server will be synchronized first.
  • the user may also provide a manual priority ranking of which of the files on the synchronization list of node 10 should be synchronized first.
  • the priority ranking of the synchronization list may also be determined based on metadata tags or classes applied to the file. Synchronizing higher priority files first may ensure that the most high priority files are synchronized prior to an interruption in network access.
  • Files may be modified without the node's knowledge, for example, if the file comprises information gleaned during a drilling process monitored by a sensor, the file may be updated in the background on the node 10.
  • the node's modification detection module 24 may monitor for differences between the locally stored version of the most recent file during the last synchronization and the most recently updated file on the node 10.
  • the node's modification detection module 24 may be registered with the node's operating system in order to capture file changes from a plurality of programs and processes, similarly to a virus scanning program. All files that should be synchronized with the server may then be synchronized once the network 14 becomes available.
  • the file management system may be better suited for use with unreliable networks than past systems. This allows minimal data transfer and ensures that the files that should be synchronized are synchronized as soon as possible. Furthermore, by enabling a user to search for files on a node 10 when the node 10 is not acting as a server for those files, searches may be conducted in the absence of a network connection.
  • FIG. 7 an example of operation of the file manager is shown in one example implementation, in which various types of data relating to the preparation, construction and operation of a mine are shown. It shall be understood that this is but one example, and that numerous other example implementations, and processes related to this implementation, may be provided.
  • three types of data may be stored in files on a node acting as a server for those files.
  • the first type of data may be updated frequently.
  • blast- hole data 76, the ore control block model 82, and the short-term plan 90 may be updated on a daily basis.
  • the second type of data may be updated less frequently, for example, the mine design 84 may be updated on a weekly or monthly basis.
  • the third type of data may be updated infrequently, for example, the assay data 70, the drill hole data 72, the solids data 74, the block model 78, and the long-term plan 86 may be updated on a yearly basis.
  • Older versions of a file may not be replaced; however, they may be accessed, edited as saved as a new file or a new version. This ensures that the historical order of the files can always be retrieved from the data store 16.
  • the user When a user wishes to edit a document, the user must check out the document to make edits in the master copy. Other users may access the same document; however, changes made by these other users may not be saved as a newer version of the document. These changes may be saved as a side branch of the document, as is shown in FIG. 5. Only the user who has checked out the document may save the master version of the document.
PCT/CA2012/050784 2011-11-04 2012-11-05 System and method for data synchronization over a network WO2013071428A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201280065691.2A CN104272649A (zh) 2011-11-04 2012-11-05 用于通过网络进行数据同步的系统和方法
AU2012339532A AU2012339532B2 (en) 2011-11-04 2012-11-05 System and method for data communication over a network

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US201161555999P 2011-11-04 2011-11-04
US61/555,999 2011-11-04
CA2769773A CA2769773C (en) 2011-11-04 2012-02-28 System and method for data communication over a network
CA2769773 2012-02-28

Publications (2)

Publication Number Publication Date
WO2013071428A1 true WO2013071428A1 (en) 2013-05-23
WO2013071428A8 WO2013071428A8 (en) 2013-10-31

Family

ID=48222505

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CA2012/050784 WO2013071428A1 (en) 2011-11-04 2012-11-05 System and method for data synchronization over a network

Country Status (4)

Country Link
CN (1) CN104272649A (zh)
AU (1) AU2012339532B2 (zh)
CA (1) CA2769773C (zh)
WO (1) WO2013071428A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017048409A1 (en) * 2015-09-15 2017-03-23 Microsoft Technology Licensing, Llc Synchronizing file data between computer systems
US20170344594A1 (en) * 2016-05-27 2017-11-30 Cisco Technology, Inc. Delta database synchronization
CN112714149A (zh) * 2020-11-27 2021-04-27 北京飞讯数码科技有限公司 数据同步方法、装置、计算机设备及存储介质

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105991685B (zh) * 2014-11-07 2019-06-25 天地融科技股份有限公司 数据更新方法及系统
CN105279100A (zh) * 2015-11-04 2016-01-27 杭州华为数字技术有限公司 链接克隆母卷更新方法及装置
CN106372199B (zh) * 2016-08-31 2019-07-05 镇江乐游网络科技有限公司 一种基于元数据支持的多版本文件管理系统
CN107172169A (zh) * 2017-05-27 2017-09-15 广东欧珀移动通信有限公司 数据同步方法、装置、服务器及存储介质
US10402311B2 (en) * 2017-06-29 2019-09-03 Microsoft Technology Licensing, Llc Code review rebase diffing
CN109308272A (zh) * 2017-07-28 2019-02-05 同星科技股份有限公司 通过数据储存装置控制外围装置的方法与可控制外围装置的数据储存装置
CN108121804B (zh) * 2017-12-22 2020-06-05 百度在线网络技术(北京)有限公司 跨地域分布式存储数据的方法、装置、终端及存储介质
CN110636090B (zh) * 2018-06-22 2022-09-20 北京东土科技股份有限公司 窄带宽条件下的数据同步方法和装置
CN109218447B (zh) * 2018-10-29 2021-09-17 中国建设银行股份有限公司 媒体文件分发方法及文件分发平台
CN111090835B (zh) * 2019-12-06 2022-04-19 支付宝(杭州)信息技术有限公司 一种文件衍生图的构建方法及装置
CN111259072B (zh) * 2020-01-08 2023-11-14 广州虎牙科技有限公司 数据同步方法、装置、电子设备和计算机可读存储介质
CN113094443A (zh) * 2021-05-21 2021-07-09 珠海金山网络游戏科技有限公司 数据同步方法及装置
CN114124928B (zh) * 2021-09-27 2023-07-14 苏州浪潮智能科技有限公司 设备间文件快速同步方法、装置及系统

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2356269A (en) * 1999-06-17 2001-05-16 Ibm Multiple links to versions of a source file in a distributed computer environment
US20010048728A1 (en) * 2000-02-02 2001-12-06 Luosheng Peng Apparatus and methods for providing data synchronization by facilitating data synchronization system design
US20060161516A1 (en) * 2005-01-14 2006-07-20 Microsoft Corporation Method and system for synchronizing multiple user revisions to a shared object
US20070186069A1 (en) * 2004-08-10 2007-08-09 Moir Mark S Coordinating Synchronization Mechanisms using Transactional Memory
WO2011109049A1 (en) * 2010-03-04 2011-09-09 Alibaba Group Holding Limited Method and apparatus of backing-up subversion repository

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1249597C (zh) * 2002-09-03 2006-04-05 鸿富锦精密工业(深圳)有限公司 分布式文件同步系统及方法
CN1261877C (zh) * 2002-10-11 2006-06-28 鸿富锦精密工业(深圳)有限公司 多节点文件同步系统及方法
CN1756108A (zh) * 2004-09-29 2006-04-05 华为技术有限公司 主备系统数据同步方法
CN101142573A (zh) * 2004-10-25 2008-03-12 恩鲍尔技术公司 全局数据同步的系统和方法
US8185495B2 (en) * 2008-02-01 2012-05-22 Microsoft Corporation Representation of qualitative object changes in a knowledge based framework for a multi-master synchronization environment

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2356269A (en) * 1999-06-17 2001-05-16 Ibm Multiple links to versions of a source file in a distributed computer environment
US20010048728A1 (en) * 2000-02-02 2001-12-06 Luosheng Peng Apparatus and methods for providing data synchronization by facilitating data synchronization system design
US20070186069A1 (en) * 2004-08-10 2007-08-09 Moir Mark S Coordinating Synchronization Mechanisms using Transactional Memory
US20060161516A1 (en) * 2005-01-14 2006-07-20 Microsoft Corporation Method and system for synchronizing multiple user revisions to a shared object
WO2011109049A1 (en) * 2010-03-04 2011-09-09 Alibaba Group Holding Limited Method and apparatus of backing-up subversion repository

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017048409A1 (en) * 2015-09-15 2017-03-23 Microsoft Technology Licensing, Llc Synchronizing file data between computer systems
US10425477B2 (en) 2015-09-15 2019-09-24 Microsoft Technology Licensing, Llc Synchronizing file data between computer systems
US20170344594A1 (en) * 2016-05-27 2017-11-30 Cisco Technology, Inc. Delta database synchronization
US10671590B2 (en) * 2016-05-27 2020-06-02 Cisco Technology, Inc. Delta database synchronization
CN112714149A (zh) * 2020-11-27 2021-04-27 北京飞讯数码科技有限公司 数据同步方法、装置、计算机设备及存储介质

Also Published As

Publication number Publication date
CN104272649A (zh) 2015-01-07
AU2012339532A1 (en) 2014-05-01
CA2769773A1 (en) 2013-05-04
AU2012339532B2 (en) 2016-12-01
CA2769773C (en) 2018-01-09
WO2013071428A8 (en) 2013-10-31

Similar Documents

Publication Publication Date Title
AU2012339532B2 (en) System and method for data communication over a network
US10990629B2 (en) Storing and identifying metadata through extended properties in a historization system
US11314690B2 (en) Regenerated container file storing
JP6373939B2 (ja) 動的なデータ差分生成および配送
CN102902601B (zh) 高效数据恢复
US6694335B1 (en) Method, computer readable medium, and system for monitoring the state of a collection of resources
CN103457905B (zh) 数据同步方法、系统及设备
US20170147616A1 (en) Compacting data history files
US20070136305A1 (en) Method for synchronizing and updating bookmarks on multiple computer devices
US20150363484A1 (en) Storing and identifying metadata through extended properties in a historization system
US20170293628A1 (en) Space optimized snapshot for network backup
US20150032785A1 (en) Non-transitory computer-readable media storing file management program, file management apparatus, and file management method
KR20040099392A (ko) 데이터가 여러 데이터 저장부들에 저장되는 방식을동기화하기 위한 방법 및 장치
CN109376121B (zh) 一种基于ElasticSearch全文检索的文件索引系统及方法
JP2007122643A (ja) データ検索システム、メタデータ同期方法およびデータ検索装置
US9075722B2 (en) Clustered and highly-available wide-area write-through file system cache
US11106635B2 (en) Computer system, file storage controller, and data sharing method
EP1564658A1 (en) Apparatus and associated method for synchronizing databases by comparing hash values.
US10402373B1 (en) Filesystem redirection
CN103294739B (zh) 文档管理服务器、文档管理装置、文档管理系统及方法
US9507528B2 (en) Client-side data caching
CN109325057B (zh) 中间件管理方法、装置、计算机设备以及存储介质
JP2009277142A (ja) 作業情報管理装置、作業情報管理方法及び作業情報管理プログラム
US9727569B2 (en) Transitive file copying
KR101929948B1 (ko) 데이터 타입 기반의 멀티 디바이스 데이터 동기화를 위한 방법 및 시스템

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 12849812

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2012339532

Country of ref document: AU

Date of ref document: 20121105

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 12849812

Country of ref document: EP

Kind code of ref document: A1