CN113282540A - Cloud object storage synchronization method and device, computer equipment and storage medium - Google Patents

Cloud object storage synchronization method and device, computer equipment and storage medium Download PDF

Info

Publication number
CN113282540A
CN113282540A CN202110625241.5A CN202110625241A CN113282540A CN 113282540 A CN113282540 A CN 113282540A CN 202110625241 A CN202110625241 A CN 202110625241A CN 113282540 A CN113282540 A CN 113282540A
Authority
CN
China
Prior art keywords
tree
cloud
data
local
current
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110625241.5A
Other languages
Chinese (zh)
Inventor
陈飞
李志鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen University
Original Assignee
Shenzhen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen University filed Critical Shenzhen University
Priority to CN202110625241.5A priority Critical patent/CN113282540A/en
Publication of CN113282540A publication Critical patent/CN113282540A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/13File access structures, e.g. distributed indices
    • G06F16/137Hash-based
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/178Techniques for file synchronisation in file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • G06F16/184Distributed file systems implemented as replicated file system

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The invention discloses a cloud object storage synchronization method, a cloud object storage synchronization device, computer equipment and a storage medium, wherein the method comprises the following steps: coding current client user data based on a cyclic push-pull synchronization mode to obtain a local history tree, a local current tree, a cloud history tree and a cloud current tree; comparing the state information of the local historical tree and the local current tree aiming at each round of synchronization; if the data change occurs, searching the changed first target data in the local historical tree and the local current tree, and synchronizing the first target data to cloud storage; comparing the state information of the cloud history tree and the cloud current tree when the current client is used; if the data change occurs, searching the changed second target data in the cloud history tree and the cloud current tree, and synchronizing the second target data to the local storage of the current client; the user data is updated to a global update state. The invention can make the user data in the global latest state.

Description

Cloud object storage synchronization method and device, computer equipment and storage medium
Technical Field
The invention relates to the technical field of computers, in particular to a cloud object storage synchronization method and device, computer equipment and a storage medium.
Background
In recent years, cloud computing has become the dominant computing infrastructure. It has the advantage of providing better computing services at a lower cost. Businesses also gradually choose to utilize the cloud to manage their data and traffic, ranging from small to large. As an important component of cloud computing, cloud storage is also rapidly evolving. Using cloud storage, users can access their data anytime and anywhere using a variety of platforms (including personal computers, tablets, cell phones, etc.). This advantage attracts enterprise and consumer users to use cloud storage to host their data. After people host data in cloud storage, how to manage their data to meet the requirements of upper-layer applications or personal requirements becomes a key issue.
Among the various data management applications, the cloud storage synchronization program is a useful and crucial application. For enterprise users, due to security concerns, enterprise users may wish to store data (e.g., still images) on both local and remote clouds, or even on different clouds. Data in the cloud provides normal customer data access through Web service; the data in the local storage may be used as a backup. In the event of an unexpected disruption of cloud storage, the local data copy may restore business for the enterprise by redirecting cloud data access to traditional local access. In this application scenario, it is crucial for enterprise users to synchronize their data between the cloud and the local data store. As another application scenario, a consumer user may also require a cloud storage synchronization service. Consider another scenario. Consumer users typically have several different devices (e.g., office computers, home computers, mobile phones, etc.) to process their data. Consumer users naturally want to synchronize data between these different devices. Cloud storage synchronization fits this requirement well. In reality, the existing popular cloud disk service such as Microsoft OneDrive is a typical cloud storage synchronization application.
Existing storage synchronization solutions can be divided into network-based traditional solutions and cloud-based modern solutions. In the first solution, rsync (remote data synchronization tool) is a typical solution. rsync can synchronize files and directories between two networked computers. rsync supports block transfer and recursive transfer. rsync suitably uses differential encoding to reduce the amount of data transfer. Since the rsync needs to index file blocks of both local and remote files, the rsync needs to be installed on both local and remote computers. In contrast, current cloud storage services provide only read and write interfaces. This makes rsync and this set of solutions unsuitable for cloud storage synchronization. Furthermore, automatic and real-time synchronization is also a problem.
In a second set of solutions, modern major cloud storage service providers provide a closed-source cloud synchronization program. For example, Microsoft OneDrive (a cloud storage service software) and Dropbox (a free network File synchronization tool) are two representative examples. These solutions require the user to outsource control of the data to the service provider. Data privacy is a key issue for cautious users. Flexible customization of the storage synchronization system is also not feasible. Furthermore, these solutions typically fix the backend storage server and do not allow the user to choose between the different existing cloud storage service providers. It is often very difficult when a user wants to transfer their data from one cloud to another.
Disclosure of Invention
The embodiment of the invention provides a cloud object storage synchronization method and device, computer equipment and a storage medium, aiming at synchronizing user data for different client equipment and enabling the user data to be in a global latest state.
In a first aspect, an embodiment of the present invention provides a cloud object storage synchronization method, including:
based on a cyclic push-pull synchronization mode, encoding current client user data by using a tree data structure to obtain a local history tree and a local current tree, a cloud history tree and a cloud current tree which are used for storing the user data in each round of synchronization;
comparing the state information of the local history tree and the local current tree for each round of synchronization; wherein the state information includes a file name, a modification time, and an identification ID of the user data;
if the state information of the local history tree is different from that of the local current tree, judging that data change occurs, then searching first target data which is changed in the local history tree and the local current tree, and synchronizing the first target data to cloud storage to complete the push synchronization process of the current round;
comparing the state information of the cloud history tree and the cloud current tree when the current client is used; wherein the state information includes a file name, a modification time, and an identification ID of the user data;
if the state information of the cloud history tree and the cloud current tree is different, judging that data change occurs, then searching second target data which are changed in the cloud history tree and the cloud current tree, and synchronizing the second target data to the local storage of the current client so as to complete the pull synchronization process of the current round;
after cloud storage and local storage are synchronized, user data are updated to a global latest state, and next round of synchronization is continued until an instruction for stopping synchronization is received. In a second aspect, an embodiment of the present invention provides a cloud object storage synchronization apparatus, including:
the system comprises a tree structure coding unit, a cloud history tree coding unit and a cloud current tree coding unit, wherein the tree structure coding unit is used for coding current client user data by utilizing a tree data structure based on a cyclic push-pull synchronization mode to obtain a local history tree and a local current tree, a cloud history tree and a cloud current tree which are used for storing the user data in each round of synchronization;
the first comparison unit is used for comparing the state information of the local historical tree and the local current tree aiming at each round of synchronization; wherein the state information includes a file name, a modification time, and an identification ID of the user data;
the push synchronization unit is used for judging that data change occurs if the state information of the local history tree and the local current tree is different, then searching the changed first target data in the local history tree and the local current tree, and synchronizing the first target data to cloud storage to complete the push synchronization process of the current round;
the second comparison unit is used for comparing the state information of the cloud history tree and the cloud current tree when the current client is used; wherein the state information includes a file name, a modification time, and an identification ID of the user data;
the pull synchronization unit is used for judging that data change occurs if the state information of the cloud history tree and the cloud current tree is different, then searching the changed second target data in the cloud history tree and the cloud current tree, and synchronizing the second target data to the local storage of the current client so as to complete the pull synchronization process of the current round;
and the data updating unit is used for updating the user data to a global latest state after the cloud storage and the local storage are synchronized, and continuing to perform the next round of synchronization until an instruction for stopping the synchronization is received.
In a third aspect, an embodiment of the present invention provides a computer device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor implements the cloud object storage synchronization method according to the first aspect when executing the computer program.
In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when executed by a processor, the computer program implements the cloud object storage synchronization method according to the first aspect.
The embodiment of the invention provides a cloud object storage synchronization method, a cloud object storage synchronization device, computer equipment and a storage medium, wherein the method comprises the following steps: based on a cyclic push-pull synchronization mode, encoding current client user data by using a tree data structure to obtain a local history tree and a local current tree, a cloud history tree and a cloud current tree which are used for storing the user data in each round of synchronization; comparing the state information of the local history tree and the local current tree for each round of synchronization; wherein the state information includes a file name, a modification time, and an identification ID of the user data; if the state information of the local history tree is different from that of the local current tree, judging that data change occurs, then searching first target data which is changed in the local history tree and the local current tree, and synchronizing the first target data to cloud storage to complete the push synchronization process of the current round; comparing the state information of the cloud history tree and the cloud current tree when the current client is used; wherein the state information includes a file name, a modification time, and an identification ID of the user data; if the state information of the cloud history tree and the cloud current tree is different, judging that data change occurs, then searching second target data which are changed in the cloud history tree and the cloud current tree, and synchronizing the second target data to the local storage of the current client so as to complete the pull synchronization process of the current round; after cloud storage and local storage are synchronized, user data are updated to a global latest state, and next round of synchronization is continued until an instruction for stopping synchronization is received. According to the embodiment of the invention, the local history tree, the local current tree, the cloud history tree and the cloud current tree are created through coding, the state information of the local history tree and the local current tree is compared, and the state information of the cloud history tree and the cloud current tree is compared, so that the user data is synchronized for different client devices, and the user data is in the global latest state.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a schematic flowchart of a cloud object storage synchronization method according to an embodiment of the present invention;
fig. 2 is a sub-flow diagram of a cloud object storage synchronization method according to an embodiment of the present invention;
fig. 3 is a schematic block diagram of a cloud object storage synchronization apparatus according to an embodiment of the present invention;
fig. 4 is a sub-schematic block diagram of a cloud object storage synchronization apparatus according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It is also to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the specification of the present invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
It should be further understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.
Referring to fig. 1, fig. 1 is a schematic flowchart of a cloud object storage synchronization method according to an embodiment of the present invention, which specifically includes: steps S101 to S106.
S101, coding current client user data by using a tree data structure based on a cyclic push-pull synchronization mode to obtain a local history tree and a local current tree as well as a cloud history tree and a cloud current tree for storing the user data in each round of synchronization;
s102, comparing the state information of the local historical tree and the local current tree aiming at each round of synchronization; wherein the state information includes a file name, a modification time, and an identification ID of the user data;
s103, if the state information of the local history tree and the local current tree is different, judging that data change occurs, then searching the changed first target data in the local history tree and the local current tree, and synchronizing the first target data to cloud storage to complete the push synchronization process of the current round;
s104, comparing the state information of the cloud history tree and the state information of the cloud current tree when the current client is used; wherein the state information includes a file name, a modification time, and an identification ID of the user data;
s105, if the state information of the cloud history tree and the cloud current tree is different, judging that data change occurs, searching second target data which are changed in the cloud history tree and the cloud current tree, and synchronizing the second target data to the local storage of the current client to complete the pull synchronization process of the current round;
and S106, after the cloud storage and the local storage are synchronized, updating the user data to a global latest state, and continuing to perform the next round of synchronization until an instruction for stopping synchronization is received.
In this embodiment, the tree data structure is used to encode and store the user data on the current client of the user, that is, the local history tree and the local current tree, and the cloud history tree and the cloud current tree are used to record data changes. Meanwhile, the embodiment adopts a cycle push-pull synchronization paradigm to realize storage synchronization. The method comprises the steps of comparing state information of a local history tree and state information of a local current tree, determining whether data change occurs on a current client of a user, synchronizing the changed data to cloud storage, and achieving a pushing process. And determining whether data change occurs on the cloud storage or not by comparing the state information of the cloud history tree and the current cloud tree, and synchronizing the changed data to the current client of the user to realize the pulling process. It should be noted that, if data change occurs in the cloud storage, it is indicated that data change occurs in other clients of the user, and the changed data is synchronized to the cloud storage, so that the data in the cloud storage is not synchronized with the data in the current client of the user.
After one round of pushing and pulling synchronization, the four trees are in the same state; i.e., local storage and cloud storage are synchronized. The next round of synchronization can then be performed. In a specific application scenario, the time interval between each round of synchronization may be set by the user, and may be once per second or once every few seconds.
In the embodiment, the local history tree, the local current tree, the cloud history tree and the cloud current tree are created by encoding, the state information of the local history tree and the local current tree is compared, and the state information of the cloud history tree and the cloud current tree is compared, so that the user data is synchronized for different client devices, and the user data is in the global latest state. Meanwhile, the cloud object storage synchronization method provided by the embodiment can synchronize user data from different sources (i.e. different client devices) by using the cloud storage as an intermediary. In contrast to existing closed cloud storage synchronization solutions, the present embodiments allow a data owner to directly control data and allow the data owner to choose between existing mainstream cloud storage providers after balancing issues such as performance, cost, privacy, etc.
By the cloud object storage synchronization method provided by the embodiment, a user can execute synchronization operation only by installing a synchronization program locally, and does not need to install the program in the cloud. The user can also control the data uploaded by the user to avoid the privacy disclosure of a third party. The embodiment is suitable for the existing mainstream cloud object storage service, and has no special requirements on cloud providers. The user is also allowed to select different cloud storage services according to budgets and requirements. The present embodiment allows the user to have more control over his data than existing solutions.
For example, assume that a consumer user wants to synchronize data for an office computer and a home computer. During working hours, the user may change the data. In order to ensure the reliability of the data, the user synchronizes the changed data to the cloud in real time by using the cloud object storage method provided by the embodiment. While it is also possible to synchronize data only when the user is away from the office, real-time synchronization is more user friendly. This is because the user may forget the manual synchronization operation when leaving the office, or because the user may want to share some data with colleagues using a Web link provided by the cloud service. When the user returns home, the user may run the cloud storage sync again to synchronize the data in the cloud to the local. Thereafter, the user can continue to update the data, which will also be synchronized into the cloud in real time. It is noted that the user may also update the data in the home computer before running the cloud storage synchronization system. Thereafter, the user may run the synchronization cloud storage object to synchronize updates in the office computer and the home computer. That is, the global latest state of the user data may be determined entirely by the client device, or of course, by a plurality of client devices.
For another example, assume that a medium-sized enterprise hosting its data uses the cloud object storage method provided by the present embodiment, the enterprise having multiple branches in different cities. All branches run cloud storage synchronization, which can help them maintain the latest state of data in real time. In this case, all branches can update their data, and real-time synchronization is necessary for the enterprise. Similarly, in this case, the global latest state is determined by a plurality of client devices in a distributed manner.
The following is a simple case description of synchronizing static data at two different places. Assume that two static directories D1 and D2 are synchronized manually. The D1 directory contains files a and B, which are local directories in the user's computer. The D2 directory contains files a and C, which are remote directories in cloud storage. At the beginning of synchronization, the file in D1 is scanned first. After file a is found, this file is synchronized to D2. Since D2 has this file, no operation need be performed. And then continues processing file B. No file B is found in D2, and then file B is copied to D2. So far, all files in directory D1 have been traversed.
Next, we continue to traverse through all the files in directory D2. First, file a is found. Since file a is located in D1, the next file B will be processed and the file will also be skipped. File C is then processed, since it is not in D1, so it is copied to D1. After traversing and processing all the data in D2, the two directories are checked, and then the entire synchronization process is completed, i.e., the two directories D1 and D2 are synchronized.
The synchronization process above shows some basic elements for synchronizing data between a local directory and a remote cloud directory. First, the directory needs to be traversed to synchronize all the data. During traversal, all differences between the two directories can be found. Essentially, traversing the directory helps to find the current state of the directory. The status indicates which data is present in the directory. A total of two state information are used to synchronize the data. One for local directories and the other for remote cloud directories. Second, some meta-information for each specific file needs to be used. The file name is basic meta information. In some cases, there may be two files in both directories. Therefore, modification time needs to be used to determine the latest version of the file.
The static synchronization described above uses only current state information to synchronize the local directory and the remote cloud directory. In fact, the user may update data on a different client device, i.e., extending static synchronization to dynamic synchronization.
In particular, data updates introduce more complexity to synchronization. For example, assume that a user deletes some files locally while the files still exist in the cloud. Files in the cloud are copied to the local directory using static synchronization. This contradicts the user's intention to delete the file. For cloud files, this may also occur. For example, a user deletes certain files on an office computer that have also been synchronized into the cloud. When the user returns home, using static synchronization copies the file into the cloud, which is an error. The goal of synchronization is to bring the user data to a global state-of-date, which is determined by the distributed co-existence of the local directory and the cloud directory in a dynamic case.
The key to addressing dynamic synchronization is to track state information changes for local and remote directories. In static synchronization, the current state of the local directory and the cloud directory is used for synchronization. To support data dynamic operations, another historical state is recorded for each data. The historical state contains all data information of the directory in the previous synchronization round. In contrast to the historical state, the current state contains the most up-to-date information. By comparing the historical and current states of the local and remote directories, respectively, it is clear which data has changed since the last synchronization round. Finally, dynamic synchronization is supported by using four states, where there are two states locally: one is the historical state and the other is the current state. Likewise, there are historical and current states on the cloud storage.
Storage synchronization is a classic topic in computer science. With the advent of new storage platforms and advances in new technologies (e.g., data structures, mathematics, etc.), there is a continuing search for new storage platforms and advances in new technologies. Existing research on synchronization can be divided into two categories: incremental synchronization and full synchronization.
The main idea of incremental synchronization is to synchronize the modified portions of the file, rather than the entire file content. It saves network resources by consuming more computing resources. The most typical and influential solution in this class is the rsync application. Subsequent studies have attempted to improve rsync in various ways. The incremental sync is divided into two groups. A group of files are divided into blocks with fixed sizes for synchronization; instead, another group determines the size of the sync block based on the file content. Fixed size file blocks. The original rsync application uses fixed file blocks.
The operation of rsync is substantially as follows: assume that a data sender wants to synchronize data with a data receiver. In the rsync synchronization process, a sender first sends a synchronization request containing a target file name to a receiver. After receiving the request, the receiving side divides the target file into a plurality of blocks according to the fixed length, calculates the checksum of each block, and then forms the checksums into a list. The checksum of each block contains two hash values. One is a weak hash value that can be computed very quickly. The other is a strong hash value that consumes more computing power, but has a lower probability of collision than a weak hash. The receiver will then send the list of checksums to the sender. When the sender receives the checksum list, it will calculate the hash value of its file chunk and compare it with the checksum list of the receiver. When a block with the same strong hash value is found, the sender records the location of the block and skips it. If the hash values are not consistent, the sender processes the file byte by byte until a file block with the same hash value is found. After the comparison, the sender will generate a list of differences and the locations of the same blocks. This result is then sent to the recipient. Finally, the recipient reorganizes the new file according to its local file blocks and the received results.
The WebSync application applies synchronization to the Web browser. The computing power of the browser is relatively weak. If the computing structure of rsync is used directly in a browser, it is easy to block a single thread of the browser and cause the browser to crash. Therefore, WebSync transfers the portion of the difference of the computed files that has higher performance requirements to the server side. The browser need only compute the file checksum list.
Notably, the rsync application needs to compare file contents byte by byte, which consumes a significant amount of computing resources. Later, the amount of computation required was reduced by the application of DeltaCFS. The DeltaCFS improves performance by first identifying the common modification types of the files and then synchronizing the files without calculating differences. Specifically, it intercepts the modification operation of the Linux kernel on the file system and directly obtains the file difference between the previous version and the current version. The DeltaCFS then matches the modification operation into the file update pattern. Using the matching results, the modified data can be synchronized directly without byte-by-byte comparison.
The use of incremental synchronization on small files can have negative effects, such as consuming unnecessary computing resources and increasing the amount of data transmitted over a network. When the file size is smaller than a certain threshold, the PandaSync will adopt full synchronization. Otherwise, the application will use incremental synchronization. The threshold is typically 300KB-400KB and is adjusted accordingly based on system performance.
The widely used Dropbox application also uses fixed-size file blocks for incremental synchronization. To improve the performance of Dropbox, a batch update delay synchronization mechanism is employed to relieve network pressure. This is because synchronization of small files can result in a large number of network requests and reduced performance. It creates a "scratch box" to temporarily store all changes. After a period of time, the changes in the box will all be pushed together to the remote server.
The content defines the size of the file block. In addition to optimizing the process of calculating the difference portion of the file content, the prior art divides the file into blocks of variable block size, which is determined by the block content. It typically applies some kind of hashing or fingerprinting algorithm to some file chunk. If the calculation result satisfies a predetermined condition, the size of the block is determined. Therefore, the generated block size is variable, unlike the fixed block used by rsync. This approach reduces computational consumption by increasing network resource cost compared to the fixed block size approach.
LBFS systems aim at synchronizing file systems rather than single files. It uses a truncated cryptographic hash value for fingerprinting and segmenting file blocks. LBFS maintains a database that stores locations of and indexes file blocks. When a file synchronization operation occurs, the system will look for the same file block in the database. LBFS reduces a large amount of computation because it does not require byte-by-byte comparison of files. But the disadvantage is that if a mismatch is found, the system needs to send the entire file block.
The QuickSync system also uses variable data block sizes. It selects the block size based on network conditions. Meanwhile, the method adopts a batch update delay synchronization mechanism to synchronize all data blocks, thereby reducing the consumption of network resources. The Dsync system combines rsync with content-defined blocks to find a balance between computing resources and network resources. A new hash algorithm and communication protocol are also presented. The new hash algorithm has a faster computation speed and a similar lower collision probability. Under the new communication protocol, the system first requests a weak hash for comparison and then requests a strong hash when the weak hashes match. Thus, it reduces the amount of data required for network communication. It also merges several consecutive weak hashes into one large block to further save the number of network requests.
The main idea of full-scale synchronization is to synchronize the entire contents of the modified file, rather than the modified portions. Synchronizing the entire file content requires high network bandwidth. However, full-scale synchronization, as compared to incremental synchronization, eliminates the need to compute the difference between the two files. This approach greatly reduces the computational requirements.
When synchronizing small files, the performance of full-scale synchronization is superior to incremental synchronization. The size of small files typically does not exceed 1 MB. For such files, incremental synchronization requires dividing the file into multiple blocks and computing a hash value. This can result in additional network traffic, the cost of which exceeds the size of the file content itself. Therefore, it is more suitable to use full-scale synchronization for small files.
The PandASync system combines full-scale synchronization and incremental synchronization in a hybrid manner. Files smaller than 400KB require full synchronization. The system makes this selection based on file size statistics. Previous studies have shown that small files account for the majority of all files. About 80% of daily use documents belong to small documents.
In general, full-scale synchronization is easy to implement and error-prone because there is no complex computation. Some well-known companies also choose to use full-scale synchronization. Google Drive and Microsoft Onedrive apply full-scale synchronization techniques to their cloud disk services. Full-scale synchronization requires only small computations on the client, which is very mobile-friendly.
In one embodiment, the step S101 includes:
packaging meta information of user data by using a FileStatus object, wherein the meta information comprises a file name, modification time and an identification ID;
respectively recording historical state information and current state information of locally stored data by creating a local historical tree and a local current tree in each round of synchronization; and
and respectively recording historical state information and current state information of the cloud storage data by creating a cloud historical tree and a cloud current tree.
In this embodiment, the tree structure is used to encode the state information of the synchronization directory. I.e. the state of the user data of the current client is encoded by the FileStatus structure. Further, a DirectoryStatus structure is used for the directory. DirectoryStatus contains a list of FileStatus/DirectoryStatus, since a directory consists of the files and subdirectories underneath it.
FileStatus contains meta information of a file, similar to leaf nodes in a tree. The meta information includes a file name, a modification time, and an identification ID. The file name may be obtained using a local system call or cloud storage API call. Modification time is a crucial piece of information, by which it can be determined whether a file has been modified. As for the identification ID to uniquely identify the file, sometimes, the two files may have the same content but have different file names, and thus can be distinguished by the identification ID.
In a specific embodiment, the identification ID is calculated for the meta-information using a cryptographic hash function. For the DirectoryStatus structure, it contains the meta-information of the directory, similar to the middle/root nodes of the tree. Its meta information is similar to FileStatus.
The present embodiment adopts a DirectoryStatus structure to fully represent the state, i.e. two DirectoryStatus structures are used to represent the history and the current state of the local directory, i.e. the local history tree and the local current tree. Likewise, the history and current state of the cloud directory are encoded using two other DirectoryStatus structures, namely a cloud history tree and a cloud current tree.
Specifically, four trees are first initialized to encode the system state. metatrelh, metaTreeLC represent the local history tree and the local current tree, respectively. Also, metaTreeCH, metaTreeCC represent the cloud history tree and the cloud current tree, respectively. Wherein the history state of the data in the last round of synchronization operation is encoded using the local history tree and the cloud history tree. The current state of the data in the current round synchronization is encoded using the local current tree and the cloud current tree. These four trees are examples of DirectoryStatus data structures.
During synchronization, in order to find out the latest state change, a DirectoryStatus structure is adopted to construct the current state, namely the local current tree metatreeLC, for the local directory. And searching for data change in the local directory according to the historical record and the current state of the local directory. Similarly, data changes in the cloud directory are looked up according to the history and current state of the cloud directory. And after the local data and the cloud data are synchronized, updating the historical record states of the local directory and the cloud directory to be the current states. Further, two history trees are stored in a local disk, so that accidents are prevented.
Additionally, at the beginning of each round of synchronization, a state tree (i.e., a local current tree and a cloud current tree) is computed to represent the local storage state and the cloud storage state. Specifically, first, the DirectoryStatus data structure is initialized using the path of the root directory. The meta information of the local directory is obtained by a local system call and the meta information of the cloud directory is obtained by calling an API of the cloud object storage, and then all subdirectories and files under the root directory are traversed in a recursive manner.
In order to process the normal file f, meta information thereof is acquired and a FileStatus object fs is constructed. After fs is inserted as an entry of the child list in its parent directory state tree, file processing is completed. For the processing of subdirectories, BuildPree will be recursively run to obtain the DirectoryStatus object subDir. The subDir is then inserted as a child entry into the child list of its parent directory state tree. This completes the traversal of all the files and subdirectories of the root directory. The returned metaTree is a tree data structure that contains all the state information of the file data, which can be used in subsequent push and pull processes to synchronize the user's data.
In one embodiment, as shown in fig. 2, the step S102 includes: steps S201 to S203.
S201, calling meta information of local storage data, and traversing the local history tree and the local current tree respectively in a recursive mode;
s202, comparing the traversal result of the local historical tree with the traversal result of the local current tree;
s203, judging whether data addition, modification, deletion and/or renaming occur between the local history tree and the local current tree according to the comparison result.
In this embodiment, the data modification includes addition, modification, deletion, renaming, and the like, and the renaming operation may be considered as a combination of the deletion and addition operations. In order to detect whether data change occurs, the present embodiment compares the state information of the local history tree metatrelh and the local current tree metatrelc. For the data change corresponding to the addition and modification, the changed data is located in the current tree, so the embodiment traverses the data in the local current tree and compares the data with the local history tree, thereby detecting whether the data addition and modification occur. For delete file/directory operations, the deleted data does not exist in the local current tree, but exists in the local history tree, so the present embodiment further traverses the local history tree.
In summary, the present embodiment completes the push process through two traversals. One for the local current tree and one for the local history tree.
In one embodiment, the step S103 includes:
when data addition is judged to occur between the local history tree and the local current tree, acquiring corresponding local addition data from the local current tree, and uploading the local addition data to cloud storage;
when data modification is judged to occur between the local history tree and the local current tree, the modification time of the modified data in the local history tree and the local current tree is compared; if the modification time of the local history tree is different from that of the local current tree and the modification time of the local current tree is the latest modification time, uploading corresponding local modification data to cloud storage;
when data deletion is judged to occur between the local history tree and the local current tree, searching corresponding local deletion data in the local history tree, and deleting corresponding data in cloud storage;
and when the data renaming between the local history tree and the local current tree is judged, acquiring corresponding renaming data in the local current tree, uploading the renaming data to cloud storage, and deleting data corresponding to the renaming data in the local history tree.
In this embodiment, the data addition means that the local data and the cloud data have been synchronized in the previous round, and thereafter, the user adds a new file/directory d in the current client. Since the local current tree metaTreeLC is built at the beginning of each synchronization round, the new file/directory d added is contained in the local current tree. However, d does not exist in the local history tree metatrelh. Therefore, when traversing the metaTreeLC, the change data with data addition is detected first, and then the addition data is uploaded to the cloud storage.
The data modification means that after the last round of synchronization, the user modifies the file/directory d in the current client. By comparing the modification times for file/directory d in the local current tree and the local history tree, it can be determined whether a modification has occurred. I.e. if the modification time of file/directory d is the latest time in the local current tree, this indicates that the data is changed. And then uploading the modified data to the cloud storage.
The data deletion means that after the previous round of synchronization, the user deletes the file/directory d in the current client. The file/directory d is therefore not contained in the local current tree, that is, this change cannot be detected while traversing the local current tree metaTreeLC. However, file/directory d still exists in the local history tree that holds the state of the previous round of synchronization. Therefore, by further traversing the local history tree metatrelh and checking whether data exists in the local current tree metatrelc, it can be determined whether data change for deleting data occurs. Once the deleted data is identified, it is deleted in the cloud storage through the API of the cloud storage.
It should be noted that this is also true when data addition, modification, and deletion processes are applied to directories. In other words, when a new directory is added, it is treated as new data and then uploaded into the cloud storage. The directory has no modification operation as a file. When a directory is deleted, all files and subdirectories will be deleted. When traversing the metatrelh, it will check whether each file and subdirectory exists in the local current tree. When they will not exist in metaTreeLC, they will be deleted correspondingly in cloud storage.
For renaming files, the present embodiment treats the renaming files as new data and uploads the new data to the cloud storage when traversing the local current tree. The old file is treated as deleted data and is deleted when traversing the local history tree. For directory operations, the operations are similar, the only difference being that directories need to be added and deleted in a recursive manner.
In one embodiment, the step S104 includes:
traversing the current cloud tree and comparing a traversal result with the historical cloud tree to determine whether data addition and/or modification occurs;
and traversing the cloud history tree, and comparing a traversal result with the cloud current tree to determine whether data deletion occurs.
In this embodiment, after the data in the client is updated and synchronized to the cloud storage, a pull process is performed. Similar to the push process, the pull process will detect data changes in the cloud storage and synchronize them to the current client. It is noted that data changes in cloud storage are caused by other client devices, including data additions, modifications, and deletions. When data in other client devices changes, the clients synchronize the changed data to the cloud.
The pull process also identifies data changes in the cloud storage through two traversals. In the first traversal, the cloud current tree metaTreeCC is traversed and compared to the cloud history tree metaTreeCH to find new or modified data in the cloud. In the second traversal, the cloud history tree metaTreeCH is traversed and compared to the cloud current tree metaTreeCC to find the data that should be deleted in the cloud.
In one embodiment, the step S105 includes:
when data addition and/or modification are determined to occur, corresponding cloud addition data and/or cloud modification data are obtained from the current cloud tree, and the cloud addition data and/or the cloud modification data are downloaded to a local storage of a current client;
and when the data deletion is determined to occur, determining corresponding cloud deletion data in the cloud history tree, and deleting the corresponding data in the local storage of the current client.
In this embodiment, in one round of synchronization, synchronization is performed on a current client (for example, Alice), and at this time, a state of cloud storage is stored in a cloud history tree metaTreeCH. Next, another client (e.g., Bob) adds data d to the data and synchronizes with the cloud storage. The global latest state of the user data has changed due to the update of the client Bob. While for client Alice, the data is not in the global latest state. Therefore, in the next round of synchronization of the client Alice, the update will be synchronized again for Alice. Thereafter, Alice owns the latest state of the cloud current tree metaTreeCH, which includes the added data d, but the data d does not exist in the cloud history tree metaTreeCH. Therefore, when traversing the current cloud tree metaTreeCC, d can be detected as new data, and thus, data d can be downloaded to the current client Alice.
When certain data d in the cloud storage is modified, the modification information is encoded in the cloud current tree metaTreeCC. Thus, data d may be detected while traversing the cloud current tree metaTreeCC and found to be different from d in the cloud history tree metaTreeCH. Further, in order to identify the modification/difference, the data d may be determined according to the modification time and the identification ID in the meta information, that is, if the modification time of the data d in the current cloud tree metaTreeCC is newer, the data d is downloaded to the current client.
When other clients delete data d in the cloud storage, then data d is not in the cloud current tree metaTreeCC. And the cloud history tree metaTreeCH stores the information in the last round of synchronization, that is, data d. Therefore, when data d in the cloud history tree is traversed and does not exist in the cloud current tree, the data d can be determined to be deleted, and therefore the locally stored data d of the current client can be deleted.
In an embodiment, the cloud object storage method further includes:
when data synchronization is carried out, whether target data identical to the identification ID of the data to be synchronized exists or not is searched in the cloud history tree or the local history tree, and if the target data is searched, the target data is copied in the cloud current tree or the local current tree.
In the embodiment, in order to improve the data transmission efficiency and reduce the resource consumption, the data in the cloud history tree or the local history tree is searched to search the data which is the same as the data identifier ID to be synchronously stored, that is, the target data, so that the target data can be directly copied to the corresponding cloud current tree or cloud history tree without being uploaded locally or downloaded through cloud storage.
For example, a user may copy some existing data into other subdirectories, and the file names of the data may change. This may occur locally or in a remote cloud. In this case, the synchronization system may directly copy data locally or in the cloud, rather than uploading/downloading data over a network. Specifically, when uploading data to the cloud storage, it is first checked whether its identification ID exists in the cloud history tree metaTreeCH. If the data to be uploaded already exists, a copy API of the cloud storage can be directly called to replace the uploading operation. Similarly, when data is downloaded from the cloud storage, the corresponding identification ID is checked first, and then it is determined whether the data exists in the local history tree metatree lh according to the identification ID, and if the data exists in the local history tree, the data can be copied to the local current tree through a local copy operation without downloading the data from the cloud storage. If the file names are different, the system renames the data to the target file name in addition to the copy operation.
For renaming files, new data needs to be uploaded or downloaded during synchronization, and when the file data volume is large, a large amount of network resources are consumed. However, if the content of the renamed data remains unchanged, then there is no need to retransmit the data content. Specifically, when new data is detected, it is checked whether the ID of the new data is stored in the local history tree metatree lh, and if corresponding data exists in the local history tree, it is determined that the data is renamed. And then, the cloud data can be operated by copying the API without transmitting the data through the network, so that repeated data transmission can be avoided, and the network overhead is reduced.
For directory files, all files/subdirectories need to be traversed recursively during synchronization. If the directory structure has many subdirectories, then the directory is entered further recursively layer by layer. Until the deepest directory is traversed, it will gradually return to the previous directory and continue traversing. Such operations consume a lot of time and memory. The recursive traversal approach wastes resources, particularly when there are no data changes in the multi-level directory. For this reason, the present embodiment defines the identification ID of the directory as:
Figure BDA0003101916640000161
wherein the directory contains n files f1, … …, fn and l subdirectories d1, … …, dl. The characteristics defined by this identification ID are: any changes to the file will eventually be reflected to the upper directory node. When the file content in the i-th layer changes, its identification ID will also change. This change will pass to the last directory layer by definition above. Accordingly, if all files/subdirectories in a directory have not been changed, the identification ID of the directory will also remain unchanged. Using the identification ID of the directory, directory copy and rename operations may be improved. When looking for a new directory, it will check if its identification ID already exists, and if so, it will only need to copy the directory locally, without having to recursively upload/download all files and sub-directories.
Although this speeds up the directory operation, it also incurs considerable computational cost. If the files in the synchronization directory are changed frequently, the system needs to frequently enter each subdirectory to calculate the latest identification ID. In this case, calculating the directory ID increases resource consumption, which becomes an unnecessary additional cost. Therefore, it is suggested to use the directory ID only in the case where the directory is frequently updated but the file is not updated. For the general case, the directory ID should be set to the null field.
In addition, in the last step of each round of synchronization, the local history trees of the current client local data and the cloud storage data need to be updated. It is understood that the cloud history tree metaTreeCH and the local history tree metatrelh are actually the same, that is, the local data of the current client and the data on the cloud storage can be represented by one history state. Meanwhile, the history state can be dynamically updated in the pushing and pulling processes. Such as adding or deleting some data, this change may be applied to the historical state. After all updates are complete, the history tree will reflect the current state of the clients for this round of synchronization.
In order to ensure the synchronization efficiency and reliability, the embodiment can call the local storage or cloud storage API to complete the renaming in place. Further, to support efficient renaming and move operations, the data content hash values may be encoded in four trees. That is, during data transfer to and from the cloud storage, it is detected whether its hash value exists locally. If a hash value is detected, then the synchronization task in local storage or cloud storage will be completed using rename and move operations. In order to improve reliability, after each round of synchronization, the history trees of local storage and cloud storage are simultaneously stored on the local disk. These history trees can be used to ensure the correctness of cloud storage synchronization if the device is turned off.
The correctness requirement of cloud storage synchronization is that user data can be always kept in a global latest state. This example discusses correctness in two cases. One is that no conflict occurs in a single round of synchronization of the individual data (files or directories). Conflicts may be caused by different client devices updating conflicting data items. Another is that there is a conflict with some data during synchronization.
For the first case, the global update state is determined by the update of the local data and the update of the remote cloud storage data, and the two updates are independent. In this case, the update of the local data will be synchronized into the cloud storage in the push process. During the pull process, remote cloud storage updates will be synchronized to the local client device. Combining these two processes, it can be ensured that the client device reaches a global up-to-date state in this round of synchronization. Over time, the run continues and data is synchronized at each discrete time period (i.e., each round). Even with multiple client devices, in most cases it should operate in such a conflict-free manner. This is because the present embodiment discretizes time into discrete time periods and performs synchronization in the discrete time periods. The chance of collision is small during a certain period of time.
For the second case, the conflict needs to be handled. An example is used to illustrate a conflict. Assume that a user has two client devices; one is running and the other is already off. A client device modifies the file at the office and synchronizes the modifications to the cloud storage. When the user returns home, the user may forget to synchronize the data and make modifications on the same file. When a user synchronizes data, the file is a conflict because the user edited it in two different client devices without proper synchronization. Generally, two methods can be used. First, the user is prompted to select and retain the correct version. This method ensures correctness. Second, conflicts can also be handled implicitly using a simple global latest time principle. The modification times of the data are compared, and the file with the latest modification time is used as a global latest file. This is reasonable because the user's intent is typically to keep the files up to date. The push process does this by first updating the most recent data to the cloud storage. The pull process then synchronizes the updates of the cloud to the local client device.
In an embodiment, the cloud object storage method is analyzed for theoretical performance. Assume that there are n files in the synchronization directory and that m files have been modified after the last synchronization round. While a directory is considered a special file. Table 1 gives a brief summary of the theoretical performance of the cloud object storage method.
State tree Push-in Pulling and taking
Cost of storage O(n) O(1) O(1)
Calculating cost O(n) O(n) O(n)
Cost of communication O(n) O(m) O(m)
TABLE 1
As can be seen from Table 1, the storage cost is at most O (n). The storage overhead of the cloud object storage method is mainly used for storing a local history tree and a local current tree, as well as a cloud history tree and a cloud current tree, and the state trees in the table 1 refer to the local current tree and the cloud current tree. In particular, the cloud object storage method uses two current trees to encode local data states and cloud data states. Each node in the tree stores meta information including a file name, a modification time, and an identification ID. One node occupies the storage space of O (1). Thus, the storage overhead of a single tree storing meta-information is O (n). In the optimized cloud object storage method, the total number of trees is 3. Therefore, the storage cost is O (n). The push and pull processes do not incur additional storage costs.
The computational cost is present in all synchronization processes. In the stage of checking data changes and building a state tree, the state tree (i.e. the local current tree and the cloud current tree) needs to be traversed, and the size of each state tree is O (n). Therefore, the traversal process requires time of O (n). Likewise, traversal also requires processing of n files and directories during push and pull. Suppose that it takes O (1) time to search for a file. If there are O (1) files per directory, this can be done by using a hash table or a direct linear search. In summary, both the push and pull processes require the computational cost of O (n).
Communication costs may also occur during synchronization. Before checking for data changes, it is necessary to obtain meta-information of all files from the cloud to build a state tree. For n files, n cloud storage API calls need to be called, which will send n HTTP requests. Therefore, the stage of building the state tree consumes communication costs of o (n). For push and pull, the communication cost depends on how many files are needed from cloud sync to cloud or from cloud sync to cloud. For m file changes, m cloud API requests need to be sent. Thus, the communication overhead is o (m).
The cloud object storage method was prototyped using Python 3.6. An object storage service of the Aliskiu is used as backend cloud storage. For the cloud storage service, setting a storage geographic region as 'Shenzhen'; no acceleration services are used, such as load balancing from the service provider. This implementation uses two history trees to simplify the encoding and fine evaluation. One is a local history tree and the other is a cloud history tree.
In an implementation, the SHA-256 is used as a cryptographic hash function to compute the identification ID of the file. The History State Tree is serialized using Python's pickle library to facilitate reading and writing to disk. Process _ time function is measured using the built-in timing function of Python.
To evaluate the performance of the cloud object storage method in a controlled manner, 1,000 common files were first generated, with file names ranging from "0001. txt" to "1000. txt". Each file has a random 1024 bytes. The performance was then evaluated using real data sets from two popular open source items. For the performance index, the storage overhead, the calculation overhead and the communication overhead are mainly measured. A total of 10 experiments were performed and the average experimental results are reported, which can provide a more stable prediction of performance.
The storage overhead. The storage cost is caused by storing the state tree locally. The cloud object storage method stores a local history state (i.e., a local history tree) and a cloud history state (a cloud history tree) in a disk. The two trees are similar in content. Table 2 lists the storage costs of the local history tree and the cloud history tree. Table 2 different file settings were tested, with the number of files varying from 100 to 10,000. These experimental results show that the storage cost increases linearly. The maximum cost of 10,000 files is about 1.53 MB. This overhead is acceptable for modern computers (e.g. PCs, mobile phones).
Number of files 100 1,000 1,000
Local History Tree storage cost 15,640 160,540 1,609,558
Cloud History Tree storage cost 15,639 160,539 1,609,557
TABLE 2
And calculating the overhead. The calculation section includes building a state tree and running a synchronization algorithm to identify data changes, divided into two sections. First, the cost of building two current trees; another is the cost in the push pull process. The two fractions were evaluated separately.
To build the state tree (i.e., the local current tree and the cloud current tree), 1,000 files are generated locally in advance and uploaded into the cloud. These files are used to build the local current tree and the cloud current tree. Table 3 shows the computational cost of building the state tree. From table 3, it can be found that it is most time consuming to build the cloud current tree. This is because the cloud object storage needs to be accessed using the cloud-provided API through network communication. Network access is much slower than local file reading. However, the cost is still small and acceptable. It is several hundred milliseconds in size.
Figure BDA0003101916640000201
TABLE 3
For the push-pull process, 1,000 files are also generated locally in advance and uploaded into the cloud. A portion of the content in the file is then altered to evaluate the performance of the push and pull. Data dynamics include adding, modifying, renaming, and deleting files. For each data dynamic operation, the operation is applied to 10% of the total number of files, i.e. 100 files. Taking the addition operation as an example, 100 files are created locally and then the push process is run. After the evaluation add operation, the newly added 100 files are deleted so that the clean-up environment is ready for the next evaluation. A total of 8 different tests were performed and the results are shown in table 4.
Process Operation of Time cost (millisecond)
Push-in Adding files 700.00
Push-in Modifying a file 864.06
Push-in Renaming files 1201.56
Push-in Deleting files 653.13
Pulling and taking Adding files 600.00
Pulling and taking Modifying a file 606.25
Pulling and taking Renaming files 862.50
Pulling and taking Deleting files 396.88
TABLE 4
From table 4 it can be seen that the push process is typically slower than the pull process. This is because pushing requires access to the cloud using the cloud's API. Renaming a file is the most time consuming for different operations. This is because an operation needs to be identified as renamed. This recognition operation requires searching the state tree and therefore takes more time. Also, deleting files takes the least time. This is because the file can be deleted by only calling the cloud API without further writing or modifying the file.
The communication overhead. The communication cost includes two parts, namely the number of HTTP requests and the length of the HTTP requests. The number of HTTP requests is first evaluated. The results are shown in Table 5. For each file, its information needs to be requested during synchronization. Each file requires 1 HTTP request. However, for each directory, it needs to be first identified as a directory and then further requested for its children. Then the number of requests for the directory is 2.
Figure BDA0003101916640000211
TABLE 5
When using the cloud API, there is no difference between file operations and directory operations. Take file operations as an example. For push, there are four file operations, namely create, modify, rename and delete. Each file takes 2 HTTP requests for the first and last operation. The renaming operation is divided into creation and deletion. It takes 4 HTTP requests. The modify operation is 1 more request than the create request because the modify operation requires checking the file hash value before updating the file. For pull, the analysis is similar, except for a local delete operation, which has no HTTP request. This is because the client only needs to retrieve existing files and directories.
For the length of HTTP requests, a major focus is on the extra cost of storing metadata for the file/directory. The present embodiment does not calculate the length of the data itself, as it is necessary to send the data to the cloud storage. The extra communication charges include the modification time and identification ID of the item. The modification time is stored in a string format; it is 10 bytes in length. The identification ID is calculated using the SHA256 algorithm; it is 32 bytes in length. Therefore, the additional data of each file is small and negligible compared to the contents of the normal file.
The above evaluation uses a small number of data sets to test performance in a fine-tuned manner. In order to evaluate the scalability and utility of the cloud object storage method, two real open source projects were selected for further evaluation. One item is also small and the other is large. Js, which has 644 files for building the popular JavaScript framework of the user interface. The larger item is a vscode with 6080 files, which is the mainstream source code editor. Js, the proposed cloud object storage method uses 171.875 milliseconds to build a state tree. For vscode, the corresponding cost is 2281.25 milliseconds. Since building a state tree is the most time consuming operation, and the push and pull processes consume much less time, the overhead of vue. Therefore, the experimental result shows that the cloud object storage method can be expanded and can well process synchronization in actual projects.
The experimental result shows that the cloud object storage method provided by the embodiment of the invention has low storage cost and low calculation cost. From the above tests, it can be found that the storage cost of each data item of the local history tree and the cloud history tree is about 160 bytes of local disk space. At the stage of building the state tree, the average time cost of building a file in the local state is less than 1 millisecond. When the system builds a cloud state tree, each file takes on average 1.6 milliseconds. Each data change takes approximately 7 milliseconds of computational time during the push-pull cycle. The resource overhead of the pull algorithm is less than the resource overhead of the push algorithm. The reason is that the pull algorithm calls a local system call to perform the update, rather than the network API used by the push algorithm. Communication overhead is also acceptable in today's network environment. Therefore, the cloud object storage method provided by the embodiment has practical value for the use case that the user wants to directly control the data.
Fig. 3 is a schematic block diagram of a cloud object storage synchronization apparatus 300 according to an embodiment of the present invention, where the apparatus 300 includes:
a tree structure encoding unit 301, configured to encode current client user data by using a tree data structure based on a cyclic push-pull synchronization manner, so as to obtain a local history tree and a local current tree, and a cloud history tree and a cloud current tree, which are used for storing user data, in each round of synchronization;
a first comparing unit 302, configured to compare, for each synchronization round, state information of the local history tree and the local current tree; wherein the state information includes a file name, a modification time, and an identification ID of the user data;
a push synchronization unit 303, configured to determine that data change occurs if the state information of the local history tree is different from the state information of the local current tree, then search for the changed first target data in the local history tree and the local current tree, and synchronize the first target data to a cloud storage, so as to complete a push synchronization process of a current round;
a second comparing unit 304, configured to compare state information of the cloud history tree and the cloud current tree when the current client is used; wherein the state information includes a file name, a modification time, and an identification ID of the user data;
a pull synchronization unit 305, configured to determine that data change occurs if the state information of the cloud history tree and the cloud current tree are different, then search for second target data that has changed in the cloud history tree and the cloud current tree, and synchronize the second target data to a local storage of a current client, so as to complete a pull synchronization process of a current round;
and the data updating unit 306 is configured to update the user data to a global latest state after the cloud storage and the local storage are synchronized, and continue to perform the next round of synchronization until an instruction for stopping synchronization is received.
In one embodiment, the tree structure encoding unit 301 includes:
the system comprises a packaging unit, a processing unit and a processing unit, wherein the packaging unit is used for packaging meta information of user data by using a FileStatus object, and the meta information comprises a file name, modification time and an identification ID;
the first recording unit is used for respectively recording the historical state information and the current state information of the locally stored data by creating a local historical tree and a local current tree in each round of synchronization; and
and the second recording unit is used for respectively recording the historical state information and the current state information of the cloud storage data by creating a cloud historical tree and a cloud current tree.
In an embodiment, as shown in fig. 4, the first comparing unit 302 includes:
a calling unit 401, configured to call metadata of local storage data, and respectively traverse the local history tree and the local current tree in a recursive manner;
a result comparing unit 402, configured to compare a traversal result of the local history tree with a traversal result of the local current tree;
a judging unit 403, configured to judge whether data addition, modification, deletion, and/or renaming occurs between the local history tree and the local current tree according to the comparison result.
In an embodiment, the push synchronization unit 303 includes:
the first judging unit is used for acquiring corresponding local addition data from the local current tree and uploading the local addition data to cloud storage when judging that data addition occurs between the local history tree and the local current tree;
a second determination unit, configured to compare modification times of modified data in the local history tree and the local current tree when it is determined that data modification occurs between the local history tree and the local current tree; if the modification time of the local history tree is different from that of the local current tree and the modification time of the local current tree is the latest modification time, uploading corresponding local modification data to cloud storage;
a third determining unit, configured to, when it is determined that data deletion occurs between the local history tree and the local current tree, search for corresponding local deletion data in the local history tree, and delete the corresponding data in cloud storage;
and the fourth judging unit is used for acquiring corresponding renamed data from the local current tree and uploading the renamed data to cloud storage when judging that data renaming occurs between the local history tree and the local current tree, and deleting the data corresponding to the renamed data from the local history tree.
In one embodiment, the second comparing unit 304 includes:
the first traversal unit is used for traversing the current cloud tree and comparing a traversal result with the historical cloud tree to determine whether data addition and/or modification occurs or not;
and the second traversing unit is used for traversing the cloud history tree and comparing a traversing result with the cloud current tree to determine whether data deletion occurs or not.
In one embodiment, the pull synchronization unit 305 includes:
the first determining unit is used for acquiring corresponding cloud adding data and/or cloud modifying data from the current cloud tree and downloading the cloud adding data and/or the cloud modifying data to a local storage of a current client when the data adding and/or modifying is determined to occur;
and the second determining unit is used for determining corresponding cloud deletion data in the cloud history tree and deleting the corresponding data in the local storage of the current client when the data deletion is determined to occur.
In an embodiment, the cloud object storage synchronization apparatus 300 further includes:
the searching unit is used for searching whether target data identical to the identification ID of the data to be synchronized exists in the cloud history tree or the local history tree during data synchronization, and copying the target data into the cloud current tree or the local current tree if the target data is searched.
Since the embodiments of the apparatus portion and the method portion correspond to each other, please refer to the description of the embodiments of the method portion for the embodiments of the apparatus portion, which is not repeated here.
Embodiments of the present invention also provide a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed, the steps provided by the above embodiments can be implemented. The storage medium may include: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
The embodiment of the present invention further provides a computer device, which may include a memory and a processor, where the memory stores a computer program, and the processor may implement the steps provided in the above embodiments when calling the computer program in the memory. Of course, the computer device may also include various network interfaces, power supplies, and the like.
The embodiments are described in a progressive manner in the specification, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. For the system disclosed by the embodiment, the description is relatively simple because the system corresponds to the method disclosed by the embodiment, and the relevant points can be referred to the method part for description. It should be noted that, for those skilled in the art, it is possible to make several improvements and modifications to the present application without departing from the principle of the present application, and such improvements and modifications also fall within the scope of the claims of the present application.
It is further noted that, in the present specification, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

Claims (10)

1. A cloud object storage synchronization method is characterized by comprising the following steps:
based on a cyclic push-pull synchronization mode, encoding current client user data by using a tree data structure to obtain a local history tree and a local current tree, a cloud history tree and a cloud current tree which are used for storing the user data in each round of synchronization;
comparing the state information of the local history tree and the local current tree for each round of synchronization; wherein the state information includes a file name, a modification time, and an identification ID of the user data;
if the state information of the local history tree is different from that of the local current tree, judging that data change occurs, then searching first target data which is changed in the local history tree and the local current tree, and synchronizing the first target data to cloud storage to complete the push synchronization process of the current round;
comparing the state information of the cloud history tree and the cloud current tree when the current client is used; wherein the state information includes a file name, a modification time, and an identification ID of the user data;
if the state information of the cloud history tree and the cloud current tree is different, judging that data change occurs, then searching second target data which are changed in the cloud history tree and the cloud current tree, and synchronizing the second target data to the local storage of the current client so as to complete the pull synchronization process of the current round;
after cloud storage and local storage are synchronized, user data are updated to a global latest state, and next round of synchronization is continued until an instruction for stopping synchronization is received.
2. The cloud object storage synchronization method according to claim 1, wherein the encoding current client user data by using a tree data structure to obtain a local history tree and a local current tree for storing user data and a cloud history tree and a cloud current tree in each synchronization round comprises:
packaging meta information of user data by using a FileStatus object, wherein the meta information comprises a file name, modification time and an identification ID;
respectively recording historical state information and current state information of locally stored data by creating a local historical tree and a local current tree in each round of synchronization; and
and respectively recording historical state information and current state information of the cloud storage data by creating a cloud historical tree and a cloud current tree.
3. The cloud object storage synchronization method of claim 2, wherein said comparing the state information of the local history tree and the local current tree for each synchronization round comprises:
calling the meta information of the local storage data, and respectively traversing the local history tree and the local current tree in a recursive mode;
comparing the traversal result of the local history tree with the traversal result of the local current tree;
and judging whether data addition, modification, deletion and/or renaming occur between the local history tree and the local current tree according to the comparison result.
4. The cloud object storage synchronization method according to claim 3, wherein if the state information of the local history tree and the local current tree are different, it is determined that data change occurs, then the first target data that is changed is found in the local history tree and the local current tree, and the first target data is synchronized to cloud storage, so as to complete a push synchronization process of a current round, including:
when data addition is judged to occur between the local history tree and the local current tree, acquiring corresponding local addition data from the local current tree, and uploading the local addition data to cloud storage;
when data modification is judged to occur between the local history tree and the local current tree, the modification time of the modified data in the local history tree and the local current tree is compared; if the modification time of the local history tree is different from that of the local current tree and the modification time of the local current tree is the latest modification time, uploading corresponding local modification data to cloud storage;
when data deletion is judged to occur between the local history tree and the local current tree, searching corresponding local deletion data in the local history tree, and deleting corresponding data in cloud storage;
and when the data renaming between the local history tree and the local current tree is judged, acquiring corresponding renaming data in the local current tree, uploading the renaming data to cloud storage, and deleting data corresponding to the renaming data in the local history tree.
5. The cloud object storage synchronization method of claim 1, wherein comparing the state information of the cloud history tree and the cloud current tree while using the current client comprises:
traversing the current cloud tree and comparing a traversal result with the historical cloud tree to determine whether data addition and/or modification occurs;
and traversing the cloud history tree, and comparing a traversal result with the cloud current tree to determine whether data deletion occurs.
6. The cloud object storage synchronization method according to claim 5, wherein if the state information of the cloud history tree and the cloud current tree is different, it is determined that data change has occurred, then second target data that has changed is found in the cloud history tree and the cloud current tree, and the second target data is synchronized to a local storage of a current client, so as to complete a current round of pull synchronization process, and the method includes:
when data addition and/or modification are determined to occur, corresponding cloud addition data and/or cloud modification data are obtained from the current cloud tree, and the cloud addition data and/or the cloud modification data are downloaded to a local storage of a current client;
and when the data deletion is determined to occur, determining corresponding cloud deletion data in the cloud history tree, and deleting the corresponding data in the local storage of the current client.
7. The cloud object storage synchronization method of claim 1, further comprising:
when data synchronization is carried out, whether target data identical to the identification ID of the data to be synchronized exists or not is searched in the cloud history tree or the local history tree, and if the target data is searched, the target data is copied in the cloud current tree or the local current tree.
8. A cloud object storage synchronization apparatus, comprising:
the system comprises a tree structure coding unit, a cloud history tree coding unit and a cloud current tree coding unit, wherein the tree structure coding unit is used for coding current client user data by utilizing a tree data structure based on a cyclic push-pull synchronization mode to obtain a local history tree and a local current tree, a cloud history tree and a cloud current tree which are used for storing the user data in each round of synchronization;
the first comparison unit is used for comparing the state information of the local historical tree and the local current tree aiming at each round of synchronization; wherein the state information includes a file name, a modification time, and an identification ID of the user data;
the push synchronization unit is used for judging that data change occurs if the state information of the local history tree and the local current tree is different, then searching the changed first target data in the local history tree and the local current tree, and synchronizing the first target data to cloud storage to complete the push synchronization process of the current round;
the second comparison unit is used for comparing the state information of the cloud history tree and the cloud current tree when the current client is used; wherein the state information includes a file name, a modification time, and an identification ID of the user data;
the pull synchronization unit is used for judging that data change occurs if the state information of the cloud history tree and the cloud current tree is different, then searching the changed second target data in the cloud history tree and the cloud current tree, and synchronizing the second target data to the local storage of the current client so as to complete the pull synchronization process of the current round;
and the data updating unit is used for updating the user data to a global latest state after the cloud storage and the local storage are synchronized, and continuing to perform the next round of synchronization until an instruction for stopping the synchronization is received.
9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the cloud object storage synchronization method of any of claims 1 to 7 when executing the computer program.
10. A computer-readable storage medium, having stored thereon a computer program which, when executed by a processor, implements the cloud object storage synchronization method of any of claims 1 to 7.
CN202110625241.5A 2021-06-04 2021-06-04 Cloud object storage synchronization method and device, computer equipment and storage medium Pending CN113282540A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110625241.5A CN113282540A (en) 2021-06-04 2021-06-04 Cloud object storage synchronization method and device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110625241.5A CN113282540A (en) 2021-06-04 2021-06-04 Cloud object storage synchronization method and device, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
CN113282540A true CN113282540A (en) 2021-08-20

Family

ID=77283455

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110625241.5A Pending CN113282540A (en) 2021-06-04 2021-06-04 Cloud object storage synchronization method and device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113282540A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114385747A (en) * 2021-09-22 2022-04-22 国家电网有限公司 Mobile internet rapid data synchronization method
CN114422503A (en) * 2022-01-24 2022-04-29 深圳市云语科技有限公司 Method for intelligently selecting file transmission mode of multi-node file transmission system
CN115454720A (en) * 2022-09-20 2022-12-09 中电云数智科技有限公司 Data increment reconstruction system and method based on daos distributed storage system
CN116701380A (en) * 2023-08-02 2023-09-05 长扬科技(北京)股份有限公司 Method and device for clearing redundant data based on Openstack

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102790760A (en) * 2012-05-31 2012-11-21 清华大学 Data synchronization method based on directory tree in safe network disc system
CN105740418A (en) * 2016-01-29 2016-07-06 杭州亿方云网络科技有限公司 File monitoring and message pushing based real-time synchronization system
CN108573014A (en) * 2017-12-19 2018-09-25 北京金山云网络技术有限公司 A kind of file synchronisation method, device, electronic equipment and readable storage medium storing program for executing
CN110347651A (en) * 2019-06-11 2019-10-18 平安科技(深圳)有限公司 Method of data synchronization, device, equipment and storage medium based on cloud storage

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102790760A (en) * 2012-05-31 2012-11-21 清华大学 Data synchronization method based on directory tree in safe network disc system
CN105740418A (en) * 2016-01-29 2016-07-06 杭州亿方云网络科技有限公司 File monitoring and message pushing based real-time synchronization system
CN108573014A (en) * 2017-12-19 2018-09-25 北京金山云网络技术有限公司 A kind of file synchronisation method, device, electronic equipment and readable storage medium storing program for executing
CN110347651A (en) * 2019-06-11 2019-10-18 平安科技(深圳)有限公司 Method of data synchronization, device, equipment and storage medium based on cloud storage

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114385747A (en) * 2021-09-22 2022-04-22 国家电网有限公司 Mobile internet rapid data synchronization method
CN114422503A (en) * 2022-01-24 2022-04-29 深圳市云语科技有限公司 Method for intelligently selecting file transmission mode of multi-node file transmission system
CN114422503B (en) * 2022-01-24 2024-01-30 深圳市云语科技有限公司 Method for intelligently selecting file transmission mode by multi-node file transmission system
CN115454720A (en) * 2022-09-20 2022-12-09 中电云数智科技有限公司 Data increment reconstruction system and method based on daos distributed storage system
CN115454720B (en) * 2022-09-20 2024-04-02 中电云计算技术有限公司 Data increment reconstruction system and method based on daos distributed storage system
CN116701380A (en) * 2023-08-02 2023-09-05 长扬科技(北京)股份有限公司 Method and device for clearing redundant data based on Openstack
CN116701380B (en) * 2023-08-02 2023-10-27 长扬科技(北京)股份有限公司 Method and device for clearing redundant data based on Openstack

Similar Documents

Publication Publication Date Title
JP7050931B2 (en) Efficient management of client synchronous updates
US11190587B1 (en) Synchronizing cloud data based upon network sensing operations
US11985192B2 (en) Synchronized content library
CN113282540A (en) Cloud object storage synchronization method and device, computer equipment and storage medium
US9967298B2 (en) Appending to files via server-side chunking and manifest manipulation
US9183213B2 (en) Indirection objects in a cloud storage system
RU2500023C2 (en) Document synchronisation on protocol not using status information
US9323758B1 (en) Efficient migration of replicated files from a file server having a file de-duplication facility
CN101819577B (en) Method, system and apparatus for maintaining file system client directory caches
US7640363B2 (en) Applications for remote differential compression
US8103621B2 (en) HSM two-way orphan reconciliation for extremely large file systems
US11442902B2 (en) Shard-level synchronization of cloud-based data store and local file system with dynamic sharding
JP2004038960A (en) System and method of managing file name for file system filter driver
CN111273863B (en) Cache management
Tummarello et al. RDFSync: efficient remote synchronization of RDF models
US12093316B2 (en) Partial file system instances
CN108256019A (en) Database key generation method, device, equipment and its storage medium
US7761418B2 (en) Method and product for sharing logged data objects within a distributed storage system
CN111562936B (en) Object history version management method and device based on Openstack-Swift
US10795891B2 (en) Data deduplication for eventual consistency system and method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20210820

RJ01 Rejection of invention patent application after publication