WO2018054497A1

WO2018054497A1 - Method and device for synchronising data between devices

Info

Publication number: WO2018054497A1
Application number: PCT/EP2016/072754
Authority: WO
Inventors: Christopher Lowe
Original assignee: Huawei Technologies Co., Ltd.
Priority date: 2016-09-23
Filing date: 2016-09-23
Publication date: 2018-03-29

Abstract

A device is configured to apply a relatively strong hash function to data that is to be synchronised with data stored at an originating device to generate a strong hash result. The device is configured to divide the data that is to be synchronised into a plurality of sections and apply a relatively weak hash function to each of those sections to generate respective weak hash results. The device is also configured to communicate the weak hash results and the strong hash result to the originating device. The device is thus able to transfer information to the originating device that will help the originating device to identify sections of data that are different between the two devices, while keeping the amount of data that has to be transferred between the two devices relatively small.

Description

Method and Device for Synchronising Data Between Devices

This invention relates to devices and techniques for synchronising data between one device and another.

It is often necessary to push data to a device in order to synchronise the data held at a target device with data held at one or more other devices. Often the target device will already have some data, so what is required is to update the data that the device already has. An example is system configuration settings. This can be achieved by wirelessly transmitting the replacement settings to the target device. The target device then overwrites its old version of the settings with the replacement settings. Data updates such as this can often be large, so it is sometimes preferable to compress the replacement data before it is transmitted in order to use the available bandwidth more efficiently. Transmitting the entirety of the replacement data is one option for pushing data to a device. This is not necessarily efficient, given that at least some of the replacement data will usually be the same as the original. It is more efficient to transmit the differences between the new data and the data that the device has already. The challenge with this approach is in identifying exactly what data a particular device might already have. One option would be just to ask the device to send the data that it has, so that the new data can be checked against it, but this is not an efficient use of bandwidth. Another option is to use a versioning system, but that relies on the device having an up-to-date copy of the data. The rsync algorithm is another option. It identifies files that differ by checking various parameters for each file, such as modification time and file size, and also uses a rolling checksum to compare areas of a file. Rsync is efficient but bandwidth hungry.

It is an object of the invention to provide techniques for updating data stored on a device that are suitable for bandwidth-constrained devices. The foregoing and other objects are achieved by the features of the independent claims. Further implementation forms are apparent from the dependent claims, the description and the figures.

According to a first aspect, a device is provided that is configured to apply a relatively strong hash function to data that is to be synchronised with data stored at an originating device to generate a strong hash result. The device is configured to divide the data that is to be synchronised into a plurality of sections and apply a relatively weak hash function to each of those sections to generate respective weak hash results. The device is also configured to communicate the weak hash results and the strong hash result to the originating device. The device is thus able to transfer information to the originating device that will help the originating device to identify sections of data that are different between the two devices, while keeping the amount of data that has to be transferred between the two devices relatively small.

The device may be configured to divide the data that is to be synchronised into a plurality of sections multiple times, the sections into which the data is divided becoming smaller each time. The device may also be configured to apply a hash function to every section each time that the data is divided, the hash function becoming progressively weaker each time, to create respective weak hash results that form a hierarchy of hashes in which progressively smaller sections of the data are represented by progressively weaker hash results. This hierarchy of hashes enables the originating device to focus on specific areas of difference between the data stored at the two devices. This helps to reduce the amount of replacement data that the originating device has to transmit to the target device, since discrepancies can be localised to relatively small areas of the overall data. The size of the sections and strength of the hash may be tuned to minimise the amount of data that has to be transferred.

The device may be configured to receive a replacement data section and an originating strong hash result from the originating device. The device may be configured to alter the data that it stores to incorporate the replacement data section and to apply the relatively strong hash function to the altered data to generate an altered strong hash result. It may also be configured to compare the altered strong hash result with the originating strong hash result. This mechanism provides a way for the target device to check whether or not the replacement section of data has synchronised its data with that of the originating device.

The device may be configured to, if the altered strong hash result and the originating strong hash result are different from each other: (i) divide the altered data into sections;(ii) apply a relatively weak hash function to those sections to generate respective second weak hash results that are longer than the weak hash results; and (iii) communicate the second weak hash results to the originating device with the altered strong hash result. This helps the target device to address situations where the weak hashes are misleading and failing to flag up sections of the data where there is actually a mismatch between the data stored at the two devices.

According to a second aspect, a device is provided that is configured to receive a strong hash result and a plurality of weak hash results from a target device, said strong hash result and weak hash results representing data stored at the target device that is to be synchronised with data stored by the device. The device is configured to identify, in dependence on the weak hash results and the strong hash result communicated by the target device, to what extent the data stored by the target device differs from the data stored by the device. In this way the originating device is able to focus on areas of difference between its data and that stored by the target device without the target device having to transmit an entire copy of that data to the originating device.

The device may be configured to apply a relatively strong hash function to the data stored at the device to generate an originating strong hash result. The data stored at the device may be divided into a plurality of sections. A relatively weak hash function may be applied to each of those sections to generate respective third weak hash results. The device may compare the third weak hash results and the originating strong hash result with the weak hash results and strong hash result received from the target device. This enables the originating device to identify discrepancies between the data stored by the two devices, since those discrepancies will be reflected in the respective hash results.

The device may be configured to divide the data stored at the originating device into a plurality of sections that are equivalent to the plurality of sections into which the data stored at the target device was divided. This mimics the procedure performed at the target device and enables the originating device to directly compare the hash results that it has received with the hash results that it has generated locally.

The device may be configured to configured to determine, if the strong hash result is different from the originating strong hash result, that the data stored at the target device is different from the data stored at the device. This provides a quick, efficient test of whether the data is synchronised between the two devices.

The device may be configured to determine where the data stored by the target device differs from that stored by the device by identifying any section of the data stored by the target device that has a weak hash result different from the third weak hash result of the corresponding section of data stored by the device. This localises the differences between the data stored by the two devices to specific sections of that data. The device may be configured to communicate the corresponding section of data to an identified section of data to the target device. This enables the data to be synchronised between the two devices by providing replacement data for sections that appear to show some discrepancy.

The device may be configured to communicate the section of data to the target device with the originating strong hash result. The originating strong hash result provides a mechanism for the target device to confirm whether or not the replacing that particular section of data with the new section of data has been enough to synchronise the data between the two devices.

According to a third aspect, method is provided that comprises identifying data, stored at a target device, that is to be synchronised with data stored at an originating device. A relatively strong hash function is applied to the data stored at the target device to generate a strong hash result. The data stored at the target device is divided into a plurality of sections and a relatively weak hash function is applied to each of those sections to generate respective weak hash results. The weak hash results and the strong hash result are communicated to the originating device. This technique enables the target device to provide the originating device with information that it can use to identify sections of the data that differ between the two devices, while limiting the amount of data that has to be transferred between the two devices.

According to a fourth aspect, a non-transitory machine readable storage medium is provided having stored thereon processor executable instructions implementing a method. The method comprises applying a relatively strong hash function to data, which is stored in one device and is to be synchronised with data stored at another device, to generate a strong hash result. The data that is to be synchronised is divided into a plurality of sections and a relatively weak hash function is applied to each of those sections to generate respective weak hash results. The weak hash results and the strong hash result are communicated to the other device.

The present invention will now be described by way of example with reference to the accompanying drawings. In the drawings: Figure 1 shows an example of a device;

Figure 2 is a flowchart showing an example of a technique for processing data;

Figure 3 is a flowchart showing an example of a technique for analysing received data; and

Figure 4 shows an example of a process for synchronising data between two devices. An example of a device for communicating a data update to another device is shown in Figure 1 . The device comprises a number of functional blocks including a data store 101 , an input/output 102, a hash calculator 103, a comparison unit 104 and a controller 105. The operations of these functional blocks will be described with reference to Figure 2, which provides an overview of a bandwidth efficient technique for updating the data stored on a device.

The device in Figure 1 is shown illustratively as comprising a number of functional blocks. In practice these functional blocks are likely to be implemented using software. The controller, hash calculator and comparison unit in particular are likely to be implemented wholly or partly by a processor acting under software control. The functional blocks shown in Figure 1 may be embodied by a single computer program stored on a non-transitory machine-readable storage medium. In other implementations, the functional blocks of Figure 1 could be embodied by a number of separate computer programs. Figure 1 is not intended to define a strict division between different programs, procedures or functions in software. In one implementation, the device may form part of a server tasked with generating data updates for a number of distributed devices. In some implementations, some or all of the functions of the device could be implemented wholly or partly in hardware. The data store and input/output, in particular, are likely to incorporate hardware elements. In some implementations the device may be configured to communicate data with other devices via a wired connection. In these implementations the input/output may be any suitable input port. In some implementations the take the form of device may be configured to communicate data with other devices via a wireless connection. In these implementations the input/output may be a wireless transceiver configured to operate in accordance with any suitable communications protocol. A typical wireless transceiver will usually incorporate dedicated hardware for performing functions such as frequency mixing, filtering, demodulation, decoding etc.

The operations of the device shown in Figure 1 will now be described with reference to the process illustrated in Figure 2. The technique commences in step S201 with the controller 105 identifying data that needs to be synchronised with another device. Typically, that data will be stored in the data store 101. The hash calculator then calculates a strong hash of the entirety of the data (step S202). That data may be termed the "image", and it will frequently represent a program, procedure or function in the device's software, or any part thereof. Applying the strong hash function to the entire image generates a strong hash result. The image is divided into sections by the hash calculator (step S203). These sections might overlap. They might be of equal size or of different size. The hash calculator calculates a hash of each section (step S204), and those hashes are weaker than the strong hash calculated in step S202. In most implementations the same hash function will be applied to every section, but they could be different. The outcome is a plurality of weak hash results, which are then communicated to the other device with the strong hash result calculated in step S202 (step S205). The hashes provide a description of the data that has to be synchronised between two devices. The hashes are much smaller than the data itself, so they provide a way of communicating a description of the data between the two devices in a bandwidth efficient way. The different levels of hash enable the other device to identify specific areas of difference between the data stored at the two devices with some accuracy.

The hash calculator may be configured to use any suitable function to calculate the hashes. In general, a hash function maps data of arbitrary size to data of fixed size. The strength of a hash function is determined by the size of the fixed-sized data that it maps to. The larger the fixed-size data (e.g. the more bits it contains), the greater the number of different hashes that are available for a given data set. This means that fewer members of the data set map to the same hash, thus strengthening the hash. The converse is also true. The smaller the fixed-size data, the more members of a given data set will map to any given hash. The hash is consequently weaker because there is a greater likelihood of false positives. False positives occur when two hashes are identical, and thus assumed to correspond to the same member of the underlying data set, whereas in fact they were generated from different members of that data set. Thus where hash functions are referred to as being relatively "strong" or "weak" herein that relates to the size of the fixed size data that the respective hash function maps to. A relatively strong hash function maps to larger fixed size data than a relatively weak hash function, and vice versa.

The weak hash functions will typically generate hash results that incorporate some uncertainty because the set of available hash results is smaller than the number of possible data permutations in the image. Thus image sections that are actually different may appear to be the same based on their weak hashes. The strong hash function preferably generates hash results that are very unlikely to generate false positives. Thus, although the weak hashes are imperfect, the strong hash ensure correctness with a miniscule chance of collision. Only two layers of hashes are computed in the example of Figure 2 but the device could calculate more. This would involve the device repeatedly dividing the image into sections, with the sections getting smaller each time. The device calculates hashes for all of the sections, and suitably those hashes get weaker each time that the image is divided. The result is a hierarchy of hashes in which progressively smaller sections of the data are represented by progressively weaker hashes. This helps to limit the amount of data that is transferred between the two devices: the more hashes that are computed for a given image (due to the image having been divided into sections) the fewer bits that each of those individual hashes contains. An example of the process that a device receiving the communication containing the strong and weak hashes may perform is shown in Figure 3. A device for performing this process may be the same as that shown in Figure 1 . The process starts in step S301 with the device receiving a strong hash and a weak hash from another device via its input/output 102. The device applies the strong hash function to the entire image that it has in its own data store 101. The comparison unit 104 compares the strong hash result with the strong hash received from the other device. If the two are the same, controller 105 determines that the data is synchronised (step S304). If the two are different, the device mimics the behaviour of the other device by having the hash calculator 103 divide the local image into sections (step S305) and apply a weak hash function to each section (step S306). Any areas in which there is a discrepancy between the local image and that held by the other device is then identified by comparing the weak hashes for each respective image section (step S307). If the weak hash generated locally for a given image section is different from that received from the other device, this indicates that there is a discrepancy between that part of the local image and the image at the other device.

An example of a message flow between two devices in a specific scenario in which one device is a server and the other device is a client, such as a user equipment (UE) is shown in Figure 4. The key steps in the process are as follows: 1 . Weak Hashes (WH in Figure 4) are calculated over several key/value pairs. A Strong Hash (SH in Figure 4) is calculated over the entirety of the same source data. This is done on both the client side and the server side.

2. The hashes are sent from the client to the server.

3. The server checks for matches and differences in the hashes. If the strong hash doesn't match, then the server checks for differences in the weak hashes. It is likely (with a probability based on the weak hash length) that the source of any discrepancy in the strong hash will be reflected in a corresponding mismatch in the weak hashes. 4. The key/value pairs that are likely to have caused the difference in the weak hashes are sent from the server to the client, together with a Strong Hash for the total result.

5. The client applies the changes by replacing its local copy of the key/value pairs with the versions received from the server. The client then recomputes the hashes. If the Strong Hash for the new local source data does not match the Strong Hash received from the server, synchronisation has not been achieved. In this situation the client computes longer Weak Hashes of the key/value pairs.

6. The Strong Hash and the Weak Hashes computed from the new client source data are sent to the server.

If necessary, the process detailed in 1 to 6 above is repeated until the data is synchronised, which will be identified by matching strong hashes from the server side and the client side.

If the strong hash still does not match even after the weak hashes all match, then the weak hashes were not long enough. The matching weak hashes in this situation are due to mismatching key/value pairs mapping onto the same weak hash. This is identified in step 5 above. The solution is to calculate weak hashes for the new local source data that are longer than the weak hashes that were originally calculated for the key/value pairs. The data sections that the weak hash function is applied to are suitably of the same length as in the original iteration of this technique, but the weak hash functions themselves may be relatively stronger, so that fewer data sets map to each weak hash result.

The applicant hereby discloses in isolation each individual feature described herein and any combination of two or more such features, to the extent that such features or combinations are capable of being carried out based on the present specification as a whole in the light of the common general knowledge of a person skilled in the art, irrespective of whether such features or combinations of features solve any problems disclosed herein, and without limitation to the scope of the claims. The applicant indicates that aspects of the present invention may consist of any such individual feature or combination of features. In view of the foregoing description it will be evident to a person skilled in the art that various modifications may be made within the scope of the invention.

Claims

1 . A device configured to:

apply a relatively strong hash function to data that is to be synchronised with data stored at an originating device to generate a strong hash result;

divide the data that is to be synchronised into a plurality of sections and apply a relatively weak hash function to each of those sections to generate respective weak hash results; and

communicate the weak hash results and the strong hash result to the originating device.

2. The device as claimed in claim 1 , configured to:

divide the data that is to be synchronised into a plurality of sections multiple times, the sections into which the data is divided becoming smaller each time; and

applying a hash function to every section each time that the data is divided, the hash function becoming progressively weaker each time, to create respective weak hash results that form a hierarchy of hashes in which progressively smaller sections of the data are represented by progressively weaker hash results.

3. A device as claimed in claim 1 or 2, configured to:

receive a replacement data section and an originating strong hash result from the originating device;

alter the data that it stores to incorporate the replacement data section;

apply the relatively strong hash function to the altered data to generate an altered strong hash result; and

compare the altered strong hash result with the originating strong hash result.

4. A device as claimed in claim 3, configured to, if the altered strong hash result and the originating strong hash result are different from each other:

divide the altered data into sections;

applying a relatively weak hash function to each of those sections to generate respective second weak hash results that are longer than the weak hash results; and

communicate the second weak hash results to the originating device with the altered strong hash result.

5. A device configured to:

receive a strong hash result and a plurality of weak hash results from a target device, said strong hash result and weak hash results representing data stored at the target device that is to be synchronised with data stored by the device; and

identifying, in dependence on the weak hash results and the strong hash result communicated by the target device, to what extent the data stored by the target device differs from the data stored by the device.

6. A device as claimed in claim 5, configured to:

apply a relatively strong hash function to the data stored at the device to generate an originating strong hash result;

divide the data stored at the device into a plurality of sections and apply a relatively weak hash function to each of those sections to generate respective third weak hash results; and

compare the third weak hash results and the originating strong hash result with the weak hash results and strong hash result received from the target device.

7. A device as claimed in claim 6, configured to divide the data stored at the originating device into a plurality of sections that are equivalent to the plurality of sections into which the data stored at the target device was divided.

8. A device as claimed in claim 6 or 7, configured to determine, if the strong hash result is different from the originating strong hash result, that the data stored at the target device is different from the data stored at the device.

9. A device as claimed in any of claims 6 to 8, configured to determine where the data stored by the target device differs from that stored by the device by identifying any section of the data stored by the target device that has a weak hash result different from the third weak hash result of the corresponding section of data stored by the device.

10. A device as claimed in claim 9, configured to communicate the corresponding section of data to an identified section of data to the target device.

1 1 . A device as claimed in claim 10, configured to communicate the section of data to the target device with the originating strong hash result.

12. A method comprising:

identifying data, stored at a target device, that is to be synchronised with data stored at an originating device;

applying a relatively strong hash function to the data stored at the target device to generate a strong hash result;

dividing the data stored at the target device into a plurality of sections and applying a relatively weak hash function to each of those sections to generate respective weak hash results; and

communicating the weak hash results and the strong hash result to the originating device.

13. A non-transitory machine readable storage medium having stored thereon processor executable instructions implementing a method comprising:

applying a relatively strong hash function to data, which is stored in one device and is to be synchronised with data stored at another device, to generate a strong hash result; dividing the data that is to be synchronised into a plurality of sections and applying a relatively weak hash function to each of those sections to generate respective weak hash results; and

communicating the weak hash results and the strong hash result to the other device.