US7984018B2 - Efficient point-to-multipoint data reconciliation - Google Patents
Efficient point-to-multipoint data reconciliation Download PDFInfo
- Publication number
- US7984018B2 US7984018B2 US11/109,011 US10901105A US7984018B2 US 7984018 B2 US7984018 B2 US 7984018B2 US 10901105 A US10901105 A US 10901105A US 7984018 B2 US7984018 B2 US 7984018B2
- Authority
- US
- United States
- Prior art keywords
- hashes
- hash
- erasure
- receiver
- dataset
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related, expires
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/12—Applying verification of the received information
- H04L63/123—Applying verification of the received information received data contents, e.g. message integrity
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/60—Network structure or processes for video distribution between server and client or between remote clients; Control signalling between clients, server and network components; Transmission of management data between server and client, e.g. sending from server to client commands for recording incoming content stream; Communication details between server and client
- H04N21/63—Control signaling related to video distribution between client, server and network components; Network processes for video distribution between server and clients or between remote clients, e.g. transmitting basic layer and enhancement layers over different transmission paths, setting up a peer-to-peer communication via Internet between remote STB's; Communication protocols; Addressing
- H04N21/631—Multimode Transmission, e.g. transmitting basic layers and enhancement layers of the content over different transmission paths or transmitting with different error corrections, different keys or with different transmission protocols
Definitions
- This description relates generally to enabling efficient data reconciliation and more specifically to enabling efficient reconciliation of an outdated or modified version or copy of a master file or dataset.
- a dataset can be any arbitrary type of data, such as a file, a file system directory, a set of one or more web pages, a BLOB, a data structure, etc.
- a receiver with a dataset that needs to be updated may send feedback to a sender indicating the differences between the receiver's dataset and the sender will use that feedback to provide the receiver with individually tailored update information that the receiver can use to update its version of the dataset to match the sender's master version of the dataset.
- a receiver may provide a sender with clues or feedback about the particular data that the receiver needs to update its copy of the dataset.
- the sender may not be able to handle the overhead needed to form individual bi-directional connections with a large number of clients (receivers); one-way broadcasting may be the only means of propagating update information to clients. If a one-way communication medium is being used, for example broadcast radio, then feedback will not be possible. Whether feedback is possible or not, and regardless of the application, there is a general need to minimize the amount of information that a receiver or client needs to receive in order to be able to compare or update its version of a corresponding file, dataset, table, data store, etc. There is also a need to minimize the bandwidth used to update multiple receivers. Minimizing the amount of delta or update information can conserve network bandwidth, reduce the active listening time of a wireless device, conserve battery energy, and reduce the time that it takes to bring a receiver's version up to date.
- One source may enable updating at multiple receivers by sending each receiver the same update information. Any receiver can use the same update information to increase its knowledge about how its target dataset differs from the master dataset, even though its target dataset may uniquely differ from the master dataset.
- the master dataset may be divided into divisions and subdivisions, which may be hashed to form a hash hierarchy.
- the update information sent by the sender may include a top level of hashes of the hash hierarchy as well as encodings of the lower levels of the hash hierarchy and encodings of blocks of the content of the master dataset.
- the encodings may be erasure hashes, for example.
- An erasure hash may be computed, for example, as a random linear combination of the hashes of a given level of the hierarchy.
- Any receiver is highly likely to be able to use any hash encoding to improve its understanding about how its target dataset differs from the master dataset. More specifically, parts of the master that a receiver knows it already has may be hashed and those hashes may be used, based on a received encoding or erasure hash (and possibly based also on information about how the encoding was encoded) to reproduce a needed hash. A receiver can use received, computed, and/or reproduced hashes to determine which parts of the master it might need. A receiver may use encodings of the blocks of the master to obtain blocks of the master, which may be applied to the receiver's target to construct a local copy of the master.
- FIG. 1 shows a generic distribution arrangement
- FIG. 2 shows an example of a wireless file distribution arrangement.
- FIG. 3 shows a timeline of receivers.
- FIG. 4 shows a hierarchical hash scheme
- FIG. 5 shows an efficient hash hierarchy encoding scheme.
- FIG. 6 shows a process of a sender providing update information to a receiver and a receiver performing an update.
- FIG. 7 shows an erasure hash encoding scheme.
- FIG. 8 summarizes a hashing, encoding, and transmission process for a sender.
- FIG. 9 shows an overview of how a receiver can find unmatched blocks in its target dataset.
- FIG. 10 shows a reconstruction process performed by a receiver when the sender uses a decomposable erasure scheme.
- FIG. 11 shows a process a receiver may use to determine when to stop downloading erasure hashes.
- FIG. 12 shows a graph
- FIG. 13 shows three performance graphs.
- FIG. 14 shows a table of empirical results for different download methods.
- FIG. 1 shows a generic distribution arrangement.
- a sender 100 has a master dataset 102 .
- a communication medium 103 allows at least one-way communication from the sender 100 to receivers 104 , 106 , 108 , and 110 , each with its own target dataset, 112 , 114 , 116 , and 118 , respectively.
- the communication medium 103 could be a data network, a mobile wireless network, a radio broadcast, a system bus in a computer, or even physically distributed storage mediums such as diskettes.
- the target datasets 112 , 114 , 116 , and 118 are various outdated versions of the master dataset 102 .
- the target datasets 112 , 114 , 116 , and 118 may differ from the master dataset 102 by varying degrees.
- Target datasets 112 , 114 , 116 , and 118 are to be updated or synchronized to match the master dataset 102 .
- a target dataset may be empty or may not have any parts of
- FIG. 2 shows an example of a wireless file distribution arrangement.
- various wireless mobile devices such as cell phone 130 , PDA 132 , and laptop 134 receive radio signals from broadcast system 136 .
- Mobile devices, 130 , 132 , and 134 may each have their own outdated versions of master dataset 102 .
- the radio signals convey information about the master dataset 102 that receivers can use to bring their respective versions of the dataset up to date.
- wireless systems can particularly benefit from an efficient data synchronization scheme that minimizes listening time, minimizes latency, and minimizes battery consumption by minimizing CPU usage.
- each receiver had the same version of a dataset and listened to the sender at the same time, then updating would not be a difficult problem; the sender would send the particular differences between the master dataset 102 and the receiver's version, the receivers would receive the same difference information at the same time and apply the differences to reconstruct the master dataset 102 .
- receivers may have different versions of the master dataset 102 , and some receivers may listen to the sender at different times.
- FIG. 3 shows a timeline 150 of receivers.
- Four different receivers C 1 , C 2 , C 3 , and C 4 are active at different times.
- At time 1 only C 4 is active.
- At time 2 C 1 and C 2 are active.
- no receivers are active or receiving information from the sender.
- the receivers may have different versions of the master dataset 102 .
- the sender may take a one-size-fits-all approach. That is, the sender may send or broadcast one set of differential information that each receiver can apply to its version to identify differences or reconstruct a local copy of the master dataset 102 .
- One approach for providing differential information is to use a hierarchical hash scheme.
- FIG. 4 shows a hierarchical hash scheme.
- the master dataset 102 is implicitly or explicitly divided into coarse top level blocks 170 (b 1 , . . . , bj).
- a hash function 171 is applied to the top level blocks 170 , creating top level block hashes 172 .
- the top level blocks 170 are subdivided into two or more smaller blocks 174 (b 2 . 1 1 , . . . , b 2 . 2 j ).
- the hash function 171 is applied to the smaller blocks 174 to create second level hashes 176 , and so on.
- hash hierarchy 178 has three levels but two or more may be used.
- Any receiver possessing a complete hash hierarchy 178 can determine its differences over the sender's master dataset 102 .
- the sender 180 may broadcast or send each level of the hash hierarchy 178 on a different communication channel (not shown).
- the hashes of each level are repeatedly sent as though on a data carousel.
- the receiver 182 receives the transmitted hashes and also data blocks of the master dataset (not shown) and performs a reconstruction process 184 using it target dataset 183 (an outdated version of master dataset 102 ).
- receiver 182 initially reads or receives 186 the top level block hashes 172 .
- the receiver 182 compares 188 the top level block hashes 172 against the target dataset 183 (F old ). More specifically, the receiver 182 moves a sliding window (the size of a top level block 170 ) across the target dataset 183 , applying the hash function 171 to windowed blocks in the target dataset 183 .
- the matching process can also be done through hierarchical fingerprints as described elsewhere and as discussed further below.
- matched blocks from target dataset 183 may be accumulated into a temporary file.
- the receiver 182 in effect identifies top level blocks 170 in the master dataset 102 that are not found in the target dataset 183 (although the receiver 182 does not yet know the contents of these missing blocks, it does know that they are not in its target dataset 183 ). In other words, the receiver 182 can identify any top level hashes 172 of the master dataset 102 that are not mapped-to by any part of the target dataset 102 .
- the receiver 182 compares 188 the top level hashes 172 and determines that top level block b 1 is not in the target dataset 183 , then the receiver will need second level hashes h(b 2 . 1 1 ) and h(b 2 . 2 1 ) to identify the portion of block b 1 (e.g. b 2 . 1 1 or b 2 . 2 1 ) that is not in the target dataset 183 . However, if h(b 2 .
- the receiver 182 will have to wait 190 for all of the other level two hashes 176 to transmit on the data carousel before actually receiving 192 the level two hash that it needs. This unproductive waiting time increases the time that it takes for the receiver 182 to synchronize its target dataset 183 . If the receiver 182 is a wireless device, then power may be consumed receiving unneeded hashes. On average, a receiver will wait for half the number of hashes of a hierarchy level before receiving a needed hash, and the cost will increase with the number of levels in the hash hierarchy.
- FIG. 5 shows an efficient hash hierarchy encoding scheme.
- sender 200 generates a′ hash hierarchy 202 preferably using a special hash function 204 .
- the dashed lines in hash hierarchy 202 indicate a mathematical relationship between the hashes in neighboring levels. This relationship results from the choice of hash function 204 and will be explained further below.
- Sender 200 carousels the top level block hashes 206 . However, rather than carousel or transmit the lower level hashes 210 themselves, the sender 200 first encodes 208 the lower level block hashes 210 and transmits encodings 211 of the lower level hashes 210 .
- the sender 200 also encodes 208 data blocks 212 (b 3 .
- the encodings 213 of the data blocks 212 are also transmitted.
- the encoding 208 is discussed below with reference to FIG. 7 .
- Receiver 214 receives transmissions from the sender 200 and performs a synchronization or reconstruction process 216 .
- the receiver 214 receives 218 the top level block hashes 206 . Similar to step 188 in FIG. 4 , the receiver 214 compares 220 the top level hashes 206 against the target dataset 183 (F old ) to determine which second level hashes will be needed to identify unmatched parts of unmatched level one blocks 170 . For example, the receiver 214 may determine that block b 1 is not found in the target dataset 183 and level two hashes—h(b 2 . 1 1 ) and h(b 2 . 2 1 )—will be needed to determine whether target dataset 183 has either block b 2 .
- the receiver 214 first calculates 221 determinable hashes h(b 2 . 1 2 ) . . . h(b 2 . 2 j ), which it can calculate by applying the known hash function 204 to copies of blocks b 2 . 1 2 . . . b 2 . 2 j (from its target dataset 183 ) which it knows from step 220 match the master dataset 102 .
- the receiver then receives 222 the next (or any) encodings 211 of the level 2 hashes and, using the calculated 221 hashes at the same level, decodes 224 them to produce the needed block hashes, in this example h(b 2 . 1 1 ) and h(b 2 . 2 1 ).
- steps 221 and 222 can occur in any order.
- the receiver 214 is easily able to determine which data blocks (e.g. b 3 . 2 1 and b 3 . 4 1 ) need to be applied to the target dataset 183 to reproduce the master dataset 102 .
- the receiver 214 does not need to wait for a particular data block to arrive. Instead, the receiver 214 receives 226 any encodings 213 , preferably the next transmitted encodings 213 , and uses the received 226 encodings 213 to reproduce the needed data blocks.
- the receiver 214 is likely to be able to use any encodings 211 / 213 to help reconstruct the needed block hashes (e.g. h(b 2 . 1 and h(b 2 . 2 )) or data blocks 212 .
- each encoding 211 received by receiver 214 is likely to contribute to the reconstruction of a needed block hash and the subsequent identification of portions of the master dataset 102 that are missing from the target dataset 183 .
- Each encoding 213 of the data blocks 212 is likely to contribute to the reconstruction of a needed data block.
- the decoding process may not be able to commence until all needed encodings are received, although progressive decoding is sometimes possible.
- FIG. 6 shows a process of a sender providing update information to a receiver and a receiver performing an update.
- the sender generates 230 a hash hierarchy as discussed above. Namely, the sender generates 230 block hashes of divisions and subdivisions of the master dataset 102 (F old ).
- the sender generates 232 encodings of the hash hierarchy.
- the sender then: transmits 234 the top level block hashes of the hash hierarchy (the hashes of the largest divisions of the master dataset 102 ); transmits 234 encodings of the lower level hashes; and transmits 234 encodings of blocks of the actual content of the master dataset 102 .
- Level one hashes can also be encoded, however, most of the time this will not provide a significant benefit.
- the receiver iteratively downloads transmissions from the sender and reconstructs, level by level with increasing fineness, hashes of the hash hierarchy that it determines it needs.
- the receiver obtains the top level of the hash hierarchy by receiving 236 the top level hashes, which it uses as a current hash search set.
- the receiver uses 238 the current hash search set to search the target dataset 183 (F old ) for hashes in the current search set that do not have a matching block in target dataset 183 .
- the receiver then goes 240 down to the next level of the hash hierarchy.
- the receiver receives 244 the encodings of the new level of the hash hierarchy.
- the receiver uses 246 the encodings and the sub-hashes for the matched blocks at the current level to reconstruct block hashes at the new/current level that will be used as the hash search set, again performing a search 238 .
- Sub-hashes for matched blocks can be easily calculated by the receiver since the receiver is also aware of the hashing algorithm used by the server and the receiver has the same content of the matched blocks in its target dataset.
- the searching 238 , receiving 244 , and reconstruction 246 are repeated at lower finer levels until there is a determination that the overall searching process is finished 242 .
- This determination can be as simple as reaching a predetermined or lowest level of the hash hierarchy, or it could be a dynamic determination based, for example, on whether new searches 238 are improving the receiver's knowledge of the master dataset 102 . If the receiver realizes that it did not match anything at the first level, then it will not need to download the second level. If the receiver realizes that it has many matches on the first level, but on the second level and third levels it keeps matching the same things, then it realizes it cannot get any more information than it got on the third level, and it may stop.
- the receiver can measure the benefit at a given level and stop if there is no benefit.
- This adaptive search approach allows the receiver to search with fine granularity when there are only small differences between the master dataset 102 and the target dataset 183 .
- the receiver may search with coarse granularity when there are large differences between the datasets 102 , 183 .
- the receiver can dynamically adjust how much of the hash hierarch it will need to download.
- arbitrary erasure encodings of the data blocks may be downloaded 248 in a quantity proportional to the number of unmatched hashes at the lowest level of the hash hierarchy.
- the encodings of the data blocks may be decoded and used to reconstruct 250 a copy 252 of the target dataset 102 .
- any commonly transmitted encoding is highly likely to allow any receiver to make a determination at that level about what part of the master dataset is not missing from a receiver's particular master dataset.
- Receiver 1 and Receiver 2 both need to learn two numbers, “1” and “2”, and the receivers are listening from a server that has to provide these two numbers. If Receiver 1 already knows “1”, and Receiver 2 already knows “2”, the server can tell Receiver 1 :“2” and can tell Receiver 2 :“1”.
- Receiver 1 and Receiver 2 will know both numbers. However, it will have cost the server two numbers/operations (inform Receiver 1 , inform Receiver 2 ). If, instead, the server sends both receivers “3”, and sends information instructing the receivers to “subtract the number that I'm sending you from the number that you have” (i.e. the server says, “3 and subtract”), then Receiver 1 will subtract 1 from 3 to obtain 2; its missing number. Similarly, Receiver 2 will subtract 2 from 3 to obtain its missing number; 1. With the same transmitted information each receiver can generate the number it needs.
- the server saved time and bandwidth by providing a number that is really a combination of the numbers that the receivers already have and by providing an operation that can reconstruct the number a receiver is missing. As discussed below, block hashes can be encoded with a similar concept.
- FIG. 7 shows an erasure hash encoding scheme.
- a sender can encode block hashes into erasure hashes (encodings of block hashes) by taking the block hashes at a given level and combining them.
- a random linear combination of block hashes is an efficient way to combine block hashes.
- an erasure hash 280 is produced by crossing a vector 282 of preferably random coefficients with a matrix 284 of block hashes.
- Each erasure hash 280 is produced by its own corresponding vector of random coefficients. Operations happen in a finite field, e.g. Galois Field (2 16 ).
- the receiver can use known linear algebra techniques to solve for a missing hash.
- the receiver can hash blocks in its target dataset that are known to match the master dataset to locally obtain some or most of the hash vectors in the matrix 284 .
- Unknown hashes can then be solved using the known vectors in the matrix 284 , using the known coefficients 282 , and using one or more erasure hashes 280 .
- the number of erasure hashes needed by the receiver will be proportional to the number of known blocks (or, conversely, the number of unknown hashes).
- FIG. 8 summarizes a hashing, encoding, and transmission process for a sender.
- the sender can generate or otherwise obtain block hashes 300 for increasingly smaller blocks of subdivisions of a target dataset.
- the sender transmits 302 the level-1 block hashes, preferably on their own communication channel.
- For each level of block hashes 300 below level-1 the sender computes 304 an erasure hash as a random linear combination of the block hashes 300 at that level.
- the erasure hash is transmitted 306 .
- the computing 304 and transmitting 306 is repeated to produce a stream of substantially unique erasure hashes for each level.
- the sender also encodes 308 the data blocks 312 into an erasure block and transmits 310 the erasure block.
- the encoding 308 and transmitting 310 is repeated to provide a steady stream of erasure blocks of the content of the master dataset 102 .
- the stream of erasure blocks has its own communication channel.
- Each stream of erasure hashes preferably has its own communication channel, which allows different receivers to pull an erasure hash from any level at any time without having to wait.
- Parallel transmission 302 , 306 , 310 of top level block hashes, erasure hashes, and erasure blocks is preferred but not necessary.
- the sender may also transmit with each erasure hash the vector of coefficients that were used to produce the erasure hash.
- An efficient alternative is to have the sender transmit a seed for a random number generator.
- Each receiver can use the seed and generator to reproduce the same sequence of coefficients used by the sender to linearly combine hashes to produce erasure hashes.
- a large book of predetermined coefficients could be stored in advance at the sender and each receiver. Any coefficient sharing mechanism may be used.
- the block hash is a weak rolling block hash, as used with the well known rsync algorithm.
- a rolling block hash if a block is hashed (producing a first hash) and then the block is extended by a small amount, the first hash can be used to cheaply hash the extended block.
- a rateless erasure algorithm has been described above. If, in terms of network layers, the erasure algorithm is implemented at the application layer, then error correction can be presumed to be handled at a lower layer such as the link or transport layer. However, if error correction or redundancy is desired, a rated erasure may be used to provide error correction.
- an erasure hash may be a random linear combination of all the block hashes at the level of that erasure hash.
- a linear combination of a subset of block hashes can be used, which makes the encoding matrix sparser. If all block hashes are used then the encoding matrix will not be sparse.
- a non-sparse matrix may require a lot of time to decode at the receiver side because more equations need to be solved. For the block hashes this is not much of a concern because the hashes are small, but the actual downloaded content data blocks may be relatively large and solving a full matrix can be expensive. It is possible to probabilistically produce some linear combinations that are just a combination of one data block.
- a receiver can skip a data block if it already has the coefficients for that block. That is, if the receiver knows the coefficients of a forthcoming block, for example by receiving it before receiving a block or by using the shared-seed scheme discussed above, then the receiver can use the coefficients for the data block that is going to be distributed next to determine whether that upcoming data block will provide information new to the receiver.
- This determination may be made by calculating the rank of the matrix of coefficient vectors; the receiver already has a set of blocks and it knows the coefficients of those blocks, so it can start building the matrix.
- a receiver can download the seed, calculate the coefficients that are going to be used for the next/new block, add those coefficients to those that it already has stored locally.
- the receiver can then calculate the rank of the combined matrix. If the rank increases by one that indicates that whatever is being broadcast through the air is new to the receiver. Otherwise, the receiver won't need it. Calculating the rank is a way to make sure that the new coefficient vector is linearly independent of the coefficient vectors for blocks already received or known by the receiver. If the rank does not increase that indicates that the new coefficient vector is linearly dependent on what the receiver already has, i.e.
- a data block is one megabyte, it might take 5 minutes for it to be downloaded by the receiver. But, if the receiver only has to download the relatively small coefficients (e.g. 16 bytes) produced by the seed to determine whether or not to download the next data block, the receiver can skip the block and wait for next coefficients/block, etc.
- relatively small coefficients e.g. 16 bytes
- Whether to use block prediction as discussed above may be decided by weighing the overhead against the fairly low probability that a receiver will receive information that it will not need. Most of the time data blocks are linear combinations of all of the blocks and will be useful to a receiver.
- u(i) is a shorthand expression for the number of unmatched blocks at level i.
- the receiver downloads u(i ⁇ 1)*2 erasure hashes in order to reconstruct the correct set of hashes at level i.
- the number of erasure hashes downloaded at level i can be halved by using decomposable homomorphic hashes.
- a hash function is decomposable if h(f[m+1, r]) can be computed from the values h(f[l, r]), h(f[l,m]), r ⁇ l, and r ⁇ m ⁇ 1, and also h(f[l,m]) from h(f[l, r]), h(f[m+1, r]), r ⁇ l, and m ⁇ l.
- decomposable hash function can save on the cost of delivering block hashes used to identify matching data. Since receivers already have a hash for the parent block, they can receive one additional hash per pair of sibling child blocks, the hash for the other sibling can then be computed from these two.
- a decomposable function h(f[l, r]) for a block at a given level is equal to h(f[l,m])+h(f[m, r]), where h(f[l,m]) and h(f[m, r]) are the hashes for the corresponding sibling blocks at the next hierarchy level.
- h may be defined such that
- decomposable hashes can be efficiently used in combination with erasure hashes.
- Decomposable hashes at a given level can be used as input to the erasure decoding algorithm at the next level of hierarchy where, given the parent hash and an additional erasure block, any of the siblings can be reconstructed.
- decomposable parent hashes can be interpreted as simple linear combinations of the two child block hashes when the erasure hash is created as the addition of two hashes.
- each erasure hash is thought of as an equation, and each block is thought of as a variable, and there are 1,000 variables, only one equation that is linearly independent with the known equations will be needed to solve the unknown variable if 999 variables are known.
- the receiver To reproduce the newly needed block hash the receiver only needs the 999 downloaded/known block hashes, the 1 erasure hash, and the corresponding vector (coefficients) that was used to generate that erasure hash.
- FIG. 9 shows an overview of how a receiver can find unmatched blocks in its target dataset.
- the receiver reads 340 the top level of the hash hierarchy which it uses to determine 342 hashes—and therefore blocks—of the master dataset that do not have a top-level equivalent in the target dataset.
- the hashes can be stored in a dictionary.
- Each top-level block in the target dataset (moving one byte or symbol at a time) is hashed and its hash is searched for in the dictionary.
- the receiver then reads 344 , at any time, any portion of the transmitted encoding of the hash hierarchy, where the size of the downloaded portion is proportional to the number of unmatched blocks at the current level.
- the portion of the hash hierarchy that is known to the receiver is used 346 to decode the portion of the hash hierarchy needed by the receiver.
- These decoded block hashes are hashes of finer sub-blocks of the unmatched blocks in the level above.
- the decoded portions of the hash hierarchy (the decoded block hashes) are used 348 to determine unmatched sub-blocks corresponding to the determined 342 unmatched blocks in the level above. The process may be repeated downward as necessary and then the necessary data blocks may be obtained by the receiver.
- FIG. 10 shows a reconstruction process performed by a receiver when the sender uses a decomposable erasure scheme.
- Erasure hashes 360 are transmitted via 4 channels 362 .
- the “h2” erasure hashes are at level-2, the “h3” hashes are at level-3, and the “h4” hashes are at level-4.
- the receiver knows that block 364 is the only unmatched level-1 block hash.
- block hashes 366 are shown with different size to emphasize the sizes of the blocks that they represent. In practice, the block hashes 366 are preferably the same size or same order of magnitude.
- the receiver did not match level-1 master hash 364 so the receiver needs to determine whether its target dataset has either of the master sub-blocks of hash 366 's master block.
- the receiver because only 1 master block/hash is unmatched at stage j, the receiver only needs to download one level-2 erasure hash. Therefore, the receiver reads, from channel 1, erasure block hash e h2,j . Because the hash function is decomposable, the receiver uses hash 364 , erasure hash e h2,j , and the other hashes for matched blocks at level 2, to produce hash 368 and hash 370 . The receiver then determines whether hashes 368 and 370 have matching blocks in its target dataset.
- hash 368 is not matched and hash 370 is matched.
- the receiver again needs to download only one erasure hash, level-3 erasure hash e h3,j+1 , which it reads from channel 2.
- Hash e h3,j+1 is used together with hash 368 to compute hash 372 and hash 374 .
- the receiver then uses these hashes to determine which sub-blocks of hash 368 's master block are missing from the target dataset. Neither hash matches and at stages j+2 and j+3 the receiver reads the next two level-4 erasure hashes from channel 3; e h4,j+2 and e h4,j+3 .
- the child hashes 375 of hashes 372 and 374 are computed and used to determine that level-4 hashes 376 do not have a match.
- the receiver knows that it only needs two data erasures and reads data erasures ed j+4 and ed j+5 from channel 4.
- the receiver decodes the data erasures ed j+4 and ed j+5 to produce the missing master data blocks 378 that correspond to block hashes 376 .
- FIG. 11 shows a process a receiver may use to determine when to stop downloading erasure hashes.
- the receiver downloads 400 erasure hashes and searches for blocks at the current level of the hierarchy.
- the results are compared 402 to results from searching at the previous level.
- the receiver determines 404 the rate or amount of new information being added. Methods for making determination 404 are discussed above. If the rate is low or if no new information is being added, then the receiver finishes 406 searching and may download content data. Otherwise, the receiver goes 408 to the next level and repeats steps 400 , 402 , and 404 .
- a receiver can quickly obtain the hashes it needs, by downloading an amount of information proportional to the number of missing hashes/blocks yet without having to provide feedback to the sender.
- any random generator will suffice but most processors will be most efficient with 2-byte coefficients.
- a hash size of 7 bytes was found to be optimal for many types of applications.
- An MD5 or SHA1 hash can be used as a signature to verify that a reconstructed dataset matches its master dataset.
- FIG. 12 shows a graph 430 .
- graph 430 it can be seen that over a range of different blocksizes hierarchical hashing (“DeltaCast”) requires less than half the bandwidth of a single round hashing scheme.
- block size the blocks of the master dataset do not have to be divided into fixed sizes. Variable sized blocks can be used for greater efficiency. As long the receiver knows the sizes in advance the same techniques may be used. Rather than use a rolling window sized to the size of the blocks at the current search level, the rolling window can be matched to a certain value (fingerprint) that identifies the border of a block. That is, the window keeps rolling until the edge of the window matches another border identifier/value, and so on.
- fingerprint a certain value
- variable block size approach works well when the master file or dataset is updated with arbitrary insertions/deletions. There may be a range of block sizes at one level, and at the next level the range of block sizes will be halved. However, blocks will need to be padded to a multiple of a fixed block size.
- FIG. 13 shows three performance graphs.
- Graph 440 shows how performance can vary with the number of levels in the hash hierarchy.
- Graph 442 shows how performance can vary when receivers have different outdated versions of the dataset.
- Graph 444 shows a comparison of average download latencies for four different schemes: (a) simple full file download with no hashes, (b) hierarchical hashing scheme with no encoded data, (c) single-layer hashing, and (d) hierarchical hashing with erasure encoding (“DeltaCast”).
- the total latency represented in graph 444 includes (i) the time to download the hashes or the erasure hashes, (ii) the time to download the missing data, (iii) the idle time waiting in the different carrousels for the specific hashes and data to arrive, and (iv) the time to decode the encoded hashes and data. Note that not all latency factors are part of every scheme. For instance, the total latency in scheme (a) is determined by the latency factor (i). The latency in schemes (b) and (c), however, is determined by factors (i), (ii), and (iii), while the latency in scheme (d) is determined by factors (i), (ii), and (iv).
- FIG. 14 shows a table 470 of empirical results for different download methods.
- the “DeltaCast” approach offers significant performance improvement over other techniques.
- Three sub-columns represent three different sets of synchronized data. In each case, the “DeltaCast” was superior.
- Various embodiments discussed herein may involve reconstructing a master dataset at a receiver. However, there may be some applications where actual reconstruction is not needed or performed and data blocks of the master dataset (or encodings thereof) may not be sent and/or received. For example, there may be cases where it is useful to simply determine where a target dataset differs from a master dataset. Stale data in the target dataset could be deleted from the target dataset to cause the target dataset to become a proper subset of the master dataset. Or, if the master dataset is known to differ only by deletions, then the target dataset could be rendered equivalent to the master dataset by deleting portions determined to be absent from the master dataset. It should also be noted that embodiments discussed herein are highly useful for one-way sender-receiver communication.
- aspects of the embodiments may also be useful for two-way communication systems.
- a receiver could determine missing parts of a master dataset in ways discussed herein and then use a feedback communication to the sender to request specific missing parts of the master dataset.
- techniques discussed herein are useful with but not limited to one-way communication systems.
- a remote computer may store an example of the process described as software.
- a local or terminal computer may access the remote computer and download a part or all of the software to run the program.
- the local computer may download pieces of the software as needed, or distributively process by executing some software instructions at the local terminal and some at the remote computer (or computer network).
- a dedicated circuit such as a DSP, programmable logic array, or the like.
Landscapes
- Engineering & Computer Science (AREA)
- Computer Security & Cryptography (AREA)
- Signal Processing (AREA)
- Computer Hardware Design (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Multimedia (AREA)
- Mobile Radio Communication Systems (AREA)
- Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
- Circuits Of Receivers In General (AREA)
Abstract
Description
where bi is an individual block, (b1, . . . , bn) is the parent block made of the concatenation of
where cv are random coefficients.
Claims (20)
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US11/109,011 US7984018B2 (en) | 2005-04-18 | 2005-04-18 | Efficient point-to-multipoint data reconciliation |
US13/155,356 US20110264629A1 (en) | 2005-04-18 | 2011-06-07 | Efficient point-to-multipoint data reconciliation |
US13/155,607 US20110238623A1 (en) | 2005-04-18 | 2011-06-08 | Efficient point-to-multipoint data reconciliation |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US11/109,011 US7984018B2 (en) | 2005-04-18 | 2005-04-18 | Efficient point-to-multipoint data reconciliation |
Related Child Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/155,356 Continuation US20110264629A1 (en) | 2005-04-18 | 2011-06-07 | Efficient point-to-multipoint data reconciliation |
US13/155,607 Continuation US20110238623A1 (en) | 2005-04-18 | 2011-06-08 | Efficient point-to-multipoint data reconciliation |
Publications (2)
Publication Number | Publication Date |
---|---|
US20060235895A1 US20060235895A1 (en) | 2006-10-19 |
US7984018B2 true US7984018B2 (en) | 2011-07-19 |
Family
ID=37109808
Family Applications (3)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US11/109,011 Expired - Fee Related US7984018B2 (en) | 2005-04-18 | 2005-04-18 | Efficient point-to-multipoint data reconciliation |
US13/155,356 Abandoned US20110264629A1 (en) | 2005-04-18 | 2011-06-07 | Efficient point-to-multipoint data reconciliation |
US13/155,607 Abandoned US20110238623A1 (en) | 2005-04-18 | 2011-06-08 | Efficient point-to-multipoint data reconciliation |
Family Applications After (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/155,356 Abandoned US20110264629A1 (en) | 2005-04-18 | 2011-06-07 | Efficient point-to-multipoint data reconciliation |
US13/155,607 Abandoned US20110238623A1 (en) | 2005-04-18 | 2011-06-08 | Efficient point-to-multipoint data reconciliation |
Country Status (1)
Country | Link |
---|---|
US (3) | US7984018B2 (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070208748A1 (en) * | 2006-02-22 | 2007-09-06 | Microsoft Corporation | Reliable, efficient peer-to-peer storage |
US20100281062A1 (en) * | 2009-05-01 | 2010-11-04 | Brother Kogyo Kabushiki Kaisha | Management apparatus, recording medium recording an information generation program , and information generating method |
US20120192272A1 (en) * | 2011-01-20 | 2012-07-26 | F-Secure Corporation | Mitigating multi-AET attacks |
US11115198B2 (en) * | 2018-09-19 | 2021-09-07 | Kabushiki Kaisha Toshiba | Key generation device, key generation method, and computer program product |
Families Citing this family (23)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070061381A1 (en) * | 2005-09-15 | 2007-03-15 | Gregory Newton | Methods, systems and computer program products for managing user information across multiple devices associated with the user |
US7617231B2 (en) * | 2005-12-07 | 2009-11-10 | Electronics And Telecommunications Research Institute | Data hashing method, data processing method, and data processing system using similarity-based hashing algorithm |
US7844581B2 (en) * | 2006-12-01 | 2010-11-30 | Nec Laboratories America, Inc. | Methods and systems for data management using multiple selection criteria |
EP2103023B1 (en) | 2006-12-14 | 2015-04-15 | Thomson Licensing | Rateless codes decoding method for communication systems |
CN101563873A (en) | 2006-12-14 | 2009-10-21 | 汤姆逊许可证公司 | Modulation indication method for communication systems |
JP5297387B2 (en) * | 2006-12-14 | 2013-09-25 | トムソン ライセンシング | Rateless encoding in communication systems |
KR101367072B1 (en) * | 2006-12-14 | 2014-02-24 | 톰슨 라이센싱 | Arq with adaptive modulation for communication systems |
US7827137B2 (en) * | 2007-04-19 | 2010-11-02 | Emc Corporation | Seeding replication |
US8103718B2 (en) * | 2008-07-31 | 2012-01-24 | Microsoft Corporation | Content discovery and transfer between mobile communications nodes |
US20100088296A1 (en) * | 2008-10-03 | 2010-04-08 | Netapp, Inc. | System and method for organizing data to facilitate data deduplication |
US20100174968A1 (en) * | 2009-01-02 | 2010-07-08 | Microsoft Corporation | Heirarchical erasure coding |
US8200641B2 (en) * | 2009-09-11 | 2012-06-12 | Dell Products L.P. | Dictionary for data deduplication |
US9646105B2 (en) * | 2012-11-08 | 2017-05-09 | Texas Instruments Incorporated | Reduced complexity hashing |
US9680650B2 (en) * | 2013-08-23 | 2017-06-13 | Qualcomm Incorporated | Secure content delivery using hashing of pre-coded packets |
US10574438B2 (en) * | 2014-02-18 | 2020-02-25 | Nippon Telegraph And Telephone Corporation | Security apparatus, method thereof, and program |
US9922201B2 (en) | 2015-04-01 | 2018-03-20 | Dropbox, Inc. | Nested namespaces for selective content sharing |
US9697269B2 (en) * | 2015-10-29 | 2017-07-04 | Dropbox, Inc. | Content item block replication protocol for multi-premises hosting of digital content items |
WO2017223095A1 (en) * | 2016-06-20 | 2017-12-28 | Anacode Labs, Inc. | Parallel, block-based data encoding and decoding using multiple computational units |
US20200111555A2 (en) | 2016-07-26 | 2020-04-09 | Bayer Business Services Gmbh | Synchronization of hierarchical data |
US10749668B2 (en) * | 2017-05-03 | 2020-08-18 | International Business Machines Corporation | Reduction in storage usage in blockchain |
US11064055B2 (en) | 2019-07-22 | 2021-07-13 | Anacode Labs, Inc. | Accelerated data center transfers |
US11449278B2 (en) * | 2020-09-04 | 2022-09-20 | Netapp, Inc. | Methods for accelerating storage operations using computational network and storage components and devices thereof |
US20230216690A1 (en) * | 2021-12-30 | 2023-07-06 | Gm Cruise Holdings Llc | Data transfer acceleration via content-defined chunking |
Citations (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4538240A (en) * | 1982-12-30 | 1985-08-27 | International Business Machines Corporation | Method and apparatus for performing hashing operations using Galois field multiplication |
US5199073A (en) * | 1990-10-30 | 1993-03-30 | International Business Machines Corporation | Key hashing in data processors |
US5390187A (en) * | 1990-10-23 | 1995-02-14 | Emc Corporation | On-line reconstruction of a failed redundant array system |
US5475826A (en) * | 1993-11-19 | 1995-12-12 | Fischer; Addison M. | Method for protecting a volatile file using a single hash |
US5530757A (en) * | 1994-06-28 | 1996-06-25 | International Business Machines Corporation | Distributed fingerprints for information integrity verification |
US5701418A (en) * | 1994-03-31 | 1997-12-23 | Chrysler Corporation | Intra-vehicular LAN and method of routing messages along it using hash functions |
US5909700A (en) * | 1996-12-23 | 1999-06-01 | Emc Corporation | Back-up data storage facility incorporating filtering to select data items to be backed up |
US6148382A (en) * | 1996-12-23 | 2000-11-14 | Emc Corporation | Arrangement for filtering data item updates to reduce the number of updates to a data item to be stored on mass data storage facility |
US6202135B1 (en) * | 1996-12-23 | 2001-03-13 | Emc Corporation | System and method for reconstructing data associated with protected storage volume stored in multiple modules of back-up mass data storage facility |
US6233589B1 (en) * | 1998-07-31 | 2001-05-15 | Novell, Inc. | Method and system for reflecting differences between two files |
US20010037323A1 (en) * | 2000-02-18 | 2001-11-01 | Moulton Gregory Hagan | Hash file system and method for use in a commonality factoring system |
US20020055991A1 (en) * | 1998-05-08 | 2002-05-09 | Apple Computer, Inc. | Method and apparatus for configuring a computer |
US20030182568A1 (en) * | 2002-03-21 | 2003-09-25 | Snapp Robert F. | Method and system for storing and retrieving data using hash-accessed multiple data stores |
US20030217058A1 (en) * | 2002-03-27 | 2003-11-20 | Edya Ladan-Mozes | Lock-free file system |
-
2005
- 2005-04-18 US US11/109,011 patent/US7984018B2/en not_active Expired - Fee Related
-
2011
- 2011-06-07 US US13/155,356 patent/US20110264629A1/en not_active Abandoned
- 2011-06-08 US US13/155,607 patent/US20110238623A1/en not_active Abandoned
Patent Citations (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4538240A (en) * | 1982-12-30 | 1985-08-27 | International Business Machines Corporation | Method and apparatus for performing hashing operations using Galois field multiplication |
US5390187A (en) * | 1990-10-23 | 1995-02-14 | Emc Corporation | On-line reconstruction of a failed redundant array system |
US5199073A (en) * | 1990-10-30 | 1993-03-30 | International Business Machines Corporation | Key hashing in data processors |
US5475826A (en) * | 1993-11-19 | 1995-12-12 | Fischer; Addison M. | Method for protecting a volatile file using a single hash |
US5694569A (en) * | 1993-11-19 | 1997-12-02 | Fischer; Addison M. | Method for protecting a volatile file using a single hash |
US5701418A (en) * | 1994-03-31 | 1997-12-23 | Chrysler Corporation | Intra-vehicular LAN and method of routing messages along it using hash functions |
US5530757A (en) * | 1994-06-28 | 1996-06-25 | International Business Machines Corporation | Distributed fingerprints for information integrity verification |
US6148382A (en) * | 1996-12-23 | 2000-11-14 | Emc Corporation | Arrangement for filtering data item updates to reduce the number of updates to a data item to be stored on mass data storage facility |
US5909700A (en) * | 1996-12-23 | 1999-06-01 | Emc Corporation | Back-up data storage facility incorporating filtering to select data items to be backed up |
US6202135B1 (en) * | 1996-12-23 | 2001-03-13 | Emc Corporation | System and method for reconstructing data associated with protected storage volume stored in multiple modules of back-up mass data storage facility |
US20010042222A1 (en) * | 1996-12-23 | 2001-11-15 | Emc Corporation | System and method for reconstructing data associated with protected storage volume stored in multiple modules of back-up mass data storage facility |
US6397309B2 (en) * | 1996-12-23 | 2002-05-28 | Emc Corporation | System and method for reconstructing data associated with protected storage volume stored in multiple modules of back-up mass data storage facility |
US20020055991A1 (en) * | 1998-05-08 | 2002-05-09 | Apple Computer, Inc. | Method and apparatus for configuring a computer |
US6233589B1 (en) * | 1998-07-31 | 2001-05-15 | Novell, Inc. | Method and system for reflecting differences between two files |
US20010037323A1 (en) * | 2000-02-18 | 2001-11-01 | Moulton Gregory Hagan | Hash file system and method for use in a commonality factoring system |
US20030182568A1 (en) * | 2002-03-21 | 2003-09-25 | Snapp Robert F. | Method and system for storing and retrieving data using hash-accessed multiple data stores |
US20030217058A1 (en) * | 2002-03-27 | 2003-11-20 | Edya Ladan-Mozes | Lock-free file system |
US6850969B2 (en) * | 2002-03-27 | 2005-02-01 | International Business Machined Corporation | Lock-free file system |
Non-Patent Citations (69)
Title |
---|
"Improved single-round protocols for remote file synchronization"; Irmak, U.; Mihaylov, S.; Suel, T.;INFOCOM 2005. 24th Annual Joint Conference of the IEEE Computer and Communications Societies. Proceedings IEEE vol. 3, Mar. 13-17, 2005 pp. 1665-1676 vol. 3; Digital Object Identifier 10.1109/INFCOM.2005.1498448. * |
"Low cost comparisons of file copies"; Schwarz, T.; Bowdidge, R.W.; Burkhard, W.A.;Distributed Computing Systems, 1990. Proceedings., 10th International Conference on May 28-Jun. 1, 1990 pp. 196-202; Digital Object Identifier 10.1109/ICDCS.1990.89272. * |
"Multilevel error-control codes for data storage channels"; Abdel-Ghaffar, K.A.S.; Hassner, M.; Information Theory, IEEE Transactions on vol. 37, Issue 3, Part 2, May 1991 pp. 735-741; Digital Object Identifier 10.1109/18.79944. * |
"Shift-register synthesis and BCH decoding"; Massey, J.; Information Theory, IEEE Transactions on vol. 15, Issue 1, Jan. 1969 pp. 122-127. * |
"Software-based erasure codes for scalable distributed storage"; Cooley, J.A.; Mineweaser, J.L.; Servi, L.D.; Tsung, E.T.;Mass Storage Systems and Technologies, 2003. (MSST 2003). Proceedings. 20th IEEE/11th NASA Goddard Conference on Apr. 7-10, 2003 pp. 157-164. * |
A. Muthitacharoen, B. Chen, and D. Mazi'eres, A low bandwidth network file system, in Proc. of the 18th ACM Symp. on Operating Systems Principles, Oct. 2001, pp. 174187. |
A. Orlitsky and K. Viswanathan, One-way communication and error-correcting codes, Proc. of the 2002 IEEE Int. Symp. on Information Theory, Jun. 2002 p. 394. |
A. Orlitsky, Interactive communication of balanced distributions and of correlated files, SIAM Journal of Discrete Math, vol. 6, No. 4. pp. 548564, 1993. |
A. Orlitsky, Worst-case interactive communication II: Two messages are not optimal, IEEE Transactions on Information Theory, vol. 37, No. 4, pp. 9951005, Jul. 1991. |
A. Tridgell and P. MacKerras, The rsync algorithm, Technical Report TR-CS-96-05, Australian National University, Jun. 1996. |
Acharya, Swarup et al., "Broadcast Disks: Data Management for Asymmetric Communication Environments", Technical Report No. CS-94-43 (Dec. 1994), pp. 28. |
Agarwal, S., et al., "On the Scalability of Data Synchronization Protocols for PDAs and Mobile Devices", pp. 14. |
Baric, Niko et al., "Collision-Free Accumulators and Fail-Stop Signature Schemes Without Trees", Advances in Cryptology-EUROCRYPT '97, LNCS 1233, pp. 480-494. |
Bellare, Mihir et al., "A New Paradigm for Collision-free Hashing: Incrementality at Reduced Cost", pp. 21. |
Broadcast and Multicast Service in cdma2000 Wireless IP Network, http://www.3gpp2.org/, Oct. 2003. |
Broadcast and Multicast Service in cdma2000 Wireless IP Network., http://www.3gpp2.org/, Oct. 2003. |
Byers, J., et al., "Informed Content Delivery Across Adaptive Overlay Networks", pp. 12. |
Byers, John et al., "A Digital Fountain Approach to Reliable Distribution of Bulk Data", pp. 12. |
Chen, Ming-Syan et al., "Optimizing Index Allocation for Sequential Data Broadcasting in Wireless Mobile Computing", IEEE Transactions on Knowledge and Data Engineering, vol. 15, No. 1, Jan./Feb. 2003, pp. 161-173. |
Cormode, Graham "Sequence Distance Embeddings", Thesis Submitted to the University of Warwick for the degree of Doctor of Philosophy, Computer Science, Jan. 2003, pp. 1-174. |
Cormode, Graham et al., "Communication complexity of document exchange", pp. 17. |
Cox, Landon P., et al., "Pastiche: Making Backup Cheap and Easy", pp. 14. |
D. Starobinski, A. Trachtenberg, and S. Agarwal, Efficient PDA synchronization, IEEE Trans. on Mobile Computing, 2003. |
Digital Audio and Video Broadcasting systems, http://www.etsi.org/. |
DirectBand Network. Microsoft Smart Personal Objects Technology (SPOT), http://www.microsoft.com/resources/spot/. |
DirectBand Network. Microsoft Smart Personal Objects Technology (SPOT). http://www.microsoft.com/resources/spot/. |
First UK user trial of multi-channel TV to mobile phones, Nokia Press Release, IBC Amsterdam, 2004. |
First UK user trial of multi-channel TV to mobile phones, Nokia Press Release, IBC Amsterdam, 2004. http://press.nokia.com/PR/200409/960284-5.html. |
G. Cormode, M. Paterson, S. Sahinalp, and U. Vishkin, Communication complexity of document exchange, Proc. of the ACM-SIAM Symp. on Discrete Algorithms, Jan. 2000. |
G. Cormode, Sequence Distance Embeddings, Ph.D thesis, University of Warwick, Jan. 2003. |
Hu, Qingilong et al., "Performance Evaluation of a Wireless Hierarchical Data Dissemination System", Mobicom '99, pp. 163-173. |
Hughes Network Systems, http://www.direcway.com/. |
Hughes Network Systems., http://www.direcway.com/. |
Imielinski, T., et al., "Data on Air: Organization and Access", IEEE Transactions on Knowledge and Data Engineering, Vol. 9, No. 3, May/Jun. 1997, pp. 353-372. |
Irmak, U., et al., "Improved Single-Round Protocols for Remote File Synchronization", pp. 12. |
J. Byers and J. Considine, Informed Content Delivery Across Adaptive Overlay Networks, Proc. of ACM SIGCOMM, Aug. 2002. |
Johnson, R., et al., "Homomorphic Signature Schemes", Progress in Cryptology CT-RSA 2002, 2002, pp. 18. |
Julian Chesterfield and Pablo Rodriguez; "DeltaCast: Efficient File Reconciliation in Wireless Broadcast Systems" Jun. 2005-Proceedings of the Third International Conference on Mobile Systems, Applications, and Services (MobiSys 2005) (1931971315); Berkley, CA; Seattle, WA USA; 14 pages. |
Karp, R., et al., "Efficient randomized pattern-matching algorithms", IBM J. Res. Develop. vol. 31 No. 2 Mar. 1987, pp. 249-260. |
L. Cox, C. Murray, and B. Noble, Pastiche: Making backup cheap and easy, in Proc of the 5th Symp. on Operating Systems Design and Implementation, Dec. 2002. |
M. Bellare and D. Micciancio, A new paradigm for collision-free hashing: Incrementality at reduced cost, Advances in Cryptology, EUROCRYPT 97, 1997. |
Minsky, Y., et al., "Set Reconciliation with Nearly Optimal Communication Complexit", Technical Report TR2000-1813, Cornell University, Apr. 29, 2004, pp. 1-18. |
Multimedia Broadcast/Multicast System (MBMS), http://www.3gpp.org/ftp/Specs/html-info/29846.htm. |
Multimedia Broadcast/Multicast System (MBMS), http://www.3gpp2.org/ftp/Specs/html-info/29846.htm. |
Muthitacharoen, A., et al., "A low bandwidth network file system", In Proc. of the 18th ACM Symp. on Operating Systems Principles, Oct. 2001, pp. 14. |
N. Baric and B. Pfitzmann, Collision-free accumulators and fail-stop signature schemes without trees, Advance in Cryptology, EUROCRYPT 97, 1997. |
N. Spring and D. Wetherall, A protocol independent technique for eliminating redundant network traffic, ACM SIGCOMM Conference, 2000. |
Orlitsky, A., et al., "One-Way Communication and Error-Correcting Codes", Correspondence, IEEE Transactions on Information Theory, vol. 49, No. 7, Jul. 2003, pp. 1781-1788. |
Orlitsky, Alon "Interactive Communication of Balanced Distributions and of Correlated Files", Siam J. Disc. Math. vol. 6, No. 4, Nov. 1993, pp. 548-564. |
Orlitsky, Alon "Worst-case Interactive Communication 11: Two Messages are Not Optimal", IEEE Transactions on Information Theory, vol. 37, No. 4, Jul. 1991, pp. 995-1005. |
Q. L. Hu, D.L. Lee, and W.C. Lee; Performance evaluation of a wireless hierarchical data dissemination system., In Proceedings of the 5th Annual ACM International Conference on Mobile Computing and Networking (MobiCom99), Seattle, WA, Aug. 1999. |
R. Johnson, D. Molnar, D. Song, and D. Wagner, Homomorphic signature schemes, Progress in Cryptology CT-RSA 2002, 2002. |
R. Karp and M. Rabin, Efficient randomized pattern-matching algorithms, IBM Journal of Research and Development, vol. 31, No. 2, pp. 249260. 1987. |
Rhea, Sean C., et al., "Value Based Web Caching", Proceedings of the Twelfth International World Wide Web Conference, May 2003, pp. 1-10. |
S. Acharya, R. Alonso, M. Franklin, and S. Zdonik. Broadcast disks: Data management for asymmetric communications environments., In Proceeding of ACM SIGMOD Conference on Management of Data, San Jose, CA, May 1995. |
S. Agarwal, D. Starobinski, and A. Trachtenberg, On the scalability of data synchronization protocols for PDAs and mobile devices, IEEE Network Magazine, special issue on Scalability in Communication Networks, Jul. 2002. |
S. Rhea, K. Liang, and E. Brewer, Value-based web caching, Proc. of the 12th Int. World Wide Web Conference, May 2003. |
Schwarz, T., et al., "Low Cost Comparisons of File Copies", IEEE (1990), pp. 196-202. |
Shivakumar, N., et al., "Efficient indexing for broadcast based wireless systems", Mobile Networks and Applications 1 (1996), pp. 433-446. |
Spring, Neil et al., "A Protocol-Independent Technique for Eliminating Redundant Network Traffic", pp. 9. |
StarBand, http://www.starband.com/. |
StarBand., http://www.starband.com/. |
Starobinski, D., et al., "Efficient PDA Synchronization", IEEE Transactions on Mobile Computing, vol. 2, No. 1, Jan.-Mar. 2003, pp. 40-51. |
T. Imielinski, S. Viswanathan, and B.R. Badrinath. Data on air Organization and access., IEEE Transactions on Knowledge and Data Engineering (TKDE), May/Jun. 1997. |
T. Schwarz, R., Bowdidge, and W. Burkhard, Low cost comparison of File copies, Proc. of the 10th Int. Conf. on Distributed Computing Systems, 1990, pp. 196202. |
Tridgell, Andrew et al., "The rsync algorithm", TR-CS-96-05, Jun. 18, 1996, pp. 1-6. |
Xu, J., et al., "Exponential Index: A Parameterized Distributed Indexing Scheme for Data on Air", MobiSys'04, Jun. 6-9, 2004, pp. 153-164. |
Y. Minsky, A. Trachtenberg, and R. Zippel, Set reconciliation with almost optimal communication complexity, Technical Report TR2000-1813, Cornell University, 2000. |
Zdelta Home Page, http://cis.poly.edu/zdelta/. |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070208748A1 (en) * | 2006-02-22 | 2007-09-06 | Microsoft Corporation | Reliable, efficient peer-to-peer storage |
US9047310B2 (en) * | 2006-02-22 | 2015-06-02 | Microsoft Technology Licensing, Llc | Reliable, efficient peer-to-peer storage |
US20100281062A1 (en) * | 2009-05-01 | 2010-11-04 | Brother Kogyo Kabushiki Kaisha | Management apparatus, recording medium recording an information generation program , and information generating method |
US8311976B2 (en) * | 2009-05-01 | 2012-11-13 | Brother Kogyo Kabushiki Kaisha | Management apparatus, recording medium recording an information generation program, and information generating method |
US20120192272A1 (en) * | 2011-01-20 | 2012-07-26 | F-Secure Corporation | Mitigating multi-AET attacks |
US8763121B2 (en) * | 2011-01-20 | 2014-06-24 | F-Secure Corporation | Mitigating multiple advanced evasion technique attacks |
US11115198B2 (en) * | 2018-09-19 | 2021-09-07 | Kabushiki Kaisha Toshiba | Key generation device, key generation method, and computer program product |
Also Published As
Publication number | Publication date |
---|---|
US20060235895A1 (en) | 2006-10-19 |
US20110238623A1 (en) | 2011-09-29 |
US20110264629A1 (en) | 2011-10-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7984018B2 (en) | Efficient point-to-multipoint data reconciliation | |
EP1999615B1 (en) | Reliable, efficient peer-to-peer storage | |
CN103973475B (en) | Different patch packet generation method and method for down loading, server, client | |
US7418649B2 (en) | Efficient implementation of reed-solomon erasure resilient codes in high-rate applications | |
US9419801B2 (en) | System and method for transmitting needed portions of a data file between networked computers | |
US20110314070A1 (en) | Optimization of storage and transmission of data | |
US20060020560A1 (en) | Content distribution using network coding | |
US20100218037A1 (en) | Matrix-based Error Correction and Erasure Code Methods and Apparatus and Applications Thereof | |
US20070177739A1 (en) | Method and Apparatus for Distributed Data Replication | |
US11226944B2 (en) | Cache management | |
US9294227B2 (en) | LT staircase FEC code | |
JP7159348B2 (en) | Dynamic Blockchain Data Storage Based on Error Correcting Codes | |
ITVI20120026A1 (en) | METHODS FOR SHARING FILES RELATIVE TO THE BIT FOUNTAIN PROTOCOL | |
CN102088331A (en) | Data transmission method and network node | |
Agarwal et al. | Bandwidth efficient string reconciliation using puzzles | |
US8224868B2 (en) | Network coding with last modified dates for P2P web caching | |
Zakerinasab et al. | An update model for network coding in cloud storage systems | |
Zakerinasab et al. | Practical network coding for the update problem in cloud storage systems | |
EP2434413B1 (en) | Method and distributed computing system for synchronizing data-sets stored on different communication devices | |
Zhang et al. | BEC: A reliable and efficient mechanism for cloud storage service | |
WO2022222527A1 (en) | Blockchain-based decentralized file system rebalancing method | |
Ali et al. | Multi-version coding for consistent distributed storage of correlated data updates | |
Ribeiro | Exploiting Rateless Coding in Structured Overlays to Achieve Persistent Storage |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: MICROSOFT CORPORATION, WASHINGTON Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:RODRIGUEZ, PABLO RODRIGUEZ;CHESTERFIELD, JULIAN;REEL/FRAME:026304/0027 Effective date: 20050418 |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
AS | Assignment |
Owner name: MICROSOFT TECHNOLOGY LICENSING, LLC, WASHINGTON Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:MICROSOFT CORPORATION;REEL/FRAME:034543/0001 Effective date: 20141014 |
|
FPAY | Fee payment |
Year of fee payment: 4 |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 8TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1552); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 8 |
|
FEPP | Fee payment procedure |
Free format text: MAINTENANCE FEE REMINDER MAILED (ORIGINAL EVENT CODE: REM.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
LAPS | Lapse for failure to pay maintenance fees |
Free format text: PATENT EXPIRED FOR FAILURE TO PAY MAINTENANCE FEES (ORIGINAL EVENT CODE: EXP.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
STCH | Information on status: patent discontinuation |
Free format text: PATENT EXPIRED DUE TO NONPAYMENT OF MAINTENANCE FEES UNDER 37 CFR 1.362 |
|
FP | Lapsed due to failure to pay maintenance fee |
Effective date: 20230719 |