US20100123607A1 - Method and system for efficient data transmission with server side de-duplication - Google Patents

Method and system for efficient data transmission with server side de-duplication Download PDF

Info

Publication number
US20100123607A1
US20100123607A1 US12/273,329 US27332908A US2010123607A1 US 20100123607 A1 US20100123607 A1 US 20100123607A1 US 27332908 A US27332908 A US 27332908A US 2010123607 A1 US2010123607 A1 US 2010123607A1
Authority
US
United States
Prior art keywords
length
bitstream
data blocks
decoding
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
US12/273,329
Other versions
US7733247B1 (en
Inventor
Dake He
Vadim Sheinin
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Google LLC
Original Assignee
International Business Machines Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by International Business Machines Corp filed Critical International Business Machines Corp
Assigned to INTERNATIONAL BUSINESS MACHINES CORPORATION reassignment INTERNATIONAL BUSINESS MACHINES CORPORATION ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: SHEININ, VADIM, HE, DAKE
Priority to US12/273,329 priority Critical patent/US7733247B1/en
Priority to CN200910222445.3A priority patent/CN101741838B/en
Priority to US12/751,888 priority patent/US8138954B2/en
Publication of US20100123607A1 publication Critical patent/US20100123607A1/en
Publication of US7733247B1 publication Critical patent/US7733247B1/en
Application granted granted Critical
Assigned to GOOGLE INC. reassignment GOOGLE INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: INTERNATIONAL BUSINESS MACHINES CORPORATION
Priority to US13/412,200 priority patent/US8836547B1/en
Assigned to GOOGLE LLC reassignment GOOGLE LLC CHANGE OF NAME (SEE DOCUMENT FOR DETAILS). Assignors: GOOGLE INC.
Active legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction

Definitions

  • server side data de-duplication data is transmitted before de-duplication in the link from the client to server.
  • Yet another embodiment of the invention provides a system for reducing transmission of redundant data blocks.
  • the system includes a client device including a Slepian-Wolf encoder module configured to encode a data block into a bitstream, a server device including a Slepian-Wolf decoder module configured to decode the bitstream using a plurality of previously stored data blocks on the server device, and a de-duplication module coupled to the decoder module configured to deduplicate successful decoded portions of the data block.
  • the Slepian-Wolf decoder is configured to reduce redundant data blocks before de-duplication.
  • a hash function is used to index the original value and then used later each time the data associated with the value is to be retrieved.
  • known hash functions are used, such as a division-remainder method, folding, radix transformation, digit rearrangement, etc.
  • encryption hash functions are used, such as MD2, MD4, MD5, the Secure Hash Algorithm (SHA), etc.

Abstract

The invention provides a method and system for reducing redundant data blocks. The method includes encoding a first data block having a first length into a bitstream having a second length, transmitting the bitstream to a server device, and reducing redundant data blocks by decoding the first data block from a first plurality of data blocks and the bitstream where each block in the first plurality of data blocks has a length equal to the first length.

Description

    BACKGROUND OF THE INVENTION
  • 1. Field of the Invention
  • The present invention relates generally to removing redundant data, and in particular to reducing data transmission for server side data de-duplication.
  • 2. Background Information
  • De-duplication processes partition data objects into smaller parts (named “chunks”) and retain only the unique chunks in a dictionary (repository) of chunks. To be able to reconstruct the object, a list of hashes (indexes or metadata) of the unique chunks is stored in place of original objects. The list of hashes is customarily ignored in the de-duplication compression ratios reported by various de-duplication product vendors. That is, vendors typically only report the unique chunk data size versus original size.
  • The list of hashes is relatively larger when smaller chunks are employed. Smaller chunks are more likely to match and can be used to achieve higher compression ratios. Known de-duplication systems try to diminish the significance of index metadata by using large chunk sizes, and therefore, accept lower overall compression ratios. Also, standard compression methods (LZ, Gzip, Compress, Bzip2, etc.) applied to the list of hashes perform poorly.
  • In order to reduce bandwidth requirements from client to server, (hash-based) data de-duplication has to be performed at the client. Client side data de-duplication has the following: 1) It is difficult to deploy as client side data de-duplication requires tighter integration into existing applications and systems; 2) It is difficult to do direct compare when using hashing methods in client side data de-duplication, and delta differencing requires large local cache which might not be available in a resource-limited client.
  • When client side data de-duplication is not possible, the alternative is to perform data de-duplication at the server. In server side data de-duplication, data is transmitted before de-duplication in the link from the client to server.
  • SUMMARY OF THE INVENTION
  • The invention provides a method and system for reducing redundant data blocks. The method includes encoding a first data block having a first length into a bitstream having a second length, transmitting the bitstream to a server device, and reducing redundant data blocks by decoding the first data block from a first plurality of data blocks and the bitstream where each block in the first plurality of data blocks has a length equal to the first length.
  • In one embodiment of the invention, the decoding is performed with a Slepian-Wolf decoder. In another embodiment of the invention, upon decoding being successful for the complete first length, de-duplication is performed on the first data block. In yet another embodiment of the invention, upon the decoding being unsuccessful for the complete first length, requesting further information of the first data block from a client. This embodiment of the invention further provides encoding the first data block having the first length into another bitstream having one of the second length and a third length, transmitting the other bitstream to the server device, and reducing redundant data blocks by decoding the first data block from a second plurality of data blocks and the other bitstream, where each block in the second plurality of data blocks has a length equal to the first length. In still another embodiment of the invention, de-duplication is performed by the decoding. In one embodiment of the invention, the decoding is performed with a variable length for a predetermined collision rate. In another embodiment of the invention, a transmission rate for the transmitting is variable to meet a predetermined collision rate.
  • In another provision of the invention, a system for reducing redundant data blocks includes a client device including an encoder module configured to encode a data block into a bitstream, a server device including a decoder module configured to decode the bitstream using a plurality of previously stored data blocks on the server device, and a de-duplication module coupled to the decoder module configured to deduplicate successful decoded portions of the data block.
  • One embodiment of the invention further includes a data storage device coupled to the server device. Another embodiment of the invention further includes at least another client device. In yet another embodiment of the invention, the encoder module performs a Slepian-Wolf encoding. In still another embodiment of the invention, the decoder module performs a Slepian-Wolf decoding. In one embodiment of the invention, the de-duplication module further includes a sequence identifier module configured to identify sequences of chunk portion identifiers of a data object, an indexing module configured to apply indexing to identification of chunk portions based on a sequence type, and an encoding module configured to encode first repeated sequences with a first encoding and encodes second repeated sequences with a second encoding, wherein storing repeated sequences of chunk portion identifiers is avoided.
  • Yet another embodiment of the invention provides a computer program product for reducing transmission of redundant data before de-duplication. The computer program product when executed by a processor encodes a first data block having a first length into a bitstream having a second length, transmits the bitstream to a server device, and reduces redundant data blocks by decoding the first data block from a first plurality of data blocks and the bitstream where each block in the first plurality of data blocks has a length equal to the first length.
  • In one embodiment of the invention, the decoding is performed with a Slepian-Wolf decoder. In another embodiment of the invention, upon decoding being successful for the complete first length, de-duplication is performed on the first data block. In yet another embodiment of the invention, upon decoding being unsuccessful for the complete first length, requesting further information of the first data block from a client. Still another embodiment of the invention further causes the computer to encode the first data block having the first length into another bitstream having one of the second length and a third length, transmit the other bitstream to the server device, and reduce redundant data blocks by decoding the first data block from a second plurality of data blocks and the other bitstream, where each block in the second plurality of data blocks have a length equal to the first length. In one embodiment of the invention, a transmission rate for the transmitting is variable to meet a predetermined collision rate.
  • Still another embodiment of the invention provides method including encoding a first data block having a first length into a bitstream having a second length using a Slepian-Wolf encoding process, transmitting the bitstream to a server device, and reducing redundant data blocks before de-duplication using a Slepian-Wolf decoding process by decoding the first data block from a first plurality of data blocks and the bitstream, where each block in the first plurality of data blocks have a length equal to the first length.
  • In one embodiment of the invention, upon the decoding being unsuccessful for the complete first length, requesting further information of the first data block from a client. In another embodiment of the invention, the method further includes encoding the first data block having the first length into another bitstream having one of the second length and a third length, transmitting the other bitstream to the server device, and reducing redundant data blocks by decoding the first data block from a second plurality of data blocks and the other bitstream, where each block in the second plurality of data blocks has a length equal to the first length.
  • Yet another embodiment of the invention provides a system for reducing transmission of redundant data blocks. The system includes a client device including a Slepian-Wolf encoder module configured to encode a data block into a bitstream, a server device including a Slepian-Wolf decoder module configured to decode the bitstream using a plurality of previously stored data blocks on the server device, and a de-duplication module coupled to the decoder module configured to deduplicate successful decoded portions of the data block.
  • In one embodiment of the invention, the Slepian-Wolf decoder is configured to reduce redundant data blocks before de-duplication.
  • Other aspects and advantages of the present invention will become apparent from the following detailed description, which, when taken in conjunction with the drawings, illustrate by way of example the principles of the invention.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • For a fuller understanding of the nature and advantages of the invention, as well as a preferred mode of use, reference should be made to the following detailed description read in conjunction with the accompanying drawings, in which:
  • FIG. 1 illustrates a system for reducing redundant data that needs to be transmitted to a server device before data de-duplication according to one embodiment of the invention;
  • FIG. 2 illustrates a block diagram of a process for reducing redundant data that needs to be transmitted to a server before data de-duplication according to one embodiment of the invention; and
  • FIG. 3 illustrates a de-duplication module of the system illustrated in FIG. 1 according to one embodiment of the invention.
  • DESCRIPTION OF THE PREFERRED EMBODIMENTS
  • The following description is made for the purpose of illustrating the general principles of the invention and is not meant to limit the inventive concepts claimed herein. Further, particular features described herein can be used in combination with other described features in each of the various possible combinations and permutations. Unless otherwise specifically defined herein, all terms are to be given their broadest possible interpretation including meanings implied from the specification as well as meanings understood by those skilled in the art and/or as defined in dictionaries, treatises, etc.
  • The description may disclose several preferred embodiments of reducing redundant data blocks before de-duplication, as well as operation and/or component parts thereof. While the following description will be described in terms of de-duplication reduction processes and devices for clarity and to place the invention in context, it should be kept in mind that the teachings herein may have broad application to all types of systems, devices and applications.
  • The invention provides a method and system for reducing redundant data blocks. The method includes encoding a first data block having a first length into a bitstream having a second length, transmitting the bitstream to a server device, and reducing redundant data blocks by decoding the first data block from a first plurality of data blocks and the bitstream where each block in the first plurality of data blocks has a length equal to the first length.
  • FIG. 1 illustrates a block diagram of a system 100 for reducing transmission of redundant data blocks before de-duplication according to one embodiment. As illustrated, system 100 includes client devices 1 to N 150 including an encoder module 160. In one embodiment of the invention, the encoder uses Slepian-Wolf encoding (David Slepian and J. K. Wolf “Noiseless Coding of Correlated Information Sources”; IEEE Transactions on Information Theory; July 1973; pp. 471-480; vol. 19.). It should be noted that Slepian-Wolf refers to the compression of the outputs of two or more physically separated sources that do not communicate with each other (hence distributed coding). These sources send their compressed outputs to a central point (e.g., the server device 110) for joint decoding. Other embodiments of the invention use other known distributed coding techniques for encoding and decoding.
  • The client devices 150 are connected to a server device 110 through a network, wireless connection, wired connection, etc. The server device includes a decoder module 120 and a de-duplication module 130. Coupled to the server device is a data storage device 140. In one embodiment of the invention, the decoder module performs decoding using Slepian-Wolf decoding. In one embodiment of the invention, the client devices 150 include data sources, such as uploaded/downloaded files (e.g., data files, video/audio files, streaming media, etc.) that can be resident or non-resident in client device 150. In one embodiment of the invention, the data source is downloaded from a network (wired or wirelessly), such as the Internet, a local area network (LAN), wide area network (WAN), a disk, a disk drive, flash card, memory, etc.
  • In one embodiment of the invention, the encoder module 160 uses a length-n Slepian-Wolf coder to encode a binary data block X having a length n into a bitstream Z of m bits, where m<n (m and n being integers greater than 0). The client device 150 transmits or routes Z to the server device 110. The server device 110 uses the decoder module 120 of the same Slepian-Wolf code used by the encoder module 160 of the client device 150 to decode X from Z and a set of N stored data blocks on the data storage device 140, each having data block in the set having a length of n {Y(i)}_{i=1}̂{N}. If the decoding of (Z, Y_i) for i=N is successful, then X is forwarded to the de-duplication module 130 for de-duplication. If the decoding fails and i=N, the server device 110 sends a request to the client device 150 for more information about X. When the client device 150 receives the server device 110 request for more information about X, the encoder module 160 uses the same or a different Slepian-Wolf code to generate another bitstream Z′ of m′ bits, where m+m′≦n, and transmits or routes Z′ to the server device 110 for further decoding by the decoder module 120 where Z=(Z, Z′) for the decoding. In one embodiment of the invention, m and the Slepian-Wolf codes are designed to meet a desired collision rate.
  • FIG. 2 illustrates the de-duplication module 130. The de-duplication module 130 performs de-duplication of the decoded data blocks in the data storage device 140. In one embodiment, metadata includes descriptions, parameters, priority, date, time, and other pertinent information regarding chunked object portions. A hash is a transformation of a string of characters (e.g., metadata) into a shorter fixed-length value or key that represents the original string. In one embodiment, hashing is used to index and retrieve chunk portions in the data storage device 140. It should be noted that it is faster to find a chunk portion using the shorter hashed metadata than to find it using the original value. In one embodiment a hashing function is used to create an indexed version of the represented value of chunk portions of data objects. That is, a hash function is used to index the original value and then used later each time the data associated with the value is to be retrieved. In one embodiment, known hash functions are used, such as a division-remainder method, folding, radix transformation, digit rearrangement, etc. In another embodiment, encryption hash functions are used, such as MD2, MD4, MD5, the Secure Hash Algorithm (SHA), etc.
  • In one embodiment of the invention, the de-duplication module 130 includes chunking module 141, search module 142, sequence identifier module 143, indexing module 144, encoding module 145 and a removal module 146. In another embodiment of the invention, the individual modules included in the de-duplication module 130 can be a software process, a hardware module or a combination of software and hardware. In one embodiment of the invention, de-duplication module 130 reduces an index of identifiers for chunk portions in de-duplication where the identifiers are metadata hashes of objects. The chunking module 141 is configured to create smaller chunk portions from chunks received from a data chunker. In another embodiment of the invention, the chunking module 141 performs chunking of an input stream of larger chunks by one or more of: fixed size chunking, sliding window chunking, variable size chunking and content dependent chunking, in order to reduce the input stream of chunk portions to smaller chunk portions.
  • In one embodiment of the invention, the search module 142 searches the data storage device 140 to find matching chunks to a chunk originally destined for the data storage device 140. In one embodiment of the invention, the sequence identifier module 143 operates to identify sequences of chunk portion identifiers of a data object. The indexing module 144 operates to apply indexing to identification of chunk portions based on a chunk repeating sequence type according to one embodiment of the invention. In another embodiment of the invention, the stored identification (e.g., hashed metadata) of chunk portions includes a chronological pointer linking newly added identification of chunk portions in chronological order.
  • In one embodiment of the invention, the encoding module 145 is connected to the indexing module 144 and the encoding module 145 operates to encode first repeated chunk sequences with a first encoding and encodes second repeated chunk sequences with a second encoding, and repeated sequences of chunk portion identifiers are removed from a memory to reduce storage use. The second encoding identifies the first appearance of the first repeated sequences of chunk portions, according to one embodiment of the invention. In another embodiment of the invention, the second encoding includes a distance offset from a first appearance of a repeated chunk portion to a second appearance of the repeated chunk portion. In one embodiment of the invention, the sequence type is assigned based on a length of repeated chunk identification. In one embodiment of the invention, an optional removal module 146 removes repeated chunk portions from the data storage device 140 to reduce stored chunk portions stored in the data storage device 140.
  • In one embodiment of the invention, the reduction in the amount of data to be transmitted or routed from the client device 150 to the server device 110 reduces transmission time and/or lowers bandwidth requirements. Since the encoder module 160 includes Slepian-Wolf coding, which is simple to implement and computationally efficient, the embodiments of the invention can be easily integrated into existing applications and systems.
  • One advantage of using the embodiments of the invention over non-hash-valued server side de-duplication is the amount of data to be transmitted is significantly reduced than using lossless compression in the case where data de-duplication is effective. For data having a duplicated copy stored at the server, de-duplication can be performed in one pass with Slepian-Wolf decoding whereas with lossless compression, decompression and de-duplication have to been performed sequentially in two passes. Other advantages of using the embodiments of the invention over hash-valued server side data de-duplication are: the embodiments are more flexible and efficient than hash-valued server side data de-duplication in the sense that significant compression can still be achieved when no exact copy of the data is available at the server but its slight variations are present (in this case no exact match of the hash value of the data to be transmitted can be found, and as a result the original data needs to be transmitted)[for the Slepian-Wolf case, if the difference between stored uncompressed data and Slepian-Wolf decoded data is small, a second tier of de-duplication can be performed after decoding to further compress stored data]; in contrast to sending only hash values within a definite range, the embodiments of the invention allow flexible adjustment of transmission rate to meet the desired collision rate.
  • FIG. 3 illustrates a block diagram of a process 300 for reducing transmission of data from a client device to a server device. Process 300 begins with block 310 where a block of data X having a length n is encoded using a Slepian-Wolf decoder into a bitstream Z of m bits on a client device. In block 320, the bitstream Z is transmitted or routed to a server device. In block 330, a Slepian-Wolf decoder is initialized by setting i to one (“1”) where i is an index for decoding the ith block of data in the bitstream Z. In block 340, the bitstream Z is decoded to X for the ith block of data (which is currently is i=1).
  • Block 350 determines whether the decoding the ith block of data is successful or not. If the decoding for all data block is successful (i.e., i=1-n), process 300 continues to block 355 where data de-duplication is performed on the decoded data block X on the server device. If decoding the ith data block is not successful, process 300 continues with block 360. In block 360, it is determined if i is less than n (i<n). If i is less than n, process 300 continues with block 365 where i is incremented by 1 (i=i+1) and process 300 continues to block 340. If it is determined that i is not less than 1, process 300 continues with block 370 where a request for more information about data block X is sent from the server device to the client device. Process 300 continues with block 380 where the client device encodes a data block of X having a length n using the same or different Slepian-Wolf encoder into a bitstream Z′ of m′ bits on the client device (where m+m′<n). Z is then set to (Z,Z′) and process 300 continues to block 320. Process 300 continues until data block X is recovered for data de-duplication on the server device.
  • The embodiments of the invention can take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment containing both hardware and software elements. In a preferred embodiment, the invention is implemented in software, which includes but is not limited to firmware, resident software, microcode, etc.
  • Furthermore, the embodiments of the invention can take the form of a computer program product accessible from a computer-usable or computer-readable medium providing program code for use by or in connection with a computer, processing device, or any instruction execution system. For the purposes of this description, a computer-usable or computer readable medium can be any apparatus that can contain, store, communicate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
  • The medium can be electronic, magnetic, optical, or a semiconductor system (or apparatus or device). Examples of a computer-readable medium include, but are not limited to, a semiconductor or solid state memory, magnetic tape, a removable computer diskette, a RAM, a read-only memory (ROM), a rigid magnetic disk, an optical disk, etc. Current examples of optical disks include compact disk-read only memory (CD-ROM), compact disk-read/write (CD-R/W) and DVD.
  • I/O devices (including but not limited to keyboards, displays, pointing devices, etc.) can be connected to the system either directly or through intervening controllers. Network adapters may also be connected to the system to enable the data processing system to become connected to other data processing systems or remote printers or storage devices through intervening private or public networks. Modems, cable modem and Ethernet cards are just a few of the currently available types of network adapters.
  • In the description above, numerous specific details are set forth. However, it is understood that embodiments of the invention may be practiced without these specific details. For example, well-known equivalent components and elements may be substituted in place of those described herein, and similarly, well-known equivalent techniques may be substituted in place of the particular techniques disclosed. In other instances, well-known structures and techniques have not been shown in detail to avoid obscuring the understanding of this description.
  • Reference in the specification to “an embodiment,” “one embodiment,” “some embodiments,” or “other embodiments” means that a particular feature, structure, or characteristic described in connection with the embodiments is included in at least some embodiments, but not necessarily all embodiments. The various appearances of “an embodiment,” “one embodiment,” or “some embodiments” are not necessarily all referring to the same embodiments. If the specification states a component, feature, structure, or characteristic “may,” “might,” or “could” be included, that particular component, feature, structure, or characteristic is not required to be included. If the specification or claim refers to “a” or “an” element, that does not mean there is only one of the element. If the specification or claims refer to “an additional” element, that does not preclude there being more than one of the additional element.
  • While certain exemplary embodiments have been described and shown in the accompanying drawings, it is to be understood that such embodiments are merely illustrative of and not restrictive on the broad invention, and that this invention not be limited to the specific constructions and arrangements shown and described, since various other modifications may occur to those ordinarily skilled in the art.

Claims (24)

1. A method comprising:
encoding a first data block having a first length into a bitstream having a second length;
transmitting the bitstream to a server; and
reducing redundant data blocks by decoding the first data block from a first plurality of data blocks and the bitstream where each block in the first plurality of data blocks has a length equal to the first length.
2. The method of claim 1, wherein the decoding is performed with a Slepian-Wolf decoder.
3. The method of claim 1, wherein upon decoding being successful for the complete first length, de-duplication is performed on the first data block.
4. The method of claim 1, wherein upon the decoding being unsuccessful for the complete first length, requesting further information of the first data block from a client.
5. The method of claim 4, further comprising:
encoding the first data block having the first length into another bitstream having one of the second length and a third length;
transmitting the other bitstream to the server; and
reducing redundant data blocks by decoding the first data block from a second plurality of data blocks and the other bitstream, where each block in the second plurality of data blocks has a length equal to the first length.
6. The method of claim 1, wherein de-duplication is performed by the decoding.
7. The method of claim 1, wherein the decoding is performed with a variable length for a predetermined collision rate.
8. The method of claim 1, wherein a transmission rate for the transmitting is variable to meet a predetermined collision rate.
9. A system for reducing redundant data blocks, comprising:
a client device including an encoder module configured to encode a data block into a bitstream;
a server device including a decoder module configured to decode the bitstream using a plurality of previously stored data blocks on the server device; and
a de-duplication module coupled to the decoder module configured to deduplicate successful decoded portions of the data block.
10. The system of claim 9, further comprising a data storage device coupled to the server device.
11. The system of claim 9, further comprising at least another client device.
12. The system of claim 9, wherein the encoder module performs a Slepian-Wolf encoding.
13. The system of claim 9, wherein the decoder module performs a Slepian-Wolf decoding.
14. A computer program product comprising a computer usable medium including a computer readable program, wherein the computer readable program when executed on a computer causes the computer to:
encode a first data block having a first length into a bitstream having a second length;
transmit the bitstream to a server; and
reduce redundant data blocks by decoding the first data block from a first plurality of data blocks and the bitstream where each block in the first plurality of data blocks has a length equal to the first length.
15. The computer program product of claim 14, wherein the decoding is performed with a Slepian-Wolf decoder.
16. The computer program product of claim 14, wherein upon decoding being successful for the complete first length, de-duplication is performed on the first data block.
17. The computer program product of claim 14, wherein upon decoding being unsuccessful for the complete first length, requesting further information of the first data block from a client.
18. The computer program product of claim 14, further causing the computer to:
encode the first data block having the first length into another bitstream having one of the second length and a third length;
transmit the other bitstream to the server; and
reduce redundant data blocks by decoding the first data block from a second plurality of data blocks and the other bitstream, where each block in the second plurality of data blocks has a length equal to the first length.
19. The computer program product of claim 14, wherein a transmission rate for the transmitting is variable to meet a predetermined collision rate.
20. A method comprising:
encoding a first data block having a first length into a bitstream having a second length using a Slepian-Wolf encoding process;
transmitting the bitstream to a server device; and
reducing redundant data blocks before de-duplication using a Slepian-Wolf decoding process by decoding the first data block from a first plurality of data blocks and the bitstream, where each block in the first plurality of data blocks has a length equal to the first length.
21. The method of claim 20, wherein upon the decoding being unsuccessful for the complete first length, requesting further information of the first data block from a client.
22. The method of claim 21, further comprising:
encoding the first data block having the first length into another bitstream having one of the second length and a third length;
transmitting the other bitstream to the server device; and
reducing redundant data blocks by decoding the first data block from a second plurality of data blocks and the other bitstream, where each block in the second plurality of data blocks has a length equal to the first length.
23. A system for reducing transmission of redundant data blocks, comprising:
a client device including a Slepian-Wolf encoder module configured to encode a data block into a bitstream;
a server device including a Slepian-Wolf decoder module configured to decode the bitstream using a plurality of previously stored data blocks on the server device; and
a de-duplication module coupled to the decoder module configured to deduplicate successful decoded portions of the data block.
24. The system of claim 23, wherein the Slepian-Wolf decoder is configured to reduce redundant data blocks before de-duplication.
US12/273,329 2008-11-18 2008-11-18 Method and system for efficient data transmission with server side de-duplication Active 2028-11-26 US7733247B1 (en)

Priority Applications (4)

Application Number Priority Date Filing Date Title
US12/273,329 US7733247B1 (en) 2008-11-18 2008-11-18 Method and system for efficient data transmission with server side de-duplication
CN200910222445.3A CN101741838B (en) 2008-11-18 2009-11-13 Method and system for diminishing redundancy data block or its transmission
US12/751,888 US8138954B2 (en) 2008-11-18 2010-03-31 Method and system for efficient data transmission with server side de-duplication
US13/412,200 US8836547B1 (en) 2008-11-18 2012-03-05 Server side data storage and deduplication

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US12/273,329 US7733247B1 (en) 2008-11-18 2008-11-18 Method and system for efficient data transmission with server side de-duplication

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US12/751,888 Continuation US8138954B2 (en) 2008-11-18 2010-03-31 Method and system for efficient data transmission with server side de-duplication

Publications (2)

Publication Number Publication Date
US20100123607A1 true US20100123607A1 (en) 2010-05-20
US7733247B1 US7733247B1 (en) 2010-06-08

Family

ID=42171584

Family Applications (3)

Application Number Title Priority Date Filing Date
US12/273,329 Active 2028-11-26 US7733247B1 (en) 2008-11-18 2008-11-18 Method and system for efficient data transmission with server side de-duplication
US12/751,888 Active - Reinstated 2028-12-04 US8138954B2 (en) 2008-11-18 2010-03-31 Method and system for efficient data transmission with server side de-duplication
US13/412,200 Expired - Fee Related US8836547B1 (en) 2008-11-18 2012-03-05 Server side data storage and deduplication

Family Applications After (2)

Application Number Title Priority Date Filing Date
US12/751,888 Active - Reinstated 2028-12-04 US8138954B2 (en) 2008-11-18 2010-03-31 Method and system for efficient data transmission with server side de-duplication
US13/412,200 Expired - Fee Related US8836547B1 (en) 2008-11-18 2012-03-05 Server side data storage and deduplication

Country Status (2)

Country Link
US (3) US7733247B1 (en)
CN (1) CN101741838B (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100281207A1 (en) * 2009-04-30 2010-11-04 Miller Steven C Flash-based data archive storage system
US20110218972A1 (en) * 2010-03-08 2011-09-08 Quantum Corporation Data reduction indexing
US20120150824A1 (en) * 2010-12-10 2012-06-14 Inventec Corporation Processing System of Data De-Duplication
US20130254441A1 (en) * 2012-03-20 2013-09-26 Sandisk Technologies Inc. Method and apparatus to process data based upon estimated compressibility of the data
US20140002918A1 (en) * 2012-06-27 2014-01-02 Marvell World Trade Ltd. Systems and methods for reading and decoding encoded data from a storage device
US20140181465A1 (en) * 2012-04-05 2014-06-26 International Business Machines Corporation Increased in-line deduplication efficiency
US20150181308A1 (en) * 2012-02-08 2015-06-25 Vixs Systems, Inc. Container agnostic decryption device and methods for use therewith
US9078015B2 (en) 2010-08-25 2015-07-07 Cable Television Laboratories, Inc. Transport of partially encrypted media
CN105258303A (en) * 2015-11-20 2016-01-20 珠海格力电器股份有限公司 Remote transmission control method and device for air conditioner unit operation data and air conditioner
CN113709510A (en) * 2021-08-06 2021-11-26 联想(北京)有限公司 High-speed data real-time transmission method and device, equipment and storage medium

Families Citing this family (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8938595B2 (en) * 2003-08-05 2015-01-20 Sepaton, Inc. Emulated storage system
US9372941B2 (en) 2007-10-25 2016-06-21 Hewlett Packard Enterprise Development Lp Data processing apparatus and method of processing data
US8140637B2 (en) * 2007-10-25 2012-03-20 Hewlett-Packard Development Company, L.P. Communicating chunks between devices
WO2009054827A1 (en) * 2007-10-25 2009-04-30 Hewlett-Packard Development Company, L.P. Data processing apparatus and method of processing data
US8510275B2 (en) 2009-09-21 2013-08-13 Dell Products L.P. File aware block level deduplication
US8228213B2 (en) * 2009-09-23 2012-07-24 International Business Machines Corporation Data compression system and associated methods
US9325625B2 (en) * 2010-01-08 2016-04-26 Citrix Systems, Inc. Mobile broadband packet switched traffic optimization
US8514697B2 (en) * 2010-01-08 2013-08-20 Sycamore Networks, Inc. Mobile broadband packet switched traffic optimization
US8560552B2 (en) * 2010-01-08 2013-10-15 Sycamore Networks, Inc. Method for lossless data reduction of redundant patterns
US8495312B2 (en) * 2010-01-25 2013-07-23 Sepaton, Inc. System and method for identifying locations within data
GB2470498B (en) * 2010-07-19 2011-04-06 Quantum Corp Establishing parse scope
US9122639B2 (en) 2011-01-25 2015-09-01 Sepaton, Inc. Detection and deduplication of backup sets exhibiting poor locality
CN102811212A (en) * 2011-06-02 2012-12-05 英业达集团(天津)电子技术有限公司 Data encryption method with repetitive data deleting function and system thereof
US8996800B2 (en) * 2011-07-07 2015-03-31 Atlantis Computing, Inc. Deduplication of virtual machine files in a virtualized desktop environment
US8484170B2 (en) 2011-09-19 2013-07-09 International Business Machines Corporation Scalable deduplication system with small blocks
US9417811B2 (en) 2012-03-07 2016-08-16 International Business Machines Corporation Efficient inline data de-duplication on a storage system
US9069472B2 (en) 2012-12-21 2015-06-30 Atlantis Computing, Inc. Method for dispersing and collating I/O's from virtual machines for parallelization of I/O access and redundancy of storing virtual machine data
US9277010B2 (en) 2012-12-21 2016-03-01 Atlantis Computing, Inc. Systems and apparatuses for aggregating nodes to form an aggregated virtual storage for a virtualized desktop environment
US9250946B2 (en) 2013-02-12 2016-02-02 Atlantis Computing, Inc. Efficient provisioning of cloned virtual machine images using deduplication metadata
US9372865B2 (en) 2013-02-12 2016-06-21 Atlantis Computing, Inc. Deduplication metadata access in deduplication file system
US9471590B2 (en) 2013-02-12 2016-10-18 Atlantis Computing, Inc. Method and apparatus for replicating virtual machine images using deduplication metadata
US9766832B2 (en) 2013-03-15 2017-09-19 Hitachi Data Systems Corporation Systems and methods of locating redundant data using patterns of matching fingerprints
US9256611B2 (en) 2013-06-06 2016-02-09 Sepaton, Inc. System and method for multi-scale navigation of data
US9678973B2 (en) 2013-10-15 2017-06-13 Hitachi Data Systems Corporation Multi-node hybrid deduplication
US10545918B2 (en) 2013-11-22 2020-01-28 Orbis Technologies, Inc. Systems and computer implemented methods for semantic data compression
CN106796809B (en) 2014-10-03 2019-08-09 杜比国际公司 The intellectual access of personalized audio
CN108260163B (en) * 2018-03-28 2023-03-24 中兴通讯股份有限公司 Information sending and receiving method and device

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060200733A1 (en) * 2005-03-01 2006-09-07 Stankovic Vladimir M Multi-source data encoding, transmission and decoding using Slepian-Wolf codes based on channel code partitioning
US7271747B2 (en) * 2005-05-10 2007-09-18 Rice University Method and apparatus for distributed compressed sensing
US20070233707A1 (en) * 2006-03-29 2007-10-04 Osmond Roger F Combined content indexing and data reduction
US20070255758A1 (en) * 2006-04-28 2007-11-01 Ling Zheng System and method for sampling based elimination of duplicate data
US7295137B2 (en) * 2005-03-01 2007-11-13 The Texas A&M University System Data encoding and decoding using Slepian-Wolf coded nested quantization to achieve Wyner-Ziv coding
US20080005201A1 (en) * 2006-06-29 2008-01-03 Daniel Ting System and method for managing data deduplication of storage systems utilizing persistent consistency point images
US20080065633A1 (en) * 2006-09-11 2008-03-13 Simply Hired, Inc. Job Search Engine and Methods of Use
US20090103606A1 (en) * 2007-10-17 2009-04-23 Microsoft Corporation Progressive Distributed Video Coding

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2509579C (en) * 2002-12-12 2011-10-18 Finite State Machine Labs, Inc. Systems and methods for detecting a security breach in a computer system
US7240064B2 (en) * 2003-11-10 2007-07-03 Overture Services, Inc. Search engine with hierarchically stored indices
US7653867B2 (en) * 2005-03-01 2010-01-26 The Texas A&M University System Multi-source data encoding, transmission and decoding using Slepian-Wolf codes based on channel code partitioning
JP2007324754A (en) * 2006-05-30 2007-12-13 Ntt Docomo Inc Signal receiving section detector
US7504969B2 (en) * 2006-07-11 2009-03-17 Data Domain, Inc. Locality-based stream segmentation for data deduplication

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060200733A1 (en) * 2005-03-01 2006-09-07 Stankovic Vladimir M Multi-source data encoding, transmission and decoding using Slepian-Wolf codes based on channel code partitioning
US7295137B2 (en) * 2005-03-01 2007-11-13 The Texas A&M University System Data encoding and decoding using Slepian-Wolf coded nested quantization to achieve Wyner-Ziv coding
US7271747B2 (en) * 2005-05-10 2007-09-18 Rice University Method and apparatus for distributed compressed sensing
US20070233707A1 (en) * 2006-03-29 2007-10-04 Osmond Roger F Combined content indexing and data reduction
US20070255758A1 (en) * 2006-04-28 2007-11-01 Ling Zheng System and method for sampling based elimination of duplicate data
US20080005201A1 (en) * 2006-06-29 2008-01-03 Daniel Ting System and method for managing data deduplication of storage systems utilizing persistent consistency point images
US20080065633A1 (en) * 2006-09-11 2008-03-13 Simply Hired, Inc. Job Search Engine and Methods of Use
US20090103606A1 (en) * 2007-10-17 2009-04-23 Microsoft Corporation Progressive Distributed Video Coding

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100281207A1 (en) * 2009-04-30 2010-11-04 Miller Steven C Flash-based data archive storage system
US20110218972A1 (en) * 2010-03-08 2011-09-08 Quantum Corporation Data reduction indexing
US8423519B2 (en) * 2010-03-08 2013-04-16 Jeffrey Vincent TOFANO Data reduction indexing
US9078015B2 (en) 2010-08-25 2015-07-07 Cable Television Laboratories, Inc. Transport of partially encrypted media
US20120150824A1 (en) * 2010-12-10 2012-06-14 Inventec Corporation Processing System of Data De-Duplication
US9641322B2 (en) * 2012-02-08 2017-05-02 Vixs Systems, Inc. Container agnostic decryption device and methods for use therewith
US20150181308A1 (en) * 2012-02-08 2015-06-25 Vixs Systems, Inc. Container agnostic decryption device and methods for use therewith
US9246511B2 (en) * 2012-03-20 2016-01-26 Sandisk Technologies Inc. Method and apparatus to process data based upon estimated compressibility of the data
US20130254441A1 (en) * 2012-03-20 2013-09-26 Sandisk Technologies Inc. Method and apparatus to process data based upon estimated compressibility of the data
US20140181465A1 (en) * 2012-04-05 2014-06-26 International Business Machines Corporation Increased in-line deduplication efficiency
US9268497B2 (en) * 2012-04-05 2016-02-23 International Business Machines Corporation Increased in-line deduplication efficiency
US20140002918A1 (en) * 2012-06-27 2014-01-02 Marvell World Trade Ltd. Systems and methods for reading and decoding encoded data from a storage device
US10255944B2 (en) * 2012-06-27 2019-04-09 Marvell World Trade Ltd. Systems and methods for reading and decoding encoded data from a storage device
CN105258303A (en) * 2015-11-20 2016-01-20 珠海格力电器股份有限公司 Remote transmission control method and device for air conditioner unit operation data and air conditioner
CN113709510A (en) * 2021-08-06 2021-11-26 联想(北京)有限公司 High-speed data real-time transmission method and device, equipment and storage medium

Also Published As

Publication number Publication date
US7733247B1 (en) 2010-06-08
US20100188273A1 (en) 2010-07-29
CN101741838B (en) 2013-07-31
US8836547B1 (en) 2014-09-16
CN101741838A (en) 2010-06-16
US8138954B2 (en) 2012-03-20

Similar Documents

Publication Publication Date Title
US7733247B1 (en) Method and system for efficient data transmission with server side de-duplication
US8645333B2 (en) Method and apparatus to minimize metadata in de-duplication
US8578058B2 (en) Real-time multi-block lossless recompression
US9680500B2 (en) Staged data compression, including block level long range compression, for data streams in a communications system
US8456332B2 (en) Systems and methods for compression of logical data objects for storage
CN107395209B (en) Data compression method, data decompression method and equipment thereof
US8872677B2 (en) Method and apparatus for compressing data-carrying signals
US20210297708A1 (en) Residual entropy compression for cloud-based video applications
CN112584155B (en) Video data processing method and device
Vestergaard et al. A randomly accessible lossless compression scheme for time-series data
CN112380196B (en) Server for data compression transmission
EP1751873A1 (en) Method and apparatus for structured block-wise compressing and decompressing of xml data
Rathore et al. A brief study of data compression algorithms
Talasila et al. Generalized deduplication: Lossless compression by clustering similar data
Shah et al. The improvised GZIP, a technique for real time lossless data compression
US7750826B2 (en) Data structure management for lossless data compression
Waghulde et al. New data compression algorithm and its comparative study with existing techniques
Hema Data Compression and Source Coding
Jena An improved Lempel-Ziv algorithm for sequential data compression

Legal Events

Date Code Title Description
AS Assignment

Owner name: INTERNATIONAL BUSINESS MACHINES CORPORATION,NEW YO

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:HE, DAKE;SHEININ, VADIM;SIGNING DATES FROM 20080917 TO 20080919;REEL/FRAME:021853/0625

FEPP Fee payment procedure

Free format text: PAYOR NUMBER ASSIGNED (ORIGINAL EVENT CODE: ASPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

STCF Information on status: patent grant

Free format text: PATENTED CASE

AS Assignment

Owner name: GOOGLE INC., CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:INTERNATIONAL BUSINESS MACHINES CORPORATION;REEL/FRAME:027463/0594

Effective date: 20111228

FPAY Fee payment

Year of fee payment: 4

AS Assignment

Owner name: GOOGLE LLC, CALIFORNIA

Free format text: CHANGE OF NAME;ASSIGNOR:GOOGLE INC.;REEL/FRAME:044101/0610

Effective date: 20170929

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 8TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1552)

Year of fee payment: 8

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 12TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1553); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Year of fee payment: 12