EP2692112A1 - Method for data cache in a gateway - Google Patents

Method for data cache in a gateway

Info

Publication number
EP2692112A1
EP2692112A1 EP11805865.0A EP11805865A EP2692112A1 EP 2692112 A1 EP2692112 A1 EP 2692112A1 EP 11805865 A EP11805865 A EP 11805865A EP 2692112 A1 EP2692112 A1 EP 2692112A1
Authority
EP
European Patent Office
Prior art keywords
data
gateway
blocks
gateways
remote
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP11805865.0A
Other languages
German (de)
French (fr)
Inventor
Gilles Straub
Erwan Le Merrer
Nicolas Le Scouarnec
Serge Defrance
Alexandre Van Kempen
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Thomson Licensing SAS
Original Assignee
Thomson Licensing SAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Thomson Licensing SAS filed Critical Thomson Licensing SAS
Priority to EP11805865.0A priority Critical patent/EP2692112A1/en
Publication of EP2692112A1 publication Critical patent/EP2692112A1/en
Withdrawn legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/56Provisioning of proxy services
    • H04L67/568Storing data temporarily at an intermediate stage, e.g. caching
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/40Support for services or applications
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1095Replication or mirroring of data, e.g. scheduling or transport for data synchronisation between network nodes

Definitions

  • the invention concerns a method to cache data in a gateway.
  • the present invention is particularly related to the backup of data.
  • Digital data is becoming increasingly common (photos, videos, documents) and increasingly important. Digital data is stored on disks which are a fragile supports prone to failures, burglar or any catastrophic event such as fire. Yet, very few people have good backup practices for their personal content for a few reasons ranging from not being aware of the importance of backup to the lack of good automated backup policies suited for them.
  • the best approach to backup is to deploy an automated off-site backup similarly to what is done in corporate networks.
  • the off-site feature is very important as a backup on a second disk attached to the computer leaves the user exposed to burglar, fire, or thunder. Yet off-site backup, as well as the related problem of sharing, is inhibited by the slow uplink of the internet connection of the users. Hence, performing backup requires long periods of uptime to upload the content.
  • the present invention proposes to cache content to backup/upload in a gateway between a local area network and the wide area network.
  • the present invention proposes a method for data cache on a first gateway comprising recording medium, the first gateway being connected through a first local area network to at least one local device, and connected through a wide area network to at least one remote device comprising a recording medium.
  • the method comprises the following steps:
  • the method further comprises a step of ciphering the cached data on the local gateway before the transmitting step.
  • the method further comprises a step of splitting the data into k blocks, each of the k blocks being respectively transmitted to one over k remote devices for storing it.
  • the method comprises a further step of adding redundancy data to the data, before the transmission step, obtaining n blocks of data with n>k.
  • the n blocks replicate the k blocks, n minus k blocks being a duplication of some of the k blocks.
  • the n blocks are formed using the data of the k blocks to which error correction data are added.
  • the first gateway comprises an index table establishing an index between each block and the remote device storing the block.
  • the first gateway is therefore able to manage the allocation of blocks into the remote device; this is needed for the retrieval of blocks. This can be managed as a file system.
  • the index which is small but precious, can be stored onto some sort of distributed resilient structure... (DHT), or stored into a safe central server.
  • DHT distributed resilient structure
  • remote devices are remote gateways.
  • gateways are the devices commonly used for interconnecting local area networks and wide area networks and are devices which are most of the time powered on, gateways are the preferred devices for caching data.
  • remote gateways are connected to local devices through local area networks, the method further comprising a step of transferring the data from the remote gateways to the devices local to the remote gateways though the local area networks, when the devices local to the remote gateways are available.
  • the gateway may not have a huge capacity of recording space, it may be useful to add additional storage space by associating the devices local to the remote gateways in the backup process.
  • the k blocks of data or the n blocks of data are each stored on one device local to the remote gateways.
  • the data before transferring the data to the devices local to the remote gateways, the data are cached into the remote gateways.
  • the transfer of data from the remote gateways can be performed asynchronously, when the devices are available.
  • Each block can moreover be transferred to the destination device when said device is available, and each of the block can be transferred to its destination device asynchronously from the other blocks, depending on the availability of the destination devices.
  • FIG. 1 represents an overview of a system implementing a preferred embodiment of a method according to the invention
  • FIG. 1 On figure 1 , four local area networks LAN1 , LAN2, LAN3 and LAN4 are connected by a wide area network, preferably of internet type. In a preferred embodiment there 4 networks of Ethernet types but could be any other networks interconnecting peripherals.
  • Each of the LANs comprises a gateway GW, (GW1 in LAN1 , GW2 in LAN2, GW3 in LAN3, GW4 in LAN4) and comprises a set of peripherals, 1 1 , 12, 13 and 14 on LAN1 ; 21 , 22, 23 on LAN2; 31 , 32, 33 on LAN3 and 41 , 42, 43, 44, 45 on LAN4.
  • GW1 in LAN1 , GW2 in LAN2, GW3 in LAN3, GW4 in LAN4 comprises a set of peripherals, 1 1 , 12, 13 and 14 on LAN1 ; 21 , 22, 23 on LAN2; 31 , 32, 33 on LAN3 and 41 , 42, 43, 44, 45 on LAN4.
  • the gateways GW1 , GW2, GW3 and GW4 are interconnected though the wide area network.
  • the recording medium is preferably a high capacity recording medium such as a hard disk drive, optical recording medium, blu ray, USB stick memory, SDRAM memory....
  • Each device or at least some of them comprise also recording facilities.
  • Gateways are considered as stable and pretty always connected. Stability of gateways is much higher than the stability of the other devices connected to the LAN. To simplify we can consider that both gateways and peers (devices on the local area network) contribute to the same amount of storage space, but simply because there are more peers than gateways, there is more storage available at peers than on gateways. Gateways are interconnected between themselves using a P2P (peer to peer) grid or system and therefore have a limited bandwidth access, leading to long transmission time.
  • P2P peer to peer
  • Peers are considered as volatile in the sense that they are not always connected. They connect to the system as the user boots them. However they provide some significant free space and can contribute to the system by offering some storage space, but with limited availability due to the device churn. They also have a high speed LAN connection to the gateway.
  • a backup application can also be run on the user device automatically without any user interaction, for instance for a backup of photos, personal movies, personal files or folders stored on the user device. This application can be run on a periodic basis, once a week, a month, or can be launched as soon as the user device gets connected to the local area network....
  • LAN1 can have a bit rate of 7MB/s compared to an ADSL line (interconnecting the gateways) having a bit rate of 128KB/S. It is therefore an advantage to cache the data on the gateway GW1 before transferring them to the cloud, which is compounded of the other devices, mainly gateways, connected to the wide area network.
  • the invention takes full benefit of the difference of bit rate between the LAN and the WAN networks.
  • some redundancy may be added (if the cloud is a distributed architecture), either pure replication, of part or of the totality of the data, in this case the data are uploaded several times; or erasure code, in this case some redundancy data are computed and added to the original data. Then the resulting data is uploaded to a set of gateways, here GW2, GW3, GW4 using a distributed (P2P like) protocol. This means that data is spread over a set of gateways GW2, GW3, GW4 over the network. Gateways have the advantage of having a very low churn and being pretty always online. Therefore the only limiting factor is the Internet connection throughput and the data volume to be transmitted.
  • the local gateway decides either to keep the data on its local storage (to provide local access / sharing of the data) or to drop the data from its local storage, using its storage as a simple cache. Once all the data have been transmitted to the remote gateways, the data can be considered as successfully uploaded / backup.
  • Some gateways among GW2, GW3, GW4 may or may not be selected according to some criteria, such as their availability, the number of peers connected to them.
  • GW1 is in charge of selecting the gateways that are used for the backup.
  • the storage space available on the gateways GW2, GW3, GW4 may be limited, another step may be useful in order to offload the gateway towards their local volatile devices 21 -23, 31 -33, 41 -45, if these devices comprise themselves some recording capacity.
  • Data is offloaded from the remote gateways to their local volatile devices. This allows maintaining the amount of required storage in the gateway at a reasonable level.
  • This transmission of data from the remote gateways to the devices local to these remote gateways can be done as soon as the devices are available. By available, one can understand as long as the devices are connected to the local area network, the data remaining on the remote gateways as long as the devices are not available (or switched on). By available, one can also understand as long as the devices have enough storage space for storing the data they have to store.
  • Blocks of data are therefore stored on storage peers, each block being stored on a different storage peer for instance or several blocks can be stored on one peer.
  • the selection of blocks and the selection of peers storing these blocks is done by the GW1 .
  • One remote peer which is always connected can be more used than one remote peer which is not very often connected.
  • the recording capacity of each peer is also one criteria to take into account by GW1 when assigning a peer for recording one or more blocks.
  • the reliability of the remote peers can also by taken into account and the type of peer (mobile or desktop).
  • Remote gateways GW2, GW3, GW4 can also be used as cache for blocks.
  • a block first reaches the remote (storage) gateways where it is cached during a period of time. This period of time corresponds to the waiting time that the final storage peer 21 -23, 31 -33, 41 -45 joins the home network and that the storage gateway is able to push data (the block) to the final storage peer.
  • the transmission of data from computer 12 to GW1 , the transmission of data from GW1 to GW2, GW3 and GW4 and the transmission of data from GW2, GW3 and GW4 to their local peripheral devices can overlap or can be done sequentially, one after the other.
  • the content to be backup is assumed to be ciphered prior to its transmission to the cloud, for privacy concerns.
  • GW1 adds redundancy to the data to be stored.
  • the content is split into a plurality of k blocks and redundancy is added by expanding this set of k blocks into a bigger set of n blocks using erasure correcting codes so that any subset of k out of n blocks allows recovering the original data.
  • a Distributed Hash table in GW1 maintains an index allowing finding which peers and/or gateways store a given block; peers maintain the list of blocks they have uploaded to the application.
  • the present invention actively leverages gateways that have only been considered as transparent devices by state of the art approaches.
  • the gateway GW1 is in charge of adding the redundancy; this allows faster transfer from the peer 12 to the gateway GW1 as a lower volume of data is concerned. Once done, it starts uploading data to other gateways GW2, GW3, GW4, at WAN speed this time.
  • the data can be downloaded to other gateways in a delayed manner.
  • the usage of bandwidth can be smoothed for providing users with a more transparent service (i.e., using the upload for backup when users are not using their computer/internet connection). Indeed, using the whole upload while the user is using his computer may severely affect his experience of Internet browsing thus convincing him in not using the system.
  • Such architecture also allows the provider to delay transfer from gateways to the Internet so as to smooth the usage of its core network.
  • Data is quickly uploaded from a peer to the gateway.
  • the peer is free to rapidly leave the home network, the gateway acts as a background agent on behalf of the peer. While saving an archive of 1 Gbytes with a 128 kB/s uplink lasts a little more than 2 hours (with a replication ratio of 1 , K more time with a replication ratio of K), the same archive is uploaded from the peer to the gateway within 140 seconds through a 7 MB/s bandwidth home network. After 140 seconds, the peer is free to leave the home network (which, for instance, makes possible enjoyable mobile experience), while the gateway shall process (replication, block division, .ciphering . . ) and backup the archive as a background task.
  • the requesting peer 12 informs its gateway GW1 of the data it is interested in.
  • the gateway GW1 carries on the download on behalf of the client peer 12 by contacting the remote gateways where data was uploaded or by contacting the gateways handling peers where data was uploaded. If data was offloaded to some peer, it is fetched as soon as possible by the corresponding remote gateway. Such a remote gateway then sends the data to the requesting client's gateway GW1 .
  • the client gateway GW1 has succeeded in getting the whole content, it informs peer 12 that its retrieval request has been completed, as soon as it connects back.
  • caching content to backup on the gateways allows the user to standby as he used to, and we also show that it greatly reduce the time to backup (i.e., the age of the backup).
  • caching at the edge is much more efficient than adding a cache within the Internet.
  • the total amount of storage needed for caching is also reduced.
  • the structure of the network is therefore taken into account. Indeed, most P2P applications ignore the presence of a gateway in between the peer and the Internet. As a result they do not leverage its presence while it could greatly improve the performance. Leveraging this gateway can make possible P2P storage even in cases where peers have a rather low availability, provided that they connect to the network frequently enough.
  • peers' fast but transient connections with gateways' slow but permanent connections is of real interest. If peers upload directly to the Internet, they can upload up to 460 MB/day. However, if we consider that the gateway is an active equipment that can perform caching, a peer can upload 24 GB/day to the gateway and the gateway can upload up to 1 1 GB/day. Turning the gateway into an active device can significantly enhance online storage service (P2P or cloud). It is proposed to offload upload tasks to the gateway as it provides several advantages:
  • An average value of this waiting time for a peer which availability is 30 per cent and which connects the home network twice a day can be estimated to 3 hours; while a Time to Backup value for an archive of 1 Gbytes with a replication ratio of 3 and an upload bandwidth of 128 kB/s is 6.5 hours. With no active gateways, this last value is increased by the waiting time, i.e. by 50 per cent.
  • Figure 2 represents an embodiment of a method according to preferred embodiment of the invention.
  • device 12 transfers data to gateway GW1 .
  • the data transferred are preferably intended to be backed up on a remote device.
  • the data transfer can be automatic, as explained earlier or can be controlled by a user of device 12.
  • the data are transmitted to a gateway GW1 via a local area network connecting device 12 and gateway GW1 .
  • the data are cached into gateway GW1 in a step E2.
  • the device 12 can be disconnected, it is no more involved in the backup process.
  • the devices take benefit of a high speed data transfer on the local area network and data are therefore very quickly transferred to the gateway GW1 . Therefore device 12 resources are not used during a long time period for this task.
  • the gateway GW1 ciphers the data before transferring them in the cloud composed of an internet network, remote gateways and devices.
  • Any ciphering methods can be used in order to protect the content in an efficient manner.
  • redundancy is added to the data in a step E4.
  • Redundancy can be added such a duplication of the content.
  • Duplication can enable to save the same data on several remote gateways and devices. This will increase the chances to recover the content. For instance, when a remote device containing part of a content to be retrieved is disconnected, if the part of the content is also stored on another device, then the user may retrieve his content faster. The more the blocks are duplicated, the more the chance to recover a backup copy of the content is high but this leads also to a big increase in storage resources.
  • the duplication of data can be a parameter adjusted by the user device according to the type of data to be saved or by the user himself, or adjusted by the gateway GW1 according to the type of data, to the storage capacities of the cloud, to the requirements input by a user or a system administrator...
  • Redundancy can also be added for error protection.
  • a step E5 the content is split into a number of blocks.
  • the number of blocks is adjusted by the gateway GW1 in accordance with the size of the data, with the bandwidth in the wide area network, with the number of selected remote gateways and/or devices local to these remote gateways...
  • the number of blocks can be set also by a network administrator.
  • a step E6 the blocks are then sent to the cloud in order to be recorded on the remote gateways GW2, GW3, GW4, each block being addressed to one of these gateways.
  • a step E7 the blocks are received and cached in the remote gateways GW2, GW3 and GW4.
  • the data can be definitively stored in these gateways but the gateways can also decide to store the content to local devices connected to them.
  • GW1 can also prevent the remote gateways from storing the data to their local devices. For instance, if the data are important, confidential data, or if the data need to be retrieved often.
  • the data in the form of blocks, are transferred to the local devices. For instance, one or several blocks can be stored on some or on all of the devices connected to the gateway. These devices must be equipped with storage facilities.
  • the remote gateways can also store part of the content on their own recording medium; The remote gateways can also delay the transmission of the data to their local devices, when the local devices get connected or available.
  • the transfer of data from the remote gateways can be performed asynchronously for each block of data, when the devices are available.
  • Each block can moreover be transferred to the destination device when said device is available, and each of the block can be transferred to its destination device asynchronously from the other blocks, depending on the availability of the destination devices.
  • At least the remote gateways are in charge of knowing which device on its local area network keeps which block.
  • GW1 can have a table indicating which remote gateway stores which data but each remote gateway GW2, GW3 and GW4 is in charge of keeping its own table indexing which local device keeps which block.
  • the GW1 can also store a copy of the indexing tables of the remote gateways where data from its local devices have been stored.
  • the index which is precious can also be stored in some resilient data structure (distributed hash table) or also stored onto a central server.
  • the index can also be stored by peers, leaving a more "passive" role to the gateways, even if they keep caching the data
  • the invention is not limited to the embodiments given here but it also applies to other types of local area networks and wide area networks and to other types of devices.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Multimedia (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)
  • Small-Scale Networks (AREA)

Abstract

The present invention proposes to cache content to backup/upload in the gateway, i.e., the equipment at the edge between the home network and the internet. Gateways are considered as stable and pretty always connected. Stability of gateways is much higher than the stability of the other devices connected to the LAN. Moreover, the data rate on local area networks is much higher than on wide area network. During upload of data to the cloud, the present invention proposes to cache the data in the local gateway so that once the data have been uploaded into the local gateway, the device can leave the local area network and the gateway will then upload the content to the cloud, for instance remote gateways or peers connected to these remote gateways, to backup the content.

Description

METHOD FOR DATA CACHE IN A GATEWAY
FIELD OF THE INVENTION
The invention concerns a method to cache data in a gateway. The present invention is particularly related to the backup of data. BACKGROUND OF THE INVENTION
Digital data is becoming increasingly common (photos, videos, documents) and increasingly important. Digital data is stored on disks which are a fragile supports prone to failures, burglar or any catastrophic event such as fire. Yet, very few people have good backup practices for their personal content for a few reasons ranging from not being aware of the importance of backup to the lack of good automated backup policies suited for them. The best approach to backup is to deploy an automated off-site backup similarly to what is done in corporate networks. The off-site feature is very important as a backup on a second disk attached to the computer leaves the user exposed to burglar, fire, or thunder. Yet off-site backup, as well as the related problem of sharing, is inhibited by the slow uplink of the internet connection of the users. Hence, performing backup requires long periods of uptime to upload the content. Furthermore, people shift their usage from desktop computers to portable/mobile devices that have much smaller uptime while at the same time increasing their production of digital content to backup and share. Yet, users would like that any upload task required for sharing or backup can be handled transparently within the up time of the devices without requiring the user to ensure that it will remain on for long enough as this requires intrusive changes such as disabling automatic standby and plugging the device.
The slow up-link combined with short connections periods inhibits large-scale deployments of online storage applications (backup, sharing). This is becoming increasingly true as the size of the content to backup increases while ADSL bandwidth has not evolved a lot in the last years. For example, uploading 1 GB (a 300 photo album) to online storage requires at least 2 hours of continuous uptime. Hence, these applications require users to change their behavior (i.e., let their computers powered for the whole night) thus limiting their deployment and making automated and seamless backup impossible.
SUMMARY OF THE INVENTION
The present invention proposes to cache content to backup/upload in a gateway between a local area network and the wide area network.
To this end, the present invention proposes a method for data cache on a first gateway comprising recording medium, the first gateway being connected through a first local area network to at least one local device, and connected through a wide area network to at least one remote device comprising a recording medium. According to the invention, the method comprises the following steps:
- Transmitting through the local area network, from the at least one local device, data to be stored on the at least one remote device, via the first gateway,
- Receiving the data on the first gateway,
- Caching of the data in the first gateway;
- Transmitting, via the wide area network, the data from the first gateway to the at least one remote device for storing it. According to a preferred embodiment, the method further comprises a step of ciphering the cached data on the local gateway before the transmitting step.
This enables to save the privacy of user data which are stored on remote devices, therefore without any control from the user the data belongs to.
Preferentially, the method further comprises a step of splitting the data into k blocks, each of the k blocks being respectively transmitted to one over k remote devices for storing it.
The data being split, it is easier to find storage space for storing blocks rather than for storing the whole data on the same recording medium. Preferentially, the method comprises a further step of adding redundancy data to the data, before the transmission step, obtaining n blocks of data with n>k. In a preferred embodiment, the n blocks replicate the k blocks, n minus k blocks being a duplication of some of the k blocks.
This makes easier the retrieval of data and the system safer as the data are duplicated. If one of the recording medium is not available when the user needs to retrieve his data, the block contained in this recording medium can be retrieved from another recording medium.
According to a variant, the n blocks are formed using the data of the k blocks to which error correction data are added.
Adding error correction data enables to protect the content against the network failures.
Preferentially, the first gateway comprises an index table establishing an index between each block and the remote device storing the block.
The first gateway is therefore able to manage the allocation of blocks into the remote device; this is needed for the retrieval of blocks. This can be managed as a file system.
The index, which is small but precious, can be stored onto some sort of distributed resilient structure... (DHT), or stored into a safe central server.
According to the best embodiment, remote devices are remote gateways.
As gateways are the devices commonly used for interconnecting local area networks and wide area networks and are devices which are most of the time powered on, gateways are the preferred devices for caching data.
In a preferred embodiment, remote gateways are connected to local devices through local area networks, the method further comprising a step of transferring the data from the remote gateways to the devices local to the remote gateways though the local area networks, when the devices local to the remote gateways are available.
As the gateway may not have a huge capacity of recording space, it may be useful to add additional storage space by associating the devices local to the remote gateways in the backup process.
Preferentially, the k blocks of data or the n blocks of data are each stored on one device local to the remote gateways.
In order to optimize the retrieval, it is more convenient that blocks which are duplicated are not stored on the same gateway, this enables to retrieve one block which has been duplicated when one of the device used to back it up is not available, provided the other one is available.
In one embodiment, before transferring the data to the devices local to the remote gateways, the data are cached into the remote gateways.
This does not prevent from using devices local to remote gateways when they are not immediately available. The transfer of data from the remote gateways can be performed asynchronously, when the devices are available. Each block can moreover be transferred to the destination device when said device is available, and each of the block can be transferred to its destination device asynchronously from the other blocks, depending on the availability of the destination devices.
BRIEF DESCRIPTION OF THE DRAWINGS
Other characteristics and advantages of the invention will appear through the description of a non-limiting embodiment of the invention, which will be illustrated, with the help of the enclosed drawings among which:
- Figure 1 represents an overview of a system implementing a preferred embodiment of a method according to the invention,
- Figure 2 represents a flow chart of a preferred embodiment of the method according to the invention. DETAILED DESCRIPTION OF THE INVENTION
On figure 1 , four local area networks LAN1 , LAN2, LAN3 and LAN4 are connected by a wide area network, preferably of internet type. In a preferred embodiment there 4 networks of Ethernet types but could be any other networks interconnecting peripherals.
Each of the LANs comprises a gateway GW, (GW1 in LAN1 , GW2 in LAN2, GW3 in LAN3, GW4 in LAN4) and comprises a set of peripherals, 1 1 , 12, 13 and 14 on LAN1 ; 21 , 22, 23 on LAN2; 31 , 32, 33 on LAN3 and 41 , 42, 43, 44, 45 on LAN4.
The gateways GW1 , GW2, GW3 and GW4 are interconnected though the wide area network.
Each gateway or at least some of them comprise a recording medium. The recording medium is preferably a high capacity recording medium such as a hard disk drive, optical recording medium, blu ray, USB stick memory, SDRAM memory....
Each device or at least some of them comprise also recording facilities.
Gateways are considered as stable and pretty always connected. Stability of gateways is much higher than the stability of the other devices connected to the LAN. To simplify we can consider that both gateways and peers (devices on the local area network) contribute to the same amount of storage space, but simply because there are more peers than gateways, there is more storage available at peers than on gateways. Gateways are interconnected between themselves using a P2P (peer to peer) grid or system and therefore have a limited bandwidth access, leading to long transmission time.
Peers are considered as volatile in the sense that they are not always connected. They connect to the system as the user boots them. However they provide some significant free space and can contribute to the system by offering some storage space, but with limited availability due to the device churn. They also have a high speed LAN connection to the gateway When a user wants to backup some data, the present invention proposes that these data are cached in the local gateway before being transferred to remote devices to be stored. A backup application can also be run on the user device automatically without any user interaction, for instance for a backup of photos, personal movies, personal files or folders stored on the user device. This application can be run on a periodic basis, once a week, a month, or can be launched as soon as the user device gets connected to the local area network....
For instance when the user of computer 12 on LAN1 wants to backup some of the data of his device 12, the data are transferred to the gateway GW1 at the speed of the local LAN interface. The gateway acts therefore as a cache. This takes the benefit of the high speed local area network LAN1 . For instance, LAN1 can have a bit rate of 7MB/s compared to an ADSL line (interconnecting the gateways) having a bit rate of 128KB/S. It is therefore an advantage to cache the data on the gateway GW1 before transferring them to the cloud, which is compounded of the other devices, mainly gateways, connected to the wide area network.
So, when the data have been transferred to the local gateway GW1 , computer 12 can be switched off, it does not intervene in the rest of the backup process.
The invention takes full benefit of the difference of bit rate between the LAN and the WAN networks.
At this point, the data has been partially backup but the final level of reliability is not yet offered.
As soon as some data are stored on the gateway GW1 , some redundancy may be added (if the cloud is a distributed architecture), either pure replication, of part or of the totality of the data, in this case the data are uploaded several times; or erasure code, in this case some redundancy data are computed and added to the original data. Then the resulting data is uploaded to a set of gateways, here GW2, GW3, GW4 using a distributed (P2P like) protocol. This means that data is spread over a set of gateways GW2, GW3, GW4 over the network. Gateways have the advantage of having a very low churn and being pretty always online. Therefore the only limiting factor is the Internet connection throughput and the data volume to be transmitted. Depending on how the gateway has been configured (there are here 2 possible modes of operations), once a piece of data has been successfully transmitted to remote gateways, the local gateway decides either to keep the data on its local storage (to provide local access / sharing of the data) or to drop the data from its local storage, using its storage as a simple cache. Once all the data have been transmitted to the remote gateways, the data can be considered as successfully uploaded / backup.
Some gateways among GW2, GW3, GW4 may or may not be selected according to some criteria, such as their availability, the number of peers connected to them. GW1 is in charge of selecting the gateways that are used for the backup.
According to a preferred embodiment of the invention, since the storage space available on the gateways GW2, GW3, GW4 may be limited, another step may be useful in order to offload the gateway towards their local volatile devices 21 -23, 31 -33, 41 -45, if these devices comprise themselves some recording capacity. Data is offloaded from the remote gateways to their local volatile devices. This allows maintaining the amount of required storage in the gateway at a reasonable level. This transmission of data from the remote gateways to the devices local to these remote gateways, can be done as soon as the devices are available. By available, one can understand as long as the devices are connected to the local area network, the data remaining on the remote gateways as long as the devices are not available (or switched on). By available, one can also understand as long as the devices have enough storage space for storing the data they have to store.
Blocks of data are therefore stored on storage peers, each block being stored on a different storage peer for instance or several blocks can be stored on one peer. However, the selection of blocks and the selection of peers storing these blocks is done by the GW1 . One remote peer which is always connected can be more used than one remote peer which is not very often connected. The recording capacity of each peer is also one criteria to take into account by GW1 when assigning a peer for recording one or more blocks. The reliability of the remote peers can also by taken into account and the type of peer (mobile or desktop...).
It is also possible that a part of the data remains on the remote gateways GW2, GW3, GW4.
Remote gateways GW2, GW3, GW4 (respectively connected to storage peers 21 -23, 31 -33, 41 -45) can also be used as cache for blocks. A block first reaches the remote (storage) gateways where it is cached during a period of time. This period of time corresponds to the waiting time that the final storage peer 21 -23, 31 -33, 41 -45 joins the home network and that the storage gateway is able to push data (the block) to the final storage peer. However, it is fair to consider that data is backed up once blocks have reached the storage gateways; because, once blocks stored on gateways GW2, GW3, GW4, data is secured.
The transmission of data from computer 12 to GW1 , the transmission of data from GW1 to GW2, GW3 and GW4 and the transmission of data from GW2, GW3 and GW4 to their local peripheral devices can overlap or can be done sequentially, one after the other.
According to the best embodiment of the invention, the content to be backup is assumed to be ciphered prior to its transmission to the cloud, for privacy concerns.
In order to achieve sufficient reliability, GW1 adds redundancy to the data to be stored. The content is split into a plurality of k blocks and redundancy is added by expanding this set of k blocks into a bigger set of n blocks using erasure correcting codes so that any subset of k out of n blocks allows recovering the original data. A Distributed Hash table in GW1 maintains an index allowing finding which peers and/or gateways store a given block; peers maintain the list of blocks they have uploaded to the application. The present invention actively leverages gateways that have only been considered as transparent devices by state of the art approaches. In the present invention, the gateway GW1 is in charge of adding the redundancy; this allows faster transfer from the peer 12 to the gateway GW1 as a lower volume of data is concerned. Once done, it starts uploading data to other gateways GW2, GW3, GW4, at WAN speed this time.
In other embodiments, the data can be downloaded to other gateways in a delayed manner. The usage of bandwidth can be smoothed for providing users with a more transparent service (i.e., using the upload for backup when users are not using their computer/internet connection). Indeed, using the whole upload while the user is using his computer may severely affect his experience of Internet browsing thus convincing him in not using the system. Such architecture also allows the provider to delay transfer from gateways to the Internet so as to smooth the usage of its core network.
Data is quickly uploaded from a peer to the gateway. The peer is free to rapidly leave the home network, the gateway acts as a background agent on behalf of the peer. While saving an archive of 1 Gbytes with a 128 kB/s uplink lasts a little more than 2 hours (with a replication ratio of 1 , K more time with a replication ratio of K), the same archive is uploaded from the peer to the gateway within 140 seconds through a 7 MB/s bandwidth home network. After 140 seconds, the peer is free to leave the home network (which, for instance, makes possible enjoyable mobile experience), while the gateway shall process (replication, block division, .ciphering . . ) and backup the archive as a background task.
When it comes to reclaiming backup data, the role of all elements in the systems are reversed. To access data, the requesting peer 12 informs its gateway GW1 of the data it is interested in. The gateway GW1 carries on the download on behalf of the client peer 12 by contacting the remote gateways where data was uploaded or by contacting the gateways handling peers where data was uploaded. If data was offloaded to some peer, it is fetched as soon as possible by the corresponding remote gateway. Such a remote gateway then sends the data to the requesting client's gateway GW1 . When the client gateway GW1 has succeeded in getting the whole content, it informs peer 12 that its retrieval request has been completed, as soon as it connects back. In the context of online services, caching content to backup on the gateways allows the user to standby as he used to, and we also show that it greatly reduce the time to backup (i.e., the age of the backup). In the context of transiently available P2P services, caching at the edge is much more efficient than adding a cache within the Internet. In addition to the reduction time to backup and time to restore, the total amount of storage needed for caching is also reduced. The structure of the network is therefore taken into account. Indeed, most P2P applications ignore the presence of a gateway in between the peer and the Internet. As a result they do not leverage its presence while it could greatly improve the performance. Leveraging this gateway can make possible P2P storage even in cases where peers have a rather low availability, provided that they connect to the network frequently enough.
The combination of peers' fast but transient connections with gateways' slow but permanent connections is of real interest. If peers upload directly to the Internet, they can upload up to 460 MB/day. However, if we consider that the gateway is an active equipment that can perform caching, a peer can upload 24 GB/day to the gateway and the gateway can upload up to 1 1 GB/day. Turning the gateway into an active device can significantly enhance online storage service (P2P or cloud). It is proposed to offload upload tasks to the gateway as it provides several advantages:
In peer-to-peer systems as known in the state of the art, fetching content may be slow because of anti-correlated presence of the server peer and the client peer. By introducing gateways that are permanently available, the synchronization between peers is not a problem anymore. This is of interest for delay tolerant applications such as backup. As already mentioned, once the peer has cached data to backup on the gateway, backup process continues as a background process on the gateway. On the gateway, data is replicated according to the ad-hoc protection policy (simple replication or erasure code) and divided into blocks.. The advantage for the gateway assisted model results from the almost permanent presence of gateways, sparing the waiting time that final storage peers connect. An average value of this waiting time for a peer which availability is 30 per cent and which connects the home network twice a day can be estimated to 3 hours; while a Time to Backup value for an archive of 1 Gbytes with a replication ratio of 3 and an upload bandwidth of 128 kB/s is 6.5 hours. With no active gateways, this last value is increased by the waiting time, i.e. by 50 per cent.
Figure 2 represents an embodiment of a method according to preferred embodiment of the invention.
In a first step, device 12 transfers data to gateway GW1 . The data transferred are preferably intended to be backed up on a remote device.
The data transfer can be automatic, as explained earlier or can be controlled by a user of device 12.
The data are transmitted to a gateway GW1 via a local area network connecting device 12 and gateway GW1 .
Then the data are cached into gateway GW1 in a step E2. Once the data are cached into GW1 , the device 12 can be disconnected, it is no more involved in the backup process. The devices take benefit of a high speed data transfer on the local area network and data are therefore very quickly transferred to the gateway GW1 . Therefore device 12 resources are not used during a long time period for this task.
During a step E3, the gateway GW1 ciphers the data before transferring them in the cloud composed of an internet network, remote gateways and devices.
Any ciphering methods can be used in order to protect the content in an efficient manner.
Once enciphered, redundancy is added to the data in a step E4.
Redundancy can be added such a duplication of the content. Duplication can enable to save the same data on several remote gateways and devices. This will increase the chances to recover the content. For instance, when a remote device containing part of a content to be retrieved is disconnected, if the part of the content is also stored on another device, then the user may retrieve his content faster. The more the blocks are duplicated, the more the chance to recover a backup copy of the content is high but this leads also to a big increase in storage resources. The duplication of data can be a parameter adjusted by the user device according to the type of data to be saved or by the user himself, or adjusted by the gateway GW1 according to the type of data, to the storage capacities of the cloud, to the requirements input by a user or a system administrator...
Redundancy can also be added for error protection.
In a step E5, the content is split into a number of blocks. The number of blocks is adjusted by the gateway GW1 in accordance with the size of the data, with the bandwidth in the wide area network, with the number of selected remote gateways and/or devices local to these remote gateways...
The number of blocks can be set also by a network administrator.
In a step E6, the blocks are then sent to the cloud in order to be recorded on the remote gateways GW2, GW3, GW4, each block being addressed to one of these gateways.
In a step E7, the blocks are received and cached in the remote gateways GW2, GW3 and GW4. The data can be definitively stored in these gateways but the gateways can also decide to store the content to local devices connected to them. GW1 can also prevent the remote gateways from storing the data to their local devices. For instance, if the data are important, confidential data, or if the data need to be retrieved often.
When such prevention is not done by GW1 and when the remote gateways decide to store the data to their local devices, the data, in the form of blocks, are transferred to the local devices. For instance, one or several blocks can be stored on some or on all of the devices connected to the gateway. These devices must be equipped with storage facilities. The remote gateways can also store part of the content on their own recording medium; The remote gateways can also delay the transmission of the data to their local devices, when the local devices get connected or available.
This does not prevent from using devices local to remote gateways when they are not immediately available. The transfer of data from the remote gateways can be performed asynchronously for each block of data, when the devices are available. Each block can moreover be transferred to the destination device when said device is available, and each of the block can be transferred to its destination device asynchronously from the other blocks, depending on the availability of the destination devices.
At least the remote gateways are in charge of knowing which device on its local area network keeps which block. GW1 can have a table indicating which remote gateway stores which data but each remote gateway GW2, GW3 and GW4 is in charge of keeping its own table indexing which local device keeps which block. The GW1 can also store a copy of the indexing tables of the remote gateways where data from its local devices have been stored.
The index, which is precious can also be stored in some resilient data structure (distributed hash table) or also stored onto a central server. The index can also be stored by peers, leaving a more "passive" role to the gateways, even if they keep caching the data
The invention is not limited to the embodiments given here but it also applies to other types of local area networks and wide area networks and to other types of devices.

Claims

Claims
1 . Method for data cache on a first gateway (GW1 ) comprising a recording medium, said first gateway (GW1 ) being connected through a first local area network (LAN1 ) to at least one local device (12), and connected through a wide area network to at least one remote device (GW1 , GW2, GW3) comprising a recording medium, said method being characterized in that it comprises the following steps:
- Transmitting (E1 ) through said local area network (LAN1 ), from said at least one local device (12), data to be stored on said at least one remote device (GW1 , GW2, GW3), via said first gateway (GW1 ),
- Receiving said data on said first gateway (GW1 ),
- Caching (E2) of said data in said first gateway (GW1 );
- Transmitting(E6), via said wide area network, said data from said first gateway (GW1 ) to said at least one remote device (GW1 , GW2, GW3) for storing (E7) it.
2. Method according to claim 1 characterized in that it further comprises a step (E3) of ciphering said cached data on the local gateway
(GW1 ) before said second transmitting step (E6) .
3. Method according to claim 1 or 2 characterized in that it further comprises a step of splitting (E5) said data into k blocks, each of said k blocks being respectively transmitted to one over k remote devices (GW1 , GW2, GW3) for storing it.
4. Method according to claim 3 characterized in that it comprises a further step (E4) of adding redundancy data to said data, before the second transmission step (E7), obtaining n blocks of data with n>k.
5. Method according to claim 4 characterized in that said n blocks replicate the k blocks, n minus k blocks being a duplication of some of the k blocks.
6. Method according to claim 4 characterized in that said n blocks are formed using the data of said k blocks to which error correction data are added.
7. Method according to claim 4 characterized in that said k blocks comprise said data to be recorded and associated error correction data, n minus k blocks being a duplication of some of the k blocks.
8. Method according to claim 3 characterized in that said first gateway (GW1 ) comprises an index table establishing an index between each block and said remote device (GW1 , GW2, GW3) storing said block.
9. Method according to any of claims 1 to 8 characterized in that said remote devices (GW1 , GW2, GW3) are remote gateways (GW1 , GW2, GW3).
10. Method according to claim 9 characterized in that said remote gateways (GW1 , GW2, GW3) are connected to local devices (21 -23,31 -33, 41 -45) through local area networks (LAN2, LAN3, LAN4), said method further comprising a step of transferring (E8) said data from said remote gateways (GW1 , GW2, GW3) to said devices (21 -23,31 -33, 41 -45) local to said remote gateways (GW1 , GW2, GW3) though said local area networks (LAN2, LAN3, LAN4), when said devices (21 -23,31 -33, 41 -45) local to said remote gateways (GW1 , GW2, GW3) are available.
1 1 . Method according to claim 10 and any of claims 3 to 6 characterized in that the k blocks of data or the n blocks of data are each stored on one device (21 -23,31 -33, 41 -45) local to the remote gateways (GW1 , GW2, GW3).
12. Method according to claim 10 or 1 1 characterized in that before transferring the data to the devices (21 -23,31 -33, 41 -45) local to the remote gateways (GW1 , GW2, GW3), the data are cached into the remote gateways (GW1 , GW2, GW3).
EP11805865.0A 2011-03-31 2011-12-23 Method for data cache in a gateway Withdrawn EP2692112A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
EP11805865.0A EP2692112A1 (en) 2011-03-31 2011-12-23 Method for data cache in a gateway

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
EP11305374 2011-03-31
EP11805865.0A EP2692112A1 (en) 2011-03-31 2011-12-23 Method for data cache in a gateway
PCT/EP2011/074007 WO2012130348A1 (en) 2011-03-31 2011-12-23 Method for data cache in a gateway

Publications (1)

Publication Number Publication Date
EP2692112A1 true EP2692112A1 (en) 2014-02-05

Family

ID=45464556

Family Applications (1)

Application Number Title Priority Date Filing Date
EP11805865.0A Withdrawn EP2692112A1 (en) 2011-03-31 2011-12-23 Method for data cache in a gateway

Country Status (6)

Country Link
EP (1) EP2692112A1 (en)
JP (1) JP2014512750A (en)
KR (1) KR20140018963A (en)
CN (1) CN103493461A (en)
WO (1) WO2012130348A1 (en)
ZA (1) ZA201308108B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20140077821A (en) 2012-12-14 2014-06-24 삼성전자주식회사 Apparatus and method for contents back-up in home network system
CN103401755A (en) * 2013-08-15 2013-11-20 山东神思电子技术股份有限公司 Method for data transmission in virtual communication link
CN105450542B (en) * 2014-08-21 2019-08-27 联想(北京)有限公司 A kind of data processing method and the first electronic equipment
US10949378B2 (en) 2016-05-31 2021-03-16 Fujitsu Limited Automatic and customisable checkpointing
GB2558517B (en) * 2016-05-31 2022-02-16 Fujitsu Ltd Automatic and customisable checkpointing
KR102030905B1 (en) * 2017-12-08 2019-10-10 인제대학교 산학협력단 Block chain system architecture and method
EP3844455A1 (en) * 2018-09-25 2021-07-07 Sony Corporation Communication network, method, network equipment and communication device

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7139811B2 (en) * 2001-08-01 2006-11-21 Actona Technologies Ltd. Double-proxy remote data access system
US20070271302A1 (en) * 2006-05-16 2007-11-22 Texas Instruments, Incorporated Data copy system and method for multi-platform disaster recovery
US7734733B1 (en) * 2007-06-15 2010-06-08 Packeteer, Inc. WAFS disconnected-mode read-write access
EP2385682B1 (en) * 2008-05-15 2013-08-28 Nomad Spectrum Ltd. Method for optimising packet-oriented data transmission and computer program product
CN101808012B (en) * 2010-03-31 2012-07-18 重庆索伦互联网信息服务有限公司 Data backup method in the cloud atmosphere
CN101902498B (en) * 2010-07-02 2013-03-27 广州鼎甲计算机科技有限公司 Network technology based storage cloud backup method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See references of WO2012130348A1 *

Also Published As

Publication number Publication date
KR20140018963A (en) 2014-02-13
ZA201308108B (en) 2015-01-28
WO2012130348A1 (en) 2012-10-04
CN103493461A (en) 2014-01-01
JP2014512750A (en) 2014-05-22

Similar Documents

Publication Publication Date Title
US10873629B2 (en) System and method of implementing an object storage infrastructure for cloud-based services
US10375166B2 (en) Caching device and method thereof for integration with a cloud storage system
WO2012130348A1 (en) Method for data cache in a gateway
JP6296316B2 (en) Distributed secure data storage and transmission of streaming media content
US8843637B2 (en) Managed peer-to-peer content backup service system and method using dynamic content dispersal to plural storage nodes
CN106156359B (en) A kind of data synchronization updating method under cloud computing platform
CN106294585A (en) A kind of storage method under cloud computing platform
US9729614B2 (en) Resilient data node for improving distributed data management and bandwidth utilization
JP2007073004A (en) Data maintenance information apparatus, dispersion storage system, and its method
US20110246721A1 (en) Method and apparatus for providing automatic synchronization appliance
US20170004322A1 (en) System and method for secure multi-tenancy in datadomain operating system (ddos), a purpose built backup appliance (pbba) operating system
WO2020098654A1 (en) Data storage method and device based on cloud storage, and storage medium
US20120296871A1 (en) File managing apparatus for processing an online storage service
Widodo et al. SDM: Smart deduplication for mobile cloud storage
US9886216B2 (en) Distributed remote data storage access
US11341009B1 (en) Directing placement of data in cloud storage nodes
Silva et al. Ephemeral data storage for networks of hand-held devices
Defrance et al. Efficient peer-to-peer backup services through buffering at the edge
KR20180099349A (en) User terminal, and cloud service system including the same
US9596183B2 (en) NAS off-loading of network traffic for shared files
JP6435616B2 (en) Storage device, storage system, storage system control method and control program
Wickramarachchi et al. Use of nomadic computing devices for storage synchronization
CN113377728A (en) File sharing method and system
Boian et al. Solving Storage Limitations Using a Peer-to-Peer Web File System
Oh et al. A distributed file system over unreliable network storages

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20131008

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

DAX Request for extension of the european patent (deleted)
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 1193520

Country of ref document: HK

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION HAS BEEN WITHDRAWN

18W Application withdrawn

Effective date: 20160712

REG Reference to a national code

Ref country code: HK

Ref legal event code: WD

Ref document number: 1193520

Country of ref document: HK