WO2014154973A1

WO2014154973A1 - Method for storing data in a computer system performing data deduplication

Info

Publication number: WO2014154973A1
Application number: PCT/FR2014/050653
Authority: WO
Inventors: Pierre OBAME MEYE; Philippe Raipin Parvedy
Original assignee: Orange
Priority date: 2013-03-28
Filing date: 2014-03-20
Publication date: 2014-10-02
Also published as: EP2979222B1; FR3003968A1; US20160054949A1; EP2979222A1

Abstract

The invention relates to a method for storing data in a computer system (SYS) including a plurality of first devices (PC1, PC2) storing data belonging to respective users (U1, U2), a second device (SS) capable of managing a backup of data from first devices, said backup including a step of deduplicating inter-user data, characterised in that an intermediate device (I) is inserted between the first devices (PC1, PC2) and the second device (SS), such as to initially perform an intra-user deduplication of the data to be backed up from first devices, and then to manage the inter-user deduplication in cooperation with the second device (SS).

Description

A method of storing data in a computer system performing data deduplication.

Technical area

The invention relates to a method for storing data in a computer system performing data deduplication.

Remember that in computer science, a deduplication (also called factorization or single instance storage) is a technique of data backup, consisting in factoring identical data sequences in order to save the used memory space.

State of the art

Current networked storage systems perform data deduplication before storage. Deduplication consists in detecting a redundancy between the data to be saved in a computer system and the data already saved in order to store only the difference. Thus, if a first device requires storage of data on a second device, deduplication is performed. If the data to be saved is already saved in the second device relative to a third device, only a reference to this data is created in connection with the first device. Thus, when the first device wishes to access the data, the second device uses the reference and obtains the data. The second device can then transmit the data to the first device.

This deduplication technique ensures a saving of about 90% storage savings according to certain applications.

The deduplication operation is performed either on the source side, in our example the first device, or on the target side for the backup, the second device in our example. This second device is usually a storage server.

If the deduplication is done on the source side, namely on the first device, a client program installed in the first device performs the deduplication before transmitting data to be saved to the second device. This technique effectively saves bandwidth at the first device.

If the deduplication is carried out in the second device, namely the server, the client program referred to above transmits the data to be saved to the second device that will perform the deduplication. In this case, all the data is transmitted; there is no saving in bandwidth at the second device.

Several solutions unifying deduplication and data confidentiality exist.

According to a first solution called "Per-user encryption", the first device encrypts with a private key the data to be saved before transmitting it to the second device. It is assumed that the second device is not aware of the public key corresponding to the private key. It is also assumed that several first devices may require storage on the second device, each device having its own private / public key pair.

In this configuration, the same data saved in the second device relative to the same user can be deduplicated. However, the same data saved in the second device with respect to two different users can not be detected by the second device, the latter having no knowledge of the public keys required for decryption.

With this first solution, the confidentiality of the data is ensured but this method reduces the efficiency of the deduplication in the system because it prevents the deduplication of data between different users. Consequently, on the side of the second device, there is no bandwidth saving and the storage space is not optimally managed because the deduplication of data belonging to different users is not effective.

With this first solution, the client program does not return data already transmitted and stored on the second device. The bandwidth at the first device is optimized.

A second solution relates to convergent encryption. Convergent encryption is an encryption procedure designed to allow the use of deduplication on contents encrypted by different first devices under different users. This second solution encrypts a datum according to its content. The general idea is that a user encrypts data with a Hash function and then uses the result of the encryption to encrypt the data. In this way, the same data encrypted by two different users will be identical after encryption; in this way, the second device can perform a deduplication on data belonging to different users. This second solution aims to unify the inter-user deduplication, that is to say between different users, and the confidentiality Datas. As a result, at the second device, the bandwidth is not saved. On the other hand, the saving in storage space is better than using encryption per user because an inter-user deduplication is implemented by the second device.

At the first device, this second solution saves bandwidth through deduplication made on the side of the second device.

The safest method of confidentiality among the existing approaches presented above is the one using user-based encryption. However, this greatly reduces the efficiency of deduplication. To improve the efficiency of deduplication and ensure data confidentiality, converged encryption looks better than the first solution. However, recent work has shown that confidentiality by using converged encryption can be compromised. Recall that in this so-called convergent encryption method, a data item corresponds to a data identifier whose calculation is a function of the data; the identifier and the data are intimately linked. This identifier is transmitted instead of the data so as to know if this data is already stored in the second device; in the affirmative, that is to say that the second device stores this same identifier, the data is not transmitted.

In this so-called convergent encryption method, it is the client program installed in the first device that creates the data identifier. As a result, malicious attacks are possible. For example, a user can create a random identifier ID completely independent of the data F to be saved; whereas, remember, the identifier should be calculated according to the data. The second device receives the couple ID / F (ID corresponds to the identifier of the data, F to the data for example a file F) and stores the couple ID / F. Later, another user of another device wishes to store a data F '; the client program of this other user correctly calculates an identifier at the base of the data (for example a hash of the data) and obtains an identifier ID which is the same as that used by the malicious device. The second device receives the identifier ID and finds that it exists in memory. The second device therefore responds to the first device that the data is already present and that a download of the data is not necessary. Later, when the first device requires a download of the data, the first device receives the data F 'from the malicious device and not the legitimate data F.

Another disadvantage is related to the observation of the network by a malicious third party. Indeed, when a user saves data in the system, this user can observe the outgoing and incoming network traffic on the client device and check if the data is actually transmitted to the second device. If it is not the case, it deduces that another user has already saved the data in the system. This makes it possible to identify data already stored by a storage system.

The invention offers a solution that does not have the drawbacks of the state of the art.

The invention

For this purpose, according to a functional aspect, the subject of the invention is a method for storing data in a computer system comprising a plurality of first devices storing data belonging to respective users, a second device able to manage a data backup. from first devices, said backup comprising an inter-user data deduplication step, characterized in that an intermediate device is intercalated between first devices and the second device, so as to perform intra-user deduplication on data backing up from first devices, and then managing the inter-user deduplication in cooperation with the second device.

Recall here that an intra-user deduplication is for a deduplication that operates on data of the same user and that an inter-user deduplication is for a deduplication that operates on data of users who can be different.

The presence of an intermediate device allows double deduplication both between data of the same user (intra-users) but also between different users (inter-users).

This results, at the level of the first device, a bandwidth saving. Indeed, during a backup by a user, it is not necessary to return all the data if it had already sent during a previous backup. Only the difference in data between previous backups and the current backup is transmitted.

Also, at the second device, this results in a saving in storage space. Indeed, the second device performs an inter-user deduplication, and thus optimizes its storage space by storing only one copy of each data. At the level of this second device, the confidentiality of the data saved is also ensured; indeed, the data can advantageously be encrypted according to the convergent encryption mode described in the paragraph devoted to the state of the technical. The second device thus guarantees the confidentiality of the data to the users; only authorized users can access data in the clear.

It should also be noted that, in the present application, a first device denotes indifferently a data processing device or a client program.

According to one embodiment, to manage inter-user deduplication, the intermediate device performs the following steps: a. A step of creating an identifier linked to data to be saved received from a first device, b. A transmission step during which the intermediate device transmits at least the identifier to the second device for managing the inter-user deduplication of the data.

The identifier of the data saved on the second device is not created by the first device but a trusted intermediary. The identifier is no longer generated by a first device. This limits malicious attacks by identifier manipulation as explained previously.

According to a second embodiment, which may be implemented alternatively or cumulatively with the previous embodiment, to manage the intra-user deduplication, a. the first device creates a first identifier linked to data to be backed up, b. the first device transmits the identifier to the intermediate device for the management of intra-user deduplication.

The first device therefore only manages identifiers that have its own data and not data belonging to other users.

According to a second embodiment, which can be implemented alternately or cumulatively with the previous embodiment, the intermediate device stores a correspondence between the identifiers linked to the intra-user deduplication and the identifiers linked to the inter-user deduplication. The device plays the role of mapping between identifiers used for intra-user and inter-user deduplication. When the intermediate device receives an identifier of data to be saved from a first device and this data is already saved in the second device, the intermediate device can find, through the correspondence, the identifier of the same data used by the intermediate device and the second device for the management of inter-user deduplication. In other words, the client program does not handle identifiers related to inter-user deduplication. A malicious attack using random identifiers, as described in the section devoted to the state of the art, is no longer possible thanks to the invention.

According to another mode, which can be implemented alternately or cumulatively with the previous, the intermediate device is located on the communication link through which the first device communicates with the second device. In this way, the device does not change the path, often the shortest, that the data exchanged between the first and the second device borrows. This intermediate device if ideally located in a place inaccessible by a user. This device is located for example in the network of a telecommunications operator. We will see in the following description that an intermediate device is ideally a device (POP) capable of aggregating data streams from a plurality of first devices. Such an aggregation device is for example a point of presence (POP) in an xDSL infrastructure. The advantage of using a point of presence POP is that the latter is a mandatory point of passage of data from or destination of first devices; accordingly, this point of presence introduces no change in the path length between a user and the second device. In addition, by placing the intermediary at the point of presence POP, this ensures that the data passes through an intermediary out of reach of users and completely secure.

Other aggregation devices exist, in particular an optical connection node (NRO) in an optical fiber network of a telecommunications operator.

According to another mode, which can be implemented alternatively or cumulatively with the previous ones, at the end of the inter-user deduplication, the intermediate device transmits information relating to the backup performed, in that the instant of activation of the transmission of information is delayed, especially if the data is already stored on the second device. Indeed, by observing the duration of realization of the deduplication, a user can in certain cases (especially if the data is of a large size) deduce that an inter-user deduplication has taken place. To make the inter-user deduplication completely transparent without consuming resources, the intermediary adds, if necessary, the latency to the processing of a request to write a data so that it lasts for as long. than a normal recording of a data. In this way, a user can not deduce if the data to be saved has just been written in the second device or if it was already stored.

More generally, this other mode makes totally transparent to users inter-user deduplication, which is not the case of existing solutions.

According to a hardware aspect, the invention relates to a computer program comprising code instructions for implementing the method according to one of the preceding claims, when this program is executed by a processor.

According to another material aspect, the invention relates to a recording medium readable by a data processor on which is recorded a program comprising program code instructions for executing the steps of the method defined above.

According to another hardware aspect, the invention relates to a device comprising a communication module for communicating with a plurality of first devices comprising respective storage modules for storing data belonging to respective users and with a second device capable of managing a backup of data from first devices, said backup comprising an inter-user data deduplication step, characterized in that it comprises a. A first intra-user deduplication management module on data to be saved from first devices, b. a second module for managing inter-user deduplication in cooperation with the second device.

According to another material aspect, the invention relates to a computer system comprising a plurality of first devices comprising respective storage modules for storing data belonging to respective users, a second device capable of managing a backup of data originating from first devices, said backup comprising an inter-user data deduplication step, characterized in that it comprises an intermediate device interposed between first devices and the second device, the intermediate device comprising a. A first intra-user deduplication management module on data to be saved from first devices, b. a second module for managing inter-user deduplication in cooperation with the second device.

The invention will be better understood on reading the description which follows, given by way of example and with reference to the appended drawings in which:

FIG. 1 represents a computer system on which is illustrated an exemplary embodiment of the invention.

Figure 2 is a detailed view of the system including the intermediate device according to one embodiment of the invention.

Figure 3 is a schematic view of exchanges taking place during a write phase of a data on a second device.

FIG. 4 is a schematic view of exchanges taking place during a reading phase of data on a second device.

Figure 5 is a synthetic view of the system according to the embodiment described.

Figures 6a and 6b illustrate another embodiment in which the intermediate device performs the 2 phases described above.

Detailed description of an exemplary embodiment illustrating the invention

FIG. 1 represents a computer SYS system in which the invention can be implemented. This system comprises a plurality of data processing devices (PC1, ... PCn).

To simplify the discussion, the following figures represent only two devices, called first devices PC1 and PC2.

In our exemplary embodiment, the system is based on a DSL network architecture of an access provider. This architecture includes client programs C1 and C2 installed on the first devices PC1 and PC2, respectively; an intermediate device I which is responsible for the deduplication of data from the same user (intra-user); an intermediate device corresponds to one or more client programs. a second SS device illustrated by a storage server; the latter is responsible for inter-user deduplication between data of a plurality of users. In our example, this second device SS is also responsible for storing data either locally or on storage nodes (SN1, SNk).

Recall that this DSL type architecture can be decomposed in a simplified manner into 3 layers, namely an access network, an aggregation network and a core network. These different layers are illustrated in FIG. 2. This figure shows an access network R-ACC, an aggregation network R-AGR and a core network R-COR.

The R-ACC access network most often comprises gateways (home gateways) installed at customers and DSLAMs multiplexers known to those skilled in the art. The subscriber lines of a region coming from the gateways are aggregated in the DSLAMs multiplexers. DSLAMs multiplexers have aggregation capabilities of one hundred to thousands of subscribers.

The aggregation network R-AGR combines the DSLAMs multiplexers and the points of presence (POP). Lines collected by DSLAMs are aggregated to a second level in POPs.

Finally, the core network R-COR includes several Points of Presence (POP). The POPs may aggregate streams from dozens of DSLAMs. Remember that a Presence point (POP point of presence) includes a set of interconnected routers at the same place (building, room ...). They are equipped with physical resources and software dedicated to routing. There are two types of routers namely AR access routers and BR core routers. Access routers are connected to aggregation networks. These access routers are in turn connected to the core routers.

Each access router within a POP POP is connected to at least two BR core routers to provide protection in the event of outages within a POP. The different routers BR cores are connected together in a mesh network (Mesh network). POP POPs provide access to the IP network of the ISP.

Deduplication can be done at different levels of data granularity, for example at file level, block level, byte level. In the following, a datum D will be the subject of a backup. An embodiment of a data writing phase will be described with reference to FIG. 3. This mode comprises several steps referenced ET1 -k (k = 1 to 10) in FIG.

It is assumed that a user U1 with an identifier IDU wishes to save data D in a storage space SNk managed by the second device SS.

According to the method, with reference to FIG. 1, an intermediate device I will manage the intra-user deduplications INTRA, and the server SS will manage the inter-user deduplications INTER.

The location of the intermediate device in the network may vary; it can be located in a first device PC1 / PC2, in the second device SS or on an intermediate device of the network. We will see in the following an intermediate device is chosen wisely especially to increase the bandwidth at a second device because it is at this level that the volume of data is the largest.

In our example, a POP multiplexer is the location chosen to illustrate the embodiment. A POP multiplexer has the advantage of being both a trusted device because it is located in a zone of confidence, namely in the core network; and in this network as close to the first devices.

The data D can be transmitted in clear, that is to say in an unencrypted manner; however, to ensure confidentiality, in our example, the data is encrypted by means of an encryption algorithm known to those skilled in the art.

In the following, a primitive can be written in the following way

Send (src, dest, COMMAND, param_1, param_2, .., param_N):

This primitive is used to designate a command for transmitting parameters from a source "src", for example a first device, to a destination "dest", for example an intermediate device.

In the following :

Hash (D) designates a hash function and D the data to which the hash function is applied;

Easym will designate an asymmetric encryption function; Esym will designate a symmetric encryption function

In our example, each user U1 and U2 has a public key and a private key.

The steps are as follows:

During a first step ET1 -1, in our example, the client program C1 of the user U1 creates a hash of the data D to send:

HD = Hash (D)

In a second step ET1 -2, optionally, the client program C1 of the user U1 creates the IDD identifier of the data D which will be used to manage an intra-user deduplication on the intermediate device I. This step is optional but recommended because comparing each bit of a data, especially if the number of bits is important, can be very long and expensive in terms of consumption of computing resources. Also, the use of an identifier avoids for a first device to transmit all the data while this data has already been the subject of a backup.

In our example, as the deduplication between the first device PC1 and the intermediate device I is intra-user, that is to say between data belonging to the same user, the identifier is created so that collisions between Different data identifiers created by the same user are not possible. For example, the identifier may be a hash taking into account the HD value created in step 1 and the IDU user ID. The hash operation can be written as follows:

IDD = Hash (IDU, HD)

During a third step ET1 -3, the client program C1 of the user U1 transmits to the intermediary I the IDD identifier of the data D to verify that it does not already have this data D.

More precisely, the transmitted primitive can take the following form:

Send (IDU, I, CHECK, IDD)

The primitive includes:

- the identifier of the user IDU, an identifier of the intermediate IDI,

- The identifier of the IDD data,

a CHECK command requiring verification of the presence or name of the IDD identifier at the intermediate device.

In a fourth step ET1 -4: Upon receipt, the intermediate device I verifies in the index data of the user U 1 if the identifier IDD exists or not.

If IDD exists in the index of the user U1, the intermediate device I responds to the client program C1 of the user U1 that it is not necessary to transmit the data D. The operation of saving the data D end.

Otherwise, the intermediate device I responds to the client program C1 of the user U1 that it must transmit the data, in our example the encrypted data, and its decrypted encryption key.

This step can be illustrated by the following syntax:

If index.get (IDU) .contains (IDD)

Send (IDU, IDD, CHECK_RESPONSE, YES)

else

Send (IDU, IDD, CHECK_RESPONSE, NO)

During a fifth step ET1 -5.1, when the client program C1 of the user U1 obtains the response of the intermediate device I; If the IDD already exists, the backup is considered completed and the operation ends.

If the identifier IDD does not exist, the client program C1 of the user U1 transmits, during a step ET1 -5.2, the encrypted data D and its decrypted encryption key.

The client program C1 of the user U1 encrypts the data D with the key HD to obtain encrypted data DE and then encrypts the key HD with its public key Ku_pub to obtain HDE so that only it, that is to say the program client C1 of the user U1, has access to the decryption key in clear. The client program C1 of the user U1 then transmits the encrypted data DE and the decrypted encryption key HDE. These last steps can be illustrated by the following syntax:

If IDD exists

End of the backup (ET1 -5.1)

Else (ET1 -5.2)

DE = Esym (HD, D)

HDE = Easym (Ku_pub, HD)

Send (IDU, I, PUT, IDD, HDE, DE)

At this point, a first phase of deduplication is complete.

A second phase starts during which an inter-user deduplication will be performed.

In a sixth step ET1 -6, when the intermediate device I receives the data, it creates a hash of DE to create an IDD_sys system identifier which will be used to manage the inter-user deduplication at the second SS device. Since all data is encrypted in the same way by all U1 and U2 users, two equal files before encryption will always be equal after the encryption and thus have the same system identifier.

IDD_sys = Hash (DE)

During a seventh step ET1 -7, the intermediate device I updates its index concerning the backup of IDD by U1 and the system identifier IDD_sys attributed to the data D.

Index. update (IDU, IDD, IDD_sys)

At this stage, in our example, at least three identifiers coexist at the intermediate device namely the IDU identifier of the user U1, the IDD identifier of the data D and the system identifier IDD_sys.

During an eighth step ET1 -8, the intermediate device I checks in its index if IDD_sys exists or not.

If IDD_sys exists (relative to another user) in the system, this means that the data is already stored on a SNk storage node and that there is no need to restock it. The intermediate device I then transmits just a reference of the data DE to the server SS as well as the decrypted encryption key HDE.

If IDD_sys does not exist, the data DE is thus not stored on the storage nodes SNk; DE and HDE are then transmitted to the SS server.

This step can be illustrated by the following syntax:

If IDD_sys exists

Send (l, FSS, PUT, IDU, IDD, IDD_sys, HDE)

else

Send (l, FSS, PUT, IDU, IDD, IDD_sys, HDE, DE)

During a ninth step ET1 -9, the server SS notifies the intermediate device I that the backup has been made.

Send (FSS, I, PUT_ACK, IDU, IDD, IDD_sys, OK)

During a tenth step ET1 -10, the intermediate device I notifies the client program of the user U1 of the end of the backup of DE.

This step can be illustrated by the following syntax:

Send (IDU, PUT_ACK, IDD, OK)

This writing phase can be followed by a reading phase of a data involving the intermediate device I. This reading phase will be described with reference to Figure 4 which includes steps referenced ET2-j in Figure 4 .

The previous steps illustrate the writing phase. The following steps illustrate a reading phase of the data D.

During a first step ET 2-1 of this reading phase, the client program C1 of the user U1 transmits to the intermediate device I the IDD identifier of the data D it wishes to recover, namely IDD. This step can be illustrated by the following syntax:

Send (IDU, I, G AND, IDD) The intermediate device I then searches for the identifier IDD in the user index U1.

If the identifier IDD exists in the data of the user U1, the intermediate device I searches the system identifier IDD_sys which corresponds to the identifier IDD in a system index. The system index can be represented by means of a table of correspondence between identifiers resulting from intra-user deduplication, for example IDD, and IDD_sys system identifiers.

Once the IDD_sys system identifier has been found, the encrypted data DE as well as the decrypted encryption key of D are retrieved on the server SS during a second step ET2-2 and transmitted to the user U1 during a third step ET2-3. If the identifier IDD does not exist in the data of the user U1, a negative response is transmitted to the user U1 during a fourth step ET2-4 of this reading phase.

We summarize the previous steps of this reading phase by the code below executed by the intermediate device I

If lndex_Users.get (IDU) .contains (IDD)

IDD_sys = user_index.getSystem_lndex (IDU, IDD)

HDE, DE = Send (1, FSS, GET, IDU, IDD, IDD_sys)

(ET2.2and ET2.3)

Send (IDU, GET_RESPONSE, IDD, HDE, DE)

(ET 2.4) Else

Send (IDU, GET_RESPONSE, IDD, NO)

(ET2.4)

We have seen, in the foregoing, that the intermediary has an interest in being placed at a point of presence POP. However, another place may be possible.

Also, the number of intermediate devices I is arbitrary. Only one intermediate device can be envisaged; however, in order to reduce the resource consumption on an intermediate device, it is preferable to provide the management of intra-user deduplication on several intermediate devices and to associate several first devices with the same intermediate device. We have also seen in the foregoing, with reference to FIG. 5, that the POP POP and the SS storage server are separate nodes on the network.

However, with reference to FIG. 6a or 6b, an intermediate device POP2 can play the role of both intermediary I for carrying out the deduplication and SS storage server operation.

FIG. 6a shows two intermediate devices, namely a first POP1 and a second P02In this configuration, when the first intermediate device POP1 receives a request from a client program C1 that is associated with it, for example from an included client program. in a PC1 of the same geographical region, the intermediate device POP1 performs the data deduplication operation. When the second intermediate device POP2 receives a request from the first intermediate device POP1, the second intermediate device POP2 acts in this case only as a storage server SS. Note that in Figure 6a, a home gateway GTW is between the first device PC1 and the point POP1.

FIG. 6b shows two intermediate devices, namely a first device POP1 and a second device P02. In this example, both devices handle both deduplication and storage on the SN storage nodes.

It has been seen previously that in step 10, the intermediate device I notifies the client program of the user U1 of the end of the backup of DE. It has been seen, with reference to the state of the art, that by observation of the network a user can observe the outgoing and incoming network traffic on the client device and check whether data to be saved is actually transmitted to the second device. If it is not the case, it deduces that another user has already saved the file in the system. This makes it possible to identify files already stored by a storage system. In this configuration, according to a variant, the instant of triggering the transmission of the response is delayed, in particular if the data is already stored on the second device. Indeed, the duration of the deduplication depends on whether the data is already present or not on the second device. The trigger time is therefore chosen so that the overall time between the transmission of the request and the reception of the response in step 10 is more or less the same. This feature makes it possible to hide from the first device that an inter-user deduplication has been performed. According to another variant, the instant of transmission may be random so as to mask once again the effective processing time of the deduplication operation. Note that an intermediate device has the following modules (not shown in the figures) for carrying out the method of the invention: a. A first intra-user deduplication management module on data to be saved from first devices, b. a second module for managing inter-user deduplication in cooperation with the second device.

Note that the term "module" used in this document, may correspond to either a software component or a hardware component, or to a set of hardware and / or software components, capable of implementing the function or functions described for the module.

Note further that the embodiment described above is based on a DSL architecture. However, the invention can be implemented in other architectures in which data deduplication is possible, for example an optical fiber network.

Claims

claims

1. A method of storing data in a computer system (SYS) comprising a plurality of first devices (PC1 .PC2) storing data belonging to respective users (U1, U2), a second device (SS) capable of managing a data backup from first devices, said backup comprising an inter-user data deduplication step, characterized in that an intermediate device (I) is intercalated between first devices (PC1 .PC2) and the second device (SS), so as to performing intra-user deduplication on data to be backed up from first devices, and then managing the inter-user deduplication in cooperation with the second device (SS).

Data storage method according to claim 1, characterized in that for managing the inter-user deduplication, the intermediate device performs the following steps: a. a step of creating an identifier (IDsys) linked to data (DE) to be saved received from a first device, b. a transmission step during which the intermediate device transmits at least the identifier (IDsys) to the second device for managing the inter-user deduplication of the data (DE).

3. Storage method according to claim 1, characterized in that for managing the intra-user deduplication, a. the first device creates a first identifier (IDD) linked to a data item (D) to be saved, b. the first device transmits the identifier (IDD) to the intermediate device (I) for the management of intra-user deduplication;

4. Storage method according to claims 2 and 3, characterized in that the intermediate device stores a correspondence between the identifiers related to intra-user deduplication and the identifiers related to inter-user deduplication.

5. Storage method according to claim 1, characterized in that the intermediate device (I) is located on the communication link through which the first device communicates with the second device.

6. Storage method according to claim 5, characterized in that, on the link, the intermediate device (POP) is a device capable of aggregating data streams from first devices.

7. Storage method according to claim 1, characterized in that, after the inter-user deduplication, the intermediate device (I) transmits information relating to the backup performed, and in that the trigger time the transmission of information is delayed.

8. Computer program comprising code instructions for implementing the method according to one of the preceding claims, when the program is executed by a processor.

Apparatus (I) comprising a communication module for communicating with a plurality of first devices (PC1) comprising respective storage modules for storing data belonging to respective users (U1) and with a second device (SS) capable of managing a backup of data from first devices, said backup comprising an inter-user data deduplication step, characterized in that it comprises a. A first intra-user deduplication management module on data to be saved from first devices, b. a second inter-user deduplication management module in cooperation with the second device (SS).

10. Computer system (SYS) comprising a plurality of first devices (PC1) comprising respective storage modules for storing data belonging to respective users (U1), a second device (SS) capable of managing a backup of data from first devices, said backup comprising an inter-user data deduplication step, characterized in that it comprises an intermediate device interposed between first devices and the second device, the intermediate device (I) comprising a. A first intra-user deduplication management module on data to be saved from first devices, b. a second inter-user deduplication management module in cooperation with the second device (SS).