GB2422927A - File restore management - Google Patents
File restore management Download PDFInfo
- Publication number
- GB2422927A GB2422927A GB0502425A GB0502425A GB2422927A GB 2422927 A GB2422927 A GB 2422927A GB 0502425 A GB0502425 A GB 0502425A GB 0502425 A GB0502425 A GB 0502425A GB 2422927 A GB2422927 A GB 2422927A
- Authority
- GB
- United Kingdom
- Prior art keywords
- file
- client
- restore
- files
- group
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/14—Error detection or correction of the data by redundancy in operation
- G06F11/1402—Saving, restoring, recovering or retrying
- G06F11/1446—Point-in-time backing up or restoration of persistent data
- G06F11/1448—Management of the data involved in backup or backup restore
- G06F11/1451—Management of the data involved in backup or backup restore by selection of backup contents
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/14—Error detection or correction of the data by redundancy in operation
- G06F11/1402—Saving, restoring, recovering or retrying
- G06F11/1446—Point-in-time backing up or restoration of persistent data
- G06F11/1458—Management of the backup or restore process
- G06F11/1464—Management of the backup or restore process for networked environments
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/14—Error detection or correction of the data by redundancy in operation
- G06F11/1402—Saving, restoring, recovering or retrying
- G06F11/1446—Point-in-time backing up or restoration of persistent data
- G06F11/1458—Management of the backup or restore process
- G06F11/1469—Backup restoration techniques
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Quality & Reliability (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
A distributed client network (100) comprises a file restore management apparatus (109). A network interface 201 of the file restore management apparatus (109) receives content identifications and location identifications of files stored at a plurality of clients. A grouping processor 203 groups the files in groups with similar content identifications in response to a similarity criterion. Upon receiving a file restore request for a first file from a first client, the file restore management apparatus (109) identifies a first group associated with the first file and a file location of a second file of the first group stored at a second client. The file restore management apparatus (109) further comprises a restore processor 211 which instigates a copy of the second file from the second client to the first client. The apparatus may provide an efficient back-up system in a distributed client network (100) with reduced storage requirements and/or increased protection against data loss.
Description
A FILE RESTORE MANAGEMENT APPARATUS AND A METHOD THEREFOR
Field of the invention
The invention relates to a file restore management apparatus and a method therefor and in particular to file restore management in a distributed client network.
Background of the Invention
information is increasingly stored as data files in magnetic, electrical and-or optical storage means such as solid state memory, hard disks, magneto-optical discs etc. It is well known that a serious disadvantage of such systems is that data files may be corrupted making all or part of a given data file inaccessible, in order to reduce the consequences of such data loss, such systems commonly back- up and store multiple copies of the same file. For example, the memory or hard disk of a computer may be backed up on e.g. an optical disk or magnetic tape. If the original file is corrupted, a restore operation may be performed whereby the stored file copy is retrieved from the second source and loaded into the computer. Although such an approach provides improved security against data loss, it may require large amounts of locally available storage space.
ifl some networks, a centralised back-up storage memory may be provided. For example, systems are known wherein data files from distributed computers are copied to a central * S *S* S 155 * S S S S 5 * * * S S S 555 5 5 5 5 S S 5 5 * S S S S S S * 5 SS I.. S 5 S S storage. Thus, if a data file is corrupted at a computer, the central back-up storage may be accessed and the copy of the corrupted file may be retrieved therefrom. However, such dfl approach requires a large amount of storage space in order to provide back up of the files. This results in an increased cost of the central back-up server. Furthermore, the requirement to copy a large number of files across the network provi.des a substantial loading of the network resulting in increased congestion or a requirement for an increased bandwidth.
It has been proposed to reduce the storage and bandwidth requirements in a centralised back-up system by only storing one copy of a given file in the central back up storage even if the file is present on a plurality of computers. The central back up processor may compare files with currently stored files and a copy may be made only if there is no previous copy present in the central storage.
However, even in such systems a large central storage is required resulting in high cost. Likewise, the bandwidth requirement is still high as all files are copied at least once. Furthermore, the risk of data loss is significant as there is no back-up available if the data file of the central. storage is corrupted.
Hence, an improved file restore apparatus would be advantageous and in particular a file restore system aflowing increased flexibility, increased performance, reduced complexity, increased protection against data loss, reduced loading of a communication system and/or reduced storage requirements would be advantageous.
S *S* S *SS * I S S S S * 5 * S S S 555 5 5 S )* S S S S 5 5 5 * S S S S 5 * 55 555 S 5 S S
Summary of the Invention
Accordingly, the Invention seeks to preferably mitigate, alleviate or eliminate one or more of the above mentioned disadvantages singly or in any combination.
According to a first aspect of the invention there is provided a file restore management apparatus for a distributed client network; the file restore management apparatus comprising: receiving means for receiving content Identifications and location identifications of files stored at a plurality of clients; grouping means for grouping files in groups with similar content identifications in response to a similarity criterion; means for receiving a file restore request for a first file from a first client; means for identifying a first group associated with the first file; means for determining a file location of a second file of the first group stored at a second client; and restore means for instigating a copy of the second file from the second client to the first client.
The file restore management apparatus may allow file restore of a file of the first client from a copy stored at the second client. The invention may allow file restore operations to be performed without requiring additional back-up copies to be made. In particular, the requirements for file copy storage may be reduced or completely eliminated and/or a centralised storing of file copies may be avoided or reduced. The bandwidth requirements and/or loading of the distributed network may e.g. be reduced. The * * *** S *ii * * S S S S 5 * * S S * 555 S S S S. S S S S S S * * S S S S 5 5 * S.. S 5. I invention may alternatively or additionally allow an improved protection against data loss and in particular an increased number of copies suitable for restoring a corrupt file may be available without increasing the storage requirements as file copies stored by other clients for other purposes may also be used for restoring files.
Each client may correspond to a processing node in the distributed network. For example, each client may correspond to a separate computer or application capable of operation independently of other nodes of the network. In some embodiments, a plurality of clients may be co-located in the distributed network and in particular a single computer may comprise more than one client.
The instigation of the copying from the second client to the first client may be direct or indirect. For example, in some embodiments, the file restore management apparatus may control the file copy or may e.g. simply provide information allowing the clients to perform the file copy without involving the file restore management apparatus.
The client may specifically be an application or processing device which provides a service independent of other clients in the distributed client network.
A given content identification and/or location identification may be common to a plurality of files. For example, a plurality of files may be closely related and may be restored as a unit. A single content identification and/or location identification may be used to relate to the whole group of files.
* S ** S S..
* SS S S S 5 5 S S * * *** S * S S. * . S * * * S * S S 5 5 5 5 S. S.. 5 * S S According to an optional feature, the restore means is arranged to transmit a file location identification of the second file to the first client. This may e.g. provide for an efficient system and/or a low complexity file restore management apparatus. The transmission of the file location identification may specifically be considered a way of instigating the copy of the second file from the second client to the first client and the first client may specifically communicate with the second client to retrieve the second file when the file location identification is received.
According to an optional feature, the restore means is arranged to retrieve the second file from the second client and to transmit the second file to the first client. This may e.g. allow for an efficient system wherein clients interact with the file restore management apparatus as if this was a central back-up storage having locally stored file copies. A simplified client organisation may e.g. be obtained as clients need only contact the file restore management apparatus to receive a file copy.
According to an optional feature, the file restore management apparatus further comprises copying means for copying a file from a client of the plurality of clients to a fiLe storage and wherein the grouping means is operable to include the file copy in a group.
The file restore management apparatus may e.g. control the number of available file copies for specific groups thereby increased the protection against data loss. The copying * * *a * *** S * * * * * * * * S * *p4 0 5 S ** 0 * * * . 0 5 * S * * S S. S., 0 means may store a content identification and/or location identification for the new file copy.
According to an optional feature, the file storage is a file storage of a client of the plurality of the clients. This may reduce storage requirements at the file restore management apparatus and may e.g. reduce the overall storage requirements of the distributed network as unused storage capacity at individual clients may dynamically be used for file back-up storage.
According to an optional feature, the file storage is a toca! file storage of the apparatus. This may e.g. reduce the storage requirements for the clients and/or allow a lower complexity system.
AccordLng to an optional feature, the copying means is arranged to copy a file in response to a determination that a number of files in a group i.s below a threshold. This may allow -improved protection against data loss as the file restore management apparatus may ensure that there are always a number of backup copies equal to the threshold available. The threshold may for example be one resulting in the file restore management apparatus making a file copy if no other files exist in a group.
According to an optional feature, the grouping means is arranged to remove a file from a group in response to receiving a deletion indication from a client indicative of the file of being deleted. This may e.g. provide an efficient and/or low complexity way of dynamically tracking file operations at the clients. S. ,.
S Sp * * S I S S I 5 *54 * - * IS S S. 5 S S 5 * 5** 5 0 5 According to an optional feature, the grouping means is arranged to remove a file from a group in response to receiving a change indication from a client indicative of the file of being changed. The change indication may for example be an indication of the file being modified, altered or appended to.
According to an optional feature, the similarity criterion is that content identifications correspond to identical file data. The files grouped toqether in a group may be files which have Content identifications which are indicative of the data of the files being identical. For example, the content identifications may be generated as a function of all or some of the data in the file and files resulting in identical content identifications may be grouped together.
This may e.g. allow that file restore operations for identical files can be achieved.
According to an optional feature, the similarity criterion is that content identifications correspond to content items of the same content. For example, the content identification may comprise identification of an author, artist and/or title of a content item and the file restore management apparatus may group files together which correspond to the same content item even if the data of the files are differ.ent. This may improve the efficiency of the system and may for example allow a music content item to be restored from another non-data identical content item corresponding to the same music.
* S *S* * *** * * S S S S S S * S S 5 *SS * S S S. S S a * * * . * S S 5 5 5 S 55 555 S * . S According to an optional feature, at least some of the content identifications comprise a file fingerprint. This may allow an efficient and reliable implementation and operation.
According to an optional feature, at least some of the location identifications comprise a file name. This may allow an efficient and reliable implementation and operation.
According to an optional feature, at least some of the location identifications comprise a client identification.
This may allow an efficient and reliable implementation and operation.
According to an optional feature, the distributed client network comprises the Internet. The file restore management apparatus may allow an improved file restore operation in a system utilising the advantages of the Internet.
According to an optional feature, the distributed client network comprises at least a first client comprising: means for transmitting a file restore request for a first file to the file restore management apparatus; means for receiving a tile location identification of a second file of a second client from the file restore management apparatus; means for retrieving the second file from the second client. This may e.g. allow an efficient file restore operation in a distributed client network and may in particular allow improved performance while retaining a low complexity of the file restore management apparatus.
* * S.. * ...
* S S * S * * * * . S S *S* S * S S. S S S S * S S * S S * S 5 5 ** *5* * S 5 S According to an optional feature, the distributed client network comprises at least a client comprising: means for transmitting a modification indication to the file restore management apparatus in response to detecting a modification of a file stored at the client.
The modification indication could for example be an indication of a data change or a file deletion. This may provide an efficient means of dynamically updating information at the file restore management apparatus to reflect changes at the clients.
According to another aspect of the invention, there is provided a method of file restore management in a distributed client network; the method comprising: receiving content identifications and location identifications of files stored at a plurality of clients; grouping files in groups with similar content identifications in response to a similarity criterion; receiving a file restore request for a first file from a first client; identifying a first group associated with the first file; determining a file location of a second file of the first group stored at a second client; and instigating a copy of the second file from the second client to the first client.
These and other aspects, features and advantages of the invention will be apparent from and elucidated with reference to the embodiment(s) described hereinafter.
Brief Description of the Drawings
* * a.. * ...
* S * * * * * . * . . S S** * S S S S * * * . S S * S * 5 * ** *.. * * * S Embodiments of the invention will be described, by way of example only, with reference to the drawings, in which FIG. 1 illustrates a distributed client network 100 in accordance with some embodiments of the invention; FIG. 2 illustrates a file restore management apparatus in accordance with some embodiments of the invention; and FIG.3 illustrates a method of file restore management in accordance with some embodiments of the invention.
Detailed Description of Embodiments of the Invention The following description focuses on embodiments of the invention applicable to a distributed client network comprising the Internet but it will be appreciated that the invention is not limited to this application but may be applied to many other networks.
FIG. 1 illustrates a distributed client network 100 in accordance with some embodiments of the invention.
In the example the Internet 101 connects a plurality of distributed clients 103, 105, 107 thereby forming a distributed client network 100. The individual clients 103- 107 may for example be individual computers or may in some embodiments include different applications running on the same computer.
* S *** S *** * S * * * * a * * S * * e** a S * ** * a * * * * * * S S * * * * ** *e* S * a S The distributed client network 100 furthermore comprises a file restore management apparatus 109 which is operable to provide back- up functionality.
In particular, the file restore management apparatus 109 allows clients to restore files from file copies stored at other clients of the distributed client network 100. In many communication systems, different clients may perform similar or identical applications and may have local copies of for example program execution or data files. As a specific example, most personal computers connected to the Internet today uses the operating system WindowsTM and therefore a large number of identical files used by WindowsTM are separately stored on the individual personal computers. The file restore management apparatus 109 of FIG. 1 may allow a file restore of one of these files to be achieved by copying the file from another client of the distributed client network 100.
FIG. 2 illustrates the file restore management apparatus 109 of FIG. 1 in more detail.
The file restore management apparatus 109 comprises a network interface 201 which couples the file restore management apparatus 109 to the Internet. The network interface is capable of receiving data from and transmitting data to clients 103-107 of the distributed client network through the Internet 101.
Specifically, the network interface 201 is capable of receiving content identifications and location identifications of files stored at the plurality of clients S * * * S S * * * S * * * S * *** S a * S. S S * * S S S a S S S 5 5 ** 0*S S * . . 103-105 from the clients 103-107. In the example of FIG. 1, each of the clients 103-107 determine a suitable content identification for some or all of the files stored locally at the client. The content identification may for example be a digital fingerprint of the file. Each of the clients 103- 107 furthermore determines a location identification for the files. The location identification may specifically comprise an identification of the client, such as an IP (Internet Protocol) address or a URL (Universal Resource Locator) identification. The location identification may further comprise a local identification such as a file name and path indicating a directory of the client in which the file is stored.
As a specific example of a location identification for a file named d1132. dll stored in the directory /windows on the C drive of a personal computer may be: 1.160.10.240 - C: /Windows/d1l32.dll where 1.160.10.240 is the IP address of the client. The fingerprint may for example be a 128 byte data value derived rom the data of the file. Thus, the amount of data required for communicating the content identification and location identification for a given file is relatively low and is typically less than 512 bytes.
Thus, in the example, the clients derive a fingerprint and a location for each (or some) locally stored file(s). The c1ents 103-107 subsequently transmit this information to the file restore management apparatus 109 over the Internet in one or more messages.
* * *** I *s.
* S I * * * * . * * * U **. * * * ** S * * * S S S * S * * * S * S* S.. * * . S In some embodiments, some or all of the content identifications and/or location identifications may each relate to more than one file. For example, the location identification may specify an entire file directory and the fingerprint may be determined on the basis of all files in the directory. As another example, the content identification and location identification may specify a specific application and/or operating system which may be back-up/ restored as a single unit. Such approaches may allow a very efficient system wherein bandwidth requirements for exchanging back-up information may be substantially reduced.
Thus, the file restore managemenL apparatus 109 receives content identifications and location identifications for the files stored at the clients 103-107. It will be appreciated that FIG. 1 for clarity illustrates only three clients 103- 107 but that a practical system may comprise a large number of clients and that the file restore management apparatus 109 may receive content identifications and location idenLifications relating to thousands or even millions of files. However, as the content identifications and location identifications correspond to relatively short data messages, the overall loading of the network and the storage requirements at the file restore management apparatus 109 remains relatively low and typically much lower than if the files were copied to a central storage.
The received content identifications and location identificatjons are fed from the network interface 201 to a grouping processor 203. The grouping processor 203 groups * . **. S * S * * * * S * * S * S.. S S *5 5 5 5. . S S 0 5 * * S a *.
0S5 S * 5 files in groups with similar content identifications in response to a similarity criterion. Thus, content identitications are considered to be similar if the similarity criterion results a match indication and to be dissimilar if it does not result in a match criterion. The term similar content identification relates to an evaluation of a similarity criterion and it will be appreciated that any suitable criterion may be used. The term similar content identifications may thus allow any grouping of files using any grouping or similarity criterion. In particular, the content identifications may be considered similar if the corresponding files can be used for restoring other files in the group. Thus, the criterion may be selected to be indicative of a probability of one file of the group being suitable for replacing another file of the group. Such suitability may depend on the individual embodiment and may be dependent on e.g. the type of file, the application(s) using the file and/or characteristics of the clients.
The grouping of the files may be performed without copying of the files to the file restore management apparatus 109.
Rather, the grouping processor 203 may simply group the content identifications and location identifications together such that a group of files is represented by the content identifications and location identifications of those files.
As a specific example, the grouping processor 203 may use the similarity criterion of the content identifications being identical. In such an embodiment, the grouping processor 203 may simply compare each new content identification to the previously received content * * *** S * * S S * * S * * S S * 5.5 5 * * 5S S S S * * * S S a S * * S * S. a.. . * S 4 identifications. If the new content identification is identical to one (or more) previously received content identifications, it is grouped together with this (these).
If the new content identification is different to all previously received content identifications, a new group is created comprising only this content identification (and thus the file corresponding to the content identification).
In this way the files stored at the individual clients 103- 107 of the distributed client network 100 which may be used for back-up of each other are grouped together without requiring any copying of the files. Specifically, if the content identifications are fingerprints of the files stored at the clients, the grouping processor may group the files having identical fingerprints together. If the fingerprint algorithm is sufficiently specific, identical fingerprints are almost certainly obtained because the corresponding Cues are identical. Thus, the grouping processor 203 groups identical files stored at the individual clients together based on the content identification. As the files are identical, a corrupt file of one client may be restored from another file of this group.
Thus, the grouping processor 203 generates a number of groups comprising files that may be used as back-ups for each other. The files are represented by the content identifications and the location identifications and the grouping processor 203 is coupled to a file group intormation storage 205 wherein the grouped content identitications and location identifications are stored. The file group information storage 205 may for example comprise a hard disk on which the data is stored.
* * *** S * S S * * * S * * * * S *.e * * S ** * * S S S * S * S S 5 * . * S. *6S S p * , in the system of FIG. 1, a client may experience that a file is corrupted or unintentionally deleted and may seek to restore the file. In order to do this, the client 103 may transmit a file restore request relating to a first file to be restored to the file restore management apparatus 109.
The file restore request may specifically comprise a content identification and a location identification of the file that should be restored. However, in some embodiments, the file restore request may simply comprise the location identification such as a file name and an identification of the client requesting the restore. In other embodiments, the file restore request may comprise the content identification but not any location identification.
The file restore request is received by the network interface 201 and fed to a restore group processor 207 coupled to the network interface 201.
When receiving the restore request, the restore group processor 207 in identifies the group to which the first file belongs. For example, if the restore request comprises a content identification, the group may directly use this content identification as a group identification. As another example, the restore group processor 207 may determine the appropriate group from a location identification only.
Specifically, the restore group processor 207 may search through the information stored in the file group information storage 205 to find a stored location identification matching the location identification of the restore request.
In other words, the restore group processor 207 may find the entry for the corrupted file in the file group information * S *** S *** * a S S * * * * * S S a S.. * * * a a S S * S * a S a * * * ** a.. a * * a storage 205 and may identify the group as the group with which this entry is associated.
The restore group processor 207 is coupled to a file identification processor 209 which is further coupled to the file group information storage 205. The file identification processor 209 is arranged to determine a file location of a second file which belongs to the same group as the first file. However, the second file is a file which is stored at a second client.
As a specific example, a first file may be corrupted at a first client 103 which may therefore send a restore request to the file restore management apparatus 109. The restore request may for example comprise only the fingerprint of the first file. This fingerprint may be fed to the restore group processor 207 which searches through the file group intormation storage 205 to find the group comprising files having this fingerprint. The file identification processor 209 may then proceed to select another file from this group, i.e. it may select a file having a location identification corresponding to a different client 105. In this way, an identical file to the corrupted file at the first client 103 may be identified at a different client 105.
The file identLfication processor 209 retrieves the location identification of this second file from the file identification processor 209. The location identification thus identifies a file at a second client 105 which is (almost certainly) identical to the first file which was corrupted at the first client 103.
* S *** S *e5 * a S S S p * * * . a p *.* * p p a. S a * * a * a * a a S * * a *a p., * * * The file identification processor 209 is coupled to a restore processor 211 which receives the location identification and in response instigates a copy of the second file from the second client 105 to the first client 103.
In some embodiments, the restore processor 211 may simply instigate the restore operation by transmitting the location identification to the first client 103. Upon receiving this information, the first client 103 may contact the second client 105 in order to retrieve a copy of the file. This may provide for a very efficient system as the file copy is directly from the back-up source to the client.
In other embodiments, the restore processor 211 may be arranged to retrieve the second file from the second client 105. In particular, the restore processor 211 may transmit a message to the second client 105 requesting that the file identified by the path and directory of the location identification of the second file is transmitted to the file restore management apparatus 109. Upon receiving the second file, the restore processor 211 may transmit the retrieved file to the first client 103. Although such embodiments may introduce a higher loading of the network and a more complex file restore management apparatus 109, it may also allow the individual clients to treat the file restore managementapparatus 109 as a centralised back-up source. Specifically, a simplified restore operation for the individual client may be obtained as the client simply transmits a restore request and in response receives a file copy from the file restore management apparatus 109. Thus, the file restore management apparatus 109 may in such embodiments appear as a * . *** U **S * U W * S p * p * p * p p.S * * p * j p p * * p * S p p 5 * * p. p.. * * p * conventional centralised network back-up node to the client.
This may reduce complexity for clients and provide improved backwards compatibility in many embodiments.
The system of FIG. 1 and 2 may thus provide a back-up and file restore system which exploits the fact that identical or sufficiently similar files may frequently be stored independently at a plurality of clients in a distributed client network. Thus, the system allows these files to be used as back-up files for each other thereby substantially reducing the storage and bandwidth requirement of conventional centralised back-up storage systems.
Specifically, file restore is enabled in a distributed client network without any files being copied for this purpose. Hence, file restore may even be performed in systems comprising no dedicated back-up storage. This may reduce the probabiliLy of data loss.
In some embodiments, the information exchange between the clients 103-107 and the file restore management apparatus 109 is performed synchronised for all clients. For example, at regular intervals, such as on a daily basis, the file restore management apparatus may transmit an information request to the clients 103-107 which in response may determine fingerprints and location information for all files to be included in the back up management by the file restore management apparatus 109. When this is completed, the clients 103-107 may upload the information to the file restore management apparatus 109. Thus, in such embodiments, the information stored at the file restore management apparatus 109 may be updated at specific times. Such embodiments may provide a more efficient use of the d * *SS S * U S S 5 * a * * S a * p. * ae S a S S * 5 a S S S * * 5 5S * 5 * I available communication resource as update information can be scheduled for off- peak times. Furthermore, it may reduce complexity of the file restore management apparatus 109 and the clients 103-107.
In other embodiments, the information may be updated more dynamically. For example, the clients 103-107 may monitor file operations and detect any modifications to files which use the file restore management apparatus 109 for back-ups.
Such modifications may for example be an addition, change or deletion of data from a file or may e.g. be a deletion of an existing file or the creation of a new file. The clients 103-107 may furthermore comprise means for transmitting a modification indication in response to such a detection. The modification indication may specifically comprise an indication of a new fingerprint for an existing file, a deletion indication for a specific file and/or a new content identification and location identification for a new file.
When the file restore management apparatus 109 receives the modification indication it updates the information of the file group information storage 205 accordingly.
Specifically, the file restore management apparatus 109 may remove the file associated with the modification indication from the group to which it previously belonged. For example, if a file is deleted at a client, the client may transmit a deletion indication to the file restore management apparatus 109 and the content identification and location identification for that file may be deleted from the file group information storage 205.
a ss a * * * * e a * * a * e. * a * . a - S * *- as. p p. In the above described example, back-up was provided without any additional copies being made for the purpose of back-up.
However, in some embodiments, file copies may be generated to provide increased protection against data loss.
Specifically, the file restore management apparatus 109 may comprise a local storage for storing back-up copies of files. For example, the file restore management apparatus 109 may determine groups containing only a single entry corresponding to no file copies being available for restoring the file of this entry. The file restore management apparatus 109 may then comprise functionality for copying the file from the client to the local storage. The location identification for the file copy may then be included in the group thereby ensuring that the original file may be restored using the file stored in the local storage. Thi.s may substantially increase the protection against data loss as it may (in this example) be ensured that all files stored at the client has at least one back-up copy.
In some embodiments, the file copies generated for back-up purposes may also be stored at other clients. For example, the file restore management apparatus 109 may be informed of unused storage capacity at the clients and may copy the files to clients having spare storage capacity.
It will be appreciated that in other embodiments, the approach may be used to ensure that there is at least a given number of files available for restore operations wi:hin each group. Furthermore, the number of files may vary for different groups. For example, files may have associated * * I.. * Ibi * * SI * * I I * S S as.. a I a SP I S S S 4 S * * a S - S I S e* IS. * * S I priority or importance ratings and the minimum number of file copies may be determined in response thereto.
It will be appreciated that the system provides high flexibility in adjusting the trade off between storage requirements and data loss proLection for the preferences and requirements of the individual application.
It will be appreciated that. the grouping processor 203 may use any similarity criterion or grouping criterion for grouping the files. Specifically, the grouping processor 203 may group identical files together as previously described.
However, in other embodiments, Lhe grouping processor 203 may group files which are not identical bi,it which are still suitable for backing each other up. For example, content items, such as audiovisual material, may be encoded using different content encoding resulting in different files.
However, content item files corresponding to the same contenL items may still be grouped together as one file may be used to restore another file. This may result in a corrupt file being replaced by e.g. another file relating to the same content item but encoded at a different quality.
For example, a specific song encoded using an MP3 encoding algorithm may be restored by a file corresponding to the same song but encoded with a different data rate.
FIG.3 illustrates a method 300 of file restore managemenL i.n accordance with some embodiments of the invention. The method may be operated by a file restore management apparatus 109 in accordance with FIG. 2 and will be described with reference to this.
* S *** a a..
* . S S S S a * * S S S *SS S S S S. a S S S S S I S S I S * * I SI *55 S 5 I S In step 301, the network interface 201 receives content identifications and location identifications of files stored at a plurality of clients 103-107. Step 301 is followed by step 303 wherein the grouping processor 203 groups files in groups with similar content identifications in response to a similarity criterion.
In step 305 the network interface 201 receives a file restore request for a first file from a first client 103.
Steps 301 and 303 may in some embodiments be continuously iterated until a file restore request is received. In some embodiments, the method may be paused in step 305 until a file restore request is received.
Step 305 is followed by step 307 wherein the restore group processor 207 identifies a first group associated with the first file.
Step 301 is followed by step 309 wherein the file identification processor 209 determines a file location of a second fil.e of the first group stored at a second client.
Step 309 is followed by step 311 wherein the restore processor 211 instigates a copy of the second file from the second client to the first client.
it will be appreciated that the file restore system may be implemented as a stand-alone system or may be integrated with other systems or applications. For example, the system may be implemented as a plug-in to other applications such * S *sS I III * S S * S S S S * S S I III S S I
SI S S I S S S S
* S S S S S S SI
S S S I
as for example peer-to-peer file sharing applications operation over the Internet.
It will be appreciated that the above description for clarity has described embodiments of the invention with reference to different functional units and processors.
However, it will be apparent that any suitable distribution of functionality between different functional units or processors may be used without detracting from the invention. For example, functionality illustrated to be performed by separate processors or controllers may be performed by the same processor or controllers. Hence, references to specific functional units are only to be seen as references to suitable means for providing the described functionality rather than indicative of a strict logical or physical structure or organization.
The invention can be implemented in any suitable form including hardware, software, firmware or any combination of these. The invention may optionally be implemented at least partly as computer software running on one or more data processors and/or digital signal processors. The elements and components of an embodiment of the invention may be physically, functionally and logically implemented in any suitable way. Indeed the functionality may be implemented in a single unit, in a plurality of units or as part of other functional units. As such, the invention may be implemented in a single unit or may be physically and functionally distributed between different units and processors.
Although the present invention has been described in connection with some embodiments, it is not intended to be * * **.
* * * * * * S * * S S S 505 5 * S SS 5 S S S S S S * S S S 5 S S *5 *.. S S S S limited to the specific form set forth herein. Rather, the scope ot the present invention is limited only by the accompanying claims. AdditionaLLy, although a feature may appear to be described in connection with particular embodiments, one skilled in the art would recognize that various features of the described embodiments may be combined in accordance with the invention. In the claims, the term comprising does not exclude the presence of other elements or steps.
Furthermore, although individually listed, a plurality of means, elements or method steps may be implemented by e.g. a single unit or processor. Additionally, although individual features may be included in different claims, these may possibly be advantageously combined, and the inclusion in different claims does not imply that a combination of features is not feasible and/or advantageous. Also the inclusion of a feature in one category of claims does not imply a limitation to this category but rather indicates that the feature is equally applicable to other claim categories as appropriate. Furthermore, the order of features in the claims do not imply any specific order in which the features must be worked and in particular the order of individual steps in a method claim does not imply that the steps must be performed in this order. Rather, the steps may be performed in any suitable order. In addition, singular references do not exclude a plurality. Thus reterences to a", an', "first", "second" etc do not prec'ude a plurality.
* * *** S *S* * * S S S S S * S S S S 5.5 5 5 5 S5 S S S S S S S * S * S 5 5 5 *S *5* S 5 * *
Claims (20)
1. A file restore management apparatus for a distributed client network; the file restore management apparatus comprising: receiving means for receiving content identifications and location identifications of files stored at a plurality of clients; grouping means for grouping files in groups with similar content identifications in response to a similarity criterion including those files located on different clients; means for receiving a file restore request for a first file from a first client; means for identifying a first group associated with the first file; means for determining a file location of a second file of the first group stored at a second client; and restore means for instigating a copy of the second file from the second client to the first client.
2. The apparatus of claim 1 wherein the restore means is arranged to transmit a file location identification of the second file to the first client, such that the first client * can directly retrieve the file from the second client.
a.....
3. The apparatus of claim 1 or 2 wherein the restore means is arranged to retrieve the second file from the second client and to transmit the second file to the first client.
CE1 3535EP
4. The apparatus of any previous claim further comprising copying means for copying a file from a client of the plurality of clients to a file storage and wherein the grouping means is operable to include the file copy in a group.
5. The apparatus of claim 1 wherein the entire group of files with a common identifier is restored as a unit.
6. The apparatus of claim 4 wherein the file storage is a local file storage of the apparatus.
7. The apparatus of any of the claims 4 to 6 wherein the copying means is arranged to copy a file in response to a determination that a number of files in a group is below a threshold
8. The apparatus of any previous claim wherein the grouping means is arranged to remove a file from a group in response to receiving a deletion indication from a client ***.*, indicative of the file of being deleted.
9. The apparatus of any previous claim wherein the grouping means is arranged to remove a file from a group in response to receiving a change indication from a client indicative of the file of being changed.
10. The apparatus of any previous claim wherein the similarity criterion is that content identifications correspond to identical file data.
GEl 3535EP
11. The apparatus of any previous claim wherein the similarity criterion is based upon the probability of one file of the group being suitable for replacing another file of the group.
12. The apparatus of any previous claim wherein at least some of the content identifications comprise a file fingerprint.
13. The apparatus of any previous claim wherein at least some of the location identifications comprise a file name.
14. The apparatus of any previous claim wherein at least some of the location identifications comprise a client identification.
15. The apparatus of any previous claim wherein the distributed client network comprises the Internet.
:.. 20
16. An distributed client network comprising a file restore management apparatus in accordance with any previous claim. . a
17. The distributed client network of claim 16 comprising at least a first client comprising: means for transmitting a file restore request for a first file to the file restore management apparatus; *. .o.* means for receiving a file location identification of a second file of a second client from the file restore management apparatus; means for retrieving the second file from the second client.
CE1 3535EP
18. The distributed client network of claim 16 or 17 comprising at least a client comprising: means for transmitting a modification indication to the file restore management apparatus in response to detecting a modification of a file stored at the client.
19. A method of file restore management in a distributed client network; the method comprising: receiving content identifications and location identifications of files stored at a plurality of clients; grouping files in groups with similar content identifications in response to a similarity criterion including those files located on different clients; receiving a file restore request for a first file from a first client; identifying a first group associated with the first file; determining a file location of a second file of the first group stored at a second client; and : ** 20 instigating a copy of the second file from the second *a client to the first client. P.
20. A computer program enabling the carrying out of a method according to claim 19. I.. 0 II S. I
S S0*S*
CE13535EP
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
GB0502425A GB2422927B (en) | 2005-02-07 | 2005-02-07 | A file restore management apparatus and a method therefor |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
GB0502425A GB2422927B (en) | 2005-02-07 | 2005-02-07 | A file restore management apparatus and a method therefor |
Publications (4)
Publication Number | Publication Date |
---|---|
GB0502425D0 GB0502425D0 (en) | 2005-03-16 |
GB2422927A true GB2422927A (en) | 2006-08-09 |
GB2422927A9 GB2422927A9 (en) | 2006-09-05 |
GB2422927B GB2422927B (en) | 2007-02-14 |
Family
ID=34355858
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
GB0502425A Active GB2422927B (en) | 2005-02-07 | 2005-02-07 | A file restore management apparatus and a method therefor |
Country Status (1)
Country | Link |
---|---|
GB (1) | GB2422927B (en) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5649196A (en) * | 1993-07-01 | 1997-07-15 | Legent Corporation | System and method for distributed storage management on networked computer systems using binary object identifiers |
US20040003272A1 (en) * | 2002-06-28 | 2004-01-01 | International Business Machines Corporation | Distributed autonomic backup |
-
2005
- 2005-02-07 GB GB0502425A patent/GB2422927B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5649196A (en) * | 1993-07-01 | 1997-07-15 | Legent Corporation | System and method for distributed storage management on networked computer systems using binary object identifiers |
US20040003272A1 (en) * | 2002-06-28 | 2004-01-01 | International Business Machines Corporation | Distributed autonomic backup |
Also Published As
Publication number | Publication date |
---|---|
GB0502425D0 (en) | 2005-03-16 |
GB2422927A9 (en) | 2006-09-05 |
GB2422927B (en) | 2007-02-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP1049989B1 (en) | Access to content addressable data over a network | |
RU2531869C2 (en) | Differential recoveries of file and system from peer-to-peer nodes of network and cloud | |
US7487551B2 (en) | Access to content addressable data over a network | |
CN102460398B (en) | Source classification for performing deduplication in a backup operation | |
US9916198B2 (en) | Erasure coding and replication in storage clusters | |
US8001096B2 (en) | Computer file system using content-dependent file identifiers | |
US8433732B2 (en) | System and method for storing data and accessing stored data | |
US9251160B1 (en) | Data transfer between dissimilar deduplication systems | |
EP2416236B1 (en) | Data restore system and method | |
US9354976B2 (en) | Locating previous versions of an object in a storage cluster | |
US20060218435A1 (en) | Method and system for a consumer oriented backup | |
US20130212070A1 (en) | Management apparatus and management method for hierarchical storage system | |
US20040236801A1 (en) | Systems and methods for distributed content storage and management | |
CN109144406A (en) | Metadata storing method, system and storage medium in distributed memory system | |
CN109947730B (en) | Metadata recovery method, device, distributed file system and readable storage medium | |
US9020902B1 (en) | Reducing head and tail duplication in stored data | |
US20120005162A1 (en) | Managing Copies of Data Structures in File Systems | |
GB2422927A (en) | File restore management | |
US8024354B2 (en) | System and method for managing data using a hierarchical metadata management system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
732E | Amendments to the register in respect of changes of name or changes affecting rights (sect. 32/1977) |
Free format text: REGISTERED BETWEEN 20110127 AND 20110202 |
|
732E | Amendments to the register in respect of changes of name or changes affecting rights (sect. 32/1977) |
Free format text: REGISTERED BETWEEN 20170831 AND 20170906 |